I was trying to read utf8 text from a text file, but just couldn't figure out a simple straight way to do it without some googling.

As a comparison, in PHP, this is very easy.

 
$test = file_get_contents("file.txt");

In PHP, string is just a byte array. The function can just read the file raw data to memory and reference it with a variable.

Java deal with string very differently. All the string in Java are unicode encoded, Java also has byte array, but byte array is not a string. A string must be a String object.

A file, in its nature its a byte array, even it is a text file. To get a String from the file, Java has to decode it with appropriate encoding.

There are no trivial ways to do it. Here is my version.

    public static String readFileString ( String file ) {
        StringBuffer text = new StringBuffer();
        try {
 
            BufferedReader in = new BufferedReader( new InputStreamReader( new FileInputStream(new File(file)), "UTF8") );
            String line;
            while ( (line = in.readLine()) != null ) {                
                text.append(line + "\r\n");
            }        
 
 
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
 
        return text.toString();
    }

Suppose you are adding document to Lucene Index, and you want read the text from file and add to Lucene document.

        Document doc = new Document(); // create a new document
         FieldType type = new FieldType();
        type.setIndexed(true);
        type.setIndexOptions(FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
        type.setStored(true);
        type.setStoreTermVectors(true);
        type.setTokenized(true);
        type.setStoreTermVectorOffsets(true);
        Field title = new Field("title", "How to read UTF8 text file into String in Java", type);
        Field content = new TextField("content", readFileString("c:\\tmp\\content.txt"), Field.Store.YES); 
        doc.add(title);
        doc.add(content);
 
        indexWriter = new IndexWriter(ramDirectory, config);
        indexWriter.addDocument(doc);
        indexWriter.close();