Lucene field, StringField vs TextField

A string and a piece of text, what's the difference. From the viewpoint of programmer, there is no differences.

Read a text file into Java, you always get a string.

In Lucene, they are different, even it's not looks so obvious. A string is a single unit that not supposed to be separated, analyzed. For example, the id, email, url, date, etc. The string itself is a term.

Text is content, article, post, document and anything that may read by human. This is the thing you want to index and search. It should be analyzed, indexed and optionally stored. It's very sensible to encapsulate all these properties in to an abstraction, this is what TextField for, a sugar class.

Using the primitive Field class usually is unnecessary, if you know what you want, you can always find a sugar class in the package org.apache.lucene.document.

Suppose you want a field to represent the unique id of the document, how to define the field. This is what I did

 
        String id = UUID.randomUUID().toString();
        doc.add(new StoredField("id", id ));
        doc.add(new Field("id", new BytesRef(id),StringField.TYPE_STORED )); 
 
 

Because the first one won't get it indexed, and second one won't store the value.

This is exactly what StringField for, a stored, not analyzed and indexed field.

 
        doc.add(new StringField("id", id,Field.Store.YES ));
 

For beginners, the syntax of Lucene field definition API can be very confusing, and they changes dramatically from version to version.