In What is Lucene index, we mentioned that any raw data has to be modeled to a document which consist of fields.

This post will take a look at how to define fields in Solr schema.xml.

Schema

The schema in Solr is just like schema in relational database, the field is like column. But Solr's schema is much flexible than RDBMS. The documents can have different fields, they also can have same fields with different options.

Another difference is the documents in Solr is flat structure, means they can not have foreign keys or recursive structures.

Fields

Fields are similar to database table columns. Each field has a type which defined by fieldType element in schema.xml.

 
 
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
 
 

A type has at least a name and the implementing class. The name will be reference by field element's type attribute.

 
<field name="name" type="string" indexed="true" stored="true"/>
 

You can define almost any fields type you want, not like relation database that has fixed types to choose.

Each field can has options specified by attribute, the most common option is indexed and stored.

indexed: This field should be searchable or sortable, the value of this field will be analyzed and indexed.

If a field is not indexed, then it should be set to stored. Not indexed field only included in search result.

stored: Set to true indicate the field should be retrievable and eligible for inclusion in search results.

multiValued: true if this field may contain multiple values per document. For example, the author of a book, it may be there are more than one author. In relation database, define multiple column for same data is awkward. But in Solr is encouraged.

The schema.xml of example in Solr release has details description about fields options.

 
F:\setup\jar\solr-4.2.1\example\solr\collection1\conf\schema.xml
 

More examples:

 
   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
   <field name="name" type="text_general" indexed="true" stored="true"/>
   <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
 
   <field name="weight" type="float" indexed="true" stored="true"/>
   <field name="price"  type="float" indexed="true" stored="true"/>
   <field name="popularity" type="int" indexed="true" stored="true" />
   <field name="inStock" type="boolean" indexed="true" stored="true" />
 
   <field name="store" type="location" indexed="true" stored="true"/>
 

More field type examples:

 
 
    <fieldType name="random" class="solr.RandomSortField" indexed="true" />
 
 
    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>
 
 
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
 
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
 
 

Dynamic fields

This is a even flexible feature compared to relation database.

 
<dynamicField name="*_dt" type="date" indexed="true"  stored="true"/>
 

If the document contains a field don't match any field definition, then it will lookup dynamic fields, if the name ends with _dt, it is considered a valid field.