Automatic generate id for Solr document

Solr document is schema free, they can have different fields with each other. But every Solr document must has an id field and it must be unique. Otherwise Solr will failed at startup and print those errors.

 
org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyField.
 

MongoDB is the only NoSQL database that did this right, it has a built-in unique id mechanism, every BSON object get an id automatically, no need configuration.

We all know relational database usually have an unique id field as the primary key and it's usually configured as auto increment. Solr is more like a NoSQL system, but it still needs the similar mechanism because there must be a way to uniquely identify an object in almost any system. In MySQL, it will looks like this.

 
DROP TABLE `mytable`;
CREATE TABLE `mytable` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    --....
    PRIMARY KEY (`id`)
 
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
 
 

We may query database by the id, but in a search engine, the id almost useless, you never gonna need to search the id field, so we better let the system auto generate this field for us.

The unique id field is tremendously useful when you need to update or delete a document, because regular searches in Solr can't uniquely identify a document, because it's term based. It's quite common that Solr beginners ignores this issue and make mistakes until they need to delete or update a document.

If you are using Solr 4 example configuration, just add those to solrconfig.xml.

 
<updateRequestProcessorChain>
 
    <processor class="solr.UUIDUpdateProcessorFactory">
      <str name="fieldName">id</str>
    </processor>
 
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>
 

Make sure the schema.xml contains those fields.

 
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>id</uniqueKey>