An overview of Lucene analyzers

The analyzer is the core component of indexing process, your query string also has to be analyzed at first and then be searched. Usually the analyzing of query string is called parsing, but the internal working mechanism are the same. Both of them need to split the text into token stream.

What is term vector in Lucene

In Lucene's JAVA Doc, term vector is defined as "A term vector is a list of the document's terms and their number of occurrences in that document.". Indicated that each document has one term vector which is a list .

A term is the basic unit searchable in Lucene. In analysing and index phrase, text are broken to streams, the element in the stream is term, in query phrase, the query first be parsed to terms and then use it to query the Lucene index.

A term is the basic unit searchable in Lucene. In analysing and index phrase, text are broken to streams, the .

How to search index with Lucene API

In last post, we introduced some important classes of Lucece write index components. Another component is searching the index.

The Lucene 4 API has changed a lot. IndexSearcher need an instance of IndexReader, get this instance by calling the static method open of DirectoryReader. The Directory is the index data storage used in writing index.

How to write index with Lucene 4 API

Solr is based on Lucene, but provide a WEB user interface, you specify the fields by xml configuration file and let Solr to call Lucene API to index these fields.

This post demonstrate how Solr did it under the hood. We use the Lucene API directly to index information.

Previous Page 1 2 3