How to do term query in Lucene index example

The term is the basic unit in Lucene indexing and searching, to query a term you need to create index and index data then create query object that contains terms you want to search. This example illustrate how to do term query in Lucene. To know more about term What is Lucene Term.

This example will use Eclipse and Gradle build tool. The Lucene version used here is 5.3.0

Step 1. Create Gradle project

Go to Eclipse File -> New -> Gradle project to create a new Gradle project. And select Java Quickstart sample project.

We need to use Lucene 5.3.0 in this project, modify the build.gradle file as below.

 
apply plugin: 'java'
apply plugin: 'eclipse'
 
 
ext.luceneVersion= "5.3.0"
 
sourceCompatibility = 1.5
version = '1.0'
jar {
    manifest {
        attributes 'Implementation-Title': 'Gradle Quickstart', 'Implementation-Version': version
    }
}
 
repositories {
    mavenLocal()
    mavenCentral()
}
 
dependencies {
    compile group: 'commons-collections', name: 'commons-collections', version: '3.2'
 
    compile "org.apache.lucene:lucene-core:${luceneVersion}"
    compile "org.apache.lucene:lucene-analyzers-common:${luceneVersion}"
    compile "org.apache.lucene:lucene-queryparser:${luceneVersion}"
 
    testCompile group: 'junit', name: 'junit', version: '4.+'
}
 
test {
    systemProperties 'property': 'value'
}
 
uploadArchives {
    repositories {
       flatDir {
           dirs 'repos'
       }
    }
}
 
 

The project structure:

Step 2. Create a Lucene index in memory

First we should add a new class to the project, TermQueryExample.java.

To search a term in Lucene, we must have an index. An index is like a database, unlike relational database which save your data in tables and using SQL to retrieve the data, Lucene index stores rich documents in inverted index format.

Stores the raw data in Lucene is optional, a common usage pattern is save data in traditional database store like relational database or NoSQL store like MongoDB and save index in Lucene, Lucene itself only keeps the primary id that point to data store. This is so called Not Index Here.

To create an index in Lucene, you need several components: the analyzer, the index directory and index writer.

 
    public static Analyzer analyzer = new StandardAnalyzer();
    public static IndexWriterConfig config = new IndexWriterConfig(analyzer);
    public static RAMDirectory ramDirectory = new RAMDirectory();
    public static IndexWriter indexWriter;
 
 

Analyzer breaks your text into terms, index directory is where you store index data structure, it can be flat files or in memory, for demonstration, this example use in memory index, the index writer accept documents and apply the analyzer to documents then update the index.

In our document, there are two text field, the author and title of a blog post.

 
    public static void createDoc(String author, String title) throws IOException {
        Document doc = new Document();
        doc.add(new TextField("author", author, Field.Store.YES));
        doc.add(new TextField("title", title, Field.Store.YES));
 
        indexWriter.addDocument(doc);
    }
 

Now we can add the document to index by calling createDoc with the following method:

 
    public static void createIndex() {
        try {
                indexWriter = new IndexWriter(ramDirectory, config);    
                createDoc("Sam", "Lucene index option analyzed vs not analyzed");    
                createDoc("Sam", "Lucene field boost and query time boost example");    
                createDoc("Jack", "How to do Lucene search highlight example");
                createDoc("Smith","Lucene BooleanQuery is depreacted as of 5.3.0" );
                createDoc("Smith","What is term vector in Lucene" );
 
                indexWriter.close();
        } catch (IOException | NullPointerException ex) {
            System.out.println("Exception : " + ex.getLocalizedMessage());
        } 
    }
 
 

Step 3. Search and display results

There are many ways to query a term in Lucene index, we need a standard procedure that perform a search query and print out the result list. As the following method shows:

 
    public static void searchIndexAndDisplayResults(Query query) {
        try {
            IndexReader idxReader = DirectoryReader.open(ramDirectory);
            IndexSearcher idxSearcher = new IndexSearcher(idxReader);
 
            TopDocs docs = idxSearcher.search(query, 10);
            System.out.println("length of top docs: " + docs.scoreDocs.length);
            for (ScoreDoc doc : docs.scoreDocs) {
                Document thisDoc = idxSearcher.doc(doc.doc);
                System.out.println(doc.doc + "\t" + thisDoc.get("author")
                        + "\t" + thisDoc.get("title"));
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
        }
    }
 

Step 4. Search a single term

With all things setup we can now perform query on the index, the basic query is to search a single term in the index. To find which titles contains term "title:lucene" or "title:example", execute the following code in main function

 
 
    public static void searchSingleTerm(String field, String termText){
        Term term = new Term(field, termText);
        TermQuery termQuery = new TermQuery(term);
 
        searchIndexAndDisplayResults(termQuery);
    }
 
 
    public static void main(String args []) {
        createIndex();
        searchSingleTerm("title","lucene");
        searchSingleTerm("title","example");
        searchSingleTerm("title","Example");
        searchSingleTerm("title","lucene example");
        ramDirectory.close();
    }
 
 

The output

 
length of top docs: 5
3    Smith    Lucene BooleanQuery is depreacted as of 5.3.0
4    Smith    What is term vector in Lucene
0    Sam    Lucene index option analyzed vs not analyzed
1    Sam    Lucene field boost and query time boost example
2    Jack    How to do Lucene search highlight example
length of top docs: 2
1    Sam    Lucene field boost and query time boost example
2    Jack    How to do Lucene search highlight example
length of top docs: 0
length of top docs: 0
 
 

The first column is the document id, this is allocated for each document at index time, and then the fields of the document. It also shows the number of matched documents.

Note that TermQuery is case sensitive, because in standard analyzer, all term are lowercased, so search for "Example" has no matches. Text are broken to words, there is no term such as "title:lucene example" in the index. As you can see the last two query didn't match any documents.

The TermQuery object mostly is used to construct complex queries in combination with other queries pragmatically, it's not supposed to accept user input queries. To accept user input we need query parser, which will analyze the input and turn it into various query object, the term in query may be first convert to TermQuery object and then combined with other terms in the query.

Search multiple terms

You can use BooleanQuery to search multiple terms, for example search documents that contains "lucene" in title and the author must be "sam":

 
public static void searchBooleanQuery(){
        TermQuery query = new TermQuery(new Term("title", "lucene"));
        TermQuery query2 = new TermQuery(new Term("author", "sam"));
 
        BooleanQuery booleanQuery = new BooleanQuery.Builder()
        .add(query2, Occur.MUST)
        .add(query, Occur.SHOULD).build();
 
        searchIndexAndDisplayResults(booleanQuery);
 
 
    }
 
 

The output

 
 
length of top docs: 2
0    Sam    Lucene index option analyzed vs not analyzed
1    Sam    Lucene field boost and query time boost example
 
 

Using QueryParser

To make more flexible and user friendly queries, you can use QueryParser. Just type anything you interested and let the parser to construct query object for you.

 
    public static void searchQueryParser(String query) throws ParseException {
        QueryParser parser = new QueryParser("title", analyzer);
        Query parsedQuery = parser.parse(query);
 
        searchIndexAndDisplayResults(parsedQuery);
    }
 
    public static void main(String args []) throws ParseException {
        createIndex();
 
        searchQueryParser("lucene term vector");
        searchQueryParser("lucene OR example");
        searchQueryParser("lucene AND example AND author:sam");
 
        ramDirectory.close();
    }
 
 
 

Output:

 
length of top docs: 5
4    Smith    What is term vector in Lucene
3    Smith    Lucene BooleanQuery is depreacted as of 5.3.0
0    Sam    Lucene index option analyzed vs not analyzed
1    Sam    Lucene field boost and query time boost example
2    Jack    How to do Lucene search highlight example
length of top docs: 5
1    Sam    Lucene field boost and query time boost example
2    Jack    How to do Lucene search highlight example
3    Smith    Lucene BooleanQuery is depreacted as of 5.3.0
4    Smith    What is term vector in Lucene
0    Sam    Lucene index option analyzed vs not analyzed
length of top docs: 1
1    Sam    Lucene field boost and query time boost example
 

QueryParser is the default and very basic parser in Lucene. There are many other query parsers that serves different purpose for example MultiFieldQueryParser , PrecedenceQueryParser , Xml-Query-Parser, DisMax , etc. When the default parser can fulfill the requirements, consider these specialized parsers.