How to generate highlighted summary with Lucene and Clojure
Lucene highlight package support highlight keywords in a piece of text based on a query string, here is how you can do it in Clojure:
How to use it, suppose you have a text file, given a query string, for example, the string you input in search engine, it will find the best match text fragments and highlight the keywords in your query string.
How to use it, suppose you have a text file, given a query string, for example, the string you input in search engine, it will find the best match text fragments and highlight the keywords in .Lucene highlighter package TokenSource deprecated methods
To highlight terms, we need a token stream, the TokenSource class usually the first choice to do this. This class is all about get a token stream from all kinds of inputs.
The tricky part is the text may be analyzed or not at indexing time, which needs different way to generate token stream.
Lucene fragmenter overview
In Lucene, the search highlight consist of two components: the fragmenter and highlighter. The first step is fragmenting, then we can optionally apply highlighting on each text fragment.
The process of fragmenting will select text pieces that best match the searched keywords from the full text of the document. It gives user the a small context about the searched terms, to help users judge how the document relevant to their search.
The process of fragmenting will select text pieces that best match the searched keywords from the full text of the document. It gives user the a small context about the searched terms, .