How to generate highlighted summary with Lucene and Clojure

Lucene highlight package support highlight keywords in a piece of text based on a query string, here is how you can do it in Clojure:

 
(defn gen-frag [ query text]
  (let [
        analyzer (org.apache.lucene.analysis.standard.StandardAnalyzer.)
 
        query (.parse (org.apache.lucene.queryparser.classic.QueryParser. "" analyzer) query) 
        highlighter (org.apache.lucene.search.highlight.Highlighter. (org.apache.lucene.search.highlight.SimpleHTMLFormatter.) (org.apache.lucene.search.highlight.QueryScorer. query))
        tokenStream (org.apache.lucene.search.highlight.TokenSources/getTokenStream "default" text analyzer)
        frag (.getBestTextFragments highlighter tokenStream text false 4)
  ]
    frag
  )  
)
 

How to use it, suppose you have a text file, given a query string, for example, the string you input in search engine, it will find the best match text fragments and highlight the keywords in your query string.

 
(map #(println (str "--" (.toString %) "--")) (gen-frag "read text file string utf8" (slurp "c:\\tmp\\content.txt")))
 

Get this output

 
(--      
 
 
 
 
 
 
 
 
 
 
 
 
 
I was trying to <B>read</B> <B>utf8</B> <B>text</B> from a <B>text</B> <B>file</B>--
 
--, <B>string</B> is just a byte array. The function can just <B>read</B> the <B>file</B> raw data to memory and reference--
 
nil --. 
 
 
 
A <B>file</B>, in its nature its a byte array, even it is a <B>text</B> <B>file</B>. To get a <B>String</B> from the <B>file</B>--
 
nil -- version. 
 
 
 
code:java
 
    public static <B>String</B> readFileString ( <B>String</B> <B>file</B> ) {
 
        StringBuffer <B>text</B>--
 
nil nil)
 

The equivalent Java version

 
 
    public static void highLightClojure(String text, String query) {
        try {
            Query queryToSearch;
 
            queryToSearch = new QueryParser("", analyzer).parse(query);
            TokenStream tokenStream = TokenSources.getTokenStream( "default",text, analyzer);            
            Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(),
                    new QueryScorer(queryToSearch));
 
            TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 4);
            for (int j = 0; j < frag.length; j++) {
                if ((frag[j] != null)) {
                    System.out.println("score: " + frag[j].getScore() + ", frag: " + (frag[j].toString()));
                }
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (InvalidTokenOffsetsException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (ParseException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }