Lucene highlighter package TokenSource deprecated methods

To highlight terms, we need a token stream, the TokenSource class usually the first choice to do this. This class is all about get a token stream from all kinds of inputs.

The tricky part is the text may be analyzed or not at indexing time, which needs different way to generate token stream.

The newest API consists of two methods

    public static TokenStream getTokenStream(String field, Fields tvFields, String text, Analyzer analyzer,int maxStartOffset) throws IOException
    public static TokenStream getTermVectorTokenStreamOrNull(String field, Fields tvFields, int maxStartOffset) throws IOException

The second method create token stream from term vectors, you should only use it when your text has been analyzed at index time, this method may return null if the term vectors not exist.

The first one first try to call the second method to read from term vectors, if failed, it will fall back to reanalyze the text by the analyzer and the text passed in.

There are a bunch of overloaded methods gets deprecated in older version. But I think the old API still has its merits, they are easier to use.

To use the new API, you need to know more details, for example, you need call IndexReader.getTermVectors to pass the tvFields parameter, and the maxStartOffset I don't even know what it means. Why I need to know so many details if I just want to get a token stream?

The old API only takes 3 parameters and they are easy to get. Is it really a simplification, or just make it more complex? Read more on JIRA LUCENE-6445