I found Mike's blog post regarding Lucene 4.X from a while ago [0]. In the* '*Other Changes*'* section Mike states "Analyzers must always provide a reusable token stream, by implementing the Analyzer.createComponents method (reusableTokenStream has been removed and tokenStream is now final, in Analzyer)." This provides a good bit ore context therefore I'm going to continue on createComponents route with the aim of implementing the newer 4.X Lucene API. In the meantime, if you get any updated or have a code sample it would be very much appreciated. Thanks Lewis
[0] http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html On Mon, May 11, 2015 at 2:03 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Suneel, > > On Sat, May 9, 2015 at 11:21 AM, Suneel Marthi <smar...@apache.org> wrote: > >> Mahout 0.9 and 0.10.0 are using Lucene 4.6.1. There's been a change in the >> TokenStream workflow in Lucene post-Lucene 4.5. >> > > Yes I know that after looking into the codebase. Thanks for clarifying! > > >> >> What exactly are u trying to do and where is it u r stuck now? It would >> help if u posted a code snippet or something. >> >> > In particular I am working on the following implementation [0] which uses > the following code > > TokenStream stream = analyzer.reusableTokenStream(key.toString(), new > StringReader(sContent.toString())); > > Of note here is that the analyzer object is instantiated as of type > DefaultAnalyzer [1]. It is further noted that the analyzer.reusableTokenStream > API is deprecated as you've noted so I am just wondering what the suggested > API semantics are in order to achieve the desired upgrade. > Thanks in advance again for any input. > Lewis > > [0] > https://github.com/DigitalPebble/behemoth/blob/master/mahout/src/main/java/com/digitalpebble/behemoth/mahout/LuceneTokenizerMapper.java#L52-L53 > [1] > http://svn.apache.org/repos/asf/mahout/tags/mahout-0.7/core/src/main/java/org/apache/mahout/vectorizer/DefaultAnalyzer.java > > > -- *Lewis*