Hunspell low level interface in Lucene 4.8

2014-06-14 Thread Michal Lopuszynski
Dear all, I am not much into searching, however, I used Lucene to do some text postprocessing, (esp. stemming) using low level tools generously gathered in Lucene. I was very happy to see the memory footprint improvement in the Hunspell stemmer algorithm (https://issues.apache.org/jira/browse/LU

Re: Hunspell low level interface in Lucene 4.8

2014-06-15 Thread Robert Muir
Can you just use the tokenstream api? Thats the one we maintain and support... On Sat, Jun 14, 2014 at 10:42 AM, Michal Lopuszynski wrote: > Dear all, > > I am not much into searching, however, I used Lucene to do some text > postprocessing, (esp. stemming) using low level tools generously > gat

Re: Hunspell low level interface in Lucene 4.8

2014-06-16 Thread Michal Lopuszynski
Hi Robert, thank you for your answer! Hmmm... I need a plain stemmer, i.e. a functionality taking a word and returning a list of stems. Wrapping every word in tokenstream, which does a lot of things I do not need, seems like an overkill and waste of resources... Is there any problem with keeping

Re: Hunspell low level interface in Lucene 4.8

2014-06-16 Thread Robert Muir
You don't have to wrap every word in a tokenstream, they can be reused! Sorry, but i think this is really the best API if you want to use lucene's analyzers. You can use the tokenstream API with 4.8 and benchmark it against using that stemmer api with 4.7 :) On Mon, Jun 16, 2014 at 4:16 AM, Micha