subject:"Hunspell low level interface in Lucene 4.8"

Hunspell low level interface in Lucene 4.8

2014-06-14 Thread Michal Lopuszynski

Dear all, I am not much into searching, however, I used Lucene to do some text postprocessing, (esp. stemming) using low level tools generously gathered in Lucene. I was very happy to see the memory footprint improvement in the Hunspell stemmer algorithm (https://issues.apache.org/jira/browse/LU

Re: Hunspell low level interface in Lucene 4.8

2014-06-15 Thread Robert Muir

Can you just use the tokenstream api? Thats the one we maintain and support... On Sat, Jun 14, 2014 at 10:42 AM, Michal Lopuszynski wrote: > Dear all, > > I am not much into searching, however, I used Lucene to do some text > postprocessing, (esp. stemming) using low level tools generously > gat

Re: Hunspell low level interface in Lucene 4.8

2014-06-16 Thread Michal Lopuszynski

Hi Robert, thank you for your answer! Hmmm... I need a plain stemmer, i.e. a functionality taking a word and returning a list of stems. Wrapping every word in tokenstream, which does a lot of things I do not need, seems like an overkill and waste of resources... Is there any problem with keeping

Re: Hunspell low level interface in Lucene 4.8

2014-06-16 Thread Robert Muir

You don't have to wrap every word in a tokenstream, they can be reused! Sorry, but i think this is really the best API if you want to use lucene's analyzers. You can use the tokenstream API with 4.8 and benchmark it against using that stemmer api with 4.7 :) On Mon, Jun 16, 2014 at 4:16 AM, Micha