Dear all,
I am not much into searching, however, I used Lucene to do some text
postprocessing, (esp. stemming) using low level tools generously
gathered in Lucene.
I was very happy to see the memory footprint improvement in the
Hunspell stemmer algorithm
(https://issues.apache.org/jira/browse/LU
Can you just use the tokenstream api? Thats the one we maintain and support...
On Sat, Jun 14, 2014 at 10:42 AM, Michal Lopuszynski wrote:
> Dear all,
>
> I am not much into searching, however, I used Lucene to do some text
> postprocessing, (esp. stemming) using low level tools generously
> gat
Hi Robert,
thank you for your answer!
Hmmm... I need a plain stemmer, i.e. a functionality taking a word and
returning a list of stems.
Wrapping every word in tokenstream, which does a lot of things I do
not need, seems like an overkill and waste of resources...
Is there any problem with keeping
You don't have to wrap every word in a tokenstream, they can be reused!
Sorry, but i think this is really the best API if you want to use
lucene's analyzers. You can use the tokenstream API with 4.8 and
benchmark it against using that stemmer api with 4.7 :)
On Mon, Jun 16, 2014 at 4:16 AM, Micha