Alan, I'd also like to comment on this:
The reason we have TokenStreamComponents and ReuseStrategies (as I
understand it) is not because they may have to load large resource
files or dictionaries or whatever, but it’s because building a
TokenStream is itself quite a heavy operation due to
Yes I'm using the term "Analyzer" in a generic sense, also concerned
about TokenStream init costs, garbage, etc.
There are a ton of uses here other than indexwriter,
AnalyzingSuggesters building FSTs, etc etc.
I don't think we need to try to add even more complexity because of
users implementing
Hey Robert,
Analyzers themselves can be heavy and load large data files, etc, I agree, but
I’m really talking about token stream construction. The way things are set up,
we expect the heavy lifting to be done when the Analyzer is constructed, but
these heavy resources should then be shared
Alan: a couple thoughts:
Analyzers are not just used for formulating queries, but also may be
used by highlighters and other things on document results at query
time.
Some analyzers may do too-expensive/garbage-creating stuff on
construction, that you wouldn't want to do at query-time.
Hi all,
I’ve been on holiday and away from a keyboard for a week, so that means I of
course spent my time thinking about lucene Analyzers and specifically their
ReuseStrategies…
Building a TokenStream can be quite a heavy operation, and so we try and reuse
already-constructed token streams as