Re: Analyzer lifecycles

2021-06-09 Thread Robert Muir
Alan, I'd also like to comment on this: The reason we have TokenStreamComponents and ReuseStrategies (as I understand it) is not because they may have to load large resource files or dictionaries or whatever, but it’s because building a TokenStream is itself quite a heavy operation due to

Re: Analyzer lifecycles

2021-06-09 Thread Robert Muir
Yes I'm using the term "Analyzer" in a generic sense, also concerned about TokenStream init costs, garbage, etc. There are a ton of uses here other than indexwriter, AnalyzingSuggesters building FSTs, etc etc. I don't think we need to try to add even more complexity because of users implementing

Re: Analyzer lifecycles

2021-06-09 Thread Alan Woodward
Hey Robert, Analyzers themselves can be heavy and load large data files, etc, I agree, but I’m really talking about token stream construction. The way things are set up, we expect the heavy lifting to be done when the Analyzer is constructed, but these heavy resources should then be shared

Re: Analyzer lifecycles

2021-06-08 Thread Robert Muir
Alan: a couple thoughts: Analyzers are not just used for formulating queries, but also may be used by highlighters and other things on document results at query time. Some analyzers may do too-expensive/garbage-creating stuff on construction, that you wouldn't want to do at query-time.

Analyzer lifecycles

2021-06-08 Thread Alan Woodward
Hi all, I’ve been on holiday and away from a keyboard for a week, so that means I of course spent my time thinking about lucene Analyzers and specifically their ReuseStrategies… Building a TokenStream can be quite a heavy operation, and so we try and reuse already-constructed token streams as