Yes I think so too. Apparently this can be a ram hog if you have zillions
of fields?
I plan to post an issue with more details over the weekend when I will
have the time to write up the results I saw.
On Fri, Apr 26, 2019, 10:31 AM David Smiley
wrote:
> Maybe what you propose would allow FST.c
Maybe what you propose would allow FST.cachedRootArcs to be eliminated?
Basically your algorithm could decide that the root would always be direct
access.
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
On Thu, Apr 25, 2019 at 5:18 PM Dawid Weiss wrote
> I'm curious how the hot cache you describe would be maintained and
> accessed.
The only gain I was successful at was with static precomputed cache.
Calculated from a priori data (hot paths) or, in the extreme, the
entire FST is converted to a hash map. :) This allows very fast
lookups of any nod
Hi Dawid,
The heuristic I used was to encode using the direct-array approach
when more than 1/4 of the array indices would be filled (ie max-label
- min-label / num-labels < 4), and otherwise to use the existing
packed array encoding. I only applied the direct encoding when we
previously would hav
Hi Mike,
My experience tells me that in practice it's really difficult to tell
which nodes should be expanded (where this "cost" of binary lookup
would significantly outweight a direct offset jump). I had some luck
in speeding up (very intensive) lookups by creating a hash of [node,
arc label] =>
I've been experimenting with a new FST encoding, and the performance
gains are exciting on FST-intensive benchmarks like for the suggesters
and for PKLookup in luceneutil. In our production system we see some
gains in regular search performance as well, although these are modest
since FST lookup is