Re: Improve performance of FST Arc traversal

2019-04-26 Thread Michael Sokolov
Yes I think so too. Apparently this can be a ram hog if you have zillions of fields? I plan to post an issue with more details over the weekend when I will have the time to write up the results I saw. On Fri, Apr 26, 2019, 10:31 AM David Smiley wrote: > Maybe what you propose would allow FST.c

Re: Improve performance of FST Arc traversal

2019-04-26 Thread David Smiley
Maybe what you propose would allow FST.cachedRootArcs to be eliminated? Basically your algorithm could decide that the root would always be direct access. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Thu, Apr 25, 2019 at 5:18 PM Dawid Weiss wrote

Re: Improve performance of FST Arc traversal

2019-04-25 Thread Dawid Weiss
> I'm curious how the hot cache you describe would be maintained and > accessed. The only gain I was successful at was with static precomputed cache. Calculated from a priori data (hot paths) or, in the extreme, the entire FST is converted to a hash map. :) This allows very fast lookups of any nod

Re: Improve performance of FST Arc traversal

2019-04-25 Thread Michael Sokolov
Hi Dawid, The heuristic I used was to encode using the direct-array approach when more than 1/4 of the array indices would be filled (ie max-label - min-label / num-labels < 4), and otherwise to use the existing packed array encoding. I only applied the direct encoding when we previously would hav

Re: Improve performance of FST Arc traversal

2019-04-25 Thread Dawid Weiss
Hi Mike, My experience tells me that in practice it's really difficult to tell which nodes should be expanded (where this "cost" of binary lookup would significantly outweight a direct offset jump). I had some luck in speeding up (very intensive) lookups by creating a hash of [node, arc label] =>

Improve performance of FST Arc traversal

2019-04-25 Thread Michael Sokolov
I've been experimenting with a new FST encoding, and the performance gains are exciting on FST-intensive benchmarks like for the suggesters and for PKLookup in luceneutil. In our production system we see some gains in regular search performance as well, although these are modest since FST lookup is