Re: Trigram-accelerated regex searches

2014-09-29 Thread Nikolas Everett
On Thu, May 22, 2014 at 4:31 PM, Erik Rose grinche...@gmail.com wrote: Alright, try this on for size. :-) Since the built-in regex-ish filters want to be all clever and index-based, why not use the JS script plugin, which is happy to run as a post-processing phase? curl -s -XGET

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Martijn took a swing at it just now. He eliminated any scoring-based slowdown, like so (constant_score_filter)… curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: { filtered: { query: { match_all: {}

Re: Trigram-accelerated regex searches

2014-05-22 Thread Robert Muir
On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote: I'm trying to move Mozilla's source code search engine (dxr.mozilla.org) from a custom-written SQLite trigram index to ES. In the current production incarnation, we support fast regex (and, by extension, wildcard) searches

Re: Trigram-accelerated regex searches

2014-05-22 Thread Matt Weber
Leading wildcards are really expensive. Maybe you can try creating a copy of your content field that reverses the tokens using reverse token filter [1]. By doing this you turn those expensive leading wildcards into trailing wildcards which should give you better performance. I think your query

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Leading wildcards are really expensive. Maybe you can try creating a copy of your content field that reverses the tokens using reverse token filter [1]. Good advice, typically, but notice I have wildcards on either side. Reversing just makes the trailing wildcard expensive. :-) -- You

Re: Trigram-accelerated regex searches

2014-05-22 Thread Itamar Syn-Hershko
Aye, and then you can use edit distance on single words (fuzzy query) to cope with fast typers On May 22, 2014 8:22 PM, Robert Muir robert.m...@elasticsearch.com wrote: On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote: I'm trying to move Mozilla's source code search engine

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
This is definitely a great approach for a database, but it won't work exactly the same way for an inverted index because the datastructure is totally different. Ah, I was afraid of that. I hoped, due to the field being unanalyzed (and the documentation's noted restriction that wildcard

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Alright, try this on for size. :-) Since the built-in regex-ish filters want to be all clever and index-based, why not use the JS script plugin, which is happy to run as a post-processing phase? curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: {

Trigram-accelerated regex searches

2014-05-21 Thread Erik Rose
I'm trying to move Mozilla's source code search engine (dxr.mozilla.org) from a custom-written SQLite trigram index to ES. In the current production incarnation, we support fast regex (and, by extension, wildcard) searches by extracting trigrams from the search pattern and paring down the