Re: Trigram-accelerated regex searches

2014-09-29 Thread Nikolas Everett
On Thu, May 22, 2014 at 4:31 PM, Erik Rose grinche...@gmail.com wrote: Alright, try this on for size. :-) Since the built-in regex-ish filters want to be all clever and index-based, why not use the JS script plugin, which is happy to run as a post-processing phase? curl -s -XGET

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Martijn took a swing at it just now. He eliminated any scoring-based slowdown, like so (constant_score_filter)… curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: { filtered: { query: { match_all: {}

Re: Trigram-accelerated regex searches

2014-05-22 Thread Robert Muir
On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote: I'm trying to move Mozilla's source code search engine (dxr.mozilla.org) from a custom-written SQLite trigram index to ES. In the current production incarnation, we support fast regex (and, by extension, wildcard) searches

Re: Trigram-accelerated regex searches

2014-05-22 Thread Matt Weber
Leading wildcards are really expensive. Maybe you can try creating a copy of your content field that reverses the tokens using reverse token filter [1]. By doing this you turn those expensive leading wildcards into trailing wildcards which should give you better performance. I think your query

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Leading wildcards are really expensive. Maybe you can try creating a copy of your content field that reverses the tokens using reverse token filter [1]. Good advice, typically, but notice I have wildcards on either side. Reversing just makes the trailing wildcard expensive. :-) -- You

Re: Trigram-accelerated regex searches

2014-05-22 Thread Itamar Syn-Hershko
Aye, and then you can use edit distance on single words (fuzzy query) to cope with fast typers On May 22, 2014 8:22 PM, Robert Muir robert.m...@elasticsearch.com wrote: On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote: I'm trying to move Mozilla's source code search engine

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
This is definitely a great approach for a database, but it won't work exactly the same way for an inverted index because the datastructure is totally different. Ah, I was afraid of that. I hoped, due to the field being unanalyzed (and the documentation's noted restriction that wildcard

Re: Trigram-accelerated regex searches

2014-05-22 Thread Erik Rose
Alright, try this on for size. :-) Since the built-in regex-ish filters want to be all clever and index-based, why not use the JS script plugin, which is happy to run as a post-processing phase? curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{ query: {