On Thu, May 22, 2014 at 4:31 PM, Erik Rose grinche...@gmail.com wrote:
Alright, try this on for size. :-)
Since the built-in regex-ish filters want to be all clever and
index-based, why not use the JS script plugin, which is happy to run as a
post-processing phase?
curl -s -XGET
Martijn took a swing at it just now. He eliminated any scoring-based
slowdown, like so (constant_score_filter)…
curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{
query: {
filtered: {
query: {
match_all: {}
On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote:
I'm trying to move Mozilla's source code search engine (dxr.mozilla.org)
from a custom-written SQLite trigram index to ES. In the current production
incarnation, we support fast regex (and, by extension, wildcard) searches
Leading wildcards are really expensive. Maybe you can try creating a copy
of your content field that reverses the tokens using reverse token filter
[1]. By doing this you turn those expensive leading wildcards into
trailing wildcards which should give you better performance. I think your
query
Leading wildcards are really expensive. Maybe you can try creating a copy
of your content field that reverses the tokens using reverse token filter
[1].
Good advice, typically, but notice I have wildcards on either side.
Reversing just makes the trailing wildcard expensive. :-)
--
You
Aye, and then you can use edit distance on single words (fuzzy query) to
cope with fast typers
On May 22, 2014 8:22 PM, Robert Muir robert.m...@elasticsearch.com
wrote:
On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote:
I'm trying to move Mozilla's source code search engine
This is definitely a great approach for a database, but it won't work
exactly the same way for an inverted index because the datastructure
is totally different.
Ah, I was afraid of that. I hoped, due to the field being unanalyzed (and
the documentation's noted restriction that wildcard
Alright, try this on for size. :-)
Since the built-in regex-ish filters want to be all clever and index-based,
why not use the JS script plugin, which is happy to run as a
post-processing phase?
curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{
query: {