Re: performance of multi_match

2014-06-27 Thread Stephane Bastian
Hello Christoph, Sorry but I won't be able to provide hints on this one. Hope you'll find a solution Stephane On 06/26/2014 05:43 PM, Christoph Lingg wrote: other unexpected results arise due to different queryNorms: for the first result i get a query norm: | { value: 0.0059806756,

Re: performance of multi_match

2014-06-26 Thread Christoph Lingg
Hm, I encounter strange scoring results I do not understand I tracked down the scoring and it seems like the 'queryWeight' is missing sometimes. thats what explain give me for one document: { value: 8.252264, description: weight(collector_1.default.raw:salzburg^18.0 in 11412869)

Re: performance of multi_match

2014-06-26 Thread Christoph Lingg
other unexpected results arise due to different queryNorms: for the first result i get a query norm: { value: 0.0059806756, description: queryNorm } for some other documents it's: { value: 0.0031318406, description: queryNorm } the querynorm is multiplied to create the score, so it

Re: performance of multi_match

2014-06-25 Thread Stephane Bastian
Hello Christoph, Just wanted to add that it would be great if you could report back your findings (good or bad) to the group. We're especially interested in this because we're going to install Photon and would love it to work as fast as possible ;) Stéphane Bastian -- You received this

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Hi Cedric and Stephane, Thanks for your feedback! Following your ideas I removed any filtering and custom scoring from the query. I do get better results, but the efficiency of multi_match is still not as good as edismax (3 or 4 times slower). I do not understand how multi_match is more

Re: performance of multi_match

2014-06-25 Thread Stephane Bastian
I guess you already know this tool, but just in case you don't. I usually use BigDesk: https://github.com/lukas-vlcek/bigdesk to check if there is something wrong with Heap size or any metrics that it provides (cache size, etc) -- You received this message because you are subscribed to the

Re: performance of multi_match

2014-06-25 Thread Cédric Hourcade
What about your sharding? Is it the same as with solr? Did you identify some particulier queries being slow? Can you compare the number of results returned between elasticsearch and solr? Cédric Hourcade c...@wal.fr On Wed, Jun 25, 2014 at 10:12 AM, Christoph Lingg c.li...@gmail.com wrote: Hi

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Hi Cedric, What about your sharding? Is it the same as with solr? I have 5 shards without replication (one node). Would it be faster if it were only one shard? Did you identify some particulier queries being slow? there is a general trend of all queries beeing slower, not only some

Re: performance of multi_match

2014-06-25 Thread Cédric Hourcade
What about your sharding? Is it the same as with solr? I have 5 shards without replication (one node). Would it be faster if it were only one shard? Same with solr? Did you identify some particulier queries being slow? there is a general trend of all queries beeing slower, not only some

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
What about your sharding? Is it the same as with solr? I have 5 shards without replication (one node). Would it be faster if it were only one shard? Same with solr? I didn't use sharding with solr. Does disabling sharding improve the performance significantly, at least if you

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
It may not be because you have more fields but because your elasticsearch query matches a lot more documents than the solr one, that's worth checking. thanks for that tip, but it's not the case here -- You received this message because you are subscribed to the Google Groups

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Disabling sharding shouldn't make any significant difference thanks! It would demand some work to isolate these queries. However, I managed to find out the reason why the query last much longer: the number of queried fields increased from 9 (solr) to 25 (es). I thought this had

Re: performance of multi_match

2014-06-25 Thread Stephane Bastian
Thanks. I'm starting to get a better idea of the whole picture ;) Could you also share the query you are running? do run the cross_field query against the default field or the 'raw' field? Stéphane Bastian On 06/25/2014 05:07 PM, Christoph Lingg wrote: Disabling sharding shouldn't

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Could you also share the query you are running? do run the cross_field query against the default field or the 'raw' field? it looks like this: { function_score: {functions: [{script_score: {script: 1. + 50. * doc['importance'].value}}], boost_mode: sum, score_mode: sum,

Re: performance of multi_match

2014-06-25 Thread Stephane Bastian
Out of curiosity, what kind of performance do you get when you only run the search on '.raw' fields and not regular fields (with edgengram). Obviously the result of the query will not be the same as before as the whole world should match if the edgengram are out of the picture. I had some

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Out of curiosity, what kind of performance do you get when you only run the search on '.raw' fields and not regular fields (with edgengram). Obviously the result of the query will not be the same as before as the whole world should match if the edgengram are out of the picture. I had

Re: performance of multi_match

2014-06-24 Thread Cédric Hourcade
Hello, It seems your Elasticsearch query is doing a lot more, there is custom scoring, some filtering with OR on missing fields, sub queries, more fields, etc. Were you doing exactly the same filtering/scoring with Solr? Can you incremently test and compare your queries performance, starting

Re: performance of multi_match

2014-06-24 Thread Stephane Bastian
Hello, It seems to me that the cross_field does more than the SOLR dismax query. To compare the same thing in both ES and Solr, you could run the disMax query with Es and start from there == http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html