Re: surprising scoring when using multi_match's cross_field

2014-06-30 Thread Christoph Lingg
Hi Stephane, yes I did try it but the results did not change. However, I reduced the number of shards from 5 to 1 and now the queryNorm is the same for every document. I learned that every shard is an independent lucene index and therefore different weights are likely to occur. However, the

surprising scoring when using multi_match's cross_field

2014-06-27 Thread Christoph Lingg
Hello! I am using the multi_match's cross_field http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-cross-fields query. It works very well and is exactly what I need. However, in some rare circumstances the order of the results doesn't

Re: surprising scoring when using multi_match's cross_field

2014-06-27 Thread Christoph Lingg
Another effect I do not understand ist the queryNorm which differs between documents, reading the documents I assumed them to be constant. From the lucene documentation http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/search/Similarity.html : queryNorm(q) is a normalizing factor

Re: surprising scoring when using multi_match's cross_field

2014-06-27 Thread Christoph Lingg
hm, after some investigations it turns out that queryNorm is related to the shard. I observed that only one of the five shard has a different query norm, all the others have equal ones. I will retry with only one shard to see if things are getting better. -- You received this message because

Re: performance of multi_match

2014-06-26 Thread Christoph Lingg
Hm, I encounter strange scoring results I do not understand I tracked down the scoring and it seems like the 'queryWeight' is missing sometimes. thats what explain give me for one document: { value: 8.252264, description: weight(collector_1.default.raw:salzburg^18.0 in 11412869)

Re: performance of multi_match

2014-06-26 Thread Christoph Lingg
other unexpected results arise due to different queryNorms: for the first result i get a query norm: { value: 0.0059806756, description: queryNorm } for some other documents it's: { value: 0.0031318406, description: queryNorm } the querynorm is multiplied to create the score, so it

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Hi Cedric and Stephane, Thanks for your feedback! Following your ideas I removed any filtering and custom scoring from the query. I do get better results, but the efficiency of multi_match is still not as good as edismax (3 or 4 times slower). I do not understand how multi_match is more

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Hi Cedric, What about your sharding? Is it the same as with solr? I have 5 shards without replication (one node). Would it be faster if it were only one shard? Did you identify some particulier queries being slow? there is a general trend of all queries beeing slower, not only some

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
What about your sharding? Is it the same as with solr? I have 5 shards without replication (one node). Would it be faster if it were only one shard? Same with solr? I didn't use sharding with solr. Does disabling sharding improve the performance significantly, at least if you

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
It may not be because you have more fields but because your elasticsearch query matches a lot more documents than the solr one, that's worth checking. thanks for that tip, but it's not the case here -- You received this message because you are subscribed to the Google Groups

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Disabling sharding shouldn't make any significant difference thanks! It would demand some work to isolate these queries. However, I managed to find out the reason why the query last much longer: the number of queried fields increased from 9 (solr) to 25 (es). I thought this had

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Could you also share the query you are running? do run the cross_field query against the default field or the 'raw' field? it looks like this: { function_score: {functions: [{script_score: {script: 1. + 50. * doc['importance'].value}}], boost_mode: sum, score_mode: sum,

Re: performance of multi_match

2014-06-25 Thread Christoph Lingg
Out of curiosity, what kind of performance do you get when you only run the search on '.raw' fields and not regular fields (with edgengram). Obviously the result of the query will not be the same as before as the whole world should match if the edgengram are out of the picture. I had