Re: a bug of solr distributed search

2010-10-27 Thread Toke Eskildsen
On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote: And a third potential reason - it's arguably a feature instead of a bug for some applications. Depending on how I organize my shards, give me the most relevant document from each shard for this search seems like it could be useful. You can

Re: a bug of solr distributed search

2010-10-26 Thread Ron Mayer
Andrzej Bialecki wrote: On 2010-10-25 11:22, Toke Eskildsen wrote: On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: But itshows a problem of distrubted search without common idf. A doc will get different score in different shard. Bingo. I really don't understand why this fundamental problem

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: But itshows a problem of distrubted search without common idf. A doc will get different score in different shard. Bingo. I really don't understand why this fundamental problem with sharding isn't mentioned more often. Every time the advice use

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 11:22, Toke Eskildsen wrote: On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: But itshows a problem of distrubted search without common idf. A doc will get different score in different shard. Bingo. I really don't understand why this fundamental problem with sharding isn't

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: * there is an exact solution to this problem, namely to make two distributed calls instead of one (first call to collect per-shard IDFs for given query terms, second call to submit a query rewritten with the global IDF-s). This

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 13:37, Toke Eskildsen wrote: On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: * there is an exact solution to this problem, namely to make two distributed calls instead of one (first call to collect per-shard IDFs for given query terms, second call to submit a query

Re: a bug of solr distributed search

2010-07-26 Thread MitchK
-- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p995407.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-25 Thread Li Li
where is the link of this patch? 2010/7/24 Yonik Seeley yo...@lucidimagination.com: On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote: why do we do not send the output of TermsComponent of every node in the cluster to a Hadoop instance? Since TermsComponent does the map-part of the

Re: a bug of solr distributed search

2010-07-25 Thread Li Li
the solr version I used is 1.4 2010/7/26 Li Li fancye...@gmail.com: where is the link of this patch? 2010/7/24 Yonik Seeley yo...@lucidimagination.com: On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote: why do we do not send the output of TermsComponent of every node in the

Re: a bug of solr distributed search

2010-07-24 Thread MitchK
distributed IDF (like at the mentioned JIRA-issue) to normalize your results's scoring. But the mentioned problem at this mailing-list-posting has nothing to do with that... Regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote: why do we do not send the output of TermsComponent of every node in the cluster to a Hadoop instance? Since TermsComponent does the map-part of the map-reduce concept, Hadoop only needs to reduce the stuff. Maybe we even do not need

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
other suggestions? -Yonik http://www.lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
That only works if the docs are exactly the same - they may not be. Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, don't they? -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html Sent from

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:40 PM, MitchK mitc...@web.de wrote: That only works if the docs are exactly the same - they may not be. Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, don't they? Documents aren't supposed to be duplicated across shards... so the presence

Re: a bug of solr distributed search

2010-07-22 Thread Yonik Seeley
As the comments suggest, it's not a bug, but just the best we can do for now since our priority queues don't support removal of arbitrary elements. I guess we could rebuild the current priority queue if we detect a duplicate, but that will have an obvious performance impact. Any other

Re: a bug of solr distributed search

2010-07-22 Thread Chris Hostetter
: As the comments suggest, it's not a bug, but just the best we can do : for now since our priority queues don't support removal of arbitrary FYI: I updated the DistributedSearch wiki to be more clear about this -- it previously didn't make it explicitly clear that docIds were suppose to be

a bug of solr distributed search

2010-07-21 Thread Li Li
in QueryComponent.mergeIds. It will remove document which has duplicated uniqueKey with others. In current implementation, it use the first encountered. String prevShard = uniqueDoc.put(id, srsp.getShard()); if (prevShard != null) { // duplicate detected

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
Li Li, this is the intended behaviour, not a bug. Otherwise you could get back the same record in a response for several times, which may not be intended by the user. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
not be intended by the user. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
you can't prevent this without custom coding or making a document's occurence unique. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983880.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread Siva Kommuri
How about sorting over the score? Would that be possible? On Jul 21, 2010, at 12:13 AM, Li Li wrote: in QueryComponent.mergeIds. It will remove document which has duplicated uniqueKey with others. In current implementation, it use the first encountered. String prevShard =

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
sees the doc_X firstly at shard_A and ignores it at shard_B. That means, that the doc maybe would occur at page 10 in pagination, although it *should* occur at page 1 or 2. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
firstly at shard_A and ignores it at shard_B. That means, that the doc maybe would occur at page 10 in pagination, although it *should* occur at page 1 or 2. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search