Personally, although I understand the rationale and performance
ramifications of the current approach of including deleted documents, I
would agree that DF and IDF should definitely be accurate, despite
deletions. So, if they aren't, I'd suggest filing a bug Jira. Granted it
might be rejected as "by design" or "won't fix" or "improvement", but it's
worth having the discussion.
Maybe one theory from the old days is that the model of "batch update" would
by definition include an optimize step. But now with Solr considered by some
to be a "NoSQL database" and with (near) real-time updates, that model is
clearly obsolete.
-- Jack Krupansky
-----Original Message-----
From: Apoorva Gaurav
Sent: Tuesday, June 17, 2014 11:15 AM
To: solr-user ; Ahmet Arslan
Subject: Re: docFreq coming to be more than 1 for unique id field
Yes we have updates on these. Didn't try optimizing will do. But isn't the
unique field supposed to be unique?
On Tue, Jun 17, 2014 at 8:37 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:
Hi,
Just a guess, do you have deletions? What happens when you optimize and
re-try?
On Tuesday, June 17, 2014 5:58 PM, Apoorva Gaurav <
apoorva.gau...@myntra.com> wrote:
Hello All,
We are using solr 4.4.0. We have a uniqueKey of type solr.StrField. We
need
to extract docs in a pre-defined order if they match a certain condition.
Our query is of the format
uniqueField:(id1 ^ weight1 OR id2 ^ weight2 ..... OR idN ^ weightN)
where weight1 > weight2 > ........ > weightN
But the result is not in the desired order. On debugging the query we've
found out that for some of the documents docFreq is higher than 1 and
hence
their tf-idf based score is less than others. What can be the reason
behind
a unique id field having docFreq greater than 1? How can we prevent it?
--
Thanks & Regards,
Apoorva
--
Thanks & Regards,
Apoorva