Personally, although I understand the rationale and performance ramifications of the current approach of including deleted documents, I would agree that DF and IDF should definitely be accurate, despite deletions. So, if they aren't, I'd suggest filing a bug Jira. Granted it might be rejected as "by design" or "won't fix" or "improvement", but it's worth having the discussion.

Maybe one theory from the old days is that the model of "batch update" would by definition include an optimize step. But now with Solr considered by some to be a "NoSQL database" and with (near) real-time updates, that model is clearly obsolete.

-- Jack Krupansky

-----Original Message----- From: Apoorva Gaurav
Sent: Tuesday, June 17, 2014 11:15 AM
To: solr-user ; Ahmet Arslan
Subject: Re: docFreq coming to be more than 1 for unique id field

Yes we have updates on these. Didn't try optimizing will do. But isn't the
unique field supposed to be unique?


On Tue, Jun 17, 2014 at 8:37 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

Hi,

Just a guess, do you have deletions? What happens when you optimize and
re-try?



On Tuesday, June 17, 2014 5:58 PM, Apoorva Gaurav <
apoorva.gau...@myntra.com> wrote:
Hello All,

We are using solr 4.4.0. We have a uniqueKey of type solr.StrField. We need
to extract docs in a pre-defined order if they match a certain condition.
Our query is of the format

uniqueField:(id1 ^ weight1 OR id2 ^ weight2 ..... OR idN ^ weightN)
where weight1 > weight2 > ........ > weightN

But the result is not in the desired order. On debugging the query we've
found out that for some of the documents docFreq is higher than 1 and hence their tf-idf based score is less than others. What can be the reason behind
a unique id field having docFreq greater than 1?  How can we prevent it?

--
Thanks & Regards,
Apoorva




--
Thanks & Regards,
Apoorva

Reply via email to