Thanks Shawn, you are indeed correct these are NRT replicas! Thanks very much 
for the advice and possible resolutions. I went down the NRT path as in the 
past I've read advice from some of the Solr gurus recommending to use these 
replica types unless you have a very good reason not to. I do have basic auth 
enabled on my Solr cloud configuration and believe I can't use PULL replicas 
until the following JIRA is resolved 
(https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-11904) as 
Solr users the index replicator for this process. With this being the case I'll 
attempt your second suggestion and see how I go. Thanks again for taking the 
time to look at this it really was a confusing one to debug. Have a great 
weekend fellow Solr users and happy Solr-ing.

Dwane
________________________________
From: Shawn Heisey <apa...@elyograg.org>
Sent: Friday, 29 November 2019 4:51 AM
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Subject: Re: Cursor mark page duplicates

On 11/28/2019 1:30 AM, Dwane Hall wrote:
> I asked a question on the forum a couple of weeks ago regarding cursorMark 
> duplicates.  I initially thought it may be due to HDFSCaching because I was 
> unable replicate the issue on local indexes but unfortunately the dreaded 
> duplicates have returned!! For a refresher I was seeing what I thought was 
> duplicate documents appearing randomly on the last page of one cursor, and 
> the first page of the next.  So if rows=50 the duplicates are document 50 on 
> page 1 and document 1 on page 2.
>
> After further investigation I don't actually believe these documents are 
> duplicates but the same document being returned from a different replica on 
> each page.  After running a diff on the two documents the only difference is 
> the field "Solr_Update_Date" which I insert on each document as it is 
> inserted into the corpus.
>
> This is how the managed-schema mapping for this field looks
>
> <field name="Solr_Update_Date" type="rdate" indexed="true" stored="true" 
> default="NOW" />
This can happen with SolrCloud using NRT replicas.  The default replica
type is NRT.  Based on the core names returned by the [shard] field in
your responses, it looks like you do have NRT replicas.

There are two solutions.  The better solution is to use
TimestampUpdateProcessorFactory for setting your timestamp field instead
of a default of NOW in the schema.  An alternate solution is to use
TLOG/PULL replica types instead of NRT -- that way replicas are
populated by copying exact index contents instead of independently indexing.

Thanks,
Shawn

Reply via email to