Thanks Shawn, you are indeed correct these are NRT replicas! Thanks very much for the advice and possible resolutions. I went down the NRT path as in the past I've read advice from some of the Solr gurus recommending to use these replica types unless you have a very good reason not to. I do have basic auth enabled on my Solr cloud configuration and believe I can't use PULL replicas until the following JIRA is resolved (https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-11904) as Solr users the index replicator for this process. With this being the case I'll attempt your second suggestion and see how I go. Thanks again for taking the time to look at this it really was a confusing one to debug. Have a great weekend fellow Solr users and happy Solr-ing.
Dwane ________________________________ From: Shawn Heisey <apa...@elyograg.org> Sent: Friday, 29 November 2019 4:51 AM To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> Subject: Re: Cursor mark page duplicates On 11/28/2019 1:30 AM, Dwane Hall wrote: > I asked a question on the forum a couple of weeks ago regarding cursorMark > duplicates. I initially thought it may be due to HDFSCaching because I was > unable replicate the issue on local indexes but unfortunately the dreaded > duplicates have returned!! For a refresher I was seeing what I thought was > duplicate documents appearing randomly on the last page of one cursor, and > the first page of the next. So if rows=50 the duplicates are document 50 on > page 1 and document 1 on page 2. > > After further investigation I don't actually believe these documents are > duplicates but the same document being returned from a different replica on > each page. After running a diff on the two documents the only difference is > the field "Solr_Update_Date" which I insert on each document as it is > inserted into the corpus. > > This is how the managed-schema mapping for this field looks > > <field name="Solr_Update_Date" type="rdate" indexed="true" stored="true" > default="NOW" /> This can happen with SolrCloud using NRT replicas. The default replica type is NRT. Based on the core names returned by the [shard] field in your responses, it looks like you do have NRT replicas. There are two solutions. The better solution is to use TimestampUpdateProcessorFactory for setting your timestamp field instead of a default of NOW in the schema. An alternate solution is to use TLOG/PULL replica types instead of NRT -- that way replicas are populated by copying exact index contents instead of independently indexing. Thanks, Shawn