Sounds like you've explicitly routed the same document to two different shards. Document replacement only happens locally to a shard, so the fact that you have documents with the same ID on two different shards is why you're getting duplicate documents.
Best Erick On Fri, May 3, 2013 at 3:44 PM, Iker Mtnz. Apellaniz <mitxin...@gmail.com> wrote: > We are currently using version 4.2. > We have made tests with a single document and it gives us a 2 document > count. But if we force to shard into te first machine, the one with a > unique shard, the count gives us 1 document. > I've tried using distrib=false parameter, it gives us no duplicate > documents, but the same document appears to be in two different shards. > > Finally, about the separate directories, We have only one directory for the > data in each physical machine and collection, and I don't see any subfolder > for the different shards. > > Is it possible that we have something wrong with the dataDir configuration > to use multiple shards in one machine? > > <dataDir>${solr.data.dir:}</dataDir> > <directoryFactory name="DirectoryFactory" > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> > > > > 2013/5/3 Erick Erickson <erickerick...@gmail.com> > >> What version of Solr? The custom routing stuff is quite new so >> I'm guessing 4x? >> >> But this shouldn't be happening. The actual index data for the >> shards should be in separate directories, they just happen to >> be on the same physical machine. >> >> Try querying each one with &distrib=false to see the counts >> from single shards, that may shed some light on this. It vaguely >> sounds like you have indexed the same document to both shards >> somehow... >> >> Best >> Erick >> >> On Fri, May 3, 2013 at 5:28 AM, Iker Mtnz. Apellaniz >> <mitxin...@gmail.com> wrote: >> > Hi, >> > We have currently a solrCloud implementation running 5 shards in 3 >> > physical machines, so the first machine will have the shard number 1, the >> > second machine shards 2 & 4, and the third shards 3 & 5. We noticed that >> > while queryng numFoundDocs decreased when we increased the start param. >> > After some investigation we found that the documents in shards 2 to 5 >> > were being counted twice. Querying to shard 2 will give you back the >> > results for shard 2 & 4, and the same thing for shards 3 & 5. Our guess >> is >> > that the physical index for both shard 2&4 is shared, so the shards don't >> > know which part of it is for each one. >> > The uniqueKey is correctly defined, and we have tried using shard >> prefix >> > (shard1!docID). >> > >> > Is there any way to solve this problem when a unique physical machine >> > shares shards? >> > Is it a "real" problem os it just affects facet & numResults? >> > >> > Thanks >> > Iker >> > >> > -- >> > /** @author imartinez*/ >> > Person me = *new* Developer(); >> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); >> > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); >> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, >> World"]}); >> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); >> > me.setWebs({*urbasaabentura.com, ikertxef.com*}); >> > *return* me; >> > > > > -- > /** @author imartinez*/ > Person me = *new* Developer(); > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]}); > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > *return* me;