Having multiple cores point to the same index is, except for special circumstances where one of the cores is guaranteed to be read only, a Bad Thing.
So it sounds like you've found your issue... Best Erick On Mon, May 6, 2013 at 4:44 AM, Iker Mtnz. Apellaniz <mitxin...@gmail.com> wrote: > Thanks Erick, > I think we found the problem. When defining the cores for both shards we > define both of them in the same instanceDir, like this: > <core schema="schema.xml" shard="shard2" instanceDir="1_collection/" > name="1_collection" config="solrconfig.xml" collection="1_collection"/> > <core schema="schema.xml" shard="shard4" instanceDir="1_collection/" > name="1_collection" config="solrconfig.xml" collection="1_collection"/> > > Each shard should have its own folder, so the final configuration should > be like this: > <core schema="schema.xml" shard="shard2" instanceDir="1_collection/shard2/" > name="1_collection" config="solrconfig.xml" collection="1_collection"/> > <core schema="schema.xml" shard="shard4" instanceDir="1_collection/shard4/" > name="1_collection" config="solrconfig.xml" collection="1_collection"/> > > Can anyone confirm this? > > Thanks, > Iker > > > 2013/5/4 Erick Erickson <erickerick...@gmail.com> > >> Sounds like you've explicitly routed the same document to two >> different shards. Document replacement only happens locally to a >> shard, so the fact that you have documents with the same ID on two >> different shards is why you're getting duplicate documents. >> >> Best >> Erick >> >> On Fri, May 3, 2013 at 3:44 PM, Iker Mtnz. Apellaniz >> <mitxin...@gmail.com> wrote: >> > We are currently using version 4.2. >> > We have made tests with a single document and it gives us a 2 document >> > count. But if we force to shard into te first machine, the one with a >> > unique shard, the count gives us 1 document. >> > I've tried using distrib=false parameter, it gives us no duplicate >> > documents, but the same document appears to be in two different shards. >> > >> > Finally, about the separate directories, We have only one directory for >> the >> > data in each physical machine and collection, and I don't see any >> subfolder >> > for the different shards. >> > >> > Is it possible that we have something wrong with the dataDir >> configuration >> > to use multiple shards in one machine? >> > >> > <dataDir>${solr.data.dir:}</dataDir> >> > <directoryFactory name="DirectoryFactory" >> > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> >> > >> > >> > >> > 2013/5/3 Erick Erickson <erickerick...@gmail.com> >> > >> >> What version of Solr? The custom routing stuff is quite new so >> >> I'm guessing 4x? >> >> >> >> But this shouldn't be happening. The actual index data for the >> >> shards should be in separate directories, they just happen to >> >> be on the same physical machine. >> >> >> >> Try querying each one with &distrib=false to see the counts >> >> from single shards, that may shed some light on this. It vaguely >> >> sounds like you have indexed the same document to both shards >> >> somehow... >> >> >> >> Best >> >> Erick >> >> >> >> On Fri, May 3, 2013 at 5:28 AM, Iker Mtnz. Apellaniz >> >> <mitxin...@gmail.com> wrote: >> >> > Hi, >> >> > We have currently a solrCloud implementation running 5 shards in 3 >> >> > physical machines, so the first machine will have the shard number 1, >> the >> >> > second machine shards 2 & 4, and the third shards 3 & 5. We noticed >> that >> >> > while queryng numFoundDocs decreased when we increased the start >> param. >> >> > After some investigation we found that the documents in shards 2 to >> 5 >> >> > were being counted twice. Querying to shard 2 will give you back the >> >> > results for shard 2 & 4, and the same thing for shards 3 & 5. Our >> guess >> >> is >> >> > that the physical index for both shard 2&4 is shared, so the shards >> don't >> >> > know which part of it is for each one. >> >> > The uniqueKey is correctly defined, and we have tried using shard >> >> prefix >> >> > (shard1!docID). >> >> > >> >> > Is there any way to solve this problem when a unique physical >> machine >> >> > shares shards? >> >> > Is it a "real" problem os it just affects facet & numResults? >> >> > >> >> > Thanks >> >> > Iker >> >> > >> >> > -- >> >> > /** @author imartinez*/ >> >> > Person me = *new* Developer(); >> >> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); >> >> > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); >> >> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, >> >> World"]}); >> >> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); >> >> > me.setWebs({*urbasaabentura.com, ikertxef.com*}); >> >> > *return* me; >> >> >> > >> > >> > >> > -- >> > /** @author imartinez*/ >> > Person me = *new* Developer(); >> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); >> > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); >> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, >> World"]}); >> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); >> > *return* me; >> > > > > -- > /** @author imartinez*/ > Person me = *new* Developer(); > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]}); > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > *return* me;