Thanks Erick. We could not recollect what could have happened in between..
Yes. We are seeing the same document in 2 shards.
Uniquefiled is set as uuid in schema and declared as String. Will go with
reindexing.
schema.xml : field name=uuid_s type=string indexed=true
stored=true
Hmmm, with that setup you should _not_ be getting
duplicate documents.
So, when you see duplicate documents, you're seeing
the exact same UUID on two shards, correct? My best
guess is that you've done something innocent-seeming
(that perhaps you forgot!) the resulted in this. Otherwise
there
Thanks Erick. As I understand now that the entire cluster goes down if any
one shard is down, my first confusion is clarified.
Following are the other details
We really need to see details since I'm guessing we're talking
past each other. So:
*1 exactly how are you indexing documents?*
bq: What happens if a shard(both leader and replica) goes down. If the
document on the dead shard is updated, will it forward the document to the
new shard. If so, when the dead shard comes up again, will this not be
considered for the same hask key range?
No. The index operation will just fail.
Alessandro,
Thanks.
see some confusion here.
*First of all you need a smart client that will load balance the docs to
index. Let's say the CloudSolrClient .
*
All these 5 shards are configured to load-balancer and requests are sent to
the load-balancer and whichever server is up, will accept
@lucene.apache.org
Subject: RE: Solr Cloud: Duplicate documents in multiple shards
When are you generating the UUID exactly? If you set the unique ID field
on an update, and it contains a new UUID, you have effectively created a
new document. Just a thought.
-Original Message
I suspect you can delete a document from the wrong shard by using
update?distrib=false.
I also suspect there are people here who would like to help you debug
this, because it has been reported before, but we haven't yet been able
to see whether it occurred due to human or software error.
Unable to delete by passing distrib=false as well. Also it is difficult to
identify those duplicate documents among the 130 million.
Is there a way we can see the generated hash key and mapping them to the
specific shard?
--
View this message in context:
, July 21, 2015 4:11 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud: Duplicate documents in multiple shards
Unable to delete by passing distrib=false as well. Also it is difficult to
identify those duplicate documents among the 130 million.
Is there a way we can see the generated hash key
dominate the distribution
of data.
-Original Message-
From: Reitzel, Charles
Sent: Tuesday, July 21, 2015 9:55 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Cloud: Duplicate documents in multiple shards
When are you generating the UUID exactly? If you set the unique ID
Thanks Erick for clarifying ..
We are not explicitly setting the compositeId. We are using numShards=5
alone as part of the server start up. We are using uuid as unique field.
One sample id is :
possting.mongo-v2.services.com-intl-staging-c2d2a376-5e4a-11e2-8963-0026b9414f30
Not sure how it
bq: We have 130 million documents in our set up and the routing key is set as
compositeId.
The most likely explanation is that somehow you've sent the same document out
with different routing keys. So what is the ID field (or, more generally, your
uniqueKey field) for a pair of duplicated
12 matches
Mail list logo