Right, but even if that worked, you'd then get docs being assigned
to the wrong shard. The shard assignment would be something
like (hash(id)/3). So a document currently on shard 0 would be
indexed next time, perhaps, on shard 2, leaving two "live" docs
in your system with the same ID. Bad Things would happen
then...

I believe that currently your only real option is to re-index from
scratch when you add more shards.

I was thinking about this at one point. Unless the guys work
some magic, it will be an expensive process. Not as
expensive as re-indexing for sure, but consider 12
documents in 3 shards.

shard1 - 1, 4, 7, 10
shard2 - 2, 5, 8, 11
shard3 - 3, 6, 9, 12

Now you add a shard and the docs are re-distributed
shard1 - 1, 5, 9
shard2 - 2, 6, 10
shard3 - 3, 7, 11
shard4 - 4, 8, 12

In this simple case, only 3 out of your 12 documents stayed on the
same shard! All the rest had to be moved.

Then the indexes have to be distributed across all replicas, then....

Now, there won't have to be any analysis done. You won't have to
reconstruct all of the documents from your system-of-record. You
won't have to a _ton_ of work that you originally had to do. This should
be enormously faster than re-indexing. But it still won't be
something to casually do on a live system under load <G>.....

Disclaimer: I really may be talking through my hat here, but this _sounds_
right.

FWIW
Erick

On Mon, Oct 8, 2012 at 4:33 AM, Upayavira <u...@odoko.co.uk> wrote:
> Given that Solr does not support distributed IDF, adding a shard without
> balancing the number of documents could seriously skew your scoring. If
> you are okay with that, then the next question is what happens if you
> download the clusterstate.json from ZooKeeper, and add another entry,
> along the lines of "shard3":{}, then upload it again, what would happen
> then?
>
> My theory is that the next host you start up would become the first node
> of shard3. Worth a try (unless someone more knowledgeable tells us
> otherwise!)
>
> Upayavira
>
> On Mon, Oct 8, 2012, at 01:35 AM, Radim Kolar wrote:
>> i am reading this: http://wiki.apache.org/solr/SolrCloud section
>> Re-sizing a Cluster
>>
>> Its possible to add shard to an existing index? I do not need to get
>> data redistributed, they can stay where they are, its enough for me if
>> new entries will be distributed into new number of shards. restarting
>> solr is fine.

Reply via email to