On 2010-01-16 21:11, Yonik Seeley wrote:

Agreed - but it could be as simple as qualifying this with "from shardX on
node2".

Right - it's pretty clear there are both physical and logical
shards... but it's less clear to me at this point if distinguishing
them in the vocabulary helps or hurts.

You _are_ distinguishing them, you just use "physical" and "logical" :) I'm in favor of using "shard" for the logical entity, and "copy" or "replica" for the physical one. Whichever term we choose, we need to be clear about this distinction because multiple physical copies (replicas) may be deployed to multiple nodes, even though they contribute only one logical shard.


The opaque model means it's more difficult to support updates.
IMHO it makes
sense to start with a set of stricter assumptions

If we were building from scratch perhaps - but it seems like if we can
just model what people do today with Solr (but just make it a lot
easier), that's a good start.  The opaque model is what we have today,
and it's conceptually simple... the complete collection consists of
all the unique shard ids (or slices) you know about.

I would argue that the current model has been adopted out of necessity, and not because of the users' preference. Unless you want an expert-level total control over what node runs what part of the index, isn't it much more convenient to delegate all the partitioning and deployment to your "search cluster" instead of managing the partitioning and deployment yourself? Users have to do it now because Solr has no mechanism for this.


And we don't need to support everything in this model - I think we
should and will also support shards where Solr does all the
partitioning and mapping of the ID space (pluggable of course) and
then we can offer more services based on that knowledge.

Well, then if we don't intend to support updates in this iteration then perhaps there is no need to change anything in Solr, just extend Katta to run Solr searchers ... :P


You've also used some slightly new terminology... "shard ID" as
opposed to just shard, which reinforces the need for different
terminology for the physical vs the logical.

You got me ;) yes, when I say "shard" I mean the logical entity, as defined
by a set of documents - physical shard I would call a replica.

I originally started off with "replica" too... but there may only be
one copy of a physical shard, it seemed strange to call it a replica.

Yeah .. it's a replica with a replication factor of 1 :)

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to