Re: Solr Cloud wiki and branch notes

Andrzej Bialecki Fri, 15 Jan 2010 16:37:15 -0800

Hi,

My 0.02 PLN on the subject ...


Terminology
-----------

First the terminology: reading your emails I have a feeling that my headis about to explode. We have to agree on the vocabulary, otherwise wehave no hope of reaching any consensus. I propose the followingvocabulary that has been in use and is generally understood:

* (global) search index: a complete collection of all indexed documents.From a conceptual point of view, this is our complete search space.

* index shard: a non-overlapping part of the search index. All shards inthe system form together the complete search space of the search index.E.g. having initially one big index I could divide it into multipleshards using MultiPassIndexSplitter, and if I combined all the shardsagain, using IndexMerger, I should obtain the original complete searchindex (modulo changed Lucene docids .. doesn't matter). I stronglybelieve in micro-sharding, because they are much easier to handle andreplicate. Also, since we control the shards we don't have to deal withoverlapping shards, which is the curse of P2P search.

* partitioning: a method whereby we can determine the target shard IDbased on a doc ID.

* search node: an application that provides search and update to one ormore shards.


* search host: a machine that may run 1 or more search nodes.

* Shard Manager: a component that keeps track of allocation of shards tonodes (plus more, see below).

Now, to translate this into Solr-speak: depending on the details of thedesign, and the evolution of Solr, one search node could be one Solrinstance that manages one shard per core. Let's forget here about thecurrent distributed search component, and the current replication - theycould be useful in this design as a raw transport mechanism, but someoneelse would be calling the shots (see below).


Architecture
------------

The replication and load balancing is a problem with many existingsolutions, and this one in particular reminds me strongly of the HadoopHDFS. In fact, early on during the development of Hadoop [1] I wonderedwhether we could reuse HDFS to manage Lucene indexes instead of opaqueblocks of fixed size. It turned out to be infeasible, but the model ofNamenode/Datanode still looks useful in our case, too.

I believe there are many useful lessons lurking inHadoop/HBase/Zookeeper that we could reuse in our design. The followingis just a straightforward port of the Namenode/Datanode concept.

Let's imagine a component called ShardManager that is responsible formanaging the following data:


* list of shard ID-s that together form the complete search index,
* for each shard ID, list of search nodes that serve this shard.
* issuing replication requests

* maintaining the partitioning function (see below), so that updates aredirected to correct shards

* maintaining heartbeat to check for dead nodes

* providing search clients with a list of nodes to query in order toobtain all results from the search index.

Whenever a new search node comes up, it reports its local shard ID-s(versioned) to the ShardManager. Based on these reports from thecurrently active nodes, the ShardManager builds this mapping of shardsto nodes, and requests replication if some shards are too old, or if thereplication count is too low, allocating these shards to selected nodes(based on a policy of some kind).

I believe most of the above functionality could be facilitated byZookeeper, including the election of the node that runs the ShardManager.


Updates
-------

We need a partitioning schema that splits documents more or less evenlyamong shards, and at the same time allows us to split or mergeunbalanced shards. The simplest function that we could imagine is thefollowing:


        hash(docId) % numShards

though this has the disadvantage that any larger update will affectmultiple shards, thus creating an avalanche of replication requests ...so a sequential model would be probably better, where ranges of docIdsare assigned to shards.

Now, if any particular shard is too unbalanced, e.g. too large, it couldbe further split in two halves, and the ShardManager would have torecord this exception. This is a very similar process to a region splitin HBase, or a page split in btree DBs. Conversely, shards that are toosmall could be joined. This is the icing on the cake, so we can leave itfor later.

After commit, a node contacts the ShardManager to report a new versionof the shard. ShardManager issues replication requests to other nodesthat hold a replica of this shard.


Search
------

There should be a component sometimes referred to as query integrator(or search front-end) that is the entry and exit point for user searchrequests. On receiving a search request this component gets a list ofrandomly selected nodes from SearchManager to contact (the listcontaining all shards that form the global index), sends the query andintegrates partial results (under a configurable policy fortimeouts/early termination), and sends back the assembled results to theuser.

Again, somewhere in the background the knowledge of who to contactshould be handled by Zookeeper.


That's it for now from the top of my head ...

-----------

[1]http://www.mail-archive.com/nutch-develop...@lists.sourceforge.net/msg02273.html


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Solr Cloud wiki and branch notes

Reply via email to