Re: Configuring the Distributed

Jamie Johnson Sat, 03 Dec 2011 10:31:50 -0800

Again great stuff.  Once distributed update/delete works (sounds like
it's not far off) I'll have to reevaluate our current stack.


You had mentioned storing the shad hash assignments in ZK, is there a
JIRA around this?

I'll keep my eyes on the JIRA tickets.  Right now the distirbuted
updated/delete are big ones for me, the rebalancing of the cluster is
a nice to have, but hopefully I wouldn't need that capability anytime
in the near future.

One other question....my current setup has replication done on a
polling setup, I didn't notice that in the updated solrconfig, how
does this work now?

On Sat, Dec 3, 2011 at 9:00 AM, Mark Miller <markrmil...@gmail.com> wrote:
> bq. A few questions if a master goes down does a replica get
> promoted?
>
> Right - if the leader goes down there is a leader election and one of the
> replicas takes over.
>
> bq.  If a new shard needs to be added is it just a matter of
> starting a new solr instance with a higher numShards?
>
> Eventually, that's the plan.
>
> The idea is, you say something like, I want 3 shards. Now if you start up 9
> instances, the first 3 end up as shard leaders - the next 6 evenly come up
> as replicas for each shard.
>
> To change the numShards, we will need some kind of micro shards / splitting
> / rebalancing.
>
> bq. Last question, how do you change numShards?
>
> I think this is somewhat a work in progress, but I think Sami just made it
> so that numShards is stored on the collection node in zk (along with which
> config set to use). So you would change it there presumably. Or perhaps
> just start up a new server with an update numShards property and then it
> would realize that needs to be a new leader - of course then you'd want to
> rebalance probably - unless you fired up enough servers to add replicas
> too...
>
> bq. Is that right?
>
> Yup, sounds about right.
>
>
> On Fri, Dec 2, 2011 at 10:59 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
>> So I just tried this out, seems like it does the things I asked about.
>>
>> Really really cool stuff, it's progressed quite a bit in the time
>> since I took a snapshot of the branch.
>>
>> Last question, how do you change numShards?  Is there a command you
>> can use to do this now? I understand there will be implications for
>> the hashing algorithm, but once the hash ranges are stored in ZK (is
>> there a separate JIRA for this or does this fall under 2358) I assume
>> that it would be a relatively simple index split (JIRA 2595?) and
>> updating the hash ranges in solr, essentially splitting the range
>> between the new and existing shard.  Is that right?
>>
>> On Fri, Dec 2, 2011 at 10:08 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>> > I think I see it.....so if I understand this correctly you specify
>> > numShards as a system property, as new nodes come up they check ZK to
>> > see if they should be a new shard or a replica based on if numShards
>> > is met.  A few questions if a master goes down does a replica get
>> > promoted?  If a new shard needs to be added is it just a matter of
>> > starting a new solr instance with a higher numShards?  (understanding
>> > that index rebalancing does not happen automatically now, but
>> > presumably it could).
>> >
>> > On Fri, Dec 2, 2011 at 9:56 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>> >> How does it determine the number of shards to create?  How many
>> >> replicas to create?
>> >>
>> >> On Fri, Dec 2, 2011 at 4:30 PM, Mark Miller <markrmil...@gmail.com>
>> wrote:
>> >>> Ah, okay - you are setting the shards in solr.xml - thats still an
>> option
>> >>> to force a node to a particular shard - but if you take that out,
>> shards
>> >>> will be auto assigned.
>> >>>
>> >>> By the way, because of the version code, distrib deletes don't work at
>> the
>> >>> moment - will get to that next week.
>> >>>
>> >>> - Mark
>> >>>
>> >>> On Fri, Dec 2, 2011 at 1:16 PM, Jamie Johnson <jej2...@gmail.com>
>> wrote:
>> >>>
>> >>>> So I'm a fool.  I did set the numShards, the issue was so trivial it's
>> >>>> embarrassing.  I did indeed have it setup as a replica, the shard
>> >>>> names in solr.xml were both shard1.  This worked as I expected now.
>> >>>>
>> >>>> On Fri, Dec 2, 2011 at 1:02 PM, Mark Miller <markrmil...@gmail.com>
>> wrote:
>> >>>> >
>> >>>> > They are unused params, so removing them wouldn't help anything.
>> >>>> >
>> >>>> > You might just want to wait till we are further along before playing
>> >>>> with it.
>> >>>> >
>> >>>> > Or if you submit your full self contained test, I can see what's
>> going
>> >>>> on (eg its still unclear if you have started setting numShards?).
>> >>>> >
>> >>>> > I can do a similar set of actions in my tests and it works fine. The
>> >>>> only reason I could see things working like this is if it thinks you
>> have
>> >>>> one shard - a leader and a replica.
>> >>>> >
>> >>>> > - Mark
>> >>>> >
>> >>>> > On Dec 2, 2011, at 12:41 PM, Jamie Johnson wrote:
>> >>>> >
>> >>>> >> Glad to hear I don't need to set shards/self, but removing them
>> didn't
>> >>>> >> seem to change what I'm seeing.  Doing this still results in 2
>> >>>> >> documents 1 on 8983 and 1 on 7574.
>> >>>> >>
>> >>>> >> String key = "1";
>> >>>> >>
>> >>>> >>               SolrInputDocument solrDoc = new SolrInputDocument();
>> >>>> >>               solrDoc.setField("key", key);
>> >>>> >>
>> >>>> >>               solrDoc.addField("content_mvtxt", "initial value");
>> >>>> >>
>> >>>> >>               SolrServer server = servers.get("
>> >>>> http://localhost:8983/solr/collection1";);
>> >>>> >>
>> >>>> >>               UpdateRequest ureq = new UpdateRequest();
>> >>>> >>               ureq.setParam("update.chain",
>> "distrib-update-chain");
>> >>>> >>               ureq.add(solrDoc);
>> >>>> >>               ureq.setAction(ACTION.COMMIT, true, true);
>> >>>> >>               server.request(ureq);
>> >>>> >>               server.commit();
>> >>>> >>
>> >>>> >>               solrDoc = new SolrInputDocument();
>> >>>> >>               solrDoc.addField("key", key);
>> >>>> >>               solrDoc.addField("content_mvtxt", "updated value");
>> >>>> >>
>> >>>> >>               server = servers.get("
>> >>>> http://localhost:7574/solr/collection1";);
>> >>>> >>
>> >>>> >>               ureq = new UpdateRequest();
>> >>>> >>               ureq.setParam("update.chain",
>> "distrib-update-chain");
>> >>>> >>               ureq.add(solrDoc);
>> >>>> >>               ureq.setAction(ACTION.COMMIT, true, true);
>> >>>> >>               server.request(ureq);
>> >>>> >>               server.commit();
>> >>>> >>
>> >>>> >>               server = servers.get("
>> >>>> http://localhost:8983/solr/collection1";);
>> >>>> >>
>> >>>> >>
>> >>>> >>               server.commit();
>> >>>> >>               System.out.println("done");
>> >>>> >>
>> >>>> >> On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller <
>> markrmil...@gmail.com>
>> >>>> wrote:
>> >>>> >>> So I dunno. You are running a zk server and running in zk mode
>> right?
>> >>>> >>>
>> >>>> >>> You don't need to / shouldn't set a shards or self param. The
>> shards
>> >>>> are
>> >>>> >>> figured out from Zookeeper.
>> >>>> >>>
>> >>>> >>> You always want to use the distrib-update-chain. Eventually it
>> will
>> >>>> >>> probably be part of the default chain and auto turn in zk mode.
>> >>>> >>>
>> >>>> >>> If you are running in zk mode attached to a zk server, this should
>> >>>> work no
>> >>>> >>> problem. You can add docs to any server and they will be
>> forwarded to
>> >>>> the
>> >>>> >>> correct shard leader and then versioned and forwarded to replicas.
>> >>>> >>>
>> >>>> >>> You can also use the CloudSolrServer solrj client - that way you
>> don't
>> >>>> even
>> >>>> >>> have to choose a server to send docs too - in which case if it
>> went
>> >>>> down
>> >>>> >>> you would have to choose another manually - CloudSolrServer
>> >>>> automatically
>> >>>> >>> finds one that is up through ZooKeeper. Eventually it will also be
>> >>>> smart
>> >>>> >>> and do the hashing itself so that it can send directly to the
>> shard
>> >>>> leader
>> >>>> >>> that the doc would be forwarded to anyway.
>> >>>> >>>
>> >>>> >>> - Mark
>> >>>> >>>
>> >>>> >>> On Fri, Dec 2, 2011 at 12:09 AM, Jamie Johnson <jej2...@gmail.com
>> >
>> >>>> wrote:
>> >>>> >>>
>> >>>> >>>> Really just trying to do a simple add and update test, the chain
>> >>>> >>>> missing is just proof of my not understanding exactly how this is
>> >>>> >>>> supposed to work.  I modified the code to this
>> >>>> >>>>
>> >>>> >>>>                String key = "1";
>> >>>> >>>>
>> >>>> >>>>                SolrInputDocument solrDoc = new
>> SolrInputDocument();
>> >>>> >>>>                solrDoc.setField("key", key);
>> >>>> >>>>
>> >>>> >>>>                 solrDoc.addField("content_mvtxt", "initial
>> value");
>> >>>> >>>>
>> >>>> >>>>                SolrServer server = servers
>> >>>> >>>>                                .get("
>> >>>> >>>> http://localhost:8983/solr/collection1";);
>> >>>> >>>>
>> >>>> >>>>                 UpdateRequest ureq = new UpdateRequest();
>> >>>> >>>>                ureq.setParam("update.chain",
>> "distrib-update-chain");
>> >>>> >>>>                ureq.add(solrDoc);
>> >>>> >>>>                ureq.setParam("shards",
>> >>>> >>>>
>> >>>> >>>>
>>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
>> >>>> >>>>                ureq.setParam("self", "foo");
>> >>>> >>>>                ureq.setAction(ACTION.COMMIT, true, true);
>> >>>> >>>>                server.request(ureq);
>> >>>> >>>>                 server.commit();
>> >>>> >>>>
>> >>>> >>>>                solrDoc = new SolrInputDocument();
>> >>>> >>>>                solrDoc.addField("key", key);
>> >>>> >>>>                 solrDoc.addField("content_mvtxt", "updated
>> value");
>> >>>> >>>>
>> >>>> >>>>                server = servers.get("
>> >>>> >>>> http://localhost:7574/solr/collection1";);
>> >>>> >>>>
>> >>>> >>>>                 ureq = new UpdateRequest();
>> >>>> >>>>                ureq.setParam("update.chain",
>> "distrib-update-chain");
>> >>>> >>>>                 //
>> >>>> ureq.deleteById("8060a9eb-9546-43ee-95bb-d18ea26a6285");
>> >>>> >>>>                 ureq.add(solrDoc);
>> >>>> >>>>                ureq.setParam("shards",
>> >>>> >>>>
>> >>>> >>>>
>>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
>> >>>> >>>>                ureq.setParam("self", "foo");
>> >>>> >>>>                ureq.setAction(ACTION.COMMIT, true, true);
>> >>>> >>>>                server.request(ureq);
>> >>>> >>>>                 // server.add(solrDoc);
>> >>>> >>>>                server.commit();
>> >>>> >>>>                server = servers.get("
>> >>>> >>>> http://localhost:8983/solr/collection1";);
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>>                server.commit();
>> >>>> >>>>                System.out.println("done");
>> >>>> >>>>
>> >>>> >>>> but I'm still seeing the doc appear on both shards.    After the
>> first
>> >>>> >>>> commit I see the doc on 8983 with "initial value".  after the
>> second
>> >>>> >>>> commit I see the updated value on 7574 and the old on 8983.
>>  After the
>> >>>> >>>> final commit the doc on 8983 gets updated.
>> >>>> >>>>
>> >>>> >>>> Is there something wrong with my test?
>> >>>> >>>>
>> >>>> >>>> On Thu, Dec 1, 2011 at 11:17 PM, Mark Miller <
>> markrmil...@gmail.com>
>> >>>> >>>> wrote:
>> >>>> >>>>> Getting late - didn't really pay attention to your code I guess
>> - why
>> >>>> >>>> are you adding the first doc without specifying the distrib
>> update
>> >>>> chain?
>> >>>> >>>> This is not really supported. It's going to just go to the
>> server you
>> >>>> >>>> specified - even with everything setup right, the update might
>> then
>> >>>> go to
>> >>>> >>>> that same server or the other one depending on how it hashes. You
>> >>>> really
>> >>>> >>>> want to just always use the distrib update chain.  I guess I
>> don't yet
>> >>>> >>>> understand what you are trying to test.
>> >>>> >>>>>
>> >>>> >>>>> Sent from my iPad
>> >>>> >>>>>
>> >>>> >>>>> On Dec 1, 2011, at 10:57 PM, Mark Miller <markrmil...@gmail.com
>> >
>> >>>> wrote:
>> >>>> >>>>>
>> >>>> >>>>>> Not sure offhand - but things will be funky if you don't
>> specify the
>> >>>> >>>> correct numShards.
>> >>>> >>>>>>
>> >>>> >>>>>> The instance to shard assignment should be using numShards to
>> >>>> assign.
>> >>>> >>>> But then the hash to shard mapping actually goes on the number of
>> >>>> shards it
>> >>>> >>>> finds registered in ZK (it doesn't have to, but really these
>> should be
>> >>>> >>>> equal).
>> >>>> >>>>>>
>> >>>> >>>>>> So basically you are saying, I want 3 partitions, but you are
>> only
>> >>>> >>>> starting up 2 nodes, and the code is just not happy about that
>> I'd
>> >>>> guess.
>> >>>> >>>> For the system to work properly, you have to fire up at least as
>> many
>> >>>> >>>> servers as numShards.
>> >>>> >>>>>>
>> >>>> >>>>>> What are you trying to do? 2 partitions with no replicas, or
>> one
>> >>>> >>>> partition with one replica?
>> >>>> >>>>>>
>> >>>> >>>>>> In either case, I think you will have better luck if you fire
>> up at
>> >>>> >>>> least as many servers as the numShards setting. Or lower the
>> numShards
>> >>>> >>>> setting.
>> >>>> >>>>>>
>> >>>> >>>>>> This is all a work in progress by the way - what you are
>> trying to
>> >>>> test
>> >>>> >>>> should work if things are setup right though.
>> >>>> >>>>>>
>> >>>> >>>>>> - Mark
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>> On Dec 1, 2011, at 10:40 PM, Jamie Johnson wrote:
>> >>>> >>>>>>
>> >>>> >>>>>>> Thanks for the quick response.  With that change (have not
>> done
>> >>>> >>>>>>> numShards yet) shard1 got updated.  But now when executing the
>> >>>> >>>>>>> following queries I get information back from both, which
>> doesn't
>> >>>> seem
>> >>>> >>>>>>> right
>> >>>> >>>>>>>
>> >>>> >>>>>>> http://localhost:7574/solr/select/?q=*:*
>> >>>> >>>>>>> <doc><str name="key">1</str><str name="content_mvtxt">updated
>> >>>> >>>> value</str></doc>
>> >>>> >>>>>>>
>> >>>> >>>>>>> http://localhost:8983/solr/select?q=*:*
>> >>>> >>>>>>> <doc><str name="key">1</str><str name="content_mvtxt">updated
>> >>>> >>>> value</str></doc>
>> >>>> >>>>>>>
>> >>>> >>>>>>>
>> >>>> >>>>>>>
>> >>>> >>>>>>> On Thu, Dec 1, 2011 at 10:21 PM, Mark Miller <
>> >>>> markrmil...@gmail.com>
>> >>>> >>>> wrote:
>> >>>> >>>>>>>> Hmm...sorry bout that - so my first guess is that right now
>> we are
>> >>>> >>>> not distributing a commit (easy to add, just have not done it).
>> >>>> >>>>>>>>
>> >>>> >>>>>>>> Right now I explicitly commit on each server for tests.
>> >>>> >>>>>>>>
>> >>>> >>>>>>>> Can you try explicitly committing on server1 after updating
>> the
>> >>>> doc
>> >>>> >>>> on server 2?
>> >>>> >>>>>>>>
>> >>>> >>>>>>>> I can start distributing commits tomorrow - been meaning to
>> do it
>> >>>> for
>> >>>> >>>> my own convenience anyhow.
>> >>>> >>>>>>>>
>> >>>> >>>>>>>> Also, you want to pass the sys property numShards=1 on
>> startup. I
>> >>>> >>>> think it defaults to 3. That will give you one leader and one
>> replica.
>> >>>> >>>>>>>>
>> >>>> >>>>>>>> - Mark
>> >>>> >>>>>>>>
>> >>>> >>>>>>>> On Dec 1, 2011, at 9:56 PM, Jamie Johnson wrote:
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>> So I couldn't resist, I attempted to do this tonight, I
>> used the
>> >>>> >>>>>>>>> solrconfig you mentioned (as is, no modifications), I setup
>> a 2
>> >>>> shard
>> >>>> >>>>>>>>> cluster in collection1, I sent 1 doc to 1 of the shards,
>> updated
>> >>>> it
>> >>>> >>>>>>>>> and sent the update to the other.  I don't see the
>> modifications
>> >>>> >>>>>>>>> though I only see the original document.  The following is
>> the
>> >>>> test
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>> public void update() throws Exception {
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>>              String key = "1";
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>>              SolrInputDocument solrDoc = new
>> SolrInputDocument();
>> >>>> >>>>>>>>>              solrDoc.setField("key", key);
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>>              solrDoc.addField("content", "initial value");
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>>              SolrServer server = servers
>> >>>> >>>>>>>>>                              .get("
>> >>>> >>>> http://localhost:8983/solr/collection1";);
>> >>>> >>>>>>>>>              server.add(solrDoc);
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>>              server.commit();
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>>              solrDoc = new SolrInputDocument();
>> >>>> >>>>>>>>>              solrDoc.addField("key", key);
>> >>>> >>>>>>>>>              solrDoc.addField("content", "updated value");
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>>              server = servers.get("
>> >>>> >>>> http://localhost:7574/solr/collection1";);
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>>              UpdateRequest ureq = new UpdateRequest();
>> >>>> >>>>>>>>>              ureq.setParam("update.chain",
>> >>>> "distrib-update-chain");
>> >>>> >>>>>>>>>              ureq.add(solrDoc);
>> >>>> >>>>>>>>>              ureq.setParam("shards",
>> >>>> >>>>>>>>>
>> >>>> >>>>
>>  "localhost:8983/solr/collection1,localhost:7574/solr/collection1");
>> >>>> >>>>>>>>>              ureq.setParam("self", "foo");
>> >>>> >>>>>>>>>              ureq.setAction(ACTION.COMMIT, true, true);
>> >>>> >>>>>>>>>              server.request(ureq);
>> >>>> >>>>>>>>>              System.out.println("done");
>> >>>> >>>>>>>>>      }
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>> key is my unique field in schema.xml
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>> What am I doing wrong?
>> >>>> >>>>>>>>>
>> >>>> >>>>>>>>> On Thu, Dec 1, 2011 at 8:51 PM, Jamie Johnson <
>> jej2...@gmail.com
>> >>>> >
>> >>>> >>>> wrote:
>> >>>> >>>>>>>>>> Yes, the ZK method seems much more flexible.  Adding a new
>> shard
>> >>>> >>>> would
>> >>>> >>>>>>>>>> be simply updating the range assignments in ZK.  Where is
>> this
>> >>>> >>>>>>>>>> currently on the list of things to accomplish?  I don't
>> have
>> >>>> time to
>> >>>> >>>>>>>>>> work on this now, but if you (or anyone) could provide
>> >>>> direction I'd
>> >>>> >>>>>>>>>> be willing to work on this when I had spare time.  I guess
>> a
>> >>>> JIRA
>> >>>> >>>>>>>>>> detailing where/how to do this could help.  Not sure if the
>> >>>> design
>> >>>> >>>> has
>> >>>> >>>>>>>>>> been thought out that far though.
>> >>>> >>>>>>>>>>
>> >>>> >>>>>>>>>> On Thu, Dec 1, 2011 at 8:15 PM, Mark Miller <
>> >>>> markrmil...@gmail.com>
>> >>>> >>>> wrote:
>> >>>> >>>>>>>>>>> Right now lets say you have one shard - everything there
>> >>>> hashes to
>> >>>> >>>> range X.
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>> Now you want to split that shard with an Index Splitter.
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>> You divide range X in two - giving you two ranges - then
>> you
>> >>>> start
>> >>>> >>>> splitting. This is where the current Splitter needs a little
>> >>>> modification.
>> >>>> >>>> You decide which doc should go into which new index by rehashing
>> each
>> >>>> doc
>> >>>> >>>> id in the index you are splitting - if its hash is greater than
>> X/2,
>> >>>> it
>> >>>> >>>> goes into index1 - if its less, index2. I think there are a
>> couple
>> >>>> current
>> >>>> >>>> Splitter impls, but one of them does something like, give me an
>> id -
>> >>>> now if
>> >>>> >>>> the id's in the index are above that id, goto index1, if below,
>> >>>> index2. We
>> >>>> >>>> need to instead do a quick hash rather than simple id compare.
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>> Why do you need to do this on every shard?
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>> The other part we need that we dont have is to store hash
>> range
>> >>>> >>>> assignments in zookeeper - we don't do that yet because it's not
>> >>>> needed
>> >>>> >>>> yet. Instead we currently just simply calculate that on the fly
>> (too
>> >>>> often
>> >>>> >>>> at the moment - on every request :) I intend to fix that of
>> course).
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>> At the start, zk would say, for range X, goto this shard.
>> After
>> >>>> >>>> the split, it would say, for range less than X/2 goto the old
>> node,
>> >>>> for
>> >>>> >>>> range greater than X/2 goto the new node.
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>> - Mark
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>> On Dec 1, 2011, at 7:44 PM, Jamie Johnson wrote:
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>> hmmm.....This doesn't sound like the hashing algorithm
>> that's
>> >>>> on
>> >>>> >>>> the
>> >>>> >>>>>>>>>>>> branch, right?  The algorithm you're mentioning sounds
>> like
>> >>>> there
>> >>>> >>>> is
>> >>>> >>>>>>>>>>>> some logic which is able to tell that a particular range
>> >>>> should be
>> >>>> >>>>>>>>>>>> distributed between 2 shards instead of 1.  So seems
>> like a
>> >>>> trade
>> >>>> >>>> off
>> >>>> >>>>>>>>>>>> between repartitioning the entire index (on every shard)
>> and
>> >>>> >>>> having a
>> >>>> >>>>>>>>>>>> custom hashing algorithm which is able to handle the
>> situation
>> >>>> >>>> where 2
>> >>>> >>>>>>>>>>>> or more shards map to a particular range.
>> >>>> >>>>>>>>>>>>
>> >>>> >>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:34 PM, Mark Miller <
>> >>>> >>>> markrmil...@gmail.com> wrote:
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>> On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote:
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>> I am not familiar with the index splitter that is in
>> >>>> contrib,
>> >>>> >>>> but I'll
>> >>>> >>>>>>>>>>>>>> take a look at it soon.  So the process sounds like it
>> >>>> would be
>> >>>> >>>> to run
>> >>>> >>>>>>>>>>>>>> this on all of the current shards indexes based on the
>> hash
>> >>>> >>>> algorithm.
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>> Not something I've thought deeply about myself yet, but
>> I
>> >>>> think
>> >>>> >>>> the idea would be to split as many as you felt you needed to.
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>> If you wanted to keep the full balance always, this
>> would
>> >>>> mean
>> >>>> >>>> splitting every shard at once, yes. But this depends on how many
>> boxes
>> >>>> >>>> (partitions) you are willing/able to add at a time.
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>> You might just split one index to start - now it's hash
>> range
>> >>>> >>>> would be handled by two shards instead of one (if you have 3
>> replicas
>> >>>> per
>> >>>> >>>> shard, this would mean adding 3 more boxes). When you needed to
>> expand
>> >>>> >>>> again, you would split another index that was still handling its
>> full
>> >>>> >>>> starting range. As you grow, once you split every original index,
>> >>>> you'd
>> >>>> >>>> start again, splitting one of the now half ranges.
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>> Is there also an index merger in contrib which could be
>> >>>> used to
>> >>>> >>>> merge
>> >>>> >>>>>>>>>>>>>> indexes?  I'm assuming this would be the process?
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>> You can merge with IndexWriter.addIndexes (Solr also
>> has an
>> >>>> >>>> admin command that can do this). But I'm not sure where this
>> fits in?
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>> - Mark
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:18 PM, Mark Miller <
>> >>>> >>>> markrmil...@gmail.com> wrote:
>> >>>> >>>>>>>>>>>>>>> Not yet - we don't plan on working on this until a
>> lot of
>> >>>> >>>> other stuff is
>> >>>> >>>>>>>>>>>>>>> working solid at this point. But someone else could
>> jump
>> >>>> in!
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> There are a couple ways to go about it that I know of:
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> A more long term solution may be to start using micro
>> >>>> shards -
>> >>>> >>>> each index
>> >>>> >>>>>>>>>>>>>>> starts as multiple indexes. This makes it pretty fast
>> to
>> >>>> move
>> >>>> >>>> mirco shards
>> >>>> >>>>>>>>>>>>>>> around as you decide to change partitions. It's also
>> less
>> >>>> >>>> flexible as you
>> >>>> >>>>>>>>>>>>>>> are limited by the number of micro shards you start
>> with.
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> A more simple and likely first step is to use an index
>> >>>> >>>> splitter . We
>> >>>> >>>>>>>>>>>>>>> already have one in lucene contrib - we would just
>> need to
>> >>>> >>>> modify it so
>> >>>> >>>>>>>>>>>>>>> that it splits based on the hash of the document id.
>> This
>> >>>> is
>> >>>> >>>> super
>> >>>> >>>>>>>>>>>>>>> flexible, but splitting will obviously take a little
>> while
>> >>>> on
>> >>>> >>>> a huge index.
>> >>>> >>>>>>>>>>>>>>> The current index splitter is a multi pass splitter -
>> good
>> >>>> >>>> enough to start
>> >>>> >>>>>>>>>>>>>>> with, but most files under codec control these days,
>> we
>> >>>> may be
>> >>>> >>>> able to make
>> >>>> >>>>>>>>>>>>>>> a single pass splitter soon as well.
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> Eventually you could imagine using both options -
>> micro
>> >>>> shards
>> >>>> >>>> that could
>> >>>> >>>>>>>>>>>>>>> also be split as needed. Though I still wonder if
>> micro
>> >>>> shards
>> >>>> >>>> will be
>> >>>> >>>>>>>>>>>>>>> worth the extra complications myself...
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> Right now though, the idea is that you should pick a
>> good
>> >>>> >>>> number of
>> >>>> >>>>>>>>>>>>>>> partitions to start given your expected data ;)
>> Adding more
>> >>>> >>>> replicas is
>> >>>> >>>>>>>>>>>>>>> trivial though.
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> - Mark
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:35 PM, Jamie Johnson <
>> >>>> >>>> jej2...@gmail.com> wrote:
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>> Another question, is there any support for
>> repartitioning
>> >>>> of
>> >>>> >>>> the index
>> >>>> >>>>>>>>>>>>>>>> if a new shard is added?  What is the recommended
>> >>>> approach for
>> >>>> >>>>>>>>>>>>>>>> handling this?  It seemed that the hashing algorithm
>> (and
>> >>>> >>>> probably
>> >>>> >>>>>>>>>>>>>>>> any) would require the index to be repartitioned
>> should a
>> >>>> new
>> >>>> >>>> shard be
>> >>>> >>>>>>>>>>>>>>>> added.
>> >>>> >>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:32 PM, Jamie Johnson <
>> >>>> >>>> jej2...@gmail.com> wrote:
>> >>>> >>>>>>>>>>>>>>>>> Thanks I will try this first thing in the morning.
>> >>>> >>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller <
>> >>>> >>>> markrmil...@gmail.com>
>> >>>> >>>>>>>>>>>>>>>> wrote:
>> >>>> >>>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson <
>> >>>> >>>> jej2...@gmail.com>
>> >>>> >>>>>>>>>>>>>>>> wrote:
>> >>>> >>>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>>>> I am currently looking at the latest solrcloud
>> branch
>> >>>> and
>> >>>> >>>> was
>> >>>> >>>>>>>>>>>>>>>>>>> wondering if there was any documentation on
>> >>>> configuring the
>> >>>> >>>>>>>>>>>>>>>>>>> DistributedUpdateProcessor?  What specifically in
>> >>>> >>>> solrconfig.xml needs
>> >>>> >>>>>>>>>>>>>>>>>>> to be added/modified to make distributed indexing
>> work?
>> >>>> >>>>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>>> Hi Jaime - take a look at
>> solrconfig-distrib-update.xml
>> >>>> in
>> >>>> >>>>>>>>>>>>>>>>>> solr/core/src/test-files
>> >>>> >>>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>>> You need to enable the update log, add an empty
>> >>>> replication
>> >>>> >>>> handler def,
>> >>>> >>>>>>>>>>>>>>>>>> and an update chain with
>> >>>> >>>> solr.DistributedUpdateProcessFactory in it.
>> >>>> >>>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>>> --
>> >>>> >>>>>>>>>>>>>>>>>> - Mark
>> >>>> >>>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>>> http://www.lucidimagination.com
>> >>>> >>>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> --
>> >>>> >>>>>>>>>>>>>>> - Mark
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>>> http://www.lucidimagination.com
>> >>>> >>>>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>> - Mark Miller
>> >>>> >>>>>>>>>>>>> lucidimagination.com
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>> - Mark Miller
>> >>>> >>>>>>>>>>> lucidimagination.com
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>>
>> >>>> >>>>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>> - Mark Miller
>> >>>> >>>>>>>> lucidimagination.com
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>> - Mark Miller
>> >>>> >>>>>> lucidimagination.com
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>>
>> >>>> >>>>>
>> >>>> >>>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> --
>> >>>> >>> - Mark
>> >>>> >>>
>> >>>> >>> http://www.lucidimagination.com
>> >>>> >>>
>> >>>> >
>> >>>> > - Mark Miller
>> >>>> > lucidimagination.com
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> - Mark
>> >>>
>> >>> http://www.lucidimagination.com
>>
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com

Re: Configuring the Distributed

Reply via email to