Again great stuff. Once distributed update/delete works (sounds like it's not far off) I'll have to reevaluate our current stack.
You had mentioned storing the shad hash assignments in ZK, is there a JIRA around this? I'll keep my eyes on the JIRA tickets. Right now the distirbuted updated/delete are big ones for me, the rebalancing of the cluster is a nice to have, but hopefully I wouldn't need that capability anytime in the near future. One other question....my current setup has replication done on a polling setup, I didn't notice that in the updated solrconfig, how does this work now? On Sat, Dec 3, 2011 at 9:00 AM, Mark Miller <markrmil...@gmail.com> wrote: > bq. A few questions if a master goes down does a replica get > promoted? > > Right - if the leader goes down there is a leader election and one of the > replicas takes over. > > bq. If a new shard needs to be added is it just a matter of > starting a new solr instance with a higher numShards? > > Eventually, that's the plan. > > The idea is, you say something like, I want 3 shards. Now if you start up 9 > instances, the first 3 end up as shard leaders - the next 6 evenly come up > as replicas for each shard. > > To change the numShards, we will need some kind of micro shards / splitting > / rebalancing. > > bq. Last question, how do you change numShards? > > I think this is somewhat a work in progress, but I think Sami just made it > so that numShards is stored on the collection node in zk (along with which > config set to use). So you would change it there presumably. Or perhaps > just start up a new server with an update numShards property and then it > would realize that needs to be a new leader - of course then you'd want to > rebalance probably - unless you fired up enough servers to add replicas > too... > > bq. Is that right? > > Yup, sounds about right. > > > On Fri, Dec 2, 2011 at 10:59 PM, Jamie Johnson <jej2...@gmail.com> wrote: > >> So I just tried this out, seems like it does the things I asked about. >> >> Really really cool stuff, it's progressed quite a bit in the time >> since I took a snapshot of the branch. >> >> Last question, how do you change numShards? Is there a command you >> can use to do this now? I understand there will be implications for >> the hashing algorithm, but once the hash ranges are stored in ZK (is >> there a separate JIRA for this or does this fall under 2358) I assume >> that it would be a relatively simple index split (JIRA 2595?) and >> updating the hash ranges in solr, essentially splitting the range >> between the new and existing shard. Is that right? >> >> On Fri, Dec 2, 2011 at 10:08 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> > I think I see it.....so if I understand this correctly you specify >> > numShards as a system property, as new nodes come up they check ZK to >> > see if they should be a new shard or a replica based on if numShards >> > is met. A few questions if a master goes down does a replica get >> > promoted? If a new shard needs to be added is it just a matter of >> > starting a new solr instance with a higher numShards? (understanding >> > that index rebalancing does not happen automatically now, but >> > presumably it could). >> > >> > On Fri, Dec 2, 2011 at 9:56 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> >> How does it determine the number of shards to create? How many >> >> replicas to create? >> >> >> >> On Fri, Dec 2, 2011 at 4:30 PM, Mark Miller <markrmil...@gmail.com> >> wrote: >> >>> Ah, okay - you are setting the shards in solr.xml - thats still an >> option >> >>> to force a node to a particular shard - but if you take that out, >> shards >> >>> will be auto assigned. >> >>> >> >>> By the way, because of the version code, distrib deletes don't work at >> the >> >>> moment - will get to that next week. >> >>> >> >>> - Mark >> >>> >> >>> On Fri, Dec 2, 2011 at 1:16 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >> >>> >> >>>> So I'm a fool. I did set the numShards, the issue was so trivial it's >> >>>> embarrassing. I did indeed have it setup as a replica, the shard >> >>>> names in solr.xml were both shard1. This worked as I expected now. >> >>>> >> >>>> On Fri, Dec 2, 2011 at 1:02 PM, Mark Miller <markrmil...@gmail.com> >> wrote: >> >>>> > >> >>>> > They are unused params, so removing them wouldn't help anything. >> >>>> > >> >>>> > You might just want to wait till we are further along before playing >> >>>> with it. >> >>>> > >> >>>> > Or if you submit your full self contained test, I can see what's >> going >> >>>> on (eg its still unclear if you have started setting numShards?). >> >>>> > >> >>>> > I can do a similar set of actions in my tests and it works fine. The >> >>>> only reason I could see things working like this is if it thinks you >> have >> >>>> one shard - a leader and a replica. >> >>>> > >> >>>> > - Mark >> >>>> > >> >>>> > On Dec 2, 2011, at 12:41 PM, Jamie Johnson wrote: >> >>>> > >> >>>> >> Glad to hear I don't need to set shards/self, but removing them >> didn't >> >>>> >> seem to change what I'm seeing. Doing this still results in 2 >> >>>> >> documents 1 on 8983 and 1 on 7574. >> >>>> >> >> >>>> >> String key = "1"; >> >>>> >> >> >>>> >> SolrInputDocument solrDoc = new SolrInputDocument(); >> >>>> >> solrDoc.setField("key", key); >> >>>> >> >> >>>> >> solrDoc.addField("content_mvtxt", "initial value"); >> >>>> >> >> >>>> >> SolrServer server = servers.get(" >> >>>> http://localhost:8983/solr/collection1"); >> >>>> >> >> >>>> >> UpdateRequest ureq = new UpdateRequest(); >> >>>> >> ureq.setParam("update.chain", >> "distrib-update-chain"); >> >>>> >> ureq.add(solrDoc); >> >>>> >> ureq.setAction(ACTION.COMMIT, true, true); >> >>>> >> server.request(ureq); >> >>>> >> server.commit(); >> >>>> >> >> >>>> >> solrDoc = new SolrInputDocument(); >> >>>> >> solrDoc.addField("key", key); >> >>>> >> solrDoc.addField("content_mvtxt", "updated value"); >> >>>> >> >> >>>> >> server = servers.get(" >> >>>> http://localhost:7574/solr/collection1"); >> >>>> >> >> >>>> >> ureq = new UpdateRequest(); >> >>>> >> ureq.setParam("update.chain", >> "distrib-update-chain"); >> >>>> >> ureq.add(solrDoc); >> >>>> >> ureq.setAction(ACTION.COMMIT, true, true); >> >>>> >> server.request(ureq); >> >>>> >> server.commit(); >> >>>> >> >> >>>> >> server = servers.get(" >> >>>> http://localhost:8983/solr/collection1"); >> >>>> >> >> >>>> >> >> >>>> >> server.commit(); >> >>>> >> System.out.println("done"); >> >>>> >> >> >>>> >> On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller < >> markrmil...@gmail.com> >> >>>> wrote: >> >>>> >>> So I dunno. You are running a zk server and running in zk mode >> right? >> >>>> >>> >> >>>> >>> You don't need to / shouldn't set a shards or self param. The >> shards >> >>>> are >> >>>> >>> figured out from Zookeeper. >> >>>> >>> >> >>>> >>> You always want to use the distrib-update-chain. Eventually it >> will >> >>>> >>> probably be part of the default chain and auto turn in zk mode. >> >>>> >>> >> >>>> >>> If you are running in zk mode attached to a zk server, this should >> >>>> work no >> >>>> >>> problem. You can add docs to any server and they will be >> forwarded to >> >>>> the >> >>>> >>> correct shard leader and then versioned and forwarded to replicas. >> >>>> >>> >> >>>> >>> You can also use the CloudSolrServer solrj client - that way you >> don't >> >>>> even >> >>>> >>> have to choose a server to send docs too - in which case if it >> went >> >>>> down >> >>>> >>> you would have to choose another manually - CloudSolrServer >> >>>> automatically >> >>>> >>> finds one that is up through ZooKeeper. Eventually it will also be >> >>>> smart >> >>>> >>> and do the hashing itself so that it can send directly to the >> shard >> >>>> leader >> >>>> >>> that the doc would be forwarded to anyway. >> >>>> >>> >> >>>> >>> - Mark >> >>>> >>> >> >>>> >>> On Fri, Dec 2, 2011 at 12:09 AM, Jamie Johnson <jej2...@gmail.com >> > >> >>>> wrote: >> >>>> >>> >> >>>> >>>> Really just trying to do a simple add and update test, the chain >> >>>> >>>> missing is just proof of my not understanding exactly how this is >> >>>> >>>> supposed to work. I modified the code to this >> >>>> >>>> >> >>>> >>>> String key = "1"; >> >>>> >>>> >> >>>> >>>> SolrInputDocument solrDoc = new >> SolrInputDocument(); >> >>>> >>>> solrDoc.setField("key", key); >> >>>> >>>> >> >>>> >>>> solrDoc.addField("content_mvtxt", "initial >> value"); >> >>>> >>>> >> >>>> >>>> SolrServer server = servers >> >>>> >>>> .get(" >> >>>> >>>> http://localhost:8983/solr/collection1"); >> >>>> >>>> >> >>>> >>>> UpdateRequest ureq = new UpdateRequest(); >> >>>> >>>> ureq.setParam("update.chain", >> "distrib-update-chain"); >> >>>> >>>> ureq.add(solrDoc); >> >>>> >>>> ureq.setParam("shards", >> >>>> >>>> >> >>>> >>>> >> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); >> >>>> >>>> ureq.setParam("self", "foo"); >> >>>> >>>> ureq.setAction(ACTION.COMMIT, true, true); >> >>>> >>>> server.request(ureq); >> >>>> >>>> server.commit(); >> >>>> >>>> >> >>>> >>>> solrDoc = new SolrInputDocument(); >> >>>> >>>> solrDoc.addField("key", key); >> >>>> >>>> solrDoc.addField("content_mvtxt", "updated >> value"); >> >>>> >>>> >> >>>> >>>> server = servers.get(" >> >>>> >>>> http://localhost:7574/solr/collection1"); >> >>>> >>>> >> >>>> >>>> ureq = new UpdateRequest(); >> >>>> >>>> ureq.setParam("update.chain", >> "distrib-update-chain"); >> >>>> >>>> // >> >>>> ureq.deleteById("8060a9eb-9546-43ee-95bb-d18ea26a6285"); >> >>>> >>>> ureq.add(solrDoc); >> >>>> >>>> ureq.setParam("shards", >> >>>> >>>> >> >>>> >>>> >> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); >> >>>> >>>> ureq.setParam("self", "foo"); >> >>>> >>>> ureq.setAction(ACTION.COMMIT, true, true); >> >>>> >>>> server.request(ureq); >> >>>> >>>> // server.add(solrDoc); >> >>>> >>>> server.commit(); >> >>>> >>>> server = servers.get(" >> >>>> >>>> http://localhost:8983/solr/collection1"); >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> server.commit(); >> >>>> >>>> System.out.println("done"); >> >>>> >>>> >> >>>> >>>> but I'm still seeing the doc appear on both shards. After the >> first >> >>>> >>>> commit I see the doc on 8983 with "initial value". after the >> second >> >>>> >>>> commit I see the updated value on 7574 and the old on 8983. >> After the >> >>>> >>>> final commit the doc on 8983 gets updated. >> >>>> >>>> >> >>>> >>>> Is there something wrong with my test? >> >>>> >>>> >> >>>> >>>> On Thu, Dec 1, 2011 at 11:17 PM, Mark Miller < >> markrmil...@gmail.com> >> >>>> >>>> wrote: >> >>>> >>>>> Getting late - didn't really pay attention to your code I guess >> - why >> >>>> >>>> are you adding the first doc without specifying the distrib >> update >> >>>> chain? >> >>>> >>>> This is not really supported. It's going to just go to the >> server you >> >>>> >>>> specified - even with everything setup right, the update might >> then >> >>>> go to >> >>>> >>>> that same server or the other one depending on how it hashes. You >> >>>> really >> >>>> >>>> want to just always use the distrib update chain. I guess I >> don't yet >> >>>> >>>> understand what you are trying to test. >> >>>> >>>>> >> >>>> >>>>> Sent from my iPad >> >>>> >>>>> >> >>>> >>>>> On Dec 1, 2011, at 10:57 PM, Mark Miller <markrmil...@gmail.com >> > >> >>>> wrote: >> >>>> >>>>> >> >>>> >>>>>> Not sure offhand - but things will be funky if you don't >> specify the >> >>>> >>>> correct numShards. >> >>>> >>>>>> >> >>>> >>>>>> The instance to shard assignment should be using numShards to >> >>>> assign. >> >>>> >>>> But then the hash to shard mapping actually goes on the number of >> >>>> shards it >> >>>> >>>> finds registered in ZK (it doesn't have to, but really these >> should be >> >>>> >>>> equal). >> >>>> >>>>>> >> >>>> >>>>>> So basically you are saying, I want 3 partitions, but you are >> only >> >>>> >>>> starting up 2 nodes, and the code is just not happy about that >> I'd >> >>>> guess. >> >>>> >>>> For the system to work properly, you have to fire up at least as >> many >> >>>> >>>> servers as numShards. >> >>>> >>>>>> >> >>>> >>>>>> What are you trying to do? 2 partitions with no replicas, or >> one >> >>>> >>>> partition with one replica? >> >>>> >>>>>> >> >>>> >>>>>> In either case, I think you will have better luck if you fire >> up at >> >>>> >>>> least as many servers as the numShards setting. Or lower the >> numShards >> >>>> >>>> setting. >> >>>> >>>>>> >> >>>> >>>>>> This is all a work in progress by the way - what you are >> trying to >> >>>> test >> >>>> >>>> should work if things are setup right though. >> >>>> >>>>>> >> >>>> >>>>>> - Mark >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> On Dec 1, 2011, at 10:40 PM, Jamie Johnson wrote: >> >>>> >>>>>> >> >>>> >>>>>>> Thanks for the quick response. With that change (have not >> done >> >>>> >>>>>>> numShards yet) shard1 got updated. But now when executing the >> >>>> >>>>>>> following queries I get information back from both, which >> doesn't >> >>>> seem >> >>>> >>>>>>> right >> >>>> >>>>>>> >> >>>> >>>>>>> http://localhost:7574/solr/select/?q=*:* >> >>>> >>>>>>> <doc><str name="key">1</str><str name="content_mvtxt">updated >> >>>> >>>> value</str></doc> >> >>>> >>>>>>> >> >>>> >>>>>>> http://localhost:8983/solr/select?q=*:* >> >>>> >>>>>>> <doc><str name="key">1</str><str name="content_mvtxt">updated >> >>>> >>>> value</str></doc> >> >>>> >>>>>>> >> >>>> >>>>>>> >> >>>> >>>>>>> >> >>>> >>>>>>> On Thu, Dec 1, 2011 at 10:21 PM, Mark Miller < >> >>>> markrmil...@gmail.com> >> >>>> >>>> wrote: >> >>>> >>>>>>>> Hmm...sorry bout that - so my first guess is that right now >> we are >> >>>> >>>> not distributing a commit (easy to add, just have not done it). >> >>>> >>>>>>>> >> >>>> >>>>>>>> Right now I explicitly commit on each server for tests. >> >>>> >>>>>>>> >> >>>> >>>>>>>> Can you try explicitly committing on server1 after updating >> the >> >>>> doc >> >>>> >>>> on server 2? >> >>>> >>>>>>>> >> >>>> >>>>>>>> I can start distributing commits tomorrow - been meaning to >> do it >> >>>> for >> >>>> >>>> my own convenience anyhow. >> >>>> >>>>>>>> >> >>>> >>>>>>>> Also, you want to pass the sys property numShards=1 on >> startup. I >> >>>> >>>> think it defaults to 3. That will give you one leader and one >> replica. >> >>>> >>>>>>>> >> >>>> >>>>>>>> - Mark >> >>>> >>>>>>>> >> >>>> >>>>>>>> On Dec 1, 2011, at 9:56 PM, Jamie Johnson wrote: >> >>>> >>>>>>>> >> >>>> >>>>>>>>> So I couldn't resist, I attempted to do this tonight, I >> used the >> >>>> >>>>>>>>> solrconfig you mentioned (as is, no modifications), I setup >> a 2 >> >>>> shard >> >>>> >>>>>>>>> cluster in collection1, I sent 1 doc to 1 of the shards, >> updated >> >>>> it >> >>>> >>>>>>>>> and sent the update to the other. I don't see the >> modifications >> >>>> >>>>>>>>> though I only see the original document. The following is >> the >> >>>> test >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> public void update() throws Exception { >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> String key = "1"; >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> SolrInputDocument solrDoc = new >> SolrInputDocument(); >> >>>> >>>>>>>>> solrDoc.setField("key", key); >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> solrDoc.addField("content", "initial value"); >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> SolrServer server = servers >> >>>> >>>>>>>>> .get(" >> >>>> >>>> http://localhost:8983/solr/collection1"); >> >>>> >>>>>>>>> server.add(solrDoc); >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> server.commit(); >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> solrDoc = new SolrInputDocument(); >> >>>> >>>>>>>>> solrDoc.addField("key", key); >> >>>> >>>>>>>>> solrDoc.addField("content", "updated value"); >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> server = servers.get(" >> >>>> >>>> http://localhost:7574/solr/collection1"); >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> UpdateRequest ureq = new UpdateRequest(); >> >>>> >>>>>>>>> ureq.setParam("update.chain", >> >>>> "distrib-update-chain"); >> >>>> >>>>>>>>> ureq.add(solrDoc); >> >>>> >>>>>>>>> ureq.setParam("shards", >> >>>> >>>>>>>>> >> >>>> >>>> >> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); >> >>>> >>>>>>>>> ureq.setParam("self", "foo"); >> >>>> >>>>>>>>> ureq.setAction(ACTION.COMMIT, true, true); >> >>>> >>>>>>>>> server.request(ureq); >> >>>> >>>>>>>>> System.out.println("done"); >> >>>> >>>>>>>>> } >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> key is my unique field in schema.xml >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> What am I doing wrong? >> >>>> >>>>>>>>> >> >>>> >>>>>>>>> On Thu, Dec 1, 2011 at 8:51 PM, Jamie Johnson < >> jej2...@gmail.com >> >>>> > >> >>>> >>>> wrote: >> >>>> >>>>>>>>>> Yes, the ZK method seems much more flexible. Adding a new >> shard >> >>>> >>>> would >> >>>> >>>>>>>>>> be simply updating the range assignments in ZK. Where is >> this >> >>>> >>>>>>>>>> currently on the list of things to accomplish? I don't >> have >> >>>> time to >> >>>> >>>>>>>>>> work on this now, but if you (or anyone) could provide >> >>>> direction I'd >> >>>> >>>>>>>>>> be willing to work on this when I had spare time. I guess >> a >> >>>> JIRA >> >>>> >>>>>>>>>> detailing where/how to do this could help. Not sure if the >> >>>> design >> >>>> >>>> has >> >>>> >>>>>>>>>> been thought out that far though. >> >>>> >>>>>>>>>> >> >>>> >>>>>>>>>> On Thu, Dec 1, 2011 at 8:15 PM, Mark Miller < >> >>>> markrmil...@gmail.com> >> >>>> >>>> wrote: >> >>>> >>>>>>>>>>> Right now lets say you have one shard - everything there >> >>>> hashes to >> >>>> >>>> range X. >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> Now you want to split that shard with an Index Splitter. >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> You divide range X in two - giving you two ranges - then >> you >> >>>> start >> >>>> >>>> splitting. This is where the current Splitter needs a little >> >>>> modification. >> >>>> >>>> You decide which doc should go into which new index by rehashing >> each >> >>>> doc >> >>>> >>>> id in the index you are splitting - if its hash is greater than >> X/2, >> >>>> it >> >>>> >>>> goes into index1 - if its less, index2. I think there are a >> couple >> >>>> current >> >>>> >>>> Splitter impls, but one of them does something like, give me an >> id - >> >>>> now if >> >>>> >>>> the id's in the index are above that id, goto index1, if below, >> >>>> index2. We >> >>>> >>>> need to instead do a quick hash rather than simple id compare. >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> Why do you need to do this on every shard? >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> The other part we need that we dont have is to store hash >> range >> >>>> >>>> assignments in zookeeper - we don't do that yet because it's not >> >>>> needed >> >>>> >>>> yet. Instead we currently just simply calculate that on the fly >> (too >> >>>> often >> >>>> >>>> at the moment - on every request :) I intend to fix that of >> course). >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> At the start, zk would say, for range X, goto this shard. >> After >> >>>> >>>> the split, it would say, for range less than X/2 goto the old >> node, >> >>>> for >> >>>> >>>> range greater than X/2 goto the new node. >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> - Mark >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> On Dec 1, 2011, at 7:44 PM, Jamie Johnson wrote: >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>>> hmmm.....This doesn't sound like the hashing algorithm >> that's >> >>>> on >> >>>> >>>> the >> >>>> >>>>>>>>>>>> branch, right? The algorithm you're mentioning sounds >> like >> >>>> there >> >>>> >>>> is >> >>>> >>>>>>>>>>>> some logic which is able to tell that a particular range >> >>>> should be >> >>>> >>>>>>>>>>>> distributed between 2 shards instead of 1. So seems >> like a >> >>>> trade >> >>>> >>>> off >> >>>> >>>>>>>>>>>> between repartitioning the entire index (on every shard) >> and >> >>>> >>>> having a >> >>>> >>>>>>>>>>>> custom hashing algorithm which is able to handle the >> situation >> >>>> >>>> where 2 >> >>>> >>>>>>>>>>>> or more shards map to a particular range. >> >>>> >>>>>>>>>>>> >> >>>> >>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:34 PM, Mark Miller < >> >>>> >>>> markrmil...@gmail.com> wrote: >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote: >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>> I am not familiar with the index splitter that is in >> >>>> contrib, >> >>>> >>>> but I'll >> >>>> >>>>>>>>>>>>>> take a look at it soon. So the process sounds like it >> >>>> would be >> >>>> >>>> to run >> >>>> >>>>>>>>>>>>>> this on all of the current shards indexes based on the >> hash >> >>>> >>>> algorithm. >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> Not something I've thought deeply about myself yet, but >> I >> >>>> think >> >>>> >>>> the idea would be to split as many as you felt you needed to. >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> If you wanted to keep the full balance always, this >> would >> >>>> mean >> >>>> >>>> splitting every shard at once, yes. But this depends on how many >> boxes >> >>>> >>>> (partitions) you are willing/able to add at a time. >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> You might just split one index to start - now it's hash >> range >> >>>> >>>> would be handled by two shards instead of one (if you have 3 >> replicas >> >>>> per >> >>>> >>>> shard, this would mean adding 3 more boxes). When you needed to >> expand >> >>>> >>>> again, you would split another index that was still handling its >> full >> >>>> >>>> starting range. As you grow, once you split every original index, >> >>>> you'd >> >>>> >>>> start again, splitting one of the now half ranges. >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>> Is there also an index merger in contrib which could be >> >>>> used to >> >>>> >>>> merge >> >>>> >>>>>>>>>>>>>> indexes? I'm assuming this would be the process? >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> You can merge with IndexWriter.addIndexes (Solr also >> has an >> >>>> >>>> admin command that can do this). But I'm not sure where this >> fits in? >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> - Mark >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:18 PM, Mark Miller < >> >>>> >>>> markrmil...@gmail.com> wrote: >> >>>> >>>>>>>>>>>>>>> Not yet - we don't plan on working on this until a >> lot of >> >>>> >>>> other stuff is >> >>>> >>>>>>>>>>>>>>> working solid at this point. But someone else could >> jump >> >>>> in! >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> There are a couple ways to go about it that I know of: >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> A more long term solution may be to start using micro >> >>>> shards - >> >>>> >>>> each index >> >>>> >>>>>>>>>>>>>>> starts as multiple indexes. This makes it pretty fast >> to >> >>>> move >> >>>> >>>> mirco shards >> >>>> >>>>>>>>>>>>>>> around as you decide to change partitions. It's also >> less >> >>>> >>>> flexible as you >> >>>> >>>>>>>>>>>>>>> are limited by the number of micro shards you start >> with. >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> A more simple and likely first step is to use an index >> >>>> >>>> splitter . We >> >>>> >>>>>>>>>>>>>>> already have one in lucene contrib - we would just >> need to >> >>>> >>>> modify it so >> >>>> >>>>>>>>>>>>>>> that it splits based on the hash of the document id. >> This >> >>>> is >> >>>> >>>> super >> >>>> >>>>>>>>>>>>>>> flexible, but splitting will obviously take a little >> while >> >>>> on >> >>>> >>>> a huge index. >> >>>> >>>>>>>>>>>>>>> The current index splitter is a multi pass splitter - >> good >> >>>> >>>> enough to start >> >>>> >>>>>>>>>>>>>>> with, but most files under codec control these days, >> we >> >>>> may be >> >>>> >>>> able to make >> >>>> >>>>>>>>>>>>>>> a single pass splitter soon as well. >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> Eventually you could imagine using both options - >> micro >> >>>> shards >> >>>> >>>> that could >> >>>> >>>>>>>>>>>>>>> also be split as needed. Though I still wonder if >> micro >> >>>> shards >> >>>> >>>> will be >> >>>> >>>>>>>>>>>>>>> worth the extra complications myself... >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> Right now though, the idea is that you should pick a >> good >> >>>> >>>> number of >> >>>> >>>>>>>>>>>>>>> partitions to start given your expected data ;) >> Adding more >> >>>> >>>> replicas is >> >>>> >>>>>>>>>>>>>>> trivial though. >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> - Mark >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:35 PM, Jamie Johnson < >> >>>> >>>> jej2...@gmail.com> wrote: >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> Another question, is there any support for >> repartitioning >> >>>> of >> >>>> >>>> the index >> >>>> >>>>>>>>>>>>>>>> if a new shard is added? What is the recommended >> >>>> approach for >> >>>> >>>>>>>>>>>>>>>> handling this? It seemed that the hashing algorithm >> (and >> >>>> >>>> probably >> >>>> >>>>>>>>>>>>>>>> any) would require the index to be repartitioned >> should a >> >>>> new >> >>>> >>>> shard be >> >>>> >>>>>>>>>>>>>>>> added. >> >>>> >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:32 PM, Jamie Johnson < >> >>>> >>>> jej2...@gmail.com> wrote: >> >>>> >>>>>>>>>>>>>>>>> Thanks I will try this first thing in the morning. >> >>>> >>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller < >> >>>> >>>> markrmil...@gmail.com> >> >>>> >>>>>>>>>>>>>>>> wrote: >> >>>> >>>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson < >> >>>> >>>> jej2...@gmail.com> >> >>>> >>>>>>>>>>>>>>>> wrote: >> >>>> >>>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>>>> I am currently looking at the latest solrcloud >> branch >> >>>> and >> >>>> >>>> was >> >>>> >>>>>>>>>>>>>>>>>>> wondering if there was any documentation on >> >>>> configuring the >> >>>> >>>>>>>>>>>>>>>>>>> DistributedUpdateProcessor? What specifically in >> >>>> >>>> solrconfig.xml needs >> >>>> >>>>>>>>>>>>>>>>>>> to be added/modified to make distributed indexing >> work? >> >>>> >>>>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>>> Hi Jaime - take a look at >> solrconfig-distrib-update.xml >> >>>> in >> >>>> >>>>>>>>>>>>>>>>>> solr/core/src/test-files >> >>>> >>>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>>> You need to enable the update log, add an empty >> >>>> replication >> >>>> >>>> handler def, >> >>>> >>>>>>>>>>>>>>>>>> and an update chain with >> >>>> >>>> solr.DistributedUpdateProcessFactory in it. >> >>>> >>>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>>> -- >> >>>> >>>>>>>>>>>>>>>>>> - Mark >> >>>> >>>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>>> http://www.lucidimagination.com >> >>>> >>>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> -- >> >>>> >>>>>>>>>>>>>>> - Mark >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>> http://www.lucidimagination.com >> >>>> >>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> - Mark Miller >> >>>> >>>>>>>>>>>>> lucidimagination.com >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> - Mark Miller >> >>>> >>>>>>>>>>> lucidimagination.com >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> >>>>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> - Mark Miller >> >>>> >>>>>>>> lucidimagination.com >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>>>> >> >>>> >>>>>> >> >>>> >>>>>> - Mark Miller >> >>>> >>>>>> lucidimagination.com >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>> >> >>>> >>>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> -- >> >>>> >>> - Mark >> >>>> >>> >> >>>> >>> http://www.lucidimagination.com >> >>>> >>> >> >>>> > >> >>>> > - Mark Miller >> >>>> > lucidimagination.com >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> - Mark >> >>> >> >>> http://www.lucidimagination.com >> > > > > -- > - Mark > > http://www.lucidimagination.com