Replication's polling technique does not scale to massively multicore environments. What is the official answer for this problem? "Use ZK and cloud?"
On Sat, Jun 11, 2011 at 7:11 AM, Mark Miller <markrmil...@gmail.com> wrote: > Jan, I feel terrible for leaving you hanging on this - I missed this email > entirely. Seems some of these should be made JIRA issues if they are not > already? > > bq. j) Question: Is ReplicationHandler ZK-aware yet? > > As I think you now know, not yet ;) > > - Mark > > On Feb 14, 2011, at 4:40 PM, Jan Høydahl wrote: > >> Some more comments: >> >> f) For consistency, the JAVA OPTIONS should all be prefixed with solr.* even >> if they are related to embedded ZK >> -Dsolr.hostPort=8900 -Dsolr.zkRun -Dsolr.zkHost=localhost:9900 >> -Dsolr.zkBootstrap_confdir=./solr/conf >> >> g) I often share parts of my config between cores, e.g. a common schema.xml >> or synonyms.xml >> In the file-based mode I can thus use ../../common_conf/synonyms.xml or >> similar. >> I have not tried to bootstrap such a config into ZK but I assume it will >> not work >> ZK mode should support such a use case either by supporting notations like >> ".." >> or by allowing an explicit zk name space: >> zk://configs/common-cfg/synonyms.xml >> >> h) Support for dev / test / prod environments >> In real life you want to develop in one environment, test in another and >> run production in a third >> Thus, the ZK data structure should have a clear separation between logical >> feature configuration and >> physical deployment config. >> >> Perhaps a new level above /COLLECTIONS could be used to model this, e.g. >> /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardA/prod01.server.com:8080 >> /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardB/prod02.server.com:8080 >> /ENV/PROD/COLLECTIONS/FILES/SHARDS/shardA/prod03.server.com:8080 >> /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardA/test01.server.com:8080 >> /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardB/test01.server.com:9090 >> >> /ENV/TEST/COLLECTIONS/FILES[@configName=TESTFILES]/SHARDS/shardA/test01.server.com:7070 >> >> When starting solr we may specify environment: -Dsolr.env=TEST (or >> configure a default) >> The main benefit is that we can maintain and store one single ZK config in >> our SCM, >> distribute the same configs to all servers, and if you like, point all >> envs to the same ZK ensemble. >> >> In the future, we can use this for automatic install of a new node as well: >> By simply adding a ZK entry on the right place, the node can discover "who >> it is" from ZK. >> >> i) Ideally, no config inside conf should contain host names. >> My DIH config will most likely include server names, which will be >> different between TEST and PROD >> This could be solved as above, by letting the collection in TEST use >> another configName than PROD, >> but for some use cases, it might be more elegant to swap out a hardcoded >> string with a ZK node >> in a generic way, such as jdbcString="my-hardcoded-string" to >> jdbcString="${zk://ENV/PROD/jdbcstrA}" >> >> j) Question: Is ReplicationHandler ZK-aware yet? >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >> On 10. feb. 2011, at 16.10, Jan Høydahl wrote: >> >>> Hi, >>> >>> I have so far just tested the examples and got a N by M cluster running. My >>> feedback: >>> >>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly >>> state what is in which version, what are current improvement plans and get >>> rid of outdated stuff. That said I think there are many good ideas there. >>> >>> b) The "collection" terminology is too much confused with "core", and >>> should probably be made more distinct. I just tried to configure two cores >>> on the same Solr instance into the same collection, and that worked fine, >>> both as distinct shards and as same shard (replica). The wiki examples give >>> the impression that "collection1" in >>> localhost:8983/solr/collection1/select?distrib=true is some magic >>> collection identifier, but what it really does is doing the query on the >>> *core* named "collection1", looking up what collection that core is part of >>> and distributing the query to all shards in that collection. >>> >>> c) ZK is not designed to store large files. While the files in conf are >>> normally well below the 1M limit ZK imposes, we should perhaps consider >>> using a lightweight distributed object or k/v store for holding the >>> /CONFIGS and let ZK store a reference only >>> >>> d) How are admins supposed to update configs in ZK? Install their favourite >>> ZK editor? >>> >>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in >>> v4. Ideally you should interact with a 1-node Solr in the same manner as >>> you do with a 100-node Solr. An example is the Admin GUI where the "schema" >>> and "solrconfig" links assume local file. This requires decent tool support >>> to make ZK interaction intuitive, such as "import" and "export" commands. >>> >>> -- >>> Jan Høydahl, search solution architect >>> Cominvent AS - www.cominvent.com >>> >>> On 19. jan. 2011, at 21.07, Mark Miller wrote: >>> >>>> Hello Users, >>>> >>>> About a little over a year ago, a few of us started working on what we >>>> called SolrCloud. >>>> >>>> This initial bit of work was really a combination of laying some base work >>>> - figuring out how to integrate ZooKeeper with Solr in a limited way, >>>> dealing with some infrastructure - and picking off some low hanging search >>>> side fruit. >>>> >>>> The next step is the indexing side. And we plan on starting to tackle that >>>> sometime soon. >>>> >>>> But first - could you help with some feedback?ISome people are using our >>>> SolrCloud start - I have seen evidence of it ;) Some, even in production. >>>> >>>> I would love to have your help in targeting what we now try and improve. >>>> Any suggestions or feedback? If you have sent this before, I/others likely >>>> missed it - send it again! >>>> >>>> I know anyone that has used SolrCloud has some feedback. I know it because >>>> I've used it too ;) It's too complicated to setup still. There are still >>>> plenty of pain points. We accepted some compromise trying to fit into what >>>> Solr was, and not wanting to dig in too far before feeling things out and >>>> letting users try things out a bit. Thinking that we might be able to >>>> adjust Solr to be more in favor of SolrCloud as we go, what is the ideal >>>> state of the work we have currently done? >>>> >>>> If anyone using SolrCloud helps with the feedback, I'll help with the >>>> coding effort. >>>> >>>> - Mark Miller >>>> -- lucidimagination.com >>> >> > > - Mark Miller > lucidimagination.com > > > > > > > > > -- Lance Norskog goks...@gmail.com