Re: SolrCloud Feedback

Mark Miller Sat, 11 Jun 2011 07:12:10 -0700

Jan, I feel terrible for leaving you hanging on this - I missed this email 
entirely. Seems some of these should be made JIRA issues if they are not 
already?


bq. j) Question: Is ReplicationHandler ZK-aware yet?

As I think you now know, not yet ;)

- Mark

On Feb 14, 2011, at 4:40 PM, Jan Høydahl wrote:

> Some more comments:
> 
> f) For consistency, the JAVA OPTIONS should all be prefixed with solr.* even 
> if they are related to embedded ZK
>   -Dsolr.hostPort=8900 -Dsolr.zkRun -Dsolr.zkHost=localhost:9900 
> -Dsolr.zkBootstrap_confdir=./solr/conf
> 
> g) I often share parts of my config between cores, e.g. a common schema.xml 
> or synonyms.xml
>   In the file-based mode I can thus use ../../common_conf/synonyms.xml or 
> similar.
>   I have not tried to bootstrap such a config into ZK but I assume it will 
> not work
>   ZK mode should support such a use case either by supporting notations like 
> ".."
>   or by allowing an explicit zk name space: 
> zk://configs/common-cfg/synonyms.xml
> 
> h) Support for dev / test / prod environments
>   In real life you want to develop in one environment, test in another and 
> run production in a third
>   Thus, the ZK data structure should have a clear separation between logical 
> feature configuration and
>   physical deployment config.
> 
>   Perhaps a new level above /COLLECTIONS could be used to model this, e.g.
>   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardA/prod01.server.com:8080
>   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardB/prod02.server.com:8080
>   /ENV/PROD/COLLECTIONS/FILES/SHARDS/shardA/prod03.server.com:8080
>   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardA/test01.server.com:8080
>   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardB/test01.server.com:9090
>   
> /ENV/TEST/COLLECTIONS/FILES[@configName=TESTFILES]/SHARDS/shardA/test01.server.com:7070
> 
>   When starting solr we may specify environment: -Dsolr.env=TEST (or 
> configure a default)
>   The main benefit is that we can maintain and store one single ZK config in 
> our SCM,
>   distribute the same configs to all servers, and if you like, point all envs 
> to the same ZK ensemble.
> 
>   In the future, we can use this for automatic install of a new node as well:
>   By simply adding a ZK entry on the right place, the node can discover "who 
> it is" from ZK.
> 
> i) Ideally, no config inside conf should contain host names.
>   My DIH config will most likely include server names, which will be 
> different between TEST and PROD
>   This could be solved as above, by letting the collection in TEST use 
> another configName than PROD,
>   but for some use cases, it might be more elegant to swap out a hardcoded 
> string with a ZK node 
>   in a generic way, such as jdbcString="my-hardcoded-string" to 
> jdbcString="${zk://ENV/PROD/jdbcstrA}"
> 
> j) Question: Is ReplicationHandler ZK-aware yet?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 10. feb. 2011, at 16.10, Jan Høydahl wrote:
> 
>> Hi,
>> 
>> I have so far just tested the examples and got a N by M cluster running. My 
>> feedback:
>> 
>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly 
>> state what is in which version, what are current improvement plans and get 
>> rid of outdated stuff. That said I think there are many good ideas there.
>> 
>> b) The "collection" terminology is too much confused with "core", and should 
>> probably be made more distinct. I just tried to configure two cores on the 
>> same Solr instance into the same collection, and that worked fine, both as 
>> distinct shards and as same shard (replica). The wiki examples give the 
>> impression that "collection1" in 
>> localhost:8983/solr/collection1/select?distrib=true is some magic collection 
>> identifier, but what it really does is doing the query on the *core* named 
>> "collection1", looking up what collection that core is part of and 
>> distributing the query to all shards in that collection.
>> 
>> c) ZK is not designed to store large files. While the files in conf are 
>> normally well below the 1M limit ZK imposes, we should perhaps consider 
>> using a lightweight distributed object or k/v store for holding the /CONFIGS 
>> and let ZK store a reference only
>> 
>> d) How are admins supposed to update configs in ZK? Install their favourite 
>> ZK editor?
>> 
>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in 
>> v4. Ideally you should interact with a 1-node Solr in the same manner as you 
>> do with a 100-node Solr. An example is the Admin GUI where the "schema" and 
>> "solrconfig" links assume local file. This requires decent tool support to 
>> make ZK interaction intuitive, such as "import" and "export" commands.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>> 
>>> Hello Users,
>>> 
>>> About a little over a year ago, a few of us started working on what we 
>>> called SolrCloud.
>>> 
>>> This initial bit of work was really a combination of laying some base work 
>>> - figuring out how to integrate ZooKeeper with Solr in a limited way, 
>>> dealing with some infrastructure - and picking off some low hanging search 
>>> side fruit.
>>> 
>>> The next step is the indexing side. And we plan on starting to tackle that 
>>> sometime soon.
>>> 
>>> But first - could you help with some feedback?ISome people are using our 
>>> SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>> 
>>> I would love to have your help in targeting what we now try and improve. 
>>> Any suggestions or feedback? If you have sent this before, I/others likely 
>>> missed it - send it again!
>>> 
>>> I know anyone that has used SolrCloud has some feedback. I know it because 
>>> I've used it too ;) It's too complicated to setup still. There are still 
>>> plenty of pain points. We accepted some compromise trying to fit into what 
>>> Solr was, and not wanting to dig in too far before feeling things out and 
>>> letting users try things out a bit. Thinking that we might be able to 
>>> adjust Solr to be more in favor of SolrCloud as we go, what is the ideal 
>>> state of the work we have currently done?
>>> 
>>> If anyone using SolrCloud helps with the feedback, I'll help with the 
>>> coding effort.
>>> 
>>> - Mark Miller
>>> -- lucidimagination.com
>> 
> 

- Mark Miller
lucidimagination.com

Re: SolrCloud Feedback

Reply via email to