Re: SolrCloud Feedback

Lance Norskog Sat, 11 Jun 2011 14:17:08 -0700

Replication's polling technique does not scale to massively multicore
environments. What is the official answer for this problem?  "Use ZK
and cloud?"


On Sat, Jun 11, 2011 at 7:11 AM, Mark Miller <markrmil...@gmail.com> wrote:
> Jan, I feel terrible for leaving you hanging on this - I missed this email 
> entirely. Seems some of these should be made JIRA issues if they are not 
> already?
>
> bq. j) Question: Is ReplicationHandler ZK-aware yet?
>
> As I think you now know, not yet ;)
>
> - Mark
>
> On Feb 14, 2011, at 4:40 PM, Jan Høydahl wrote:
>
>> Some more comments:
>>
>> f) For consistency, the JAVA OPTIONS should all be prefixed with solr.* even 
>> if they are related to embedded ZK
>>   -Dsolr.hostPort=8900 -Dsolr.zkRun -Dsolr.zkHost=localhost:9900 
>> -Dsolr.zkBootstrap_confdir=./solr/conf
>>
>> g) I often share parts of my config between cores, e.g. a common schema.xml 
>> or synonyms.xml
>>   In the file-based mode I can thus use ../../common_conf/synonyms.xml or 
>> similar.
>>   I have not tried to bootstrap such a config into ZK but I assume it will 
>> not work
>>   ZK mode should support such a use case either by supporting notations like 
>> ".."
>>   or by allowing an explicit zk name space: 
>> zk://configs/common-cfg/synonyms.xml
>>
>> h) Support for dev / test / prod environments
>>   In real life you want to develop in one environment, test in another and 
>> run production in a third
>>   Thus, the ZK data structure should have a clear separation between logical 
>> feature configuration and
>>   physical deployment config.
>>
>>   Perhaps a new level above /COLLECTIONS could be used to model this, e.g.
>>   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardA/prod01.server.com:8080
>>   /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardB/prod02.server.com:8080
>>   /ENV/PROD/COLLECTIONS/FILES/SHARDS/shardA/prod03.server.com:8080
>>   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardA/test01.server.com:8080
>>   /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardB/test01.server.com:9090
>>   
>> /ENV/TEST/COLLECTIONS/FILES[@configName=TESTFILES]/SHARDS/shardA/test01.server.com:7070
>>
>>   When starting solr we may specify environment: -Dsolr.env=TEST (or 
>> configure a default)
>>   The main benefit is that we can maintain and store one single ZK config in 
>> our SCM,
>>   distribute the same configs to all servers, and if you like, point all 
>> envs to the same ZK ensemble.
>>
>>   In the future, we can use this for automatic install of a new node as well:
>>   By simply adding a ZK entry on the right place, the node can discover "who 
>> it is" from ZK.
>>
>> i) Ideally, no config inside conf should contain host names.
>>   My DIH config will most likely include server names, which will be 
>> different between TEST and PROD
>>   This could be solved as above, by letting the collection in TEST use 
>> another configName than PROD,
>>   but for some use cases, it might be more elegant to swap out a hardcoded 
>> string with a ZK node
>>   in a generic way, such as jdbcString="my-hardcoded-string" to 
>> jdbcString="${zk://ENV/PROD/jdbcstrA}"
>>
>> j) Question: Is ReplicationHandler ZK-aware yet?
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> On 10. feb. 2011, at 16.10, Jan Høydahl wrote:
>>
>>> Hi,
>>>
>>> I have so far just tested the examples and got a N by M cluster running. My 
>>> feedback:
>>>
>>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly 
>>> state what is in which version, what are current improvement plans and get 
>>> rid of outdated stuff. That said I think there are many good ideas there.
>>>
>>> b) The "collection" terminology is too much confused with "core", and 
>>> should probably be made more distinct. I just tried to configure two cores 
>>> on the same Solr instance into the same collection, and that worked fine, 
>>> both as distinct shards and as same shard (replica). The wiki examples give 
>>> the impression that "collection1" in 
>>> localhost:8983/solr/collection1/select?distrib=true is some magic 
>>> collection identifier, but what it really does is doing the query on the 
>>> *core* named "collection1", looking up what collection that core is part of 
>>> and distributing the query to all shards in that collection.
>>>
>>> c) ZK is not designed to store large files. While the files in conf are 
>>> normally well below the 1M limit ZK imposes, we should perhaps consider 
>>> using a lightweight distributed object or k/v store for holding the 
>>> /CONFIGS and let ZK store a reference only
>>>
>>> d) How are admins supposed to update configs in ZK? Install their favourite 
>>> ZK editor?
>>>
>>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in 
>>> v4. Ideally you should interact with a 1-node Solr in the same manner as 
>>> you do with a 100-node Solr. An example is the Admin GUI where the "schema" 
>>> and "solrconfig" links assume local file. This requires decent tool support 
>>> to make ZK interaction intuitive, such as "import" and "export" commands.
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>>
>>> On 19. jan. 2011, at 21.07, Mark Miller wrote:
>>>
>>>> Hello Users,
>>>>
>>>> About a little over a year ago, a few of us started working on what we 
>>>> called SolrCloud.
>>>>
>>>> This initial bit of work was really a combination of laying some base work 
>>>> - figuring out how to integrate ZooKeeper with Solr in a limited way, 
>>>> dealing with some infrastructure - and picking off some low hanging search 
>>>> side fruit.
>>>>
>>>> The next step is the indexing side. And we plan on starting to tackle that 
>>>> sometime soon.
>>>>
>>>> But first - could you help with some feedback?ISome people are using our 
>>>> SolrCloud start - I have seen evidence of it ;) Some, even in production.
>>>>
>>>> I would love to have your help in targeting what we now try and improve. 
>>>> Any suggestions or feedback? If you have sent this before, I/others likely 
>>>> missed it - send it again!
>>>>
>>>> I know anyone that has used SolrCloud has some feedback. I know it because 
>>>> I've used it too ;) It's too complicated to setup still. There are still 
>>>> plenty of pain points. We accepted some compromise trying to fit into what 
>>>> Solr was, and not wanting to dig in too far before feeling things out and 
>>>> letting users try things out a bit. Thinking that we might be able to 
>>>> adjust Solr to be more in favor of SolrCloud as we go, what is the ideal 
>>>> state of the work we have currently done?
>>>>
>>>> If anyone using SolrCloud helps with the feedback, I'll help with the 
>>>> coding effort.
>>>>
>>>> - Mark Miller
>>>> -- lucidimagination.com
>>>
>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: SolrCloud Feedback

Reply via email to