Re: Regarding document routing

manish tanger Wed, 10 Jan 2018 22:01:31 -0800

Hello Shwana,

First of all thanks for your response.


>*For redundancy with ZK, you need three hosts minimum.  A two-host ZK
ensemble is actually *less* reliable than using one server.  You aren't
protected against failure until you have at least three.  You would only
need a minimum of two Solr hosts, though.*
Yeah! same I have read somewhere but that is my local setup, not production
setup. Still, I'll remember your advice while making prod setup.


>*ZooKeeper isn't responsible for distributing documents between shards.
It is Solr that does this, using information in the ZK database.  With the
implicit router, the only routing information in ZK is the shard names.
Solr cannot make decisions about which shard gets the documents, that
information must come from the system doing the indexing.*
As we are connecting through zookeeper my understanding was routing will
done by a zookeeper, Thanks for the clarification.


>*What was the precise commands or API calls that you used to create the
collection?  What is the definition of the dateandhour field?*




*Collection Creation Through UI:[image: Inline image 3]*

*API for insertion the docs:*
List<SolrInputDocument> inputDocuments = new ArrayList<>;
solrCloudClient = new
CloudSolrClient.Builder().withZkHost(ZK_HOST_LIST).build();
solrCloudClient.setDefaultCollection(COLLECTION_NAME);

SolrInputDocument inputDocument = new SolrInputDocument();
inputDocument.addField("id", UUID.randomUUID().toString());
*inputDocument.addField(dateAndHour, "20180111_04");*
inputDocument.addField(__KEY__, __VALUE__);
inputDocuments.add(inputDocument);

solrCloudClient.add(inputDocuments);



*dateandhour field defination:*<field name="dateandhour" type="string"
indexed="false" stored="true"/>

Now here I wanted to put all one-hour data into *20180111_04 *shard.

Thanks for your help.


Regards

Manish Kr. Tanger


On Wed, Jan 10, 2018 at 7:41 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 1/10/2018 12:18 AM, manish tanger wrote:
>
>> I am having a doubt in implicit routing and didn't find much info about
>> this over the internet, so Please help me out on this.
>>
>> *About environment:*
>> M/c 1: Zookeeper 1 and Solr 1
>> M/c 2: Zookeeper 2 and Solr 2
>>
>
> For redundancy with ZK, you need three hosts minimum.  A two-host ZK
> ensemble is actually *less* reliable than using one server.  You aren't
> protected against failure until you have at least three.  You would only
> need a minimum of two Solr hosts, though.
>
> I am using clustered zookeeper and using "CloudSolrClient" from solrJ
>> API in java.
>>
>> *this.solrCloudClient = new
>> CloudSolrClient.Builder().withZkHost(zkHostList).build();*
>>
>> *Requirement:*
>>
>> My requirement is to store lots of data on solr using a single collection.
>> so my idea is that i am going to create a new shard for every hour so that
>> indexing doesn't take much time.
>>
>> I choose for the implicit document routing, but I am unable to redirect
>> the
>> docs on the particular shard. Zookeeper is still distributing it on all
>> nodes and shards.
>>
>
> ZooKeeper isn't responsible for distributing documents between shards.  It
> is Solr that does this, using information in the ZK database.  With the
> implicit router, the only routing information in ZK is the shard names.
> Solr cannot make decisions about which shard gets the documents, that
> information must come from the system doing the indexing.
>
> *What I have tried:*
>> 1. I have created a collection with implicit routing and put customer
>> routing field "*dateandhour*" and add it as a filed in my collection.
>>
>>      While adding solr input doc I am setting this filed with shard name.
>>
>
> What was the precise commands or API calls that you used to create the
> collection?  What is the definition of the dateandhour field?
>
> 2. I have also tried to add shard name to id filed like:
>>       id="*shardName!*uniquedocumentId"
>>
>
> If you want to use a prefix in the uniqueId field, you must be using the
> compositeId router, not the implicit router.  The compositeId router will
> not fit your use case, though -- you cannot add shards to a collection if
> it uses compositeId.  Also, the prefix does not specify the shard by name,
> the value of the prefix is hashed to determine which shard(s) are used.
>
> Here's the documentation on document routing:
>
> https://lucene.apache.org/solr/guide/7_2/shards-and-indexing
> -data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
>
> Thanks,
> Shawn
>
>

Re: Regarding document routing

Reply via email to