RE: Re: SolrCloud: how to index documents into a specific core and how to search against that core?

Darren Govoni Tue, 22 May 2012 07:09:09 -0700

I'm curious what the solrcloud experts say, but my suggestion is to try not to 
over-engineering the search architecture  on solrcloud. For example, what is 
the benefit of managing the what cores are indexed and searched? Having to know 
those details, in my mind, works against the automation in solrcore, but maybe 
there's a good reason you want to do it this way.


<br><br><br>------- Original Message -------
On 5/22/2012  07:35 AM Yandong Yao wrote:<br>Hi Darren,
<br>
<br>Thanks very much for your reply.
<br>
<br>The reason I want to control core indexing/searching is that I want to
<br>use one core to store one customer's data (all customer share same
<br>config):  such as customer 1 use coreForCustomer1 and customer 2
<br>use coreForCustomer2.
<br>
<br>Is there any better way than using different core for different customer?
<br>
<br>Another way maybe use different collection for different customer, while
<br>not sure how many collections solr cloud could support. Which way is better
<br>in terms of flexibility/scalability? (suppose there are tens of thousands
<br>customers).
<br>
<br>Regards,
<br>Yandong
<br>
<br>2012/5/22 Darren Govoni <dar...@ontrenet.com>
<br>
<br>> Why do you want to control what gets indexed into a core and then
<br>> knowing what core to search? That's the kind of "knowing" that SolrCloud
<br>> solves. In SolrCloud, it handles the distribution of documents across
<br>> shards and retrieves them regardless of which node is searched from.
<br>> That is the point of "cloud", you don't know the details of where
<br>> exactly documents are being managed (i.e. they are cloudy). It can
<br>> change and re-balance from time to time. SolrCloud performs the
<br>> distributed search for you, therefore when you try to search a node/core
<br>> with no documents, all the results from the "cloud" are retrieved
<br>> regardless. This is considered "A Good Thing".
<br>>
<br>> It requires a change in thinking about indexing and searching....
<br>>
<br>> On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
<br>> > Hi Guys,
<br>> >
<br>> > I use following command to start solr cloud according to solr cloud 
wiki.
<br>> >
<br>> > yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
<br>> > -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
<br>> > yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983
<br>> -jar
<br>> > start.jar
<br>> >
<br>> > Then I have created several cores using CoreAdmin API (
<br>> > http://localhost:8983/solr/admin/cores?action=CREATE&name=
<br>> > <coreName>&collection=collection1), and clusterstate.json show following
<br>> > topology:
<br>> >
<br>> >
<br>> > collection1:
<br>> >     -- shard1:
<br>> >           -- collection1
<br>> >           -- CoreForCustomer1
<br>> >           -- CoreForCustomer3
<br>> >           -- CoreForCustomer5
<br>> >     -- shard2:
<br>> >           -- collection1
<br>> >           -- CoreForCustomer2
<br>> >           -- CoreForCustomer4
<br>> >
<br>> >
<br>> > 1) Index:
<br>> >
<br>> > Using following command to index mem.xml file in exampledocs directory.
<br>> >
<br>> > yydzero:exampledocs bjcoe$ java -Durl=
<br>> > http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
<br>> > SimplePostTool: version 1.4
<br>> > SimplePostTool: POSTing files to
<br>> > http://localhost:8983/solr/coreForCustomer3/update..
<br>> > SimplePostTool: POSTing file mem.xml
<br>> > SimplePostTool: COMMITting Solr index changes.
<br>> >
<br>> > And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
<br>> > 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
<br>> > core has 0 documents.
<br>> >
<br>> > *Question 1:*  Is this expected behavior? How do I to index documents
<br>> into
<br>> > a specific core?
<br>> >
<br>> > *Question 2*:  If SolrCloud don't support this yet, how could I extend 
it
<br>> > to support this feature (index document to particular core), where
<br>> should i
<br>> > start, the hashing algorithm?
<br>> >
<br>> > *Question 3*:  Why the documents are also indexed into 
'coreForCustomer1'
<br>> > and 'coreForCustomer5'?  The default replica for documents are 1, right?
<br>> >
<br>> > Then I try to index some document to 'coreForCustomer2':
<br>> >
<br>> > $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
<br>> > post.jar ipod_video.xml
<br>> >
<br>> > While 'coreForCustomer2' still have 0 documents and documents in
<br>> ipod_video
<br>> > are indexed to core for customer 1/3/5.
<br>> >
<br>> > *Question 4*:  Why this happens?
<br>> >
<br>> > 2) Search: I use "
<br>> > http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml"; to
<br>> > search against 'CoreForCustomer2', while it will return all documents in
<br>> > the whole collection even though this core has no documents at all.
<br>> >
<br>> > Then I use "
<br>> >
<br>> 
http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml&shards=localhost:8983/solr/coreForCustomer2
<br>> ",
<br>> > and it will return 0 documents.
<br>> >
<br>> > *Question 5*: So If want to search against a particular core, we need to
<br>> > use 'shards' parameter and use solrCore name as parameter value, right?
<br>> >
<br>> >
<br>> > Thanks very much in advance!
<br>> >
<br>> > Regards,
<br>> > Yandong
<br>>
<br>>
<br>>
<br>

RE: Re: SolrCloud: how to index documents into a specific core and how to search against that core?

Reply via email to