Hi Mark, Darren Thanks very much for your help, Will try collection for each customer then.
Regards, Yandong 2012/5/22 Mark Miller <markrmil...@gmail.com> > I think the key is this: you want to think of a SolrCore on a single node > Solr installation as a collection on a multi node SolrCloud installation. > > So if you would use multiple SolrCore's with a std Solr setup, you should > be using multiple collections in SolrCloud. If you were going to try to do > everything in one SolrCore, that would be like putting everything in one > collection in SolrCloud. I don't think it generally makes sense to try and > work at the SolrCore level when working with SolrCloud. This will be made > more clear once we add a simple collections api. > > So I think your choice should be similar to using a single node - do you > want to put everything in one 'collection' and use a filter to separate > customers (with all its caveats and limitations) or do you want to use a > collection per customer. You can always start up more clusters if you reach > any limits. > > > > On May 22, 2012, at 10:08 AM, Darren Govoni wrote: > > > I'm curious what the solrcloud experts say, but my suggestion is to try > not to over-engineering the search architecture on solrcloud. For example, > what is the benefit of managing the what cores are indexed and searched? > Having to know those details, in my mind, works against the automation in > solrcore, but maybe there's a good reason you want to do it this way. > > > > <br><br><br>------- Original Message ------- > > On 5/22/2012 07:35 AM Yandong Yao wrote:<br>Hi Darren, > > <br> > > <br>Thanks very much for your reply. > > <br> > > <br>The reason I want to control core indexing/searching is that I want > to > > <br>use one core to store one customer's data (all customer share same > > <br>config): such as customer 1 use coreForCustomer1 and customer 2 > > <br>use coreForCustomer2. > > <br> > > <br>Is there any better way than using different core for different > customer? > > <br> > > <br>Another way maybe use different collection for different customer, > while > > <br>not sure how many collections solr cloud could support. Which way is > better > > <br>in terms of flexibility/scalability? (suppose there are tens of > thousands > > <br>customers). > > <br> > > <br>Regards, > > <br>Yandong > > <br> > > <br>2012/5/22 Darren Govoni <dar...@ontrenet.com> > > <br> > > <br>> Why do you want to control what gets indexed into a core and then > > <br>> knowing what core to search? That's the kind of "knowing" that > SolrCloud > > <br>> solves. In SolrCloud, it handles the distribution of documents > across > > <br>> shards and retrieves them regardless of which node is searched > from. > > <br>> That is the point of "cloud", you don't know the details of where > > <br>> exactly documents are being managed (i.e. they are cloudy). It can > > <br>> change and re-balance from time to time. SolrCloud performs the > > <br>> distributed search for you, therefore when you try to search a > node/core > > <br>> with no documents, all the results from the "cloud" are retrieved > > <br>> regardless. This is considered "A Good Thing". > > <br>> > > <br>> It requires a change in thinking about indexing and searching.... > > <br>> > > <br>> On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote: > > <br>> > Hi Guys, > > <br>> > > > <br>> > I use following command to start solr cloud according to solr > cloud wiki. > > <br>> > > > <br>> > yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf > > <br>> > -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar > start.jar > > <br>> > yydzero:example2 bjcoe$ java -Djetty.port=7574 > -DzkHost=localhost:9983 > > <br>> -jar > > <br>> > start.jar > > <br>> > > > <br>> > Then I have created several cores using CoreAdmin API ( > > <br>> > http://localhost:8983/solr/admin/cores?action=CREATE&name= > > <br>> > <coreName>&collection=collection1), and clusterstate.json show > following > > <br>> > topology: > > <br>> > > > <br>> > > > <br>> > collection1: > > <br>> > -- shard1: > > <br>> > -- collection1 > > <br>> > -- CoreForCustomer1 > > <br>> > -- CoreForCustomer3 > > <br>> > -- CoreForCustomer5 > > <br>> > -- shard2: > > <br>> > -- collection1 > > <br>> > -- CoreForCustomer2 > > <br>> > -- CoreForCustomer4 > > <br>> > > > <br>> > > > <br>> > 1) Index: > > <br>> > > > <br>> > Using following command to index mem.xml file in exampledocs > directory. > > <br>> > > > <br>> > yydzero:exampledocs bjcoe$ java -Durl= > > <br>> > http://localhost:8983/solr/coreForCustomer3/update -jar > post.jar mem.xml > > <br>> > SimplePostTool: version 1.4 > > <br>> > SimplePostTool: POSTing files to > > <br>> > http://localhost:8983/solr/coreForCustomer3/update.. > > <br>> > SimplePostTool: POSTing file mem.xml > > <br>> > SimplePostTool: COMMITting Solr index changes. > > <br>> > > > <br>> > And now SolrAdmin UI shows that 'coreForCustomer1', > 'coreForCustomer3', > > <br>> > 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and > other 2 > > <br>> > core has 0 documents. > > <br>> > > > <br>> > *Question 1:* Is this expected behavior? How do I to index > documents > > <br>> into > > <br>> > a specific core? > > <br>> > > > <br>> > *Question 2*: If SolrCloud don't support this yet, how could I > extend it > > <br>> > to support this feature (index document to particular core), > where > > <br>> should i > > <br>> > start, the hashing algorithm? > > <br>> > > > <br>> > *Question 3*: Why the documents are also indexed into > 'coreForCustomer1' > > <br>> > and 'coreForCustomer5'? The default replica for documents are > 1, right? > > <br>> > > > <br>> > Then I try to index some document to 'coreForCustomer2': > > <br>> > > > <br>> > $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update-jar > > <br>> > post.jar ipod_video.xml > > <br>> > > > <br>> > While 'coreForCustomer2' still have 0 documents and documents in > > <br>> ipod_video > > <br>> > are indexed to core for customer 1/3/5. > > <br>> > > > <br>> > *Question 4*: Why this happens? > > <br>> > > > <br>> > 2) Search: I use " > > <br>> > > http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml" to > > <br>> > search against 'CoreForCustomer2', while it will return all > documents in > > <br>> > the whole collection even though this core has no documents at > all. > > <br>> > > > <br>> > Then I use " > > <br>> > > > <br>> > http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml&shards=localhost:8983/solr/coreForCustomer2 > > <br>> ", > > <br>> > and it will return 0 documents. > > <br>> > > > <br>> > *Question 5*: So If want to search against a particular core, we > need to > > <br>> > use 'shards' parameter and use solrCore name as parameter value, > right? > > <br>> > > > <br>> > > > <br>> > Thanks very much in advance! > > <br>> > > > <br>> > Regards, > > <br>> > Yandong > > <br>> > > <br>> > > <br>> > > <br> > > - Mark Miller > lucidimagination.com > > > > > > > > > > > >