Hi,

   I've gone through the mailing archive and have read contradicting
remarks on this issue. Can someone please clear this up as I'm not
able to run distributed search on multi-cores. Is there any document
on how can I search across multicore which share the same schema. Here
are the various comments I've read on this mailing list,

1) http://www.nabble.com/multi-core-vs-multi-app-td15803781.html#a15803781
Don't think you can search against multiple cores "automatically" -
i.e. got to make multiple queries, one for each core and combine
results yourself. Yes, this will slow things down.   - Otis

2) 
http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-td20356173.html#a20356173
The idea behind multicore is that you will use them if you have completely
different type of documents (basically multiple schemas). - Shalin

3) http://www.nabble.com/Distributed-search-td22036229.html#a22036229
That should work, yes, though it may not be a wise thing to do
performance-wise, if the number of CPU cores that solr server has is
lower than the number of Solr cores. - Otis

My only motivation behind using multi-core is to keep the index size
in limit. All my cores are using the same schema. My index grow to
over 30G within a day and I need to keep up to a year of data.  I
couldn't find any other way of scaling using Solr. I've noticed once
the index grows above 10G the index process starts slowing down, the
commit takes much longer and optimize is hard to finish. So, I'm
trying to create a new core after every 10 million documents (equals
to 10G in my case). I don't want to start new Solr instance every 10G
- that won't scale for a year time. I'm going to use 3-4 servers to
hold all these cores.

Now if someone could please tell me if this is a wrong scaling
architecture I could re-think. I want fast indexing at the same time
fast enough search. If I've to search on each core separately and
merge myself the search performance is going to be awful.

Is Solr the right tool for managing billions of records (I can get up
to 100million records every day - with 1Kb per record - 100GB of index
a day)? Most of the field values are pretty distinct (like  10 million
email addresses) so the index size would be huge too.

I would think it's a common problem to scale huge size index keeping
both indexing and search time acceptable. I'm not sure if this can be
managed on just 4 servers - we don't have 100s of boxes for this
project. Any other tool that might be more appropriate for this kind
of case - like Katta or Lucene on Hadoop, or simply use Lucene using
Parallel Search and partition the indexes on size?

Thanks,
-vivek

On Wed, Apr 8, 2009 at 11:07 AM, vivek sar <vivex...@gmail.com> wrote:
> Any help on this issue? Would distributed search on multi-core on same
> Solr instance even work? Does it has to be different Solr instances
> altogether (separate shards)?
>
> I'm kind of stuck at this point right now. Keep getting one of the two
> errors (when running distributed search - single searches work fine)
> as mentioned in this thread earlier.
>
> Thanks,
> -vivek
>
> On Wed, Apr 8, 2009 at 1:57 AM, vivek sar <vivex...@gmail.com> wrote:
>> Thanks Fergus. I'm still having problem with multicore search.
>>
>> I tried the following with two cores (they both share the same schema
>> and solrconfig.xml) on the same box on same solr instance,
>>
>> 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
>> cores in admin interface
>> 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in 
>> xml
>> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
>> gives me top 10 records
>> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
>> gives me top 10 records
>> 5) 
>> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
>>  - this FAILS. I've seen two problems with this.
>>
>>    a) When index are being committed I see,
>>
>> SEVERE: org.apache.solr.common.SolrException:
>> org.apache.solr.client.solrj.SolrServerException:
>> java.net.SocketException: Connection reset
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>        at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>        at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>        at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>        at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>>        at 
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>        at 
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>        at java.lang.Thread.run(Thread.java:637)
>>
>>    b) Other times I see this,
>>
>> SEVERE: java.lang.NullPointerException
>>        at 
>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
>>        at 
>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>        at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>        at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>        at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>        at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>>        at 
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>        at 
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>        at java.lang.Thread.run(Thread.java:637)
>>
>>
>> Any tips on how can I search on multicore on same solr instance?
>>
>> Thanks,
>> -vivek
>>
>> On Mon, Apr 6, 2009 at 2:40 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
>>> vivek,
>>>
>>> 404 from the URL you provided in the message! Similar URLs work
>>> OK for me.
>>>
>>> hmm try http://localhost:8080/solr/admin/cores?action=status and see
>>> if that gives a 404.
>>>
>>> Also are you running a nightly build or a svn checkout? Using tomcat?
>>> Perhaps it should be
>>>
>>> http://localhost:8080/apache-solr-1.4-dev/admin/cores?action=status
>>>
>>> Fergus.
>>>
>>>>Hi,
>>>>
>>>>  Any help on this. I've looked at DistributedSearch on Wiki, but that
>>>>doesn't seem to be working for me on multi-core and multiple Solr
>>>>instances on the same box.
>>>>
>>>>Scenario,
>>>>
>>>>1) Two boxes (localhost, 10.4.x.x)
>>>>2) Two Solr instances on each box (8080 and 8085 ports)
>>>>3) Two cores on each instance (core0, core1)
>>>>
>>>>I'm not sure how to construct my search on the above setup if I need
>>>>to search across all the cores on all the boxes. Here is what I'm
>>>>trying,
>>>>
>>>>http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan
>>>>
>>>>I get 404 error. Is this the right URL construction for my setup? How
>>>>else can I do this?
>>>>
>>>>Thanks,
>>>>-vivek
>>>>
>>>>On Fri, Apr 3, 2009 at 1:02 PM, vivek sar <vivex...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>>  I've a multi-core system (one core per day), so there would be around
>>>>> 30 cores in a month on a box running one Solr instance. We have two
>>>>> boxes running the Solr instance and input data is feeded to them in
>>>>> round-robin fashion. Each box can have up to 30 cores in a month. Here
>>>>> are questions,
>>>>>
>>>>>  1) How would I search for a term in multiple cores on same box?
>>>>>
>>>>>  Single core I'm able to search like,
>>>>>   http://localhost:8080/solr/20090402/select?q=*:*
>>>>>
>>>>> 2) How would I search for a term in multiple cores on both boxes at
>>>>> the same time?
>>>>>
>>>>> 3) Is it possible to have two Solr instances on one box with one doing
>>>>> the indexing and other perform only searches on that index? The idea
>>>>> is have two JVMs with each doing its own task - I'm not sure whether
>>>>> the indexer process needs to know about searcher process - like do
>>>>> they need to have the same solr.xml (for multicore etc). We don't want
>>>>> to replicate the indexes also (we got very light search traffic, but
>>>>> very high indexing traffic) so they need to use the same index.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> -vivek
>>>>>
>>>
>>> --
>>>
>>> ===============================================================
>>> Fergus McMenemie               Email:fer...@twig.me.uk
>>> Techmore Ltd                   Phone:(UK) 07721 376021
>>>
>>> Unix/Mac/Intranets             Analyst Programmer
>>> ===============================================================
>>>
>>
>

Reply via email to