Re: Using Solr 3.4 running on tomcat7 - very slow search

Mou Wed, 18 Jul 2012 18:20:44 -0700

Increasing the polling interval does help. But the requirement is to get a
document indexed and searchable instantly ( sounds like RTS), 30 sec is
acceptable.I need to look at Solr NRT and cloud.


I created a new core to accept daily updates and replicate every 10 sec.
Two other cores with 234 Million documents are configured to replicate only
once a day.
I am feeding all three cores but two big cores are not replicating. While
searching I am running a group.field on my unique id and taking the most
updated one. Right now it looks fine.Every day I am going to delete the
last day's records from the daily update.

I am planning to use rsync for replication, it will be fusion IO to fusion
IO , so hopefully will be very fast. What do you think ?

We use windows service ( written in dot net C#) to feed the data using REST
call. That is really fast , we can feed more than 15 Million data in a day
to two cores easily. I am using solr config autocommit = 5 sec

I could not figure out how I was able to achieve those numbers in my test
environment, all configuration were same except I had lot less memory in
test  ! I am trying to find out what I am missing in other configuration.
My SLES kernel version is different in production, its a 3.0.*  , test was
2.6.* but I do not think that can cause a problem.

Thank you again,
Mou

On Wed, Jul 18, 2012 at 6:26 PM, Erick Erickson [via Lucene] <
ml-node+s472066n3995861...@n3.nabble.com> wrote:

> Replication will indeed be incremental. But if you commit too often (and
> committing too often a common mistake) then the merging will
> eventually merge everything into new segments and the whole thing will
> be replicated.
>
> Additionally, optimizing (or forceMerge in 4.x) will make a single segment
> and force the entire index to replicate.
>
> You should emphatically _not_ have to have two cores. Solr is built to
> handle replication etc. I suspect your committing too often or some
> other mis-configuration and you're creating a problem for yourself.
>
> Here's what I'd do:
> 1> increase the polling interval to, say, 10 minutes (or however long you
> can
> live with stale data) on the slave.
>
> 2> decrease the commits you're  doing. This could involve the autocommit
> options
> you might have set in solrconfig.xml. It could be your client (don't
> know how you're
> indexing, solrJ?) and the commitWithin parameter. Could be you're
> optimizing (if you
> are, stop it!).
>
> Note that ramBufferSizeMB has no influence on how often things are
> _committed_.
> When this limit is exceeded, the accumulated indexing data is written
> to the currently-open
> segment. Multiple flushes can go to the _same_ segment. The write-once
> nature of
> segments means that after a segment is closed (through a commit), it
> is not changed. But
> a segment that is not closed may be written to multiple times until it's
> closed.
>
> HTH
> Erick
>
> On Wed, Jul 18, 2012 at 1:25 PM, Mou <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=3995861&i=0>>
> wrote:
>
> > Hi Eric,
> >
> > I totally agree. That's what I also figured ultimately. One thing I am
> not
> > clear.  The replication is supposed to be incremental ?  But looks like
> it
> > is trying to replicate the whole index. May be I am changing the index
> so
> > frequently, it is triggering auto merge and a full replication ? I am
> > thinking in right direction?
> >
> > I see that when I start the solr search instance before I start feeding
> the
> > solr Index, my searches are fine BUT it is using the old searcher so I
> am
> > not seeing the updates in the result.
> >
> > So now I am trying to change my architecture. I am going to have a core
> > dedicated to receive daily updates, which is going to be 5 million docs
> and
> > size is going to be little less than 5 G, which is small and replication
> > will be faster?
> >
> > I will search both the cores i.e. old data and the daily updates and do
> a
> > field collapsing on my unique id so that I do not return duplicate
> results
> > .I haven't tried grouping results ; so not sure about  the performance.
> Any
> > suggestion ?
> >
> > Eventually I have to use Solr trunk like you suggested.
> >
> > Thank you for your help,
> >
> > On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] <
> > [hidden email] <http://user/SendEmail.jtp?type=node&node=3995861&i=1>>
> wrote:
> >
> >> bq: This index is only used for searching and being replicated every 7
> sec
> >> from
> >> the master.
> >>
> >> This is a red-flag. 7 second replication times are likely forcing your
> >> app to spend
> >> all its time opening new searchers. Your cached filter queries are
> >> likely rarely being re-used
> >> because they're being thrown away every 7 seconds. This assumes you're
> >> changing your master index frequently.
> >>
> >> If you need near real time, consider Solr trunk and SolrCloud, but
> >> trying to simulate
> >> NRT with very short replication intervals is usually a bad idea.
> >>
> >> A quick test would be to disable replication for a bit (or lengthen it
> >> to, say, 10 minutes)
> >>
> >> Best
> >> Erick
> >>
> >> On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=3995754&i=0>>
> >> wrote:
> >>
> >> >
> >> >> FWIW, when asked at what point one would want to split JVMs and
> shard,
> >> >> on the same machine, Grant Ingersoll mentioned 16GB, and precisely
> for
> >> >> GC cost reasons. You're way above that.
> >> >
> >> > - his index is 75G, and Grant mentioned RAM heap size; we can use
> >> terabytes
> >> > of index with 16Gb memory.
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >> ------------------------------
> >>  If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
> http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html
> >>  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow
> search, click
> >> here<
>
> >> .
> >> NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> >>
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html
>
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995861.html
>  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
> click
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3995436&code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995873.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using Solr 3.4 running on tomcat7 - very slow search

Reply via email to