Re: distributed search is significantly slower than direct search

Yuval Dotan Sun, 17 Nov 2013 09:45:43 -0800

Hi Tomás
This is just a test environment meant only to reproduce the issue I am
currently investigating.
The number of documents should grow substantially (billions of docs).




On Sun, Nov 17, 2013 at 7:12 PM, Tomás Fernández Löbbe <
tomasflo...@gmail.com> wrote:

> Hi Yuval, quick question. You say that your code has 750k docs and around
> 400mb? Is this some kind of test dataset and you expect it to grow
> significantly? For an index of this size, I wouldn't use distributed
> search, single shard should be fine.
>
>
> Tomás
>
>
> On Sun, Nov 17, 2013 at 6:50 AM, Yuval Dotan <yuvaldo...@gmail.com> wrote:
>
> > Hi,
> >
> > I isolated the case
> >
> > Installed on a new machine (2 x Xeon E5410 2.33GHz)
> >
> > I have an environment with 12Gb of memory.
> >
> > I assigned 6gb of memory to Solr and I’m not running any other memory
> > consuming process so no memory issues should arise.
> >
> > Removed all indexes apart from two:
> >
> > emptyCore – empty – used for routing
> >
> > core1 – holds the stored data – has ~750,000 docs and size of 400Mb
> >
> > Again this is a single machine that holds both indexes.
> >
> > The query
> >
> >
> http://localhost:8210/solr/emptyCore/select?rows=5000&q=*:*&shards=127.0.0.1:8210/solr/core1&wt=jsonQTime
> > takes ~3 seconds
> >
> > and direct query
> > http://localhost:8210/solr/core1/select?rows=5000&q=*:*&wt=json Qtime
> > takes
> > ~15 ms - a magnitude difference.
> >
> > I ran the long query several times and got an improvement of about a sec
> > (33%) but that’s it.
> >
> > I need to better understand why this is happening.
> >
> > I tried looking at Solr code and debugging the issue but with no success.
> >
> > The one thing I did notice is that the getFirstMatch method which
> receives
> > the doc id, searches the term dict and returns the internal id takes most
> > of the time for some reason.
> >
> > I am pretty stuck and would appreciate any ideas
> >
> > My only solution for the moment is to bypass the distributed query,
> > implement code in my own app that directly queries the relevant cores and
> > handles the sorting etc..
> >
> > Thanks
> >
> >
> >
> >
> > On Sat, Nov 16, 2013 at 2:39 PM, Michael Sokolov <
> > msoko...@safaribooksonline.com> wrote:
> >
> > > Did you say what the memory profile of your machine is?  How much
> memory,
> > > and how large are the shards? This is just a random guess, but it might
> > be
> > > that if you are memory-constrained, there is a lot of thrashing caused
> by
> > > paging (swapping?) in and out the sharded indexes while a single index
> > can
> > > be scanned linearly, even if it does need to be paged in.
> > >
> > > -Mike
> > >
> > >
> > > On 11/14/2013 8:10 AM, Elran Dvir wrote:
> > >
> > >> Hi,
> > >>
> > >> We tried returning just the id field and got exactly the same
> > performance.
> > >> Our system is distributed but all shards are in a single machine so
> > >> network issues are not a factor.
> > >> The code we found where Solr is spending its time is on the shard and
> > not
> > >> on the routing core, again all shards are local.
> > >> We investigated the getFirstMatch() method and noticed that the
> > >> MultiTermEnum.reset (inside MultiTerm.iterator) and
> MultiTerm.seekExact
> > >> take 99% of the time.
> > >> Inside these methods, the call to BlockTreeTermsReader$
> > >> FieldReader$SegmentTermsEnum$Frame.loadBlock  takes most of the time.
> > >> Out of the 7 seconds  run these methods take ~5 and
> > >> BinaryResponseWriter.write takes the rest(~ 2 seconds).
> > >>
> > >> We tried increasing cache sizes and got hits, but it only improved the
> > >> query time by a second (~6), so no major effect.
> > >> We are not indexing during our tests. The performance is similar.
> > >> (How do we measure doc size? Is it important due to the fact that the
> > >> performance is the same when returning only id field?)
> > >>
> > >> We still don't completely understand why the query takes this much
> > longer
> > >> although the cores are on the same machine.
> > >>
> > >> Is there a way to improve the performance (code, configuration,
> query)?
> > >>
> > >> -----Original Message-----
> > >> From: idokis...@gmail.com [mailto:idokis...@gmail.com] On Behalf Of
> > >> Manuel Le Normand
> > >> Sent: Thursday, November 14, 2013 1:30 AM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: distributed search is significantly slower than direct
> > search
> > >>
> > >> It's surprising such a query takes a long time, I would assume that
> > after
> > >> trying consistently q=*:* you should be getting cache hits and times
> > should
> > >> be faster. Try see in the adminUI how do your query/doc cache perform.
> > >> Moreover, the query in itself is just asking the first 5000 docs that
> > >> were indexed (returing the first [docid]), so seems all this time is
> > wasted
> > >> on transfer. Out of these 7 secs how much is spent on the above
> method?
> > >> What do you return by default? How big is every doc you display in
> your
> > >> results?
> > >> Might be the matter that both collections work on the same ressources.
> > >> Try elaborating your use-case.
> > >>
> > >> Anyway, it seems like you just made a test to see what will be the
> > >> performance hit in a distributed environment so I'll try to explain
> some
> > >> things we encountered in our benchmarks, with a case that has at least
> > the
> > >> similarity of the num of docs fetched.
> > >>
> > >> We reclaim 2000 docs every query, running over 40 shards. This means
> > >> every shard is actually transfering to our frontend 2000 docs every
> > >> document-match request (the first you were referring to). Even if
> lazily
> > >> loaded, reading 2000 id's (on 40 servers) and lazy loading the fields
> > is a
> > >> tough job. Waiting for the slowest shard to respond, then sorting the
> > docs
> > >> and reloading (lazy or not) the top 2000 docs might take a long time.
> > >>
> > >> Our times are 4-8 secs, but do it's not possible comparing cases.
> We've
> > >> done few steps that improved it along the way, steps that led to
> others.
> > >> These were our starters:
> > >>
> > >>     1. Profile these queries from different servers and solr
> instances,
> > >> try
> > >>     putting your finger what collection is working hard and why. Check
> > if
> > >>     you're stuck on components that don't have an added value for you
> > but
> > >> are
> > >>     used by default.
> > >>     2. Consider eliminating the doc cache. It loads lots of (partly)
> > lazy
> > >>     documents that their probability of secondary usage is low.
> There's
> > >> no such
> > >>     thing "popular docs" when requesting so many docs. You may be
> using
> > >> your
> > >>     memory in a better way.
> > >>     3. Bottleneck check - inner server metrics as cpu user / iowait,
> > >> packets
> > >>     transferred over the network, page faults etc. are excellent in
> > order
> > >> to
> > >>     understand if the disk/network/cpu is slowing you down. Then
> upgrade
> > >>     hardware in one of the shards to check if it helps by looking at
> the
> > >>     upgraded shard qTime compared to other.
> > >>     4. Warm up the index after commiting - try to benchmark how do
> > queries
> > >>     performs before and after some warm-up, let's say some few
> hundreds
> > of
> > >>     queries (from your previous system) in order to warm up the os
> cache
> > >>     (assuming your using NRTDirectoryFactory)
> > >>
> > >>
> > >> Good luck,
> > >> Manu
> > >>
> > >>
> > >> On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <
> > erickerick...@gmail.com>
> > >> wrote:
> > >>
> > >>  One thing you can try, and this is more diagnostic than a cure, is
> > >>> return just the id field (and insure that lazy field loading is
> true).
> > >>> That'll tell you whether the issue is actually fetching the document
> > >>> off disk and decompressing, although frankly that's unlikely since
> you
> > >>> can get your 5,000 rows from a single machine quickly.
> > >>>
> > >>> The code you found where Solr is spending its time, is that on the
> > >>> "routing" core or on the shards? I actually have a hard time
> > >>> understanding how that code could take a long time, doesn't seem
> > >>> right.
> > >>>
> > >>> You are transferring 5,000 docs across the network, so it's possible
> > >>> that your network is just slow, that's certainly a difference between
> > >>> the local and remote case, but that's a stab in the dark.
> > >>>
> > >>> Not much help I know,
> > >>> Erick
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <elr...@checkpoint.com>
> > >>> wrote:
> > >>>
> > >>>  Erick, Thanks for your response.
> > >>>>
> > >>>> We are upgrading our system using Solr.
> > >>>> We need to preserve old functionality.  Our client displays 5K
> > >>>> document and groups them.
> > >>>>
> > >>>> Is there a way to refactor code in order to improve distributed
> > >>>> documents fetching?
> > >>>>
> > >>>> Thanks.
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
> > >>>> Sent: Wednesday, October 30, 2013 3:17 AM
> > >>>> To: solr-user@lucene.apache.org
> > >>>> Subject: Re: distributed search is significantly slower than direct
> > >>>>
> > >>> search
> > >>>
> > >>>> You can't. There will inevitably be some overhead in the distributed
> > >>>>
> > >>> case.
> > >>>
> > >>>> That said, 7 seconds is quite long.
> > >>>>
> > >>>> 5,000 rows is excessive, and probably where your issue is. You're
> > >>>> having to go out and fetch the docs across the wire. Perhaps there
> > >>>> is some batching that could be done there, I don't know whether this
> > >>>> is one document per request or not.
> > >>>>
> > >>>> Why 5K docs?
> > >>>>
> > >>>> Best,
> > >>>> Erick
> > >>>>
> > >>>>
> > >>>> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <elr...@checkpoint.com>
> > >>>>
> > >>> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>>
> > >>>>> I am using Solr 4.4 with multi cores. One core (called template)
> > >>>>> is my "routing" core.
> > >>>>>
> > >>>>> When I run
> > >>>>>
> > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > >>>>> 0.0.1:8983/solr/core1,
> > >>>>> it consistently takes about 7s.
> > >>>>> When I run
> > >>>>> http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> > >>>>> consistently takes about 40ms.
> > >>>>>
> > >>>>> I profiled the distributed query.
> > >>>>> This is the distributed query process (I hope the terms are
> > accurate):
> > >>>>> When solr identifies a distributed query, it sends the query to
> > >>>>> the shard and get matched shard docs.
> > >>>>> Then it sends another query to the shard to get the Solr documents.
> > >>>>> Most time is spent in the last stage in the function "process" of
> > >>>>> "QueryComponent" in:
> > >>>>>
> > >>>>> for (int i=0; i<idArr.size(); i++) {
> > >>>>>          int id = req.getSearcher().getFirstMatch(
> > >>>>>                  new Term(idField.getName(),
> > >>>>> idField.getType().toInternal(idArr.get(i))));
> > >>>>>
> > >>>>> How can I make my distributed query as fast as the direct one?
> > >>>>>
> > >>>>> Thanks.
> > >>>>>
> > >>>>>
> > >>>> Email secured by Check Point
> > >>>>
> > >>>>
> > >> Email secured by Check Point
> > >>
> > >
> > >
> >
>

Re: distributed search is significantly slower than direct search

Reply via email to