Hi Tomás This is just a test environment meant only to reproduce the issue I am currently investigating. The number of documents should grow substantially (billions of docs).
On Sun, Nov 17, 2013 at 7:12 PM, Tomás Fernández Löbbe < tomasflo...@gmail.com> wrote: > Hi Yuval, quick question. You say that your code has 750k docs and around > 400mb? Is this some kind of test dataset and you expect it to grow > significantly? For an index of this size, I wouldn't use distributed > search, single shard should be fine. > > > Tomás > > > On Sun, Nov 17, 2013 at 6:50 AM, Yuval Dotan <yuvaldo...@gmail.com> wrote: > > > Hi, > > > > I isolated the case > > > > Installed on a new machine (2 x Xeon E5410 2.33GHz) > > > > I have an environment with 12Gb of memory. > > > > I assigned 6gb of memory to Solr and I’m not running any other memory > > consuming process so no memory issues should arise. > > > > Removed all indexes apart from two: > > > > emptyCore – empty – used for routing > > > > core1 – holds the stored data – has ~750,000 docs and size of 400Mb > > > > Again this is a single machine that holds both indexes. > > > > The query > > > > > http://localhost:8210/solr/emptyCore/select?rows=5000&q=*:*&shards=127.0.0.1:8210/solr/core1&wt=jsonQTime > > takes ~3 seconds > > > > and direct query > > http://localhost:8210/solr/core1/select?rows=5000&q=*:*&wt=json Qtime > > takes > > ~15 ms - a magnitude difference. > > > > I ran the long query several times and got an improvement of about a sec > > (33%) but that’s it. > > > > I need to better understand why this is happening. > > > > I tried looking at Solr code and debugging the issue but with no success. > > > > The one thing I did notice is that the getFirstMatch method which > receives > > the doc id, searches the term dict and returns the internal id takes most > > of the time for some reason. > > > > I am pretty stuck and would appreciate any ideas > > > > My only solution for the moment is to bypass the distributed query, > > implement code in my own app that directly queries the relevant cores and > > handles the sorting etc.. > > > > Thanks > > > > > > > > > > On Sat, Nov 16, 2013 at 2:39 PM, Michael Sokolov < > > msoko...@safaribooksonline.com> wrote: > > > > > Did you say what the memory profile of your machine is? How much > memory, > > > and how large are the shards? This is just a random guess, but it might > > be > > > that if you are memory-constrained, there is a lot of thrashing caused > by > > > paging (swapping?) in and out the sharded indexes while a single index > > can > > > be scanned linearly, even if it does need to be paged in. > > > > > > -Mike > > > > > > > > > On 11/14/2013 8:10 AM, Elran Dvir wrote: > > > > > >> Hi, > > >> > > >> We tried returning just the id field and got exactly the same > > performance. > > >> Our system is distributed but all shards are in a single machine so > > >> network issues are not a factor. > > >> The code we found where Solr is spending its time is on the shard and > > not > > >> on the routing core, again all shards are local. > > >> We investigated the getFirstMatch() method and noticed that the > > >> MultiTermEnum.reset (inside MultiTerm.iterator) and > MultiTerm.seekExact > > >> take 99% of the time. > > >> Inside these methods, the call to BlockTreeTermsReader$ > > >> FieldReader$SegmentTermsEnum$Frame.loadBlock takes most of the time. > > >> Out of the 7 seconds run these methods take ~5 and > > >> BinaryResponseWriter.write takes the rest(~ 2 seconds). > > >> > > >> We tried increasing cache sizes and got hits, but it only improved the > > >> query time by a second (~6), so no major effect. > > >> We are not indexing during our tests. The performance is similar. > > >> (How do we measure doc size? Is it important due to the fact that the > > >> performance is the same when returning only id field?) > > >> > > >> We still don't completely understand why the query takes this much > > longer > > >> although the cores are on the same machine. > > >> > > >> Is there a way to improve the performance (code, configuration, > query)? > > >> > > >> -----Original Message----- > > >> From: idokis...@gmail.com [mailto:idokis...@gmail.com] On Behalf Of > > >> Manuel Le Normand > > >> Sent: Thursday, November 14, 2013 1:30 AM > > >> To: solr-user@lucene.apache.org > > >> Subject: Re: distributed search is significantly slower than direct > > search > > >> > > >> It's surprising such a query takes a long time, I would assume that > > after > > >> trying consistently q=*:* you should be getting cache hits and times > > should > > >> be faster. Try see in the adminUI how do your query/doc cache perform. > > >> Moreover, the query in itself is just asking the first 5000 docs that > > >> were indexed (returing the first [docid]), so seems all this time is > > wasted > > >> on transfer. Out of these 7 secs how much is spent on the above > method? > > >> What do you return by default? How big is every doc you display in > your > > >> results? > > >> Might be the matter that both collections work on the same ressources. > > >> Try elaborating your use-case. > > >> > > >> Anyway, it seems like you just made a test to see what will be the > > >> performance hit in a distributed environment so I'll try to explain > some > > >> things we encountered in our benchmarks, with a case that has at least > > the > > >> similarity of the num of docs fetched. > > >> > > >> We reclaim 2000 docs every query, running over 40 shards. This means > > >> every shard is actually transfering to our frontend 2000 docs every > > >> document-match request (the first you were referring to). Even if > lazily > > >> loaded, reading 2000 id's (on 40 servers) and lazy loading the fields > > is a > > >> tough job. Waiting for the slowest shard to respond, then sorting the > > docs > > >> and reloading (lazy or not) the top 2000 docs might take a long time. > > >> > > >> Our times are 4-8 secs, but do it's not possible comparing cases. > We've > > >> done few steps that improved it along the way, steps that led to > others. > > >> These were our starters: > > >> > > >> 1. Profile these queries from different servers and solr > instances, > > >> try > > >> putting your finger what collection is working hard and why. Check > > if > > >> you're stuck on components that don't have an added value for you > > but > > >> are > > >> used by default. > > >> 2. Consider eliminating the doc cache. It loads lots of (partly) > > lazy > > >> documents that their probability of secondary usage is low. > There's > > >> no such > > >> thing "popular docs" when requesting so many docs. You may be > using > > >> your > > >> memory in a better way. > > >> 3. Bottleneck check - inner server metrics as cpu user / iowait, > > >> packets > > >> transferred over the network, page faults etc. are excellent in > > order > > >> to > > >> understand if the disk/network/cpu is slowing you down. Then > upgrade > > >> hardware in one of the shards to check if it helps by looking at > the > > >> upgraded shard qTime compared to other. > > >> 4. Warm up the index after commiting - try to benchmark how do > > queries > > >> performs before and after some warm-up, let's say some few > hundreds > > of > > >> queries (from your previous system) in order to warm up the os > cache > > >> (assuming your using NRTDirectoryFactory) > > >> > > >> > > >> Good luck, > > >> Manu > > >> > > >> > > >> On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson < > > erickerick...@gmail.com> > > >> wrote: > > >> > > >> One thing you can try, and this is more diagnostic than a cure, is > > >>> return just the id field (and insure that lazy field loading is > true). > > >>> That'll tell you whether the issue is actually fetching the document > > >>> off disk and decompressing, although frankly that's unlikely since > you > > >>> can get your 5,000 rows from a single machine quickly. > > >>> > > >>> The code you found where Solr is spending its time, is that on the > > >>> "routing" core or on the shards? I actually have a hard time > > >>> understanding how that code could take a long time, doesn't seem > > >>> right. > > >>> > > >>> You are transferring 5,000 docs across the network, so it's possible > > >>> that your network is just slow, that's certainly a difference between > > >>> the local and remote case, but that's a stab in the dark. > > >>> > > >>> Not much help I know, > > >>> Erick > > >>> > > >>> > > >>> > > >>> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <elr...@checkpoint.com> > > >>> wrote: > > >>> > > >>> Erick, Thanks for your response. > > >>>> > > >>>> We are upgrading our system using Solr. > > >>>> We need to preserve old functionality. Our client displays 5K > > >>>> document and groups them. > > >>>> > > >>>> Is there a way to refactor code in order to improve distributed > > >>>> documents fetching? > > >>>> > > >>>> Thanks. > > >>>> > > >>>> -----Original Message----- > > >>>> From: Erick Erickson [mailto:erickerick...@gmail.com] > > >>>> Sent: Wednesday, October 30, 2013 3:17 AM > > >>>> To: solr-user@lucene.apache.org > > >>>> Subject: Re: distributed search is significantly slower than direct > > >>>> > > >>> search > > >>> > > >>>> You can't. There will inevitably be some overhead in the distributed > > >>>> > > >>> case. > > >>> > > >>>> That said, 7 seconds is quite long. > > >>>> > > >>>> 5,000 rows is excessive, and probably where your issue is. You're > > >>>> having to go out and fetch the docs across the wire. Perhaps there > > >>>> is some batching that could be done there, I don't know whether this > > >>>> is one document per request or not. > > >>>> > > >>>> Why 5K docs? > > >>>> > > >>>> Best, > > >>>> Erick > > >>>> > > >>>> > > >>>> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <elr...@checkpoint.com> > > >>>> > > >>> wrote: > > >>> > > >>>> Hi all, > > >>>>> > > >>>>> I am using Solr 4.4 with multi cores. One core (called template) > > >>>>> is my "routing" core. > > >>>>> > > >>>>> When I run > > >>>>> > > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127. > > >>>>> 0.0.1:8983/solr/core1, > > >>>>> it consistently takes about 7s. > > >>>>> When I run > > >>>>> http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it > > >>>>> consistently takes about 40ms. > > >>>>> > > >>>>> I profiled the distributed query. > > >>>>> This is the distributed query process (I hope the terms are > > accurate): > > >>>>> When solr identifies a distributed query, it sends the query to > > >>>>> the shard and get matched shard docs. > > >>>>> Then it sends another query to the shard to get the Solr documents. > > >>>>> Most time is spent in the last stage in the function "process" of > > >>>>> "QueryComponent" in: > > >>>>> > > >>>>> for (int i=0; i<idArr.size(); i++) { > > >>>>> int id = req.getSearcher().getFirstMatch( > > >>>>> new Term(idField.getName(), > > >>>>> idField.getType().toInternal(idArr.get(i)))); > > >>>>> > > >>>>> How can I make my distributed query as fast as the direct one? > > >>>>> > > >>>>> Thanks. > > >>>>> > > >>>>> > > >>>> Email secured by Check Point > > >>>> > > >>>> > > >> Email secured by Check Point > > >> > > > > > > > > >