Answers inline below -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, November 17, 2012 6:40 AM To: solr-user@lucene.apache.org Subject: Re: inconsistent number of results returned in solr cloud
Hmmm, first an aside. If by "commit after every batch of documents " you mean after every call to server.add(doclist), there's no real need to do that unless you're striving for really low latency. the usual recommendation is to use commitWithin when adding and commit only at the very end of the run. This shouldn't actually be germane to your issue, just an FYI. DB> Good point. The code for committing docs to solr is fairly old. I will update it since I don't have a latency requirement. So you're saying that the inconsistency is permanent? By that I mean it keeps coming back inconsistently for minutes/hours/days? DB> Yes, it is permanent. I have collections that have been up for weeks, and are still returning inconsistent results, and I haven't been adding any additional documents. DB> Related to this, I seem to have a discrepancy between the number of documents I think I am sending to solr, and the number of documents it is reporting. I have tried reducing the number of shards for one of my small collections, so I deleted all references to this collections, and reloaded it. I think I have 260 documents submitted (counted from a hadoop job). Solr returns a count of ~430 (it varies), and the first returned document is not consistent. I guess if I were trying to test this I'd need to know how you added subsequent collections. In particular what you did re: zookeeper as you added each collection. DB> These are my steps DB> 1. Create the collection via the HTTP API: http://<host>:<port>/solr/admin/collections?action=CREATE&name=<collection>&numShards=6&%20collection.configName=<collection> DB> 2. Relaunch one of my JVM processes, bootstrapping the collection: DB> java -Xmx16g -Dcollection.configName=<collection> -Djetty.port=<port> -DzkHost=<zkhost> -Dsolr.solr.home=<solr home> -DnumShards=6 -Dbootstrap_confdir=conf -jar start.jar DB> load data DB> Let me know if something is unclear. I can run through the process again and document it more carefully. DB> DB> Thanks for looking at it, DB> Dave Best Erick On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David <buttl...@llnl.gov> wrote: > My typical way of adding documents is through SolrJ, where I commit after > every batch of documents (where the batch size is configurable) > > I have now tried committing several times, from the command line (curl) > with and without openSearcher=true. It does not affect anything. > > Dave > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, November 16, 2012 11:04 AM > To: solr-user@lucene.apache.org > Subject: Re: inconsistent number of results returned in solr cloud > > How did you do the final commit? Can you try a lone commit (with > openSearcher=true) and see if that affects things? > > Trying to determine if this is a known issue or not. > > - Mark > > On Nov 16, 2012, at 1:34 PM, "Buttler, David" <buttl...@llnl.gov> wrote: > > > Hi all, > > I buried an issue in my last post, so let me pop it up. > > > > I have a cluster with 10 collections on it. The first collection I > loaded works perfectly. But every subsequent collection returns an > inconsistent number of results for each query. The queries can be simply > *:*, or more complex facet queries. If I go to individual cores and issue > the query, with distrib=false, I get a consistent number of results. I am > wondering if there is some delay in returning results from my shards, and > the queried node just times out and displays the number of results that it > has received so far. If there is such a timeout, it must be very small, as > my QTime is around 11 ms. > > > > Dave > >