Answers inline below

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, November 17, 2012 6:40 AM
To: solr-user@lucene.apache.org
Subject: Re: inconsistent number of results returned in solr cloud

Hmmm, first an aside. If by "commit after every batch of documents " you
mean after every call to server.add(doclist), there's no real need to do
that unless you're striving for really low latency. the usual
recommendation is to use commitWithin when adding and commit only at the
very end of the run. This shouldn't actually be germane to your issue, just
an FYI.

DB> Good point.  The code for committing docs to solr is fairly old.  I will 
update it since I don't have a latency requirement.

So you're saying that the inconsistency is permanent? By that I mean it
keeps coming back inconsistently for minutes/hours/days?

DB> Yes, it is permanent.  I have collections that have been up for weeks, and 
are still returning inconsistent results, and I haven't been adding any 
additional documents.
DB> Related to this, I seem to have a discrepancy between the number of 
documents I think I am sending to solr, and the number of documents it is 
reporting.  I have tried reducing the number of shards for one of my small 
collections, so I deleted all references to this collections, and reloaded it. 
I think I have 260 documents submitted (counted from a hadoop job).  Solr 
returns a count of ~430 (it varies), and the first returned document is not 
consistent.

I guess if I were trying to test this I'd need to know how you added
subsequent collections. In particular what you did re: zookeeper as you
added each collection.

DB> These are my steps
DB> 1. Create the collection via the HTTP API: 
http://<host>:<port>/solr/admin/collections?action=CREATE&name=<collection>&numShards=6&%20collection.configName=<collection>
DB> 2. Relaunch one of my JVM processes, bootstrapping the collection: 
DB> java -Xmx16g -Dcollection.configName=<collection> -Djetty.port=<port> 
-DzkHost=<zkhost> -Dsolr.solr.home=<solr home> -DnumShards=6 
-Dbootstrap_confdir=conf -jar start.jar
DB> load data

DB> Let me know if something is unclear.  I can run through the process again 
and document it more carefully.
DB>
DB> Thanks for looking at it,
DB> Dave

Best
Erick


On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David <buttl...@llnl.gov> wrote:

> My typical way of adding documents is through SolrJ, where I commit after
> every batch of documents (where the batch size is configurable)
>
> I have now tried committing several times, from the command line (curl)
> with and without openSearcher=true.  It does not affect anything.
>
> Dave
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Friday, November 16, 2012 11:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: inconsistent number of results returned in solr cloud
>
> How did you do the final commit? Can you try a lone commit (with
> openSearcher=true) and see if that affects things?
>
> Trying to determine if this is a known issue or not.
>
> - Mark
>
> On Nov 16, 2012, at 1:34 PM, "Buttler, David" <buttl...@llnl.gov> wrote:
>
> > Hi all,
> > I buried an issue in my last post, so let me pop it up.
> >
> > I have a cluster with 10 collections on it.  The first collection I
> loaded works perfectly.  But every subsequent collection returns an
> inconsistent number of results for each query.  The queries can be simply
> *:*, or more complex facet queries.  If I go to individual cores and issue
> the query, with distrib=false, I get a consistent number of results.  I am
> wondering if there is some delay in returning results from my shards, and
> the queried node just times out and displays the number of results that it
> has received so far.  If there is such a timeout, it must be very small, as
> my QTime is around 11 ms.
> >
> > Dave
>
>

Reply via email to