Can you do me a favor and try not using the batch add for a run? Just do the add one doc at a time. (solrServer.add(doc) rather than solrServer.add(collection))
I just fixed one issue with it this morning on trunk - it may be the cause of this oddity. I'm also working on some performance issues around that method too (good performance without starting thousands of threads). Until I get all that straightened out (hopefully very soon), I think you will have better luck not using the bulk, collection add method. On Aug 2, 2012, at 2:16 PM, Timothy Potter <thelabd...@gmail.com> wrote: > Thanks Mark. > > I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer: > > Collection<SolrInputDocument> batch = ... > ... build up batch ... > solrServer.add( batch ); > > Basically, I have a custom Pig StoreFunc that sends docs to Solr from > our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA > is that I couldn't get it to run in my Hadoop environment. There's > some classpath conflict with the Apache HttpClient. SolrJ 4 depends on > 4.1.3 but when I run it in my env, I get the following: > > Caused by: java.lang.NoSuchMethodError: > org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method > <init>()V not found > at > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94) > at > org.apache.solr.client.solrj.impl.CloudSolrServer.<init>(CloudSolrServer.java:70) > ... 16 more > > I spent hours trying to resolve the classpath issue and finally had to > bail and just used the 3.4 SolrJ client as I'm just at the evaluation > stage at this point. So it sounds like this could be the cause of my > problems. > > One other thing ... I do have the _version_ field defined in my > schema.xml but am not setting it on the client side when indexing. > Should I be doing that? > > Cheers, > Tim > > > On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller <markrmil...@gmail.com> wrote: >> >> On Aug 2, 2012, at 11:08 AM, Timothy Potter <thelabd...@gmail.com> wrote: >> >>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very >>> impressed so far ... >>> >>> I have a 12-shard index with ~104M docs with each shard having >>> 1-replica (so 24 Solr servers running) >>> >>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery >>> (*:*) and each time I send the request the value for numFound in the >>> result is different. It's always close but not exactly the same as I >>> would expect? Can anyone shed some light on this issue? I also tried a >>> real query, such as "#olympics lochte" and same thing - different >>> numFound each time. The first page of actual docs returned is the same >>> so maybe I should just ignore the numFound issue? >>> >>> Note that while experiencing this behavior, I am not adding any docs >>> to the index and all docs have been committed with waitFlush=true and >>> waitSearcher=true on the commit. Also, not doing soft commits at this >>> point. In addition, after having committed all 104M docs, I hit the >>> optimize button the panel so I have only 1 segment. In other words, >>> the index is not being updated and has been optimized at this point. >> >> >> How are you adding docs? Eg what client and what method in particular (what >> is your line of code that actually adds the doc). >> >> You can find the numFound result for each node by passing the param >> distrib=false. What does this tell you? Are your replicas in sync with the >> leader? What does the count for each shard add up to? >> >> I would not ignore the issue - something must be off. It may somehow be user >> error, it may be a bug that has been fixed since the alpha, or it may be >> something new. >> >> Are you sure every shard you are issuing the query *from* is active and live >> according to ZooKeeper? Eg when you look at the cloud admin view and look at >> the cluster visualization, are all the nodes green? >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> - Mark Miller lucidimagination.com