Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

Erick Erickson Sun, 20 Jan 2013 07:46:42 -0800

If this was in SolrCloud mode, there was a bug in 4.0 when submitting
batches of documents at once. Can't find it right now, but thought I'd
mention it just in case. Submitting the docs one-at-a-time doesn't
have the same problem.


May not be applicable, and entirely orthogonal to the discussion about
swallowing errors....

Erick

On Tue, Jan 15, 2013 at 4:10 PM, Mark Bennett <mbenn...@ideaeng.com> wrote:
> First off, just reporting this:
>
> I wound up with approx 58% few documents having submitted via
> ConcurrentUpdateSolrServer.  I went back and changed the code to use
> HttpSolrServer and had 100%
>
> This was a long running test, approx 12 hours, with gigabytes of data, so
> conveniently shared / reproducible, but I at least wanted to email around,
> in part to get it "on the record", and second to see if anybody else has
> seen this?  I didn't see anything in JIRA.
>
> I realize that Concurrent update is asynchronous and I'm giving up the
> ability to monitor things, but since it works using the old server, there's
> nothing glaringly wrong at least.
>
> Here's a few more details:
> * Approx 2 M docs, submitted 1,000 at a time.
> * Solr 4.0.0 on Windows Server 2008
> * Solr server JVM configured with 4 Gigs of RAM
> * Submitting client JVM (SolrJ) configured with 10 Gigs of RAM
> * Did didn't see any OOM (Out Of Memory) errors on the asynchronous /
> ConcurrentUpdateSolrServer run.  However, I didn't capture the entire log.
> Usually with OOM it's just before the run crashes, and the end of the log
> on the screen looked fine.
> * I also didn't think there was OOM issues on the Solr server side, for the
> same reason
> * When submitting the same data synchronously (via HttpSolrServer) it
> didn't have any problems
>
> Questions:
>
> The async client certainly finished faster, and since the underlying Solr
> server presumably didn't do the real work any faster, presumably a backlog
> built up somewhere.  Agreed?
>
> I'm guessing this backlog had something to do with the failure.  Or are
> there other areas to think about?
>
> Which process would get backlogged, the SolrJ client or the Solr server?
> I'd guess the server?
>
> And if async submits are accumulated in the Solr server, is there some
> mechanism to queue them onto disk, or does it try to hold them all in RAM?
>
> And *if* the backlog caused an OOM condition, wouldn't that JVM have mostly
> crashed (if not completely)?
>
> Any guesses on the mostly likely failure point, and where to look?
>
> Thanks,
> Mark
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

Reply via email to