On 5/20/2015 5:41 PM, Ryan Cutter wrote:
> I have a collection with 1 billion documents and I want to delete 500 of
> them.  The collection has a dozen shards and a couple replicas.  Using Solr
> 4.4.
> 
> Sent the delete query via HTTP:
> 
> http://hostname:8983/solr/my_collection/update?stream.body=
> <delete><query>source:foo</query></delete>
> 
> Took a couple minutes and several replicas got knocked into Recovery mode.
> They eventually came back and the desired docs were deleted but the cluster
> wasn't thrilled (high load, etc).
> 
> Is this expected behavior?  Is there a better way to delete documents that
> I'm missing?

That's the correct way to do the delete.  Before you'll see the change,
a commit must happen in one way or another.  Hopefully you already knew
that.

I believe that your setup has some performance issues that are making it
very slow and knocking out your Solr nodes temporarily.

The most common root problems with SolrCloud and indexes going into
recovery are:  1) Your heap is enormous but your garbage collection is
not tuned.  2) You don't have enough RAM, separate from your Java heap,
for adequate index caching.  With a billion documents in your
collection, you might even be having problems with both.

Here's a wiki page that includes some info on both of these problems,
plus a few others:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Reply via email to