On 5/20/2015 5:41 PM, Ryan Cutter wrote: > I have a collection with 1 billion documents and I want to delete 500 of > them. The collection has a dozen shards and a couple replicas. Using Solr > 4.4. > > Sent the delete query via HTTP: > > http://hostname:8983/solr/my_collection/update?stream.body= > <delete><query>source:foo</query></delete> > > Took a couple minutes and several replicas got knocked into Recovery mode. > They eventually came back and the desired docs were deleted but the cluster > wasn't thrilled (high load, etc). > > Is this expected behavior? Is there a better way to delete documents that > I'm missing?
That's the correct way to do the delete. Before you'll see the change, a commit must happen in one way or another. Hopefully you already knew that. I believe that your setup has some performance issues that are making it very slow and knocking out your Solr nodes temporarily. The most common root problems with SolrCloud and indexes going into recovery are: 1) Your heap is enormous but your garbage collection is not tuned. 2) You don't have enough RAM, separate from your Java heap, for adequate index caching. With a billion documents in your collection, you might even be having problems with both. Here's a wiki page that includes some info on both of these problems, plus a few others: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn