As Jan pointed out, unless your client sends Solr some instructions for what to do with those documents specifically, Solr doesn't do anything.
In your example, Nutch crawls 30 documents at first, and 30 documents are sent to Solr and added to the index. On next crawl, it finds 27 documents, and 27 documents are sent to Solr. If these documents have the same unique keys (IDs) as 27 documents already in the index, the documents in the index will be updated (someone can correct me on this, but I believe these IDs get updated even if the content itself has not changed). Unless Nutch (or any other client) specifically tells Solr to do something with the 3 documents that were not sent as part of this second update, Solr does nothing with regard to those documents. Which makes sense, you don't want Solr just deleting documents because you didn't happen to update them with every indexing request. Solr maintains no record of where a document came from, what client sent it, nor whether subsequent updates from the same client update or do not update the same set of documents as previous requests from the same client. It is up to the client process itself to keep track of this, and send Solr details of what to do with subsequent update requests. In this case, what you want is for Nutch to send Solr a delete by ID request for those 3 documents so they are removed. I'm not sure if Nutch is capable of doing that, however. On Thu, Aug 30, 2018 at 7:00 AM kunhu0...@gmail.com <kunhu0...@gmail.com> wrote: > Thanks for the update > > I'm using Nutch 1.14 and Solr 6.6.3 and Zookeeper 3.4.12. We are using two > Solr and configured as Solr cloud. Please let me know if anything is > missing > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >