As Jan pointed out, unless your client sends Solr some instructions for
what to do with those documents specifically, Solr doesn't do anything.

In your example, Nutch crawls 30 documents at first, and 30 documents are
sent to Solr and added to the index. On next crawl, it finds 27 documents,
and 27 documents are sent to Solr. If these documents have the same unique
keys (IDs) as 27 documents already in the index, the documents in the index
will be updated (someone can correct me on this, but I believe these IDs
get updated even if the content itself has not changed).

Unless Nutch (or any other client) specifically tells Solr to do something
with the 3 documents that were not sent as part of this second update, Solr
does nothing with regard to those documents. Which makes sense, you don't
want Solr just deleting documents because you didn't happen to update them
with every indexing request.

Solr maintains no record of where a document came from, what client sent
it, nor whether subsequent updates from the same client update or do not
update the same set of documents as previous requests from the same client.
It is up to the client process itself to keep track of this, and send Solr
details of what to do with subsequent update requests. In this case, what
you want is for Nutch to send Solr a delete by ID request for those 3
documents so they are removed. I'm not sure if Nutch is capable of doing
that, however.

On Thu, Aug 30, 2018 at 7:00 AM kunhu0...@gmail.com <kunhu0...@gmail.com>
wrote:

> Thanks for the update
>
> I'm using Nutch 1.14 and Solr 6.6.3 and Zookeeper 3.4.12. We are using two
> Solr and configured as Solr cloud. Please let me know if anything is
> missing
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Reply via email to