On 5/19/2015 3:02 AM, Bram Van Dam wrote:
> I'm looking for a way to have Solr reject documents if a certain field
> value is duplicated (reject, not overwrite). There doesn't seem to be
> any kind of unique option in schema fields.
> 
> The de-duplication feature seems to make this (somewhat) possible, but I
> would like it to provide the unique value myself, without having the
> deduplicator create a hash of field values.
> 
> Am I missing an obvious (or less obvious) way of accomplishing this?

Write a custom update processor and include it in your update chain.
You will then have the ability to do anything you want with the entire
input document before it hits the code to actually do the indexing.

A script update processor is included with Solr allows you to write your
processor in a language other than Java, such as javascript.

https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

Here's how to discard a document in an update processor written in Java:

http://stackoverflow.com/questions/27108200/how-to-cancel-indexing-of-a-solr-document-using-update-request-processor

The javadoc that I linked above describes the ability to return "false"
in other languages to discard the document.

Thanks,
Shawn

Reply via email to