Re: Deduplication

Shalin Shekhar Mangar Wed, 20 May 2015 07:00:13 -0700

On Wed, May 20, 2015 at 12:59 PM, Bram Van Dam <bram.van...@intix.eu> wrote:


> >> Write a custom update processor and include it in your update chain.
> >> You will then have the ability to do anything you want with the entire
> >> input document before it hits the code to actually do the indexing.
>
> This sounded like the perfect option ... until I read Jack's comment:
>
> >
> > My understanding was that the distributed update processor is near the
> end
> > of the chain, so that running of user update processors occurs before the
> > distribution step, but is that distribution to the leader, or
> distribution
> > from leader to replicas for a shard?
>
> That would pose some potential problems.
>
> Would a custom update processor make the solution "cloud-safe"?
>

Starting with Solr 5.1, you have the ability to specify an update processor
on the fly to requests and you can even control whether it is to be
executed before any distribution happens or before it is actually indexed
on the replica.

e.g. you can specify processor=xyz,MyCustomUpdateProc in the request to
have processor xyz run first and then MyCustomUpdateProc and then the
default update processor chain (which will also distribute the doc to the
leader or from the leader to a replica). This also means that such
processors will not be executed on the replicas at all.

You can also specify post-processor=xyz,MyCustomUpdateProc to have xyz and
MyCustomUpdateProc to run on each replica (including the leader) right
before the doc is indexed (i.e. just before RunUpdateProcessor)

Unfortunately, due to an oversight, this feature hasn't been documented
well which is something I'll fix. See
https://issues.apache.org/jira/browse/SOLR-6892 for more details.


>
> Thx,
>
>  - Bram
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Deduplication

Reply via email to