On Wed, May 20, 2015 at 12:59 PM, Bram Van Dam <bram.van...@intix.eu> wrote:
> >> Write a custom update processor and include it in your update chain. > >> You will then have the ability to do anything you want with the entire > >> input document before it hits the code to actually do the indexing. > > This sounded like the perfect option ... until I read Jack's comment: > > > > > My understanding was that the distributed update processor is near the > end > > of the chain, so that running of user update processors occurs before the > > distribution step, but is that distribution to the leader, or > distribution > > from leader to replicas for a shard? > > That would pose some potential problems. > > Would a custom update processor make the solution "cloud-safe"? > Starting with Solr 5.1, you have the ability to specify an update processor on the fly to requests and you can even control whether it is to be executed before any distribution happens or before it is actually indexed on the replica. e.g. you can specify processor=xyz,MyCustomUpdateProc in the request to have processor xyz run first and then MyCustomUpdateProc and then the default update processor chain (which will also distribute the doc to the leader or from the leader to a replica). This also means that such processors will not be executed on the replicas at all. You can also specify post-processor=xyz,MyCustomUpdateProc to have xyz and MyCustomUpdateProc to run on each replica (including the leader) right before the doc is indexed (i.e. just before RunUpdateProcessor) Unfortunately, due to an oversight, this feature hasn't been documented well which is something I'll fix. See https://issues.apache.org/jira/browse/SOLR-6892 for more details. > > Thx, > > - Bram > > -- Regards, Shalin Shekhar Mangar.