Re: ElasticsearchSystemProducer Crashes Samza Job

2016-02-16 Thread Roger Hoover
The code that you showed below is part of the BulkProcessor.Listener interface so if that listener were pluggable, you could override the default behavior (which is to only ignore version conflicts). On Tue, Feb 16, 2016 at 12:27 PM, jeremiah adams wrote: > The root of the issue may be in the

Re: ElasticsearchSystemProducer Crashes Samza Job

2016-02-16 Thread jeremiah adams
The root of the issue may be in the HTTP status code handling. This code seems to imply that the only valid error case from Elasticsearch is conflict. This is too narrow of a constraint. In one of my use cases, a mapping/message conflict occurs resulting in an HTTP 400. In my case, it is perfectly

Re: ElasticsearchSystemProducer Crashes Samza Job

2016-02-16 Thread Roger Hoover
Hi Jeremiah, There's currently no way to do that. I think the best way to modify the existing ElasticsearchSystemProducer would be to add a config option for a callback to let you customize this behavior. Basically, a pluggable listener ( https://github.com/apache/samza/blob/master/samza-elastic

ElasticsearchSystemProducer Crashes Samza Job

2016-02-15 Thread jeremiah adams
We have a samza job configured to run in a yarn cluster. This job consumes multiple kafka topics and routes the messages to elasticsearch for indexing. When enough batch-updates to elasticsearch fail using the ElasticsearchSystemProducer, the entire samza job dies. Due to checkpointing + yarn, the