Handling Batched Failures in ElasticsearchSink

Rion Williams Thu, 23 Mar 2023 08:44:50 -0700

Hi all,

I have a pipeline that is currently reading from Kafka and writing to 
Elasticsearch. I recently was doing some testing for how it handles failures 
and was wondering if there’s a best practice or recommendation for doing so. 
Specifically, if I have a batch of 100 records being sent via a BulkProcessor 
call (internally from the sink), and a single record in the batch is bad (for 
whatever reason), how I might handle this.


Ideally, I’d be able to only retry the message(s) in the batch that failed, but 
that may require access to the BulkProcessor instance directly (if possible at 
all). I don’t see a way to easily discern or govern how reindexing should be 
handled within the onFailure handler, or if I would need access to the 
afterBulk handler on the processor specifically.

Just trying to leverage the batching without making a potentially large 
additional bulk request to Elastic due to one bad record in a batch.

Any recommendations on how I might handle this? It doesn’t seem like disabling 
batching (I.e. send one record at at time) is anywhere near performance enough 
and fails under large volumes.

Rion

(dev+user for reach)

Handling Batched Failures in ElasticsearchSink

Reply via email to