Hi Owen,
Great. This is actually what I needed confirmation on. All seems to be good
now. If I encounter a failure now, I just don't increase the size of my
vectorized row batch and my smoke tests seems to be satisfying what I want.
Thanks!
On Fri, Sep 11, 2020 at 11:38 AM Owen O'Malley
wrote:
>
What I'd propose is that in addToVector, which I assume is your code, you
catch exceptions and roll back the VectorizedRowBatch.size to the previous
row by subtracting one. That will effectively wipe out the previous partial
row. For complex types, you won't reclaim the values, but they won't be
wr
Hi Owen,
Thanks for the quick response.
Essentially, I have an Avro -> ORC real-time conversion process I have. I
do the conversion myself using the Java API. In the case I (internally in
my code) hit a serialization failure, etc. then I push to a queue to handle
offline.
However, since I write th
Where is the failure happening? If it is happening in the ORC writer code,
there isn't a way to do that. Can I ask what kind of exception you are
hitting? In the column (aka tree) writers, there shouldn't be much that can
go wrong. It doesn't even write to the file handle, just buffering in
memory.
I'm writing a streaming application that converts incoming data into ORC in
real-time. One thing I'm implementing is a dead-letter queue that still
allows me to continue the batch processing even if a single record fails.
The caveat to this, is I want to remove the data that has been written thus