Yeah, I'd suggest adding to:
OrcFile.ReaderOptions:
exposeAcidRowId(boolean); -- so that the returned schema includes the
ACID row id
Reader.Options:
setValidTransactions(TransactionList); -- apply transaction filtering
Then it will read a single file (or range using
Reader.Options.range(l
> For performance reasons, you prefer the second option that I rejected
> where users give a file and the system finds the deletes from there. I can
> buy that.
That's simpler at least to understand and debug, the logs from ORC alone are
enough to find consistency issues.
The rest of the det
For performance reasons, you prefer the second option that I rejected
where users give a file and the system finds the deletes from there. I can
buy that.
As for passing splits rather than files, that makes sense but seems like a
bigger change, since this should work with and without ACID, so I’
> The first thing that strikes me is that createReader takes a file.
> But for acid, you need to pass the directory because it needs to look for any
> relevant delta files.
The ACID 2.x impl, the InputFormat gets a directory - but a Reader should still
be getting an individual file.
In fact
I’ve been looking at the OrcFile.createReader method and thinking about
what I will need to do to read acid files. The first thing that strikes me
is that createReader takes a file. But for acid, you need to pass the
directory because it needs to look for any relevant delta files. Acid also
requ