Re: Thoughts on Acid reader

2017-09-15 Thread Owen O'Malley
Yeah, I'd suggest adding to: OrcFile.ReaderOptions: exposeAcidRowId(boolean); -- so that the returned schema includes the ACID row id Reader.Options: setValidTransactions(TransactionList); -- apply transaction filtering Then it will read a single file (or range using Reader.Options.range(l

Re: Thoughts on Acid reader

2017-09-14 Thread Gopal Vijayaraghavan
> For performance reasons, you prefer the second option that I rejected > where users give a file and the system finds the deletes from there. I can > buy that. That's simpler at least to understand and debug, the logs from ORC alone are enough to find consistency issues. The rest of the det

Re: Thoughts on Acid reader

2017-09-14 Thread Alan Gates
​For performance reasons, you prefer the second option that I rejected where users give a file and the system finds the deletes from there. I can buy that. As for passing splits rather than files, that makes sense but seems like a bigger change, since this should work with and without ACID, so I’

Re: Thoughts on Acid reader

2017-09-13 Thread Gopal Vijayaraghavan
> The first thing that strikes me is that createReader takes a file. > But for acid, you need to pass the directory because it needs to look for any > relevant delta files. The ACID 2.x impl, the InputFormat gets a directory - but a Reader should still be getting an individual file. In fact

Thoughts on Acid reader

2017-09-13 Thread Alan Gates
I’ve been looking at the OrcFile.createReader method and thinking about what I will need to do to read acid files. The first thing that strikes me is that createReader takes a file. But for acid, you need to pass the directory because it needs to look for any relevant delta files. Acid also requ