>From a previous build, I got the data for these columns just fine from sqlline. So I think we can eliminate any display issues unless I am missing something?
- Rahul On Fri, Nov 6, 2015 at 5:34 PM, Jacques Nadeau <jacq...@dremio.com> wrote: > Can you confirm if this is a display bug in sqlline or jdbc to string > versus an actual data return? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Nov 6, 2015 at 5:31 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Jason, > > > > You were partly correct. We are not dropping records however we are > > corrupting dictionary encoded binary columns. I got confused that we are > > returning different records, but we are trimming (or returning unreadable > > chars) some columns which are binary. I was able to reproduce with the > > lineitem data set. I will raise a jira and I think this should be treated > > critical. Thoughts? > > > > - Rahul > > > > On Fri, Nov 6, 2015 at 4:30 PM, rahul challapalli < > > challapallira...@gmail.com> wrote: > > > > > Jason, > > > > > > I missed that. Let me check whether we are dropping any records. I > would > > > be surprised if our regression tests missed that :) > > > > > > - Rahul > > > > > > On Fri, Nov 6, 2015 at 4:19 PM, Jason Altekruse < > > altekruseja...@gmail.com> > > > wrote: > > > > > >> Rahul, > > >> > > >> Thanks for working on a reproduction of the issue. You didn't actually > > >> answer my first question, are you getting the same data out of the > file, > > >> just in a different order? It seems much more likely that we are > > dropping > > >> some records at the beginning than reordering them somehow, although I > > >> would have expected an error like this to be caught by the unit or > > >> regression tests. > > >> > > >> Thanks, > > >> Jason > > >> > > >> On Fri, Nov 6, 2015 at 4:13 PM, rahul challapalli < > > >> challapallira...@gmail.com> wrote: > > >> > > >> > Thanks for your replies. The file is private and I will try to > > >> construct a > > >> > file without sensitive data which can expose this behavior. > > >> > > > >> > - Rahul > > >> > > > >> > On Fri, Nov 6, 2015 at 3:45 PM, Jason Altekruse < > > >> altekruseja...@gmail.com> > > >> > wrote: > > >> > > > >> > > Is this a large or private parquet file? Can you share it to allow > > me > > >> to > > >> > > debug the read path for it? > > >> > > > > >> > > On Fri, Nov 6, 2015 at 3:37 PM, Jason Altekruse < > > >> > altekruseja...@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > The changes to parquet were not supposed to be functional at > all. > > We > > >> > had > > >> > > > been maintaining our fork of parquet-mr to have a ByteBuffer > based > > >> read > > >> > > and > > >> > > > write path to reduce heap memory usage. The work done was just > > >> getting > > >> > > > these changes merged back into parquet-mr and making > corresponding > > >> > > changes > > >> > > > in Drill to accommodate any interface modifications introduced > > >> since we > > >> > > > last rebased (there were mostly just package renames). There > were > > a > > >> lot > > >> > > of > > >> > > > comments on the PR, and a decent amount of refactoring that was > > >> done to > > >> > > > consolidate and otherwise clean up the code, but there shouldn't > > >> have > > >> > > been > > >> > > > any changes to the behavior of the reader or writer. > > >> > > > > > >> > > > Are you getting all of the same data out if you read the whole > > file, > > >> > just > > >> > > > in a different order? > > >> > > > > > >> > > > On Fri, Nov 6, 2015 at 3:31 PM, rahul challapalli < > > >> > > > challapallira...@gmail.com> wrote: > > >> > > > > > >> > > >> parquet-meta command suggests that there is only one row group > > >> > > >> > > >> > > >> On Fri, Nov 6, 2015 at 3:23 PM, Jacques Nadeau < > > jacq...@dremio.com > > >> > > > >> > > >> wrote: > > >> > > >> > > >> > > >> > How many row groups? > > >> > > >> > > > >> > > >> > -- > > >> > > >> > Jacques Nadeau > > >> > > >> > CTO and Co-Founder, Dremio > > >> > > >> > > > >> > > >> > On Fri, Nov 6, 2015 at 3:14 PM, rahul challapalli < > > >> > > >> > challapallira...@gmail.com> wrote: > > >> > > >> > > > >> > > >> > > Drillers, > > >> > > >> > > > > >> > > >> > > With the new parquet library update, can someone throw some > > >> light > > >> > on > > >> > > >> the > > >> > > >> > > order in which the records are read from a single parquet > > file? > > >> > > >> > > > > >> > > >> > > With the older library, when I run the below query on a > > single > > >> > > parquet > > >> > > >> > > file, I used to get a set of records. Now after the parquet > > >> > library > > >> > > >> > update, > > >> > > >> > > I am seeing a different set of records. Just wanted to > > >> understand > > >> > > what > > >> > > >> > > specifically has changed. > > >> > > >> > > > > >> > > >> > > select * from `file.parquet` limit 5; > > >> > > >> > > > > >> > > >> > > - Rahul > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >