Re: Order of records read in a parquet file

Jason Altekruse Fri, 06 Nov 2015 15:38:53 -0800

The changes to parquet were not supposed to be functional at all. We had
been maintaining our fork of parquet-mr to have a ByteBuffer based read and
write path to reduce heap memory usage. The work done was just getting
these changes merged back into parquet-mr and making corresponding changes
in Drill to accommodate any interface modifications introduced since we
last rebased (there were mostly just package renames). There were a lot of
comments on the PR, and a decent amount of refactoring that was done to
consolidate and otherwise clean up the code, but there shouldn't have been
any changes to the behavior of the reader or writer.


Are you getting all of the same data out if you read the whole file, just
in a different order?

On Fri, Nov 6, 2015 at 3:31 PM, rahul challapalli <
challapallira...@gmail.com> wrote:

> parquet-meta command suggests that there is only one row group
>
> On Fri, Nov 6, 2015 at 3:23 PM, Jacques Nadeau <jacq...@dremio.com> wrote:
>
> > How many row groups?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Nov 6, 2015 at 3:14 PM, rahul challapalli <
> > challapallira...@gmail.com> wrote:
> >
> > > Drillers,
> > >
> > > With the new parquet library update, can someone throw some light on
> the
> > > order in which the records are read from a single parquet file?
> > >
> > > With the older library, when I run the below query on a single parquet
> > > file, I used to get a set of records. Now after the parquet library
> > update,
> > > I am seeing a different set of records. Just wanted to understand what
> > > specifically has changed.
> > >
> > > select * from `file.parquet` limit 5;
> > >
> > > - Rahul
> > >
> >
>

Re: Order of records read in a parquet file

Reply via email to