Thanks for the reply Xu. Gwen gave me information about how to join the parquet-dev mailing list ( http://parquet.incubator.apache.org/community/ ).
There are no issues with parquet format. It seems like they do not have a ready api to read the columns directly in memory. Have asked the question here too: http://stackoverflow.com/questions/25334466/parquet-read-particular-columns-into-memory Also, i had posted to the parquet-dev mailing list a question about how the different meta-data objects relate to each other. I think the metadata relationships diagram on the parquet-mr github page ( https://github.com/Parquet/parquet-format ) is out of date with what is in the code. I do not see any BlockMetaData in the diagram and seems like ColumnMetaData has been renamed to ColumnChunkMetaData, also some of the variable names have been changed. Currently i am trying to figure out how to get the total count of the number of rows for a given column and if nulls would be included in the count or not. I am doing this so that i can read the entire columns in memory (have to allocate exact amount of space) and then index them by the primary key, so that i can do fast in memory lookups. Any help would be appreciated. Regards, ~Pratik On Wed, Aug 27, 2014 at 5:46 PM, Xu, Qian A <[email protected]> wrote: > Hi Pratik, > > > > What are the issues with Parquet format? Could you please tell some > details? > > > > Best regards > > --Qian Xu (Stanley) > > > > > > > > *From:* pratik khadloya [mailto:[email protected]] > *Sent:* Thursday, August 28, 2014 1:44 AM > *To:* [email protected] > *Subject:* Join parquet-dev google group? > > > > I know this is not related, but i am facing some issues with the Parquet > format and have questions for the parquet-dev community. But i am unable to > join the parquet-dev googlegroup as i do not see a subscribe link over > there. Can anyone please show me how to join the parquet-dev google group? > > > > > > Thanks, > > Pratik >
