[jira] [Assigned] (PARQUET-334) UT TestSummary failed with "java.lang.RuntimeException: Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from null" when Pig >=0.15

2015-12-07 Thread Thomas Friedrich (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Friedrich reassigned PARQUET-334: Assignee: Thomas Friedrich > UT TestSummary failed with "java.lang.RuntimeExceptio

Re: Can't read some parquet files after ByteBuffer Patch

2015-12-07 Thread Jason Altekruse
I got the file, I should have time to look at it today. On Mon, Dec 7, 2015 at 3:05 PM, Daniel Weeks wrote: > I sent Jason a file that can reproduce the issue with just 1K lines in it. > > If you want, I can open a JIRA and attach the file. > > 5a45ae3b1deb5117cb9e9a13141eeab1e9ad3d71 Can read t

Re: Can't read some parquet files after ByteBuffer Patch

2015-12-07 Thread Daniel Weeks
I sent Jason a file that can reproduce the issue with just 1K lines in it. If you want, I can open a JIRA and attach the file. 5a45ae3b1deb5117cb9e9a13141eeab1e9ad3d71 Can read the file without issue 6b605a4ea05b66e1a6bf843353abcb4834a4ced8 (bytebuffer) cannot read the file -Dan On Mon, Dec 7,

Re: Creating Hive tables on Parquet data with a custom Thrift generator

2015-12-07 Thread Ryan Blue
On 12/07/2015 02:23 PM, Stephen Bly wrote: Thank you all for you help, I think I know what I need to do now. At some point maybe I can contribute to the Parquet project to allow Hive to access columns by looking at the stored ID field instead of the field name. That would be great! Let us kn

Re: Can't read some parquet files after ByteBuffer Patch

2015-12-07 Thread Julien Le Dem
In the meantime if you have the stacktrace for this error that would help too. On Fri, Dec 4, 2015 at 1:59 PM, Jason Altekruse wrote: > I assume that the buffer that we are giving to thrift doesn't have the > header in it at the expected position. We hadn't seen this error in any of > our regres

Re: Creating Hive tables on Parquet data with a custom Thrift generator

2015-12-07 Thread Stephen Bly
To follow-up, after discussing with more senior engineers at my company: I misread what Julien said in regards to accessing by column index. I thought this was equivalent to Thrift ID, but now I understand what he actually meant, and that solution is unfortunately not viable for our use case. I

Re: parquet file doubts

2015-12-07 Thread Julien Le Dem
Thanks Cheng! Here is a useful blog post: http://grepalex.com/2014/05/13/parquet-file-format-and-object-model/ about 2. On Sun, Dec 6, 2015 at 9:52 PM, Cheng Lian wrote: > cc parquet-dev list (it would be nice to always do so for these general > questions.) > > Cheng > > On 12/6/15 3:10 PM, Shus

Re: Creating Hive tables on Parquet data with a custom Thrift generator

2015-12-07 Thread Ryan Blue
On 12/07/2015 11:21 AM, Stephen Bly wrote: Thank you all for your detailed responses. Let me make sure I have this right: I can write the Parquet file in any way I want, including using our own custom Thrift code. Hive does not care, because it will used the schema stored in the Parquet file t

Re: Creating Hive tables on Parquet data with a custom Thrift generator

2015-12-07 Thread Stephen Bly
Thank you all for your detailed responses. Let me make sure I have this right: I can write the Parquet file in any way I want, including using our own custom Thrift code. Hive does not care, because it will used the schema stored in the Parquet file together with the schema I specified when crea

Re: Creating Hive tables on Parquet data with a custom Thrift generator

2015-12-07 Thread Julien Le Dem
(CC'ing some folks who may have more context) There's a setting to make the column lookup by index instead (ignoring names in the file): https://github.com/apache/parquet-mr/search?utf8=%E2%9C%93&q=PARQUET_COLUMN_INDEX_ACCESS (remember that the serde code has moved to hive itself so parquet-hive is

Re: Creating Hive tables on Parquet data with a custom Thrift generator

2015-12-07 Thread Dmitriy Ryaboy
Hi Stephen, I'm not sure I follow your scenario. So you have your own Thrift generator; you then write (outside of Hive, using your own input/output format) the Thrift objects out into HDFS using your own custom Parquet output format. You have two questiosn: 1. Is there anything special you need to

Re: Creating Hive tables on Parquet data with a custom Thrift generator

2015-12-07 Thread Ryan Blue
Hi Stephen, Good questions. I think there is a slight misunderstanding with some of the components, so I'll go over how they relate to one another first. There are several different object models -- ways of working with data in memory -- including Thrift, Hive, and Avro (to name just one othe

Re: Add support for repeated columns in the filter2 API [PARQUET-34]

2015-12-07 Thread Ryan Blue
I've been using the two interchangably. It depends on the order. If you have an element e and a set S, then you can test if "e in S" or "S contains e". They are both the same operation, but expressed in terms of the element first or the set first. rb On 12/05/2015 08:25 AM, Flavio Pompermaier

Creating Hive tables on Parquet data with a custom Thrift generator

2015-12-07 Thread Stephen Bly
Greetings Parquet experts. I am in need of a little help. I am a (very) Junior developer at my company, and I have been tasked with adding the Parquet file format to our Hadoop ecosystem. Our main use case is in creating Hive tables on Parquet data and querying them. As you know, Hive can creat

[jira] [Created] (PARQUET-399) website should list mailing list adresses

2015-12-07 Thread JIRA
André Kelpe created PARQUET-399: --- Summary: website should list mailing list adresses Key: PARQUET-399 URL: https://issues.apache.org/jira/browse/PARQUET-399 Project: Parquet Issue Type: Improve