Re: [DISCUSS] Moving Variant to Parquet Details

2024-09-10 Thread Daniel Weeks
I feel like it's reasonable to put the specification in the 'parquet-format' repo and reduce the confusion that would be caused by having specs split across repos. As for the implementations, we already know there will be multiple and some will be in languages where there is no current repo in the

Re: [DISCUSS] Adopt Variant Spec from Spark?

2024-08-23 Thread Daniel Weeks
Julien, I think there's interest in supporting multiple language implementations for variant (java/scala/cpp/rust/etc), so we might what to consider having a 'parquet-varient' repository to house the spec and language implementations. That might also help to keep them aligned, but open to other s

Re: [DISCUSS] Adopt Variant Spec from Spark?

2024-08-23 Thread Daniel Weeks
+1 On Fri, Aug 23, 2024 at 12:54 PM Ryan Blue wrote: > +1 > > On Fri, Aug 23, 2024 at 12:30 PM Jacques Nadeau > wrote: > > > +1 > > > > On Fri, Aug 23, 2024 at 8:51 AM Nong Li wrote: > > > > > +1. > > > > > > On Fri, Aug 23, 2024 at 12:57 PM Jan Finis wrote: > > > > > > > I would also appreci

Re: [DISCUSS] rename parquet-mr to parquet-java?

2024-05-17 Thread Daniel Weeks
+1 agree, much cleaner naming -Dan On Fri, May 17, 2024 at 8:46 AM Chao Sun wrote: > +1 too. The name has been confusing for a very long time. > > On Fri, May 17, 2024 at 8:40 AM Fokko Driesprong wrote: > > > +1 - I think it is much clearer to anyone. > > > > GitHub will handle all the redirec

Re: [VOTE][Format] Add Float16 type to specification

2023-10-06 Thread Daniel Weeks
+1 On Fri, Oct 6, 2023, 8:33 PM Gang Wu wrote: > +1 (non-binding) > > Best, > Gang > > On Sat, Oct 7, 2023 at 11:05 AM Micah Kornfield > wrote: > > > I'm +1 (non-binding) for the proposal in general. > > > > I do have a concern that we should be implementing > > https://issues.apache.org/jira/b

Re: [VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-31 Thread Daniel Weeks
+1 (binding) Verified sigs, sums, build and test. On Mon, Jan 28, 2019 at 2:08 PM Ryan Blue wrote: > Hi everyone, > > I propose the following RC to be released as official Apache Parquet Java > 1.10.1 release. > > The commit id is a89df8f9932b6ef6633d06069e50c9b7970bebd1 > >- This correspon

Re: [VOTE] Release Apache Parquet Format 2.5.0 RC0

2018-04-16 Thread Daniel Weeks
+1 (binding) Checked sigs, built and tested. On Thu, Apr 12, 2018 at 2:48 PM, Julien Le Dem wrote: > +1 (binding) > checked signature > ran build and tests > > On Mon, Apr 9, 2018 at 8:44 AM, Ryan Blue > wrote: > > > +1 (binding) > > > > Checked this for the last vote. > > > > On Mon, Apr 9,

Re: [VOTE] Accept donation of Parquet Rust implementation

2018-03-16 Thread Daniel Weeks
+1 On Tue, Mar 6, 2018 at 12:32 PM, Jacques Nadeau wrote: > +1 (non-binding) > > On Tue, Mar 6, 2018 at 12:31 PM, Uwe L. Korn wrote: > > > +1 > > > > On Tue, Mar 6, 2018, at 9:29 PM, Ryan Blue wrote: > > > +1 > > > > > > Thanks for starting a vote, Wes! > > > > > > On Tue, Mar 6, 2018 at 12:24

Re: [VOTE] Release Apache Parquet Format 2.4.0 RC2

2017-10-19 Thread Daniel Weeks
+1 Verified checksum, signature, build On Wed, Oct 18, 2017 at 9:37 AM, Ryan Blue wrote: > +1 > > Verified signature, checksums, tested. Also tested Parquet MR with PR #430 > and everything is passing. > > On Tue, Oct 17, 2017 at 12:33 PM, Ryan Bl

Re: [VOTE] Release Apache Parquet Format 2.4.0 RC1

2017-10-17 Thread Daniel Weeks
+1 Reviewed, Built, Verified Checksums On Tue, Oct 17, 2017 at 9:13 AM, Ryan Blue wrote: > +1 > > Checked build, RAT, checksums. > > On Mon, Oct 16, 2017 at 5:17 PM, Ryan Blue wrote: > > > Hi everyone, > > > > I propose the following RC to be released as official Apache Parquet > > Format 2.4.

Re: [VOTE] Release Apache Parquet 1.8.2 RC1

2017-01-23 Thread Daniel Weeks
+1 checked sums, built, tested On Mon, Jan 23, 2017 at 9:58 AM, Ryan Blue wrote: > Gabor, that md5 matches what I get. Are you sure you used the right file? > It isn’t the same format that md5sum produces, but if you check the octets > the hash matches.. > > [blue@work Downloads]$ md5sum apache-

Re: [VOTE] Release Apache Parquet MR 1.9.0 RC2

2016-10-24 Thread Daniel Weeks
+1 build and test On Thu, Oct 20, 2016 at 4:14 PM, Julien Le Dem wrote: > +1 > > Checked signature, build and test > > On Thu, Oct 20, 2016 at 3:47 PM, Ryan Blue wrote: > > > +1 > > > > Checked signature, checksums, build, and tests. > > > > On Tue, Oct 18, 2016 at 6:22 PM, Ryan Blue wrote: >

[jira] [Comment Edited] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

2016-07-06 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364563#comment-15364563 ] Daniel Weeks edited comment on PARQUET-400 at 7/6/16 4:3

[jira] [Comment Edited] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

2016-07-06 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364563#comment-15364563 ] Daniel Weeks edited comment on PARQUET-400 at 7/6/16 4:2

[jira] [Commented] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

2016-07-06 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364563#comment-15364563 ] Daniel Weeks commented on PARQUET-400: -- @ferdinand This is the JIRA tracking

[jira] [Commented] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

2016-05-09 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277348#comment-15277348 ] Daniel Weeks commented on PARQUET-400: -- [~pnarang] We uncovered that this i

[jira] [Resolved] (PARQUET-571) Fix potential leak in ParquetFileReader.close()

2016-03-25 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks resolved PARQUET-571. -- Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request 338 [https

[jira] [Comment Edited] (PARQUET-384) Add Dictionary Based Filtering to Filter2 API

2016-03-09 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187577#comment-15187577 ] Daniel Weeks edited comment on PARQUET-384 at 3/9/16 6:2

[jira] [Commented] (PARQUET-384) Add Dictionary Based Filtering to Filter2 API

2016-03-09 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187577#comment-15187577 ] Daniel Weeks commented on PARQUET-384: -- This now includes the api for rea

[jira] [Commented] (PARQUET-427) Push predicates into the whole read path

2016-02-12 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145671#comment-15145671 ] Daniel Weeks commented on PARQUET-427: -- [~proflin] Thanks for putting together

[jira] [Commented] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

2016-02-05 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134909#comment-15134909 ] Daniel Weeks commented on PARQUET-400: -- [~jaltekruse] If you have a chance, c

[jira] [Commented] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

2016-01-27 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120179#comment-15120179 ] Daniel Weeks commented on PARQUET-400: -- [~jaltekruse] I've made a pull r

[jira] [Updated] (PARQUET-397) Pig Predicate Pushdown using Filter2 API

2016-01-27 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks updated PARQUET-397: - Summary: Pig Predicate Pushdown using Filter2 API (was: Add Predicate Pushdown using Filter2

[jira] [Commented] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

2016-01-04 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081400#comment-15081400 ] Daniel Weeks commented on PARQUET-400: -- [~jaltekruse] We use EMR for our ha

[jira] [Commented] (PARQUET-409) InternalParquetRecordWriter doesn't use min/max row counts

2015-12-16 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060774#comment-15060774 ] Daniel Weeks commented on PARQUET-409: -- I definitely think it's worth expo

Re: Parquet sync up

2015-12-14 Thread Daniel Weeks
Works for me as well. On Mon, Dec 14, 2015 at 12:40 PM, Reuben Kuhnert < reuben.kuhn...@cloudera.com> wrote: > I can make that. Thanks > > On Mon, Dec 14, 2015 at 12:27 PM, Ryan Blue wrote: > > > Works for me. > > > > > > On 12/12/2015 03:20 PM, Julien Le Dem wrote: > > > >> The next parquet syn

[jira] [Commented] (PARQUET-400) Error reading some files after PARQUET-77 bytebuffer read path

2015-12-08 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047074#comment-15047074 ] Daniel Weeks commented on PARQUET-400: -- [~jaltekruse] Thanks for creating the

Re: Can't read some parquet files after ByteBuffer Patch

2015-12-07 Thread Daniel Weeks
ed our changes into master. > > > > Can you try to generate data similar to the private dataset the produces > > the issue? If you are having trouble reproducing could you share the data > > types and encodings that are being used in the file and I can try to > >

Can't read some parquet files after ByteBuffer Patch

2015-12-04 Thread Daniel Weeks
Jason or Julien, Just wanted to see if you or anyone else has run into problems reading files after the ByteBuffer patch. I've been running into issues and have narrowed it down to the ByteBuffer commit using a small repro file (written with 1.6.0, unfortunately can't share the data). It doesn't

Re: [DISCUSS] Weekly triage rotation

2015-12-01 Thread Daniel Weeks
I'm in as well. On Mon, Nov 23, 2015 at 10:13 PM, Julien Le Dem wrote: > This sounds good to me. > I'd sign up for triaging. > The more the merrier! > > On Mon, Nov 23, 2015 at 12:23 PM, Ryan Blue wrote: > > > Hi everyone, > > > > In the Parquet sync-up today, we were talking about how to keep

[jira] [Created] (PARQUET-397) Add Predicate Pushdown using Filter2 API

2015-12-01 Thread Daniel Weeks (JIRA)
Daniel Weeks created PARQUET-397: Summary: Add Predicate Pushdown using Filter2 API Key: PARQUET-397 URL: https://issues.apache.org/jira/browse/PARQUET-397 Project: Parquet Issue Type

[jira] [Assigned] (PARQUET-397) Add Predicate Pushdown using Filter2 API

2015-12-01 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks reassigned PARQUET-397: Assignee: Daniel Weeks > Add Predicate Pushdown using Filter2

[jira] [Commented] (PARQUET-384) Add Dictionary Based Filtering to Filter2 API

2015-11-04 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990573#comment-14990573 ] Daniel Weeks commented on PARQUET-384: -- https://github.com/apache/parquet-mr/

Re: Hive+Parquet : auto-type promotion

2015-10-09 Thread Daniel Weeks
I believe pig has this already: https://issues.apache.org/jira/browse/PARQUET-2 -Dan On Fri, Oct 9, 2015 at 9:59 AM, Sergio Pena wrote: > Hey Ryan, Mohammad > > I just left a comment on d...@hive.apache.org regarding this. It may be > possible to do such promotion in Hive. I don't know if Pig d

PARQUET-384 Adding Dictionaries to Filter API

2015-09-24 Thread Daniel Weeks
Alex, At the sync up, you asked about why we didn't add dictionary predicate evaluation into the filter api and for the most part it was driven by how presto does PPD. They prefer to do it via there TupleDomain implementation which is pretty efficient for normal files and also works well with the

[jira] [Created] (PARQUET-384) Add Dictionary Based Filtering to Filter2 API

2015-09-24 Thread Daniel Weeks (JIRA)
Daniel Weeks created PARQUET-384: Summary: Add Dictionary Based Filtering to Filter2 API Key: PARQUET-384 URL: https://issues.apache.org/jira/browse/PARQUET-384 Project: Parquet Issue Type

[jira] [Commented] (PARQUET-344) Limit the number of rows per block and per split

2015-08-24 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710193#comment-14710193 ] Daniel Weeks commented on PARQUET-344: -- For bullet #1, I could see it being ab

[jira] [Commented] (PARQUET-344) Limit the number of rows per block and per split

2015-08-24 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709669#comment-14709669 ] Daniel Weeks commented on PARQUET-344: -- [~rdblue] I have seen this problem

[jira] [Updated] (PARQUET-99) Large rows cause unnecessary OOM exceptions

2015-07-24 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks updated PARQUET-99: Affects Version/s: 1.8.1 1.7.0 1.8.0 > Large r

[jira] [Commented] (PARQUET-99) Large rows cause unnecessary OOM exceptions

2015-07-24 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641263#comment-14641263 ] Daniel Weeks commented on PARQUET-99: - Pull request: https://github.com/ap

Re: Issue while reading Parquet file in Hive

2015-07-21 Thread Daniel Weeks
bove exception. > > So please help me to solve this problem. > > Currently I am using > Hive 1.1.0-cdh5.4.2. >Cascading 2.5.1 >parquet-format-2.2.0 > > > Thanks, > Santlal Gutpa > > > -Original Message- > From: Daniel Weeks [mailto:dwe

Re: Issue while reading Parquet file in Hive

2015-07-17 Thread Daniel Weeks
Santial, It might just be as simple as the storage format for your hive table. I notice you say: hive> create table timestampTest (timestampField timestamp); But this should be: hive> create table timestampTest (timestampField timestamp) stored as parquet; Hive is probably processing the file

[jira] [Commented] (PARQUET-99) Large rows cause unnecessary OOM exceptions

2015-07-16 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630228#comment-14630228 ] Daniel Weeks commented on PARQUET-99: - This has been affecting us pretty regul

[jira] [Assigned] (PARQUET-99) Large rows cause unnecessary OOM exceptions

2015-07-16 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks reassigned PARQUET-99: --- Assignee: Daniel Weeks > Large rows cause unnecessary OOM excepti

[jira] [Resolved] (PARQUET-100) provide an option in parquet-pig to avoid reading footers in client side

2015-07-13 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks resolved PARQUET-100. -- Resolution: Won't Fix Fix Version/s: 1.6.0 This is fixed assuming you provide a s

[jira] [Commented] (PARQUET-100) provide an option in parquet-pig to avoid reading footers in client side

2015-07-13 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625433#comment-14625433 ] Daniel Weeks commented on PARQUET-100: -- PARQUET-139 fixed this. Let's c

[jira] [Updated] (PARQUET-299) [Vectorized Reader] ColumnVector length should be in terms of rows, not DataPages

2015-06-12 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks updated PARQUET-299: - Assignee: (was: Daniel Weeks) > [Vectorized Reader] ColumnVector length should be in te

[jira] [Assigned] (PARQUET-299) [Vectorized Reader] ColumnVector length should be in terms of rows, not DataPages

2015-06-12 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks reassigned PARQUET-299: Assignee: Daniel Weeks > [Vectorized Reader] ColumnVector length should be in terms

[jira] [Commented] (PARQUET-222) parquet writer runs into OOM during writing when calling DataFrame.saveAsParquetFile in Spark SQL

2015-06-12 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583752#comment-14583752 ] Daniel Weeks commented on PARQUET-222: -- [~lian cheng] For extremely wide ta

[jira] [Commented] (PARQUET-266) Add support for lists of primitives to Pig schema converter

2015-06-05 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574895#comment-14574895 ] Daniel Weeks commented on PARQUET-266: -- Thanks [~ccrolf] for the contribu

[jira] [Resolved] (PARQUET-266) Add support for lists of primitives to Pig schema converter

2015-06-05 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Weeks resolved PARQUET-266. -- Resolution: Fixed Fix Version/s: 1.8.0 Issue resolved by pull request 209 [https

[jira] [Commented] (PARQUET-222) parquet writer runs into OOM during writing when calling DataFrame.saveAsParquetFile in Spark SQL

2015-06-05 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574880#comment-14574880 ] Daniel Weeks commented on PARQUET-222: -- I believe you are running into the pro

[jira] [Commented] (PARQUET-266) Add support for lists of primitives to Pig schema converter

2015-06-01 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568139#comment-14568139 ] Daniel Weeks commented on PARQUET-266: -- Patch looks good to me (reviewed, bu

[jira] [Commented] (PARQUET-266) Add support for lists of primitives to Pig schema converter

2015-05-27 Thread Daniel Weeks (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561928#comment-14561928 ] Daniel Weeks commented on PARQUET-266: -- I can take a look. > Add support fo