[jira] [Commented] (PARQUET-1199) [C++] Support writing (and test reading) boolean values with RLE encoding

2018-01-19 Thread Jim Pivarski (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333004#comment-16333004 ] Jim Pivarski commented on PARQUET-1199: --- SORRY--- I misread the issue. Thi

[jira] [Commented] (PARQUET-1199) [C++] Support writing (and test reading) boolean values with RLE encoding

2018-01-19 Thread Jim Pivarski (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332992#comment-16332992 ] Jim Pivarski commented on PARQUET-1199: --- I happen to have appropriate test f

Re: Recommended page size controversy

2018-01-17 Thread Jim Pivarski
Optimizing compression ratios is one issue, optimizing page granularity for column indexes is another, and a third issue is that there is per-page metadata in the Parquet footer in Thrift format that has to be interpreted before anything in the file can be accessed. Too many pages could slow down f

Re: [PARQUET-CPP] Writing hierarchical schema to a parquet

2018-01-17 Thread Jim Pivarski
I also have a use-case that requires lists-of-structs and encountered that limitation in pyarrow. Just one level deep would enable a lot of HEP data. I've worked out the logic of converting Parquet definition and repetition levels into Arrow-style arrays: https://github.com/diana-hep/oamap/blob/m

[jira] [Commented] (PARQUET-1084) Parquet-C++ doesn't selectively read columns with mmap'ed files

2018-01-16 Thread Jim Pivarski (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327749#comment-16327749 ] Jim Pivarski commented on PARQUET-1084: --- I just discovered something impor

[jira] [Commented] (PARQUET-1084) Parquet-C++ doesn't selectively read columns

2017-09-28 Thread Jim Pivarski (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184375#comment-16184375 ] Jim Pivarski commented on PARQUET-1084: --- Yes, it's the memory map.

[jira] [Issue Comment Deleted] (PARQUET-1084) Parquet-C++ doesn't selectively read columns

2017-09-28 Thread Jim Pivarski (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Pivarski updated PARQUET-1084: -- Comment: was deleted (was: If the file is opened as a memory map (I don't know t

[jira] [Commented] (PARQUET-1084) Parquet-C++ doesn't selectively read columns

2017-09-28 Thread Jim Pivarski (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184330#comment-16184330 ] Jim Pivarski commented on PARQUET-1084: --- If the file is opened as a memory ma

[jira] [Created] (PARQUET-1084) Parquet-C++ doesn't selectively read columns

2017-08-30 Thread Jim Pivarski (JIRA)
Jim Pivarski created PARQUET-1084: - Summary: Parquet-C++ doesn't selectively read columns Key: PARQUET-1084 URL: https://issues.apache.org/jira/browse/PARQUET-1084 Project: Parquet

Re: Next sync

2016-09-26 Thread Jim Pivarski
On Thu, Sep 22, 2016 at 7:18 PM, Julien Le Dem wrote: > The sync next week collides with strata Conf in NY. > I propose to move it to the following week. Does that mean it would be pushed back to Thursday, October 3 at 10-11am PT?

Re: Parquet for High Energy Physics

2016-08-12 Thread Jim Pivarski
t; I'd be happy to join on the 16th as well. I'm not far from the financial > > district. I can book a room there. > > Julien > > > > On Friday, August 5, 2016, Jim Pivarski wrote: > >> > >> Hi Wes (and the Parquet team), > >> > >>

Re: Parquet for High Energy Physics

2016-08-05 Thread Jim Pivarski
16th but I > can meet with you in the later part of the day, and anyone else is > welcome to join to discuss. > > - Wes > > On Wed, Aug 3, 2016 at 10:04 AM, Julien Le Dem wrote: > > Yes that would be another way to do it. > > The Parquet-cpp/parquet-arrow integrat

Re: Parquet for High Energy Physics

2016-08-03 Thread Jim Pivarski
Related question: could I get ROOT's complex events into Parquet files without inventing a Logical Type Definition by converting them to Apache Arrow data structures in memory, and then letting the Arrow-Parquet integration write those data structures to files? Arrow could provide side-benefits, s

Parquet for High Energy Physics

2016-08-03 Thread Jim Pivarski
Hi, I'd like to use parquet-cpp for High Energy Physics (HEP) and possibly contribute to the core to support that use-case, but I'm having trouble determining the status of the C++ project. Most HEP data is stored in the ROOT file format ( https://root.cern.ch/root/InputOutput.html), which repres