[
https://issues.apache.org/jira/browse/PARQUET-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333004#comment-16333004
]
Jim Pivarski commented on PARQUET-1199:
---
SORRY--- I misread the issue. Thi
[
https://issues.apache.org/jira/browse/PARQUET-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332992#comment-16332992
]
Jim Pivarski commented on PARQUET-1199:
---
I happen to have appropriate test f
Optimizing compression ratios is one issue, optimizing page granularity for
column indexes is another, and a third issue is that there is per-page
metadata in the Parquet footer in Thrift format that has to be interpreted
before anything in the file can be accessed. Too many pages could slow down
f
I also have a use-case that requires lists-of-structs and encountered that
limitation in pyarrow. Just one level deep would enable a lot of HEP data.
I've worked out the logic of converting Parquet definition and repetition
levels into Arrow-style arrays:
https://github.com/diana-hep/oamap/blob/m
[
https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327749#comment-16327749
]
Jim Pivarski commented on PARQUET-1084:
---
I just discovered something impor
[
https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184375#comment-16184375
]
Jim Pivarski commented on PARQUET-1084:
---
Yes, it's the memory map.
[
https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Pivarski updated PARQUET-1084:
--
Comment: was deleted
(was: If the file is opened as a memory map (I don't know t
[
https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184330#comment-16184330
]
Jim Pivarski commented on PARQUET-1084:
---
If the file is opened as a memory ma
Jim Pivarski created PARQUET-1084:
-
Summary: Parquet-C++ doesn't selectively read columns
Key: PARQUET-1084
URL: https://issues.apache.org/jira/browse/PARQUET-1084
Project: Parquet
On Thu, Sep 22, 2016 at 7:18 PM, Julien Le Dem wrote:
> The sync next week collides with strata Conf in NY.
> I propose to move it to the following week.
Does that mean it would be pushed back to Thursday, October 3 at 10-11am PT?
t; I'd be happy to join on the 16th as well. I'm not far from the financial
> > district. I can book a room there.
> > Julien
> >
> > On Friday, August 5, 2016, Jim Pivarski wrote:
> >>
> >> Hi Wes (and the Parquet team),
> >>
> >>
16th but I
> can meet with you in the later part of the day, and anyone else is
> welcome to join to discuss.
>
> - Wes
>
> On Wed, Aug 3, 2016 at 10:04 AM, Julien Le Dem wrote:
> > Yes that would be another way to do it.
> > The Parquet-cpp/parquet-arrow integrat
Related question: could I get ROOT's complex events into Parquet files
without inventing a Logical Type Definition by converting them to Apache
Arrow data structures in memory, and then letting the Arrow-Parquet
integration write those data structures to files?
Arrow could provide side-benefits, s
Hi,
I'd like to use parquet-cpp for High Energy Physics (HEP) and possibly
contribute to the core to support that use-case, but I'm having trouble
determining the status of the C++ project.
Most HEP data is stored in the ROOT file format (
https://root.cern.ch/root/InputOutput.html), which repres
14 matches
Mail list logo