Re: Writing very large rowgroups to Apache Parquet

2020-07-11 Thread Micah Kornfield
This is an interesting idea. For s3 multipart uploads one might run into limitations pretty quickly (only 10k parts appear to be supported. all but the last are expected to be at least 5mb if I read their docs correctly [1]) [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html On

Re: Writing very large rowgroups to Apache Parquet

2020-07-11 Thread Jacques Nadeau
I'd suggest a new write pattern. Write the columns page at a time to separate files then use a second process to concatenate the columns and append the footer. Odds are you would do better than os swapping and take memory requirements down to page size times field count. In s3 I believe you could

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau
For reference, the doc (from eight years ago) I meant to link in my initial message was: https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit On Sat, Jul 11, 2020, 11:24 AM Wes McKinney wrote: > On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau > wrote: > > > > On

Re: [DISCUSS] [C++] custom allocator for large objects

2020-07-11 Thread Wes McKinney
On Sat, Jul 11, 2020 at 4:10 AM Rémi Dettai wrote: > > Hi Micah, > > Thanks for the answer ! But it seems your email got split in half in some > way ;-) > > My use case mainly focuses on aggregations (with group by), and after > fighting quite a bit with the allocators I ended up thinking that it

Re: language independent representation of filter expressions

2020-07-11 Thread Wes McKinney
On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau wrote: > > On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote: > > > I would also be interested in having a reusable serialized format for > > filter- and projection-like expressions. I think trying to go so far > > as full logical query plans

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau
On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote: > I would also be interested in having a reusable serialized format for > filter- and projection-like expressions. I think trying to go so far > as full logical query plans suitable for building a SQL engine is > perhaps a bit too far but we

Re: Status of Rust Integration Testing

2020-07-11 Thread Neville Dipale
Hi Micah, Yes, those files are read correctly. We test against them. I was trying to generate gold files based on 0.17.1, so I could debug against those, I'll work on that in the coming days. On Sat, 11 Jul 2020, 05:58 Micah Kornfield, wrote: > Hi Neville, > Thanks for the update. One

[NIGHTLY] Arrow Build Report for Job nightly-2020-07-11-0

2020-07-11 Thread Crossbow
Arrow Build Report for Job nightly-2020-07-11-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-11-0 Failed Tasks: - centos-8-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-11-0-travis-centos-8-aarch64 -

Re: [DISCUSS] [C++] custom allocator for large objects

2020-07-11 Thread Rémi Dettai
Hi Micah, Thanks for the answer ! But it seems your email got split in half in some way ;-) My use case mainly focuses on aggregations (with group by), and after fighting quite a bit with the allocators I ended up thinking that it might not be worth it materializing the raw data as arrow tables