Re: [Python] Disk size performance of Snappy vs Brotli vs Blosc

2018-01-24 Thread Daniel Lemire
Here are some realistic tabular data sets... https://github.com/lemire/RealisticTabularDataSets They are small by modern standards but they are also one GitHub clone away. - Daniel On Wed, Jan 24, 2018 at 2:26 PM, Wes McKinney wrote: > Thanks Ted. I will echo these

Re: Linking against parquet-cpp

2017-12-07 Thread Daniel Lemire
You might be missing a "-l" flag or two in addition to the "-I" flag. You might also need a "-L" flag. On Thu, Dec 7, 2017 at 1:34 PM, Renato MarroquĂ­n Mogrovejo < renatoj.marroq...@gmail.com> wrote: > Hi devs, > > I have also sent this question to the parquet mailing list, but I guess > this is

Re: Help in reconciling how arrow helps with columnar processing?

2017-12-02 Thread Daniel Lemire
I don't know the answer per se but my understanding is that Arrow enables ccmputational kernels that can be highly optimized. I plan to do some work in this direction myself. - Daniel Hi, > > I wonder if anyone can comment on how does Apache Arrow accomplish, or help > accomplish the following,

[jira] [Comment Edited] (ARROW-273) Lists use unsigned offset vectors instead of signed (as defined in the spec)

2016-08-26 Thread Daniel Lemire (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438920#comment-15438920 ] Daniel Lemire edited comment on ARROW-273 at 8/26/16 1:23 PM: -- If the max