Wes McKinney created ARROW-2032:
---
Summary: [C++] ORC ep installs on each call to ninja build (even
if no work to do)
Key: ARROW-2032
URL: https://issues.apache.org/jira/browse/ARROW-2032
Project: Apache
Jim Crist created ARROW-2031:
Summary: HadoopFileSystem isn't pickleable
Key: ARROW-2031
URL: https://issues.apache.org/jira/browse/ARROW-2031
Project: Apache Arrow
Issue Type: Improvement
Phillip Cloud created ARROW-2030:
Summary: NativeFile's Attributes are not exposed in child classes
without explicit initialization
Key: ARROW-2030
URL: https://issues.apache.org/jira/browse/ARROW-2030
Jim Crist created ARROW-2029:
Summary: [Python] Program crash on `HdfsFile.tell` if file is
closed
Key: ARROW-2029
URL: https://issues.apache.org/jira/browse/ARROW-2029
Project: Apache Arrow
Iss
Thank you Wes for cleaning most of them up!
We now got down to 3. One of them has an active discussion, we will probably
this soon to JIRA. The next about time drifts is something I think I have also
seen with a turbodbc user (independent of Arrow) so I'll probably look a bit
deeper into that
Uwe L. Korn created ARROW-2028:
--
Summary: [Python] extra_cmake_args needs to be passed through
shlex.split
Key: ARROW-2028
URL: https://issues.apache.org/jira/browse/ARROW-2028
Project: Apache Arrow
Wes McKinney created ARROW-2027:
---
Summary: [C++] ipc::Message::SerializeTo does not pad the message
body
Key: ARROW-2027
URL: https://issues.apache.org/jira/browse/ARROW-2027
Project: Apache Arrow
Here are some realistic tabular data sets...
https://github.com/lemire/RealisticTabularDataSets
They are small by modern standards but they are also one GitHub clone away.
- Daniel
On Wed, Jan 24, 2018 at 2:26 PM, Wes McKinney wrote:
> Thanks Ted. I will echo these comments and recommend to r
Thanks Ted. I will echo these comments and recommend to run tests on
larger and preferably "real" datasets rather than randomly generated
ones. The more repetition and less entropy in a dataset, the better
Parquet performs relative to other storage options. Web-scale datasets
often exhibit these ch
Diego Argueta created ARROW-2026:
Summary: Timestamps saved as int64 even if
use_deprecated_int96_timestamps=True
Key: ARROW-2026
URL: https://issues.apache.org/jira/browse/ARROW-2026
Project: Apache
Simba
Nice summary. I think that there may be some issues with your tests. In
particular, you are storing essentially uniform random values. That might
be a viable test in some situations, there are many where there is
considerably less entropy in the data being stored. For instance, if you
store
Brief meeting today. Attendees and topics discussed as follows:
Attendees
- Wes (Two Sigma)
- Expand Interval metadata in format spec
- 0.9.0 milestone
- Simba
- Uwe (Blue Yonder)
- C++
- Li (Two Sigma)
- Dwight (Revirda)
- Sidd (Dremio)
- Struct change merge
- Interval
- Phillip Cloud
Jim Crist created ARROW-2025:
Summary: [Python/C++] HDFS Client disconnect closes all open
clients
Key: ARROW-2025
URL: https://issues.apache.org/jira/browse/ARROW-2025
Project: Apache Arrow
Iss
I can't make the sync today, will catch up later.
Bryan
On Jan 24, 2018 6:30 AM, "Wes McKinney" wrote:
> Join us at https://meet.google.com/vtm-teks-phx
>
Join us at https://meet.google.com/vtm-teks-phx
Hi Uwe, thanks.
I've attached a Google Sheet link
https://docs.google.com/spreadsheets/d/1by1vCaO2p24PLq_NAA5Ckh1n3i-SoFYrRcfi1siYKFQ/edit#gid=0
Kind Regards
Simba
On Wed, 24 Jan 2018 at 15:07 Uwe L. Korn wrote:
> Hello Simba,
>
> your plots did not come through. Try uploading them somewhere
Hello Simba,
your plots did not come through. Try uploading them somewhere and link
to them in the mails. Attachments are always stripped on Apache
mailing lists.
Uwe
On Wed, Jan 24, 2018, at 1:48 PM, simba nyatsanga wrote:
> Hi Everyone,
>
> I did some benchmarking to compare the disk size per
Hi Everyone,
I did some benchmarking to compare the disk size performance when writing
Pandas DataFrames to parquet files using Snappy and Brotli compression. I
then compared these numbers with those of my current file storage solution.
In my current (non Arrow+Parquet solution), every column in
Hi,
I am trying to use the arrow go and follow the following instructions at
https://github.com/apache/arrow/tree/master/c_glib/example/go
everything goes ok until I am trying to do
% git clone https://github.com/apache/arrow.git ~/arrow
% cd ~/arrow/c_glib/example/go
% make generate
This retu
19 matches
Mail list logo