[jira] [Created] (ARROW-2571) [C++] Lz4Codec doesn't properly handle empty data

2018-05-10 Thread Dmitry Kalinkin (JIRA)
Dmitry Kalinkin created ARROW-2571: -- Summary: [C++] Lz4Codec doesn't properly handle empty data Key: ARROW-2571 URL: https://issues.apache.org/jira/browse/ARROW-2571 Project: Apache Arrow Is

Re: [JS] Arrow output from JS library?

2018-05-10 Thread Paul Taylor
Quick update on the Arrow JS ipc buffer writer: I had a chance to revisit this branch on my fork last night, and managed to get a working prototype of the RecordBatchStreamWriter correctly serializing the integration test data to ArrayB

Re: How to model massive nested data

2018-05-10 Thread Wes McKinney
hi Tyler, I am not sure the Arrow Java libraries have yet been used for interacting with larger than memory datasets, but this would be a good opportunity to try to get this working. In the C++ libraries, any Arrow data structures can easily reference memory-mapped data on disk; none of the data

Re: [CI] Code coverage reports

2018-05-10 Thread Wes McKinney
hi Antoine, I also prefer codecov.io, but unfortunately Apache Infra does not support it I believe due to some app hook permissions issue (there are some similar problems preventing CircleCI from being made available to Apache projects). I have asked before, you are welcome to open an INFRA ticket

[jira] [Created] (ARROW-2570) [Python] Add support for writing parquet files with LZ4 compression

2018-05-10 Thread Dmitry Kalinkin (JIRA)
Dmitry Kalinkin created ARROW-2570: -- Summary: [Python] Add support for writing parquet files with LZ4 compression Key: ARROW-2570 URL: https://issues.apache.org/jira/browse/ARROW-2570 Project: Apache

PyArrow and Parquet DELTA_BINARY_PACKED

2018-05-10 Thread Feras Salim
Hi, I was wondering if I'm missing something or currently the `DELTA_BINARY_PACKED` is only available for reading when it comes to parquet files, I can't find a way for the writer to encode timestamp data with `DELTA_BINARY_PACKED`, furthermore I seem to get about 10% increase in final file size wh

Re: How to model massive nested data

2018-05-10 Thread Martin Durant
This is not directly relevant here, but has anyone looked into oamap ( https://github.com/diana-hep/oamap ), which is capable of using numba to compile python functions which traverse nested data structures down to the basic leaf nodes, without creating intermediate python objects. Then the person

Re: How to model massive nested data

2018-05-10 Thread Lukasz Cwik
Is it also possible to iterate over the iterator more then once. Can I have multiple iterators at different positions for iterator all working independently? On Thu, May 10, 2018 at 12:22 PM Tyler Akidau wrote: > Hello Arrow folks, > > I've been skimming through the Arrow docs and code trying to

How to model massive nested data

2018-05-10 Thread Tyler Akidau
Hello Arrow folks, I've been skimming through the Arrow docs and code trying to figure out how one might model nested data structures where the nested portions themselves might be massive (i.e., larger than available memory). AFAICT, the nesting constructs in Arrow appear to assume that you can al

[jira] [Created] (ARROW-2569) [C++] Improve thread pool size heuristic

2018-05-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2569: - Summary: [C++] Improve thread pool size heuristic Key: ARROW-2569 URL: https://issues.apache.org/jira/browse/ARROW-2569 Project: Apache Arrow Issue Type: I

[jira] [Created] (ARROW-2568) [Python] Expose thread pool size setting to Python, and deprecate "nthreads"

2018-05-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2568: - Summary: [Python] Expose thread pool size setting to Python, and deprecate "nthreads" Key: ARROW-2568 URL: https://issues.apache.org/jira/browse/ARROW-2568 Project:

[jira] [Created] (ARROW-2567) [C++/Python] Unit is ignored on comparison of TimestampArrays

2018-05-10 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2567: -- Summary: [C++/Python] Unit is ignored on comparison of TimestampArrays Key: ARROW-2567 URL: https://issues.apache.org/jira/browse/ARROW-2567 Project: Apache Arrow

[jira] [Created] (ARROW-2566) [CI] Add codecov.io badge to README

2018-05-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2566: - Summary: [CI] Add codecov.io badge to README Key: ARROW-2566 URL: https://issues.apache.org/jira/browse/ARROW-2566 Project: Apache Arrow Issue Type: Task

[CI] Code coverage reports

2018-05-10 Thread Antoine Pitrou
Hi, Previous efforts to gather and publish C++ code coverage using the free service provided by coveralls.io have stalled (see ARROW-27). I went ahead and experimented with another free service, codecov.io. I got it to work with our C++ and Rust code bases. An example report can be seen here:

[jira] [Created] (ARROW-2565) [Plasma] new subscriber cannot receive notifications about existing objects

2018-05-10 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-2565: Summary: [Plasma] new subscriber cannot receive notifications about existing objects Key: ARROW-2565 URL: https://issues.apache.org/jira/browse/ARROW-2565 Project: Apache Arr