[jira] [Created] (ARROW-6461) [Java] EchoServer can close socket before client has finished reading

2019-09-04 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-6461: --- Summary: [Java] EchoServer can close socket before client has finished reading Key: ARROW-6461 URL: https://issues.apache.org/jira/browse/ARROW-6461 Project: Apache

[jira] [Created] (ARROW-6460) [Java] Add unit test for large avro data

2019-09-04 Thread Ji Liu (Jira)
Ji Liu created ARROW-6460: - Summary: [Java] Add unit test for large avro data Key: ARROW-6460 URL: https://issues.apache.org/jira/browse/ARROW-6460 Project: Apache Arrow Issue Type: Sub-task

[jira] [Created] (ARROW-6459) [C++] Remove "python" from conda_env_cpp.yml

2019-09-04 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6459: --- Summary: [C++] Remove "python" from conda_env_cpp.yml Key: ARROW-6459 URL: https://issues.apache.org/jira/browse/ARROW-6459 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-6458) [Java] Improve the performance and code structure for ApproxEqualsVisitor

2019-09-04 Thread Liya Fan (Jira)
Liya Fan created ARROW-6458: --- Summary: [Java] Improve the performance and code structure for ApproxEqualsVisitor Key: ARROW-6458 URL: https://issues.apache.org/jira/browse/ARROW-6458 Project: Apache Arrow

Re: [Discuss][Java] Support conversions between delta vector and partial sum vector

2019-09-04 Thread Fan Liya
Hi Wes, Thanks a lot for the comments. You are right. This can be applied to the data encoding/compression, and I think this is one of the building blocks for encoding/compression. In the short term, it will provide conversions between the two memory layouts of run length vectors. In the mid

Re: [Discuss][Java] Support conversions between delta vector and partial sum vector

2019-09-04 Thread Micah Kornfield
> > Having utility algorithms to perform data transformations seems fine > if there is a use for them and maintaining the code in the Arrow > libraries makes sense. In principle I agree. The genesis of this discussion was a request I made to Liya Fan on the PR for this algorithm [1].

[jira] [Created] (ARROW-6457) [C++] CMake build locally fails with MSVC 2015 build generator

2019-09-04 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6457: --- Summary: [C++] CMake build locally fails with MSVC 2015 build generator Key: ARROW-6457 URL: https://issues.apache.org/jira/browse/ARROW-6457 Project: Apache Arrow

[jira] [Created] (ARROW-6456) [C++] Possible to reduce object code generated in compute/kernels/take.cc?

2019-09-04 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6456: --- Summary: [C++] Possible to reduce object code generated in compute/kernels/take.cc? Key: ARROW-6456 URL: https://issues.apache.org/jira/browse/ARROW-6456 Project:

Re: Size of c++ libraries

2019-09-04 Thread Wes McKinney
Static C++ libraries on Windows are known to be quite large. The Protocol Buffers static libraries on Windows are over 100MB. The size is not caused by thirdparty dependencies AFAIK, it's the result of object code produced by MSVC. See

RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-04 Thread Eric Erhardt
The C# PR is up. https://github.com/apache/arrow/pull/5280 Eric -Original Message- From: Eric Erhardt Sent: Wednesday, September 4, 2019 10:12 AM To: dev@arrow.apache.org; Ji Liu Cc: emkornfield ; Paul Taylor Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte

[jira] [Created] (ARROW-6455) [C++] Implement ExtensionType for non-UTF8 Unicode data

2019-09-04 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6455: --- Summary: [C++] Implement ExtensionType for non-UTF8 Unicode data Key: ARROW-6455 URL: https://issues.apache.org/jira/browse/ARROW-6455 Project: Apache Arrow

Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-04 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-6455. It might make sense to define a common ExtensionType metadata in case multiple implementations decide they need this On Tue, Sep 3, 2019 at 10:35 PM Micah Kornfield wrote: > > This might be bike-shedding but I agree we should attempt to

[jira] [Created] (ARROW-6454) [Developer] Add LLVM license to LICENSE.txt due to binary redistribution in packages

2019-09-04 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6454: --- Summary: [Developer] Add LLVM license to LICENSE.txt due to binary redistribution in packages Key: ARROW-6454 URL: https://issues.apache.org/jira/browse/ARROW-6454

Re: [Discuss][Java] Support conversions between delta vector and partial sum vector

2019-09-04 Thread Wes McKinney
hi, Having utility algorithms to perform data transformations seems fine if there is a use for them and maintaining the code in the Arrow libraries makes sense. I don't understand point #2 "We can transform them to delta vectors before IPC". It sounds like you are proposing a data compression

Re: Parquet to Arrow in Java

2019-09-04 Thread Chao Sun
Thanks Uwe for pointing out the Iceberg effort - will take a look. It is good to have a "standard" Parquet-to-Arrow reader implementation live in the Arrow project though, so that in future different projects can just refer to this instead of implementing their own. Chao On Wed, Sep 4, 2019 at

Re: Parquet to Arrow in Java

2019-09-04 Thread Uwe L. Korn
Hello, You may want to interact with the Apache Iceberg community here. They are currently a similar things: https://lists.apache.org/thread.html/3bb4f89a0b37f474cf67915f91326fa845afa597bdd2463c98a2c8b9@%3Cdev.iceberg.apache.org%3E I'm not involved in this, just reading both mailing lists and

Re: Parquet to Arrow in Java

2019-09-04 Thread Chao Sun
Bumping this. We may have an upcoming use case for this as well. Want to know if anyone is actively working on this? I also heard that Dremio has internally implemented a performant Parquet to Arrow reader. Is there any plan to open source it? that could save us a lot of work. Thanks, Chao On

Re: Arrow sync call tomorrow (September 4) at 12:00 US/Eastern, 16:00 UTC

2019-09-04 Thread Neal Richardson
We're meeting at meet.google.com/nwj-xado-dtu instead today. On Tue, Sep 3, 2019 at 4:24 PM Neal Richardson wrote: > > Hi all, > Reminder that the biweekly Arrow call is tomorrow (or today, depending > on your time zone ;) at https://meet.google.com/vtm-teks-phx. All are > welcome to join. Notes

Re: Size of c++ libraries

2019-09-04 Thread Francois Saint-Jacques
Hello Ivan, There's a software called `bloaty` [1] that can tell you the size of binary object per symbols. Thank you, François [1] https://github.com/google/bloaty On Wed, Sep 4, 2019 at 12:00 PM Ivan Popivanov wrote: > > Have been trying to figure out the binary size of a basic arrow static

Size of c++ libraries

2019-09-04 Thread Ivan Popivanov
Have been trying to figure out the binary size of a basic arrow static library. Release, on windows, produced static libraries of ~65Mb. The dynamic library however is about 5Mb. Which dlls are bringing the extras in the static - can we cut them off? In other words, are there any switches to turn

RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-04 Thread Eric Erhardt
I'm working on a PR for the C# bindings. I hope to have it up in the next day or two. Integration tests for C# would be a great addition at some point - it's been on my backlog. For now I plan on manually testing it. -Original Message- From: Wes McKinney Sent: Tuesday, September 3,

[jira] [Created] (ARROW-6453) [C++] More informative error messages from S3

2019-09-04 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6453: - Summary: [C++] More informative error messages from S3 Key: ARROW-6453 URL: https://issues.apache.org/jira/browse/ARROW-6453 Project: Apache Arrow Issue