[Discuss] [Rust] Looking to add Wasm32 compile target for rust library

2020-07-13 Thread RJ Atwal
Hi all, Looking for guidance on how to submit a design and PR to add WASM32 support to apache arrow's rust libraries. I am looking to use the arrow library to pass data in arrow format between the host spark environment and UDFs defined in WASM . I created the following JIRA ticket to capture

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Wes McKinney
Thanks Micah. I'll check in the test file that has the V6 metadata and open a PR later today On Mon, Jul 13, 2020 at 5:53 PM Micah Kornfield wrote: > > To clarify on UBSAN and enums. My understanding is: > > enum A { a = 1, b =2, c = 3}; > class enum B : int16_t { a = 1, b = 2, c = 3}; > > A a

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Micah Kornfield
To clarify on UBSAN and enums. My understanding is: enum A { a = 1, b =2, c = 3}; class enum B : int16_t { a = 1, b = 2, c = 3}; A a = static_cast(4); // UB B b = static_cast(4); // Not UB. Declaring the holding type makes this allowable. On Mon, Jul 13, 2020 at 3:44 PM Micah Kornfield

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Micah Kornfield
Please see [1]. I ran this arrow-ipc-read-write-test with UBSAN enabled and it passed (this isn't my normal dev environment so please double check). https://github.com/emkornfield/arrow/commit/7fbd0fb95f7ea164284720428c7974b87b4b2443 On Mon, Jul 13, 2020 at 3:12 PM Micah Kornfield wrote: > I

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Micah Kornfield
I think this might be more complicated, let me see if i can write a test that demonstrates what I'm talking about. On Mon, Jul 13, 2020 at 3:10 PM Wes McKinney wrote: > Here's a patch that does the check > > > https://github.com/wesm/arrow/commit/5bfdb4255a66a4ec62b1c36ba07682fad47df9a7 > >

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Wes McKinney
Here's a patch that does the check https://github.com/wesm/arrow/commit/5bfdb4255a66a4ec62b1c36ba07682fad47df9a7 Here is a serialized schema that uses a V6 version https://drive.google.com/file/d/1GiWh5yKXdMaLRWU5K4cnGW2ilybF0LF_/view?usp=sharing See in action

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Wes McKinney
On Mon, Jul 13, 2020 at 4:43 PM Micah Kornfield wrote: >> >> We don't have any test cases that have a future metadata version. I >> made a branch where I added V6 and wrote an IPC message, then found >> that I was unable to determine that it was out of bounds (presumably >> UBSAN would error,

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Micah Kornfield
> > We don't have any test cases that have a future metadata version. I > made a branch where I added V6 and wrote an IPC message, then found > that I was unable to determine that it was out of bounds (presumably > UBSAN would error, though, but we need a runtime error outside of > ASAN/UBSAN).

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Wes McKinney
On Mon, Jul 13, 2020 at 4:31 PM Micah Kornfield wrote: > > > > > > > That static cast is currently undefined behavior. > > Is ubsan reporting this? When looking into the feature enum I tried to > understand if that was valid. At the time I read the C++ spec* if the enum > has an explicitly

Re: [DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Micah Kornfield
> > > That static cast is currently undefined behavior. Is ubsan reporting this? When looking into the feature enum I tried to understand if that was valid. At the time I read the C++ spec* if the enum has an explicitly declared type, all values in that types range are supported. The generated

[DISCUSS] How to provide forward compatibility with MetadataVersion

2020-07-13 Thread Wes McKinney
I've discovered while working on ARROW-9399 that it is very difficult with the Flatbuffers API in C++ to detect a MetadataVersion [1] that is higher than the current version. For example, suppose that 3 or 4 years from now we move from version V5 to version V6. The generated Flatbuffers code

Re: Timeline for next major Arrow release (1.0.0)

2020-07-13 Thread Micah Kornfield
In that case, I will take my time :) On Mon, Jul 13, 2020 at 11:00 AM Antoine Pitrou wrote: > > I don't think we want to introduce last-minute unforeseen issues (such > as security issues) in the IPC layer, so personally I'd rather defer the > feature enum implementation to the next version. >

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-07-13-0

2020-07-13 Thread Neal Richardson
https://issues.apache.org/jira/browse/ARROW-9443 is the other R build issue. I should get a fix one way or another today, but regardless, it is not release-blocking. Neal On Mon, Jul 13, 2020 at 8:24 AM Neal Richardson wrote: > conda-r is ticketed

Re: Timeline for next major Arrow release (1.0.0)

2020-07-13 Thread Antoine Pitrou
I don't think we want to introduce last-minute unforeseen issues (such as security issues) in the IPC layer, so personally I'd rather defer the feature enum implementation to the next version. Just my two cents :) Regards Antoine. Le 13/07/2020 à 19:42, Micah Kornfield a écrit : > I'll try

Re: Timeline for next major Arrow release (1.0.0)

2020-07-13 Thread Micah Kornfield
I'll try to make PRs for the feature enum tonight. I don't think this is a blocker as there are other mechanisms to detect the current values listed. On Mon, Jul 13, 2020 at 10:37 AM Wes McKinney wrote: > Aside from fixing nightly builds, which of the 25 issues remaining in > the 1.0.0

Re: Timeline for next major Arrow release (1.0.0)

2020-07-13 Thread Wes McKinney
Aside from fixing nightly builds, which of the 25 issues remaining in the 1.0.0 milestone must be resolved in order to release? Speak now or forever hold your peace =) As one problem where I haven't seen activity, we have not implemented the Feature Enum anywhere, do we want to try to add simple

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-07-13-0

2020-07-13 Thread Neal Richardson
conda-r is ticketed (https://issues.apache.org/jira/browse/ARROW-9409) and has a PR (https://github.com/apache/arrow/pull/7706) but there are remaining issues and I am uncertain that this is a build worth maintaining anyway. If anyone has opinions, please comment on the PR. As for any other R

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-13 Thread Antoine Pitrou
Agreed, but even then, if some Parquet files are generated inside of a well-defined system which only needs to be interoperable with itself, it's not necessaril harmful to allow LZ4 compression when writing new files. Regards Antoine. Le 13/07/2020 à 17:07, Wes McKinney a écrit : > I didn’t

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-13 Thread Krisztián Szűcs
On Mon, Jul 13, 2020 at 11:15 AM Antoine Pitrou wrote: > > > I'm not sure that's a good idea. There are probably Parquet files that > are only ever used with the Arrow implementation (Arrow C++, Arrow > Python, Arrow R...). I tend to agree with Antoine here. As an alternative to disabling the

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-13 Thread Wes McKinney
I didn’t say to disable _reading_ them, only writing them. On Mon, Jul 13, 2020 at 4:15 AM Antoine Pitrou wrote: > > I'm not sure that's a good idea. There are probably Parquet files that > are only ever used with the Arrow implementation (Arrow C++, Arrow > Python, Arrow R...). > > I admit

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-13 Thread Patrick Pai
I'll volunteer to disable writing/reading LZ4. I'll submit a patch in the next few days. On 2020/07/12 22:11:33, Wes McKinney wrote: > Since there hasn't been other movement on this, we need to disable > writing LZ4-compressed files until this can be investigated more > thoroughly. If someone

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-07-13-0

2020-07-13 Thread Krisztián Szűcs
Failures with patches: - wheel-osx-*: https://github.com/apache/arrow/pull/7728 should fix it - conda-cpp-valgrind: https://github.com/apache/arrow/pull/7727 should fix it Known failures: - conda-python-3.8-jpype: https://issues.apache.org/jira/browse/ARROW-9385 New failures: -

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-07-13-0

2020-07-13 Thread Krisztián Szűcs
I resubmitted the nightly jobs and the builds are running: https://github.com/ursa-labs/crossbow/branches/all?query=build-834 We'll see tomorrow whether the issue persists or not. On Mon, Jul 13, 2020 at 1:31 PM Krisztián Szűcs wrote: > > This report is misleading because no builds were

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-07-13-0

2020-07-13 Thread Krisztián Szűcs
This report is misleading because no builds were triggered at all, see https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-13-0 I'm investigating it. On Mon, Jul 13, 2020 at 12:15 PM Crossbow wrote: > > > Arrow Build Report for Job nightly-2020-07-13-0 > > All tasks: >

Re: [EXTERNAL] Re: .NET support for Arrow

2020-07-13 Thread Takashi Hashida
Hi, My organization already uses the official C# Arrow library for a product. It seems that the official library is working fine on the product, so I think it has some stability compared to what it used to be. Moreover, I think that if we focus on the official C# implementation, we can test

Re: [EXTERNAL] Re: .NET support for Arrow

2020-07-13 Thread Takashi Hashida
Hi, My organization alreadly uses the official C# Arrow library for a product. It seems that the official library is working fine on the product, so I think it has some stability compared to what it used to be. Moreover, I think that if we focus on the official C# implementation, we can test

[NIGHTLY] Arrow Build Report for Job nightly-2020-07-13-0

2020-07-13 Thread Crossbow
Arrow Build Report for Job nightly-2020-07-13-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-13-0 Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-13-0-github-centos-6-amd64 -

Re: language independent representation of filter expressions

2020-07-13 Thread Antoine Pitrou
On Sat, 11 Jul 2020 09:55:16 -0700 Jacques Nadeau wrote: > > I'm against extending use of flatbuf within Arrow. The language support is > too weak. Language support isn't just about having a binding for different > languages, it is about having a high-quality binding. Could you please expand on

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-07-13 Thread Antoine Pitrou
I'm not sure that's a good idea. There are probably Parquet files that are only ever used with the Arrow implementation (Arrow C++, Arrow Python, Arrow R...). I admit I'm also not terribly bothered about this, since the Parquet community itself doesn't seem to care much about the issue (it has