Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Bryan Cutler
I'd rather not see ARROW-9223 reverted, if possible. I will put up my hacked patch to Spark for this so we can test against it if needed, and could share my branch if anyone else wants to test it locally. On Sun, Jul 19, 2020 at 7:35 PM Micah Kornfield wrote: > I'll spend some time tonight on

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Micah Kornfield
I'll spend some time tonight on it and if I can't get round trip working I'll handle reverting On Sunday, July 19, 2020, Wes McKinney wrote: > On Sun, Jul 19, 2020 at 7:33 PM Neal Richardson > wrote: > > > > It sounds like you may have identified a pyarrow bug, which sounds not > > good,

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
On Sun, Jul 19, 2020 at 7:33 PM Neal Richardson wrote: > > It sounds like you may have identified a pyarrow bug, which sounds not > good, though I don't know enough about the broader context to know whether > this is (1) worse than 0.17 or (2) release blocking. I defer to y'all who > know better.

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Neal Richardson
It sounds like you may have identified a pyarrow bug, which sounds not good, though I don't know enough about the broader context to know whether this is (1) worse than 0.17 or (2) release blocking. I defer to y'all who know better. If there are quirks in how Spark handles timezone-naive

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
Honestly I think reverting is the best option. This change evidently needs more time to "season" and perhaps this is motivation to enhance test coverage in a number of places. On Sun, Jul 19, 2020 at 7:11 PM Wes McKinney wrote: > > I am OK with any solution that doesn't delay the production of

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
I am OK with any solution that doesn't delay the production of the next RC by more than 24 hours On Sun, Jul 19, 2020 at 7:08 PM Micah Kornfield wrote: > > If I read the example right it looks like constructing from python types > isn't keeping timezones into in place? I can try make a patch

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Micah Kornfield
If I read the example right it looks like constructing from python types isn't keeping timezones into in place? I can try make a patch that fixes that tonight or would the preference be to revert my patch (note I think another bug with a prior bug was fixed in my PR as well) -Micah On Sunday,

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
I put up a PR to revert ARROW-9223. If someone cannot resolve the problem another way that I recommend applying the reversion and cutting RC2 https://github.com/apache/arrow/pull/7802 To state the obvious we must verify that this resolves the Spark problem also On Sun, Jul 19, 2020 at 6:55 PM

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
I think I see the problem now: In [40]: parr Out[40]: 0 {'f0': 1969-12-31 16:00:00-08:00} 1{'f0': 1969-12-31 16:00:00.01-08:00} 2{'f0': 1969-12-31 16:00:00.02-08:00} dtype: object In [41]: parr[0]['f0'] Out[41]: datetime.datetime(1969, 12, 31, 16, 0, tzinfo=) In [42]:

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
Ah I forgot that this is a "feature" of nanosecond timestamps In [21]: arr = pa.array([0, 1, 2], type=pa.timestamp('us', 'America/Los_Angeles')) In [22]: struct_arr = pa.StructArray.from_arrays([arr], names=['f0']) In [23]: struct_arr.to_pandas() Out[23]: 0 {'f0': 1969-12-31

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
There seems to be other broken StructArray stuff In [14]: arr = pa.array([0, 1, 2], type=pa.timestamp('ns', 'America/Los_Angeles')) In [15]: struct_arr = pa.StructArray.from_arrays([arr], names=['f0']) In [16]: struct_arr Out[16]: -- is_valid: all not null -- child 0 type: timestamp[ns,

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
Well, the problem is that time zones are really finicky comparing Spark (which uses a localtime interpretation of timestamps without time zone) and Arrow (which has naive timestamps -- a concept similar but different from the SQL concept TIMESTAMP WITHOUT TIME ZONE -- and tz-aware timestamps). So

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Neal Richardson
If it’s a display problem, should it block the release? Sent from my iPhone > On Jul 19, 2020, at 3:57 PM, Wes McKinney wrote: > > I opened https://issues.apache.org/jira/browse/ARROW-9525 about the > display problem. My guess is that there are other problems lurking > here > >> On Sun, Jul

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-9525 about the display problem. My guess is that there are other problems lurking here On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney wrote: > > hi Bryan, > > This is a display bug > > In [6]: arr = pa.array([0, 1, 2], type=pa.timestamp('ns', >

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
hi Bryan, This is a display bug In [6]: arr = pa.array([0, 1, 2], type=pa.timestamp('ns', 'America/Los_Angeles')) In [7]: arr.view('int64') Out[7]: [ 0, 1, 2 ] In [8]: arr Out[8]: [ 1970-01-01 00:00:00.0, 1970-01-01 00:00:00.1, 1970-01-01 00:00:00.2 ] In

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
Well this is not good and pretty disappointing given that we had nearly a month to sort through the implications of Micah’s patch. We should try to resolve this ASAP On Sun, Jul 19, 2020 at 5:10 PM Bryan Cutler wrote: > +0 (non-binding) > > I ran verification script for binaries and then

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Bryan Cutler
+0 (non-binding) I ran verification script for binaries and then source, as below, and both look good ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1 TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1 dev/release/verify-release-candidate.sh source 1.0.0

Re: Building pyarrow on macOS

2020-07-19 Thread Sutou Kouhei
Hi, It may be a problem of our CMake configuration system. Could you file this to JIRA? https://issues.apache.org/jira/projects/ARROW/issues And could you provide your Portfile? Thanks, -- kou In <43c2cc28-7792-4baa-bdb7-40600812f...@gmail.com> "Re: Building pyarrow on macOS " on Sun, 19

Re: [VOTE] Release Apache Arrow 1.0.0 - RC1

2020-07-19 Thread Wes McKinney
+1 (binding) I ran the release verification (source and binary) on Ubuntu 18.04 and Windows with MSVC. I experienced a symbol loading issue on macOS [1] but I suspect it's something environment-specific on the machine. Since we have viable macOS packages I'm not concerned about this and can

Arrow C++ maintainer vacations

2020-07-19 Thread Wes McKinney
hi folks, Antoine and I are both on vacation the next two weeks and won't be doing many code reviews. I'll be back doing some code reviews albeit at 50% or less capacity the week of July 27. I encourage other maintainers to step in to help carry the load of getting PRs merged. If there's anything

Re: Building pyarrow on macOS

2020-07-19 Thread Steven Smith
This issue has arisen again on a Macports buildbot. The Apache Arrow cmake system locates the installed libraries and headers, but then says it can’t find them. Does anyone know what would cause this build failure? It works on my system, but not on a build bot: Build command: > Executing: cd

[NIGHTLY] Arrow Build Report for Job nightly-2020-07-19-0

2020-07-19 Thread Crossbow
Arrow Build Report for Job nightly-2020-07-19-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-19-0 Failed Tasks: - conda-linux-gcc-py36-cpu: URL: