Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-16 Thread Bogdan Klichuk
Nice, hank you for the approximate timeline! On Mon, Jun 17, 2019 at 1:15 AM Micah Kornfield wrote: > Hi Bogdan, > >> Alright, so speaking of serialization of pyarrow.Table vs Feather, if >> they are pretty much the same, but arrow alone shouldn't >> be used to long-storage, is this also the cas

Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-16 Thread Micah Kornfield
Hi Bogdan, > Alright, so speaking of serialization of pyarrow.Table vs Feather, if they > are pretty much the same, but arrow alone shouldn't > be used to long-storage, is this also the case for Feather or can it be a > valid option for my case? Per Wes's e-mail on similar thread[1], once we rea

Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-16 Thread Bogdan Klichuk
Hello. Thanks for the reply! On Sun, Jun 16, 2019 at 8:40 AM Wes McKinney wrote: > hi Micah, > > On Sun, Jun 16, 2019 at 12:16 AM Micah Kornfield > wrote: > > > > Hi Bogdan, > > I'm not an expert here but answers based on my understanding are below: > > > > 1) Is there something I'm missing i

[jira] [Created] (ARROW-5624) [C++] -Duriparser_SOURCE=BUNDLED is broken

2019-06-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5624: --- Summary: [C++] -Duriparser_SOURCE=BUNDLED is broken Key: ARROW-5624 URL: https://issues.apache.org/jira/browse/ARROW-5624 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-5623) [CI][GLib] Failed on macOS

2019-06-16 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-5623: --- Summary: [CI][GLib] Failed on macOS Key: ARROW-5623 URL: https://issues.apache.org/jira/browse/ARROW-5623 Project: Apache Arrow Issue Type: Test Comp

[jira] [Created] (ARROW-5622) [C++][Dataset] arrow-dataset.pc isn't provided

2019-06-16 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-5622: --- Summary: [C++][Dataset] arrow-dataset.pc isn't provided Key: ARROW-5622 URL: https://issues.apache.org/jira/browse/ARROW-5622 Project: Apache Arrow Issue Type:

Re: Arrow as a common open standard for machine learning data

2019-06-16 Thread Sebastien Binet
hi there, On Sun, Jun 16, 2019 at 6:07 AM Micah Kornfield wrote: > > * Can Feather files already be read in Java/Go/C#/...? > > I don't know the status of feather. The arrow file format should be > readable by Java and C++ (I believe all the languages that bind C++ also > support the format,

[jira] [Created] (ARROW-5620) [Go] implement read/write IPC for Map arrays

2019-06-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5620: -- Summary: [Go] implement read/write IPC for Map arrays Key: ARROW-5620 URL: https://issues.apache.org/jira/browse/ARROW-5620 Project: Apache Arrow Issue T

[jira] [Created] (ARROW-5621) [Go] implement read/write IPC for Decimal128 arrays

2019-06-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5621: -- Summary: [Go] implement read/write IPC for Decimal128 arrays Key: ARROW-5621 URL: https://issues.apache.org/jira/browse/ARROW-5621 Project: Apache Arrow

Re: New CI system: Ursabot

2019-06-16 Thread Uwe L. Korn
On Fri, Jun 14, 2019, at 11:23 PM, Krisztián Szűcs wrote: > On Fri, Jun 14, 2019 at 9:04 PM Wes McKinney wrote: > > > hi Krisz, > > > > Thanks for working on this! It already helped me fix a Python 2.7-only > > bug yesterday https://github.com/apache/arrow/pull/4553 > > > > I have a bunch of q

Re: [DISCUSS] Timing of release and making a 1.0.0 release marking Arrow protocol stability

2019-06-16 Thread Wes McKinney
I also agree that time-based releases are the only reasonable thing to do going forward. It also creates a sense of urgency to complete work by a certain date, in the sense of "the ship is leaving the port on X, all aboard!" The Blocker priority on JIRA can be used to communicate if some issue sho

Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-16 Thread Wes McKinney
hi Micah, On Sun, Jun 16, 2019 at 12:16 AM Micah Kornfield wrote: > > Hi Bogdan, > I'm not an expert here but answers based on my understanding are below: > > 1) Is there something I'm missing in understanding difference between > > serializing dataframe directly using PyArrow and serializing > >

Re: Arrow as a common open standard for machine learning data

2019-06-16 Thread Wes McKinney
hi Micah and Joaquin, With regards to the Feather format, I have been waiting a _long_ time for the R community to "catch up" with Apache Arrow development and get a release of an Arrow R project out that can be installed by most R users. We are finally approaching that point, and so Feather devel

Re: [DISCUSS] Timing of release and making a 1.0.0 release marking Arrow protocol stability

2019-06-16 Thread Sutou Kouhei
Hi, > If we want to stick to project-wide releases where all languages and > components are released once, then time-based releases are the only > reasonable scheme IMHO. Of course, there may still be release-blocking > issues, but those should only be important regressions, not "this is a > feat