[jira] [Commented] (ARROW-361) Python: Support reading a column-selection from Parquet files

2016-11-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15635451#comment-15635451 ] Uwe L. Korn commented on ARROW-361: --- PR: https://github.com/apache/arrow/pull/197 > Pyth

Re: [Reminder] Arrow sync today at 10am

2016-11-03 Thread Julien Le Dem
next sync in 2 weeks, same time. Let me know if you want to get reminded with a calendar invite. On Thu, Nov 3, 2016 at 10:38 AM, Julien Le Dem wrote: > Notes: > > Wes (Two Sigma): > Status: >- IO layer >- Benchmark >- Python -> arrow with zero-copy. Some overhead in Pandas. discus

Re: Pyarrow getting import error for libarrow.so

2016-11-03 Thread Wes McKinney
Yes, please -- any patch must have an associated JIRA. On Wed, Nov 2, 2016 at 5:15 PM, Bryan Cutler wrote: > Thanks for clearing that up Wes! I couldn't figure out why it was working > before I updated, but makes sense now. I'd be happy to add this to the > Python README, is it worth opening a

Re: [Reminder] Arrow sync today at 10am

2016-11-03 Thread Julien Le Dem
Notes: Wes (Two Sigma): Status: - IO layer - Benchmark - Python -> arrow with zero-copy. Some overhead in Pandas. discussed for Pandas 2.0 Priority: - integration tests: - generate some hand coded datasets to validate on each side (Java/CPP) - cross validation - c

[jira] [Created] (ARROW-364) [Python] Multithreaded conversion between Arrow record batches as NumPy arrays (for pandas)

2016-11-03 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-364: -- Summary: [Python] Multithreaded conversion between Arrow record batches as NumPy arrays (for pandas) Key: ARROW-364 URL: https://issues.apache.org/jira/browse/ARROW-364 P

[jira] [Commented] (ARROW-362) Python: Calling to_pandas on a table read from Parquet leaks memory

2016-11-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633574#comment-15633574 ] Uwe L. Korn commented on ARROW-362: --- This was to one too much reference count in the zero

[jira] [Resolved] (ARROW-323) [Python] Opt-in to PyArrow parquet build rather than skipping silently on failure

2016-11-03 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-323. Resolution: Fixed Issue resolved by pull request 194 [https://github.com/apache/arrow/pull/194] > [P

[jira] [Created] (ARROW-363) Set up Java/C++ integration test harness

2016-11-03 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-363: -- Summary: Set up Java/C++ integration test harness Key: ARROW-363 URL: https://issues.apache.org/jira/browse/ARROW-363 Project: Apache Arrow Issue Type: New Featu

[jira] [Assigned] (ARROW-230) Python: Do not name modules like native ones (i.e. rename pyarrow.io)

2016-11-03 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-230: -- Assignee: Wes McKinney > Python: Do not name modules like native ones (i.e. rename pyarrow.io) >

[jira] [Resolved] (ARROW-230) Python: Do not name modules like native ones (i.e. rename pyarrow.io)

2016-11-03 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-230. Resolution: Fixed Resolved by https://github.com/apache/arrow/commit/d4148759a266d90dacd1ca2b7b7ff0d

Re: [Reminder] Arrow sync today at 10am

2016-11-03 Thread Julien Le Dem
it is happening now. On Thu, Nov 3, 2016 at 7:44 AM, Julien Le Dem wrote: > Every other week we do an Arrow sync over google hangout: > https://plus.google.com/hangouts/_/dremio.com/arrow > Thursday 10am PT > -- > Julien > -- Julien

[jira] [Created] (ARROW-362) Python: Calling to_pandas on a table read from Parquet leaks memory

2016-11-03 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-362: - Summary: Python: Calling to_pandas on a table read from Parquet leaks memory Key: ARROW-362 URL: https://issues.apache.org/jira/browse/ARROW-362 Project: Apache Arrow

[jira] [Updated] (ARROW-362) Python: Calling to_pandas on a table read from Parquet leaks memory

2016-11-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated ARROW-362: -- Description: Steps to reproduce: * Read a parquet file with {{pyarrow.parquet.read_table}} and convert t

[jira] [Updated] (ARROW-362) Python: Calling to_pandas on a table read from Parquet leaks memory

2016-11-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated ARROW-362: -- Description: Steps to reproduce: * Read a parquet file with {{pyarrow.parquet.read_table}} and convert t

[jira] [Updated] (ARROW-362) Python: Calling to_pandas on a table read from Parquet leaks memory

2016-11-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated ARROW-362: -- Affects Version/s: 0.1.0 > Python: Calling to_pandas on a table read from Parquet leaks memory >

[jira] [Created] (ARROW-361) Python: Support reading a column-selection from Parquet files

2016-11-03 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-361: - Summary: Python: Support reading a column-selection from Parquet files Key: ARROW-361 URL: https://issues.apache.org/jira/browse/ARROW-361 Project: Apache Arrow I

[Reminder] Arrow sync today at 10am

2016-11-03 Thread Julien Le Dem
Every other week we do an Arrow sync over google hangout: https://plus.google.com/hangouts/_/dremio.com/arrow Thursday 10am PT -- Julien

Re: Question

2016-11-03 Thread Donald Foss
Abdulrahman, your schema diagram did not come through, at least not in a way I could view it in Mac Mail. Looking at the message source, I don’t see the specified Content ID [cid] or inline data element for the graphic. Generally speaking, I believe the Arrow project defines data structures, f

[jira] [Created] (ARROW-360) C++: Add method to shrink PoolBuffer using realloc

2016-11-03 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-360: - Summary: C++: Add method to shrink PoolBuffer using realloc Key: ARROW-360 URL: https://issues.apache.org/jira/browse/ARROW-360 Project: Apache Arrow Issue Type: I

Question

2016-11-03 Thread Abdulrahman Kaitoua
Dears, I would like to get more information from you in order for me to use Arrow and be able to contribute in the near future. What i see in Arrow that i can read and write Arrow files (from the vector test classes), i did not see tests for sending data over a network. As i understood from t