[jira] [Assigned] (ARROW-3741) [R] Add support for arrow::compute::Cast to convert Arrow arrays from one type to another

2018-11-13 Thread JIRA
[ https://issues.apache.org/jira/browse/ARROW-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Romain François reassigned ARROW-3741: -- Assignee: Romain François > [R] Add support for arrow::compute::Cast to convert Arrow

[jira] [Updated] (ARROW-3787) Implement From for BinaryArray

2018-11-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3787: -- Labels: pull-request-available (was: ) > Implement From for BinaryArray >

[jira] [Created] (ARROW-3787) Implement From for BinaryArray

2018-11-13 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-3787: -- Summary: Implement From for BinaryArray Key: ARROW-3787 URL: https://issues.apache.org/jira/browse/ARROW-3787 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-3786) Enable merge_arrow_pr.py script to run in non-English JIRA accounts.

2018-11-13 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-3786: --- Summary: Enable merge_arrow_pr.py script to run in non-English JIRA accounts. Key: ARROW-3786 URL: https://issues.apache.org/jira/browse/ARROW-3786 Project: Apache

[jira] [Updated] (ARROW-2956) [Python] Arrow plasma throws ArrowIOError and process crashed

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2956: Summary: [Python] Arrow plasma throws ArrowIOError and process crashed (was: [Python]Arrow plasma

[jira] [Created] (ARROW-3785) [C++] Use double-conversion conda package in CI toolchain

2018-11-13 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3785: --- Summary: [C++] Use double-conversion conda package in CI toolchain Key: ARROW-3785 URL: https://issues.apache.org/jira/browse/ARROW-3785 Project: Apache Arrow

[jira] [Updated] (ARROW-3780) [R] Failed to fetch data: invalid data when collecting int16

2018-11-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3780: -- Labels: pull-request-available spark (was: spark) > [R] Failed to fetch data: invalid data

[jira] [Commented] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685945#comment-16685945 ] Wes McKinney commented on ARROW-3781: - It would definitely require some design work. In

[jira] [Created] (ARROW-3784) [R] Array with type fails with x is not a vector

2018-11-13 Thread Javier Luraschi (JIRA)
Javier Luraschi created ARROW-3784: -- Summary: [R] Array with type fails with x is not a vector Key: ARROW-3784 URL: https://issues.apache.org/jira/browse/ARROW-3784 Project: Apache Arrow

[jira] [Updated] (ARROW-3784) [R] Array with type fails with x is not a vector

2018-11-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3784: -- Labels: pull-request-available (was: ) > [R] Array with type fails with x is not a vector >

[jira] [Commented] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Antoine Pitrou (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685817#comment-16685817 ] Antoine Pitrou commented on ARROW-3781: --- We may want to think about flushing in a separate thread,

[jira] [Comment Edited] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685812#comment-16685812 ] Wes McKinney edited comment on ARROW-3781 at 11/13/18 10:07 PM: What I

[jira] [Commented] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685812#comment-16685812 ] Wes McKinney commented on ARROW-3781: - What I mean is that if I call {{out->Flush()}} it may not be

[jira] [Commented] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685809#comment-16685809 ] Wes McKinney commented on ARROW-3781: - Sorry, I'm using file systems here again proverbially.

[jira] [Commented] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Antoine Pitrou (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685791#comment-16685791 ] Antoine Pitrou commented on ARROW-3781: --- Are you thinking about the `Flush` method? It's as

[jira] [Commented] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685779#comment-16685779 ] Wes McKinney commented on ARROW-3781: - I'm thinking about the "file systems" HDFS, AWS S3, Google

[jira] [Commented] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685784#comment-16685784 ] Wes McKinney commented on ARROW-3781: - For cloud stores, at some point we might want to consider

[jira] [Updated] (ARROW-3783) [R] Incorrect collection of float type

2018-11-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3783: -- Labels: pull-request-available (was: ) > [R] Incorrect collection of float type >

[jira] [Created] (ARROW-3783) [R] Incorrect collection of float type

2018-11-13 Thread Javier Luraschi (JIRA)
Javier Luraschi created ARROW-3783: -- Summary: [R] Incorrect collection of float type Key: ARROW-3783 URL: https://issues.apache.org/jira/browse/ARROW-3783 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-3782) [C++] Implement BufferedReader for C++

2018-11-13 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3782: --- Summary: [C++] Implement BufferedReader for C++ Key: ARROW-3782 URL: https://issues.apache.org/jira/browse/ARROW-3782 Project: Apache Arrow Issue Type: New

[jira] [Resolved] (ARROW-3306) [R] Objects and support functions different kinds of arrow::Buffer

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3306. - Resolution: Fixed Assignee: Romain François This was resolved in passing. > [R] Objects

[jira] [Commented] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Antoine Pitrou (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685727#comment-16685727 ] Antoine Pitrou commented on ARROW-3781: --- I don't think it's dependent on filesystem latency. Unless

[jira] [Updated] (ARROW-2237) [Python] [Plasma] Huge pages test failure

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2237: Summary: [Python] [Plasma] Huge pages test failure (was: [Python] Huge tables test failure) >

[jira] [Created] (ARROW-3781) [C++] Configure buffer size in arrow::io::BufferedOutputStream

2018-11-13 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3781: --- Summary: [C++] Configure buffer size in arrow::io::BufferedOutputStream Key: ARROW-3781 URL: https://issues.apache.org/jira/browse/ARROW-3781 Project: Apache Arrow

[jira] [Updated] (ARROW-2807) [Python] Enable memory-mapping to be toggled in get_reader when reading Parquet files

2018-11-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2807: -- Labels: parquet pull-request-available (was: parquet) > [Python] Enable memory-mapping to be

[jira] [Commented] (ARROW-3344) [Python] test_plasma.py fails (in test_plasma_list)

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685711#comment-16685711 ] Wes McKinney commented on ARROW-3344: - This bug is still present for me on Ubuntu 14.04 {code}

[jira] [Assigned] (ARROW-2807) [Python] Enable memory-mapping to be toggled in get_reader when reading Parquet files

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2807: --- Assignee: Wes McKinney > [Python] Enable memory-mapping to be toggled in get_reader when

[jira] [Commented] (ARROW-3780) [R] Failed to fetch data: invalid data when collecting int16

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685675#comment-16685675 ] Wes McKinney commented on ARROW-3780: - I was pretty sure this non-specific error message was going to

[jira] [Commented] (ARROW-3779) [Python] Validate timezone passed to pa.timestamp

2018-11-13 Thread Krisztian Szucs (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685682#comment-16685682 ] Krisztian Szucs commented on ARROW-3779: Renamed. I created the issue before I saw that... On

[jira] [Updated] (ARROW-3779) [Python] Validate timezone passed to pa.timestamp

2018-11-13 Thread Krisztian Szucs (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-3779: --- Summary: [Python] Validate timezone passed to pa.timestamp (was: [Format] Standardize

[jira] [Updated] (ARROW-3780) [R] Failed to fetch data: invalid data when collecting int16

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3780: Labels: spark (was: ) > [R] Failed to fetch data: invalid data when collecting int16 >

[jira] [Updated] (ARROW-3780) [R] Failed to fetch data: invalid data when collecting int16

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3780: Fix Version/s: 0.12.0 > [R] Failed to fetch data: invalid data when collecting int16 >

[jira] [Created] (ARROW-3780) [R] Failed to fetch data: invalid data when collecting int16

2018-11-13 Thread Javier Luraschi (JIRA)
Javier Luraschi created ARROW-3780: -- Summary: [R] Failed to fetch data: invalid data when collecting int16 Key: ARROW-3780 URL: https://issues.apache.org/jira/browse/ARROW-3780 Project: Apache Arrow

[jira] [Commented] (ARROW-3779) [Format] Standardize timezone specification

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685672#comment-16685672 ] Wes McKinney commented on ARROW-3779: - What do we need to do beyond what's in Schema.fbs?

[jira] [Created] (ARROW-3779) [Format] Standardize timezone specification

2018-11-13 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-3779: -- Summary: [Format] Standardize timezone specification Key: ARROW-3779 URL: https://issues.apache.org/jira/browse/ARROW-3779 Project: Apache Arrow Issue

[jira] [Updated] (ARROW-3778) [C++] Don't put implementations in test-util.h

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3778: Fix Version/s: 0.12.0 > [C++] Don't put implementations in test-util.h >

[jira] [Commented] (ARROW-3778) [C++] Don't put implementations in test-util.h

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685666#comment-16685666 ] Wes McKinney commented on ARROW-3778: - Agreed. I had partly done this in

[jira] [Updated] (ARROW-3738) [C++] Add CSV conversion option to parse ISO8601-like timestamp strings

2018-11-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3738: -- Labels: csv pull-request-available (was: csv) > [C++] Add CSV conversion option to parse

[jira] [Created] (ARROW-3778) [C++] Don't put implementations in test-util.h

2018-11-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3778: - Summary: [C++] Don't put implementations in test-util.h Key: ARROW-3778 URL: https://issues.apache.org/jira/browse/ARROW-3778 Project: Apache Arrow Issue

[jira] [Updated] (ARROW-1837) [Java] Unable to read unsigned integers outside signed range for bit width in integration tests

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1837: Fix Version/s: (was: 0.12.0) 0.13.0 > [Java] Unable to read unsigned

[jira] [Updated] (ARROW-1875) Write 64-bit ints as strings in integration test JSON files

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1875: Fix Version/s: (was: 0.12.0) 0.13.0 > Write 64-bit ints as strings in

[jira] [Assigned] (ARROW-3738) [C++] Add CSV conversion option to parse ISO8601-like timestamp strings

2018-11-13 Thread Antoine Pitrou (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-3738: - Assignee: Antoine Pitrou > [C++] Add CSV conversion option to parse ISO8601-like

[jira] [Updated] (ARROW-3085) [Rust] Add an adapter for parquet.

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3085: Component/s: Rust > [Rust] Add an adapter for parquet. > -- > >

[jira] [Updated] (ARROW-3346) [Python] Segfault when reading parquet files if torch is imported before pyarrow

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3346: Summary: [Python] Segfault when reading parquet files if torch is imported before pyarrow (was:

[jira] [Updated] (ARROW-2786) [JS] Read Parquet files in JavaScript

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2786: Labels: parquet (was: ) > [JS] Read Parquet files in JavaScript >

[jira] [Updated] (ARROW-2627) [Python] Add option (or some equivalent) to toggle memory mapping functionality when using parquet.ParquetFile or other read entry points

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2627: Labels: parquet (was: ) > [Python] Add option (or some equivalent) to toggle memory mapping >

[jira] [Updated] (ARROW-2079) [Python] Possibly use `_common_metadata` for schema if `_metadata` isn't available

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2079: Labels: parquet (was: ) > [Python] Possibly use `_common_metadata` for schema if `_metadata`

[jira] [Updated] (ARROW-1957) [Python] Handle nanosecond timestamps in parquet serialization

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1957: Labels: parquet (was: ) > [Python] Handle nanosecond timestamps in parquet serialization >

[jira] [Updated] (ARROW-1957) [Python] Handle nanosecond timestamps in parquet serialization

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1957: Component/s: Python > [Python] Handle nanosecond timestamps in parquet serialization >

[jira] [Updated] (ARROW-3248) [C++] Arrow tests should have label "arrow"

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3248: Fix Version/s: 0.12.0 > [C++] Arrow tests should have label "arrow" >

[jira] [Updated] (ARROW-3325) [Python] Support reading Parquet binary/string columns as pandas Categorical

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3325: Labels: parquet (was: ) > [Python] Support reading Parquet binary/string columns as pandas

[jira] [Updated] (ARROW-1957) [Python] Handle nanosecond timestamps in parquet serialization

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1957: Summary: [Python] Handle nanosecond timestamps in parquet serialization (was: Handle nanosecond

[jira] [Updated] (ARROW-3085) [Rust] Add an adapter for parquet.

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3085: Labels: parquet (was: ) > [Rust] Add an adapter for parquet. > --

[jira] [Updated] (ARROW-3762) [C++] Arrow table reads error when overflowing capacity of BinaryArray

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3762: Labels: parquet (was: ) > [C++] Arrow table reads error when overflowing capacity of BinaryArray

[jira] [Closed] (ARROW-3139) [Python] ArrowIOError: Arrow error: Capacity error during read

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-3139. --- Resolution: Duplicate > [Python] ArrowIOError: Arrow error: Capacity error during read >

[jira] [Updated] (ARROW-2360) [C++] Add set_chunksize for RecordBatchReader in arrow/record_batch.h

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2360: Component/s: C++ > [C++] Add set_chunksize for RecordBatchReader in arrow/record_batch.h >

[jira] [Updated] (ARROW-3208) [Python] Segmentation fault when reading a Parquet partitioned dataset to a Parquet file

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3208: Labels: parquet (was: ) > [Python] Segmentation fault when reading a Parquet partitioned dataset

[jira] [Updated] (ARROW-3722) [C++] Allow specifying column types to CSV reader

2018-11-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3722: -- Labels: pull-request-available (was: ) > [C++] Allow specifying column types to CSV reader >

[jira] [Updated] (ARROW-3502) [C++] parquet-column_scanner-test failure building ARROW_PARQUET build 11.

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3502: Labels: parquet (was: ) > [C++] parquet-column_scanner-test failure building ARROW_PARQUET build

[jira] [Updated] (ARROW-3731) [R] R API for reading and writing Parquet files

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3731: Labels: parquet (was: ) > [R] R API for reading and writing Parquet files >

[jira] [Updated] (ARROW-3703) [Python] DataFrame.to_parquet crashes if datetime column has time zones

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3703: Labels: parquet (was: ) > [Python] DataFrame.to_parquet crashes if datetime column has time zones

[jira] [Updated] (ARROW-1988) [Python] Extend flavor=spark in Parquet writing to handle INT types

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1988: Labels: parquet (was: ) > [Python] Extend flavor=spark in Parquet writing to handle INT types >

[jira] [Updated] (ARROW-3166) [C++] Consolidate IO interfaces used in arrow/io and parquet-cpp

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3166: Labels: parquet (was: ) > [C++] Consolidate IO interfaces used in arrow/io and parquet-cpp >

[jira] [Updated] (ARROW-3728) [Python] Merging Parquet Files - Pandas Meta in Schema Mismatch

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3728: Labels: parquet (was: ) > [Python] Merging Parquet Files - Pandas Meta in Schema Mismatch >

[jira] [Commented] (ARROW-3525) [Packaging] Remove arrow/ and parquet-cpp/ dependencies in dev/run_docker_compose.sh

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685329#comment-16685329 ] Wes McKinney commented on ARROW-3525: - [~kszucs] was this completed? > [Packaging] Remove arrow/ and

[jira] [Updated] (ARROW-2624) [Python] Random schema and data generator for Arrow conversion and Parquet testing

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2624: Labels: parquet (was: ) > [Python] Random schema and data generator for Arrow conversion and

[jira] [Updated] (ARROW-1848) [Python] Add documentation examples for reading single Parquet files and datasets from HDFS

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1848: Labels: parquet (was: ) > [Python] Add documentation examples for reading single Parquet files

[jira] [Updated] (ARROW-2728) [Python] Support partitioned Parquet datasets using glob-style file paths

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2728: Labels: parquet (was: newbie) > [Python] Support partitioned Parquet datasets using glob-style

[jira] [Updated] (ARROW-3210) [Python] Creating ParquetDataset creates partitioned ParquetFiles with mismatched Parquet schemas

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3210: Labels: parquet (was: ) > [Python] Creating ParquetDataset creates partitioned ParquetFiles with

[jira] [Commented] (ARROW-3722) [C++] Allow specifying column types to CSV reader

2018-11-13 Thread Antoine Pitrou (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685325#comment-16685325 ] Antoine Pitrou commented on ARROW-3722: --- > We also need a way to provide column names (or even

[jira] [Updated] (ARROW-2628) [Python] parquet.write_to_dataset is memory-hungry on large DataFrames

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2628: Labels: parquet (was: ) > [Python] parquet.write_to_dataset is memory-hungry on large DataFrames

[jira] [Updated] (ARROW-1012) [C++] Create implementation of StreamReader that reads from Apache Parquet files

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1012: Labels: parquet (was: ) > [C++] Create implementation of StreamReader that reads from Apache

[jira] [Updated] (ARROW-2366) [Python] Support reading Parquet files having a permutation of column order

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2366: Labels: parquet (was: ) > [Python] Support reading Parquet files having a permutation of column

[jira] [Updated] (ARROW-2038) [Python] Follow-up bug fixes for s3fs Parquet support

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2038: Labels: aws parquet (was: aws) > [Python] Follow-up bug fixes for s3fs Parquet support >

[jira] [Updated] (ARROW-2077) [Python] Document on how to use Storefact & Arrow to read Parquet from S3/Azure/...

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2077: Labels: parquet (was: ) > [Python] Document on how to use Storefact & Arrow to read Parquet from

[jira] [Updated] (ARROW-1682) [Python] Add documentation / example for reading a directory of Parquet files on S3

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1682: Labels: parquet (was: ) > [Python] Add documentation / example for reading a directory of Parquet

[jira] [Assigned] (ARROW-3722) [C++] Allow specifying column types to CSV reader

2018-11-13 Thread Antoine Pitrou (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-3722: - Assignee: Antoine Pitrou > [C++] Allow specifying column types to CSV reader >

[jira] [Updated] (ARROW-1925) [Python] Wrapping PyArrow Table with Numpy without copy

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1925: Summary: [Python] Wrapping PyArrow Table with Numpy without copy (was: Wrapping PyArrow Table

[jira] [Updated] (ARROW-976) [Python] Provide API for defining and reading Parquet datasets with more ad hoc partition schemes

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-976: --- Labels: parquet (was: ) > [Python] Provide API for defining and reading Parquet datasets with more

[jira] [Updated] (ARROW-1925) [Python] Wrapping PyArrow Table with Numpy without copy

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1925: Labels: parquet (was: ) > [Python] Wrapping PyArrow Table with Numpy without copy >

[jira] [Updated] (ARROW-2360) [C++] Add set_chunksize for RecordBatchReader in arrow/record_batch.h

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2360: Summary: [C++] Add set_chunksize for RecordBatchReader in arrow/record_batch.h (was: Add

[jira] [Updated] (ARROW-3762) [C++] Arrow table reads error when overflowing capacity of BinaryArray

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3762: Component/s: Python C++ > [C++] Arrow table reads error when overflowing capacity

[jira] [Updated] (ARROW-2592) [Python] AssertionError in to_pandas()

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2592: Labels: parquet (was: ) > [Python] AssertionError in to_pandas() >

[jira] [Updated] (ARROW-2659) [Python] More graceful reading of empty String columns in ParquetDataset

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2659: Labels: parquet (was: beginner) > [Python] More graceful reading of empty String columns in

[jira] [Updated] (ARROW-3538) [Python] ability to override the automated assignment of uuid for filenames when writing datasets

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3538: Labels: features parquet (was: features) > [Python] ability to override the automated assignment

[jira] [Updated] (ARROW-2098) [Python] Implement "errors as null" option when coercing Python object arrays to Arrow format

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2098: Labels: parquet (was: ) > [Python] Implement "errors as null" option when coercing Python object

[jira] [Updated] (ARROW-2598) [Python] table.to_pandas segfault

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2598: Labels: parquet (was: ) > [Python] table.to_pandas segfault > --

[jira] [Updated] (ARROW-3538) [Python] ability to override the automated assignment of uuid for filenames when writing datasets

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3538: Summary: [Python] ability to override the automated assignment of uuid for filenames when writing

[jira] [Updated] (ARROW-2026) [Python] µs timestamps saved as int64 even if use_deprecated_int96_timestamps=True

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2026: Labels: parquet redshift timestamps (was: redshift timestamps) > [Python] µs timestamps saved as

[jira] [Updated] (ARROW-2079) [Python] Possibly use `_common_metadata` for schema if `_metadata` isn't available

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2079: Summary: [Python] Possibly use `_common_metadata` for schema if `_metadata` isn't available (was:

[jira] [Updated] (ARROW-3139) [Python] ArrowIOError: Arrow error: Capacity error during read

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3139: Labels: parquet (was: ) > [Python] ArrowIOError: Arrow error: Capacity error during read >

[jira] [Reopened] (ARROW-3139) [Python] ArrowIOError: Arrow error: Capacity error during read

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reopened ARROW-3139: - > [Python] ArrowIOError: Arrow error: Capacity error during read >

[jira] [Updated] (ARROW-2710) [Python] pyarrow.lib.ArrowIOError when running PyTorch DataLoader in multiprocessing

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2710: Labels: parquet (was: ) > [Python] pyarrow.lib.ArrowIOError when running PyTorch DataLoader in >

[jira] [Updated] (ARROW-2591) [Python] Segmentationfault issue in pq.write_table

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2591: Labels: parquet (was: ) > [Python] Segmentationfault issue in pq.write_table >

[jira] [Closed] (ARROW-3139) [Python] ArrowIOError: Arrow error: Capacity error during read

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-3139. --- Resolution: Duplicate duplicate of ARROW-3762 (formerly PARQUET-1239) > [Python] ArrowIOError:

[jira] [Updated] (ARROW-3585) [Python] Update the documentation about Schema & Metadata usage

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3585: Labels: beginner documentation easyfix newbie parquet (was: beginner documentation easyfix

[jira] [Updated] (ARROW-3139) [Python] ArrowIOError: Arrow error: Capacity error during read

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3139: Summary: [Python] ArrowIOError: Arrow error: Capacity error during read (was:

[jira] [Updated] (ARROW-3652) [Python] CategoricalIndex is lost after reading back

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3652: Labels: parquet (was: ) > [Python] CategoricalIndex is lost after reading back >

[jira] [Updated] (ARROW-3654) [Python] Column with CategoricalIndex fails to be read back

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3654: Labels: parquet (was: ) > [Python] Column with CategoricalIndex fails to be read back >

[jira] [Updated] (ARROW-3650) [Python] Mixed column indexes are read back as strings

2018-11-13 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3650: Labels: parquet (was: ) > [Python] Mixed column indexes are read back as strings >

  1   2   >