Error when converting csv to parquet in chunks, with the first chunk being all nulls

2017-07-07 Thread Alexey Strokach
I am running into a problem converting a csv file into a parquet file in chunks, where one of the string columns is null for the first several million rows. Self-contained dummy example: csv_file = '/tmp/df.csv' parquet_file = '/tmp/df.parquet' df = pd.DataFrame([np.nan] * 3 + ['hello'],

[jira] [Created] (ARROW-1194) Trouble deserializing a pandas DataFrame from a PyArrow buffer.

2017-07-07 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-1194: --- Summary: Trouble deserializing a pandas DataFrame from a PyArrow buffer. Key: ARROW-1194 URL: https://issues.apache.org/jira/browse/ARROW-1194 Project: Apache

[jira] [Created] (ARROW-1193) [C++] Support pkg-config forarrow_python.so

2017-07-07 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-1193: --- Summary: [C++] Support pkg-config forarrow_python.so Key: ARROW-1193 URL: https://issues.apache.org/jira/browse/ARROW-1193 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-1192) [JAVA] Improve splitAndTransfer performance for List and Union vectors

2017-07-07 Thread Steven Phillips (JIRA)
Steven Phillips created ARROW-1192: -- Summary: [JAVA] Improve splitAndTransfer performance for List and Union vectors Key: ARROW-1192 URL: https://issues.apache.org/jira/browse/ARROW-1192 Project:

[jira] [Created] (ARROW-1191) [JAVA] Implement getField() method for the complex readers

2017-07-07 Thread Steven Phillips (JIRA)
Steven Phillips created ARROW-1191: -- Summary: [JAVA] Implement getField() method for the complex readers Key: ARROW-1191 URL: https://issues.apache.org/jira/browse/ARROW-1191 Project: Apache Arrow