I am running into a problem converting a csv file into a parquet file in
chunks, where one of the string columns is null for the first several
million rows.
Self-contained dummy example:
csv_file = '/tmp/df.csv'
parquet_file = '/tmp/df.parquet'
df = pd.DataFrame([np.nan] * 3 + ['hello'],
Robert Nishihara created ARROW-1194:
---
Summary: Trouble deserializing a pandas DataFrame from a PyArrow
buffer.
Key: ARROW-1194
URL: https://issues.apache.org/jira/browse/ARROW-1194
Project: Apache
Kouhei Sutou created ARROW-1193:
---
Summary: [C++] Support pkg-config forarrow_python.so
Key: ARROW-1193
URL: https://issues.apache.org/jira/browse/ARROW-1193
Project: Apache Arrow
Issue Type:
Steven Phillips created ARROW-1192:
--
Summary: [JAVA] Improve splitAndTransfer performance for List and
Union vectors
Key: ARROW-1192
URL: https://issues.apache.org/jira/browse/ARROW-1192
Project:
Steven Phillips created ARROW-1191:
--
Summary: [JAVA] Implement getField() method for the complex readers
Key: ARROW-1191
URL: https://issues.apache.org/jira/browse/ARROW-1191
Project: Apache Arrow