Young-Jun Ko created ARROW-1925: ----------------------------------- Summary: Wrapping PyArrow Table with Numpy without copy Key: ARROW-1925 URL: https://issues.apache.org/jira/browse/ARROW-1925 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 0.7.1 Reporter: Young-Jun Ko Priority: Minor
The scenario is the following: I have a parquet file, which has a column containing a float array of constant size. So it can be thought of as a matrix. When I read the parquet file, the way I currently access it, is to convert it to pandas, extract the values, giving me a list of np.array and then doing np.vstack to get the matrix. This involves a copy that would be nice to avoid. When a parquet file (or more generally a parquet dataset) is read, would the values of the array column be contiguous in memory, so that a view on the data could be created without having to copy? That would be neat. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)