Young-Jun Ko created ARROW-1925:
-----------------------------------

             Summary: Wrapping PyArrow Table with Numpy without copy
                 Key: ARROW-1925
                 URL: https://issues.apache.org/jira/browse/ARROW-1925
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
    Affects Versions: 0.7.1
            Reporter: Young-Jun Ko
            Priority: Minor


The scenario is the following:
I have a parquet file, which has a column containing a float array of constant 
size.
So it can be thought of as a matrix.
When I read the parquet file, the way I currently access it, is to convert it 
to pandas, extract the values, giving me a list of np.array and then doing 
np.vstack to get the matrix.
This involves a copy that would be nice to avoid.
When a parquet file (or more generally a parquet dataset) is read, would the 
values of the array column be contiguous in memory, so that a view on the data 
could be created without having to copy? That would be neat.
Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to