[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin closed ARROW-17590.
---
Resolution: Duplicate
> Lower memory usage with filters
> ---
>
> Key: A
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599489#comment-17599489
]
Yin commented on ARROW-17590:
-
Yep,
pa.total_allocated_bytes 289.74639892578125 MB dt.nbytes
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Attachment: sample-1.py
> Lower memory usage with filters
> ---
>
> Ke
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599104#comment-17599104
]
Yin edited comment on ARROW-17590 at 9/1/22 7:07 PM:
-
Hi Weston, Ju
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599104#comment-17599104
]
Yin commented on ARROW-17590:
-
Hi Weston, Just saw your comment. Will try it in the sample
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599102#comment-17599102
]
Yin commented on ARROW-17590:
-
Hi Will,
I am using pandas.read_parquet, which goes to pyarr
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Attachment: sample.py
> Lower memory usage with filters
> ---
>
> Key:
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a parquet file (about 23MB with 250K rows and 600 object/string
columns with lots of
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a parquet file (about 23MB with 250K rows and 600 object/string
columns with lots of
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a parquet file (about 23MB with 250K rows and 600 object/string
columns with lots of
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a parquet file (about 23mb with 250K rows and 600 object/string
columns with lots of
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a parquet file (about 23mb with 250K rows and 600 object/string
columns with lots of
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a parquet file (about 23mb with 250K rows and 600 object/string
columns with lots of
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a parquet file (about 23mb with 250K rows and 600 object/string
columns with lots of
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a large parquet file with filter for a small number of rows, the
memory usage is pre
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a large parquet file with filter for a small number of rows, the
memory usage is pre
[
https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-17590:
Description:
Hi,
When I read a large parquet file with filter for a small number of rows, the
memory usage is pre
Yin created ARROW-17590:
---
Summary: Lower memory usage with filters
Key: ARROW-17590
URL: https://issues.apache.org/jira/browse/ARROW-17590
Project: Apache Arrow
Issue Type: Improvement
Repo
[
https://issues.apache.org/jira/browse/ARROW-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494296#comment-17494296
]
Yin commented on ARROW-15724:
-
David, Thanks. Good to know.
Looks like both FileSystemDatas
[
https://issues.apache.org/jira/browse/ARROW-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-15724:
Summary: [C++] Reduce directory and file IO when reading partition parquet
dataset with partition key filters (wa
[
https://issues.apache.org/jira/browse/ARROW-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin updated ARROW-15724:
Description:
Hi,
It seems that Arrow accesses all partitions directories (and even each parquet
files), including
Yin created ARROW-15724:
---
Summary: reduce directory and file IO when reading partition
parquet dataset
Key: ARROW-15724
URL: https://issues.apache.org/jira/browse/ARROW-15724
Project: Apache Arrow
Iss
22 matches
Mail list logo