[jira] [Closed] (ARROW-17590) Lower memory usage with filters

2022-09-02 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin closed ARROW-17590. --- Resolution: Duplicate > Lower memory usage with filters > --- > > Key: A

[jira] [Commented] (ARROW-17590) Lower memory usage with filters

2022-09-02 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599489#comment-17599489 ] Yin commented on ARROW-17590: - Yep, pa.total_allocated_bytes 289.74639892578125 MB dt.nbytes

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-09-01 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Attachment: sample-1.py > Lower memory usage with filters > --- > > Ke

[jira] [Comment Edited] (ARROW-17590) Lower memory usage with filters

2022-09-01 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599104#comment-17599104 ] Yin edited comment on ARROW-17590 at 9/1/22 7:07 PM: - Hi Weston,  Ju

[jira] [Commented] (ARROW-17590) Lower memory usage with filters

2022-09-01 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599104#comment-17599104 ] Yin commented on ARROW-17590: - Hi Weston,  Just saw your comment. Will try it in the sample

[jira] [Commented] (ARROW-17590) Lower memory usage with filters

2022-09-01 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599102#comment-17599102 ] Yin commented on ARROW-17590: - Hi Will,  I am using pandas.read_parquet, which goes to pyarr

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-09-01 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Attachment: sample.py > Lower memory usage with filters > --- > > Key:

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-09-01 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a parquet file (about 23MB with 250K rows and 600 object/string columns with lots of

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-09-01 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a parquet file (about 23MB with 250K rows and 600 object/string columns with lots of

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a parquet file (about 23MB with 250K rows and 600 object/string columns with lots of

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a parquet file (about 23mb with 250K rows and 600 object/string columns with lots of

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a parquet file (about 23mb with 250K rows and 600 object/string columns with lots of

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a parquet file (about 23mb with 250K rows and 600 object/string columns with lots of

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a parquet file (about 23mb with 250K rows and 600 object/string columns with lots of

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a large parquet file with filter for a small number of rows, the memory usage is pre

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a large parquet file with filter for a small number of rows, the memory usage is pre

[jira] [Updated] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-17590: Description: Hi, When I read a large parquet file with filter for a small number of rows, the memory usage is pre

[jira] [Created] (ARROW-17590) Lower memory usage with filters

2022-08-31 Thread Yin (Jira)
Yin created ARROW-17590: --- Summary: Lower memory usage with filters Key: ARROW-17590 URL: https://issues.apache.org/jira/browse/ARROW-17590 Project: Apache Arrow Issue Type: Improvement Repo

[jira] [Commented] (ARROW-15724) [C++] Reduce directory and file IO when reading partition parquet dataset with partition key filters

2022-02-17 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494296#comment-17494296 ] Yin commented on ARROW-15724: - David, Thanks. Good to know.  Looks like both FileSystemDatas

[jira] [Updated] (ARROW-15724) [C++] Reduce directory and file IO when reading partition parquet dataset with partition key filters

2022-02-17 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-15724: Summary: [C++] Reduce directory and file IO when reading partition parquet dataset with partition key filters (wa

[jira] [Updated] (ARROW-15724) reduce directory and file IO when reading partition parquet dataset

2022-02-17 Thread Yin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin updated ARROW-15724: Description: Hi, It seems that Arrow accesses all partitions directories (and even each parquet files), including

[jira] [Created] (ARROW-15724) reduce directory and file IO when reading partition parquet dataset

2022-02-17 Thread Yin (Jira)
Yin created ARROW-15724: --- Summary: reduce directory and file IO when reading partition parquet dataset Key: ARROW-15724 URL: https://issues.apache.org/jira/browse/ARROW-15724 Project: Apache Arrow Iss