[jira] [Commented] (ARROW-17984) pq.read_table doesn't seem to be thread safe

2022-11-03 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628471#comment-17628471 ] Ziheng Wang commented on ARROW-17984: - ``` Thread 42 (Thread 0x7fd1d77fe700 (LWP 10

[jira] [Commented] (ARROW-17984) pq.read_table doesn't seem to be thread safe

2022-10-28 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625816#comment-17625816 ] Ziheng Wang commented on ARROW-17984: - I have attached a crash file. You can unpack

[jira] [Updated] (ARROW-17984) pq.read_table doesn't seem to be thread safe

2022-10-28 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17984: Attachment: _usr_bin_python3.8.1000.crash > pq.read_table doesn't seem to be thread safe > ---

[jira] [Created] (ARROW-18105) Arrow Flight SegFault

2022-10-19 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-18105: --- Summary: Arrow Flight SegFault Key: ARROW-18105 URL: https://issues.apache.org/jira/browse/ARROW-18105 Project: Apache Arrow Issue Type: Bug Componen

[jira] [Commented] (ARROW-17984) pq.read_table doesn't seem to be thread safe

2022-10-11 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616016#comment-17616016 ] Ziheng Wang commented on ARROW-17984: - Unfortunately I cannot figure out how to get

[jira] [Updated] (ARROW-17984) pq.read_table doesn't seem to be thread safe

2022-10-10 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17984: Description: Before PR: [https://github.com/apache/arrow/pull/13799] gets merged in master, I am

[jira] [Updated] (ARROW-17984) pq.read_table doesn't seem to be thread safe

2022-10-10 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17984: Description: Before PR: [https://github.com/apache/arrow/pull/13799] gets merged in master, I am

[jira] [Created] (ARROW-17984) pq.read_table doesn't seem to be thread safe

2022-10-10 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17984: --- Summary: pq.read_table doesn't seem to be thread safe Key: ARROW-17984 URL: https://issues.apache.org/jira/browse/ARROW-17984 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-17529) Clean up how the CSV reader handles the first buffer

2022-08-25 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17529: --- Summary: Clean up how the CSV reader handles the first buffer Key: ARROW-17529 URL: https://issues.apache.org/jira/browse/ARROW-17529 Project: Apache Arrow Iss

[jira] [Updated] (ARROW-17481) Major performance improvements to CSV reading from S3

2022-08-19 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17481: Description: The current dataset reader for CSV is pretty slow on EC2 reading from S3. EC2 instan

[jira] [Updated] (ARROW-17481) [C++] [Python] Major performance improvements to CSV reading from S3

2022-08-19 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17481: Summary: [C++] [Python] Major performance improvements to CSV reading from S3 (was: Major perform

[jira] [Created] (ARROW-17481) Major performance improvements to CSV reading from S3

2022-08-19 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17481: --- Summary: Major performance improvements to CSV reading from S3 Key: ARROW-17481 URL: https://issues.apache.org/jira/browse/ARROW-17481 Project: Apache Arrow Is

[jira] [Updated] (ARROW-17380) [C++] [Python] Tag record batches with start_byte and end_byte infromation

2022-08-10 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17380: Summary: [C++] [Python] Tag record batches with start_byte and end_byte infromation (was: Tag rec

[jira] [Created] (ARROW-17380) Tag record batches with start_byte and end_byte infromation

2022-08-10 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17380: --- Summary: Tag record batches with start_byte and end_byte infromation Key: ARROW-17380 URL: https://issues.apache.org/jira/browse/ARROW-17380 Project: Apache Arrow

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576926#comment-17576926 ] Ziheng Wang commented on ARROW-17313: - There is no physical way you can do this with

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576079#comment-17576079 ] Ziheng Wang commented on ARROW-17313: - Ideally we update the Dataset Scanner to be a

[jira] [Updated] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17313: Description: Sometimes it's desirable to just read a portion of a CSV. The best way to do that is

[jira] [Updated] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17313: Description: Sometimes it's desirable to just read a portion of a CSV. The best way to do that is

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575967#comment-17575967 ] Ziheng Wang commented on ARROW-17313: - Also this will not support compressed formats

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575916#comment-17575916 ] Ziheng Wang commented on ARROW-17313: - Ah I meant what we should do about the linbre

[jira] [Commented] (ARROW-17313) Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575862#comment-17575862 ] Ziheng Wang commented on ARROW-17313: - [~apitrou] can you elaborate a bit on your wa

[jira] [Commented] (ARROW-17313) Add Byte Range to CSV Reader ReadOptions

2022-08-04 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575545#comment-17575545 ] Ziheng Wang commented on ARROW-17313: - I think if you support things like this, then

[jira] [Commented] (ARROW-17313) Add Byte Range to CSV Reader ReadOptions

2022-08-04 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575489#comment-17575489 ] Ziheng Wang commented on ARROW-17313: - My proposal is that we will allow additional

[jira] [Comment Edited] (ARROW-17313) Add Byte Range to CSV Reader ReadOptions

2022-08-04 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575489#comment-17575489 ] Ziheng Wang edited comment on ARROW-17313 at 8/5/22 12:02 AM:

[jira] [Created] (ARROW-17313) Add Byte Range to CSV Reader ReadOptions

2022-08-04 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17313: --- Summary: Add Byte Range to CSV Reader ReadOptions Key: ARROW-17313 URL: https://issues.apache.org/jira/browse/ARROW-17313 Project: Apache Arrow Issue Type: Imp

[jira] [Updated] (ARROW-17299) [C++] [Python] Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters

2022-08-03 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-17299: Summary: [C++] [Python] Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead pa

[jira] [Created] (ARROW-17299) Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters

2022-08-03 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17299: --- Summary: Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters Key: ARROW-17299 URL: https://issues.apache.org/jira/browse/ARROW-17299 Proj

[jira] [Assigned] (ARROW-14635) [C++][Dataset] Devise a mechanism to limit the total "system ram" (process + cache) used by dataset writes

2022-07-19 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-14635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang reassigned ARROW-14635: --- Assignee: Ziheng Wang > [C++][Dataset] Devise a mechanism to limit the total "system ram" (

[jira] [Closed] (ARROW-17114) [Python][C++] O_DIRECT write support

2022-07-19 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang closed ARROW-17114. --- Resolution: Duplicate > [Python][C++] O_DIRECT write support >

[jira] [Assigned] (ARROW-17114) [Python][C++] O_DIRECT write support

2022-07-18 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang reassigned ARROW-17114: --- Assignee: Ziheng Wang > [Python][C++] O_DIRECT write support > ---

[jira] [Created] (ARROW-17114) [Python][C++] O_DIRECT write support

2022-07-18 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17114: --- Summary: [Python][C++] O_DIRECT write support Key: ARROW-17114 URL: https://issues.apache.org/jira/browse/ARROW-17114 Project: Apache Arrow Issue Type: New Fe

[jira] [Assigned] (ARROW-16521) [C++][R] Configure curl timeout policy for S3

2022-06-13 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang reassigned ARROW-16521: --- Assignee: Ziheng Wang > [C++][R] Configure curl timeout policy for S3 > ---

[jira] [Commented] (ARROW-16037) Possible memory leak in compute.take

2022-03-28 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513609#comment-17513609 ] Ziheng Wang commented on ARROW-16037: - I am on ubuntu pa.default_memory_pool().backe

[jira] [Commented] (ARROW-16037) Possible memory leak in compute.take

2022-03-28 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513573#comment-17513573 ] Ziheng Wang commented on ARROW-16037: - Does not help. mem usage 179580928 8000

[jira] [Updated] (ARROW-16037) Possible memory leak in compute.take

2022-03-26 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziheng Wang updated ARROW-16037: Priority: Blocker (was: Major) > Possible memory leak in compute.take > -

[jira] [Created] (ARROW-16037) Possible memory leak in compute.take

2022-03-26 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-16037: --- Summary: Possible memory leak in compute.take Key: ARROW-16037 URL: https://issues.apache.org/jira/browse/ARROW-16037 Project: Apache Arrow Issue Type: Bug