[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-09 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281869#comment-17281869 ] Pac A. He edited comment on ARROW-11456 at 2/9/21, 4:22 PM: We have seen

[jira] [Commented] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-09 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281869#comment-17281869 ] Pac A. He commented on ARROW-11456: --- We have seen that there are one or more pyarrow limits at

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-09 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281869#comment-17281869 ] Pac A. He edited comment on ARROW-11456 at 2/9/21, 4:22 PM: We have seen

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279918#comment-17279918 ] Pac A. He edited comment on ARROW-11456 at 2/5/21, 7:01 PM: I see. I have

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading or writing a large parquet file, I have this error: {noformat} df:

[jira] [Commented] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279918#comment-17279918 ] Pac A. He commented on ARROW-11456: --- I see. I have now added code to reproduce the issue. Basically,

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Environment: pyarrow 3.0.0 / 2.0.0 pandas 1.1.5 / 1.2.1 smart_open 4.1.2 python 3.8.6 was: pyarrow

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error: {noformat} df: Final =

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error: {noformat} df: Final =

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error: {noformat} df: Final =

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error: {noformat} df: Final =

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error: {noformat} df: Final =

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final =

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-04 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278976#comment-17278976 ] Pac A. He edited comment on ARROW-11456 at 2/4/21, 5:09 PM: Unfortunately I

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-04 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278976#comment-17278976 ] Pac A. He edited comment on ARROW-11456 at 2/4/21, 5:09 PM: Unfortunately I

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-04 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278976#comment-17278976 ] Pac A. He edited comment on ARROW-11456 at 2/4/21, 5:08 PM: Unfortunately I

[jira] [Commented] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-04 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278976#comment-17278976 ] Pac A. He commented on ARROW-11456: --- Unfortunately I have not been able to produce a reproducible

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final =

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277234#comment-17277234 ] Pac A. He edited comment on ARROW-11456 at 2/2/21, 4:12 PM: For what it's

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final =

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final =

[jira] [Commented] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277234#comment-17277234 ] Pac A. He commented on ARROW-11456: --- For what it's worth, {{fastparquet}} v0.5.0 had no trouble at all

[jira] [Updated] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11464: -- Description: The {{*pyarrow.parquet.read_pandas*}} 

[jira] [Updated] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11464: -- Description: The {{*pyarrow.parquet.read_pandas*}} 

[jira] [Updated] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11464: -- Description: The {{*pyarrow.parquet.read_pandas*}} 

[jira] [Updated] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11464: -- Description: The {{*pyarrow.parquet.read_pandas*}} 

[jira] [Updated] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11464: -- Description: The {{*pyarrow.parquet.read_pandas*}} 

[jira] [Updated] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11464: -- Description: The {{*pyarrow.parquet.read_pandas*}} 

[jira] [Created] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-01 Thread Pac A. He (Jira)
Pac A. He created ARROW-11464: - Summary: [Python] pyarrow.parquet.read_pandas doesn't conform to its docs Key: ARROW-11464 URL: https://issues.apache.org/jira/browse/ARROW-11464 Project: Apache Arrow

[jira] [Updated] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11464: -- Description: The {{*pyarrow.parquet.read_pandas*}} 

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276501#comment-17276501 ] Pac A. He edited comment on ARROW-11456 at 2/1/21, 5:21 PM:

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276501#comment-17276501 ] Pac A. He edited comment on ARROW-11456 at 2/1/21, 5:21 PM:

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276501#comment-17276501 ] Pac A. He edited comment on ARROW-11456 at 2/1/21, 5:20 PM:

[jira] [Commented] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276501#comment-17276501 ] Pac A. He commented on ARROW-11456: --- [~jorisvandenbossche] This is very difficult in this case because

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final =

[jira] [Updated] (ARROW-11456) OSError: Capacity error: BinaryBuilder cannot reserve space for more than 2147483646 child elements

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final =

[jira] [Updated] (ARROW-11456) OSError: Capacity error: BinaryBuilder cannot reserve space for more than 2147483646 child elements

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final =

[jira] [Updated] (ARROW-11456) OSError: Capacity error: BinaryBuilder cannot reserve space for more than 2147483646 child elements

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Environment: pyarrow 3.0.0 / 2.0.0 pandas 1.2.1 python 3.8.6 was: pyarrow 3.0.0 / 2.0.0 pandas

[jira] [Created] (ARROW-11456) OSError: Capacity error: BinaryBuilder cannot reserve space for more than 2147483646 child elements

2021-02-01 Thread Pac A. He (Jira)
Pac A. He created ARROW-11456: - Summary: OSError: Capacity error: BinaryBuilder cannot reserve space for more than 2147483646 child elements Key: ARROW-11456 URL: https://issues.apache.org/jira/browse/ARROW-11456

[jira] [Updated] (ARROW-11456) OSError: Capacity error: BinaryBuilder cannot reserve space for more than 2147483646 child elements

2021-02-01 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final =

[jira] [Closed] (ARROW-10152) "ImportError: liborc.so" with miniconda pyarrow=1.0.1 when "import pyarrow"

2020-10-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He closed ARROW-10152. - > "ImportError: liborc.so" with miniconda pyarrow=1.0.1 when "import pyarrow" >

[jira] [Comment Edited] (ARROW-10152) "ImportError: liborc.so" with miniconda pyarrow=1.0.1 when "import pyarrow"

2020-10-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208154#comment-17208154 ] Pac A. He edited comment on ARROW-10152 at 10/5/20, 4:03 PM: - There is

[jira] [Commented] (ARROW-10152) "ImportError: liborc.so" with miniconda pyarrow=1.0.1 when "import pyarrow"

2020-10-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208154#comment-17208154 ] Pac A. He commented on ARROW-10152: --- There is nothing wrong with `environment.yml`. The issue was

[jira] [Updated] (ARROW-10152) "ImportError: liborc.so" with miniconda pyarrow=1.0.1 when "import pyarrow"

2020-10-05 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-10152: -- Description: I cannot run "{{import pyarrow}}" with {{pyarrow=1.0.1}} in dockerized miniconda. It

[jira] [Created] (ARROW-10152) "ImportError: liborc.so" with miniconda pyarrow=1.0.1 when "import pyarrow"

2020-10-01 Thread Pac A. He (Jira)
Pac A. He created ARROW-10152: - Summary: "ImportError: liborc.so" with miniconda pyarrow=1.0.1 when "import pyarrow" Key: ARROW-10152 URL: https://issues.apache.org/jira/browse/ARROW-10152 Project: