[jira] [Comment Edited] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file

2020-10-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217917#comment-17217917 ] Joris Van den Bossche edited comment on ARROW-10344 at 10/20/20, 8:38 PM:

[jira] [Commented] (ARROW-10056) [Python] PyArrow writes invalid Feather v2 file: OSError: Verification of flatbuffer-encoded Footer failed.

2020-10-20 Thread Gert Hulselmans (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217904#comment-17217904 ] Gert Hulselmans commented on ARROW-10056: - I just tested the code you posted, and indeed this

[jira] [Created] (ARROW-10359) [R] Don't download linux binary if system requirements not met

2020-10-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-10359: --- Summary: [R] Don't download linux binary if system requirements not met Key: ARROW-10359 URL: https://issues.apache.org/jira/browse/ARROW-10359 Project: Apache

[jira] [Resolved] (ARROW-10320) [Rust] Convert RecordBatchIterator to a Stream

2020-10-20 Thread Jira
[ https://issues.apache.org/jira/browse/ARROW-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão resolved ARROW-10320. -- Resolution: Fixed Issue resolved by pull request 8473

[jira] [Updated] (ARROW-10358) [R] Followups to 2.0.0 release

2020-10-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10358: --- Labels: pull-request-available (was: ) > [R] Followups to 2.0.0 release >

[jira] [Created] (ARROW-10358) [R] Followups to 2.0.0 release

2020-10-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-10358: --- Summary: [R] Followups to 2.0.0 release Key: ARROW-10358 URL: https://issues.apache.org/jira/browse/ARROW-10358 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-10357) [R][CI] Add nightly job that checks reverse dependencies

2020-10-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-10357: --- Summary: [R][CI] Add nightly job that checks reverse dependencies Key: ARROW-10357 URL: https://issues.apache.org/jira/browse/ARROW-10357 Project: Apache Arrow

[jira] [Commented] (ARROW-10309) [Ruby] gem install red-arrow fails

2020-10-20 Thread Bhargav Parsi (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217804#comment-17217804 ] Bhargav Parsi commented on ARROW-10309: --- Sorry about that, We install rvm first and use   `rvm

[jira] [Created] (ARROW-10356) [Rust] [DataFusion] Add support for is_in

2020-10-20 Thread Jira
Jorge Leitão created ARROW-10356: Summary: [Rust] [DataFusion] Add support for is_in Key: ARROW-10356 URL: https://issues.apache.org/jira/browse/ARROW-10356 Project: Apache Arrow Issue Type:

[jira] [Resolved] (ARROW-10318) [C++] Use pimpl idiom in CSV parser

2020-10-20 Thread Ben Kietzman (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman resolved ARROW-10318. -- Resolution: Fixed Issue resolved by pull request 8493

[jira] [Updated] (ARROW-10354) [Rust] [DataFusion] Add support for regex extract

2020-10-20 Thread Jira
[ https://issues.apache.org/jira/browse/ARROW-10354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão updated ARROW-10354: - Labels: beginner (was: ) > [Rust] [DataFusion] Add support for regex extract >

[jira] [Created] (ARROW-10354) [Rust] [DataFusion] Add support for regex extract

2020-10-20 Thread Jira
Jorge Leitão created ARROW-10354: Summary: [Rust] [DataFusion] Add support for regex extract Key: ARROW-10354 URL: https://issues.apache.org/jira/browse/ARROW-10354 Project: Apache Arrow

[jira] [Commented] (ARROW-1614) [C++] Add a Tensor logical value type with constant dimensions, implemented using ExtensionType

2020-10-20 Thread Christian Hudon (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217666#comment-17217666 ] Christian Hudon commented on ARROW-1614: I manifestly haven't had time to work on this yet,

[jira] [Resolved] (ARROW-10338) [Rust]: Use const fn for applicable methods

2020-10-20 Thread Jira
[ https://issues.apache.org/jira/browse/ARROW-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Leitão resolved ARROW-10338. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 8487

[jira] [Commented] (ARROW-10353) [C++] Parquet decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217634#comment-17217634 ] Wes McKinney commented on ARROW-10353: -- Note that DataPageV2 is not recommended for production use

[jira] [Commented] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file

2020-10-20 Thread Gert Hulselmans (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217629#comment-17217629 ] Gert Hulselmans commented on ARROW-10344: - Thanks. I can still do the filtering afterwards, so

[jira] [Updated] (ARROW-10353) [C++] Parquet decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-10353: --- Summary: [C++] Parquet decompresses DataPageV2 pages even if is_compressed==0 (was: Arrow

[jira] [Comment Edited] (ARROW-10353) Arrow Parquet Cpp decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217628#comment-17217628 ] Antoine Pitrou edited comment on ARROW-10353 at 10/20/20, 2:17 PM: ---

[jira] [Updated] (ARROW-10353) Arrow Parquet Cpp decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-10353: --- Fix Version/s: 3.0.0 > Arrow Parquet Cpp decompresses DataPageV2 pages even if

[jira] [Commented] (ARROW-10353) Arrow Parquet Cpp decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217628#comment-17217628 ] Antoine Pitrou commented on ARROW-10353: Thanks for the report. Do you want to submit a pR? >

[jira] [Updated] (ARROW-10353) Arrow Parquet Cpp decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Finis updated ARROW-10353: -- Description: According to the parquet-format specification, DataPageV2 pages have an is_compressed

[jira] [Updated] (ARROW-10353) Arrow Parquet Cpp decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Finis updated ARROW-10353: -- Description: According to the parquet-format specification, DataPageV2 pages have an is_compressed

[jira] [Updated] (ARROW-10328) [C++] Consider using fast-double-parser

2020-10-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10328: --- Labels: pull-request-available (was: ) > [C++] Consider using fast-double-parser >

[jira] [Commented] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file

2020-10-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217594#comment-17217594 ] Joris Van den Bossche commented on ARROW-10344: --- bq. For filtering the data, is there an

[jira] [Commented] (ARROW-10056) [Python] PyArrow writes invalid Feather v2 file: OSError: Verification of flatbuffer-encoded Footer failed.

2020-10-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217586#comment-17217586 ] Joris Van den Bossche commented on ARROW-10056: --- The pandas metadata is required for a

[jira] [Updated] (ARROW-10318) [C++] Use pimpl idiom in CSV parser

2020-10-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-10318: --- Labels: pull-request-available (was: ) > [C++] Use pimpl idiom in CSV parser >

[jira] [Updated] (ARROW-10353) Arrow Parquet Cpp decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Jan Finis (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Finis updated ARROW-10353: -- Description: According to the parquet-format specification, DataPageV2 pages have an is_compressed

[jira] [Created] (ARROW-10353) Arrow Parquet Cpp decompresses DataPageV2 pages even if is_compressed==0

2020-10-20 Thread Jan Finis (Jira)
Jan Finis created ARROW-10353: - Summary: Arrow Parquet Cpp decompresses DataPageV2 pages even if is_compressed==0 Key: ARROW-10353 URL: https://issues.apache.org/jira/browse/ARROW-10353 Project: Apache

[jira] [Comment Edited] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file

2020-10-20 Thread Gert Hulselmans (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217425#comment-17217425 ] Gert Hulselmans edited comment on ARROW-10344 at 10/20/20, 8:27 AM:

[jira] [Commented] (ARROW-10345) [C++] NaN breaks sorting

2020-10-20 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217426#comment-17217426 ] Antoine Pitrou commented on ARROW-10345: > Maybe it's better to partition NaN to end of array

[jira] [Commented] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file

2020-10-20 Thread Gert Hulselmans (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217425#comment-17217425 ] Gert Hulselmans commented on ARROW-10344: - I was using Feather v2, but had to switch back to

[jira] [Comment Edited] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file

2020-10-20 Thread Gert Hulselmans (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217425#comment-17217425 ] Gert Hulselmans edited comment on ARROW-10344 at 10/20/20, 8:24 AM:

[jira] [Commented] (ARROW-10056) [Python] PyArrow writes invalid Feather v2 file: OSError: Verification of flatbuffer-encoded Footer failed.

2020-10-20 Thread Gert Hulselmans (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217418#comment-17217418 ] Gert Hulselmans commented on ARROW-10056: - Does the pandas dataframe contain anything that is

[jira] [Created] (ARROW-10352) [CI][Gandiva] Travis osx nightly build is failing due to homebrew llvm upgrade

2020-10-20 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-10352: -- Summary: [CI][Gandiva] Travis osx nightly build is failing due to homebrew llvm upgrade Key: ARROW-10352 URL: https://issues.apache.org/jira/browse/ARROW-10352

[jira] [Commented] (ARROW-10344) [Python] Get all columns names (or schema) from Feather file, before loading whole Feather file

2020-10-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217350#comment-17217350 ] Joris Van den Bossche commented on ARROW-10344: --- [~ghuls] good question, this is not

[jira] [Assigned] (ARROW-10345) [C++] NaN breaks sorting

2020-10-20 Thread Yibo Cai (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibo Cai reassigned ARROW-10345: Assignee: Yibo Cai > [C++] NaN breaks sorting > > >

[jira] [Commented] (ARROW-10345) [C++] NaN breaks sorting

2020-10-20 Thread Yibo Cai (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217337#comment-17217337 ] Yibo Cai commented on ARROW-10345: -- Numpy uses a special compare function to treat NaN as largest