[jira] [Commented] (PARQUET-1698) [C++] Add reader option to pre-buffer entire serialized row group into memory

2020-01-13 Thread David Li (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014797#comment-17014797 ] David Li commented on PARQUET-1698: --- Probably there are a few layers: * The file I/O layer should

[jira] [Commented] (PARQUET-1698) [C++] Add reader option to pre-buffer entire serialized row group into memory

2020-01-13 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014781#comment-17014781 ] Wes McKinney commented on PARQUET-1698: --- Currently in the C++ library, IO calls are issued

[jira] [Commented] (PARQUET-1698) [C++] Add reader option to pre-buffer entire serialized row group into memory

2020-01-13 Thread Jacques Nadeau (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014777#comment-17014777 ] Jacques Nadeau commented on PARQUET-1698: - In our internal work we actually separate this out

[jira] [Commented] (PARQUET-1698) [C++] Add reader option to pre-buffer entire serialized row group into memory

2020-01-13 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014744#comment-17014744 ] Wes McKinney commented on PARQUET-1698: --- I think the pre-buffering should probably be implemented

[jira] [Moved] (PARQUET-1766) [C++] parquet NaN/null double statistics can result in endless loop

2020-01-13 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney moved ARROW-7376 to PARQUET-1766: -- Component/s: (was: C++) parquet-cpp

[jira] [Commented] (PARQUET-1698) [C++] Add reader option to pre-buffer entire serialized row group into memory

2020-01-13 Thread Deepak Majeti (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014666#comment-17014666 ] Deepak Majeti commented on PARQUET-1698: How about adding API to the _RowGroupReader_ that will

[jira] [Comment Edited] (PARQUET-1698) [C++] Add reader option to pre-buffer entire serialized row group into memory

2020-01-13 Thread Deepak Majeti (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014666#comment-17014666 ] Deepak Majeti edited comment on PARQUET-1698 at 1/13/20 9:36 PM: - How

[jira] [Commented] (PARQUET-1698) [C++] Add reader option to pre-buffer entire serialized row group into memory

2020-01-13 Thread David Li (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014647#comment-17014647 ] David Li commented on PARQUET-1698: --- [~wesm] agreed, this optimization can be generic over the

[jira] [Commented] (PARQUET-1698) [C++] Add reader option to pre-buffer entire serialized row group into memory

2020-01-13 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014637#comment-17014637 ] Wes McKinney commented on PARQUET-1698: --- [~lidavidm] I missed the part about "wide datasets". I

[jira] [Commented] (PARQUET-1765) Invalid filteredRowCount in InternalParquetRecordReader

2020-01-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014460#comment-17014460 ] ASF GitHub Bot commented on PARQUET-1765: - gszadovszky commented on pull request #747:

[jira] [Updated] (PARQUET-1765) Invalid filteredRowCount in InternalParquetRecordReader

2020-01-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1765: Labels: pull-request-available (was: ) > Invalid filteredRowCount in

[jira] [Created] (PARQUET-1765) Invalid filteredRowCount in InternalParquetRecordReader

2020-01-13 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-1765: - Summary: Invalid filteredRowCount in InternalParquetRecordReader Key: PARQUET-1765 URL: https://issues.apache.org/jira/browse/PARQUET-1765 Project: Parquet

[jira] [Commented] (PARQUET-1758) InternalParquetRecordReader Logging it Too Verbose

2020-01-13 Thread David Mollitor (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014416#comment-17014416 ] David Mollitor commented on PARQUET-1758: - I think the general idea is that almost all logging

[jira] [Commented] (PARQUET-1745) No result for partition key included in Parquet file

2020-01-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014309#comment-17014309 ] Gabor Szadovszky commented on PARQUET-1745: --- The problem here is Spark sets a projection to

parquet filtering + projection

2020-01-13 Thread Gabor Szadovszky
Hi All, Current parquet filters handles missing columns (that are not in the file) as if their values were all null. This is completely logical. The question is how shall parquet filtering handle the columns that are in the file (with real values) but missing in the projection. I've thought

[RESULT] Release Apache Parquet Format 2.8.0 RC0

2020-01-13 Thread Gabor Szadovszky
Dear All, With three +1 binding votes and an additional +1 vote this release vote passes. Thank you all who have tested this RC. I'll finalize the release today. Cheers, Gabor On Thu, Jan 9, 2020 at 6:13 PM Junjie Chen wrote: > +1 (non-binding) > > On Wed, Jan 8, 2020 at 5:24 PM Gabor