Re: Write a parquet file with delta encoding enable

2020-03-24 Thread Gabor Szadovszky
Hi, I am not aware of any parquet writer implementation that allows to set the exact encoding for your data. I am working on parquet-mr only, though. (parquet-mr is the one used by spark.) In parquet-mr the encoding is chosen based on the data type and some additional parameters (but none of them

[jira] [Updated] (PARQUET-1824) [C++] Fix crashes on invalid input (OSS-Fuzz)

2020-03-24 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated PARQUET-1824: Fix Version/s: cpp-1.6.0 > [C++] Fix crashes on invalid input (OSS-Fuzz)

[jira] [Created] (PARQUET-1825) [C++] Fix compilation error in column_io_benchmark.cc

2020-03-24 Thread Uwe Korn (Jira)
Uwe Korn created PARQUET-1825: - Summary: [C++] Fix compilation error in column_io_benchmark.cc Key: PARQUET-1825 URL: https://issues.apache.org/jira/browse/PARQUET-1825 Project: Parquet Issue Typ

[jira] [Updated] (PARQUET-1825) [C++] Fix compilation error in column_io_benchmark.cc

2020-03-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1825: Labels: pull-request-available (was: ) > [C++] Fix compilation error in column_io_benchm

Parquet sync meeting notes

2020-03-24 Thread Xinli shang
3/24/2020 Attendee: Walid, Gabor, Xindong, Gidon, Shrikumar, Xinli Topics: Column Encryption - PR-769 is being reviewed. Gidon/Xinli just addressed the feedback this morning and Gabor is going to have another look. - Gidon has

[jira] [Resolved] (PARQUET-1786) [C++] Use simd to improve BYTE_STREAM_SPLIT encoding performance

2020-03-24 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-1786. - Fix Version/s: cpp-1.6.0 Resolution: Fixed Issue resolved by pull request 6679 [

[jira] [Updated] (PARQUET-1786) [C++] Use simd to improve BYTE_STREAM_SPLIT encoding performance

2020-03-24 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1786: Component/s: parquet-cpp > [C++] Use simd to improve BYTE_STREAM_SPLIT encoding performan

[jira] [Updated] (PARQUET-1786) [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performance

2020-03-24 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1786: Summary: [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performance (was: [C++] Us

[jira] [Reopened] (PARQUET-1786) [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performance

2020-03-24 Thread Martin Radev (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Radev reopened PARQUET-1786: --- I need to submit also the encoder improvement. > [C++] Use simd to improve BYTE_STREAM_SPLIT d

[jira] [Commented] (PARQUET-1786) [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performance

2020-03-24 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066070#comment-17066070 ] Antoine Pitrou commented on PARQUET-1786: - Can you please open a separate issue

[jira] [Resolved] (PARQUET-1786) [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performance

2020-03-24 Thread Martin Radev (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Radev resolved PARQUET-1786. --- Resolution: Fixed Will do. > [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performa

[jira] [Closed] (PARQUET-1786) [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performance

2020-03-24 Thread Martin Radev (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Radev closed PARQUET-1786. - Closing. > [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performance >

[jira] [Commented] (PARQUET-1786) [C++] Use simd to improve BYTE_STREAM_SPLIT decoding performance

2020-03-24 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066085#comment-17066085 ] Wes McKinney commented on PARQUET-1786: --- Please leave resolved issues in "Resolve

[jira] [Resolved] (PARQUET-1825) [C++] Fix compilation error in column_io_benchmark.cc

2020-03-24 Thread Kouhei Sutou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved PARQUET-1825. --- Fix Version/s: cpp-1.6.0 Resolution: Fixed Issue resolved by pull request 6701 [http