Re: Reverting the merge blocks command feature

2019-02-21 Thread Gabor Szadovszky
Yes, it is related to PARQUET-1414. During the fix of PARQUET-1531 I've added an exception to be thrown in the case when an empty page would be written. Because of that I've discovered that the unit tests of this merge features throw this exception so I started to investigate the implementation of

[jira] [Commented] (PARQUET-1537) [C++] The patch for PARQUET-1508 leads to infinite loop and infinite memory allocation when reading very sparse ByteArray columns

2019-02-21 Thread Valery Meleshkin (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774598#comment-16774598 ] Valery Meleshkin commented on PARQUET-1537: --- I can't attach the original file

[jira] [Comment Edited] (PARQUET-1537) [C++] The patch for PARQUET-1508 leads to infinite loop and infinite memory allocation when reading very sparse ByteArray columns

2019-02-21 Thread Valery Meleshkin (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774598#comment-16774598 ] Valery Meleshkin edited comment on PARQUET-1537 at 2/21/19 11:45 PM:

[jira] [Comment Edited] (PARQUET-1537) [C++] The patch for PARQUET-1508 leads to infinite loop and infinite memory allocation when reading very sparse ByteArray columns

2019-02-21 Thread Valery Meleshkin (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774598#comment-16774598 ] Valery Meleshkin edited comment on PARQUET-1537 at 2/21/19 11:43 PM:

[jira] [Comment Edited] (PARQUET-1537) [C++] The patch for PARQUET-1508 leads to infinite loop and infinite memory allocation when reading very sparse ByteArray columns

2019-02-21 Thread Valery Meleshkin (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774598#comment-16774598 ] Valery Meleshkin edited comment on PARQUET-1537 at 2/21/19 11:41 PM:

[jira] [Moved] (PARQUET-1537) [C++] The patch for PARQUET-1508 leads to infinite loop and infinite memory allocation when reading very sparse ByteArray columns

2019-02-21 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney moved ARROW-4650 to PARQUET-1537: -- Fix Version/s: (was: 0.13.0) Component/s: (was: C++)

[jira] [Updated] (PARQUET-1537) [C++] The patch for PARQUET-1508 leads to infinite loop and infinite memory allocation when reading very sparse ByteArray columns

2019-02-21 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated PARQUET-1537: -- Fix Version/s: cpp-1.6.0 > [C++] The patch for PARQUET-1508 leads to infinite loop and infini

[jira] [Commented] (PARQUET-1537) [C++] The patch for PARQUET-1508 leads to infinite loop and infinite memory allocation when reading very sparse ByteArray columns

2019-02-21 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774511#comment-16774511 ] Wes McKinney commented on PARQUET-1537: --- I moved this issue to PARQUET > [C++] T

Re: Clarification on CRC checksum field

2019-02-21 Thread Ryan Blue
I'm not aware of any readers or writers using the CRC field. I think it would be great to clean up the spec and make it more clear. Want to submit a PR to parquet-format for this? Thanks! On Thu, Feb 21, 2019 at 6:48 AM Boudewijn Braams < boudewijn.bra...@databricks.com> wrote: > Hi all, > > Alt

Re: Reverting the merge blocks command feature

2019-02-21 Thread Ryan Blue
Was the motivation for this the bug that was found with PARQUET-1414? How did we catch this? On Thu, Feb 21, 2019 at 4:56 AM Gabor Szadovszky wrote: > HI All, > > I'm planning to push the revert tomorrow if there are no objections. > > Cheers, > Gabor > > On Tue, Feb 19, 2019 at 6:02 PM Gabor Sz

Clarification on CRC checksum field

2019-02-21 Thread Boudewijn Braams
Hi all, Although a page-level CRC field is defined in the Thrift specification, currently neither parquet-cpp nor parquet-mr seem to leverage it. Having these checksums will allow us to do localized detection of corruptions and provides a means for reasoning about where in the write/read path a c

[jira] [Commented] (PARQUET-1533) TestSnappy() throws OOM exception with Parquet-1485 change

2019-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774127#comment-16774127 ] ASF GitHub Bot commented on PARQUET-1533: - gszadovszky commented on pull reques

[jira] [Updated] (PARQUET-1534) [parquet-cli] Argument error: Illegal character in opaque part at index 2 on Windows

2019-02-21 Thread Masayuki Takahashi (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masayuki Takahashi updated PARQUET-1534: External issue ID: PARQUET-1536 > [parquet-cli] Argument error: Illegal character

[jira] [Commented] (PARQUET-1533) TestSnappy() throws OOM exception with Parquet-1485 change

2019-02-21 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774128#comment-16774128 ] ASF GitHub Bot commented on PARQUET-1533: - gszadovszky commented on pull reques

[jira] [Created] (PARQUET-1536) [parquet-cli] Add simple tests for each command

2019-02-21 Thread Masayuki Takahashi (JIRA)
Masayuki Takahashi created PARQUET-1536: --- Summary: [parquet-cli] Add simple tests for each command Key: PARQUET-1536 URL: https://issues.apache.org/jira/browse/PARQUET-1536 Project: Parquet

Re: Reverting the merge blocks command feature

2019-02-21 Thread Gabor Szadovszky
HI All, I'm planning to push the revert tomorrow if there are no objections. Cheers, Gabor On Tue, Feb 19, 2019 at 6:02 PM Gabor Szadovszky wrote: > Sorry, wrong PR. So, see PARQUET-1381 > and PR #621 >

[jira] [Commented] (PARQUET-1534) [parquet-cli] Argument error: Illegal character in opaque part at index 2 on Windows

2019-02-21 Thread Masayuki Takahashi (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773974#comment-16773974 ] Masayuki Takahashi commented on PARQUET-1534: - The stacktrace is following: