[jira] [Commented] (PARQUET-1951) Allow different strategies to combine key values when merging parquet files

2021-01-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259333#comment-17259333 ] ASF GitHub Bot commented on PARQUET-1951: - satishkotha commented on pull reques

[GitHub] [parquet-mr] satishkotha commented on pull request #847: [PARQUET-1951] Allow different strategies to combine key values when …

2021-01-05 Thread GitBox
satishkotha commented on pull request #847: URL: https://github.com/apache/parquet-mr/pull/847#issuecomment-755018753 > @satishkotha, thanks for explaining your use case. I understand that you need to inject your own implementation to make it work. What I would like to achieve is to keep p

Re: Query on striping parquet files while maintaining Row group alignment

2021-01-05 Thread Tim Armstrong
Thanks for the explanation, kinda makes more sense. I guess you'd still have to read the parquet footer outside of the storage nodes and then send the relevant info from the footer to the storage nodes, right? I guess the footer doesn't need to be in the same block as the So the following options

Re: Query on incoherent total_byte_size and offset difference calculation results

2021-01-05 Thread Micah Kornfield
Hi Jayjeet, It isn't clear from your description whether the files being produced are corrupt or can be read but do not match your expectations. Either way some sample code and a more detailed explanation would be helpful in trying to figure out where the problem is. Thanks, Micah On Tue, Jan 5

Query on incoherent total_byte_size and offset difference calculation results

2021-01-05 Thread Jayjeet Chakraborty
I am using  Apache Arrow to write Parquet files. I am writing an uncompressed and non dictionary-encoded parquet file using pyarrow.parquet but the offsets are not well aligned when inspected using parquet tools. For example when I add up the row group offset with the row group size it does not

[jira] [Assigned] (PARQUET-1954) TCP connection leak in parquet dump

2021-01-05 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1954: - Assignee: xiepengjie > TCP connection leak in parquet dump > -

[jira] [Resolved] (PARQUET-1954) TCP connection leak in parquet dump

2021-01-05 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-1954. --- Resolution: Fixed > TCP connection leak in parquet dump > -

[jira] [Commented] (PARQUET-1954) TCP connection leak in parquet dump

2021-01-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258829#comment-17258829 ] ASF GitHub Bot commented on PARQUET-1954: - gszadovszky merged pull request #849

[GitHub] [parquet-mr] gszadovszky merged pull request #849: PARQUET-1954: TCP connection leak in parquet dump

2021-01-05 Thread GitBox
gszadovszky merged pull request #849: URL: https://github.com/apache/parquet-mr/pull/849 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[jira] [Commented] (PARQUET-1951) Allow different strategies to combine key values when merging parquet files

2021-01-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258826#comment-17258826 ] ASF GitHub Bot commented on PARQUET-1951: - gszadovszky commented on pull reques

[GitHub] [parquet-mr] gszadovszky commented on pull request #847: [PARQUET-1951] Allow different strategies to combine key values when …

2021-01-05 Thread GitBox
gszadovszky commented on pull request #847: URL: https://github.com/apache/parquet-mr/pull/847#issuecomment-754559854 @satishkotha, thanks for explaining your use case. I understand that you need to inject your own implementation to make it work. What I would like to achieve is to keep par

[jira] [Commented] (PARQUET-1951) Allow different strategies to combine key values when merging parquet files

2021-01-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258808#comment-17258808 ] ASF GitHub Bot commented on PARQUET-1951: - gszadovszky commented on a change in

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #847: [PARQUET-1951] Allow different strategies to combine key values when …

2021-01-05 Thread GitBox
gszadovszky commented on a change in pull request #847: URL: https://github.com/apache/parquet-mr/pull/847#discussion_r551843042 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/KeyValueMetadataMergeStrategy.java ## @@ -0,0 +1,44 @@ +/* + * License