shangxinli commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1414220467
##
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java:
##
@@ -409,4 +428,14 @@ abstract void writePage(
ValuesWriter definit
shangxinli commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1414218649
##
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java:
##
@@ -389,7 +400,14 @@ void writePage() {
this.rowsWrittenSoFar += pag
shangxinli commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1414218649
##
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java:
##
@@ -389,7 +400,14 @@ void writePage() {
this.rowsWrittenSoFar += pag
wgtmac commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403914499
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
wgtmac commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403909642
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##
@@ -316,7 +346,14 @@ private int toIntWithCheck(long size) {
retur
wgtmac commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403907724
##
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndexBuilder.java:
##
@@ -80,11 +90,22 @@ public void add(int compressedPageSize,
wgtmac commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403906914
##
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndex.java:
##
@@ -57,4 +57,16 @@ public interface ColumnIndex extends
Visitor {
wgtmac commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403906654
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
wgtmac commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1825101267
> Hi @wgtmac thanks for this great work. Could this influence the rewriter?
How could we rebuild the `SizeStatistics` during rewriting?
Thanks for your review! IMO, the rewriter c
ConeyLiu commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1824649905
Hi @wgtmac thanks for this great work. Could this influence the rewriter?
How could we rebuild the `SizeStatistics` during rewriting?
--
This is an automated message from the Apache
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403548852
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java:
##
@@ -775,6 +848,51 @@ public void writeDataPageV2(
uncompressedDataSize,
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403542243
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##
@@ -316,7 +346,14 @@ private int toIntWithCheck(long size) {
ret
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403537889
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##
@@ -316,7 +346,14 @@ private int toIntWithCheck(long size) {
ret
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403536463
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##
@@ -152,13 +157,26 @@ public void writePage(BytesInput bytesInput, int
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403535802
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##
@@ -87,6 +90,7 @@ private static final class ColumnChunkPageWriter impl
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403534057
##
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndexBuilder.java:
##
@@ -116,11 +137,28 @@ private OffsetIndexBuilder() {
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403533624
##
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndexBuilder.java:
##
@@ -80,11 +90,22 @@ public void add(int compressedPageSiz
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403526389
##
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndex.java:
##
@@ -57,4 +57,16 @@ public interface ColumnIndex extends
Visitor
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403513092
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (A
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403509676
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (A
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403507869
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (A
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403507171
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (A
ConeyLiu commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1824590527
Thank @wgtmac for your notification.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403506285
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (A
ConeyLiu commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1403505946
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (A
wgtmac commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1823898172
cc @ConeyLiu as I have modified mergeColumnStatistics method which you've
just refactored.
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
wgtmac commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1823895468
I have just rebased on the latest master branch and fixed all CI falures. As
this PR gets too large, I will add print cli command and rewriter support for
`SizeStatistics` in follow-up
wgtmac opened a new pull request, #1201:
URL: https://github.com/apache/parquet-mr/pull/1201
Make sure you have checked _all_ steps below.
### Jira
- [ ] My PR addresses the following [Parquet
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references
them in
wgtmac commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1367696949
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
etseidl commented on code in PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#discussion_r1367498685
##
parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java:
##
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
emkornfield commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1773069981
@wgtmac took a scan through and this generally seems like what I expected.
Thank you for doing it. Agree unit tests are needed.
--
This is an automated message from the Apache
wgtmac commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1771949119
> Thanks @wgtmac, this looks great! I'm not sure if this is in scope for
this PR, but it would be nice if the CLI was aware of the changes.
Specifically, it would be great if the `colum
etseidl commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1771735094
Thanks @wgtmac, this looks great! I'm not sure if this is in scope for this
PR, but it would be nice if the CLI was aware of the changes. Specifically, it
would be great if the `column
wgtmac commented on PR #1177:
URL: https://github.com/apache/parquet-mr/pull/1177#issuecomment-1771189247
I have drafted the POC to read/write SizeStatistics. The feature
implementation should be complete and associated tests will be added
progressively. Please take a look when you have tim
wgtmac opened a new pull request, #1177:
URL: https://github.com/apache/parquet-mr/pull/1177
Make sure you have checked _all_ steps below.
### Jira
- [ ] My PR addresses the following [Parquet
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references
them in
35 matches
Mail list logo