[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317565318 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition level

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317578394 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317578394 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317578394 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317578394 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317745824 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition level

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318260892 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318273187 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318275562 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318273187 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318275562 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318292827 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318367540 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318737835 ## src/main/thrift/parquet.thrift: ## @@ -529,7 +596,15 @@ struct DataPageHeader { /** Encoding used for repetition levels **/ 4: required Encoding repetition

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323028211 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /** +

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323028211 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /** +

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323059489 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: option

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323069846 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /**

[GitHub] [parquet-format] JFinis commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
JFinis commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323069846 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +810,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /**