[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1301901914 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1302095236 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1302246559 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1302271092 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-22 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1302271092 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-23 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1303197041 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-23 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1303204476 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-23 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1303211127 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-23 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1303299069 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-23 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1303360668 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-23 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1303360668 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-25 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1306335556 ## src/main/thrift/parquet.thrift: ## @@ -974,6 +1050,13 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-08-31 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1312152587 ## src/main/thrift/parquet.thrift: ## @@ -764,6 +845,14 @@ struct ColumnMetaData { * in a single I/O. */ 15: optional i32 bloom_filter_length; + + /**

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-01 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1313547575 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition leve

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-01 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1313547575 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,74 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition leve

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-06 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317881962 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318190332 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318565796 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,73 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition leve

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318570484 ## src/main/thrift/parquet.thrift: ## @@ -191,6 +191,73 @@ enum FieldRepetitionType { REPEATED = 2; } +/** + * A histogram of repetition and definition leve

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1318770227 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1073,15 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-07 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319212210 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320122860 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320192142 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-08 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1320256768 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio

[GitHub] [parquet-format] etseidl commented on a diff in pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

2023-09-12 Thread via GitHub
etseidl commented on code in PR #197: URL: https://github.com/apache/parquet-format/pull/197#discussion_r1323524900 ## src/main/thrift/parquet.thrift: ## @@ -977,6 +1038,25 @@ struct ColumnIndex { /** A list containing the number of null values for each page **/ 5: optio