[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730385#comment-17730385 ] ASF GitHub Bot commented on PARQUET-675: nevi-me closed pull request #165: PARQU

[GitHub] [parquet-format] nevi-me closed pull request #165: PARQUET-675: Specify Interval LogicalType

2023-06-07 Thread via GitHub
nevi-me closed pull request #165: PARQUET-675: Specify Interval LogicalType URL: https://github.com/apache/parquet-format/pull/165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [parquet-mr] wgtmac commented on pull request #1068: Bump elephant-bird.version from 4.4 to 4.17

2023-06-07 Thread via GitHub
wgtmac commented on PR #1068: URL: https://github.com/apache/parquet-mr/pull/1068#issuecomment-1581769831 > Should we deprecate the dependency? The last release of `elephant-bird` is from March 2018 Sorry for the late reply. I think it should be deprecated as it is not maintained any

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730337#comment-17730337 ] ASF GitHub Bot commented on PARQUET-2305: - wgtmac commented on code in PR #1102

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
wgtmac commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1222352757 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java: ## @@ -31,10 +32,12 @@ import org.apache.parquet.io.api.Converter; import org

Re: Bloom filters for full-text search and predicate pushdown

2023-06-07 Thread Micah Kornfield
You probably need to be more specific on which language bindings you are using. I think the C++ community is just starting to work on being able to write out bloom filters (so it isn't supported in C++, Python and R, Ruby, etc). The way I read the specification, yes each single value should be ad

Re: Bloom filters for full-text search and predicate pushdown

2023-06-07 Thread Marco Colli
@Micah Does that mean that columns of type array already get a bloom filter on each single value? I am using Apache Arrow in particular to deal with Parquet files Il Mer 7 Giu 2023, 16:00 Micah Kornfield ha scritto: > Hi Marco, > Could you describe how your proposal differs from tokenizing the t

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730253#comment-17730253 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1222036621 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java: ## @@ -86,32 +89,71 @@ class ProtoMessageConverter extends GroupConverter {

[GitHub] [parquet-mr] tddfan closed pull request #1108: Parquet2proto ignore unkown fields

2023-06-07 Thread via GitHub
tddfan closed pull request #1108: Parquet2proto ignore unkown fields URL: https://github.com/apache/parquet-mr/pull/1108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [parquet-mr] tddfan opened a new pull request, #1108: Parquet2proto ignore unkown fields

2023-06-07 Thread via GitHub
tddfan opened a new pull request, #1108: URL: https://github.com/apache/parquet-mr/pull/1108 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730227#comment-17730227 ] ASF GitHub Bot commented on PARQUET-2171: - steveloughran commented on PR #1103:

[GitHub] [parquet-mr] steveloughran commented on pull request #1103: PARQUET-2171. Implement vectored IO in parquet file format

2023-06-07 Thread via GitHub
steveloughran commented on PR #1103: URL: https://github.com/apache/parquet-mr/pull/1103#issuecomment-1581240289 Hadoop API shim; now separate ASF repo there. https://github.com/apache/hadoop-api-shim -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
wgtmac commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221872560 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoConstants.java: ## @@ -26,6 +26,7 @@ public final class ProtoConstants { public static final String

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730189#comment-17730189 ] ASF GitHub Bot commented on PARQUET-2305: - wgtmac commented on code in PR #1102

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730184#comment-17730184 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730185#comment-17730185 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221865678 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java: ## @@ -86,32 +89,71 @@ class ProtoMessageConverter extends GroupConverter {

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730183#comment-17730183 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730182#comment-17730182 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221864629 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java: ## @@ -86,32 +89,71 @@ class ProtoMessageConverter extends GroupConverter {

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730181#comment-17730181 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730180#comment-17730180 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221863218 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java: ## @@ -124,13 +166,15 @@ public void start() { @Override public void en

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730177#comment-17730177 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221862896 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java: ## @@ -86,32 +89,71 @@ class ProtoMessageConverter extends GroupConverter {

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730179#comment-17730179 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221861354 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoParquetReader.java: ## @@ -71,6 +77,13 @@ protected Builder(InputFile file) { super(file);

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730175#comment-17730175 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730174#comment-17730174 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221860150 ## parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoSchemaEvolutionTest.java: ## @@ -65,4 +66,68 @@ public void testEnumSchemaWriteV1ReadV2() throws IOExc

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221859356 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoParquetReader.java: ## @@ -37,11 +37,17 @@ public static ParquetReader.Builder builder(Path file)

[jira] [Commented] (PARQUET-2305) Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730170#comment-17730170 ] ASF GitHub Bot commented on PARQUET-2305: - tddfan commented on code in PR #1102

[GitHub] [parquet-mr] tddfan commented on a diff in pull request #1102: PARQUET-2305 Allow Parquet to Proto conversion even though Target Schema has less fields

2023-06-07 Thread via GitHub
tddfan commented on code in PR #1102: URL: https://github.com/apache/parquet-mr/pull/1102#discussion_r1221851162 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java: ## @@ -86,32 +89,71 @@ class ProtoMessageConverter extends GroupConverter {

[GitHub] [parquet-mr] tddfan closed pull request #1107: Parquet2proto ignore unkown fields

2023-06-07 Thread via GitHub
tddfan closed pull request #1107: Parquet2proto ignore unkown fields URL: https://github.com/apache/parquet-mr/pull/1107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [parquet-mr] tddfan opened a new pull request, #1107: Parquet2proto ignore unkown fields

2023-06-07 Thread via GitHub
tddfan opened a new pull request, #1107: URL: https://github.com/apache/parquet-mr/pull/1107 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in

Re: Bloom filters for full-text search and predicate pushdown

2023-06-07 Thread Micah Kornfield
Hi Marco, Could you describe how your proposal differs from tokenizing the target string and storing the list of tokens in a column that has a bloom filter attached? I think this should be supportable today by the format at least if not existing libraries. Thanks, Micah On Wednesday, June 7, 202

Re: Bloom filters for full-text search and predicate pushdown

2023-06-07 Thread Gang Wu
Hi Marco, That sounds interesting! However, this requires the parquet implementation to be able to tokenize both strings to write and literals in the filters. The actual efficiency depends on the data distribution. I am also concerned with the possible explosion of distinct values introduced by s