[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #832: PARQUET-1926: Add LogicalType support to ThriftType

2021-01-25 Thread GitBox
gszadovszky commented on a change in pull request #832: URL: https://github.com/apache/parquet-mr/pull/832#discussion_r563543489 ## File path: parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java ## @@ -325,7 +323,7 @@ private ConvertedField v

[jira] [Commented] (PARQUET-1926) Add LogicalType support to ThriftType.I64Type

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271166#comment-17271166 ] ASF GitHub Bot commented on PARQUET-1926: - gszadovszky commented on a change in

[GitHub] [parquet-format] gszadovszky commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
gszadovszky commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-766730709 @nevi-me, based on the previous PR it will require some time to agree on it. Starting a discussion in the dev list might also help as a heads up. (I do not have the req

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271247#comment-17271247 ] ASF GitHub Bot commented on PARQUET-675: gszadovszky commented on pull request #

[jira] [Assigned] (PARQUET-1926) Add LogicalType support to ThriftType.I64Type

2021-01-25 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1926: - Assignee: Joshua Martone > Add LogicalType support to ThriftType.I64Type > ---

Bloom filter for apache parquet

2021-01-25 Thread Viviana Elizabeth Romero Noguera
Hi. I am a doctoral student at ICMC - USP in Brazil. I am looking to work with apache parquet. I am looking to program in java or python. Has the bloom filter already been implemented? at what level? row groups, column chunk or page? in what version of the parquet are they implemented? I'm looking

[GitHub] [parquet-format] nevi-me closed pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
nevi-me closed pull request #165: URL: https://github.com/apache/parquet-format/pull/165 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271573#comment-17271573 ] ASF GitHub Bot commented on PARQUET-675: nevi-me closed pull request #165: URL:

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271574#comment-17271574 ] ASF GitHub Bot commented on PARQUET-675: nevi-me opened a new pull request #165:

[GitHub] [parquet-format] emkornfield commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
emkornfield commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767097727 @nevi-me I don't think we should be imposing arrow's modeling of interval type on parquet. The existing interval type seems reasonable in parquet. I think there are t

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271666#comment-17271666 ] ASF GitHub Bot commented on PARQUET-675: emkornfield commented on pull request #

[GitHub] [parquet-format] houqp commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
houqp commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767105058 I do agree that if this is something specific to Arrow, then we shouldn't impose it onto parquet. However, given that #43 was created to non-arrow use-case, it looks like the

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271673#comment-17271673 ] ASF GitHub Bot commented on PARQUET-675: houqp commented on pull request #165: U

[GitHub] [parquet-format] emkornfield commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
emkornfield commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767114642 If I read the conclusion from #43 it was to potentially store these as separate columns for day_time. So if we really want to introduce these types I think not merging

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271678#comment-17271678 ] ASF GitHub Bot commented on PARQUET-675: emkornfield commented on pull request #

[GitHub] [parquet-format] nevi-me commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
nevi-me commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767132481 Thanks for the responses, I'll look at the interval vs duration support in more detail in the coming weeks, and rather put together a Google doc for discussion in the Parqu

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271692#comment-17271692 ] ASF GitHub Bot commented on PARQUET-675: nevi-me commented on pull request #165:

[GitHub] [parquet-format] nevi-me edited a comment on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
nevi-me edited a comment on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767132481 Thanks for the responses, I'll look at the interval vs duration support in more detail in the coming weeks, and rather put together a Google doc for discussion in th

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271693#comment-17271693 ] ASF GitHub Bot commented on PARQUET-675: nevi-me edited a comment on pull reques

[GitHub] [parquet-mr] wangyum commented on pull request #857: PARQUET-1746: Disable parquet.page.write-checksum.enabled by default

2021-01-25 Thread GitBox
wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694 @bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3cjira.13233926.1558083819000.446944.1560276180...@atlassian.jira%3E

[jira] [Commented] (PARQUET-1746) Changed the data order after DataFrame reuse

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271729#comment-17271729 ] ASF GitHub Bot commented on PARQUET-1746: - wangyum commented on pull request #8

[GitHub] [parquet-format] houqp commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
houqp commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767105058 I do agree that if this is something specific to Arrow, then we shouldn't impose it onto parquet. However, given that #43 was created to non-arrow use-case, it looks like the

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271812#comment-17271812 ] ASF GitHub Bot commented on PARQUET-675: houqp commented on pull request #165: U

[GitHub] [parquet-format] nevi-me edited a comment on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
nevi-me edited a comment on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767132481 Thanks for the responses, I'll look at the interval vs duration support in more detail in the coming weeks, and rather put together a Google doc for discussion in th

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271818#comment-17271818 ] ASF GitHub Bot commented on PARQUET-675: nevi-me opened a new pull request #165:

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271819#comment-17271819 ] ASF GitHub Bot commented on PARQUET-675: nevi-me edited a comment on pull reques

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #832: PARQUET-1926: Add LogicalType support to ThriftType

2021-01-25 Thread GitBox
gszadovszky commented on a change in pull request #832: URL: https://github.com/apache/parquet-mr/pull/832#discussion_r563543489 ## File path: parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java ## @@ -325,7 +323,7 @@ private ConvertedField v

[jira] [Commented] (PARQUET-1926) Add LogicalType support to ThriftType.I64Type

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271826#comment-17271826 ] ASF GitHub Bot commented on PARQUET-1926: - gszadovszky commented on a change in

[GitHub] [parquet-format] nevi-me commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
nevi-me commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767132481 Thanks for the responses, I'll look at the interval vs duration support in more detail in the coming weeks, and rather put together a Google doc for discussion in the Parqu

[GitHub] [parquet-format] nevi-me closed pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
nevi-me closed pull request #165: URL: https://github.com/apache/parquet-format/pull/165 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271831#comment-17271831 ] ASF GitHub Bot commented on PARQUET-675: nevi-me commented on pull request #165:

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271832#comment-17271832 ] ASF GitHub Bot commented on PARQUET-675: nevi-me closed pull request #165: URL:

[GitHub] [parquet-format] gszadovszky commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
gszadovszky commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-766730709 @nevi-me, based on the previous PR it will require some time to agree on it. Starting a discussion in the dev list might also help as a heads up. (I do not have the req

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271836#comment-17271836 ] ASF GitHub Bot commented on PARQUET-675: gszadovszky commented on pull request #

[GitHub] [parquet-mr] wangyum commented on pull request #857: PARQUET-1746: Disable parquet.page.write-checksum.enabled by default

2021-01-25 Thread GitBox
wangyum commented on pull request #857: URL: https://github.com/apache/parquet-mr/pull/857#issuecomment-767180694 @bbraams I read a outdate document: http://mail-archives.apache.org/mod_mbox/parquet-dev/201906.mbox/%3cjira.13233926.1558083819000.446944.1560276180...@atlassian.jira%3E

[jira] [Commented] (PARQUET-1746) Changed the data order after DataFrame reuse

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271852#comment-17271852 ] ASF GitHub Bot commented on PARQUET-1746: - wangyum commented on pull request #8

[GitHub] [parquet-format] emkornfield commented on pull request #165: PARQUET-675: Specify Interval LogicalType

2021-01-25 Thread GitBox
emkornfield commented on pull request #165: URL: https://github.com/apache/parquet-format/pull/165#issuecomment-767097727 This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[jira] [Commented] (PARQUET-675) Add INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types

2021-01-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271870#comment-17271870 ] ASF GitHub Bot commented on PARQUET-675: emkornfield commented on pull request #

Re: Bloom filter for apache parquet

2021-01-25 Thread Micah Kornfield
Welcome Vivianna, I think taking a look at https://issues.apache.org/jira/browse/PARQUET-41 and sub-issues should give you a sense of the current implementation. Java seems to have an implementation. The python implementation of parquet is a binding on top of the C++ implementation. Bloom filte