Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-09 Thread TP Boudreau
I'm not a long-time Parquet user, but I assisted in the expansion of the parquet-cpp library's LogicalType facility. My impression is that the original TIMESTAMP converted types were silent on whether the annotated value was UTC adjusted and that (often arcane) out-of-band information had to be re

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-09 Thread Wes McKinney
Thanks Zoltan. This is definitely a tricky issue. Spark's application of localtime semantics to timestamp data has been a source of issues for many people. Personally I don't find that behavior to be particularly helpful since depending on the session time zone, you will get different results on

Re: Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-09 Thread Zoltan Ivanfi
Hi Wes, The rules for TIMESTAMP forward-compatibility were created based on the assumption that TIMESTAMP_MILLIS and TIMESTAMP_MICROS have only been used in the instant aka. UTC-normalized semantics so far. This assumption was supported by two sources: 1. The specification: parquet-format defined

Forward compatibility issues with TIMESTAMP_MILLIS/MICROS ConvertedType

2019-07-09 Thread Wes McKinney
hi folks, We have just recently implemented the new LogicalType unions in the Parquet C++ library and we have run into a forward compatibility problem with reader versions prior to this implementation. To recap the issue, prior to the introduction of LogicalType, the Parquet format had no explici

Commons Collection v3 dependency in HadoopCodecs

2019-07-09 Thread Szabolcs Váradi
Dear Developers, Thank you for your hard work on Apache Parquet! I am a software engineer who is working on a solution of converting large batches of Parquet files into Capacitor files. I have a working prototype which I am trying to finalize but I ran into some issues regarding our dependency ma

[jira] [Commented] (PARQUET-458) [C++] Implement support for DataPageV2

2019-07-09 Thread sravani kalikiri (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881141#comment-16881141 ] sravani kalikiri commented on PARQUET-458: -- Thanks Wes Mckinney  > [C++] Imple

[jira] [Commented] (PARQUET-1615) getRecordWriter shouldn't hardcode CREAT mode when new ParquetFileWriter

2019-07-09 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881052#comment-16881052 ] ASF GitHub Bot commented on PARQUET-1615: - gszadovszky commented on pull reques

[jira] [Assigned] (PARQUET-1615) getRecordWriter shouldn't hardcode CREAT mode when new ParquetFileWriter

2019-07-09 Thread Gabor Szadovszky (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1615: - Assignee: Lantao Jin > getRecordWriter shouldn't hardcode CREAT mode when new