Github user zivanfi commented on the issue:
https://github.com/apache/spark/pull/19250
Yes, that is correct. We introduced the table property to address the 2nd
problem I mentioned above: "The adjustment depends on the local timezone."
(details in my [previous
comm
Github user zivanfi commented on the issue:
https://github.com/apache/spark/pull/19250
Yes, you understand correctly, the table property affects both the read
path and the write path, while the current workaround used by Hive and Impala
only affects the read path. (Both are Parquet
Github user zivanfi commented on the issue:
https://github.com/apache/spark/pull/19250
Hive and Impala introduced the following workaround for timestamp
interoperability a long ago: The footer of the Parquet file contains metadata
about the library that wrote the file. For Hive and
Github user zivanfi commented on the issue:
https://github.com/apache/spark/pull/19250
The interoperability issue is that Impala follows timezone-agnostic
timestamp semantics as mandated by the SQL standard, while SparkSQL follows
UTC-normalized semantics instead (which is not SQL
Github user zivanfi commented on the issue:
https://github.com/apache/spark/pull/19250
@attilajeges has just found a problem with the behavior specified in the
requirements:
* Partitions of a table can use different file formats.
* As a result, a single table can have data
Github user zivanfi commented on a diff in the pull request:
https://github.com/apache/spark/pull/19250#discussion_r143462649
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -266,6 +267,10 @@ final class DataFrameWriter[T] private[sql](ds
Github user zivanfi commented on a diff in the pull request:
https://github.com/apache/spark/pull/19250#discussion_r143257840
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
---
@@ -1213,6 +1213,71 @@ case class
Github user zivanfi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16781#discussion_r104673553
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
---
@@ -137,8 +141,190 @@ class
Github user zivanfi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16781#discussion_r104668320
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
---
@@ -89,11 +92,23
Github user zivanfi commented on the issue:
https://github.com/apache/spark/pull/16781
Please update the pull request description, because the one dated Feb 2
does not correspond to the fix any more.
---
If your project is set up for it, you can reply to this email and have your
Github user zivanfi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16781#discussion_r104660877
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
---
@@ -89,11 +92,23
Github user zivanfi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16781#discussion_r104664187
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -674,6 +674,12 @@ object SQLConf {
.stringConf
12 matches
Mail list logo