(spark) branch master updated: [MINOR][SQL] Add parquet nanosAsLong behavior change to 3.2 migration guide

gurwls223 Tue, 06 Aug 2024 20:37:56 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 46acdf453517 [MINOR][SQL] Add parquet nanosAsLong behavior change to 
3.2 migration guide
46acdf453517 is described below

commit 46acdf453517c403a3f212254489c6a82950810d
Author: Amanda Liu <amanda....@databricks.com>
AuthorDate: Wed Aug 7 12:36:52 2024 +0900

    [MINOR][SQL] Add parquet nanosAsLong behavior change to 3.2 migration guide
    
    ### What changes were proposed in this pull request?
    
    Add Spark 3.2 migration guide for `CREATE TABLE AS SELECT...` behavior 
change.
    
    SPARK-40819 allows for nanosecond precision in Parquet timestamp type when 
the config `spark.sql.legacy.parquet.nanosAsLong` is set to `true`. Otherwise, 
beyond Spark 3.2 there would be a behavior change where Parquet files with type 
`INT64 (TIMESTAMP(NANOS, true))` are unreadable.
    
    ### Why are the changes needed?
    This documents a behavior change starting in Spark 3.2 for Parquet files 
with nanosecond precision for timestamp type.
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    `doc build
    `
    ### Was this patch authored or co-authored using generative AI tooling?
    No.
    
    Closes #47638 from asl3/asl3/nanosAsLongMigDoc.
    
    Authored-by: Amanda Liu <amanda....@databricks.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 3846f7bb24d1..ad678c44657e 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -157,6 +157,8 @@ license: |
 
   - Since Spark 3.2, all the supported JDBC dialects use StringType for ROWID. 
In Spark 3.1 or earlier, Oracle dialect uses StringType and the other dialects 
use LongType.
 
+  - Since Spark 3.2, Parquet files with nanosecond precision for timestamp 
type (`INT64 (TIMESTAMP(NANOS, true))`) are not readable. To restore the 
behavior before Spark 3.2, you can set `spark.sql.legacy.parquet.nanosAsLong` 
to `true`.
+
   - In Spark 3.2, PostgreSQL JDBC dialect uses StringType for MONEY and 
MONEY[] is not supported due to the JDBC driver for PostgreSQL can't handle 
those types properly. In Spark 3.1 or earlier, DoubleType and ArrayType of 
DoubleType are used respectively.
 
   - In Spark 3.2, `spark.sql.adaptive.enabled` is enabled by default. To 
restore the behavior before Spark 3.2, you can set `spark.sql.adaptive.enabled` 
to `false`.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][SQL] Add parquet nanosAsLong behavior change to 3.2 migration guide

Reply via email to