Hi, I've got a number of tables that I'm loading in from a SQL server. The timestamp in SQL server is stored like 2003-11-24T09:02:32 I get these as parquet files in our raw storage location and pick them up in Databricks. When I load the data in databricks, the dataframe/spark assumes UTC or +0000 on the timestamp like 2003-11-24T09:02:32+0000 the time and date is the same as what's in SQL server however the offset is incorrect
I've tried various methods like the below code to set the JVM timezone to my local timezone but when viewing the data it seems to just subtract the offset from the timestamp and add it to the offset part like 2003-11-24T09:02:32+0000 -> 2003-11-23T20:02:32+1300 (NZ has a +13 offset in winter) spark = pyspark.sql.SparkSession \ .Builder()\ .appName('test') \ .master('local') \ .config('spark.driver.extraJavaOptions', '-Duser.timezone=Pacific/Auckland') \ .config('spark.executor.extraJavaOptions', '-Duser.timezone=Pacific/Auckland') \ .config('spark.sql.session.timeZone', 'Pacific/Auckland') \ .getOrCreate() I understand that in Parquet these are stored as UNIX time and aren't timezone aware, however are there any settings that I can set in spark that would implicitly convert/assume the timestamp from 2003-11-24T09:02:32+0000 to 2003-11-24T09:02:32+1300 I know this can be done with transformations however I'm trying to avoid doing transformations for every timestamp on 100's tables Any help much appreciated, thanks, Jack