Hi Lucas,
The assessments from Wes and Li are right on. Just to add to that, and
unfortunately make things even more complicated.. Spark does not always use
the config "spark.sql.session.timeZone", so it doesn't really help with
your example. It would be used if instead you generated timestamps
Lucas,
Wes' explanation is correct. If you are using Spark 2.2, you can set spark
config "spark.sql.session.timeZone" to "UTC".
I have written an documentation explaining this. I can clean it up for
ARROW-1425.
On Mon, Aug 28, 2017 at 5:23 PM, Wes McKinney wrote:
> see
hi Lucas,
Bryan Cutler, Holden Karau, Li Jin, or someone with deeper knowledge
of the Spark timestamp issue (which is a known, and not a bug per se)
should be able to give some extra context about this.
My understanding is that when you read timezone-naive data in Spark,
it is treated as
Here is the pyspark script I used to see this difference.
On Mon, 28 Aug 2017 at 09:20 Lucas Pickup
wrote:
> Hi all,
>
> Very sorry if people already responded to this at:
> lucas.pic...@microsoft.com There was an INVALID identifier attached to
> the end of the
Hi all,
Very sorry if people already responded to this at:
lucas.pic...@microsoft.com There was an INVALID identifier attached to the
end of the reply address for some reason which may have caused replies to
be lost.
I've been messing around with Spark and PyArrow Parquet reading. In my
testing
, 2017 3:23 PM
To: dev@arrow.apache.org
Subject: Reading Parquet datetime column gives different answer in Spark vs
PyArrow
Hi all,
I've been messing around with Spark and PyArrow Parquet reading. In my testing
I've found that a Parquet file written by Spark containing a datetime column
Hi all,
I've been messing around with Spark and PyArrow Parquet reading. In my testing
I've found that a Parquet file written by Spark containing a datetime column,
results in different datetimes from Spark and PyArrow.
The attached script demonstrates this.
Output:
Spark Reading the parquet