I actually think this is a general problem with usage of DateFormat and
SimpleDateFormat across the code, in that it relies on the default locale
of the JVM. I believe this needs to, at least, default consistently to
Locale.US so that behavior is consistent; otherwise it's possible that
parsing and formatting of dates could work subtly differently across
environments.

There's a similar question about some code that formats dates for the UI.
It's more reasonable to let that use the platform-default locale, but, I'd
still favor standardizing it I think.

Anyway, let me test it out a bit and possibly open a JIRA with this change
for discussion.

On Mon, Oct 24, 2016 at 1:03 PM pietrop <pietro.pu...@gmail.com> wrote:

Hi there,
I opened a question on StackOverflow at this link:
http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972

I didn’t get any useful answer, so I’m writing here hoping that someone can
help me.

In short, I’m trying to read a CSV containing data columns stored using the
pattern “yyyyMMMdd”. What doesn’t work for me is “MMM”. I’ve done some
testing and discovered that it’s a localization issue. As you can read from
the StackOverflow question, I run a simple Java code to parse the date
“1989Dec31” and it works only if I specify Locale.US in the
SimpleDateFormat() function.

I would like pyspark to work. I tried setting a different local from console
(LANG=“en_US”), but it doesn’t work. I tried also setting it using the
locale package from Python.

So, there’s a way to set locale in Spark when using pyspark? The issue is
Java related and not Python related (the function that parses data is
invoked by spark.read.load(dateFormat=“yyyyMMMdd”, …). I don’t want to use
other solutions in order to encode data because they are slower (from what
I’ve seen so far).

Thank you
Pietro



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-dateFormat-pattern-in-spark-read-load-for-dates-like-1989Dec31-and-31D9-tp27951.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to