This is more of an OS-level thing, but I think that if you can manage to set -Duser.language=en to the JVM, it might do the trick.
I summarized what I think I know about this at https://issues.apache.org/jira/browse/SPARK-18076 and so we can decide what to do, if anything, there. Sean On Mon, Oct 24, 2016 at 3:08 PM Pietro Pugni <pietro.pu...@gmail.com> wrote: > Thank you, I’ll appreciate that. I have no experience with Python, Java > and Spark, so I the question can be translated to: “How can I set JVM > locale when using spark-submit and pyspark?”. Probably this is possible > only by changing the system defaul locale and not within the Spark session, > right? > > Thank you > Pietro > > Il giorno 24 ott 2016, alle ore 14:51, Hyukjin Kwon <gurwls...@gmail.com> > ha scritto: > > I am also interested in this issue. I will try to look into this too > within coming few days.. > > 2016-10-24 21:32 GMT+09:00 Sean Owen <so...@cloudera.com>: > > I actually think this is a general problem with usage of DateFormat and > SimpleDateFormat across the code, in that it relies on the default locale > of the JVM. I believe this needs to, at least, default consistently to > Locale.US so that behavior is consistent; otherwise it's possible that > parsing and formatting of dates could work subtly differently across > environments. > > There's a similar question about some code that formats dates for the UI. > It's more reasonable to let that use the platform-default locale, but, I'd > still favor standardizing it I think. > > Anyway, let me test it out a bit and possibly open a JIRA with this change > for discussion. > > On Mon, Oct 24, 2016 at 1:03 PM pietrop <pietro.pu...@gmail.com> wrote: > > Hi there, > I opened a question on StackOverflow at this link: > > http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972 > > I didn’t get any useful answer, so I’m writing here hoping that someone can > help me. > > In short, I’m trying to read a CSV containing data columns stored using the > pattern “yyyyMMMdd”. What doesn’t work for me is “MMM”. I’ve done some > testing and discovered that it’s a localization issue. As you can read from > the StackOverflow question, I run a simple Java code to parse the date > “1989Dec31” and it works only if I specify Locale.US in the > SimpleDateFormat() function. > > I would like pyspark to work. I tried setting a different local from > console > (LANG=“en_US”), but it doesn’t work. I tried also setting it using the > locale package from Python. > > So, there’s a way to set locale in Spark when using pyspark? The issue is > Java related and not Python related (the function that parses data is > invoked by spark.read.load(dateFormat=“yyyyMMMdd”, …). I don’t want to use > other solutions in order to encode data because they are slower (from what > I’ve seen so far). > > Thank you > Pietro > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-dateFormat-pattern-in-spark-read-load-for-dates-like-1989Dec31-and-31D9-tp27951.html > Sent from the Apache Spark User List mailing list archive at Nabble.com > <http://nabble.com>. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > >