I'd suggest first converting your string containing your date/time to a TimestampType or a DateType. Then the built in functions for year, month, day, etc. will then work as expected. If your date is in a "standard" format, you can perform the conversion just by casting the column to a date or timestamp type. The list of types it can auto-convert are listed at this link: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L270-L295
If casting won't work, you can manually convert it by specifying a format string with the following builtin function: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.unix_timestamp The format string uses the java simpleDateFormat format string, if I remember correctly ( http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html). Nicholas Szandor Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Tue, Oct 3, 2017 at 10:43 AM, Adaryl Wakefield < adaryl.wakefi...@hotmail.com> wrote: > I gave myself a project to start actually writing Spark programs. I’m > using Scala and Spark 2.2.0. In my project, I had to do some grouping and > filtering by dates. It was awful and took forever. I was trying to use > dataframes and SQL as much as possible. I see that there are date functions > in the dataframe API but trying to use them was frustrating. Even following > code samples was a headache because apparently the code is different > depending on which version of Spark you are using. I was really hoping for > a rich set of date functions like you’d find in T-SQL but I never really > found them. > > > > Is there a best practice for dealing with dates and time in Spark? I feel > like taking a date/time string and converting it to a date/time object and > then manipulating data based on the various components of the timestamp > object (hour, day, year etc.) should be a heck of a lot easier than what > I’m finding and perhaps I’m just not looking in the right place. > > > > You can see my work here: https://github.com/BobLovesData/Apache-Spark-In- > 24-Hours/blob/master/src/net/massstreet/hour10/BayAreaBikeAnalysis.scala > > > > Adaryl "Bob" Wakefield, MBA > Principal > Mass Street Analytics, LLC > 913.938.6685 <(913)%20938-6685> > > www.massstreet.net > > www.linkedin.com/in/bobwakefieldmba > Twitter: @BobLovesData <http://twitter.com/BobLovesData> > > > > >