On 3 Oct 2017, at 18:43, Adaryl Wakefield 
<adaryl.wakefi...@hotmail.com<mailto:adaryl.wakefi...@hotmail.com>> wrote:

I gave myself a project to start actually writing Spark programs. I’m using 
Scala and Spark 2.2.0. In my project, I had to do some grouping and filtering 
by dates. It was awful and took forever. I was trying to use dataframes and SQL 
as much as possible. I see that there are date functions in the dataframe API 
but trying to use them was frustrating. Even following code samples was a 
headache because apparently the code is different depending on which version of 
Spark you are using. I was really hoping for a rich set of date functions like 
you’d find in T-SQL but I never really found them.

Is there a best practice for dealing with dates and time in Spark? I feel like 
taking a date/time string and converting it to a date/time object and then 
manipulating data based on the various components of the timestamp object 
(hour, day, year etc.) should be a heck of a lot easier than what I’m finding 
and perhaps I’m just not looking in the right place.

You can see my work here: 
https://github.com/BobLovesData/Apache-Spark-In-24-Hours/blob/master/src/net/massstreet/hour10/BayAreaBikeAnalysis.scala


Once you've done that one, I have a few hundred MB of london bike stats if you 
wan then. Their timestamps come in as strings, but "01/01/1970" is by far the 
most popular dropoff time, which is 0 in the epoch...

9809600,0,6248,01/01/1970 00:00,0,NA,31/01/2012 19:31,365,City Road: Angel
9806201,0,6422,01/01/1970 00:00,0,NA,31/01/2012 19:32,17,Hatton Wall: Holborn
9802063,0,4096,01/01/1970 00:00,0,NA,31/01/2012 19:34,338,Wellington Street : 
Strand
9804765,0,5276,01/01/1970 00:00,0,NA,31/01/2012 19:37,93,Cloudesley Road: Angel
9806779,1970,14,31/01/2012 20:11,410,Edgware Road Station: Paddington
9813333,0,5810,01/01/1970 00:00,0,NA,31/01/2012 19:39,114,Park Road (Baker 
Street): Regent's Park
9803952,0,5682,01/01/1970 00:00,0,NA,31/01/2012 19:41,210,Hinde Street: 
Marylebone
9818659,0,5572,01/01/1970 00:00,0,NA,31/01/2012 19:41,87,Devonshire Square: 
Liverpool Street
9808144,0,5244,01/01/1970 00:00,0,NA,31/01/2012 19:42,374,Waterloo Station 1: 
Waterloo
9814365,0,5422,01/01/1970 00:00,0,NA,31/01/2012 19:48,15,Great Russell Street: 
Bloomsbury
9816863,0,6079,01/01/1970 00:00,0,NA,31/01/2012 19:49,258,Kensington Gore: 
Knightsbridge
9818469,0,4903,01/01/1970 00:00,0,NA,31/01/2012 19:50,341,Craven Street: Strand
9811512,0,5572,01/01/1970 00:00,0,NA,31/01/2012 19:50,298,Curlew Street: Shad 
Thames
9817931,0,708,01/01/1970 00:00,0,NA,31/01/2012 19:51,341,Craven Street: Strand
9816429,0,3210,01/01/1970 00:00,0,NA,31/01/2012 19:59,388,Southampton Street: 
Strand
9806284,0,4359,01/01/1970 00:00,0,NA,31/01/2012 20:06,335,Tavistock Street: 
Covent Garden


Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.massstreet.net<http://www.massstreet.net/>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData<http://twitter.com/BobLovesData>

Reply via email to