On 3 Oct 2017, at 18:43, Adaryl Wakefield <adaryl.wakefi...@hotmail.com<mailto:adaryl.wakefi...@hotmail.com>> wrote:
I gave myself a project to start actually writing Spark programs. I’m using Scala and Spark 2.2.0. In my project, I had to do some grouping and filtering by dates. It was awful and took forever. I was trying to use dataframes and SQL as much as possible. I see that there are date functions in the dataframe API but trying to use them was frustrating. Even following code samples was a headache because apparently the code is different depending on which version of Spark you are using. I was really hoping for a rich set of date functions like you’d find in T-SQL but I never really found them. Is there a best practice for dealing with dates and time in Spark? I feel like taking a date/time string and converting it to a date/time object and then manipulating data based on the various components of the timestamp object (hour, day, year etc.) should be a heck of a lot easier than what I’m finding and perhaps I’m just not looking in the right place. You can see my work here: https://github.com/BobLovesData/Apache-Spark-In-24-Hours/blob/master/src/net/massstreet/hour10/BayAreaBikeAnalysis.scala Once you've done that one, I have a few hundred MB of london bike stats if you wan then. Their timestamps come in as strings, but "01/01/1970" is by far the most popular dropoff time, which is 0 in the epoch... 9809600,0,6248,01/01/1970 00:00,0,NA,31/01/2012 19:31,365,City Road: Angel 9806201,0,6422,01/01/1970 00:00,0,NA,31/01/2012 19:32,17,Hatton Wall: Holborn 9802063,0,4096,01/01/1970 00:00,0,NA,31/01/2012 19:34,338,Wellington Street : Strand 9804765,0,5276,01/01/1970 00:00,0,NA,31/01/2012 19:37,93,Cloudesley Road: Angel 9806779,1970,14,31/01/2012 20:11,410,Edgware Road Station: Paddington 9813333,0,5810,01/01/1970 00:00,0,NA,31/01/2012 19:39,114,Park Road (Baker Street): Regent's Park 9803952,0,5682,01/01/1970 00:00,0,NA,31/01/2012 19:41,210,Hinde Street: Marylebone 9818659,0,5572,01/01/1970 00:00,0,NA,31/01/2012 19:41,87,Devonshire Square: Liverpool Street 9808144,0,5244,01/01/1970 00:00,0,NA,31/01/2012 19:42,374,Waterloo Station 1: Waterloo 9814365,0,5422,01/01/1970 00:00,0,NA,31/01/2012 19:48,15,Great Russell Street: Bloomsbury 9816863,0,6079,01/01/1970 00:00,0,NA,31/01/2012 19:49,258,Kensington Gore: Knightsbridge 9818469,0,4903,01/01/1970 00:00,0,NA,31/01/2012 19:50,341,Craven Street: Strand 9811512,0,5572,01/01/1970 00:00,0,NA,31/01/2012 19:50,298,Curlew Street: Shad Thames 9817931,0,708,01/01/1970 00:00,0,NA,31/01/2012 19:51,341,Craven Street: Strand 9816429,0,3210,01/01/1970 00:00,0,NA,31/01/2012 19:59,388,Southampton Street: Strand 9806284,0,4359,01/01/1970 00:00,0,NA,31/01/2012 20:06,335,Tavistock Street: Covent Garden Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.massstreet.net<http://www.massstreet.net/> www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData<http://twitter.com/BobLovesData>