Hi, I load json file that has timestamp (as long in milliseconds) and several other attributes. I would like to group them by 5 minutes and store them as separate file.
I am facing couple of problems here.. 1. Using Floor function at select clause (to bucket by 5mins) gives me error saying "java.util.NoSuchElementException: key not found: floor". How do I use floor function in select clause? I see that floor method is available in org.apache.spark.sql.functions clause but not sure why its not working here. 2. Can I use the same in Group by clause? 3. How do I store them as separate file after grouping them? String logPath = "my-json.gz"; DataFrame logdf = sqlContext.read().json(logPath); logdf.registerTempTable("logs"); DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp` as rawTimeStamp, `user.requestId` as requestId, *floor(`user.timestamp`/72000*) as timeBucket FROM logs"); bucketLogs.toJSON().saveAsTextFile("target_file"); Regards Ashok