Hi -- New to Spark and trying to figure out how to do a generate unique counts per page by date given this raw data:
timestamp,page,userId 1405377264,google,user1 1405378589,google,user2 1405380012,yahoo,user1 .. I can do a groupBy a field and get the count: val lines=sc.textFile("data.csv") val csv=lines.map(_.split(",")) // group by page csv.groupBy(_(1)).count But not able to see how to do count distinct on userId and also apply another groupBy on timestamp field. Please let me know how to handle such cases. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781.html Sent from the Apache Spark User List mailing list archive at Nabble.com.