Hi --

New to Spark and trying to figure out how to do a generate unique counts per
page by date given this raw data:

timestamp,page,userId
1405377264,google,user1
1405378589,google,user2
1405380012,yahoo,user1
..

I can do a groupBy a field and get the count:

val lines=sc.textFile("data.csv")
val csv=lines.map(_.split(","))
// group by page
csv.groupBy(_(1)).count

But not able to see how to do count distinct on userId and also apply
another groupBy on timestamp field. Please let me know how to handle such
cases. 

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to