We have CDH 5.0.2 which doesn't include Spark SQL yet and may only be available in CDH 5.1 which is yet to be released.
If Spark SQL is the only option then I might need to hack around to add it into the current CDH deployment if thats possible. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Count-distinct-with-groupBy-usage-tp9781p9787.html Sent from the Apache Spark User List mailing list archive at Nabble.com.