Hey Ron,
It was pretty much exactly as Sean had depicted. I just needed to provide
count an anonymous function to tell it which elements to count. Since I
wanted to count them all, the function is simply "true".
val grouped = rdd.groupByKey().mapValues { mcs =>
val values = mcs.map(_.foo.toDouble)
val n = values.count(x => true)
val sum = values.sum
val sumSquares = values.map(x => x * x).sum
val stddev = math.sqrt(n * sumSquares - sum * sum) / n
print("stddev: " + stddev)
stddev
}
I hope that helps
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11334.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]