*Metrics API is odd in MLLib

Sam Mon, 15 Jun 2015 07:13:57 -0700

Google+
<https://plus.google.com/app/basic?nopromo=1&source=mog&gl=uk>
<http://mail.google.com/mail/x/mog-/gp/?source=mog&gl=uk>
Calendar
<https://www.google.com/calendar/gpcal?source=mog&gl=uk>
Web
<http://www.google.co.uk/?source=mog&gl=uk>
more
Inbox
Apache Spark Email
GmailNot Work
S
sam.sav...@barclays.com
to me
0 minutes ago
Details
According to
https://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.mllib.evaluation.BinaryClassificationMetrics


The constructor takes `RDD[(Double, Double)]` meaning lables are Doubles,
this seems odd, shouldn't it be Boolean?  Similarly for MutlilabelMetrics
(I.e. Should be RDD[(Array[Double], Array[Boolean])]), and for
MulticlassMetrics the type of both should be generic?

Additionally it would be good if either the ROC output type was changed or
another method was added that returned confusion matricies, so that the
hard integer values can be obtained before the divisions. E.g.

```
case class Confusion(tp: Int, fp: Int, fn: Int, tn: Int)
{
  // bunch of methods for each of the things in the table here
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
}
...
def confusions(): RDD[Confusion]
```

*Metrics API is odd in MLLib

Reply via email to