Re: *Metrics API is odd in MLLib

Xiangrui Meng Wed, 17 Jun 2015 17:03:11 -0700

LabeledPoint was used for both classification and regression, where label
type is Double for simplicity. So in BinaryClassificationMetrics, we still
use Double for labels. We compute the confusion matrix at each threshold
internally, but this is not exposed to users (
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L127).
Feel free to submit a PR to make it public. -Xiangrui


On Mon, Jun 15, 2015 at 7:13 AM, Sam <[email protected]> wrote:

>
> Google+
> <https://plus.google.com/app/basic?nopromo=1&source=mog&gl=uk>
> <http://mail.google.com/mail/x/mog-/gp/?source=mog&gl=uk>
> Calendar
> <https://www.google.com/calendar/gpcal?source=mog&gl=uk>
> Web
> <http://www.google.co.uk/?source=mog&gl=uk>
> more
> Inbox
> Apache Spark Email
> GmailNot Work
> S
> [email protected]
> to me
> 0 minutes ago
> Details
> According to
> https://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
>
> The constructor takes `RDD[(Double, Double)]` meaning lables are Doubles,
> this seems odd, shouldn't it be Boolean?  Similarly for MutlilabelMetrics
> (I.e. Should be RDD[(Array[Double], Array[Boolean])]), and for
> MulticlassMetrics the type of both should be generic?
>
> Additionally it would be good if either the ROC output type was changed or
> another method was added that returned confusion matricies, so that the
> hard integer values can be obtained before the divisions. E.g.
>
> ```
> case class Confusion(tp: Int, fp: Int, fn: Int, tn: Int)
> {
>   // bunch of methods for each of the things in the table here
> https://en.wikipedia.org/wiki/Receiver_operating_characteristic
> }
> ...
> def confusions(): RDD[Confusion]
> ```
>

Re: *Metrics API is odd in MLLib

Reply via email to