[ https://issues.apache.org/jira/browse/SPARK-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094668#comment-14094668 ]
Kyle Ellrott commented on SPARK-2372: ------------------------------------- GroupedBinaryClassificationMetrics has been added to the pull request connected to this issue. GroupedBinaryClassificationMetrics is an re-write of the BinaryClassificationMetrics methods, but it work on a RDD[KEY,(Double,Double)] structure (rather then the RDD[(Double,Double)] that BinaryClassificationMetrics takes), where KEY is a generic that will be the type of the key used to identified the data set. Now methods return Map[KEY,Double], with a different score for each data set, rather then a single 'Double' A unit test is included do validate these function work in the same way as the BinaryClassificationMetrics implementations. https://github.com/kellrott/spark/commit/dcabb2f6a39c0940afc39e809a50601f46e50162 > Grouped Optimization/Learning > ----------------------------- > > Key: SPARK-2372 > URL: https://issues.apache.org/jira/browse/SPARK-2372 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.0.1, 1.1.0, 1.0.2 > Reporter: Kyle Ellrott > > The purpose of this patch is the enable MLLib to better handle scenarios > where the user would want to do learning on multiple feature/label sets at > the same time. Rather then schedule each learning task separately, this patch > lets the user create a single RDD with an Int key to represent the 'group' > sets of entries belong to. > This patch establishing the GroupedOptimizer trait, for which > GroupedGradientDescent has been implemented. This systems differs from the > original Optimizer trait in that the original optimize method accepted > RDD[(Int, Vector)] the new GroupedOptimizer accepts RDD[(Int, (Double, > Vector))]. > The difference is that the GroupedOptimizer uses a 'group' ID key in the RDD > to multiplex multiple optimization operations in the same RDD. > This patch also establishes the GroupedGeneralizedLinearAlgorithm trait, for > which the 'run' method has had the RDD[LabeledPoint] input replaced with > RDD[(Int,LabeledPoint)]. > This patch also provides a unit test and utility to take the results of > MLUtils.kFold and turn it into a single grouped RDD, ready for simultaneous > learning. > https://github.com/apache/spark/pull/1292 -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org