[ 
https://issues.apache.org/jira/browse/SPARK-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094668#comment-14094668
 ] 

Kyle Ellrott commented on SPARK-2372:
-------------------------------------

GroupedBinaryClassificationMetrics has been added to the pull request connected 
to this issue.
GroupedBinaryClassificationMetrics is an re-write of the 
BinaryClassificationMetrics methods, but it work on a RDD[KEY,(Double,Double)] 
structure (rather then the RDD[(Double,Double)] that 
BinaryClassificationMetrics takes), where KEY is a generic that will be the 
type of the key used to identified the data set. Now methods return 
Map[KEY,Double], with a different score for each data set, rather then a single 
'Double'

A unit test is included do validate these function work in the same way as the 
BinaryClassificationMetrics implementations.

https://github.com/kellrott/spark/commit/dcabb2f6a39c0940afc39e809a50601f46e50162

> Grouped Optimization/Learning
> -----------------------------
>
>                 Key: SPARK-2372
>                 URL: https://issues.apache.org/jira/browse/SPARK-2372
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.0.1, 1.1.0, 1.0.2
>            Reporter: Kyle Ellrott
>
> The purpose of this patch is the enable MLLib to better handle scenarios 
> where the user would want to do learning on multiple feature/label sets at 
> the same time. Rather then schedule each learning task separately, this patch 
> lets the user create a single RDD with an Int key to represent the 'group' 
> sets of entries belong to.
> This patch establishing the GroupedOptimizer trait, for which 
> GroupedGradientDescent has been implemented. This systems differs from the 
> original Optimizer trait in that the original optimize method accepted 
> RDD[(Int, Vector)] the new GroupedOptimizer accepts RDD[(Int, (Double, 
> Vector))].
> The difference is that the GroupedOptimizer uses a 'group' ID key in the RDD 
> to multiplex multiple optimization operations in the same RDD.
> This patch also establishes the GroupedGeneralizedLinearAlgorithm trait, for 
> which the 'run' method has had the RDD[LabeledPoint] input replaced with 
> RDD[(Int,LabeledPoint)].
> This patch also provides a unit test and utility to take the results of 
> MLUtils.kFold and turn it into a single grouped RDD, ready for simultaneous 
> learning.
> https://github.com/apache/spark/pull/1292



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to