yuhao yang created SPARK-13223:
----------------------------------

             Summary: Add stratified sampling to ML feature engineering
                 Key: SPARK-13223
                 URL: https://issues.apache.org/jira/browse/SPARK-13223
             Project: Spark
          Issue Type: New Feature
          Components: ML
            Reporter: yuhao yang
            Priority: Minor


I found it useful to add an sampling transformer during a case of fraud 
detection. It can be used in resampling or overSampling, which in turn is 
required by ensemble and unbalanced data processing.

Internally, it invoke the sampleByKey in Pair RDD operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to