yuhao yang created SPARK-13223: ---------------------------------- Summary: Add stratified sampling to ML feature engineering Key: SPARK-13223 URL: https://issues.apache.org/jira/browse/SPARK-13223 Project: Spark Issue Type: New Feature Components: ML Reporter: yuhao yang Priority: Minor
I found it useful to add an sampling transformer during a case of fraud detection. It can be used in resampling or overSampling, which in turn is required by ensemble and unbalanced data processing. Internally, it invoke the sampleByKey in Pair RDD operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org