You need to convert DataFrame to RDD, call sampleByKey, and then apply
the schema back to create DataFrame.
val df: DataFrame = ...
val schema = df.schema
val sampledRDD = df.rdd.keyBy(r => r.getAs[Int](0)).sampleByKey(...).values
val sampled = sqlContext.createDataFrame(sampledRDD, schema)
Hopef
Hi,
I'm in Spark 1.3.0 and my data is in DataFrames.
I need operations like sampleByKey(), sampleByKeyExact().
I saw the JIRA "Add approximate stratified sampling to DataFrame" (
https://issues.apache.org/jira/browse/SPARK-7157).
That's targeted for Spark 1.5, till that comes through, whats the eas