Hello -

I have a question on parallelization of model training in Spark ..

Suppose I have this code fragment for training a model with KMeans ..

labeledData.foreachRDD { rdd =>
  val normalizedData: RDD[Vector] = normalize(rdd)
  val trainedModel: KMeansModel = trainModel(normalizedData, noOfClusters)
  //.. compute WCSSE
}

Here labeledData is a DStream that I fetched from Kafka.

Is there any way I can use the above fragment to train multiple models
parallely with different values of noOfClusters ? e.g.

(1 to 100).foreach { i =>
  labeledData.foreachRDD { rdd =>
    val normalizedData: RDD[Vector] = normalize(rdd)
    val trainedModel: KMeansModel = trainModel(normalizedData, i)
    //.. compute WCSSE
  }
}

which will use all available CPUs parallely for the training ..

regards.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/parallelizing-model-training-tp28118.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to