Hello - I have a question on parallelization of model training in Spark ..
Suppose I have this code fragment for training a model with KMeans .. labeledData.foreachRDD { rdd => val normalizedData: RDD[Vector] = normalize(rdd) val trainedModel: KMeansModel = trainModel(normalizedData, noOfClusters) //.. compute WCSSE } Here labeledData is a DStream that I fetched from Kafka. Is there any way I can use the above fragment to train multiple models parallely with different values of noOfClusters ? e.g. (1 to 100).foreach { i => labeledData.foreachRDD { rdd => val normalizedData: RDD[Vector] = normalize(rdd) val trainedModel: KMeansModel = trainModel(normalizedData, i) //.. compute WCSSE } } which will use all available CPUs parallely for the training .. regards. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/parallelizing-model-training-tp28118.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org