trying again
> On 13 Jun 2015, at 10:15, Robin East <robin.e...@xense.co.uk> wrote:
> 
> Here’s typical way to do it:
> 
> 
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
> 14
> import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
> import org.apache.spark.mllib.linalg.Vectors
>  
> // Load and parse the data
> val data = sc.textFile("data/mllib/kmeans_data.txt")
> val parsedData = data.map(s => Vectors.dense(s.split(' 
> ').map(_.toDouble))).cache()
>  
> // Cluster the data into two classes using KMeans
> val numClusters = 2
> val numIterations = 20
> val model = KMeans.train(parsedData, numClusters, numIterations)
>  
> val parsedDataClusters = model.predict(parsedData)
> val dataWithClusters   = parsedData.zip(parsedDataClusters)
> 
> 
>> On 12 Jun 2015, at 23:44, Minnow Noir <minnown...@gmail.com 
>> <mailto:minnown...@gmail.com>> wrote:
>> 
>> Greetings.
>> 
>> I have been following some of the tutorials online for Spark k-means 
>> clustering.  I would like to be able to just "dump" all the cluster values 
>> and their centroids to text file so I can explore the data.  I have the 
>> clusters as such:
>> 
>> val clusters = KMeans.train(parsedData, numClusters, numIterations)
>> 
>> clusters
>> res2: org.apache.spark.mllib.clustering.KMeansModel = 
>> org.apache.spark.mllib.clustering.KMeansModel@59de440b
>> 
>> Is there a way to build something akin to a key value RDD that has the 
>> center as the key and the array of values associated with that center as the 
>> value? I don't see anything in the tutorials, API docs, or the "Learning" 
>> book for how to do this.  
>> 
>> Thank you
>> 
> 

Reply via email to