trying again
> On 13 Jun 2015, at 10:15, Robin East <robin.e...@xense.co.uk> wrote:
>
> Here’s typical way to do it:
>
>
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
> 14
> import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
> import org.apache.spark.mllib.linalg.Vectors
>
> // Load and parse the data
> val data = sc.textFile("data/mllib/kmeans_data.txt")
> val parsedData = data.map(s => Vectors.dense(s.split('
> ').map(_.toDouble))).cache()
>
> // Cluster the data into two classes using KMeans
> val numClusters = 2
> val numIterations = 20
> val model = KMeans.train(parsedData, numClusters, numIterations)
>
> val parsedDataClusters = model.predict(parsedData)
> val dataWithClusters = parsedData.zip(parsedDataClusters)
>
>
>> On 12 Jun 2015, at 23:44, Minnow Noir <minnown...@gmail.com
>> <mailto:minnown...@gmail.com>> wrote:
>>
>> Greetings.
>>
>> I have been following some of the tutorials online for Spark k-means
>> clustering. I would like to be able to just "dump" all the cluster values
>> and their centroids to text file so I can explore the data. I have the
>> clusters as such:
>>
>> val clusters = KMeans.train(parsedData, numClusters, numIterations)
>>
>> clusters
>> res2: org.apache.spark.mllib.clustering.KMeansModel =
>> org.apache.spark.mllib.clustering.KMeansModel@59de440b
>>
>> Is there a way to build something akin to a key value RDD that has the
>> center as the key and the array of values associated with that center as the
>> value? I don't see anything in the tutorials, API docs, or the "Learning"
>> book for how to do this.
>>
>> Thank you
>>
>