I knew I was forgetting something, right. Feel free to make an update for the doxs6
On Mon, Nov 29, 2021, 4:49 PM Artemis User <arte...@dtechspace.com> wrote: > Thanks Sean! After a little bit digging through the source code, it seems > that the computeCost method has been replaced by the trainingCost method in > KMeansSummary class. This is the hidden comment in the source code for the > trainingCost method (somehow it wasn't propagated to the online Spark API > doc): > > @param trainingCost K-means cost (sum of squared distances to the nearest > centroid for all points in the training dataset). This is equivalent to > sklearn's inertia. > > Inertia actually means the same as within-cluster sum of squares (WCSS).. > Just wished Spark's documentation could be made better... > > -- ND > > On 11/29/21 3:57 PM, Sean Owen wrote: > > I don't believe there is, directly, though there is ClusteringMetrics to > evaluate clusterings in .ml. I'm kinda confused that it doesn't expose sum > of squared distances though; it computes silhouette only? > You can compute it directly, pretty easily, in any event, either by just > writing up a few lines of code or using the .mllib model inside the .ml > model object anyway. > > On Mon, Nov 29, 2021 at 2:50 PM Artemis User <arte...@dtechspace.com> > wrote: > >> The RDD-based org.apache.spark.mllib.clustering.KMeansModel class >> defines a method called computeCost that is used to calculate the WCSS >> error of K-Means clusters >> ( >> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/mllib/clustering/KMeansModel.html). >> >> Is there an equivalent method of computeCost in the new ml library for >> K-Means? >> >> Thanks in advance! >> >> -- ND >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >