I knew I was forgetting something, right. Feel free to make an update for
the doxs6

On Mon, Nov 29, 2021, 4:49 PM Artemis User <arte...@dtechspace.com> wrote:

> Thanks Sean!  After a little bit digging through the source code, it seems
> that the computeCost method has been replaced by the trainingCost method in
> KMeansSummary class.  This is the hidden comment in the source code for the
> trainingCost method (somehow it wasn't propagated to the online Spark API
> doc):
>
> @param trainingCost K-means cost (sum of squared distances to the nearest
> centroid for all points in the training dataset). This is equivalent to
> sklearn's inertia.
>
> Inertia actually means the same as within-cluster sum of squares (WCSS)..
> Just wished Spark's documentation could be made better...
>
> -- ND
>
> On 11/29/21 3:57 PM, Sean Owen wrote:
>
> I don't believe there is, directly, though there is ClusteringMetrics to
> evaluate clusterings in .ml. I'm kinda confused that it doesn't expose sum
> of squared distances though; it computes silhouette only?
> You can compute it directly, pretty easily, in any event, either by just
> writing up a few lines of code or using the .mllib model inside the .ml
> model object anyway.
>
> On Mon, Nov 29, 2021 at 2:50 PM Artemis User <arte...@dtechspace.com>
> wrote:
>
>> The RDD-based org.apache.spark.mllib.clustering.KMeansModel class
>> defines a method called computeCost that is used to calculate the WCSS
>> error of K-Means clusters
>> (
>> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/mllib/clustering/KMeansModel.html).
>>
>> Is there an equivalent method of computeCost in the new ml library for
>> K-Means?
>>
>> Thanks in advance!
>>
>> -- ND
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

Reply via email to