I see some algo can cluster incrementally if dataset is too huge ex:
minibatchkmeans and Birch.
But is there any way to evaluate incrementally?
I found silhouette-coefficient and Calinski-Harabaz index because I don't
know the ground truth labels.
But they can't evaluate incrementally.
__
You can always predict incrementally by predicting on batches of samples.
On Fri, 3 May 2019 at 10:05, lampahome wrote:
> I see some algo can cluster incrementally if dataset is too huge ex:
> minibatchkmeans and Birch.
>
> But is there any way to evaluate incrementally?
>
> I found silhouette-c
oh sorry, I see now that you mention about evaluating.
On Fri, 3 May 2019 at 10:12, Guillaume Lemaître
wrote:
> You can always predict incrementally by predicting on batches of samples.
>
> On Fri, 3 May 2019 at 10:05, lampahome wrote:
>
>> I see some algo can cluster incrementally if dataset i
I usually use clustering to save costs on labelling.
I like to apply hierarchical clustering, and then label a small sample and
fine-tune the clustering algorithm.
That way, you can evaluate the effectiveness in terms of cluster purity
(how many clusters contain mixed labels)
See example with skl