[scikit-learn] Can I evaluate clustering efficiency incrementally?

2019-05-03 Thread lampahome
I see some algo can cluster incrementally if dataset is too huge ex: minibatchkmeans and Birch. But is there any way to evaluate incrementally? I found silhouette-coefficient and Calinski-Harabaz index because I don't know the ground truth labels. But they can't evaluate incrementally. __

Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

2019-05-03 Thread Guillaume Lemaître
You can always predict incrementally by predicting on batches of samples. On Fri, 3 May 2019 at 10:05, lampahome wrote: > I see some algo can cluster incrementally if dataset is too huge ex: > minibatchkmeans and Birch. > > But is there any way to evaluate incrementally? > > I found silhouette-c

Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

2019-05-03 Thread Guillaume Lemaître
oh sorry, I see now that you mention about evaluating. On Fri, 3 May 2019 at 10:12, Guillaume Lemaître wrote: > You can always predict incrementally by predicting on batches of samples. > > On Fri, 3 May 2019 at 10:05, lampahome wrote: > >> I see some algo can cluster incrementally if dataset i

Re: [scikit-learn] Can I evaluate clustering efficiency incrementally?

2019-05-03 Thread Uri Goren
I usually use clustering to save costs on labelling. I like to apply hierarchical clustering, and then label a small sample and fine-tune the clustering algorithm. That way, you can evaluate the effectiveness in terms of cluster purity (how many clusters contain mixed labels) See example with skl