Re: [Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, only 8 in .12 ?

2012-09-10 Thread Andreas Mueller
Hi Denis. As far as I know there was no change in MiniBatchKMeans. Was the result you had before robust?I would guess that there is something non-deterministic somewhere. Cheers, Andy On 09/07/2012 03:56 PM, denis wrote: Folks, what changed in MiniBatchKMeans in .12 ? Running it on datasets.

Re: [Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, only 8 in .12 ?

2012-09-10 Thread Nelle Varoquaux
On 10 September 2012 10:08, Andreas Mueller wrote: > Hi Denis. > As far as I know there was no change in MiniBatchKMeans. > Was the result you had before robust?I would guess > that there is something non-deterministic somewhere. > MiniBatchKMeans uses a random initialisation and can easily fall

Re: [Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, only 8 in .12 ?

2012-09-10 Thread Andreas Mueller
On 09/10/2012 09:31 AM, Nelle Varoquaux wrote: On 10 September 2012 10:08, Andreas Mueller > wrote: Hi Denis. As far as I know there was no change in MiniBatchKMeans. Was the result you had before robust?I would guess that there is something no

Re: [Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, only 8 in .12 ?

2012-09-10 Thread denis
Hi Nelle, hi Andreas, the runs were with random_state=0; running test-mbkmeans.py (does it make sense ?) with seeds 0 .. 5 changing only sklearn version --> test-mbkmeans.py 10sep MiniBatchKMeans( seed 0 .. 5 ) datasets.load_digits() scikit-learn .11 -- cluster sizes MiniBatchKmeans: [275 195

Re: [Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, only 8 in .12 ?

2012-09-10 Thread Andreas Müller
Hi Denis. That is weird. Unfortunately I don't have time to investigate ATM. Maybe someone else does? Also, I thought we would reinitialize clusters with zero points? Cheers, Andy - Ursprüngliche Mail - Von: "denis" An: scikit-learn-general@lists.sourceforge.net Gesendet: Montag, 10. Sep

[Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, 8 in .12 ?

2012-09-10 Thread denis
 Folks,   what changed in MiniBatchKMeans in .12 ? Running it on datasets.load_digits() gave 10 classes in .11 but now only 8 in .12 ? test-mbkmeans.py and logs attached. (Sure the size is too small for MiniBatch and for that matter kmeans is I think general

Re: [Scikit-learn-general] problem clustering using PCA and kmeans

2012-09-10 Thread Aliabbas Petiwala
Apart from the above problem , can anyone suggest how to extract cluster information from dendrogram in scikit, more specifically i want the clusters to be returned as lists of file names of the documents? Thanks On Sun, Sep 9, 2012 at 4:50 PM, Aliabbas Petiwala wrote: > Thanks Olivier that hel

Re: [Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, only 8 in .12 ?

2012-09-10 Thread Olivier Grisel
2012/9/10 Andreas Müller : > Hi Denis. > That is weird. Unfortunately I don't have time to investigate ATM. > Maybe someone else does? > Also, I thought we would reinitialize clusters with zero points? The fact that the effectively returned number 8 is not the requested value (10 in this case) sou

Re: [Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, only 8 in .12 ?

2012-09-10 Thread denis
Andy, no problem -- repeat, kmeans is imho generally weak (but widely used because it's so simple ? semi-supervised svm with only 10 - 20 random samples per digit is quite a bit better (on optdigits), and I'm sure you guys have better semi-sup methods -- tell me more ?) Fwiw KMeans is exactly

[Scikit-learn-general] MiniBatchKMeans 10 classes: n_clusters= not k= Olivier +1

2012-09-10 Thread denis
Olivier, +1, I had k= instead of n_clusters= -- drew a warning but not the same :( Fwiw, for seed in range(5): mbkm = MiniBatchKMeans( 10, random_state=seed, verbose=1 ).fit(digits.data) --> seed 0: clusters [294 205 194 188 185 184 178 165 117 87] seed 1: clusters [280 203 185 181 177

Re: [Scikit-learn-general] MiniBatchKMeans 10 classes: n_clusters= not k= Olivier +1

2012-09-10 Thread Andreas Müller
So using k didn't work?! uh oh. That is a bug! Looks like one more for a 0.12.1 :-( - Ursprüngliche Mail - Von: "denis" An: scikit-learn-general@lists.sourceforge.net Gesendet: Montag, 10. September 2012 11:11:34 Betreff: [Scikit-learn-general] MiniBatchKMeans 10 classes: n_clusters= not

Re: [Scikit-learn-general] MiniBatchKMeans 10 classes: n_clusters= not k= Olivier +1

2012-09-10 Thread Olivier Grisel
2012/9/10 denis : > Olivier, +1, >I had k= instead of n_clusters= -- > drew a warning but not the same :( Thanks for the bug report. > Fwiw, > for seed in range(5): > mbkm = MiniBatchKMeans( 10, random_state=seed, verbose=1 > ).fit(digits.data) > > --> > seed 0: clusters [294 205 194 18

Re: [Scikit-learn-general] MiniBatchKMeans 10 classes: n_clusters= not k= Olivier +1

2012-09-10 Thread Olivier Grisel
2012/9/10 Olivier Grisel : > 2012/9/10 denis : >> Olivier, +1, >>I had k= instead of n_clusters= -- >> drew a warning but not the same :( > > Thanks for the bug report. > >> Fwiw, >> for seed in range(5): >> mbkm = MiniBatchKMeans( 10, random_state=seed, verbose=1 >> ).fit(digits.data) >>

Re: [Scikit-learn-general] kmeans on optdigits

2012-09-10 Thread Olivier Grisel
Please stay on the mailing list using reply-all if it's not the case by default. 2012/9/10 denis : > Olivier, > ok, agree, but > http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html#example-cluster-plot-kmeans-digits-py > suggests that KMeans on optdigits is reasonable, w

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-10 Thread Yaroslav Halchenko
On Sat, 08 Sep 2012, Andreas Mueller wrote: > I didn't know about the deprecation warnings. > For the other warnings: I think using the sklearn.test() > is a bad idea and using ``nosetests sklearn --exe`` eh... I was just using nosetests -sv sklearn ... adding --exe indeed is not a bad idea I gu

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-10 Thread Ark
Olivier Grisel writes: > > 2012/9/6 Ark : > > > >> Hand how large in bytes? It seems that is should be small enough to be > >> able to use sklearn.linear_model.LogisticRegression despite the data > >> copy in memory. > >> > > > > Right now its not even 100M, but it will extend to 1G atleast.

[Scikit-learn-general] Memory explosion with GridSearchCV

2012-09-10 Thread Christian Jauvin
Hi, I'm working on a text classification problem, and the strategy I'm currently studying is based on this example: http://scikit-learn.org/dev/auto_examples/grid_search_text_feature_extraction.html When I replace the data component by my own, I have found that the memory requirement explodes in