Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Mon, Mar 26, 2012 at 09:48:52AM +1100, Robert Layton wrote: >It's a good description of DBSCAN. I would point out that the outliers are >found as "The points which do not belong to any current cluster and do not >have enough close neighbours to start a new cluster." Thanks, I have a

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Robert Layton
On 26 March 2012 09:38, Gael Varoquaux wrote: > On Mon, Mar 26, 2012 at 12:27:37AM +0200, Andreas wrote: > > Well as you can tell my motivation for working on the examples > > and the data sets was not all altruistic ;) > > The key to success in a shared project is that every actor should get a >

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Mon, Mar 26, 2012 at 12:27:37AM +0200, Andreas wrote: > Well as you can tell my motivation for working on the examples > and the data sets was not all altruistic ;) The key to success in a shared project is that every actor should get a benefit. I don't work on the scikit for the glory of manki

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
On 03/26/2012 12:31 AM, Gael Varoquaux wrote: > On Mon, Mar 26, 2012 at 12:22:53AM +0200, Andreas wrote: > >> Thanks for the great work. This is really a step forward for the docs! >> > Thanks guys. I must confess that I had a presentation to give tomorow > about clustering and I jumped o

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Mon, Mar 26, 2012 at 12:22:53AM +0200, Andreas wrote: > Thanks for the great work. This is really a step forward for the docs! Thanks guys. I must confess that I had a presentation to give tomorow about clustering and I jumped on the occasion to improve the docs. Gael

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
On 03/26/2012 12:19 AM, Gael Varoquaux wrote: > Thanks for all the feedback. I have included in at merged to master, > because I was running out of time, but it can still be improved! > > Thanks for the great work. This is really a step forward for the docs! ---

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Mon, Mar 26, 2012 at 09:21:21AM +1100, Robert Layton wrote: >This is great, Thanks, >and I think it would be a good idea to include such a >summary table for classification at some point as well. Yes. Actually I believe that every main usecase should have one, at the beginning of

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Robert Layton
On 26 March 2012 09:19, Gael Varoquaux wrote: > Thanks for all the feedback. I have included in at merged to master, > because I was running out of time, but it can still be improved! > > Gael > > > -- > This SF email is s

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
Thanks for all the feedback. I have included in at merged to master, because I was running out of time, but it can still be improved! Gael -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
On 03/26/2012 12:06 AM, Gael Varoquaux wrote: > On Sun, Mar 25, 2012 at 11:56:31PM +0200, Andreas wrote: > >> As far as I can see, your groups are "KMeans + Ward" and "rest". >> I don't know how ward works but looking at the lena example, >> the clusters don't seem to be convex. >> > But

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 11:56:31PM +0200, Andreas wrote: > As far as I can see, your groups are "KMeans + Ward" and "rest". > I don't know how ward works but looking at the lena example, > the clusters don't seem to be convex. But you are looking in the wrong space: the physical space, and not the

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
On 03/25/2012 11:47 PM, Gael Varoquaux wrote: > On Sun, Mar 25, 2012 at 11:38:50PM +0200, Andreas wrote: > >>> Unlike something like spectral clustering, it is the euclidean distance >>> to the centers that is minimized. Thus K-Means will seek clusters that >>> are regular in the flat euclidean

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 11:38:50PM +0200, Andreas wrote: > > Unlike something like spectral clustering, it is the euclidean distance > > to the centers that is minimized. Thus K-Means will seek clusters that > > are regular in the flat euclidean space. > Ok, that's right. Though I would argue that

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
> Unlike something like spectral clustering, it is the euclidean distance > to the centers that is minimized. Thus K-Means will seek clusters that > are regular in the flat euclidean space. > > Ok, that's right. Though I would argue that the distance measure is not the only factor here. MeanSh

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 11:30:36PM +0200, Andreas wrote: > On 03/25/2012 11:32 PM, Gael Varoquaux wrote: > > On Sun, Mar 25, 2012 at 11:23:55PM +0200, Andreas wrote: > >> Without looking at the source, it could be that we initialize GMM > >> with the result of KMeans. > > We do. > Then I would s

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
On 03/25/2012 11:32 PM, Gael Varoquaux wrote: > On Sun, Mar 25, 2012 at 11:23:55PM +0200, Andreas wrote: > >> Without looking at the source, it could be that we initialize GMM >> with the result of KMeans. >> > We do. > > Then I would suggest changing that. Although not sure what the

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 11:22:32PM +0200, Andreas wrote: > >> I'm not sure if "flat geometry" is a good way to describe the case that > >> KMeans works in. I would have said "convex clusters". Not sure in how far > >> that applies to hierarchical clustering, though. > > Euclidean distance. > Can

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 11:23:55PM +0200, Andreas wrote: > Without looking at the source, it could be that we initialize GMM > with the result of KMeans. We do. > I read that if you do this, the GMM > solution rarely changes. No surprising. > Instead, one should only run KMeans for one or two i

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
On 03/25/2012 11:20 PM, Gael Varoquaux wrote: > On Sun, Mar 25, 2012 at 10:51:36PM +0200, Gael Varoquaux wrote: > >>> - You should at least refer to GMMs, as this is the most popular >>> clustering framework that comes with a natural probabilistic setting >>> > >> Agreed. >> >

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
>> I'm not sure if "flat geometry" is a good way to describe the case that >> KMeans works in. I would have said "convex clusters". Not sure in how far >> that applies to hierarchical clustering, though. >> > Euclidean distance. > Can you please elaborate? >> Also, I would mention explic

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 10:51:36PM +0200, Gael Varoquaux wrote: > > - You should at least refer to GMMs, as this is the most popular > > clustering framework that comes with a natural probabilistic setting > Agreed. Actually, on our various examples, it is impressive how much GMMs behave similar

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 10:12:59PM +0200, Andreas wrote: > For the input, I would hope we can implement Olivier's proposal soon > so that we don't need to differentiate the different input types. Agreed. It was literly itching me when I was playing with the example. > I'm not sure if "flat geomet

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 10:22:39PM +0200, Andreas wrote: > I might not have the time next week but after that I can give > it a shot if you don't have the time. It would be great, as I am not a specialist of this method. Gael --

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
On Sun, Mar 25, 2012 at 10:10:51PM +0200, bthirion wrote: > - "Hierarchical clustering -> Few clusters": I thought it was not the > best use case for these algorithms Yes, this is clearly a typo. > - "Hierarchical clustering -> even cluster size": this is not true if > you consider single linka

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
> - You should at least refer to GMMs, as this is the most popular > clustering framework that comes with a natural probabilistic setting > +1 > - With mean shift, I would refer to 'modes' rather than 'blobs'. > +1 In general the mean shift docs could be improved a lot. There is quite a n

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
On 03/25/2012 10:25 PM, Olivier Grisel wrote: > Le 25 mars 2012 22:12, Andreas a écrit : > >> ps: Maybe I'll find time to do the "fit_distance"/"fit_kernel" API in >> one or two weeks. >> > As discussed earlier, I would prefer `fit_symmetric` or `fit_pairwise` > when working with squared

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Olivier Grisel
Le 25 mars 2012 22:12, Andreas a écrit : > > ps: Maybe I'll find time to do the "fit_distance"/"fit_kernel" API in > one or two weeks. As discussed earlier, I would prefer `fit_symmetric` or `fit_pairwise` when working with squared distance / affinity / kernel matrices as main data input. -- Ol

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Andreas
> I am working on a summary table on clustering methods. It is not > finished, I need to do a bit more literature review, however, I'd love > some feedback on the current status: > https://github.com/GaelVaroquaux/scikit-learn/blob/master/doc/modules/clustering.rst > > > Thanks for starting on

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread bthirion
Hi Gael, Here are some suggestions regarding details of the page: - "Hierarchical clustering -> Few clusters": I thought it was not the best use case for these algorithms - "Hierarchical clustering -> even cluster size": this is not true if you consider single linkage, or even in general with Wa

Re: [Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Olivier Grisel
Le 25 mars 2012 20:40, Gael Varoquaux a écrit : > Hi list, > > I am working on a summary table on clustering methods. It is not > finished, I need to do a bit more literature review, however, I'd love > some feedback on the current status: > https://github.com/GaelVaroquaux/scikit-learn/blob/maste

[Scikit-learn-general] Summary table on clustering

2012-03-25 Thread Gael Varoquaux
Hi list, I am working on a summary table on clustering methods. It is not finished, I need to do a bit more literature review, however, I'd love some feedback on the current status: https://github.com/GaelVaroquaux/scikit-learn/blob/master/doc/modules/clustering.rst Cheers, Gaël ---