Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Andreas Mueller
+1 On 07/31/2015 04:50 PM, Sebastian Raschka wrote: Hi, Timo, wow, the code really short, well organized and commented. But it's probably better to submit a pull request so that people can directly comment on sections of the code and get notifications and updates. Best, Sebastian On Jul 31,

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Sebastian Raschka
Hi, Timo, wow, the code really short, well organized and commented. But it's probably better to submit a pull request so that people can directly comment on sections of the code and get notifications and updates. Best, Sebastian > On Jul 31, 2015, at 4:35 PM, Timo Erkkilä wrote: > > Good idea

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Timo Erkkilä
Good ideas. I'm fine integrating the code to Scikit-Learn even though it's a bit of work. :) I've pushed the first version of the code under feature branch "kmedoids": https://github.com/terkkila/scikit-learn/blob/kmedoids/sklearn/cluster/k_medoids_.py I've added drafts of the "clustering" and "d

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Sebastian Raschka
To address the efficiency issue for large datasets (to some extend), we could maybe have a `clustering` argument where `clustering='pam'` or `clustering='clara'`; 'pam' should probably be the default. In a nutshell, CLARA repeatedly draws random samples (k < n_samples), applies PAM to them, and

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Andreas Mueller
Cool. Including the code in scikit-learn is often a bit of a process but it might indeed be interesting. You could just start with a pull request - or publish a gist if you don't think you'll have time to work on the inclusion and leave that part to someone else. Cheers, Andy On 07/31/2015 0

Re: [Scikit-learn-general] Implementation of DBCLASD for clustering

2015-07-31 Thread Andreas Mueller
Hi Sebastian. Have you seen this used much recently? How does it compare against DBSCAN, BIRCH, OPTICS or just KMeans? Cheers, Andy On 07/31/2015 10:28 AM, Sebastián Palacio wrote: Hello all, I've been investigating clustering algorithms with special interest in non-parametric methods and,

Re: [Scikit-learn-general] Implementation of DBCLASD for clustering

2015-07-31 Thread Sebastián Palacio
Hello all, I've been investigating clustering algorithms with special interest in non-parametric methods and, one that is being mentioned quite often is DBCLASD [1]. I've looked around but I haven't been able to find one single implementation of this algorithm whatsoever so I decided to implement

[Scikit-learn-general] Conditional Inference Trees?

2015-07-31 Thread Daniel Homola
Hi all, I was checking the archive of the mailing list to see if there were any attempts in the past to incorporate Conditional Inferences Trees into the Ensemble module. I've found a mail from Theo Strinopoulos (07-07-2013) asking if this would be welcomed as a contribution of his. Gilles Lo

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Timo Erkkilä
That makes sense. The basic implementation is definitely short, just ~20 lines of code if you don't count comments etc. I can put the source code available so that you can judge whether it's good to take further. I am familiar with the documentation libraries you are using (Sphinx with Numpy style

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Gael Varoquaux
> Is it required that an algorithm, which is implemented in Scikit-Learn, scales > well wrt n_samples?  The requirement is 'be actually useful', which is something that is a bit hard to judge :). I think that K-medoids is bordeline on this requirement, probably on the right side of the border. I

Re: [Scikit-learn-general] KMedoids algorithm in Scikit-Learn

2015-07-31 Thread Timo Erkkilä
I was using a dynamic time warping (DTW) distance with KMedoids, which made more sense than using euclidean distance since the profiles indeed had warps along the time axis. DTW implementation was taken from MLPY since it's not in Scikit-Learn either. Is it required that an algorithm, which is imp