Re: [Scikit-learn-general] K means on a sphere

2013-01-23 Thread Mathieu Blondel
On Thu, Jan 24, 2013 at 9:24 AM, Gael Varoquaux wrote: > Yes, there is a massive difference in amount of work and performance when > you try to replace the Euclidean distance. Amongst other problems, the > mean is no longer the sum divided by the number of points, but the > Frechet mean, which re

Re: [Scikit-learn-general] K means on a sphere

2013-01-23 Thread Alexandre Gramfort
hi Ariel, what I would do, if the data are not too big, is reimplement my kmeans in 10 lines and after you update the centers, normalize them to put them back on the sphere. I don't think you can say much about convergence but it might work in practice. HTH Alex On Thu, Jan 24, 2013 at 1:24 AM,

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-23 Thread JAGANADH G
Hi Peter, Thanks for sharing the experience and code. I will try the same. @Jaques : Thanks for the link. My plan is to use sklearn only . If I have to use Mahout the entire project has to be converted to java. I am interested to accomplish it in Python only !! Best regards jaganadh On Wed,

Re: [Scikit-learn-general] K means on a sphere

2013-01-23 Thread Gael Varoquaux
On Thu, Jan 24, 2013 at 12:34:31AM +0100, Andreas Mueller wrote: > Sorry, custom metrics for K means are not possible at the moment. Yes, there is a massive difference in amount of work and performance when you try to replace the Euclidean distance. Amongst other problems, the mean is no longer th

Re: [Scikit-learn-general] K means on a sphere

2013-01-23 Thread Andreas Mueller
Hi Ariel. Sorry, custom metrics for K means are not possible at the moment. If you wanted to tweak the sklearn implementation, you would have to look into this file: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/k_means_.py#L413 In particular the function _labels_inert

[Scikit-learn-general] K means on a sphere

2013-01-23 Thread Ariel Rokem
Hi everyone, I am interested in using the sklearn implementation of k means to estimate clusters of unit vectors on the surface of a sphere. This requires that the distance metric be changed from the current Euclidean distance metric to angles. Is there any easy way to achieve that with the curr

Re: [Scikit-learn-general] (no subject)

2013-01-23 Thread Andreas Mueller
Am 23.01.2013 20:32, schrieb Ronnie Ghose: > How can _best_score in GridSearchCV be negative? R^2 can only be from > 0 to -1 ...? R^2 can also be negative afaik. It is somewhat unstable for small sample sizes. -- Master

[Scikit-learn-general] (no subject)

2013-01-23 Thread Ronnie Ghose
How can _best_score in GridSearchCV be negative? R^2 can only be from 0 to -1 ...? -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills curr

Re: [Scikit-learn-general] Multilabel questions

2013-01-23 Thread Andreas Mueller
Am 23.01.2013 18:39, schrieb Lars Buitinck: > > if you want more predictions or something... > More in detail: OneVsRestClassifier exports an object called > label_binarizer_, which is used to transform decision function values > D back to class labels. By default, it picks all the classes for whic

Re: [Scikit-learn-general] Multilabel questions

2013-01-23 Thread Lars Buitinck
2013/1/23 Andreas Mueller : > Am 23.01.2013 16:47, schrieb Philipp Singer: >> That's what I originally thought, but then I tried it with just using >> LinearSVC and it magically worked for my sample dataset, really >> interesting. I think it is working now properly. > I'm pretty sure it shouldn't.

Re: [Scikit-learn-general] Multilabel questions

2013-01-23 Thread Andreas Mueller
Am 23.01.2013 16:47, schrieb Philipp Singer: > Hey, > > That's what I originally thought, but then I tried it with just using > LinearSVC and it magically worked for my sample dataset, really > interesting. I think it is working now properly. I'm pretty sure it shouldn't. > What I am asking myself

Re: [Scikit-learn-general] Multilabel questions

2013-01-23 Thread Ronnie Ghose
* bug for On Jan 23, 2013 10:48 AM, "Ronnie Ghose" wrote: > File a bugbor inadequate validation also? > On Jan 23, 2013 10:34 AM, "Andreas Mueller" > wrote: > >> Hi Philipp. >> LinearSVC can not cope with multilabel problems. >> It seems it is not doing enough input validation. >> You have to us

Re: [Scikit-learn-general] Multilabel questions

2013-01-23 Thread Ronnie Ghose
File a bugbor inadequate validation also? On Jan 23, 2013 10:34 AM, "Andreas Mueller" wrote: > Hi Philipp. > LinearSVC can not cope with multilabel problems. > It seems it is not doing enough input validation. > You have to use OneVsRestClassifier together with LinearSVC > to do that afaik. > Che

Re: [Scikit-learn-general] Multilabel questions

2013-01-23 Thread Philipp Singer
Hey, That's what I originally thought, but then I tried it with just using LinearSVC and it magically worked for my sample dataset, really interesting. I think it is working now properly. What I am asking myself is how exactly the decision is made for the multilabel prediction. Is there some w

Re: [Scikit-learn-general] Multilabel questions

2013-01-23 Thread Andreas Mueller
Hi Philipp. LinearSVC can not cope with multilabel problems. It seems it is not doing enough input validation. You have to use OneVsRestClassifier together with LinearSVC to do that afaik. Cheers, Andy Am 23.01.2013 16:27, schrieb Philipp Singer: > Hey guys! > > I am currently trying to do multila

[Scikit-learn-general] Multilabel questions

2013-01-23 Thread Philipp Singer
Hey guys! I am currently trying to do multilabel prediction using textual features (e.g., tfidf). My data consists of a different amount of labels for a sample. One can have just one label and one can have 10 labels. I now simply built a list of tuples for my y vector. So for example: (19, 8,

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-23 Thread Peter Prettenhofer
Hi Jaganadh, I once used hadoop to implement grid search / multi-task learning with hadoop streaming. The setup was fairly simple: I put the serialized dataset (joblib dump) on HDFS and created an input file - one line for each parameter setting for grid search. The map script deserialized the dat

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-23 Thread Jaques Grobler
2013/1/23 JAGANADH G > Hadoop/Dumbo or hadoop This thread may be of some interest : http://news.ycombinator.com/item?id=4968609 Regards J -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Win

[Scikit-learn-general] Using sklearn in Hadoop

2013-01-23 Thread JAGANADH G
Hi All, Does anybody tried using sklearn with Hadoop/Dumbo or hadoop streaming. Please share your thoughts and experience. Best regards -- ** JAGANADH G http://jaganadhg.in *ILUGCBE* http://ilugcbe.org.in --

Re: [Scikit-learn-general] 2D Gaussian Process Regression?

2013-01-23 Thread Andreas Mueller
Actually it looks like John opened a pull request for the feature today: https://github.com/scikit-learn/scikit-learn/pull/1611 -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, J

Re: [Scikit-learn-general] 2D Gaussian Process Regression?

2013-01-23 Thread Andreas Mueller
I am confused now, too. Fernando, you want 2 dimensional targets, right? So y is (n_samples, 2)? This is not possible with the current code afaik. It should be possible to extend the code but that hasn't been done yet. hth, Andy -