Re: [Scikit-learn-general] Gaussian HMM ndarrays shape

2014-02-04 Thread Andy
Hi Alexey. Sorry about the state of the HMMs. We will probably remove them from sklearn pretty soon, giving them their own repo. Still, adding to the documentation should help make the code more useful. You can simply create a pull request: http://scikit-learn.org/dev/developers/index.html So

[Scikit-learn-general] Gaussian HMM ndarrays shape

2014-02-04 Thread Alexey Morozov
I've run into some difficulties with hmm.GaussianHMM recently, and I assume it's because of poorly documented element shapes. Say, I want an HMM with 2 hidden states that emit 1-dimensional variables. Should means be np.array([300,-300]) or np.array([[300],[300]])? Should observation sequence be [

[Scikit-learn-general] GSoC 2014

2014-02-04 Thread Siddharth Agrawal
Hi, I'm Siddharth Agrawal, a final year Computer Science student. I have been an avid follower of the Machine Learning community and have used scikit-learn in the past. I'm very interested in the recently developed field of Unsupervised Feature Learning. I have coded a few basic templates of Spars

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Manoj Kumar
I'm sorry it should just be [image: \frac{\omega_{j}\sum_{i = 1}^n(X_{i}^j)^2 - \alpha + \sum_{i = 1}^n (y_{i} - X'\omega)(X_{j}^i)}{\sum_{i = 1}^n (X_{i}^j)^2+ \beta}] in the first equation On Wed, Feb 5, 2014 at 10:27 AM, Manoj Kumar wrote: > Hi, > > I went through the enet_coordinate_de

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Manoj Kumar
Hi, I went through the enet_coordinate_descent function in cd_fast.pyx. I have some questions which are noobish but I'll go ahead and ask them anyway. It seems in L176 in each cycle, each omega_j is updated as [image: \frac{\omega_{j}\sum_{i = 1}^n(X_{i}^j)^2 - \alpha + \sum_{i = 1}^n (y_{i} - X

Re: [Scikit-learn-general] Weighted logistic regression

2014-02-04 Thread Joel Nothman
Two options: 1. Use SGDClassifier(loss='log').fit(X, y, sample_weight). 2. Use the branch of PR 2784 which ports sample_weight to scikit-learn's liblinear. - Joel On 5 February 2014 11:34, Cory Dolphin wrote: > Hello, > > First, thanks fo

[Scikit-learn-general] Weighted logistic regression

2014-02-04 Thread Cory Dolphin
Hello, First, thanks for this wonderful library, I am an undergraduate engineering student and this tool has opened my mind to ML! I have a problem with repeated samples, which I wish to perform logistic regression on. I expected to be able to pass a vector of weights to associate with the repeat

Re: [Scikit-learn-general] Strange Error Message

2014-02-04 Thread abhishek
I recently had the same problem. Please check the dtype. If its numpy object, convert it to float to use RandomForest On Wed, Feb 5, 2014 at 12:36 AM, Kyle Kastner wrote: > Sorry - just re-read your earlier mail. If you have already used the > standard scaler then something else is going on...

Re: [Scikit-learn-general] Strange Error Message

2014-02-04 Thread Kyle Kastner
Sorry - just re-read your earlier mail. If you have already used the standard scaler then something else is going on... do other regression algorithms at least run? It would be good to figure out if it is particular to the algorithm + data combination, or if many algorithms have the same problem.

Re: [Scikit-learn-general] Strange Error Message

2014-02-04 Thread Kyle Kastner
You may need to scale your features. Look into StandardScaler() - it will subtract the mean and divide by variance for you. Some algorithms have harder times with data which is not between 0 and 1 than others. On Tue, Feb 4, 2014 at 4:37 PM, Lorenzo Isella wrote: > Dear All, > I am back again on

Re: [Scikit-learn-general] Strange Error Message

2014-02-04 Thread Lorenzo Isella
Dear All, I am back again on this topic. when I try to run this small snippet # #!/usr/bin/env python import scipy as s import numpy as n import string import pandas as pd import pickle from sklearn.externals import joblib from sk

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Andy
On 02/04/2014 03:23 PM, Gael Varoquaux wrote: > On Tue, Feb 04, 2014 at 07:51:17PM +0530, Manoj Kumar wrote: >> I tried recompiling from my top level directory using "sudo python setup.py >> build_ext --inplace" , but it doesn't seem to print when I run my script. >> What am I doing wrong? > Usuall

Re: [Scikit-learn-general] LabelEncoder with never seen before values

2014-02-04 Thread Andy
On 02/03/2014 11:01 AM, Lars Buitinck wrote: > 2014-02-02 Andy : >> Now, with respect to sinning: there is really no additional information >> in the labels that could be used during learning. > Actually there is: the presence of classes outside the training set > affects probability distributions.

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-02-04 Thread Felipe Eltermann
>> How much code in our current implementation depends on the data representation? > Not much actually. It now basically boils down to simply write a new splitter object. Everything else remains the same. So basically, I would say that it amounts to 300~ lines of Cython (out of the 2300 lines in o

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Gael Varoquaux
On Tue, Feb 04, 2014 at 07:51:17PM +0530, Manoj Kumar wrote: > I tried recompiling from my top level directory using "sudo python setup.py > build_ext --inplace" , but it doesn't seem to print when I run my script. > What am I doing wrong? You need to regenerate the c files from the cython, by doi

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Manoj Kumar
A really naive question in Cython. Lets say I make some changes in cd_fast.pyx, and I want to debug something by printing it out, how do I do it? I tried recompiling from my top level directory using "sudo python setup.py build_ext --inplace" , but it doesn't seem to print when I run my script. Wh

Re: [Scikit-learn-general] KMeans distance metrics

2014-02-04 Thread Mathieu Blondel
On Tue, Feb 4, 2014 at 8:51 PM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > > Because the update rules change. For instance the mean in the sens of > another metrics ends up being a Frechet means, which can be much more > expensive to compute and, most important, requires specific cod

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-02-04 Thread Gael Varoquaux
On Tue, Feb 04, 2014 at 10:25:43PM +1100, Joel Nothman wrote: > I don't think the multiplication by -1 pertains: it happens before the type > check, so it will have already failed! > I think needing a float is validation required for grid search and > simply not required here; OK ---

Re: [Scikit-learn-general] KMeans distance metrics

2014-02-04 Thread Gael Varoquaux
> I was looking at the KMeans code and wondering why has > euclidean_distances been hardcoded into the source. Can I not use other > distance metrics (L1, Cosine) with KMeans? Because the update rules change. For instance the mean in the sens of another metrics ends up being a Frechet means, which

[Scikit-learn-general] KMeans distance metrics

2014-02-04 Thread Matti Lyra
I was looking at the KMeans code and wondering why has euclidean_distances been hardcoded into the source. Can I not use other distance metrics (L1, Cosine) with KMeans? Matti Lyra DPhil Student Text Analytics Group Chichester 1, R203 School of Engineering and

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-02-04 Thread Joel Nothman
I don't think the multiplication by -1 pertains: it happens before the type check, so it will have already failed! I think needing a float is validation required for grid search and simply not required here; even in grid search it would be nice if there were a distinction between the single object

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-02-04 Thread Gael Varoquaux
OK, I have been rereading the scoring codebase (which it becoming frighteningly complicated), and it seems that the reason that we really need this check is that the scores can multipled by "-1". Maybe we should have a "try/except" where the multiplication happens, and raise a meaningful error onl

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-02-04 Thread Gael Varoquaux
On Tue, Feb 04, 2014 at 08:08:18PM +1100, Joel Nothman wrote: > > not to have him waste hours of computation before crashing > Yet this is precisely what the present code does in this issue... Fail :/ -- Managing the Per

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-02-04 Thread Joel Nothman
> not to have him waste hours of computation before crashing Yet this is precisely what the present code does in this issue... On 4 February 2014 18:32, Gael Varoquaux wrote: > > In the mean time, I wonder why you would even need to do a type checking > here? > > As far as I know, Python strong

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Nick Pentreath
Makes sense Well from what I understand, just getting the multi-class logistic and svm loss and the group lasso penalty into scikit-learn seems like a worthwhile undertaking. — Sent from Mailbox for iPhone On Tue, Feb 4, 2014 at 10:34 AM, Gael Varoquaux wrote: > On Tue, Feb 04, 2014 at 10:32

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Gael Varoquaux
On Tue, Feb 04, 2014 at 10:32:12AM +0200, Nick Pentreath wrote: > Are some of the algorithms too cutting edge or not cited enough, Yes > or some other reason? I think that it is good practice to explore new ideas outside of scikit-learn. It usually takes a lot of effort and time to figure out wh

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Nick Pentreath
That does seem like it would be a very worthwhile project - but why was lightning outside scikit-learn initially? Are some of the algorithms too cutting edge or not cited enough, or some other reason? On Tue, Feb 4, 2014 at 10:28 AM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > On T

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Gael Varoquaux
On Tue, Feb 04, 2014 at 09:04:00AM +0100, Alexandre Gramfort wrote: > > Alex had provided me a link to this gist, > > https://gist.github.com/fabianp/3097107 . Sorry for sounding dumb, but is > > this one of the "strong rules"? > yes http://arxiv.org/pdf/1011.2234 > > And one last question, what

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-02-04 Thread Gael Varoquaux
> In the mean time, I wonder why you would even need to do a type checking here? > As far as I know, Python strongly encourages duck typing. Why not remove the > type checking altogether and just let the exception "bubble up"  when the duck > cannot quack (i.e., when the score function output canno

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Alexandre Gramfort
> Would you be able to tell me which part of the code-base I should play with > for this project? (I'm assuming it is the cython code in > coordinate_descent.pyx) yes. > Some references to literature would definitely help. you have a few on the wiki page. you can also start from wikipedia: htt