Re: [Scikit-learn-general] PCA with missing data

2014-03-07 Thread Vijay Desai
Skipper, thanks for your input. On Fri, Mar 7, 2014 at 2:18 PM, Skipper Seabold wrote: > On Fri, Mar 7, 2014 at 2:01 PM, Vijay Desai wrote: >> >> It is actually commodities futures data. >> >> Another way to handle missing data could be to estimate covariance >> matrix by ignoring the missing va

Re: [Scikit-learn-general] PCA with missing data

2014-03-07 Thread Skipper Seabold
On Fri, Mar 7, 2014 at 2:01 PM, Vijay Desai wrote: > It is actually commodities futures data. > > Another way to handle missing data could be to estimate covariance > matrix by ignoring the missing values and then determine eigenvectors > of the covariance matrix to obtain principal components. >

Re: [Scikit-learn-general] PCA with missing data

2014-03-07 Thread Vijay Desai
It is actually commodities futures data. Another way to handle missing data could be to estimate covariance matrix by ignoring the missing values and then determine eigenvectors of the covariance matrix to obtain principal components. On Fri, Mar 7, 2014 at 11:36 AM, Tommy Carstensen wrote: > Is

Re: [Scikit-learn-general] proposal

2014-03-07 Thread vamsi kaushik
hi Gael, thanks for that. Firstly i have some issues to clarify before diving into the proposal part. I have a potential architecture in mind for the sparse implementation firstly i feel a separate baseSplitter ( SparseSplitter) must be implemented which is inherited by the Sparsebestsplitter etc

Re: [Scikit-learn-general] PCA with missing data

2014-03-07 Thread Tommy Carstensen
Is this genotype data by any chance Vijay? Tommy -Original Message- From: Vijay Desai [mailto:vijay.de...@gmail.com] Sent: 07 March 2014 15:08 To: scikit-learn-general@lists.sourceforge.net Subject: [Scikit-learn-general] PCA with missing data Hi all, I was interested in doing PCA on a

Re: [Scikit-learn-general] PCA with missing data

2014-03-07 Thread Gael Varoquaux
On Fri, Mar 07, 2014 at 10:24:33AM -0500, Lee Zamparo wrote: > You would need to impute your missing values first to use the > implementation of PCA in scikit-learn. Alternatively, you could roll > your own (or find a package somewhere) for a Probabilistic PCA that > *can* handle missing values in

Re: [Scikit-learn-general] PCA with missing data

2014-03-07 Thread Lee Zamparo
Hi Vijay, You would need to impute your missing values first to use the implementation of PCA in scikit-learn. Alternatively, you could roll your own (or find a package somewhere) for a Probabilistic PCA that *can* handle missing values in the data. Hope this helps, Lee. On Fri, Mar 7, 2014 at

Re: [Scikit-learn-general] PCA with missing data

2014-03-07 Thread Gael Varoquaux
On Fri, Mar 07, 2014 at 10:07:34AM -0500, Vijay Desai wrote: > I was interested in doing PCA on a matrix with missing data. Can it be > done using scikit-learn? No it cannot. G -- Subversion Kills Productivity. Get off

[Scikit-learn-general] PCA with missing data

2014-03-07 Thread Vijay Desai
Hi all, I was interested in doing PCA on a matrix with missing data. Can it be done using scikit-learn? Pointers to examples/documentation would be helpful. Thanks. Regards, Vijay -- Subversion Kills Productivity. Get off

Re: [Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)

2014-03-07 Thread Manoj Kumar
Okay. Firstly, currently in my timeline, I have put the coordinate-descent based solver at the end of my timeline. Do you want me to move it just after we get LogisticRegression (and LogisticRegressionCV) merged, since then we would be able to see if multinomialLR can be done away with or not. Se

Re: [Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)

2014-03-07 Thread Lars Buitinck
2014-03-07 11:26 GMT+01:00 Manoj Kumar : > I'm sorry but I'm not quite able to get you. Do you mean logistic Regression > itself would handle the multi-output case, instead of the one vs All it does > now? No, for multi-output you'd still want OvR, of course. What I'm saying (asking) is that (whet

Re: [Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)

2014-03-07 Thread Mathieu Blondel
I think it will depend on the multiclass LR objective used. Depending on the objective, we need to learn n_classes vectors or n_classes - 1 vectors. In the former case, a multiclass LR will do twice more work as a binary LR. One advantage of OvA is that it is embarrassingly parallel w.r.t. classes

Re: [Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)

2014-03-07 Thread Manoj Kumar
Hi Lars, I'm sorry but I'm not quite able to get you. Do you mean logistic Regression itself would handle the multi-output case, instead of the one vs All it does now? On Fri, Mar 7, 2014 at 3:35 PM, Vlad Niculae wrote: > In some cases it might be preferable to fit an OvA model. In those > cas

Re: [Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)

2014-03-07 Thread Vlad Niculae
In some cases it might be preferable to fit an OvA model. In those cases, I think the user code would look nicer and more explicit if it'd use the sklearn.multiclass.OneVsRest encoder. The downside is that we'll need to go through an ugly deprecation cycle for a major class in the library. With

Re: [Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)

2014-03-07 Thread Lars Buitinck
2014-03-06 18:41 GMT+01:00 Manoj Kumar : > I have prepared a wiki page for the first draft of my GSoC proposal after > several discussions. Please do have a look and provide me feedback. > https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2014-Application:-Improved-Linear-Models Technical que