Re: [Scikit-learn-general] Retrieve the coefficients of fitted polynomial using LASSO

2014-06-30 Thread Fernando Paolo
Hi Michael, as you point out I also think the most straight forward approach (good enough) is to fit all the polynomials (n = 0,..,3) using ols, and evaluate the predictive capability by cross-validation. Will compare this to the lasso approach. Thanks for your comments! -fernando On Sun, Jun 2

Re: [Scikit-learn-general] Clustering using TfidfVectorizer

2014-06-30 Thread Abijith Kp
On Tue, Jul 1, 2014 at 3:35 AM, Joel Nothman wrote: > It may be beneficial to use some kind of query expansion or unsupervised > dimensionality reduction, as the vectors from a bag of words encoding will > probably be very sparse. Does that help? > > How can query expansion help?? I don't think I

Re: [Scikit-learn-general] Clustering using TfidfVectorizer

2014-06-30 Thread Robert Layton
A bit more concretely, have a look at this class: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html It is a transformer, so you can apply it to any matrix (that doesn't mean it makes sense, just that you can): # Create original matrix X = creat

Re: [Scikit-learn-general] Difference between sklearn.feature_selection.chi2 and scipy.stats.chi2_contingency

2014-06-30 Thread Christian Jauvin
> If I understand you correctly, one way to reconcile the difference > between the two interpretations (multinomial vs binomial) would be to > binarize first my boolean input variable: Just for the sake of clarity: I meant to add the complement to my input variable (i.e. as a second feature), rath

Re: [Scikit-learn-general] Clustering using TfidfVectorizer

2014-06-30 Thread Joel Nothman
It may be beneficial to use some kind of query expansion or unsupervised dimensionality reduction, as the vectors from a bag of words encoding will probably be very sparse. Does that help? On 30 June 2014 03:03, Abijith Kp wrote: > Hi, > > Is it possible to use TfidfVectorizer to cluster very s

Re: [Scikit-learn-general] Strings as features

2014-06-30 Thread Joel Nothman
They are defined in the beta release of version 0.15. On 30 June 2014 02:53, Abijith Kp wrote: > In which version of sklearn, is the above mention 'make_pipeline' and > 'make_union' defined?? > > When I read through some example, the idea of using FeatureUnion and > Pipelined are easy, I guess.

Re: [Scikit-learn-general] Scikit learn's multiprocessing

2014-06-30 Thread Olivier Grisel
Note: it seems to happen only in code that uses the multiprocessing.pool.ThreadPool class. However I still cannot reproduce the failure on toy scripts. Rich, can you reproduce the problem on randomly generated data? If so could you please post such a notebook publicly? -- Olivier --

Re: [Scikit-learn-general] Scikit learn's multiprocessing

2014-06-30 Thread Fernando Perez
Needless to say, if you have a way to reproduce this one with a simpler case, please let us know. We'd love to track down the origin of the problem and fix it if it's possible within ipython... On Mon, Jun 30, 2014 at 12:48 AM, Olivier Grisel wrote: > I could reproduce the "ZMQError: Address al

Re: [Scikit-learn-general] Difference between sklearn.feature_selection.chi2 and scipy.stats.chi2_contingency

2014-06-30 Thread Christian Jauvin
Thanks for your answer. > The difference seems (thinking out loud) to stem from assumptions > about the input. feature_selection.chi2 (implicitly) assumes a > multinomial event model, so each X[i, j] is the frequency with which > event j was observed when drawing X[i].sum() times from a multinomia

Re: [Scikit-learn-general] Difference between sklearn.feature_selection.chi2 and scipy.stats.chi2_contingency

2014-06-30 Thread Lars Buitinck
2014-06-30 0:28 GMT+02:00 Christian Jauvin : > What explains the difference in terms of the Chi-Square value (0.5 vs 2) and > the P-value (0.48 vs 0.157)? Here's the feature_extraction.chi2 algorithm: >>> A = numpy.vstack(([[0,0]] * 18, [[0,1]] * 7, [[1,0]] * 42, [[1,1]] * 33)) >>> X = A[:, [0]]

Re: [Scikit-learn-general] PCA inverse transform

2014-06-30 Thread Gael Varoquaux
> Is this necessary for new PCA methods as well? In other words, should I add an > already deprecated constructor arg to IncrementalPCA as well, or just do the > whitening inverse_transform the way it will be done in 0.16 and on? The latter option, I believe. G --

Re: [Scikit-learn-general] PCA inverse transform

2014-06-30 Thread Kyle Kastner
Is this necessary for new PCA methods as well? In other words, should I add an already deprecated constructor arg to IncrementalPCA as well, or just do the whitening inverse_transform the way it will be done in 0.16 and on? On Mon, Jun 30, 2014 at 3:20 PM, Gael Varoquaux < gael.varoqu...@normales

Re: [Scikit-learn-general] PCA inverse transform

2014-06-30 Thread Gael Varoquaux
On Mon, Jun 30, 2014 at 02:38:41PM +0200, Alexandre Gramfort wrote: > I would be +1 adding an invert_whitening param to PCA that would > default to False in 0.15 and move to True in 0.16 to eventually > disappear later. +1 --

Re: [Scikit-learn-general] PCA inverse transform

2014-06-30 Thread Alexandre Gramfort
hi, I would be +1 adding an invert_whitening param to PCA that would default to False in 0.15 and move to True in 0.16 to eventually disappear later. Alex On Mon, Jun 30, 2014 at 8:53 AM, Michael Eickenberg wrote: > Kyle is facing the same question for his incremental pca pr > https://github.c

Re: [Scikit-learn-general] Scikit learn's multiprocessing

2014-06-30 Thread Olivier Grisel
I could reproduce the "ZMQError: Address already in use" under Python 3.4, IPython 2.1.0 and scikit-learn master when using cross validation with n_jobs != 1 in IPython notebook on long running jobs. There might be a problem with IPython notebook and POSIX forks triggered by the use of mulitprocess

[Scikit-learn-general] Clustering using TfidfVectorizer

2014-06-30 Thread Abijith Kp
Hi, Is it possible to use TfidfVectorizer to cluster very small sized texts?? By small I mean with words less than 20. Or is there any better way to do it. Regards, Abijith -- Abijith KP github.com/abijith-kp kpabijith.wordpress.com -