Hi Michael, as you point out I also think the most straight forward
approach (good enough) is to fit all the polynomials (n = 0,..,3) using
ols, and evaluate the predictive capability by cross-validation. Will
compare this to the lasso approach.
Thanks for your comments!
-fernando
On Sun, Jun 2
On Tue, Jul 1, 2014 at 3:35 AM, Joel Nothman wrote:
> It may be beneficial to use some kind of query expansion or unsupervised
> dimensionality reduction, as the vectors from a bag of words encoding will
> probably be very sparse. Does that help?
>
> How can query expansion help?? I don't think I
A bit more concretely, have a look at this class:
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html
It is a transformer, so you can apply it to any matrix (that doesn't mean
it makes sense, just that you can):
# Create original matrix
X = creat
> If I understand you correctly, one way to reconcile the difference
> between the two interpretations (multinomial vs binomial) would be to
> binarize first my boolean input variable:
Just for the sake of clarity: I meant to add the complement to my
input variable (i.e. as a second feature), rath
It may be beneficial to use some kind of query expansion or unsupervised
dimensionality reduction, as the vectors from a bag of words encoding will
probably be very sparse. Does that help?
On 30 June 2014 03:03, Abijith Kp wrote:
> Hi,
>
> Is it possible to use TfidfVectorizer to cluster very s
They are defined in the beta release of version 0.15.
On 30 June 2014 02:53, Abijith Kp wrote:
> In which version of sklearn, is the above mention 'make_pipeline' and
> 'make_union' defined??
>
> When I read through some example, the idea of using FeatureUnion and
> Pipelined are easy, I guess.
Note: it seems to happen only in code that uses the
multiprocessing.pool.ThreadPool class. However I still cannot
reproduce the failure on toy scripts.
Rich, can you reproduce the problem on randomly generated data? If so
could you please post such a notebook publicly?
--
Olivier
--
Needless to say, if you have a way to reproduce this one with a simpler
case, please let us know. We'd love to track down the origin of the problem
and fix it if it's possible within ipython...
On Mon, Jun 30, 2014 at 12:48 AM, Olivier Grisel
wrote:
> I could reproduce the "ZMQError: Address al
Thanks for your answer.
> The difference seems (thinking out loud) to stem from assumptions
> about the input. feature_selection.chi2 (implicitly) assumes a
> multinomial event model, so each X[i, j] is the frequency with which
> event j was observed when drawing X[i].sum() times from a multinomia
2014-06-30 0:28 GMT+02:00 Christian Jauvin :
> What explains the difference in terms of the Chi-Square value (0.5 vs 2) and
> the P-value (0.48 vs 0.157)?
Here's the feature_extraction.chi2 algorithm:
>>> A = numpy.vstack(([[0,0]] * 18, [[0,1]] * 7, [[1,0]] * 42, [[1,1]] * 33))
>>> X = A[:, [0]]
> Is this necessary for new PCA methods as well? In other words, should I add an
> already deprecated constructor arg to IncrementalPCA as well, or just do the
> whitening inverse_transform the way it will be done in 0.16 and on?
The latter option, I believe.
G
--
Is this necessary for new PCA methods as well? In other words, should I add
an already deprecated constructor arg to IncrementalPCA as well, or just do
the whitening inverse_transform the way it will be done in 0.16 and on?
On Mon, Jun 30, 2014 at 3:20 PM, Gael Varoquaux <
gael.varoqu...@normales
On Mon, Jun 30, 2014 at 02:38:41PM +0200, Alexandre Gramfort wrote:
> I would be +1 adding an invert_whitening param to PCA that would
> default to False in 0.15 and move to True in 0.16 to eventually
> disappear later.
+1
--
hi,
I would be +1 adding an invert_whitening param to PCA that would
default to False in 0.15 and move to True in 0.16 to eventually
disappear later.
Alex
On Mon, Jun 30, 2014 at 8:53 AM, Michael Eickenberg
wrote:
> Kyle is facing the same question for his incremental pca pr
> https://github.c
I could reproduce the "ZMQError: Address already in use" under Python
3.4, IPython 2.1.0 and scikit-learn master when using cross validation
with n_jobs != 1 in IPython notebook on long running jobs.
There might be a problem with IPython notebook and POSIX forks
triggered by the use of mulitprocess
Hi,
Is it possible to use TfidfVectorizer to cluster very small sized texts??
By small I mean with words less than 20.
Or is there any better way to do it.
Regards,
Abijith
--
Abijith KP
github.com/abijith-kp
kpabijith.wordpress.com
-
16 matches
Mail list logo