Hi Andy, thanks.
I found out, it seems that people are not very interested in Dynamic
Bayes Network these years, even the Murphy's BNT has been inactive for
quite a while, any idea why the interest for this method is low these
years? It seems to me a good way to learn and predict regimes in time
s
Dear scikit-learners,
during the last sprint we've spotted an efficiency issue with the numpy.dot for
numpy versions < 1.8. Apparently, the dot allocates additional copies in order
to deliver appropriate input to the underlying BLAS gemm function which expects
Fortran contiguous memory layout f
Hi,
Helge, this ECML/PKDD paper [1] might be helpful in the case of
semi-supervised learning.
Sometimes ago me and one of the authors of [1] talked about implementing
the algorithm in sklearn. I think now is a good time to mention it in the
mailing list. I'm not sure if there is any online semi-s
A poor-man's scikit-learn compatible wrapper around VW would be to call the
command line via popen and feed it data through stdin.
If you do that, create a gist and add it to the third-party snippet list in
https://github.com/scikit-learn/scikit-learn/wiki/Useful-Snippets
Mathieu
On Fri, Aug 23
Thanks for the details. My main advice is still the same: try on small
subsamples with increasing sizes and check the impact of the size of
the training set on the test score.
For a linear binary classifier I am pretty sure that it's not going to
help you to use all the data (unless you learn non-
Thanks a lot Nick and Oliver. To answer your questions:
- how many samples?
About 1 billion rows.
> - how many features?
>
It will depend on the nature of the analyzes. Many of the categorical
variables have taxonomies that can be used to reduce cardinality. Sometimes
I'll want to use these,
> The kind of thing I would like to do is run vowpal-wabbit from within
> scikit learn.
>
I know VW has a C interface now, so it is theoretically possible to develop
a python binding (hunch.net seems down as of now, but John Langford wrote
about it on the blog).
However, memory structures possib
Thanks Lars - I would really like to clarify the problems with my
suggestion, in particular if/how a CLI interface would break the scikit
learn interface. You obviously can immediately identify the problems.
The kind of thing I would like to do is run vowpal-wabbit from within
scikit learn. There
2013/8/23 helge.reike...@gmail.com :
> Good day,
>
> Can anyone perhaps give me an idea of how large datasets scikit-learn
> algorithms typically can handle?
>
> I have about 4 TB of structured data. I might be able to normalize that down
> to say 1 TB if necessary. The tasks would typically be log
Hey Helge
Funny I just saw this drop into my inbox! Hope you are well.
What does your data look like? Is it sparse? For classification tasks
(read: SGDClassifier), one can stream data one-by-one and thus be
"out-of-core" - though in this case I'd recommend doing it in
"mini-batches". This would u
Good day,
Can anyone perhaps give me an idea of how large datasets scikit-learn
algorithms typically can handle?
I have about 4 TB of structured data. I might be able to normalize that
down to say 1 TB if necessary. The tasks would typically be logistic
regression, Naive Bayes, k-Means and possib
11 matches
Mail list logo