Hi
My company is considering setting up some infrastructure for ML. Right now
we're either using our own laptops / google comput engine. Has anyone done
this/found good tools for it? I was considering looking into GCE/amazon and
auto scaling, maybe having it setup another ipython notebook (probabl
Thanks, Gael,
that's very useful information.
I will do some hyperparameter tuning via GridSearch on the alpha and priors
then for using roc_auc as scoring metric and see how it goes.
Best,
Sebastian
On Aug 31, 2014, at 5:10 PM, Gael Varoquaux
wrote:
> On Sat, Aug 30, 2014 at 03:53:24PM -04
On Sat, Aug 30, 2014 at 03:53:24PM -0400, Sebastian Raschka wrote:
> I was wondering if it somehow possible to define a loss function to the Naive
> Bayes classifier in scikit-learn.
No.
> For example, let's assume that we are interested in spam vs. ham
> classification. In this context, such a
Updated also the post in my blog. Thanks Olivier.
A+
Marco
On 31 Aug 2014, at 16:49, Olivier Grisel wrote:
> To be clearer the original bug reported by Amita has been fixed by
> building new wheel packages in this issue:
>
> https://github.com/scikit-learn/scikit-learn/issues/3548
>
> To mak
To be clearer the original bug reported by Amita has been fixed by
building new wheel packages in this issue:
https://github.com/scikit-learn/scikit-learn/issues/3548
To make sure that you install the lastest wheel packages you can do:
pip uninstall -y scikit-learn
rm -rf ~/.pip/cache
pip instal
2014-08-29 19:01 GMT+01:00 Amita Misra :
> Thanks it works now!.
>
> I have one more question regarding virtual environment. If I use virtual
> environment then how can I use scikit, since if I follow the steps as
> mentioned above ,it installs the packages in
>
> /usr/local/Cellar/. Even though I
> Is there any other way through which I can
train GradientBoostingRegressor for this dataset?
No, not yet.
However, our implementation of gradient boosting has a `subsample` option
for using a subset of the data when building each tree (this is called
stochastic gradient boosting in the literatu
On Wed, Aug 20, 2014 at 02:31:46PM +0200, Gael Varoquaux wrote:
> > It's been around for so long, but it's also hard to believe that anyone
> > exploited this behaviour intentionally. Shall we be bold and just fix
> > it with a warning?
Joel has implemented this strategy:
https://github.com/scikit
> We should not encourage users to store sparse data in CSV format.
+1
> the technique showed by Lars could be applied to any row oriented format,
be it text or data read from the network.
Perhaps, but then they can construct a sparse format, such as a dict that
is passed to DictVectorizer.
On
I am not convinced we need this, even if only in the docs. We should not
encourage users to store sparse data in CSV format. Storing a large
high-dimensional dataset in CSV format could easily consume an entire disk
(if not compressed). Reading from the network row by row is even worse as
it would
Well yes, CSV is not particularly suited to sparse data but the technique
showed by Lars could be applied to any row oriented format, be it text or
data read from the network.
2014-08-31 10:56 GMT+02:00 Mathieu Blondel :
> Do you store zero entries explicitly in your CSV format? CSV doesn't
> st
Do you store zero entries explicitly in your CSV format? CSV doesn't strike
me as the best choice for representing sparse data...
M.
On Sun, Aug 31, 2014 at 5:21 PM, Eustache DIEMERT
wrote:
> @Lars, shouldn't the last line of the for loop be
>
> indptr.append(indptr[-1]+len(nonzero))
>
> rat
@Lars, shouldn't the last line of the for loop be
indptr.append(indptr[-1]+len(nonzero))
rather than
indptr.append(i)
?
FYI, here is the PR to include your snippet into the doc:
https://github.com/scikit-learn/scikit-learn/pull/3610
Eustache
2014-07-29 11:24 GMT+02:00 Lars Buitinck :
13 matches
Mail list logo