[Scikit-learn-general] Shared scikit/ipython server

2014-08-31 Thread Anders Aagaard
Hi My company is considering setting up some infrastructure for ML. Right now we're either using our own laptops / google comput engine. Has anyone done this/found good tools for it? I was considering looking into GCE/amazon and auto scaling, maybe having it setup another ipython notebook (probabl

Re: [Scikit-learn-general] Is it possible to define a loss function for the naive Bayes classifier?

2014-08-31 Thread Sebastian Raschka
Thanks, Gael, that's very useful information. I will do some hyperparameter tuning via GridSearch on the alpha and priors then for using roc_auc as scoring metric and see how it goes. Best, Sebastian On Aug 31, 2014, at 5:10 PM, Gael Varoquaux wrote: > On Sat, Aug 30, 2014 at 03:53:24PM -04

Re: [Scikit-learn-general] Is it possible to define a loss function for the naive Bayes classifier?

2014-08-31 Thread Gael Varoquaux
On Sat, Aug 30, 2014 at 03:53:24PM -0400, Sebastian Raschka wrote: > I was wondering if it somehow possible to define a loss function to the Naive > Bayes classifier in scikit-learn. No. > For example, let's assume that we are interested in spam vs. ham > classification. In this context, such a

Re: [Scikit-learn-general] Runtime warning scikit 0.15, warning for numpy

2014-08-31 Thread Giuseppe Marco Randazzo
Updated also the post in my blog. Thanks Olivier. A+ Marco On 31 Aug 2014, at 16:49, Olivier Grisel wrote: > To be clearer the original bug reported by Amita has been fixed by > building new wheel packages in this issue: > > https://github.com/scikit-learn/scikit-learn/issues/3548 > > To mak

Re: [Scikit-learn-general] Runtime warning scikit 0.15, warning for numpy

2014-08-31 Thread Olivier Grisel
To be clearer the original bug reported by Amita has been fixed by building new wheel packages in this issue: https://github.com/scikit-learn/scikit-learn/issues/3548 To make sure that you install the lastest wheel packages you can do: pip uninstall -y scikit-learn rm -rf ~/.pip/cache pip instal

Re: [Scikit-learn-general] Runtime warning scikit 0.15, warning for numpy

2014-08-31 Thread Olivier Grisel
2014-08-29 19:01 GMT+01:00 Amita Misra : > Thanks it works now!. > > I have one more question regarding virtual environment. If I use virtual > environment then how can I use scikit, since if I follow the steps as > mentioned above ,it installs the packages in > > /usr/local/Cellar/. Even though I

Re: [Scikit-learn-general] partial-fit in gradient boosting

2014-08-31 Thread Mathieu Blondel
> Is there any other way through which I can train GradientBoostingRegressor for this dataset? No, not yet. However, our implementation of gradient boosting has a `subsample` option for using a subset of the data when building each tree (this is called stochastic gradient boosting in the literatu

Re: [Scikit-learn-general] Custom Scoring Functions for Grid Search

2014-08-31 Thread Gael Varoquaux
On Wed, Aug 20, 2014 at 02:31:46PM +0200, Gael Varoquaux wrote: > > It's been around for so long, but it's also hard to believe that anyone > > exploited this behaviour intentionally. Shall we be bold and just fix > > it with a warning? Joel has implemented this strategy: https://github.com/scikit

Re: [Scikit-learn-general] sparse datasets loading

2014-08-31 Thread Joel Nothman
> We should not encourage users to store sparse data in CSV format. +1 > the technique showed by Lars could be applied to any row oriented format, be it text or data read from the network. Perhaps, but then they can construct a sparse format, such as a dict that is passed to DictVectorizer. On

Re: [Scikit-learn-general] sparse datasets loading

2014-08-31 Thread Mathieu Blondel
I am not convinced we need this, even if only in the docs. We should not encourage users to store sparse data in CSV format. Storing a large high-dimensional dataset in CSV format could easily consume an entire disk (if not compressed). Reading from the network row by row is even worse as it would

Re: [Scikit-learn-general] sparse datasets loading

2014-08-31 Thread Eustache DIEMERT
Well yes, CSV is not particularly suited to sparse data but the technique showed by Lars could be applied to any row oriented format, be it text or data read from the network. 2014-08-31 10:56 GMT+02:00 Mathieu Blondel : > Do you store zero entries explicitly in your CSV format? CSV doesn't > st

Re: [Scikit-learn-general] sparse datasets loading

2014-08-31 Thread Mathieu Blondel
Do you store zero entries explicitly in your CSV format? CSV doesn't strike me as the best choice for representing sparse data... M. On Sun, Aug 31, 2014 at 5:21 PM, Eustache DIEMERT wrote: > @Lars, shouldn't the last line of the for loop be > > indptr.append(indptr[-1]+len(nonzero)) > > rat

Re: [Scikit-learn-general] sparse datasets loading

2014-08-31 Thread Eustache DIEMERT
@Lars, shouldn't the last line of the for loop be indptr.append(indptr[-1]+len(nonzero)) rather than indptr.append(i) ? FYI, here is the PR to include your snippet into the doc: https://github.com/scikit-learn/scikit-learn/pull/3610 Eustache 2014-07-29 11:24 GMT+02:00 Lars Buitinck :