Re: [Scikit-learn-general] __check_build error on sklearn 0.12 import

2012-05-11 Thread W. Bryan Smith
On Fri, May 11, 2012 at 10:47 AM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > > > $ nosetests --exe sklearn > ok... thanks so much. it now does not error out, though it has some of the same AttributeError stuff: Exception AttributeError: AttributeError("'NoneType' object has no

Re: [Scikit-learn-general] __check_build error on sklearn 0.12 import

2012-05-11 Thread Gael Varoquaux
On Fri, May 11, 2012 at 10:44:45AM -0700, W. Bryan Smith wrote: > Thank you Gaël, I can now install and import. > The tests are failing though, log attached. How did you run the tests? The recommended way is now to use $ nosetests --exe sklearn Using 'import sklearn; sklearn.test()' breaks fo

Re: [Scikit-learn-general] __check_build error on sklearn 0.12 import

2012-05-11 Thread W. Bryan Smith
Thank you Gaël, I can now install and import. The tests are failing though, log attached. thanks, bryan On Fri, May 11, 2012 at 10:22 AM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > I could reproduce using an install rather than a build inplace, and I > pushed a fix. > > Thanks for

Re: [Scikit-learn-general] __check_build error on sklearn 0.12 import

2012-05-11 Thread Gael Varoquaux
I could reproduce using an install rather than a build inplace, and I pushed a fix. Thanks for reporting and sorry for the bug, Gaël On Fri, May 11, 2012 at 10:13:51AM -0700, W. Bryan Smith wrote: > On Fri, May 11, 2012 at 3:03 PM, W. Bryan Smith wrote: > >> the source was pulled from git abou

Re: [Scikit-learn-general] __check_build error on sklearn 0.12 import

2012-05-11 Thread W. Bryan Smith
apologies for the formatting... i've configured my account to receive the digest and i don't know how to get back to the original message. response in-line below: Message: 6 Date: Fri, 11 May 2012 15:26:30 +0900 From: Mathieu Blondel Subject: Re: [Scikit-learn-general] __check_build error on skl

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Olivier Grisel
2012/5/11 Mathieu Blondel : > > > On Fri, May 11, 2012 at 11:08 PM, Lars Buitinck wrote: >> >> >> Shouldn't you set the intercept_ as well? >> > Indeed. And there was a typo. Obviously it should be > > clf.coef_ += clf2.coef_ But as demonstrated in my previous message, this won't work as clf.coef

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Mathieu Blondel
On Fri, May 11, 2012 at 11:08 PM, Lars Buitinck wrote: > > Shouldn't you set the intercept_ as well? > > Indeed. And there was a typo. Obviously it should be clf.coef_ += clf2.coef_ Mathieu -- Live Security Virtual Conf

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Mathieu Blondel
On Fri, May 11, 2012 at 11:15 PM, Olivier Grisel wrote: > > +1 : the semantics of warm_start is *only to speedup the convergence* > by starting from a solution closer to the optimal solution of the > convex optimization problem (in this case the final solution will be > the solution of fit(X_subse

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Olivier Grisel
2012/5/11 Gael Varoquaux : > On Fri, May 11, 2012 at 10:47:19PM +0900, Mathieu Blondel wrote: >>    All algorithms which supports a warm_start constructor option should also >>    be usable similarly to partial_fit. For example: > >>    from sklearn.linear_model import Lasso > >>    clf = Lasso(war

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Gael Varoquaux
On Fri, May 11, 2012 at 10:47:19PM +0900, Mathieu Blondel wrote: >All algorithms which supports a warm_start constructor option should also >be usable similarly to partial_fit. For example: >from sklearn.linear_model import Lasso >clf = Lasso(warm_start=True) >clf.fit(X_subset

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Lars Buitinck
2012/5/11 Mathieu Blondel : > Another idea is to learn a different classifier on each subset and use a > mixture of the classifiers. As a mixture weight, a simple choice is 1 / > n_mixtures. > > clf = LinearSVC() > clf.fit(X_subset1, y_subset1) > clf2 = LinearSVC() > clf2.fit(X_subset2, y_subset2)

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Olivier Grisel
2012/5/11 Mathieu Blondel : > > > On Fri, May 11, 2012 at 10:52 PM, Olivier Grisel > wrote: >> >> Unfortunately I don't think you can assign coef_ on liblinear wrapper >> models due to internal memory layout constraints. >> > > Sure you can :) > > https://github.com/scikit-learn/scikit-learn/blob/

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Mathieu Blondel
On Fri, May 11, 2012 at 10:52 PM, Olivier Grisel wrote: > Unfortunately I don't think you can assign coef_ on liblinear wrapper > models due to internal memory layout constraints. > > Sure you can :) https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py#L805 Mathieu --

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Olivier Grisel
2012/5/11 Mathieu Blondel : > All algorithms which supports a warm_start constructor option should also be > usable similarly to partial_fit. For example: > > from sklearn.linear_model import Lasso > > clf = Lasso(warm_start=True) > clf.fit(X_subset1, y_subset1) > clf.fit(X_subset2, y_subset2) > ..

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Mathieu Blondel
All algorithms which supports a warm_start constructor option should also be usable similarly to partial_fit. For example: from sklearn.linear_model import Lasso clf = Lasso(warm_start=True) clf.fit(X_subset1, y_subset1) clf.fit(X_subset2, y_subset2) ... Another idea is to learn a different clas

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Gael Varoquaux
On Fri, May 11, 2012 at 02:53:26PM +0200, Peter Prettenhofer wrote: > Incremental learning is supported via ``partial_fit``, however, for > supervised learning only ``SGDClassifier`` [1] supports it (it should > be easy to add it to ``MultinomialNB`` too [2]). > For clustering you should have a loo

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Peter Prettenhofer
Hi Rafael, Incremental learning is supported via ``partial_fit``, however, for supervised learning only ``SGDClassifier`` [1] supports it (it should be easy to add it to ``MultinomialNB`` too [2]). For clustering you should have a look at ``MinibatchKMeans`` [3] it supports ``partial_fit`` too - a

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Lars Buitinck
2012/5/11 Rafael Calsaverini : > Any of the algorithms implemented in scikit-learn can be incrementally > trained? All estimators that have a partial_fit method can be trained incrementally. That excludes the text vectorizer, unfortunately, but it includes SGDClassifier (approximate linear SVM/log

Re: [Scikit-learn-general] Online or incremental training

2012-05-11 Thread Olivier Grisel
2012/5/11 Rafael Calsaverini : > Any of the algorithms implemented in scikit-learn can be incrementally > trained? > > Three particular things are interesting to me: classifying texts, > unsupervised clustering analysis of texts and hierarchical clustering > analysis of texts. But my set of texts i

[Scikit-learn-general] Online or incremental training

2012-05-11 Thread Rafael Calsaverini
Any of the algorithms implemented in scikit-learn can be incrementally trained? Three particular things are interesting to me: classifying texts, unsupervised clustering analysis of texts and hierarchical clustering analysis of texts. But my set of texts is just too big to load in memory all at on

Re: [Scikit-learn-general] Renaming clustering parameters

2012-05-11 Thread Bertrand Thirion
> The current idea would be to use n_clusters for all clustering > algorithms and n_components > for GMM. +1 B -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat

Re: [Scikit-learn-general] Renaming clustering parameters

2012-05-11 Thread Gael Varoquaux
On Fri, May 11, 2012 at 11:49:50AM +0200, Andreas Mueller wrote: > The current idea would be to use n_clusters for all clustering > algorithms and n_components > for GMM. > Comments? I agree with this choice. It needs to good through a phase of deprecation, but I find that it is a good policy.

[Scikit-learn-general] Renaming clustering parameters

2012-05-11 Thread Andreas Mueller
Hi everybody. I recently opened an issue on renaming the clustering parameters: https://github.com/scikit-learn/scikit-learn/issues/844 At the moment, the parameter in KMeans and MiniBatchKMeans and SpectralClustering is called k, and n_clusters in ward. The number of cluster centers in GMM is ca

Re: [Scikit-learn-general] Get TF-IDF mapped with associated word vector

2012-05-11 Thread Olivier Grisel
2012/5/10 JAGANADH G : > Hi all > > Is there any way to get the TF-IDF value mapped with the word vector in > sklearn. > > I would like to get output like > > w1 -> TF-IDF > w2 -> TF-IDF TF is sample-dependent but the IDF weights for each feature index are stored as an array attribute named `idf_`

Re: [Scikit-learn-general] Need for Speed liftoff: Linear Regression models

2012-05-11 Thread Olivier Grisel
2012/5/11 Vlad Niculae : > A significant part of this project will consist of the benchmark suite > itself, that will need to be run by the CI we will deploy. > > The question is where to host the benchmark suite. Should I create a new repo > in the scikit-learn project? > > scikit-learn/speed >

[Scikit-learn-general] Multinomial Logistic Regression via SGD

2012-05-11 Thread Peter Prettenhofer
Hi list, I've started a work-in-progress PR on multinomial logistic regression for the SGD module. You can find it here [1]. I would really appreciate your input - especially on issues such as API, learning rate schedule, implementation. thanks, Peter [1] https://github.com/scikit-learn/scikit-

Re: [Scikit-learn-general] Need for Speed liftoff: Linear Regression models

2012-05-11 Thread Peter Prettenhofer
I'm +1 for scikit-learn/scikit-learn-speed or scikit-learn/scikit-learn-vbench (if that's what we intend to use for performance regression tests). BTW: please keep me posted w.r.t. your efforts - I would really like to add some performance regression tests for the SGD module - I've the feeling tha

Re: [Scikit-learn-general] Need for Speed liftoff: Linear Regression models

2012-05-11 Thread Vlad Niculae
A significant part of this project will consist of the benchmark suite itself, that will need to be run by the CI we will deploy. The question is where to host the benchmark suite. Should I create a new repo in the scikit-learn project? scikit-learn/speed scikit-learn/scikit-learn-speed scikit-