[Scikit-learn-general] Coordinated descent in linear models beyond squared loss GSOC

2012-03-27 Thread Immanuel B
Hello all, before attempting a detailed proposal I would like to discuss the big picture with you. I went though the two referenced papers and my feeling is that glmnet as coordinate descent method could be a good choice especially since the connection with strong rule approach is already available

Re: [Scikit-learn-general] svm.sparse.SVC(): Unexpected ERROR

2012-03-27 Thread Dimitrios Pritsos
I think I found it - but I have to test it again with the whole data set and let you know. So when I am using only one tag in the Y for example [1, 1, 1, 1, 1, 1, 1, 1]. it is returing the error I metioned in my first post. But when I am having something like this [1, 1,1, 1, 1, 1, 1, 2]. It

Re: [Scikit-learn-general] svm.sparse.SVC(): Unexpected ERROR

2012-03-27 Thread Gael Varoquaux
On Tue, Mar 27, 2012 at 08:20:11PM +0300, Dimitrios Pritsos wrote: > So Should I send the whole thing or the parts are creating the matrix? Just save X and y and create a gist that can reproduce the problem without the external dependencies. G

Re: [Scikit-learn-general] svm.sparse.SVC(): Unexpected ERROR

2012-03-27 Thread Dimitrios Pritsos
Hello Peter, Yes I can do that but the codes is using a lib I have implemented for raw HTML to Vector conversion. So Should I send the whole thing or the parts are creating the matrix? Regards, Dimitrios On 03/27/2012 08:08 PM, Peter Prettenhofer wrote: > Dimitrios, > > please provide an

Re: [Scikit-learn-general] svm.sparse.SVC(): Unexpected ERROR

2012-03-27 Thread Dimitrios Pritsos
Hello Vlad, Yes 18 it is just for Debugging because I have implemented a Locally Weighted Bag Of Words that requires several Gaussian PDFs to Smooth out the Data and it is a quite time consuming process. So 18 is just enough for Debugging. Later will uses about 800 etc. Train_Y is a list si

Re: [Scikit-learn-general] svm.sparse.SVC(): Unexpected ERROR

2012-03-27 Thread Peter Prettenhofer
Dimitrios, please provide an example script so that we can reproduce the error. BTW: gist [1] is a handy tool to distribute scripts. [1] https://gist.github.com/ best, Peter 2012/3/27 Dimitrios Pritsos : > > Hello, > > While I am  svm.sparse.SVC with a scipy.sparse.csr_matrix the following >

Re: [Scikit-learn-general] svm.sparse.SVC(): Unexpected ERROR

2012-03-27 Thread Vlad Niculae
Hello Dimitrios You only have 18 samples? What is the shape of your train_Y? Best, Vlad On Mar 27, 2012, at 19:31 , Dimitrios Pritsos wrote: > > Hello, > > While I am svm.sparse.SVC with a scipy.sparse.csr_matrix the following error > occurs: > > File > "/home/dimitrios/Development_Wo

[Scikit-learn-general] svm.sparse.SVC(): Unexpected ERROR

2012-03-27 Thread Dimitrios Pritsos
Hello, While I am svm.sparse.SVC with a scipy.sparse.csr_matrix the following error occurs: File "/home/dimitrios/Development_Workspace/webgenreidentification/src/experiments_lowbow.py", line 115, in evaluate csvm.fit(train_X, train_Y) File "/usr/local/lib/python2.6/dist-packages/s

Re: [Scikit-learn-general] covertype benchmark and unexpected extra trees and random forest results

2012-03-27 Thread Peter Prettenhofer
2012/3/27 Paolo Losi : > Gilles, > > thank you very much for having checked. > > If everyone agrees I'll: > > - uncomment extratrees and randomforest benchmark (@pprett is there >   any valid reason to leave them out?) no, absolutely not - I just forgot to uncomment them - thx > - explicitly conf

Re: [Scikit-learn-general] covertype benchmark and unexpected extra trees and random forest results

2012-03-27 Thread Olivier Grisel
Le 27 mars 2012 14:50, Paolo Losi a écrit : > Gilles, > > thank you very much for having checked. > > If everyone agrees I'll: > > - uncomment extratrees and randomforest benchmark (@pprett is there >   any valid reason to leave them out?) They are far slower to run that the other. Ideally a com

Re: [Scikit-learn-general] covertype benchmark and unexpected extra trees and random forest results

2012-03-27 Thread Paolo Losi
Gilles, thank you very much for having checked. If everyone agrees I'll: - uncomment extratrees and randomforest benchmark (@pprett is there any valid reason to leave them out?) - explicitly config max_features=None for RandomForest and ExtraTrees Thanks again Paolo On Tue, Mar 27, 2012 at

Re: [Scikit-learn-general] covertype benchmark and unexpected extra trees and random forest results

2012-03-27 Thread Gilles Louppe
Hi, Using max_features="auto" (default setting) indeed yields the results that Paolo reports. When setting max_features=None (i.e., using all features as in our earlier code), I got the following on my machine: RandomForest 778.1471s 1.2830s 0.0248 Extra-Trees 1325.2397s 1.3544s 0.01

Re: [Scikit-learn-general] covertype benchmark and unexpected extra trees and random forest results

2012-03-27 Thread Peter Prettenhofer
Interesting - covtype involves a number of categorical attributes which are represented via a one-hot encoding - do you think that such a representation has a significant effect on feature sampling and thus the performance of random forests? 2012/3/27 Gilles Louppe : > Hi, > > I am running the tes

Re: [Scikit-learn-general] covertype benchmark and unexpected extra trees and random forest results

2012-03-27 Thread Gilles Louppe
Hi, I am running the tests again, but indeed I think the difference in the results comes from that fact that max_features=sqrt(n_features) now by default whereas it was max_features=n_features before. Gilles On 27 March 2012 11:56, Paolo Losi wrote: > Thanks Peter, > > On Tue, Mar 27, 2012 at 1

Re: [Scikit-learn-general] covertype benchmark and unexpected extra trees and random forest results

2012-03-27 Thread Paolo Losi
Thanks Peter, On Tue, Mar 27, 2012 at 11:34 AM, Peter Prettenhofer < peter.prettenho...@gmail.com> wrote: > Paolo, > > I noticed that too - maybe @glouppe can comment on this - I think the > reason was a change in the ``n_features`` heuristic but I might be > mistaken. > Gilles, can you give a q

Re: [Scikit-learn-general] covertype benchmark and unexpected extra trees and random forest results

2012-03-27 Thread Peter Prettenhofer
Paolo, I noticed that too - maybe @glouppe can comment on this - I think the reason was a change in the ``n_features`` heuristic but I might be mistaken. Concerning the GaussianNB - there's a PR [1] adressing a critical bug in the estimator - it should be merged ASAP. Furthermore, test time is qu

Re: [Scikit-learn-general] tf-idf changes

2012-03-27 Thread Jaques Grobler
Thanks a lot. I've let the author know J Le 26 mars 2012 14:14, Jaques Grobler a ?crit : > > > Hi everyone- > > > > > > I stumbled upon this post that offers a quick run-trough of > > > text-feature-extraction using > > > sklearn.feature_extraction.text's?CountVectorizer: > > > > > > > > > http:

Re: [Scikit-learn-general] To prospective GSOC students

2012-03-27 Thread Andreas
On 03/27/2012 12:41 AM, David Warde-Farley wrote: > On Mon, Mar 26, 2012 at 11:38:51PM +0200, Gael Varoquaux wrote: > > >> Also, for more senior contributors, if you feel like being a mentor, >> don't hesitate to contact me. It would be great to have a fair number of >> prospective mentors with