Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread Florian Lindner
Am Montag, 25. November 2013, 12:33:25 schrieb abhishek: > a simple way of cleaning the html tags is using NLTK's "clean_html" Hey, thx, didn't know about that. Just for information: this is now be done by BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text It will so

[Scikit-learn-general] Fwd: Problem with scikit learn kernel PCA

2013-11-25 Thread Vlad Niculae
Dear Mohammadjavad, This kind of questions are best directed to the scikit-learn mailing list (and I am forwarding it there). In this case, as the preimage is just the inverse transformation between spaces, I don't think it would make much sense to use a different kernel, so I guess it will be the

[Scikit-learn-general] Combine probabilities

2013-11-25 Thread Abhi
How can we combine probabilities from multiple classifiers in sklearn? [Classifiers are trained on similar type datasets, difference being their sizes and the way each result might be used]. I am using SGDClassifier to train the individual classifiers, and need to choose the best amongst them. B

Re: [Scikit-learn-general] Random forest with zero features

2013-11-25 Thread Michal Romaniuk
Hi everyone, I submitted a pull request to enable grid_search with failing classifiers. Did anyone have some time to look at it? Thanks, Michal On 08/11/13 17:56, Michal Romaniuk wrote: > Did anyone work on this problem (exceptions raised by classifiers in > grid search) since? I would be happy

Re: [Scikit-learn-general] MiniBatchKmeans crashes

2013-11-25 Thread Douwe Kiela
On Sun, Nov 24, 2013 at 6:02 PM, Olivier Grisel wrote: > Thanks for the reproduction case. Could you please open a new issue on > github? Just for the sake of completeness, the ticket is here: https://github.com/scikit-learn/scikit-learn/issues/2611 Let me know if there is anything I can do to

Re: [Scikit-learn-general] MiniBatchKmeans crashes

2013-11-25 Thread Jaques Grobler
@ogrisel I can reproduce this but at first glance don't really know what's causing this. You have any thoughts on this crash, Olivier? Regards, J 2013/11/24 Olivier Grisel > Thanks for the reproduction case. Could you please open a new issue on > github? > > -- > Olivier > > >

Re: [Scikit-learn-general] AdaBoostClassifier work with sparse matrix

2013-11-25 Thread Mathieu Blondel
Adaboost seems to always enforce dense arrays, irrespective of the base estimator: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/weight_boosting.py#L93 It should at least be possible to use Adaboost with sparse matrices if the base estimator supports them (which is the

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread Tadej Stajner
Hi, Python has the built-in email package which could be useful for you at least for the multipart stuff and the metadata. http://docs.python.org/2/library/email-examples.html http://docs.python.org/3/library/email-examples.html On how to construct features, it depends on what you need to do -

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread Jaques Grobler
@Florian - Abhishek's suggestion is the way to go. Simple and works well [?] 2013/11/25 abhishek > a simple way of cleaning the html tags is using NLTK's "clean_html" > > > On Mon, Nov 25, 2013 at 12:30 PM, Jaques Grobler > wrote: > >> Hey Florian, >> >> So you need some lexical analyzer to re

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread abhishek
a simple way of cleaning the html tags is using NLTK's "clean_html" On Mon, Nov 25, 2013 at 12:30 PM, Jaques Grobler wrote: > Hey Florian, > > So you need some lexical analyzer to remove all the HTML tags etc before > you start your classification? > I'm not sure about any ready-to-use packages

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread Jaques Grobler
Hey Florian, So you need some lexical analyzer to remove all the HTML tags etc before you start your classification? I'm not sure about any ready-to-use packages for this (I'm sure they're out there), but I've played around with pythons `re` module at some point and now found this which might be u

Re: [Scikit-learn-general] AdaBoostClassifier work with sparse matrix

2013-11-25 Thread Olivier Grisel
2013/11/22 Yi Pan : > Dear scikit-learn persons, > > This is Pan Yi from the University of Washington, US. I am currently working > on a course project, exploring the performance of AdaBoostClassifier when > using the same base classifier, such as DecisionTreeClassifier, Perceptron, > > KNeighborsC

[Scikit-learn-general] AdaBoostClassifier work with sparse matrix

2013-11-25 Thread Yi Pan
Dear scikit-learn persons, This is Pan Yi from the University of Washington, US. I am currently working on a course project, exploring the performance of AdaBoostClassifier when using the same base classifier, such as DecisionTreeClassifier, Perceptron, KNeighborsClassifier, or mixing different c

[Scikit-learn-general] MiniBatchKmeans crashes

2013-11-25 Thread Douwe Kiela
(I am not on this list so please CC.) Hi, The MiniBatchKmeans implementation in sklearn/cluster/k_means_.py crashes rather ungracefully on line 860 with the following Traceback: Init 1/3 with method: k-means++ /usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py:1146: RuntimeWarnin