subject:"\[Scikit\-learn\-general\] Cleaning\/feature extraction of e\-mail messages"

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread Florian Lindner

Am Montag, 25. November 2013, 12:33:25 schrieb abhishek: > a simple way of cleaning the html tags is using NLTK's "clean_html" Hey, thx, didn't know about that. Just for information: this is now be done by BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text It will so

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread Tadej Stajner

Hi, Python has the built-in email package which could be useful for you at least for the multipart stuff and the metadata. http://docs.python.org/2/library/email-examples.html http://docs.python.org/3/library/email-examples.html On how to construct features, it depends on what you need to do -

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread Jaques Grobler

@Florian - Abhishek's suggestion is the way to go. Simple and works well [?] 2013/11/25 abhishek > a simple way of cleaning the html tags is using NLTK's "clean_html" > > > On Mon, Nov 25, 2013 at 12:30 PM, Jaques Grobler > wrote: > >> Hey Florian, >> >> So you need some lexical analyzer to re

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread abhishek

a simple way of cleaning the html tags is using NLTK's "clean_html" On Mon, Nov 25, 2013 at 12:30 PM, Jaques Grobler wrote: > Hey Florian, > > So you need some lexical analyzer to remove all the HTML tags etc before > you start your classification? > I'm not sure about any ready-to-use packages

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-25 Thread Jaques Grobler

Hey Florian, So you need some lexical analyzer to remove all the HTML tags etc before you start your classification? I'm not sure about any ready-to-use packages for this (I'm sure they're out there), but I've played around with pythons `re` module at some point and now found this which might be u

[Scikit-learn-general] Cleaning/feature extraction of e-mail messages

2013-11-24 Thread Florian Lindner

Hello, I want to use scikit-lean for mail classification (no spam detection). I haven't really worked with machine learning software (besides end-user spamfilters). What I have done so far: vectorizer = TfidfVectorizer(input='filename', preprocessor=mail_preprocessor, decode_error="ignore") X

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

Re: [Scikit-learn-general] Cleaning/feature extraction of e-mail messages

[Scikit-learn-general] Cleaning/feature extraction of e-mail messages

6 matches

Site Navigation

Mail list logo

Footer information