Re: [Scikit-learn-general] Small bug/inconsistency in sklearn.feature_extraction.text.CountVectorizer

2013-03-27 Thread Mathieu Blondel
On Thu, Mar 28, 2013 at 8:37 AM, Lars Buitinck wrote: > There's already a pull request that speeds up CountVectorizer and > returns a csr_matrix. I think we should merge it in soon. Actually, it returns a CSC matrix and we were arguing whether it should be the default output or not. I think that

Re: [Scikit-learn-general] Questions about converting categorical data into input data for an SVM

2013-03-27 Thread Lars Buitinck
2013/3/27 Anne Dwyer : > Just to clarify, you are saying that there is no procedure in scikit that > will transform categorical feature values into numerical values like I was > trying to do here. Correct? Not that I know of. DictVectorizer comes quite close, though. -- Lars Buitinck Scientific

Re: [Scikit-learn-general] Small bug/inconsistency in sklearn.feature_extraction.text.CountVectorizer

2013-03-27 Thread Lars Buitinck
2013/3/27 Tom Fawcett : > I’ve identified a bug/inconsistency in sklearn.feature_extraction.text. > TfidfVectorizer returns a matrix of type scipy.sparse.csr.csr_matrix; whereas > CountVectorizer returns scipy.sparse.coo.coo_matrix, which don’t support > multiple (array) indexing. > > Below is a

[Scikit-learn-general] Small bug/inconsistency in sklearn.feature_extraction.text.CountVectorizer

2013-03-27 Thread Tom Fawcett
I’ve identified a bug/inconsistency in sklearn.feature_extraction.text. TfidfVectorizer returns a matrix of type scipy.sparse.csr.csr_matrix; whereas CountVectorizer returns scipy.sparse.coo.coo_matrix, which don’t support multiple (array) indexing. Below is a short (silly) example that demonstr

Re: [Scikit-learn-general] Questions about converting categorical data into input data for an SVM

2013-03-27 Thread Anne Dwyer
Thanks for your answer. Just to clarify, you are saying that there is no procedure in scikit that will transform categorical feature values into numerical values like I was trying to do here. Correct? Anne Dwyer On Wed, Mar 27, 2013 at 4:05 PM, Lars Buitinck wrote: > 2013/3/27 Anne Dwyer : > >

[Scikit-learn-general] Sparse Data Format for SVM SCIKIT

2013-03-27 Thread Abdul Wahid Memon
Hello all Is there any example of using datasets.load_svmlight_file() already available or any data set in sparse matrix format to be used along with this function in SCIKIT? Regards -- Own the Future-Intel® Level Up Gam

Re: [Scikit-learn-general] Questions about converting categorical data into input data for an SVM

2013-03-27 Thread Lars Buitinck
2013/3/27 Anne Dwyer : > I'm trying to convert categorical data to input data for an SVM. I am trying > to transform the data first to label encoded data, then use the one hot > encoding procedure. > > Can some please explain what I am doing wrong? LabelEncoder is for class labels, not features. I

[Scikit-learn-general] Questions about converting categorical data into input data for an SVM

2013-03-27 Thread Anne Dwyer
I'm trying to convert categorical data to input data for an SVM. I am trying to transform the data first to label encoded data, then use the one hot encoding procedure. Here's a subset of the data that I'm trying to transform: array_both= [['3' 'male'] ['1' 'female'] ['3' 'female'] ['1' 'female