On Thu, Mar 28, 2013 at 8:37 AM, Lars Buitinck wrote:
> There's already a pull request that speeds up CountVectorizer and
> returns a csr_matrix. I think we should merge it in soon.
Actually, it returns a CSC matrix and we were arguing whether it
should be the default output or not.
I think that
2013/3/27 Anne Dwyer :
> Just to clarify, you are saying that there is no procedure in scikit that
> will transform categorical feature values into numerical values like I was
> trying to do here. Correct?
Not that I know of. DictVectorizer comes quite close, though.
--
Lars Buitinck
Scientific
2013/3/27 Tom Fawcett :
> I’ve identified a bug/inconsistency in sklearn.feature_extraction.text.
> TfidfVectorizer returns a matrix of type scipy.sparse.csr.csr_matrix; whereas
> CountVectorizer returns scipy.sparse.coo.coo_matrix, which don’t support
> multiple (array) indexing.
>
> Below is a
I’ve identified a bug/inconsistency in sklearn.feature_extraction.text.
TfidfVectorizer returns a matrix of type scipy.sparse.csr.csr_matrix; whereas
CountVectorizer returns scipy.sparse.coo.coo_matrix, which don’t support
multiple (array) indexing.
Below is a short (silly) example that demonstr
Thanks for your answer.
Just to clarify, you are saying that there is no procedure in scikit that
will transform categorical feature values into numerical values like I was
trying to do here. Correct?
Anne Dwyer
On Wed, Mar 27, 2013 at 4:05 PM, Lars Buitinck wrote:
> 2013/3/27 Anne Dwyer :
> >
Hello all
Is there any example of using datasets.load_svmlight_file() already
available or any data set in sparse matrix format to be used along
with this function in SCIKIT?
Regards
--
Own the Future-Intel® Level Up Gam
2013/3/27 Anne Dwyer :
> I'm trying to convert categorical data to input data for an SVM. I am trying
> to transform the data first to label encoded data, then use the one hot
> encoding procedure.
>
> Can some please explain what I am doing wrong?
LabelEncoder is for class labels, not features. I
I'm trying to convert categorical data to input data for an SVM. I am
trying to transform the data first to label encoded data, then use the one
hot encoding procedure.
Here's a subset of the data that I'm trying to transform:
array_both=
[['3' 'male']
['1' 'female']
['3' 'female']
['1' 'female