gist at https://gist.github.com/jnothman/a75bac778c1eb9661017555249e50379
On 18 August 2017 at 01:26, Joel Nothman wrote:
> I don't consider LabelBinarizer the best workaround.
>
> Given a Pandas dataframe df, I'd use:
>
> DictVectorizer().fit_transform(df.to_dict(orient='records'))
>
> which wi
I don't consider LabelBinarizer the best workaround.
Given a Pandas dataframe df, I'd use:
DictVectorizer().fit_transform(df.to_dict(orient='records'))
which will handle encoding strings with one-hot and numerical features as
column vectors. Or:
class PandasVectorizer(DictVectorizer):
def f
Hi Georg.
Unfortunately this is not entirely trivial right now, but will be fixed by
https://github.com/scikit-learn/scikit-learn/pull/9151
and
https://github.com/scikit-learn/scikit-learn/pull/9012
which will be in the next release (0.20).
LabelBinarizer is probably the best work-around for now,
Hi,
how can I properly handle categorical values in scikit-learn?
https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
goals
- scikit-learn syle fit/transform methods to encode labels of
categorical features of X
- should handl