Re: [scikit-learn] Categorical handling

2017-08-17 Thread Joel Nothman
gist at https://gist.github.com/jnothman/a75bac778c1eb9661017555249e50379 On 18 August 2017 at 01:26, Joel Nothman wrote: > I don't consider LabelBinarizer the best workaround. > > Given a Pandas dataframe df, I'd use: > > DictVectorizer().fit_transform(df.to_dict(orient='records')) > > which wi

Re: [scikit-learn] Categorical handling

2017-08-17 Thread Joel Nothman
I don't consider LabelBinarizer the best workaround. Given a Pandas dataframe df, I'd use: DictVectorizer().fit_transform(df.to_dict(orient='records')) which will handle encoding strings with one-hot and numerical features as column vectors. Or: class PandasVectorizer(DictVectorizer): def f

Re: [scikit-learn] Categorical handling

2017-08-17 Thread Andreas Mueller
Hi Georg. Unfortunately this is not entirely trivial right now, but will be fixed by https://github.com/scikit-learn/scikit-learn/pull/9151 and https://github.com/scikit-learn/scikit-learn/pull/9012 which will be in the next release (0.20). LabelBinarizer is probably the best work-around for now,

[scikit-learn] Categorical handling

2017-08-17 Thread Georg Heiler
Hi, how can I properly handle categorical values in scikit-learn? https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 goals - scikit-learn syle fit/transform methods to encode labels of categorical features of X - should handl