Re: [scikit-learn] Malformed input for SVC(kernel='precomputed').predict()

2017-08-17 Thread Sam Barnett
Hi Andy, Please find attached a Jupyter notebook showing exactly where the problem appears. Best, Sam On Thu, Aug 17, 2017 at 4:03 PM, Andreas Mueller wrote: > Hi Sam. > > Can you say which test fails exactly and where (i.e. give traceback)? > The estimator checks are currently quite strict wi

Re: [scikit-learn] Categorical handling

2017-08-17 Thread Joel Nothman
gist at https://gist.github.com/jnothman/a75bac778c1eb9661017555249e50379 On 18 August 2017 at 01:26, Joel Nothman wrote: > I don't consider LabelBinarizer the best workaround. > > Given a Pandas dataframe df, I'd use: > > DictVectorizer().fit_transform(df.to_dict(orient='records')) > > which wi

Re: [scikit-learn] Categorical handling

2017-08-17 Thread Joel Nothman
I don't consider LabelBinarizer the best workaround. Given a Pandas dataframe df, I'd use: DictVectorizer().fit_transform(df.to_dict(orient='records')) which will handle encoding strings with one-hot and numerical features as column vectors. Or: class PandasVectorizer(DictVectorizer): def f

Re: [scikit-learn] Categorical handling

2017-08-17 Thread Andreas Mueller
Hi Georg. Unfortunately this is not entirely trivial right now, but will be fixed by https://github.com/scikit-learn/scikit-learn/pull/9151 and https://github.com/scikit-learn/scikit-learn/pull/9012 which will be in the next release (0.20). LabelBinarizer is probably the best work-around for now,

Re: [scikit-learn] Malformed input for SVC(kernel='precomputed').predict()

2017-08-17 Thread Andreas Mueller
Hi Sam. Can you say which test fails exactly and where (i.e. give traceback)? The estimator checks are currently quite strict with respect to raising helpful error messages. That doesn't mean your estimator is broken (necessarily). With a precomputed gram matrix, I expect the shape of X in pred

[scikit-learn] Categorical handling

2017-08-17 Thread Georg Heiler
Hi, how can I properly handle categorical values in scikit-learn? https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 goals - scikit-learn syle fit/transform methods to encode labels of categorical features of X - should handl

[scikit-learn] Malformed input for SVC(kernel='precomputed').predict()

2017-08-17 Thread Sam Barnett
I am rolling classifier based on SVC which computes a custom Gram matrix and runs this through the SVC classifier with kernel = 'precomputed'. While this works fine with the fit method, I face a dilemma with the predict method, shown here: def predict(self, X): """Run the predict meth