Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-21 Thread Olivier Grisel
I have not yet thought about what should be the best behavior, but we should at least fix the regression for 0.15.1. I created an issue here: https://github.com/scikit-learn/scikit-learn/issues/3462 -- Olivier -- Want f

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-21 Thread Christian Jauvin
> What I had in mind (for the LB) was an option to "reserve" an extra > column at the LB creation, which could then be used to map all the > unknown values further encountered by "transform". This column would > obviously be all zeros in the matrix returned by "fit_transform" (i.e. > could only con

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-17 Thread Christian Jauvin
> I think the encoders should all be able to deal with unknown labels. > The thing about the extra single value is that you don't have a column > to map it to. > How would you use the extra value in LabelBinarizer or OneHotEncoder? You're right, and this points to a difference between what PR #324

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-17 Thread Andy
I think the encoders should all be able to deal with unknown labels. The thing about the extra single value is that you don't have a column to map it to. How would you use the extra value in LabelBinarizer or OneHotEncoder? For LabelEncoder I think it would make sense. On 07/17/2014 12:59 AM, Ch

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-16 Thread Michael Bommarito
Relevant to this: https://github.com/scikit-learn/scikit-learn/pull/3243 Thanks, Michael J. Bommarito II, CEO Bommarito Consulting, LLC *Web:* http://www.bommaritollc.com *Mobile:* +1 (646) 450-3387 On Wed, Jul 16, 2014 at 6:59 PM, Christian Jauvin wrote: > I can open an issue, but on the othe

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-16 Thread Joel Nothman
cf. https://github.com/scikit-learn/scikit-learn/pull/3243 On 17 July 2014 08:59, Christian Jauvin wrote: > I can open an issue, but on the other hand, you could argue that the > new behaviour is now at least consistent with the other encoder types, > e.g.: > > >>> le = LabelEncoder() > >>> le.

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-16 Thread Christian Jauvin
I can open an issue, but on the other hand, you could argue that the new behaviour is now at least consistent with the other encoder types, e.g.: >>> le = LabelEncoder() >>> le.fit_transform(['a', 'b', 'c']) array([0, 1, 2]) >>> le.transform(['a', 'd', 'e']) [...] ValueError: y contains new labels

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-16 Thread Arnaud Joly
Hi This looks like a regression. Can you open an issue on github? I am not sure that it would make sense to add a unknown columns label with an optional parameter. But you could easily add one with some numpy operations np.hstack([y, y.sum(axis=1,keepdims=True) == 0]) Best regards, Arnaud On

[Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-16 Thread Christian Jauvin
Hi, I have noticed a change with the LabelBinarizer between version 0.15 and those before. Prior 0.15, this worked: >>> lb = LabelBinarizer() >>> lb.fit_transform(['a', 'b', 'c']) array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) >>> lb.transform(['a', 'd', 'e']) array([[1, 0, 0], [0