I have not yet thought about what should be the best behavior, but we
should at least fix the regression for 0.15.1. I created an issue
here:
https://github.com/scikit-learn/scikit-learn/issues/3462
--
Olivier
--
Want f
> What I had in mind (for the LB) was an option to "reserve" an extra
> column at the LB creation, which could then be used to map all the
> unknown values further encountered by "transform". This column would
> obviously be all zeros in the matrix returned by "fit_transform" (i.e.
> could only con
> I think the encoders should all be able to deal with unknown labels.
> The thing about the extra single value is that you don't have a column
> to map it to.
> How would you use the extra value in LabelBinarizer or OneHotEncoder?
You're right, and this points to a difference between what PR #324
I think the encoders should all be able to deal with unknown labels.
The thing about the extra single value is that you don't have a column
to map it to.
How would you use the extra value in LabelBinarizer or OneHotEncoder?
For LabelEncoder I think it would make sense.
On 07/17/2014 12:59 AM, Ch
Relevant to this:
https://github.com/scikit-learn/scikit-learn/pull/3243
Thanks,
Michael J. Bommarito II, CEO
Bommarito Consulting, LLC
*Web:* http://www.bommaritollc.com
*Mobile:* +1 (646) 450-3387
On Wed, Jul 16, 2014 at 6:59 PM, Christian Jauvin wrote:
> I can open an issue, but on the othe
cf. https://github.com/scikit-learn/scikit-learn/pull/3243
On 17 July 2014 08:59, Christian Jauvin wrote:
> I can open an issue, but on the other hand, you could argue that the
> new behaviour is now at least consistent with the other encoder types,
> e.g.:
>
> >>> le = LabelEncoder()
> >>> le.
I can open an issue, but on the other hand, you could argue that the
new behaviour is now at least consistent with the other encoder types,
e.g.:
>>> le = LabelEncoder()
>>> le.fit_transform(['a', 'b', 'c'])
array([0, 1, 2])
>>> le.transform(['a', 'd', 'e'])
[...]
ValueError: y contains new labels
Hi
This looks like a regression. Can you open an issue on github?
I am not sure that it would make sense to add a unknown columns
label with an optional parameter. But you could easily add one with
some numpy operations
np.hstack([y, y.sum(axis=1,keepdims=True) == 0])
Best regards,
Arnaud
On
Hi,
I have noticed a change with the LabelBinarizer between version 0.15
and those before.
Prior 0.15, this worked:
>>> lb = LabelBinarizer()
>>> lb.fit_transform(['a', 'b', 'c'])
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
>>> lb.transform(['a', 'd', 'e'])
array([[1, 0, 0],
[0