Hello list...
I'm a huge fan of sklearn and use it daily at work. I was confused by the
results of some recent text classification experiments and started looking
more closely at the vectorization code.
I'm wondering about the logic behind:
1) not doing stopword removal for the char_wb analyzer
Oh, right (duh)...I wasn't thinking clearly about the padding for char_wb.
I'll do some tests with stopword removal for char_wb and submit a PR if it
looks worthwhile.
Cheers,
Fred.
On 19 July 2013 13:27, Olivier Grisel wrote:
> 2013/7/19 Fred Mailhot :
> > Hello list...
>
> Hi Fred,
>
> > I'm
2013/7/19 Fred Mailhot :
> Hello list...
Hi Fred,
> I'm a huge fan of sklearn and use it daily at work. I was confused by the
> results of some recent text classification experiments and started looking
> more closely at the vectorization code.
>
> I'm wondering about the logic behind:
>
> 1) not