[Scikit-learn-general] Vectorization/tokenization question...

2013-07-19 Thread Fred Mailhot
Hello list... I'm a huge fan of sklearn and use it daily at work. I was confused by the results of some recent text classification experiments and started looking more closely at the vectorization code. I'm wondering about the logic behind: 1) not doing stopword removal for the char_wb analyzer

Re: [Scikit-learn-general] Vectorization/tokenization question...

2013-07-19 Thread Fred Mailhot
Oh, right (duh)...I wasn't thinking clearly about the padding for char_wb. I'll do some tests with stopword removal for char_wb and submit a PR if it looks worthwhile. Cheers, Fred. On 19 July 2013 13:27, Olivier Grisel wrote: > 2013/7/19 Fred Mailhot : > > Hello list... > > Hi Fred, > > > I'm

Re: [Scikit-learn-general] Vectorization/tokenization question...

2013-07-19 Thread Olivier Grisel
2013/7/19 Fred Mailhot : > Hello list... Hi Fred, > I'm a huge fan of sklearn and use it daily at work. I was confused by the > results of some recent text classification experiments and started looking > more closely at the vectorization code. > > I'm wondering about the logic behind: > > 1) not