Re: [Scikit-learn-general] Order of processes in WordNGramAnalyzer

2011-12-06 Thread Gael Varoquaux
> We don't want generators or list of functions as parameters though as > it would break the ability to do cross validation and picklability. Agreed, but this does seem to fit in the general usecase of on-line learning, some hopefully we should be able to addresse this usecase in the long run. G

Re: [Scikit-learn-general] Order of processes in WordNGramAnalyzer

2011-11-24 Thread Olivier Grisel
I plan to work on this during the sprint to simplify the vectorizer and make it easier to override the default implementation. We don't want generators or list of functions as parameters though as it would break the ability to do cross validation and picklability. -- Olivier

Re: [Scikit-learn-general] Order of processes in WordNGramAnalyzer

2011-11-24 Thread Robert Layton
On 25 November 2011 08:58, Nelle Varoquaux wrote: > On 24 November 2011 22:51, Lars Buitinck wrote: > > 2011/11/22 SK Sn : > >> I looked into WordNGramAnalyzer in feature_extraction/text.py. > >> > >> It occured to me that in case of nGram n>1, 'handle token n-grams' > happends > >> before 'handl

Re: [Scikit-learn-general] Order of processes in WordNGramAnalyzer

2011-11-24 Thread Nelle Varoquaux
On 24 November 2011 22:51, Lars Buitinck wrote: > 2011/11/22 SK Sn : >> I looked into WordNGramAnalyzer in feature_extraction/text.py. >> >> It occured to me that in case of nGram n>1, 'handle token n-grams' happends >> before 'handle stop words', as shown in following snippet: > > > >> At least

Re: [Scikit-learn-general] Order of processes in WordNGramAnalyzer

2011-11-24 Thread Lars Buitinck
2011/11/22 SK Sn : > I looked into WordNGramAnalyzer in feature_extraction/text.py. > > It occured to me that in case of nGram n>1, 'handle token n-grams' happends > before 'handle stop words', as shown in following snippet: > At least it is strange to me that, especially when I define my own >

[Scikit-learn-general] Order of processes in WordNGramAnalyzer

2011-11-22 Thread SK Sn
Hi there, I looked into WordNGramAnalyzer in feature_extraction/text.py. It occured to me that in case of nGram n>1, 'handle token n-grams' happends before 'handle stop words', as shown in following snippet: # handle token n-grams if self.min_n != 1 or self.max_n != 1: