Re: [Scikit-learn-general] Feature selection != feature elimination?

2016-05-02 Thread Philip Tully
arch 2016 at 08:05, Philip Tully wrote: > >> Hi, >> >> I'm trying to optimize the time it takes to make a prediction with my >> model(s). I realized that when I perform feature selection during the >> model fit(), that these features are likely still comp

[Scikit-learn-general] Feature selection != feature elimination?

2016-03-14 Thread Philip Tully
Hi, I'm trying to optimize the time it takes to make a prediction with my model(s). I realized that when I perform feature selection during the model fit(), that these features are likely still computed when I go to predict() or predict_proba(). An optimization would then involve actually eliminat

[Scikit-learn-general] Analyzer and tokenizer in (Count/TfIdf)Vectorizer

2015-11-30 Thread Philip Tully
Hi all, In the documentation ( http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn.feature_extraction.text.CountVectorizer) it is written that when a callable tokenizer is passed into (Count/TfIdf)Vectorizer, then this "Only applies if anal

Re: [Scikit-learn-general] The proper way to do nested Cross Validation with (Randomidized/)GridSearchCV pipelines

2015-09-28 Thread Philip Tully
different algorithms > you'd like to compare and select the model & algorithm that gives you the > "best" unbiased estimate (average of the outer loop validation scores). > After that, you select this "best" learning algorithm and tune it again via > "r

[Scikit-learn-general] The proper way to do nested Cross Validation with (Randomidized/)GridSearchCV pipelines

2015-09-27 Thread Philip Tully
Hi all, My question is mostly technical, but part ML best practice. I am performing (Randomized/)GridSearchCV to 'optimize' the hyperparameters of my estimator. However, if I want to do model selection after this, it would be best to do nested cross-validation to get a more unbiased estimate and a