Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-18 Thread Olivier Grisel
Thanks -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel -- Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-18 Thread JAGANADH G
On Fri, Jan 18, 2013 at 4:08 PM, Olivier Grisel wrote: > Thanks for the details. So if I understand correctly you can only see > the problem on a pipeline that is loaded from a joblib pickle on the > harddrive? > > If so you should be able to reproduce the issue with the 20 newsgroups > dataset. >

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-18 Thread Olivier Grisel
Thanks for the details. So if I understand correctly you can only see the problem on a pipeline that is loaded from a joblib pickle on the harddrive? If so you should be able to reproduce the issue with the 20 newsgroups dataset. Could you please try to push a self-hosting script that uses the 20

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-17 Thread JAGANADH G
Hi Olivier , Here is the output as requested. sklearn version - '0.12.1' Python 2.7 Os : Ubuntu 11.04 Trace : In [3]: from sklearn.datasets import load_files In [4]: categ = ['pos','neg'] In [5]: dataset = load_files('data_n',categories=categ,shuffle=False) In [6]: from sklearn.feature_extraction

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-17 Thread Olivier Grisel
2013/1/17 Andreas Mueller : > On 01/17/2013 07:02 PM, Olivier Grisel wrote: >> This is a bug. >> >> Could you run the profiler (cProfile or line_profiler) on >> TfidfVectorizer on a slice of your data an post the output? >> >> http://scikit-learn.org/dev/developers/performance.html#profiling-python

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-17 Thread Andreas Mueller
On 01/17/2013 07:02 PM, Olivier Grisel wrote: > This is a bug. > > Could you run the profiler (cProfile or line_profiler) on > TfidfVectorizer on a slice of your data an post the output? > > http://scikit-learn.org/dev/developers/performance.html#profiling-python-code > Do you think this is specifi

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-17 Thread Olivier Grisel
This is a bug. Could you run the profiler (cProfile or line_profiler) on TfidfVectorizer on a slice of your data an post the output? http://scikit-learn.org/dev/developers/performance.html#profiling-python-code -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel --

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-17 Thread JAGANADH G
On Thu, Jan 17, 2013 at 3:38 PM, Olivier Grisel wrote: > It sounds like a bug. How many tokens do you have in your corpus? > > If you have the vectorized corpus in a variable X (e.g. `X = > CountVectorizer().fit_transform(list_of_documents)`) you can do: > > >>> print(repr(X)) > > to get the d

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-17 Thread Olivier Grisel
It sounds like a bug. How many tokens do you have in your corpus? If you have the vectorized corpus in a variable X (e.g. `X = CountVectorizer().fit_transform(list_of_documents)`) you can do: >>> print(repr(X)) to get the dimension and number of non-zeros in the sparse matrix. -

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-17 Thread JAGANADH G
On Wed, Jan 16, 2013 at 8:35 PM, Jacob Perkins wrote: > The tfidf transformer is the slow part - I've done a number of speed tests > with scikit-learn classifiers, and adding tfidf always slowed things down > significantly. It's also didn't seem to help much with accuracy. > > > > Hi Jacob, W

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-16 Thread JAGANADH G
On Wed, Jan 16, 2013 at 8:35 PM, Jacob Perkins wrote: > The tfidf transformer is the slow part - I've done a number of speed tests > with scikit-learn classifiers, and adding tfidf always slowed things down > significantly. It's also didn't seem to help much with accuracy. > > > > Hi Jacob, If

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-16 Thread Jacob Perkins
p://twitter.com/japerk On Wed, Jan 16, 2013 at 12:38 AM, < scikit-learn-general-requ...@lists.sourceforge.net> wrote: > > Date: Wed, 16 Jan 2013 14:00:12 +0530 > From: JAGANADH G > Subject: [Scikit-learn-general] Saved Classifier Model slow in > prediction > To: sci

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-16 Thread Andreas Mueller
On 01/16/2013 09:46 AM, JAGANADH G wrote: Probably. I'm a bit surprised that it took so long. I would imagine it is the vectorizer? Thought that should be O(tokens) afaik. Can you find out which of the steps in the pipline takes so long? Hi, Is there any way to check the sam

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-16 Thread JAGANADH G
> > Probably. > I'm a bit surprised that it took so long. I would imagine it is the > vectorizer? > Thought that should be O(tokens) afaik. > Can you find out which of the steps in the pipline takes so long? > Hi, Is there any way to check the same. Because I saved the entire pipeline to disk and

Re: [Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-16 Thread Andreas Mueller
On 01/16/2013 09:30 AM, JAGANADH G wrote: > Hi All, > I trained a LinierSVC() classifier for Text Classfication. Total > training set is about 1 Lakh documents (mostly sinle line). The saved > model is about 3.5 MB in size. > When I used the model in my python script it takes too much time to >

[Scikit-learn-general] Saved Classifier Model slow in prediction

2013-01-16 Thread JAGANADH G
Hi All, I trained a LinierSVC() classifier for Text Classfication. Total training set is about 1 Lakh documents (mostly sinle line). The saved model is about 3.5 MB in size. When I used the model in my python script it takes too much time to preform the prediction (near to one min to predict took