Thanks
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
--
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step
On Fri, Jan 18, 2013 at 4:08 PM, Olivier Grisel wrote:
> Thanks for the details. So if I understand correctly you can only see
> the problem on a pipeline that is loaded from a joblib pickle on the
> harddrive?
>
> If so you should be able to reproduce the issue with the 20 newsgroups
> dataset.
>
Thanks for the details. So if I understand correctly you can only see
the problem on a pipeline that is loaded from a joblib pickle on the
harddrive?
If so you should be able to reproduce the issue with the 20 newsgroups dataset.
Could you please try to push a self-hosting script that uses the 20
Hi Olivier ,
Here is the output as requested.
sklearn version - '0.12.1'
Python 2.7
Os : Ubuntu 11.04
Trace :
In [3]: from sklearn.datasets import load_files
In [4]: categ = ['pos','neg']
In [5]: dataset = load_files('data_n',categories=categ,shuffle=False)
In [6]: from sklearn.feature_extraction
2013/1/17 Andreas Mueller :
> On 01/17/2013 07:02 PM, Olivier Grisel wrote:
>> This is a bug.
>>
>> Could you run the profiler (cProfile or line_profiler) on
>> TfidfVectorizer on a slice of your data an post the output?
>>
>> http://scikit-learn.org/dev/developers/performance.html#profiling-python
On 01/17/2013 07:02 PM, Olivier Grisel wrote:
> This is a bug.
>
> Could you run the profiler (cProfile or line_profiler) on
> TfidfVectorizer on a slice of your data an post the output?
>
> http://scikit-learn.org/dev/developers/performance.html#profiling-python-code
>
Do you think this is specifi
This is a bug.
Could you run the profiler (cProfile or line_profiler) on
TfidfVectorizer on a slice of your data an post the output?
http://scikit-learn.org/dev/developers/performance.html#profiling-python-code
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
--
On Thu, Jan 17, 2013 at 3:38 PM, Olivier Grisel wrote:
> It sounds like a bug. How many tokens do you have in your corpus?
>
> If you have the vectorized corpus in a variable X (e.g. `X =
> CountVectorizer().fit_transform(list_of_documents)`) you can do:
>
> >>> print(repr(X))
>
> to get the d
It sounds like a bug. How many tokens do you have in your corpus?
If you have the vectorized corpus in a variable X (e.g. `X =
CountVectorizer().fit_transform(list_of_documents)`) you can do:
>>> print(repr(X))
to get the dimension and number of non-zeros in the sparse matrix.
-
On Wed, Jan 16, 2013 at 8:35 PM, Jacob Perkins wrote:
> The tfidf transformer is the slow part - I've done a number of speed tests
> with scikit-learn classifiers, and adding tfidf always slowed things down
> significantly. It's also didn't seem to help much with accuracy.
>
>
>
>
Hi Jacob,
W
On Wed, Jan 16, 2013 at 8:35 PM, Jacob Perkins wrote:
> The tfidf transformer is the slow part - I've done a number of speed tests
> with scikit-learn classifiers, and adding tfidf always slowed things down
> significantly. It's also didn't seem to help much with accuracy.
>
>
>
>
Hi Jacob,
If
p://twitter.com/japerk
On Wed, Jan 16, 2013 at 12:38 AM, <
scikit-learn-general-requ...@lists.sourceforge.net> wrote:
>
> Date: Wed, 16 Jan 2013 14:00:12 +0530
> From: JAGANADH G
> Subject: [Scikit-learn-general] Saved Classifier Model slow in
> prediction
> To: sci
On 01/16/2013 09:46 AM, JAGANADH G wrote:
Probably.
I'm a bit surprised that it took so long. I would imagine it is the
vectorizer?
Thought that should be O(tokens) afaik.
Can you find out which of the steps in the pipline takes so long?
Hi,
Is there any way to check the sam
>
> Probably.
> I'm a bit surprised that it took so long. I would imagine it is the
> vectorizer?
> Thought that should be O(tokens) afaik.
> Can you find out which of the steps in the pipline takes so long?
>
Hi,
Is there any way to check the same. Because I saved the entire pipeline to
disk and
On 01/16/2013 09:30 AM, JAGANADH G wrote:
> Hi All,
> I trained a LinierSVC() classifier for Text Classfication. Total
> training set is about 1 Lakh documents (mostly sinle line). The saved
> model is about 3.5 MB in size.
> When I used the model in my python script it takes too much time to
>
Hi All,
I trained a LinierSVC() classifier for Text Classfication. Total training
set is about 1 Lakh documents (mostly sinle line). The saved model is about
3.5 MB in size.
When I used the model in my python script it takes too much time to preform
the prediction (near to one min to predict took
16 matches
Mail list logo