Re: [Scikit-learn-general] Nosetests error

2013-02-25 Thread hrishikesh911
You do not have required compilers to compile sklearn. Do sudo apt-get install build-essential On Tue, Feb 26, 2013 at 2:51 AM, Saagar Takhi wrote: > Hello, > I have tried installing version .12, .12-1, .13-1 on my kubuntu, but i > have always failed in running > > "nosetests sklearn --exe" > >

[Scikit-learn-general] Nosetests error

2013-02-25 Thread Saagar Takhi
Hello, I have tried installing version .12, .12-1, .13-1 on my kubuntu, but i have always failed in running "nosetests sklearn --exe" it doesn't reach to any conclusion just a few dots of tests with no 'E' or 'S' and doesn't move ahead. Also while installing all the three versions in every poss

Re: [Scikit-learn-general] Packaging large objects

2013-02-25 Thread Lars Buitinck
2013/2/25 Ark <4rk@gmail.com>: > Due to a very large number of features(and reduce the size), I use SelectKBest > which selects 150k features from the 500k features that I get from > TfIdfVectorizer, which worked fine. When I use Hashing vectorizer instead of > TfidfVectorizer I see following w

Re: [Scikit-learn-general] Packaging large objects

2013-02-25 Thread Ark
> You could also try the HashingVectorizer in sklearn.feature_extraction > and see if performance is still acceptable with a small number of > features. That also skips storing the vocabulary, which I imagine will > be quite large as well. > Due to a very large number of features(and reduce the si

Re: [Scikit-learn-general] Imbalance in scikit-learn

2013-02-25 Thread Philipp Singer
Hey! One simple solution that often works wonders is to set the class_weight parameter of a classifier (if available) to 'auto' [1]. If you have enough data, it often also makes sense to balance the data beforehand. [1] http://scikit-learn.org/dev/modules/svm.html#unbalanced-problems Am 25.02

[Scikit-learn-general] Imbalance in scikit-learn

2013-02-25 Thread Maor Hornstein
I'm using scikit-learn in my Python program in order to perform some machine-learning operations. The problem is that my data-set has severe imbalance issues. Does anyone know a solution for imbalance in scikit-learn or in python in general? Thanks :)

Re: [Scikit-learn-general] Problem in text feature extraction (sklearn.feature_extraction.text)

2013-02-25 Thread Lars Buitinck
2013/2/25 Vlad Niculae : > This is certainly a case where the default behaviour cannot possibly > please everybody. I can't think of an application where changing > tokenization and preprocessing wouldn't help. > > For instance you often want to replace all numbers with the same > token. Possibly y

Re: [Scikit-learn-general] Problem in text feature extraction (sklearn.feature_extraction.text)

2013-02-25 Thread Vlad Niculae
This is certainly a case where the default behaviour cannot possibly please everybody. I can't think of an application where changing tokenization and preprocessing wouldn't help. For instance you often want to replace all numbers with the same token. Possibly you want a different token for number

Re: [Scikit-learn-general] Problem in text feature extraction (sklearn.feature_extraction.text)

2013-02-25 Thread Olivier Grisel
2013/2/25 : > the missing 2 in tokenizing 2.50 is indeed a bit weird, though. There is definitely a compromise between simplicity (and thus understandability / maintainability) of the default regexp and coverage of the most common patterns. I you have a suggestion for a better yet simple and und

Re: [Scikit-learn-general] Problem in text feature extraction (sklearn.feature_extraction.text)

2013-02-25 Thread Philipp Singer
I guess the parser thinks about a new word after a dot and the word before (2) is not two characters long. Am 25.02.2013 08:21, schrieb amuel...@ais.uni-bonn.de: > the missing 2 in tokenizing 2.50 is indeed a bit weird, though. > > > > Tom Fawcett schrieb: > > First, thanks for all your grea