You do not have required compilers to compile sklearn.
Do
sudo apt-get install build-essential
On Tue, Feb 26, 2013 at 2:51 AM, Saagar Takhi wrote:
> Hello,
> I have tried installing version .12, .12-1, .13-1 on my kubuntu, but i
> have always failed in running
>
> "nosetests sklearn --exe"
>
>
Hello,
I have tried installing version .12, .12-1, .13-1 on my kubuntu, but i have
always failed in running
"nosetests sklearn --exe"
it doesn't reach to any conclusion just a few dots of tests with no 'E' or
'S' and doesn't move ahead.
Also while installing all the three versions in every poss
2013/2/25 Ark <4rk@gmail.com>:
> Due to a very large number of features(and reduce the size), I use SelectKBest
> which selects 150k features from the 500k features that I get from
> TfIdfVectorizer, which worked fine. When I use Hashing vectorizer instead of
> TfidfVectorizer I see following w
> You could also try the HashingVectorizer in sklearn.feature_extraction
> and see if performance is still acceptable with a small number of
> features. That also skips storing the vocabulary, which I imagine will
> be quite large as well.
>
Due to a very large number of features(and reduce the si
Hey!
One simple solution that often works wonders is to set the class_weight
parameter of a classifier (if available) to 'auto' [1].
If you have enough data, it often also makes sense to balance the data
beforehand.
[1] http://scikit-learn.org/dev/modules/svm.html#unbalanced-problems
Am 25.02
I'm using scikit-learn in my Python program in order to perform some
machine-learning operations. The problem is that my data-set has severe
imbalance issues.
Does anyone know a solution for imbalance in scikit-learn or in python in
general?
Thanks :)
2013/2/25 Vlad Niculae :
> This is certainly a case where the default behaviour cannot possibly
> please everybody. I can't think of an application where changing
> tokenization and preprocessing wouldn't help.
>
> For instance you often want to replace all numbers with the same
> token. Possibly y
This is certainly a case where the default behaviour cannot possibly
please everybody. I can't think of an application where changing
tokenization and preprocessing wouldn't help.
For instance you often want to replace all numbers with the same
token. Possibly you want a different token for number
2013/2/25 :
> the missing 2 in tokenizing 2.50 is indeed a bit weird, though.
There is definitely a compromise between simplicity (and thus
understandability / maintainability) of the default regexp and
coverage of the most common patterns.
I you have a suggestion for a better yet simple and und
I guess the parser thinks about a new word after a dot and the word
before (2) is not two characters long.
Am 25.02.2013 08:21, schrieb amuel...@ais.uni-bonn.de:
> the missing 2 in tokenizing 2.50 is indeed a bit weird, though.
>
>
>
> Tom Fawcett schrieb:
>
> First, thanks for all your grea
10 matches
Mail list logo