Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-25 Thread Paolo Losi
On Wed, Jan 25, 2012 at 6:00 PM, Olivier Grisel wrote: > > > Once you have clustered the unlabeled samples, > > you can add, as extra features on the labeled samples, > > the distance from each cluster center (e.g. computed > > via RBF kernel). > > Is that what you are suggesting? > > They are more

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-25 Thread Olivier Grisel
2012/1/25 Paolo Losi : > Hi Oliver, > > your reply is very informative (as always :-) ). > I've got a couple of question for you. See below... > > On Tue, Jan 24, 2012 at 1:57 PM, Olivier Grisel > wrote: >> >> If you can cheaply collect unsupervised data that looks similar to >> your training set

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-25 Thread Paolo Losi
Hi Oliver, your reply is very informative (as always :-) ). I've got a couple of question for you. See below... On Tue, Jan 24, 2012 at 1:57 PM, Olivier Grisel wrote: > > If you can cheaply collect unsupervised data that looks similar to > your training set (albeit without the labels and in much

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-24 Thread Olivier Grisel
Which classifier have you tried? Are you sure you selected the best hyper-parameters with GridSearchCV? Have your tried to normalize the dataset? For instance have a look at: http://scikit-learn.org/dev/modules/preprocessing.html For very sparse data with large variance in the feature, you shou

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-24 Thread Philipp Singer
Am 15.01.2012 19:45, schrieb Gael Varoquaux: > On Sun, Jan 15, 2012 at 07:39:00PM +0100, Philipp Singer wrote: >> The problem is that my representation is very sparse so I have a huge >> amount of zeros. > That's actually good: some of our estimators are able to use a sparse > representation to spe

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-15 Thread Gael Varoquaux
On Sun, Jan 15, 2012 at 07:39:00PM +0100, Philipp Singer wrote: > The problem is that my representation is very sparse so I have a huge > amount of zeros. That's actually good: some of our estimators are able to use a sparse representation to speed up computation. > Furthermore the dataset is ske

[Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-15 Thread Philipp Singer
Hey guys! I am currently trying to use the best possible classifier for my task. In my case I have regularly slightly more features than training examples and overall about 5000 features. The problem is that my representation is very sparse so I have a huge amount of zeros. The labels range fr