I misunderstood that your "random data" is resampled from original data.
btw, for transforming to orthogonal features:
http://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_vs_pca.html#example-decomposition-plot-ica-vs-pca-py
2014-03-17 16:40 GMT+08:00 Caleb :
> >Hi, please repeat
>Hi, please repeat your experiments several times because both >Random Forest
>Embedding > and your resampling are ?>not deterministic algorithm. their
>results may varys a lot
>please also check the attributes "n_support_" >and "support_vectors_" in the
>object, they're > highly related to >tim
> I transform my data using different transformation T1 and T2 and then feed
> it into the LInearSVC. What I found is that the classifier is trained
> significantly faster with the transformed data using T2.
>
> Since both transformed data has the same number of instances, we are looking
> at facto
Hi, please repeat your experiments several times because both Random Forest
Embedding and your resampling are not deterministic algorithm. their
results may varys a lot
please also check the attributes "n_support_" and "support_vectors_" in the
object, they're highly related to time complexity
20
Hi Olivier,
Thanks for your advices. Maybe I should rephrase my question. The basic
situation is shown below.
T1
|> LinearSVC (longer training time, about 30s)
data -- T2
|> LinearSVC (significantly shorter training time, about 3s)
Also don't forget to grid search the regularizer hyperparameter for each model.
--
Olivier
--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
The SVC implementation in scikit-learn is based on the SMO
implementation of libsvm that has a complexity more than quadratic
with the number of samples.
To train linear SVMs on medium to large datasets (in terms of
n_samples), you'd rather use LinearSVC (based on liblinear) or
SGDClassifier(loss=
I am currently playing with the RandomForestEmbedding in scikit-learn using the
MNIST data. I train the RandomForestEmbedding using random matrix of the shape
(1000,784) and use it to transform the MNIST data then feed it into linear SVM.
The training time of the SVM is about 37s and the accurac
2014-03-02 16:33 GMT+01:00 Olivier Grisel :
> Inverting 0 and 1 does not change the problem mathematically
Actually I think it does change a bit, because Liblinear regularizes
the intercept. Imagine a space of two features and two classes,
clustered around (0,0) and (1,1). Unless the decision line
Inverting 0 and 1 does not change the problem mathematically but does
change the number of multiplication addition operations performed when
computing the dot products or euclidean norms involved in the
computation of columns of the kernel matrix when a sparse
representation of the input data is us
| Interesting trick. Unfortunately that does not work anymore on
| non-thresholded gray level pixels that are more common in non-| | toy
computer vision datasets.
Sparse coding can do the similar trick to those datasets.
| LinearSVC will convert to Liblinear's custom sparse format
| intern
Interesting trick. Unfortunately that does not work anymore on
non-thresholded gray level pixels that are more common in non-toy
computer vision datasets.
--
Olivier
--
Flow-based real-time traffic analytics software. Ci
2014-02-28 15:42 GMT+01:00 Caleb :
> I am training my SVM with the raw pixels value of the mnist dataset.
> Just for fun, I round up all the pixel values to either 1 or 0, and this I
> will call dataset1. Then I invert the 1 to 0 and 0 to 1 to form dataset2.
> Thus dataset1 will consists of mainly
13 matches
Mail list logo