Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-17 Thread He-chien Tsai
I misunderstood that your "random data" is resampled from original data. btw, for transforming to orthogonal features: http://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_vs_pca.html#example-decomposition-plot-ica-vs-pca-py 2014-03-17 16:40 GMT+08:00 Caleb : > >Hi, please repeat

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-17 Thread Caleb
>Hi, please repeat your experiments several times because both >Random Forest >Embedding > and your resampling are ?>not deterministic algorithm. their >results may varys a lot >please also check the attributes "n_support_" >and "support_vectors_" in the >object, they're > highly related to >tim

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-17 Thread Alexandre Gramfort
> I transform my data using different transformation T1 and T2 and then feed > it into the LInearSVC. What I found is that the classifier is trained > significantly faster with the transformed data using T2. > > Since both transformed data has the same number of instances, we are looking > at facto

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-16 Thread He-chien Tsai
Hi, please repeat your experiments several times because both Random Forest Embedding and your resampling are not deterministic algorithm. their results may varys a lot please also check the attributes "n_support_" and "support_vectors_" in the object, they're highly related to time complexity 20

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-16 Thread Caleb
Hi Olivier, Thanks for your advices. Maybe I should rephrase my question. The basic situation is shown below.                             T1           |> LinearSVC (longer training time, about 30s) data --   T2           |> LinearSVC (significantly shorter training time, about 3s)

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-16 Thread Olivier Grisel
Also don't forget to grid search the regularizer hyperparameter for each model. -- Olivier -- Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-16 Thread Olivier Grisel
The SVC implementation in scikit-learn is based on the SMO implementation of libsvm that has a complexity more than quadratic with the number of samples. To train linear SVMs on medium to large datasets (in terms of n_samples), you'd rather use LinearSVC (based on liblinear) or SGDClassifier(loss=

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-16 Thread Caleb
I am currently playing with the RandomForestEmbedding in scikit-learn using the MNIST data. I train the RandomForestEmbedding using random matrix of the shape (1000,784) and use it to transform the MNIST data then feed it into linear SVM. The training time of the SVM is about 37s and the accurac

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-02 Thread Lars Buitinck
2014-03-02 16:33 GMT+01:00 Olivier Grisel : > Inverting 0 and 1 does not change the problem mathematically Actually I think it does change a bit, because Liblinear regularizes the intercept. Imagine a space of two features and two classes, clustered around (0,0) and (1,1). Unless the decision line

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-02 Thread Olivier Grisel
Inverting 0 and 1 does not change the problem mathematically but does change the number of multiplication addition operations performed when computing the dot products or euclidean norms involved in the computation of columns of the kernel matrix when a sparse representation of the input data is us

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-03-02 Thread Caleb
 | Interesting trick. Unfortunately that does not work anymore on  | non-thresholded gray level pixels that are more common in non-|  | toy  computer vision datasets. Sparse coding can do the similar trick to those datasets.  | LinearSVC will convert to Liblinear's custom sparse format   | intern

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-02-28 Thread Olivier Grisel
Interesting trick. Unfortunately that does not work anymore on non-thresholded gray level pixels that are more common in non-toy computer vision datasets. -- Olivier -- Flow-based real-time traffic analytics software. Ci

Re: [Scikit-learn-general] Faster SVM training time with sparse input

2014-02-28 Thread Lars Buitinck
2014-02-28 15:42 GMT+01:00 Caleb : > I am training my SVM with the raw pixels value of the mnist dataset. > Just for fun, I round up all the pixel values to either 1 or 0, and this I > will call dataset1. Then I invert the 1 to 0 and 0 to 1 to form dataset2. > Thus dataset1 will consists of mainly