Re: [scikit-learn] anti-correlated predictions by SVR

2017-09-26 Thread Sebastian Raschka
I'd agree with Gael that a potential explanation could be the distribution shift upon splitting (usually the smaller the dataset, the more this is of an issue). As potential solutions/workarounds, you could try a) stratified sampling for regression, if you'd like to stick with the 2-way holdout

Re: [scikit-learn] anti-correlated predictions by SVR

2017-09-26 Thread Gael Varoquaux
I took my example in classification for didactic purposes. My hypothesis still holds that the splitting of the data creates anti correlations between train and test (a depletion effect). Basically , you shouldn't work with datasets that small. Gaël ⁣Sent from my phone, please excuse typos and

Re: [scikit-learn] anti-correlated predictions by SVR

2017-09-26 Thread Thomas Evangelidis
I have very small training sets (10-50 observations). Currently, I am working with 16 observations for training and 25 for validation (external test set). And I am doing Regression, not Classification (hence the SVR instead of SVC). On 26 September 2017 at 18:21, Gael Varoquaux wrote: > Hypothe

Re: [scikit-learn] anti-correlated predictions by SVR

2017-09-26 Thread Gael Varoquaux
Hypothesis: you have a very small dataset and when you leave out data, you create a distribution shift between the train and the test. A simplified example: 20 samples, 10 class a, 10 class b. A leave-one-out cross-validation will create a training set of 10 samples of one class, 9 samples of the o

[scikit-learn] anti-correlated predictions by SVR

2017-09-26 Thread Thomas Evangelidis
Greetings, I don't know if anyone encountered this before, but sometimes I get anti-correlated predictions by the SVR I that am training. Namely, the Pearson's R and Kendall's tau are negative when I compare the predictions on the external test set with the true values. However, the SVR prediction