I'd agree with Gael that a potential explanation could be the distribution
shift upon splitting (usually the smaller the dataset, the more this is of an
issue). As potential solutions/workarounds, you could try
a) stratified sampling for regression, if you'd like to stick with the 2-way
holdout
I took my example in classification for didactic purposes. My hypothesis still
holds that the splitting of the data creates anti correlations between train
and test (a depletion effect).
Basically , you shouldn't work with datasets that small.
Gaël
Sent from my phone, please excuse typos and
I have very small training sets (10-50 observations). Currently, I am
working with 16 observations for training and 25 for validation (external
test set). And I am doing Regression, not Classification (hence the SVR
instead of SVC).
On 26 September 2017 at 18:21, Gael Varoquaux wrote:
> Hypothe
Hypothesis: you have a very small dataset and when you leave out data,
you create a distribution shift between the train and the test. A
simplified example: 20 samples, 10 class a, 10 class b. A leave-one-out
cross-validation will create a training set of 10 samples of one class, 9
samples of the o
Greetings,
I don't know if anyone encountered this before, but sometimes I get
anti-correlated predictions by the SVR I that am training. Namely, the
Pearson's R and Kendall's tau are negative when I compare the predictions
on the external test set with the true values. However, the SVR prediction