Hi all, I want to fragment a dataset into k-cross-validation partitions (folds). The content of the folds should be stratified, but not according to a single (categorical) feature, but according to a range of features (numeric, if possible numeric and categorical). Does anybody know a way to do this?
I only found a way to do this for a single split (training-test split) with the package sampling. I will paste the example code for the training-test split below to make clear what I am looking for. With best regards, Martin example code: library("sampling") data <- as.matrix( iris[1:4] ) # skipping iris class column as this method only works for numerical features, but thats ok prob <- 0.3 # probability to be selected into test set samplecube(data, pik=rep(prob, times=nrow(data)), order=2) >>> [...] QUALITY OF BALANCING TOTALS HorvitzThompson_estimators Relative_deviation Sepal.Length 876.5 874.6667 -0.20916524 Sepal.Width 458.6 458.3333 -0.05814799 Petal.Length 563.7 563.3333 -0.06504642 Petal.Width 179.9 178.6667 -0.68556606 [1] 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 [38] 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 [75] 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 [112] 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 [149] 0 0 -- Dipl-Inf. Martin Gütlein Phone: +49 (0)761 203 7633 (office) +49 (0)177 623 9499 (mobile) Email: guetl...@informatik.uni-freiburg.de ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.