Hello, I have written an user-defined split function for the package rpart, now I want to prune the fitted tree with my own defined function. To do so I want at first to grow a large tree with rpart and then use the function to prune the tree. The problem here is, growing a large tree with the user defined split function and therefore setting the complexity parameter to 0 (cp=0), gives me a smaller tree, as when I set the complexity parameter to 0.01 (default value). My question is now, which node values does the rpart use in order to 'prune' the tree? I first thought it is only the 'deviance' value, which is an output of the evaluation function, but I am not quite sure about that anymore.
Example Output of 2 trees (same data and split functions, different cp) : load('alist.R') # user defined split function > fit1 <- rpart(time.discrete ~ x1+x2+x3+x4+x5+x6,datTrain,control=list(cp=0.01), + method=alist) n= 3042 node), split, n, deviance, yval * denotes terminal node 1) root 3042 3043.5170 2 2) x3=2,3 1036 1231.5710 1 4) x2=2,3,4,5 556 704.0214 1 8) x6< 23.5 118 126.3924 1 * 9) x6>=23.5 438 541.8522 1 18) x1=1 164 214.2196 1 * 19) x1=0 274 295.5250 2 38) x5< 30.5 81 102.2434 1 * 39) x5>=30.5 193 161.5116 3 * 5) x2=0,1 480 454.6036 3 * 3) x3=0,1 2006 1698.9710 3 6) x6< 23.5 596 660.9713 2 * 7) x6>=23.5 1410 978.9448 3 14) x5< 19.5 323 342.9626 2 * 15) x5>=19.5 1087 567.2176 4 30) x1=0 633 200.8669 5 * 31) x1=1 454 329.6705 3 62) x2=0,1,3 254 109.3010 4 * 63) x2=2,4,5 200 186.8194 3 * > fit1$cptable CP nsplit rel error 1 0.03712005 0 1.0000000 2 0.02396757 1 0.9628800 3 0.02099862 2 0.9389124 4 0.01205192 4 0.8969151 5 0.01175506 5 0.8848632 6 0.01102346 6 0.8731082 7 0.01054950 7 0.8620847 8 0.01043856 8 0.8515352 9 0.01000000 9 0.8410966 > fit2 <- rpart(time.discrete ~ x1+x2+x3+x4+x5+x6,datTrain,control=list(cp=0), + method=alist) n= 3042 node), split, n, deviance, yval * denotes terminal node 1) root 3042 3.043517e+03 2 2) x3=2,3 1036 1.231571e+03 1 4) x2=2,3,4,5 556 7.040214e+02 1 8) x6< 23.5 118 1.263924e+02 1 16) x5< 42.5 73 5.778729e+01 1 32) x1=1 31 4.888716e-10 1 * 33) x1=0 42 4.611536e+01 1 * 17) x5>=42.5 45 5.607449e+01 1 * 9) x6>=23.5 438 5.418522e+02 1 * 5) x2=0,1 480 4.546036e+02 3 * 3) x3=0,1 2006 1.698971e+03 3 * > fit2$cptable CP nsplit rel error 1 0.037120046 0 1.0000000 2 0.023967574 1 0.9628800 3 0.011755057 2 0.9389124 4 0.004117156 3 0.9271573 5 0.003835016 4 0.9230402 6 0.000000000 5 0.9192052 Thank you Peter Mayer [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.