I'd like to train a decision tree on a set of weighted data points. I looked into the rpart package, which builds trees but doesn't seem to offer the capability of weighting inputs. (There is a weights parameter, but it seems to correspond to output classes rather than to input points).
I'm making do for now by preprocessing my input data by adding multiple instances of each data point corresponding to its weight before feeding to rpart. But I worry this tricks the cross-validation phase of the rpart building process into thinking a model generalizes better than it really does. This is because a heavily-weighted point can be included in both the training and testing set of a cross validation split. Is there a better way to achieve my goal? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. Jump Trading, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.