I used SMOTE algorithm in R for class balancing. My data size has 13000 rows, I had 7% minority class in my sample now I used SMOTE( Synthetic Minority Oversampling Technique) for class balancing such that I raised the ration of minority class to 42 % and number of rows in data sample becomes 12655, Now I need to fit a logistic regression on my data set for that I need to divide the sample for cross validation and testing. I tried two approach :
a.) train my data on sample obtained after SMOTE and tested on the original sample having 13000 rows. b.) divide the sample obtained after SMOTE into train and test and do the fitting and testing on this data set only In first approach my results might get skewed so which approach should I take and Why ? -- Vijay Goel *+91-7501378852* ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.