Re: [Scikit-learn-general] Weighted and Balanced Random Forests

2013-03-19 Thread Manish Amde
I have a follow up question regarding the usage of sample_weights for fitting the RandomForestClassifier. Does the predict_proba method take the sample weights (used during fitting) into account as well? I spent some time trying to understand the _tree.pyc and tree.py files in the codebase but stil

Re: [Scikit-learn-general] Weighted and Balanced Random Forests

2013-02-08 Thread Jeff Elmore
I've been wrestling with this same issue in the regression case. I realize it's not as straight forward to balance continuous data as it is for discrete classes of output. But I wonder if this list has any thoughts about how it might be approached. The data I'm predicting is distributed normally

[Scikit-learn-general] Weighted and Balanced Random Forests

2013-02-08 Thread Manish Amde
Fellow sklearners, I am working on a classification problem with an unbalanced data set and have been successful using SVM classifiers with the class_weight option. I have also tried Random Forests and am getting a decent ROC performance but I am hoping to get a performance improvement by using

Re: [Scikit-learn-general] Weighted and Balanced Random Forests

2013-02-07 Thread Manish Amde
Thanks Gilles. This definitely helps. I am glad I asked. :-) -Manish On Feb 7, 2013, at 11:33 PM, Gilles Louppe wrote: > Hello, > > You might achieve what you want by using sample weights when fitting > your forest (See the 'sample_weight' parameter). There is also a > 'balance_weights' method

Re: [Scikit-learn-general] Weighted and Balanced Random Forests

2013-02-07 Thread Gilles Louppe
Hello, You might achieve what you want by using sample weights when fitting your forest (See the 'sample_weight' parameter). There is also a 'balance_weights' method from the preprocessing module that basically generates sample weights for you, such that classes become balanced. https://github.co

[Scikit-learn-general] Weighted and Balanced Random Forests

2013-02-07 Thread Manish Amde
Fellow sklearners, I am working on a classification problem with an unbalanced data set and have been successful using SVM classifiers with the class_weight option. I have also tried Random Forests and am getting a decent ROC performance but I am hoping to get a performance improvement by using W