I have a follow up question regarding the usage of sample_weights for
fitting the RandomForestClassifier. Does the predict_proba method take the
sample weights (used during fitting) into account as well? I spent some
time trying to understand the _tree.pyc and tree.py files in the codebase
but stil
I've been wrestling with this same issue in the regression case.
I realize it's not as straight forward to balance continuous data as it is
for discrete classes of output.
But I wonder if this list has any thoughts about how it might be approached.
The data I'm predicting is distributed normally
Fellow sklearners,
I am working on a classification problem with an unbalanced data set and have
been successful using SVM classifiers with the class_weight option.
I have also tried Random Forests and am getting a decent ROC performance but I
am hoping to get a performance improvement by using
Thanks Gilles. This definitely helps. I am glad I asked. :-)
-Manish
On Feb 7, 2013, at 11:33 PM, Gilles Louppe wrote:
> Hello,
>
> You might achieve what you want by using sample weights when fitting
> your forest (See the 'sample_weight' parameter). There is also a
> 'balance_weights' method
Hello,
You might achieve what you want by using sample weights when fitting
your forest (See the 'sample_weight' parameter). There is also a
'balance_weights' method from the preprocessing module that basically
generates sample weights for you, such that classes become balanced.
https://github.co
Fellow sklearners,
I am working on a classification problem with an unbalanced data set and
have been successful using SVM classifiers with the class_weight option.
I have also tried Random Forests and am getting a decent ROC performance
but I am hoping to get a performance improvement by using W