, 8:58
Subject: Re: [Scikit-learn-general] Classification accuracy too high
On 08/15/2013 01:08 PM, Jason Williams wrote:
> I follow the sample at
> http://blog.yhathq.com/posts/random-forests-in-python.html where it randomly
> assigns true, false to the dataset
>
> np.rand
On 08/15/2013 01:08 PM, Jason Williams wrote:
> I follow the sample at
> http://blog.yhathq.com/posts/random-forests-in-python.html where it randomly
> assigns true, false to the dataset
>
> np.random.uniform(0, 1, len(df)) <= .75
>
> then partition dataset into train set and test set. I use
ubject: Re: [Scikit-learn-general] Classification accuracy too high
Hi Jason,
It looks like you are evaluating your error on your training data,
aren't you? It will give you a (very) poor estimate of the
generalization error of your model. Instead, try your model on an
independent part of your datas
Hi Jason,
It looks like you are evaluating your error on your training data,
aren't you? It will give you a (very) poor estimate of the
generalization error of your model. Instead, try your model on an
independent part of your dataset (in particular, one which has a not
been used to fit to your fo
The first thing I'd do is publish the result (just kidding!).
Try it with another data set first, especially one that has an example in
the docs.
If you are still getting top marks, it may be your "framework" around the
code. (are you doing proper test/train splits, etc)
If it drops, consider that
I ran a few test based on Random Forest Classifier
(http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
with default setting. The classification (repeated the classification
procedure several times) is nearly 100% correct. That seems to be overfitting.