Re: [Scikit-learn-general] Classification accuracy too high

2013-08-15 Thread Jason Williams
By removing label from the training set, and then rerun the process (fit, predict, etc.). The result looks reasonable.  Thank you very much.  - Original Message - From: Andreas Mueller To: Jason Williams ; scikit-learn-general@lists.sourceforge.net Cc: Sent: Thursday, 15 August 2013

Re: [Scikit-learn-general] Classification accuracy too high

2013-08-15 Thread Jason Williams
tain header 3969 $ cat /tmp/partitioned_data/train_set | wc -l # contain header 12064 $ echo $((12063+3968)) 16031   - Original Message - From: Gilles Louppe To: "scikit-learn-general@lists.sourceforge.net" Cc: Jason Williams Sent: Thursday, 15 August 2013, 3:49 S

[Scikit-learn-general] Classification accuracy too high

2013-08-15 Thread Jason Williams
I ran a few test based on Random Forest Classifier (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) with default setting. The classification (repeated the classification procedure several times) is nearly 100% correct. That seems to be overfitting.

Re: [Scikit-learn-general] Can Random Forest Classifer ignore specific fields?

2013-08-13 Thread Jason Williams
olumns in your X array before feeding it to your estimator.  > > >(Note however that Random Forests have the advantages of being robust with >respect to noise attributes. Training with or without shouldn't change the >result by much.) > > >Best, > > >Gilles >

Re: [Scikit-learn-general] Can Random Forest Classifer ignore specific fields?

2013-08-13 Thread Jason Williams
simpler.   Thank for help From: Roland Szabo To: Jason Williams ; scikit-learn-general@lists.sourceforge.net Sent: Tuesday, 13 August 2013, 6:11 Subject: Re: [Scikit-learn-general] Can Random Forest Classifer ignore specific fields? Isn't it simpler to

[Scikit-learn-general] Can Random Forest Classifer ignore specific fields?

2013-08-13 Thread Jason Williams
I follow an example found on the internet (http://blog.yhathq.com/posts/random-forests-in-python.html) for using Random Forest Classifer. The result looks working. From the sample code, it looks like taking all attributes to train the model. But checking api (http://scikit-learn.org/stable/modu