Re: [Scikit-learn-general] data preprocessing

2012-11-01 Thread Andreas Mueller
On 11/01/2012 03:43 PM, paul.czodrow...@merckgroup.com wrote: Dear RDKitters, > > However, I found it strange that "X_train.shape" gives (373, 177) - > > shouldn't be the second bit be the number of classes, i.e. 2? > > [snip] > > > 177 corresponds, BTW, to the number of features.. > > And that

Re: [Scikit-learn-general] data preprocessing

2012-11-01 Thread Paul . Czodrowski
Dear RDKitters, > > However, I found it strange that "X_train.shape" gives (373, 177) - > > shouldn't be the second bit be the number of classes, i.e. 2? > > [snip] > > > 177 corresponds, BTW, to the number of features.. > > And that's exactly what this is supposed to represent. The number of

Re: [Scikit-learn-general] data preprocessing

2012-11-01 Thread Lars Buitinck
2012/11/1 : > I was trying to do a train/test set split: > from sklearn.cross_validation import train_test_split > X_train, X_test, y_train, y_test = train_test_split(dataDescrs, > data_activities, test_size=.4) > > However, I found it strange that "X_train.shape" gives (373, 177) - > shouldn't be

Re: [Scikit-learn-general] data preprocessing

2012-11-01 Thread Paul . Czodrowski
> > given a list of of features - e.g. dataDescrs[0] = (140.0, 2, 0.5 - and a > > list of experimental observations - e.g. data_activities[0] = 0 - how do I > > transform these lists to the scikit-learn nomenclature? > > Depends on what these things represent, but if all tuples in > dataDescrs h

Re: [Scikit-learn-general] data preprocessing

2012-11-01 Thread Lars Buitinck
2012/11/1 : > given a list of of features - e.g. dataDescrs[0] = (140.0, 2, 0.5 - and a > list of experimental observations - e.g. data_activities[0] = 0 - how do I > transform these lists to the scikit-learn nomenclature? Depends on what these things represent, but if all tuples in dataDescrs ha

[Scikit-learn-general] data preprocessing

2012-11-01 Thread Paul . Czodrowski
Dear Scikitters, given a list of of features - e.g. dataDescrs[0] = (140.0, 2, 0.5 - and a list of experimental observations - e.g. data_activities[0] = 0 - how do I transform these lists to the scikit-learn nomenclature? Cheers & Thanks, Paul This message and any attachment are confidential