Not sure just off=hand. Need to look in more detail in a debugger. Need to find time to do that.
On Thu, Oct 11, 2012 at 1:58 AM, Rajesh Nikam <rajeshni...@gmail.com> wrote: > what could be the problem with data formatting ? > Could you please update on the same. > > On Thu, Oct 11, 2012 at 11:31 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > My first thought was that we needed several passes, but that is clearly > > wrong. > > > > I think that the problem is in the data formatting and conversion > somehow. > > Haven't had time to dope this out just yet. The iris data should > converge > > trivially. > > > > On Wed, Oct 10, 2012 at 9:58 PM, Rajesh Nikam <rajeshni...@gmail.com> > > wrote: > > > > > Thanks for looking into it. > > > > > > Actually first I have tried it with big data. Below was model info for > > it. > > > > > > AUC = 0.50 > > > confusion: [[1252978.0, 23003.0], [0.0, 0.0]] > > > entropy: [[-0.0, -0.0], [-46.1, -0.8]] > > > > > > Looking forward for your comments. > > > > > > Thanks > > > Rajesh > > > > > > > > > On Wed, Oct 10, 2012 at 8:08 PM, Ted Dunning <ted.dunn...@gmail.com> > > > wrote: > > > > > > > Sgd is more suitable for large data. I will take a look later today. > > > > > > > > Sent from my iPhone > > > > > > > > On Oct 9, 2012, at 11:29 PM, Rajesh Nikam <rajeshni...@gmail.com> > > wrote: > > > > > > > > > Hi Ted, > > > > > > > > > > Putting specific question with data for getting problem with SGD. > > > > > > > > > > I am using Iris Plants Database from Michael Marshall. PFA > iris.arff. > > > > > > > > > > Converted this to csv file just by updating header: > > iris-3-classes.csv > > > > > > > > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input > > > > /usr/local/mahout/trunk/iris-3-classes.csv --features 4 --output > > > > /usr/local/mahout/trunk/iris-3-classes.model --target class > > --categories > > > 3 > > > > --predictors sepallength sepalwidth petallength petalwidth --types n > n > > > > > > > > > > >> it gave following error. > > > > > Exception in thread "main" java.lang.IllegalArgumentException: Can > > only > > > > call classifyScalar with two categories > > > > > > > > > > Now created csv with only 2 classes. PFA iris-2-classes.csv > > > > > > > > > > >> trained iris-2-classes.csv with sgd > > > > > > > > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input > > > > /usr/local/mahout/trunk/iris-2-classes.csv --features 4 --output > > > > /usr/local/mahout/trunk/iris-2-classes.model --target class > > --categories > > > 2 > > > > --predictors sepallength sepalwidth petallength petalwidth --types n > n > > > > > > > > > > > > > > > mahout runlogistic --input > /usr/local/mahout/trunk/iris-2-classes.csv > > > > --model /usr/local/mahout/trunk/iris-2-classes.model --auc > --confusion > > > > > > > > > > AUC = 0.14 > > > > > confusion: [[50.0, 50.0], [0.0, 0.0]] > > > > > entropy: [[-0.6, -0.3], [-0.8, -0.4]] > > > > > > > > > > >> AUC seems to poor. Now changed --predictors > > > > > > > > > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input > > > > /usr/local/mahout/trunk/iris-2-classes.csv --features 4 --output > > > > /usr/local/mahout/trunk/iris-2-classes.model --target class > > --categories > > > 2 > > > > --predictors sepalwidth petallength --types n n > > > > > > > > > > mahout runlogistic --input > /usr/local/mahout/trunk/iris-2-classes.csv > > > > --model /usr/local/mahout/trunk/iris-2-classes.model --auc > --confusion > > > > --scores > > > > > > > > > > AUC = 0.80 > > > > > confusion: [[50.0, 50.0], [0.0, 0.0]] > > > > > entropy: [[-0.7, -0.3], [-0.7, -0.4]] > > > > > > > > > > AUC is improved, however from confusion matrix seems everything is > > > > classified as class a. > > > > > > > > > > Below is the output. > > > > > > > > > > "target","model-output","log-likelihood" > > > > > 0,0.492,-0.677017 > > > > > 0,0.493,-0.679192 > > > > > 0,0.493,-0.678355 > > > > > 0,0.493,-0.678724 > > > > > 0,0.492,-0.676583 > > > > > 0,0.491,-0.675182 > > > > > 0,0.492,-0.677452 > > > > > 0,0.492,-0.677419 > > > > > 0,0.493,-0.679628 > > > > > 0,0.493,-0.678724 > > > > > 0,0.491,-0.676116 > > > > > 0,0.492,-0.677386 > > > > > 0,0.493,-0.679192 > > > > > 0,0.493,-0.679291 > > > > > 0,0.491,-0.674912 > > > > > 0,0.490,-0.673081 > > > > > 0,0.491,-0.675313 > > > > > 0,0.492,-0.677017 > > > > > 0,0.491,-0.675616 > > > > > 0,0.491,-0.675682 > > > > > 0,0.492,-0.677353 > > > > > 0,0.491,-0.676116 > > > > > 0,0.492,-0.676714 > > > > > 0,0.492,-0.677788 > > > > > 0,0.492,-0.677287 > > > > > 0,0.493,-0.679126 > > > > > 0,0.492,-0.677386 > > > > > 0,0.492,-0.676984 > > > > > 0,0.492,-0.677452 > > > > > 0,0.492,-0.678256 > > > > > 0,0.493,-0.678691 > > > > > 0,0.492,-0.677419 > > > > > 0,0.491,-0.674381 > > > > > 0,0.490,-0.673980 > > > > > 0,0.493,-0.678724 > > > > > 0,0.493,-0.678387 > > > > > 0,0.492,-0.677050 > > > > > 0,0.493,-0.678724 > > > > > 0,0.493,-0.679225 > > > > > 0,0.492,-0.677419 > > > > > 0,0.492,-0.677050 > > > > > 0,0.495,-0.682279 > > > > > 0,0.493,-0.678355 > > > > > 0,0.492,-0.676951 > > > > > 0,0.491,-0.675550 > > > > > 0,0.493,-0.679192 > > > > > 0,0.491,-0.675649 > > > > > 0,0.493,-0.678322 > > > > > 0,0.491,-0.676116 > > > > > 0,0.492,-0.677887 > > > > > 1,0.492,-0.709316 > > > > > 1,0.492,-0.709248 > > > > > 1,0.492,-0.708935 > > > > > 1,0.494,-0.705048 > > > > > 1,0.493,-0.707488 > > > > > 1,0.493,-0.707454 > > > > > 1,0.492,-0.709765 > > > > > 1,0.494,-0.705258 > > > > > 1,0.493,-0.707936 > > > > > 1,0.493,-0.706803 > > > > > 1,0.495,-0.703539 > > > > > 1,0.493,-0.708249 > > > > > 1,0.494,-0.704601 > > > > > 1,0.493,-0.707970 > > > > > 1,0.493,-0.707597 > > > > > 1,0.492,-0.708765 > > > > > 1,0.492,-0.708351 > > > > > 1,0.493,-0.706871 > > > > > 1,0.494,-0.704770 > > > > > 1,0.494,-0.705908 > > > > > 1,0.492,-0.709350 > > > > > 1,0.493,-0.707285 > > > > > 1,0.493,-0.706247 > > > > > 1,0.493,-0.707522 > > > > > 1,0.493,-0.707835 > > > > > 1,0.492,-0.708317 > > > > > 1,0.493,-0.707556 > > > > > 1,0.492,-0.708520 > > > > > 1,0.493,-0.707902 > > > > > 1,0.494,-0.706220 > > > > > 1,0.494,-0.705427 > > > > > 1,0.494,-0.705393 > > > > > 1,0.493,-0.706803 > > > > > 1,0.493,-0.707210 > > > > > 1,0.492,-0.708351 > > > > > 1,0.492,-0.710146 > > > > > 1,0.492,-0.708867 > > > > > 1,0.494,-0.705183 > > > > > 1,0.493,-0.708215 > > > > > 1,0.494,-0.705942 > > > > > 1,0.493,-0.706525 > > > > > 1,0.492,-0.708385 > > > > > 1,0.493,-0.706389 > > > > > 1,0.494,-0.704811 > > > > > 1,0.493,-0.706905 > > > > > 1,0.493,-0.708249 > > > > > 1,0.493,-0.707801 > > > > > 1,0.493,-0.707835 > > > > > 1,0.494,-0.705604 > > > > > 1,0.493,-0.707319 > > > > > > > > > > AUC = 0.80 > > > > > confusion: [[50.0, 50.0], [0.0, 0.0]] > > > > > entropy: [[-0.7, -0.3], [-0.7, -0.4]] > > > > > > > > > > SGD is suitable for what kind of data? > > > > > > > > > > Thanks, > > > > > Rajesh > > > > > > > > > > > > > > > <iris-2-classes.csv> > > > > > <iris-3-classes.csv> > > > > > > > > > >