Vishnu, thanks for the response. The problem is that I actually do not have index labels, they are hidden in the dataframe as a metadata. And anyone, who'd like to use that have to apply an ugly hack.
The issue might be even worse in case I serialize my model into a file for a delayed use. When I later on read it from the file, I do not have such a map at all. The only workaround is to store the map along with serialized model, which is not really great. -- Be well! Jean Morozov On Sat, Dec 5, 2015 at 2:24 AM, Vishnu Viswanath < vishnu.viswanat...@gmail.com> wrote: > Hi, > > As per my understanding the probability matrix is giving the probability > that that particular item can belong to each class. So the one with highest > probability is your predicted class. > > Since you have converted you label to index label, according the model the > classes are 0.0 to 9.0 and I see you are getting prediction as a value > which is in [0.0,1.0,....,9.0] - which is correct. > > So what you want is a reverse map that can convert your predicted class > back to the String. I don't know if StringIndexer has such an option, may > be you can create your own map and reverse map of (label to index) and > (index to label) and use this for getting back your original label. > > May be there is better way to do this.. > > Regards, > Vishnu > > On Fri, Dec 4, 2015 at 4:56 PM, Eugene Morozov <evgeny.a.moro...@gmail.com > > wrote: > >> Hello, >> >> I've got an input dataset of handwritten digits and working java code >> that uses random forest classification algorithm to determine the numbers. >> My test set is just some lines from the same input dataset - just to be >> sure I'm doing the right thing. My understanding is that having correct >> classifier in this case would give me the correct prediction. >> At the moment overfitting is not an issue. >> >> After applying StringIndexer to my input DataFrame I've applied an ugly >> trick and got "indexedLabel" metadata: >> >> {"ml_attr":{"vals":["1.0","7.0","3.0","9.0","2.0","6.0","0.0","4.0","8.0","5.0"],"type":"nominal","name":"indexedLabel"}} >> >> >> So, my algorithm gives me the following result. The question is I'm not >> sure I understand the meaning of the "prediction" here in the output. It >> looks like it's just an index of the highest probability value in the >> "prob" array. Shouldn't "prediction" be the actual class, i.e. one of the >> "0.0", "1.0", ..., "9.0"? If the prediction is just an ordinal number, then >> I have to manually correspond it to my classes, but for that I have to >> either specify classes manually to know their order or somehow be able to >> get them out of the classifier. Both of these way seem to be are not >> accessible. >> >> (4.0 -> prediction=7.0, >> prob=[0.004708283878223195,0.08478236104777455,0.03594642191080532,0.19286506771018885,0.038304389235523435,0.028413077979999386,0.003334431932056404,0.5685242322346109,0.018564705500837587,0.024557028569980155] >> (9.0 -> prediction=3.0, >> prob=[0.018432404716456248,0.16837195846781422,0.05995559403934031,0.32282148259583565,0.018374168600855455,0.04792285114398864,0.018226352623526704,0.1611650363085499,0.11703073969440755,0.06769941180922535] >> (2.0 -> prediction=4.0, >> prob=[0.017918245251872154,0.029243677407669404,0.06228050320552064,0.03633295481094746,0.45707974962418885,0.09675606366289394,0.03921437851648226,0.043917057390743426,0.14132883075087405,0.0759285393788078] >> >> So, what is the prediction here? How can I specify classes manually or >> get the valid access to them? >> -- >> Be well! >> Jean Morozov >> > > >