Re: Spark ML Random Forest output.

Eugene Morozov Sat, 05 Dec 2015 01:07:52 -0800

Vishnu, thanks for the response.

The problem is that I actually do not have index labels, they are hidden in
the dataframe as a metadata. And anyone, who'd like to use that have to
apply an ugly hack.


The issue might be even worse in case I serialize my model into a file for
a delayed use. When I later on read it from the file, I do not have such a
map at all. The only workaround is to store the map along with serialized
model, which is not really great.

--
Be well!
Jean Morozov

On Sat, Dec 5, 2015 at 2:24 AM, Vishnu Viswanath <
vishnu.viswanat...@gmail.com> wrote:

> Hi,
>
> As per my understanding the probability matrix is giving the probability
> that that particular item can belong to each class. So the one with highest
> probability is your predicted class.
>
> Since you have converted you label to index label, according the model the
> classes are 0.0 to 9.0 and I see you are getting prediction as a value
> which is in [0.0,1.0,....,9.0] -  which is correct.
>
> So what you want is a reverse map that can convert your predicted class
> back to the String. I don't know if  StringIndexer has such an option, may
> be you can create your own map and reverse map of (label to index) and
> (index to label) and use this for getting back your original label.
>
> May be there is better way to do this..
>
> Regards,
> Vishnu
>
> On Fri, Dec 4, 2015 at 4:56 PM, Eugene Morozov <evgeny.a.moro...@gmail.com
> > wrote:
>
>> Hello,
>>
>> I've got an input dataset of handwritten digits and working java code
>> that uses random forest classification algorithm to determine the numbers.
>> My test set is just some lines from the same input dataset - just to be
>> sure I'm doing the right thing. My understanding is that having correct
>> classifier in this case would give me the correct prediction.
>> At the moment overfitting is not an issue.
>>
>> After applying StringIndexer to my input DataFrame I've applied an ugly
>> trick and got "indexedLabel" metadata:
>>
>> {"ml_attr":{"vals":["1.0","7.0","3.0","9.0","2.0","6.0","0.0","4.0","8.0","5.0"],"type":"nominal","name":"indexedLabel"}}
>>
>>
>> So, my algorithm gives me the following result. The question is I'm not
>> sure I understand the meaning of the "prediction" here in the output. It
>> looks like it's just an index of the highest probability value in the
>> "prob" array. Shouldn't "prediction" be the actual class, i.e. one of the
>> "0.0", "1.0", ..., "9.0"? If the prediction is just an ordinal number, then
>> I have to manually correspond it to my classes, but for that I have to
>> either specify classes manually to know their order or somehow be able to
>> get them out of the classifier. Both of these way seem to be are not
>> accessible.
>>
>> (4.0 -> prediction=7.0,
>> prob=[0.004708283878223195,0.08478236104777455,0.03594642191080532,0.19286506771018885,0.038304389235523435,0.028413077979999386,0.003334431932056404,0.5685242322346109,0.018564705500837587,0.024557028569980155]
>> (9.0 -> prediction=3.0,
>> prob=[0.018432404716456248,0.16837195846781422,0.05995559403934031,0.32282148259583565,0.018374168600855455,0.04792285114398864,0.018226352623526704,0.1611650363085499,0.11703073969440755,0.06769941180922535]
>> (2.0 -> prediction=4.0,
>> prob=[0.017918245251872154,0.029243677407669404,0.06228050320552064,0.03633295481094746,0.45707974962418885,0.09675606366289394,0.03921437851648226,0.043917057390743426,0.14132883075087405,0.0759285393788078]
>>
>> So, what is the prediction here? How can I specify classes manually or
>> get the valid access to them?
>> --
>> Be well!
>> Jean Morozov
>>
>
>
>

Re: Spark ML Random Forest output.

Reply via email to