Hi,

I am trying to run Naive Bayes Model using Spark ML libraries, in Java. 
The sample snippet of dataset is given below:

Raw Data -


But, as the input data needs to in numeric, so I am using one-hot-encoder 
on the Gender field[m->0,1][f->1,0]; and the finally the 'features' vector 
is inputted to Model, and I could get the Output.

Transformed Data - 


But the model results are not correct as the 'Gender' field[Originally, 
Categorical] is now considered as a continuous field after one-hot 
encoding transformations. 

Expectation is that - for 'continuous data', mean and variance ; and for 
'categorical data', the number of occurrences of different categories, is 
to be calculated. [In, my case, mean and variances are calculated even for 
the Gender Field].

So, is there any way by which I can indicate to the model that a 
particular data field is 'categorical' by nature?

Thanks

Best Regards
Amlan Jyoti


=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Reply via email to