Hi, I am trying to run Naive Bayes Model using Spark ML libraries, in Java. The sample snippet of dataset is given below:
Raw Data - But, as the input data needs to in numeric, so I am using one-hot-encoder on the Gender field[m->0,1][f->1,0]; and the finally the 'features' vector is inputted to Model, and I could get the Output. Transformed Data - But the model results are not correct as the 'Gender' field[Originally, Categorical] is now considered as a continuous field after one-hot encoding transformations. Expectation is that - for 'continuous data', mean and variance ; and for 'categorical data', the number of occurrences of different categories, is to be calculated. [In, my case, mean and variances are calculated even for the Gender Field]. So, is there any way by which I can indicate to the model that a particular data field is 'categorical' by nature? Thanks Best Regards Amlan Jyoti =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you