I have a big dataset of categories of cars and descriptions of cars. So i want to give a description of a car and the program to classify the category of that car. So i decided to use multinomial naive Bayes. I created a unique id for each word and replaced my whole category,description data.
//My input 2,25187 15095 22608 28756 17862 29523 499 32681 9830 24957 18993 19501 16596 17953 16596 20,1846 29058 16252 20446 9835 52,16861 808 26785 17874 18993 18993 18993 18269 34157 33811 18437 6004 2791 27923 19141 ... ... Why do I have errors like: //Errors 3 ERROR Executor: Exception in task 0.0 in stage 211.0 (TID 392) java.lang.IndexOutOfBoundsException: 13 not in [-13,13) ERROR Executor: Exception in task 1.0 in stage 211.0 (TID 393) java.lang.IndexOutOfBoundsException: 17 not in [-17,17) ERROR TaskSetManager: Task 0 in stage 211.0 failed 1 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 211.0 failed 1 times, most recent failure: Lost task 0.0 in stage 211.0 (TID 392, localhost): java.lang.IndexOutOfBoundsException: 13 not in [-13,13) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-Problem-tp22531.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org