Greetings,
Following the example on the AS page for Naive Bayes using Dataset<Row>
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes
<https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes>
I want to predict the outcome of another set of data. So instead of
splitting the data into training and testing, I have 1 set of training and
one set of testing. i.e.;
Dataset<Row> training = spark.createDataFrame(dataTraining,
schemaForFrame);
Dataset<Row> testing = spark.createDataFrame(dataTesting,
schemaForFrame);
NaiveBayes nb = new NaiveBayes();
NaiveBayesModel model = nb.fit(train);
Dataset<Row> predictions = model.transform(testing);
predictions.show();
But I get the error.
17/07/11 13:40:38 INFO DAGScheduler: Job 2 finished: collect at
NaiveBayes.scala:171, took 3.942413 s
Exception in thread "main" org.apache.spark.SparkException: Failed to
execute user defined function($anonfun$1: (vector) => vector)
at
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1075)
at
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:144)
at
org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:48)
at
org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:30)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
...
...
...
How do I perform predictions on other datasets that were not created at a
split?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Testing-another-Dataset-after-ML-training-tp28845.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]