Hi all,
I am trying to use spark environment in predict function of ML engine for
text analysis. It extends P2LAlgorithm algorithm. System works on
standalone cluster.
Predict function for new query is as below :
> override def predict(model: NaiveBayesModel, query: Query):
> PredictedResult = {
> val sc_new = SparkContext.getOrCreate()
> val sqlContext = SQLContext.getOrCreate(sc_new)
> val phraseDataframe = sqlContext.createDataFrame(Seq(query)).toDF("text")
> val dpObj = new DataPreparator
> val tf = dpObj.processPhrase(phraseDataframe)
tf.show()
val labeledpoints = tf.map(row => row.getAs[Vector]("rowFeatures"))
> val predictedResult = model.predict(labeledpoints)
> *return *predictedResult
it trains properly in pio train and while deploying as well it predicts
results properly for single query.
But in case of pio eval, when i try to check accuracy of model it runs
upto tf.show() properly but when forming labelled point statement comes, it
stuck and after waiting for long it shows error that it lost spark executor
and no heartbeat received. Here it is error log :
WARN org.apache.spark.HeartbeatReceiver
> [sparkDriver-akka.actor.default-dispatcher-14] - Removing executor driver
> with no recent heartbeats: 686328 ms exceeds timeout 120000 ms
>
> ERROR org.apache.spark.scheduler.TaskSchedulerImpl
> [sparkDriver-akka.actor.default-dispatcher-14] - Lost executor driver on
> localhost: Executor heartbeat timed out after 686328 ms
>
> WARN org.apache.spark.scheduler.TaskSetManager
> [sparkDriver-akka.actor.default-dispatcher-14] - Lost task 3.0 in stage
> 103.0 (TID 237, localhost): ExecutorLostFailure (executor driver lost)
>
> ERROR org.apache.spark.scheduler.TaskSetManager
> [sparkDriver-akka.actor.default-dispatcher-14] - Task 3 in stage 103.0
> failed 1 times; aborting job
> ......
> org.apache.spark.SparkException: Job cancelled because SparkContext was
> shut down
Please suggest me how to solve this issue.
Thank you.
Regards,
Bansari Shah