I would investigate GC using JVM of your executors, very common to see that
error during large GC pauses.

On Thu, Jan 19, 2017 at 9:44 AM, Donald Szeto <[email protected]> wrote:

> Do you have more detail logs from Spark executors?
>
> Regards,
> Donald
>
> On Wed, Dec 14, 2016 at 3:26 AM Bansari Shah <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I am trying to use spark environment in predict function of ML engine for
>> text analysis. It extends P2LAlgorithm algorithm. System works on
>> standalone cluster.
>>
>> Predict function for new query is as below :
>>
>> override def predict(model: NaiveBayesModel, query: Query):
>> PredictedResult = {
>> val sc_new = SparkContext.getOrCreate()
>> val sqlContext = SQLContext.getOrCreate(sc_new)
>> val phraseDataframe = sqlContext.createDataFrame(Seq(query)).toDF("text")
>> val dpObj = new DataPreparator
>> val tf = dpObj.processPhrase(phraseDataframe)
>>
>> tf.show()
>>
>> val labeledpoints = tf.map(row => row.getAs[Vector]("rowFeatures"))
>> val predictedResult = model.predict(labeledpoints)
>> *return *predictedResult
>>
>>
>> it trains properly in pio train and while deploying as well it predicts
>> results properly for single query.
>>
>>  But in case of pio eval, when i try to check accuracy of model it runs
>> upto tf.show() properly but when forming labelled point statement comes, it
>> stuck and after waiting for long it shows error that it lost spark executor
>> and no heartbeat received. Here it is error log :
>>
>> WARN org.apache.spark.HeartbeatReceiver 
>> [sparkDriver-akka.actor.default-dispatcher-14]
>> - Removing executor driver with no recent heartbeats: 686328 ms exceeds
>> timeout 120000 ms
>>
>>
>>
>> ERROR org.apache.spark.scheduler.TaskSchedulerImpl
>> [sparkDriver-akka.actor.default-dispatcher-14] - Lost executor driver on
>> localhost: Executor heartbeat timed out after 686328 ms
>>
>>
>>
>> WARN org.apache.spark.scheduler.TaskSetManager 
>> [sparkDriver-akka.actor.default-dispatcher-14]
>> - Lost task 3.0 in stage 103.0 (TID 237, localhost): ExecutorLostFailure
>> (executor driver lost)
>>
>>
>>
>> ERROR org.apache.spark.scheduler.TaskSetManager 
>> [sparkDriver-akka.actor.default-dispatcher-14]
>> - Task 3 in stage 103.0 failed 1 times; aborting job
>> ......
>> org.apache.spark.SparkException: Job cancelled because SparkContext was
>> shut down
>>
>>
>> Please suggest me how to solve this issue.
>>
>> Thank you.
>> Regards,
>> Bansari Shah
>>
>>
>>
>>


-- 
Thank you,
Felipe

http://geeks.aretotally.in
http://twitter.com/_felipera

Reply via email to