Re: Using data in RDD to specify HDFS directory to write to

Akhil Das Sun, 16 Nov 2014 09:43:03 -0800

Can you check in the worker logs what exactly is happening!??

Thanks
Best Regards


On Sun, Nov 16, 2014 at 2:54 AM, jschindler <john.schind...@utexas.edu>
wrote:

> UPDATE
>
> I have removed  and added things systematically to the job and have figured
> that the inclusion of the construction of the SparkContext object is what
> is
> causing it to fail.
>
> The last run contained the code below.
>
> I keep losing executors apparently and I'm not sure why.  Some of the
> relevant spark output is below, will add more on Monday as I must go
> participate in wknd activities.
>
>  14/11/15 14:53:43 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20141115145328-0025/3 on hostPort cloudera01.local.company.com:7078
> with
> 8 cores, 512.0 MB RAM
> 14/11/15 14:53:43 INFO AppClient$ClientActor: Executor updated:
> app-20141115145328-0025/3 is now RUNNING
> 14/11/15 14:53:46 INFO MemoryStore: ensureFreeSpace(1063) called with
> curMem=1063, maxMem=309225062
> 14/11/15 14:53:46 INFO MemoryStore: Block input-0-1416084826000 stored as
> bytes to memory (size 1063.0 B, free 294.9 MB)
> 14/11/15 14:53:46 INFO BlockManagerInfo: Added input-0-1416084826000 in
> memory on cloudera01.local.company.com:49902 (size: 1063.0 B, free: 294.9
> MB)
> 14/11/15 14:53:46 INFO BlockManagerMaster: Updated info of block
> input-0-1416084826000
> 14/11/15 14:53:46 WARN BlockManager: Block input-0-1416084826000 already
> exists on this machine; not re-adding it
> 14/11/15 14:53:46 INFO BlockGenerator: Pushed block input-0-1416084826000
> 14/11/15 14:53:46 INFO SparkDeploySchedulerBackend: Registered executor:
> Actor[akka.tcp://
> sparkexecu...@cloudera01.local.company.com:52715/user/Executor#-1518587721
> ]
> with ID 3
> 14/11/15 14:53:47 INFO BlockManagerInfo: Registering block manager
> cloudera01.local.company.com:46926 with 294.9 MB RAM
> 14/11/15 14:53:47 INFO SparkDeploySchedulerBackend: Executor 3
> disconnected,
> so removing it
> 14/11/15 14:53:47 ERROR TaskSchedulerImpl: Lost an executor 3 (already
> removed): remote Akka client disassociated
> 14/11/15 14:53:47 INFO AppClient$ClientActor: Executor updated:
> app-20141115145328-0025/3 is now EXITED (Command exited with code 1)
> 14/11/15 14:53:47 INFO SparkDeploySchedulerBackend: Executor
> app-20141115145328-0025/3 removed: Command exited with code 1
> 14/11/15 14:53:47 INFO AppClient$ClientActor: Executor added:
> app-20141115145328-0025/4 on
> worker-20141114114152-cloudera01.local.company.com-7078
> (cloudera01.local.company.com:7078) with 8 cores
>
> BLOCK 2 - last block before app fails:
>
> 14/11/15 14:54:15 INFO BlockManagerInfo: Registering block manager
> cloudera01.local.uship.com:34335 with 294.9 MB RAM
> 14/11/15 14:54:16 INFO SparkDeploySchedulerBackend: Executor 9
> disconnected,
> so removing it
> 14/11/15 14:54:16 ERROR TaskSchedulerImpl: Lost an executor 9 (already
> removed): remote Akka client disassociated
> 14/11/15 14:54:16 INFO AppClient$ClientActor: Executor updated:
> app-20141115145328-0025/9 is now EXITED (Command exited with code 1)
> 14/11/15 14:54:16 INFO SparkDeploySchedulerBackend: Executor
> app-20141115145328-0025/9 removed: Command exited with code 1
> 14/11/15 14:54:16 ERROR SparkDeploySchedulerBackend: Application has been
> killed. Reason: Master removed our application: FAILED
> 14/11/15 14:54:16 ERROR TaskSchedulerImpl: Exiting due to error from
> cluster
> scheduler: Master removed our application: FAILED
> [hdfs@cloudera01 root]$
>
>
>
> import kafka.producer._
> import org.apache.spark.streaming._
> import org.apache.spark.streaming.StreamingContext._
> import org.apache.spark.streaming.kafka._
> import org.apache.spark.SparkConf
> import org.apache.spark._
>
> import org.json4s._
> import org.json4s.native.JsonMethods._
>
> import scala.collection.mutable.Map
> import scala.collection.mutable.MutableList
>
> case class Event(EventName: String, Payload: org.json4s.JValue)
>
> object App {
>
>   def main(args: Array[String]) {
>
>     val ssc = new StreamingContext("local[2]", "Data", Seconds(20))
>     ssc.checkpoint("checkpoint")
>
>
>       val conf = new
> SparkConf().setMaster("spark://cloudera01.local.company.com:7077")
>       val sc = new SparkContext(conf)
>
>
>
>     val eventMap = scala.collection.immutable.Map("Events" -> 1)
>     val pipe = KafkaUtils.createStream(ssc,
> "dockerrepo,dockerrepo,dockerrepo", "Cons1", eventMap).map(_._2)
>
>
>     val eventStream = pipe.map(data => {
>       parse(data)
>     }).map(json => {
>
>
>       implicit val formats = DefaultFormats
>       val eventName = (json \ "event").extractOpt[String]
>       Event(eventName.getOrElse("*** NO EVENT NAME ***"), json)
>
>     })
>
>
>     eventStream.foreach(x => {
>       var arr = x.toArray
>       x.foreachPartition(y => {
>         y.foreach(z => {print(z)})
>
>       })
>     })
>
>
>     ssc.start()
>     ssc.awaitTermination()
>
>   }
>
> }
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-data-in-RDD-to-specify-HDFS-directory-to-write-to-tp18789p19012.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Using data in RDD to specify HDFS directory to write to

Reply via email to