Thanks for the help. I set  --executor-cores and it works now. I've used
--total-executor-cores and don't realize it changed.

Tathagata Das <t...@databricks.com>于2015年7月10日周五 上午3:11写道:

> 1. There will be a long running job with description "start()" as that is
> the jobs that is running the receivers. It will never end.
>
> 2. You need to set the number of cores given to the Spark executors by the
> YARN container. That is SparkConf spark.executor.cores,  --executor-cores
> in spark-submit. Since it is by default 1, your only container has one core
> which is occupied by the receiver, leaving no cores to run the map tasks.
> So the map stage is blocked
>
> 3.  Note these log lines. Especially "15/07/09 18:29:00 INFO
> receiver.ReceiverSupervisorImpl: Received stop signal" . I think somehow
> your streaming context is being shutdown too early which is causing the
> KafkaReceiver to stop. Something your should debug.
>
>
> 15/07/09 18:27:13 INFO consumer.ConsumerFetcherThread: 
> [ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42],
>  Starting
> 15/07/09 18:27:13 INFO consumer.ConsumerFetcherManager: 
> [ConsumerFetcherManager-1436437633199] Added fetcher for partitions 
> ArrayBuffer([[adhoc_data,0], initOffset 53 to broker 
> id:42,host:szq1.appadhoc.com,port:9092] )
> 15/07/09 18:27:13 INFO storage.MemoryStore: ensureFreeSpace(1680) called with 
> curMem=96628, maxMem=16669841817
> 15/07/09 18:27:13 INFO storage.MemoryStore: Block input-0-1436437633600 
> stored as bytes in memory (estimated size 1680.0 B, free 15.5 GB)
> 15/07/09 18:27:13 WARN storage.BlockManager: Block input-0-1436437633600 
> replicated to only 0 peer(s) instead of 1 peers
> 15/07/09 18:27:14 INFO receiver.BlockGenerator: Pushed block 
> input-0-1436437633600*15/07/09 18:29:00 INFO receiver.ReceiverSupervisorImpl: 
> Received stop signal
> *15/07/09 18:29:00 INFO receiver.ReceiverSupervisorImpl: Stopping receiver 
> with message: Stopped by driver:
> 15/07/09 18:29:00 INFO consumer.ZookeeperConsumerConnector: 
> [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201], 
> ZKConsumerConnector shutting down
> 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager: 
> [ConsumerFetcherManager-1436437633199] Stopping leader finder thread
> 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: 
> [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread],
>  Shutting down
> 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: 
> [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread],
>  Stopped
> 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: 
> [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-leader-finder-thread],
>  Shutdown completed
> 15/07/09 18:29:00 INFO consumer.ConsumerFetcherManager: 
> [ConsumerFetcherManager-1436437633199] Stopping all fetchers
> 15/07/09 18:29:00 INFO consumer.ConsumerFetcherThread: 
> [ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42],
>  Shutting down
> 15/07/09 18:29:01 INFO consumer.SimpleConsumer: Reconnect due to socket 
> error: java.nio.channels.ClosedByInterruptException
> 15/07/09 18:29:01 INFO consumer.ConsumerFetcherThread: 
> [ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42],
>  Stopped
> 15/07/09 18:29:01 INFO consumer.ConsumerFetcherThread: 
> [ConsumerFetcherThread-adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201-0-42],
>  Shutdown completed
> 15/07/09 18:29:01 INFO consumer.ConsumerFetcherManager: 
> [ConsumerFetcherManager-1436437633199] All connections stopped
> 15/07/09 18:29:01 INFO zkclient.ZkEventThread: Terminate ZkClient event 
> thread.
> 15/07/09 18:29:01 INFO zookeeper.ZooKeeper: Session: 0x14e70eedca00315 closed
> 15/07/09 18:29:01 INFO zookeeper.ClientCnxn: EventThread shut down
> 15/07/09 18:29:01 INFO consumer.ZookeeperConsumerConnector: 
> [adhoc_data_spark_szq1.appadhoc.com-1436437633136-a84a7201], 
> ZKConsumerConnector shutdown completed in 74 ms
> 15/07/09 18:29:01 INFO receiver.ReceiverSupervisorImpl: Called receiver onStop
> 15/07/09 18:29:01 INFO receiver.ReceiverSupervisorImpl: Deregistering 
> receiver 0
>
>
>
>

Reply via email to