Hi all, I'm trying to use Spark and Cassandra.
I have two datacenter in different regions on AWS, and tried ran simple table count program. However, I'm still getting * WARN TaskSchedulerImpl: Initial job has not accepted any resources; * , and Spark can't finish the processing. The test table only has 571 rows and 2 small columns. I assume it doesn't require a lot of memory for small table. I also tried increasing Cores and Ram in Spark config files, but the result is still same. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- scala> import com.datastax.spark.connector._ import com.datastax.spark.connector._ scala> import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.{SparkContext, SparkConf} scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host", "172.17.10.44").set("spark.cassandra.auth.username", "masteruser").set("spark.cassandra.auth.password", "password") conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@1cfffdf3 scala> val sc = new SparkContext("spark://172.17.10.182:7077", "test", conf) 15/02/23 21:56:21 INFO SecurityManager: Changing view acls to: root 15/02/23 21:56:21 INFO SecurityManager: Changing modify acls to: root 15/02/23 21:56:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/02/23 21:56:21 INFO Slf4jLogger: Slf4jLogger started 15/02/23 21:56:21 INFO Remoting: Starting remoting 15/02/23 21:56:21 INFO Utils: Successfully started service 'sparkDriver' on port 41709. 15/02/23 21:56:21 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ip-172-17-10-182:41709] 15/02/23 21:56:21 INFO SparkEnv: Registering MapOutputTracker 15/02/23 21:56:21 INFO SparkEnv: Registering BlockManagerMaster 15/02/23 21:56:21 INFO DiskBlockManager: Created local directory at /srv/spark/tmp/spark-9f50ea1b-e8eb-4cb8-8f48-d04e3ec525a2/spark-61a2d7fa-697e-4a61-80af-c3d72149f244 15/02/23 21:56:21 INFO MemoryStore: MemoryStore started with capacity 534.5 MB 15/02/23 21:56:21 INFO HttpFileServer: HTTP File server directory is /srv/spark/tmp/spark-1c34ed81-1ea9-45b1-81dd-184f12b975f6/spark-7c001536-1b70-40ea-9013-14551ad05a29 15/02/23 21:56:21 INFO HttpServer: Starting HTTP Server 15/02/23 21:56:21 INFO Utils: Successfully started service 'HTTP file server' on port 51439. 15/02/23 21:56:21 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/02/23 21:56:21 INFO SparkUI: Started SparkUI at http://52.10.105.190:4040 15/02/23 21:56:21 INFO SparkContext: Added JAR file:/home/ubuntu/spark-cassandra-connector/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar at http://172.17.10.182:51439/jars/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar with timestamp 1424728581916 15/02/23 21:56:21 INFO AppClient$ClientActor: Connecting to master spark://172.17.10.182:7077... 15/02/23 21:56:21 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150223215621-0010 15/02/23 21:56:21 INFO NettyBlockTransferService: Server created on 45474 15/02/23 21:56:21 INFO BlockManagerMaster: Trying to register BlockManager 15/02/23 21:56:21 INFO BlockManagerMasterActor: Registering block manager ip-172-17-10-182:45474 with 534.5 MB RAM, BlockManagerId(<driver>, ip-172-17-10-182, 45474) 15/02/23 21:56:21 INFO BlockManagerMaster: Registered BlockManager 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor added: app-20150223215621-0010/0 on worker-20150223191054-ip-172-17-10-45-9000 (ip-172-17-10-45:9000) with 2 cores 15/02/23 21:56:22 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150223215621-0010/0 on hostPort ip-172-17-10-45:9000 with 2 cores, 512.0 MB RAM 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor added: app-20150223215621-0010/1 on worker-20150223191054-ip-172-17-10-47-9000 (ip-172-17-10-47:9000) with 2 cores 15/02/23 21:56:22 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150223215621-0010/1 on hostPort ip-172-17-10-47:9000 with 2 cores, 512.0 MB RAM 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor added: app-20150223215621-0010/2 on worker-20150223191055-ip-172-17-10-46-9000 (ip-172-17-10-46:9000) with 2 cores 15/02/23 21:56:22 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150223215621-0010/2 on hostPort ip-172-17-10-46:9000 with 2 cores, 512.0 MB RAM 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor added: app-20150223215621-0010/3 on worker-20150223191051-ip-172-17-10-44-9000 (ip-172-17-10-44:9000) with 2 cores 15/02/23 21:56:22 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150223215621-0010/3 on hostPort ip-172-17-10-44:9000 with 2 cores, 512.0 MB RAM 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/0 is now LOADING 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/2 is now LOADING 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/1 is now LOADING 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/3 is now LOADING 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/0 is now RUNNING 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/1 is now RUNNING 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/2 is now RUNNING 15/02/23 21:56:22 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/3 is now RUNNING 15/02/23 21:56:22 INFO EventLoggingListener: Logging events to file:/tmp/spark-events//app-20150223215621-0010 15/02/23 21:56:22 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@3649aa92 scala> val rdd= sc.cassandraTable("keyspace", "table") table: com.datastax.spark.connector.rdd.CassandraRDD[com.datastax.spark.connector.CassandraRow] = CassandraRDD[0] at RDD at CassandraRDD.scala:50 scala> rdd.toArray.foreach(println) warning: there were 1 deprecation warning(s); re-run with -deprecation for details 15/02/23 21:56:40 INFO Cluster: New Cassandra host /172.17.10.44:9042 added 15/02/23 21:56:40 INFO Cluster: New Cassandra host /172.17.10.62:9042 added *15/02/23 21:56:40 INFO LocalNodeFirstLoadBalancingPolicy: Added host 172.17.10.62 (DC2) <- Datacenter in different region. * 15/02/23 21:56:40 INFO Cluster: New Cassandra host /172.17.10.46:9042 added 15/02/23 21:56:40 INFO LocalNodeFirstLoadBalancingPolicy: Added host 172.17.10.46 (BACK) 15/02/23 21:56:40 INFO Cluster: New Cassandra host /172.17.10.71:9042 added *15/02/23 21:56:40 INFO LocalNodeFirstLoadBalancingPolicy: Added host 172.17.10.71 (DC2)* 15/02/23 21:56:40 INFO Cluster: New Cassandra host /172.17.10.159:9042 added *15/02/23 21:56:40 INFO LocalNodeFirstLoadBalancingPolicy: Added host 172.17.10.159 (DC2)* 15/02/23 21:56:40 INFO Cluster: New Cassandra host /172.17.10.45:9042 added 15/02/23 21:56:40 INFO LocalNodeFirstLoadBalancingPolicy: Added host 172.17.10.45 (BACK) 15/02/23 21:56:40 INFO Cluster: New Cassandra host /172.17.10.72:9042 added *15/02/23 21:56:40 INFO LocalNodeFirstLoadBalancingPolicy: Added host 172.17.10.72 (DC2)* 15/02/23 21:56:40 INFO Cluster: New Cassandra host /172.17.10.47:9042 added 15/02/23 21:56:40 INFO LocalNodeFirstLoadBalancingPolicy: Added host 172.17.10.47 (BACK) *15/02/23 21:56:40 INFO CassandraConnector: Connected to Cassandra cluster: TestCassandra* *15/02/23 21:56:41 INFO CassandraConnector: Disconnected from Cassandra cluster: TestCassandra* 15/02/23 21:56:45 INFO SparkContext: Starting job: toArray at <console>:21 15/02/23 21:56:45 INFO DAGScheduler: Got job 0 (toArray at <console>:21) with 6 output partitions (allowLocal=false) 15/02/23 21:56:45 INFO DAGScheduler: Final stage: Stage 0(toArray at <console>:21) 15/02/23 21:56:45 INFO DAGScheduler: Parents of final stage: List() 15/02/23 21:56:45 INFO DAGScheduler: Missing parents: List() 15/02/23 21:56:45 INFO DAGScheduler: Submitting Stage 0 (CassandraRDD[0] at RDD at CassandraRDD.scala:50), which has no missing parents 15/02/23 21:56:45 INFO MemoryStore: ensureFreeSpace(4536) called with curMem=0, maxMem=560497950 15/02/23 21:56:45 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.4 KB, free 534.5 MB) 15/02/23 21:56:46 INFO MemoryStore: ensureFreeSpace(2620) called with curMem=4536, maxMem=560497950 15/02/23 21:56:46 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.6 KB, free 534.5 MB) 15/02/23 21:56:46 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-17-10-182:45474 (size: 2.6 KB, free: 534.5 MB) 15/02/23 21:56:46 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/02/23 21:56:46 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838 15/02/23 21:56:46 INFO DAGScheduler: Submitting 6 missing tasks from Stage 0 (CassandraRDD[0] at RDD at CassandraRDD.scala:50) 15/02/23 21:56:46 INFO TaskSchedulerImpl: Adding task set 0.0 with 6 tasks 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/0 is now EXITED (Command exited with code 1) 15/02/23 21:56:54 INFO SparkDeploySchedulerBackend: Executor app-20150223215621-0010/0 removed: Command exited with code 1 15/02/23 21:56:54 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor added: app-20150223215621-0010/4 on worker-20150223191054-ip-172-17-10-45-9000 (ip-172-17-10-45:9000) with 2 cores 15/02/23 21:56:54 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150223215621-0010/4 on hostPort ip-172-17-10-45:9000 with 2 cores, 512.0 MB RAM 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/2 is now EXITED (Command exited with code 1) 15/02/23 21:56:54 INFO SparkDeploySchedulerBackend: Executor app-20150223215621-0010/2 removed: Command exited with code 1 15/02/23 21:56:54 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor added: app-20150223215621-0010/5 on worker-20150223191055-ip-172-17-10-46-9000 (ip-172-17-10-46:9000) with 2 cores 15/02/23 21:56:54 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150223215621-0010/5 on hostPort ip-172-17-10-46:9000 with 2 cores, 512.0 MB RAM 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/4 is now LOADING 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/1 is now EXITED (Command exited with code 1) 15/02/23 21:56:54 INFO SparkDeploySchedulerBackend: Executor app-20150223215621-0010/1 removed: Command exited with code 1 15/02/23 21:56:54 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor added: app-20150223215621-0010/6 on worker-20150223191054-ip-172-17-10-47-9000 (ip-172-17-10-47:9000) with 2 cores 15/02/23 21:56:54 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150223215621-0010/6 on hostPort ip-172-17-10-47:9000 with 2 cores, 512.0 MB RAM 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/5 is now LOADING 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/3 is now EXITED (Command exited with code 1) 15/02/23 21:56:54 INFO SparkDeploySchedulerBackend: Executor app-20150223215621-0010/3 removed: Command exited with code 1 15/02/23 21:56:54 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor added: app-20150223215621-0010/7 on worker-20150223191051-ip-172-17-10-44-9000 (ip-172-17-10-44:9000) with 2 cores 15/02/23 21:56:54 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150223215621-0010/7 on hostPort ip-172-17-10-44:9000 with 2 cores, 512.0 MB RAM 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/6 is now LOADING 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/7 is now LOADING 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/4 is now RUNNING 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/5 is now RUNNING 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/6 is now RUNNING 15/02/23 21:56:54 INFO AppClient$ClientActor: Executor updated: app-20150223215621-0010/7 is now RUNNING *15/02/23 21:57:01 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory* -- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The logs show that it connected to Cassandra and disconnected. datacenter DC2 is in different regions and cant connect with that IP addresses. What is the problem here?? I would appreciate any help. Thanks, Bo.