Guys,
I have a Spark 2.1.1 job with Kinesis where it is failing to launch 50 active
receivers with oversized cluster on EMR Yarn. It registers sometimes 16,
sometimes 32, other times 48 receivers but not all 50. Any help would be
greatly appreciated.
Kinesis stream shards = 500
YARN EMR CLuster
Master m4.4xlarge 1
Core m4.2xlarge 15
Task m4.2xlarge 20
Spark Submit:
/usr/lib/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn,--conf,spark.streaming.stopGracefullyOnShutdown=true,--conf,spark.locality.wait=7500ms,--conf,spark.streaming.blockInterval=10000ms,--conf,spark.shuffle.consolidateFiles=true,--conf,spark.serializer=org.apache.spark.serializer.KryoSerializer,--conf,spark.closure.serializer=org.apache.spark.serializer.KryoSerializer,--conf,spark.dynamicAllocation.enabled=true,--conf,spark.scheduler.mode=FIFO,--conf,spark.ui.retainedJobs=50,--conf,spark.ui.retainedStages=50,--conf,spark.ui.retainedTasks=500,--conf,spark.worker.ui.retainedExecutors=50,--conf,spark.worker.ui.retainedDrivers=50,--conf,spark.sql.ui.retainedExecutions=50,--conf,spark.streaming.ui.retainedBatches=50,--conf,'spark.executor.extraJavaOptions=-XX:+AlwaysPreTouch
-XX:MaxPermSize=6G',--conf,spark.rdd.compress=true,--conf,spark.yarn.executor.memoryOverhead=5120,--executor-memory,15G,--conf,spark.task.maxFailures=8,--conf,spark.yarn.maxAppAttempts=4,--conf,'spark.yarn.max.executor.failures=200',--conf,spark.yarn.executor.failuresValidityInterval=1h,--conf,spark.yarn.am.attemptFailuresValidityInterval=1h,--conf,spark.speculation=false,--driver-java-options,'-XX:+AlwaysPreTouch
-XX:MaxPermSize=6G',--conf,spark.metrics.namespace=$env.$namespace.skynet.stream-concurrency,--class,com.mlbam.emr.StreamingJob,s3://s3jobsbucket/jars/
spark-assembly-${VERSION}.jar,--env,$env,--checkpoint-location,"hdfs:///var/log/spark/apps/checkpoints/app-$env",ActionOnFailure=CONTINUE
My Environment:
Java Home
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-1.b16.32.amzn1.x86_64/jre
Java Version
1.8.0_141 (Oracle Corporation)
Scala Version
version 2.11.8
Spark Properties
Name
Value
spark.app.id
application_1504636247367_0007
spark.app.name
skynet-stream-concurrency-qa
spark.closure.serializer
org.apache.spark.serializer.KryoSerializer
spark.default.parallelism
800
spark.driver.extraClassPath
jsonevent-layout-1.7.jar:json-smart-1.1.1.jar:/home/hadoop/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
spark.driver.extraJavaOptions
-XX:+AlwaysPreTouch -XX:MaxPermSize=6G
spark.driver.extraLibraryPath
/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
spark.driver.host
10.202.138.242
spark.driver.memory
22342M
spark.driver.port
38634
spark.dynamicAllocation.enabled
true
spark.dynamicAllocation.executorIdleTimeout
10m
spark.eventLog.dir
hdfs:///var/log/spark/apps
spark.eventLog.enabled
true
spark.executor.cores
16
spark.executor.extraClassPath
jsonevent-layout-1.7.jar:json-smart-1.1.1.jar:/home/hadoop/lib/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
spark.executor.extraJavaOptions
-XX:+AlwaysPreTouch -XX:MaxPermSize=6G
spark.executor.extraLibraryPath
/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
spark.executor.id
driver
spark.executor.memory
15G
spark.hadoop.yarn.timeline-service.enabled
false
spark.history.fs.logDirectory
hdfs:///var/log/spark/apps
spark.history.ui.port
18080
spark.kryo.classesToRegister
com.mlbam.emr.UserSessions,com.mlbam.emr.StreamSampleEvent
spark.locality.wait
7500ms
spark.master
yarn
spark.metrics.namespace
qa.mlbam.skynet.stream-concurrency
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS
ip-10-202-138-87.mlbam.qa.us-east-1.bamgrid.net
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES
http://ip-10-202-138-87.mlbam.qa.us-east-1.bamgrid.net:20888/proxy/application_1504636247367_0007
spark.rdd.compress
true
spark.scheduler.mode
FIFO
spark.serializer
org.apache.spark.serializer.KryoSerializer
spark.shuffle.consolidateFiles
true
spark.shuffle.service.enabled
true
spark.speculation
false
spark.sql.hive.metastore.sharedPrefixes
com.amazonaws.services.dynamodbv2
spark.sql.ui.retainedExecutions
50
spark.sql.warehouse.dir
hdfs:///user/spark/warehouse
spark.streaming.backpressure.enabled
true
spark.streaming.blockInterval
10000ms
spark.streaming.stopGracefullyOnShutdown
true
spark.streaming.ui.retainedBatches
50
spark.submit.deployMode
cluster
spark.task.maxFailures
8
spark.ui.filters
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
spark.ui.port
0
spark.ui.retainedJobs
50
spark.ui.retainedStages
50
spark.ui.retainedTasks
500
spark.worker.ui.retainedDrivers
50
spark.worker.ui.retainedExecutors
50
spark.yarn.am.attemptFailuresValidityInterval
1h
spark.yarn.app.container.log.dir
/var/log/hadoop-yarn/containers/application_1504636247367_0007/container_1504636247367_0007_01_000001
spark.yarn.app.id
application_1504636247367_0007
spark.yarn.executor.failuresValidityInterval
1h
spark.yarn.executor.memoryOverhead
5120
spark.yarn.historyServer.address
ip-10-202-138-87.mlbam.qa.us-east-1.bamgrid.net:18080
spark.yarn.max.executor.failures
200
spark.yarn.maxAppAttempts
4