Was there any other creative solutions for this? I am running into the same issue with submitting to yarn from a Docker container and the solutions don't provided don't work. (1. the host doesn't work, even if I use the hostname of the physical node because when spark tries to bind to the hostname of the physical node in bridged mode, it doesn't see it and errors out... as stated we need a bind address, and advertise address if this is to work), 2. Same restrictions. 3. cluster mode doesn't work for pyspark shell.
Any other thoughts? John On Thu, Jun 11, 2015 at 12:09 AM, Ashwin Shankar <ashwinshanka...@gmail.com> wrote: > Hi Eron, Thanks for your reply, but none of these options works for us. >> >> >> 1. use 'spark.driver.host' and 'spark.driver.port' setting to >> stabilize the driver-side endpoint. (ref >> <https://spark.apache.org/docs/latest/configuration.html#networking>) >> >> This unfortunately won't help since if we set spark.driver.port to > something, its going to be used to bind on the client > side and the same will be passed to the AM. We need two variables,a) one > to bind to on the client side, b)another port which is opened up on the > docker host and will be used by the AM to talk back to the driver. > > 2. use host networking for your container, i.e. "docker run --net=host >> ..." > > We run containers in shared environment, and this option makes host > network stack accessible to all > containers in it, which could leads to security issues. > > 3. use yarn-cluster mode > > Pyspark interactive shell(ipython) doesn't have cluster mode. SPARK-5162 > <https://issues.apache.org/jira/browse/SPARK-5162> is for spark-submit > python in cluster mode. > > Thanks, > Ashwin > > > On Wed, Jun 10, 2015 at 3:55 PM, Eron Wright <ewri...@live.com> wrote: > >> Options include: >> >> 1. use 'spark.driver.host' and 'spark.driver.port' setting to >> stabilize the driver-side endpoint. (ref >> <https://spark.apache.org/docs/latest/configuration.html#networking>) >> 2. use host networking for your container, i.e. "docker run >> --net=host ..." >> 3. use yarn-cluster mode (see SPARK-5162 >> <https://issues.apache.org/jira/browse/SPARK-5162>) >> >> >> Hope this helps, >> Eron >> >> >> ------------------------------ >> Date: Wed, 10 Jun 2015 13:43:04 -0700 >> Subject: Problem with pyspark on Docker talking to YARN cluster >> From: ashwinshanka...@gmail.com >> To: d...@spark.apache.org; user@spark.apache.org >> >> >> All, >> I was wondering if any of you have solved this problem : >> >> I have pyspark(ipython mode) running on docker talking to >> a yarn cluster(AM/executors are NOT running on docker). >> >> When I start pyspark in the docker container, it binds to port *49460.* >> >> Once the app is submitted to YARN, the app(AM) on the cluster side fails >> with the following error message : >> *ERROR yarn.ApplicationMaster: Failed to connect to driver at :49460* >> >> This makes sense because AM is trying to talk to container directly and >> it cannot, it should be talking to the docker host instead. >> >> *Question* : >> How do we make Spark AM talk to host1:port1 of the docker host(not the >> container), which would then >> route it to container which is running pyspark on host2:port2 ? >> >> One solution I could think of is : after starting the driver(say on >> hostA:portA), and before submitting the app to yarn, we could >> reset driver's host/port to hostmachine's ip/port. So the AM can then >> talk hostmachine's ip/port, which would be mapped >> to the container. >> >> Thoughts ? >> -- >> Thanks, >> Ashwin >> >> >> > > > -- > Thanks, > Ashwin > > >