Karthik Palaniappan created SPARK-24668:
-------------------------------------------

             Summary: PySpark crashes when getting the webui url if the webui 
is disabled
                 Key: SPARK-24668
                 URL: https://issues.apache.org/jira/browse/SPARK-24668
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.3.0
         Environment: * Spark 2.3.0
 * Spark-on-YARN
 * Java 8
 * Python 2
 * Jupyter 
            Reporter: Karthik Palaniappan


Repro:

 

Evaluate `sc` in a Jupyter notebook:

 

 

{{---------------------------------------------------------------------------}}
{{Py4JJavaError                             Traceback (most recent call last)}}
{{/opt/conda/lib/python3.6/site-packages/IPython/core/formatters.py in 
__call__(self, obj)}}
{{    343             method = get_real_method(obj, self.print_method)}}
{{    344             if method is not None:}}
{{--> 345                 return method()}}
{{    346             return None}}
{{    347         else:}}

{{/usr/lib/spark/python/pyspark/context.py in _repr_html_(self)}}
{{    261         </div>}}
{{    262         """.format(}}
{{--> 263             sc=self}}
{{    264         )}}
{{    265 }}

{{/usr/lib/spark/python/pyspark/context.py in uiWebUrl(self)}}
{{    373     def uiWebUrl(self):}}
{{    374         """Return the URL of the SparkUI instance started by this 
SparkContext"""}}
{{--> 375         return 
self._[jsc.sc|https://www.google.com/url?q=http://jsc.sc&sa=D&usg=AFQjCNHUwO0Cf3OHs1QafBFXzShZ_PU8IQ]().uiWebUrl().get()}}
{{    376 }}
{{    377     @property}}

{{/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in 
__call__(self, *args)}}
{{   1158         answer = self.gateway_client.send_command(command)}}
{{   1159         return_value = get_return_value(}}
{{-> 1160             answer, self.gateway_client, self.target_id, 
[self.name|https://www.google.com/url?q=http://self.name&sa=D&usg=AFQjCNEu_LlQOduOrIyV64UgIuRgm6Ea2w])}}
{{   1161 }}
{{   1162         for temp_arg in temp_args:}}

{{/usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw)}}
{{     61     def deco(*a, **kw):}}
{{     62         try:}}
{{---> 63             return f(*a, **kw)}}
{{     64         except py4j.protocol.Py4JJavaError as e:}}
{{     65             s = e.java_exception.toString()}}

{{/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in 
get_return_value(answer, gateway_client, target_id, name)}}
{{    318                 raise Py4JJavaError(}}
{{    319                     "An error occurred while calling 
\{0}{1}\{2}.\n".}}
{{--> 320                     format(target_id, ".", name), value)}}
{{    321             else:}}
{{    322                 raise Py4JError(}}

{{Py4JJavaError: An error occurred while calling o80.get.}}
{{: java.util.NoSuchElementException: None.get}}
{{        at scala.None$.get(Option.scala:347)}}
{{        at scala.None$.get(Option.scala:345)}}
{{        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
{{        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
{{        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
{{        at java.lang.reflect.Method.invoke(Method.java:498)}}
{{        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)}}
{{        at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)}}
{{        at py4j.Gateway.invoke(Gateway.java:282)}}
{{        at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)}}
{{        at py4j.commands.CallCommand.execute(CallCommand.java:79)}}
{{        at py4j.GatewayConnection.run(GatewayConnection.java:214)}}
{{        at java.lang.Thread.run(Thread.java:748)}}

 

PySpark only prints out the web ui url in `_repr_html`, not `__repr__`, so this 
only happens in notebooks that render html, not the pyspark shell. 
[https://github.com/apache/spark/commit/f654b39a63d4f9b118733733c7ed2a1b58649e3d]

 

Disabling Spark's UI with `spark.ui.enabled` *is* valuable outside of tests. A 
couple reasons that come to mind:

1) If you run multiple spark applications from one machine, Spark irritatingly 
starts picking the same port (4040), as the first application, then increments 
(4041, 4042, etc) until it finds an open port. If you are running 10 spark 
apps, then the 11th prints out 10 warnings about ports being taken until it 
finally finds one.

2) You can serve the spark web ui from a dedicated spark history server instead 
of per-driver. This is documented here, at least for Spark-on-YARN: 
[https://spark.apache.org/docs/latest/running-on-yarn.html#using-the-spark-history-server-to-replace-the-spark-web-ui.]

 

PySpark should not crash if the web ui is disabled. There are a couple of 
options:

1) SparkContext#uiWebUrl() in Scala should return the driver web ui url or the 
history server url, depending on which one is being used.

2) PySpark should call getOrElse(None) rather than get().

 

I strongly prefer option 1), but I can't figure out how to do it in a non-hacky 
way. In SparkContext.scala, uiWebUrl() comes from `_ui.map(_.webUrl)`, where 
`_ui` contains the actual SparkUI if spark.ui.enabled=true.

1) I could set `_ui` to SparkUI.createHistoryUI(), and then just avoid calling 
`bind()` on the UI server. I'm not sure what the implications would be for 
classes outside of SparkContext that use SparkContext#ui.

2) I could make `_ui` and `uiWebUrl()` inconsistent. `_ui` only contains the 
in-driver UI and `uiWebUrl()` returns the in-driver or history URL.

 

I would appreciate some help figuring out how to proceed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to