Hi

Really sorry for keep posting. Next time, I will dig more before posting.

I finally got my session created after removing "livy.spark.master":
"yarn-cluster". Does "yarn" mean yarn client mode? Does that mean livy
server should run driver in yarn-client mode? I imagine the client (which
is livy) will not go away so yarn-client mode is OK? But why "yarn-cluster"
is not working?

Sorry I am a newbie to spark.




On Mon, Mar 18, 2024 at 9:48 PM Guanyao Huang <gyhu...@ucdavis.edu> wrote:

> I am
> reading 
> src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala
>
> I see line 467 if livyConf.isRunningOnYarn() it will create spark app.
> Also line 491 client.get.submit sends a dummy job. So it is related to line
> 120 how client was created and how we initiate InteractiveSessionServlet
> with LivyConf.
>
> This is the config I used to run EMR
> ```
>   {
>     "Classification": "livy-conf",
>     "Properties": {
>       "livy.server.session.timeout": "24h",
>       "livy.server.session.timeout-check": "false",
>       "livy.spark.master": "yarn-cluster"
>     }
>   },
> ```
>
> btw, in the EMR cluster where can I check those server side logs.
>
> On Mon, Mar 18, 2024 at 8:38 PM Guanyao Huang <gyhu...@ucdavis.edu> wrote:
>
>> The final log
>>
>> ```
>> {'appId': None,
>> 'appInfo': {'driverLogUrl': None, 'sparkUiUrl': None},
>> 'id': 0,
>> 'kind': 'spark',
>> 'log': ['\tat '
>>
>> 'org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:257)',
>> '\tat '
>> 'org.apache.spark.deploy.SparkSubmit.org
>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1010)',
>> '\tat '
>> 'org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)',
>> '\tat '
>> 'org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)',
>> '\tat '
>> 'org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)',
>> '\tat '
>>
>> 'org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1167)',
>> '\tat '
>> 'org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1176)',
>> '\tat org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)',
>> '\nYARN Diagnostics: ', 'spark-submit start failed'],
>> 'name': None,
>> 'owner': None,
>> 'proxyUser': None,
>> 'state': 'dead'}
>> ```
>> Why is spark-submit triggered?
>>
>> On Mon, Mar 18, 2024 at 8:16 PM Guanyao Huang <gyhu...@ucdavis.edu>
>> wrote:
>>
>>> Hello
>>>
>>> Sorry to bother you. This is my first time using livy with Python in
>>> EMR. My code is as simple as below:
>>>
>>> ```
>>> import json, pprint, requests, textwrap, time
>>>
>>> def lambda_handler(event, context):
>>>     start_time = time.time()
>>>
>>>     ip = event['ip']
>>>     host = 'http://{0}:8998'.format(ip)
>>>     data = {'kind': 'spark'}
>>>     headers = {'Content-Type': 'application/json'}
>>>     r = requests.post(host + '/sessions', data=json.dumps(data),
>>> headers=headers)
>>>     pprint.pprint(r.json(), compact=True)
>>>     pprint.pprint(r.headers, compact=True)
>>>
>>>     session_url = host + r.headers['location']
>>>     ready = False
>>>     while not ready:
>>>         r = requests.get(session_url, headers=headers)
>>>         resp = r.json()
>>>         ready = resp['state'] == 'idle'
>>>         print('checking session creation')
>>>         pprint.pprint(r.json(), compact=True)
>>>         time.sleep(2)
>>>
>>>     session_ready = time.time() - start_time
>>>     print('{0} elapsed to create
>>> session'.format(time.strftime("%H:%M:%S", time.gmtime(session_ready))))
>>> ```
>>>
>>> I basically post create a session and repeatedly wait 2s to get its
>>> status.
>>>
>>> I only got it working once. For the rest of times, the session got
>>> shutting_down very early like below logs:
>>>
>>> ```
>>> 2024-03-19T02:50:51.218Z {'appId': None,
>>> 2024-03-19T02:50:51.218Z 'appInfo': {'driverLogUrl': None, 'sparkUiUrl':
>>> None},
>>> 2024-03-19T02:50:51.218Z 'id': 3,
>>> 2024-03-19T02:50:51.218Z 'kind': 'spark',
>>> 2024-03-19T02:50:51.218Z 'log': ['\tat '
>>> 2024-03-19T02:50:51.218Z
>>> 'org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:1093)',
>>> 2024-03-19T02:50:51.218Z '\tat '
>>> 2024-03-19T02:50:51.218Z
>>> 'org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:257)',
>>> 2024-03-19T02:50:51.218Z '\tat '
>>> 2024-03-19T02:50:51.218Z 'org.apache.spark.deploy.SparkSubmit.org
>>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1010)',
>>> 2024-03-19T02:50:51.218Z '\tat '
>>> 2024-03-19T02:50:51.218Z
>>> 'org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)',
>>> 2024-03-19T02:50:51.218Z '\tat '
>>> 2024-03-19T02:50:51.218Z
>>> 'org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)',
>>> 2024-03-19T02:50:51.218Z '\tat '
>>> 2024-03-19T02:50:51.218Z
>>> 'org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)',
>>> 2024-03-19T02:50:51.218Z '\tat '
>>> 2024-03-19T02:50:51.218Z
>>> 'org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1167)',
>>> 2024-03-19T02:50:51.218Z '\tat '
>>> 2024-03-19T02:50:51.218Z
>>> 'org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1176)',
>>> 2024-03-19T02:50:51.218Z '\tat
>>> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)',
>>> 2024-03-19T02:50:51.218Z '\nYARN Diagnostics: '],
>>> 2024-03-19T02:50:51.218Z 'name': None,
>>> 2024-03-19T02:50:51.218Z 'owner': None,
>>> 2024-03-19T02:50:51.218Z 'proxyUser': None,
>>> 2024-03-19T02:50:51.218Z
>>> 'state': 'shutting_down'}
>>> ```
>>>
>>> Why does it sound like it wants me to provide runnable jars or pyFiles?
>>> How can get a complete log? From examples in
>>> https://livy.apache.org/examples/ and
>>> https://github.com/apache/incubator-livy/blob/f615f272e9130d02170024832ea308516b907195/dev/docker/README.md?plain=1#L74,
>>> I thought the real job is submitted via statement endpoint.
>>>
>>> Thanks.
>>>
>>

Reply via email to