[ https://issues.apache.org/jira/browse/YARN-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735285#comment-16735285 ]
Wangda Tan edited comment on YARN-9160 at 1/6/19 7:17 PM: ---------------------------------------------------------- Committed to trunk and branch-3.2, thanks [~tangzhankun]. was (Author: leftnoteasy): Committed to trunk, thanks [~tangzhankun]. > [Submarine] Document "PYTHONPATH" environment variable setting when using > -localization options > ----------------------------------------------------------------------------------------------- > > Key: YARN-9160 > URL: https://issues.apache.org/jira/browse/YARN-9160 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Zhankun Tang > Assignee: Zhankun Tang > Priority: Major > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-9160-trunk.001.patch > > > An infra platform might want to provide the user a Zepplin notebook and > execute user's job with user's command input like "python entry_point.py > ...". This is better for the end user because he/she feels that the > "entry_point.py" seems in the local workbench. > This may translate to below submarine command in the platform when submitting > the job: > > {code:java} > ... job run > --localization entry_script.py:./ > --localization depedency_script1.py:./ > --localization depedency_script2.py:./ > --worker_launch_cmd "python entry_point.py .." > {code} > Or > > {code:java} > ... job run > --localization entry_script.py:./ > --localization depedency_scripts_dir:./ > --worker_launch_cmd "python entry_script.py .." > {code} > > When running with the above command, both will fail due to module import > error from the entry_point.py. This is because YARN only creates symbol links > in the container's work dir (the real scripts files are in different cache > folders) and python module import won't know that. > One possible solution is set localization with a directory containing all > scripts and change the worker_launch_cmd to "cd scripts_dir && python > entry_script.py". But this solution makes the user experience bad which feels > not in a local workbench. > And another solution is using "PYTHONPATH" environment variable. This > solution can keep the user experience good and won't need YARN localization > internal changes. > {code:java} > ... job run > # the entry point > --localization entry_script.py:<path>/entry_script.py > # the dependency Python scripts of the entry point > --localization depedency_scripts_dir:<path>/dependency_scripts_dir > # the PYTHONPATH env to make dependency available to entry script > --env PYTHONPATH="<path>/dependency_scripts_dir" > --worker_launch_cmd "python <path>/entry_script.py ..."{code} > And we should document this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org