Wei Zhong created FLINK-31184: --------------------------------- Summary: Failed to get python udf runner directory via running GET_RUNNER_DIR_SCRIPT Key: FLINK-31184 URL: https://issues.apache.org/jira/browse/FLINK-31184 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.16.1, 1.15.3, 1.17.0 Reporter: Wei Zhong
The following exception is thrown when using python udf in user job: {code:java} Caused by: java.io.IOException: Cannot run program "ERROR: ld.so: object '/usr/lib64/libjemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored. /mnt/ssd/0/yarn/nm-local-dir/usercache/flink/appcache/application_1670838323719_705777/python-dist-fe870981-4de7-4229-ad0b-f51881e80d90/python-archives/pipeline_venv_v5.tar.gz/lib/python3.7/site-packages/pyflink/bin/pyflink-udf-runner.sh": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:147) at org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:122) at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:106) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) ... 19 more Suppressed: java.lang.NullPointerException: Process for id does not exist: 1-1 at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:895) at org.apache.beam.runners.fnexecution.environment.ProcessManager.stopProcess(ProcessManager.java:172) at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:126) ... 29 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 32 more {code} This is because SRE introduce a environment param {code:java} LD_PRELOAD=/usr/lib64/libjemalloc.so.1 {code} The logic of the python process itself can be executed normally, but an extra error message will be printed. So the whole output looks like: {code:java} ERROR: ld.so: object '/usr/lib64/libjemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored. /mnt/ssd/0/yarn/nm-local-dir/usercache/flink/appcache/application_1670838323719_705777/python-dist-fe870981-4de7-4229-ad0b-f51881e80d90/python-archives/pipeline_venv_v5.tar.gz/lib/python3.7/site-packages/pyflink/bin/{code} And the whole output is treated as a command, which caused the exception. It seems the output is not very reliable. Maybe we need to find another way to transfer data, or filter the output before using. -- This message was sent by Atlassian Jira (v8.20.10#820010)