Hi Biao, Thanks for your help. That solved my issue. It turned out that in setup1 (in EMR), I got apache-flink installed, but the package (and its dependencies) are not in the directory `/usr/lib/python3.7/site-packages` (corresponding to the python binary in `/usr/bin/python3`). For some reason, the packages are in the current user's location (`~/.local/...) which Flink did not look at.
BTW, is there any way to use the pyflink shipped with the Flink binary zip file that I downloaded from Apache's site? On EMR, such package is included, and I feel it's awkward to have to install another version using `pip install`. It will also be confusing about where to add the dependencies jars. Thanks and regards, Levan Huyen On Thu, 20 Oct 2022 at 02:25, Biao Geng <biaoge...@gmail.com> wrote: > Hi Levan, > > For your setup1 & 2, it looks like the python environment is not ready. > Have you tried python -m pip install apache-flink for the first 2 setups? > For your setup3, as you are trying to use `flink run ...` command, it will > try to connect to a launched flink cluster but I guess you did not launch > the flink cluster. You can do `start-cluster.sh` first to launch a > standalone flink cluster and then try the `flink run ...` command. > For your setup4, the reason why it works well is that it will use the > default mini cluster to run the pyflink job. So even you haven't started a > standalone cluster, it can work as well. > > Best, > Biao Geng > > Levan Huyen <lvhu...@gmail.com> 于2022年10月19日周三 17:07写道: > >> Hi, >> >> I'm new to PyFlink, and I couldn't run a basic example that shipped with >> Flink. >> This is the command I tried: >> >> ./bin/flink run -py examples/python/datastream/word_count.py >> >> Here below are the results I got with different setups: >> >> 1. On AWS EMR 6.8.0 (Flink 1.15.1): >> *Error: No module named 'google'*I tried with the Flink shipped with >> EMR, or the binary v1.15.1/v1.15.2 downloaded from Flink site. I got that >> same error message in all cases. >> >> Traceback (most recent call last): >> >> File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main >> >> "__main__", mod_spec) >> >> File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code >> >> exec(code, run_globals) >> >> File >> "/tmp/pyflink/622f19cc-6616-45ba-b7cb-20a1462c4c3f/a29eb7a4-fff1-4d98-9b38-56b25f97b70a/word_count.py", >> line 134, in <module> >> >> word_count(known_args.input, known_args.output) >> >> File >> "/tmp/pyflink/622f19cc-6616-45ba-b7cb-20a1462c4c3f/a29eb7a4-fff1-4d98-9b38-56b25f97b70a/word_count.py", >> line 89, in word_count >> >> ds = ds.flat_map(split) \ >> >> File >> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/datastream/data_stream.py", >> line 333, in flat_map >> >> File >> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/datastream/data_stream.py", >> line 557, in process >> >> File >> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/fn_execution/flink_fn_execution_pb2.py", >> line 23, in <module> >> >> ModuleNotFoundError: No module named 'google' >> >> org.apache.flink.client.program.ProgramAbortException: >> java.lang.RuntimeException: Python process exits with code: 1 >> >> 2. On my Mac, without a virtual environment (so `*-pyclientexec python3*` >> is included in the run command): got the same error as with EMR, but >> there's a stdout line from `*print()*` in the Python script >> >> File >> "/Users/lvh/dev/tmp/flink-1.15.2/opt/python/pyflink.zip/pyflink/datastream/data_stream.py", >> line 557, in process >> >> File "<frozen zipimport>", line 259, in load_module >> >> File >> "/Users/lvh/dev/tmp/flink-1.15.2/opt/python/pyflink.zip/pyflink/fn_execution/flink_fn_execution_pb2.py", >> line 23, in <module> >> >> ModuleNotFoundError: No module named 'google' >> >> Executing word_count example with default input data set. >> >> Use --input to specify file input. >> >> org.apache.flink.client.program.ProgramAbortException: >> java.lang.RuntimeException: Python process exits with code: 1 >> >> 3. On my Mac, with a virtual environment and Python package ` >> *apache-flink`* installed: Flink tried to connect to localhost:8081 (I >> don't know why), and failed with 'connection refused'. >> >> Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: >> Could not complete the operation. Number of retries has been exhausted. >> >> at >> org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:395) >> >> ... 21 more >> >> Caused by: java.util.concurrent.CompletionException: >> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: >> Connection refused: localhost/127.0.0.1:8081 >> >> 4. If I run that same example job using Python: `*python word_count.py*` >> then it runs well. >> >> I tried with both v1.15.2 and 1.15.1, Python 3.7 & 3.8, and got the same >> result. >> >> Could someone please help? >> >> Thanks. >> >