Where are those? Yeah I saw Dataflow but unfortunately we have a use case that requires custom image and higher scale than the dataflow quota allows
On Tue, Jul 13, 2021 at 2:00 PM Kyle Weaver <[email protected]> wrote: > You can check the task manager logs as well to see if there is any > additional information. > > Beam+Flink+Dataproc isn't unheard of, but Java is definitely more common > than Python (and simpler to operate). And overall Dataflow is usually the > preferred way to run Beam on GCP. > > On Tue, Jul 13, 2021 at 7:27 AM Joey Tran <[email protected]> > wrote: > >> I found the jobmanager log from the yarn web interface [jobmanager_log >> (1).txt]. >> >> I didn't see any errors about malconfigured logging drivers this time. >> >> Is running flink/beam on dataproc a rare use case? >> >> On Mon, Jul 12, 2021 at 6:55 PM Kyle Weaver <[email protected]> wrote: >> >>> no_container.log is from the Python driver process, so it may be >>> abbreviated. You should be able to find the unabridged log in either the >>> Beam job server or Flink task manager. When container startup fails, Beam >>> attempts to get the container logs. In your previous email >>> (jobmanager_log.txt) there was a line "Error response from daemon: >>> configured logging driver does not support reading". I'm guessing that is >>> still the case now, but we should confirm. If it is the case, it's >>> a Dataproc-Docker setup issue and unfortunately falls outside of my >>> wheelhouse. It should be possible to configure logging drivers correctly, >>> but I don't know how (or why it doesn't work as expected in the first >>> place). https://docs.docker.com/config/containers/logging/configure/ >>> >>> On Mon, Jul 12, 2021 at 12:36 PM Joey Tran <[email protected]> >>> wrote: >>> >>>> Yeah I saw the 1.9 and tried it initially but the cluster flink version >>>> is actually updated and in the 2.0 image flink==1.12 (I confirmed by >>>> looking at the flink dashboard from the YARN ResourceManager web >>>> interface). >>>> >>>> I checked out beam==2.30.0 and built >>>> :runners:flink:1.12:job-server:shadowJar and we've made some progress... >>>> there is no longer a jackson issue but we're now back to the docker >>>> container issue :/ >>>> >>>> Where would I find the docker logs? I've attached the log with the >>>> stdout from the main workflow invocation, though it looks pretty similar to >>>> the log I sent you a couple emails back mistakenly. (I also included the >>>> invocation at the top of the file). >>>> >>>> Thanks again for helping me debug this, this is far more support than I >>>> expected and I really appreciate it! >>>> >>>> On Mon, Jul 12, 2021 at 2:29 PM Kyle Weaver <[email protected]> >>>> wrote: >>>> >>>>> You weren't getting this error before, so I might recommend building >>>>> the Flink job server jar from a release branch rather than from head. e.g. >>>>> "git checkout upstream/release-2.29.0" and then build using the same >>>>> command as before. >>>>> >>>>> Another thing I just noticed is that the Dataproc example says to set >>>>> pipeline option "--flink_version=1.9", so I'm assuming 1.9 is the Flink >>>>> version Dataproc installs by default. So you may also need to change your >>>>> job server build command to use Flink version 1.9: "./gradlew >>>>> :runners:flink:1.9:job-server:shadowJar". The wrinkle here is that support >>>>> for Flink 1.9 was dropped, so 2.29.0 is the last release that includes >>>>> support. So you will have to build from that branch or earlier. >>>>> >>>>> On Mon, Jul 12, 2021 at 11:13 AM Joey Tran <[email protected]> >>>>> wrote: >>>>> >>>>>> Ah so sorry Kyle, I attached the wrong logs... This is the correct log >>>>>> >>>>>> On Mon, Jul 12, 2021 at 2:10 PM Kyle Weaver <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> But got a new error jackson error having to do with trying to >>>>>>>> convert a null into a float? I've attached the log. Thanks again for >>>>>>>> all >>>>>>>> your help. >>>>>>>> >>>>>>> >>>>>>> I don't see the Jackson error in the logs. Instead there's this: >>>>>>> >>>>>>> IllegalStateException: No container running for id >>>>>>> 5e882860579e1e4b9eac46418e4f21434d242316badb759298a9f510593fba34 >>>>>>> >>>>>>> Which indicates the Python SDK container failed to start for some >>>>>>> reason. Unfortunately when it tries to recover container logs that also >>>>>>> fails with "Error response from daemon: configured logging driver does >>>>>>> not >>>>>>> support reading". I'm not sure how Docker is set up on Dataproc, but it >>>>>>> may >>>>>>> be difficult to debug this further unless we can get those container >>>>>>> logs. >>>>>>> >>>>>>> On Mon, Jul 12, 2021 at 7:25 AM Joey Tran <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Ah I see what you mean now. Okay I just tried that: >>>>>>>> >>>>>>>> python wordcount.py --input kinglear.txt --output my_counts >>>>>>>> --runner FlinkRunner --flink_master $FLINK_MASTER_URL >>>>>>>> --environment_type >>>>>>>> DOCKER --flink_job_server_jar >>>>>>>> ~/beam/runners/flink/1.13/job-server/build/libs/beam-runners-flink-1.13-job-server-2.32.0-SNAPSHOT.jar >>>>>>>> >>>>>>>> But got a new error jackson error having to do with trying to >>>>>>>> convert a null into a float? I've attached the log. Thanks again for >>>>>>>> all >>>>>>>> your help. >>>>>>>> >>>>>>>> On Fri, Jul 9, 2021 at 6:35 PM Kyle Weaver <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> The important thing is it needs to be the job server jar (has >>>>>>>>> job-server in the path). >>>>>>>>> >>>>>>>>> On Fri, Jul 9, 2021 at 3:31 PM Joey Tran < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi kyle, >>>>>>>>>> >>>>>>>>>> I noticed the one you linked didnt work when i copied and pasted >>>>>>>>>> it so chose the only one in the directory and that’s how when i got >>>>>>>>>> the >>>>>>>>>> error message >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> Joey >>>>>>>>>> >>>>>>>>>> On Fri, Jul 9, 2021 at 4:50 PM Kyle Weaver <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Joey, my mistake, I picked the wrong jar. The correct jar >>>>>>>>>>> should be >>>>>>>>>>> "runners/flink/1.13/job-server/build/libs/beam-runners-flink-1.13-job-server-2.31.0-SNAPSHOT.jar" >>>>>>>>>>> (or similar depending on your Beam/Flink version choices). >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 9, 2021 at 1:43 PM Joey Tran < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> Thank you very much for the responses! >>>>>>>>>>>> >>>>>>>>>>>> I feel a bit better about using dataproc since it's not in beta >>>>>>>>>>>> like flink on GKE. >>>>>>>>>>>> >>>>>>>>>>>> I rebuilt the flinkrunner as you specified but I still get an >>>>>>>>>>>> error. I've attached the stdout from trying to run with the >>>>>>>>>>>> patched flink >>>>>>>>>>>> runner. >>>>>>>>>>>> >>>>>>>>>>>> Here's instructions to get a cluster started and to the state >>>>>>>>>>>> right before I run your patched flink runner instructions: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> gcloud dataproc clusters create my-flink-cluster >>>>>>>>>>>> --optional-components=FLINK,DOCKER --region=us-central1 >>>>>>>>>>>> --image-version=2.0 >>>>>>>>>>>> --enable-component-gateway >>>>>>>>>>>> gcloud compute ssh my-flink-cluster-m >>>>>>>>>>>> curl >>>>>>>>>>>> https://raw.githubusercontent.com/cs109/2015/master/Lectures/Lecture15b/sparklect/shakes/kinglear.txt >>>>>>>>>>>> > kinglear.txt >>>>>>>>>>>> curl >>>>>>>>>>>> https://raw.githubusercontent.com/apache/beam/master/sdks/python/apache_beam/examples/wordcount.py >>>>>>>>>>>> > wordcount.py >>>>>>>>>>>> pip install apache_beam apache_beam[gcp] >>>>>>>>>>>> . /usr/bin/flink-yarn-daemon >>>>>>>>>>>> python wordcount.py --input kinglear.txt --output my_counts >>>>>>>>>>>> --runner FlinkRunner --flink_master $FLINK_MASTER_URL >>>>>>>>>>>> --environment_type >>>>>>>>>>>> DOCKER --flink_job_server_jar >>>>>>>>>>>> beam/runners/flink/1.13/build/libs/beam-runners-flink-1.13-2.32.0-SNAPSHOT.jar >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Let me know if anything stands out to you. Thanks again for the >>>>>>>>>>>> support! Sorry if I'm missing something silly >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 9, 2021 at 3:20 PM Kyle Weaver <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> That's for Java only. Joey was asking about the portable >>>>>>>>>>>>> (Python) example. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 9, 2021 at 12:18 PM Tianzi Cai <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Kyle so much for forwarding. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was literally just trying this myself and got stuck too (b/ >>>>>>>>>>>>>> <http://b/193180649>193180649 <http://b/193180649>). I >>>>>>>>>>>>>> finally got it all to work. Please feel free to share with the >>>>>>>>>>>>>> customer. I can >>>>>>>>>>>>>> give them repo.reader permission if needed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Run this command to generate the canonical word count >>>>>>>>>>>>>> example. >>>>>>>>>>>>>> mvn archetype:generate \ >>>>>>>>>>>>>> -DarchetypeGroupId=org.apache.beam \ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ >>>>>>>>>>>>>> -DarchetypeVersion=2.30.0 \ >>>>>>>>>>>>>> -DgroupId=org.example \ >>>>>>>>>>>>>> -DartifactId=word-count-beam \ >>>>>>>>>>>>>> -Dversion="0.1" \ >>>>>>>>>>>>>> -Dpackage=org.apache.beam.examples \ >>>>>>>>>>>>>> -DinteractiveMode=false >>>>>>>>>>>>>> 2. Make a few code changes (here >>>>>>>>>>>>>> >>>>>>>>>>>>>> <https://source.cloud.google.com/tz-playground-bigdata/word-count-example/+/gcp:> >>>>>>>>>>>>>> are >>>>>>>>>>>>>> mine) then make sure that the code works with mvn compile >>>>>>>>>>>>>> exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount >>>>>>>>>>>>>> and >>>>>>>>>>>>>> you can see the aggregated results printed out. >>>>>>>>>>>>>> 3. Run mvn package -Pflink-runner to get the packaged >>>>>>>>>>>>>> JARs. >>>>>>>>>>>>>> 4. Upload the uber jar word-count-beam-bundled-0.1.jar to >>>>>>>>>>>>>> a Cloud Storage bucket. SSH into my Dataproc master node. >>>>>>>>>>>>>> Download the uber >>>>>>>>>>>>>> jar. >>>>>>>>>>>>>> 5. flink run -c org.apache.beam.examples.WordCount >>>>>>>>>>>>>> word-count-beam-bundled-0.1.jar --runner=FlinkRunner >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 9, 2021 at 12:11 PM Kyle Weaver < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> If you're not committed to Dataproc, you may also want to >>>>>>>>>>>>>>> try running it on GKE, which AFAIK doesn't have these issues. >>>>>>>>>>>>>>> https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jul 9, 2021 at 12:08 PM Kyle Weaver < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Joey, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jackson dependency issues are likely >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-10430. You will >>>>>>>>>>>>>>>> have to manually patch it until a fix is available in an >>>>>>>>>>>>>>>> upcoming Beam >>>>>>>>>>>>>>>> release. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. Download Beam source from Github >>>>>>>>>>>>>>>> 2. Check out a patch for the issue, such as >>>>>>>>>>>>>>>> https://github.com/apache/beam/pull/14953 >>>>>>>>>>>>>>>> 3. Build the Flink runner using command "./gradlew >>>>>>>>>>>>>>>> :runners:flink:1.13:job-server:shadowJar" >>>>>>>>>>>>>>>> 4. Use the outputted Flink runner jar in your Python >>>>>>>>>>>>>>>> pipeline options >>>>>>>>>>>>>>>> "--flink_job_server_jar=runners/flink/1.13/build/libs/beam-runners-flink-1.13-2.31.0-SNAPSHOT.jar" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For the "No container id" issue, can you share the full >>>>>>>>>>>>>>>> logs? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +Tianzi Cai <[email protected]> +Anthony Mancuso >>>>>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Kyle >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jul 9, 2021 at 8:47 AM Ahmet Altay < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /cc @Kyle Weaver <[email protected]> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Jul 9, 2021 at 5:24 AM Joey Tran < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm trying to just demo Beam/Flink and I tried following >>>>>>>>>>>>>>>>>> the instructions with Google's Dataproc but I get a bunch of >>>>>>>>>>>>>>>>>> errors ranging >>>>>>>>>>>>>>>>>> from jackson dependency issues to some issue about "No >>>>>>>>>>>>>>>>>> container id". >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Does anyone know if these dataproc instructions[1] are >>>>>>>>>>>>>>>>>> complete? I ran through it pretty much word for word and >>>>>>>>>>>>>>>>>> can't get a simple >>>>>>>>>>>>>>>>>> wordcount going, I'm not sure if I'm somehow messing >>>>>>>>>>>>>>>>>> something up or >>>>>>>>>>>>>>>>>> there's more necessary than just this doc instructs? FWIW >>>>>>>>>>>>>>>>>> I've been able to >>>>>>>>>>>>>>>>>> run the java wordcount example fine, it seems like I only >>>>>>>>>>>>>>>>>> run into issues >>>>>>>>>>>>>>>>>> when trying to follow the portable runner instructions >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks so much in advance for your help1 I'm not very >>>>>>>>>>>>>>>>>> experienced with deploying these kinds of things but I >>>>>>>>>>>>>>>>>> wanted to do a demo >>>>>>>>>>>>>>>>>> to show that Beam+Flink is a better solution than writing a >>>>>>>>>>>>>>>>>> framework myself >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>> https://cloud.google.com/dataproc/docs/concepts/components/flink#portable_beam_jobs >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>
