Hi Fabian, Did you get this working and are you willing to share the final results? If not I will see what I can do, and we can add it to our documentation.
Cheers, Hans On Thu, 11 Aug 2022 at 13:14, Matt Casters <[email protected]> wrote: > When you run class org.apache.hop.beam.run.MainBeam you need to provide 3 > arguments to run: > > 1. The filename of the pipeline to run > 2. The filename which contains Hop metadata > 3. The name of the pipeline run configuration to use > > See also for example: > https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-flink-pipeline-engine.html#_running_with_flink_run > > Good luck, > Matt > > > On Thu, Aug 11, 2022 at 11:08 AM Fabian Peters <[email protected]> wrote: > >> Hello Hans, >> >> I went through the flex-template process yesterday but the generated >> template does not work. The main piece that's missing for me is how to pass >> the actual pipeline that should be run. My test boiled down to: >> >> gcloud dataflow flex-template build >> gs://foo_ag_dataflow/tmp/todays-directories.json \ >> --image-gcr-path " >> europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest" \ >> --sdk-language "JAVA" \ >> --flex-template-base-image JAVA11 \ >> --metadata-file >> "/Users/fabian/Documents/src/foo/fooDataEngineering/hop/dataflow/todays-directories.json" >> \ >> --jar "/Users/fabian/tmp/fat-hop.jar" \ >> --env >> FLEX_TEMPLATE_JAVA_MAIN_CLASS="org.apache.hop.beam.run.MainBeam" >> >> gcloud dataflow flex-template run "todays-directories-`date >> +%Y%m%d-%H%M%S`" \ >> --template-file-gcs-location " >> gs://foo_ag_dataflow/tmp/todays-directories.json" \ >> --region "europe-west1" >> >> With Dockerfile: >> >> FROM gcr.io/dataflow-templates-base/java11-template-launcher-base >> >> ARG WORKDIR=/dataflow/template >> RUN mkdir -p ${WORKDIR} >> WORKDIR ${WORKDIR} >> >> ENV FLEX_TEMPLATE_JAVA_MAIN_CLASS="org.apache.hop.beam.run.MainBeam" >> ENV FLEX_TEMPLATE_JAVA_CLASSPATH="/dataflow/template/*" >> >> ENTRYPOINT ["/opt/google/dataflow/java_template_launcher"] >> >> >> And "todays-directories.json": >> >> { >> "defaultEnvironment": {}, >> "image": " >> europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest", >> "metadata": { >> "description": "Test templates creation with Apache Hop", >> "name": "Todays directories" >> }, >> "sdkInfo": { >> "language": "JAVA" >> } >> } >> >> Thanks for having a look at it! >> >> cheers >> >> Fabian >> >> Am 10.08.2022 um 16:03 schrieb Hans Van Akelyen < >> [email protected]>: >> >> Hi Fabian, >> >> You have indeed found something we have not yet documented, mainly >> because we have not yet tried it out ourselves. >> The main class that gets called when running Beam pipelines is >> "org.apache.hop.beam.run.MainBeam". >> >> I was hoping the "Import as pipeline" button on a job would give you >> everything you need to execute this but it does not. >> I'll take a closer look the following days to see what is needed to use >> this functionality, could be that we need to export the template based on a >> pipeline. >> >> Kr, >> Hans >> >> On Wed, 10 Aug 2022 at 15:46, Fabian Peters <[email protected]> wrote: >> >>> Hi all! >>> >>> Thanks to Hans' work on the REST transform, I can now deploy my jobs to >>> Dataflow. >>> >>> Next, I'd like to schedule a batch job >>> <https://cloud.google.com/community/tutorials/schedule-dataflow-jobs-with-cloud-scheduler>, >>> but for this I need to create a >>> <https://cloud.google.com/dataflow/docs/concepts/dataflow-templates> >>> template >>> <https://cloud.google.com/dataflow/docs/concepts/dataflow-templates>. >>> I've searched the Hop documentation but haven't found anything on this. I'm >>> guessing that flex-templates >>> <https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template> >>> are >>> the way to go, due to the fat-jar, but I'm wondering what to pass as >>> the FLEX_TEMPLATE_JAVA_MAIN_CLASS. >>> >>> cheers >>> >>> Fabian >>> >> >> > > -- > Neo4j Chief Solutions Architect > *✉ *[email protected] > > > >
