Hi Hans, No, I didn't yet have another go. The hints from Matt (didn't see that mail on the list?) do look quite useful in the context of Datlow templates. I'll try to see whether I can get a bit further, but if you have time to have a look at it, I'd much appreciate!
cheers Fabian > Am 16.08.2022 um 11:09 schrieb Hans Van Akelyen <[email protected]>: > > Hi Fabian, > > Did you get this working and are you willing to share the final results? > If not I will see what I can do, and we can add it to our documentation. > > Cheers, > Hans > > On Thu, 11 Aug 2022 at 13:14, Matt Casters <[email protected] > <mailto:[email protected]>> wrote: > When you run class org.apache.hop.beam.run.MainBeam you need to provide 3 > arguments to run: > > 1. The filename of the pipeline to run > 2. The filename which contains Hop metadata > 3. The name of the pipeline run configuration to use > > See also for example: > https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-flink-pipeline-engine.html#_running_with_flink_run > > <https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-flink-pipeline-engine.html#_running_with_flink_run> > > Good luck, > Matt > > > On Thu, Aug 11, 2022 at 11:08 AM Fabian Peters <[email protected] > <mailto:[email protected]>> wrote: > Hello Hans, > > I went through the flex-template process yesterday but the generated template > does not work. The main piece that's missing for me is how to pass the actual > pipeline that should be run. My test boiled down to: > > gcloud dataflow flex-template build > gs://foo_ag_dataflow/tmp/todays-directories.json <> \ > --image-gcr-path > "europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest > <http://europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest>" \ > --sdk-language "JAVA" \ > --flex-template-base-image JAVA11 \ > --metadata-file > "/Users/fabian/Documents/src/foo/fooDataEngineering/hop/dataflow/todays-directories.json" > \ > --jar "/Users/fabian/tmp/fat-hop.jar" \ > --env FLEX_TEMPLATE_JAVA_MAIN_CLASS="org.apache.hop.beam.run.MainBeam" > > gcloud dataflow flex-template run "todays-directories-`date +%Y%m%d-%H%M%S`" \ > --template-file-gcs-location > "gs://foo_ag_dataflow/tmp/todays-directories.json <>" \ > --region "europe-west1" > > With Dockerfile: > > FROM gcr.io/dataflow-templates-base/java11-template-launcher-base > <http://gcr.io/dataflow-templates-base/java11-template-launcher-base> > > ARG WORKDIR=/dataflow/template > RUN mkdir -p ${WORKDIR} > WORKDIR ${WORKDIR} > > ENV FLEX_TEMPLATE_JAVA_MAIN_CLASS="org.apache.hop.beam.run.MainBeam" > ENV FLEX_TEMPLATE_JAVA_CLASSPATH="/dataflow/template/*" > > ENTRYPOINT ["/opt/google/dataflow/java_template_launcher"] > > > And "todays-directories.json": > > { > "defaultEnvironment": {}, > "image": "europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest > <http://europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest>", > "metadata": { > "description": "Test templates creation with Apache Hop", > "name": "Todays directories" > }, > "sdkInfo": { > "language": "JAVA" > } > } > > Thanks for having a look at it! > > cheers > > Fabian > >> Am 10.08.2022 um 16:03 schrieb Hans Van Akelyen <[email protected] >> <mailto:[email protected]>>: >> >> Hi Fabian, >> >> You have indeed found something we have not yet documented, mainly because >> we have not yet tried it out ourselves. >> The main class that gets called when running Beam pipelines is >> "org.apache.hop.beam.run.MainBeam". >> >> I was hoping the "Import as pipeline" button on a job would give you >> everything you need to execute this but it does not. >> I'll take a closer look the following days to see what is needed to use this >> functionality, could be that we need to export the template based on a >> pipeline. >> >> Kr, >> Hans >> >> On Wed, 10 Aug 2022 at 15:46, Fabian Peters <[email protected] >> <mailto:[email protected]>> wrote: >> Hi all! >> >> Thanks to Hans' work on the REST transform, I can now deploy my jobs to >> Dataflow. >> >> Next, I'd like to schedule a batch job >> <https://cloud.google.com/community/tutorials/schedule-dataflow-jobs-with-cloud-scheduler>, >> but for this I need to create a >> <https://cloud.google.com/dataflow/docs/concepts/dataflow-templates>template >> <https://cloud.google.com/dataflow/docs/concepts/dataflow-templates>. I've >> searched the Hop documentation but haven't found anything on this. I'm >> guessing that flex-templates >> <https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template> >> are the way to go, due to the fat-jar, but I'm wondering what to pass as >> the FLEX_TEMPLATE_JAVA_MAIN_CLASS. >> >> cheers >> >> Fabian > > > > -- > Neo4j Chief Solutions Architect > ✉ [email protected] <mailto:[email protected]> > > >
