Hi Hans,

No, I didn't yet have another go. The hints from Matt (didn't see that mail on 
the list?) do look quite useful in the context of Datlow templates. I'll try to 
see whether I can get a bit further, but if you have time to have a look at it, 
I'd much appreciate!

cheers

Fabian

> Am 16.08.2022 um 11:09 schrieb Hans Van Akelyen <[email protected]>:
> 
> Hi Fabian,
> 
> Did you get this working and are you willing to share the final results?
> If not I will see what I can do, and we can add it to our documentation.
> 
> Cheers,
> Hans
> 
> On Thu, 11 Aug 2022 at 13:14, Matt Casters <[email protected] 
> <mailto:[email protected]>> wrote:
> When you run class org.apache.hop.beam.run.MainBeam you need to provide 3 
> arguments to run:
> 
> 1. The filename of the pipeline to run
> 2. The filename which contains Hop metadata
> 3. The name of the pipeline run configuration to use
> 
> See also for example: 
> https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-flink-pipeline-engine.html#_running_with_flink_run
>  
> <https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-flink-pipeline-engine.html#_running_with_flink_run>
> 
> Good luck,
> Matt
> 
> 
> On Thu, Aug 11, 2022 at 11:08 AM Fabian Peters <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello Hans,
> 
> I went through the flex-template process yesterday but the generated template 
> does not work. The main piece that's missing for me is how to pass the actual 
> pipeline that should be run. My test boiled down to:
> 
> gcloud dataflow flex-template build 
> gs://foo_ag_dataflow/tmp/todays-directories.json <> \
>       --image-gcr-path 
> "europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest 
> <http://europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest>" \
>       --sdk-language "JAVA" \
>       --flex-template-base-image JAVA11 \
>       --metadata-file 
> "/Users/fabian/Documents/src/foo/fooDataEngineering/hop/dataflow/todays-directories.json"
>  \
>       --jar "/Users/fabian/tmp/fat-hop.jar" \
>       --env FLEX_TEMPLATE_JAVA_MAIN_CLASS="org.apache.hop.beam.run.MainBeam"
> 
> gcloud dataflow flex-template run "todays-directories-`date +%Y%m%d-%H%M%S`" \
>     --template-file-gcs-location 
> "gs://foo_ag_dataflow/tmp/todays-directories.json <>" \
>     --region "europe-west1"
> 
> With Dockerfile:
> 
> FROM gcr.io/dataflow-templates-base/java11-template-launcher-base 
> <http://gcr.io/dataflow-templates-base/java11-template-launcher-base>
> 
> ARG WORKDIR=/dataflow/template
> RUN mkdir -p ${WORKDIR}
> WORKDIR ${WORKDIR}
> 
> ENV FLEX_TEMPLATE_JAVA_MAIN_CLASS="org.apache.hop.beam.run.MainBeam"
> ENV FLEX_TEMPLATE_JAVA_CLASSPATH="/dataflow/template/*"
> 
> ENTRYPOINT ["/opt/google/dataflow/java_template_launcher"]
> 
> 
> And "todays-directories.json":
> 
> {
>     "defaultEnvironment": {},
>     "image": "europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest 
> <http://europe-west1-docker.pkg.dev/dashboard-foo/dataflow/hop:latest>",
>     "metadata": {
>         "description": "Test templates creation with Apache Hop",
>         "name": "Todays directories"
>     },
>     "sdkInfo": {
>         "language": "JAVA"
>     }
> }
> 
> Thanks for having a look at it!
> 
> cheers
> 
> Fabian
> 
>> Am 10.08.2022 um 16:03 schrieb Hans Van Akelyen <[email protected] 
>> <mailto:[email protected]>>:
>> 
>> Hi Fabian,
>> 
>> You have indeed found something we have not yet documented, mainly because 
>> we have not yet tried it out ourselves.
>> The main class that gets called when running Beam pipelines is 
>> "org.apache.hop.beam.run.MainBeam".
>> 
>> I was hoping the "Import as pipeline" button on a job would give you 
>> everything you need to execute this but it does not.
>> I'll take a closer look the following days to see what is needed to use this 
>> functionality, could be that we need to export the template based on a 
>> pipeline.
>> 
>> Kr,
>> Hans
>> 
>> On Wed, 10 Aug 2022 at 15:46, Fabian Peters <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi all!
>> 
>> Thanks to Hans' work on the REST transform, I can now deploy my jobs to 
>> Dataflow.
>> 
>> Next, I'd like to schedule a batch job 
>> <https://cloud.google.com/community/tutorials/schedule-dataflow-jobs-with-cloud-scheduler>,
>>  but for this I need to create a  
>> <https://cloud.google.com/dataflow/docs/concepts/dataflow-templates>template 
>> <https://cloud.google.com/dataflow/docs/concepts/dataflow-templates>. I've 
>> searched the Hop documentation but haven't found anything on this. I'm 
>> guessing that flex-templates 
>> <https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template>
>>  are the way to go, due to the fat-jar, but I'm wondering what to pass as 
>> the FLEX_TEMPLATE_JAVA_MAIN_CLASS.
>> 
>> cheers
>> 
>> Fabian
> 
> 
> 
> -- 
> Neo4j Chief Solutions Architect
> ✉   [email protected] <mailto:[email protected]>
> 
> 
> 

Reply via email to