https://github.com/google/dataflow-ml-starter/tree/main?tab=readme-ov-file#run-the-beam-pipeline-with-dataflow-flex-templates has a full example about how to create your own flex template. FYI.
On Mon, Dec 18, 2023 at 5:01 AM Bartosz Zabłocki via user < user@beam.apache.org> wrote: > Hi Sumit, > could you elaborate a little bit more on what you are trying to achieve > with the templates? > > As far as I know, these base Docker images serve as base images for your > own custom templates. > If you want to use an existing template, you can use one of these: > https://cloud.google.com/dataflow/docs/guides/templates/provided-templates > . > To run it, you just need to invoke `gcloud dataflow jobs run... ` or > equivalent command ( > https://cloud.google.com/dataflow/docs/guides/templates/provided/pubsub-to-pubsub#gcloud). > Or just use the UI to launch it (Cloud Console -> Dataflow -> Jobs -> > Create Job From Template). > > If you want to create your own template (ie a reusable Dataflow pipeline) > take a look at this page: > https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template. > This will let you package your own pipeline as a template. You'll be able > to launch it with the `gcloud dataflow jobs run...` command. > If you want to create a custom container image, which gives you more > control over the environment and dependencies, you can create your own, > custom Docker image. That's where you'll use the base image you mentioned. > See this page for an example: > https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_a_custom_container_for_dependencies > . > > I hope this helps, let me know if you have any other questions. > > Cheers, > Bartosz Zablocki > > On Mon, Dec 18, 2023 at 8:36 AM Sumit Desai via user <user@beam.apache.org> > wrote: > >> I am creating an Apache beam pipeline using Python SDK.I want to use some >> standard template of dataflow (this one >> <https://console.cloud.google.com/gcr/images/dataflow-templates-base/global/python310-template-launcher-base?tab=info>). >> But when I am specifying it using 'template_location' key while creating >> pipeline_options object, I am getting an error `FileNotFoundError: [Errno >> 2] No such file or directory: ' >> gcr.io/dataflow-templates-base/python310-template-launcher-base'` >> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base'> >> >> I also tried to specify the complete version ` >> gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00` >> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00> >> but got the same error. Can someone suggest what I might be doing wrong? >> The code snippet to create pipeline_options is as follows- >> >> def __create_pipeline_options_dataflow(job_name): >> >> >> # Set up the Dataflow runner options >> gcp_project_id = os.environ.get(GCP_PROJECT_ID) >> # TODO:Move to environmental variables >> pipeline_options = { >> 'project': gcp_project_id, >> 'region': "us-east1", >> 'job_name': job_name, # Provide a unique job name >> 'temp_location': >> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp', >> 'staging_location': >> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging', >> 'runner': 'DataflowRunner', >> 'save_main_session': True, >> 'service_account_email': service_account, >> # 'network': f'projects/{gcp_project_id}/global/networks/default', >> # 'subnetwork': >> f'projects/{gcp_project_id}/regions/us-east1/subnetworks/default' >> 'template_location': ' >> gcr.io/dataflow-templates-base/python310-template-launcher-base' >> >> } >> logger.debug(f"pipeline_options created as {pipeline_options}") >> return pipeline_options >> >> >>