Hi Sumit,
could you elaborate a little bit more on what you are trying to achieve
with the templates?

As far as I know, these base Docker images serve as base images for your
own custom templates.
If you want to use an existing template, you can use one of these:
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates.
To run it, you just need to invoke `gcloud dataflow jobs run... ` or
equivalent command (
https://cloud.google.com/dataflow/docs/guides/templates/provided/pubsub-to-pubsub#gcloud).
Or just use the UI to launch it (Cloud Console -> Dataflow -> Jobs ->
Create Job From Template).

If you want to create your own template (ie a reusable Dataflow pipeline)
take a look at this page:
https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template.
This will let you package your own pipeline as a template. You'll be able
to launch it with the `gcloud dataflow jobs run...` command.
If you want to create a custom container image, which gives you more
control over the environment and dependencies, you can create your own,
custom Docker image. That's where you'll use the base image you mentioned.
See this page for an example:
https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_a_custom_container_for_dependencies
.

I hope this helps, let me know if you have any other questions.

Cheers,
Bartosz Zablocki

On Mon, Dec 18, 2023 at 8:36 AM Sumit Desai via user <user@beam.apache.org>
wrote:

> I am creating an Apache beam pipeline using Python SDK.I want to use some
> standard template of dataflow (this one
> <https://console.cloud.google.com/gcr/images/dataflow-templates-base/global/python310-template-launcher-base?tab=info>).
> But when I am specifying it using 'template_location' key while creating
> pipeline_options object, I am getting an error `FileNotFoundError: [Errno
> 2] No such file or directory: '
> gcr.io/dataflow-templates-base/python310-template-launcher-base'`
> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base'>
>
> I also tried to specify the complete version `
> gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00`
> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00>
> but got the same error. Can someone suggest what I might be doing wrong?
> The code snippet to create pipeline_options is as follows-
>
> def __create_pipeline_options_dataflow(job_name):
>
>
>     # Set up the Dataflow runner options
>     gcp_project_id = os.environ.get(GCP_PROJECT_ID)
>     # TODO:Move to environmental variables
>     pipeline_options = {
>         'project': gcp_project_id,
>         'region': "us-east1",
>         'job_name': job_name,  # Provide a unique job name
>         'temp_location':
> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>         'staging_location':
> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>         'runner': 'DataflowRunner',
>         'save_main_session': True,
>         'service_account_email': service_account,
>         # 'network': f'projects/{gcp_project_id}/global/networks/default',
>         # 'subnetwork':
> f'projects/{gcp_project_id}/regions/us-east1/subnetworks/default'
>         'template_location': '
> gcr.io/dataflow-templates-base/python310-template-launcher-base'
>
>     }
>     logger.debug(f"pipeline_options created as {pipeline_options}")
>     return pipeline_options
>
>
>

Reply via email to