https://github.com/google/dataflow-ml-starter/tree/main?tab=readme-ov-file#run-the-beam-pipeline-with-dataflow-flex-templates
has a full example about how to create your own flex template. FYI.

On Mon, Dec 18, 2023 at 5:01 AM Bartosz Zabłocki via user <
user@beam.apache.org> wrote:

> Hi Sumit,
> could you elaborate a little bit more on what you are trying to achieve
> with the templates?
>
> As far as I know, these base Docker images serve as base images for your
> own custom templates.
> If you want to use an existing template, you can use one of these:
> https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
> .
> To run it, you just need to invoke `gcloud dataflow jobs run... ` or
> equivalent command (
> https://cloud.google.com/dataflow/docs/guides/templates/provided/pubsub-to-pubsub#gcloud).
> Or just use the UI to launch it (Cloud Console -> Dataflow -> Jobs ->
> Create Job From Template).
>
> If you want to create your own template (ie a reusable Dataflow pipeline)
> take a look at this page:
> https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#create_a_flex_template.
> This will let you package your own pipeline as a template. You'll be able
> to launch it with the `gcloud dataflow jobs run...` command.
> If you want to create a custom container image, which gives you more
> control over the environment and dependencies, you can create your own,
> custom Docker image. That's where you'll use the base image you mentioned.
> See this page for an example:
> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_a_custom_container_for_dependencies
> .
>
> I hope this helps, let me know if you have any other questions.
>
> Cheers,
> Bartosz Zablocki
>
> On Mon, Dec 18, 2023 at 8:36 AM Sumit Desai via user <user@beam.apache.org>
> wrote:
>
>> I am creating an Apache beam pipeline using Python SDK.I want to use some
>> standard template of dataflow (this one
>> <https://console.cloud.google.com/gcr/images/dataflow-templates-base/global/python310-template-launcher-base?tab=info>).
>> But when I am specifying it using 'template_location' key while creating
>> pipeline_options object, I am getting an error `FileNotFoundError: [Errno
>> 2] No such file or directory: '
>> gcr.io/dataflow-templates-base/python310-template-launcher-base'`
>> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base'>
>>
>> I also tried to specify the complete version `
>> gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00`
>> <http://gcr.io/dataflow-templates-base/python310-template-launcher-base::flex_templates_base_image_release_20231127_RC00>
>> but got the same error. Can someone suggest what I might be doing wrong?
>> The code snippet to create pipeline_options is as follows-
>>
>> def __create_pipeline_options_dataflow(job_name):
>>
>>
>>     # Set up the Dataflow runner options
>>     gcp_project_id = os.environ.get(GCP_PROJECT_ID)
>>     # TODO:Move to environmental variables
>>     pipeline_options = {
>>         'project': gcp_project_id,
>>         'region': "us-east1",
>>         'job_name': job_name,  # Provide a unique job name
>>         'temp_location':
>> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>         'staging_location':
>> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>         'runner': 'DataflowRunner',
>>         'save_main_session': True,
>>         'service_account_email': service_account,
>>         # 'network': f'projects/{gcp_project_id}/global/networks/default',
>>         # 'subnetwork':
>> f'projects/{gcp_project_id}/regions/us-east1/subnetworks/default'
>>         'template_location': '
>> gcr.io/dataflow-templates-base/python310-template-launcher-base'
>>
>>     }
>>     logger.debug(f"pipeline_options created as {pipeline_options}")
>>     return pipeline_options
>>
>>
>>

Reply via email to