You can use the same docker image for both template launcher and Dataflow job. Here is one example: https://github.com/google/dataflow-ml-starter/blob/main/tensorflow_gpu.flex.Dockerfile#L60
On Fri, Dec 22, 2023 at 8:04 AM Sumit Desai <sumit.de...@uplight.com> wrote: > Yes, I will have to try it out. > > Regards > Sumit Desai > > On Fri, Dec 22, 2023 at 3:53 PM Sofia’s World <mmistr...@gmail.com> wrote: > >> I guess so, i am not an expert on using env variables in dataflow >> pipelines as any config dependencies i need, i pass them as job input >> params >> >> But perhaps you can configure variables in your docker file (i am not an >> expert in this either), as flex templates use Docker? >> >> >> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates >> >> hth >> Marco >> >> >> >> >> On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <sumit.de...@uplight.com> >> wrote: >> >>> We are using an external non-public package which expects environmental >>> variables only. If environmental variables are not found, it will throw an >>> error. We can't change source of this package. >>> >>> Does this mean we will face same problem with flex templates also? >>> >>> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mmistr...@gmail.com> wrote: >>> >>>> The flex template will allow you to pass input params with dynamic >>>> values to your data flow job so you could replace the env variable with >>>> that input? That is, unless you have to have env bars..but from your >>>> snippets it appears you are just using them to configure one of your >>>> components? >>>> Hth >>>> >>>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <sumit.de...@uplight.com> >>>> wrote: >>>> >>>>> Hi Sofia and XQ, >>>>> >>>>> The application is failing because I have loggers defined in every >>>>> file and the method to create a logger tries to create an object of >>>>> UplightTelemetry. If I use flex templated, will the environmental >>>>> variables >>>>> I supply be loaded before the application gets loaded? If not, it would >>>>> not >>>>> serve my purpose. >>>>> >>>>> Thanks & Regards, >>>>> Sumit Desai >>>>> >>>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <sumit.de...@uplight.com> >>>>> wrote: >>>>> >>>>>> Thank you HQ. Will take a look at this. >>>>>> >>>>>> Regards, >>>>>> Sumit Desai >>>>>> >>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <x...@google.com> wrote: >>>>>> >>>>>>> Dataflow VMs cannot know your local env variable. I think you should >>>>>>> use custom container: >>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers. >>>>>>> Here is a sample project: >>>>>>> https://github.com/google/dataflow-ml-starter >>>>>>> >>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mmistr...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Sumit >>>>>>>> Thanks. Sorry...I guess if the value of the env variable is always >>>>>>>> the same u can pass it as job params?..though it doesn't sound like a >>>>>>>> viable option... >>>>>>>> Hth >>>>>>>> >>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <sumit.de...@uplight.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Sofia, >>>>>>>>> >>>>>>>>> Thanks for the response. For now, we have decided not to use flex >>>>>>>>> template. Is there a way to pass environmental variables without >>>>>>>>> using any >>>>>>>>> template? >>>>>>>>> >>>>>>>>> Thanks & Regards, >>>>>>>>> Sumit Desai >>>>>>>>> >>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mmistr...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> My 2 cents. .have u ever considered using flex templates to run >>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime.. >>>>>>>>>> (Apologies in advance if it does not cover your use case...) >>>>>>>>>> >>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, < >>>>>>>>>> user@beam.apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I have a Python application which is using Apache beam and >>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package >>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while >>>>>>>>>>> creating pipeline_options object. This package expects an >>>>>>>>>>> environmental >>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not >>>>>>>>>>> present >>>>>>>>>>> in the Dataflow worker, it is resulting in an error during >>>>>>>>>>> application >>>>>>>>>>> startup. >>>>>>>>>>> >>>>>>>>>>> I am passing this variable using custom pipeline options. Code >>>>>>>>>>> to create pipeline options is as follows- >>>>>>>>>>> >>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions( >>>>>>>>>>> project=gcp_project_id, >>>>>>>>>>> region="us-east1", >>>>>>>>>>> job_name=job_name, >>>>>>>>>>> >>>>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp', >>>>>>>>>>> >>>>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging', >>>>>>>>>>> runner='DataflowRunner', >>>>>>>>>>> save_main_session=True, >>>>>>>>>>> service_account_email= service_account, >>>>>>>>>>> subnetwork=os.environ.get(SUBNETWORK_URL), >>>>>>>>>>> extra_packages=[uplight_telemetry_tar_file_path], >>>>>>>>>>> setup_file=setup_file_path, >>>>>>>>>>> OTEL_SERVICE_NAME=otel_service_name, >>>>>>>>>>> OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes >>>>>>>>>>> # Set values for additional custom variables as needed >>>>>>>>>>> ) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> And the code that executes the pipeline is as follows- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> result = ( >>>>>>>>>>> pipeline >>>>>>>>>>> | "ReadPendingRecordsFromDB" >> read_from_db >>>>>>>>>>> | "Parse input PCollection" >> >>>>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests) >>>>>>>>>>> | "Fetch bills " >> >>>>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation()) >>>>>>>>>>> ) >>>>>>>>>>> >>>>>>>>>>> pipeline.run().wait_until_finish() >>>>>>>>>>> >>>>>>>>>>> Is there a way I can set the environmental variables in custom >>>>>>>>>>> options available in the worker? >>>>>>>>>>> >>>>>>>>>>> Thanks & Regards, >>>>>>>>>>> Sumit Desai >>>>>>>>>>> >>>>>>>>>> >>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <x...@google.com> wrote: >>>>> >>>>>> Dataflow VMs cannot know your local env variable. I think you should >>>>>> use custom container: >>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers. >>>>>> Here is a sample project: >>>>>> https://github.com/google/dataflow-ml-starter >>>>>> >>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mmistr...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello Sumit >>>>>>> Thanks. Sorry...I guess if the value of the env variable is always >>>>>>> the same u can pass it as job params?..though it doesn't sound like a >>>>>>> viable option... >>>>>>> Hth >>>>>>> >>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <sumit.de...@uplight.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Sofia, >>>>>>>> >>>>>>>> Thanks for the response. For now, we have decided not to use flex >>>>>>>> template. Is there a way to pass environmental variables without using >>>>>>>> any >>>>>>>> template? >>>>>>>> >>>>>>>> Thanks & Regards, >>>>>>>> Sumit Desai >>>>>>>> >>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mmistr...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> My 2 cents. .have u ever considered using flex templates to run >>>>>>>>> your pipeline? Then you can pass all your parameters at runtime.. >>>>>>>>> (Apologies in advance if it does not cover your use case...) >>>>>>>>> >>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, < >>>>>>>>> user@beam.apache.org> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I have a Python application which is using Apache beam and >>>>>>>>>> Dataflow as runner. The application uses a non-public Python package >>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while >>>>>>>>>> creating pipeline_options object. This package expects an >>>>>>>>>> environmental >>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not >>>>>>>>>> present >>>>>>>>>> in the Dataflow worker, it is resulting in an error during >>>>>>>>>> application >>>>>>>>>> startup. >>>>>>>>>> >>>>>>>>>> I am passing this variable using custom pipeline options. Code to >>>>>>>>>> create pipeline options is as follows- >>>>>>>>>> >>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions( >>>>>>>>>> project=gcp_project_id, >>>>>>>>>> region="us-east1", >>>>>>>>>> job_name=job_name, >>>>>>>>>> >>>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp', >>>>>>>>>> >>>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging', >>>>>>>>>> runner='DataflowRunner', >>>>>>>>>> save_main_session=True, >>>>>>>>>> service_account_email= service_account, >>>>>>>>>> subnetwork=os.environ.get(SUBNETWORK_URL), >>>>>>>>>> extra_packages=[uplight_telemetry_tar_file_path], >>>>>>>>>> setup_file=setup_file_path, >>>>>>>>>> OTEL_SERVICE_NAME=otel_service_name, >>>>>>>>>> OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes >>>>>>>>>> # Set values for additional custom variables as needed >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> And the code that executes the pipeline is as follows- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> result = ( >>>>>>>>>> pipeline >>>>>>>>>> | "ReadPendingRecordsFromDB" >> read_from_db >>>>>>>>>> | "Parse input PCollection" >> >>>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests) >>>>>>>>>> | "Fetch bills " >> >>>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation()) >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> pipeline.run().wait_until_finish() >>>>>>>>>> >>>>>>>>>> Is there a way I can set the environmental variables in custom >>>>>>>>>> options available in the worker? >>>>>>>>>> >>>>>>>>>> Thanks & Regards, >>>>>>>>>> Sumit Desai >>>>>>>>>> >>>>>>>>>