You can use the same docker image for both template launcher and Dataflow
job. Here is one example:
https://github.com/google/dataflow-ml-starter/blob/main/tensorflow_gpu.flex.Dockerfile#L60

On Fri, Dec 22, 2023 at 8:04 AM Sumit Desai <sumit.de...@uplight.com> wrote:

> Yes, I will have to try it out.
>
> Regards
> Sumit Desai
>
> On Fri, Dec 22, 2023 at 3:53 PM Sofia’s World <mmistr...@gmail.com> wrote:
>
>> I guess so, i am not an expert on using env variables in dataflow
>> pipelines as any config dependencies i  need, i pass them as job input
>> params
>>
>> But perhaps you can configure variables in your docker file (i am not an
>> expert in this either),  as  flex templates use Docker?
>>
>>
>> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates
>>
>> hth
>>   Marco
>>
>>
>>
>>
>> On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <sumit.de...@uplight.com>
>> wrote:
>>
>>> We are using an external non-public package which expects environmental
>>> variables only. If environmental variables are not found, it will throw an
>>> error. We can't change source of this package.
>>>
>>> Does this mean we will face same problem with flex templates also?
>>>
>>> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mmistr...@gmail.com> wrote:
>>>
>>>> The flex template will allow you to pass input params with dynamic
>>>> values to your data flow job so you could replace the env variable with
>>>> that input? That is, unless you have to have env bars..but from your
>>>> snippets it appears you are just using them to configure one of your
>>>> components?
>>>> Hth
>>>>
>>>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <sumit.de...@uplight.com>
>>>> wrote:
>>>>
>>>>> Hi Sofia and XQ,
>>>>>
>>>>> The application is failing because I have loggers defined in every
>>>>> file and the method to create a logger tries to create an object of
>>>>> UplightTelemetry. If I use flex templated, will the environmental 
>>>>> variables
>>>>> I supply be loaded before the application gets loaded? If not, it would 
>>>>> not
>>>>> serve my purpose.
>>>>>
>>>>> Thanks & Regards,
>>>>> Sumit Desai
>>>>>
>>>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <sumit.de...@uplight.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you HQ. Will take a look at this.
>>>>>>
>>>>>> Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <x...@google.com> wrote:
>>>>>>
>>>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>>>> use custom container:
>>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>>> Here is a sample project:
>>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mmistr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Sumit
>>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>>>> viable option...
>>>>>>>> Hth
>>>>>>>>
>>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <sumit.de...@uplight.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Sofia,
>>>>>>>>>
>>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>>> template. Is there a way to pass environmental variables without 
>>>>>>>>> using any
>>>>>>>>> template?
>>>>>>>>>
>>>>>>>>> Thanks & Regards,
>>>>>>>>> Sumit Desai
>>>>>>>>>
>>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mmistr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>>
>>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>>> creating pipeline_options object. This package expects an 
>>>>>>>>>>> environmental
>>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not 
>>>>>>>>>>> present
>>>>>>>>>>> in the Dataflow worker, it is resulting in an error during 
>>>>>>>>>>> application
>>>>>>>>>>> startup.
>>>>>>>>>>>
>>>>>>>>>>> I am passing this variable using custom pipeline options. Code
>>>>>>>>>>> to create pipeline options is as follows-
>>>>>>>>>>>
>>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>>     region="us-east1",
>>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>>     
>>>>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>>     
>>>>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> result = (
>>>>>>>>>>>         pipeline
>>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>>         | "Parse input PCollection" >> 
>>>>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>>         | "Fetch bills " >> 
>>>>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>>
>>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>>> options available in the worker?
>>>>>>>>>>>
>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>> Sumit Desai
>>>>>>>>>>>
>>>>>>>>>>
>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <x...@google.com> wrote:
>>>>>
>>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>>> use custom container:
>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>> Here is a sample project:
>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mmistr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Sumit
>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>>> viable option...
>>>>>>> Hth
>>>>>>>
>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <sumit.de...@uplight.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Sofia,
>>>>>>>>
>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>> template. Is there a way to pass environmental variables without using 
>>>>>>>> any
>>>>>>>> template?
>>>>>>>>
>>>>>>>> Thanks & Regards,
>>>>>>>> Sumit Desai
>>>>>>>>
>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mmistr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>
>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>> creating pipeline_options object. This package expects an 
>>>>>>>>>> environmental
>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not 
>>>>>>>>>> present
>>>>>>>>>> in the Dataflow worker, it is resulting in an error during 
>>>>>>>>>> application
>>>>>>>>>> startup.
>>>>>>>>>>
>>>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>>>> create pipeline options is as follows-
>>>>>>>>>>
>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>     region="us-east1",
>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>     
>>>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>     
>>>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> result = (
>>>>>>>>>>         pipeline
>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>         | "Parse input PCollection" >> 
>>>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>         | "Fetch bills " >> 
>>>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>
>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>> options available in the worker?
>>>>>>>>>>
>>>>>>>>>> Thanks & Regards,
>>>>>>>>>> Sumit Desai
>>>>>>>>>>
>>>>>>>>>

Reply via email to