I guess so, i am not an expert on using env variables in dataflow pipelines
as any config dependencies i  need, i pass them as job input params

But perhaps you can configure variables in your docker file (i am not an
expert in this either),  as  flex templates use Docker?

https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates

hth
  Marco




On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <sumit.de...@uplight.com>
wrote:

> We are using an external non-public package which expects environmental
> variables only. If environmental variables are not found, it will throw an
> error. We can't change source of this package.
>
> Does this mean we will face same problem with flex templates also?
>
> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mmistr...@gmail.com> wrote:
>
>> The flex template will allow you to pass input params with dynamic values
>> to your data flow job so you could replace the env variable with that
>> input? That is, unless you have to have env bars..but from your snippets it
>> appears you are just using them to configure one of your components?
>> Hth
>>
>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <sumit.de...@uplight.com> wrote:
>>
>>> Hi Sofia and XQ,
>>>
>>> The application is failing because I have loggers defined in every file
>>> and the method to create a logger tries to create an object of
>>> UplightTelemetry. If I use flex templated, will the environmental variables
>>> I supply be loaded before the application gets loaded? If not, it would not
>>> serve my purpose.
>>>
>>> Thanks & Regards,
>>> Sumit Desai
>>>
>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <sumit.de...@uplight.com>
>>> wrote:
>>>
>>>> Thank you HQ. Will take a look at this.
>>>>
>>>> Regards,
>>>> Sumit Desai
>>>>
>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <x...@google.com> wrote:
>>>>
>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>> use custom container:
>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>> Here is a sample project:
>>>>> https://github.com/google/dataflow-ml-starter
>>>>>
>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mmistr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Sumit
>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>> viable option...
>>>>>> Hth
>>>>>>
>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <sumit.de...@uplight.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sofia,
>>>>>>>
>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>> template. Is there a way to pass environmental variables without using 
>>>>>>> any
>>>>>>> template?
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mmistr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>
>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>> creating pipeline_options object. This package expects an 
>>>>>>>>> environmental
>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not 
>>>>>>>>> present
>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>> startup.
>>>>>>>>>
>>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>>> create pipeline options is as follows-
>>>>>>>>>
>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>     region="us-east1",
>>>>>>>>>     job_name=job_name,
>>>>>>>>>     
>>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>     
>>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>     save_main_session=True,
>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> result = (
>>>>>>>>>         pipeline
>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>         | "Parse input PCollection" >> 
>>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>         | "Fetch bills " >> 
>>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>
>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>> options available in the worker?
>>>>>>>>>
>>>>>>>>> Thanks & Regards,
>>>>>>>>> Sumit Desai
>>>>>>>>>
>>>>>>>>
>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <x...@google.com> wrote:
>>>
>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>> use custom container:
>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>>>
>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mmistr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Sumit
>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>> viable option...
>>>>> Hth
>>>>>
>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <sumit.de...@uplight.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sofia,
>>>>>>
>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>> template. Is there a way to pass environmental variables without using 
>>>>>> any
>>>>>> template?
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mmistr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>
>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>> user@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>>>> as runner. The application uses a non-public Python package
>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not 
>>>>>>>> present
>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>> startup.
>>>>>>>>
>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>> create pipeline options is as follows-
>>>>>>>>
>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>     project=gcp_project_id,
>>>>>>>>     region="us-east1",
>>>>>>>>     job_name=job_name,
>>>>>>>>     
>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>     
>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>     runner='DataflowRunner',
>>>>>>>>     save_main_session=True,
>>>>>>>>     service_account_email= service_account,
>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>
>>>>>>>>
>>>>>>>> result = (
>>>>>>>>         pipeline
>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>         | "Parse input PCollection" >> 
>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>         | "Fetch bills " >> 
>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>> )
>>>>>>>>
>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>
>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>> options available in the worker?
>>>>>>>>
>>>>>>>> Thanks & Regards,
>>>>>>>> Sumit Desai
>>>>>>>>
>>>>>>>

Reply via email to