Yes, I will have to try it out.

Regards
Sumit Desai

On Fri, Dec 22, 2023 at 3:53 PM Sofia’s World <[email protected]> wrote:

> I guess so, i am not an expert on using env variables in dataflow
> pipelines as any config dependencies i  need, i pass them as job input
> params
>
> But perhaps you can configure variables in your docker file (i am not an
> expert in this either),  as  flex templates use Docker?
>
>
> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates
>
> hth
>   Marco
>
>
>
>
> On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <[email protected]>
> wrote:
>
>> We are using an external non-public package which expects environmental
>> variables only. If environmental variables are not found, it will throw an
>> error. We can't change source of this package.
>>
>> Does this mean we will face same problem with flex templates also?
>>
>> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <[email protected]> wrote:
>>
>>> The flex template will allow you to pass input params with dynamic
>>> values to your data flow job so you could replace the env variable with
>>> that input? That is, unless you have to have env bars..but from your
>>> snippets it appears you are just using them to configure one of your
>>> components?
>>> Hth
>>>
>>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <[email protected]> wrote:
>>>
>>>> Hi Sofia and XQ,
>>>>
>>>> The application is failing because I have loggers defined in every file
>>>> and the method to create a logger tries to create an object of
>>>> UplightTelemetry. If I use flex templated, will the environmental variables
>>>> I supply be loaded before the application gets loaded? If not, it would not
>>>> serve my purpose.
>>>>
>>>> Thanks & Regards,
>>>> Sumit Desai
>>>>
>>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <[email protected]>
>>>> wrote:
>>>>
>>>>> Thank you HQ. Will take a look at this.
>>>>>
>>>>> Regards,
>>>>> Sumit Desai
>>>>>
>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <[email protected]> wrote:
>>>>>
>>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>>> use custom container:
>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>> Here is a sample project:
>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Sumit
>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>>> viable option...
>>>>>>> Hth
>>>>>>>
>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Sofia,
>>>>>>>>
>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>> template. Is there a way to pass environmental variables without using 
>>>>>>>> any
>>>>>>>> template?
>>>>>>>>
>>>>>>>> Thanks & Regards,
>>>>>>>> Sumit Desai
>>>>>>>>
>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>
>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>> creating pipeline_options object. This package expects an 
>>>>>>>>>> environmental
>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not 
>>>>>>>>>> present
>>>>>>>>>> in the Dataflow worker, it is resulting in an error during 
>>>>>>>>>> application
>>>>>>>>>> startup.
>>>>>>>>>>
>>>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>>>> create pipeline options is as follows-
>>>>>>>>>>
>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>     region="us-east1",
>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>     
>>>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>     
>>>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> result = (
>>>>>>>>>>         pipeline
>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>         | "Parse input PCollection" >> 
>>>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>         | "Fetch bills " >> 
>>>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>
>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>> options available in the worker?
>>>>>>>>>>
>>>>>>>>>> Thanks & Regards,
>>>>>>>>>> Sumit Desai
>>>>>>>>>>
>>>>>>>>>
>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <[email protected]> wrote:
>>>>
>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>> use custom container:
>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>> Here is a sample project:
>>>>> https://github.com/google/dataflow-ml-starter
>>>>>
>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello Sumit
>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>> viable option...
>>>>>> Hth
>>>>>>
>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sofia,
>>>>>>>
>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>> template. Is there a way to pass environmental variables without using 
>>>>>>> any
>>>>>>> template?
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>
>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>> creating pipeline_options object. This package expects an 
>>>>>>>>> environmental
>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not 
>>>>>>>>> present
>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>> startup.
>>>>>>>>>
>>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>>> create pipeline options is as follows-
>>>>>>>>>
>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>     region="us-east1",
>>>>>>>>>     job_name=job_name,
>>>>>>>>>     
>>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>     
>>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>     save_main_session=True,
>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> result = (
>>>>>>>>>         pipeline
>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>         | "Parse input PCollection" >> 
>>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>         | "Fetch bills " >> 
>>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>
>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>> options available in the worker?
>>>>>>>>>
>>>>>>>>> Thanks & Regards,
>>>>>>>>> Sumit Desai
>>>>>>>>>
>>>>>>>>

Reply via email to