The flex template will allow you to pass input params with dynamic values
to your data flow job so you could replace the env variable with that
input? That is, unless you have to have env bars..but from your snippets it
appears you are just using them to configure one of your components?
Hth

On Fri, 22 Dec 2023, 10:01 Sumit Desai, <sumit.de...@uplight.com> wrote:

> Hi Sofia and XQ,
>
> The application is failing because I have loggers defined in every file
> and the method to create a logger tries to create an object of
> UplightTelemetry. If I use flex templated, will the environmental variables
> I supply be loaded before the application gets loaded? If not, it would not
> serve my purpose.
>
> Thanks & Regards,
> Sumit Desai
>
> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <sumit.de...@uplight.com>
> wrote:
>
>> Thank you HQ. Will take a look at this.
>>
>> Regards,
>> Sumit Desai
>>
>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <x...@google.com> wrote:
>>
>>> Dataflow VMs cannot know your local env variable. I think you should use
>>> custom container:
>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>>
>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mmistr...@gmail.com>
>>> wrote:
>>>
>>>> Hello Sumit
>>>>  Thanks. Sorry...I guess if the value of the env variable is always the
>>>> same u can pass it as job params?..though it doesn't sound like a
>>>> viable option...
>>>> Hth
>>>>
>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <sumit.de...@uplight.com>
>>>> wrote:
>>>>
>>>>> Hi Sofia,
>>>>>
>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>> template. Is there a way to pass environmental variables without using any
>>>>> template?
>>>>>
>>>>> Thanks & Regards,
>>>>> Sumit Desai
>>>>>
>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mmistr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>  My 2 cents. .have u ever considered using flex templates to run your
>>>>>> pipeline? Then you can pass all your parameters at runtime..
>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>
>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>> user@beam.apache.org> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>>> as runner. The application uses a non-public Python package
>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not 
>>>>>>> present
>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>> startup.
>>>>>>>
>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>> create pipeline options is as follows-
>>>>>>>
>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>     project=gcp_project_id,
>>>>>>>     region="us-east1",
>>>>>>>     job_name=job_name,
>>>>>>>     
>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>     
>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>     runner='DataflowRunner',
>>>>>>>     save_main_session=True,
>>>>>>>     service_account_email= service_account,
>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>     setup_file=setup_file_path,
>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>     # Set values for additional custom variables as needed
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>
>>>>>>>
>>>>>>> result = (
>>>>>>>         pipeline
>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>         | "Parse input PCollection" >> 
>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>         | "Fetch bills " >> 
>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>> )
>>>>>>>
>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>
>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>> options available in the worker?
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>
> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <x...@google.com> wrote:
>
>> Dataflow VMs cannot know your local env variable. I think you should use
>> custom container:
>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>
>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mmistr...@gmail.com>
>> wrote:
>>
>>> Hello Sumit
>>>  Thanks. Sorry...I guess if the value of the env variable is always the
>>> same u can pass it as job params?..though it doesn't sound like a
>>> viable option...
>>> Hth
>>>
>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <sumit.de...@uplight.com> wrote:
>>>
>>>> Hi Sofia,
>>>>
>>>> Thanks for the response. For now, we have decided not to use flex
>>>> template. Is there a way to pass environmental variables without using any
>>>> template?
>>>>
>>>> Thanks & Regards,
>>>> Sumit Desai
>>>>
>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mmistr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>  My 2 cents. .have u ever considered using flex templates to run your
>>>>> pipeline? Then you can pass all your parameters at runtime..
>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>
>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <user@beam.apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>> as runner. The application uses a non-public Python package
>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>> startup.
>>>>>>
>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>> create pipeline options is as follows-
>>>>>>
>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>     project=gcp_project_id,
>>>>>>     region="us-east1",
>>>>>>     job_name=job_name,
>>>>>>     
>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>     
>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>     runner='DataflowRunner',
>>>>>>     save_main_session=True,
>>>>>>     service_account_email= service_account,
>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>     setup_file=setup_file_path,
>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>     # Set values for additional custom variables as needed
>>>>>> )
>>>>>>
>>>>>>
>>>>>> And the code that executes the pipeline is as follows-
>>>>>>
>>>>>>
>>>>>> result = (
>>>>>>         pipeline
>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>         | "Parse input PCollection" >> 
>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>         | "Fetch bills " >> 
>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>> )
>>>>>>
>>>>>> pipeline.run().wait_until_finish()
>>>>>>
>>>>>> Is there a way I can set the environmental variables in custom
>>>>>> options available in the worker?
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>

Reply via email to