Yes, I will have to try it out. Regards Sumit Desai
On Fri, Dec 22, 2023 at 3:53 PM Sofia’s World <[email protected]> wrote: > I guess so, i am not an expert on using env variables in dataflow > pipelines as any config dependencies i need, i pass them as job input > params > > But perhaps you can configure variables in your docker file (i am not an > expert in this either), as flex templates use Docker? > > > https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates > > hth > Marco > > > > > On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <[email protected]> > wrote: > >> We are using an external non-public package which expects environmental >> variables only. If environmental variables are not found, it will throw an >> error. We can't change source of this package. >> >> Does this mean we will face same problem with flex templates also? >> >> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <[email protected]> wrote: >> >>> The flex template will allow you to pass input params with dynamic >>> values to your data flow job so you could replace the env variable with >>> that input? That is, unless you have to have env bars..but from your >>> snippets it appears you are just using them to configure one of your >>> components? >>> Hth >>> >>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <[email protected]> wrote: >>> >>>> Hi Sofia and XQ, >>>> >>>> The application is failing because I have loggers defined in every file >>>> and the method to create a logger tries to create an object of >>>> UplightTelemetry. If I use flex templated, will the environmental variables >>>> I supply be loaded before the application gets loaded? If not, it would not >>>> serve my purpose. >>>> >>>> Thanks & Regards, >>>> Sumit Desai >>>> >>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <[email protected]> >>>> wrote: >>>> >>>>> Thank you HQ. Will take a look at this. >>>>> >>>>> Regards, >>>>> Sumit Desai >>>>> >>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <[email protected]> wrote: >>>>> >>>>>> Dataflow VMs cannot know your local env variable. I think you should >>>>>> use custom container: >>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers. >>>>>> Here is a sample project: >>>>>> https://github.com/google/dataflow-ml-starter >>>>>> >>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hello Sumit >>>>>>> Thanks. Sorry...I guess if the value of the env variable is always >>>>>>> the same u can pass it as job params?..though it doesn't sound like a >>>>>>> viable option... >>>>>>> Hth >>>>>>> >>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Sofia, >>>>>>>> >>>>>>>> Thanks for the response. For now, we have decided not to use flex >>>>>>>> template. Is there a way to pass environmental variables without using >>>>>>>> any >>>>>>>> template? >>>>>>>> >>>>>>>> Thanks & Regards, >>>>>>>> Sumit Desai >>>>>>>> >>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> My 2 cents. .have u ever considered using flex templates to run >>>>>>>>> your pipeline? Then you can pass all your parameters at runtime.. >>>>>>>>> (Apologies in advance if it does not cover your use case...) >>>>>>>>> >>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I have a Python application which is using Apache beam and >>>>>>>>>> Dataflow as runner. The application uses a non-public Python package >>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while >>>>>>>>>> creating pipeline_options object. This package expects an >>>>>>>>>> environmental >>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not >>>>>>>>>> present >>>>>>>>>> in the Dataflow worker, it is resulting in an error during >>>>>>>>>> application >>>>>>>>>> startup. >>>>>>>>>> >>>>>>>>>> I am passing this variable using custom pipeline options. Code to >>>>>>>>>> create pipeline options is as follows- >>>>>>>>>> >>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions( >>>>>>>>>> project=gcp_project_id, >>>>>>>>>> region="us-east1", >>>>>>>>>> job_name=job_name, >>>>>>>>>> >>>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp', >>>>>>>>>> >>>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging', >>>>>>>>>> runner='DataflowRunner', >>>>>>>>>> save_main_session=True, >>>>>>>>>> service_account_email= service_account, >>>>>>>>>> subnetwork=os.environ.get(SUBNETWORK_URL), >>>>>>>>>> extra_packages=[uplight_telemetry_tar_file_path], >>>>>>>>>> setup_file=setup_file_path, >>>>>>>>>> OTEL_SERVICE_NAME=otel_service_name, >>>>>>>>>> OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes >>>>>>>>>> # Set values for additional custom variables as needed >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> And the code that executes the pipeline is as follows- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> result = ( >>>>>>>>>> pipeline >>>>>>>>>> | "ReadPendingRecordsFromDB" >> read_from_db >>>>>>>>>> | "Parse input PCollection" >> >>>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests) >>>>>>>>>> | "Fetch bills " >> >>>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation()) >>>>>>>>>> ) >>>>>>>>>> >>>>>>>>>> pipeline.run().wait_until_finish() >>>>>>>>>> >>>>>>>>>> Is there a way I can set the environmental variables in custom >>>>>>>>>> options available in the worker? >>>>>>>>>> >>>>>>>>>> Thanks & Regards, >>>>>>>>>> Sumit Desai >>>>>>>>>> >>>>>>>>> >>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <[email protected]> wrote: >>>> >>>>> Dataflow VMs cannot know your local env variable. I think you should >>>>> use custom container: >>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers. >>>>> Here is a sample project: >>>>> https://github.com/google/dataflow-ml-starter >>>>> >>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello Sumit >>>>>> Thanks. Sorry...I guess if the value of the env variable is always >>>>>> the same u can pass it as job params?..though it doesn't sound like a >>>>>> viable option... >>>>>> Hth >>>>>> >>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Sofia, >>>>>>> >>>>>>> Thanks for the response. For now, we have decided not to use flex >>>>>>> template. Is there a way to pass environmental variables without using >>>>>>> any >>>>>>> template? >>>>>>> >>>>>>> Thanks & Regards, >>>>>>> Sumit Desai >>>>>>> >>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> My 2 cents. .have u ever considered using flex templates to run >>>>>>>> your pipeline? Then you can pass all your parameters at runtime.. >>>>>>>> (Apologies in advance if it does not cover your use case...) >>>>>>>> >>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have a Python application which is using Apache beam and >>>>>>>>> Dataflow as runner. The application uses a non-public Python package >>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while >>>>>>>>> creating pipeline_options object. This package expects an >>>>>>>>> environmental >>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not >>>>>>>>> present >>>>>>>>> in the Dataflow worker, it is resulting in an error during application >>>>>>>>> startup. >>>>>>>>> >>>>>>>>> I am passing this variable using custom pipeline options. Code to >>>>>>>>> create pipeline options is as follows- >>>>>>>>> >>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions( >>>>>>>>> project=gcp_project_id, >>>>>>>>> region="us-east1", >>>>>>>>> job_name=job_name, >>>>>>>>> >>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp', >>>>>>>>> >>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging', >>>>>>>>> runner='DataflowRunner', >>>>>>>>> save_main_session=True, >>>>>>>>> service_account_email= service_account, >>>>>>>>> subnetwork=os.environ.get(SUBNETWORK_URL), >>>>>>>>> extra_packages=[uplight_telemetry_tar_file_path], >>>>>>>>> setup_file=setup_file_path, >>>>>>>>> OTEL_SERVICE_NAME=otel_service_name, >>>>>>>>> OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes >>>>>>>>> # Set values for additional custom variables as needed >>>>>>>>> ) >>>>>>>>> >>>>>>>>> >>>>>>>>> And the code that executes the pipeline is as follows- >>>>>>>>> >>>>>>>>> >>>>>>>>> result = ( >>>>>>>>> pipeline >>>>>>>>> | "ReadPendingRecordsFromDB" >> read_from_db >>>>>>>>> | "Parse input PCollection" >> >>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests) >>>>>>>>> | "Fetch bills " >> >>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation()) >>>>>>>>> ) >>>>>>>>> >>>>>>>>> pipeline.run().wait_until_finish() >>>>>>>>> >>>>>>>>> Is there a way I can set the environmental variables in custom >>>>>>>>> options available in the worker? >>>>>>>>> >>>>>>>>> Thanks & Regards, >>>>>>>>> Sumit Desai >>>>>>>>> >>>>>>>>
