I guess so, i am not an expert on using env variables in dataflow pipelines as any config dependencies i need, i pass them as job input params
But perhaps you can configure variables in your docker file (i am not an expert in this either), as flex templates use Docker? https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates hth Marco On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <[email protected]> wrote: > We are using an external non-public package which expects environmental > variables only. If environmental variables are not found, it will throw an > error. We can't change source of this package. > > Does this mean we will face same problem with flex templates also? > > On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <[email protected]> wrote: > >> The flex template will allow you to pass input params with dynamic values >> to your data flow job so you could replace the env variable with that >> input? That is, unless you have to have env bars..but from your snippets it >> appears you are just using them to configure one of your components? >> Hth >> >> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <[email protected]> wrote: >> >>> Hi Sofia and XQ, >>> >>> The application is failing because I have loggers defined in every file >>> and the method to create a logger tries to create an object of >>> UplightTelemetry. If I use flex templated, will the environmental variables >>> I supply be loaded before the application gets loaded? If not, it would not >>> serve my purpose. >>> >>> Thanks & Regards, >>> Sumit Desai >>> >>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <[email protected]> >>> wrote: >>> >>>> Thank you HQ. Will take a look at this. >>>> >>>> Regards, >>>> Sumit Desai >>>> >>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <[email protected]> wrote: >>>> >>>>> Dataflow VMs cannot know your local env variable. I think you should >>>>> use custom container: >>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers. >>>>> Here is a sample project: >>>>> https://github.com/google/dataflow-ml-starter >>>>> >>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello Sumit >>>>>> Thanks. Sorry...I guess if the value of the env variable is always >>>>>> the same u can pass it as job params?..though it doesn't sound like a >>>>>> viable option... >>>>>> Hth >>>>>> >>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Sofia, >>>>>>> >>>>>>> Thanks for the response. For now, we have decided not to use flex >>>>>>> template. Is there a way to pass environmental variables without using >>>>>>> any >>>>>>> template? >>>>>>> >>>>>>> Thanks & Regards, >>>>>>> Sumit Desai >>>>>>> >>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> My 2 cents. .have u ever considered using flex templates to run >>>>>>>> your pipeline? Then you can pass all your parameters at runtime.. >>>>>>>> (Apologies in advance if it does not cover your use case...) >>>>>>>> >>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have a Python application which is using Apache beam and >>>>>>>>> Dataflow as runner. The application uses a non-public Python package >>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while >>>>>>>>> creating pipeline_options object. This package expects an >>>>>>>>> environmental >>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not >>>>>>>>> present >>>>>>>>> in the Dataflow worker, it is resulting in an error during application >>>>>>>>> startup. >>>>>>>>> >>>>>>>>> I am passing this variable using custom pipeline options. Code to >>>>>>>>> create pipeline options is as follows- >>>>>>>>> >>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions( >>>>>>>>> project=gcp_project_id, >>>>>>>>> region="us-east1", >>>>>>>>> job_name=job_name, >>>>>>>>> >>>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp', >>>>>>>>> >>>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging', >>>>>>>>> runner='DataflowRunner', >>>>>>>>> save_main_session=True, >>>>>>>>> service_account_email= service_account, >>>>>>>>> subnetwork=os.environ.get(SUBNETWORK_URL), >>>>>>>>> extra_packages=[uplight_telemetry_tar_file_path], >>>>>>>>> setup_file=setup_file_path, >>>>>>>>> OTEL_SERVICE_NAME=otel_service_name, >>>>>>>>> OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes >>>>>>>>> # Set values for additional custom variables as needed >>>>>>>>> ) >>>>>>>>> >>>>>>>>> >>>>>>>>> And the code that executes the pipeline is as follows- >>>>>>>>> >>>>>>>>> >>>>>>>>> result = ( >>>>>>>>> pipeline >>>>>>>>> | "ReadPendingRecordsFromDB" >> read_from_db >>>>>>>>> | "Parse input PCollection" >> >>>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests) >>>>>>>>> | "Fetch bills " >> >>>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation()) >>>>>>>>> ) >>>>>>>>> >>>>>>>>> pipeline.run().wait_until_finish() >>>>>>>>> >>>>>>>>> Is there a way I can set the environmental variables in custom >>>>>>>>> options available in the worker? >>>>>>>>> >>>>>>>>> Thanks & Regards, >>>>>>>>> Sumit Desai >>>>>>>>> >>>>>>>> >>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <[email protected]> wrote: >>> >>>> Dataflow VMs cannot know your local env variable. I think you should >>>> use custom container: >>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers. >>>> Here is a sample project: https://github.com/google/dataflow-ml-starter >>>> >>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <[email protected]> >>>> wrote: >>>> >>>>> Hello Sumit >>>>> Thanks. Sorry...I guess if the value of the env variable is always >>>>> the same u can pass it as job params?..though it doesn't sound like a >>>>> viable option... >>>>> Hth >>>>> >>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Sofia, >>>>>> >>>>>> Thanks for the response. For now, we have decided not to use flex >>>>>> template. Is there a way to pass environmental variables without using >>>>>> any >>>>>> template? >>>>>> >>>>>> Thanks & Regards, >>>>>> Sumit Desai >>>>>> >>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> My 2 cents. .have u ever considered using flex templates to run >>>>>>> your pipeline? Then you can pass all your parameters at runtime.. >>>>>>> (Apologies in advance if it does not cover your use case...) >>>>>>> >>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I have a Python application which is using Apache beam and Dataflow >>>>>>>> as runner. The application uses a non-public Python package >>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while >>>>>>>> creating pipeline_options object. This package expects an environmental >>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not >>>>>>>> present >>>>>>>> in the Dataflow worker, it is resulting in an error during application >>>>>>>> startup. >>>>>>>> >>>>>>>> I am passing this variable using custom pipeline options. Code to >>>>>>>> create pipeline options is as follows- >>>>>>>> >>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions( >>>>>>>> project=gcp_project_id, >>>>>>>> region="us-east1", >>>>>>>> job_name=job_name, >>>>>>>> >>>>>>>> temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp', >>>>>>>> >>>>>>>> staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging', >>>>>>>> runner='DataflowRunner', >>>>>>>> save_main_session=True, >>>>>>>> service_account_email= service_account, >>>>>>>> subnetwork=os.environ.get(SUBNETWORK_URL), >>>>>>>> extra_packages=[uplight_telemetry_tar_file_path], >>>>>>>> setup_file=setup_file_path, >>>>>>>> OTEL_SERVICE_NAME=otel_service_name, >>>>>>>> OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes >>>>>>>> # Set values for additional custom variables as needed >>>>>>>> ) >>>>>>>> >>>>>>>> >>>>>>>> And the code that executes the pipeline is as follows- >>>>>>>> >>>>>>>> >>>>>>>> result = ( >>>>>>>> pipeline >>>>>>>> | "ReadPendingRecordsFromDB" >> read_from_db >>>>>>>> | "Parse input PCollection" >> >>>>>>>> beam.Map(ProcessBillRequests.parse_bill_data_requests) >>>>>>>> | "Fetch bills " >> >>>>>>>> beam.ParDo(ProcessBillRequests.FetchBillInformation()) >>>>>>>> ) >>>>>>>> >>>>>>>> pipeline.run().wait_until_finish() >>>>>>>> >>>>>>>> Is there a way I can set the environmental variables in custom >>>>>>>> options available in the worker? >>>>>>>> >>>>>>>> Thanks & Regards, >>>>>>>> Sumit Desai >>>>>>>> >>>>>>>
