Hey
 sure... it's  a crap script :).. just an ordinary dataflow script

https://github.com/mmistroni/GCP_Experiments/tree/master/dataflow/edgar_flow


What i meant to say , for your template question, is for you to write a
basic script which run on bean... something as simple as this

https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/beam_test.py

and then you can create a template out of it by just running this

python -m edgar_main  --runner=dataflow --project=datascience-projets
--template_location=gs://mm_dataflow_bucket/templates/edgar_dataflow_template
--temp_location=gs://mm_dataflow_bucket/temp
--staging_location=gs://mm_dataflow_bucket/staging

That will create a template 'edgar_dataflow_template' which you can use in
GCP dataflow console to create your job.

hth, i m sort of a noob to Beam, having started writing code just over a
month ago. Feel free to ping me if u get stuck

kind regards
 Marco












On Sat, Apr 4, 2020 at 6:01 PM Xander Song <iamuuriw...@gmail.com> wrote:

> Hi Marco,
>
> Thanks for your response. Would you mind sending the edgar_main script so
> I can take a look?
>
> On Sat, Apr 4, 2020 at 2:25 AM Marco Mistroni <mmistr...@gmail.com> wrote:
>
>> Hey
>>  As far as I know you can generate a dataflow template out of your beam
>> code by specifying an option on command line?
>> I am running this CMD and once template is generated I kick off a dflow
>> job via console by pointing at it
>>
>> python -m edgar_main --runner=dataflow --project=datascience-projets
>> --template_location=gs://<your bucket> Hth
>>
>>
>> On Sat, Apr 4, 2020, 9:52 AM Xander Song <iamuuriw...@gmail.com> wrote:
>>
>>> I am attempting to write a custom Dataflow Template using the Apache
>>> Beam Python SDK, but am finding the documentation difficult to follow. Does
>>> anyone have a minimal working example of how to write and deploy such a
>>> template?
>>>
>>> Thanks in advance.
>>>
>>

Reply via email to