Hi Daniel,
Thanks for engaging me on this one.

On 2022/03/04 18:17:58 Daniel Standish wrote:
> Where is the data coming from?  

>From a previous Airflow task. 

Task A generate arbitrary JSON data
Task B process arbitrary JSON data in KubernetesPodOperator. Again, we use 
KubernetesPodOperator because we are trying to isolate this legacy feature and 
'manage' it separately from other tasks.

> Can you refactor your task so that it reads
> data from cloud storage and pushes it into ES?  Rather than taking the data
> as an arg to the task.  So instead, your arg is
> `s3://blah-bucket/blah-file.ndjson.gz` or something.

Absolutely. We wrote a custom AWS S3 XCom backend to do exactly that.

Is that your guidance?

Thank you
lewismc

Reply via email to