Re: Optimal mechanism for loading data into KubernetesPodOperator

2022-03-07 Thread Eric Chiu
So in order to get around running airflow inside the KPO, we do the following: We know that the meta-database stores the XCOM data or in this case, because of the enabled custom xcom, stores the s3 filename. We know that to get to the xcom s3 file name we can query the xcom table based on the

Re: Optimal mechanism for loading data into KubernetesPodOperator

2022-03-04 Thread Daniel Standish
> Absolutely. We wrote a custom AWS S3 XCom backend to do exactly that. Well if you have it all working then what are we yabbering about :) I think custom XCom requires that your container is running airflow -- but you are using KPO so I don't know if that's true? Either that or it forces you

Re: Optimal mechanism for loading data into KubernetesPodOperator

2022-03-04 Thread Lewis John McGibbney
Hi Daniel, Thanks for engaging me on this one. On 2022/03/04 18:17:58 Daniel Standish wrote: > Where is the data coming from? >From a previous Airflow task. Task A generate arbitrary JSON data Task B process arbitrary JSON data in KubernetesPodOperator. Again, we use KubernetesPodOperator

Re: Optimal mechanism for loading data into KubernetesPodOperator

2022-03-04 Thread Daniel Standish
Where is the data coming from? Can you refactor your task so that it reads data from cloud storage and pushes it into ES? Rather than taking the data as an arg to the task. So instead, your arg is `s3://blah-bucket/blah-file.ndjson.gz` or something. On Fri, Mar 4, 2022 at 10:12 AM Lewis John

Re: Optimal mechanism for loading data into KubernetesPodOperator

2022-03-04 Thread Lewis John McGibbney
For example this is the message we get HTTP response body: { "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (11652891 vs. 2097152)", "code":