Re: Optimal mechanism for loading data into KubernetesPodOperator

Daniel Standish Fri, 04 Mar 2022 10:18:41 -0800

Where is the data coming from?  Can you refactor your task so that it reads
data from cloud storage and pushes it into ES?  Rather than taking the data
as an arg to the task.  So instead, your arg is
`s3://blah-bucket/blah-file.ndjson.gz` or something.


On Fri, Mar 4, 2022 at 10:12 AM Lewis John McGibbney <lewi...@apache.org>
wrote:

> For example this is the message we get
>
> HTTP response body:
> {
>     "kind": "Status",
>     "apiVersion": "v1",
>     "metadata": {},
>     "status": "Failure",
>     "message": "rpc error: code = ResourceExhausted desc = grpc: trying to
> send message larger than max (11652891 vs. 2097152)",
>     "code": 500
> }
>
> I know that this indicates we have exceeded data volume however I am still
> curious to hear which architectural approach is 'better'.
>
> Thanks for any assistance.
> lewismc
>
> On 2022/03/04 18:01:30 lewis john mcgibbney wrote:
> > Hi users,
> > We are using the KubernetesPodOperator to isolate some code which writes
> > data into a VERY OLD Elasticsearch 2.X cluster. Please don't make fun of
> me
> > for this!!!
> > We are wondering, does a recommended practice exists for processing
> (JSON)
> > data within the KubernetesPodOperator?
> > Currently, we've experimented with passing various volumes of JSON string
> > data to the KubernetesPodOperator 'argument' parameter. This works for
> > reasonably small record batches such as 100's but fails for >10k's
> records.
> > Should we be using a custom XCom backend to pull data into the container
> > rather than push it via 'arguments'?
> > Thank you
> > lewismc
> >
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
> >
>

Re: Optimal mechanism for loading data into KubernetesPodOperator

Reply via email to