Hi Robert,
Thanks for the input. I did increase the amount of managed memory, and
confirmed that both SSDs (on each slave) are being used for temp data.
I haven’t been able to figure out why the server CPU usage is low, but I did
notice that it fluctuated from very low (10%) on up to 95+%,
Hi Ken,
Some random ideas that pop up in my head:
- make sure you use data types that are efficient to serialize, and cheap
to compare (ideally use primitive types in TupleN or POJOs)
- Maybe try the TableAPI batch support (if you have time to experiment).
- optimize memory usage on the
Hi all,
I added a CoGroup to my batch job, and it’s now running much slower, primarily
due to back pressure from the CoGroup operator.
I assume it’s because this operator is having to sort/buffer-to-disk all
incoming data. Looks like about 1TB from one side of the join, currently very
little