Sharding of Operators

Tripathi,Vikash Wed, 17 Feb 2021 08:09:39 -0800

Hi there,

I wanted to know how re-partitioning of keys per operator instance would happen 
when the current operator instances are scaled up or down and we are restarting 
our job from a previous savepoint which had a different number of parallel 
instances of the same operator.


My main concern is whether the re-distribution would lead to mapping of same 
keys to same operator instances as was done earlier but if this happens then 
there would be no added advantage of adding new task slots for the same 
operator because they would remain less used or not used at all if all possible 
key values have been seen earlier and if we go by the other way around of 
evenly distributing out keys (based on the hash function) to the new parallel 
slots as well, won't this cause issues in terms of processing consistent 
results based on the state of operator as was provided by previous savepoint of 
application.

Is there a guarantee given by the hash function as in attached snippet, that 
same keys which landed earlier on an operator instance will land back again to 
the same operator instance once the job is restarted with new set of 
parallelism configuration?

Thanks,
Vikash



CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Sharding of Operators

Reply via email to