Side Inputs size

augusto . mcc Mon, 08 Apr 2019 10:30:59 -0700

Hi,

In one of my transforms I am using Map which is the result of a previous 
transform as a sideInput. This Map<String, Int>  is potentially very large with 
count of all words that appeared in all documents.


The step that uses the sideInput is quite slow because it seems like it is 
initialising a huge Hashmap for every element it processes (I followed this 
example https://beam.apache.org/documentation/programming-guide/#side-inputs)

Is this the wrong way of using sideInputs? And by this I mean, can a sideInput 
be too big to be a sideInput? I also thought about saving the sideInput as a 
static class variable, then in principle I only have to read it once per 
"transform" initialised in the cluster.

Am I going totally wrong about this, should I try other approaches?

Best regards,
Augusto

Side Inputs size

Reply via email to