Hi,

In one of my transforms I am using Map which is the result of a previous 
transform as a sideInput. This Map<String, Int>  is potentially very large with 
count of all words that appeared in all documents. 

The step that uses the sideInput is quite slow because it seems like it is 
initialising a huge Hashmap for every element it processes (I followed this 
example https://beam.apache.org/documentation/programming-guide/#side-inputs)

Is this the wrong way of using sideInputs? And by this I mean, can a sideInput 
be too big to be a sideInput? I also thought about saving the sideInput as a 
static class variable, then in principle I only have to read it once per 
"transform" initialised in the cluster.

Am I going totally wrong about this, should I try other approaches?

Best regards,
Augusto


Reply via email to