What about a StatefulDoFn where you append the value(s) in a bag state as you see them?
If you need to seed the state information, you could do a one time lookup in processElement for each key to HBase if the key hasn't yet been seen (storing the fact that you loaded the data in a boolean) but afterwards you would rely on reading the value(s) from the bag state. processElement(...) { Value newValue = ... Iterable<Value> values; if (!hasSeenKeyBooleanValueState.read()) { values = ... load from HBase ... valuesBagState.append(values); hasSeenKeyBooleanValueState.set(true); } else { values = valuesBagState.read(); } ... perform processing using values ... valuesBagState.append(newValue); } This blog post[1] has a good example. 1: https://beam.apache.org/blog/2017/02/13/stateful-processing.html On Mon, Dec 3, 2018 at 12:48 PM Chandan Biswas <pela.chan...@gmail.com> wrote: > Hello All, > I have a use case where I have PCollection<KV<Key,Value>> data coming from > Kafka source. When processing each record (KV<Key,Value>) I need all old > values for that Key stored in a hbase table. The naive approach is to do > HBase lookup in the DoFn.processElement. I considered sideinput but it' not > going to work because of large dataset. Any suggestion? > > Thanks, > Chandan >