What about a StatefulDoFn where you append the value(s) in a bag state as
you see them?

If you need to seed the state information, you could do a one time lookup
in processElement for each key to HBase if the key hasn't yet been seen
(storing the fact that you loaded the data in a boolean) but afterwards you
would rely on reading the value(s) from the bag state.

processElement(...) {
  Value newValue = ...
  Iterable<Value> values;
  if (!hasSeenKeyBooleanValueState.read()) {
    values = ... load from HBase ...
    valuesBagState.append(values);
    hasSeenKeyBooleanValueState.set(true);
  } else {
    values = valuesBagState.read();
  }
  ... perform processing using values ...

   valuesBagState.append(newValue);
}

This blog post[1] has a good example.

1: https://beam.apache.org/blog/2017/02/13/stateful-processing.html

On Mon, Dec 3, 2018 at 12:48 PM Chandan Biswas <pela.chan...@gmail.com>
wrote:

> Hello All,
> I have a use case where I have PCollection<KV<Key,Value>> data coming from
> Kafka source. When processing each record (KV<Key,Value>) I need all old
> values for that Key stored in a hbase table. The naive approach is to do
> HBase lookup in the DoFn.processElement. I considered sideinput but it' not
> going to work because of large dataset. Any suggestion?
>
> Thanks,
> Chandan
>

Reply via email to