Hi Carlos, You are quite correct that choosing the keys is important for work to be evenly distributed. The reason you need to have a KvCoder is that state is partitioned per key (to give natural & automatic parallelism) and window (to allow reclaiming expired state so you can process unbounded data with bounded storage, and also more parallelism). To a Beam runner, most data in the pipeline is "just bytes" that it cannot interpret. KvCoder is a special case where a runner knows the binary layout of encoded data so it can pull out the keys in order to shuffle data of the same key to the same place, so that is why it has to be a KvCoder.
Kenn On Mon, Feb 12, 2018 at 5:52 AM, Carlos Alonso <car...@mrcalonso.com> wrote: > I was refactoring my solution a bit and tried to make my stateful > transform to work on simple case classes and I got this exception: > https://pastebin.com/x4xADmvL . I'd like to understand the rationale > behind this as I think carefully choosing the keys would be very important > in order for the work to be properly distributed. > > Thanks! >