To enrich your data, have you checked https://cloud.google.com/dataflow/docs/guides/enrichment?
This transform is built on top of https://beam.apache.org/documentation/io/built-in/webapis/ On Fri, Apr 12, 2024 at 4:38 PM Ruben Vargas <ruben.var...@metova.com> wrote: > On Fri, Apr 12, 2024 at 2:17 PM Jaehyeon Kim <dott...@gmail.com> wrote: > > > > Here is an example from a book that I'm reading now and it may be > applicable. > > > > JAVA - (id.hashCode() & Integer.MAX_VALUE) % 100 > > PYTHON - ord(id[0]) % 100 > > Maybe this is what I'm looking for. I'll give it a try. Thanks! > > > > > On Sat, 13 Apr 2024 at 06:12, George Dekermenjian <ged1...@gmail.com> > wrote: > >> > >> How about just keeping track of a buffer and flush the buffer after 100 > messages and if there is a buffer on finish_bundle as well? > >> > >> > > If this is in memory, It could lead to potential loss of data. That is > why the state is used or at least that is my understanding. but maybe > there is a way to do this in the state? > > > >> On Fri, Apr 12, 2024 at 21.23 Ruben Vargas <ruben.var...@metova.com> > wrote: > >>> > >>> Hello guys > >>> > >>> Maybe this question was already answered, but I cannot find it and > >>> want some more input on this topic. > >>> > >>> I have some messages that don't have any particular key candidate, > >>> except the ID, but I don't want to use it because the idea is to > >>> group multiple IDs in the same batch. > >>> > >>> This is my use case: > >>> > >>> I have an endpoint where I'm gonna send the message ID, this endpoint > >>> is gonna return me certain information which I will use to enrich my > >>> message. In order to avoid fetching the endpoint per message I want to > >>> batch it in 100 and send the 100 IDs in one request ( the endpoint > >>> supports it) . I was thinking on using GroupIntoBatches. > >>> > >>> - If I choose the ID as the key, my understanding is that it won't > >>> work in the way I want (because it will form batches of the same ID). > >>> - Use a constant will be a problem for parallelism, is that correct? > >>> > >>> Then my question is, what should I use as a key? Maybe something > >>> regarding the timestamp? so I can have groups of messages that arrive > >>> at a certain second? > >>> > >>> Any suggestions would be appreciated > >>> > >>> Thanks. >