Hi Jan,
Let me add a few more details to show the full picture. We have live
datastreams (video analysis metadata) and we would like to run both live
and historic pipelines on the metadata (eg.: live alerts, historic video
searches). We planned to use kafka to store the streaming data and
directly run both types of queries on top. You are suggesting to consider
having kafka with small retention to server the live queries and store the
historic data somewhere else which scales better for historic queries? We
need to have on prem options here. What options should we consider that
scales nicely (in terms of IO parallelization) with beam? (eg. hdfs?)
Thank you,
Gyorgy

On Mon, Jul 1, 2024 at 9:21 AM Jan Lukavský <je...@seznam.cz> wrote:

> H Gyorgy,
>
> I don't think it is possible to co-locate tasks as you describe it. Beam
> has no information about location of 'splits'. On the other hand, if batch
> throughput is the main concern, then reading from Kafka might not be the
> optimal choice. Although Kafka provides tiered storage for offloading
> historical data, it still somewhat limits scalability (and thus
> throughput), because the data have to be read by a broker and only then
> passed to a consumer. The parallelism is therefore limited by the number of
> Kafka partitions and not parallelism of the Flink job. A more scalable
> approach could be to persist data from Kafka to a batch storage (e.g. S3 or
> GCS) and reprocess it from there.
>
> Best,
>
>  Jan
> On 6/29/24 09:12, Balogh, György wrote:
>
> Hi,
> I'm planning a distributed system with multiple kafka brokers co located
> with flink workers.
> Data processing throughput for historic queries is a main KPI. So I want
> to make sure all flink workers read local data and not remote. I'm defining
> my pipelines in beam using java.
> Is it possible? What are the critical config elements to achieve this?
> Thank you,
> Gyorgy
>
> --
>
> György Balogh
> CTO
> E gyorgy.bal...@ultinous.com <zsolt.sala...@ultinous.com>
> M +36 30 270 8342 <+36%2030%20270%208342>
> A HU, 1117 Budapest, Budafoki út 209.
> W www.ultinous.com
>
>

-- 

György Balogh
CTO
E gyorgy.bal...@ultinous.com <zsolt.sala...@ultinous.com>
M +36 30 270 8342 <+36%2030%20270%208342>
A HU, 1117 Budapest, Budafoki út 209.
W www.ultinous.com

Reply via email to