Hey Matt,

in general, Flink doesn't put too much work in co-locating sources
(doesn't happen for Kafka, etc. either). I think the only local
assignments happen in the DataSet API for files in HDFS.

Often this is of limited help anyways. Your approach sounds like it
could work, but I would generally not recommend such custom solutions
if you don't really need it. Have you tried running your program with
remote reads? What's the bottleneck for you there?

– Ufuk

On Mon, Apr 24, 2017 at 11:47 AM, Matt <dromitl...@gmail.com> wrote:
> Hi all,
>
> I've been playing around with Apache Ignite and I want to run Flink on top
> of it but there's something I'm not getting.
>
> Ignite has its own support for clustering, and data is distributed on
> different nodes using a partitioned key. Then, we are able to run a closure
> and do some computation on the nodes that owns the data (collocation of
> computation [1]), that way saving time and bandwidth. It all looks good, but
> I'm not sure how it would play with Flink's own clustering capability.
>
> My initial idea -which I haven't tried yet- is to use collocation to run a
> closure where the data resides, and use that closure to execute a Flink
> pipeline locally on that node (running it using a local environment), then
> using a custom made data source I should be able to plug the data from the
> local Ignite cache to the Flink pipeline and back into a cache using an
> Ignite sink.
>
> I'm not sure it's a good idea to disable Flink distribution and running it
> in a local environment so the data is not transferred to another node. I
> think it's the same problem with Kafka, if it partitions the data on
> different nodes, how do you guarantee that Flink jobs are executed where the
> data resides? In case there's no way to guarantee that unless you enable
> local environment, what do you think of that approach (in terms of
> performance)?
>
> Any additional insight regarding stream processing on Ignite or any other
> distributed storage is very welcome!
>
> Best regards,
> Matt
>
> [1] https://apacheignite.readme.io/docs/collocate-compute-and-data

Reply via email to