Hey Matt, in general, Flink doesn't put too much work in co-locating sources (doesn't happen for Kafka, etc. either). I think the only local assignments happen in the DataSet API for files in HDFS.
Often this is of limited help anyways. Your approach sounds like it could work, but I would generally not recommend such custom solutions if you don't really need it. Have you tried running your program with remote reads? What's the bottleneck for you there? – Ufuk On Mon, Apr 24, 2017 at 11:47 AM, Matt <dromitl...@gmail.com> wrote: > Hi all, > > I've been playing around with Apache Ignite and I want to run Flink on top > of it but there's something I'm not getting. > > Ignite has its own support for clustering, and data is distributed on > different nodes using a partitioned key. Then, we are able to run a closure > and do some computation on the nodes that owns the data (collocation of > computation [1]), that way saving time and bandwidth. It all looks good, but > I'm not sure how it would play with Flink's own clustering capability. > > My initial idea -which I haven't tried yet- is to use collocation to run a > closure where the data resides, and use that closure to execute a Flink > pipeline locally on that node (running it using a local environment), then > using a custom made data source I should be able to plug the data from the > local Ignite cache to the Flink pipeline and back into a cache using an > Ignite sink. > > I'm not sure it's a good idea to disable Flink distribution and running it > in a local environment so the data is not transferred to another node. I > think it's the same problem with Kafka, if it partitions the data on > different nodes, how do you guarantee that Flink jobs are executed where the > data resides? In case there's no way to guarantee that unless you enable > local environment, what do you think of that approach (in terms of > performance)? > > Any additional insight regarding stream processing on Ignite or any other > distributed storage is very welcome! > > Best regards, > Matt > > [1] https://apacheignite.readme.io/docs/collocate-compute-and-data