Re: iterative processing

2014-06-18 Thread Stephan Ewen
Yes, I think you can few graphs and sparse matrices very similar. Flink can do graphs (and sparse matrices) quite well. It has support for "stateful" iterations what help a lot with graphs. There are also some custom operators that allow you to write code in a similar form as the GAS (gather apply

Re: iterative processing

2014-06-17 Thread Yingjun Wu
; > Till > > > On Tue, Jun 17, 2014 at 3:39 PM, Yingjun Wu > wrote: > > > Hi Stephan, > > > > Thanks for your quick reply. > > > > Let's just consider a simple iterative processing algorithm, say, > pagerank. > > If we wanna compute t

Re: iterative processing

2014-06-17 Thread Yingjun Wu
t; you can implement a matrix vector multiplication with Flink. >> >> I guess that he missing vector vector addition is now easy to implement. >> The last thing to do is to use this dataflow as a building block for your >> iteration and then you are done. If I should have miss

Re: iterative processing

2014-06-17 Thread Till Rohrmann
do is to use this dataflow as a building block for your iteration and then you are done. If I should have missed your point, then let me know. Best, Till On Tue, Jun 17, 2014 at 3:39 PM, Yingjun Wu wrote: > Hi Stephan, > > Thanks for your quick reply. > > Let's just conside

Re: iterative processing

2014-06-17 Thread Sebastian Schelter
The internal state of pagerank is just the rank vector which would be 100k times 1. --sebastian Am 17.06.2014 17:56 schrieb "Yingjun Wu" : > Hi Stephan, > > Thanks for your quick reply. > > Let's just consider a simple iterative processing algorithm, say, pager

Re: iterative processing

2014-06-17 Thread Kostas Tzoumas
gt; Let's just consider a simple iterative processing algorithm, say, pagerank. > If we wanna compute the pagerank value for 100k webpages, then the internal > state, if represent the graph as matrix, should be at least 100k * 100k * 8 > bytes=74GB, which obviously exceeds the memo

Re: iterative processing

2014-06-17 Thread Yingjun Wu
Hi Stephan, Thanks for your quick reply. Let's just consider a simple iterative processing algorithm, say, pagerank. If we wanna compute the pagerank value for 100k webpages, then the internal state, if represent the graph as matrix, should be at least 100k * 100k * 8 bytes=74GB, which obvi

Re: iterative processing

2014-06-17 Thread Stephan Ewen
Hi Yingjun! Thanks for pointing this out. I would like to learn a bit more about the problem you have, so let me ask you a few questions to make sure I understand the matter in more detail... I assume you are thinking abut programs that run a single reduce function in the end, "all reduce" , rath

iterative processing

2014-06-17 Thread Yingjun Wu
Dear all, In general case, iterative processing jobs usually contains one reduce task and multiple parallel processing tasks. In some cases, the state size in reduce task may exceeds the memory size, and it seems that flink directly goes to out-of-core mode. I am wondering whether it is