Re: broadcast set size

2015-04-09 Thread Stephan Ewen
Hi Martin! You can use a broadcast join for that as well. You use it exactly like the usual join, but you write "joinWithTiny" or "joinWithLarge", depending on whether the data set that is the argument to the function is the small or the large one. The broadcast join internally also broadcasts th

Re: broadcast set size

2015-04-09 Thread Alexander Alexandrov
Hi Martin, The answer of your question really depends on the DOP in which you will be running the job and the expected selectivity (the fraction of lines with that certain ID) in case this does not depend on "the other side" and can be pre-filtered prior to broadcasting. However, since Flink's op

broadcast set size

2015-04-09 Thread Martin Neumann
Hej, Up to what sizes are broadcast sets a good idea? I have large dataset (~5 GB) and I'm only interested in lines with a certain ID that I have in a file. The file has ~10 k entries. I could either Join the dataset with the IDList or I could broadcast the ID list and do the filtering in a Ma