I number of the problems I want to work with generate datasets which are too large to hold in memory. This becomes an issue when building a FlatMapFunction and also when the data used in combineByKey cannot be held in memory.
The following is a simple, if a little silly, example of a FlatMapFunction returning maxMultiples multiples of a long. It works well for maxMultiples = 1000 but what happens if maxMultiples = 10 Billion. The issue is that call cannot return a List or any other structure which is held in memory. What can it return or is there another way to do this?? public static class GenerateMultiplesimplements FlatMapFunction<String, String> { private final long maxMultiples; public GenerateMultiplesimplements (final long maxMultiples ) { this,maxMultiples = maxMultiples ; } public Iterable<Long> call(Long l) { List<Long> holder = new ArrayList<Long>(); for (long factor = 1; factor < maxMultiples; factor++) { holder.add(new Long(l * factor); } return holder; } }