Yes, the problem is that the Java API inadvertently requires an
Iterable return value, not an Iterator:
https://issues.apache.org/jira/browse/SPARK-3369 I think this can't be
fixed until Spark 2.x.

It seems possible to cheat and return a wrapper like the
"IteratorIterable" I posted in the JIRA. You can return an Iterator
instead this way, and as long as Spark happens to consume it only
once, it will work fine. I don't know if this is guaranteed but seems
to be the case anecdotally.

On Thu, Oct 2, 2014 at 2:01 AM, Steve Lewis <lordjoe2...@gmail.com> wrote:
>   I number of the problems I want to work with generate datasets which are
> too large to hold in memory. This becomes an issue when building a
> FlatMapFunction and also when the data used in combineByKey cannot be held
> in memory.
>
>    The following is a simple, if a little silly, example of a
> FlatMapFunction returning maxMultiples multiples of a long. It works well
> for maxMultiples = 1000 but what happens if maxMultiples = 10 Billion.
>    The issue is that call cannot return a List or any other structure which
> is held in memory. What can it return or is there another way to do this??
>
>   public static class GenerateMultiplesimplements FlatMapFunction<String,
> String> {
>         private final long maxMultiples;
>
>         public GenerateMultiplesimplements (final long maxMultiples ) {
>             this,maxMultiples = maxMultiples ;
>         }
>
>         public Iterable<Long> call(Long l) {
>               List<Long> holder = new ArrayList<Long>();
>             for (long factor = 1; factor < maxMultiples; factor++) {
>                 holder.add(new Long(l * factor);
>             }
>             return holder;
>         }
>     }
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to