Well you can do another map to turn Optional<String> into String as in the cases when Optional is empty you can store e.g. “NULL” as the value of the RDD element
If this is not acceptable (based on the objectives of your architecture) and IF when returning plain null instead of Optional does throw Spark exception THEN as far as I am concerned, chess-mate From: Steve Lewis [mailto:lordjoe2...@gmail.com] Sent: Sunday, April 19, 2015 8:16 PM To: Evo Eftimov Cc: Olivier Girardot; user@spark.apache.org Subject: Re: Can a map function return null So you imagine something like this: JavaRDD<String> words = ... JavaRDD< Optional<String>> wordsFiltered = words.map(new Function<String, Optional<String>>() { @Override public Optional<String> call(String s) throws Exception { if ((s.length()) % 2 == 1) // drop strings of odd length return Optional.empty(); else return Optional.of(s); } }); That seems to return the wrong type a JavaRDD< Optional<String>> which cannot be used as a JavaRDD<String> which is what the next step expects On Sun, Apr 19, 2015 at 12:17 PM, Evo Eftimov <evo.efti...@isecc.com> wrote: I am on the move at the moment so i cant try it immediately but from previous memory / experience i think if you return plain null you will get a spark exception Anyway yiu can try it and see what happens and then ask the question If you do get exception try Optional instead of plain null Sent from Samsung Mobile -------- Original message -------- From: Olivier Girardot Date:2015/04/18 22:04 (GMT+00:00) To: Steve Lewis ,user@spark.apache.org Subject: Re: Can a map function return null You can return an RDD with null values inside, and afterwards filter on "item != null" In scala (or even in Java 8) you'd rather use Option/Optional, and in Scala they're directly usable from Spark. Exemple : sc.parallelize(1 to 1000).flatMap(item => if (item % 2 ==0) Some(item) else None).collect() res0: Array[Int] = Array(2, 4, 6, ....) Regards, Olivier. Le sam. 18 avr. 2015 à 20:44, Steve Lewis <lordjoe2...@gmail.com> a écrit : I find a number of cases where I have an JavaRDD and I wish to transform the data and depending on a test return 0 or one item (don't suggest a filter - the real case is more complex). So I currently do something like the following - perform a flatmap returning a list with 0 or 1 entry depending on the isUsed function. JavaRDD<Foo> original = ... JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() { @Override public Iterable<Foo> call(final Foo s) throws Exception { List<Foo> ret = new ArrayList<Foo>(); if(isUsed(s)) ret.add(transform(s)); return ret; // contains 0 items if isUsed is false } }); My question is can I do a map returning the transformed data and null if nothing is to be returned. as shown below - what does a Spark do with a map function returning null JavaRDD<Foo> words = original.map(new MapFunction<String, String>() { @Override Foo call(final Foo s) throws Exception { List<Foo> ret = new ArrayList<Foo>(); if(isUsed(s)) return transform(s); return null; // not used - what happens now } }); -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com