[ https://issues.apache.org/jira/browse/SPARK-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-19287. ------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22690 [https://github.com/apache/spark/pull/22690] > JavaPairRDD flatMapValues requires function returning Iterable, not Iterator > ---------------------------------------------------------------------------- > > Key: SPARK-19287 > URL: https://issues.apache.org/jira/browse/SPARK-19287 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.1.1 > Reporter: Sean Owen > Assignee: Sean Owen > Priority: Minor > Labels: release-notes > Fix For: 3.0.0 > > > SPARK-3369 corrected an old oversight in the Java API, wherein > {{FlatMapFunction}} required an {{Iterable}} rather than {{Iterator}}. As > reported by [~akrim], it seems that this same type of problem was overlooked > also in {{JavaPairRDD}} > (https://github.com/apache/spark/blob/6c00c069e3c3f5904abd122cea1d56683031cca0/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677 > ): > {code} > def flatMapValues[U](f: JFunction[V, java.lang.Iterable[U]]): JavaPairRDD[K, > U] = > {code} > As in {{PairRDDFunctions.scala}}, whose {{flatMapValues}} operates on > {{TraversableOnce}}, this should really take a function that returns an > {{Iterator}} -- really, {{FlatMapFunction}}. > We can easily add an overload and deprecate the existing method. > {code} > def flatMapValues[U](f: FlatMapFunction[V, U]): JavaPairRDD[K, U] > {code} > This is source- and binary-backwards-compatible, in Java 7. It's > binary-backwards-compatible in Java 8, but not source-compatible. The > following natural usage with Java 8 lambdas becomes ambiguous and won't > compile -- Java won't figure out which to implement even based on the return > type unfortunately: > {code} > JavaPairRDD<Integer, String> pairRDD = ... > JavaPairRDD<Integer, Integer> mappedRDD = > pairRDD.flatMapValues(s -> Arrays.asList(s.length()).iterator()); > {code} > It can be resolved by explicitly casting the lambda. > We can at least document this. One day in Spark 3.x this can just be changed > outright. > It's conceivable to resolve this by making the new method called > "flatMapValues2" or something ugly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org