On 5/12/2016 10:01 PM, Holden Karau wrote:This is not the expected behavior, can you maybe post the code where you are running into this? Hello, thanks for replying! Below is the function I took out from the code. def converter(rdd: RDD[(String, JsValue)], param:String): RDD[(String, Int)] = { // I am breaking this down for future readability and ease of optimization // as a first attempt at solving this problem, I am not concerned with performance // and pretty, more with accuracy ;) // r1 will be an RDD containing only the "param" method of selection val r1:RDD[(String,JsValue)] = rdd.filter(x => (x._2 \ "field1" \ "field2").as[String].replace("\"","") == param.replace("\"","")) // r2 will be an RDD of Lists of fields (A1-Z1) with associated counts // remapFields returns a List[(String,Int)] val r2:RDD[List[(String,Int)]] = r1.map(x => remapFields(x._2 \ "extra")) // r3 will be flattened to enable grouping val r3:RDD[(String,Int)] = r2.flatMap(x => x) // now we can group by entity val r4:RDD[(String,Iterable[(String,Int)])] = r3.groupBy(x => x._1) // and produce a mapping of entity -> count pairs val r5:RDD[(String,Int)] = r4.map(x => (x._1, x._2.map(y => y._2).sum)) // return the result r5 }If I call on the above function and collectAsMap on the returned RDD, I get an empty Map(). If I copy/paste this code into the caller, I get the properly filled in Map. I am fairly new to Spark and Scala so excuse any inefficiencies - my priority was to be able to solve the problem in an obvious and correct way and worry about making it pretty later. Thanks! On Thursday, May 12, 2016, Dood@ODDO <oddodao...@gmail.com> wrote: --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org |
- Confused - returning RDDs from functions Dood
- Re: Confused - returning RDDs from functions Holden Karau
- Re: Confused - returning RDDs from functions Dood