write multiple outputs by key

patcharee Sat, 06 Jun 2015 08:08:12 -0700

Hi,

How can I write to multiple outputs for each key? I tried to createcustom partitioner or define the number of partition but does not work.There are only the few tasks/partitions (which equals to the number ofall key combination) gets large datasets, data is not splitting to alltasks/partition. The job failed as the few tasks handled too far largedatasets. Below is my code snippet.

val varWFlatRDD =varWRDD.map(FlatMapUtilClass().flatKeyFromWrf).groupByKey() //key are(zone, z, year, month)

    .foreach(
        x => {
          val z = x._1._1
          val year = x._1._2
          val month = x._1._3
          val df_table_4dim = x._2.toList.toDF()
          df_table_4dim.registerTempTable("table_4Dim")

hiveContext.sql("INSERT OVERWRITE table 4dim partition(zone=" + ZONE + ",z=" + z + ",year=" + year + ",month=" + month + ") " +"select date, hh, x, y, height, u, v, w, ph, phb, t, p, pb,qvapor, qgraup, qnice, qnrain, tke_pbl, el_pbl from table_4Dim");

})

From the spark history UI, at groupByKey there are > 1000 tasks (equalsto the parent's # partitions). at foreach there are > 1000 tasks aswell, but 50 tasks (same as the # all key combination) gets datasets.


How can I fix this problem? Any suggestions are appreciated.

BR,
Patcharee



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

write multiple outputs by key

Reply via email to