Hi Angel, how about to use this :
k.filter(k("WT_ID") as a val variable? i think you can avoid that and do not forget to use System.nanoTime to know the profit... Alonso Isidoro Roman. Mis citas preferidas (de hoy) : "Si depurar es el proceso de quitar los errores de software, entonces programar debe ser el proceso de introducirlos..." - Edsger Dijkstra My favorite quotes (today): "If debugging is the process of removing software bugs, then programming must be the process of putting ..." - Edsger Dijkstra "If you pay peanuts you get monkeys" 2016-04-19 9:46 GMT+02:00 Angel Angel <areyouange...@gmail.com>: > Hello, > > I am writing the one spark application, it runs well but takes long > execution time can anyone help me to optimize my query to increase the > processing speed. > > > I am writing one application in which i have to construct the histogram > and compare the histograms in order to find the final candidate. > > > My code in which i read the text file and matches the first field and > subtract the second fild from the matched candidates and update the table. > > Here is my code, Please help me to optimize it. > > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > > > import sqlContext.implicits._ > > > val Array_Ele = > sc.textFile("/root/Desktop/database_200/patch_time_All_20_modified_1.txt").flatMap(line=>line.split(" > ")).take(900) > > > val df1= > sqlContext.read.parquet("hdfs://hadoopm0:8020/tmp/input1/database_modified_No_name_400.parquet") > > > var k = df1.filter(df1("Address").equalTo(Array_Ele(0) )) > > var a= 0 > > > for( a <-2 until 900 by 2){ > > k=k.unionAll( > df1.filter(df1("Address").equalTo(Array_Ele(a))).select(df1("Address"),df1("Couple_time")-Array_Ele(a+1),df1("WT_ID")))} > > > k.cache() > > > val WT_ID_Sort = k.groupBy("WT_ID").count().sort(desc("count")) > > > val temp = WT_ID_Sort.select("WT_ID").rdd.map(r=>r(0)).take(10) > > > val Table0= > k.filter(k("WT_ID").equalTo(temp(0))).groupBy("Couple_time").count().select(max($"count")).show() > > val Table1= > k.filter(k("WT_ID").equalTo(temp(1))).groupBy("Couple_time").count().select(max($"count")).show() > > val Table2= > k.filter(k("WT_ID").equalTo(temp(2))).groupBy("Couple_time").count().select(max($"count")).show() > > val Table3= > k.filter(k("WT_ID").equalTo(temp(3))).groupBy("Couple_time").count().select(max($"count")).show() > > val Table4= > k.filter(k("WT_ID").equalTo(temp(4))).groupBy("Couple_time").count().select(max($"count")).show() > > val Table5= > k.filter(k("WT_ID").equalTo(temp(5))).groupBy("Couple_time").count().select(max($"count")).show() > > val Table6= > k.filter(k("WT_ID").equalTo(temp(6))).groupBy("Couple_time").count().select(max($"count")).show() > > val Table7= > k.filter(k("WT_ID").equalTo(temp(7))).groupBy("Couple_time").count().select(max($"count")).show() > > val Table8= > k.filter(k("WT_ID").equalTo(temp(8))).groupBy("Couple_time").count().select(max($"count")).show() > > > > val Table10= > k.filter(k("WT_ID").equalTo(temp(10))).groupBy("Couple_time").count().select(max($"count")).show() > > > val Table11= > k.filter(k("WT_ID").equalTo(temp(11))).groupBy("Couple_time").count().select(max($"count")).show() > > > and last one how can i compare the all this tables to find the maximum > value. > > > > > Thanks, > > >