I am new to Scala. I have a dataset with many columns, each column has a column name. Given several column names (these column names are not fixed, they are generated dynamically), I need to sum up the values of these columns. Is there an efficient way of doing this?
I worked out a way by using for loop, but I don't think it is efficient: val AllLabels = List("ID", "val1", "val2", "val3", "val4") val lbla = List("val1", "val3", "val4") val index_lbla = lbla.map(x => AllLabels.indexOf(x)) val dataRDD = sc.textFile("../test.csv").map(_.split(",")) dataRDD.map(x=> { var sum = 0.0 for (i <- 1 to index_lbla.length) sum = sum + x(i).toDouble sum } ).collect The test.csv looks like below (without column names): "ID", "val1", "val2", "val3", "val4" A, 123, 523, 534, 893 B, 536, 98, 1623, 98472 C, 537, 89, 83640, 9265 D, 7297, 98364, 9, 735 ... Your help is very much appreciated! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-sum-up-the-values-in-the-columns-of-a-dataset-in-Scala-tp21639.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org