I have 2 paraquet files with format e.g name , age, town I read them and then join them to get all the names which are in both towns . the resultant dataset is
res4: Array[org.apache.spark.sql.Row] = Array([name1, age1, town1,name2,age2,town2]....) Name 1 and name 2 are same as I am joining . Now , I want to get only to the format (name , age1, age2) But I cant seem to getting to manipulate the spark.sql.row. Trying something like map(_.split (",")).map(a=> (a(0), a(1).trim().toInt)) does not work . Can you suggest a way ? Thanks -R