Hi Mich, Few comments:
When doing .filter(_ > “”) you’re actually doing a lexicographic comparison and not filtering for empty lines (which could be achieved with _.notEmpty or _.length > 0). I think that filtering with _.contains should be sufficient and the first filter can be omitted. As for line => line.split(“\t”).split(“,”): You have to do a second map or (since split() requires a regex as input) .split(“\t,”). The problem is that your first split() call will generate an Array and then your second call will result in an error. e.g. val lines: Array[String] = line.split(“\t”) lines.split(“,”) // Compilation error - no method split() exists for Array So either go with map(_.split(“\t”)).map(_.split(“,”)) or map(_.split(“\t,”)) Hope that helps. Eliran Bivas Data Team | iguaz.io<http://iguaz.io> On 3 Apr 2016, at 13:31, Mich Talebzadeh <mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote: Hi, I am not sure this is the correct approach Read a text file in val f = sc.textFile("/tmp/ASE15UpgradeGuide.txt") Now I want to get rid of empty lines and filter only the lines that contain "ASE15" f.filter(_ > "").filter(_ contains("ASE15")). The above works but I am not sure whether I need two filter transformation above? Can it be done in one? Now I want to map the above filter to lines with carriage return ans split them by "," f.filter(_ > "").filter(_ contains("ASE15")).map(line => (line.split("\t"))) res88: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[131] at map at <console>:30 Now I want to split the output by "," scala> f.filter(_ > "").filter(_ contains("ASE15")).map(line => (line.split("\t").split(","))) <console>:30: error: value split is not a member of Array[String] f.filter(_ > "").filter(_ contains("ASE15")).map(line => (line.split("\t").split(","))) ^ Any advice will be appreciated Thanks Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>