Hello, I am using Spark with Scala and I am attempting to understand the different filtering and mapping capabilities available. I haven't found an example of the specific task I would like to do.
I am trying to read in a tab spaced text file and filter specific entries. I would like this filter to be applied to different "columns" and not lines. I was using the following to split the data but attempts to filter by "column" afterwards are not working. ----------------------------- val data = sc.textFile("test_data.txt") var parsedData = data.map( _.split("\t").map(_.toString)) ------------------------------ To try to give a more concrete example of my goal, Suppose the data file is: A1 A2 A3 A4 B1 B2 A3 A4 C1 A2 C2 C3 How would I filter the data based on the second column to only return those entries which have A2 in column two? So, that the resulting RDD would just be: A1 A2 A3 A4 C1 A2 C2 C3 Is there a convenient way to do this? Any suggestions or assistance would be appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Manipulation-in-Scala-tp2285.html Sent from the Apache Spark User List mailing list archive at Nabble.com.