Re: Splitting columns from a text file

2016-09-05 Thread Gourav Sengupta
just use SPARK CSV, all other ways of splitting and working is just trying to reinvent the wheel and a magnanimous waste of time. Regards, Gourav On Mon, Sep 5, 2016 at 1:48 PM, Ashok Kumar wrote: > Hi, > > I have a text file as below that I read in > >

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
sc.textFile("filename").map(_.split(",")).filter(arr => arr.length == 3 && arr(2).toDouble > 50).collect this will give you a Array[Array[String]] do as you may wish with it. And please read through abt RDD On 5 Sep 2016 8:51 pm, "Ashok Kumar" wrote: > Thanks everyone. > >

Re: Splitting columns from a text file

2016-09-05 Thread Ashok Kumar
Thanks everyone. I am not skilled like you gentlemen This is what I did 1) Read the text file val textFile = sc.textFile("/tmp/myfile.txt") 2) That produces an RDD of String. 3) Create a DF after splitting the file into an Array  val df = textFile.map(line =>

Re: Splitting columns from a text file

2016-09-05 Thread ayan guha
Then, You need to refer third term in the array, convert it to your desired data type and then use filter. On Tue, Sep 6, 2016 at 12:14 AM, Ashok Kumar wrote: > Hi, > I want to filter them for values. > > This is what is in array > >

Re: Splitting columns from a text file

2016-09-05 Thread Fridtjof Sander
Ask yourself how to access the third element in an array in Scala. Am 05.09.2016 um 16:14 schrieb Ashok Kumar: Hi, I want to filter them for values. This is what is in array 74,20160905-133143,98.11218069128827594148 I want to filter anything > 50.0 in the third column Thanks On

Re: Splitting columns from a text file

2016-09-05 Thread Ashok Kumar
Hi,I want to filter them for values. This is what is in array 74,20160905-133143,98.11218069128827594148 I want to filter anything > 50.0 in the third column Thanks On Monday, 5 September 2016, 15:07, ayan guha wrote: Hi x.split returns an array. So, after first

Re: Splitting columns from a text file

2016-09-05 Thread ayan guha
Hi x.split returns an array. So, after first map, you will get RDD of arrays. What is your expected outcome of 2nd map? On Mon, Sep 5, 2016 at 11:30 PM, Ashok Kumar wrote: > Thank you sir. > > This is what I get > > scala> textFile.map(x=> x.split(",")) > res52:

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
Please have a look at the documentation for information on how to work with RDD. Start with this http://spark.apache.org/docs/latest/quick-start.html On 5 Sep 2016 7:00 pm, "Ashok Kumar" wrote: > Thank you sir. > > This is what I get > > scala> textFile.map(x=>

Re: Splitting columns from a text file

2016-09-05 Thread Ashok Kumar
Thank you sir. This is what I get scala> textFile.map(x=> x.split(","))res52: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[27] at map at :27 How can I work on individual columns. I understand they are strings scala> textFile.map(x=> x.split(",")).map(x => (x.getString(0))     |

Re: Splitting columns from a text file

2016-09-05 Thread Somasundaram Sekar
Basic error, you get back an RDD on transformations like map. sc.textFile("filename").map(x => x.split(",") On 5 Sep 2016 6:19 pm, "Ashok Kumar" wrote: > Hi, > > I have a text file as below that I read in > > 74,20160905-133143,98.11218069128827594148 >

Splitting columns from a text file

2016-09-05 Thread Ashok Kumar
Hi, I have a text file as below that I read in