Re: How to flatten a row in PySpark

2017-10-13 Thread Debabrata Ghosh
Thanks Ayan and NIcholas for your jetfast reply ! Appreciate it a lot. Cheers, Debu On Fri, Oct 13, 2017 at 9:27 AM, ayan guha wrote: > Quick pyspark code: > > >>> s = "ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730" > >>> base = sc.parallelize([s.split("|")]) > >>> base.take(10)

Re: How to flatten a row in PySpark

2017-10-12 Thread ayan guha
Quick pyspark code: >>> s = "ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730" >>> base = sc.parallelize([s.split("|")]) >>> base.take(10) [['ABZ', 'ABZ', 'AF', '2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y', '1,2,3,4,5', '730']] >>> def pv(t): ... x = t[3].split(",") ... y = t[4].spli

Re: How to flatten a row in PySpark

2017-10-12 Thread Nicholas Hakobian
Using explode on the 4th column, followed by an explode on the 5th column would produce what you want (you might need to use split on the columns first if they are not already an array). Nicholas Szandor Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Thu,

How to flatten a row in PySpark

2017-10-12 Thread Debabrata Ghosh
Hi, Greetings ! I am having data in the format of the following row: ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730 I want to convert it into several rows in the format below: ABZ|ABZ|AF|2|1|730 ABZ|ABZ|AF|3+1|730 . . . ABZ|ABZ|AF|3|1|730 ABZ|ABZ|AF|3|2|730 ABZ|ABZ|AF|3|3|