Thanks Ayan and NIcholas for your jetfast reply ! Appreciate it a lot.
Cheers,
Debu
On Fri, Oct 13, 2017 at 9:27 AM, ayan guha wrote:
> Quick pyspark code:
>
> >>> s = "ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730"
> >>> base = sc.parallelize([s.split("|")])
> >>> base.take(10)
Quick pyspark code:
>>> s = "ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730"
>>> base = sc.parallelize([s.split("|")])
>>> base.take(10)
[['ABZ', 'ABZ', 'AF', '2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y', '1,2,3,4,5',
'730']]
>>> def pv(t):
... x = t[3].split(",")
... y = t[4].spli
Using explode on the 4th column, followed by an explode on the 5th column
would produce what you want (you might need to use split on the columns
first if they are not already an array).
Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health
nicholas.hakob...@rallyhealth.com
On Thu,
Hi,
Greetings !
I am having data in the format of the following row:
ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730
I want to convert it into several rows in the format below:
ABZ|ABZ|AF|2|1|730
ABZ|ABZ|AF|3+1|730
.
.
.
ABZ|ABZ|AF|3|1|730
ABZ|ABZ|AF|3|2|730
ABZ|ABZ|AF|3|3|