My use case: prim_id,secondary_id,value
There are million ids.. but 5 secondary ids.. But any secondary id is optional. For example: So.. secondary ids are say [alpha,beta,gamma,delta,kappa] 1,alpha,20 1,beta,22 1,gamma,25 2,alpha,1 2,delta,15 3,kappa,90 What I want is to get the following output 1,20,22,25,0,0 # since kappa and delta are not present 2,1,0,0,15,0 3,0,0,0,0,90 So basically flatten it out? How do i do this in pyspark. Thanks