Hi, You can try using FULL OUTER JOIN on (rank).. if the number of distinct id's are small.
For example, All rows that have a Id of 12324 will be in one relation and rows that have an id of 12325 will be in another relation. and a FULL OUTER JOIN on those two relations would give what you need. Thanks, Kannappan On Nov 7, 2013, at 4:08 PM, Siddhi Borkar <[email protected]> wrote: > > Hi, > > Please ignore my previous mail. > We have the following sample data which has to be transformed into a output > format using pig script > > Id rank Value > 12324 1 1582 > 12324 2 1142 > 12324 4 1292 > 12324 5 1134 > 12325 1 1582 > 12325 2 1142 > 12325 3 1292 > 12325 4 1134 > 12325 5 1183 > 12326 1 1582 > 12326 2 1142 > 12326 3 1292 > 12326 4 1134 > 12326 5 1183 > > We need to compare the values (of the value column) per rank for each id. > The output needs to be generated in the following format > > > Id1 Id2 > value_rank1 value_rank1 > value_rank2 value_rank2 > value_rank3 value_rank3 > ... ...... > value_rankn value_rankn > > > > For e.g. > > 12324 12325 > 1582 1582 > 1142 1142 > 1292 > 1292 1134 > 1134 1183 > > There has to be a blank value for any missing rank for a particular id. > > Is there any way to achieve this? > > Thanks, > Siddhi
