Hi, I have a question about self-joining two bags. I have some set of
numbers that describes connections between the first set of integers and
the second set of integers. For example:

1,2
3,4
5,6
5,7
6,8

I then load my data as follows, and group it:

data = load 'data.csv' as integer_1, integer_2;
grouped = group data by integer_1;

grouped_numbers = foreach grouped generate group as node,
data.integer_2 as connection;

Which then yields a bag with each first integer and its first-degree
connections:

(1,{(2)})
(3,{(4)})
(5,{(6),(7)})
(6,{(8)})

I would then like to do a self-join of the grouped_numbers bag, in order to
give the resultant first integer with each of its first- and second-degree
connections. In this case, that would be:

(1,{(2)})
(3,{(4)})
(5,{(6),(7),(8)})
(6,{(8)})

because 5 is connected to 6, which is connected to 8, so 8 is a
second-degree connection of 6. Is there a way to implement this in Pig?


Best,


Rowan

Reply via email to