Hi, I have a question about self-joining two bags. I have some set of
numbers that describes connections between the first set of integers and
the second set of integers. For example:
1,2
3,4
5,6
5,7
6,8
I then load my data as follows, and group it:
data = load 'data.csv' as integer_1, integer_2;
grouped = group data by integer_1;
grouped_numbers = foreach grouped generate group as node,
data.integer_2 as connection;
Which then yields a bag with each first integer and its first-degree
connections:
(1,{(2)})
(3,{(4)})
(5,{(6),(7)})
(6,{(8)})
I would then like to do a self-join of the grouped_numbers bag, in order to
give the resultant first integer with each of its first- and second-degree
connections. In this case, that would be:
(1,{(2)})
(3,{(4)})
(5,{(6),(7),(8)})
(6,{(8)})
because 5 is connected to 6, which is connected to 8, so 8 is a
second-degree connection of 6. Is there a way to implement this in Pig?
Best,
Rowan