Joins in Hadoop

Ravikant Dindokar Tue, 23 Jun 2015 23:36:01 -0700

Hi Hadoop user,

I want to use hadoop for performing operation on graph data
I have two file :


1. Edge list file
        This file contains one line for each edge in the graph.
sample:
1    2 (here 1 is source and 2 is sink node for the edge)
1    5
2    3
4    2
4    3
5    6
5    4
5    7
7    8
8    9
8    10

2. Partition file :
         This file contains one line for each vertex. Each line has two
values first number is <vertex id> and second number is <partition id >
 sample : <vertex id>  <partition id >
2    1
3    1
4    1
5    2
6    2
7    2
8    1
9    1
10    1


The Edge list file is having size of 32Gb, while partition file is of 10Gb.
(size is so large that map/reduce can read only partition file . I have 20
node cluster with 24Gb memory per node.)

My aim is to get all vertices (along with their adjacency list )those
having same partition id in one reducer so that I can perform further
analytics on a given partition in reducer.

Is there any way in hadoop to get join of these two file in mapper and so
that I can map based on the partition id ?

Thanks
Ravikant

Joins in Hadoop

Reply via email to