Spark clustering question

goi cto Thu, 13 Feb 2014 04:43:16 -0800

Hi,

I have the following input file:


Tx ID , Dest Node ID, Original Tx ID, Amount

for every line with Original Tx ID we will find a line with the same Tx ID
if Tx are edges going into nodes then every edge going out from a node had
a previous edge going in.

*Sample Data*:
Tx1, node A, null , 100
Tx2, node B, Tx1, 50
Tx3, node C, Tx1, 50
Tx4, node B, null, 100
Tx5, node C, Tx4, 75
Tx6, node B, Tx4, 25

I want to build a spark program that build a file with the following
structure:

Source Node, Tx ID edge , Dest Node

*Sample Data:*
ROOT    Tx1,     A,     100
A,          Tx2,     B,     50
A,          Tx3,     C,     50
ROOT    Tx4,      B,    100
B,          Tx5,      C,    75
B,          Tx6,      B,    25

The logic needs to be implemented here is:
for each node (N -> ,
           for each row where N is the dest node ( row ->
                write: N , Row.TxID, Row.Node, Row.Amount))

*Any idea how to do I do it using Spark?*

-- 
Eran | CTO

Spark clustering question

Reply via email to