Let say I have transaction data and visit data

visit
| userId | Visit source | Timestamp |
| A      | google ads   | 1         |
| A      | facebook ads | 2         |

transaction
| userId | total price | timestamp |
| A      | 100         | 248384    |
| B      | 200         | 43298739  |

I want to join transaction data and visit data to do sales attribution. I
want to do it realtime whenever transaction occurs (streaming).

Is it scalable to do join between one data and very big historical data
using join function in spark? If it is not, then how it usually be done?

Visit needs to be historical, since visit can be anytime before transaction
(e.g. visit is one year before transaction occurs)

Rendy

Reply via email to