Since you are doing a self join you don't need to actually use trident
join, or the multireducer on which it is based. You could group the stream
on your join key, then write an aggregator which collects all the tuples in
each group and emits the cross product at the end of each batch (or in a
Hi all,
Taking Storm distributed processing framework as a
reference: Following two problems we are facing.
(1) Find the optimal time out of batch processing based on traffic load
and provision to set timeout at runtime in storm cluster.
(2) Computes the optimal degree of parallelism for
Is it possible to join a trident stream with itself?
My particular use case is that I want to take the cross product of all the
incoming tuples for a batch and then only keep the joined tuples containing
a known value.
I believe the SQL for what I am trying to accomplish is:
SELECT * FROM table