Hi,

I'm wondering whether it's possible to continuously merge the RDDs coming
from a stream into a single RDD efficiently.

One thought is to use the union() method. But using union, I will get a new
RDD each time I do a merge. I don't know how I should name these RDDs,
because I remember Spark does not encourage users to create an array of
RDDs.

Another possible solution is to follow the example of
"StatefulNetworkWordCount", which uses the updateStateByKey() method. But
my RDD type is not key value pairs (it's a struct with multiple fields). Is
there a workaround?

Thanks,
Cui

Reply via email to