Douglas,

>> Does that mean you have a cluster of components that processes requests 
>> of type X, another cluster that processes requests of type Y and so on?
> 
> Not really, I think.  Here's a simplified picture:
> 
>   +--------------+     +-----+
>   | Read from DB |---->| Sum |+
>   +--------------+     +-----+|
>                         +-----+
> 
> So the first component reads some records from a database (or file or
> whatever), and we want to calculate the equivalent of an SQL SUM/GROUP
> BY.  Each record is an individual 0MQ message.  If the Read component
> outputs the following list of records:
> 
>   Key   Value
>  -------------
>    A      1
>    B      3
>    A      1
>    A      2
>    B      4
> 
> then we want the final result to be <(A,4), (B,7)>.
> 
> We've got multiple copies of the Sum component, so that we can process a
> huge number of records in parallel.  If we implement the arrow as an
> UP/DOWNSTREAM connection, with multiple sockets on the downstream end,
> then the records will be round-robin distributed to the different Sums.
> The first Sum will give us an output of <(A,2), (B,4)>, and the second
> Sum will give us <(A,2), (B,3)>.
> 
> To get the right answer, the part that does the partitioning has to
> ensure that all "A" records go to the same Sum instance.  This is
> usually done by hashing the key and using that to choose which instance
> to send each record to.  (That way the partitioner can be stateless, so
> we can scale up to any number of records.)  So if we want 0MQ to handle
> this behavior, we'd have to pass in this hash value when we submit a
> message using zmq_send.

Ok. I see. So basically, you want messages to be load-balanced by 0MQ 
but at the same time ensure that all messages with a particular 
"subject" (whatever that means) go to the same downstream component.

The problem I see with this scenario is that it doesn't comply with 
basic "scalability" requirement that underlies all 0MQ messaging 
patterns (except PAIR socket... ugh).

"Scalability" requirement says that when your distributed system is not 
catching up with load you can add more boxes on the fly to solve it 
(think of buying more processing time in the cloud).

With the above scenario adding a new box wouldn't help. The messages 
have to be deterministically dispatched from the very beginning. Thus no 
message would be ever dispatched to the newly added box.

Can we possibly console the requirement for "grouping" with "scalability"?

Thoughts?
Martin
_______________________________________________
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to