Here's some background on what SRC is. This is basically slide 6 in Dror's talk, for those that missed the talk.
* * * SRC is an extension supported by recent Mellanox hardware which is geared toward reducing the number of QPs required for all-to-all communication on systems with a high number of jobs per node. =================================================================== Motivation: =================================================================== Given N nodes with J jobs per node, number of QPs required for all-to-all communication is: With RC: O((N * J) ^ 2) Since each job out of O(N * J) jobs must create a single QP to communicate with each one of O(N * J) other jobs. With SRC: O(N ^ 2 * J) This is achived by using a single send queue (per job, out of O(N * J) jobs) to send data to all J jobs running on a specific node (out of O(N) nodes). Hardware uses new "SRQ number" field in packet header to multiplex receive WRs and WCs to private memory of each job. This is similiar idea to IB RD. Q: Why not use RD then? A: Because no hardware supports it. Details: =================================================================== Verbs extension: =================================================================== - There is a new transport/QP type "SRC". - There is a new object type "SRC domain" - Each SRQ gets new (optional) attributes: SRC domain SRC SRQ number SRC CQ SRQ must have either all 3 of these or none of these attributes - QPs of type SRC have all the same attributes as regular RC QPs connected to SRQ, except that: A. Each SRC QP has a new required attribute "SRC domain" B. SRC QPs do *not* have "SRQ" attribute (do not have a specific SRQ associated with them) =================================================================== Protocol extension: =================================================================== SRC QP behaviour: Requestor - Post send WR for this QP type is extended with SRQ number field This number is sent as part of packet header - SRC Packets follow rules for RC packets on the wire, exactly What is different is their handling at the responder side SRC QP behaviour: Responder Each incoming packet passes transport checks with respect to the SRC QP, following RC rules, exactly. After this, SRQ number in packet header is used to look up a specific SRQ. SRC domain of the resulting SRQ must be equal to SRC domain of the QP, otherwise a NAK is sent, and QP moves to error state. If the SRC domains match, receive WR and receive WC processing are as follows: - RC Send - Rather than using SRQ to which the QP is attached, SRQ is looked up by SRQ number in the packet. Receive WR is taken from this SRQ. - Completions are generated on the CQ specified in the SRQ - RDMA/Atomic - Rather than using PD to which the QP is attached, SRQ is looked up by SRQ number in the packet. PD of this SRQ is used for protection checks. =================================================================== -- MST _______________________________________________ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg