RE: [PATCH] librdmacm/rsockets: Support MSG_WAITALL with rsockets recv()

2012-08-16 Thread Hefty, Sean
Support MSG_WAITALL flag with recv() when using rsockets. Signed-off-by: Sridhar Samudrala s...@us.ibm.com The MSG_PEEK description that you pointed me to wasn't in the man page documentation that I was looking at. That simplifies things. I originally expected adding MSG_WAITALL support to

[PATCH 2/2] librdmacm/rstream: Use MSG_WAITALL for blocking test

2012-08-16 Thread Hefty, Sean
Signed-off-by: Sean Hefty sean.he...@intel.com --- examples/rstream.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/rstream.c b/examples/rstream.c index befb7c6..1d221d0 100644 --- a/examples/rstream.c +++ b/examples/rstream.c @@ -607,7 +607,7 @@ static int

RE: rsockets and fork

2012-08-16 Thread Hefty, Sean
This test is using Mellanox 10Gb RoCEE with MTU set to 9000 Server is started using # ldr netserver -D 2 clients are started in 2 windows as follows. # ldr netperf -v2 -c -C -H 192.168.0.22 -l10 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22

RE: handling rdma apps using chroot

2012-08-15 Thread Hefty, Sean
There are lots of issues with using a dev/sysfs interface instead of system calls and trying to support chroot... Not sure how rdmacm works, but verbs returns event channel FDs directly 'in-band' which avoids further use of dev.. The rdmacm calls open() in rdma_create_event_channel()...

RE: Setting service level for a QP with ibv_modify_qp and RDMA_CM

2012-08-14 Thread Hefty, Sean
we are trying to set the service level for a QP with ibv_modify_qp, but ibv_modify_qp() returns an error (errno = EINVAL). We are using RDMA CM and use rdma_create_qp() to allocate the queue pair. After some searching in the net we found posts, which indicate that ibv_modify_qp() cannot be

RE: rsockets and fork

2012-08-14 Thread Hefty, Sean
Yes. it is also using rsockets. The second session always hangs after sending a fixed number of bytes (38469632). rsend() blocks waiting for the CQ event. Can you send me the parameters that you use for testing?

RE: [PATCH 2/8] opensm/complib: define if statements with branch prediction hints

2012-08-14 Thread Hefty, Sean
+#define if_PF(cond) if(CL_PREDICT_FALSE(cond)) +#define if_PT(cond) if(CL_PREDICT_TRUE(cond)) If CL_PREDICT_TRUE/FALSE are too long, why not just shorten those, rather than abstract if statements behind a macro? -- To unsubscribe from this list: send the line unsubscribe

RE: [ANNOUNCE] librdmacm-1.0.16

2012-08-13 Thread Hefty, Sean
Are there any plans to include SOCK_DGRAM support? I could see that being potentionally interesting along with mapping broadcast/multicast to IB physical layer multicast. I took a look at linux/net/rds to see if something similar could be done in terms of transparently supporting

RE: rsockets and fork

2012-08-13 Thread Hefty, Sean
I could not get fork enabled netperf to work with rsockets in the latest librdmacm git repository. After some debugging, i found that the child netserver process is blocked at sem_wait() call in fork_passive(). It is not clear to me how this call is supposed to unblock as sem_post() is done

RE: [PATCH] RDMA/ucma.c: Different fix for ucma context uid=0, causing iWarp RDMA applications to fail in connection establishment

2012-08-10 Thread Hefty, Sean
Roland, there's a race here where ucma_set_event_context() copies ctx-uid to the event structure outside of the mutex. Once the mutex is acquired, ctx-uid is checked. However, the uid could have changed between saving it off to the event and checking it. OK. So then this patch,

RE: [PATCH] [trivial] infiniband: Fix typo in infiniband driver

2012-08-09 Thread Hefty, Sean
diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index 8c81992..b80867e 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -439,7 +439,7 @@ static int c2_rnic_close(struct c2_dev *c2dev)

RE: [PATCH] RDMA/ucma.c: Different fix for ucma context uid=0, causing iWarp RDMA applications to fail in connection establishment

2012-08-05 Thread Hefty, Sean
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 8002ae6..88c50d2 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -267,6 +267,7 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id, if (!uevent)

RE: [PATCH] RDMA/ucma.c: Fix for ucma context uid=0, causing iWarp RDMA applications to fail in connection establishment

2012-08-02 Thread Hefty, Sean
drivers/infiniband/core/ucma.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 8002ae6..6cc40de 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -803,9

RE: [PATCH 3/5] librspreload: Support server apps that call fork()

2012-08-01 Thread Hefty, Sean
Have you tried this with netperf? Yes, I tested it with: netserver netserver -D netserver -D -f Example: # export RDMAV_FORK_SAFE=1 # LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so netserver Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC # export

[RFC] zero-copy extensions for rsockets

2012-07-31 Thread Hefty, Sean
Before implementing this, I'm looking for feedback. The following proposal defines user-space APIs to support zero-copy. The intent is that the use of these extensions is fully compatible with existing calls, allowing applications to make selective use of them. Although I'm specifically

RE: [RFC] zero-copy extensions for rsockets

2012-07-31 Thread Hefty, Sean
This looks very similar to the libaio interface.. I did look at aio. It may be possible to use aio context in place of ioq, and I'm open to that. I was actually modeling ioq more after epoll than aio. It just seemed simpler to treat an ioq as a standard fd. For the get/put calls, there's

RE: [RFC] zero-copy extensions for rsockets

2012-07-31 Thread Hefty, Sean
libaio is designed to be used along with an eventfd that provides the epoll like semantics you are talking about. Each time you call io_submit you can call io_set_eventfd() on the iocb and the aio engine will trigger that eventfd when the IO completes. poll or epoll on the eventfd fd. A

RE: [RFC] zero-copy extensions for rsockets

2012-07-31 Thread Hefty, Sean
I'm not sure that is so great, one of the benefits of the aio interface is you have just one queue and one eventfd to manage, no matter how many fd's you are AIOing against. Completions can happen out of order. Requiring an app to juggle multiple ioq thingies split on some arbitrary axis (ie

[PATCH 3/3] librdmacm/rsocket: Improve disconnect time

2012-07-30 Thread Hefty, Sean
When both sides of a connection attempt to close at the same time, one of the two sides can easily get an error when sending a disconnect message. This results in that side hanging during close until the send times out. (The time out is caused by the remote side destroying its QP.) We can

[PATCH 2/3] librdmacm/rsockets: Use wr_id to determine completion type

2012-07-30 Thread Hefty, Sean
If a work request has completed in error, the completion type field is undefined. Use the wr_id to determine if the failed completion was a send or receive. This fixes an issue where MPI can hang during finalize. With both sides of a connection shutting down simultaneously, one side may

[PATCH 1/3] librdmacm/rsockets: Enable support for privileged ports

2012-07-30 Thread Hefty, Sean
Allow the preload library to use rsockets with privileged ports. Signed-off-by: Sean Hefty sean.he...@intel.com --- src/preload.c | 30 ++ 1 files changed, 6 insertions(+), 24 deletions(-) diff --git a/src/preload.c b/src/preload.c index c8ad747..52eaf1a 100644 ---

RE: [PATCH] RDMA/ucma: Convert open-coded equivalent to memdup_user()

2012-07-27 Thread Hefty, Sean
From: Roland Dreier rol...@purestorage.com Suggested by scripts/coccinelle/api/memdup_user.cocci. Reported-by: Fengguang Wu fengguang...@intel.com Signed-off-by: Roland Dreier rol...@purestorage.com Acked-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/ucma.c | 19

[PATCH 3/5] librspreload: Support server apps that call fork()

2012-07-24 Thread Hefty, Sean
Provide limited support for applications that call fork(). To handle fork(), we establish connections using normal sockets. The socket is later converted to an rsocket when the user makes the first call to a data transfer function (e.g. send, recv, read, write, etc.). Fork support is indicated

[PATCH 2/5] librspreload: Make socket_fallback() call more generic

2012-07-24 Thread Hefty, Sean
socket_fallback is used to switch from an rsocket to a normal socket in the case of failures. Rename the call and make it more generic, so that it can switch between an rsocket and a normal socket in either direction. This will be used to support fork(). As part of this change, we move the list

RE: uverbs message alignment

2012-07-20 Thread Hefty, Sean
struct c4iw_create_raw_qp_req { struct ibv_create_qp ibv_req; __u32 port; __u32 vlan_pri; __u32 nfids; }; struct ibv_create_qp contains a u64, which will force the size of the structure to 64-bit. You should to add an additional 32-bits of padding.

RE: rdma_connect() timeout

2012-07-18 Thread Hefty, Sean
Is there a way to setup the timeout in rdma_connect() ? For IB, the timeout is based on the packet lifetime in the path record returned by the SA. The rdma_cm will retry a CM REQ the maximum number of times (15). Is there a way to change the CM parameters ? e.g. Service Timeout to wait for

RE: rdma_connect() timeout

2012-07-18 Thread Hefty, Sean
According to the OpenSM default configuration (/usr/sbin/opensm --create-config config) : # The subnet_timeout code that will be set for all the ports # The actual timeout is 4.096usec * 2^subnet_timeout subnet_timeout 18 # The code of maximal time a packet can live in a switch

RE: rsockets and standard socket based TCP benchmarks

2012-07-16 Thread Hefty, Sean
Have you had a chance to look more into the above for fork() support? Actually, I just started working on it last Friday. I'll post a patch once I have at least something working. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to

[ANNOUNCE] librdmacm-1.0.16

2012-07-13 Thread Hefty, Sean
librdmacm release 1.0.16 is now available from www.openfabrics.org/downloads/rdmacm This release contains several bug fixes from 1.0.15, plus introduces the rsocket API and protocol. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to

RE: [PATCH] librdmacm/preload.c: Eliminate some compile warnings

2012-07-12 Thread Hefty, Sean
doh (to me) - thanks! applied -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH librdmacm] rdma_resolve_addr: source address protocol family must be valid

2012-07-11 Thread Hefty, Sean
thanks - applied -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH] librdmacm/rsocket: Build librspreload library as part of build

2012-07-11 Thread Hefty, Sean
Build the rsocket preload library as part of the build. To reduce the risk of the preload library intercepting calls without the user's knowledge, the preload library is installed into {_libdir}/rsocket. Signed-off-by: Sean Hefty sean.he...@intel.com --- diff --git a/Makefile.am

RE: linux-next: build failure after merge of the infiniband tree

2012-07-06 Thread Hefty, Sean
Thanks, fixed with a #if IS_ENABLED(CONFIG_IPV6) around the code that touches ipv6... Sean, let me know if more is required. The fix-up looks to be complete. Thanks -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More

RE: [Q] How to tranfer a file which is over 2GB(2^31) size in RDMA network?

2012-07-03 Thread Hefty, Sean
Hello Parav.Pandit Thank you for your advice. I'll try it. You can also look at rsockets in the latest librdmacm library. You'd need to download and build the library yourself, since rsockets is not yet available in any release. But there's a sample program (rcopy) that will copy a

RE: rsockets with RoCE

2012-06-29 Thread Hefty, Sean
No objection. The rdma_cm shouldn't be considered speed path anyway. Btw, the IB CM exports some counters which can sometimes be helpful in debugging, though, those only report a count of which messages have been sent/received. I have not used this before. How does one read these

[PATCH] librdmacm/rsocket: Handle other shutdown options

2012-06-27 Thread Hefty, Sean
Handle SHUT_RD and SHUT_WR shutdown options. In order to handle shutting down the send and receive sides separately, we break the connection state into multiple sub-states. This allows us to be partially connected (i.e. for either just reads or just writes). Support for SHUT_WR is needed to

[PATCH] librdmacm/rsocket: Set readfds event if rsocket has been disconnected

2012-06-27 Thread Hefty, Sean
Signed-off-by: Sean Hefty sean.he...@intel.com --- src/rsocket.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/rsocket.c b/src/rsocket.c index 394fed4..c833d46 100644 --- a/src/rsocket.c +++ b/src/rsocket.c @@ -1631,7 +1631,7 @@ rs_poll_to_select(int nfds, struct

RE: RDMA_CM_EVENT_REJECTED and ressources release

2012-06-21 Thread Hefty, Sean
- call rdma_disconnect(): even if the connection is not established, rdma_disconnect() can be called. In this case, all receive WR posted came back in error. On processing a reject event, the librdmacm should transition the QP into the error state. That should flush all posted work

RE: rsockets and other performance

2012-06-14 Thread Hefty, Sean
Traditional sockets based applications wanting high throughput could use rsockets Since it is layered on top of uverbs we expected to see good throughput numbers. So, we started to run netperf and iperf. We observed that it tops off at about 20Gb/s with QDR adapters. A quick perf top revealed

RE: rsockets and standard socket based TCP benchmarks

2012-06-14 Thread Hefty, Sean
Yes, good point. If the other side does not have rsockets then it is not that straightforward. Some thoughts: 1. The best option might be if we exchanged an option during connection setup. This tells the peers if the other side is capable of RDMA. If it is then one can switch to support

[PATCH 1/3 v2] rdma/cm: Bind to a specific address family

2012-06-14 Thread Hefty, Sean
The rdma cm uses a single port space for all associated (tcp, udp, etc.) port bindings, regardless of the address family that the user binds to. The result is that if a user binds to AF_INET, but does not specify an IP address, the bind will occur for AF_INET6. This causes an attempt to bind to

[PATCH 3/3 v2] rdma/cm: Allow user to restrict listens to bound address family

2012-06-14 Thread Hefty, Sean
Provide an option for the user to specify that listens should only accept connections where the incoming address family matches that of the locally bound address. This is used to support the equivalent of IPV6_V6ONLY socket option, which allows an app to only accept connection requests directed

[PATCH] librdmacm/rsocket: Support IPV6_V6ONLY socket option

2012-06-14 Thread Hefty, Sean
Signed-off-by: Sean Hefty sean.he...@intel.com --- Patch is dependent on proposed kernel changes. include/rdma/rdma_cma.h |1 + src/rsocket.c | 23 +++ 2 files changed, 24 insertions(+), 0 deletions(-) diff --git a/include/rdma/rdma_cma.h

[PATCH] rdma/cm: QP type check on received REQs should be AND not OR

2012-06-14 Thread Hefty, Sean
Change || check to when checking the QP type in a received connection request against the listening endpoint. Signed-off-by: Sean Hefty sean.he...@intel.com --- Found by code inspection. drivers/infiniband/core/cma.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git

RE: ibv_poll_cq() and wc-byte_len

2012-06-13 Thread Hefty, Sean
In a parallel universe, struct ibv_wc would have a bitmap field indicating which others fields are valid. In this part of the multiverse, a more complete documentation would be welcome. If libmlx4/libmthca behavior is the compliant one, I can provide an updated man page. The best

[PATCH 2/3] rdma/cm: Listen on specific address family

2012-06-13 Thread Hefty, Sean
The rdma_cm maps IPv4 and IPv6 addresses to the same service ID. This prevents apps from listening only for IPv4 or IPv6 addresses. It also results in an app binding to an IPv4 address receiving connection requests for an IPv6 address. Match socket behavior. Restrict listens on IPv4 addresses

[PATCH 3/3] rdma/cm: Allow user to restrict listens to bound address family

2012-06-13 Thread Hefty, Sean
Provide an option for the user to specify that listens should only accept connections where the incoming address family matches that of the locally bound address. This is used to support the equivalent of IPV6_V6ONLY socket option, which allows an app to only accept connection requests directed

[PATCH 1/3] rdma/cm: Bind to a specific address family

2012-06-13 Thread Hefty, Sean
The rdma cm uses a single port space for all associated (tcp, udp, etc.) port bindings, regardless of the address family that the user binds to. The result is that if a user binds to AF_INET, but does not specify an IP address, the bind will occur for AF_INET6. This causes an attempt to bind to

RE: [PATCH 2/3] rdma/cm: Listen on specific address family

2012-06-13 Thread Hefty, Sean
Match socket behavior. Restrict listens on IPv4 addresses to only IPv4 addresses. If a listen is on an IPv6 address, allow it to receive either IPv4 or IPv6 addresses. Can you match the IP stack and incorporate /proc/sys/net/ipv6/bindv6only I'll check and socket option IPV6_V6ONLY?

RE: [PATCH for-next V1 0/4] IB/IPoIB TSS and RSS support for datagram mode

2012-06-12 Thread Hefty, Sean
Do you have something more specific re how to actually align (say) the RSS QP group into the framework used by XRC? Not really. That was more of a conceptual statement regarding the design. The operation, particularly on the receive side, just seems similar. I don't think we should force a

RE: rsockets and standard socket based TCP benchmarks

2012-06-11 Thread Hefty, Sean
Though one can consider the fall-back in reverse order i.e. if the rdma connection fails continue with the already established connection (over the normal inet socket). When I consider fallback, one of the issues is handling the case where one of the two sides is not using rsockets. This

RE: [PATCH 1/4] librdamcm/rsocket: Handle SHUT_RD/WR shutdown flags

2012-06-08 Thread Hefty, Sean
Unfortunately this introduces another issue. This causes netperf to hang in recv() after shutdown(SHUT_WR) on the data socket. Bah. Thanks for testing. I was hoping for a simple work-around, rather than expand rsocket states. Oh well. Let me figure out the proper way to handle partial

RE: rsockets and standard socket based TCP benchmarks

2012-06-08 Thread Hefty, Sean
But to map standard networking applications to rsockets we will run into the above problem i.e. fork() will not work.  It would be very useful to allow for the standard networking paradigm of: bind()-listen()-accept()- -fork(), and then the server goes back to accept(). That would allow us to

[PATCH 2/4] librdamcm/rsocket: Handle TCP_MAXSEG socket option

2012-06-06 Thread Hefty, Sean
netperf uses the TCP_MAXSEG socket option. Add support for it. Problem reported by Sridhar Samudrala s...@us.ibm.com getsockopt returns the path MTU as the TCP_MAXSEG. setsockopt currently ignores the value. Signed-off-by: Sean Hefty sean.he...@intel.com --- src/rsocket.c |9 + 1

[PATCH 3/4] librdamcm/rsocket: Spin before blocking on an rsocket

2012-06-06 Thread Hefty, Sean
The latency cost of blocking is significant compared to round trip ping-pong time. Spin briefly on rsockets before calling into the kernel and blocking. The time to spin before blocking is read from an rsocket configuration file %sysconfig%/rdma/rsocket/polling_time. This is user adjustable.

[PATCH 4/4] librdamcm/rsocket: Use configuration files to specify default settings

2012-06-06 Thread Hefty, Sean
Give an administrator control over the default settings used by rsockets. Use files under %sysconfig%/rdma/rsocket as shown: mem_default - default size of receive buffer(s) wmem_default - default size of send buffer(s) sqsize_default - default size of send queue rqsize_default - default size of

[PATCH 1/4] librdamcm/rsocket: Handle SHUT_RD/WR shutdown flags

2012-06-06 Thread Hefty, Sean
Sridhar Samudrala s...@us.ibm.com reported an error (EOPNOTSUPP) after calling select(). The issue is that rshutdown(SHUT_WR) was called before select(). As part of shutdown, rsockets switches the underlying fd from nonblocking to blocking to ensure that previously sent data has completed.

RE: [PATCH for-next V1 0/4] IB/IPoIB TSS and RSS support for datagram mode

2012-06-04 Thread Hefty, Sean
Still, the 1st and most thing to handle here is feedback on the QP groups concept suggested by this patch set to support TSS/RSS over verbs. The plan is for this concept to (with little help from a framework for verbs extension) apply to user space RSS as well, for both UD and RAW QPs.

[PATCH 2/2] ibacm: Automatically select local port if not specified by path record

2012-06-04 Thread Hefty, Sean
If the user specifies a DLID or DGID as part of a path record lookup, automatically select a local port. This allows a user to query an SA without needing to specify the local SLID or SGID. Signed-off-by: Sean Hefty sean.he...@intel.com --- src/acm.c |6 +- 1 files changed, 5

[PATCH 1/2] ibacm/acme: Eliminate segfault when SLID/SGID not given

2012-06-04 Thread Hefty, Sean
Problem and cause reported by Hal Rosenstock h...@mellanox.com Signed-off-by: Sean Hefty sean.he...@intel.com --- src/acme.c | 14 +- 1 files changed, 9 insertions(+), 5 deletions(-) diff --git a/src/acme.c b/src/acme.c index e6ae188..0e1d4ed 100644 --- a/src/acme.c +++

RE: [PATCH] librdmacm/man/rdma_getaddrinfo.3: Add RDMA_PS_IB to supported port spaces

2012-06-01 Thread Hefty, Sean
thanks! - applied -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: HCA concurrency

2012-06-01 Thread Hefty, Sean
In my test case to evaluate fetch-and-add, I spawn multiple threads, each owning its own QP inside the same PD and context and sending fetch-and-add requests without any inner contention (no lock, etc.). I quickly reach a ceiling of about 900KOPS with 5/6 threads, and I have a hard time

RE: [PATCH] ibacm/acm.c: Make sure shift for subnet timeout is not negative

2012-05-31 Thread Hefty, Sean
thanks! - applied -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH] ibacm/acme.c: Eliminate seg fault when source not supplied

2012-05-31 Thread Hefty, Sean
diff --git a/src/acme.c b/src/acme.c index d3f8174..533588c 100644 --- a/src/acme.c +++ b/src/acme.c @@ -618,12 +618,18 @@ static void resolve(char *svc) ret = resolve_name(path); break; case 'l': -

[PATCH 1/16] librdmacm: Check that send and recv CQs are different before destroying

2012-05-30 Thread Hefty, Sean
ucma_destroy_cqs() destroys both the send and recv CQs if they are non-null. If the two CQs are actually the same one, this results in a crash when trying to destroy the second CQ. Check that the CQs are different before destroying the second CQ. This fixes a crash when using rsockets, which

[PATCH 0/16] librdmacm: rsocket improvements

2012-05-30 Thread Hefty, Sean
The following patch set provides fixes, adds user configurability, and optimizes rsockets based on the needs and results of performance analysis and wider testing. Additional optimizations will follow, but I wanted to go ahead and push these changes out. Signed-off-by: Sean Hefty

[PATCH 2/16] librdmacm/rstream: Check for connection error on async connect

2012-05-30 Thread Hefty, Sean
Signed-off-by: Sean Hefty sean.he...@intel.com --- examples/rstream.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/examples/rstream.c b/examples/rstream.c index 104b318..c440f04 100644 --- a/examples/rstream.c +++ b/examples/rstream.c @@ -448,7 +448,8 @@

[PATCH 16/16] librdmacm/rstream: Use separate connections for latency/bw tests

2012-05-30 Thread Hefty, Sean
Optimize each connection for either latency or bandwidth results. This improves small message latency under 384 bytes by .5 - 1 us, while increasing bandwidth by 1 - 1.5 Gbps. Signed-off-by: Sean Hefty sean.he...@intel.com --- examples/rstream.c | 146

[PATCH 3/16] librdmacm/rsocket: Fix hang in rrecv/rsend after disconnecting

2012-05-30 Thread Hefty, Sean
If a user calls rrecv() after a blocking rsocket has been disconnected, it will hang. This problem and the cause was reported by Sridhar Samudrala samudr...@us.ibm.com. It can be reproduced by running netserver -f -D using the rs-preload library. A similar issue exists with rsend(). Fix this

[PATCH 5/16] librdmacm: Delay ACM connection until resolving an address

2012-05-30 Thread Hefty, Sean
Avoid creating a connection to the ACM service when it's not needed. For example, if the user of the librdmacm is a server application, it will not use ACM services. Signed-off-by: Sean Hefty sean.he...@intel.com --- src/acm.c | 22 +- src/cma.c |2 -- src/cma.h |2

[PATCH 4/16] librdmacm/acm: Use -1 to indicate an invalid socket rather than 0

2012-05-30 Thread Hefty, Sean
socket() can return 0 as a valid socket. This can happen when using a daemon that closes stdin/out/err. Signed-off-by: Sean Hefty sean.he...@intel.com --- src/acm.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/acm.c b/src/acm.c index bcf11da..9c65919

[PATCH 7/16] librdmacm/rsockets: Reduce QP size if larger than hardware maximums

2012-05-30 Thread Hefty, Sean
When porting rsockets to iwarp, it was discovered that the default QP size (512) was larger than that supported by the hardware. Decrease the size of the QP if the default size is larger than the maximum supported by the hardware. Signed-off-by: Sean Hefty sean.he...@intel.com --- src/cma.c

[PATCH 6/16] librdmacm/rs-preload: Handle recursive socket() calls

2012-05-30 Thread Hefty, Sean
When ACM support is enabled in the librdmacm, it will attempt to establish a socket connection to the ACM daemon. When the rsocket preload library is in use, this can result in a recursive call to socket() that results in the library hanging. The resulting call stack is: socket() - rsocket() -

[PATCH 14/16] librdmacm/rstream: Add option to specify size of send/recv buffers

2012-05-30 Thread Hefty, Sean
Signed-off-by: Sean Hefty sean.he...@intel.com --- examples/rstream.c | 34 +- man/rstream.1 |5 - 2 files changed, 17 insertions(+), 22 deletions(-) diff --git a/examples/rstream.c b/examples/rstream.c index c440f04..df36e34 100644 ---

[PATCH 13/16] librdmacm/rsockets: Change the default QP size from 512 to 384

2012-05-30 Thread Hefty, Sean
Simple latency/bandwidth tests using rstream showed minimal difference in performance between using a QP sized to 384 entries versus 512. Reduce the overhead of a default rsocket by using 384 entries. A user can request a larger size by calling rsetsockopt. Signed-off-by: Sean Hefty

[PATCH 15/16] librdmacm/rstream: Use snprintf in place of sprintf

2012-05-30 Thread Hefty, Sean
Avoid possible buffer overrun. Signed-off-by: Sean Hefty sean.he...@intel.com --- examples/rstream.c | 38 +- 1 files changed, 21 insertions(+), 17 deletions(-) diff --git a/examples/rstream.c b/examples/rstream.c index df36e34..054d11e 100644 ---

[PATCH 12/16] librdmacm/rsockets: Simplify state checks

2012-05-30 Thread Hefty, Sean
Signed-off-by: Sean Hefty sean.he...@intel.com --- src/rsocket.c | 26 +++--- 1 files changed, 11 insertions(+), 15 deletions(-) diff --git a/src/rsocket.c b/src/rsocket.c index ef070a8..b89ef42 100644 --- a/src/rsocket.c +++ b/src/rsocket.c @@ -126,6 +126,9 @@ union

[PATCH 11/16] librdmacm/rs-preload: Use environment variable to set QP size

2012-05-30 Thread Hefty, Sean
Allow the user to specify the size of the send/receive queues and inline data size through environment variables: RS_SQ_SIZE, RS_RQ_SIZE, and RS_INLINE. Signed-off-by: Sean Hefty sean.he...@intel.com --- src/preload.c | 39 +++ 1 files changed, 39

[PATCH 8/16] librdmacm/rsockets: Define options specific to rsockets

2012-05-30 Thread Hefty, Sean
Allow a user to control some of the RDMA related attributes of an rsocket through setsockopt/getsockopt. A user specifies that the rsocket should be modified through SOL_RDMA level. This patch provides the initial framework. Subsequent patches will add the configurable parameters.

[PATCH 9/16] librdmacm/rsockets: Allow user to specify the QP sizes

2012-05-30 Thread Hefty, Sean
Add setsockopt options that allow the user to specify the desired size of the underlying QP. The provided sizes are used as the maximum size when creating the QP. The actual sizes of the QP are the smaller of the user provided maximum and the maximum sizes supported by the underlying hardware.

RE: [PATCH 2/3] IB/mlx4: Fix max_wqe capacity report for query device

2012-05-29 Thread Hefty, Sean
Did you try out the patches? was it helpful to address the problem you're facing? I have not had time to test it yet -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at

[PATCH 3/4] librdmacm/rstream: Set rsocket nonblocking for base tests

2012-05-16 Thread Hefty, Sean
The base set of rstream tests want nonblocking rsockets, but don't actually set the rsocket to nonblocking. It instead relies on the MSG_DONTWAIT flag. Make the code match the expected behavior and set the rsocket to nonblocking and make nonblocking the default. Provide a test option to switch

[PATCH 2/4] librdmacm/rstream: Always set TCP_NODELAY on rsocket

2012-05-16 Thread Hefty, Sean
The NODELAY option is coupled with whether the socket is blocking or nonblocking. Remove this coupling and always set the NODELAY option. NODELAY currently has no effect on rsockets. Signed-off-by: Sean Hefty sean.he...@intel.com --- examples/rstream.c | 11 ++- 1 files changed, 2

[PATCH 4/4] librdmacm/rstream: Set rsocket nonblocking if set to async operation

2012-05-16 Thread Hefty, Sean
If asynchronous use is specified (use of poll/select), set the rsocket to nonblocking. This matches the common usage case for asynchronous sockets. When asynchronous support is enabled, the nonblocking/blocking test option determines whether the poll/select call will block, or if rstream will

[PATCH 1/4] librdmacm/rsocket: Succeed setsockopt REUSEADDR on connected sockets

2012-05-16 Thread Hefty, Sean
The RDMA CM fail calls to set REUSEADDR on an rdma_cm_id if it is not in the idle state. As a result, this causes a failure in NetPipe when run with socket calls intercepted by rsockets. Fix this by returning success when REUSEADDR is set on an rsocket that has already been connected. When

[PATCH] librdmacm/rsocket: Succeed setsockopt REUSEADDR on connected sockets

2012-05-11 Thread Hefty, Sean
The RDMA CM fail calls to set REUSEADDR on an rdma_cm_id if it is not in the idle state. As a result, this causes a failure in NetPipe when run with socket calls intercepted by rsockets. Fix this by returning success when REUSEADDR is set on an rsocket that has already been connected. When

RE: [PATCH] librdmacm/rsockets: Optimize synchronization to improve performance

2012-05-10 Thread Hefty, Sean
A test that acquired and released a lock 2 billion times reported that the custom lock was roughly 20% faster than using the mutex. 26.6 seconds versus 33.0 seconds. I think you are measuring the fact your call is inlined and pthreads has an indirect jump - because internally pthreads

RE: ib_destroy_cm_id() versus cm callback race ?

2012-04-30 Thread Hefty, Sean
That makes me wonder how it is prevented that two CM callbacks for the same CM ID run concurrently on different CPUs ? The callback code ends up looking like this: ret = atomic_inc_and_test(cm_id_priv-work_count); if (!ret) list_add_tail(work-list,

RE: ib_destroy_cm_id() versus cm callback race ?

2012-04-30 Thread Hefty, Sean
Are you sure that only one thread at a time will invoke a CM callback ? As far as I can see cm_recv_handler() queues work without checking whether any other work is ongoing. From drivers/infiniband/core/cm.c: All callbacks for a single ID should be serialized. (I think the listen ID is an

RE: ib_destroy_cm_id() versus cm callback race ?

2012-04-27 Thread Hefty, Sean
If I interpret the source code in drivers/infiniband/core/cm.c correctly ib_destroy_cm_id() can return before an ongoing cm_id callback has finished. Is this on purpose ? If not, isn't there a flush_workqueue(cm.wq) call missing in cm_destroy_id() ? ib_destroy_cm_id() will block while there's

RE: ibstat does not recognize iWARP RNIC adapters

2012-04-26 Thread Hefty, Sean
I noticed that doing ibstat, none of the iWARP RNIC adapters were showing up.  I have attached a patch to address the issue (libibumad.patch). SYS_NODE_TYPE for iWARP RNIC is 4 and is_ib_type only checked to 3. I don't think libibumad should support RNICs. Users can use ibv_devinfo

RE: ibstat does not recognize iWARP RNIC adapters

2012-04-26 Thread Hefty, Sean
Users seem to expect ibstat to show all rdma devices... They why not change ibstat to use ibverbs or have it gather its data directly? What other functionality does umad provide for RNICs? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to

RE: ibstat does not recognize iWARP RNIC adapters

2012-04-26 Thread Hefty, Sean
Hal/Sean, I defer to you on whether you think we should add this change to ibstat. If you all recommend against it, we'll advise customers to use ibv_devinfo, which is included with libibverbs and is required for iwarp user apps. But it seems a minimal change to add, and previously it did

[PATCH] rdma/cm: Fix false possible recursive locking

2012-04-25 Thread Hefty, Sean
The following lockdep problem was reported by Or Gerlitz ogerl...@mellanox.com. = [ INFO: possible recursive locking detected ] 3.3.0-32035-g1b2649e-dirty #4 Not tainted - kworker/5:1/418 is trying to acquire

RE: The libibverbs with the verbs extension framework uses reserved word in C++

2012-04-23 Thread Hefty, Sean
We evaluated the verbs extension framework and we noticed that two new attributes in the structures: ibv_device ibv_context contain an attribute called private, which is a reserved work in C++. Changing those attributes to private_data seems like a good idea. Do you want me to

RE: [PATCH] [TRIVIAL] ibacm: security fix: replace sprintf with snprintf

2012-04-23 Thread Hefty, Sean
thanks - applied -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: How to use IB netlink infrastructure

2012-04-23 Thread Hefty, Sean
It's been a few days since I posted this question, but I've had no responses so far. Can I use the IB netlink infrastructure to produce actual RDMA data transfers? no If this is not possible, what is the actual entry point in the IB kernel code that results in a call to get_user_pages()? I

rsockets direct data placement

2012-04-18 Thread Hefty, Sean
I've committed the rsocket implementation in my librdmacm git tree, but I'll be fairly open about wire protocol changes until an actual release. I'd like to start a discussion on the best way to support direct placement of data into an application's buffer with rsockets. I spoke with many

[ANNOUNCE] ibacm release 1.0.6

2012-04-13 Thread Hefty, Sean
ibacm release 1.0.6 is now available from: https://www.openfabrics.org/downloads/rdmacm/ibacm-1.0.6.tar.gz The git shortlog from 1.0.5 is: Dotan Barak (1): After allocation of dynamic memory blocks, check the allocation Hal Rosenstock (4): ib_acme: Fix typo ib_acme: Use IPv4

RE: [RFC] [PATCH 1/4] librdmacm: Define streaming over RDMA interface (rsockets)

2012-04-13 Thread Hefty, Sean
I'm a little slow writing this up, but for anyone interested, see below for details on the wire protocol. +#define RS_QP_CTRL_SIZE 4 4 entries on the send queue are reserved for control messages. (At least 1 is needed to avoid deadlock.) User data is only transferred if there is an

<    2   3   4   5   6   7   8   9   10   11   >