Re: PATCH: opensm enhancements

2013-07-03 Thread Hal Rosenstock
HI Jeff, On 6/26/2013 5:24 PM, Jeff Becker wrote: Hi Hal. At the OFA workshop, I mentioned that I've been working on some modifications to opensm that we use at NASA. Following extensive testing of these applied to opensm 3.3.13 (the version we run here), I have ported these to top of tree

[PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Bart Van Assche
The purpose of this InfiniBand SRP initiator patch series is as follows: - Make the SRP initiator driver better suited for use in a H.A. setup. Add fast_io_fail_tmo and dev_loss_tmo parameters. These can be used either to speed up failover or to avoid device removal when e.g. using

[PATCH v3 01/13] IB/srp: Fix remove_one crash due to resource exhaustion

2013-07-03 Thread Bart Van Assche
From: Dotan Barak dot...@dev.mellanox.co.il If the add_one callback fails during driver load no resources are allocated so there isn't a need to release any resources. Trying to clean the resource may lead to the following kernel panic: BUG: unable to handle kernel NULL pointer dereference at

[PATCH v3 02/13] IB/srp: Avoid that srp_reset_host() is skipped after a TL error

2013-07-03 Thread Bart Van Assche
The SCSI error handler assumes that the transport layer is operational if an eh_abort_handler() returns SUCCESS. Hence let srp_abort() only return SUCCESS if sending the ABORT TASK task management function succeeded. This patch avoids that the SCSI error handler skips the srp_reset_host() call

[PATCH v3 03/13] IB/srp: Fail I/O fast if target offline

2013-07-03 Thread Bart Van Assche
If reconnecting failed we know that no command completion will be received anymore. Hence let the SCSI error handler fail such commands immediately. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Acked-by: Sebastian Riemer

[PATCH v3 04/13] IB/srp: Skip host settle delay

2013-07-03 Thread Bart Van Assche
The SRP initiator implements host reset by reconnecting to the SRP target. That means that communication with the target is possible as soon as host reset finished. Hence skip the host settle delay. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc:

[PATCH v3 05/13] IB/srp: Maintain a single connection per I_T nexus

2013-07-03 Thread Bart Van Assche
An SRP target is required to maintain a single connection between initiator and target. This means that if the 'add_target' attribute is used to create a second connection to a target that the first connection will be logged out and that the SCSI error handler will kick in. The SCSI error handler

[PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer

2013-07-03 Thread Bart Van Assche
Keep the rport data structure around after srp_remove_host() has finished until cleanup of the IB transport layer has finished completely. This is necessary because later patches use the rport pointer inside the queuecommand callback. Without this patch accessing the rport from inside a

[PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Bart Van Assche
Add the necessary functions in the SRP transport module to allow an SRP initiator driver to implement transport layer error handling similar to the functionality already provided by the FC transport layer. This includes: - Support for implementing fast_io_fail_tmo, the time that should elapse

[PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread Bart Van Assche
Finish all outstanding I/O requests after fast_io_fail_tmo expired, which speeds up failover in a multipath setup. This patch is a reworked version of a patch from Sebastian Riemer. Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com Signed-off-by: Bart Van Assche bvanass...@acm.org

[PATCH v3 10/13] IB/srp: Start timers if a transport layer error occurs

2013-07-03 Thread Bart Van Assche
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a transport layer error occurs. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc: Roland Dreier rol...@kernel.org Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer

[PATCH v3 09/13] IB/srp: Use SRP transport layer error recovery

2013-07-03 Thread Bart Van Assche
Enable reconnect_delay, fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Add kernel module parameters that allow to specify default values for these three parameters. Signed-off-by: Bart Van Assche bvanass...@acm.org Acked-by: David Dillow dillo...@ornl.gov Cc: Roland

[PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-03 Thread Bart Van Assche
Several InfiniBand HCA's allow to configure the completion vector per queue pair. This allows to spread the workload created by IB completion interrupts over multiple MSI-X vectors and hence over multiple CPU cores. In other words, configuring the completion vector properly not only allows to

[PATCH v3 12/13] IB/srp: Make transport layer retry count configurable

2013-07-03 Thread Bart Van Assche
Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count helps with reducing path failover time. [bvanassche: Rewrote patch description / changed default retry count from 2 back to 7] Signed-off-by: Vu Pham

[PATCH v3 13/13] IB/srp: Bump driver version and release date

2013-07-03 Thread Bart Van Assche
Signed-off-by: Vu Pham v...@mellanox.com Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: David Dillow dillo...@ornl.gov Cc: Sebastian Riemer sebastian.rie...@profitbricks.com --- drivers/infiniband/ulp/srp/ib_srp.c |4 ++-- 1 file changed, 2

Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Or Gerlitz
On 03/07/2013 15:41, Bart Van Assche wrote: [...] Bart, The individual patches in this series are as follows: 0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch 0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch

Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 14:55 +0200, Bart Van Assche wrote: Finish all outstanding I/O requests after fast_io_fail_tmo expired, which speeds up failover in a multipath setup. This patch is a reworked version of a patch from Sebastian Riemer. Reported-by: Sebastian Riemer

Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable

2013-07-03 Thread David Dillow
On Tue, 2013-07-02 at 13:18 -0600, Jason Gunthorpe wrote: On Mon, Jul 01, 2013 at 07:26:05AM -0400, David Dillow wrote: You assume independent failures, which is suspect -- many times these are data-dependent, or so I tend to think. Jason, do you have any insight on this (overall) topic you

Re: [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 14:59 +0200, Bart Van Assche wrote: Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count helps with reducing path failover time. [bvanassche: Rewrote patch description / changed default retry

Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Bart Van Assche
On 07/03/13 15:38, Or Gerlitz wrote: Some of these patches were already picked by Roland (SB), I would suggest that you post V4 and drop the ones which were accepted. One of the patches that is already in Roland's tree and that was in v1 of this series has been split into two patches in v2

Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread Bart Van Assche
On 07/03/13 16:08, David Dillow wrote: On Wed, 2013-07-03 at 14:55 +0200, Bart Van Assche wrote: Finish all outstanding I/O requests after fast_io_fail_tmo expired, which speeds up failover in a multipath setup. This patch is a reworked version of a patch from Sebastian Riemer. Reported-by:

Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 16:45 +0200, Bart Van Assche wrote: Having it in the caller has the advantage that the compiler can optimize the shift operation out because the number that is being shifted left is a constant. srp_finish_req() is likely to be inlined, so the compiler will be able to

Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 10:57 -0400, David Dillow wrote: On Wed, 2013-07-03 at 16:45 +0200, Bart Van Assche wrote: Having it in the caller has the advantage that the compiler can optimize the shift operation out because the number that is being shifted left is a constant.

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote: +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ + return (fast_io_fail_tmo 0 || dev_loss_tmo 0 || + fast_io_fail_tmo dev_loss_tmo) + fast_io_fail_tmo = SCSI_DEVICE_BLOCK_MAX_TIMEOUT +

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Bart Van Assche
On 07/03/13 17:14, David Dillow wrote: On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote: +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ + return (fast_io_fail_tmo 0 || dev_loss_tmo 0 || + fast_io_fail_tmo dev_loss_tmo) +

Re: PATCH: opensm enhancements

2013-07-03 Thread Jeff Becker
Hi Hal, I have some testing info about the second patch below. On 07/03/2013 03:23 AM, Hal Rosenstock wrote: HI Jeff, On 6/26/2013 5:24 PM, Jeff Becker wrote: Hi Hal. At the OFA workshop, I mentioned that I've been working on some modifications to opensm that we use at NASA. Following

Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz
On 01/07/2013 20:49, Roland Dreier wrote: - I think the active flag for the health check timer is unnecessary. It can just be stopped with del_timer_sync(). Hi Roland Jack looked on this comment/code and he says that the active flag is used to prevent re-scheduling the timer from inside the

[PATCH opensm] Add flags to OSM_EVENT_ID_UCAST_ROUTING_DONE

2013-07-03 Thread Hal Rosenstock
to be able to discern between ucast routing done when rerouting versus heavy sweep. Signed-off-by: Hal Rosenstock h...@mellanox.com --- diff --git a/include/opensm/osm_event_plugin.h b/include/opensm/osm_event_plugin.h index 6b060e7..ca5a719 100644 --- a/include/opensm/osm_event_plugin.h +++

[PATCH V2 4/9] IB/core: Add reserved values to enums for low-level drivers use

2013-07-03 Thread Or Gerlitz
From: Jack Morgenstein ja...@dev.mellanox.co.il Continue the approach taken by commit d2b57063e4a IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use and reserved entries to the ib_qp_type and ib_wr_opcode enums. The low-level drivers will then define macros to use these

[PATCH V2 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/ah.c | 95 drivers/infiniband/hw/mlx5/cq.c | 844 + drivers/infiniband/hw/mlx5/doorbell.c | 100 drivers/infiniband/hw/mlx5/mad.c

[PATCH V2 9/9] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- MAINTAINERS | 10 ++ drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile |1 + drivers/infiniband/hw/mlx5/Kconfig | 10 ++

[PATCH V2 0/9] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz
Hi Roland, all Here's V2 of the driver, with Dave's and Roland's comments addressed, looking forward to see if we have OK from Roland to merge that into 3.11 Jack, Moshe and Or. changes from V1: - Addreessed Dave Miller's comments: * Local variables in functions listed from longest to

[PATCH V2 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com Signed-off-by: Eli Cohen e...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 547 ++ drivers/infiniband/hw/mlx5/mr.c | 1021 ++ 2 files changed, 1568 insertions(+), 0 deletions(-) create mode 100644

Re: rtnl_lock deadlock on 3.10

2013-07-03 Thread Shawn Bohrer
On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote: On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote: On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote: On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa han...@stressinduktion.org wrote:

Re: PATCH: opensm enhancements

2013-07-03 Thread Hal Rosenstock
Hi again Jeff, On 7/3/2013 12:20 PM, Jeff Becker wrote: Hi Hal, I have some testing info about the second patch below. On 07/03/2013 03:23 AM, Hal Rosenstock wrote: HI Jeff, On 6/26/2013 5:24 PM, Jeff Becker wrote: Hi Hal. At the OFA workshop, I mentioned that I've been working on some

Re: rtnl_lock deadlock on 3.10

2013-07-03 Thread Or Gerlitz
On 03/07/2013 20:22, Shawn Bohrer wrote: On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote: On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote: On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote: On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: On 07/03/13 17:14, David Dillow wrote: On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote: +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo) +{ + return (fast_io_fail_tmo 0 || dev_loss_tmo 0 || +

[PATCH V3 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion had...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/uverbs.h |

[PATCH V3 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-07-03 Thread Or Gerlitz
From: Igor Ivanov igor.iva...@itseez.com Add Infra-structure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and

[PATCH V3 for-next 0/4] Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
Hi Roland, all V3 addresses the comments made by Sean. There are still some concerns/questions posed by Roland on the uverbs extensions element of the series. I have posted replies for them, but so far no further comments were made. V3 changes: - Addressed comments from Sean: - modified

[PATCH V3 for-next 4/4] IB/mlx4: Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com Implement ib_create_flow and ib_destroy_flow. Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides 64 bit

[PATCH V3 for-next 1/4] IB/core: Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, for which plain Ethernet packets are used, specifically packets which don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a

[PATCH] IB/qib: fix module level leak

2013-07-03 Thread Mike Marciniszyn
The vzalloc()'ed field physshadow is leaked on module unload. This patch adds vfree after the sibling page shadow is freed. Reported-by: Dean Luick dean.lu...@intel.com Reviewed-by: Dean Luick dean.lu...@intel.com Signed-off-by: Mike Marciniszyn mike.marcinis...@intel.com ---

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Bart Van Assche
On 07/03/13 19:27, David Dillow wrote: On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable to the target was pulled. I/O resumed shortly after the cable

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread David Dillow
On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote: On 07/03/13 19:27, David Dillow wrote: On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was detected shortly after the cable

Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Roland Dreier
On Wed, Jul 3, 2013 at 9:41 AM, Or Gerlitz ogerl...@mellanox.com wrote: Jack looked on this comment/code and he says that the active flag is used to prevent re-scheduling the timer from inside the timer handling routine. In the kernel, the comment header in the source file for del_timer_sync

Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz
On Wed, Jul 3, 2013 at 10:26 PM, Roland Dreier rol...@kernel.org wrote: On Wed, Jul 3, 2013 at 9:41 AM, Or Gerlitz ogerl...@mellanox.com wrote: Jack looked on this comment/code and he says that the active flag is used to prevent re-scheduling the timer from inside the timer handling routine.

Re: [PATCH V2 1/9] net/mlx5: Mellanox Connect-IB, core driver part 1/3

2013-07-03 Thread Joe Perches
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote: From: Eli Cohen e...@mellanox.com trivial comments: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c [] +static const char *deliv_status_to_str(u8 status) +{ + switch

[PATCH v3 05/25] infiniband: Change how dentry's d_lock field is accessed

2013-07-03 Thread Waiman Long
Because of the changes made in dcache.h header file, files that use the d_lock field of the dentry structure need to be changed accordingly. All the d_lock's spin_lock() and spin_unlock() calls are replaced by the corresponding d_lock() and d_unlock() calls. There is no change in logic and

Re: [PATCH V2 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5

2013-07-03 Thread Joe Perches
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote: From: Eli Cohen e...@mellanox.com more trivia: diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c [] +struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr, +struct mlx5_ib_ah *ah) +{ +

Re: [PATCH V2 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5

2013-07-03 Thread Joe Perches
On Wed, 2013-07-03 at 20:13 +0300, Or Gerlitz wrote: From: Eli Cohen e...@mellanox.com More trivia: diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h [] +#define mlx5_ib_dbg(dev, format, arg...) \ +do {

Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling

2013-07-03 Thread Vu Pham
David Dillow wrote: On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote: On 07/03/13 19:27, David Dillow wrote: On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote: The combination of dev_loss_tmo off and reconnect_delay 0 worked fine in my tests. An I/O failure was

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-03 Thread Jeff Squyres (jsquyres)
Bump. On Jul 2, 2013, at 8:31 AM, Jeff Squyres jsquy...@cisco.com wrote: (Previous patch did not include updates for the man pages) Keep IBV_MTU_* enums values as they are, but pass MTU values around as a struct containing a single int. Per lengthy discusson on the linux-rdma list,