Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-09 Thread Jack Wang
Hi Or, I managed to update the kernel to OFED 3.0 to verify the bug, but I can still produce the bug, maybe there're still some synchronice_irq is missing? Thanks Jack 2015-07-08 16:07 GMT+02:00 Jack Wang xjtu...@gmail.com: Thanks for your time. Looks the last one is missing in OFED 2.4

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-09 Thread Jack Wang
2015-07-09 13:21 GMT+02:00 Or Gerlitz ogerl...@mellanox.com: On 7/9/2015 2:14 PM, Jack Wang wrote: I managed to update the kernel to OFED 3.0 to verify the bug, but I can still produce the bug, maybe there're still some synchronice_irq is missing? Again, even if you don't use the upstream

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-08 Thread Jack Wang
: Fatal exception in interrupt Best regards, Jack Wang -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-08 Thread Jack Wang
PM, Jack Wang wrote: static void mlx4_ib_cq_comp(struct mlx4_cq *cq) 47 { 48 struct ib_cq *ibcq = to_mibcq(cq)-ibcq; 49 ibcq-comp_handler(ibcq, ibcq-cq_context); 50 } Looks like cq use-after-free? I have no idea where. see if you have in the code base you're using (why not the stock

Re: mlx4: IB_EVENT_PATH_MIG not generated on path migration

2015-02-10 Thread Jack Wang
2015-02-10 19:00 GMT+01:00 Jason Gunthorpe jguntho...@obsidianresearch.com: On Tue, Feb 10, 2015 at 04:56:43PM +0100, Fabian Holler wrote: Does anybody have an idea what could be wrong? Are the PATH_MIG* notifications with mlx4 drivers are working for somebody? IIRC rdmacm does not forward

Re: First byte of private_data transfered with rdma_connect() is always 0

2014-12-16 Thread Jack Wang
Found the bug. commit 56e620c453f2588cfc9898a41b110477f6417a5d Author: Jack Wang jinpu.w...@profitbricks.com Date: Tue Dec 16 15:44:17 2014 +0100 RDMA/cma: fix first byte overwritten for AF_IB If user attach private data for AF_IB, the first byte will be overwritten, because we

[PATCH] RDMA/cma: fix first byte overwritten for AF_IB

2014-12-16 Thread Jack Wang
If user attach private data for AF_IB, the first byte will be overwritten, because we always set the cma version no matter family is AF_IB, so move the version set inside if condition. Reported-by: Fabian Holler fabian.hol...@profitbricks.com Signed-off-by: Jack Wang jinpu.w...@profitbricks.com

Re: [Users] Linux kernel: Crash of IB peer in RC mode is not detected

2014-10-24 Thread Jack Wang
Thanks Roland to clarify our confusion. So looks ping-pong mechanism is the way to go. Regards, Jack 2014-10-23 20:43 GMT+02:00 Roland Dreier rol...@purestorage.com: On Thu, Oct 23, 2014 at 6:50 AM, Jack Wang xjtu...@gmail.com wrote: I expected that RDMA-Write operations will fail

Re: [Users] Linux kernel: Crash of IB peer in RC mode is not detected

2014-10-24 Thread Jack Wang
to register to SM traps in kernel module? Regards, Jack 2014-10-23 20:43 GMT+02:00 Roland Dreier rol...@purestorage.com: On Thu, Oct 23, 2014 at 6:50 AM, Jack Wang xjtu...@gmail.com wrote: I expected that RDMA-Write operations will fail if the other crashes. Also I hoped that an event

Re: [Users] Linux kernel: Crash of IB peer in RC mode is not detected

2014-10-23 Thread Jack Wang
cc to linux-rdma, which is more proper for this kind of questions. 2014-10-23 15:15 GMT+02:00 Fabian Holler fabian.hol...@profitbricks.com: Hello, we are implementing Linux kernel modules that are transferring data with RDMA-Write operations via an RC-connection between 2 hosts. After the

Re: Unable to compile RDMA-aware examples in C

2014-09-29 Thread Jack Wang
Yes, you need install some lib. eg: RedHat (RHEL 6.4 and above) yum groupinstall Infiniband Support yum install libtool autoconf automake yum install infiniband-diags perftest libibverbs-utils librdmacm-utils yum install librdmacm-devel libibverbs-devel numactl numactl-devel libaio-devel

[BUG]GPF in skb_release_data+0xa8/0x100

2014-05-27 Thread Jack Wang
Hi, We hit GPF in skb_release_data+0xa8/0x100 below in our production: (gdb) list *skb_release_data+0xa8 0x81528118 is in skb_release_data (net/core/skbuff.c:399). 394 */ 395 if (skb_shinfo(skb)-tx_flags SKBTX_DEV_ZEROCOPY) { 396

Re: [PATCH 0/9] SRP initiator patches for kernel 3.16

2014-05-06 Thread Jack Wang
On 05/06/2014 02:49 PM, Bart Van Assche wrote: This patch series consists of one patch that adds fast registration support to the SRP initiator and eight preparation patches: 0001-IB-srp-Fix-kernel-doc-warnings.patch 0002-IB-srp-Introduce-an-additional-local-variable.patch

git.openfabrics.org is down?

2014-03-31 Thread Jack Wang
Hi all, When I try to clone repo, git error out fatal: unable to connect to git.openfabrics.org: git.openfabrics.org[0: 69.55.231.74]: errno=Connection refused git web pages also do not exist any more, jump to bugzilla. Anyone know why? Regards, Jack -- To unsubscribe from this list: send the

Re: git.openfabrics.org is down?

2014-03-31 Thread Jack Wang
On 03/31/2014 03:23 PM, Weiny, Ira wrote: From a post to OFA EWG list: The System Admin is in the process of moving everything from Ubuntu to RHEL 6.5 - it may take a day to complete. Sorry for the inconvenience. Ira Thanks Ira. -Original Message- *From: *Jack Wang [xjtu

Re: SRP initiator driver maintainership

2014-01-21 Thread Jack Wang
On 01/21/2014 11:03 AM, Sagi Grimberg wrote: On 1/20/2014 7:37 PM, Bart Van Assche wrote: On 01/03/14 22:16, David Dillow wrote: Today was my last day at ORNL, and my future endeavors will leave even less time to maintain the SRP initiator. My thanks especially go to Bart, for keeping the

[PATCHv3] IB/srp: add change_queue_depth and change_queue_type support

2013-11-07 Thread Jack Wang
Hi Roland, Could you include this patch? I update with tested-by from Bart and acked-by from David. From e8d655fec4ac74a6af6a0b2471173c74e7c13f51 Mon Sep 17 00:00:00 2001 From: Jack Wang jinpu.w...@profitbricks.com Date: Mon, 26 Aug 2013 15:50:03 +0200 Subject: [PATCH] IB/srp: add

Re: [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr

2013-10-31 Thread Jack Wang
On 10/31/2013 01:24 PM, Sagi Grimberg wrote: Support create_mr and destroy_mr verbs. Creating ib_mr may be done for either ib_mr that will register regular page lists like alloc_fast_reg_mr routine, or indirect ib_mr's that can register other (pre-registered) ib_mr's in an indirect manner.

Re: [PATCH RFC v2 00/10] Introduce Signature feature

2013-10-31 Thread Jack Wang
Hi Sagi, I wander what's the performance overhead with this DIF support? And is there a roadmap for support SRP/ISER and target side for DIF? Regards, Jack On 10/31/2013 01:24 PM, Sagi Grimberg wrote: This patchset Introduces Verbs level support for signature handover feature. Siganture is

Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR

2013-10-31 Thread Jack Wang
On 10/31/2013 01:24 PM, Sagi Grimberg wrote: +{ + struct ib_mr *sig_mr = wr-wr.sig_handover.sig_mr; + u32 sig_key = sig_mr-rkey; + + memset(seg, 0, sizeof(*seg)); + + seg-status = 0x4; /*set free*/ + seg-flags = get_umr_flags(wr-wr.sig_handover.access_flags) | +

Re: [PATCH RFC v2 00/10] Introduce Signature feature

2013-10-31 Thread Jack Wang
On 10/31/2013 02:20 PM, Sagi Grimberg wrote: On 10/31/2013 2:55 PM, Jack Wang wrote: Hi Sagi, I wander what's the performance overhead with this DIF support? And is there a roadmap for support SRP/ISER and target side for DIF? Regards, Jack Well, all DIF operations are fully offloaded

Re: [PATCH 10/10] IB/srp: Make queue size configurable

2013-10-11 Thread Jack Wang
On 10/10/2013 02:19 PM, Bart Van Assche wrote: Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Hello Bart, It's better to mention user

Re: [PATCH 2/3] IB/srp: Avoid offlining operational SCSI devices

2013-10-10 Thread Jack Wang
Dreier rol...@kernel.org Cc: Vu Pham vuhu...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com Cc: Jack Wang jinpu.w...@profitbricks.com Cc: sta...@vger.kernel.org --- drivers/infiniband/ulp/srp/ib_srp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

Re: [PATCH 1/3] IB/srp: Remove target from list before freeing Scsi_Host structure

2013-10-10 Thread Jack Wang
: Roland Dreier rol...@kernel.org Cc: Sebastian Riemer sebastian.rie...@profitbricks.com Cc: Jack Wang jinpu.w...@profitbricks.com Cc: sta...@vger.kernel.org --- drivers/infiniband/ulp/srp/ib_srp.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers

Re: [PATCH 1/3] IB/srp: Remove target from list before freeing Scsi_Host structure

2013-10-10 Thread Jack Wang
On 10/10/2013 02:48 PM, Bart Van Assche wrote: On 10/10/13 14:45, Jack Wang wrote: On 10/10/2013 01:50 PM, Bart Van Assche wrote: From: Vu Pham vuhu...@mellanox.com Remove an SRP target from the SRP target list before invoking the last scsi_host_put() call. This change is necessary because

[PATCH]SRP: fix task management handle in srp

2013-09-27 Thread Jack Wang
that returns a non-zero status code in the STATUS field. The STATUS field contains the status of a task that completes. Patch made against v3.12-rc1 From 5f5af6de8dd72e37448841b7d7735d3eea4d3d83 Mon Sep 17 00:00:00 2001 From: Jack Wang jinpu.w...@profitbricks.com Date: Fri, 27 Sep 2013 11:10:05 +0200

Re: [PATCH]SRP: fix task management handle in srp

2013-09-27 Thread Jack Wang
On 09/27/2013 12:30 PM, Bart Van Assche wrote: On 09/27/13 11:20, Jack Wang wrote: Hi all, Currently handle of srp_rsp for task management is broken. in 6.9 T10/1415-D revision 16a SRP_RSP responses that contain either RESPONSE DATA or SENSE DATA shall be sent as the minimum length

Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-09-13 Thread Jack Wang
On 09/12/2013 06:30 PM, Bart Van Assche wrote: On 09/12/13 18:16, Jack Wang wrote: On 09/12/2013 12:16 AM, David Dillow wrote: On Tue, 2013-09-10 at 19:44 +0200, Bart Van Assche wrote: If this name was not yet in use in any interface that is visible in user space, I would agree that we

Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-09-13 Thread Jack Wang
On 09/13/2013 11:24 AM, Bart Van Assche wrote: On 09/13/13 10:40, Bart Van Assche wrote: On 09/13/13 10:06, Jack Wang wrote: On 09/12/2013 06:30 PM, Bart Van Assche wrote: On 09/12/13 18:16, Jack Wang wrote: On 09/12/2013 12:16 AM, David Dillow wrote: On Tue, 2013-09-10 at 19:44 +0200, Bart

Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-09-13 Thread Jack Wang
On 09/13/2013 03:33 PM, Bart Van Assche wrote: On 09/13/13 14:25, Jack Wang wrote: I tried your srp-ha branch in github, echo string SRP2=id_ext=${THCA2_GUID},ioc_guid=${THCA2_GUID},dgid=${TGID_P2},pkey=${PKEY},service_id=${THCA2_GUID},can_queue=512 to add_target failed with ib_srp

Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-09-12 Thread Jack Wang
On 09/12/2013 12:16 AM, David Dillow wrote: On Tue, 2013-09-10 at 19:44 +0200, Bart Van Assche wrote: If this name was not yet in use in any interface that is visible in user space, I would agree that we should come up with a better name. However, the SCSI mid-layer already uses that name

Re: [PATCHv2] IB/srp: add change_queue_depth and change_queue_type support

2013-09-09 Thread Jack Wang
On 08/28/2013 10:19 AM, Jack Wang wrote: Hi, Below patch is new version which address comments from Bart. Hi Roland, Could you included this in your tree or do you need me resend it? Best regards, Jack From 62d87b4d546066b251e44a3cb468b16783df7ee4 Mon Sep 17 00:00:00 2001 From

[PATCHv2] IB/srp: add change_queue_depth and change_queue_type support

2013-08-28 Thread Jack Wang
Hi, Below patch is new version which address comments from Bart. From 62d87b4d546066b251e44a3cb468b16783df7ee4 Mon Sep 17 00:00:00 2001 From: Jack Wang jinpu.w...@profitbricks.com Date: Mon, 26 Aug 2013 15:50:03 +0200 Subject: [PATCH] IB/srp: add change_queue_depth/change_queue_type support

Re: [PATCH] IB/srp: add change_queue_depth and change_queue_type support

2013-08-27 Thread Jack Wang
On 08/27/2013 10:31 AM, Bart Van Assche wrote: On 08/26/13 15:53, Jack Wang wrote: From: Jack Wang jinpu.w...@profitbricks.com Date: Mon, 26 Aug 2013 15:50:03 +0200 Subject: [PATCH] IB/srp: add change_queue_depth/change_queue_type support Signed-off-by: Jack Wang jinpu.w...@profitbricks.com

Re: [PATCH] IB/srp: add change_queue_depth and change_queue_type support

2013-08-27 Thread Jack Wang
snip This code seems incorrect to me for the SRP protocol. In the SRP protocol, although there is no TCQ support, queue depths above one are supported. I also have a more general remark. There is no TCQ support in the SRP protocol, which means that sdev-tagged_supported is always 0 (false).

[PATCH] IB/srp: add change_queue_depth and change_queue_type support

2013-08-26 Thread Jack Wang
Attached patch add change_queue_depth/change_queue_type function support for srp driver, as what most modern scsi host driver does. From 10445c9fd9e24d03269e43680bcd2504c713b622 Mon Sep 17 00:00:00 2001 From: Jack Wang jinpu.w...@profitbricks.com Date: Mon, 26 Aug 2013 15:50:03 +0200 Subject

Re: [PATCH] IPoIB: Fix race in deleting ipoib_neigh entries

2013-08-13 Thread Jack Wang
On 08/13/2013 09:54 AM, Or Gerlitz wrote: On 09/08/2013 03:44, Jim Foraker wrote: In several places, this snippet is used when removing neigh entries: list_del(neigh-list); ipoib_neigh_free(neigh); The list_del() removes neigh from the associated struct ipoib_path, while

RE: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-24 Thread Jack Wang
I'm not sure it's possible to avoid such a race without introducing a new mutex. How about something like the (untested) SCSI core patch below, and invoking scsi_block_eh() and scsi_unblock_eh() around any reconnect activity not initiated from the SCSI EH thread ? [PATCH] Add

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-24 Thread Jack Wang
On 06/24/2013 05:50 PM, Bart Van Assche wrote: On 06/24/13 15:48, Jack Wang wrote: I'm not sure it's possible to avoid such a race without introducing a new mutex. How about something like the (untested) SCSI core patch below, and invoking scsi_block_eh() and scsi_unblock_eh() around any

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-21 Thread Jack Wang
On 06/19/2013 05:27 PM, Bart Van Assche wrote: On 06/19/13 15:44, Jack Wang wrote: +/* + * It can occur that after fast_io_fail_tmo expired and before + * dev_loss_tmo expired that the SCSI error handler has + * offlined one or more devices. doesn't

RE: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-19 Thread Jack Wang
+ /* + * It can occur that after fast_io_fail_tmo expired and before + * dev_loss_tmo expired that the SCSI error handler has + * offlined one or more devices. scsi_target_unblock() doesn't + * change the state of these devices

RE:[PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-12 Thread Jack Wang
An SRP target is required to maintain a single connection between initiator and target. This means that if the 'add_target' attribute is used to create a second connection to a target that the first connection will be logged out and that the SCSI error handler will kick in. The SCSI error

Warning about possible recursive locking detected in IPoIB

2013-05-23 Thread Jack Wang
Hi Or, I saw below warning when enable CONFIG_DEBUG_MUTEXES 1893 May 21 08:56:32 ib2 kernel: [ 44.738725] = 1894 May 21 08:56:32 ib2 kernel: [ 44.738782] [ INFO: possible recursive locking detected ] 1895 May 21 08:56:32 ib2 kernel: [

Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-23 Thread Jack Wang
On 05/21/2013 05:19 PM, Jack Wang wrote: On 05/21/2013 02:51 PM, Sebastian Riemer wrote: On 17.05.2013 16:16, Jack Wang wrote: unable to handle kernel paging request Hi Jack, this should be related to the list corruption in IPoIB as list_del() sets the LIST_POISON1 and LIST_POISON2

Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-23 Thread Jack Wang
On 2013年05月23日 19:41, Doug Ledford wrote: On 05/23/2013 11:38 AM, Jack Wang wrote: Tainted: G O 3.4.23-pserver-hotfix+ #109 System manufacturer ^^^ I would try a newer kernel. There are a couple known issues fixed since this kernel (including a memory

Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-21 Thread Jack Wang
On 05/21/2013 02:51 PM, Sebastian Riemer wrote: On 17.05.2013 16:16, Jack Wang wrote: unable to handle kernel paging request Hi Jack, this should be related to the list corruption in IPoIB as list_del() sets the LIST_POISON1 and LIST_POISON2 pointers. Referencing these results in page

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
Hi Jack, I don't understand what is the current status, that is what do you see now after applying the patches. If you don't get the original bug why did you gave the trace of it? Or is it a new trace? It is not clear from your mail. Please add only the trace of the current issue.

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
On 2013年05月20日 21:00, Or Gerlitz wrote: On Mon, May 20, 2013 at 5:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: Sorry for confusion. Current list corruption is gone in my preliminary test, after I changed list_del to list_del_init as Or suggested. As Or asked for the original bug, so

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
On 2013年05月20日 21:50, Or Gerlitz wrote: On Mon, May 20, 2013 at 10:38 PM, Jack Wang jinpu.w...@profitbricks.com wrote: The bug in our production environment is introduced in our backport about ipoib fixes from mainline, and when we hit that bug we reverted back to old kernel without

Re: list corruption in IPOIB

2013-05-18 Thread Jack Wang
On 2013年05月18日 21:37, Or Gerlitz wrote: On Fri, May 17, 2013 at 10:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: We've seen below neigh-list list corruption warning during testing, So about little heads up on what kernel you are using? what's the way to trigger this warning? Hi

BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-17 Thread Jack Wang
Hi All, I've saw this before, anyone have suggestion how to fix this. May 17 16:09:13 ib2 kernel: [ 528.500381] BUG: unable to handle kernel paging request at 00070a78 May 17 16:09:13 ib2 kernel: [ 528.500529] IP: [a0166810] ipoib_cm_tx_handler+0x30/0x2a0 [ib_ipoib] May 17

list corruption in IPOIB

2013-05-17 Thread Jack Wang
Hi Shlomo Or, We've seen below neigh-list list corruption warning during testing, From Dongsu's and my opinion, several place also need netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh-list , I tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it improved the

RE: [PATCH 4/7] [SCSI] scst: Add SRP target driver

2010-12-20 Thread Jack Wang
This patch adds the kernel module ib_srpt, which is a SCSI RDMA Protocol (SRP) target implementation. This driver uses the InfiniBand stack and the SCST core. It is a high performance driver capable of handling 600K+ 4K random write IOPS by a single target as well as 2.5+ GB/s sequential