Hi Or,
I managed to update the kernel to OFED 3.0 to verify the bug, but I
can still produce the bug, maybe there're still some synchronice_irq
is missing?
Thanks
Jack
2015-07-08 16:07 GMT+02:00 Jack Wang xjtu...@gmail.com:
Thanks for your time.
Looks the last one is missing in OFED 2.4
2015-07-09 13:21 GMT+02:00 Or Gerlitz ogerl...@mellanox.com:
On 7/9/2015 2:14 PM, Jack Wang wrote:
I managed to update the kernel to OFED 3.0 to verify the bug, but I
can still produce the bug, maybe there're still some synchronice_irq
is missing?
Again, even if you don't use the upstream
: Fatal exception in
interrupt
Best regards,
Jack Wang
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
PM, Jack Wang wrote:
static void mlx4_ib_cq_comp(struct mlx4_cq *cq)
47 {
48 struct ib_cq *ibcq = to_mibcq(cq)-ibcq;
49 ibcq-comp_handler(ibcq, ibcq-cq_context);
50 }
Looks like cq use-after-free? I have no idea where.
see if you have in the code base you're using (why not the stock
2015-02-10 19:00 GMT+01:00 Jason Gunthorpe jguntho...@obsidianresearch.com:
On Tue, Feb 10, 2015 at 04:56:43PM +0100, Fabian Holler wrote:
Does anybody have an idea what could be wrong?
Are the PATH_MIG* notifications with mlx4 drivers are working for somebody?
IIRC rdmacm does not forward
Found the bug.
commit 56e620c453f2588cfc9898a41b110477f6417a5d
Author: Jack Wang jinpu.w...@profitbricks.com
Date: Tue Dec 16 15:44:17 2014 +0100
RDMA/cma: fix first byte overwritten for AF_IB
If user attach private data for AF_IB, the first byte will
be overwritten, because we
If user attach private data for AF_IB, the first byte will
be overwritten, because we always set the cma version no matter
family is AF_IB, so move the version set inside if condition.
Reported-by: Fabian Holler fabian.hol...@profitbricks.com
Signed-off-by: Jack Wang jinpu.w...@profitbricks.com
Thanks Roland to clarify our confusion.
So looks ping-pong mechanism is the way to go.
Regards,
Jack
2014-10-23 20:43 GMT+02:00 Roland Dreier rol...@purestorage.com:
On Thu, Oct 23, 2014 at 6:50 AM, Jack Wang xjtu...@gmail.com wrote:
I expected that RDMA-Write operations will fail
to register to SM traps in kernel module?
Regards,
Jack
2014-10-23 20:43 GMT+02:00 Roland Dreier rol...@purestorage.com:
On Thu, Oct 23, 2014 at 6:50 AM, Jack Wang xjtu...@gmail.com wrote:
I expected that RDMA-Write operations will fail if the other crashes.
Also I hoped that an event
cc to linux-rdma, which is more proper for this kind of questions.
2014-10-23 15:15 GMT+02:00 Fabian Holler fabian.hol...@profitbricks.com:
Hello,
we are implementing Linux kernel modules that are transferring data
with RDMA-Write operations via an RC-connection between 2 hosts.
After the
Yes, you need install some lib. eg:
RedHat (RHEL 6.4 and above)
yum groupinstall Infiniband Support
yum install libtool autoconf automake
yum install infiniband-diags perftest libibverbs-utils librdmacm-utils
yum install librdmacm-devel libibverbs-devel numactl numactl-devel
libaio-devel
Hi,
We hit GPF in skb_release_data+0xa8/0x100 below in our production:
(gdb) list *skb_release_data+0xa8
0x81528118 is in skb_release_data (net/core/skbuff.c:399).
394 */
395 if (skb_shinfo(skb)-tx_flags SKBTX_DEV_ZEROCOPY) {
396
On 05/06/2014 02:49 PM, Bart Van Assche wrote:
This patch series consists of one patch that adds fast registration
support to the SRP initiator and eight preparation patches:
0001-IB-srp-Fix-kernel-doc-warnings.patch
0002-IB-srp-Introduce-an-additional-local-variable.patch
Hi all,
When I try to clone repo, git error out
fatal: unable to connect to git.openfabrics.org:
git.openfabrics.org[0: 69.55.231.74]: errno=Connection refused
git web pages also do not exist any more, jump to bugzilla.
Anyone know why?
Regards,
Jack
--
To unsubscribe from this list: send the
On 03/31/2014 03:23 PM, Weiny, Ira wrote:
From a post to OFA EWG list:
The System Admin is in the process of moving everything from Ubuntu to RHEL
6.5 - it may take a day to complete. Sorry for the inconvenience.
Ira
Thanks Ira.
-Original Message-
*From: *Jack Wang [xjtu
On 01/21/2014 11:03 AM, Sagi Grimberg wrote:
On 1/20/2014 7:37 PM, Bart Van Assche wrote:
On 01/03/14 22:16, David Dillow wrote:
Today was my last day at ORNL, and my future endeavors will leave even
less time to maintain the SRP initiator.
My thanks especially go to Bart, for keeping the
Hi Roland,
Could you include this patch? I update with tested-by from Bart and
acked-by from David.
From e8d655fec4ac74a6af6a0b2471173c74e7c13f51 Mon Sep 17 00:00:00 2001
From: Jack Wang jinpu.w...@profitbricks.com
Date: Mon, 26 Aug 2013 15:50:03 +0200
Subject: [PATCH] IB/srp: add
On 10/31/2013 01:24 PM, Sagi Grimberg wrote:
Support create_mr and destroy_mr verbs.
Creating ib_mr may be done for either ib_mr that will
register regular page lists like alloc_fast_reg_mr routine,
or indirect ib_mr's that can register other (pre-registered)
ib_mr's in an indirect manner.
Hi Sagi,
I wander what's the performance overhead with this DIF support?
And is there a roadmap for support SRP/ISER and target side for DIF?
Regards,
Jack
On 10/31/2013 01:24 PM, Sagi Grimberg wrote:
This patchset Introduces Verbs level support for signature handover
feature. Siganture is
On 10/31/2013 01:24 PM, Sagi Grimberg wrote:
+{
+ struct ib_mr *sig_mr = wr-wr.sig_handover.sig_mr;
+ u32 sig_key = sig_mr-rkey;
+
+ memset(seg, 0, sizeof(*seg));
+
+ seg-status = 0x4; /*set free*/
+ seg-flags = get_umr_flags(wr-wr.sig_handover.access_flags) |
+
On 10/31/2013 02:20 PM, Sagi Grimberg wrote:
On 10/31/2013 2:55 PM, Jack Wang wrote:
Hi Sagi,
I wander what's the performance overhead with this DIF support?
And is there a roadmap for support SRP/ISER and target side for DIF?
Regards,
Jack
Well, all DIF operations are fully offloaded
On 10/10/2013 02:19 PM, Bart Van Assche wrote:
Certain storage configurations, e.g. a sufficiently large array of
hard disks in a RAID configuration, need a queue depth above 64 to
achieve optimal performance. Hence make the queue depth configurable.
Hello Bart,
It's better to mention user
Dreier rol...@kernel.org
Cc: Vu Pham vuhu...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: Jack Wang jinpu.w...@profitbricks.com
Cc: sta...@vger.kernel.org
---
drivers/infiniband/ulp/srp/ib_srp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
: Roland Dreier rol...@kernel.org
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: Jack Wang jinpu.w...@profitbricks.com
Cc: sta...@vger.kernel.org
---
drivers/infiniband/ulp/srp/ib_srp.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers
On 10/10/2013 02:48 PM, Bart Van Assche wrote:
On 10/10/13 14:45, Jack Wang wrote:
On 10/10/2013 01:50 PM, Bart Van Assche wrote:
From: Vu Pham vuhu...@mellanox.com
Remove an SRP target from the SRP target list before invoking the
last scsi_host_put() call. This change is necessary because
that returns
a non-zero status code in the
STATUS field.
The STATUS field contains the status of a task that completes.
Patch made against v3.12-rc1
From 5f5af6de8dd72e37448841b7d7735d3eea4d3d83 Mon Sep 17 00:00:00 2001
From: Jack Wang jinpu.w...@profitbricks.com
Date: Fri, 27 Sep 2013 11:10:05 +0200
On 09/27/2013 12:30 PM, Bart Van Assche wrote:
On 09/27/13 11:20, Jack Wang wrote:
Hi all,
Currently handle of srp_rsp for task management is broken.
in 6.9
T10/1415-D revision 16a
SRP_RSP responses that contain either
RESPONSE DATA or SENSE DATA shall be sent as the minimum length
On 09/12/2013 06:30 PM, Bart Van Assche wrote:
On 09/12/13 18:16, Jack Wang wrote:
On 09/12/2013 12:16 AM, David Dillow wrote:
On Tue, 2013-09-10 at 19:44 +0200, Bart Van Assche wrote:
If this name was not yet in use in any interface that is visible in
user
space, I would agree that we
On 09/13/2013 11:24 AM, Bart Van Assche wrote:
On 09/13/13 10:40, Bart Van Assche wrote:
On 09/13/13 10:06, Jack Wang wrote:
On 09/12/2013 06:30 PM, Bart Van Assche wrote:
On 09/12/13 18:16, Jack Wang wrote:
On 09/12/2013 12:16 AM, David Dillow wrote:
On Tue, 2013-09-10 at 19:44 +0200, Bart
On 09/13/2013 03:33 PM, Bart Van Assche wrote:
On 09/13/13 14:25, Jack Wang wrote:
I tried your srp-ha branch in github,
echo string
SRP2=id_ext=${THCA2_GUID},ioc_guid=${THCA2_GUID},dgid=${TGID_P2},pkey=${PKEY},service_id=${THCA2_GUID},can_queue=512
to add_target failed with
ib_srp
On 09/12/2013 12:16 AM, David Dillow wrote:
On Tue, 2013-09-10 at 19:44 +0200, Bart Van Assche wrote:
If this name was not yet in use in any interface that is visible in user
space, I would agree that we should come up with a better name. However,
the SCSI mid-layer already uses that name
On 08/28/2013 10:19 AM, Jack Wang wrote:
Hi,
Below patch is new version which address comments from Bart.
Hi Roland,
Could you included this in your tree or do you need me resend it?
Best regards,
Jack
From 62d87b4d546066b251e44a3cb468b16783df7ee4 Mon Sep 17 00:00:00 2001
From
Hi,
Below patch is new version which address comments from Bart.
From 62d87b4d546066b251e44a3cb468b16783df7ee4 Mon Sep 17 00:00:00 2001
From: Jack Wang jinpu.w...@profitbricks.com
Date: Mon, 26 Aug 2013 15:50:03 +0200
Subject: [PATCH] IB/srp: add change_queue_depth/change_queue_type support
On 08/27/2013 10:31 AM, Bart Van Assche wrote:
On 08/26/13 15:53, Jack Wang wrote:
From: Jack Wang jinpu.w...@profitbricks.com
Date: Mon, 26 Aug 2013 15:50:03 +0200
Subject: [PATCH] IB/srp: add change_queue_depth/change_queue_type support
Signed-off-by: Jack Wang jinpu.w...@profitbricks.com
snip
This code seems incorrect to me for the SRP protocol. In the SRP
protocol, although there is no TCQ support, queue depths above one are
supported.
I also have a more general remark. There is no TCQ support in the SRP
protocol, which means that sdev-tagged_supported is always 0 (false).
Attached patch add change_queue_depth/change_queue_type function support
for srp driver, as what most modern scsi host driver does.
From 10445c9fd9e24d03269e43680bcd2504c713b622 Mon Sep 17 00:00:00 2001
From: Jack Wang jinpu.w...@profitbricks.com
Date: Mon, 26 Aug 2013 15:50:03 +0200
Subject
On 08/13/2013 09:54 AM, Or Gerlitz wrote:
On 09/08/2013 03:44, Jim Foraker wrote:
In several places, this snippet is used when removing neigh entries:
list_del(neigh-list);
ipoib_neigh_free(neigh);
The list_del() removes neigh from the associated struct ipoib_path, while
I'm not sure it's possible to avoid such a race without introducing
a new mutex. How about something like the (untested) SCSI core patch
below, and invoking scsi_block_eh() and scsi_unblock_eh() around any
reconnect activity not initiated from the SCSI EH thread ?
[PATCH] Add
On 06/24/2013 05:50 PM, Bart Van Assche wrote:
On 06/24/13 15:48, Jack Wang wrote:
I'm not sure it's possible to avoid such a race without introducing
a new mutex. How about something like the (untested) SCSI core patch
below, and invoking scsi_block_eh() and scsi_unblock_eh() around any
On 06/19/2013 05:27 PM, Bart Van Assche wrote:
On 06/19/13 15:44, Jack Wang wrote:
+/*
+ * It can occur that after fast_io_fail_tmo expired and before
+ * dev_loss_tmo expired that the SCSI error handler has
+ * offlined one or more devices. doesn't
+ /*
+ * It can occur that after fast_io_fail_tmo expired and before
+ * dev_loss_tmo expired that the SCSI error handler has
+ * offlined one or more devices. scsi_target_unblock() doesn't
+ * change the state of these devices
An SRP target is required to maintain a single connection between
initiator and target. This means that if the 'add_target' attribute
is used to create a second connection to a target that the first
connection will be logged out and that the SCSI error handler will
kick in. The SCSI error
Hi Or,
I saw below warning when enable CONFIG_DEBUG_MUTEXES
1893 May 21 08:56:32 ib2 kernel: [ 44.738725]
=
1894 May 21 08:56:32 ib2 kernel: [ 44.738782] [ INFO: possible recursive
locking detected ]
1895 May 21 08:56:32 ib2 kernel: [
On 05/21/2013 05:19 PM, Jack Wang wrote:
On 05/21/2013 02:51 PM, Sebastian Riemer wrote:
On 17.05.2013 16:16, Jack Wang wrote:
unable to handle kernel paging request
Hi Jack,
this should be related to the list corruption in IPoIB as list_del()
sets the LIST_POISON1 and LIST_POISON2
On 2013年05月23日 19:41, Doug Ledford wrote:
On 05/23/2013 11:38 AM, Jack Wang wrote:
Tainted: G O 3.4.23-pserver-hotfix+ #109 System manufacturer
^^^
I would try a newer kernel. There are a couple known issues fixed since
this kernel (including a memory
On 05/21/2013 02:51 PM, Sebastian Riemer wrote:
On 17.05.2013 16:16, Jack Wang wrote:
unable to handle kernel paging request
Hi Jack,
this should be related to the list corruption in IPoIB as list_del()
sets the LIST_POISON1 and LIST_POISON2 pointers.
Referencing these results in page
Hi Jack,
I don't understand what is the current status, that is what do you see
now after applying the patches.
If you don't get the original bug why did you gave the trace of it? Or
is it a new trace? It is not clear from your mail.
Please add only the trace of the current issue.
On 2013年05月20日 21:00, Or Gerlitz wrote:
On Mon, May 20, 2013 at 5:36 PM, Jack Wang jinpu.w...@profitbricks.com
wrote:
Sorry for confusion. Current list corruption is gone in my preliminary test,
after I changed
list_del to list_del_init as Or suggested.
As Or asked for the original bug, so
On 2013年05月20日 21:50, Or Gerlitz wrote:
On Mon, May 20, 2013 at 10:38 PM, Jack Wang jinpu.w...@profitbricks.com
wrote:
The bug in our production environment is introduced in our backport
about ipoib fixes from mainline, and when we hit that bug we reverted
back to old kernel without
On 2013年05月18日 21:37, Or Gerlitz wrote:
On Fri, May 17, 2013 at 10:36 PM, Jack Wang jinpu.w...@profitbricks.com
wrote:
We've seen below neigh-list list corruption warning during testing,
So about little heads up on what kernel you are using? what's the way
to trigger this warning?
Hi
Hi All,
I've saw this before, anyone have suggestion how to fix this.
May 17 16:09:13 ib2 kernel: [ 528.500381] BUG: unable to handle kernel
paging request at 00070a78
May 17 16:09:13 ib2 kernel: [ 528.500529] IP: [a0166810]
ipoib_cm_tx_handler+0x30/0x2a0 [ib_ipoib]
May 17
Hi Shlomo Or,
We've seen below neigh-list list corruption warning during testing,
From Dongsu's and my opinion, several place also need
netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh-list , I
tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it
improved the
This patch adds the kernel module ib_srpt, which is a SCSI RDMA Protocol
(SRP)
target implementation. This driver uses the InfiniBand stack and the SCST
core.
It is a high performance driver capable of handling 600K+ 4K random write
IOPS by a single target as well as 2.5+ GB/s sequential
53 matches
Mail list logo