= -12
[5416523.238381] ib_srpt: ***ERROR***: rejected SRP_LOGIN_REQ because
creating a new RDMA channel failed.
[5416523.238393] ib_srpt: Rejecting login with reason 0x10001
Cheers,
Sebastian
--
Sebastian Riemer
Linux Kernel Developer
ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Ge
On 19.07.2012 22:31, Roland Dreier wrote:
> I have to think about the best way to fix this. We could just
> convert to vmalloc() here but I'm not thrilled about consuming
> vmalloc() space (on modern 64-bit architectures it's a non-issue
> but it's going to cause issues for people on smaller syste
. ;-)
Cheers,
Sebastian
--
Sebastian Riemer
Linux Kernel Developer
ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.rie...@profitbricks.com
Tel.: +49 - 30 - 60 98 56 991 - 915
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht
On 31.07.2012 13:08, Alex Netes wrote:
> Congestion control isn't a credit based mechanism. While InfiniBand flow
> control is defined between two ports of the same link, congestion control is
> working across the fabric between a congestion point (a switch) and a reaction
> point (source node). Re
a4ac921bb1a9d647 ]---
ib0: transmit timeout: latency 1770 msecs
ib0: queue stopped 1, tx_head 39614, tx_tail 39614
--
Sebastian Riemer
Linux Kernel Developer
ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.rie...@profitbricks.com
Tel.: +49
Hi Bart,
we've triggered the WARN_ON() in srp_wait_last_send_wqe() by connecting
to a disabled SCST SRP target.
I would remove that one.
Cheers,
Sebastian
On 09.08.2012 17:53, Bart Van Assche wrote:
> Modify srp_disconnect_target() such that it waits until it is
> sure that no new IB completi
Hi Vladimir,
why do you put OFED together for a kernel nobody uses? Perhaps SLES and
Red Hat do it like this but nobody else.
Have a look at http://en.wikipedia.org/wiki/Linux_kernel - 3.0, 3.2 and
3.4 are the long-term stable releases.
This approach is worse than the approach before IMHO. Since
Hi Bart,
thanks for approaching this! We're not the best mainline developers so I
guess we won't be there. But we have the big SRP setups and our
sysadmins really don't like reconnecting SRP hosts manually and putting
their devices complicated to the related dm-multipath devices again.
Think abou
On 06.02.2013 10:22, Or Gerlitz wrote:
> On 06/02/2013 11:17, Mathis GAVILLON wrote:
>> Ok. But what is it possible to do with Infiniband VFs if QP0 is not
>> available ?
>
> EVERYTHING, e.g run IPoIB, iSER, RDS, MPI, etc, etc - except for what
> requires QP0, such as running SM or issuing SMPs fo
On 06.02.2013 11:20, Or Gerlitz wrote:
> On 06/02/2013 12:04, Mathis GAVILLON wrote:
>> Just a last question : is that possible VFs lid to be different from
>> PF one ?
>
> NO, we've implemented a "shared port" model, so all functions on the
> same IB port use the same lid, each function has its o
On 08.02.2013 10:24, Sagi Grimberg wrote:
> On 2/8/2013 12:42 AM, Vu Pham wrote:
>> Hello Bart,
>>
>> Thank you for taking the initiative.
>> Mellanox think that this should be discussed. We'd be happy to attend.
>>
>> We also would like to discuss:
>> * How and how fast does SRP detect a path fail
On 26.02.2013 17:55, Roland Dreier wrote:
[...]
> In fact I bet this is why the bug has been there as long as it has
> been: almost no one is using IPv6 on IPoIB seriously, and IPv4 should
> work OK as you point out.
Thanks a lot, Unfortunately, we are using IPoIB with IPv6 in
production for t
troduce qp_retry_cnt module parameter
Cheers,
Sebastian
Btw.: Before, I've hacked MD RAID-1 for high-performance replication as
DRBD is crap for our purposes. But that's worthless without a reliably
working transport.
>From c101d00fe529d845192dd6d5930a1b9c16c99b81 Mon Sep 17 00:00:00 20
On 19.03.2013 12:22, Or Gerlitz wrote:
> On 19/03/2013 12:16, Sebastian Riemer wrote:
>> Hi Bart,
>>
>> now I've got my priority on SRP again.
>
> Hi Sebastian,
>
> Are these patches targeted to upstream or backports to some OS/kernel?
> if the former, ca
On 19.03.2013 12:45, Bart Van Assche wrote:
> On 03/19/13 11:16, Sebastian Riemer wrote:
>>
>> What are your thought regarding this?
>>
>> Attached patches:
>> ib_srp: register srp_fail_rport_io as terminate_rport_io
>> ib_srp: be quiet when failing SCSI comm
On 09.04.2013 13:12, Vasiliy Tolstov wrote:
> Hello. I have some servers, with mellanox ConnectX-3 and have some questions:
> Why max_mtu differs with active_mtu?
Because 2048 is the default and 4096 is the max. supported MTU by the
hardware.
> How can i set active mtu?
Something like this:
echo
On 09.04.2013 13:51, Vasiliy Tolstov wrote:
>> Something like this:
>> echo 4096 > /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu
>
> After doing this all srp connections down and port is down. I need to
> restart openibd
Sorry for that! It's much easier to set the IP MTU. Managed switches
su
On 09.04.2013 14:49, Hal Rosenstock wrote:
> On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote:
>> Hello. I have some servers, with mellanox ConnectX-3 and have some questions:
>> Why max_mtu differs with active_mtu?
>
> What does peer port say for max MTU ?
>
>> How can i set active mtu?
>
> SM sets
On 09.04.2013 15:34, Hal Rosenstock wrote:
> On 4/9/2013 9:16 AM, Sebastian Riemer wrote:
>> On 09.04.2013 14:49, Hal Rosenstock wrote:
>>> On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote:
>>>> Hello. I have some servers, with mellanox ConnectX-3 and have some
>>&g
On 09.04.2013 16:23, Hal Rosenstock wrote:
>> So these values are exactly the same as in "ibv_devinfo" and can be set
>> in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu.
>>
>> I've found the PortInfo with the command
>> "smpquery portinfo -C mlx4_0 3 1"
>> where I'm using the first HCA to con
e'll have a booth on
LinuxTag in Berlin/Germany. I'll have a technical talk there about SRP:
http://www.linuxtag.org/2013/en/program/thursday-may-23-2013.html?eventid=208
Cheers,
Sebastian
--
Sebastian Riemer
Linux Kernel Developer - Storage
ProfitBricks GmbH • Greifswalder
Hi Vasiliy,
sorry for the late reply! I was ill last week.
The main difference so far is that my patches are much easier to
understand as I don't provide back-porting and also don't put
performance improvements in between the stability fixes. I provide full
test cases + scripts which makes it sim
Hi Gandalf,
just build up two separate fabrics. This means that you don't
interconnect both switches.
Otherwise, issues on one port also affect the other port.
What do you use for storage? SRP?
This requires dm-multipath and fast IO failing + automatic reconnect
patches from Bart or from me.
All
FYI: I've released version 0.6 of my SRP patches today.
The automatic reconnect is included now. The tests for that will follow
in the next version. But we already did quite intensive testing for that.
Hard reboot and also soft reboot of the target are possible with that
reconnect. It just reconn
On 14.05.2013 12:02, Vasiliy Tolstov wrote:
> Sorry for bumping old thread, i'm solve my problems with new firmware.
> I have supermicro servers that rebrand mellanox firmware (recompile
> and change some bits)
> Now all works fine i have 40 gb/s QDR instead of 10 Gb/s
>
Thanks, sharing lesson lea
ter reconnects and ability to close session from
> initiator side under qlogic hardware, does it possible? Or this
> patches only covers mallanox cards?
>
> 2013/5/8 Sebastian Riemer :
>> FYI: I've released version 0.6 of my SRP patches today.
>>
>> The automatic reconne
On 15.05.2013 07:12, Vasiliy Tolstov wrote:
> 2013/5/14 Bart Van Assche :
>> The ability to close a session from the initiator side went upstream in
>> kernel 3.8 (/sys/class/srp_remote_ports/port-:/delete). Regarding
>> faster reconnects: please keep in mind that after a cable pull it can easily
>
On 17.05.2013 16:16, Jack Wang wrote:
> unable to handle kernel paging request
Hi Jack,
this should be related to the list corruption in IPoIB as list_del()
sets the LIST_POISON1 and LIST_POISON2 pointers.
Referencing these results in page faults according to the documentation
in the code.
Cheer
On 08.06.2013 04:31, Bruce McKenzie wrote:
> Hi Bart.
>
> any advice on using this fix with MD raid 1? a guide or site you know of?
>
> ive compiled ubuntu 13.04 to kernel 3.6.11 with OFED 2 from Mellanox, and it
> works ok, performance is a little better with SRP. Some packages dont seem
> to w
On 10.06.2013 14:44, Bart Van Assche wrote:
> On 06/10/13 14:05, Sebastian Riemer wrote:
>> Perhaps, I should collect all guys who require MD RAID-1 for remote
>> storage replication in order to put some pressure on Neil.
>
> If I remember correctly one of the things Neil is
he 'delete' sysfs attribute of the remote port before connecting.
Note: The function srp_conn_unique() has been taken from Bart Van Assche.
Cc: Bart Van Assche
Cc: David Dillow
Cc: Vu Pham
Cc: Sagi Grimberg
Cc: Oren Duer
Cc: Or Gerlitz
Signed-off-by: Sebastian Riemer
Reviewed-b
t that all users are using the srp-tools.
Please compare with Bart's version and let's discuss this here.
https://github.com/bvanassche/ib_srp-backport/commit/7d8774ff58d489858b1c046b2bf01b4e84e8dd9b
Cheers,
Sebastian
On 12.06.2013 13:29, Sebastian Riemer wrote:
> The sysfs attribut
;> Signed-off-by: Dotan Barak
>> Reviewed-by: Eli Cohen
>> Signed-off-by: Bart Van Assche
>> Cc: Roland Dreier
>> Cc: David Dillow
>> Cc: Vu Pham
>> Cc: Sebastian Riemer
>> ---
>> drivers/infiniband/ulp/srp/ib_srp.c |2 ++
>>
e wrote:
> Avoid that srp_claim_command() can claim a command while
> srp_queuecommand() is still busy queueing the same command.
> Found this via source reading.
>
> Signed-off-by: Bart Van Assche
> Cc: Roland Dreier
> Cc: David Dillow
> Cc: Vu Pham
> Cc: Sebastian
ids that the SCSI
> error handler skips the srp_reset_host() call after a transport
> layer error.
>
> Signed-off-by: Bart Van Assche
> Cc: Roland Dreier
> Cc: David Dillow
> Cc: Vu Pham
> Cc: Sebastian Riemer
> ---
> drivers/infiniband/ulp/srp/ib_srp.c | 11 +
e
> Cc: Roland Dreier
> Cc: David Dillow
> Cc: Vu Pham
> Cc: Sebastian Riemer
> ---
> drivers/infiniband/ulp/srp/ib_srp.c |1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
> b/drivers/infiniband/ulp/srp/ib_srp.c
&g
via sysfs.
>
> Signed-off-by: Bart Van Assche
> Cc: Roland Dreier
> Cc: David Dillow
> Cc: Vu Pham
> Cc: Sebastian Riemer
> ---
> drivers/infiniband/ulp/srp/ib_srp.c | 38
> +++
> 1 file changed, 38 insertions(+)
>
> diff --gi
Bart's version also has the printing of the connection string if the
double login fails.
So forget about this version here.
On 12.06.2013 13:51, Sebastian Riemer wrote:
> Hi all,
>
> as proposed by Or, let's discuss this on the mailing list.
>
> This is a fundam
On 13.06.2013 17:07, Bart Van Assche wrote:
[...]
> The "%.*s" should only copy the data provided by the user, even if it
> is not '\0' terminated. Stripping the trailing newline is probably
> possible with something like the (untested) code below (will only work
> if there is only one newline in t
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 14.06.2013 01:27, Vu Pham wrote:
> Bart Van Assche wrote:
>> On 06/13/13 19:50, Vu Pham wrote:
>>> Hello Bart,
>>>
+/** + * srp_conn_unique() - check whether the connection to
a target is unique + */ +static bool srp_conn_unique(struct
>>
changed - so that the
iSCSI host number couldn't be found.
After fixing that, it worked for me.
Cheers,
Sebastian
--
Sebastian Riemer
Linux Kernel Developer
ProfitBricks GmbH
Greifswalder Str. 207
10405 Berlin, Germany
Tel.: +49 - 30 - 51 64 09 20
Fax: +49 - 30 - 51 64 09 22
2011/12/20 Or Gerlitz :
>
> Beep, I'd like to better/understand the problem before looking on your
> struggle for solution...
> I understand that your Debian system runs kernel 3.0 - however, you didn't
> say what version of the iscsi initiator utils is provided with that distro
> nor what were the
2011/12/20 Or Gerlitz :
>
> Beep(2), so your system has distro which is based on kernel 2.6.32 and iscsi
> initiator tools version 2.0.871 and per your needs, you've booted it with
> kernel 3.0 .
>
> At this point should you have stop and make sure that this combo works,
> iscsi wise (simpler to te
>
>> Would it help, if we provide our patches for open-iscsi and IB/iSER>
>> 2.6.32 to bring that into mainline OFED?
>
> As Or notes, OFED is providing the kernel modules more than the iscsi code
> drop. Would be better for all (cough cough) to push changes back to the
> iscsi initiator maintai
2011/12/20 Or Gerlitz :
>
> horses, please, stay at home, or at least run a little bit slower,
> just for you - from 2 minutes
> ago - iser works well with 3.2.0-rc5 (its say -dirty b/c its a
> development system and the kernel has some patches, but not iser ones)
> and iscsi-initiator-utils of 6.2
> you wrote long emails, I'm asking for one concrete example for that enum
> crunching of adding entries
> not at the end, can you, please?
I've meant e.g. the iscsi tasks in libiscsi.h between 2.6.30 and
2.6.32. But I've meant this for OFED and not the mainline kernel.
2.6.30:
enum {
I
2011/12/21 Or Gerlitz :
> I tested the upstream kernel iser against the upstream iscsi tools from
> git://github.com/mikechristie/open-iscsi
> (commit 4323e342d2c9fb8ed7233ce855001c189ec55b23), it works
>
To bring this to an end: I believe you. Most likely I had that much
trouble because of the O
part and the
iscsid log.
This looks interesting:
"iser: iser_drain_tx_cq:tx id 88402391f898 status 4 vend_err 57"
Or, could you please investigate/explain?
It is a pain that we need both: working iSER and IPoIB traffic with good
performance.
Cheers,
Sebastian
On 19/12/11 10:14, Sebastian Riem
On 12/01/12 10:29, Or Gerlitz wrote:
> If you have build the kernel IB user space support (uverbs) and the
> IB libs, do "ibv_devinfo" if not, just ossi "cat
> /sys/class/infiniband/mlx4_0/*" and send the output. To be clear, iser
> does work for you on the productive servers but not on this serve
On 12/01/12 11:16, Sebastian Riemer wrote:
> On 12/01/12 10:29, Or Gerlitz wrote:
>
>> If you have build the kernel IB user space support (uverbs) and the
>> IB libs, do "ibv_devinfo" if not, just ossi "cat
>> /sys/class/infiniband/mlx4_0/*" and sen
On 12/01/12 17:14, Or Gerlitz wrote:
>
> you didn't send the kernel logs from the failure after opening the iser
> (debug_level=2) and libiscsi (debug_libiscsi_session=1
> debug_libiscsi_conn=1) debug prints
OK, I've also set mlx4_core debug_level=2 and have verified in
/sys/module that the para
On 16/01/12 22:16, Or Gerlitz wrote:
> Sebastian, I asked for the **iser** (ib_iser) and not mlx4_core debug_level=2
>
Yes, I did! I've enabled that additionally. And I've checked these
settings in /sys/module/*/parameters. They were set. The libiscsi from
OFED had only the option "debug_libisc
On 17/01/12 15:56, Or Gerlitz wrote:
> could you try and patch your 3.0.15 kernel with commit
> 52439540ea30396982b69662dd21aede6b336288 "IB/iser: DMA unmap TX bufs
> used for iSCSI/iSER headers" from upstream, this could help here.
>
Hi Or,
unfortunately, just cherry-picking that commit didn't do
On 19/01/12 13:18, Or Gerlitz wrote:
>> [...]
>> Or Gerlitz (4):
>> IB/iser: Fix wrong mask when sizeof (dma_addr_t) > sizeof
>> (unsigned long)
>> IB/iser: Support iSCSI PDU padding
>> IB/iser: Use separate buffers for the login request/response
>> IB/iser: DMA unmap TX buf
Hi Chet,
the trick is to check out the latest pkg-ofed source from debian SVN
(svn://svn.debian.org/svn/pkg-ofed/) and to update the upstream source
by merging the stuff by extracting the source RPMs or even better by
importing the source directly from the git repos of the OFED user space.
In the
Hi Chet,
On 22/06/12 21:02, Chet Murthy wrote:
>
> Sebastian,
>
> Thank you for taking the time to explain these things! It's a little
> confusing
>
>> Here a simple list of matching code:
>> OFED-1.5.4 ---> kernel 3.2.x
>> OFED-1.5.4.1 ---> kernel 3.3.x
>
> (1) Is there a more-exhausti
On 14.06.2013 19:07, Vu Pham wrote:
[...]
>> For what do you need the same target with multiple pkeys on the same
>> local SRP port?
>>
> There is no need, it's just a gray area that you can choose to have
> multiple connections to same target using different pkeys (same as dgid)
>> Which other
On 17.06.2013 09:29, Bart Van Assche wrote:
> On 06/17/13 09:14, Hannes Reinecke wrote:
>> On 06/17/2013 09:04 AM, Bart Van Assche wrote:
>>> I agree that the value of fast_io_fail_tmo should be kept small.
>>> Although as you explained changing the SCSI device state into
>>> SDEV_BLOCK doesn't hel
On 28.06.2013 01:45, Roland Dreier wrote:
> On Thu, Jun 27, 2013 at 2:01 PM, David Dillow wrote:
>> On Wed, 2013-06-12 at 15:20 +0200, Bart Van Assche wrote:
>>> If the add_one callback fails during driver load no resources are
>>> allocated so there isn't a need to release any resources. Trying
>
On 28.06.2013 14:48, Bart Van Assche wrote:
> Avoid that srp_claim_command() can claim a command while
> srp_queuecommand() is still busy queueing the same command.
> Found this via source reading.
Nice, that's much less re-acquiring of the target lock in error case in
srp_queuecommand().
But if w
On 28.06.2013 16:51, Bart Van Assche wrote:
>> Nice, that's much less re-acquiring of the target lock in error case in
>> srp_queuecommand().
>> But if we have to change that many locations for srp_put_tx_iu() anyway,
>> wouldn't it make sense to rename it into __srp_put_tx_iu() as well?
>>
>> Then
On 28.06.2013 14:49, Bart Van Assche wrote:
> If reconnecting failed we know that no command completion will
> be received anymore. Hence let the SCSI error handler fail such
> commands immediately.
>
> Signed-off-by: Bart Van Assche
> Cc: Roland Dreier
> Cc: David Di
On 28.06.2013 14:49, Bart Van Assche wrote:
> If reconnecting failed we know that no command completion will
> be received anymore. Hence let the SCSI error handler fail such
> commands immediately.
>
> Signed-off-by: Bart Van Assche
> Cc: Roland Dreier
> Cc: David Di
On 01.07.2013 13:33, Bart Van Assche wrote:
>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>>> if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>>>
On 01.07.2013 13:38, Bart Van Assche wrote:
>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>>> if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>>>
On 28.06.2013 14:49, Bart Van Assche wrote:
> If reconnecting failed we know that no command completion will
> be received anymore. Hence let the SCSI error handler fail such
> commands immediately.
Acked-by: Sebastian Riemer
--
To unsubscribe from this list: send the line "uns
rt() return FAST_IO_FAIL instead of SUCCESS.
>
> Signed-off-by: Bart Van Assche
> Reported-by: Sebastian Riemer
> Cc: David Dillow
> Cc: Roland Dreier
> Cc: Vu Pham
> ---
> drivers/infiniband/ulp/srp/ib_srp.c |3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
Hi Hal,
we've encountered an issue with OpenSM 3.3.16 and the config option
"console off".
OpenSM processes are at 100% CPU load.
>From strace:
poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0, revents=POLLIN}])
read(0, "", 4096) = 0
poll([{fd=0, events=POLLIN}], 1, 1000)
On 09.10.2013 15:30, David Dillow wrote:
> On Wed, 2013-10-09 at 09:28 -0400, Hal Rosenstock wrote:
>>> >From strace:
>>> poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0, revents=POLLIN}])
>>> read(0, "", 4096) = 0
>>> poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0, r
On 09.10.2013 16:00, Hal Rosenstock wrote:
> Do you recall the sequence to get to this ?
>
> Was console option changed to off and then OpenSM SIGHUP'd ? Something
> else ?
>
> Is this reproducible ?
Yes, now I can reproduce it. The opensm has been initially started with
"console off" and I act
On 09.10.2013 17:15, Hal Rosenstock wrote:
> What does service restart do in terms of OpenSM ?
>
> Note that the console parameter is _not_ changeable "on the fly" right
> now so if OpenSM is being SIGHUP'd by service restart then this is a
> current limitation (and is clearly not detected/protect
On 21.01.2014 11:03, Sagi Grimberg wrote:
> On 1/20/2014 7:37 PM, Bart Van Assche wrote:
>> On 01/03/14 22:16, David Dillow wrote:
>>> Today was my last day at ORNL, and my future endeavors will leave even
>>> less time to maintain the SRP initiator.
>>>
>>> My thanks especially go to Bart, for kee
Hi Sagi,
is that "/mswg/git/mlnx_ofed/mlnx-ofed-2.x-kernel.git" tree from the
MLNX_OFED public by any chance?
There are fixes included relevant for the mainline. Would be strange if
I would send the patches as somebody at Mellanox discovered and fixed
the issues.
I've hit a kernel panic today du
ct/union/enum/typedef member 'deleted' description in 'srp_rport'
>
> Signed-off-by: Bart Van Assche
> Reported-by: Masanari Iida
> Cc: Sagi Grimberg
> Cc: Sebastian Riemer
> Cc: James Bottomley
> Cc: Roland Dreier
> ---
> drivers/scsi/scsi_transpor
On 24.02.2014 15:30, Sagi Grimberg wrote:
> When unmapping request data, it is unsafe automatically
> decrement req->nfmr regardless of it's value. This may
> happen since IO and reconnect flow may run concurrently
> resulting in req->nfmr = -1 and falsely call ib_fmr_pool_unmap.
Something is stil
75 matches
Mail list logo