mlx4_ib_create_qp failed - OOM with call trace

2012-07-18 Thread Sebastian Riemer
= -12 [5416523.238381] ib_srpt: ***ERROR***: rejected SRP_LOGIN_REQ because creating a new RDMA channel failed. [5416523.238393] ib_srpt: Rejecting login with reason 0x10001 Cheers, Sebastian -- Sebastian Riemer Linux Kernel Developer ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Ge

Re: mlx4_ib_create_qp failed - OOM with call trace

2012-07-20 Thread Sebastian Riemer
On 19.07.2012 22:31, Roland Dreier wrote: > I have to think about the best way to fix this. We could just > convert to vmalloc() here but I'm not thrilled about consuming > vmalloc() space (on modern 64-bit architectures it's a non-issue > but it's going to cause issues for people on smaller syste

Basics of congestion control?

2012-07-31 Thread Sebastian Riemer
. ;-) Cheers, Sebastian -- Sebastian Riemer Linux Kernel Developer ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany www.profitbricks.com • sebastian.rie...@profitbricks.com Tel.: +49 - 30 - 60 98 56 991 - 915 Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht

Re: Basics of congestion control?

2012-07-31 Thread Sebastian Riemer
On 31.07.2012 13:08, Alex Netes wrote: > Congestion control isn't a credit based mechanism. While InfiniBand flow > control is defined between two ports of the same link, congestion control is > working across the fabric between a congestion point (a switch) and a reaction > point (source node). Re

IB softirq race

2012-08-10 Thread Sebastian Riemer
a4ac921bb1a9d647 ]--- ib0: transmit timeout: latency 1770 msecs ib0: queue stopped 1, tx_head 39614, tx_tail 39614 -- Sebastian Riemer Linux Kernel Developer ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany www.profitbricks.com • sebastian.rie...@profitbricks.com Tel.: +49

Re: [PATCH 11/20] ib_srp: Make srp_disconnect_target() wait for IB completions

2012-08-23 Thread Sebastian Riemer
Hi Bart, we've triggered the WARN_ON() in srp_wait_last_send_wqe() by connecting to a disabled SCST SRP target. I would remove that one. Cheers, Sebastian On 09.08.2012 17:53, Bart Van Assche wrote: > Modify srp_disconnect_target() such that it waits until it is > sure that no new IB completi

Re: [ANNOUNCE] OFED-3.5-rc2 is available

2012-10-04 Thread Sebastian Riemer
Hi Vladimir, why do you put OFED together for a kernel nobody uses? Perhaps SLES and Red Hat do it like this but nobody else. Have a look at http://en.wikipedia.org/wiki/Linux_kernel - 3.0, 3.2 and 3.4 are the long-term stable releases. This approach is worse than the approach before IMHO. Since

Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time

2013-02-04 Thread Sebastian Riemer
Hi Bart, thanks for approaching this! We're not the best mainline developers so I guess we won't be there. But we have the big SRP setups and our sysadmins really don't like reconnecting SRP hosts manually and putting their devices complicated to the related dm-multipath devices again. Think abou

Re: "Virtual" ibnetdiscover command fails

2013-02-06 Thread Sebastian Riemer
On 06.02.2013 10:22, Or Gerlitz wrote: > On 06/02/2013 11:17, Mathis GAVILLON wrote: >> Ok. But what is it possible to do with Infiniband VFs if QP0 is not >> available ? > > EVERYTHING, e.g run IPoIB, iSER, RDS, MPI, etc, etc - except for what > requires QP0, such as running SM or issuing SMPs fo

Re: "Virtual" ibnetdiscover command fails

2013-02-06 Thread Sebastian Riemer
On 06.02.2013 11:20, Or Gerlitz wrote: > On 06/02/2013 12:04, Mathis GAVILLON wrote: >> Just a last question : is that possible VFs lid to be different from >> PF one ? > > NO, we've implemented a "shared port" model, so all functions on the > same IB port use the same lid, each function has its o

Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time

2013-02-08 Thread Sebastian Riemer
On 08.02.2013 10:24, Sagi Grimberg wrote: > On 2/8/2013 12:42 AM, Vu Pham wrote: >> Hello Bart, >> >> Thank you for taking the initiative. >> Mellanox think that this should be discussed. We'd be happy to attend. >> >> We also would like to discuss: >> * How and how fast does SRP detect a path fail

Re: [PATCH/RFC] IPoIB: Free ipoib neigh on path record failure so path rec queries are retried

2013-02-27 Thread Sebastian Riemer
On 26.02.2013 17:55, Roland Dreier wrote: [...] > In fact I bet this is why the bug has been there as long as it has > been: almost no one is using IPv6 on IPoIB seriously, and IPv4 should > work OK as you point out. Thanks a lot, Unfortunately, we are using IPoIB with IPv6 in production for t

[RFC ib_srp-backport] ib_srp: bind fast IO failing to QP timeout

2013-03-19 Thread Sebastian Riemer
troduce qp_retry_cnt module parameter Cheers, Sebastian Btw.: Before, I've hacked MD RAID-1 for high-performance replication as DRBD is crap for our purposes. But that's worthless without a reliably working transport. >From c101d00fe529d845192dd6d5930a1b9c16c99b81 Mon Sep 17 00:00:00 20

Re: [RFC ib_srp-backport] ib_srp: bind fast IO failing to QP timeout

2013-03-19 Thread Sebastian Riemer
On 19.03.2013 12:22, Or Gerlitz wrote: > On 19/03/2013 12:16, Sebastian Riemer wrote: >> Hi Bart, >> >> now I've got my priority on SRP again. > > Hi Sebastian, > > Are these patches targeted to upstream or backports to some OS/kernel? > if the former, ca

Re: [RFC ib_srp-backport] ib_srp: bind fast IO failing to QP timeout

2013-03-19 Thread Sebastian Riemer
On 19.03.2013 12:45, Bart Van Assche wrote: > On 03/19/13 11:16, Sebastian Riemer wrote: >> >> What are your thought regarding this? >> >> Attached patches: >> ib_srp: register srp_fail_rport_io as terminate_rport_io >> ib_srp: be quiet when failing SCSI comm

Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 13:12, Vasiliy Tolstov wrote: > Hello. I have some servers, with mellanox ConnectX-3 and have some questions: > Why max_mtu differs with active_mtu? Because 2048 is the default and 4096 is the max. supported MTU by the hardware. > How can i set active mtu? Something like this: echo

Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 13:51, Vasiliy Tolstov wrote: >> Something like this: >> echo 4096 > /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu > > After doing this all srp connections down and port is down. I need to > restart openibd Sorry for that! It's much easier to set the IP MTU. Managed switches su

Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 14:49, Hal Rosenstock wrote: > On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: >> Hello. I have some servers, with mellanox ConnectX-3 and have some questions: >> Why max_mtu differs with active_mtu? > > What does peer port say for max MTU ? > >> How can i set active mtu? > > SM sets

Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 15:34, Hal Rosenstock wrote: > On 4/9/2013 9:16 AM, Sebastian Riemer wrote: >> On 09.04.2013 14:49, Hal Rosenstock wrote: >>> On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: >>>> Hello. I have some servers, with mellanox ConnectX-3 and have some >>&g

Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 16:23, Hal Rosenstock wrote: >> So these values are exactly the same as in "ibv_devinfo" and can be set >> in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu. >> >> I've found the PortInfo with the command >> "smpquery portinfo -C mlx4_0 3 1" >> where I'm using the first HCA to con

[ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-04-12 Thread Sebastian Riemer
e'll have a booth on LinuxTag in Berlin/Germany. I'll have a technical talk there about SRP: http://www.linuxtag.org/2013/en/program/thursday-may-23-2013.html?eventid=208 Cheers, Sebastian -- Sebastian Riemer Linux Kernel Developer - Storage ProfitBricks GmbH • Greifswalder

Re: [ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-05-06 Thread Sebastian Riemer
Hi Vasiliy, sorry for the late reply! I was ill last week. The main difference so far is that my patches are much easier to understand as I don't provide back-porting and also don't put performance improvements in between the stability fixes. I provide full test cases + scripts which makes it sim

Re: Infiniband HA

2013-05-08 Thread Sebastian Riemer
Hi Gandalf, just build up two separate fabrics. This means that you don't interconnect both switches. Otherwise, issues on one port also affect the other port. What do you use for storage? SRP? This requires dm-multipath and fast IO failing + automatic reconnect patches from Bart or from me. All

Re: [ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-05-08 Thread Sebastian Riemer
FYI: I've released version 0.6 of my SRP patches today. The automatic reconnect is included now. The tests for that will follow in the next version. But we already did quite intensive testing for that. Hard reboot and also soft reboot of the target are possible with that reconnect. It just reconn

Re: tune ib stack

2013-05-14 Thread Sebastian Riemer
On 14.05.2013 12:02, Vasiliy Tolstov wrote: > Sorry for bumping old thread, i'm solve my problems with new firmware. > I have supermicro servers that rebrand mellanox firmware (recompile > and change some bits) > Now all works fine i have 40 gb/s QDR instead of 10 Gb/s > Thanks, sharing lesson lea

Re: [ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-05-14 Thread Sebastian Riemer
ter reconnects and ability to close session from > initiator side under qlogic hardware, does it possible? Or this > patches only covers mallanox cards? > > 2013/5/8 Sebastian Riemer : >> FYI: I've released version 0.6 of my SRP patches today. >> >> The automatic reconne

Re: [ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-05-15 Thread Sebastian Riemer
On 15.05.2013 07:12, Vasiliy Tolstov wrote: > 2013/5/14 Bart Van Assche : >> The ability to close a session from the initiator side went upstream in >> kernel 3.8 (/sys/class/srp_remote_ports/port-:/delete). Regarding >> faster reconnects: please keep in mind that after a cable pull it can easily >

Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-21 Thread Sebastian Riemer
On 17.05.2013 16:16, Jack Wang wrote: > unable to handle kernel paging request Hi Jack, this should be related to the list corruption in IPoIB as list_del() sets the LIST_POISON1 and LIST_POISON2 pointers. Referencing these results in page faults according to the documentation in the code. Cheer

Re: How to do replication right with SRP or remote storage?

2013-06-10 Thread Sebastian Riemer
On 08.06.2013 04:31, Bruce McKenzie wrote: > Hi Bart. > > any advice on using this fix with MD raid 1? a guide or site you know of? > > ive compiled ubuntu 13.04 to kernel 3.6.11 with OFED 2 from Mellanox, and it > works ok, performance is a little better with SRP. Some packages dont seem > to w

Re: How to do replication right with SRP or remote storage?

2013-06-10 Thread Sebastian Riemer
On 10.06.2013 14:44, Bart Van Assche wrote: > On 06/10/13 14:05, Sebastian Riemer wrote: >> Perhaps, I should collect all guys who require MD RAID-1 for remote >> storage replication in order to put some pressure on Neil. > > If I remember correctly one of the things Neil is

[PATCH] IB/srp: Maintain a single connection per I_T nexus

2013-06-12 Thread Sebastian Riemer
he 'delete' sysfs attribute of the remote port before connecting. Note: The function srp_conn_unique() has been taken from Bart Van Assche. Cc: Bart Van Assche Cc: David Dillow Cc: Vu Pham Cc: Sagi Grimberg Cc: Oren Duer Cc: Or Gerlitz Signed-off-by: Sebastian Riemer Reviewed-b

Re: [PATCH] IB/srp: Maintain a single connection per I_T nexus

2013-06-12 Thread Sebastian Riemer
t that all users are using the srp-tools. Please compare with Bart's version and let's discuss this here. https://github.com/bvanassche/ib_srp-backport/commit/7d8774ff58d489858b1c046b2bf01b4e84e8dd9b Cheers, Sebastian On 12.06.2013 13:29, Sebastian Riemer wrote: > The sysfs attribut

Re: [PATCH 01/14] IB/srp: Fix remove_one crash due to resource exhaustion

2013-06-12 Thread Sebastian Riemer
;> Signed-off-by: Dotan Barak >> Reviewed-by: Eli Cohen >> Signed-off-by: Bart Van Assche >> Cc: Roland Dreier >> Cc: David Dillow >> Cc: Vu Pham >> Cc: Sebastian Riemer >> --- >> drivers/infiniband/ulp/srp/ib_srp.c |2 ++ >>

Re: [PATCH 02/14] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()

2013-06-12 Thread Sebastian Riemer
e wrote: > Avoid that srp_claim_command() can claim a command while > srp_queuecommand() is still busy queueing the same command. > Found this via source reading. > > Signed-off-by: Bart Van Assche > Cc: Roland Dreier > Cc: David Dillow > Cc: Vu Pham > Cc: Sebastian

Re: [PATCH 03/14] IB/srp: Avoid that srp_reset_host() is skipped after a TL error

2013-06-13 Thread Sebastian Riemer
ids that the SCSI > error handler skips the srp_reset_host() call after a transport > layer error. > > Signed-off-by: Bart Van Assche > Cc: Roland Dreier > Cc: David Dillow > Cc: Vu Pham > Cc: Sebastian Riemer > --- > drivers/infiniband/ulp/srp/ib_srp.c | 11 +

Re: [PATCH 04/14] IB/srp: Skip host settle delay

2013-06-13 Thread Sebastian Riemer
e > Cc: Roland Dreier > Cc: David Dillow > Cc: Vu Pham > Cc: Sebastian Riemer > --- > drivers/infiniband/ulp/srp/ib_srp.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c > b/drivers/infiniband/ulp/srp/ib_srp.c &g

Re: [PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-13 Thread Sebastian Riemer
via sysfs. > > Signed-off-by: Bart Van Assche > Cc: Roland Dreier > Cc: David Dillow > Cc: Vu Pham > Cc: Sebastian Riemer > --- > drivers/infiniband/ulp/srp/ib_srp.c | 38 > +++ > 1 file changed, 38 insertions(+) > > diff --gi

Re: [PATCH] IB/srp: Maintain a single connection per I_T nexus

2013-06-13 Thread Sebastian Riemer
Bart's version also has the printing of the connection string if the double login fails. So forget about this version here. On 12.06.2013 13:51, Sebastian Riemer wrote: > Hi all, > > as proposed by Or, let's discuss this on the mailing list. > > This is a fundam

Re: [PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-13 Thread Sebastian Riemer
On 13.06.2013 17:07, Bart Van Assche wrote: [...] > The "%.*s" should only copy the data provided by the user, even if it > is not '\0' terminated. Stripping the trailing newline is probably > possible with something like the (untested) code below (will only work > if there is only one newline in t

Re: [PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-14 Thread Sebastian Riemer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14.06.2013 01:27, Vu Pham wrote: > Bart Van Assche wrote: >> On 06/13/13 19:50, Vu Pham wrote: >>> Hello Bart, >>> +/** + * srp_conn_unique() - check whether the connection to a target is unique + */ +static bool srp_conn_unique(struct >>

IB/iSER with Linux 3.0 and Debian: Lesson learned

2011-12-19 Thread Sebastian Riemer
changed - so that the iSCSI host number couldn't be found. After fixing that, it worked for me. Cheers, Sebastian -- Sebastian Riemer Linux Kernel Developer ProfitBricks GmbH Greifswalder Str. 207 10405 Berlin, Germany Tel.:  +49 - 30 - 51 64 09 20 Fax:   +49 - 30 - 51 64 09 22

Re: IB/iSER with Linux 3.0 and Debian: Lesson learned

2011-12-20 Thread Sebastian Riemer
2011/12/20 Or Gerlitz : > > Beep, I'd like to better/understand the problem before looking on your > struggle for solution... > I understand that your Debian system runs kernel 3.0 - however, you didn't > say what version of the iscsi initiator utils is provided with that distro > nor what were the

Re: IB/iSER with Linux 3.0 and Debian: Lesson learned

2011-12-20 Thread Sebastian Riemer
2011/12/20 Or Gerlitz : > > Beep(2), so your system has distro which is based on kernel 2.6.32 and iscsi > initiator tools version 2.0.871 and per your needs, you've booted it with > kernel 3.0 . > > At this point should you have stop and make sure that this combo works, > iscsi wise (simpler to te

Re: IB/iSER with Linux 3.0 and Debian: Lesson learned

2011-12-20 Thread Sebastian Riemer
> >> Would it help, if we provide our patches for open-iscsi and IB/iSER> >> 2.6.32 to bring that into mainline OFED? > > As Or notes, OFED is providing the kernel modules more than the iscsi code > drop.  Would be better for all (cough cough) to push changes back to the > iscsi initiator maintai

Re: IB/iSER with Linux 3.0 and Debian: Lesson learned

2011-12-20 Thread Sebastian Riemer
2011/12/20 Or Gerlitz : > > horses, please, stay at home, or at least run a little bit slower, > just for you - from 2 minutes > ago - iser works well with 3.2.0-rc5 (its say -dirty b/c its a > development system and the kernel has some patches, but not iser ones) > and iscsi-initiator-utils of 6.2

Re: IB/iSER with Linux 3.0 and Debian: Lesson learned

2011-12-21 Thread Sebastian Riemer
> you wrote long emails, I'm asking for one concrete example for that enum > crunching  of adding entries > not at the end, can you, please? I've meant e.g. the iscsi tasks in libiscsi.h between 2.6.30 and 2.6.32. But I've meant this for OFED and not the mainline kernel. 2.6.30: enum { I

Re: IB/iSER with Linux 3.0 and Debian: Lesson learned

2011-12-21 Thread Sebastian Riemer
2011/12/21 Or Gerlitz : > I tested the upstream kernel iser against the upstream iscsi tools  from > git://github.com/mikechristie/open-iscsi > (commit 4323e342d2c9fb8ed7233ce855001c189ec55b23), it works > To bring this to an end: I believe you. Most likely I had that much trouble because of the O

IB/iSER major problems with Linux 3.0 and Solaris targets

2012-01-11 Thread Sebastian Riemer
part and the iscsid log. This looks interesting: "iser: iser_drain_tx_cq:tx id 88402391f898 status 4 vend_err 57" Or, could you please investigate/explain? It is a pain that we need both: working iSER and IPoIB traffic with good performance. Cheers, Sebastian On 19/12/11 10:14, Sebastian Riem

Re: IB/iSER major problems with Linux 3.0 and Solaris targets

2012-01-12 Thread Sebastian Riemer
On 12/01/12 10:29, Or Gerlitz wrote: > If you have build the kernel IB user space support (uverbs) and the > IB libs, do "ibv_devinfo" if not, just ossi "cat > /sys/class/infiniband/mlx4_0/*" and send the output. To be clear, iser > does work for you on the productive servers but not on this serve

Re: IB/iSER major problems with Linux 3.0 and Solaris targets

2012-01-12 Thread Sebastian Riemer
On 12/01/12 11:16, Sebastian Riemer wrote: > On 12/01/12 10:29, Or Gerlitz wrote: > >> If you have build the kernel IB user space support (uverbs) and the >> IB libs, do "ibv_devinfo" if not, just ossi "cat >> /sys/class/infiniband/mlx4_0/*" and sen

Re: IB/iSER major problems with Linux 3.0 and Solaris targets

2012-01-16 Thread Sebastian Riemer
On 12/01/12 17:14, Or Gerlitz wrote: > > you didn't send the kernel logs from the failure after opening the iser > (debug_level=2) and libiscsi (debug_libiscsi_session=1 > debug_libiscsi_conn=1) debug prints OK, I've also set mlx4_core debug_level=2 and have verified in /sys/module that the para

Re: IB/iSER problems with Linux 3.0

2012-01-17 Thread Sebastian Riemer
On 16/01/12 22:16, Or Gerlitz wrote: > Sebastian, I asked for the **iser** (ib_iser) and not mlx4_core debug_level=2 > Yes, I did! I've enabled that additionally. And I've checked these settings in /sys/module/*/parameters. They were set. The libiscsi from OFED had only the option "debug_libisc

Re: IB/iSER problems with Linux 3.0

2012-01-19 Thread Sebastian Riemer
On 17/01/12 15:56, Or Gerlitz wrote: > could you try and patch your 3.0.15 kernel with commit > 52439540ea30396982b69662dd21aede6b336288 "IB/iser: DMA unmap TX bufs > used for iSCSI/iSER headers" from upstream, this could help here. > Hi Or, unfortunately, just cherry-picking that commit didn't do

Solved: IB/iSER problems with Linux 3.0

2012-01-19 Thread Sebastian Riemer
On 19/01/12 13:18, Or Gerlitz wrote: >> [...] >> Or Gerlitz (4): >> IB/iser: Fix wrong mask when sizeof (dma_addr_t) > sizeof >> (unsigned long) >> IB/iser: Support iSCSI PDU padding >> IB/iser: Use separate buffers for the login request/response >> IB/iser: DMA unmap TX buf

Re: OFED 1.5.4.1 on Ubuntu 10.04 with Mellanox cards?

2012-06-22 Thread Sebastian Riemer
Hi Chet, the trick is to check out the latest pkg-ofed source from debian SVN (svn://svn.debian.org/svn/pkg-ofed/) and to update the upstream source by merging the stuff by extracting the source RPMs or even better by importing the source directly from the git repos of the OFED user space. In the

Re: OFED 1.5.4.1 on Ubuntu 10.04 with Mellanox cards?

2012-06-25 Thread Sebastian Riemer
Hi Chet, On 22/06/12 21:02, Chet Murthy wrote: > > Sebastian, > > Thank you for taking the time to explain these things! It's a little > confusing > >> Here a simple list of matching code: >> OFED-1.5.4 ---> kernel 3.2.x >> OFED-1.5.4.1 ---> kernel 3.3.x > > (1) Is there a more-exhausti

Re: [PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-17 Thread Sebastian Riemer
On 14.06.2013 19:07, Vu Pham wrote: [...] >> For what do you need the same target with multiple pkeys on the same >> local SRP port? >> > There is no need, it's just a gray area that you can choose to have > multiple connections to same target using different pkeys (same as dgid) >> Which other

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-17 Thread Sebastian Riemer
On 17.06.2013 09:29, Bart Van Assche wrote: > On 06/17/13 09:14, Hannes Reinecke wrote: >> On 06/17/2013 09:04 AM, Bart Van Assche wrote: >>> I agree that the value of fast_io_fail_tmo should be kept small. >>> Although as you explained changing the SCSI device state into >>> SDEV_BLOCK doesn't hel

Re: [PATCH 01/14] IB/srp: Fix remove_one crash due to resource exhaustion

2013-06-28 Thread Sebastian Riemer
On 28.06.2013 01:45, Roland Dreier wrote: > On Thu, Jun 27, 2013 at 2:01 PM, David Dillow wrote: >> On Wed, 2013-06-12 at 15:20 +0200, Bart Van Assche wrote: >>> If the add_one callback fails during driver load no resources are >>> allocated so there isn't a need to release any resources. Trying >

Re: [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()

2013-06-28 Thread Sebastian Riemer
On 28.06.2013 14:48, Bart Van Assche wrote: > Avoid that srp_claim_command() can claim a command while > srp_queuecommand() is still busy queueing the same command. > Found this via source reading. Nice, that's much less re-acquiring of the target lock in error case in srp_queuecommand(). But if w

Re: [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()

2013-06-28 Thread Sebastian Riemer
On 28.06.2013 16:51, Bart Van Assche wrote: >> Nice, that's much less re-acquiring of the target lock in error case in >> srp_queuecommand(). >> But if we have to change that many locations for srp_put_tx_iu() anyway, >> wouldn't it make sense to rename it into __srp_put_tx_iu() as well? >> >> Then

Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline

2013-07-01 Thread Sebastian Riemer
On 28.06.2013 14:49, Bart Van Assche wrote: > If reconnecting failed we know that no command completion will > be received anymore. Hence let the SCSI error handler fail such > commands immediately. > > Signed-off-by: Bart Van Assche > Cc: Roland Dreier > Cc: David Di

Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline

2013-07-01 Thread Sebastian Riemer
On 28.06.2013 14:49, Bart Van Assche wrote: > If reconnecting failed we know that no command completion will > be received anymore. Hence let the SCSI error handler fail such > commands immediately. > > Signed-off-by: Bart Van Assche > Cc: Roland Dreier > Cc: David Di

Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline

2013-07-01 Thread Sebastian Riemer
On 01.07.2013 13:33, Bart Van Assche wrote: >>> --- a/drivers/infiniband/ulp/srp/ib_srp.c >>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c >>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd) >>> if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun, >>>

Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline

2013-07-01 Thread Sebastian Riemer
On 01.07.2013 13:38, Bart Van Assche wrote: >>> --- a/drivers/infiniband/ulp/srp/ib_srp.c >>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c >>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd) >>> if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun, >>>

Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline

2013-07-02 Thread Sebastian Riemer
On 28.06.2013 14:49, Bart Van Assche wrote: > If reconnecting failed we know that no command completion will > be received anymore. Hence let the SCSI error handler fail such > commands immediately. Acked-by: Sebastian Riemer -- To unsubscribe from this list: send the line "uns

Re: [PATCH] IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline

2013-07-10 Thread Sebastian Riemer
rt() return FAST_IO_FAIL instead of SUCCESS. > > Signed-off-by: Bart Van Assche > Reported-by: Sebastian Riemer > Cc: David Dillow > Cc: Roland Dreier > Cc: Vu Pham > --- > drivers/infiniband/ulp/srp/ib_srp.c |3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) >

OpenSM 3.3.16 at 100% CPU load, "console off"

2013-10-09 Thread Sebastian Riemer
Hi Hal, we've encountered an issue with OpenSM 3.3.16 and the config option "console off". OpenSM processes are at 100% CPU load. >From strace: poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0, revents=POLLIN}]) read(0, "", 4096) = 0 poll([{fd=0, events=POLLIN}], 1, 1000)

Re: OpenSM 3.3.16 at 100% CPU load, "console off"

2013-10-09 Thread Sebastian Riemer
On 09.10.2013 15:30, David Dillow wrote: > On Wed, 2013-10-09 at 09:28 -0400, Hal Rosenstock wrote: >>> >From strace: >>> poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0, revents=POLLIN}]) >>> read(0, "", 4096) = 0 >>> poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0, r

Re: OpenSM 3.3.16 at 100% CPU load, "console off"

2013-10-09 Thread Sebastian Riemer
On 09.10.2013 16:00, Hal Rosenstock wrote: > Do you recall the sequence to get to this ? > > Was console option changed to off and then OpenSM SIGHUP'd ? Something > else ? > > Is this reproducible ? Yes, now I can reproduce it. The opensm has been initially started with "console off" and I act

Re: OpenSM 3.3.16 at 100% CPU load, "console off"

2013-10-09 Thread Sebastian Riemer
On 09.10.2013 17:15, Hal Rosenstock wrote: > What does service restart do in terms of OpenSM ? > > Note that the console parameter is _not_ changeable "on the fly" right > now so if OpenSM is being SIGHUP'd by service restart then this is a > current limitation (and is clearly not detected/protect

Re: SRP initiator driver maintainership

2014-01-21 Thread Sebastian Riemer
On 21.01.2014 11:03, Sagi Grimberg wrote: > On 1/20/2014 7:37 PM, Bart Van Assche wrote: >> On 01/03/14 22:16, David Dillow wrote: >>> Today was my last day at ORNL, and my future endeavors will leave even >>> less time to maintain the SRP initiator. >>> >>> My thanks especially go to Bart, for kee

IB/srp: merge fixes from MLNX_OFED

2014-02-18 Thread Sebastian Riemer
Hi Sagi, is that "/mswg/git/mlnx_ofed/mlnx-ofed-2.x-kernel.git" tree from the MLNX_OFED public by any chance? There are fixes included relevant for the mainline. Would be strange if I would send the patches as somebody at Mellanox discovered and fixed the issues. I've hit a kernel panic today du

Re: [PATCH 1/6] scsi_transport_srp: Fix two kernel-doc warnings

2014-02-20 Thread Sebastian Riemer
ct/union/enum/typedef member 'deleted' description in 'srp_rport' > > Signed-off-by: Bart Van Assche > Reported-by: Masanari Iida > Cc: Sagi Grimberg > Cc: Sebastian Riemer > Cc: James Bottomley > Cc: Roland Dreier > --- > drivers/scsi/scsi_transpor

Re: [PATCH v1 1/3] IB/srp: Fix crash when unmapping data loop

2014-02-24 Thread Sebastian Riemer
On 24.02.2014 15:30, Sagi Grimberg wrote: > When unmapping request data, it is unsafe automatically > decrement req->nfmr regardless of it's value. This may > happen since IO and reconnect flow may run concurrently > resulting in req->nfmr = -1 and falsely call ib_fmr_pool_unmap. Something is stil