Re: [PATCH 1/4] scsi: move initialization of scmd->eh_entry

2013-05-24 Thread Jörn Engel
On Fri, 24 May 2013 11:50:47 +0200, Hannes Reinecke wrote: > > The 'eh_entry' list might be used even before scsi_softirq_done() > is called. Hence we should rather initialize it together with > the other eh-related variables. > > Signed-off-by: Hannes Reinecke > --- > drivers/scsi/scsi_lib.c |

Re: [PATCH] sg: atomize check and set sdp->exclude in sg_open

2013-06-05 Thread Jörn Engel
On Wed, 5 June 2013 17:18:33 +0800, vaughan wrote: > > Check and set sdp->exclude should be atomic when set in sg_open(). The patch is line-wrapped. More importantly, it doesn't seem to do what your description indicates it should do. And lastly, does this fix a bug, possibly even one you have

Re: [PATCH] sg: atomize check and set sdp->exclude in sg_open

2013-06-05 Thread Jörn Engel
On Thu, 6 June 2013 00:16:45 +0800, vaughan wrote: > 于 2013年06月05日 21:27, Jörn Engel 写道: > >On Wed, 5 June 2013 17:18:33 +0800, vaughan wrote: > >> > >>Check and set sdp->exclude should be atomic when set in sg_open(). > > > >The patch is line-wrapped

Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler

2013-06-05 Thread Jörn Engel
On Sat, 25 May 2013 11:55:58 +0200, Hannes Reinecke wrote: > > > It should be possible to move the code into scsi_lib and just have > small hooks for the individual transports to use this. We have done something a bit tasteless and unconditionally enabled your new error handling for all scsi drive

Re: [PATCH 1/4] scsi: move initialization of scmd->eh_entry

2013-06-06 Thread Jörn Engel
On Thu, 6 June 2013 11:43:52 +0200, Hannes Reinecke wrote: > > The 'eh_entry' list might be used even before scsi_softirq_done() > is called. Hence we should rather initialize it together with > the other eh-related variables. > > Signed-off-by: Hannes Reinecke Tested-by: Joern Engel Jörn --

Re: [PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code

2013-06-06 Thread Jörn Engel
On Thu, 6 June 2013 11:43:53 +0200, Hannes Reinecke wrote: > > Add a 'BLK_EH_SCHEDULED' return code for blk-timeout to indicate > that a delayed error recovery has been initiated. > > Signed-off-by: Hannes Reinecke Since the patch I tested diverged from your patches 2-4, this is only Acked-by:

Re: [PATCH 3/4] scsi: improved eh timeout handler

2013-06-06 Thread Jörn Engel
On Thu, 6 June 2013 11:43:54 +0200, Hannes Reinecke wrote: > > When a command runs into a timeout we need to send an 'ABORT TASK' > TMF. This is typically done by the 'eh_abort_handler' LLDD callback. > > Conceptually, however, this function is a normal SCSI command, so > there is no need to ente

Re: [PATCH 3/4] scsi: improved eh timeout handler

2013-06-06 Thread Jörn Engel
On Thu, 6 June 2013 22:39:14 +0200, Hannes Reinecke wrote: > > >>+ spin_unlock_irqrestore(&sdev->list_lock, flags); > >>+ SCSI_LOG_ERROR_RECOVERY(3, > >>+ scmd_printk(KERN_INFO, scmd, > >>+ "aborting command %p\n", scmd)); > >>+

Re: [PATCH 0/4] New SCSI command timeout handler

2013-06-07 Thread Jörn Engel
On Fri, 7 June 2013 14:54:16 +0800, Ren Mingxin wrote: > On 06/06/2013 05:43 PM, Hannes Reinecke wrote: > > This approach may not work for some LLDDs as you said, but I wonder > whether SAS is applicable(whether there will be later patches for > SAS). We have tested a derived patch on SAS and it

Re: [PATCH 3/4] scsi: improved eh timeout handler

2013-06-07 Thread Jörn Engel
On Thu, 6 June 2013 11:43:54 +0200, Hannes Reinecke wrote: > > + spin_lock_irqsave(&sdev->list_lock, flags); > + list_for_each_entry_safe(scmd, tmp, &sdev->eh_abort_list, eh_entry) { > + list_del_init(&scmd->eh_entry); > + spin_unlock_irqrestore(&sdev->list_lock, fl

Re: [PATCH 1/9] target: Add transport_cmd_check_stop write_pending bit

2013-06-07 Thread Jörn Engel
On Fri, 7 June 2013 21:34:16 +, Nicholas A. Bellinger wrote: > > -static int transport_cmd_check_stop(struct se_cmd *cmd, bool > remove_from_lists) > +static int transport_cmd_check_stop(struct se_cmd *cmd, bool > remove_from_lists, > + bool write_pending) ..

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-06-10 Thread Jörn Engel
On Mon, 10 June 2013 11:00:49 +0200, Hannes Reinecke wrote: > On 06/10/2013 10:20 AM, Christoph Hellwig wrote: > > > > Why can't we use a work item per command? Linking things into a list > > just to queue it up to workqueues missed half of the point of the > > workqueue infrastructure. > > > Hm

Re: [PATCH 8/9] mpt2sas: Enable new EH timeout handler

2013-06-10 Thread Jörn Engel
On Mon, 10 June 2013 09:40:57 +0200, Hannes Reinecke wrote: > > Signed-off-by: Hannes Reinecke > --- > drivers/scsi/mpt2sas/mpt2sas_scsih.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c > b/drivers/scsi/mpt2sas/mpt2sas_scsih.c > inde

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-06-10 Thread Jörn Engel
On Mon, 10 June 2013 09:40:52 +0200, Hannes Reinecke wrote: > + > + spin_lock_irqsave(&sdev->list_lock, flags); > + if (list_empty(&sdev->eh_abort_list)) > + kick_worker = 1; > + list_add(&scmd->eh_entry, &sdev->eh_abort_list); > + spin_unlock_irqrestore(&sdev->list_lock

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-06-10 Thread Jörn Engel
On Mon, 10 June 2013 11:19:16 -0400, Jörn Engel wrote: > > I don't care too much whether we use per-command work items or a > single system-global thread. Actually, I do care. We have to abort the commands in parallel, as a fairly large number of abort can queue up and individ

Re: [PATCH 3/9] scsi: improved eh timeout handler

2013-06-11 Thread Jörn Engel
On Tue, 11 June 2013 08:18:51 +0200, Hannes Reinecke wrote: > On 06/11/2013 01:24 AM, Jörn Engel wrote: > > On Mon, 10 June 2013 11:19:16 -0400, Jörn Engel wrote: > >> > >> I don't care too much whether we use per-command work items or a > >> single syst

Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-01 Thread Jörn Engel
On Mon, 1 July 2013 08:50:48 +0200, Hannes Reinecke wrote: > > This patchset implements a new 'eh_deadline' attribute to the > SCSI host. It will limit the overall SCSI EH runtime by a given > timeout. If the timeout is reached all intermediate EH steps > will be skipped and host reset will be sch

Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-01 Thread Jörn Engel
On Mon, 1 July 2013 19:23:25 +, James Bottomley wrote: > On Mon, 2013-07-01 at 13:44 -0400, Jörn Engel wrote: > > If a single device is bad, don't ever do a host > > reset. > > This isn't a tenable position. Sometimes a device looks bad because the > host

Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-02 Thread Jörn Engel
On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote: > > I don't understand what you're getting at. In a dual HBA situation, > whether the second HBA is implicated or not depends on configuration and > what the first HBA is doing. If it's just passively lost device state, > then the second

Re: [PATCHv2 0/7] Limit overall SCSI EH runtime

2013-07-02 Thread Jörn Engel
On Tue, 2 July 2013 16:33:40 +, James Bottomley wrote: > On Tue, 2013-07-02 at 10:58 -0400, Jörn Engel wrote: > > On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote: > > > > > > I don't understand what you're getting at. In a dual HBA situa

Re: [PATCH v2 1/1] [SCSI] sg: fix race condition when do exclusive open

2013-07-05 Thread Jörn Engel
Sorry about replying so late. On Mon, 17 June 2013 21:10:53 +0800, vaughan wrote: > > Rewrite the last patch. > Add a new field 'toopen' in sg_device to count ongoing sg_open's. By checking > both 'toopen' and 'exclude' marks when do exclusive open, old race conditions > can be avoided. > Repla

Re: [PATCH v2 1/1] [SCSI] sg: fix race condition when do exclusive open

2013-07-15 Thread Jörn Engel
On Sun, 7 July 2013 01:24:44 +0800, vaughan wrote: > > Do you agree that I use a per device spin_lock 'sfd_lock' to protect > sfds list and leave sg_index_lock > only protect the global sg device lookup? I think it's reasonable > for concurrency. That bit looks fine to me. I don't think it matte

Re: [PATCH v3 1/1] [SCSI] sg: fix race condition when do exclusive open

2013-07-15 Thread Jörn Engel
On Mon, 8 July 2013 03:53:49 +0800, vaughan wrote: > > Use rwsem to aid opens. Exclusive open has to get write lock and > non-exclusive open should get read lock. > Replace global sg_open_exclusive_lock with a per device lock - sfd_lock. > Since sfds list is now protected by the lock owned by th

Re: [PATCH v4 1/4] [SCSI] sg: use rwsem to solve race during exclusive open

2013-07-19 Thread Jörn Engel
On Wed, 17 July 2013 23:34:03 +0800, Vaughan Cao wrote: > Date: Wed, 17 Jul 2013 23:34:03 +0800 > From: Vaughan Cao > To: jo...@logfs.org > Cc: dgilb...@interlog.com, jbottom...@parallels.com, > linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org, > vaughan@oracle.com > Subjec

Re: [PATCH v4 4/4] [SCSI] sg: push file descriptor list locking down to per-device locking

2013-07-19 Thread Jörn Engel
On Wed, 17 July 2013 23:34:06 +0800, Vaughan Cao wrote: > Date: Wed, 17 Jul 2013 23:34:06 +0800 > From: Vaughan Cao > To: jo...@logfs.org > Cc: dgilb...@interlog.com, jbottom...@parallels.com, > linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org, > vaughan@oracle.com > Subjec

Re: [PATCH v4 2/4] [SCSI] sg: no need sg_open_exclusive_lock

2013-07-19 Thread Jörn Engel
On Wed, 17 July 2013 23:34:04 +0800, Vaughan Cao wrote: > Date: Wed, 17 Jul 2013 23:34:04 +0800 > From: Vaughan Cao > To: jo...@logfs.org > Cc: dgilb...@interlog.com, jbottom...@parallels.com, > linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org, > vaughan@oracle.com > Subjec

Re: [PATCH v4 3/4] [SCSI] sg: checking sdp->detached isn't protected when open

2013-07-19 Thread Jörn Engel
On Wed, 17 July 2013 23:34:05 +0800, Vaughan Cao wrote: > > -static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev); > +static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev, int * reason); You can use ERR_PTR and friends instead of adding another parameter. > static void sg_remove_sfp(struct kref *); >

[PATCH] mpt2sas: don't handle broadcast primitives

2013-07-19 Thread Jörn Engel
The handling of broadcast primitives involves _scsih_block_io_all_device(), which does what the name implies. I have observed cases with >60s of blocking io on all devices, caused by a single bad device. The downsides of this code are obvious, while the upsides are more elusive. Signed-off-by: J

Re: [PATCH] mpt2sas: don't handle broadcast primitives

2013-07-19 Thread Jörn Engel
On Fri, 19 July 2013 18:06:59 -0400, Jörn Engel wrote: > > The handling of broadcast primitives involves > _scsih_block_io_all_device(), which does what the name implies. I have > observed cases with >60s of blocking io on all devices, caused by a > single bad device. The down

Re: [PATCH v4 3/4] [SCSI] sg: checking sdp->detached isn't protected when open

2013-07-22 Thread Jörn Engel
On Sat, 20 July 2013 15:53:16 +0800, Xitao Cao wrote: > > Thanks for your comment. Do I need to update it and resend? Yes, please. Jörn -- When people work hard for you for a pat on the back, you've got to give them that pat. -- Robert Heinlein -- To unsubscribe from this list: send the line "u

Re: [PATCH v5 0/4] [SCSI] sg: fix race condition in sg_open

2013-07-22 Thread Jörn Engel
On Mon, 22 July 2013 12:40:29 +0800, Vaughan Cao wrote: > > There is a race when open sg with O_EXCL flag. Also a race may happen between > sg_open and sg_remove. > > Changes from v4: > * [3/4] use ERR_PTR series instead of adding another parameter in sg_add_sfp > * [4/4] fix conflict for cherr

Re: [PATCH] mpt2sas: don't handle broadcast primitives

2013-07-24 Thread Jörn Engel
On Wed, 24 July 2013 23:42:22 +0300, Baruch Even wrote: > On Sat, Jul 20, 2013 at 1:11 AM, Jörn Engel wrote: > > On Fri, 19 July 2013 18:06:59 -0400, Jörn Engel wrote: > >> > >> The handling of broadcast primitives involves > >> _scsih_block_io_all_device(),

[RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-27 Thread Jörn Engel
Here is a fun patch in an early state. Essentially I want to trace scsi commands, which has already been done long ago. The problem I have is that I care about all the scsi commands for one particular device - without knowing in advance which device it will be. Once I know the device in question

Re: [RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-27 Thread Jörn Engel
On Tue, 27 August 2013 19:54:22 -0400, Steven Rostedt wrote: > On Tue, 27 Aug 2013 18:03:25 -0400 > Jörn Engel wrote: > > > Here is a fun patch in an early state. Essentially I want to trace > > scsi commands, which has already been done long ago. The problem I > >

Re: [RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-28 Thread Jörn Engel
On Tue, 27 August 2013 20:34:43 -0400, Steven Rostedt wrote: > On Tue, 27 Aug 2013 18:41:47 -0400 > Jörn Engel wrote: > > > > "without knowing in advance which device it will be" > > > > I have to trace all devices, because I don't know the interesti

Re: [RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-28 Thread Jörn Engel
On Tue, 27 August 2013 22:09:19 -0700, Nicholas A. Bellinger wrote: > On Tue, 2013-08-27 at 18:17 -0400, Jörn Engel wrote: > > Here is a fun patch in an early state. Essentially I want to trace > > scsi commands, which has already been done long ago. The problem I > > have

Re: [RFC][PATCH] Add a flight data recorder for scsi commands

2013-08-28 Thread Jörn Engel
On Wed, 28 August 2013 14:45:14 +0200, Bart Van Assche wrote: > > Isn't this facility something that overlaps with what Wireshark can > already do today ? Wireshark can already capture SCSI traces for > several transport types (iSCSI, USB, ...). For other transports, > e.g. FC, I'd prefer to see W

Re: [PATCHv5 0/9] New EH command timeout handler

2013-09-03 Thread Jörn Engel
On Mon, 2 September 2013 02:31:29 -0700, Christoph Hellwig wrote: > > I would defintively love enabling it globally. Agreed. I carry a private patch based on an older version of Hannes' patchset doing just that. Istr there was some problem exactly once on a VM with a virtualized hba. We hit pr

[PATCH] mpt2sas: prevent double free on error path

2013-01-23 Thread Jörn Engel
I noticed this one when list_del was called with poisoned list pointers, but the real problem is a double-free (and a use-after-free just before that). Both _scsih_probe_boot_devices() and _scsih_sas_device_add() put the sas_device onto a list, thereby giving up control. Next they call mpt2sas_tr

Re: [PATCH] mpt2sas: prevent double free on error path

2013-01-25 Thread Jörn Engel
On Thu, 24 January 2013 08:51:20 +0100, Bjørn Mork wrote: > > How about the copy of this code in drivers/scsi/mpt3sas/mpt3sas_scsih.c? > Is that safe, or does it need fixing as well? Well spotted, that appears to suffer from the same ailment. Will cook up a second patch for that. Are there any

[PATCH v2] mpt2sas/mpt3sas: prevent double free on error path

2013-01-25 Thread Jörn Engel
Changes since v1: - Also fixes mpt3sas, thanks to Bjørn Mork. - Adds a missing put_sas_device to _scsih_probe_boot_devices - Added a newline - Moved a kref_get outside the spinlock I noticed this one when list_del was called with poisoned list pointers, but the real problem is a double-free (and a

[PATCH v3] mpt2sas/mpt3sas: prevent double free on error path

2013-01-25 Thread Jörn Engel
Changes since v1: - Adds another missing put_sas_device to _scsih_probe_boot_devices, proving the need for more coffee before sending out patches. I noticed this one when list_del was called with poisoned list pointers, but the real problem is a double-free (and a use-after-free just before that

[PATCH] scsi_lib: add NULL check to scsi_setup_blk_pc_cmnd

2012-07-23 Thread Jörn Engel
At least two slightly different paths can lead to a NULL pointer dereference in scsi_prep_state_check and have been hit in practice. 1. Call Trace: [] scsi_setup_blk_pc_cmnd+0x2b/0x170 [] sd_prep_fn+0x568/0xdd0 [] blk_peek_request+0xb4/0x240 [] scsi_request_fn+0x43e/0x4a0 [] __blk_run_queue+0x

Re: [PATCH] scsi_lib: add NULL check to scsi_setup_blk_pc_cmnd

2012-07-23 Thread Jörn Engel
On Mon, 23 July 2012 23:45:55 +0400, James Bottomley wrote: > On Mon, 2012-07-23 at 13:32 -0400, Jörn Engel wrote: > > At least two slightly different paths can lead to a NULL pointer > > dereference in scsi_prep_state_check and have been hit in practice. > > Have you

Re: [PATCH] scsi_lib: add NULL check to scsi_setup_blk_pc_cmnd

2012-07-25 Thread Jörn Engel
On Tue, 24 July 2012 09:01:41 +0400, James Bottomley wrote: > On Mon, 2012-07-23 at 15:24 -0400, Jörn Engel wrote: > > On Mon, 23 July 2012 23:45:55 +0400, James Bottomley wrote: > > > > > > Have you checked this with the patches in scsi-misc? There's a serie