On Fri, 24 May 2013 11:50:47 +0200, Hannes Reinecke wrote:
>
> The 'eh_entry' list might be used even before scsi_softirq_done()
> is called. Hence we should rather initialize it together with
> the other eh-related variables.
>
> Signed-off-by: Hannes Reinecke
> ---
> drivers/scsi/scsi_lib.c |
On Wed, 5 June 2013 17:18:33 +0800, vaughan wrote:
>
> Check and set sdp->exclude should be atomic when set in sg_open().
The patch is line-wrapped. More importantly, it doesn't seem to do
what your description indicates it should do. And lastly, does this
fix a bug, possibly even one you have
On Thu, 6 June 2013 00:16:45 +0800, vaughan wrote:
> 于 2013年06月05日 21:27, Jörn Engel 写道:
> >On Wed, 5 June 2013 17:18:33 +0800, vaughan wrote:
> >>
> >>Check and set sdp->exclude should be atomic when set in sg_open().
> >
> >The patch is line-wrapped
On Sat, 25 May 2013 11:55:58 +0200, Hannes Reinecke wrote:
> >
> It should be possible to move the code into scsi_lib and just have
> small hooks for the individual transports to use this.
We have done something a bit tasteless and unconditionally enabled
your new error handling for all scsi drive
On Thu, 6 June 2013 11:43:52 +0200, Hannes Reinecke wrote:
>
> The 'eh_entry' list might be used even before scsi_softirq_done()
> is called. Hence we should rather initialize it together with
> the other eh-related variables.
>
> Signed-off-by: Hannes Reinecke
Tested-by: Joern Engel
Jörn
--
On Thu, 6 June 2013 11:43:53 +0200, Hannes Reinecke wrote:
>
> Add a 'BLK_EH_SCHEDULED' return code for blk-timeout to indicate
> that a delayed error recovery has been initiated.
>
> Signed-off-by: Hannes Reinecke
Since the patch I tested diverged from your patches 2-4, this is only
Acked-by:
On Thu, 6 June 2013 11:43:54 +0200, Hannes Reinecke wrote:
>
> When a command runs into a timeout we need to send an 'ABORT TASK'
> TMF. This is typically done by the 'eh_abort_handler' LLDD callback.
>
> Conceptually, however, this function is a normal SCSI command, so
> there is no need to ente
On Thu, 6 June 2013 22:39:14 +0200, Hannes Reinecke wrote:
>
> >>+ spin_unlock_irqrestore(&sdev->list_lock, flags);
> >>+ SCSI_LOG_ERROR_RECOVERY(3,
> >>+ scmd_printk(KERN_INFO, scmd,
> >>+ "aborting command %p\n", scmd));
> >>+
On Fri, 7 June 2013 14:54:16 +0800, Ren Mingxin wrote:
> On 06/06/2013 05:43 PM, Hannes Reinecke wrote:
>
> This approach may not work for some LLDDs as you said, but I wonder
> whether SAS is applicable(whether there will be later patches for
> SAS).
We have tested a derived patch on SAS and it
On Thu, 6 June 2013 11:43:54 +0200, Hannes Reinecke wrote:
>
> + spin_lock_irqsave(&sdev->list_lock, flags);
> + list_for_each_entry_safe(scmd, tmp, &sdev->eh_abort_list, eh_entry) {
> + list_del_init(&scmd->eh_entry);
> + spin_unlock_irqrestore(&sdev->list_lock, fl
On Fri, 7 June 2013 21:34:16 +, Nicholas A. Bellinger wrote:
>
> -static int transport_cmd_check_stop(struct se_cmd *cmd, bool
> remove_from_lists)
> +static int transport_cmd_check_stop(struct se_cmd *cmd, bool
> remove_from_lists,
> + bool write_pending)
..
On Mon, 10 June 2013 11:00:49 +0200, Hannes Reinecke wrote:
> On 06/10/2013 10:20 AM, Christoph Hellwig wrote:
> >
> > Why can't we use a work item per command? Linking things into a list
> > just to queue it up to workqueues missed half of the point of the
> > workqueue infrastructure.
> >
> Hm
On Mon, 10 June 2013 09:40:57 +0200, Hannes Reinecke wrote:
>
> Signed-off-by: Hannes Reinecke
> ---
> drivers/scsi/mpt2sas/mpt2sas_scsih.c | 11 +++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> inde
On Mon, 10 June 2013 09:40:52 +0200, Hannes Reinecke wrote:
> +
> + spin_lock_irqsave(&sdev->list_lock, flags);
> + if (list_empty(&sdev->eh_abort_list))
> + kick_worker = 1;
> + list_add(&scmd->eh_entry, &sdev->eh_abort_list);
> + spin_unlock_irqrestore(&sdev->list_lock
On Mon, 10 June 2013 11:19:16 -0400, Jörn Engel wrote:
>
> I don't care too much whether we use per-command work items or a
> single system-global thread.
Actually, I do care. We have to abort the commands in parallel, as a
fairly large number of abort can queue up and individ
On Tue, 11 June 2013 08:18:51 +0200, Hannes Reinecke wrote:
> On 06/11/2013 01:24 AM, Jörn Engel wrote:
> > On Mon, 10 June 2013 11:19:16 -0400, Jörn Engel wrote:
> >>
> >> I don't care too much whether we use per-command work items or a
> >> single syst
On Mon, 1 July 2013 08:50:48 +0200, Hannes Reinecke wrote:
>
> This patchset implements a new 'eh_deadline' attribute to the
> SCSI host. It will limit the overall SCSI EH runtime by a given
> timeout. If the timeout is reached all intermediate EH steps
> will be skipped and host reset will be sch
On Mon, 1 July 2013 19:23:25 +, James Bottomley wrote:
> On Mon, 2013-07-01 at 13:44 -0400, Jörn Engel wrote:
> > If a single device is bad, don't ever do a host
> > reset.
>
> This isn't a tenable position. Sometimes a device looks bad because the
> host
On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote:
>
> I don't understand what you're getting at. In a dual HBA situation,
> whether the second HBA is implicated or not depends on configuration and
> what the first HBA is doing. If it's just passively lost device state,
> then the second
On Tue, 2 July 2013 16:33:40 +, James Bottomley wrote:
> On Tue, 2013-07-02 at 10:58 -0400, Jörn Engel wrote:
> > On Tue, 2 July 2013 06:37:05 +, James Bottomley wrote:
> > >
> > > I don't understand what you're getting at. In a dual HBA situa
Sorry about replying so late.
On Mon, 17 June 2013 21:10:53 +0800, vaughan wrote:
>
> Rewrite the last patch.
> Add a new field 'toopen' in sg_device to count ongoing sg_open's. By checking
> both 'toopen' and 'exclude' marks when do exclusive open, old race conditions
> can be avoided.
> Repla
On Sun, 7 July 2013 01:24:44 +0800, vaughan wrote:
>
> Do you agree that I use a per device spin_lock 'sfd_lock' to protect
> sfds list and leave sg_index_lock
> only protect the global sg device lookup? I think it's reasonable
> for concurrency.
That bit looks fine to me. I don't think it matte
On Mon, 8 July 2013 03:53:49 +0800, vaughan wrote:
>
> Use rwsem to aid opens. Exclusive open has to get write lock and
> non-exclusive open should get read lock.
> Replace global sg_open_exclusive_lock with a per device lock - sfd_lock.
> Since sfds list is now protected by the lock owned by th
On Wed, 17 July 2013 23:34:03 +0800, Vaughan Cao wrote:
> Date: Wed, 17 Jul 2013 23:34:03 +0800
> From: Vaughan Cao
> To: jo...@logfs.org
> Cc: dgilb...@interlog.com, jbottom...@parallels.com,
> linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org,
> vaughan@oracle.com
> Subjec
On Wed, 17 July 2013 23:34:06 +0800, Vaughan Cao wrote:
> Date: Wed, 17 Jul 2013 23:34:06 +0800
> From: Vaughan Cao
> To: jo...@logfs.org
> Cc: dgilb...@interlog.com, jbottom...@parallels.com,
> linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org,
> vaughan@oracle.com
> Subjec
On Wed, 17 July 2013 23:34:04 +0800, Vaughan Cao wrote:
> Date: Wed, 17 Jul 2013 23:34:04 +0800
> From: Vaughan Cao
> To: jo...@logfs.org
> Cc: dgilb...@interlog.com, jbottom...@parallels.com,
> linux-scsi@vger.kernel.org, linux-ker...@vger.kernel.org,
> vaughan@oracle.com
> Subjec
On Wed, 17 July 2013 23:34:05 +0800, Vaughan Cao wrote:
>
> -static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
> +static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev, int * reason);
You can use ERR_PTR and friends instead of adding another parameter.
> static void sg_remove_sfp(struct kref *);
>
The handling of broadcast primitives involves
_scsih_block_io_all_device(), which does what the name implies. I have
observed cases with >60s of blocking io on all devices, caused by a
single bad device. The downsides of this code are obvious, while the
upsides are more elusive.
Signed-off-by: J
On Fri, 19 July 2013 18:06:59 -0400, Jörn Engel wrote:
>
> The handling of broadcast primitives involves
> _scsih_block_io_all_device(), which does what the name implies. I have
> observed cases with >60s of blocking io on all devices, caused by a
> single bad device. The down
On Sat, 20 July 2013 15:53:16 +0800, Xitao Cao wrote:
>
> Thanks for your comment. Do I need to update it and resend?
Yes, please.
Jörn
--
When people work hard for you for a pat on the back, you've got
to give them that pat.
-- Robert Heinlein
--
To unsubscribe from this list: send the line "u
On Mon, 22 July 2013 12:40:29 +0800, Vaughan Cao wrote:
>
> There is a race when open sg with O_EXCL flag. Also a race may happen between
> sg_open and sg_remove.
>
> Changes from v4:
> * [3/4] use ERR_PTR series instead of adding another parameter in sg_add_sfp
> * [4/4] fix conflict for cherr
On Wed, 24 July 2013 23:42:22 +0300, Baruch Even wrote:
> On Sat, Jul 20, 2013 at 1:11 AM, Jörn Engel wrote:
> > On Fri, 19 July 2013 18:06:59 -0400, Jörn Engel wrote:
> >>
> >> The handling of broadcast primitives involves
> >> _scsih_block_io_all_device(),
Here is a fun patch in an early state. Essentially I want to trace
scsi commands, which has already been done long ago. The problem I
have is that I care about all the scsi commands for one particular
device - without knowing in advance which device it will be. Once I
know the device in question
On Tue, 27 August 2013 19:54:22 -0400, Steven Rostedt wrote:
> On Tue, 27 Aug 2013 18:03:25 -0400
> Jörn Engel wrote:
>
> > Here is a fun patch in an early state. Essentially I want to trace
> > scsi commands, which has already been done long ago. The problem I
> >
On Tue, 27 August 2013 20:34:43 -0400, Steven Rostedt wrote:
> On Tue, 27 Aug 2013 18:41:47 -0400
> Jörn Engel wrote:
> >
> > "without knowing in advance which device it will be"
> >
> > I have to trace all devices, because I don't know the interesti
On Tue, 27 August 2013 22:09:19 -0700, Nicholas A. Bellinger wrote:
> On Tue, 2013-08-27 at 18:17 -0400, Jörn Engel wrote:
> > Here is a fun patch in an early state. Essentially I want to trace
> > scsi commands, which has already been done long ago. The problem I
> > have
On Wed, 28 August 2013 14:45:14 +0200, Bart Van Assche wrote:
>
> Isn't this facility something that overlaps with what Wireshark can
> already do today ? Wireshark can already capture SCSI traces for
> several transport types (iSCSI, USB, ...). For other transports,
> e.g. FC, I'd prefer to see W
On Mon, 2 September 2013 02:31:29 -0700, Christoph Hellwig wrote:
>
> I would defintively love enabling it globally.
Agreed. I carry a private patch based on an older version of Hannes'
patchset doing just that. Istr there was some problem exactly once on
a VM with a virtualized hba. We hit pr
I noticed this one when list_del was called with poisoned list
pointers, but the real problem is a double-free (and a use-after-free
just before that).
Both _scsih_probe_boot_devices() and _scsih_sas_device_add() put the
sas_device onto a list, thereby giving up control. Next they call
mpt2sas_tr
On Thu, 24 January 2013 08:51:20 +0100, Bjørn Mork wrote:
>
> How about the copy of this code in drivers/scsi/mpt3sas/mpt3sas_scsih.c?
> Is that safe, or does it need fixing as well?
Well spotted, that appears to suffer from the same ailment. Will cook
up a second patch for that.
Are there any
Changes since v1:
- Also fixes mpt3sas, thanks to Bjørn Mork.
- Adds a missing put_sas_device to _scsih_probe_boot_devices
- Added a newline
- Moved a kref_get outside the spinlock
I noticed this one when list_del was called with poisoned list
pointers, but the real problem is a double-free (and a
Changes since v1:
- Adds another missing put_sas_device to _scsih_probe_boot_devices,
proving the need for more coffee before sending out patches.
I noticed this one when list_del was called with poisoned list
pointers, but the real problem is a double-free (and a use-after-free
just before that
At least two slightly different paths can lead to a NULL pointer
dereference in scsi_prep_state_check and have been hit in practice.
1.
Call Trace:
[] scsi_setup_blk_pc_cmnd+0x2b/0x170
[] sd_prep_fn+0x568/0xdd0
[] blk_peek_request+0xb4/0x240
[] scsi_request_fn+0x43e/0x4a0
[] __blk_run_queue+0x
On Mon, 23 July 2012 23:45:55 +0400, James Bottomley wrote:
> On Mon, 2012-07-23 at 13:32 -0400, Jörn Engel wrote:
> > At least two slightly different paths can lead to a NULL pointer
> > dereference in scsi_prep_state_check and have been hit in practice.
>
> Have you
On Tue, 24 July 2012 09:01:41 +0400, James Bottomley wrote:
> On Mon, 2012-07-23 at 15:24 -0400, Jörn Engel wrote:
> > On Mon, 23 July 2012 23:45:55 +0400, James Bottomley wrote:
> > >
> > > Have you checked this with the patches in scsi-misc? There's a serie
45 matches
Mail list logo