On Tue, 2018-04-10 at 09:30 +0800, Ming Lei wrote:
> Also is it possible to see queue freed here?
I think the caller should keep a reference on the request queue. Otherwise
we have a much bigger problem than a race between submitting a bio and
removing a request queue from the cgroup controller in
On Mon, Apr 09, 2018 at 10:54:57PM +, Bart Van Assche wrote:
> On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
> > The oops happens during generic_make_request_checks(), in
> > blk_throtl_bio() exactly.
> > So if we want to bypass dying queue, we have to check this before
> > generic_make_r
On Mon, 2018-04-09 at 16:58 -0600, Jens Axboe wrote:
> This ends up being nutty in the generic_make_request() case, where we
> do the exact same enter/exit logic right after. That needs to get unified.
> Maybe move the queue enter into generic_make_request_checks(), and exit
> in the caller?
Hello
On 4/9/18 4:54 PM, Bart Van Assche wrote:
> On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
>> The oops happens during generic_make_request_checks(), in
>> blk_throtl_bio() exactly.
>> So if we want to bypass dying queue, we have to check this before
>> generic_make_request_checks(), I think.
>
On Mon, 2018-04-09 at 14:54 +0800, Joseph Qi wrote:
> The oops happens during generic_make_request_checks(), in
> blk_throtl_bio() exactly.
> So if we want to bypass dying queue, we have to check this before
> generic_make_request_checks(), I think.
How about something like the patch below?
Thank
Hi Bart,
On 18/4/9 12:47, Bart Van Assche wrote:
> On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
>> The following kernel oops is triggered by 'removing scsi device' during
>> heavy IO.
>
> Is the below patch sufficient to fix this?
>
> Thanks,
>
> Bart.
>
>
> Subject: blk-mq: Avoid that
On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> The following kernel oops is triggered by 'removing scsi device' during
> heavy IO.
Is the below patch sufficient to fix this?
Thanks,
Bart.
Subject: blk-mq: Avoid that submitting a bio concurrently with device removal
triggers a crash
Bec
On Mon, Apr 09, 2018 at 09:33:08AM +0800, Joseph Qi wrote:
> Hi Bart,
>
> On 18/4/8 22:50, Bart Van Assche wrote:
> > On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> >> The following kernel oops is triggered by 'removing scsi device' during
> >> heavy IO.
> >
> > How did you trigger this oop
Hi Bart,
On 18/4/8 22:50, Bart Van Assche wrote:
> On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
>> The following kernel oops is triggered by 'removing scsi device' during
>> heavy IO.
>
> How did you trigger this oops?
>
I can reproduce this oops by the following steps:
1) start a fio job
On Sun, 2018-04-08 at 16:11 +0800, Joseph Qi wrote:
> This is because scsi_remove_device() will call blk_cleanup_queue(), and
> then all blkgs have been destroyed and root_blkg is NULL.
> Thus tg is NULL and trigger NULL pointer dereference when get td from
> tg (tg->td).
> It seems that we cannot
On Sun, 2018-04-08 at 12:21 +0800, Ming Lei wrote:
> The following kernel oops is triggered by 'removing scsi device' during
> heavy IO.
How did you trigger this oops?
Bart.
On Sun, Apr 08, 2018 at 05:25:42PM +0800, Ming Lei wrote:
> On Sun, Apr 08, 2018 at 04:11:51PM +0800, Joseph Qi wrote:
> > This is because scsi_remove_device() will call blk_cleanup_queue(), and
> > then all blkgs have been destroyed and root_blkg is NULL.
> > Thus tg is NULL and trigger NULL point
On Sun, Apr 08, 2018 at 04:11:51PM +0800, Joseph Qi wrote:
> This is because scsi_remove_device() will call blk_cleanup_queue(), and
> then all blkgs have been destroyed and root_blkg is NULL.
> Thus tg is NULL and trigger NULL pointer dereference when get td from
> tg (tg->td).
> It seems that we
This is because scsi_remove_device() will call blk_cleanup_queue(), and
then all blkgs have been destroyed and root_blkg is NULL.
Thus tg is NULL and trigger NULL pointer dereference when get td from
tg (tg->td).
It seems that we cannot simply move blkcg_exit_queue() up to
blk_cleanup_queue().
Tha
Hi,
The following kernel oops is triggered by 'removing scsi device' during
heavy IO.
'git bisect' shows that commit a063057d7c731cffa7d10740(block: Fix a race
between request queue removal and the block cgroup controller)
introduced this regression:
[ 42.268257] BUG: unable to handle kernel N
15 matches
Mail list logo