RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-15 Thread Dexuan Cui
> From: h...@lst.de [mailto:h...@lst.de] > Sent: Wednesday, February 15, 2017 00:35 > > I tested today's linux-next (next-20170214) + the 2 patches just now and > got > > a weird result: > > sometimes the VM stills hung with a new calltrace (BUG: spinlock bad > > magic) , but sometimes the VM did b

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
> I tested today's linux-next (next-20170214) + the 2 patches just now and got > a weird result: > sometimes the VM stills hung with a new calltrace (BUG: spinlock bad > magic) , but sometimes the VM did boot up despite the new calltrace! > > Attached is the log of a "good" boot. > > It looks we

RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread Dexuan Cui
ernel.org; Nick Meier > ; Alex Ng (LIS) ; Long Li > ; Adrian Suhov (Cloudbase Solutions SRL) ads...@microsoft.com>; Chris Valean (Cloudbase Solutions SRL) chv...@microsoft.com> > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock > when scheduling workq

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
On Tue, Feb 14, 2017 at 02:46:41PM +, Dexuan Cui wrote: > > From: h...@lst.de [mailto:h...@lst.de] > > Sent: Tuesday, February 14, 2017 22:29 > > To: Dexuan Cui > > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock > >

RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread Dexuan Cui
> From: h...@lst.de [mailto:h...@lst.de] > Sent: Tuesday, February 14, 2017 22:29 > To: Dexuan Cui > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock > when scheduling workqueue elements") > > Ok, thanks for testing. Can you try the p

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
Ok, thanks for testing. Can you try the patch below? It fixes a clear problem which was partially papered over before the commit you bisected to, although it can't explain why blk-mq still works. >From e4a66856fa2d92c0298000de658365f31bea60cd Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Dat

RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread Dexuan Cui
> From: h...@lst.de [mailto:h...@lst.de] > > Hi Dexuan, > > can you try the hack below for now? I disable the TUR call from > sd_check_events, which I think your VM is hanging on. The checks > it does on the sense data look a bit fishy, but so far I've not > identified a possible root cause. >

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
Hi Dexuan, can you try the hack below for now? I disable the TUR call from sd_check_events, which I think your VM is hanging on. The checks it does on the sense data look a bit fishy, but so far I've not identified a possible root cause. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-09 Thread h...@lst.de
Hi Dexuan, I've spent some time with the logs and looking over the code and couldn't find any smoking gun. I start to wonder if it might just be a timing issue? Can you try one or two things for me: 1) run with the blk-mq I/O path for scsi by either enabling it a boot / module load time wi

RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-08 Thread Dexuan Cui
nel.org > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock > when scheduling workqueue elements") > > On Wed, Feb 08, 2017 at 10:43:59AM -0700, Jens Axboe wrote: > > I've changed the subject line, this issue has nothing to do with the >

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-08 Thread h...@lst.de
On Wed, Feb 08, 2017 at 10:43:59AM -0700, Jens Axboe wrote: > I've changed the subject line, this issue has nothing to do with the > issue that Hannes was attempting to fix. Nothing really useful in the thread. Dexuan, can you throw in some prints to see which command times out?

Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-08 Thread Jens Axboe
On 02/08/2017 03:48 AM, Dexuan Cui wrote: >> From: Jens Axboe [mailto:ax...@kernel.dk] >> Sent: Wednesday, February 8, 2017 00:09 >> To: Dexuan Cui ; Bart Van Assche >> ; h...@suse.com; h...@suse.de >> Cc: h...@lst.de; linux-ker...@vger.kernel.org; linux-block@vger.kernel.org; >> j...@kernel.org >>