Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Dongsu Park
On 21.04.2015 00:48, Ming Lei wrote: > Thanks for providing that. > The trick is just in CPU number and virito-scsi hw queue number, > and that is why I asked that, :-) > Now the problem is quite clear, before CPU1 online, suppose > CPU3 is mapped hw queue 6, and CPU 3 will map to hw queue 5 >

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Ming Lei
On Mon, 20 Apr 2015 17:52:40 +0200 Dongsu Park wrote: > On 20.04.2015 21:12, Ming Lei wrote: > > On Mon, Apr 20, 2015 at 4:07 PM, Dongsu Park > > wrote: > > > Hi Ming, > > > > > > On 18.04.2015 00:23, Ming Lei wrote: > > >> > Does anyone have an idea? > > >> > > >> As far as I can see, at least

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Dongsu Park
On 20.04.2015 21:12, Ming Lei wrote: > On Mon, Apr 20, 2015 at 4:07 PM, Dongsu Park > wrote: > > Hi Ming, > > > > On 18.04.2015 00:23, Ming Lei wrote: > >> > Does anyone have an idea? > >> > >> As far as I can see, at least two problems exist: > >> - race between timeout and CPU hotplug > >> - in

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Ming Lei
On Mon, Apr 20, 2015 at 4:07 PM, Dongsu Park wrote: > Hi Ming, > > On 18.04.2015 00:23, Ming Lei wrote: >> > Does anyone have an idea? >> >> As far as I can see, at least two problems exist: >> - race between timeout and CPU hotplug >> - in case of shared tags, during CPU online handling, about

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Dongsu Park
Hi Ming, On 18.04.2015 00:23, Ming Lei wrote: > > Does anyone have an idea? > > As far as I can see, at least two problems exist: > - race between timeout and CPU hotplug > - in case of shared tags, during CPU online handling, about setting > and checking hctx->tags > > So could you please test

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Ming Lei
On Mon, Apr 20, 2015 at 4:07 PM, Dongsu Park dongsu.p...@profitbricks.com wrote: Hi Ming, On 18.04.2015 00:23, Ming Lei wrote: Does anyone have an idea? As far as I can see, at least two problems exist: - race between timeout and CPU hotplug - in case of shared tags, during CPU online

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Dongsu Park
Hi Ming, On 18.04.2015 00:23, Ming Lei wrote: Does anyone have an idea? As far as I can see, at least two problems exist: - race between timeout and CPU hotplug - in case of shared tags, during CPU online handling, about setting and checking hctx-tags So could you please test the

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Ming Lei
On Mon, 20 Apr 2015 17:52:40 +0200 Dongsu Park dongsu.p...@profitbricks.com wrote: On 20.04.2015 21:12, Ming Lei wrote: On Mon, Apr 20, 2015 at 4:07 PM, Dongsu Park dongsu.p...@profitbricks.com wrote: Hi Ming, On 18.04.2015 00:23, Ming Lei wrote: Does anyone have an idea?

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Dongsu Park
On 20.04.2015 21:12, Ming Lei wrote: On Mon, Apr 20, 2015 at 4:07 PM, Dongsu Park dongsu.p...@profitbricks.com wrote: Hi Ming, On 18.04.2015 00:23, Ming Lei wrote: Does anyone have an idea? As far as I can see, at least two problems exist: - race between timeout and CPU hotplug

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-20 Thread Dongsu Park
On 21.04.2015 00:48, Ming Lei wrote: Thanks for providing that. The trick is just in CPU number and virito-scsi hw queue number, and that is why I asked that, :-) Now the problem is quite clear, before CPU1 online, suppose CPU3 is mapped hw queue 6, and CPU 3 will map to hw queue 5 after

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-19 Thread Ming Lei
On Sat, Apr 18, 2015 at 4:30 PM, Jens Axboe wrote: > On 04/17/2015 10:23 PM, Ming Lei wrote: >> >> Hi Dongsu, >> >> On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park >> wrote: >>> >>> Hi, >>> >>> there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq. >>> Every time when a CPU is offlined,

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-19 Thread Ming Lei
On Sat, Apr 18, 2015 at 4:30 PM, Jens Axboe ax...@kernel.dk wrote: On 04/17/2015 10:23 PM, Ming Lei wrote: Hi Dongsu, On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park dongsu.p...@profitbricks.com wrote: Hi, there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq. Every time when a

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-18 Thread Jens Axboe
On 04/17/2015 10:23 PM, Ming Lei wrote: Hi Dongsu, On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park wrote: Hi, there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq. Every time when a CPU is offlined, some arbitrary range of kernel memory seems to get corrupted. Then after a while,

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-18 Thread Jens Axboe
On 04/17/2015 10:23 PM, Ming Lei wrote: Hi Dongsu, On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park dongsu.p...@profitbricks.com wrote: Hi, there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq. Every time when a CPU is offlined, some arbitrary range of kernel memory seems to get

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-17 Thread Ming Lei
Hi Dongsu, On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park wrote: > Hi, > > there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq. > Every time when a CPU is offlined, some arbitrary range of kernel memory > seems to get corrupted. Then after a while, kernel panics at random places >

panic with CPU hotplug + blk-mq + scsi-mq

2015-04-17 Thread Dongsu Park
Hi, there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq. Every time when a CPU is offlined, some arbitrary range of kernel memory seems to get corrupted. Then after a while, kernel panics at random places when block IOs are issued. (for example, see the call traces below) This bug

panic with CPU hotplug + blk-mq + scsi-mq

2015-04-17 Thread Dongsu Park
Hi, there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq. Every time when a CPU is offlined, some arbitrary range of kernel memory seems to get corrupted. Then after a while, kernel panics at random places when block IOs are issued. (for example, see the call traces below) This bug

Re: panic with CPU hotplug + blk-mq + scsi-mq

2015-04-17 Thread Ming Lei
Hi Dongsu, On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park dongsu.p...@profitbricks.com wrote: Hi, there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq. Every time when a CPU is offlined, some arbitrary range of kernel memory seems to get corrupted. Then after a while, kernel