Re: usercopy whitelist woe in scsi_sense_cache

2018-04-21 Thread Paolo Valente
> Il giorno 20 apr 2018, alle ore 22:23, Kees Cook ha > scritto: > > On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente > wrote: >> I'm missing something here. When the request gets completed in the >> first place, the hook

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-20 Thread Oleksandr Natalenko
Hi. On 20.04.2018 22:23, Kees Cook wrote: I don't know the "how", I only found the "what". :) If you want, grab the reproducer VM linked to earlier in this thread; it'll hit the problem within about 30 seconds of running the reproducer. Just to avoid a possible confusion I should note that

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-20 Thread Kees Cook
On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente wrote: > I'm missing something here. When the request gets completed in the > first place, the hook bfq_finish_requeue_request gets called, and that > hook clears both ->elv.priv elements (as the request has a non-null >

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-19 Thread Paolo Valente
> Il giorno 18 apr 2018, alle ore 16:30, Jens Axboe ha > scritto: > > On 4/18/18 3:08 AM, Paolo Valente wrote: >> >> >>> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha >>> scritto: >>> >>> On 4/17/18 3:48 PM, Jens Axboe wrote: On 4/17/18 3:47

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-18 Thread Jens Axboe
On 4/18/18 3:08 AM, Paolo Valente wrote: > > >> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha >> scritto: >> >> On 4/17/18 3:48 PM, Jens Axboe wrote: >>> On 4/17/18 3:47 PM, Kees Cook wrote: On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote: >

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-18 Thread Paolo Valente
> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha > scritto: > > On 4/17/18 3:48 PM, Jens Axboe wrote: >> On 4/17/18 3:47 PM, Kees Cook wrote: >>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote: On 4/17/18 3:25 PM, Kees Cook wrote: > On Tue,

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 5:06 PM, Kees Cook wrote: > On Tue, Apr 17, 2018 at 3:57 PM, Jens Axboe wrote: >> On 4/17/18 3:48 PM, Jens Axboe wrote: >>> On 4/17/18 3:47 PM, Kees Cook wrote: On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote: > On 4/17/18 3:25 PM, Kees

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 3:57 PM, Jens Axboe wrote: > On 4/17/18 3:48 PM, Jens Axboe wrote: >> On 4/17/18 3:47 PM, Kees Cook wrote: >>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote: On 4/17/18 3:25 PM, Kees Cook wrote: > On Tue, Apr 17, 2018 at 1:46

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 3:48 PM, Jens Axboe wrote: > On 4/17/18 3:47 PM, Kees Cook wrote: >> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote: >>> On 4/17/18 3:25 PM, Kees Cook wrote: On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote: > I see elv.priv[1]

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Oleksandr Natalenko
Hi. 17.04.2018 23:47, Kees Cook wrote: I sent the patch anyway, since it's kind of a robustness improvement, I'd hope. If you fix BFQ also, please add: Reported-by: Oleksandr Natalenko Root-caused-by: Kees Cook :) I gotta task-switch to other

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 3:47 PM, Kees Cook wrote: > On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote: >> On 4/17/18 3:25 PM, Kees Cook wrote: >>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote: I see elv.priv[1] assignments made in a few places -- is it

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote: > On 4/17/18 3:25 PM, Kees Cook wrote: >> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote: >>> I see elv.priv[1] assignments made in a few places -- is it possible >>> there is some kind of

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 3:25 PM, Kees Cook wrote: > On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote: >> I see elv.priv[1] assignments made in a few places -- is it possible >> there is some kind of uninitialized-but-not-NULL state that can leak >> in there? > > Got it. This fixes it

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote: > I see elv.priv[1] assignments made in a few places -- is it possible > there is some kind of uninitialized-but-not-NULL state that can leak > in there? Got it. This fixes it for me: diff --git a/block/blk-mq.c

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 1:28 PM, Jens Axboe wrote: > It has to be the latter bfqq->dispatched increment, as those are > transient (and bfqd is not). Yeah, and I see a lot of comments around the lifetime of rq and bfqq, so I assume something is not being locked correctly.

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 2:25 PM, Kees Cook wrote: > On Tue, Apr 17, 2018 at 1:20 PM, Kees Cook wrote: >> On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote: >>> The above bfq_dispatch_request+0x99/0xad0 is still >>> __bfq_dispatch_request at

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 1:20 PM, Kees Cook wrote: > On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote: >> The above bfq_dispatch_request+0x99/0xad0 is still >> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN >> removed. 0x99 is 153

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote: > The above bfq_dispatch_request+0x99/0xad0 is still > __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN > removed. 0x99 is 153 decimal: > > (gdb) disass bfq_dispatch_request > Dump of assembler code for

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote: > With a hardware watchpoint, I've isolated the corruption to here: > > bfq_dispatch_request+0x2be/0x1610: > __bfq_dispatch_request at block/bfq-iosched.c:3902 > 3900if (rq) { > 3901inc_in_driver_start_rq: >

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 10:42 AM, Kees Cook wrote: > On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote: >> With a hardware watchpoint, I've isolated the corruption to here: >> >> bfq_dispatch_request+0x2be/0x1610: >> __bfq_dispatch_request at block/bfq-iosched.c:3902 >> 3900

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote: > With a hardware watchpoint, I've isolated the corruption to here: > > bfq_dispatch_request+0x2be/0x1610: > __bfq_dispatch_request at block/bfq-iosched.c:3902 > 3900if (rq) { > 3901inc_in_driver_start_rq: >

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 3:02 AM, James Bottomley wrote: > On Mon, 2018-04-16 at 20:12 -0700, Kees Cook wrote: >> I still haven't figured this out, though... any have a moment to look >> at this? > > Just to let you know you're not alone ... but I can't make any sense of >

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 2:19 AM, Oleksandr Natalenko wrote: > By any chance, have you tried to simplify the reproducer environment, or it > still needs my complex layout to trigger things even with KASAN? I haven't tried minimizing the reproducer yet, no. Now that I

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread James Bottomley
On Mon, 2018-04-16 at 20:12 -0700, Kees Cook wrote: > I still haven't figured this out, though... any have a moment to look > at this? Just to let you know you're not alone ... but I can't make any sense of this either. The bfdq is the elevator_data, which is initialised when the scheduler is

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Oleksandr Natalenko
Hi. 17.04.2018 05:12, Kees Cook wrote: Turning off HARDENED_USERCOPY and turning on KASAN, I see the same report: [ 38.274106] BUG: KASAN: slab-out-of-bounds in _copy_to_user+0x42/0x60 [ 38.274841] Read of size 22 at addr 8800122b8c4b by task smartctl/1064 [ 38.275630] [

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-16 Thread Kees Cook
On Mon, Apr 16, 2018 at 1:44 PM, Kees Cook wrote: > On Thu, Apr 12, 2018 at 8:02 PM, Kees Cook wrote: >> On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote: >>> After fixing up some build issues in the middle of the 4.16 cycle,

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-16 Thread Kees Cook
On Thu, Apr 12, 2018 at 8:02 PM, Kees Cook wrote: > On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote: >> After fixing up some build issues in the middle of the 4.16 cycle, I >> get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge branch >>

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-12 Thread Kees Cook
On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote: > After fixing up some build issues in the middle of the 4.16 cycle, I > get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge branch > 'for-4.16/block'"). Instead of letting the test run longer, I'm going > to

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-12 Thread Kees Cook
On Thu, Apr 12, 2018 at 3:01 PM, Kees Cook wrote: > On Thu, Apr 12, 2018 at 12:04 PM, Oleksandr Natalenko > wrote: >> Hi. >> >> On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote: >>> My first bisect attempt gave me commit 5448aca41cd5

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-12 Thread Kees Cook
On Thu, Apr 12, 2018 at 12:04 PM, Oleksandr Natalenko wrote: > Hi. > > On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote: >> My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire >> up timeouts"), which seems insane given that null_blk isn't even

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-12 Thread Oleksandr Natalenko
Hi. On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote: > My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire > up timeouts"), which seems insane given that null_blk isn't even built > in the .config. I managed to get the testing automated now for a "git > bisect run ...",

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-12 Thread Kees Cook
On Wed, Apr 11, 2018 at 5:03 PM, Kees Cook wrote: > On Wed, Apr 11, 2018 at 3:47 PM, Kees Cook wrote: >> On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote: >>> I'll see about booting with my own kernels, etc, and try to narrow

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-11 Thread Kees Cook
On Wed, Apr 11, 2018 at 3:47 PM, Kees Cook wrote: > On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote: >> I'll see about booting with my own kernels, etc, and try to narrow this >> down. :) > > If I boot kernels I've built, I no longer hit the bug in

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-11 Thread Kees Cook
On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote: > I'll see about booting with my own kernels, etc, and try to narrow this down. > :) If I boot kernels I've built, I no longer hit the bug in this VM (though I'll keep trying). What compiler are you using? -Kees -- Kees

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Kees Cook
On Tue, Apr 10, 2018 at 10:16 AM, Oleksandr Natalenko wrote: > Hi, Kees, Paolo et al. > > 10.04.2018 08:53, Kees Cook wrote: >> >> Unfortunately I only had a single hang with no dumps. I haven't been >> able to reproduce it since. :( > > > For your convenience I've

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko
Hi, Kees, Paolo et al. 10.04.2018 08:53, Kees Cook wrote: Unfortunately I only had a single hang with no dumps. I haven't been able to reproduce it since. :( For your convenience I've prepared a VM that contains a reproducer. It consists of 3 disk images (sda.img is for the system, it is

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko
Hi. 10.04.2018 08:35, Oleksandr Natalenko wrote: - does it reproduce _without_ hardened usercopy? (I would assume yes, but you'd just not get any warning until the hangs started.) If it does reproduce without hardened usercopy, then a new bisect run could narrow the search even more. Looks

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Kees Cook
On Mon, Apr 9, 2018 at 11:35 PM, Oleksandr Natalenko wrote: > Did your system hang on smartctl hammering too? Have you got some stack > traces to compare with mine ones? Unfortunately I only had a single hang with no dumps. I haven't been able to reproduce it since. :(

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko
Hi. 09.04.2018 22:30, Kees Cook wrote: echo 1 | tee /sys/block/sd*/queue/nr_requests I can't get this below "4". Oops, yeah. It cannot be less than BLKDEV_MIN_RQ (which is 4), so it is enforced explicitly in queue_requests_store(). It is the same for me. echo 1 | tee

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Kees Cook
On Mon, Apr 9, 2018 at 1:30 PM, Kees Cook wrote: > Ah! dm-crypt too. I'll see if I can get that added easily to my tests. Quick update: I added dm-crypt (with XFS on top) and it hung my system almost immediately. I got no warnings at all, though. -Kees -- Kees Cook

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Kees Cook
On Mon, Apr 9, 2018 at 12:02 PM, Oleksandr Natalenko wrote: > > Hi. > > (fancy details for linux-block and BFQ people go below) > > 09.04.2018 20:32, Kees Cook wrote: >> >> Ah, this detail I didn't have. I've changed my environment to >> >> build with: >> >>

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Oleksandr Natalenko
Hi. (fancy details for linux-block and BFQ people go below) 09.04.2018 20:32, Kees Cook wrote: Ah, this detail I didn't have. I've changed my environment to build with: CONFIG_BLK_MQ_PCI=y CONFIG_BLK_MQ_VIRTIO=y CONFIG_IOSCHED_BFQ=y boot with scsi_mod.use_blk_mq=1 and select BFQ in the

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Kees Cook
On Sun, Apr 8, 2018 at 12:07 PM, Oleksandr Natalenko wrote: > So far, I wasn't able to trigger this with mq-deadline (or without blk-mq). > Maybe, this has something to do with blk-mq+BFQ re-queuing, or it's just me > not being persistent enough. Ah, this detail I

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Oleksandr Natalenko
Hi. 09.04.2018 11:35, Christoph Hellwig wrote: I really can't make sense of that report. Sorry, I have nothing to add there so far, I just see the symptom of something going wrong in the ioctl code path that is invoked by smartctl, but I have no idea what's the minimal environment to

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Christoph Hellwig
I really can't make sense of that report. And I'm also curious why you think 17cb960f29c2 should change anything for that code path.

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-08 Thread Oleksandr Natalenko
Hi. Cc'ing linux-block people (mainly, Christoph) too because of 17cb960f29c2. Also, duplicating the initial statement for them. With v4.16 (and now with v4.16.1) it is possible to trigger usercopy whitelist warning and/or bug while doing smartctl on a SATA disk having blk-mq and BFQ enabled.