> Il giorno 20 apr 2018, alle ore 22:23, Kees Cook ha
> scritto:
>
> On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente
> wrote:
>> I'm missing something here. When the request gets completed in the
>> first place, the hook bfq_finish_requeue_request gets called, and that
>> hook clears both ->
Hi.
On 20.04.2018 22:23, Kees Cook wrote:
I don't know the "how", I only found the "what". :) If you want, grab
the reproducer VM linked to earlier in this thread; it'll hit the
problem within about 30 seconds of running the reproducer.
Just to avoid a possible confusion I should note that I'v
On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente wrote:
> I'm missing something here. When the request gets completed in the
> first place, the hook bfq_finish_requeue_request gets called, and that
> hook clears both ->elv.priv elements (as the request has a non-null
> elv.icq). So, when bfq gets
> Il giorno 18 apr 2018, alle ore 16:30, Jens Axboe ha
> scritto:
>
> On 4/18/18 3:08 AM, Paolo Valente wrote:
>>
>>
>>> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha
>>> scritto:
>>>
>>> On 4/17/18 3:48 PM, Jens Axboe wrote:
On 4/17/18 3:47 PM, Kees Cook wrote:
> On Tue,
On 4/18/18 3:08 AM, Paolo Valente wrote:
>
>
>> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha
>> scritto:
>>
>> On 4/17/18 3:48 PM, Jens Axboe wrote:
>>> On 4/17/18 3:47 PM, Kees Cook wrote:
On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
> On 4/17/18 3:25 PM, Kees Cook wrot
> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha
> scritto:
>
> On 4/17/18 3:48 PM, Jens Axboe wrote:
>> On 4/17/18 3:47 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook
On 4/17/18 5:06 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 3:57 PM, Jens Axboe wrote:
>> On 4/17/18 3:48 PM, Jens Axboe wrote:
>>> On 4/17/18 3:47 PM, Kees Cook wrote:
On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
> On 4/17/18 3:25 PM, Kees Cook wrote:
>> On Tue, Apr 17, 201
On Tue, Apr 17, 2018 at 3:57 PM, Jens Axboe wrote:
> On 4/17/18 3:48 PM, Jens Axboe wrote:
>> On 4/17/18 3:47 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
>> I see
On 4/17/18 3:48 PM, Jens Axboe wrote:
> On 4/17/18 3:47 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
>>> On 4/17/18 3:25 PM, Kees Cook wrote:
On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
> I see elv.priv[1] assignments made in a few places -- is it poss
Hi.
17.04.2018 23:47, Kees Cook wrote:
I sent the patch anyway, since it's kind of a robustness improvement,
I'd hope. If you fix BFQ also, please add:
Reported-by: Oleksandr Natalenko
Root-caused-by: Kees Cook
:) I gotta task-switch to other things!
Thanks for the pointers, and thank you O
On 4/17/18 3:47 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
>> On 4/17/18 3:25 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
I see elv.priv[1] assignments made in a few places -- is it possible
there is some kind of uninitialize
On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
> On 4/17/18 3:25 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
>>> I see elv.priv[1] assignments made in a few places -- is it possible
>>> there is some kind of uninitialized-but-not-NULL state that can leak
>>> in t
On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
>> I see elv.priv[1] assignments made in a few places -- is it possible
>> there is some kind of uninitialized-but-not-NULL state that can leak
>> in there?
>
> Got it. This fixes it for me:
>
> diff --git a
On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
> I see elv.priv[1] assignments made in a few places -- is it possible
> there is some kind of uninitialized-but-not-NULL state that can leak
> in there?
Got it. This fixes it for me:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0dc9e341c2a
On Tue, Apr 17, 2018 at 1:28 PM, Jens Axboe wrote:
> It has to be the latter bfqq->dispatched increment, as those are
> transient (and bfqd is not).
Yeah, and I see a lot of comments around the lifetime of rq and bfqq,
so I assume something is not being locked correctly.
#define RQ_BFQQ(rq)
On 4/17/18 2:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:20 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote:
>>> The above bfq_dispatch_request+0x99/0xad0 is still
>>> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
>>> removed. 0x99 is 153 dec
On Tue, Apr 17, 2018 at 1:20 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote:
>> The above bfq_dispatch_request+0x99/0xad0 is still
>> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
>> removed. 0x99 is 153 decimal:
>>
>> (gdb) disass bfq_dispatch_reque
On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote:
> The above bfq_dispatch_request+0x99/0xad0 is still
> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
> removed. 0x99 is 153 decimal:
>
> (gdb) disass bfq_dispatch_request
> Dump of assembler code for function bfq_dispatch_requ
On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote:
> With a hardware watchpoint, I've isolated the corruption to here:
>
> bfq_dispatch_request+0x2be/0x1610:
> __bfq_dispatch_request at block/bfq-iosched.c:3902
> 3900if (rq) {
> 3901inc_in_driver_start_rq:
> 3902
On 4/17/18 10:42 AM, Kees Cook wrote:
> On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote:
>> With a hardware watchpoint, I've isolated the corruption to here:
>>
>> bfq_dispatch_request+0x2be/0x1610:
>> __bfq_dispatch_request at block/bfq-iosched.c:3902
>> 3900if (rq) {
>> 3901inc_
On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote:
> With a hardware watchpoint, I've isolated the corruption to here:
>
> bfq_dispatch_request+0x2be/0x1610:
> __bfq_dispatch_request at block/bfq-iosched.c:3902
> 3900if (rq) {
> 3901inc_in_driver_start_rq:
> 3902
On Tue, Apr 17, 2018 at 3:02 AM, James Bottomley
wrote:
> On Mon, 2018-04-16 at 20:12 -0700, Kees Cook wrote:
>> I still haven't figured this out, though... any have a moment to look
>> at this?
>
> Just to let you know you're not alone ... but I can't make any sense of
> this either. The bfdq is
On Tue, Apr 17, 2018 at 2:19 AM, Oleksandr Natalenko
wrote:
> By any chance, have you tried to simplify the reproducer environment, or it
> still needs my complex layout to trigger things even with KASAN?
I haven't tried minimizing the reproducer yet, no. Now that I have a
specific place to watch
On Mon, 2018-04-16 at 20:12 -0700, Kees Cook wrote:
> I still haven't figured this out, though... any have a moment to look
> at this?
Just to let you know you're not alone ... but I can't make any sense of
this either. The bfdq is the elevator_data, which is initialised when
the scheduler is att
Hi.
17.04.2018 05:12, Kees Cook wrote:
Turning off HARDENED_USERCOPY and turning on KASAN, I see the same
report:
[ 38.274106] BUG: KASAN: slab-out-of-bounds in
_copy_to_user+0x42/0x60
[ 38.274841] Read of size 22 at addr 8800122b8c4b by task
smartctl/1064
[ 38.275630]
[ 38.2758
On Mon, Apr 16, 2018 at 1:44 PM, Kees Cook wrote:
> On Thu, Apr 12, 2018 at 8:02 PM, Kees Cook wrote:
>> On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote:
>>> After fixing up some build issues in the middle of the 4.16 cycle, I
>>> get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge br
On Thu, Apr 12, 2018 at 8:02 PM, Kees Cook wrote:
> On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote:
>> After fixing up some build issues in the middle of the 4.16 cycle, I
>> get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge branch
>> 'for-4.16/block'"). Instead of letting the test
On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote:
> After fixing up some build issues in the middle of the 4.16 cycle, I
> get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge branch
> 'for-4.16/block'"). Instead of letting the test run longer, I'm going
> to switch to doing several short
On Thu, Apr 12, 2018 at 3:01 PM, Kees Cook wrote:
> On Thu, Apr 12, 2018 at 12:04 PM, Oleksandr Natalenko
> wrote:
>> Hi.
>>
>> On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote:
>>> My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire
>>> up timeouts"), which seems insane g
On Thu, Apr 12, 2018 at 12:04 PM, Oleksandr Natalenko
wrote:
> Hi.
>
> On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote:
>> My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire
>> up timeouts"), which seems insane given that null_blk isn't even built
>> in the .config. I man
Hi.
On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote:
> My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire
> up timeouts"), which seems insane given that null_blk isn't even built
> in the .config. I managed to get the testing automated now for a "git
> bisect run ...", so
On Wed, Apr 11, 2018 at 5:03 PM, Kees Cook wrote:
> On Wed, Apr 11, 2018 at 3:47 PM, Kees Cook wrote:
>> On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote:
>>> I'll see about booting with my own kernels, etc, and try to narrow this
>>> down. :)
>>
>> If I boot kernels I've built, I no longer hit
On Wed, Apr 11, 2018 at 3:47 PM, Kees Cook wrote:
> On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote:
>> I'll see about booting with my own kernels, etc, and try to narrow this
>> down. :)
>
> If I boot kernels I've built, I no longer hit the bug in this VM
> (though I'll keep trying). What comp
On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote:
> I'll see about booting with my own kernels, etc, and try to narrow this down.
> :)
If I boot kernels I've built, I no longer hit the bug in this VM
(though I'll keep trying). What compiler are you using?
-Kees
--
Kees Cook
Pixel Security
On Tue, Apr 10, 2018 at 10:16 AM, Oleksandr Natalenko
wrote:
> Hi, Kees, Paolo et al.
>
> 10.04.2018 08:53, Kees Cook wrote:
>>
>> Unfortunately I only had a single hang with no dumps. I haven't been
>> able to reproduce it since. :(
>
>
> For your convenience I've prepared a VM that contains a re
Hi, Kees, Paolo et al.
10.04.2018 08:53, Kees Cook wrote:
Unfortunately I only had a single hang with no dumps. I haven't been
able to reproduce it since. :(
For your convenience I've prepared a VM that contains a reproducer.
It consists of 3 disk images (sda.img is for the system, it is
Arc
Hi.
10.04.2018 08:35, Oleksandr Natalenko wrote:
- does it reproduce _without_ hardened usercopy? (I would assume yes,
but you'd just not get any warning until the hangs started.) If it
does reproduce without hardened usercopy, then a new bisect run could
narrow the search even more.
Looks lik
On Mon, Apr 9, 2018 at 11:35 PM, Oleksandr Natalenko
wrote:
> Did your system hang on smartctl hammering too? Have you got some stack
> traces to compare with mine ones?
Unfortunately I only had a single hang with no dumps. I haven't been
able to reproduce it since. :(
-Kees
--
Kees Cook
Pixel
Hi.
09.04.2018 22:30, Kees Cook wrote:
echo 1 | tee /sys/block/sd*/queue/nr_requests
I can't get this below "4".
Oops, yeah. It cannot be less than BLKDEV_MIN_RQ (which is 4), so it is
enforced explicitly in queue_requests_store(). It is the same for me.
echo 1 | tee /sys/block/sd*/devic
On Mon, Apr 9, 2018 at 1:30 PM, Kees Cook wrote:
> Ah! dm-crypt too. I'll see if I can get that added easily to my tests.
Quick update: I added dm-crypt (with XFS on top) and it hung my system
almost immediately. I got no warnings at all, though.
-Kees
--
Kees Cook
Pixel Security
On Mon, Apr 9, 2018 at 12:02 PM, Oleksandr Natalenko
wrote:
>
> Hi.
>
> (fancy details for linux-block and BFQ people go below)
>
> 09.04.2018 20:32, Kees Cook wrote:
>>
>> Ah, this detail I didn't have. I've changed my environment to
>>
>> build with:
>>
>> CONFIG_BLK_MQ_PCI=y
>> CONFIG_BLK_MQ_VI
Hi.
(fancy details for linux-block and BFQ people go below)
09.04.2018 20:32, Kees Cook wrote:
Ah, this detail I didn't have. I've changed my environment to
build with:
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_IOSCHED_BFQ=y
boot with scsi_mod.use_blk_mq=1
and select BFQ in the sche
On Sun, Apr 8, 2018 at 12:07 PM, Oleksandr Natalenko
wrote:
> So far, I wasn't able to trigger this with mq-deadline (or without blk-mq).
> Maybe, this has something to do with blk-mq+BFQ re-queuing, or it's just me
> not being persistent enough.
Ah, this detail I didn't have. I've changed my env
Hi.
09.04.2018 11:35, Christoph Hellwig wrote:
I really can't make sense of that report.
Sorry, I have nothing to add there so far, I just see the symptom of
something going wrong in the ioctl code path that is invoked by
smartctl, but I have no idea what's the minimal environment to reprodu
I really can't make sense of that report. And I'm also curious why
you think 17cb960f29c2 should change anything for that code path.
Hi.
Cc'ing linux-block people (mainly, Christoph) too because of 17cb960f29c2.
Also, duplicating the initial statement for them.
With v4.16 (and now with v4.16.1) it is possible to trigger usercopy whitelist
warning and/or bug while doing smartctl on a SATA disk having blk-mq and BFQ
enabled.
Hi.
05.04.2018 20:52, Kees Cook wrote:
Okay. My qemu gets mad about that and wants the format=raw argument,
so I'm using:
-drive file=sda.img,format=raw \
-drive file=sdb.img,format=raw \
How are you running your smartctl? I'm doing this now:
[1] Running while :; do
( sm
[forcing non-HTML and resending...]
On Thu, Apr 5, 2018 at 7:33 AM, Oleksandr Natalenko
wrote:
>
> 05.04.2018 16:32, Oleksandr Natalenko wrote:
>>
>> "-hda sda.img -hdb sda.img"
>
>
> "-hda sda.img -hdb sdb.img", of course, I don't pass the same disk twice
Okay. My qemu gets mad about that and w
05.04.2018 16:32, Oleksandr Natalenko wrote:
"-hda sda.img -hdb sda.img"
"-hda sda.img -hdb sdb.img", of course, I don't pass the same disk twice
☺
Hi.
05.04.2018 16:21, Kees Cook wrote:
I had a VM running over night with:
[1] Running while :; do
smartctl -a /dev/sda > /dev/null;
done &
[2]- Running while :; do
ls --color=auto -lR / > /dev/null 2> /dev/null;
done &
[3]+ Running wh
On Thu, Apr 5, 2018 at 2:56 AM, Oleksandr Natalenko
wrote:
> Hi.
>
> 04.04.2018 23:25, Kees Cook wrote:
>>
>> Thanks for the report! I hope someone more familiar with sg_io() can
>> help explain the changing buffer offset... :P
>
>
> Also, FYI, I kept the server running with smartctl periodically
Hi.
04.04.2018 23:25, Kees Cook wrote:
Thanks for the report! I hope someone more familiar with sg_io() can
help explain the changing buffer offset... :P
Also, FYI, I kept the server running with smartctl periodically invoked,
and it was still triggering BUGs, however, I consider them to be m
Hi.
04.04.2018 23:25, Kees Cook wrote:
Actually, I can trigger a BUG too:
[ 129.259213] usercopy: Kernel memory exposure attempt detected from
SLUB
object 'scsi_sense_cache' (offset 119, size 22)!
Wow, yeah, that's totally outside the slub object_size. How did you
trigger this? Just luck o
On Wed, Apr 4, 2018 at 1:49 PM, Oleksandr Natalenko
wrote:
> Hi.
>
> On středa 4. dubna 2018 22:21:53 CEST Kees Cook wrote:
>>
>> ...
>> That means scsi_sense_cache should be 96 bytes in size? But a 22 byte
>> read starting at offset 94 happened? That seems like a 20 byte read
>> beyond the end of
Hi.
On středa 4. dubna 2018 22:21:53 CEST Kees Cook wrote:
...
That means scsi_sense_cache should be 96 bytes in size? But a 22 byte
read starting at offset 94 happened? That seems like a 20 byte read
beyond the end of the SLUB object? Though if it were reading past the
actual end of the object,
On 2018-04-04 04:32 PM, Kees Cook wrote:
On Wed, Apr 4, 2018 at 12:07 PM, Oleksandr Natalenko
wrote:
[ 261.262135] Bad or missing usercopy whitelist? Kernel memory exposure
attempt detected from SLUB object 'scsi_sense_cache' (offset 94, size 22)!
I can easily reproduce it with a qemu VM and 2
On 2018-04-04 04:21 PM, Kees Cook wrote:
On Wed, Apr 4, 2018 at 12:07 PM, Oleksandr Natalenko
wrote:
With v4.16 I get the following dump while using smartctl:
[...]
[ 261.262135] Bad or missing usercopy whitelist? Kernel memory exposure
attempt detected from SLUB object 'scsi_sense_cache' (off
On Wed, Apr 4, 2018 at 12:07 PM, Oleksandr Natalenko
wrote:
> [ 261.262135] Bad or missing usercopy whitelist? Kernel memory exposure
> attempt detected from SLUB object 'scsi_sense_cache' (offset 94, size 22)!
> I can easily reproduce it with a qemu VM and 2 virtual SCSI disks by calling
> smart
On Wed, Apr 4, 2018 at 12:07 PM, Oleksandr Natalenko
wrote:
> With v4.16 I get the following dump while using smartctl:
> [...]
> [ 261.262135] Bad or missing usercopy whitelist? Kernel memory exposure
> attempt detected from SLUB object 'scsi_sense_cache' (offset 94, size 22)!
> [...]
> [ 261.3
Hi, Kees, David et al.
With v4.16 I get the following dump while using smartctl:
===
[ 261.260617] [ cut here ]
[ 261.262135] Bad or missing usercopy whitelist? Kernel memory exposure
attempt detected from SLUB object 'scsi_sense_cache' (offset 94, size 22)!
[ 261.2676
60 matches
Mail list logo