> Il giorno 20 apr 2018, alle ore 22:23, Kees Cook ha
> scritto:
>
> On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente
> wrote:
>> I'm missing something here. When the request gets completed in the
>> first place, the hook
Hi.
On 20.04.2018 22:23, Kees Cook wrote:
I don't know the "how", I only found the "what". :) If you want, grab
the reproducer VM linked to earlier in this thread; it'll hit the
problem within about 30 seconds of running the reproducer.
Just to avoid a possible confusion I should note that
On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente wrote:
> I'm missing something here. When the request gets completed in the
> first place, the hook bfq_finish_requeue_request gets called, and that
> hook clears both ->elv.priv elements (as the request has a non-null
>
> Il giorno 18 apr 2018, alle ore 16:30, Jens Axboe ha
> scritto:
>
> On 4/18/18 3:08 AM, Paolo Valente wrote:
>>
>>
>>> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha
>>> scritto:
>>>
>>> On 4/17/18 3:48 PM, Jens Axboe wrote:
On 4/17/18 3:47
On 4/18/18 3:08 AM, Paolo Valente wrote:
>
>
>> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha
>> scritto:
>>
>> On 4/17/18 3:48 PM, Jens Axboe wrote:
>>> On 4/17/18 3:47 PM, Kees Cook wrote:
On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
>
> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe ha
> scritto:
>
> On 4/17/18 3:48 PM, Jens Axboe wrote:
>> On 4/17/18 3:47 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue,
On 4/17/18 5:06 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 3:57 PM, Jens Axboe wrote:
>> On 4/17/18 3:48 PM, Jens Axboe wrote:
>>> On 4/17/18 3:47 PM, Kees Cook wrote:
On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
> On 4/17/18 3:25 PM, Kees
On Tue, Apr 17, 2018 at 3:57 PM, Jens Axboe wrote:
> On 4/17/18 3:48 PM, Jens Axboe wrote:
>> On 4/17/18 3:47 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:46
On 4/17/18 3:48 PM, Jens Axboe wrote:
> On 4/17/18 3:47 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
>>> On 4/17/18 3:25 PM, Kees Cook wrote:
On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
> I see elv.priv[1]
Hi.
17.04.2018 23:47, Kees Cook wrote:
I sent the patch anyway, since it's kind of a robustness improvement,
I'd hope. If you fix BFQ also, please add:
Reported-by: Oleksandr Natalenko
Root-caused-by: Kees Cook
:) I gotta task-switch to other
On 4/17/18 3:47 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
>> On 4/17/18 3:25 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
I see elv.priv[1] assignments made in a few places -- is it
On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe wrote:
> On 4/17/18 3:25 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
>>> I see elv.priv[1] assignments made in a few places -- is it possible
>>> there is some kind of
On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
>> I see elv.priv[1] assignments made in a few places -- is it possible
>> there is some kind of uninitialized-but-not-NULL state that can leak
>> in there?
>
> Got it. This fixes it
On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook wrote:
> I see elv.priv[1] assignments made in a few places -- is it possible
> there is some kind of uninitialized-but-not-NULL state that can leak
> in there?
Got it. This fixes it for me:
diff --git a/block/blk-mq.c
On Tue, Apr 17, 2018 at 1:28 PM, Jens Axboe wrote:
> It has to be the latter bfqq->dispatched increment, as those are
> transient (and bfqd is not).
Yeah, and I see a lot of comments around the lifetime of rq and bfqq,
so I assume something is not being locked correctly.
On 4/17/18 2:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:20 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote:
>>> The above bfq_dispatch_request+0x99/0xad0 is still
>>> __bfq_dispatch_request at
On Tue, Apr 17, 2018 at 1:20 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote:
>> The above bfq_dispatch_request+0x99/0xad0 is still
>> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
>> removed. 0x99 is 153
On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook wrote:
> The above bfq_dispatch_request+0x99/0xad0 is still
> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
> removed. 0x99 is 153 decimal:
>
> (gdb) disass bfq_dispatch_request
> Dump of assembler code for
On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote:
> With a hardware watchpoint, I've isolated the corruption to here:
>
> bfq_dispatch_request+0x2be/0x1610:
> __bfq_dispatch_request at block/bfq-iosched.c:3902
> 3900if (rq) {
> 3901inc_in_driver_start_rq:
>
On 4/17/18 10:42 AM, Kees Cook wrote:
> On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote:
>> With a hardware watchpoint, I've isolated the corruption to here:
>>
>> bfq_dispatch_request+0x2be/0x1610:
>> __bfq_dispatch_request at block/bfq-iosched.c:3902
>> 3900
On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook wrote:
> With a hardware watchpoint, I've isolated the corruption to here:
>
> bfq_dispatch_request+0x2be/0x1610:
> __bfq_dispatch_request at block/bfq-iosched.c:3902
> 3900if (rq) {
> 3901inc_in_driver_start_rq:
>
On Tue, Apr 17, 2018 at 3:02 AM, James Bottomley
wrote:
> On Mon, 2018-04-16 at 20:12 -0700, Kees Cook wrote:
>> I still haven't figured this out, though... any have a moment to look
>> at this?
>
> Just to let you know you're not alone ... but I can't make any sense of
>
On Tue, Apr 17, 2018 at 2:19 AM, Oleksandr Natalenko
wrote:
> By any chance, have you tried to simplify the reproducer environment, or it
> still needs my complex layout to trigger things even with KASAN?
I haven't tried minimizing the reproducer yet, no. Now that I
On Mon, 2018-04-16 at 20:12 -0700, Kees Cook wrote:
> I still haven't figured this out, though... any have a moment to look
> at this?
Just to let you know you're not alone ... but I can't make any sense of
this either. The bfdq is the elevator_data, which is initialised when
the scheduler is
Hi.
17.04.2018 05:12, Kees Cook wrote:
Turning off HARDENED_USERCOPY and turning on KASAN, I see the same
report:
[ 38.274106] BUG: KASAN: slab-out-of-bounds in
_copy_to_user+0x42/0x60
[ 38.274841] Read of size 22 at addr 8800122b8c4b by task
smartctl/1064
[ 38.275630]
[
On Mon, Apr 16, 2018 at 1:44 PM, Kees Cook wrote:
> On Thu, Apr 12, 2018 at 8:02 PM, Kees Cook wrote:
>> On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote:
>>> After fixing up some build issues in the middle of the 4.16 cycle,
On Thu, Apr 12, 2018 at 8:02 PM, Kees Cook wrote:
> On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote:
>> After fixing up some build issues in the middle of the 4.16 cycle, I
>> get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge branch
>>
On Thu, Apr 12, 2018 at 3:47 PM, Kees Cook wrote:
> After fixing up some build issues in the middle of the 4.16 cycle, I
> get an unhelpful bisect result of commit 0a4b6e2f80aa ("Merge branch
> 'for-4.16/block'"). Instead of letting the test run longer, I'm going
> to
On Thu, Apr 12, 2018 at 3:01 PM, Kees Cook wrote:
> On Thu, Apr 12, 2018 at 12:04 PM, Oleksandr Natalenko
> wrote:
>> Hi.
>>
>> On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote:
>>> My first bisect attempt gave me commit 5448aca41cd5
On Thu, Apr 12, 2018 at 12:04 PM, Oleksandr Natalenko
wrote:
> Hi.
>
> On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote:
>> My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire
>> up timeouts"), which seems insane given that null_blk isn't even
Hi.
On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote:
> My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire
> up timeouts"), which seems insane given that null_blk isn't even built
> in the .config. I managed to get the testing automated now for a "git
> bisect run ...",
On Wed, Apr 11, 2018 at 5:03 PM, Kees Cook wrote:
> On Wed, Apr 11, 2018 at 3:47 PM, Kees Cook wrote:
>> On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote:
>>> I'll see about booting with my own kernels, etc, and try to narrow
On Wed, Apr 11, 2018 at 3:47 PM, Kees Cook wrote:
> On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote:
>> I'll see about booting with my own kernels, etc, and try to narrow this
>> down. :)
>
> If I boot kernels I've built, I no longer hit the bug in
On Tue, Apr 10, 2018 at 8:13 PM, Kees Cook wrote:
> I'll see about booting with my own kernels, etc, and try to narrow this down.
> :)
If I boot kernels I've built, I no longer hit the bug in this VM
(though I'll keep trying). What compiler are you using?
-Kees
--
Kees
On Tue, Apr 10, 2018 at 10:16 AM, Oleksandr Natalenko
wrote:
> Hi, Kees, Paolo et al.
>
> 10.04.2018 08:53, Kees Cook wrote:
>>
>> Unfortunately I only had a single hang with no dumps. I haven't been
>> able to reproduce it since. :(
>
>
> For your convenience I've
Hi, Kees, Paolo et al.
10.04.2018 08:53, Kees Cook wrote:
Unfortunately I only had a single hang with no dumps. I haven't been
able to reproduce it since. :(
For your convenience I've prepared a VM that contains a reproducer.
It consists of 3 disk images (sda.img is for the system, it is
Hi.
10.04.2018 08:35, Oleksandr Natalenko wrote:
- does it reproduce _without_ hardened usercopy? (I would assume yes,
but you'd just not get any warning until the hangs started.) If it
does reproduce without hardened usercopy, then a new bisect run could
narrow the search even more.
Looks
On Mon, Apr 9, 2018 at 11:35 PM, Oleksandr Natalenko
wrote:
> Did your system hang on smartctl hammering too? Have you got some stack
> traces to compare with mine ones?
Unfortunately I only had a single hang with no dumps. I haven't been
able to reproduce it since. :(
Hi.
09.04.2018 22:30, Kees Cook wrote:
echo 1 | tee /sys/block/sd*/queue/nr_requests
I can't get this below "4".
Oops, yeah. It cannot be less than BLKDEV_MIN_RQ (which is 4), so it is
enforced explicitly in queue_requests_store(). It is the same for me.
echo 1 | tee
On Mon, Apr 9, 2018 at 1:30 PM, Kees Cook wrote:
> Ah! dm-crypt too. I'll see if I can get that added easily to my tests.
Quick update: I added dm-crypt (with XFS on top) and it hung my system
almost immediately. I got no warnings at all, though.
-Kees
--
Kees Cook
On Mon, Apr 9, 2018 at 12:02 PM, Oleksandr Natalenko
wrote:
>
> Hi.
>
> (fancy details for linux-block and BFQ people go below)
>
> 09.04.2018 20:32, Kees Cook wrote:
>>
>> Ah, this detail I didn't have. I've changed my environment to
>>
>> build with:
>>
>>
Hi.
(fancy details for linux-block and BFQ people go below)
09.04.2018 20:32, Kees Cook wrote:
Ah, this detail I didn't have. I've changed my environment to
build with:
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_IOSCHED_BFQ=y
boot with scsi_mod.use_blk_mq=1
and select BFQ in the
On Sun, Apr 8, 2018 at 12:07 PM, Oleksandr Natalenko
wrote:
> So far, I wasn't able to trigger this with mq-deadline (or without blk-mq).
> Maybe, this has something to do with blk-mq+BFQ re-queuing, or it's just me
> not being persistent enough.
Ah, this detail I
Hi.
09.04.2018 11:35, Christoph Hellwig wrote:
I really can't make sense of that report.
Sorry, I have nothing to add there so far, I just see the symptom of
something going wrong in the ioctl code path that is invoked by
smartctl, but I have no idea what's the minimal environment to
I really can't make sense of that report. And I'm also curious why
you think 17cb960f29c2 should change anything for that code path.
Hi.
Cc'ing linux-block people (mainly, Christoph) too because of 17cb960f29c2.
Also, duplicating the initial statement for them.
With v4.16 (and now with v4.16.1) it is possible to trigger usercopy whitelist
warning and/or bug while doing smartctl on a SATA disk having blk-mq and BFQ
enabled.
46 matches
Mail list logo