[3.10-rc6] WARNING: at fs/btrfs/inode.c:7961 btrfs_destroy_inode+0x265/0x2e0 [btrfs]()

2013-06-17 Thread Dave Jones
Hit this while running this script in a loop.. https://github.com/kernelslacker/io-tests/blob/master/setup.sh [34385.251507] [ cut here ] [34385.254068] WARNING: at fs/btrfs/inode.c:7961 btrfs_destroy_inode+0x265/0x2e0 [btrfs]() [34385.257275] Modules linked in:

Re: [3.10-rc6] WARNING: at fs/btrfs/inode.c:7961 btrfs_destroy_inode+0x265/0x2e0 [btrfs]()

2013-06-17 Thread Dave Jones
On Mon, Jun 17, 2013 at 02:42:27PM -0400, Chris Mason wrote: Quoting Dave Jones (2013-06-17 14:20:06) On Mon, Jun 17, 2013 at 01:39:42PM -0400, Chris Mason wrote: Quoting Dave Jones (2013-06-17 09:49:55) Hit this while running this script in a loop.. https://github.com

btrfs triggered 'MAX_LOCKDEP_CHAINS too low'

2013-06-17 Thread Dave Jones
Something else I've seen a few times from my io script (Always during btrfs runs)... BUG: MAX_LOCKDEP_CHAINS too low! turning off the locking correctness validator. Please attach the output of /proc/lock_stat to the bug report CPU: 1 PID: 492255 Comm: kworker/u8:0 Not tainted 3.10.0-rc6+ #6

Re: [3.10-rc6] WARNING: at fs/btrfs/inode.c:7961 btrfs_destroy_inode+0x265/0x2e0 [btrfs]()

2013-06-19 Thread Dave Jones
On Wed, Jun 19, 2013 at 02:02:33PM -0400, Chris Mason wrote: Quoting Dave Jones (2013-06-17 14:58:10) On Mon, Jun 17, 2013 at 02:42:27PM -0400, Chris Mason wrote: Quoting Dave Jones (2013-06-17 14:20:06) On Mon, Jun 17, 2013 at 01:39:42PM -0400, Chris Mason wrote: Quoting

btrfs triggered lockdep WARN.

2013-06-27 Thread Dave Jones
Another bug caused by this script. https://github.com/kernelslacker/io-tests/blob/master/setup.sh WARNING: at kernel/lockdep.c:708 __lock_acquire+0x183b/0x1b70() Modules linked in: sctp lec bridge 8021q garp stp mrp fuse dlci tun bnep hidp rfcomm l2tp_ppp l2tp_netlink l2tp_core

Re: btrfs triggered lockdep WARN.

2013-06-27 Thread Dave Jones
On Thu, Jun 27, 2013 at 11:01:30AM -0400, Chris Mason wrote: Quoting Dave Jones (2013-06-27 10:58:24) Another bug caused by this script. https://github.com/kernelslacker/io-tests/blob/master/setup.sh I'm still struggling to reproduce that one here. I've tried every variation I can

Re: btrfs triggered lockdep WARN.

2013-06-27 Thread Dave Jones
On Thu, Jun 27, 2013 at 11:38:57AM -0400, Chris Mason wrote: I really hope you don't already have CONFIG_DEBUG_PAGE_ALLOC turned on, maybe it will catch this? I do. Though given this is lockdep complaining about what looks like memory corruption, it's probably not related.

Fix leak in __btrfs_map_block error path

2013-07-30 Thread Dave Jones
If we bail out when the stripe alloc fails, we need to undo the earlier allocation of raid_map. Signed-off-by: Dave Jones da...@redhat.com diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 78b8717..6a0f52f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4671,6 +4671,7

3.4rc1 btrfs IO problem ?

2012-04-02 Thread Dave Jones
I noticed something odd when I booted rc1 on my btrfs test box.. First I tried updating my git tree.. $ gp error: cannot open .git/FETCH_HEAD: Input/output error $ gp remote: Counting objects: 5101, done. remote: Compressing objects: 100% (546/546), done. remote: Total 3764 (delta 3233), reused

btrfs io errors on 3.4rc1

2012-04-02 Thread Dave Jones
Updated to rc1 this morning, and my machines with btrfs are all freaking out.. I got this from my cron email .. /etc/cron.daily/prelink: line 41: /var/lib/prelink/full: Input/output error cp: cannot create regular file `/var/lib/prelink/quick': Input/output error /etc/cron.daily/prelink: line

Re: btrfs io errors on 3.4rc1

2012-04-02 Thread Dave Jones
On Mon, Apr 02, 2012 at 03:48:14PM -0400, Chris Mason wrote: On Mon, Apr 02, 2012 at 02:02:14PM -0400, Dave Jones wrote: Updated to rc1 this morning, and my machines with btrfs are all freaking out.. I got this from my cron email .. /etc/cron.daily/prelink: line 41: /var/lib

Re: btrfs io errors on 3.4rc1

2012-04-02 Thread Dave Jones
On Mon, Apr 02, 2012 at 05:26:08PM -0400, Chris Mason wrote: On Mon, Apr 02, 2012 at 05:16:22PM -0400, Dave Jones wrote: On Mon, Apr 02, 2012 at 03:48:14PM -0400, Chris Mason wrote: On Mon, Apr 02, 2012 at 02:02:14PM -0400, Dave Jones wrote: Updated to rc1 this morning, and my

Re: btrfs io errors on 3.4rc1

2012-04-02 Thread Dave Jones
On Mon, Apr 02, 2012 at 06:28:02PM -0400, Chris Mason wrote: x86-64. dmesg below. (ignore the rpc oops, reported elsewhere, it's unrelated) Well, there really are no btrfs messages in there at all. Do you have free space for a clean copy of the btrfs partition? Trying to

Re: btrfs io errors on 3.4rc1

2012-04-02 Thread Dave Jones
On Mon, Apr 02, 2012 at 06:39:19PM -0400, Chris Mason wrote: On Mon, Apr 02, 2012 at 06:33:50PM -0400, Dave Jones wrote: On Mon, Apr 02, 2012 at 06:28:02PM -0400, Chris Mason wrote: x86-64. dmesg below. (ignore the rpc oops, reported elsewhere, it's unrelated

Re: btrfs io errors on 3.4rc1

2012-04-02 Thread Dave Jones
On Mon, Apr 02, 2012 at 07:50:50PM -0400, Chris Mason wrote: I'll start a bisect later to see if I can narrow it down at least. Ok, a directed bisect of the major suspects. Josef changed the extent buffer eio code in this commit (jump to the commit before it): I had already started

Re: btrfs io errors on 3.4rc1

2012-04-03 Thread Dave Jones
On Tue, Apr 03, 2012 at 04:26:07PM +0200, David Sterba wrote: On Mon, Apr 02, 2012 at 09:47:22PM -0400, Dave Jones wrote: 49b25e0540904be0bf558b84475c69d72e4de66e is the first bad commit btrfs: enhance transaction abort infrastructure Attached patch adds several debugging printks

Re: btrfs io errors on 3.4rc1

2012-04-03 Thread Dave Jones
On Tue, Apr 03, 2012 at 06:33:43PM +0200, David Sterba wrote: On Tue, Apr 03, 2012 at 12:20:23PM -0400, Dave Jones wrote: I see a lot of these .. btrfs: __btrfs_end_transaction -EIO abored=1802201963 (no super error) 1802201963 == 0x6b6b6b6b #define POISON_FREE 0x6b

Re: btrfs io errors on 3.4rc1

2012-04-03 Thread Dave Jones
On Tue, Apr 03, 2012 at 12:50:55PM -0400, Dave Jones wrote: On Tue, Apr 03, 2012 at 06:33:43PM +0200, David Sterba wrote: On Tue, Apr 03, 2012 at 12:20:23PM -0400, Dave Jones wrote: I see a lot of these .. btrfs: __btrfs_end_transaction -EIO abored=1802201963 (no super error

Re: btrfs io errors on 3.4rc1

2012-04-03 Thread Dave Jones
On Tue, Apr 03, 2012 at 01:07:50PM -0400, Dave Jones wrote: wait, what... 535 memset(trans, 0, sizeof(*trans)); 536 kmem_cache_free(btrfs_trans_handle_cachep, trans); 537 538 if (throttle) 539 btrfs_run_delayed_iputs(root

[3.18rc1] btrfs triggering vm bug_on

2014-10-21 Thread Dave Jones
page:ea00088aa1c0 count:4 mapcount:0 mapping:88009901e2d8 index:0x0 flags: 0x2ffc000806(error|referenced|private) page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) [ cut here ] kernel BUG at mm/filemap.c:747! invalid opcode: [#1] PREEMPT SMP

Re: [3.18rc1] btrfs triggering vm bug_on

2014-10-21 Thread Dave Jones
On Wed, Oct 22, 2014 at 08:50:57AM +0800, Qu Wenruo wrote: Any reproducer? Thanks, Qu Original Message Subject: [3.18rc1] btrfs triggering vm bug_on From: Dave Jones da...@redhat.com To: Linux Kernel linux-ker...@vger.kernel.org Date: 2014年10月22日 05:57

btrfs: WARN_ON(data_sinfo-bytes_may_use bytes);

2014-10-22 Thread Dave Jones
Just hit this while running trinity. WARNING: CPU: 3 PID: 9612 at fs/btrfs/extent-tree.c:3799 btrfs_free_reserved_data_space+0x1d1/0x280 [btrfs]() Modules linked in: rfcomm hidp bnep af_key llc2 scsi_transport_iscsi nfnetlink sctp libcrc32c can_raw can_bcm nfc caif_socket caif af_802154

Re: btrfs: WARN_ON(data_sinfo-bytes_may_use bytes);

2014-10-22 Thread Dave Jones
On Wed, Oct 22, 2014 at 09:07:31PM -0400, Dave Jones wrote: Just hit this while running trinity. WARNING: CPU: 3 PID: 9612 at fs/btrfs/extent-tree.c:3799 btrfs_free_reserved_data_space+0x1d1/0x280 [btrfs]() Modules linked in: rfcomm hidp bnep af_key llc2 scsi_transport_iscsi

Re: btrfs: WARN_ON(data_sinfo-bytes_may_use bytes);

2014-10-24 Thread Dave Jones
On Fri, Oct 24, 2014 at 11:52:47PM +0800, Liu Bo wrote: I also see this WARN_ON being hit from the sync path.. WARNING: CPU: 2 PID: 11166 at fs/btrfs/extent-tree.c:3799 btrfs_free_reserved_data_space+0x1d1/0x280 [btrfs]() CPU: 2 PID: 11166 Comm: trinity-c61 Tainted: GW

btrfs crash when low on memory.

2013-02-26 Thread Dave Jones
Something I've yet to repeat managed to leak a whole bunch of memory while I was travelling, and locked up my workstation. When I got home, this was the last thing printed out before it locked up (it did make it into the logs thankfully) after a bunch of instances of the oom-killers handywork.

!PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-08 Thread Dave Jones
Not sure if I've already reported this one, but I've been seeing this a lot this last couple days. kernel BUG at mm/page-writeback.c:2654! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN CPU: 1 PID: 2566 Comm: trinity-c1 Tainted: GW 4.4.0-rc4-think+ #14 task:

Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Dave Jones
On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote: > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote: > > Not sure if I've already reported this one, but I've been seeing this > > a lot this last couple days. > > > > kernel BUG at mm/page-wri

Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Dave Jones
On Thu, Dec 10, 2015 at 04:30:24PM -0500, Chris Mason wrote: > On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote: > > On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote: > > > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote: > > >

Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Dave Jones
On Thu, Dec 10, 2015 at 04:30:24PM -0500, Chris Mason wrote: > On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote: > > On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote: > > > On Tue, Dec 08, 2015 at 11:25:28PM -0500, Dave Jones wrote: > > >

Re: !PageLocked BUG_ON hit in clear_page_dirty_for_io

2015-12-10 Thread Dave Jones
On Thu, Dec 10, 2015 at 05:57:20PM -0500, Dave Jones wrote: > On Thu, Dec 10, 2015 at 04:30:24PM -0500, Chris Mason wrote: > > On Thu, Dec 10, 2015 at 02:35:55PM -0500, Dave Jones wrote: > > > On Thu, Dec 10, 2015 at 02:02:20PM -0500, Chris Mason wrote: > > > &

Re: btrfs_destroy_inode WARN_ON.

2016-03-27 Thread Dave Jones
On Thu, Mar 24, 2016 at 06:54:11PM -0400, Dave Jones wrote: > Just hit this on a tree from earlier this morning, v4.5-11140 or so. > > WARNING: CPU: 2 PID: 32570 at fs/btrfs/inode.c:9261 > btrfs_destroy_inode+0x389/0x3f0 [btrfs] > CPU: 2 PID: 32570 Comm: rm Not tainted 4

btrfs_destroy_inode WARN_ON.

2016-03-24 Thread Dave Jones
Just hit this on a tree from earlier this morning, v4.5-11140 or so. WARNING: CPU: 2 PID: 32570 at fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x389/0x3f0 [btrfs] CPU: 2 PID: 32570 Comm: rm Not tainted 4.5.0-think+ #14 c039baf9 ef721ef0 88025966fc08 8957bcdb

Re: btrfs_destroy_inode WARN_ON.

2016-04-01 Thread Dave Jones
On Sun, Mar 27, 2016 at 09:14:00PM -0400, Dave Jones wrote: > > WARNING: CPU: 2 PID: 32570 at fs/btrfs/inode.c:9261 > btrfs_destroy_inode+0x389/0x3f0 [btrfs] > > CPU: 2 PID: 32570 Comm: rm Not tainted 4.5.0-think+ #14 > > c039baf9 ef72

Re: btrfs_destroy_inode WARN_ON.

2016-04-01 Thread Dave Jones
On Fri, Apr 01, 2016 at 02:12:27PM -0400, Dave Jones wrote: > BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > Showing busy workqueues and worker pools: > workqueue events: flags=0x0 > pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 &

BTRFS: assertion failed: num_extents, file: fs/btrfs/extent-tree.c, line: 5584

2016-04-20 Thread Dave Jones
Don't think I've reported this one before. It's on the same box I've been seeing the btrfs_destroy_inode WARN_ON's on though. Dave BTRFS: assertion failed: num_extents, file: fs/btrfs/extent-tree.c, line: 5584 [ cut here ] kernel BUG at fs/btrfs/ctree.h:4320!

Re: assertion failed: last_size == new_size, file: fs/btrfs/inode.c

2017-02-27 Thread Dave Jones
On Mon, Feb 27, 2017 at 07:53:48AM -0800, Liu Bo wrote: > On Sun, Feb 26, 2017 at 07:18:42PM -0500, Dave Jones wrote: > > Hitting this fairly frequently.. I'm not sure if this is the same bug I've > > been hitting occasionally since 4.9. The assertion looks new to me

assertion failed: last_size == new_size, file: fs/btrfs/inode.c

2017-02-26 Thread Dave Jones
Hitting this fairly frequently.. I'm not sure if this is the same bug I've been hitting occasionally since 4.9. The assertion looks new to me at least. Dave assertion failed: last_size == new_size, file: fs/btrfs/inode.c, line: 4619 [ cut here ] kernel BUG at

Re: lockdep warning in btrfs in 4.8-rc3

2016-09-08 Thread Dave Jones
On Thu, Sep 08, 2016 at 08:58:48AM -0400, Chris Mason wrote: > On 09/08/2016 07:50 AM, Christian Borntraeger wrote: > > On 09/08/2016 01:48 PM, Christian Borntraeger wrote: > >> Chris, > >> > >> with 4.8-rc3 I get the following on an s390 box: > > > > Sorry for the noise, just saw the fix

btrfs_direct_IO oops

2016-10-08 Thread Dave Jones
Found this in logs this morning. First time I've seen this one. Might be related to some direct IO related changes I made in Trinity that is tickling some new path. Oops: [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 2 PID: 25313 Comm: trinity-c18 Not tainted 4.8.0-think+ #7 task: 88040f7b1c00

Re: btrfs_direct_IO oops

2016-10-08 Thread Dave Jones
On Sat, Oct 08, 2016 at 07:29:03PM +0100, Al Viro wrote: > On Sat, Oct 08, 2016 at 02:08:06PM -0400, Dave Jones wrote: > > That code: matches this dissembly: > > > > for (i = seg + 1; i < iter->nr_segs; i++) { > > *whoa* > > OK

Re: btrfs_direct_IO oops

2016-10-10 Thread Dave Jones
On Mon, Oct 10, 2016 at 04:43:57AM +0100, Al Viro wrote: > Very interesting. Could you slap something like > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 0ce3411..1ef00e7 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -682,8 +682,9 @@ static void pipe_advance(struct

Re: btrfs_direct_IO oops

2016-10-09 Thread Dave Jones
On Sat, Oct 08, 2016 at 07:20:08PM -0400, Dave Jones wrote: > On Sat, Oct 08, 2016 at 07:29:03PM +0100, Al Viro wrote: > > On Sat, Oct 08, 2016 at 02:08:06PM -0400, Dave Jones wrote: > > > That code: matches this dissembly: > > > > > >

Re: btrfs_direct_IO oops

2016-10-10 Thread Dave Jones
On Mon, Oct 10, 2016 at 04:55:03PM +0100, Al Viro wrote: > > WARNING: CPU: 1 PID: 13581 at lib/iov_iter.c:327 sanity+0x102/0x150 > > CPU: 1 PID: 13581 Comm: trinity-c17 Not tainted 4.8.0-think+ #9 > > c9963ae8 > > b93e22d1 > > > > > >

Re: bio linked list corruption.

2016-10-26 Thread Dave Jones
On Tue, Oct 25, 2016 at 06:39:03PM -0700, Linus Torvalds wrote: > On Tue, Oct 25, 2016 at 6:33 PM, Linus Torvalds > wrote: > > > > Completely untested. Maybe there's some reason we can't write to the > > whole thing like that? > > That hack boots and seems

Re: bio linked list corruption.

2016-10-26 Thread Dave Jones
On Wed, Oct 26, 2016 at 05:03:45PM -0600, Jens Axboe wrote: > On 10/26/2016 04:58 PM, Linus Torvalds wrote: > > On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds > > wrote: > >> > >> Dave: it might be a good idea to split that "WARN_ON_ONCE()" in > >>

Re: bio linked list corruption.

2016-10-26 Thread Dave Jones
On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote: > >- hctx->queued++; > >- data->hctx = hctx; > >- data->ctx = ctx; > >+ data->hctx = alloc_data.hctx; > >+ data->ctx = alloc_data.ctx; > >+ data->hctx->queued++; > >return rq; > > } > > This made it through

Re: bio linked list corruption.

2016-10-26 Thread Dave Jones
On Wed, Oct 26, 2016 at 03:51:01PM -0700, Linus Torvalds wrote: > Dave: it might be a good idea to split that "WARN_ON_ONCE()" in > blk_mq_merge_queue_io() into two, since right now it can trigger both > for the > > blk_mq_bio_to_request(rq, bio); > > path _and_ for the >

Re: bio linked list corruption.

2016-10-26 Thread Dave Jones
On Wed, Oct 26, 2016 at 09:48:39AM -0700, Linus Torvalds wrote: > I know you already had this in some email, but I lost it. I think you > narrowed it down to a specific set of system calls that seems to > trigger this best. fallocate and xattrs or something? So I was about to give that a shot

Re: btrfs btree_ctree_super fault

2016-11-08 Thread Dave Jones
On Sun, Nov 06, 2016 at 11:55:39AM -0500, Dave Jones wrote: > > > On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote: > > On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote: > > >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones <da...@c

Re: btrfs btree_ctree_super fault

2016-11-06 Thread Dave Jones
On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote: > On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote: > >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones <da...@codemonkey.org.uk> > >wrote: > >> > >> BUG: Bad page sta

Re: btrfs btree_ctree_super fault

2016-11-10 Thread Dave Jones
On Tue, Nov 08, 2016 at 10:08:04AM -0500, Chris Mason wrote: > > And another new one: > > > > kernel BUG at fs/btrfs/ctree.c:3172! > > > > Call Trace: > > [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] > > We've been hunting this one for at least two years. It's the white > whale of

Re: bio linked list corruption.

2016-10-22 Thread Dave Jones
On Fri, Oct 21, 2016 at 04:02:45PM -0400, Dave Jones wrote: > > It could be worth trying this, too: > > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=174531fef4e8 > > > > It occurred to me that the cur

Re: bio linked list corruption.

2016-10-23 Thread Dave Jones
On Sun, Oct 23, 2016 at 05:32:21PM -0400, Chris Mason wrote: > > > On 10/22/2016 11:20 AM, Dave Jones wrote: > > On Fri, Oct 21, 2016 at 04:02:45PM -0400, Dave Jones wrote: > > > > > > It could be worth trying this, too: > > > > >

Re: bio linked list corruption.

2016-10-21 Thread Dave Jones
On Fri, Oct 21, 2016 at 04:17:48PM -0400, Chris Mason wrote: > > BTRFS warning (device sda3): csum failed ino 130654 off 0 csum 2566472073 > > expected csum 3008371513 > > BTRFS warning (device sda3): csum failed ino 131057 off 4096 csum > > 3563910319 expected csum 738595262 > > BTRFS

Re: bio linked list corruption.

2016-10-21 Thread Dave Jones
On Thu, Oct 20, 2016 at 04:23:32PM -0700, Andy Lutomirski wrote: > On Thu, Oct 20, 2016 at 4:03 PM, Dave Jones <da...@codemonkey.org.uk> wrote: > > On Thu, Oct 20, 2016 at 04:01:12PM -0700, Andy Lutomirski wrote: > > > On Thu, Oct 20, 2016 at 3:50 PM, Dave Jones

Re: bio linked list corruption.

2016-10-21 Thread Dave Jones
On Fri, Oct 21, 2016 at 04:41:09PM -0400, Josef Bacik wrote: > >> > > >> > btrfs inspect inode 130654 mntpoint > >> > >> Interesting, they all return > >> > >> ERROR: ino paths ioctl: No such file or directory > >> > >> So these files got deleted perhaps ? > >> > > Yeah, they must

Re: bio linked list corruption.

2016-10-20 Thread Dave Jones
On Tue, Oct 18, 2016 at 06:05:57PM -0700, Andy Lutomirski wrote: > One possible debugging approach would be to change: > > #define NR_CACHED_STACKS 2 > > to > > #define NR_CACHED_STACKS 0 > > in kernel/fork.c and to set CONFIG_DEBUG_PAGEALLOC=y. The latter will > force an

Re: bio linked list corruption.

2016-10-20 Thread Dave Jones
On Thu, Oct 20, 2016 at 04:01:12PM -0700, Andy Lutomirski wrote: > On Thu, Oct 20, 2016 at 3:50 PM, Dave Jones <da...@codemonkey.org.uk> wrote: > > On Tue, Oct 18, 2016 at 06:05:57PM -0700, Andy Lutomirski wrote: > > > > > One possible debuggi

Re: bio linked list corruption.

2016-10-20 Thread Dave Jones
On Tue, Oct 18, 2016 at 05:28:44PM -0700, Linus Torvalds wrote: > On Tue, Oct 18, 2016 at 5:10 PM, Linus Torvalds > wrote: > > > > Adding Andy to the cc, because this *might* be triggered by the > > vmalloc stack code itself. Maybe the re-use of stacks showing

Re: bio linked list corruption.

2016-10-18 Thread Dave Jones
On Tue, Oct 11, 2016 at 10:45:07AM -0400, Dave Jones wrote: > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > list_add corruption. prev->next should be next (e8806648), but was > c967fcd8. (prev=880503878b80). > CPU: 1 PID: 3673 C

Re: bio linked list corruption.

2016-11-23 Thread Dave Jones
On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote: > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4 > trace from just before this happened. Does this shed any light ? > > https://codemonkey.org.uk/junk/trace.txt crap, I just noticed th

Re: bio linked list corruption.

2016-11-23 Thread Dave Jones
On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote: > On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote: > >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones <da...@codemonkey.org.uk> > >wrote: > >> > >> BUG: Bad page sta

Re: bio linked list corruption.

2016-10-26 Thread Dave Jones
On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote: > Could you try the attached patch? It adds a couple of sanity tests: > > - a number of tests to verify that 'rq->queuelist' isn't already on > some queue when it is added to a queue > > - one test to verify that rq->mq_ctx

Re: bio linked list corruption.

2016-10-26 Thread Dave Jones
On Wed, Oct 26, 2016 at 09:48:39AM -0700, Linus Torvalds wrote: > On Wed, Oct 26, 2016 at 9:30 AM, Dave Jones <da...@codemonkey.org.uk> wrote: > > > > I gave this a go last thing last night. It crashed within 5 minutes, > > but it was one we've already s

Re: bio linked list corruption.

2016-10-27 Thread Dave Jones
On Thu, Oct 27, 2016 at 04:41:33PM +1100, Dave Chinner wrote: > And that's indicative of a delalloc metadata reservation being > being too small and so we're allocating unreserved blocks. > > Different symptoms, same underlying cause, I think. > > I see the latter assert from time to

Re: bio linked list corruption.

2016-10-31 Thread Dave Jones
On Wed, Oct 26, 2016 at 07:47:51PM -0400, Dave Jones wrote: > On Wed, Oct 26, 2016 at 07:38:08PM -0400, Chris Mason wrote: > > > >-hctx->queued++; > > >-data->hctx = hctx; > > >-data->ctx = ctx; > > >+

Re: btrfs bio linked list corruption.

2016-10-11 Thread Dave Jones
On Tue, Oct 11, 2016 at 11:20:41AM -0400, Chris Mason wrote: > > > On 10/11/2016 11:19 AM, Dave Jones wrote: > > On Tue, Oct 11, 2016 at 04:11:39PM +0100, Al Viro wrote: > > > On Tue, Oct 11, 2016 at 10:45:08AM -0400, Dave Jones wrote: > > > > This is

Re: btrfs bio linked list corruption.

2016-10-11 Thread Dave Jones
On Tue, Oct 11, 2016 at 04:11:39PM +0100, Al Viro wrote: > On Tue, Oct 11, 2016 at 10:45:08AM -0400, Dave Jones wrote: > > This is from Linus' current tree, with Al's iovec fixups on top. > > Those iovec fixups are in the current tree... ah yeah, git quietly dropped my lo

btrfs bio linked list corruption.

2016-10-11 Thread Dave Jones
This is from Linus' current tree, with Al's iovec fixups on top. [ cut here ] WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 list_add corruption. prev->next should be next (e8806648), but was c967fcd8. (prev=880503878b80). CPU: 1

Re: btrfs bio linked list corruption.

2016-10-11 Thread Dave Jones
On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > [ cut here ] > > WARNING: CPU: 1 PID: 3673

Re: btrfs bio linked list corruption.

2016-10-13 Thread Dave Jones
On Wed, Oct 12, 2016 at 10:42:46AM -0400, Chris Mason wrote: > On 10/12/2016 10:40 AM, Dave Jones wrote: > > On Wed, Oct 12, 2016 at 09:47:17AM -0400, Dave Jones wrote: > > > On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > > > > &g

Re: btrfs bio linked list corruption.

2016-10-13 Thread Dave Jones
On Thu, Oct 13, 2016 at 05:18:46PM -0400, Chris Mason wrote: > > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 > > start_transaction+0x40a/0x440 [btrfs] > > > > CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 > > > > c900019076a8 b731ff3c

Re: btrfs bio linked list corruption.

2016-10-12 Thread Dave Jones
On Wed, Oct 12, 2016 at 09:47:17AM -0400, Dave Jones wrote: > On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > > This is from Linus' current tree, with Al's iovec fixups on top. > &g

Re: btrfs bio linked list corruption.

2016-10-12 Thread Dave Jones
On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > [ cut here ] > > WARNING: CPU: 1 PID: 3673

Re: btrfs bio linked list corruption.

2016-10-15 Thread Dave Jones
On Thu, Oct 13, 2016 at 05:18:46PM -0400, Chris Mason wrote: > > > > .. and of course the first thing that happens is a completely > > different > > > > btrfs trace.. > > > > > > > > > > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 > > start_transaction+0x40a/0x440

Re: btrfs_destroy_inode warn (outstanding extents)

2016-12-07 Thread Dave Jones
On Sat, Dec 03, 2016 at 11:48:33AM -0500, Dave Jones wrote: > The interesting process here seems to be kworker/u8:17, and the trace > captures some of what that was doing before that bad page was hit. I'm travelling next week, so I'm trying to braindump the stuff I've found

4.10-rc btrfs gets 'stuck'.

2017-01-13 Thread Dave Jones
I've seen this happen 3 times during 4.10rc. When running trinity, it gets 'stuck', with all but one process stuck on a lock. I've left this running for days, and it never makes progress. The process holding the lock seems to be stuck somewhere. When this happens it's pretty apparent in ps axf

Re: btrfs_destroy_inode warn (outstanding extents)

2016-12-03 Thread Dave Jones
On Thu, Dec 01, 2016 at 10:32:09AM -0500, Dave Jones wrote: > http://codemonkey.org.uk/junk/btrfs-destroy-inode-outstanding-extents.txt > > Also same bug, different run, but a different traceview > http://codemonkey.org.uk/junk/btrfs-destroy-inode-outstanding-extents-functi

Re: btrfs_destroy_inode warn (outstanding extents)

2016-12-01 Thread Dave Jones
On Wed, Nov 23, 2016 at 02:58:45PM -0500, Dave Jones wrote: > On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote: > > > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4 > > trace from just before this happened. Does this shed any light ?

Re: bio linked list corruption.

2016-12-05 Thread Dave Jones
On Mon, Dec 05, 2016 at 06:09:29PM +0100, Vegard Nossum wrote: > On 5 December 2016 at 12:10, Vegard Nossum wrote: > > On 5 December 2016 at 00:04, Vegard Nossum wrote: > >> FWIW I hit this as well: > >> > >> BUG: unable to handle kernel

Re: assertion failed: last_size == new_size, file: fs/btrfs/inode.c

2017-03-03 Thread Dave Jones
On Thu, Mar 02, 2017 at 06:04:33PM -0800, Liu Bo wrote: > On Thu, Mar 02, 2017 at 07:58:01AM -0800, Liu Bo wrote: > > On Wed, Mar 01, 2017 at 03:03:19PM -0500, Dave Jones wrote: > > > On Tue, Feb 28, 2017 at 05:12:01PM -0800, Liu Bo wrote: > > > > On Mon, Fe

assertion failed: page_ops & PAGE_LOCK

2017-03-05 Thread Dave Jones
After commenting out the assertion that Liu bo pointed out was bogus, my trinity runs last a little longer.. This is a new one I think.. assertion failed: page_ops & PAGE_LOCK, file: fs/btrfs/extent_io.c, line: 1716 [ cut here ] kernel BUG at fs/btrfs/ctree.h:3423! invalid

Re: assertion failed: last_size == new_size, file: fs/btrfs/inode.c

2017-03-01 Thread Dave Jones
On Tue, Feb 28, 2017 at 05:12:01PM -0800, Liu Bo wrote: > On Mon, Feb 27, 2017 at 11:23:42AM -0500, Dave Jones wrote: > > On Mon, Feb 27, 2017 at 07:53:48AM -0800, Liu Bo wrote: > > > On Sun, Feb 26, 2017 at 07:18:42PM -0500, Dave Jones wrote: > > > > Hittin

btrfs_wait_ordered_roots warning triggered

2017-06-21 Thread Dave Jones
WARNING: CPU: 2 PID: 7153 at fs/btrfs/ordered-data.c:753 btrfs_wait_ordered_roots+0x1a3/0x220 CPU: 2 PID: 7153 Comm: kworker/u8:7 Not tainted 4.12.0-rc6-think+ #4 Workqueue: events_unbound btrfs_async_reclaim_metadata_space task: 8804f08d5380 task.stack: c9000895c000 RIP:

Re: btrfs_wait_ordered_roots warning triggered

2017-06-21 Thread Dave Jones
On Wed, Jun 21, 2017 at 11:52:36AM -0400, Chris Mason wrote: > On 06/21/2017 11:16 AM, Dave Jones wrote: > > WARNING: CPU: 2 PID: 7153 at fs/btrfs/ordered-data.c:753 > > btrfs_wait_ordered_roots+0x1a3/0x220 > > CPU: 2 PID: 7153 Comm: kworker/u8:7 Not tainted