[PATCH v4] btrfs: fix fsfreeze hang caused by delayed iputs deal
When running fstests generic/068, sometimes we got below deadlock: xfs_io D 8800331dbb20 0 6697 6693 0x0080 8800331dbb20 88007acfc140 880034d895c0 8800331dc000 880032d243e8 fffe 880032d24400 0001 8800331dbb38 816a9045 880034d895c0 8800331dbba8 Call Trace: [] schedule+0x35/0x80 [] rwsem_down_read_failed+0xf2/0x140 [] ? __filemap_fdatawrite_range+0xd1/0x100 [] call_rwsem_down_read_failed+0x18/0x30 [] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs] [] percpu_down_read+0x35/0x50 [] __sb_start_write+0x2c/0x40 [] start_transaction+0x2a5/0x4d0 [btrfs] [] btrfs_join_transaction+0x17/0x20 [btrfs] [] btrfs_evict_inode+0x3c4/0x5d0 [btrfs] [] evict+0xba/0x1a0 [] iput+0x196/0x200 [] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs] [] btrfs_commit_transaction+0x928/0xa80 [btrfs] [] btrfs_freeze+0x30/0x40 [btrfs] [] freeze_super+0xf0/0x190 [] do_vfs_ioctl+0x4a5/0x5c0 [] ? do_audit_syscall_entry+0x66/0x70 [] ? syscall_trace_enter_phase1+0x11f/0x140 [] SyS_ioctl+0x79/0x90 [] do_syscall_64+0x62/0x110 [] entry_SYSCALL64_slow_path+0x25/0x25 >From this warning, freeze_super() already holds SB_FREEZE_FS, but btrfs_freeze() will call btrfs_commit_transaction() again, if btrfs_commit_transaction() finds that it has delayed iputs to handle, it'll start_transaction(), which will try to get SB_FREEZE_FS lock again, then deadlock occurs. The root cause is that in btrfs, sync_filesystem(sb) does not make sure all metadata is updated. There still maybe some codes adding delayed iputs, see below sample race window: CPU1 | CPU2 |-> freeze_super() | |-> sync_filesystem(sb); | | |-> cleaner_kthread() | | |-> btrfs_delete_unused_bgs() | | |-> btrfs_remove_chunk() | | |-> btrfs_remove_block_group() | | |-> btrfs_add_delayed_iput() | | |-> sb->s_writers.frozen = SB_FREEZE_FS; | |-> sb_wait_write(sb, SB_FREEZE_FS); | | acquire SB_FREEZE_FS lock. | | | |-> btrfs_freeze() | |-> btrfs_commit_transaction() | |-> btrfs_run_delayed_iputs() | | will handle delayed iputs, | | that means start_transaction() | | will be called, which will try | | to get SB_FREEZE_FS lock. | To fix this issue, introduce a "int fs_frozen" to record internally whether fs has been frozen. If fs has been frozen, we can not handle delayed iputs. Signed-off-by: Wang Xiaoguang--- v3: we introduce a atomic_t fs_frozen, and if fs_frozen is 1, we can not handle delayed iputs. v4: change atomic_t fs_frozen to be int. --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/disk-io.c | 1 + fs/btrfs/super.c | 10 ++ fs/btrfs/transaction.c | 7 ++- 4 files changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 90041a2..3f241d5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1091,6 +1091,8 @@ struct btrfs_fs_info { struct list_head pinned_chunks; int creating_free_space_tree; + /* Used to record internally whether fs has been frozen */ + int fs_frozen; }; struct btrfs_subvolume_writers { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 86cad9a..1d26a51 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2621,6 +2621,7 @@ int open_ctree(struct super_block *sb, atomic_set(_info->qgroup_op_seq, 0); atomic_set(_info->reada_works_cnt, 0); atomic64_set(_info->tree_mod_seq, 0); + fs_info->fs_frozen = 0; fs_info->sb = sb; fs_info->max_inline = BTRFS_DEFAULT_MAX_INLINE; fs_info->metadata_ratio = 0; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 60e7179..6b73cad 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2216,6 +2216,7 @@ static int btrfs_freeze(struct super_block *sb) struct btrfs_trans_handle *trans; struct btrfs_root *root = btrfs_sb(sb)->tree_root; + root->fs_info->fs_frozen = 1; trans = btrfs_attach_transaction_barrier(root); if (IS_ERR(trans)) { /* no transaction, don't bother */ @@ -2226,6 +2227,14 @@ static int btrfs_freeze(struct super_block *sb) return btrfs_commit_transaction(trans, root); } +static int btrfs_unfreeze(struct super_block *sb) +{ + struct btrfs_root *root = btrfs_sb(sb)->tree_root; + + root->fs_info->fs_frozen = 0; + return 0; +}
btrfs fi usage bug during shrink
Hi, all. I was resizing (shrinking) a btrfs partition, and figured I'd check in on how it was going with "btrfs fi usage." It was quite startling: $ sudo btrfs fi usage /mnt/ Overall: Device size: 370.00GiB Device allocated:372.03GiB Device unallocated: 16.00EiB Device missing: 0.00B Used:360.56GiB Free (estimated):0.00B (min: 8.00EiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 224.00MiB (used: 0.00B) Data,single: Size:370.02GiB, Used:359.31GiB /dev/mapper/c1370.02GiB Metadata,DUP: Size:1.00GiB, Used:639.22MiB /dev/mapper/c1 2.00GiB System,DUP: Size:8.00MiB, Used:64.00KiB /dev/mapper/c1 16.00MiB Unallocated: /dev/mapper/c1 16.00EiB It's reasonably obvious what's going on, here. The overall size has been set to the final size, and now the worker is going through balancing all the chunks that are now out of bounds. I feel like "fi usage" should probably have some logic to detect this situation and report something more sensible. Thankfully, it's only transient, and returns to normal once the resize completes. Thanks, --Sean -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
Chris Murphy posted on Sat, 30 Jul 2016 14:02:17 -0600 as excerpted: > Short version: When systemd-logind login.conf KillUserProcesses=yes, and > the user does "sudo btrfs scrub start" in e.g. GNOME Terminal, and then > logs out of the shell, the user space operation is killed, and btrfs > scrub status reports that the scrub was aborted. [1] What does btrfs scrub resume do? Resume, or error? If it resumes, I'd say RESOLVED/NOTABUG as both that systemd option and btrfs scrub appear to be working as intended. If it doesn't, then there's definitely a btrfs bug, even if you argue it's only in the documentation, because the manpage (tho still 4.6.1, here) says it resumes an interrupted scrub but won't start a new one if the scrub finished successfully, and an abort is definitely an interruption, not a successful finish. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs send to send out metadata and data separately
At 07/31/2016 02:49 AM, g.bt...@cobb.uk.net wrote: On 29/07/16 13:40, Qu Wenruo wrote: Cons: 1) Not full fs clone detection Such clone detection is only inside the send snapshot. For case that one extent is referred only once in the send snapshot, but also referred by source subvolume, then in the received subvolume, it will be a new extent, but not a clone. Only extent that is referred twice by send snapshot, that extent will be shared. (Although much better than disabling the whole clone detection) Qu, Does that mean that the following, extremely common, use of send would be impacted? Create many snapshots of a large and fairly busy sub-volume (say, hourly) with few changes between each one. Send all the snapshots as incremental sends to a second (backup) disk either as soon as they are created, or maybe in bunches later. With this change, would each of the snapshots require separate space usage on the backup disk, with duplicates of unchanged files? If so, that would completely destroy the concept of keeping frequent snapshots on a backup disk (and force us to keep the snapshots on the original disk, causing **many** more problems with backref walks on the data disk). This new behavior won't impact this use case. As kernel send part will compare tree blocks to send out the difference only. So incremental sends is not impacted at all. The impacted behavior is, reflink from old snapshot. One example is: 1) There is a readonly snapshot A Already sent and recovered 2) A new snapshot B, is snapshotted from A With the new modification: Reflink one extent(X) which lies in A In that case, if we send out snapshot B, based on A, then the extent X will be sent out as a new extent, not a reflink. Since it's only used once inside the snapshot B. While the original send will detect such reflink, and won't send out the whole extent. But if snapshot B has the following modification compare to A: 1) Reflink one extent(X) which originally lies in A, to inode Z 2) Reflink one extent(X) which originally lies in A, to inode W Then although extent X will still be sent out as a new extent, Z and W will share the extent, as it's referred twice inside the snapshot B. I assume the most common impact will be, reflinking the whole file from original subvolume. In that case, the whole file will be sent out as new data. While for reflinking inside the subvolume, the clone detection is faster, and I consider that's more common though. It's a trade which leans to heavily deduped files(both in-band or out-of-band) or heavily snapshotted subvolume layout, as it completely avoids the time consuming backref walk. Personally I consider it worthy though. Thanks, Qu (Does the answer change if we do non-incremental sends?) I moved to this approach after the problems I had running balance on my (very busy, and also large) data disk because of the number of snapshots I was keeping on it. My data disk has about 4TB in use, and I have just bought a 10TB backup disk but I would need about 50 more of them if the hourly snapshots were no longer sharing space! If that is the case, the cure seems much worse than the disease. Apologies if I have misunderstood the proposal. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs send to send out metadata and data separately
At 07/29/2016 09:14 PM, Libor Klepáč wrote: Hello, just a little question on receiver point 0), see bellow Dne pátek 29. července 2016 20:40:38 CEST, Qu Wenruo napsal(a): Hi Filipe, and maintainers, Receive will do the following thing first before recovering the subvolume/snapshot: 0) Create temporary dir for data extents Create a new dir with temporary name($data_extent), to put data extents into it. These are the directories in format "o4095916-21925-0" on receiver side? That's the temporary dir/file receive creates. If using send-test(not complied anymore), you will see that that's how receive recover files/dirs: 1) mkfile/mkdir with temporary name 2) rename temporary file/dir to its final name I'm in middle of send/receive and i have over 300 thousand of them already. I was always told, that directory with lots of items suffers on performance (but i newer used btrfs before :), is that true? Not completely true. The fact is even worse, no matter dir or file, if there are too much inodes/extents in a *subvolume/snapshot*, it will be slowed down. Unlike normal fs (ext*/xfs), btrfs put all dir/file info into one subvolume tree. But that's to say, if you design the subvolume layout carefully, which means never put too many things into one subvolume, then it should not be a big problem though. Should it be little structured (subdirs based on extent number, for example) ? Dir and file are sharing the same subvolume/dir tree, so subdir won't help. But I can create a new subvolume for data extents, and reflink can work across subvolumes. So that's won't cause a big problem though. Thanks, Qu Libor N�r��y���b�X��ǧv�^�){.n�+{�n�߲)���w*jg����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On Sun, Jul 31, 2016 at 4:56 AM, Gabriel Cwrote: > > > On 30.07.2016 22:02, Chris Murphy wrote: >> Short version: When systemd-logind login.conf KillUserProcesses=yes, >> and the user does "sudo btrfs scrub start" in e.g. GNOME Terminal, and >> then logs out of the shell, the user space operation is killed, and >> btrfs scrub status reports that the scrub was aborted. [1] >> > > How this is a bug ? If the privilege escalated operation (kernel threads included) are clobbered, then it's a bug because there's every reason for a user to issue this command that could take hours or days, and not have to stay logged into to their GUI shell session while it runs, for example over the weekend. Yes of course they could schedule it but saying they could do it another way doesn't fix the use case of doing it manually. If the operation continues, and just the user space command is killed off, it's a bug because the statistics and status of the scrub are lost to future status checks; that is, "interrupted" is sufficiently misleading that it's false. The operation did continue, we've just lost the conclusion. Balance and replace, while user process is killed, kernel process continues, and it's still possible for a user to get current (and correct) status information for both. Further it's arguably a regression compared to equivalent mdadm and LVM RAID behaviors. > > Is excatly what 'KillUserProcesses=yes' is extected to do.. > No, it basically breaks scrub initiated within a user's GUI session and that is in no way intended by anyone, it's a side effect. The question is how to fix it, not debate whether it's a bug, that's ridiculous. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OOM killer invoked during btrfs send/recieve on otherwise idle machine
On 2016.07.31 at 17:10 +0200, Michal Hocko wrote: > [CC Mel and linux-mm] > > On Sun 31-07-16 07:11:21, Markus Trippelsdorf wrote: > > Tonight the OOM killer got invoked during backup of /: > > > > [Jul31 01:56] kthreadd invoked oom-killer: > > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, > > oom_score_adj=0 > > This a kernel stack allocation. > > > [ +0.04] CPU: 3 PID: 2 Comm: kthreadd Not tainted > > 4.7.0-06816-g797cee982eef-dirty #37 > > [ +0.00] Hardware name: System manufacturer System Product > > Name/M4A78T-E, BIOS 350304/13/2011 > > [ +0.02] 813c2d58 8802168e7d48 > > 002ec4ea > > [ +0.02] 8118eb9d 01b8 0440 > > 03b0 > > [ +0.02] 8802133fe400 002ec4ea 81b8ac9c > > 0006 > > [ +0.01] Call Trace: > > [ +0.04] [] ? dump_stack+0x46/0x6e > > [ +0.03] [] ? dump_header.isra.11+0x4c/0x1a7 > > [ +0.02] [] ? oom_kill_process+0x2ab/0x460 > > [ +0.01] [] ? out_of_memory+0x2e3/0x380 > > [ +0.02] [] ? > > __alloc_pages_slowpath.constprop.124+0x1d32/0x1e40 > > [ +0.01] [] ? __alloc_pages_nodemask+0x10c/0x120 > > [ +0.02] [] ? copy_process.part.72+0xea/0x17a0 > > [ +0.02] [] ? pick_next_task_fair+0x915/0x1520 > > [ +0.01] [] ? kthread_flush_work_fn+0x20/0x20 > > [ +0.01] [] ? kernel_thread+0x7a/0x1c0 > > [ +0.01] [] ? kthreadd+0xd2/0x120 > > [ +0.02] [] ? ret_from_fork+0x1f/0x40 > > [ +0.01] [] ? kthread_stop+0x100/0x100 > > [ +0.01] Mem-Info: > > [ +0.03] active_anon:5882 inactive_anon:60307 isolated_anon:0 > >active_file:1523729 inactive_file:223965 isolated_file:0 > >unevictable:1970 dirty:130014 writeback:40735 unstable:0 > >slab_reclaimable:179690 slab_unreclaimable:8041 > >mapped:6771 shmem:3 pagetables:592 bounce:0 > >free:11374 free_pcp:54 free_cma:0 > > [ +0.04] Node 0 active_anon:23528kB inactive_anon:241228kB > > active_file:6094916kB inactive_file:895860kB unevictable:7880kB > > isolated(anon):0kB isolated(file):0kB mapped:27084kB dirty:520056kB > > writeback:162940kB shmem:12kB writeback_tmp:0kB unstable:0kB > > pages_scanned:32 all_unreclaimable? no > > [ +0.02] DMA free:15908kB min:20kB low:32kB high:44kB active_anon:0kB > > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > > slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB > > bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > [ +0.01] lowmem_reserve[]: 0 3486 7953 7953 > > [ +0.04] DMA32 free:23456kB min:4996kB low:8564kB high:12132kB > > active_anon:2480kB inactive_anon:10564kB active_file:2559792kB > > inactive_file:478680kB unevictable:0kB writepending:365292kB > > present:3652160kB managed:3574264kB mlocked:0kB slab_reclaimable:437456kB > > slab_unreclaimable:12304kB kernel_stack:144kB pagetables:28kB bounce:0kB > > free_pcp:212kB local_pcp:0kB free_cma:0kB > > [ +0.01] lowmem_reserve[]: 0 0 4466 4466 > > [ +0.03] Normal free:6132kB min:6400kB low:10972kB high:15544kB > > active_anon:21048kB inactive_anon:230664kB active_file:3535124kB > > inactive_file:417312kB unevictable:7880kB writepending:318020kB > > present:4718592kB managed:4574096kB mlocked:7880kB > > slab_reclaimable:281304kB slab_unreclaimable:19860kB kernel_stack:2944kB > > pagetables:2340kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > [ +0.00] lowmem_reserve[]: 0 0 0 0 > > [ +0.02] DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) > > 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15908kB > > [ +0.05] DMA32: 4215*4kB (UMEH) 319*8kB (UMH) 5*16kB (H) 2*32kB (H) > > 2*64kB (H) 1*128kB (H) 0*256kB 1*512kB (H) 1*1024kB (H) 1*2048kB (H) > > 0*4096kB = 23396kB > > [ +0.06] Normal: 650*4kB (UMH) 4*8kB (UH) 27*16kB (H) 23*32kB (H) > > 17*64kB (H) 11*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6296kB > > The memory is quite fragmented but there are order-2+ free blocks. They > seem to be in the high atomic reserves but we should release them. > Is this reproducible? If yes, could you try with the 4.7 kernel please? It never happened before and it only happend once yet. I will continue to run the latest git kernel and let you know if it happens again. (I did copy several git trees to my root partition yesterday, so the incremental btrfs stream was larger than usual.) -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OOM killer invoked during btrfs send/recieve on otherwise idle machine
[CC Mel and linux-mm] On Sun 31-07-16 07:11:21, Markus Trippelsdorf wrote: > Tonight the OOM killer got invoked during backup of /: > > [Jul31 01:56] kthreadd invoked oom-killer: > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 This a kernel stack allocation. > [ +0.04] CPU: 3 PID: 2 Comm: kthreadd Not tainted > 4.7.0-06816-g797cee982eef-dirty #37 > [ +0.00] Hardware name: System manufacturer System Product > Name/M4A78T-E, BIOS 350304/13/2011 > [ +0.02] 813c2d58 8802168e7d48 > 002ec4ea > [ +0.02] 8118eb9d 01b8 0440 > 03b0 > [ +0.02] 8802133fe400 002ec4ea 81b8ac9c > 0006 > [ +0.01] Call Trace: > [ +0.04] [] ? dump_stack+0x46/0x6e > [ +0.03] [] ? dump_header.isra.11+0x4c/0x1a7 > [ +0.02] [] ? oom_kill_process+0x2ab/0x460 > [ +0.01] [] ? out_of_memory+0x2e3/0x380 > [ +0.02] [] ? > __alloc_pages_slowpath.constprop.124+0x1d32/0x1e40 > [ +0.01] [] ? __alloc_pages_nodemask+0x10c/0x120 > [ +0.02] [] ? copy_process.part.72+0xea/0x17a0 > [ +0.02] [] ? pick_next_task_fair+0x915/0x1520 > [ +0.01] [] ? kthread_flush_work_fn+0x20/0x20 > [ +0.01] [] ? kernel_thread+0x7a/0x1c0 > [ +0.01] [] ? kthreadd+0xd2/0x120 > [ +0.02] [] ? ret_from_fork+0x1f/0x40 > [ +0.01] [] ? kthread_stop+0x100/0x100 > [ +0.01] Mem-Info: > [ +0.03] active_anon:5882 inactive_anon:60307 isolated_anon:0 >active_file:1523729 inactive_file:223965 isolated_file:0 >unevictable:1970 dirty:130014 writeback:40735 unstable:0 >slab_reclaimable:179690 slab_unreclaimable:8041 >mapped:6771 shmem:3 pagetables:592 bounce:0 >free:11374 free_pcp:54 free_cma:0 > [ +0.04] Node 0 active_anon:23528kB inactive_anon:241228kB > active_file:6094916kB inactive_file:895860kB unevictable:7880kB > isolated(anon):0kB isolated(file):0kB mapped:27084kB dirty:520056kB > writeback:162940kB shmem:12kB writeback_tmp:0kB unstable:0kB pages_scanned:32 > all_unreclaimable? no > [ +0.02] DMA free:15908kB min:20kB low:32kB high:44kB active_anon:0kB > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB > bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [ +0.01] lowmem_reserve[]: 0 3486 7953 7953 > [ +0.04] DMA32 free:23456kB min:4996kB low:8564kB high:12132kB > active_anon:2480kB inactive_anon:10564kB active_file:2559792kB > inactive_file:478680kB unevictable:0kB writepending:365292kB > present:3652160kB managed:3574264kB mlocked:0kB slab_reclaimable:437456kB > slab_unreclaimable:12304kB kernel_stack:144kB pagetables:28kB bounce:0kB > free_pcp:212kB local_pcp:0kB free_cma:0kB > [ +0.01] lowmem_reserve[]: 0 0 4466 4466 > [ +0.03] Normal free:6132kB min:6400kB low:10972kB high:15544kB > active_anon:21048kB inactive_anon:230664kB active_file:3535124kB > inactive_file:417312kB unevictable:7880kB writepending:318020kB > present:4718592kB managed:4574096kB mlocked:7880kB slab_reclaimable:281304kB > slab_unreclaimable:19860kB kernel_stack:2944kB pagetables:2340kB bounce:0kB > free_pcp:0kB local_pcp:0kB free_cma:0kB > [ +0.00] lowmem_reserve[]: 0 0 0 0 > [ +0.02] DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) > 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15908kB > [ +0.05] DMA32: 4215*4kB (UMEH) 319*8kB (UMH) 5*16kB (H) 2*32kB (H) > 2*64kB (H) 1*128kB (H) 0*256kB 1*512kB (H) 1*1024kB (H) 1*2048kB (H) 0*4096kB > = 23396kB > [ +0.06] Normal: 650*4kB (UMH) 4*8kB (UH) 27*16kB (H) 23*32kB (H) > 17*64kB (H) 11*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6296kB The memory is quite fragmented but there are order-2+ free blocks. They seem to be in the high atomic reserves but we should release them. Is this reproducible? If yes, could you try with the 4.7 kernel please? Keeping the rest of the emil for reference. > [ +0.05] 1749526 total pagecache pages > [ +0.01] 150 pages in swap cache > [ +0.01] Swap cache stats: add 1222, delete 1072, find 2366/2401 > [ +0.00] Free swap = 4091520kB > [ +0.01] Total swap = 4095996kB > [ +0.00] 2096686 pages RAM > [ +0.01] 0 pages HighMem/MovableOnly > [ +0.00] 55619 pages reserved > [ +0.01] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents > oom_score_adj name > [ +0.04] [ 153] 0 153 4087 406 9 3 104 >-1000 udevd > [ +0.01] [ 181] 0 181 5718 1169 15 3 143 >0 syslog-ng > [ +0.01] [ 187] 102 18788789 5137 53 3 663 >0 mpd > [ +0.02] [
[GIT PULL] Btrfs
Hi Linus, This is part one of my btrfs pull, and you can find it in my for-linus-4.8 branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus-4.8 This pull is dedicated to Josef's enospc rework, which we've been testing for a few releases now. It fixes some early enospc problems and is dramatically faster. The pull also includes an updated fix for the delalloc accounting that happens after a fault in copy_from_user. My patch in v4.7 was almost but not quite enough. Dave Sterba has a branch prepped with a cleanup series from Jeff Mahoney as well as other fixes. My plan is to send that after wading through vacation backlog on Monday. Josef Bacik (19) commits (+679/-344): Btrfs: avoid deadlocks during reservations in btrfs_truncate_block (+5/-0) Btrfs: use FLUSH_LIMIT for relocation in reserve_metadata_bytes (+22/-17) Btrfs: don't bother kicking async if there's nothing to reclaim (+3/-0) Btrfs: change delayed reservation fallback behavior (+23/-41) Btrfs: always reserve metadata for delalloc extents (+13/-22) Btrfs: change how we calculate the global block rsv (+9/-36) Btrfs: add bytes_readonly to the spaceinfo at once (+11/-18) Btrfs: introduce ticketed enospc infrastructure (+380/-151) Btrfs: fill relocation block rsv after allocation (+6/-0) Btrfs: fix delalloc reservation amount tracepoint (+3/-1) Btrfs: fix release reserved extents trace points (+1/-5) Btrfs: fix callers of btrfs_block_rsv_migrate (+18/-25) Btrfs: always use trans->block_rsv for orphans (+7/-1) Btrfs: use root when checking need_async_flush (+6/-5) Btrfs: add tracepoint for adding block groups (+42/-0) Btrfs: add tracepoints for flush events (+103/-10) Btrfs: warn_on for unaccounted spaces (+8/-6) Btrfs: add fsid to some tracepoints (+11/-6) Btrfs: trace pinned extents (+8/-0) Chris Mason (1) commits (+5/-7): Btrfs: fix delalloc accounting after copy_from_user faults Total: (20) commits (+684/-351) fs/btrfs/ctree.h | 15 +- fs/btrfs/delayed-inode.c | 68 ++-- fs/btrfs/extent-tree.c | 731 +++ fs/btrfs/file.c | 16 +- fs/btrfs/inode.c | 7 +- fs/btrfs/relocation.c| 45 +-- include/trace/events/btrfs.h | 139 +++- 7 files changed, 677 insertions(+), 344 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems
On Fri, Jul 15, 2016 at 2:31 AM, Omar Sandovalwrote: > From: Omar Sandoval > > So it turns out that the free space tree bitmap handling has always been > broken on big-endian systems. Totally my bad. > > Patch 1 fixes this. Technically, it's a disk format change for > big-endian systems, but it never could have worked before, so I won't go > through the trouble of any incompat bits. If you've somehow been using > space_cache=v2 on a big-endian system (I doubt anyone is), you're going > to want to mount with nospace_cache to clear it and wait for this to go > in. > > Patch 2 fixes a similar error in the sanity tests (it's the same as the > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the > oversight that patch 1 fixes. > > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests > pass on x86_64 and MIPS. Omar, can you please upstream this patch or update it for current git kernel ? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On 30.07.2016 22:02, Chris Murphy wrote: > Short version: When systemd-logind login.conf KillUserProcesses=yes, > and the user does "sudo btrfs scrub start" in e.g. GNOME Terminal, and > then logs out of the shell, the user space operation is killed, and > btrfs scrub status reports that the scrub was aborted. [1] > How this is a bug ? Is excatly what 'KillUserProcesses=yes' is extected to do.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html