[PATCH v4] btrfs: fix fsfreeze hang caused by delayed iputs deal

2016-07-31 Thread Wang Xiaoguang
When running fstests generic/068, sometimes we got below deadlock:
  xfs_io  D 8800331dbb20 0  6697   6693 0x0080
  8800331dbb20 88007acfc140 880034d895c0 8800331dc000
  880032d243e8 fffe 880032d24400 0001
  8800331dbb38 816a9045 880034d895c0 8800331dbba8
  Call Trace:
  [] schedule+0x35/0x80
  [] rwsem_down_read_failed+0xf2/0x140
  [] ? __filemap_fdatawrite_range+0xd1/0x100
  [] call_rwsem_down_read_failed+0x18/0x30
  [] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
  [] percpu_down_read+0x35/0x50
  [] __sb_start_write+0x2c/0x40
  [] start_transaction+0x2a5/0x4d0 [btrfs]
  [] btrfs_join_transaction+0x17/0x20 [btrfs]
  [] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
  [] evict+0xba/0x1a0
  [] iput+0x196/0x200
  [] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
  [] btrfs_commit_transaction+0x928/0xa80 [btrfs]
  [] btrfs_freeze+0x30/0x40 [btrfs]
  [] freeze_super+0xf0/0x190
  [] do_vfs_ioctl+0x4a5/0x5c0
  [] ? do_audit_syscall_entry+0x66/0x70
  [] ? syscall_trace_enter_phase1+0x11f/0x140
  [] SyS_ioctl+0x79/0x90
  [] do_syscall_64+0x62/0x110
  [] entry_SYSCALL64_slow_path+0x25/0x25

>From this warning, freeze_super() already holds SB_FREEZE_FS, but
btrfs_freeze() will call btrfs_commit_transaction() again, if
btrfs_commit_transaction() finds that it has delayed iputs to handle,
it'll start_transaction(), which will try to get SB_FREEZE_FS lock
again, then deadlock occurs.

The root cause is that in btrfs, sync_filesystem(sb) does not make
sure all metadata is updated. There still maybe some codes adding
delayed iputs, see below sample race window:

 CPU1  | CPU2
|-> freeze_super() |
|-> sync_filesystem(sb);   |
|  |-> cleaner_kthread()
|  |   |-> btrfs_delete_unused_bgs()
|  |   |-> btrfs_remove_chunk()
|  |   |-> 
btrfs_remove_block_group()
|  |   |-> 
btrfs_add_delayed_iput()
|  |
|-> sb->s_writers.frozen = SB_FREEZE_FS;   |
|-> sb_wait_write(sb, SB_FREEZE_FS);   |
|   acquire SB_FREEZE_FS lock. |
|  |
|-> btrfs_freeze() |
|-> btrfs_commit_transaction() |
|-> btrfs_run_delayed_iputs()  |
|   will handle delayed iputs, |
|   that means start_transaction() |
|   will be called, which will try |
|   to get SB_FREEZE_FS lock.  |

To fix this issue, introduce a "int fs_frozen" to record internally whether
fs has been frozen. If fs has been frozen, we can not handle delayed iputs.

Signed-off-by: Wang Xiaoguang 
---
v3: we introduce a atomic_t fs_frozen, and if fs_frozen is 1, we can not
handle delayed iputs.
v4: change atomic_t fs_frozen to be int.
---
 fs/btrfs/ctree.h   |  2 ++
 fs/btrfs/disk-io.c |  1 +
 fs/btrfs/super.c   | 10 ++
 fs/btrfs/transaction.c |  7 ++-
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 90041a2..3f241d5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1091,6 +1091,8 @@ struct btrfs_fs_info {
struct list_head pinned_chunks;
 
int creating_free_space_tree;
+   /* Used to record internally whether fs has been frozen */
+   int fs_frozen;
 };
 
 struct btrfs_subvolume_writers {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 86cad9a..1d26a51 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2621,6 +2621,7 @@ int open_ctree(struct super_block *sb,
atomic_set(_info->qgroup_op_seq, 0);
atomic_set(_info->reada_works_cnt, 0);
atomic64_set(_info->tree_mod_seq, 0);
+   fs_info->fs_frozen = 0;
fs_info->sb = sb;
fs_info->max_inline = BTRFS_DEFAULT_MAX_INLINE;
fs_info->metadata_ratio = 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 60e7179..6b73cad 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2216,6 +2216,7 @@ static int btrfs_freeze(struct super_block *sb)
struct btrfs_trans_handle *trans;
struct btrfs_root *root = btrfs_sb(sb)->tree_root;
 
+   root->fs_info->fs_frozen = 1;
trans = btrfs_attach_transaction_barrier(root);
if (IS_ERR(trans)) {
/* no transaction, don't bother */
@@ -2226,6 +2227,14 @@ static int btrfs_freeze(struct super_block *sb)
return btrfs_commit_transaction(trans, root);
 }
 
+static int btrfs_unfreeze(struct super_block *sb)
+{
+   struct btrfs_root *root = btrfs_sb(sb)->tree_root;
+
+   root->fs_info->fs_frozen = 0;
+   return 0;
+}

btrfs fi usage bug during shrink

2016-07-31 Thread Sean Greenslade
Hi, all. I was resizing (shrinking) a btrfs partition, and figured I'd
check in on how it was going with "btrfs fi usage." It was quite
startling:

$ sudo btrfs fi usage /mnt/

Overall:
Device size: 370.00GiB
Device allocated:372.03GiB
Device unallocated:   16.00EiB
Device missing:  0.00B
Used:360.56GiB
Free (estimated):0.00B  (min: 8.00EiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  224.00MiB  (used: 0.00B)

Data,single: Size:370.02GiB, Used:359.31GiB
   /dev/mapper/c1370.02GiB

Metadata,DUP: Size:1.00GiB, Used:639.22MiB
   /dev/mapper/c1  2.00GiB

System,DUP: Size:8.00MiB, Used:64.00KiB
   /dev/mapper/c1 16.00MiB

Unallocated:
   /dev/mapper/c1 16.00EiB


It's reasonably obvious what's going on, here. The overall size has been
set to the final size, and now the worker is going through balancing all
the chunks that are now out of bounds. 

I feel like "fi usage" should probably have some logic to detect this
situation and report something more sensible. Thankfully, it's only
transient, and returns to normal once the resize completes.

Thanks,

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: systemd KillUserProcesses=yes and btrfs scrub

2016-07-31 Thread Duncan
Chris Murphy posted on Sat, 30 Jul 2016 14:02:17 -0600 as excerpted:

> Short version: When systemd-logind login.conf KillUserProcesses=yes, and
> the user does "sudo btrfs scrub start" in e.g. GNOME Terminal, and then
> logs out of the shell, the user space operation is killed, and btrfs
> scrub status reports that the scrub was aborted. [1]

What does btrfs scrub resume do?  Resume, or error?

If it resumes, I'd say RESOLVED/NOTABUG as both that systemd option and 
btrfs scrub appear to be working as intended.  If it doesn't, then 
there's definitely a btrfs bug, even if you argue it's only in the 
documentation, because the manpage (tho still 4.6.1, here) says it 
resumes an interrupted scrub but won't start a new one if the scrub 
finished successfully, and an abort is definitely an interruption, not a 
successful finish.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs send to send out metadata and data separately

2016-07-31 Thread Qu Wenruo



At 07/31/2016 02:49 AM, g.bt...@cobb.uk.net wrote:

On 29/07/16 13:40, Qu Wenruo wrote:

Cons:
1) Not full fs clone detection
   Such clone detection is only inside the send snapshot.

   For case that one extent is referred only once in the send snapshot,
   but also referred by source subvolume, then in the received
   subvolume, it will be a new extent, but not a clone.

   Only extent that is referred twice by send snapshot, that extent
   will be shared.

   (Although much better than disabling the whole clone detection)


Qu,

Does that mean that the following, extremely common, use of send would
be impacted?

Create many snapshots of a large and fairly busy sub-volume (say,
hourly) with few changes between each one. Send all the snapshots as
incremental sends to a second (backup) disk either as soon as they are
created, or maybe in bunches later.

With this change, would each of the snapshots require separate space
usage on the backup disk, with duplicates of unchanged files?  If so,
that would completely destroy the concept of keeping frequent snapshots
on a backup disk (and force us to keep the snapshots on the original
disk, causing **many** more problems with backref walks on the data disk).


This new behavior won't impact this use case.

As kernel send part will compare tree blocks to send out the difference 
only.


So incremental sends is not impacted at all.



The impacted behavior is, reflink from old snapshot.
One example is:

1) There is a readonly snapshot A
   Already sent and recovered

2) A new snapshot B, is snapshotted from A
   With the new modification:
 Reflink one extent(X) which lies in A

In that case, if we send out snapshot B, based on A,
then the extent X will be sent out as a new extent, not a reflink.
Since it's only used once inside the snapshot B.

While the original send will detect such reflink, and won't send out the 
whole extent.



But if snapshot B has the following modification compare to A:

  1) Reflink one extent(X) which originally lies in A, to inode Z

  2) Reflink one extent(X) which originally lies in A, to inode W

Then although extent X will still be sent out as a new extent, Z and W 
will share the extent, as it's referred twice inside the snapshot B.



I assume the most common impact will be, reflinking the whole file from 
original subvolume.

In that case, the whole file will be sent out as new data.

While for reflinking inside the subvolume, the clone detection is 
faster, and I consider that's more common though.


It's a trade which leans to heavily deduped files(both in-band or 
out-of-band) or heavily snapshotted subvolume layout, as it completely 
avoids the time consuming backref walk.


Personally I consider it worthy though.

Thanks,
Qu


(Does the answer change if we do non-incremental sends?)

I moved to this approach after the problems I had running balance on my
(very busy, and also large) data disk because of the number of snapshots
I was keeping on it.  My data disk has about 4TB in use, and I have just
bought a 10TB backup disk but I would need about 50 more of them if the
hourly snapshots were no longer sharing space! If that is the case, the
cure seems much worse than the disease.

Apologies if I have misunderstood the proposal.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs send to send out metadata and data separately

2016-07-31 Thread Qu Wenruo



At 07/29/2016 09:14 PM, Libor Klepáč wrote:

Hello,

just a little question on receiver point 0), see bellow

Dne pátek 29. července 2016 20:40:38 CEST, Qu Wenruo napsal(a):

Hi Filipe, and maintainers,

Receive will do the following thing first before recovering the
subvolume/snapshot:

0) Create temporary dir for data extents
Create a new dir with temporary name($data_extent), to put data
extents into it.


These are the directories in format "o4095916-21925-0" on receiver side?


That's the temporary dir/file receive creates.

If using send-test(not complied anymore), you will see that that's how 
receive recover files/dirs:


1) mkfile/mkdir with temporary name
2) rename temporary file/dir to its final name



I'm in middle of send/receive and i have over 300 thousand of them already.
I was always told, that directory with lots of items suffers on performance
(but i newer used btrfs before :), is that true?


Not completely true.

The fact is even worse, no matter dir or file, if there are too much 
inodes/extents in a *subvolume/snapshot*, it will be slowed down.


Unlike normal fs (ext*/xfs), btrfs put all dir/file info into one 
subvolume tree.


But that's to say, if you design the subvolume layout carefully, which 
means never put too many things into one subvolume, then it should not 
be a big problem though.




Should it be little structured (subdirs based on extent number, for example) ?


Dir and file are sharing the same subvolume/dir tree, so subdir won't help.

But I can create a new subvolume for data extents, and reflink can work 
across subvolumes.

So that's won't cause a big problem though.

Thanks,
Qu



Libor
N�r��y���b�X��ǧv�^�)޺{.n�+{�n�߲)���w*jg����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: systemd KillUserProcesses=yes and btrfs scrub

2016-07-31 Thread Chris Murphy
On Sun, Jul 31, 2016 at 4:56 AM, Gabriel C  wrote:
>
>
> On 30.07.2016 22:02, Chris Murphy wrote:
>> Short version: When systemd-logind login.conf KillUserProcesses=yes,
>> and the user does "sudo btrfs scrub start" in e.g. GNOME Terminal, and
>> then logs out of the shell, the user space operation is killed, and
>> btrfs scrub status reports that the scrub was aborted. [1]
>>
>
> How this is a bug ?

If the privilege escalated operation (kernel threads included) are
clobbered, then it's a bug because there's every reason for a user to
issue this command that could take hours or days, and not have to stay
logged into to their GUI shell session while it runs, for example over
the weekend. Yes of course they could schedule it but saying they
could do it another way doesn't fix the use case of doing it manually.

If the operation continues, and just the user space command is killed
off, it's a bug because the statistics and status of the scrub are
lost to future status checks; that is, "interrupted" is sufficiently
misleading that it's false. The operation did continue, we've just
lost the conclusion.

Balance and replace, while user process is killed, kernel process
continues, and it's still possible for a user to get current (and
correct) status information for both.

Further it's arguably a regression compared to equivalent mdadm and
LVM RAID behaviors.


>
> Is excatly what 'KillUserProcesses=yes' is extected to do..
>

No, it basically breaks scrub initiated within a user's GUI session
and that is in no way intended by anyone, it's a side effect. The
question is how to fix it, not debate whether it's a bug, that's
ridiculous.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer invoked during btrfs send/recieve on otherwise idle machine

2016-07-31 Thread Markus Trippelsdorf
On 2016.07.31 at 17:10 +0200, Michal Hocko wrote:
> [CC Mel and linux-mm]
> 
> On Sun 31-07-16 07:11:21, Markus Trippelsdorf wrote:
> > Tonight the OOM killer got invoked during backup of /:
> > 
> > [Jul31 01:56] kthreadd invoked oom-killer: 
> > gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, 
> > oom_score_adj=0
> 
> This a kernel stack allocation.
> 
> > [  +0.04] CPU: 3 PID: 2 Comm: kthreadd Not tainted 
> > 4.7.0-06816-g797cee982eef-dirty #37
> > [  +0.00] Hardware name: System manufacturer System Product 
> > Name/M4A78T-E, BIOS 350304/13/2011
> > [  +0.02]   813c2d58 8802168e7d48 
> > 002ec4ea
> > [  +0.02]  8118eb9d 01b8 0440 
> > 03b0
> > [  +0.02]  8802133fe400 002ec4ea 81b8ac9c 
> > 0006
> > [  +0.01] Call Trace:
> > [  +0.04]  [] ? dump_stack+0x46/0x6e
> > [  +0.03]  [] ? dump_header.isra.11+0x4c/0x1a7
> > [  +0.02]  [] ? oom_kill_process+0x2ab/0x460
> > [  +0.01]  [] ? out_of_memory+0x2e3/0x380
> > [  +0.02]  [] ? 
> > __alloc_pages_slowpath.constprop.124+0x1d32/0x1e40
> > [  +0.01]  [] ? __alloc_pages_nodemask+0x10c/0x120
> > [  +0.02]  [] ? copy_process.part.72+0xea/0x17a0
> > [  +0.02]  [] ? pick_next_task_fair+0x915/0x1520
> > [  +0.01]  [] ? kthread_flush_work_fn+0x20/0x20
> > [  +0.01]  [] ? kernel_thread+0x7a/0x1c0
> > [  +0.01]  [] ? kthreadd+0xd2/0x120
> > [  +0.02]  [] ? ret_from_fork+0x1f/0x40
> > [  +0.01]  [] ? kthread_stop+0x100/0x100
> > [  +0.01] Mem-Info:
> > [  +0.03] active_anon:5882 inactive_anon:60307 isolated_anon:0
> >active_file:1523729 inactive_file:223965 isolated_file:0
> >unevictable:1970 dirty:130014 writeback:40735 unstable:0
> >slab_reclaimable:179690 slab_unreclaimable:8041
> >mapped:6771 shmem:3 pagetables:592 bounce:0
> >free:11374 free_pcp:54 free_cma:0
> > [  +0.04] Node 0 active_anon:23528kB inactive_anon:241228kB 
> > active_file:6094916kB inactive_file:895860kB unevictable:7880kB 
> > isolated(anon):0kB isolated(file):0kB mapped:27084kB dirty:520056kB 
> > writeback:162940kB shmem:12kB writeback_tmp:0kB unstable:0kB 
> > pages_scanned:32 all_unreclaimable? no
> > [  +0.02] DMA free:15908kB min:20kB low:32kB high:44kB active_anon:0kB 
> > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
> > writepending:0kB present:15992kB managed:15908kB mlocked:0kB 
> > slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB 
> > bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [  +0.01] lowmem_reserve[]: 0 3486 7953 7953
> > [  +0.04] DMA32 free:23456kB min:4996kB low:8564kB high:12132kB 
> > active_anon:2480kB inactive_anon:10564kB active_file:2559792kB 
> > inactive_file:478680kB unevictable:0kB writepending:365292kB 
> > present:3652160kB managed:3574264kB mlocked:0kB slab_reclaimable:437456kB 
> > slab_unreclaimable:12304kB kernel_stack:144kB pagetables:28kB bounce:0kB 
> > free_pcp:212kB local_pcp:0kB free_cma:0kB
> > [  +0.01] lowmem_reserve[]: 0 0 4466 4466
> > [  +0.03] Normal free:6132kB min:6400kB low:10972kB high:15544kB 
> > active_anon:21048kB inactive_anon:230664kB active_file:3535124kB 
> > inactive_file:417312kB unevictable:7880kB writepending:318020kB 
> > present:4718592kB managed:4574096kB mlocked:7880kB 
> > slab_reclaimable:281304kB slab_unreclaimable:19860kB kernel_stack:2944kB 
> > pagetables:2340kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [  +0.00] lowmem_reserve[]: 0 0 0 0
> > [  +0.02] DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 
> > 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15908kB
> > [  +0.05] DMA32: 4215*4kB (UMEH) 319*8kB (UMH) 5*16kB (H) 2*32kB (H) 
> > 2*64kB (H) 1*128kB (H) 0*256kB 1*512kB (H) 1*1024kB (H) 1*2048kB (H) 
> > 0*4096kB = 23396kB
> > [  +0.06] Normal: 650*4kB (UMH) 4*8kB (UH) 27*16kB (H) 23*32kB (H) 
> > 17*64kB (H) 11*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6296kB
> 
> The memory is quite fragmented but there are order-2+ free blocks. They
> seem to be in the high atomic reserves but we should release them.
> Is this reproducible? If yes, could you try with the 4.7 kernel please?

It never happened before and it only happend once yet. I will continue
to run the latest git kernel and let you know if it happens again.

(I did copy several git trees to my root partition yesterday, so the
incremental btrfs stream was larger than usual.)

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM killer invoked during btrfs send/recieve on otherwise idle machine

2016-07-31 Thread Michal Hocko
[CC Mel and linux-mm]

On Sun 31-07-16 07:11:21, Markus Trippelsdorf wrote:
> Tonight the OOM killer got invoked during backup of /:
> 
> [Jul31 01:56] kthreadd invoked oom-killer: 
> gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0

This a kernel stack allocation.

> [  +0.04] CPU: 3 PID: 2 Comm: kthreadd Not tainted 
> 4.7.0-06816-g797cee982eef-dirty #37
> [  +0.00] Hardware name: System manufacturer System Product 
> Name/M4A78T-E, BIOS 350304/13/2011
> [  +0.02]   813c2d58 8802168e7d48 
> 002ec4ea
> [  +0.02]  8118eb9d 01b8 0440 
> 03b0
> [  +0.02]  8802133fe400 002ec4ea 81b8ac9c 
> 0006
> [  +0.01] Call Trace:
> [  +0.04]  [] ? dump_stack+0x46/0x6e
> [  +0.03]  [] ? dump_header.isra.11+0x4c/0x1a7
> [  +0.02]  [] ? oom_kill_process+0x2ab/0x460
> [  +0.01]  [] ? out_of_memory+0x2e3/0x380
> [  +0.02]  [] ? 
> __alloc_pages_slowpath.constprop.124+0x1d32/0x1e40
> [  +0.01]  [] ? __alloc_pages_nodemask+0x10c/0x120
> [  +0.02]  [] ? copy_process.part.72+0xea/0x17a0
> [  +0.02]  [] ? pick_next_task_fair+0x915/0x1520
> [  +0.01]  [] ? kthread_flush_work_fn+0x20/0x20
> [  +0.01]  [] ? kernel_thread+0x7a/0x1c0
> [  +0.01]  [] ? kthreadd+0xd2/0x120
> [  +0.02]  [] ? ret_from_fork+0x1f/0x40
> [  +0.01]  [] ? kthread_stop+0x100/0x100
> [  +0.01] Mem-Info:
> [  +0.03] active_anon:5882 inactive_anon:60307 isolated_anon:0
>active_file:1523729 inactive_file:223965 isolated_file:0
>unevictable:1970 dirty:130014 writeback:40735 unstable:0
>slab_reclaimable:179690 slab_unreclaimable:8041
>mapped:6771 shmem:3 pagetables:592 bounce:0
>free:11374 free_pcp:54 free_cma:0
> [  +0.04] Node 0 active_anon:23528kB inactive_anon:241228kB 
> active_file:6094916kB inactive_file:895860kB unevictable:7880kB 
> isolated(anon):0kB isolated(file):0kB mapped:27084kB dirty:520056kB 
> writeback:162940kB shmem:12kB writeback_tmp:0kB unstable:0kB pages_scanned:32 
> all_unreclaimable? no
> [  +0.02] DMA free:15908kB min:20kB low:32kB high:44kB active_anon:0kB 
> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
> writepending:0kB present:15992kB managed:15908kB mlocked:0kB 
> slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB 
> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  +0.01] lowmem_reserve[]: 0 3486 7953 7953
> [  +0.04] DMA32 free:23456kB min:4996kB low:8564kB high:12132kB 
> active_anon:2480kB inactive_anon:10564kB active_file:2559792kB 
> inactive_file:478680kB unevictable:0kB writepending:365292kB 
> present:3652160kB managed:3574264kB mlocked:0kB slab_reclaimable:437456kB 
> slab_unreclaimable:12304kB kernel_stack:144kB pagetables:28kB bounce:0kB 
> free_pcp:212kB local_pcp:0kB free_cma:0kB
> [  +0.01] lowmem_reserve[]: 0 0 4466 4466
> [  +0.03] Normal free:6132kB min:6400kB low:10972kB high:15544kB 
> active_anon:21048kB inactive_anon:230664kB active_file:3535124kB 
> inactive_file:417312kB unevictable:7880kB writepending:318020kB 
> present:4718592kB managed:4574096kB mlocked:7880kB slab_reclaimable:281304kB 
> slab_unreclaimable:19860kB kernel_stack:2944kB pagetables:2340kB bounce:0kB 
> free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  +0.00] lowmem_reserve[]: 0 0 0 0
> [  +0.02] DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 
> 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15908kB
> [  +0.05] DMA32: 4215*4kB (UMEH) 319*8kB (UMH) 5*16kB (H) 2*32kB (H) 
> 2*64kB (H) 1*128kB (H) 0*256kB 1*512kB (H) 1*1024kB (H) 1*2048kB (H) 0*4096kB 
> = 23396kB
> [  +0.06] Normal: 650*4kB (UMH) 4*8kB (UH) 27*16kB (H) 23*32kB (H) 
> 17*64kB (H) 11*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6296kB

The memory is quite fragmented but there are order-2+ free blocks. They
seem to be in the high atomic reserves but we should release them.
Is this reproducible? If yes, could you try with the 4.7 kernel please?

Keeping the rest of the emil for reference.

> [  +0.05] 1749526 total pagecache pages
> [  +0.01] 150 pages in swap cache
> [  +0.01] Swap cache stats: add 1222, delete 1072, find 2366/2401
> [  +0.00] Free swap  = 4091520kB
> [  +0.01] Total swap = 4095996kB
> [  +0.00] 2096686 pages RAM
> [  +0.01] 0 pages HighMem/MovableOnly
> [  +0.00] 55619 pages reserved
> [  +0.01] [ pid ]   uid  tgid total_vm  rss nr_ptes nr_pmds swapents 
> oom_score_adj name
> [  +0.04] [  153] 0   153 4087  406   9   3  104  
>-1000 udevd
> [  +0.01] [  181] 0   181 5718 1169  15   3  143  
>0 syslog-ng
> [  +0.01] [  187]   102   18788789 5137  53   3  663  
>0 mpd
> [  +0.02] [  

[GIT PULL] Btrfs

2016-07-31 Thread Chris Mason
Hi Linus,

This is part one of my btrfs pull, and you can find it in my
for-linus-4.8 branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
for-linus-4.8

This pull is dedicated to Josef's enospc rework, which we've been
testing for a few releases now.  It fixes some early enospc problems and
is dramatically faster.

The pull also includes an updated fix for the delalloc accounting that
happens after a fault in copy_from_user.   My patch in v4.7 was almost but
not quite enough.

Dave Sterba has a branch prepped with a cleanup series from Jeff Mahoney
as well as other fixes.  My plan is to send that after wading
through vacation backlog on Monday.

Josef Bacik (19) commits (+679/-344):
Btrfs: avoid deadlocks during reservations in btrfs_truncate_block (+5/-0)
Btrfs: use FLUSH_LIMIT for relocation in reserve_metadata_bytes (+22/-17)
Btrfs: don't bother kicking async if there's nothing to reclaim (+3/-0)
Btrfs: change delayed reservation fallback behavior (+23/-41)
Btrfs: always reserve metadata for delalloc extents (+13/-22)
Btrfs: change how we calculate the global block rsv (+9/-36)
Btrfs: add bytes_readonly to the spaceinfo at once (+11/-18)
Btrfs: introduce ticketed enospc infrastructure (+380/-151)
Btrfs: fill relocation block rsv after allocation (+6/-0)
Btrfs: fix delalloc reservation amount tracepoint (+3/-1)
Btrfs: fix release reserved extents trace points (+1/-5)
Btrfs: fix callers of btrfs_block_rsv_migrate (+18/-25)
Btrfs: always use trans->block_rsv for orphans (+7/-1)
Btrfs: use root when checking need_async_flush (+6/-5)
Btrfs: add tracepoint for adding block groups (+42/-0)
Btrfs: add tracepoints for flush events (+103/-10)
Btrfs: warn_on for unaccounted spaces (+8/-6)
Btrfs: add fsid to some tracepoints (+11/-6)
Btrfs: trace pinned extents (+8/-0)

Chris Mason (1) commits (+5/-7):
Btrfs: fix delalloc accounting after copy_from_user faults

Total: (20) commits (+684/-351)

 fs/btrfs/ctree.h |  15 +-
 fs/btrfs/delayed-inode.c |  68 ++--
 fs/btrfs/extent-tree.c   | 731 +++
 fs/btrfs/file.c  |  16 +-
 fs/btrfs/inode.c |   7 +-
 fs/btrfs/relocation.c|  45 +--
 include/trace/events/btrfs.h | 139 +++-
 7 files changed, 677 insertions(+), 344 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems

2016-07-31 Thread Anatoly Pugachev
On Fri, Jul 15, 2016 at 2:31 AM, Omar Sandoval  wrote:
> From: Omar Sandoval 
>
> So it turns out that the free space tree bitmap handling has always been
> broken on big-endian systems. Totally my bad.
>
> Patch 1 fixes this. Technically, it's a disk format change for
> big-endian systems, but it never could have worked before, so I won't go
> through the trouble of any incompat bits. If you've somehow been using
> space_cache=v2 on a big-endian system (I doubt anyone is), you're going
> to want to mount with nospace_cache to clear it and wait for this to go
> in.
>
> Patch 2 fixes a similar error in the sanity tests (it's the same as the
> v2 I posted here [1]) and patch 3 expands the sanity tests to catch the
> oversight that patch 1 fixes.
>
> Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests
> pass on x86_64 and MIPS.

Omar,

can you please upstream this patch or update it for current git kernel ?
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: systemd KillUserProcesses=yes and btrfs scrub

2016-07-31 Thread Gabriel C


On 30.07.2016 22:02, Chris Murphy wrote:
> Short version: When systemd-logind login.conf KillUserProcesses=yes,
> and the user does "sudo btrfs scrub start" in e.g. GNOME Terminal, and
> then logs out of the shell, the user space operation is killed, and
> btrfs scrub status reports that the scrub was aborted. [1]
> 

How this is a bug ?

Is excatly what 'KillUserProcesses=yes' is extected to do..

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html