Re: Btrfs send to send out metadata and data separately
At 08/02/2016 02:00 AM, Filipe Manana wrote: On Fri, Jul 29, 2016 at 1:40 PM, Qu Wenruowrote: Hi Filipe, and maintainers, I'm recently working on the root fix to free send from calling backref walk. My current idea is to send data and metadata separately, and only do clone detection inside the send subvolume. This method needs two new send commands: (And new send attribute, A_DATA_BYTENR) 1) SEND_C_DATA much like SEND_C_WRITE, with a little change in the 1st TLV. TLVs: A_DATA_BYTENR:bytenr of the data extent A_FILE_OFFSET:offset inside the data extent A_DATA: real data 2) SEND_C_CLONE_DATA A little like SEND_C_CLONE, with unneeded parameters striped TLVs: A_PATH: filename A_DATA_BYTENR:disk_bytenr of the EXTENT_DATA A_FILE_OFFSET:file offset A_FILE_OFFSET:offset inside the EXTENT_DATA A_CLONE_LEN: num_bytes of the EXTENT_DATA The send part is different in how to sending out a EXTENT_DATA. The send work follow is: 1) Found a EXTENT_DATA to send. Check rb_tree of "disk_bytenr". if "disk_bytenr" in rb_tree goto 2) Reflink data /* Initiate a SEND_C_DATA */ Send out the *whole* *uncompressed* extent of "disk_bytenr". Adds "disk_bytenr" into rb_tree 2) Reflink data /* Initiate a SEND_C_CLONE_DATA */ Filling disk_bytenr, offset and num_bytes, and send out the command. That's to say, send will send out extent data and referencer separately. So for kernel part, it's quite easy and *NO* time consuming backref walk ever. And no other part is modified. The main trick happens in the receive part. Receive will do the following thing first before recovering the subvolume/snapshot: 0) Create temporary dir for data extents Create a new dir with temporary name($data_extent), to put data extents into it. Then for SEND_C_DATA command: 1) Create file with file name $filename under $data_extent dir filename = $(printf "0x%x" $disk_bytenr) $disk_bytenr is the first u64 TLV of SEND_A_DATA command. 2) Write data into $data_extent/$filename Then handle the SEND_C_CLONE_DATA command It would be like xfs_io -f -c "reflink $data_extent/$disk_bytenr $extent_offset $file_offset $num_bytes" $filename disk_bytenr=2nd TLV (string converted to u64, with "0x%x") extent_offset=3rd TLV, u64 file_offset=4th TLV, u64 num_bytes=5th TLV, u64 filename=1th TLV, string Finally, after the snapshot/subvolume is recovered, remove the $data_extent directory. The whole idea is to completely remove the time consuming backref walk in send. So pros: 1) No backref walk, no soft lockup, no super long execution time Under worst case O(N^2), best case O(N) Memory usage worst case O(N), best case O(1) Where N is the number of reference to extents. 2) Almost the same metadata layout Including the overlap extents Cons: 1) Not full fs clone detection Such clone detection is only inside the send snapshot. For case that one extent is referred only once in the send snapshot, but also referred by source subvolume, then in the received subvolume, it will be a new extent, but not a clone. Only extent that is referred twice by send snapshot, that extent will be shared. (Although much better than disabling the whole clone detection) 2) Extra space usage Since it completely recovers the overlap extents 3) As many fragments as source subvolume 4) Possible slow recovery due to reflink speed. I am still concerned about the following problems: 1) Is it OK to add not only 1, but 2 new send commands? 2) Is such clone detection range change OK? Any ideas and suggestion is welcomed. Hi Filipe, Thanks for your comment, it helps to refine the idea to fix the problem. New idea is stated at the ending of the mail, hopes it will address all your conern. Qu, I don't like the idea at all, for several reasons: 1) Too complex to implement. We should really avoid making things more complex than they are already. Yes, new command new TLV new receiver behavior, the whole idea itself is complex. But the core function for clone detection is simple. Rb_tree for sent extent bytenr, and avoid sending sent extents. At least it avoids doing expensive backref walk at all. My new idea will keep the core function, while stripe all the new command/TLV/receiver behavior. Your earlier suggestion to cache backref lookups is much simpler and solves the problem for the vast majority of cases (assuming a bounded cache of course). In fact, my earlier suggestion is not to cache backref walk result, but just like this one, implement a internal, simpler backref mapper. The biggest problem of backref cache is, it conflicts with snapshot. Any snapshot will easily trash backrefs of a tree. It means either we do a full tree walk to trash all backref cache, making snapshot much slower, or a broken cache. (And it adds more
Re: Btrfs send to send out metadata and data separately
On Fri, Jul 29, 2016 at 1:40 PM, Qu Wenruowrote: > Hi Filipe, and maintainers, > > I'm recently working on the root fix to free send from calling backref walk. > > My current idea is to send data and metadata separately, and only do clone > detection inside the send subvolume. > > This method needs two new send commands: > (And new send attribute, A_DATA_BYTENR) > 1) SEND_C_DATA >much like SEND_C_WRITE, with a little change in the 1st TLV. > >TLVs: >A_DATA_BYTENR:bytenr of the data extent >A_FILE_OFFSET:offset inside the data extent >A_DATA: real data > > 2) SEND_C_CLONE_DATA >A little like SEND_C_CLONE, with unneeded parameters striped > >TLVs: >A_PATH: filename >A_DATA_BYTENR:disk_bytenr of the EXTENT_DATA >A_FILE_OFFSET:file offset >A_FILE_OFFSET:offset inside the EXTENT_DATA >A_CLONE_LEN: num_bytes of the EXTENT_DATA > > > The send part is different in how to sending out a EXTENT_DATA. > The send work follow is: > > 1) Found a EXTENT_DATA to send. >Check rb_tree of "disk_bytenr". >if "disk_bytenr" in rb_tree > goto 2) Reflink data >/* Initiate a SEND_C_DATA */ >Send out the *whole* *uncompressed* extent of "disk_bytenr". >Adds "disk_bytenr" into rb_tree > > > 2) Reflink data >/* Initiate a SEND_C_CLONE_DATA */ >Filling disk_bytenr, offset and num_bytes, and send out the command. > > That's to say, send will send out extent data and referencer separately. > > So for kernel part, it's quite easy and *NO* time consuming backref walk > ever. > And no other part is modified. > > > The main trick happens in the receive part. > > Receive will do the following thing first before recovering the > subvolume/snapshot: > > 0) Create temporary dir for data extents >Create a new dir with temporary name($data_extent), to put data extents > into it. > > Then for SEND_C_DATA command: > 1) Create file with file name $filename under $data_extent dir >filename = $(printf "0x%x" $disk_bytenr) >$disk_bytenr is the first u64 TLV of SEND_A_DATA command. > 2) Write data into $data_extent/$filename > > Then handle the SEND_C_CLONE_DATA command > It would be like > xfs_io -f -c "reflink $data_extent/$disk_bytenr $extent_offset > $file_offset $num_bytes" $filename > disk_bytenr=2nd TLV (string converted to u64, with "0x%x") > extent_offset=3rd TLV, u64 > file_offset=4th TLV, u64 > num_bytes=5th TLV, u64 > filename=1th TLV, string > > Finally, after the snapshot/subvolume is recovered, remove the $data_extent > directory. > > > The whole idea is to completely remove the time consuming backref walk in > send. > > So pros: > 1) No backref walk, no soft lockup, no super long execution time >Under worst case O(N^2), best case O(N) >Memory usage worst case O(N), best case O(1) >Where N is the number of reference to extents. > > 2) Almost the same metadata layout >Including the overlap extents > > Cons: > 1) Not full fs clone detection >Such clone detection is only inside the send snapshot. > >For case that one extent is referred only once in the send snapshot, >but also referred by source subvolume, then in the received >subvolume, it will be a new extent, but not a clone. > >Only extent that is referred twice by send snapshot, that extent >will be shared. > >(Although much better than disabling the whole clone detection) > 2) Extra space usage >Since it completely recovers the overlap extents > 3) As many fragments as source subvolume > 4) Possible slow recovery due to reflink speed. > > > I am still concerned about the following problems: > > 1) Is it OK to add not only 1, but 2 new send commands? > 2) Is such clone detection range change OK? > > Any ideas and suggestion is welcomed. Qu, I don't like the idea at all, for several reasons: 1) Too complex to implement. We should really avoid making things more complex than they are already. Your earlier suggestion to cache backref lookups is much simpler and solves the problem for the vast majority of cases (assuming a bounded cache of course). There's really no need for such high complexity. 2) By adding new commands to the stream, you break backwards compatibility. Think about all the tools out there that interpret send streams and not just the receive command (for example snapper). 3) By requiring a new different behaviour for the receiver, suddenly older versions of it will no longer be able to receive from new kernels. 4) By keeping temporary files on the receiver end that contains whole extents, you're creating periods of time where stale data is exposed. Thanks. > > Thanks, > Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David
Re: Fixup direct bi_rw modifiers
On 08/01/2016 09:17 AM, Jens Axboe wrote: On 08/01/2016 05:47 AM, Christoph Hellwig wrote: On Sat, Jul 30, 2016 at 04:45:48PM -0500, Shaun Tancheff wrote: bi_rw should be using bio_set_op_attrs to set bi_rw. Looks fine, Reviewed-by: Christoph HellwigAdded, thanks Shaun. Jens, what do you think about renaming bi_rw? There aren't too many users left, and any old code that would keep using it is alsmost guranteed to be broken, so sending a post-rc1 patch to rename it might make everyone else life easier. Especially as it's also grossly misnamed now. I was planning on doing that, after -rc1. Much better to get build breakage, than potentially much worse breakage. Set of three patches, where the target one is an actual bug fix... Temporary branch, I'll rebase it once -rc1 is out, if more changes/fixups need to be made in the next week until that happens. http://git.kernel.dk/cgit/linux-block/log/?h=for-4.8/bi_rwf -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FIDEDUPERANGE with src_length == 0
On Thu, Jul 14, 2016 at 11:16:47AM -0700, Omar Sandoval wrote: > On Thu, Jul 14, 2016 at 02:12:58PM -0400, Chris Mason wrote: > > > > > > On 07/14/2016 02:06 PM, Darrick J. Wong wrote: > > > On Wed, Jul 13, 2016 at 03:19:38PM +0200, David Sterba wrote: > > > > On Tue, Jul 12, 2016 at 10:26:43PM -0700, Darrick J. Wong wrote: > > > > > On Mon, Jul 11, 2016 at 05:35:37PM -0700, Omar Sandoval wrote: > > > > > > Hey, Darrick, > > > > > > > > > > > > generic/182 is failing on Btrfs for me with the following output: > > > > > > > > > > > > --- tests/generic/182.out 2016-07-07 19:51:54.0 -0700 > > > > > > +++ /tmp/fixxfstests/xfstests/results//generic/182.out.bad > > > > > > 2016-07-11 17:28:28.230039216 -0700 > > > > > > @@ -1,12 +1,10 @@ > > > > > > QA output created by 182 > > > > > > Create the original files > > > > > > -dedupe: Extents did not match. > > > > > > f4820540fc0ac02750739896fe028d56 TEST_DIR/test-182/file1 > > > > > > 69ad53078a16243d98e21d9f8704a071 TEST_DIR/test-182/file2 > > > > > > 69ad53078a16243d98e21d9f8704a071 TEST_DIR/test-182/file2.chk > > > > > > Compare against check files > > > > > > Make the original file almost dedup-able > > > > > > -dedupe: Extents did not match. > > > > > > f4820540fc0ac02750739896fe028d56 TEST_DIR/test-182/file1 > > > > > > 158d4e3578b94b89cbb44493a2110fb9 TEST_DIR/test-182/file2 > > > > > > 158d4e3578b94b89cbb44493a2110fb9 TEST_DIR/test-182/file2.chk > > > > > > > > > > > > It looks like that test is checking that a dedupe with length == 0 > > > > > > is > > > > > > treated as a dedupe to EOF, but Btrfs doesn't do that [1]. As far > > > > > > as I > > > > > > can tell, it never did, but maybe I'm just confused. What was the > > > > > > behavior when you introduced that test? That seems like a reasonable > > > > > > thing to do, but I wanted to clear this up before changing/fixing > > > > > > Btrfs. > > > > > > > > > > It's a shortcut that we're introducing in the upcoming XFS > > > > > implementation, > > > > > since it shares the same back end as clone/clonerange, which both have > > > > > this behavior. > > > > > > > > The support for zero length does not seem to be mentioned anywhere with > > > > the dedupe range ioctl [1], so the current implemetnation is "up to > > > > spec". That it should be valid is hidden in clone_verify_area where a > > > > zero length is substituted with OFFSET_MAX > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lxr.free-2Delectrons.com_source_fs_read-5Fwrite.c-23L1607=CwIBAg=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=CKo3CgE8Up_NBDdC9t7fCuwHwsdf6nZG2nKcl5-NqnI=ZymMvbZ2mZOYBKya3guibggSaaqOHZUqedhz0pT5PPc= > > > > > > > > So it looks like it's up to the implementation in the filesystem to > > > > handle that. As the btrfs ioctl was extent-based, a zero length extent > > > > does not make sense, so this case was not handled. But in your patch > > > > > > > > 2b3909f8a7fe94e0234850aa9d120cca15b6e1f7 > > > > btrfs: use new dedupe data function pointer > > > > > > > > it was suddenly expected to work. So the missing bits are either 'not > > > > supported' for zero length or actually implement iteration over the > > > > whole file. > > > > > > > > [1] > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mankier.com_2_ioctl-5Ffideduperange=CwIBAg=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=CKo3CgE8Up_NBDdC9t7fCuwHwsdf6nZG2nKcl5-NqnI=NYdHr9JyZZNKPLsOf_VmtZ-3X2B1azTYfyE4Lf1Fa5w= > > > > > > Well, we can't change the semantics now because there could be programs > > > that > > > aren't expecting a nonzero return from a length == 0 dedupe, so like > > > Christoph > > > said, I'll just change generic/182 and make the VFS wrapper emulate the > > > btrfs > > > behavior so that any subsequent implementation won't hit this. > > > > > > I'll update the clone/clonerange manpages to mention the 0 -> EOF > > > behavior. > > > > Its fine with me if we change btrfs to do the 0->EOF. It's a corner case > > I'm happy to include. > > > > -chris > > Yeah, I think it's a nice shortcut. Are there any programs which > wouldn't want this, though? It's a milder sort of correctness problem > since dedupe is "safe", but maybe there's some tool which is being dumb > and trying to dedupe nothing. The only users of extent-same that I know of are duperemove and xfs_io. duperemove doesn't seem to send zero-length dedupe requests. xfs_io will if you tell it to, but the xfs_io feature is there primarily to run xfstests. Christoph has a valid point that we don't know the full set of users, so we could be breaking them by changing this aspect. OTOH this was an undocumented ioctl for quite a while... --D > > -- > Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
All of these have status R and D for their duration, and while all get a SIGKILL from systemd on logout, none of the processes change status or die until their kernel task is done. And each of these operations complete successfully with no worse for the wear. btrfs balance & btrfs dev rem & btrfs replace start Only 'btrfs scrub' has status S, and once it gets SIGKILL, it goes Z and all of its accounting is wrong. But the kernel tasks continue and appear to complete. I did all of this with a btrfs raid5, 3 and 4 disks, in a libvirt VM. --- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
Aug 01 09:29:59 localhost.localdomain sudo[1875]:chris : TTY=pts/1 ; PWD=/home/chris ; USER=root ; COMMAND=/sbin/btrfs scrub start /mnt/x Aug 01 09:30:16 localhost.localdomain systemd[1]: user@1000.service: Killing process 1883 (btrfs) with signal SIGKILL. Aug 01 09:43:34 localhost.localdomain sudo[2574]:chris : TTY=pts/1 ; PWD=/home/chris ; USER=root ; COMMAND=/sbin/btrfs scrub start /mnt/x Aug 01 09:43:53 localhost.localdomain systemd[1]: user@1000.service: Killing process 2579 (btrfs) with signal SIGKILL. Aug 01 11:41:00 localhost.localdomain sudo[3479]:chris : TTY=pts/1 ; PWD=/home/chris ; USER=root ; COMMAND=/sbin/btrfs fi show Aug 01 11:41:04 localhost.localdomain sudo[3492]:chris : TTY=pts/1 ; PWD=/home/chris ; USER=root ; COMMAND=/sbin/btrfs balance start /mnt/x Aug 01 11:41:14 localhost.localdomain kernel: BTRFS info (device vdb): relocating block group 24800919552 flags 130 Aug 01 11:41:14 localhost.localdomain kernel: BTRFS info (device vdb): relocating block group 23727177728 flags 132 Aug 01 11:41:14 localhost.localdomain kernel: BTRFS info (device vdb): found 6 extents Aug 01 11:41:14 localhost.localdomain kernel: BTRFS info (device vdb): relocating block group 21512585216 flags 129 Aug 01 11:41:24 localhost.localdomain kernel: BTRFS info (device vdb): found 27447 extents Aug 01 11:41:26 localhost.localdomain kernel: BTRFS info (device vdb): found 27446 extents Aug 01 11:41:26 localhost.localdomain kernel: BTRFS info (device vdb): relocating block group 19365101568 flags 129 Aug 01 11:41:36 localhost.localdomain systemd[1]: user@1000.service: Killing process 3499 (btrfs) with signal SIGKILL. It's using SIGKILL. The process goes Z for scrub but nothing happens for balance. Weird. It's definitely not exempt though. Hmm, when I don't filter the journal for btrfs... Aug 01 11:54:38 localhost.localdomain systemd[3623]: Starting Exit the Session... Aug 01 11:54:38 localhost.localdomain systemd[3623]: Received SIGRTMIN+24 from PID 4269 (kill). Aug 01 11:54:38 localhost.localdomain systemd[1]: user@1000.service: Killing process 4206 (sudo) with signal SIGKILL. Aug 01 11:54:38 localhost.localdomain systemd[1]: user@1000.service: Killing process 4213 (btrfs) with signal SIGKILL. Aug 01 11:54:38 localhost.localdomain systemd[1]: Stopped User Manager for UID 1000. Aug 01 11:54:38 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1000 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Aug 01 11:54:38 localhost.localdomain systemd[1]: Removed slice User Slice of chris. In this case PID 4213 is the process that's still flipping between status R and D. Kill is sent, but ignored. But not for scrubbing... Aug 01 11:58:21 localhost.localdomain systemd[4294]: Starting Exit the Session... Aug 01 11:58:21 localhost.localdomain systemd[4294]: Received SIGRTMIN+24 from PID 4933 (kill). Aug 01 11:58:21 localhost.localdomain systemd[4301]: pam_unix(systemd-user:session): session closed for user chris Aug 01 11:58:21 localhost.localdomain systemd[1]: user@1000.service: Killing process 4866 (btrfs) with signal SIGKILL. Aug 01 11:58:21 localhost.localdomain systemd[1]: Stopped User Manager for UID 1000. Aug 01 11:58:21 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1000 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Aug 01 11:58:21 localhost.localdomain systemd[1]: Removed slice User Slice of chris. Must have something to do with the use of & with balance, which scrub doesn't need to go to the background. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixup direct bi_rw modifiers
On 08/01/2016 05:47 AM, Christoph Hellwig wrote: On Sat, Jul 30, 2016 at 04:45:48PM -0500, Shaun Tancheff wrote: bi_rw should be using bio_set_op_attrs to set bi_rw. Looks fine, Reviewed-by: Christoph HellwigAdded, thanks Shaun. Jens, what do you think about renaming bi_rw? There aren't too many users left, and any old code that would keep using it is alsmost guranteed to be broken, so sending a post-rc1 patch to rename it might make everyone else life easier. Especially as it's also grossly misnamed now. I was planning on doing that, after -rc1. Much better to get build breakage, than potentially much worse breakage. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On Mon, Aug 1, 2016 at 11:19 AM, Austin S. Hemmelgarnwrote: > On 2016-08-01 13:15, Chris Murphy wrote: >> I've been using balance with &, and when I logout, the btrfs command >> continues to flip between status D and R, just like before logout and >> it appears to complete. I still get status messages of the balance >> after logout, in kernel messages. >> > Interesting, maybe balance is explicitly white-listed? Either that, or it > just ignores whatever signal systemd uses to kill stuff in this context (I > initially thought SIGTERM, but SIGHUP would make more sense in this > context), which wouldn't surprise me either. I'm not aware of any program specific white listing method with KillUserProcesses=yes. However, there is KillExcludeUsers which by default is KillExcludeUsers=root. Everything I run as sudo appears in top and ps as use root. So are these processes exempt? And if so, why is btrfs scrub becoming a zombie process? I don't know if it's appropriate, but I asked about it (no response yet), whether all things sudo should just be moved out of the user session. In my own head I don't associate sudo commands with my user or my user session, and at least top and ps agree with the former, so why not have sudo'd processes put in a different scope from the outset? -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On 2016-08-01 13:15, Chris Murphy wrote: On Mon, Aug 1, 2016 at 10:58 AM, Austin S. Hemmelgarnwrote: On 2016-08-01 12:19, Chris Murphy wrote: On Mon, Aug 1, 2016 at 10:08 AM, Austin S. Hemmelgarn wrote: MD and DM RAID handle this by starting kernel threads to do the scrub. They then store the info about the scrub in the array itself, so you can query it externally. If you watch, neither of those commands runs longer than it takes to start the operation, so there's nothing for systemd to kill. pvmove continues to run and report progress so it can be killed off, but it only polls for statistics, it's not actually recording them. So even though it gets killed, subsequent pvmove command shows correct statistics. Because all that the pvmove command is doing is polling for statistics. It actually works kind of like a scrub, all the actual work is done in the kernel, the userspace component just handles reporting. The difference is that the move operation is accounted and mutexed in the kernel itself, instead of userspace like scrub does. This model is actually essentially what I think scrub (and balance for that matter) should look like, and if implemented right, we could actually store scrub results in the FS itself (that is, in the metadata, not in special files or anything like that). So that makes me wonder how btrfs device add and remove will behave, if issued in a DE which is then logged out of. Those commands do not return to prompt until they complete. They work via balance, so they should behave the same as a balance command, which means it will likely run part way then get cancelled because of the SIGTERM to the userspace component (assuming of course that it is still running when you log out). I've been using balance with &, and when I logout, the btrfs command continues to flip between status D and R, just like before logout and it appears to complete. I still get status messages of the balance after logout, in kernel messages. Interesting, maybe balance is explicitly white-listed? Either that, or it just ignores whatever signal systemd uses to kill stuff in this context (I initially thought SIGTERM, but SIGHUP would make more sense in this context), which wouldn't surprise me either. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On Mon, Aug 1, 2016 at 10:58 AM, Austin S. Hemmelgarnwrote: > On 2016-08-01 12:19, Chris Murphy wrote: >> >> On Mon, Aug 1, 2016 at 10:08 AM, Austin S. Hemmelgarn >> wrote: >> >>> >>> MD and DM RAID handle this by starting kernel threads to do the scrub. >>> They >>> then store the info about the scrub in the array itself, so you can query >>> it >>> externally. If you watch, neither of those commands runs longer than it >>> takes to start the operation, so there's nothing for systemd to kill. >> >> >> pvmove continues to run and report progress so it can be killed off, >> but it only polls for statistics, it's not actually recording them. So >> even though it gets killed, subsequent pvmove command shows correct >> statistics. > > Because all that the pvmove command is doing is polling for statistics. It > actually works kind of like a scrub, all the actual work is done in the > kernel, the userspace component just handles reporting. The difference is > that the move operation is accounted and mutexed in the kernel itself, > instead of userspace like scrub does. This model is actually essentially > what I think scrub (and balance for that matter) should look like, and if > implemented right, we could actually store scrub results in the FS itself > (that is, in the metadata, not in special files or anything like that). >> >> >> So that makes me wonder how btrfs device add and remove will behave, >> if issued in a DE which is then logged out of. Those commands do not >> return to prompt until they complete. > > They work via balance, so they should behave the same as a balance command, > which means it will likely run part way then get cancelled because of the > SIGTERM to the userspace component (assuming of course that it is still > running when you log out). I've been using balance with &, and when I logout, the btrfs command continues to flip between status D and R, just like before logout and it appears to complete. I still get status messages of the balance after logout, in kernel messages. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/17] xfs: run xfs_repair at the end of each test
On Sun, Jul 31, 2016 at 11:27:19PM -0700, Christoph Hellwig wrote: > On Thu, Jul 21, 2016 at 04:46:54PM -0700, Darrick J. Wong wrote: > > Run xfs_repair twice at the end of each test -- once to rebuild > > the btree indices, and again with -n to check the rebuild work. > > This looks fine to me in general, but shouldn't we have specific > tests that test the rebuilding in a normal auto run? We do have specific tests that examine the outputs of rebuilding the indices (all the fuzzer group tests do this too); this patch enables a test runner to expand that coverage to all tests. Running a rebuilding xfs_repair for all the tests shook out some bugs in the xfs_repair rmap handling code that only triggered under some of the non-rmap non-reflink stressor tests. --D -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On 2016-08-01 12:19, Chris Murphy wrote: On Mon, Aug 1, 2016 at 10:08 AM, Austin S. Hemmelgarnwrote: MD and DM RAID handle this by starting kernel threads to do the scrub. They then store the info about the scrub in the array itself, so you can query it externally. If you watch, neither of those commands runs longer than it takes to start the operation, so there's nothing for systemd to kill. pvmove continues to run and report progress so it can be killed off, but it only polls for statistics, it's not actually recording them. So even though it gets killed, subsequent pvmove command shows correct statistics. Because all that the pvmove command is doing is polling for statistics. It actually works kind of like a scrub, all the actual work is done in the kernel, the userspace component just handles reporting. The difference is that the move operation is accounted and mutexed in the kernel itself, instead of userspace like scrub does. This model is actually essentially what I think scrub (and balance for that matter) should look like, and if implemented right, we could actually store scrub results in the FS itself (that is, in the metadata, not in special files or anything like that). So that makes me wonder how btrfs device add and remove will behave, if issued in a DE which is then logged out of. Those commands do not return to prompt until they complete. They work via balance, so they should behave the same as a balance command, which means it will likely run part way then get cancelled because of the SIGTERM to the userspace component (assuming of course that it is still running when you log out). Replace was implemented the way scrub should have been. It's done entirely in the kernel, and the userspace tools just start, stop and check status. We should just get rid of the whole scrub state file crap and have a way to query the last scrub status directly from the FS. That would fix this particular issue, and make scrub more consistent with everything else (and solve the stale scrub status bug too). OK, I'll update the bug report. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qcow2 becomes 37P in size while qemu crashes
On Mon, Aug 1, 2016 at 9:26 AM, Chris Masonwrote: > > > On 07/23/2016 04:05 PM, Chris Murphy wrote: >> >> Using btrfs-debug-tree, I'm finding something a bit odd about some of >> the items in this 39P file. >> >> Seems normal >> >> item 71 key (994163 EXTENT_DATA 43689967616) itemoff 12467 itemsize 53 >> extent data disk byte 617349906432 nr 80805888 >> extent data offset 0 nr 80805888 ram 80805888 >> extent compression(none) >> >> Seems not normal >> >> item 58 key (994163 EXTENT_DATA 38345535488) itemoff 13156 itemsize 53 >> extent data disk byte 0 nr 0 >> extent data offset 394752000 nr 61440 ram 34626881742770176 >> extent compression(none) >> >> There are quite a large number of items that take the 2nd form, and in >> each case the ram value is the same as above. >> >> > > Can't really blame a bit flip for that one. Looks like our hole truncation > math has gone crazy there. I'll see what I can find, but please yell if you > can reproduce. I've been trying, but no joy. Several bugs are required before hitting this. User bug: I inadvertently edited the machine using virsh to use the same backing qcow2 file for vda and vdb. virsh bug: No warning for this double usage. virt-manager bug: Starts the VM without warning when two virtio devices are pointed to the same backing file. Corruption of the file is inevitable. But exactly when it grew to 37P I'm not sure, so far reproducing this results in a file limited to the create time specified file size. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: send, don't bug on inconsistent snapshots
From: Filipe MananaWhen doing an incremental send, if we find a new/modified/deleted extent, reference or xattr without having previously processed the corresponding inode item we end up exexuting a BUG_ON(). This is because whenever an extent, xattr or reference is added, modified or deleted, we always expect to have the corresponding inode item updated. However there are situations where this will not happen due to transient -ENOMEM or -ENOSPC errors when doing delayed inode updates. For example, when punching holes we can succeed in deleting and modifying (shrinking) extents but later fail to do the delayed inode update. So after such failure we close our transaction handle and right after a snapshot of the fs/subvol tree can be made and used later for a send operation. The same thing can happen during truncate, link, unlink, and xattr related operations. So instead of executing a BUG_ON, make send return an -EIO error and print an informative error message do dmesg/syslog. Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 48 +--- 1 file changed, 45 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 2db8dc8..efe129f 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -273,6 +273,39 @@ struct name_cache_entry { char name[]; }; +static void inconsistent_snapshot_error(struct send_ctx *sctx, + enum btrfs_compare_tree_result result, + const char *what) +{ + const char *result_string; + + switch (result) { + case BTRFS_COMPARE_TREE_NEW: + result_string = "new"; + break; + case BTRFS_COMPARE_TREE_DELETED: + result_string = "deleted"; + break; + case BTRFS_COMPARE_TREE_CHANGED: + result_string = "updated"; + break; + case BTRFS_COMPARE_TREE_SAME: + ASSERT(0); + result_string = "unchanged"; + break; + default: + ASSERT(0); + result_string = "unexpected"; + } + + btrfs_err(sctx->send_root->fs_info, + "Send: inconsistent snapshot, found %s %s for inode %llu without updated inode item, send root is %llu, parent root is %llu", + result_string, what, sctx->cmp_key->objectid, + sctx->send_root->root_key.objectid, + (sctx->parent_root ? + sctx->parent_root->root_key.objectid : 0)); +} + static int is_waiting_for_move(struct send_ctx *sctx, u64 ino); static struct waiting_dir_move * @@ -5711,7 +5744,10 @@ static int changed_ref(struct send_ctx *sctx, { int ret = 0; - BUG_ON(sctx->cur_ino != sctx->cmp_key->objectid); + if (sctx->cur_ino != sctx->cmp_key->objectid) { + inconsistent_snapshot_error(sctx, result, "reference"); + return -EIO; + } if (!sctx->cur_inode_new_gen && sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID) { @@ -5736,7 +5772,10 @@ static int changed_xattr(struct send_ctx *sctx, { int ret = 0; - BUG_ON(sctx->cur_ino != sctx->cmp_key->objectid); + if (sctx->cur_ino != sctx->cmp_key->objectid) { + inconsistent_snapshot_error(sctx, result, "xattr"); + return -EIO; + } if (!sctx->cur_inode_new_gen && !sctx->cur_inode_deleted) { if (result == BTRFS_COMPARE_TREE_NEW) @@ -5760,7 +5799,10 @@ static int changed_extent(struct send_ctx *sctx, { int ret = 0; - BUG_ON(sctx->cur_ino != sctx->cmp_key->objectid); + if (sctx->cur_ino != sctx->cmp_key->objectid) { + inconsistent_snapshot_error(sctx, result, "extent"); + return -EIO; + } if (!sctx->cur_inode_new_gen && !sctx->cur_inode_deleted) { if (result != BTRFS_COMPARE_TREE_DELETED) -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Corruption through subsequent compression?
Dear list, I've got into some trouble with my 40GB BTRFS formatted root partition in a 2015 Macbook 15": I was running low on free space so I remembered, that I had forgotten to enable compression while installing Antergos (a flavour of Arch Linux). So I looked into the Arch Wiki at https://wiki.archlinux.org/index.php/Btrfs and executed "btrfs filesystem defragment -r -v -clzo /" as normal user. I noticed, that it would only compress files inside my home folder (which is on the same partition), so I run it again with "sudo". After some time I was surprised by windows closing and thought, that this might release file handles or something like that (I'm not experienced with the inner structure of BTRFS and optimization like defragment). I feared what might happen, when the console session, in which I was running the btrfs command, would get closed, but Ctrl-C-ing it would probably do damage too - so I waited till the window vanished. (KDE did already crash previously, so there already weren't any window decorations etc.). After a reboot I tried running the same command again in a login shell (by pressing CTRL+ALT+F2) (so no desktop environment which could interfere somehow) and at some time it said "no space left on device". I started searching and read about metadata being full resulting in this error. A quick check resulted in about 75% usage, so I tried a rebalance with increasing "dlimit" till it tried moving blocks - this was the case at dlimit=12. Again I received the annoying "no space left on device" message and deleted some files (pacman cache etc.) and tried it again some more times. After some tries the btrfs executable crashed with a weird error saying: "BTRFS: error (device dm-0) in merge_reloc_roots:2421: errno=-28 No space left --[ cut here ]-- kernel BUG at fs/btrfs/volumes.c:3421! invalid opcode: [#1] PREEMPT SMP Modules linked in: [...]" I took a picture of the screen in case the complete error message might be needed. Now I knew I was in trouble but didn't want to give up. I thought, that the compression might have somehow overwritten part of the btrfs executable with zeros and tried reinstalling it from the pacman cache, but it wasn't possible because of the filesystem being switched to read only. After a reboot things were looking even worser: Something like "...fixing, but needing a reboot" scrolled to the top and afterwards it only kept saying something like "CPU #... is hanging for [something about 20s]" and named a process (mount, kworker and some btrfs subprocess). At this time it was past 2 am and all I wanted was getting some sleep, so I just powered it of by holding the power key. Now, some hours later, I still don't know, whether my filesystem still can be fixed or whether I'm better left of nuking it and starting over... My most important files are placed on a EXFAT formated partition and partially synced to some cloud-service, so I would probably only loose some time to redownload all applications (and my browser history etc.). I was really constrained on space so I didn't had much options for backing up the system. Next time I probably should get an external hard drive... Now I'm constrained to use OS X but at least my Macbook is still usable (so the SSD still seams to be working). As I'm not experienced with this kind of situation I am not sure what to do - probably doing some investigation out of a live system? Could somebody please help me in trying to fix my filesystem? Greetings, Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On Mon, Aug 1, 2016 at 10:19 AM, Chris Murphywrote: > So that makes me wonder how btrfs device add and remove will behave, > if issued in a DE which is then logged out of. Those commands do not > return to prompt until they complete. Strike add. That's fast. I'm concerned about dev delete/remove and also resize. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On Mon, Aug 1, 2016 at 10:08 AM, Austin S. Hemmelgarnwrote: > > MD and DM RAID handle this by starting kernel threads to do the scrub. They > then store the info about the scrub in the array itself, so you can query it > externally. If you watch, neither of those commands runs longer than it > takes to start the operation, so there's nothing for systemd to kill. pvmove continues to run and report progress so it can be killed off, but it only polls for statistics, it's not actually recording them. So even though it gets killed, subsequent pvmove command shows correct statistics. So that makes me wonder how btrfs device add and remove will behave, if issued in a DE which is then logged out of. Those commands do not return to prompt until they complete. > Replace was implemented the way scrub should have been. It's done entirely > in the kernel, and the userspace tools just start, stop and check status. > We should just get rid of the whole scrub state file crap and have a way to > query the last scrub status directly from the FS. That would fix this > particular issue, and make scrub more consistent with everything else (and > solve the stale scrub status bug too). OK, I'll update the bug report. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On 2016-08-01 11:46, Chris Murphy wrote: OK I've created a new volume that's sufficiently large I can tell if the kernel workers doing the scrub are also being killed off. First, I do a scrub without logging out to get a time for an uninterrupted scrub. And then initiate a scrub which I start timing, but then logout of the DE and watch for the kernel workers to stop. - The kernel workers are killed off within ~5 seconds of an uninterrupted scrub. Conclusion is the scrub is still happening by the kernel. This makes sense, systemd is killing based on session ID, and the kernel workers have an sid of 0 (I think, it should be whatever the sid of kthreadd (PID 2) has). - The btrfs process for the scrub isn't killed either, it's just status Z for the entire length of the scrub. Z means the process is dead, but nothing has called wait() or similar to get status info from it, so it was killed, it's just that nothing took the body to the morgue yet. - While this scrubbing is happening, issuing a 'btrfs scrub status' gets me consistently stale information. It's the same information from the moment the DE was logged out. This makes sense, because the userspace component updates this info (and that's _all_ it does). [root@localhost ~]# btrfs scrub status /mnt/x scrub status for 9f9e5e1f-8d5a-44a0-8f69-8a393fb7ff3c scrub started at Mon Aug 1 09:29:59 2016, running for 00:00:15 total bytes scrubbed: 3.06GiB with 0 errors Even a minute later this information is the same. Once the zombie btrfs process dies off, and the kernel workers stop working, I get this bogus status information: [root@localhost ~]# btrfs scrub status /mnt/x scrub status for 9f9e5e1f-8d5a-44a0-8f69-8a393fb7ff3c scrub started at Mon Aug 1 09:29:59 2016, interrupted after 00:00:15, not running total bytes scrubbed: 3.06GiB with 0 errors Only the user process was interrupted. Not the scrub. Looks like only the user process is writing out the statistics and status, so once it goes zombie, there's no accounting, rather than accounting being done independently via sysfs. Can I resume this scrub? Yes. But that's also bogus because there really isn't anything to resume. All that work was done already, it just hasn't been accounted for. So whether you want to call this a bug, or deeply suboptimal behavior, I think that's splitting hairs. Neither mdadm nor LVM scrubs are affected by this logout behavior and systemd killing off user processes. I always get reliable scrub status information from either 'echo check md/sync_action' or 'lvchange --syncaction check' before and after logging out of the DE from which the command was issued. MD and DM RAID handle this by starting kernel threads to do the scrub. They then store the info about the scrub in the array itself, so you can query it externally. If you watch, neither of those commands runs longer than it takes to start the operation, so there's nothing for systemd to kill. And it's even inconsistent with btrfs replace where it continues to give me correct status information from a tty shell even though the replace command was issued in a DE, subsequently logged out of. So 'btrfs scrub' is inconsistent no matter how you look at it. It's a bug. Replace was implemented the way scrub should have been. It's done entirely in the kernel, and the userspace tools just start, stop and check status. We should just get rid of the whole scrub state file crap and have a way to query the last scrub status directly from the FS. That would fix this particular issue, and make scrub more consistent with everything else (and solve the stale scrub status bug too). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On Mon, Aug 1, 2016 at 9:46 AM, Chris Murphywrote: > - The kernel workers are killed off within ~5 seconds of an > uninterrupted scrub. i.e. the kernel workers are doing the same work. They aren't being killed sooner as a result of logging out from the DE. The only apparent change from logging out from the DE from which the scrub was issued, is the btrfs process becomes status Z. It is in fact not being killed, which itself is kinda interesting/unexpected. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
OK I've created a new volume that's sufficiently large I can tell if the kernel workers doing the scrub are also being killed off. First, I do a scrub without logging out to get a time for an uninterrupted scrub. And then initiate a scrub which I start timing, but then logout of the DE and watch for the kernel workers to stop. - The kernel workers are killed off within ~5 seconds of an uninterrupted scrub. Conclusion is the scrub is still happening by the kernel. - The btrfs process for the scrub isn't killed either, it's just status Z for the entire length of the scrub. - While this scrubbing is happening, issuing a 'btrfs scrub status' gets me consistently stale information. It's the same information from the moment the DE was logged out. [root@localhost ~]# btrfs scrub status /mnt/x scrub status for 9f9e5e1f-8d5a-44a0-8f69-8a393fb7ff3c scrub started at Mon Aug 1 09:29:59 2016, running for 00:00:15 total bytes scrubbed: 3.06GiB with 0 errors Even a minute later this information is the same. Once the zombie btrfs process dies off, and the kernel workers stop working, I get this bogus status information: [root@localhost ~]# btrfs scrub status /mnt/x scrub status for 9f9e5e1f-8d5a-44a0-8f69-8a393fb7ff3c scrub started at Mon Aug 1 09:29:59 2016, interrupted after 00:00:15, not running total bytes scrubbed: 3.06GiB with 0 errors Only the user process was interrupted. Not the scrub. Looks like only the user process is writing out the statistics and status, so once it goes zombie, there's no accounting, rather than accounting being done independently via sysfs. Can I resume this scrub? Yes. But that's also bogus because there really isn't anything to resume. All that work was done already, it just hasn't been accounted for. So whether you want to call this a bug, or deeply suboptimal behavior, I think that's splitting hairs. Neither mdadm nor LVM scrubs are affected by this logout behavior and systemd killing off user processes. I always get reliable scrub status information from either 'echo check md/sync_action' or 'lvchange --syncaction check' before and after logging out of the DE from which the command was issued. And it's even inconsistent with btrfs replace where it continues to give me correct status information from a tty shell even though the replace command was issued in a DE, subsequently logged out of. So 'btrfs scrub' is inconsistent no matter how you look at it. It's a bug. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qcow2 becomes 37P in size while qemu crashes
On 07/23/2016 04:05 PM, Chris Murphy wrote: Using btrfs-debug-tree, I'm finding something a bit odd about some of the items in this 39P file. Seems normal item 71 key (994163 EXTENT_DATA 43689967616) itemoff 12467 itemsize 53 extent data disk byte 617349906432 nr 80805888 extent data offset 0 nr 80805888 ram 80805888 extent compression(none) Seems not normal item 58 key (994163 EXTENT_DATA 38345535488) itemoff 13156 itemsize 53 extent data disk byte 0 nr 0 extent data offset 394752000 nr 61440 ram 34626881742770176 extent compression(none) There are quite a large number of items that take the 2nd form, and in each case the ram value is the same as above. Can't really blame a bit flip for that one. Looks like our hole truncation math has gone crazy there. I'll see what I can find, but please yell if you can reproduce. Thanks! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
On 07/24/2016 08:36 PM, Dave Chinner wrote: On Fri, Jul 08, 2016 at 12:02:35PM -0400, Chris Mason wrote: Can you please run the attached test program: gcc -o short-write short-write.c -lpthread ./short-write some-new-file-on-btrfs Hi Chris, this seems like a useful thing to be testing on a regular basis - can you turn this into an xfstests regression test and submit it? [ vacation backlog delay apologies ] Hi Dave, The test is just a thread constantly madvising away the memory being used for IO. It would be interesting to add a --evil option to either xfs_io or fsx so the evil is compounded across all of the existing xfstests. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd KillUserProcesses=yes and btrfs scrub
On 2016-07-30 20:29, Chris Murphy wrote: On Sat, Jul 30, 2016 at 2:02 PM, Chris Murphywrote: Short version: When systemd-logind login.conf KillUserProcesses=yes, and the user does "sudo btrfs scrub start" in e.g. GNOME Terminal, and Same thing with Xfce, so it's not DE specific. (Unsuprising.) I inflated the size of the test volume, and it seems pretty clear that the scrub is not completing, as the kernel threads stop sooner when logging out vs not logging out. So the status reporting an interruption appears to be valid for the net operation, not merely the user space tool being interrupted. You have your terminals set to start the shell as a login shell I'm guessing. That's probably why closing the terminal window is triggering systemd's process killing. It will of course still trigger when you close the graphical session though. Personally, this is yet another reason for me to not like systemd. This setting breaks traditional UNIX userspace semantics. Personally, I'm with Duncan on this one though, if resume works correctly, then it's not a bug, just a bad interaction between an administrative tool designed for a server and an init system designed for a desktop. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixup direct bi_rw modifiers
On Sat, Jul 30, 2016 at 04:45:48PM -0500, Shaun Tancheff wrote: > bi_rw should be using bio_set_op_attrs to set bi_rw. Looks fine, Reviewed-by: Christoph HellwigJens, what do you think about renaming bi_rw? There aren't too many users left, and any old code that would keep using it is alsmost guranteed to be broken, so sending a post-rc1 patch to rename it might make everyone else life easier. Especially as it's also grossly misnamed now. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/17] xfs/122: add the realtime rmapbt inode and btree fields
On Thu, Jul 21, 2016 at 04:48:00PM -0700, Darrick J. Wong wrote: > Add the on-disk structures added by the realtime rmapbt. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs-progs: Introduce new send-dump object
Introduce send-dump.[ch] which implements a new btrfs_send_ops to exam and output all operations inside a send stream. It has a better output format than the old and no longer compilable send-test tool, but still tries to be script friendly. Provides the basis for later "inspect-internal dump-send" command. Signed-off-by: Qu Wenruo--- Makefile.in | 2 +- send-dump.c | 367 send-dump.h | 24 3 files changed, 392 insertions(+), 1 deletion(-) create mode 100644 send-dump.c create mode 100644 send-dump.h diff --git a/Makefile.in b/Makefile.in index ac6b353..97caf95 100644 --- a/Makefile.in +++ b/Makefile.in @@ -80,7 +80,7 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ extent-cache.o extent_io.o volumes.o utils.o repair.o \ qgroup.o raid6.o free-space-cache.o list_sort.o props.o \ ulist.o qgroup-verify.o backref.o string-table.o task-utils.o \ - inode.o file.o find-root.o free-space-tree.o help.o + inode.o file.o find-root.o free-space-tree.o help.o send-dump.o cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \ cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \ cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \ diff --git a/send-dump.c b/send-dump.c new file mode 100644 index 000..bf451c7 --- /dev/null +++ b/send-dump.c @@ -0,0 +1,367 @@ +/* + * Copyright (C) 2016 Fujitsu. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "utils.h" +#include "commands.h" +#include "send-utils.h" +#include "send-stream.h" +#include "send-dump.h" + +#define path_cat_out_with_error(function_name, out_path, path1, path2, ret) \ +ret = path_cat_out(out_path, path1, path2);\ +if (ret < 0) { \ + error("%s: path invalid: %s\n", function_name, path2); \ + return ret; \ +} + +#define TITLE_WIDTH16 +#define PATH_WIDTH 32 + +static void print_dump(const char *title, const char *path, + const char *fmt, ...) +{ + va_list args; + char real_title[TITLE_WIDTH + 1]; + + real_title[0]='\0'; + /* Append ':' to title*/ + strncpy(real_title, title, TITLE_WIDTH - 1); + strncat(real_title, ":", TITLE_WIDTH); + + /* Unified output */ + printf("%-*s%-*s", TITLE_WIDTH, real_title, PATH_WIDTH, path); + va_start(args, fmt); + /* Command specified output */ + vprintf(fmt, args); + va_end(args); + printf("\n"); +} + +static int print_subvol(const char *path, const u8 *uuid, u64 ctransid, + void *user) +{ + struct btrfs_dump_send_args *r = user; + char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; + int ret; + + path_cat_out_with_error("subvol", r->full_subvol_path, r->root_path, + path, ret); + uuid_unparse(uuid, uuid_str); + + print_dump("subvol", r->full_subvol_path, "uuid: %s, transid: %llu", + uuid_str, ctransid); + return 0; +} + +static int print_snapshot(const char *path, const u8 *uuid, u64 ctransid, + const u8 *parent_uuid, u64 parent_ctransid, + void *user) +{ + struct btrfs_dump_send_args *r = user; + char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; + char parent_uuid_str[BTRFS_UUID_UNPARSED_SIZE]; + int ret; + + path_cat_out_with_error("snapshot", r->full_subvol_path, r->root_path, + path, ret); + uuid_unparse(uuid, uuid_str); + uuid_unparse(parent_uuid, parent_uuid_str); + + print_dump("snapshot", r->full_subvol_path, + "uuid: %s, transid: %llu, parent_uuid: %s, parent_transid: %llu", + uuid_str, ctransid, parent_uuid_str, parent_ctransid); + return 0; +} + +static int print_mkfile(const char *path, void *user) +{ + struct btrfs_dump_send_args *r = user; + char full_path[PATH_MAX]; + int ret; + + path_cat_out_with_error("mkfile", full_path, r->full_subvol_path, path, + ret);
[PATCH 2/3] btrfs-progs: inspect: Introduce dump-send-stream subcommand
Introduce a new subcommand "dump-send-stream" for "inspect-internal" command group. It will exam and output all operations inside a send stream. It's quite a useful tool to learn and dig kernel send stream. Signed-off-by: Qu Wenruo--- Documentation/btrfs-inspect-internal.asciidoc | 8 +++ Makefile.in | 3 +- cmds-inspect-dump-send.c | 84 +++ cmds-inspect-dump-send.h | 24 cmds-inspect.c| 3 + 5 files changed, 121 insertions(+), 1 deletion(-) create mode 100644 cmds-inspect-dump-send.c create mode 100644 cmds-inspect-dump-send.h diff --git a/Documentation/btrfs-inspect-internal.asciidoc b/Documentation/btrfs-inspect-internal.asciidoc index 74f6dea..8ba768d 100644 --- a/Documentation/btrfs-inspect-internal.asciidoc +++ b/Documentation/btrfs-inspect-internal.asciidoc @@ -146,6 +146,14 @@ Print sizes and statistics of trees. -b Print raw numbers in bytes. +*dump-send-stream [options]*:: +exam and output all operations inside a send stream. Read from 'stdin' by default. ++ +`Options` ++ +-f|--file +Read from file instead of 'stdin'. + EXIT STATUS --- *btrfs inspect-internal* returns a zero exit status if it succeeds. Non zero is diff --git a/Makefile.in b/Makefile.in index 97caf95..00bf639 100644 --- a/Makefile.in +++ b/Makefile.in @@ -86,7 +86,8 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \ cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \ cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \ cmds-property.o cmds-fi-usage.o cmds-inspect-dump-tree.o \ - cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o + cmds-inspect-dump-super.o cmds-inspect-tree-stats.o \ + cmds-fi-du.o cmds-inspect-dump-send.o libbtrfs_objects = send-stream.o send-utils.o rbtree.o btrfs-list.o crc32c.o \ uuid-tree.o utils-lib.o rbtree-utils.o libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \ diff --git a/cmds-inspect-dump-send.c b/cmds-inspect-dump-send.c new file mode 100644 index 000..46a3cd6 --- /dev/null +++ b/cmds-inspect-dump-send.c @@ -0,0 +1,84 @@ +/* + * Copyright (C) 2016 Fujitsu. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program. + */ + +#include +#include +#include +#include +#include +#include +#include +#include "kerncompat.h" +#include "ctree.h" +#include "send-dump.h" +#include "send-stream.h" +#include "utils.h" +#include "commands.h" + +const char * const cmd_inspect_dump_send_usage[] = { + "btrfs inspect-internal dump-send-stream [options]", + "Dump the operations of a send stream from stdin", + "-f|--file read send stream from file", + "-h|--help print this message", + NULL +}; + +int cmd_inspect_dump_send(int argc, char **argv) +{ + struct btrfs_dump_send_args dump_args; + int fd = 0; + int ret; + + while (1) { + int c; + static const struct option long_options[] = { + { "file", required_argument, NULL, 'f' }, + { "help", no_argument, NULL, 'h' }, + { NULL, 0, NULL, 0 } + }; + + c = getopt_long(argc, argv, "f:h", long_options, NULL); + if (c < 0) + break; + switch(c) { + case 'f': + fd = open(optarg, O_RDONLY, 0666); + if (fd < 0) { + error("cannot open %s: %s", optarg, + strerror(errno)); + exit(1); + } + break; + case 'h': + default: + usage(cmd_inspect_dump_send_usage); + break; + } + } + dump_args.root_path = malloc(PATH_MAX); + dump_args.root_path[0] = '.'; + dump_args.root_path[1] = '\0'; + dump_args.full_subvol_path = malloc(PATH_MAX); + dump_args.full_subvol_path[0] = '.'; + dump_args.full_subvol_path[1] = '\0'; + + ret = btrfs_read_and_process_send_stream(fd, _print_send_ops, + _args, 0, 0); + if (ret < 0) +
[PATCH 3/3] btrfs-progs: Remove send-test tool
Since new "inspect dump-send" has better output and structure, it's time to remove old and function-weak send-test tool. New "inspect dump-send" can handle them all better. Signed-off-by: Qu Wenruo--- Makefile.in | 6 +- send-test.c | 447 2 files changed, 1 insertion(+), 452 deletions(-) delete mode 100644 send-test.c diff --git a/Makefile.in b/Makefile.in index 00bf639..5e4dd57 100644 --- a/Makefile.in +++ b/Makefile.in @@ -347,10 +347,6 @@ ioctl-test: $(objects) $(libs) ioctl-test.o @echo "[LD] $@" $(Q)$(CC) $(CFLAGS) -o ioctl-test $(objects) $(libs) ioctl-test.o $(LDFLAGS) $(LIBS) -send-test: $(objects) $(libs) send-test.o - @echo "[LD] $@" - $(Q)$(CC) $(CFLAGS) -o send-test $(objects) $(libs) send-test.o $(LDFLAGS) $(LIBS) - library-test: $(libs_shared) library-test.o @echo "[LD] $@" $(Q)$(CC) $(CFLAGS) -o library-test library-test.o $(LDFLAGS) -lbtrfs @@ -382,7 +378,7 @@ clean-all: clean clean-doc clean-gen clean: $(CLEANDIRS) @echo "Cleaning" $(Q)$(RM) -f $(progs) cscope.out *.o *.o.d \ - dir-test ioctl-test quick-test send-test library-test library-test-static \ + dir-test ioctl-test quick-test library-test library-test-static \ btrfs.static mkfs.btrfs.static \ $(check_defs) \ $(libs) $(lib_links) \ diff --git a/send-test.c b/send-test.c deleted file mode 100644 index 4645b89..000 --- a/send-test.c +++ /dev/null @@ -1,447 +0,0 @@ -/* - * Copyright (C) 2013 SUSE. All rights reserved. - * - * This code is adapted from cmds-send.c and cmds-receive.c, - * Both of which are: - * - * Copyright (C) 2012 Alexander Block. All rights reserved. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public - * License v2 as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public - * License along with this program; if not, write to the - * Free Software Foundation, Inc., 59 Temple Place - Suite 330, - * Boston, MA 021110-1307, USA. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * This should be compilable without the rest of the btrfs-progs - * source distribution. - */ -#if BTRFS_FLAT_INCLUDES -#include "send-utils.h" -#include "send-stream.h" -#else -#include -#include -#endif /* BTRFS_FLAT_INCLUDES */ - -static int pipefd[2]; -struct btrfs_ioctl_send_args io_send = {0, }; -static char *subvol_path; -static char *root_path; - -struct recv_args { - char *full_subvol_path; - char *root_path; -}; - -void usage(int error) -{ - printf("send-test \n"); - if (error) - exit(error); -} - -static int print_subvol(const char *path, const u8 *uuid, u64 ctransid, - void *user) -{ - struct recv_args *r = user; - char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; - - r->full_subvol_path = path_cat(r->root_path, path); - uuid_unparse(uuid, uuid_str); - - printf("subvol\t%s\t%llu\t%s\n", uuid_str, - (unsigned long long)ctransid, r->full_subvol_path); - - return 0; -} - -static int print_snapshot(const char *path, const u8 *uuid, u64 ctransid, - const u8 *parent_uuid, u64 parent_ctransid, - void *user) -{ - struct recv_args *r = user; - char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; - char parent_uuid_str[BTRFS_UUID_UNPARSED_SIZE]; - - r->full_subvol_path = path_cat(r->root_path, path); - uuid_unparse(uuid, uuid_str); - uuid_unparse(parent_uuid, parent_uuid_str); - - printf("snapshot\t%s\t%llu\t%s\t%llu\t%s\n", uuid_str, - (unsigned long long)ctransid, parent_uuid_str, - (unsigned long long)parent_ctransid, r->full_subvol_path); - - return 0; -} - -static int print_mkfile(const char *path, void *user) -{ - struct recv_args *r = user; - char *full_path = path_cat(r->full_subvol_path, path); - - printf("mkfile\t%s\n", full_path); - - free(full_path); - return 0; -} - -static int print_mkdir(const char *path, void *user) -{ - struct recv_args *r = user; - char *full_path = path_cat(r->full_subvol_path, path); - - printf("mkdir\t%s\n", full_path); - - free(full_path); - return 0; -} - -static int print_mknod(const char *path, u64 mode, u64 dev, void *user) -{ - struct recv_args *r = user; - char
Re: [PATCH 12/17] reflink: test cross-mountpoint reflink and dedupe
On Thu, Jul 21, 2016 at 04:47:32PM -0700, Darrick J. Wong wrote: > Test sharing blocks via reflink and dedupe between two different > mountpoints of the same filesystem. This shouldn't work, since > we don't allow cross-mountpoint functions. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/17] xfs/234: use scratch device helpers
On Thu, Jul 21, 2016 at 04:47:26PM -0700, Darrick J. Wong wrote: > Use the helper functions for scratch devices. This fixes a problem > where xfs/234 fails when there's a realtime device. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/17] common/dmerror: fix mount option issues
On Thu, Jul 21, 2016 at 04:47:13PM -0700, Darrick J. Wong wrote: > Calling _mount doesn't work when we want to add mount options > such as realtime devices. Since it's just a normal scratch device > mount except for the source device, just call _scratch_mount with > SCRATCH_DEV set to the dmerror device. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/17] xfs/179: use scratch device helpers
On Thu, Jul 21, 2016 at 04:47:19PM -0700, Darrick J. Wong wrote: > Use the helper functions for scratch devices. This fixes a problem > where xfs/179 fails when there's a realtime device. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/17] xfs/129: fix post-metadump remounting idiocy
On Thu, Jul 21, 2016 at 04:47:07PM -0700, Darrick J. Wong wrote: > Use the standard _scratch_mount to mount the filesystem from the restored > image, instead of trying to call mount directly. This is needed in case > we had custom mount options (like rtdev). > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/17] common/reflink: actually test dedupe on scratch device
On Thu, Jul 21, 2016 at 04:46:48PM -0700, Darrick J. Wong wrote: > In _require_scratch_dedupe, test the scratch device, not the testdev. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/17] xfs: run xfs_repair at the end of each test
On Thu, Jul 21, 2016 at 04:46:54PM -0700, Darrick J. Wong wrote: > Run xfs_repair twice at the end of each test -- once to rebuild > the btree indices, and again with -n to check the rebuild work. This looks fine to me in general, but shouldn't we have specific tests that test the rebuilding in a normal auto run? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/17] xfs/128: cycle_mount the scratch device, not the test device
On Thu, Jul 21, 2016 at 04:47:01PM -0700, Darrick J. Wong wrote: > This test uses the scratch device, so cycle that, not the test dev. > This is also a xfs_fsr test, so put it in the fsr group. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/17] xfs/122: list the new log redo items
On Thu, Jul 21, 2016 at 04:46:42PM -0700, Darrick J. Wong wrote: > List the new log redo items. These should have stable sizes. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/17] xfs/26[34]: remove duplicate tests
On Thu, Jul 21, 2016 at 04:46:21PM -0700, Darrick J. Wong wrote: > These two tests were accidentally double-added as xfs/30[78], but the > newer versions have fixed up helper usage and fewer whitespace > problems, so nuke the old tests. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/17] xfs: use rmapbt-checking helper
On Thu, Jul 21, 2016 at 04:46:29PM -0700, Darrick J. Wong wrote: > Don't open-code _notrun checks for the rmapbt, just use the helper. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/17] xfs/310: fix the size calculation for the huge device
On Thu, Jul 21, 2016 at 04:46:35PM -0700, Darrick J. Wong wrote: > Fix the calculation of the dmhuge size. The previous calculation > tried to calculate the size correctly, but got it wrong for 1k > block sizes. Therefore, clean the whole mess up. > > Signed-off-by: Darrick J. WongLooks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html