Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-23 Thread Jeff King
On Tue, Jan 23, 2018 at 12:45:53AM -0500, Theodore Ts'o wrote: > What I was thinking about instead is that in cases where we know we > are likely to be creating a large number of loose objects (whether > they referenced or not), in a world where we will be calling fsync(2) > after every single

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Theodore Ts'o
On Mon, Jan 22, 2018 at 07:47:10PM -0500, Jeff King wrote: > > I think Ævar is talking about the case of: > > 1. You make 100 objects that aren't referenced. They're loose. > > 2. You run git-gc. They're still too recent to be deleted. > > Right now those recent loose objects sit loose,

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Jeff King
On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote: > > Wouldn't it also make gc pruning more expensive? Now you can repack > > regularly and loose objects will be left out of the pack, and then just > > rm'd, whereas now it would entail creating new packs (unless the whole > > pack

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Jeff King
On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote: > > Yes, a "cruft pack" that holds unreachable object has been discussed > > a few times recently on list, and I do agree that it is a desirable > > thing to have in the longer run. > > > > What's tricky is to devise a way

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Theodore Ts'o
On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote: > > What's tricky is to devise a way to allow us to salvage objects that > > are placed in a cruft pack because they are accessed recently, > > proving themselves to be no longer crufts. It could be that a good > > way to

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-22 Thread Ævar Arnfjörð Bjarmason
On Sat, Jan 20 2018, Junio C. Hamano jotted: > Theodore Ts'o writes: > >> I've never been fond of the "git repack -A" behavior >> where it can generate huge numbers of loose files. I'd much prefer it >> if the other objects ended up in a separate pack file, and then some

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-21 Thread Chris Mason
On 01/18/2018 11:27 AM, Christoph Hellwig wrote: [adding Chris to the Cc list - this is about the awful ext3 data=ordered behavior of syncing the whole file system data and metadata on each fsync] On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: On Wed, Jan 17, 2018 at 3:52 PM,

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-20 Thread Junio C Hamano
Theodore Ts'o writes: > I've never been fond of the "git repack -A" behavior > where it can generate huge numbers of loose files. I'd much prefer it > if the other objects ended up in a separate pack file, and then some > other provision made for nuking that pack file some

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-20 Thread Theodore Ts'o
On Fri, Jan 19, 2018 at 11:08:46AM -0800, Junio C Hamano wrote: > So..., is it fair to say that the one you sent in > > https://public-inbox.org/git/20180117193510.ga30...@lst.de/ > > is the best variant we have seen in this thread so far? I'll keep > that in my inbox so that I do not forget,

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-19 Thread Junio C Hamano
Christoph Hellwig writes: > [adding Chris to the Cc list - this is about the awful ext3 data=ordered > behavior of syncing the whole file system data and metadata on each > fsync] > > On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: >> On Wed, Jan 17, 2018 at 3:52 PM,

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-18 Thread Christoph Hellwig
[adding Chris to the Cc list - this is about the awful ext3 data=ordered behavior of syncing the whole file system data and metadata on each fsync] On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: > On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o wrote: > > > > Well,

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o wrote: > > Well, let's be fair; this is something *ext3* got wrong, and it was > the default file system back them. I'm pretty sure reiserfs and btrfs did too.. Linus

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Theodore Ts'o
On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote: > > Now re-do the test while another process writes to a totally unrelated > a huge file (say, do a ISO file copy or something). > > That was the thing that several filesystems get completely and > horribly wrong. Generally

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 3:16 PM, Ævar Arnfjörð Bjarmason wrote: > > Or does overall FS activity and raw throughput (e.g. with an ISO copy) > matter more than general FS contention? Traditionally, yes. Also note that none of this is about "throughput". It's about waiting for a

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Ævar Arnfjörð Bjarmason
On Wed, Jan 17 2018, Linus Torvalds jotted: > On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason > wrote: >> >> I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered >> on the tests I thought might do a lot of loose object writes: >> >> $

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 2:07 PM, Linus Torvalds wrote: > > The original git design was very much to write each object file > without any syncing, because they don't matter since a new object file > - by definition - isn't really reachable. Then sync before writing

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason wrote: > > I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered > on the tests I thought might do a lot of loose object writes: > > $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux >

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Ævar Arnfjörð Bjarmason
On Wed, Jan 17 2018, Junio C. Hamano jotted: > Christoph Hellwig writes: > >> fsync is required for data integrity as there is no gurantee that >> data makes it to disk at any specified time without it. Even for >> ext3 with data=ordered mode the file system will only commit all

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 03:55:09PM -0500, Jeff King wrote: > I'm definitely sympathetic, and I've contemplated a patch like this a > few times. But I'm not sure we're "safe by default" here after this > patch. In particular: > > 1. This covers only loose objects. We generally sync pack writes >

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Jeff King
On Wed, Jan 17, 2018 at 07:48:28PM +0100, Christoph Hellwig wrote: > fsync is required for data integrity as there is no gurantee that > data makes it to disk at any specified time without it. Even for > ext3 with data=ordered mode the file system will only commit all > data at some point in

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Andreas Schwab
On Jan 17 2018, Christoph Hellwig wrote: > I've lost data on development machines with various times countless > times due to the lack of this option, and now lost trees on a Too many times. :-) Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 11:37:31AM -0800, Matthew Wilcox wrote: > How about this instead? > > This option is enabled by default and ensures data integrity by calling > fsync after writing object files. It is not necessary on filesystems > which journal data writes, but is still necessary on

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Matthew Wilcox
On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote: > Christoph Hellwig writes: > > @@ -866,10 +866,8 @@ core.whitespace:: > > core.fsyncObjectFiles:: > > This boolean will enable 'fsync()' when writing object files. > > + > > -This is a total waste of time and

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote: > I am somewhat sympathetic to the desire to flip the default to > "safe" and allow those who know they are already safe to tweak the > knob for performance, and it also makes sense to document that the > default is "true" here. But

Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Junio C Hamano
Christoph Hellwig writes: > fsync is required for data integrity as there is no gurantee that > data makes it to disk at any specified time without it. Even for > ext3 with data=ordered mode the file system will only commit all > data at some point in time that is not guaranteed.

[PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
fsync is required for data integrity as there is no gurantee that data makes it to disk at any specified time without it. Even for ext3 with data=ordered mode the file system will only commit all data at some point in time that is not guaranteed. I've lost data on development machines with

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-24 Thread Theodore Ts'o
On Tue, Jun 23, 2015 at 10:32:08PM -0700, Junio C Hamano wrote: Regarding loose object files, given that we write to a temporary, optionally fsync, close and then move to the final name, would we still see partially written file if we omit the fsync, or would the corruption be limited to

[PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Stefan Beller
Linus Torvalds started a discussion[1] if we want to play rather safe than use defaults which make sense only for the most power users of Git: So git is safe in the sense that you won't really lose any data, but you may well be inconvenienced. The fsync each object config option is there in

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Jeff King
On Tue, Jun 23, 2015 at 02:57:23PM -0700, Stefan Beller wrote: Linus Torvalds started a discussion[1] if we want to play rather safe than use defaults which make sense only for the most power users of Git: So git is safe in the sense that you won't really lose any data, but you may well

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Junio C Hamano
Jeff King p...@peff.net writes: I had always assumed this was fine on ext4 with data=ordered (i.e., either the rename and its pointed-to content will go through, or not; so you either get your update or the old state, but not a garbage or empty file). But it sounds from what Ted wrote in:

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Junio C Hamano
Theodore Ts'o ty...@mit.edu writes: The main issue is that non-expert users might not realize that they really need to run git fsck after a crash; otherwise, what might happen is that although git is only appending, that you might have some zero-length (or partially written) git object or

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Duy Nguyen
On Wed, Jun 24, 2015 at 4:57 AM, Stefan Beller sbel...@google.com wrote: Linus Torvalds started a discussion[1] if we want to play rather safe than use defaults which make sense only for the most power users of Git: So git is safe in the sense that you won't really lose any data, but you may

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Junio C Hamano
Stefan Beller sbel...@google.com writes: Linus Torvalds started a discussion[1] if we want to play rather safe than use defaults which make sense only for the most power users of Git: So git is safe in the sense that you won't really lose any data, but you may well be inconvenienced. The

Re: [PATCH] Enable core.fsyncObjectFiles by default

2015-06-23 Thread Theodore Ts'o
The main issue is that non-expert users might not realize that they really need to run git fsck after a crash; otherwise, what might happen is that although git is only appending, that you might have some zero-length (or partially written) git object or pack files, and while you might not notice