Nice little performance improvement

2009-10-15 Thread Mike Connell
Hi, In my situation I'm using rsync to backup a server with (currently) about 570,000 files. These are all little files and maybe .1% of them change or new ones are added in any 15 minute period. I've split the main tree up so rsync can run on sub sub directories of the main tree. It does each

Re: Nice little performance improvement

2009-10-15 Thread Darryl Dixon - Winterhouse Consulting
> Hi, > > In my situation I'm using rsync to backup a server with (currently) about > 570,000 files. > These are all little files and maybe .1% of them change or new ones are > added in > any 15 minute period. > Hi Mike, We have three filesystems that between them have approx 22 million files, an

Re: Nice little performance improvement

2009-10-15 Thread Matt McCutchen
On Thu, 2009-10-15 at 19:07 -0700, Mike Connell wrote: > Today I tried the following: > > For all subsub directories > a) Fork a "du -s subsubdirectory" on the destination > subsubdirectory > b) Run rsync on the subsubdirectory > c) repeat untill done > > Seems to have improved the

Re: Nice little performance improvement

2009-10-15 Thread Mike Connell
Hi, In order to expeditiously move these new files offsite, we use a modified version of pyinotify to log all added/altered files across the entire filesystem(s) and then every five minutes feed the list to rsync with the --files-from option. This works very effectively and quickly. Interestin

Re: Nice little performance improvement

2009-10-17 Thread Mike Connell
Hi, Interesting. If you're not using incremental recursion (the default in rsync >= 3.0.0), I can see that the "du" would help by forcing the destination I/O to overlap the file-list building in time. But with incremental recursion, the "du" shouldn't be necessary because rsync actually overl

Re: Nice little performance improvement

2009-10-17 Thread Darryl Dixon - Winterhouse Consulting
> Hi, > >> In order to expeditiously move these new files offsite, we use a >> modified >> version of pyinotify to log all added/altered files across the entire >> filesystem(s) and then every five minutes feed the list to rsync with >> the >> --files-from option. This works very effectively and qu

Re: Nice little performance improvement

2009-10-17 Thread Mike Connell
No, not if the file cache isn't large enough for the number of files. E.g. if you have 20 million files and only 256MB RAM, it's likely a bad idea. Splitting down to the subsub (2-levels down) directory level allows a single subsub rsync to fit for me. Warming the cache is beneficial here, I d

Re: Nice little performance improvement

2009-10-17 Thread Jamie Lokier
Mike Connell wrote: > > Hi, > > >Interesting. If you're not using incremental recursion (the default in > >rsync >= 3.0.0), I can see that the "du" would help by forcing the > >destination I/O to overlap the file-list building in time. But with > >incremental recursion, the "du" shouldn't be ne

Re: Nice little performance improvement

2009-10-20 Thread Matt McCutchen
On Sat, 2009-10-17 at 12:13 -0700, Mike Connell wrote: > > Interesting. If you're not using incremental recursion (the default in > > rsync >= 3.0.0), I can see that the "du" would help by forcing the > > destination I/O to overlap the file-list building in time. But with > > incremental recursio