On Fri, Aug 20, 2010 at 12:36 AM, Bernie Innocenti <ber...@codewiz.org> wrote:
>  # du -sh  --exclude datastore-200* /library/backup
>  92G    /library/backup
>
> So, backing up the last versions of all journals would take "just" 92GB,
> which would take more that 4 days on a 2mbit link for the initial
> backup.

Why do you have to say exclude datastore? du should be smart enough to
recognise the hardlinking strategy.

And I think we can make it much better, if it's true that many large
files are present in many users' Journals.

There are few "find identical files and hardlink them" scripts out
there. The "best" ones I've spotted are memory-bound, and stupidly
hash every damn file...and will just make a mess on a busy XS.

[ Still, maybe you can test it overnight with one of these
un-optimised scripts... ]

Maytbe we can take one and rework it so that it works in several passes

1 - Run find -type f, find all files and store them into 'buckets' by
size in bytes, storing only full path and inode. Find a smart way to
avoid being memory-bound...

2 - any buckets with only one member, discard

3 - in each bucket
     - group by inode -- those are already hardlinked together
     - hash each distinct inode
     - coalesce hardlinks to lowest-numbered inode with identical hash...

Each bucket fits in memory so step 3 can be done in mem...

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
_______________________________________________
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Reply via email to