I use debmirror to maintain a local mirror of the Ubuntu repositories.
 A cron job runs a script I wrote every couple hours and it seems to
work really well with one exception.  Ubuntu moves their
out-of-support releases to a new repository so that only the current
releases of Ubuntu are in the main repo.  The problem is that we still
have a few people using these older releases, and I don't want to
break their systems by dropping all these older files.

When you run debmirror it normally deletes all the files that no
longer exist on the source mirror.  This is good because it keeps your
local mirror pruned, but bad in my case because it would delete all
the files for the feisty and gutsy Ubuntu releases we still want.  Our
solution has so far been to run debmirror with the --nocleanup option
so that it doesn't do this pruning.  I'd like to improve this if
possible.  Right now I have three ideas:

1) Use a script to move all the old release files out of the main
mirror directory before each update, run debmirror, and then move them
all back.

2) Move all the old files to a new location on the filesystem.  Run
debmirror like normal, but after each create symbolic links in the
mirror directory for all the old files.  These links would need to be
re-created after every debmirror run.

3) apt retrieves files from a debian mirror using HTTP.  This means I
could move all the old files to a new location and then perform some
tricks with Apache so that requests for the old files get shunted to
the new location while requests for current stuff goes to the normal
mirror location.

There are about 102,000 files associated with the two out-of-band
Ubuntu releases we currently care about.  There may eventually be
more.   The server running this is using the JFS filesystem.

I'm not a big fan of the first option, but it would be the most
straightforward.  The second option is better but still involves
re-creating hundreds of thousands of symlinks over and over.  The
third option certainly seems the cleanest, but I don't know how well
Apache will handle having over 100,000 alias directives in the site
configuration (you can't group them and do something with mod_rewrite
since all the files are stored in a central location according to
their name.  See http://archive.ubuntu.com/ubuntu/pool/ if you're not
sure what a Debian repository looks like).

There might be a better way to go about this, but I haven't found one.
 debmirror does have an --ignore=regex option that prevents it from
deleting files which match the regex, but with the number of files I'm
dealing with that seems completely useless.

Anyone have any thoughts?  I've already written a little Python script
that can load and parse the Packages files that lists all the packages
in a specific repository, so that's not a big deal.  My biggest
concern with the first two methods is possible hidden gotchya's that
might exist when doing that many recurring filesystem operations
(almost a million per day).  For the third, I don't know about
Apache's mod_alias (or something better?) scalability.

Thanks,

Nick
--------------------
BYU Unix Users Group 
http://uug.byu.edu/ 

The opinions expressed in this message are the responsibility of their
author.  They are not endorsed by BYU, the BYU CS Department or BYU-UUG. 
___________________________________________________________________
List Info (unsubscribe here): http://uug.byu.edu/mailman/listinfo/uug-list

Reply via email to