I use debmirror to maintain a local mirror of the Ubuntu repositories. A cron job runs a script I wrote every couple hours and it seems to work really well with one exception. Ubuntu moves their out-of-support releases to a new repository so that only the current releases of Ubuntu are in the main repo. The problem is that we still have a few people using these older releases, and I don't want to break their systems by dropping all these older files.
When you run debmirror it normally deletes all the files that no longer exist on the source mirror. This is good because it keeps your local mirror pruned, but bad in my case because it would delete all the files for the feisty and gutsy Ubuntu releases we still want. Our solution has so far been to run debmirror with the --nocleanup option so that it doesn't do this pruning. I'd like to improve this if possible. Right now I have three ideas: 1) Use a script to move all the old release files out of the main mirror directory before each update, run debmirror, and then move them all back. 2) Move all the old files to a new location on the filesystem. Run debmirror like normal, but after each create symbolic links in the mirror directory for all the old files. These links would need to be re-created after every debmirror run. 3) apt retrieves files from a debian mirror using HTTP. This means I could move all the old files to a new location and then perform some tricks with Apache so that requests for the old files get shunted to the new location while requests for current stuff goes to the normal mirror location. There are about 102,000 files associated with the two out-of-band Ubuntu releases we currently care about. There may eventually be more. The server running this is using the JFS filesystem. I'm not a big fan of the first option, but it would be the most straightforward. The second option is better but still involves re-creating hundreds of thousands of symlinks over and over. The third option certainly seems the cleanest, but I don't know how well Apache will handle having over 100,000 alias directives in the site configuration (you can't group them and do something with mod_rewrite since all the files are stored in a central location according to their name. See http://archive.ubuntu.com/ubuntu/pool/ if you're not sure what a Debian repository looks like). There might be a better way to go about this, but I haven't found one. debmirror does have an --ignore=regex option that prevents it from deleting files which match the regex, but with the number of files I'm dealing with that seems completely useless. Anyone have any thoughts? I've already written a little Python script that can load and parse the Packages files that lists all the packages in a specific repository, so that's not a big deal. My biggest concern with the first two methods is possible hidden gotchya's that might exist when doing that many recurring filesystem operations (almost a million per day). For the third, I don't know about Apache's mod_alias (or something better?) scalability. Thanks, Nick -------------------- BYU Unix Users Group http://uug.byu.edu/ The opinions expressed in this message are the responsibility of their author. They are not endorsed by BYU, the BYU CS Department or BYU-UUG. ___________________________________________________________________ List Info (unsubscribe here): http://uug.byu.edu/mailman/listinfo/uug-list
