[Vserver] [PATCH] Add support for FAI build method
The attached patch adds support for a FAI build method to util-vserver. It is of proof-of-concept quality; it works, but could do with a bit of refactoring. This patch requires the vserver branch of FAI, browsable online at http://svn.debian.org/wsvn/fai/people/mugwump/vserver/ Detection of this branch is still TO-DO. --- Makefile.in |1 scripts/Makefile-files|1 scripts/vserver-build |6 + scripts/vserver-build.fai | 182 ++ 4 files changed, 189 insertions(+), 1 deletion(-) #! /bin/sh /usr/share/dpatch/dpatch-run ## 22_fai.dpatch by Sam Vilain [EMAIL PROTECTED] ## ## All lines beginning with `## DP:' are a description of the patch. ## DP: No description. @DPATCH@ diff -urN util-vserver-0.30.210.orig/Makefile.in util-vserver-0.30.210/Makefile.in --- util-vserver-0.30.210.orig/Makefile.in 2006-06-20 17:31:04.0 +1200 +++ util-vserver-0.30.210/Makefile.in 2006-06-20 17:30:02.0 +1200 @@ -1916,6 +1916,7 @@ scripts/vserver-build.apt-rpm \ scripts/vserver-build.skeleton \ scripts/vserver-build.debootstrap \ + scripts/vserver-build.fai \ scripts/vserver-build.rpm \ scripts/vserver-build.yum \ scripts/vserver-build.functions \ --- util-vserver-0.30.210.orig/scripts/Makefile-files 2006-06-20 17:31:04.0 +1200 +++ util-vserver-0.30.210/scripts/Makefile-files2006-06-20 17:30:02.0 +1200 @@ -41,6 +41,7 @@ scripts/vserver-build.apt-rpm \ scripts/vserver-build.skeleton \ scripts/vserver-build.debootstrap \ + scripts/vserver-build.fai \ scripts/vserver-build.rpm \ scripts/vserver-build.yum \ scripts/vserver-build.functions \ diff -urN util-vserver-0.30.210.orig/scripts/vserver-build util-vserver-0.30.210/scripts/vserver-build --- util-vserver-0.30.210.orig/scripts/vserver-build2006-06-20 17:31:04.0 +1200 +++ util-vserver-0.30.210/scripts/vserver-build 2006-06-20 17:30:02.0 +1200 @@ -66,6 +66,10 @@ configuration file and calls an optional command then debootstrap ... -- -d distribution [-m mirror] [-s script ] [-- debootstrap-options*] bootstraps the vserver with Debian's 'debootstrap' package +fai ... -- -f fai_vserver [-n nfsroot] [-d fai_dir ] [ -a ] + bootstraps the vserver with Debian Fully Automatic Installation + -a means skip confirmation prompt and final shell. + Please report bugs to $PACKAGE_BUGREPORT exit 0 @@ -126,7 +130,7 @@ case x$method in (xlegacy) exec $_VSERVER_LEGACY $VSERVER_NAME build $@ ;; -(xapt-rpm|xcopy|xskeleton|xdebootstrap|xyum|xrpm) +(xapt-rpm|xcopy|xskeleton|xdebootstrap|xyum|xrpm|xfai) . $__PKGLIBDIR/vserver-build.$method ;; (x)panic $No build-method specified;; diff -urN util-vserver-0.30.210.orig/scripts/vserver-build.fai util-vserver-0.30.210/scripts/vserver-build.fai --- util-vserver-0.30.210.orig/scripts/vserver-build.fai1970-01-01 12:00:00.0 +1200 +++ util-vserver-0.30.210/scripts/vserver-build.fai 2006-06-20 17:30:18.0 +1200 @@ -0,0 +1,182 @@ +# +# Copyright (C) 2006 Sam Vilain [EMAIL PROTECTED] +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +tmp=$(getopt -o '+d:+f:+n:+a' --long debug,pkgmgmt -n $0 -- $@) || exit 1 +eval set -- $tmp + +. $_LIB_VSERVER_BUILD_FUNCTIONS_PKGMGMT + +DISTRIBUTION=: + +FAI_VSERVER= +FAI_NFSROOT=/usr/lib/fai/nfsroot +FAI_DIR=/usr/local/share/fai + +use_pkgmgmt= +while true; do +case $1 in + -f) FAI_VSERVER=$2; shift; ;; + -n) FAI_NFSROOT=$2; shift; ;; + -d) FAI_DIR=$2; shift; ;; + -a) AUTO=1; ;; + --debug)DEBUG=1; SH_DEBUG=-x; set -x;; + --) shift; break ;; + *) echo vserver-build.fai: internal error: unrecognized option
[Vserver] [PATCH] scripts/vserver-build: fix documentation for fai build method
The documentation was a little out of date to what the command supports. This makes the entry for this build method a little long, but at least it's there. --- vserver-build |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- scripts/vserver-build.orig 2006-09-27 11:41:48.0 +1200 +++ scripts/vserver-build 2006-09-27 11:44:50.0 +1200 @@ -66,8 +66,12 @@ configuration file and calls an optional command then debootstrap ... -- -d distribution [-m mirror] [-s script ] [-- debootstrap-options*] bootstraps the vserver with Debian's 'debootstrap' package -fai ... -- -f fai_vserver [-n nfsroot] [-d fai_dir ] [ -a ] +fai ... -- [ -f fai_vserver ] [-n nfsroot] [-d fai_dir ] [ -a ] bootstraps the vserver with Debian Fully Automatic Installation + -f means use the nfsroot and profile in the vserver fai_vserver + -n nfsroot specifies the 'NFS' root explicitly + -d fai_dir specifies the location of the FAI profile + the -f option implies -n and -d are relative to the fai_vserver -a means skip confirmation prompt and final shell. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] localhost oddity on vserver host
Herbert Poetzl wrote: basically I do not see a good reason for assigning 127.x.x.x to a guest, but if you have to, then try to choose different ones, e.g. 127.0.0.2, 127.0.0.3 ... Does that work with ssh port forwarding? I ran into this problem when I tried that: http://sources.redhat.com/ml/libc-alpha/2002-10/msg00045.html Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] whole subnet on vServer / performance
Herbert Poetzl wrote: On Tue, May 30, 2006 at 04:26:10PM +0200, Oliver Welter wrote: Hi Folks, there was a disussion some time ago here on multiple IPs assigned to one vServer. I now have the need to assign a 32 Address net to one guest - anybody here did some tests on this or can give me a go/nogo for this ? well, it should not be _that_ bad, but it will be roughly 16 times slower than with a two IPs ITYM, the average speed decrease expected is approximately 16 times the speed decrease seen with two IPs vs. one IP. Which is still probably very little - has anyone been able / inclined to measure it, yet? btw, this is a limitation which will fall in the _very_ near future ... ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] /vservers as an nfs mount?
Chuck wrote: we are completely restructuring our entire physical network around the vserver concept. it has proven itself in stability and performance in production to the point we no longer see the need for dedicated servers except in the most demanding instances (mostly our email server which cannot be run as a guest until there is no slow down using 130 ip addresses). in our network restructuring, we wish to use our large storage nfs system and place all the vserver guests on that sharing those directories to be mounted on the proper dual opteron machine front end as /vservers. i am seriously thinking of also making /etc/vservers an nfs mount so that each host configuration and guests live in a particular area on the nfs to make switching machines a breeze if so needed. does anyone see a problem with this idea? we will be using dual GB nics into this nfs system in a pvtnet from each machine to facilitate large amounts of data flow. public ip space will still use 100mb nics. if this can work efficiently (most of our guests are not disk i/o bound.. those with ultra heavy disk i/o will live on each front end machine), we can consolidate more than 100 machines into 2 front end machines and one SAN system. This would free enough rack space that if we don't need any dedicated machines in the future we could easily add more than 1500 servers in host/guest config in the same space 100 took up. it would also hugely simplify backups and drop our electric bill in half or more. Nice idea, certainly NFS is right for /etc/vservers, but consider using a network block device, like iSCSI or ATA over Ethernet for the filesystems used by vservers themselves. You'll save yourself a lot of headaches and the thing will probably run a *lot* faster. Unification would be impractical on top of all of this, but this is probably not a huge problem. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Issues of Security
Oliver Welter wrote: Hi Manish, Has anybody done any work or study on security of vserver. What are the possible security downsides and possible areas of attack on vserver both from other vservers on the same host and from external agent. Any pointers on this would be very helpful. Thanks, I havent done a study, but from the basic idea behind vserver following issues are relevant: * if we assume, the context isolation works without errors, the risk for guest - guest attacks is equal to physical independent server That's quite a foolish assumption. It's probably safe to assume by default that local exploit vulnerabilities now affect your cross-vserver cluster, unless proven otherwise. Let's not be cavalier about this. No-one has done the full auditing yet to prove that any of the single-kernel virtualisation systems are truly isolated. As far as we know, all the holes have been closed - certainly since the /proc filtering the level of uncertainty has dropped sharply - but there are still an awful lot of entry points into the kernel, each of which might have a security bug. Sam. * for non root users it is impossible to attack a guest from the host side * it IS possible - and with a faulty setup very likely - that a raising need for ressources (IO, mem, network) of a guest affects the other guests - as they share the same physikal maschine. The scheduler concept might help here *If there is a flaw in the isolation code of vserver OR someone manages to exploit a kernel bug to load some modules from inside a guest, all of the above is no longer true. I dont know if anybody here has practical results on this As I dont know what you mean with external agents I cant help you on this. If you simply mean attacks from outside, vserver is not more vulnerable like any other system. A bad setup of some services might enable an attacker to take over the guest with root privs, but even in this case he will not have that much fun, as a lot of things are not allowed inside a guest. E.g. he cant spawn new IPs, compromise your kernel, etc. This behaviour can be improved by tailoring the capabilities of the guest. HTH Oliver ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] [PATCH 00/10] Honour per-vfsmount mount options
Sam Vilain wrote: This patchset allows per-VFS mount options, such as noatime, nodiratime, and in partitular, read-only. ie, `mount -o ro --bind` can work with this patch. This is the invention of Herbert Pötzl. So, here's what's new; 1. more parts Even more fine grained :-) 2. longer descriptions Following the TPP document to the letter, and I hope my descriptive re-expressions of concept improve chances of acceptance and do not seem too long. 3. patch 8 is a new piece, though I see now from http://xrl.us/j8dk that this was not an accidental omission. Still outstanding changes before next LKML submission; 1. add file_readonly() helper - Christopher Hellwig http://xrl.us/j8ck 2. audit uses of permission() - Trond Myklebust (second audit requirement now not relevant) http://xrl.us/j8ck a simple find . \( -name \*.h -o -name \*.c \) -print | xargs grep 'permission *(' | grep -w permission | grep NULL happens to catch all of these. They happen in: fs/hpfs/namei.c fs/nfsd/vfs.c (twice) fs/nfsd/nfsfh.c fs/namei.c fs/xattr.c ipc/mqueue.c 3. split out am I allowed to write to FS from permission() - Christopher Hellwig http://xrl.us/j8cu This seems to be similar to what Trond is getting at, that nameidata is optional to permission, and so therefore it seems like the wrong place to put it. This might be seen to apply to all of the changes in part 7 of this series. Anyway, I'm putting this one down for a few days so as not to step on your toes - I hope this helps your efforts, Herbert. If you want to download them, you can get them from: http://vserver.utsl.gen.nz/patches/utsl/2.6.16-rc4-BME/ (.tar.gz one level up) Having said that, it's very easy for me to apply small changes and regenerate the set, so let me know if you want me to help in this way. In any case, this effort has been thoroughly enlightening for me as to how the submission process works, and I hope it will help me as I now focus my work on features in the main patch. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] [PATCH 0/5] Bind Mount Extensions
Herbert Poetzl wrote: This is the invention of Herbert Pötzl. (sent to the Linux-VServer list as a 'dry run', and to give Herbert a chance to veto/comment) are there any changes to this one? http://lkml.org/lkml/2006/1/21/19 (except for possible updates to newer kernels?) Heh, well, that's a pretty good reason to veto it :). I should have searched the archives to look for an earlier submission of yours to compare it to. Looking through the archives, I see you've also submitted the Quota Hash abstraction, any more? In terms of actual changes, I did not split up the individual changes to the VFS API, that's one combined patch in this set. I'll import your submission to my repository and see if I can prepare a patch that contains the best of of our parallel effort. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Unifying Gentoo Guests
Oliver Welter wrote: eergh - it seems that vunify does not support gentoo guest. Anyone here can help me out ? Implementing the 'get-conffiles' operation for the 'gentoo' case in 'scripts/vpkg' should help. I do not know gentoo enough to develop it myself. As gentoo hast no binary packages and the result of compilation depends on LOTS of flags I see no way to make this... My unify-dirs script is completely ambivalent to the packaging system in use by the installed systems. So long as the files have the same contents, permissions, ownership and relative location, they will be unified. However it does currently rely on a 'legacy' ioctl. http://vserver.utsl.gen.nz/scripts/unify-dirs (to be included with non-legacy support in Linux::VServer) Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Unifying Gentoo Guests
Herbert Poetzl wrote: My unify-dirs script is completely ambivalent to the packaging system in use by the installed systems. So long as the files have the same contents, permissions, ownership and relative location, they will be unified. However it does currently rely on a 'legacy' ioctl. hmm .. so it uses legacy ioctls, but is intended for development releases (with CoW LB) I presume, as you would not want to unify everything without that ... No, it's the classic immulink approach only, no CoW knowledge. You would normally only use it to unify the directories where there are shared libraries and binaries, eg with unify-dirs -il /vservers/*/usr unify-dirs -il /vservers/*/lib unify-dirs -il /vservers/*/bin unify-dirs -il /vservers/*/sbin Otherwise you will end up with /etc all immutable, which is a bit of a PITA (although most editors will deal with it OK). http://vserver.utsl.gen.nz/scripts/unify-dirs (to be included with non-legacy support in Linux::VServer) I guess vhashify is doing similar and probably much better, as it doesn't care about the file location (i.e. the hash value is sufficient) Right. Well, I thought they posted problems with that script needing package database awareness, too, so I thought I'd mention my script. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 04/10] vfs: make show_vfsmnt show per-vfsmount options correctly
From: Herbert Pötzl [EMAIL PROTECTED] Previously, some mount options were per-mount, and some per-vfsmount, and never the twain did meet. As now there is a mix of options that are now both per-vfsmount and per-mount, we refactor this function to deal with it correctly. This is implemented by combining the two previous source-embedded relations to a single one. With one column for the mount flag (MS_*), and one for the VFS mount flag (MNT_*), we end up with an extra column. Additionally, to avoid ro vs rw being a special case, another column is added to the relation to represent strings that are printed in the options output column when the bit is unset. Acked-by: Sam Vilain [EMAIL PROTECTED] --- fs/namespace.c | 49 ++--- 1 files changed, 26 insertions(+), 23 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index b12ea35..f3191d8 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -354,37 +354,40 @@ static int show_vfsmnt(struct seq_file * struct vfsmount *mnt = v; int err = 0; static struct proc_fs_info { - int flag; - char *str; + int s_flag; + int mnt_flag; + char *set_str; + char *unset_str; } fs_info[] = { - { MS_SYNCHRONOUS, ,sync }, - { MS_DIRSYNC, ,dirsync }, - { MS_MANDLOCK, ,mand }, - { 0, NULL } + { MS_RDONLY, MNT_RDONLY, ro, rw }, + { MS_SYNCHRONOUS, 0, ,sync, NULL }, + { MS_DIRSYNC, 0, ,dirsync, NULL }, + { MS_MANDLOCK, 0, ,mand, NULL }, + { MS_NOATIME, MNT_NOATIME, ,noatime, NULL }, + { MS_NODIRATIME, MNT_NODIRATIME, ,nodiratime, NULL }, + { 0, MNT_NOSUID, ,nosuid, NULL }, + { 0, MNT_NODEV, ,nodev, NULL }, + { 0, MNT_NOEXEC, ,noexec, NULL }, + { 0, 0, NULL, NULL } }; - static struct proc_fs_info mnt_info[] = { - { MNT_NOSUID, ,nosuid }, - { MNT_NODEV, ,nodev }, - { MNT_NOEXEC, ,noexec }, - { MNT_NOATIME, ,noatime }, - { MNT_NODIRATIME, ,nodiratime }, - { 0, NULL } - }; - struct proc_fs_info *fs_infop; + struct proc_fs_info *p; + unsigned long s_flags = mnt-mnt_sb-s_flags; + int mnt_flags = mnt-mnt_flags; mangle(m, mnt-mnt_devname ? mnt-mnt_devname : none); seq_putc(m, ' '); seq_path(m, mnt, mnt-mnt_root, \t\n\\); seq_putc(m, ' '); mangle(m, mnt-mnt_sb-s_type-name); - seq_puts(m, mnt-mnt_sb-s_flags MS_RDONLY ? ro : rw); - for (fs_infop = fs_info; fs_infop-flag; fs_infop++) { - if (mnt-mnt_sb-s_flags fs_infop-flag) - seq_puts(m, fs_infop-str); - } - for (fs_infop = mnt_info; fs_infop-flag; fs_infop++) { - if (mnt-mnt_flags fs_infop-flag) - seq_puts(m, fs_infop-str); + seq_putc(m, ' '); + for (p = fs_info; (p-s_flag | p-mnt_flag) ; p++) { + if ((s_flags p-s_flag) || (mnt_flags p-mnt_flag)) { + if (p-set_str) + seq_puts(m, p-set_str); + } else { + if (p-unset_str) + seq_puts(m, p-unset_str); + } } if (mnt-mnt_sb-s_op-show_options) err = mnt-mnt_sb-s_op-show_options(m, mnt); ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 01/10] vfs: propagate mnt_flags into do_loopback/vfsmount
From: Herbert Pötzl [EMAIL PROTECTED] Previously, the do_loopback function took a single bit only out of the mnt_flags in do_mount - the recurse bit. We want to allow per-vfsmount flags, so to enable this we pass the mnt_flags into do_loopback(). Add an extra parameter to the do_loopback function that sets up a bind mount, that allows the value of vfsmount.mnt_flags to be set. Signed-off-by: Herbert Pötzl [EMAIL PROTECTED] Acked-by: Christoph Hellwig [EMAIL PROTECTED] Acked-by: Sam Vilain [EMAIL PROTECTED] --- fs/namespace.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 058a448..1094e54 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -861,11 +861,13 @@ static int do_change_type(struct nameida /* * do loopback mount. */ -static int do_loopback(struct nameidata *nd, char *old_name, int recurse) +static int do_loopback(struct nameidata *nd, char *old_name, + unsigned long flags, int mnt_flags) { struct nameidata old_nd; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); + int recurse = flags MS_REC; if (err) return err; if (!old_name || !*old_name) @@ -891,6 +893,8 @@ static int do_loopback(struct nameidata if (!mnt) goto out; + mnt-mnt_flags = mnt_flags; + err = graft_tree(mnt, nd); if (err) { LIST_HEAD(umount_list); @@ -1312,7 +1316,7 @@ long do_mount(char *dev_name, char *dir_ retval = do_remount(nd, flags ~MS_REMOUNT, mnt_flags, data_page); else if (flags MS_BIND) - retval = do_loopback(nd, dev_name, flags MS_REC); + retval = do_loopback(nd, dev_name, flags, mnt_flags); else if (flags (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE)) retval = do_change_type(nd, flags); else if (flags MS_MOVE) ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 05/10] vfs: propagate vfsmount into chown_common()
From: Herbert Pötzl [EMAIL PROTECTED] Previously, chown_common just deals in dentry structures, which are not connected to the mount whence they came. As the permissions for the dentry may now vary depending on which VFS point the dentry came from, the vfsmount struct is propagated into chown_common() to allow for vfsmount based checks there. An accompanying MNT_IS_RDONLY() check inside chown_common rejects changes to the dentry. Acked-by: Sam Vilain [EMAIL PROTECTED] --- fs/open.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/open.c b/fs/open.c index 70e0230..8632721 100644 --- a/fs/open.c +++ b/fs/open.c @@ -686,7 +686,8 @@ asmlinkage long sys_chmod(const char __u return sys_fchmodat(AT_FDCWD, filename, mode); } -static int chown_common(struct dentry * dentry, uid_t user, gid_t group) +static int chown_common(struct dentry *dentry, struct vfsmount *mnt, + uid_t user, gid_t group) { struct inode * inode; int error; @@ -698,7 +699,7 @@ static int chown_common(struct dentry * goto out; } error = -EROFS; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || MNT_IS_RDONLY(mnt)) goto out; error = -EPERM; if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) @@ -728,7 +729,7 @@ asmlinkage long sys_chown(const char __u error = user_path_walk(filename, nd); if (!error) { - error = chown_common(nd.dentry, user, group); + error = chown_common(nd.dentry, nd.mnt, user, group); path_release(nd); } return error; @@ -761,7 +762,7 @@ asmlinkage long sys_lchown(const char __ error = user_path_walk_link(filename, nd); if (!error) { - error = chown_common(nd.dentry, user, group); + error = chown_common(nd.dentry, nd.mnt, user, group); path_release(nd); } return error; @@ -775,7 +776,7 @@ asmlinkage long sys_fchown(unsigned int file = fget(fd); if (file) { - error = chown_common(file-f_dentry, user, group); + error = chown_common(file-f_dentry, file-f_vfsmnt, user, group); fput(file); } return error; ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 08/10] vfs: make touch/atime functions correctly regard vfsmount RO flag
Previously, only the inode's idea of a read-only flag was considered for touch_atime() and file_update_time(). Add a call to the MNT_IS_RDONLY macro to correctly exclude this update on read-only vfsmounts. --- fs/inode.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index d0be615..2d26a4e 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1190,7 +1190,7 @@ void touch_atime(struct vfsmount *mnt, s struct inode *inode = dentry-d_inode; struct timespec now; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || MNT_IS_RDONLY(mnt)) return; if ((inode-i_flags S_NOATIME) || @@ -1235,7 +1235,7 @@ void file_update_time(struct file *file) if (IS_NOCMTIME(inode)) return; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || MNT_IS_RDONLY(file-f_vfsmnt)) return; now = current_fs_time(inode-i_sb); ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 02/10] vfs: add MNT_RDONLY flag and accompanying MNT_IS_RDONLY macro
From: Herbert Pötzl [EMAIL PROTECTED] Previously, it was down the the filesystem to know whether it is mounted read-only, and was of no interest to the VFS. We want per-mount read-only. Add an extra flag to the list of per-vfsmount flags for read-only, and a macro to test it. Also, make sure the new flag is put in the right place to make it to do_loopback. It is also by side effect passed to do_remount and do_new_mount, it is of little discernable effect as the underlying mount will be read-only. Acked-by: Sam Vilain [EMAIL PROTECTED] --- fs/namespace.c|2 ++ include/linux/mount.h |3 +++ 2 files changed, 5 insertions(+), 0 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 1094e54..b12ea35 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1289,6 +1289,8 @@ long do_mount(char *dev_name, char *dir_ ((char *)data_page)[PAGE_SIZE - 1] = 0; /* Separate the per-mountpoint flags */ + if (flags MS_RDONLY) + mnt_flags |= MNT_RDONLY; if (flags MS_NOSUID) mnt_flags |= MNT_NOSUID; if (flags MS_NODEV) diff --git a/include/linux/mount.h b/include/linux/mount.h index b7472ae..c485c2b 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -22,6 +22,9 @@ #define MNT_NOEXEC 0x04 #define MNT_NOATIME0x08 #define MNT_NODIRATIME 0x10 +#define MNT_RDONLY 0x20 + +#define MNT_IS_RDONLY(m) ((m) ((m)-mnt_flags MNT_RDONLY)) #define MNT_SHARED 0x1000 /* if the vfsmount is a shared mount */ #define MNT_UNBINDABLE 0x2000 /* if the vfsmount is a unbindable mount */ ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 03/10] vfs: fix consistency of IS_RDONLY macro
From: Herbert Pötzl [EMAIL PROTECTED] The IS_RDONLY macro is defined inconsistently. Tidy up. Acked-by: Sam Vilain [EMAIL PROTECTED] --- include/linux/fs.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index e059da9..2a0866a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -150,7 +150,7 @@ extern int dir_notify_enable; */ #define __IS_FLG(inode,flg) ((inode)-i_sb-s_flags (flg)) -#define IS_RDONLY(inode) ((inode)-i_sb-s_flags MS_RDONLY) +#define IS_RDONLY(inode) __IS_FLG(inode, MS_RDONLY) #define IS_SYNC(inode) (__IS_FLG(inode, MS_SYNCHRONOUS) || \ ((inode)-i_flags S_SYNC)) #define IS_DIRSYNC(inode) (__IS_FLG(inode, MS_SYNCHRONOUS|MS_DIRSYNC) || \ ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 00/10] Honour per-vfsmount mount options
This patchset allows per-VFS mount options, such as noatime, nodiratime, and in partitular, read-only. ie, `mount -o ro --bind` can work with this patch. This is the invention of Herbert Pötzl. -- Sam Vilain, Catalyst IT (NZ) Ltd. http://www.catalyst.net.nz/ phone: +64 4 499 2267PGP ID: 0x66B25843 ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 10/10] fs: check mount permissions with MNT_IS_RDONLY() when using IS_RDONLY()
From: Herbert Pötzl [EMAIL PROTECTED] Previously, permissions were only per-inode. As permissions may now be per-vfsmount, any previous checks for inodes being read-only via the IS_RDONLY() macro and not via VFS functions like permission() must now be extended to also check the mount permissions with MNT_IS_RDONLY(). Acked-by: Sam Vilain [EMAIL PROTECTED] --- arch/sparc64/solaris/fs.c |4 ++-- fs/ext2/ioctl.c |7 +-- fs/ext3/ioctl.c | 16 +++- fs/hfsplus/ioctl.c|3 ++- fs/nfs/dir.c |3 ++- fs/nfsd/vfs.c |6 -- fs/reiserfs/ioctl.c |7 +-- 7 files changed, 31 insertions(+), 15 deletions(-) diff --git a/arch/sparc64/solaris/fs.c b/arch/sparc64/solaris/fs.c index 4885ca6..612477d 100644 --- a/arch/sparc64/solaris/fs.c +++ b/arch/sparc64/solaris/fs.c @@ -363,7 +363,7 @@ static int report_statvfs(struct vfsmoun int j = strlen (p); if (j 15) j = 15; - if (IS_RDONLY(inode)) i = 1; + if (IS_RDONLY(inode) || MNT_IS_RDONLY(mnt)) i = 1; if (mnt-mnt_flags MNT_NOSUID) i |= 2; if (!sysv_valid_dev(inode-i_sb-s_dev)) return -EOVERFLOW; @@ -399,7 +399,7 @@ static int report_statvfs64(struct vfsmo int j = strlen (p); if (j 15) j = 15; - if (IS_RDONLY(inode)) i = 1; + if (IS_RDONLY(inode) || MNT_IS_RDONLY(mnt)) i = 1; if (mnt-mnt_flags MNT_NOSUID) i |= 2; if (!sysv_valid_dev(inode-i_sb-s_dev)) return -EOVERFLOW; diff --git a/fs/ext2/ioctl.c b/fs/ext2/ioctl.c index 3ca9afd..8f1c64a 100644 --- a/fs/ext2/ioctl.c +++ b/fs/ext2/ioctl.c @@ -11,6 +11,7 @@ #include linux/capability.h #include linux/time.h #include linux/sched.h +#include linux/mount.h #include asm/current.h #include asm/uaccess.h @@ -30,7 +31,8 @@ int ext2_ioctl (struct inode * inode, st case EXT2_IOC_SETFLAGS: { unsigned int oldflags; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) @@ -69,7 +71,8 @@ int ext2_ioctl (struct inode * inode, st case EXT2_IOC_SETVERSION: if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) return -EPERM; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if (get_user(inode-i_generation, (int __user *) arg)) return -EFAULT; diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c index 556cd55..2b83f7e 100644 --- a/fs/ext3/ioctl.c +++ b/fs/ext3/ioctl.c @@ -8,6 +8,7 @@ */ #include linux/fs.h +#include linux/mount.h #include linux/jbd.h #include linux/capability.h #include linux/ext3_fs.h @@ -36,7 +37,8 @@ int ext3_ioctl (struct inode * inode, st unsigned int oldflags; unsigned int jflag; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) @@ -112,7 +114,8 @@ flags_err: if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) return -EPERM; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if (get_user(generation, (int __user *) arg)) return -EFAULT; @@ -166,7 +169,8 @@ flags_err: if (!test_opt(inode-i_sb, RESERVATION) ||!S_ISREG(inode-i_mode)) return -ENOTTY; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) @@ -201,7 +205,8 @@ flags_err: if (!capable(CAP_SYS_RESOURCE)) return -EPERM; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if (get_user(n_blocks_count, (__u32 __user *)arg)) @@ -222,7 +227,8 @@ flags_err: if (!capable(CAP_SYS_RESOURCE)) return -EPERM; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt
[Vserver] [PATCH 09/10] vfs: propagate the vfsmount into *xattr()
From: Herbert Pötzl [EMAIL PROTECTED] Previously, the xattr VFS interface did not regard the VFS context from which a dentry was acquired. As the per-vfsmount flags may now affect the permissions, the vfsmount structure must now be passed into the *xattr interface. Checks are also added to setxattr() and removexattr() for the MNT_RDONLY flag via MNT_IS_RDONLY, and EROFS returned accordingly. Acked-by: Sam Vilain [EMAIL PROTECTED] --- fs/xattr.c | 23 +++ 1 files changed, 15 insertions(+), 8 deletions(-) diff --git a/fs/xattr.c b/fs/xattr.c index 80eca7d..ad83a51 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -17,6 +17,7 @@ #include linux/syscalls.h #include linux/module.h #include linux/fsnotify.h +#include linux/mount.h #include asm/uaccess.h @@ -167,7 +168,7 @@ EXPORT_SYMBOL_GPL(vfs_removexattr); */ static long setxattr(struct dentry *d, char __user *name, void __user *value, -size_t size, int flags) +size_t size, int flags, struct vfsmount *mnt) { int error; void *kvalue = NULL; @@ -194,6 +195,9 @@ setxattr(struct dentry *d, char __user * } } + if (MNT_IS_RDONLY(mnt)) + return -EROFS; + error = vfs_setxattr(d, kname, kvalue, size, flags); kfree(kvalue); return error; @@ -209,7 +213,7 @@ sys_setxattr(char __user *path, char __u error = user_path_walk(path, nd); if (error) return error; - error = setxattr(nd.dentry, name, value, size, flags); + error = setxattr(nd.dentry, name, value, size, flags, nd.mnt); path_release(nd); return error; } @@ -224,7 +228,7 @@ sys_lsetxattr(char __user *path, char __ error = user_path_walk_link(path, nd); if (error) return error; - error = setxattr(nd.dentry, name, value, size, flags); + error = setxattr(nd.dentry, name, value, size, flags, nd.mnt); path_release(nd); return error; } @@ -239,7 +243,7 @@ sys_fsetxattr(int fd, char __user *name, f = fget(fd); if (!f) return error; - error = setxattr(f-f_dentry, name, value, size, flags); + error = setxattr(f-f_dentry, name, value, size, flags, f-f_vfsmnt); fput(f); return error; } @@ -412,7 +416,7 @@ sys_flistxattr(int fd, char __user *list * Extended attribute REMOVE operations */ static long -removexattr(struct dentry *d, char __user *name) +removexattr(struct dentry *d, char __user *name, struct vfsmount *mnt) { int error; char kname[XATTR_NAME_MAX + 1]; @@ -423,6 +427,9 @@ removexattr(struct dentry *d, char __use if (error 0) return error; + if (MNT_IS_RDONLY(mnt)) + return -EROFS; + return vfs_removexattr(d, kname); } @@ -435,7 +442,7 @@ sys_removexattr(char __user *path, char error = user_path_walk(path, nd); if (error) return error; - error = removexattr(nd.dentry, name); + error = removexattr(nd.dentry, name, nd.mnt); path_release(nd); return error; } @@ -449,7 +456,7 @@ sys_lremovexattr(char __user *path, char error = user_path_walk_link(path, nd); if (error) return error; - error = removexattr(nd.dentry, name); + error = removexattr(nd.dentry, name, nd.mnt); path_release(nd); return error; } @@ -463,7 +470,7 @@ sys_fremovexattr(int fd, char __user *na f = fget(fd); if (!f) return error; - error = removexattr(f-f_dentry, name); + error = removexattr(f-f_dentry, name, f-f_vfsmnt); fput(f); return error; } ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 06/10] vfs: pass nameidata into vfs_* to deal with per-mount permissions
From: Herbert Pötzl [EMAIL PROTECTED] Previously, vfs_* functions do not regard the vfs context that the inodes came from for their checks. In order to honour per-vfsmount permission options, VFS functions must be passed the nameidata structure, similar to the existing vfs_create(). This allows for proper checks in may_create(), may_delete() and permission(). Acked-by: Sam Vilain [EMAIL PROTECTED] --- fs/namei.c | 59 ++- fs/nfsd/vfs.c | 16 -- fs/reiserfs/xattr.c |3 ++- include/linux/fs.h | 12 +- ipc/mqueue.c|2 +- net/unix/af_unix.c |2 +- 6 files changed, 54 insertions(+), 40 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index e28de84..89cccf5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1345,7 +1345,8 @@ static inline int check_sticky(struct in * 10. We don't allow removal of NFS sillyrenamed files; it's handled by * nfs_async_unlink(). */ -static int may_delete(struct inode *dir,struct dentry *victim,int isdir) +static int may_delete(struct inode *dir, struct dentry *victim, + int isdir, struct nameidata *nd) { int error; @@ -1354,7 +1355,7 @@ static int may_delete(struct inode *dir, BUG_ON(victim-d_parent-d_inode != dir); - error = permission(dir,MAY_WRITE | MAY_EXEC, NULL); + error = permission(dir,MAY_WRITE | MAY_EXEC, nd); if (error) return error; if (IS_APPEND(dir)) @@ -1773,9 +1774,10 @@ fail: } EXPORT_SYMBOL_GPL(lookup_create); -int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) +int vfs_mknod(struct inode *dir, struct dentry *dentry, + int mode, dev_t dev, struct nameidata *nd) { - int error = may_create(dir, dentry, NULL); + int error = may_create(dir, dentry, nd); if (error) return error; @@ -1825,11 +1827,12 @@ asmlinkage long sys_mknodat(int dfd, con error = vfs_create(nd.dentry-d_inode,dentry,mode,nd); break; case S_IFCHR: case S_IFBLK: - error = vfs_mknod(nd.dentry-d_inode,dentry,mode, - new_decode_dev(dev)); + error = vfs_mknod(nd.dentry-d_inode, dentry, mode, + new_decode_dev(dev), nd); break; case S_IFIFO: case S_IFSOCK: - error = vfs_mknod(nd.dentry-d_inode,dentry,mode,0); + error = vfs_mknod(nd.dentry-d_inode, dentry, mode, + 0, nd); break; case S_IFDIR: error = -EPERM; @@ -1852,9 +1855,10 @@ asmlinkage long sys_mknod(const char __u return sys_mknodat(AT_FDCWD, filename, mode, dev); } -int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +int vfs_mkdir(struct inode *dir, struct dentry *dentry, + int mode, struct nameidata *nd) { - int error = may_create(dir, dentry, NULL); + int error = may_create(dir, dentry, nd); if (error) return error; @@ -1893,7 +1897,8 @@ asmlinkage long sys_mkdirat(int dfd, con if (!IS_ERR(dentry)) { if (!IS_POSIXACL(nd.dentry-d_inode)) mode = ~current-fs-umask; - error = vfs_mkdir(nd.dentry-d_inode, dentry, mode); + error = vfs_mkdir(nd.dentry-d_inode, dentry, + mode, nd); dput(dentry); } mutex_unlock(nd.dentry-d_inode-i_mutex); @@ -1938,9 +1943,10 @@ void dentry_unhash(struct dentry *dentry spin_unlock(dcache_lock); } -int vfs_rmdir(struct inode *dir, struct dentry *dentry) +int vfs_rmdir(struct inode *dir, struct dentry *dentry, + struct nameidata *nd) { - int error = may_delete(dir, dentry, 1); + int error = may_delete(dir, dentry, 1, nd); if (error) return error; @@ -2001,7 +2007,7 @@ static long do_rmdir(int dfd, const char dentry = lookup_hash(nd); error = PTR_ERR(dentry); if (!IS_ERR(dentry)) { - error = vfs_rmdir(nd.dentry-d_inode, dentry); + error = vfs_rmdir(nd.dentry-d_inode, dentry, nd); dput(dentry); } mutex_unlock(nd.dentry-d_inode-i_mutex); @@ -2017,9 +2023,10 @@ asmlinkage long sys_rmdir(const char __u return do_rmdir(AT_FDCWD, pathname); } -int vfs_unlink(struct inode *dir, struct dentry *dentry) +int vfs_unlink(struct inode *dir, struct dentry *dentry, + struct nameidata *nd) { - int error = may_delete(dir, dentry, 0); + int error = may_delete(dir, dentry, 0, nd); if (error) return error; @@ -2081,7 +2088,7 @@ static long do_unlinkat(int dfd
[Vserver] [PATCH 07/10] vfs: honour per-vfsmount flags inside VFS functions
From: Herbert Pötzl [EMAIL PROTECTED] Previously, the vfs functions did not enforce per-vfsmount flags such as read-only. Wherever we use the macro IS_RDONLY, we also need to use MNT_IS_RDONLY on the corresponding vfsmount structure. Acked-by: Sam Vilain [EMAIL PROTECTED] --- fs/namei.c |8 ++-- fs/open.c | 13 +++-- 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 89cccf5..6af1461 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -235,7 +235,7 @@ int permission(struct inode *inode, int /* * Nobody gets write access to a read-only fs. */ - if (IS_RDONLY(inode) + if ((IS_RDONLY(inode) || (nd MNT_IS_RDONLY(nd-mnt))) (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode))) return -EROFS; @@ -1508,7 +1508,8 @@ int may_open(struct nameidata *nd, int a return -EACCES; flag = ~O_TRUNC; - } else if (IS_RDONLY(inode) (flag FMODE_WRITE)) + } else if ((IS_RDONLY(inode) || MNT_IS_RDONLY(nd-mnt)) +(flag FMODE_WRITE)) return -EROFS; /* * An append-only file must be opened in append mode for writing. @@ -2475,6 +2476,9 @@ static int do_rename(int olddfd, const c error = -EINVAL; if (old_dentry == trap) goto exit4; + error = -EROFS; + if (MNT_IS_RDONLY(newnd.mnt)) + goto exit4; new_dentry = lookup_hash(newnd); error = PTR_ERR(new_dentry); if (IS_ERR(new_dentry)) diff --git a/fs/open.c b/fs/open.c index 8632721..3e7aea4 100644 --- a/fs/open.c +++ b/fs/open.c @@ -248,7 +248,7 @@ static long do_sys_truncate(const char _ goto dput_and_out; error = -EROFS; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || MNT_IS_RDONLY(nd.mnt)) goto dput_and_out; error = -EPERM; @@ -372,7 +372,7 @@ asmlinkage long sys_utime(char __user * inode = nd.dentry-d_inode; error = -EROFS; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || MNT_IS_RDONLY(nd.mnt)) goto dput_and_out; /* Don't worry, the checks are done in inode_change_ok() */ @@ -429,7 +429,7 @@ long do_utimes(int dfd, char __user *fil inode = nd.dentry-d_inode; error = -EROFS; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || MNT_IS_RDONLY(nd.mnt)) goto dput_and_out; /* Don't worry, the checks are done in inode_change_ok() */ @@ -516,7 +516,8 @@ asmlinkage long sys_faccessat(int dfd, c if (!res) { res = vfs_permission(nd, mode); /* SuS v2 requires we report a read only fs too */ - if(!res (mode S_IWOTH) IS_RDONLY(nd.dentry-d_inode) + if(!res (mode S_IWOTH) + (IS_RDONLY(nd.dentry-d_inode) || MNT_IS_RDONLY(nd.mnt)) !special_file(nd.dentry-d_inode-i_mode)) res = -EROFS; path_release(nd); @@ -627,7 +628,7 @@ asmlinkage long sys_fchmod(unsigned int inode = dentry-d_inode; err = -EROFS; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || MNT_IS_RDONLY(file-f_vfsmnt)) goto out_putf; err = -EPERM; if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) @@ -660,7 +661,7 @@ asmlinkage long sys_fchmodat(int dfd, co inode = nd.dentry-d_inode; error = -EROFS; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || MNT_IS_RDONLY(nd.mnt)) goto dput_and_out; error = -EPERM; ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 1/5] [fs] BME: allow mount flags on loopback mounts
Add an extra parameter to the do_loopback function that sets up a bind mount, that allows the value of vfsmount.mnt_flags to be set. -- Sam Vilain, Catalyst IT (NZ) Ltd. http://www.catalyst.net.nz/ phone: +64 4 499 2267PGP ID: 0x66B25843 ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 0/5] Bind Mount Extensions
This patchset allows per-VFS mount options, such as noatime, nodiratime, and in partitular, read-only. ie, `mount -o ro --bind` can work with this patch. This is the invention of Herbert Pötzl. (sent to the Linux-VServer list as a 'dry run', and to give Herbert a chance to veto/comment) -- Sam Vilain, Catalyst IT (NZ) Ltd. http://www.catalyst.net.nz/ phone: +64 4 499 2267PGP ID: 0x66B25843 ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 2/5] [fs] BME: add MNT_RDONLY flag
Add an extra flag to the mount flags (along with noatime, noexec, etc) for a read-only mount. Note that this will only affect bind mounts as normally this is performed through the inode. -- Sam Vilain, Catalyst IT (NZ) Ltd. http://www.catalyst.net.nz/ phone: +64 4 499 2267PGP ID: 0x66B25843 ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 3/5] [fs] BME: refactor show_vfsmnt
New possibilities for flags makes this function change slightly. The opportunity is taken to simplify the function. -- Sam Vilain, Catalyst IT (NZ) Ltd. http://www.catalyst.net.nz/ phone: +64 4 499 2267PGP ID: 0x66B25843 ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 4/5] [fs] BME: vfs functions must pass mount as well as inode
In order to have a file that is accessed through the same inode have different permissions, VFS functions must pass the mount structure. -- Sam Vilain, Catalyst IT (NZ) Ltd. http://www.catalyst.net.nz/ phone: +64 4 499 2267PGP ID: 0x66B25843 ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 5/5] [fs] BME: fix consistency of IS_RDONLY
The IS_RDONLY macro is defined inconsistently. -- Sam Vilain, Catalyst IT (NZ) Ltd. http://www.catalyst.net.nz/ phone: +64 4 499 2267PGP ID: 0x66B25843 ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] [PATCH 1/5] [fs] BME: allow mount flags on loopback mounts
Sam Vilain wrote: Add an extra parameter to the do_loopback function that sets up a bind mount, that allows the value of vfsmount.mnt_flags to be set. Wohoo! no patch ... go stgit :) ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 0/5] Bind Mount Extensions
This patchset allows per-VFS mount options, such as noatime, nodiratime, and in partitular, read-only. ie, `mount -o ro --bind` can work with this patch. This is the invention of Herbert Pötzl. (sent to the Linux-VServer list as a 'dry run', and to give Herbert a chance to veto/comment) --- Sam Vilain, Catalyst IT (NZ) Ltd. http://www.catalyst.net.nz/ phone: +64 4 499 2267PGP ID: 0x66B25843 ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 2/5] [fs] BME: add MNT_RDONLY flag
Add an extra flag to the mount flags (along with noatime, noexec, etc) for a read-only mount. Note that this will only affect bind mounts as normally this is performed through the inode. Signed-off-by: Sam Vilain [EMAIL PROTECTED] --- fs/namespace.c|2 ++ include/linux/mount.h |3 +++ 2 files changed, 5 insertions(+), 0 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 1094e54..b12ea35 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1289,6 +1289,8 @@ long do_mount(char *dev_name, char *dir_ ((char *)data_page)[PAGE_SIZE - 1] = 0; /* Separate the per-mountpoint flags */ + if (flags MS_RDONLY) + mnt_flags |= MNT_RDONLY; if (flags MS_NOSUID) mnt_flags |= MNT_NOSUID; if (flags MS_NODEV) diff --git a/include/linux/mount.h b/include/linux/mount.h index b7472ae..c485c2b 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -22,6 +22,9 @@ #define MNT_NOEXEC 0x04 #define MNT_NOATIME0x08 #define MNT_NODIRATIME 0x10 +#define MNT_RDONLY 0x20 + +#define MNT_IS_RDONLY(m) ((m) ((m)-mnt_flags MNT_RDONLY)) #define MNT_SHARED 0x1000 /* if the vfsmount is a shared mount */ #define MNT_UNBINDABLE 0x2000 /* if the vfsmount is a unbindable mount */ ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 1/5] [fs] BME: allow mount flags on loopback mounts
Add an extra parameter to the do_loopback function that sets up a bind mount, that allows the value of vfsmount.mnt_flags to be set. Signed-off-by: Sam Vilain [EMAIL PROTECTED] --- fs/namespace.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 058a448..1094e54 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -861,11 +861,13 @@ static int do_change_type(struct nameida /* * do loopback mount. */ -static int do_loopback(struct nameidata *nd, char *old_name, int recurse) +static int do_loopback(struct nameidata *nd, char *old_name, + unsigned long flags, int mnt_flags) { struct nameidata old_nd; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); + int recurse = flags MS_REC; if (err) return err; if (!old_name || !*old_name) @@ -891,6 +893,8 @@ static int do_loopback(struct nameidata if (!mnt) goto out; + mnt-mnt_flags = mnt_flags; + err = graft_tree(mnt, nd); if (err) { LIST_HEAD(umount_list); @@ -1312,7 +1316,7 @@ long do_mount(char *dev_name, char *dir_ retval = do_remount(nd, flags ~MS_REMOUNT, mnt_flags, data_page); else if (flags MS_BIND) - retval = do_loopback(nd, dev_name, flags MS_REC); + retval = do_loopback(nd, dev_name, flags, mnt_flags); else if (flags (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE)) retval = do_change_type(nd, flags); else if (flags MS_MOVE) ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 5/5] [fs] BME: fix consistency of IS_RDONLY
The IS_RDONLY macro is defined inconsistently. Signed-off-by: Sam Vilain [EMAIL PROTECTED] --- include/linux/fs.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 250b002..3000655 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -150,7 +150,7 @@ extern int dir_notify_enable; */ #define __IS_FLG(inode,flg) ((inode)-i_sb-s_flags (flg)) -#define IS_RDONLY(inode) ((inode)-i_sb-s_flags MS_RDONLY) +#define IS_RDONLY(inode) __IS_FLG(inode, MS_RDONLY) #define IS_SYNC(inode) (__IS_FLG(inode, MS_SYNCHRONOUS) || \ ((inode)-i_flags S_SYNC)) #define IS_DIRSYNC(inode) (__IS_FLG(inode, MS_SYNCHRONOUS|MS_DIRSYNC) || \ ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [PATCH 4/5] [fs] BME: vfs functions must pass mount as well as inode
In order to have a file that is accessed through the same inode have different permissions, VFS functions must pass the mount structure. Signed-off-by: Sam Vilain [EMAIL PROTECTED] --- arch/sparc64/solaris/fs.c |4 +-- fs/ext2/ioctl.c |7 +++-- fs/ext3/ioctl.c | 16 +++--- fs/hfsplus/ioctl.c|3 +- fs/namei.c| 69 +++-- fs/nfs/dir.c |3 +- fs/nfsd/nfs4recover.c |6 ++-- fs/nfsd/vfs.c | 19 +++- fs/open.c | 25 fs/reiserfs/ioctl.c |7 +++-- fs/reiserfs/xattr.c |3 +- fs/xattr.c| 23 ++- include/linux/fs.h| 12 ipc/mqueue.c |2 + net/unix/af_unix.c|2 + 15 files changed, 121 insertions(+), 80 deletions(-) diff --git a/arch/sparc64/solaris/fs.c b/arch/sparc64/solaris/fs.c index 4885ca6..612477d 100644 --- a/arch/sparc64/solaris/fs.c +++ b/arch/sparc64/solaris/fs.c @@ -363,7 +363,7 @@ static int report_statvfs(struct vfsmoun int j = strlen (p); if (j 15) j = 15; - if (IS_RDONLY(inode)) i = 1; + if (IS_RDONLY(inode) || MNT_IS_RDONLY(mnt)) i = 1; if (mnt-mnt_flags MNT_NOSUID) i |= 2; if (!sysv_valid_dev(inode-i_sb-s_dev)) return -EOVERFLOW; @@ -399,7 +399,7 @@ static int report_statvfs64(struct vfsmo int j = strlen (p); if (j 15) j = 15; - if (IS_RDONLY(inode)) i = 1; + if (IS_RDONLY(inode) || MNT_IS_RDONLY(mnt)) i = 1; if (mnt-mnt_flags MNT_NOSUID) i |= 2; if (!sysv_valid_dev(inode-i_sb-s_dev)) return -EOVERFLOW; diff --git a/fs/ext2/ioctl.c b/fs/ext2/ioctl.c index 3ca9afd..8f1c64a 100644 --- a/fs/ext2/ioctl.c +++ b/fs/ext2/ioctl.c @@ -11,6 +11,7 @@ #include linux/capability.h #include linux/time.h #include linux/sched.h +#include linux/mount.h #include asm/current.h #include asm/uaccess.h @@ -30,7 +31,8 @@ int ext2_ioctl (struct inode * inode, st case EXT2_IOC_SETFLAGS: { unsigned int oldflags; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) @@ -69,7 +71,8 @@ int ext2_ioctl (struct inode * inode, st case EXT2_IOC_SETVERSION: if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) return -EPERM; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if (get_user(inode-i_generation, (int __user *) arg)) return -EFAULT; diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c index 556cd55..2b83f7e 100644 --- a/fs/ext3/ioctl.c +++ b/fs/ext3/ioctl.c @@ -8,6 +8,7 @@ */ #include linux/fs.h +#include linux/mount.h #include linux/jbd.h #include linux/capability.h #include linux/ext3_fs.h @@ -36,7 +37,8 @@ int ext3_ioctl (struct inode * inode, st unsigned int oldflags; unsigned int jflag; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) @@ -112,7 +114,8 @@ flags_err: if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) return -EPERM; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if (get_user(generation, (int __user *) arg)) return -EFAULT; @@ -166,7 +169,8 @@ flags_err: if (!test_opt(inode-i_sb, RESERVATION) ||!S_ISREG(inode-i_mode)) return -ENOTTY; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if ((current-fsuid != inode-i_uid) !capable(CAP_FOWNER)) @@ -201,7 +205,8 @@ flags_err: if (!capable(CAP_SYS_RESOURCE)) return -EPERM; - if (IS_RDONLY(inode)) + if (IS_RDONLY(inode) || + (filp MNT_IS_RDONLY(filp-f_vfsmnt))) return -EROFS; if (get_user(n_blocks_count, (__u32 __user *)arg)) @@ -222,7 +227,8 @@ flags_err: if (!capable
[Vserver] Patch spam
Sorry about that, guys. Ok, other than the minor issue with encoding, I think the first optimised-for-inclusion patch looks good to go. I haven't tested it yet ;), nor double-checked that all the places where BME should touch are touched, but feedback on the patch submission style and the patch itself are welcome. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Teaser output from Linux::VServer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, Just a quick one. Here is the output of a script that comes with Linux::VServer. See if you can work out what's going on :) wilber:~/vserver/Linux-VServer# perl scripts/vserver-qemu-test - --image=../qemu/testing.img --kernel-tree=../linux-2.6 install vserver-qemu-test: mounting partitions in ../qemu/testing.img to /mnt/qemu vserver-qemu-test: Setting up ../qemu/testing.img = /dev/loop0 vserver-qemu-test: fsck /dev/loop1 vserver-qemu-test: warning: fsck corrected errors on /dev/loop1 vserver-qemu-test: mounting /dev/loop1 on /mnt/qemu/p1 vserver-qemu-test: fsck /dev/loop2 vserver-qemu-test: warning: failed to fsck/mount partition 2, skipping vserver-qemu-test: 1 partition(s) in ../qemu/testing.img now mounted. vserver-qemu-test: Installing . to /mnt/qemu/p1/var/testing/Linux-VServer vserver-qemu-test: installed 1 module(s). vserver-qemu-test: installing bootstrap code to /mnt/qemu/p1/var/testing/runtests.pl vserver-qemu-test: Installing modules to /mnt/qemu/p1 vserver-qemu-test: unmounting /mnt/qemu/p1 vserver-qemu-test: unlooping /dev/loop1 vserver-qemu-test: unlooping /dev/loop0 wilber:~/vserver/Linux-VServer# perl scripts/vserver-qemu-test - --image=../qemu/testing.img --kernel-tree=../linux-2.6 test enter verbose vserver-qemu-test: running `qemu-system-x86_64 -kernel ../linux-2.6/arch/x86_64/boot/bzImage -nographic -append quiet console=ttyS0 root=/dev/hda1 _test=enter,verbose init=/var/testing/runtests.pl ../qemu/testing.img' Connected to host network interface: tun0 (qemu) PCI: PIIX3: Enabling Passive Release on :00:01.0 Command line is: quiet console=ttyS0 root=/dev/hda1 _test=enter,verbose init=/var/testing/runtests.pl , $1 is: enter,verbose %args is: verbose 1 enter 1 ** ~ Running tests in Linux-VServer ** t/00-nonprivileged1..4 ok 1 - use Linux::VServer; ok 2 - vs_get_version() = 0 (got 3.1) ok 3 - vs_version_ok() ok 4 - vx_get_task_xid(680) = 0 (got ) ok t/01-create...1..3 ok 1 - vx_create returns a non-zero context ok 2 - vx_create returns a *new* context ID ok 3 - vx_create migrates current task ok t/02-legacy...1..13 chattr() didn't work on t/tmp/test, and no setattr binary - cannot contine at blib/lib/Linux/VServer/Legacy.pm line 125. # Looks like you planned 13 tests but only ran 1. # Looks like your test died just after 1. ok 1 - use Linux::VServer::Legacy; dubious ~Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 2-13 ~Failed 12/13 tests, 7.69% okay t/03-pureperl.1..5 ok 1 - use Linux::VServer; ok 2 - :VCI import spec ok 3 - vs_get_version() = 0 (got 3.1) ok 4 - vs_version_ok() ok 5 - vx_get_task_xid(686) = 0 (got ) ok Failed Test Stat Wstat Total Fail Failed List of Failed - --- t/02-legacy.t 255 6528013 24 184.62% 2-13 Failed 1/4 test scripts, 75.00% okay. 12/25 subtests failed, 52.00% okay. Press ctrl+d to shutdown root@(none):/var/testing# exit 'shutting down' vserver-qemu-test: command completed in 31.3s - -- Sam Vilain, Catalyst IT (NZ) Ltd. phone: +64 4 499 2267PGP ID: 0x66B25843 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD/Vfj/AZAiGayWEMRAndDAJ0Wl8SpnUIdBRbiX6y0hSDvAOQa3gCfROdW y/eMdPEIMd27R/Loo12S/p4= =yp+i -END PGP SIGNATURE- ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Humble beginnings of Linux::VServer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I have started work on a Linux::VServer perl module, and it can at least make a syscall or two already. The main purpose of the module is for something to easily prototype changes people suggest to the API as the vserver-inclusion branch gets ready, and to have a place for managing an extremely finely grained test suite that is useful for cross-checking and debugging kernel-side development. For the VServer 2.1 API used by VServer 2.0+, it just wraps libvserver. ~ The 3.1 API, used currently only by the vserver-inclusion I am working on, is 'pure' perl. As a side project, I will be bundling up the patchy bits and pieces of scripts I have made available over time (such as immucp, unify-dirs, the LVM integrated vserver creation scripts, etc) into this module, which may end up being split off into a seperate distribution etc. Or, if people are interested, may end up being a more comprehensive set of userland utilities. Currently you can browse the tree at ~ http://vserver.utsl.gen.nz/gitweb/?p=Linux-VServer.git;a=summary or pull via git (or your favourite git porcelain) from ~ git://vserver.utsl.gen.nz/git/Linux-VServer.git - -- Sam Vilain, Catalyst IT (NZ) Ltd. phone: +64 4 499 2267PGP ID: 0x66B25843 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD+qvM/AZAiGayWEMRAuAQAJ9ecDsl2zxbTdqNLkVbP+851tNDiwCeIxIO A7nKfus/kiZTU05R2O9Oa3Q= =686r -END PGP SIGNATURE- ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Vservers and RAID (5 hard)
Ehab Heikal wrote: I have bad experience with LVM and raid it is near impossible to fix LVM if you have problems mounting them after a kernel change. LVM is not as supported in the resucue mode in most distro's CDs Which distro was that? Almost every LiveCD I've tried has good LVM support. Knoppix from about 2 years ago certainly did, as does Ubuntu Breezy and perhaps even Warty, too. Having said that you're probably still safer with at least the root filesystem on a non-LVM partition. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Managing the VServer patch with Git and stgit
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 This tutorial has now concluded, and you can get the unannotated IRC log for it at: ~ http://irc.13thfloor.at/LOG/2006-02/LOG_2006-02-16.txt Sam Vilain wrote: | Hi all, | | Some people have requested an IRC Git tutorial. I will be running this | on Thursday from 2-4pm NZDT (UTC+13). That's Thursday 1-3am in | London, Wednesday 5-7pm in California, Wednesday 8-11pm in New York and | Thursday 9-11am in Taiwan/PRC. | | Details: [...] - -- Sam Vilain, Catalyst IT (NZ) Ltd. phone: +64 4 499 2267PGP ID: 0x66B25843 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD8/dw/AZAiGayWEMRApD4AJ4rtFh1qXV07TmixdDLlJERs024TACgxgQC 0PuLteTl2Z3nFK1iucBuCYY= =hByc -END PGP SIGNATURE- ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Vservers and RAID (5 hard)
Odile Bénassy wrote: Hello,Jacques, hello all! I have set up a few vservers for hosting different web sites on the same machine and keep a separate control of each, and so far I'm happy, thanks! Time comes to get it running for the public, and as the server has a RAID controler (it is Dell's Megaraid), I'm about to get a few disks ordered. The normal setup would be : 2 * 36 Go for the system, 3 * 73 or 4 * 73 for the data. Meaning the contents of /var, with the vservers in it. If you've got the option of buying 4 disks, have you considered just using software RAID 1? You only end up with one disk's worth less space than a 4 disk raid 5 set, but you gain an awful lot in terms of stability and performance. IMHO RAID controllers were a nice idea, but I've now seen two of them simply shaft a raid set, and they were name brand, battery backed, etc. One of them was only a RAID 1 set yet the controller still seemed to be capable of getting its on-disk state so confused it would crash (blocking all SCSI I/O after that) during a resync. Even after replacing the motherboard, backplanes, etc. My advice for Dell systems is to pull the RAID key out, or at the very least, disable write caching when you create the volumes in the BIOS. So I would welcome any advice about what to purchase and how to use the disks, especially if there is some verser peculiarities. You might like to consider using the LVM for partitioning such a system. It's generally more managable than a single massive filesystem. With a system like that I'd normally set up a couple of 1-2GB partitions on the root set, one for a root FS and the other for swap (or emergency reinstall space), then just throw the rest of it into a single LVM Volume Group. Even if some of it is RAID 5 and some of it RAID 1, you can easy control through LVM which physical volumes the partitions you create live on, or move them later (without unmounting). Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Vservers and RAID (5 hard)
Christian Heim wrote: well, only on debian the vservers dir goes to /var on all other distros, it's /vservers :) Also on Gentoo ;) I hate that! Such a deep directory... besides, the unix conventions of var, /usr, etc, were made before this use case was considered (/com, anyone?). I think it deserves its own TLD (top level directory). Perhaps something more in the unix spirit: get /chld added to the FHS? Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [vserver-inclusion] progress report
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 vserver-inclusion continues to make progess. I've got through the first section of the plan, so now all that remains is the test suite, and of course fixing the mistakes :) There is now a new url: ~ http://vserver.utsl.gen.nz/ The git repo is now also in two places side by side: ~ http://vserver.utsl.gen.nz/git/vserver.git ~ git://vserver.utsl.gen.nz/git/vserver.git And gitweb is at: ~ http://vserver.utsl.gen.nz/gitweb An interesting branch to look at is http://vserver.utsl.gen.nz/gitweb/?p=vserver.git;a=shortlog;h=2.6.16-rc2-vsi exported to http://vserver.utsl.gen.nz/patches/utsl/2.6.16-rc2-vsi/ The six commits since Linux v2.6.16-rc2 represent the refactoring work I'm doing, and all the commits after that are what's left to do. ie, as work progresses these will get smaller and smaller. For instance, if you select the commitdiff for the commit labeled from 2.6.16-rc1-vs2.1.0.9/04_syscall.diff, it currently only has one hunk left. The very top commit takes it to exactly what you get from a plain 2.6.16-rc2 + Herbert's vs2.1.0.11 patch. Enjoy! - -- Sam Vilain, Catalyst IT (NZ) Ltd. phone: +64 4 499 2267PGP ID: 0x66B25843 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD8WyZ/AZAiGayWEMRAsJDAJ9pOsp4a4ljxBMu6/VUAxZH/+EvTgCfd9Th t4zt2uk/0mYZb2rhhEdQ3Nc= =qrxj -END PGP SIGNATURE- ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Managing the VServer patch with Git and stgit
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, Some people have requested an IRC Git tutorial. I will be running this on Thursday from 2-4pm NZDT (UTC+13). That's Thursday 1-3am in London, Wednesday 5-7pm in California, Wednesday 8-11pm in New York and Thursday 9-11am in Taiwan/PRC. Details: ~ server: irc.oftc.net ~ channel: #vserver Bring along: ~ - a working installation of git-core ~http://www.kernel.org/pub/software/scm/git-core/git-1.2.0.tar.gz ~(debian/ubuntu: git-core) ~ - a working installation of stacked git ~http://www.procode.org/stgit/ ~ - a fresh directory created by: ~ git-clone \ ~ rsync://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git ~(note: this will download 110MB of git pack file) ~ - (recommended) a working installation of gitk ~http://www.kernel.org/pub/software/scm/git-core/git-1.1.6.tar.gz During the tutorial we will: ~ - track some changes from Linus' development tree, without downloading ~the whole tree again ~ - track another couple of people's trees using the same repository ~ - use Stacked Git (stgit) to import an arbitrary set of split ~patches from Herbert against a non-current kernel version into a ~branch ~ - Re-base a series of patches against a new upstream release ~ - Move some changes between the patches in our patchset ~ - export the patches we're working on to a set of .diffs If requested I can draw reference to other VCS systems you may be familiar with, such as Subversion, CVS, SVK or darcs. - -- Sam Vilain, Catalyst IT (NZ) Ltd. phone: +64 4 499 2267PGP ID: 0x66B25843 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD8Av0/AZAiGayWEMRApXZAKCIWUusaTdHQLaq2y4c0yGuYUXyNQCfUs8N V+lC/wed3XXTPTRsQCI/AOU= =rPl7 -END PGP SIGNATURE- ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] HOWTO deal with NAT'ing firewalls and source-based routing with vservers
I always like to recommend fwbuilder for building firewalls for vserver hosts. It takes some of the headaches out of building firewall chains, plus you know that you can manage non-Linux firewalls with it too (having varied network stack implementations through your firewall is a common security best practice). Have a look at the screenshots on this page for many hints on how this is approached. Feel free to ask questions. http://utsl.gen.nz/talks/vserver/slide23a.html Sam. Jairo Enrique Serrano Castañeda wrote: whats about jabberd servers into vserver? any one try to work with that? i nated the main server por 5222 to the virtual 5222 port... and nothing.. ;( On 2/12/06, * Herbert Poetzl* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: On Sun, Feb 12, 2006 at 01:16:34AM +0100, Valdemar Lemche wrote: Hi all I've had a quite a headache with NAT and vservers on my home server/firewall. And as far as I can read on vserver mailling list, a lot of other people have had the same problem. So now when I cracked the nut, I decided to make this little howto. HOWTO deal with NAT'ing firewalls and source-based routing with vservers Some *challenges* with source-based routing, MASQUERADE and SNAT: SNAT and MASQUERADE are supposingly doing NAT the same way, except that MASQUERADE purges any connection, when the link of the '-o' interface goes down. However its *NOT* the case. Consider the following setup: +--+ | Some gateway at my ISP: | +-++ | | +-++ | Homeserver and firewall: | +--+ | eth0: dhcp (public ip) | +--+ vserver1 | | dummy0: 192.168.2.1/24 http://192.168.2.1/24 | | +--+ | dummy0:vserver1 192.168.2.2/24 http://192.168.2.2/24 -+--+ | dummy0:vserver2 192.168.2.3/24 http://192.168.2.3/24 -+--+ | eth1: 192.168.1.1/24 http://192.168.1.1/24 | | +--+ | default gateway is the dhcp addr | +--+ vserver2 | +-++ +--+ | | +-++ | Workstation: | | eth0:192.168.1.20/24 http://192.168.1.20/24 | | default gateway: 192.168.1.1 http://192.168.1.1 | +--+ ( I love ASCII drawings; Visio eat your heart out! ;) ) I'm doing source-based routing with iproute2: # echo 100 vserver /etc/iproute2/rt_tables # ip rule add from 192.168.2.0/24 http://192.168.2.0/24 table vserver # ip route add default dev eth0 table vserver # ip route add 192.168.1.0/24 http://192.168.1.0/24 dev eth1 table vserver I'm using MASQUERADE with iptables: # iptables -A POSTROUTING -o eth0 -j MASQUERADE From vserver1 vserver2: * I can ping 192.168.2.1 http://192.168.2.1 * I can ping the other vserver * I can ping 192.168.1.1 http://192.168.1.1 * I can ping 192.168.1.20 http://192.168.1.20 * I can ping the dhcp address of eth0 on my homeserver. But I can *NOT* ping any addresse outside eth0 Packets originating from vservers *should* exit through eth0 and thus get masqueraded as the public dhcp address? But in fact they aren't; they still have their original ip address when hitting any address outside eth0. this is already somewhat fuzzy, as threre are no addresses 'inside' eth0, the ips are 'related' or 'assigned' to an interface, but they are not limited to this interface at all ... but the main reason why this fails is that masquerading is a technique which allows a host to 'remap' certain ports/addresses in a conenction aware way when _forwarding_ them. now, first, the forwarding is not required in this case, as the packet is originating from the host, and second there is no need to map to a free port (because of the same reason). now some of you might ask: if that is so, why doesn't it work out of the box?, and the answer is simple. usually (as I said, the ips are not bound to the interfaces) the appropriate ip is chosen according to the routing decisions (i.e. if a packet leaves eth0, it will just get an ip from the eth0 pool, if it is leaving eth1, the stack _usually_ chooses one from there ... now with a restriction on the set of available ips, this leads to a sometimes wrong IP being used on the right interface, which has to be corrected But if I use SNAT
[Vserver] [vserver-inclusion] Update: 1c milestone reached, and a changed summary
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Greetings all, I am happy to announce that the vserver-inclusion project is making progress. Today I have reached stage 1c) on the patch reshaping plan (see attached). The patch plan (likely to be used for the LKML announcement) has had some LKML flintstones removed, too ;-). ~ [EMAIL PROTECTED]:~# cat /proc/virtual/info ~ VCIVersion: 0002:0001 ~ VCISyscall: 236 ~ VCIKernel: 0700 Currently the kernel boots, and basic information is available in /proc/virtual and /proc/XXX/vinfo. There are no features yet, just infrastructure. The current patches, against Linus's linux-2.6 tree (probably apply mostly cleanly to 2.6.16-rc2) are at: ~ http://utsl.gen.nz/vserver/patches-split/mine/2.6.16-rc2%2bgit-vsi/ The next stage will be to bring in the debugging and history support to the code base, and to make sure that nothing important is missing. Finally, we will need a test suite demonstrating each feature working, using util-vserver or libvserver or even a pure[*] Perl implementation. I hope to have this established in a very basic form before submission to LKML. Contributions at this point are probably only useful from experienced kernel/vserver developers who know how the vserver patch works, and are able to check that the code there matches their expectations. Experienced vserver/userspace developers could help out with the test suite. - -- Sam Vilain, Catalyst IT (NZ) Ltd. * - the term 'pure' applied to a Perl built-in like syscall() just seems ~wrong ;) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD6GsR/AZAiGayWEMRArmoAJ9HalHvBY0juclclqZABBT5ShpcEQCgrR34 fe4/V6iKy0UHES22P8BVlwk= =zUY4 -END PGP SIGNATURE- The mighty Linux-VServer inclusion branch = The Goal To reshape the Linux-VServer kernel patch into a series of patches that incrementally add features, for inclusion into the mainstream Linux 2.6 tree. This will be done on a feature-based basis, and debate is encouraged into the exact form of the plan and implementation. The name of this feature is potentially up for debate (Jails? Containers? Domains? Contexts?); but the term vserver is already in the kernel - so 'changing' that would probably need a good reason and be approved by the core team. Some features might not be dependent on vserver per se, of course. This will likely be uncovered as individual features are merged. Minor nitpicking like naming conventions are taken from the Linux-VServer.org branch, and constructive suggestions on that front are most welcome. Preferably in the form of patches or a git source I can pull :) The Plan Patches, by general category, with a rough expected order; very hazy dependencies between the general categories follow. 0. features that don't need vserver, but are in the patch anyway a. Bind Mount Extensions (mount --bind -o ro) b. Kernel split (already included upstream! and with incorrect acknowledgement ;)) 1. core vserver patch - no features a. struct and ps addition; internal API and refcounting b. syscall, and switch c. /proc visibility ** UP TO HERE ** d. debugging e. history 2. isolation features a. IPC, semaphore, and signal restrictions b. proc/array filtering c. IPv4 chbind d. FS chroot() barrier e. general /proc filtering f. ptrace g. process admin: alloc_uid, find_user, sys_setpriority h. printk i. kthread 3. virtualisation features a. uts information b. initpid virtualisation c. uptime d. load average e. ksyslog f. vshelper (reboot support) g. vroot (quota, fs IOCTL, etc) i. general PID virtualisation (eric's patch) j. ngnet (network stack virtualisation) 4. resource tracking features a. scheduler tracking hook b. FS xid counting c. FS xid tagging d. ulimit e. RSS usage f. IO - async tracking 5. resource sharing features a. scheduling v1 - TBF and vavavoom b. FS - immutable linkage invert (immulink) c. disk scheduler integration d. RSS limits e. FS - mad cow 6. resource limit features a. scheduler b. rlimits c. disklimits Locations - The GIT repository for this project is at: http://utsl.gen.nz/vserver/vserver.git The patch stack for this project will be on the vserver-inclusion branch; it is exported to: http://utsl.gen.nz/vserver/patches-split/mine/2.6.N+git-vsi/ Where 2.6.N was the last release (or release candidate) of Linus' tree. This patch is NOT against any release you can download as a tarball :). Upstream (13thfloor.at) patches will be on the vs2.1.x.y branch, corresponding to their version number. The upstream patch that was used as a source will be under: http://utsl.gen.nz/vserver/patches-split/13thfloor/2.6.N-vs2.1.x.y/ And, for sanity checking, the result of my importing of the upstream quilt patch into stgit and re
Re: [Vserver] BME and CoW as split patches available?
On Thu, 2006-02-02 at 15:20 +0100, Wilhelm Meier wrote: is the argument good enough for you to supply the split bme and cow patches for 2.6.15? 2.6.15? That's like ancient history, man. There's a historic release here; it's for a much older release, but maybe it will apply without much modification needed http://www.13thfloor.at/vserver/s_rel26/v2.01/split-2.6.14.3-vs2.01.tar.gz There's also one against the ageing 2.6.16-rc1: http://vserver.13thfloor.at/Experimental/del-2.6.16-rc1-vs2.1.0.9/ (see 36-bme and 37-cow) (note: Experimental/ URIs not guaranteed to be around next month or week) Try them, see if they work. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] packet shaping with vservers
On Mon, 2005-11-07 at 13:58 +0100, Grzegorz Nosek wrote: The test cases I've found are: - no QoS at all everything runs smoothly - root qdisc on eth0 set to htb (or pfifo, or sfq) still everything nice and smooth - default class on eth0 set to htb (rate 1Gbit, so it's not true bandwith choking) mysql connections seem to take forever to establish (lots of unauthenticated user connections as they are too throttled to even log in and time out eventually) Strange. Certainly in terms of firewalling, packets that don't leave the host always travel via the loopback interface. I'd be curious if the same thing happens without vserver (eg, by just 'entering' the servers with chroot and starting the one or two services you are testing, suitably configured to bind as appropriate). Our production boxes (w/o vservers) are happily shaping packets at the rates we encounter so CPU power is not an issue either. One thing that bugs me is why do these packets even go through eth0 queue? ip route show table local shows the local ip addresses as local X.X.X.X dev eth0 proto kernel scope host src X.X.X.Y (X.X.X.X being guest address and X.X.X.Y being host address) Have you got any ideas/suggestions? Is this a known issue? Best regards, Grzegorz Nosek ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] many ip addys
On Fri, 2005-10-21 at 15:40 -0400, Chuck wrote: so then just for clarity, using the current vserver software (2.1.0-r4) with up to 16 ip addys does not give any performance hits at all compared to 1 but beyond that is when it gets nasty. no, that is a linear search, so every new ip address will actually add overhead on every socket bind and/or ip related operation inside the guest ... ok.. so how may ips in a guest do you estimate could be used before a perceptable change in performance happens? and by this are we talking about per interface or gross totals among all interfaces? That really is a hard question to answer, and a research topic. Suck it and see, or build yourself a test system so you can definitively answer the question. It's quite possible that on today's typically CPU-heavy systems it might not be a perceivable bottleneck at all until you start getting to the kind of connection levels where NAT starts not being able to track connections any more. If you can measure any performance difference as the number of IPs in a vserver goes up at all I'd be highly impressed. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] many ip addys
On Mon, 2005-10-24 at 19:29 -0400, Chuck wrote: ok.. so how may ips in a guest do you estimate could be used before a perceptable change in performance happens? and by this are we talking about per interface or gross totals among all interfaces? That really is a hard question to answer, and a research topic. Suck it and see, or build yourself a test system so you can definitively answer the question. It's quite possible that on today's typically CPU-heavy systems it might not be a perceivable bottleneck at all until you start getting to the kind of connection levels where NAT starts not being able to track connections any more. If you can measure any performance difference as the number of IPs in a vserver goes up at all I'd be highly impressed. ok I was just being concerned that if I set up a guest with 10 or 15 ip addys, as each gets heavily used by virtual sites, the virtual machine may take more time than it would under a stand-alone hardware situation. Apples and Oranges. A more relevant question is, if you had a single machine without vserver, and with several Apache instances, each set to listen on 10 to 15 addresses - is that faster or the same speed as a system using vserver, with each block of 10 to 15 addresses assigned to a vserver, and apache running inside that vserver. Assuming you're using filesystem unification, there is virtually nothing different between those two setups - just the vserver overhead. we dont use nat, every ip is public. yes, sure. It was just drawing attention to the fact that TCP itself has performance limits, and that the current implementation of virtual IPs might possibly be fast enough that you'll run into those other limits first. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Talking about limits...
On Tue, 2005-10-18 at 15:46 -0400, Michel Belleau (malaiwah.com) wrote: I tried limiting a vserver memory this week-end but had no luck doing it.. I tried setting the AS and RSS files in the configuration directory, but it doesn't seem to work the way I wanted. It kills processes (like apache in my tests) which need more memory than what I allowed in the limits. What I would like to do is limit the usage of RAM the vserver has. It seems that AS and RSS limits are the total ressources of a vserver. I want to give a vserver 128 megabytes of RAM and 1 gig swap space. I don't want the OOM killer to restrict applications to ask for more than 128 megs of RAM. I know I can do it, since how does VPS using vservers work then? Did you try putting virt_mem in /etc/vservers/XXX/flags ? Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] patched Debian kernels?
On Tue, 2005-10-18 at 14:24 +0200, Herbert Poetzl wrote: for some time (maybe still? and at least in sarge) the tools are broken, so better get them as source tarball or from unstable/testing ... The ones on apt.utsl.gen.nz/debian include all the fixes I could find. If there are any new bugfixes, ping me on irc or via e-mail and I can easily build new packages. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] mount a NFS filesystem into a vserver
Tue, 2005-09-13 at 09:49 -0600, [EMAIL PROTECTED] wrote: I am looking to mount a NFS filesystem into a vserver. I have searched the archives, the site, and the web but can not find a straight answer to this. What is the proper/preferred method of doing this? Does anyone have the steps? I would like to mount a nfs filesystem from a non vserver (nfs server) into a vserver (nfs client). You should be able to put the mount entry in /etc/vservers/XXX/fstab, and it will be mounted at the appropriate time on vserver startup. The alternative is to turn on secure mounts, whereby the vserver can do mount operations, with certain restrictions. I haven't needed to use this myself. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Partial Xfree86 freeze at vserver ... stop
On Mon, 2005-09-12 at 10:37 +0200, Gilles wrote: When stopping the vserver, Xfree86 (on the host) starts to use ~99% of the CPU and the keyboard is dead; but not the [...] Linux 2.6.12.4-vs2.0+g1+g1 x86_64/0.30.208/0.30.208 [Ea] (0) VCI: 0002:0001 236 0374 Make sure you've got the previous patch I posted applied, if you're using x86_64 and the hard CPU scheduler. Right now there's a bug whereby a vserver might get stuck on maximum vavavoom (like a nice -5 bonus). Possibly unrelated, but it would mean that the vserver would pre-empt CPU heavy processes running on the main system with a normal priority which might explain what you're seeing. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [patch] fix sched on amd64
The attached patch fixes a bug when you use vsched on a running vserver on amd64, in which the high 32 bits of various scheduling parameters could be filled with garbage, causing tokens to be allocated at vastly incorrect rates. Patch is against -vs2.0 diff -ur linux-2.6.12.5-vs2.0.orig/include/linux/vserver/sched_def.h linux-2.6.12.5-vs2.0/include/linux/vserver/sched_def.h --- linux-2.6.12.5-vs2.0.orig/include/linux/vserver/sched_def.h 2005-09-05 19:05:11.0 +1200 +++ linux-2.6.12.5-vs2.0/include/linux/vserver/sched_def.h 2005-09-05 19:03:11.0 +1200 @@ -21,10 +21,10 @@ atomic_t tokens; /* number of CPU tokens */ spinlock_t tokens_lock; /* lock for token bucket */ - int fill_rate; /* Fill rate: add X tokens... */ - int interval; /* Divisor: per Y jiffies */ - int tokens_min; /* Limit: minimum for unhold */ - int tokens_max; /* Limit: no more than N tokens */ + uint32_t fill_rate; /* Fill rate: add X tokens... */ + uint32_t interval; /* Divisor: per Y jiffies */ + uint32_t tokens_min; /* Limit: minimum for unhold */ + uint32_t tokens_max; /* Limit: no more than N tokens */ uint32_t jiffies; /* last time accounted */ int priority_bias; /* bias offset for priority */ diff -ur linux-2.6.12.5-vs2.0.orig/kernel/vserver/sched.c linux-2.6.12.5-vs2.0/kernel/vserver/sched.c --- linux-2.6.12.5-vs2.0.orig/kernel/vserver/sched.c 2005-09-05 19:05:11.0 +1200 +++ linux-2.6.12.5-vs2.0/kernel/vserver/sched.c 2005-09-05 18:03:45.0 +1200 @@ -30,7 +30,7 @@ */ int vx_tokens_recalc(struct vx_info *vxi) { - long delta, tokens = 0; + uint32_t delta, tokens = 0; if (vx_info_flags(vxi, VXF_SCHED_PAUSE, 0)) /* we are paused */ signature.asc Description: This is a digitally signed message part ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] [patch] fix sched on amd64
On Mon, 2005-09-05 at 17:15 +0200, Herbert Poetzl wrote: On Mon, Sep 05, 2005 at 08:53:55PM +1200, Sam Vilain wrote: The attached patch fixes a bug when you use vsched on a running vserver on amd64, in which the high 32 bits of various scheduling parameters could be filled with garbage, causing tokens to be allocated at vastly incorrect rates. how and why? any details? sounds like a gcc bug to me (at first glance, maybe I'm wrong) Yes, there is a lot of guesswork in the diagnosis. Here's what I was seeing. If I increased the size of the token bucket via vsched to, say, 5, and set its size to zero, then the bucket would still be instantly filled, even with a 1:100 fill rate. At various times it would could up or down for a short while, and then reset to full again, but always less than 100 from the maximum (with a 1:100 fill rate). This behaviour is consistent with what you'd expect if values were wrapping around because of unwanted bits in the upper half of the word when calculations happened, that were later truncated to 32 bits again. That was an early guess, and my change stopped the problem from re-appearing. I agree with you that that prognosis would make this a compiler bug (I'm on gcc 3.3.5), but I think it is a good idea to avoid such bugs by avoiding mixed long / int calculations where possible. so I consider this a bandaid at most ... will look into the code soon ... Sure. If you have any better idea about how to really get to the bottom of this, let me know how I can help. All the ways I can see are in a distant field of pain :). Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Firewall between two vserver
Oliver Dietz wrote: {...} # Block everything between 2 vserver iptables -A INPUT -d 192.168.0.155 -s 192.168.0.157 -j DROP iptables -A INPUT -d 192.168.0.157 -s 192.168.0.155 -j DROP {...} The INPUT chain is for packets entering the box, but with vservers packets don't enter the box, all traffic is flowing inside the box. Try using the PREROUTING chain instead. And which table? Am i right with the mangle table? I played around a bit, but didn't find the solution until now ... maybe i will try it in the next weeks again ... [... elsewhere ...] I didnt tey but I guess that packets between the servers dont pass INCOMING chain as they are not entering the kernel from outside... I think they will walk through the Forward Chain Well, using FWBuilder, which generally Just Works™, the commands it generates look like this; $IPTABLES -A INPUT -i lo -p tcp \ -s 192.168.255.49 -d 192.168.255.0/24 --dport 22 \ -m state --state NEW -j ACCEPT I think the packets end up going through PREROUTING, INPUT and OUTPUT, but not FORWARD. However, note that they are going via the lo interface, even though in this case the servers are all set up on interface dummy0. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] loopback device in a vserver
Tilo Kaltenecker wrote: how can I use an indepentend loopback-device (127.0.0.1/8) in a vserver (patch vs2.0-rc5). You'll need to use NGN to do that. After extensive testing and experimentation, I concluded that the GNU C libraries had it hardcoded in too many places that localhost == 127.0.0.1. It's like the current situation where you have only a single interface; except more things than NTP fail, including things like SSH port forwarding. This means turning on the formidable disable legacy network API option. Make sure you play around with it extensively on a development system first. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Summary of recent improvement discussion
Gilles wrote: Bootstrapping Images I haven't seen this being discussed on the list, I hope I'm not about to say anything sacreligious, but am I in the minority to think that the build tools do not belong in util-vserver at all? Although not knowing much about it, I would think so too. I would like to see the tools as being inclusive, rather than minimal. You wouldn't expect to buy a car, then have to go somewhere else to get the seats and panelling. It is easily possible to have the source build into more than one package, make parts of it optional, etc - to avoid forcing everyone to install tons of what they might perceive as rubbish just to use vserver. Sure, projects like OpenVPS or StrongBox that have a different goal - providing business level objectives package rather than system level - still need to be seperate projects (but then, maybe parts of OpenVPS do belong in util-vserver, maybe all of it, who knows!). There may be a time when this really does deserve to be taken out and put in another package, but I don't think that time has come yet... Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] OpenFoundry project for Vserver Utilities
Ola Lundqvist wrote: As I now have the possiblity to apply a number of fixes directly to this upstream version I will do so. I need your advice on a number of things though. If you do not want me to modify some parts here please tell me as I may do it quite soon. :) This is wiki-style development; don't even worry too much about the code compiling. Just make sure that your commits have clear intent. Have trust in the Version Control System! That being said, commits which are untested and don't compile make a bit of work for others around release time. So be sure to at least hang out in IRC to receive the flames. * Is a dependency on perl for building acceptable? I need it as one of the manpages (vserver-build) is written in .pod format and need pod2man (provided by perl) to convert to manpage format. I'm not In subversion you should use a conveniently editable format. If the perl dependancy is an issue to anyone, it can always happen at the logical equivalent of `make dist' time, being a maintainer build dependancy instead. Heck, tools could be written in Haskell and compiled to C during maintainer release production, if it made development easier! * Do you want me to provide the tools available in vserver-debiantools package? It is a coupele of wrappers around debootstrap or 'vserver ... build' that make build process slightly easier. Sure, throw them in. Ideally they should work with a non-debian host, too, but beggars can't be choosers, so we can start with what works for some first. Your commits look great so far! Thanks for your hard work, Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] OpenFoundry project for Vserver Utilities
Christian Heim wrote: Well, if you like, i got holidays now and would like to do something for vserver ;) Excellent! Well, you should have received your invitation, so feel free to muck in! If you're stuck for ideas, even exploring open issues and adding notes to the code where you think it might need changing can help. But no doubt you'll be adding lots of gentoo related enhancements? Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] OpenFoundry project for Vserver Utilities
Michal Ludvig wrote: Don't take it personally as I just wonder - is your effort blessed by current project maintainers? (Herbert, Enrico, ...) Or is it a fork attempt? No-one's trying to fork anything. If anything, we're trying to help people pull their forks into the one repository. We so far including Herbert, Björn, Ola and me. Enrico and Jacques have also been invited. I'm not sure if you have an experience with development management, but allowing wiki style development seems contraproductive for me. You need a certain critical mass of eyeballs watching the changes go in before it works. Another important thing is automated smoke testing to track down where one fixed bug causes four new ones. Which of course we don't have yet, but the theory is good! :) I have participated in many opensource projects and I know how naive patches people (incl. me) try to commit at the beginning. They could fix a particular problem they address but without knowing the broader context they'd often break something else. Yes, that is fine. If you change something and break the tests, or someone else sees the change and realises that there are conditions that it would break, then it raises an issue that must be solved. Ideally by first adding a failing test to demonstrate the problem and then fixing that. If the change is decided to be a larger piece of work, then a branch can be made easily. The worst case is to reverse an individual change, in which case it is still not a complete loss - the person who made the change finds out why what they were doing was wrong. If they don't learn they lose their commit bit. The important point is to encourage exchange of ideas through code, and to connect eager hands with yearning code. The amount of crap getting in this way would be a real pain in the ass to clean once you're about to make a release. Changes to parts considered production will generally be reviewed as they are made. Eh, and one more proven thing - set up a mailing list whete all diffs from commits would be automatically posted by svn. If you really allow everyone commit everything you should at least post-examine the changes. I'm working on this now! Thanks for your input, Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] OpenFoundry project for Vserver Utilities
Hi all, I have set up a new project on OpenFoundry.org for util-vserver. OpenFoundry is like SourceForge, except it doesn't suck. http://utilvserver.openfoundry.org/ (no hyphens allowed in project names! bummer) For now the important thing it has is a public read-only Subversion server, and is trivial for project Admins to invite other people to be committers. So, you can grab the latest version of util-vserver from; http://svn.openfoundry.org/utilvserver/trunk/ Then use svn update friends to pull down new versions! :-D The older versions (produced by Enrico) have been imported to the repository as well. Ola, Jacques, Enrico, you should all have received invitations to join OpenFoundry. If anyone else thinks they might have contributions to make, please drop by #vserver on irc.oftc.net to discuss your ideas! Become a util-vserver committer today! :) Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Summary of recent improvement discussion
Herbert made an open call for suggestions for improvements today in IRC. It was noted that a list of enhancements and feature requests already exists; http://linux-vserver.org/ToDo+List+Tools As well as a list of bugs for util-vserver (anyone care to add these to the OpenFoundry issue Tracker?): http://savannah.nongnu.org/bugs/?group=util-vserver Automated Test Suites ~ The idea of automated test suites was suggested. The use of `qemu' (a virtual machine-ish package), in concert with passing init=testme.sh with a `standard' image can already be used to achieve some level of automated testing. Also, Herbert and others have systems with remote reboot switches and serial consoles, which can be semi-automatically used for regression testing vserver. With a little polish, it may be possible to make it trivial to run a full test of all vserver features with a single Makefile target, so you could run something like: make qemu_test or make harness_test console=/dev/ttyS1 reboot=/dev/ttyS2 To 'smoke test' the utilities (and the kernel patch too) on virtual or real hardware when playing with changes. Bootstrapping Images The status of debootstrap and `rpmstrap' in the current utilities was briefly discussed, so that vservers of lots of different types could easily be built without installing extra utilities manually. Björn pointed out scripts/vserver-build.debootstrap in the util-vserver distribution. There are also conflicts with some combinations of debian and rpm host vs guest building. The basic problem was agreed to be the way the tools try to install all the packages from the outside of the vserver, rather than the inside. Obviously each solution has its own benefits and disadvantages, but only bootstrapping the package utilities should need a packaging tool installed on the outer vserver - and that should be easily circumvented via guest images. Dummy functionality patch ~ Herbert also posted a link to a patch he had prepared for the new functionality of; * cloning vservers * re-configuring vservers * destroying vservers http://vserver.13thfloor.at/Experimental/delta-vserver-bertl01.diff I hope this is useful to someone! Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] FW: Oracle 10g... any Production Environments on VServer?
Matthew Nuzum wrote: I'm a big postgres fan and closely follow the performance mailing list. These features sound intriguing so I'm going to enquire about their status. Ah, my plan is falling into place... rubs hands together features are available to me. BTW, one interesting feature that Oracle has is the ability to store hierarchical data in a flat db table and pull it out in one query. For example: [...] This takes several queries in PostgreSQL. It sounds great in theory, doesn't it? Then I found out that you can't use it for anything 'useful', for instance by passing in a table column alias to the START WITH from an outer query, which seemed to me the most natural way to use it; select t1.id, t1.name, tn.id as child_id from mytable t1 left join (select t2.id from mytable t2 start with t2.id = t1.id connect by prior id = parent_id ) tn on top_id = t1.id where t1.name like '%foo%'; That's not a valid query; in fact I couldn't really see a way I could use it to generically do 'recursive' joins, to pretend that a heirarchical relationship is a mapping table or something like that, even using views and such trickery. However, it sure is useful for indenting heirarchical results for a single heirarchy in display, like it's EXPLAIN PLAN statement. I've seen this a lot with Oracle. Some feature sounds great, then you try to use it and find it's not as useful as you thought, for a very trivial yet seemingly unsurmountable reason (and I refuse to learn any DB-specific 4GLs ;)). I was dumbfounded when a bug in functional indexes gave me bogus results for a query (if some silly conditions held), and there was simply no patch available for a supposedly stable database. So we had to upgrade to a new major version (and of course we found other bugs there too). Tangram has ways to work around this problem in a DB-independant way, so I'm not particularly worried :). The information you found about these features is interesting; it sure would be great if Pg is maturing enough to be a viable replacement! Thanks for the off-topic banter :) Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Oracle 10g... any Production Environments on VServer?
Herbert Poetzl wrote: yeah, well, that's the beauty of proprietary services ... btw, postgresql is a very fine alternative to oracle, and this is not just hogwash told by folks who never used oracle before ... but of course YMMV Sadly, Postgres is missing these important features; - bitmap indexes - OLAP query re-writing Without those, our database would run like cold treacle. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Oracle 10g... any Production Environments on VServer?
Herbert Poetzl wrote: Sadly, Postgres is missing these important features; - bitmap indexes - OLAP query re-writing I have absolutely no idea what you are talking about ... but: New Enterprise Features in 7.4 * Hash aggregation in memory to make data warehousing and OLAP queries up to 20 times faster; (they are now at 8.0.1 or later ...) Well, given you asked, and NO THIS ISN'T A FLAMEWAR INVITATION LURKERS :) Bitmap indexes are a simple concept, and last time I checked there were Pg patches for them (using Pg's pluggable index system), but they weren't standard. Looking now, all I see is the occasional question on their mailing list followed by a few clueless replies ('do you mean this...?'). All they are is a B-Tree on the distinct *values* of a column, and then a very long bitmap for each value, one bit for each row in the table, with 1's in the rows where that value is held. A low CPU overhead compression scheme makes these fairly efficient. It means that if you're joining together lots of query conditions on columns with discrete values, it can be reduced to bitwise operations on these very long bitmaps; on a modern CPU the actual expanded bitmap might only actually end up in L1, and the CPU can crank through them at 1.6GHz * 64 * 4 (or however many ALUs your CPU has); still usually limited by IO capability of course. For data mining applications, this saves a *lot* of time, sometimes multiple orders of magnitude. And that's still simple. The OLAP query re-writing is even funkier. OLAP is a generic term for a large range of technologies, so there seems to be some things in there labelled for OLAP. There are lots of tricks that solve the goal of OLAP, no doubt most of which I am ignorant of. But in particular, one thing that Oracle does really nicely is the way you can make a view materialized - ie, the computed view is kept around, rather than being generated as needed. Then, when you perform queries on the original table that Oracle figures out could use the computed view to avoid looking at the original table (or improve speed by using an index, perhaps), then it transparently re-writes the query to instead use the materialized view (assuming you know how to flick all the relevant switches that only advanced Oracle DBAs can reach). The upshot of that is that you can take virtually any regularly repeated query, or hopefully a wide range of common queries, and manually help the database along by telling it what to pre-calculate. And you don't even have to 're-run' the queries when the source data changes - it has support for minimally updating just the bits that changed. Oracle certainly has a significant feature lead on Pg for data mining. Without those, our database would run like cold treacle. well, there are a bunch of SQL 'features' not present in Oracle either ... so it really depends on the requirements Absolutely. I think Oracle stinks as a general purpose application server back-end. It's buggy as a VW convention, heavy as a lead elephant and as snappy as old celery for small transactions. Sam. ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Linux-VServer Community Fund?!
Gregory (Grisha) Trubetskoy wrote: What I've seen work great in the past is if you establish a target amount, i.e. this much will keep us going for the next 6 months, then run a capmain to reach it with a running total on a webpage. Don't be timid, post to /. :-) . Usually you'll get more than what you asked for. This is a great example: http://people.freebsd.org/~phk/funding.html Interesting. How about making hosting company's ranking on: http://linux-vserver.org/VServer+Hosting depend on how much coin they've put forward! -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] stopping a context with zombie
Gregory (Grisha) Trubetskoy wrote: I'm not aware of a way to eliminate a zombie. kill the PPID. Use vps j or vps -o ppid to see what they are. If they're zombies owned by init or the fakeinit you've got a worse problem :). -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list Vserver@list.linux-vserver.org http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Re: vsched segfault (and workaround ;)
Adrian Reyer wrote: I attach an strace of the failed call: # strace vsched --xid 49161 --fill-rate 2 --interval 100 --tokens 499 --tokens-min 1 --tokens-max 999 --prio-bias 0 [...] vserver(0xe010003, 0xc009, 0x7fbac0, 0x2, 0) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ strace is no good for getting useful information about segfaults. However, it is easy to find out which function caused the segfault and this information can be invaluable for developers: - compile libraries and binaries with gcc -g (usually setting CCOPTS=-g during `make' phase is enough for this) - run program with gdb, and use bt to get a stack trace; $ gdb /path/to/binary (gdb) run --xid 49161 ... ... SEGV (gdb) bt Posting the entire gdb session output is usually worthwhile. In this case, it's pretty obvious where the fault is happening, but for next time ;-). -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] apt-get and vserver problem
Vincenzo Agosto wrote: Do you have a firewall rule in place to NAT traffic from the vserver IP address to the real IP address? nope, nothing rule Better set one up then! :-) http://www.paul.sladen.org/vserver/archives/200410/0256.html http://www.paul.sladen.org/vserver/archives/200305/0151.html -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Next Generation Networking ...
Matt Nuzum wrote: Some NICs (all?) have the ability to set or change the MAC address, or at least somehow affect what their mac address appears to be. That's right, some do. I've tried this on many chipsets and found support varies even between boards based on the same chipset. But the general rule is that most cards can do it. This is how you set it: ifconfig eth0 hw ether 12:34:56:78:9a:bc The interface can't be up at the time, or you'll get SIOCSIFHWADDR: Device or resource busy. And boy, am I lucky that I've got a serial console attached to the system whose root shell I just mistook for my laptop's for testing. In fact, with Debian you can set this in /etc/network/interfaces; other distros probably have their little place for it too: iface eth0 inet dhcp hwaddress ether 12:34:56:78:9a:bc Also, there's the concept of network bridging, which I've never used in linux but know works in Windows and Open BSD. In Window, the newly created Bridge gets a mac address and dishes the data to the right network card some how. With bridging, your host becomes a switching hub. It maintains an ARP table remembering which hardware address can be reached on which interface. Then, when it sends packets out - it masquerades the hardware address with the source address of the original packet. All bridging interfaces run in promiscuous mode, and if packets are seen which are not known to be on the same side as they originated, they are transmitted across the bridge. There's a little bit more complexity to it than that, for when you've got lots of switches, spanning protocols etc that I don't know a lot about myself. The bible in the area is Andrew S. Tanenbaum's _Computer Networks_, now in its third edition. Sam. ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] cpu limits clone vservers
Jörn Engel wrote: ...and the big challenge is - how do you apply this to memory usage? Oh, you could. But the general concept behind my unquoted list is a renewing resource. Network throughput is renewing. Network bandwidth usually isn't. With swapping, you can turn memory into cache and locality to the cpu is a renewable resource. Yep, that was my thought too. Memory seems like a static resource, so consider RSS used per second the renewable resource. Then you could charge tokens as normal. However, there are some tricky questions: 1) who do you charge shared memory (binaries etc) to ? 2) do you count mmap()'d regions in the buffercache? 3) if a process is sitting idle, but there is no VM contention, then they are using that memory more, so maybe they are using more fast memory tokens - but they might not really be occupying it, because it is not active. Maybe the thing with memory is that it's not important about how much is used per second, but more about how much active memory you are *displacing* per second into other places. We can find out from the VM subsystem how much RAM is displaced into swap by a context / process. It might also be possible for the MMU to report how much L2/L3 cache is displaced during a given slice. I have a hunch that the best solution to the memory usage problem will have to take into account the multi-tiered nature of memory. So, I think it would be excellent to be able to penalise contexts that thrash the L3 cache. Systems with megabytes of L3 cache were designed to keep the most essential parts of most of the run queue hot - programs that thwart this by being bulky and excessively using pointers waste that cache. And then, it needs to all be done with no more than a few hundred cycles every reschedule. Hmm. Here's a thought about an algorithm that might work. This is all speculation without much regard to the existing implementations out there, of course. Season with grains of salt to taste. Each context is assigned a target RSS and VM size. Usage is counted a la disklimits (Herbert - is this already done?), but all complex recalculation happens when somethings tries to swap something else out. As well as memory totals, each context also has a score that tracks how good or bad they've been with memory. Let's call that the Jabba value. When swap displacement occurs, it is first taken from disproportionately fat jabbas that are running on nearby CPUs (for NUMA). Displacing other's memory makes your context a fatter jabba too, but taking from jabbas that are already fat is not as bad as taking it from a hungry jabba. When someone takes your memory, that makes you a thinner jabba. This is not the same as simply a ratio of your context's memory usage to the allocated amount. Depending on the functions used to alter the jabba value, it should hopefully end up measuring something more akin to the amount of system memory turnover a context is inducing. It might also need something to act as a damper to pull a context's jabba nearer towards the zero point during lulls of VM activity. Then, if you are a fat jabba, maybe you might end up getting rescheduled instead of getting more memory whenever you want it! -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Casual, naïve implementation of namespace cleanup
Björn Steinbrink wrote: And here it finally is... It's a pretty hackish patch, but who cares? ;) Did you mean to include this link you posted on IRC? http://doener.homeip.net/doener/vserver/util-vserver-0.30.196-clean-namespace-test5.diff -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Namespaces wish list
Herbert Poetzl wrote: I guess we should move away from what we have now, get some distance, and think about what we want to have in let's say half a year (or maybe a year) then start to work in that direction ... Alright! Well, Christmas is coming, so from context 0 (or 1) I'd like to be able to do this (unless the context has a flag set): # ls /proc/virtual/61823/namespace bin dev homelib media opt root srv tmp var boot etc initrd lost+found mntproc sbin sys usr And I want it to give a different device number inside there so I can use `find -xdev' on /proc to just search proc, not the filesystems of all the vservers too. And I want to be able to do this: # mount --rbind /proc/virtual/61823/namespace /mnt/foo And I want this to opportunistically create a new namespace and automatically get rid of unreachable mounts: # chroot /mnt/foo/. sh -c cat /proc/mounts /dev/root / ext3 rw 0 0 procns /proc proc rw,nodiratime 0 0 shmns /tmp tmpfs rw 0 0 ptsns /dev/pts devpts rw 0 0 I'd like /proc/PID/mounts to be: - a symlink to /proc/mounts if the namespace hasn't been changed from the last pivot_root (or boot) in the host system, or if it is the same as the ``system'' namespace, if such a thing exists - a symlink to /proc/virtual/XID/mounts, if the process is `in' the context in namespace terms - a normal file, straight after a CLONE_NS, which has the same inode number on the /proc filesystem as any process with that namespace, and the number of links on it corresponds to the number of processes in that namespace. Maybe also, similarly with /proc/PID/namespace, a symlink to /, /proc/virtual/XID/namespace, or a real directory. And I want them all to virtualise magically so that you can create a vserver that can have vservers within it and not be able to tell the difference just by looking at /proc/mounts or /proc/PID/namespace. And I don't want to have to give contexts full mount ability to do that. And I want the implementation to think of a day when the we can have vservers within vservers, maybe an s_context has a parent s_context. (Zombie contexts! yay!) And I want to be able to kill off all processes on my context 0 system, get init to chroot(2) into a new filesystem I made, then all the other mounted filesystems just fall off the bottom of the mounts table and get umounted (as no namespaces are referring to them). And I'd like a Pony. TIA, -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Namespaces wish list
Sam Vilain wrote: I'd like /proc/PID/mounts to be: - a symlink to /proc/mounts if the namespace hasn't been changed from the last pivot_root (or boot) in the host system, or if it is the same as the ``system'' namespace, if such a thing exists - a symlink to /proc/virtual/XID/mounts, if the process is `in' the context in namespace terms - a normal file, straight after a CLONE_NS, which has the same inode number on the /proc filesystem as any process with that namespace, and the number of links on it corresponds to the number of processes in that namespace. As has been pointed out to me, /proc/mounts is actually the symlink in the current kernel. Maybe underneath that glaring mistake, there's the kernel of a good idea :) -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Casual, naïve implementation of namespace cleanup
Hi all, The following patch, to vservers.functions in the util-vserver distribution, will do something of a `namespace cleanup' in lieu of the rework to the vserver startup and mount cleanup process that Enrico has planned (I'm told). That is, with this patch, any filesystems which are NOT within the vserver vdir, or one of its parents, will be unmounted before the vserver's fstab is processed, which certainly isn't as tidy as can be done outside of a shell hack, but will probably work for many. This is necessary, so that running vservers don't hold a filesystem which is outside their chroot open due to namespaces. If you are not using namespaces, it will try and unmount virtually every filesystem on your system when you start a vserver. Be warned. In case it is not clear THIS IS A HACK NOT AN ENDORSED PATCH! :-) clunker:/usr/local/lib/util-vserver# diff -u vserver.functions{.orig,} --- vserver.functions.orig 2004-11-02 12:47:33.0 +1300 +++ vserver.functions 2004-11-02 12:48:27.0 +1300 @@ -667,6 +667,29 @@ test -z $NAMESPACE_CLEANUP || isAvoidNamespace $cfgdir || \ $_VNAMESPACE --cleanup +real_vdir=`cd $vdir pwd -P` +avoid=$real_vdir(/[^ ]*)? +while [ -n $real_vdir ] +do + real_vdir=`expr $real_vdir : '\(/.*\)/[^/]*'` + if [ -n $real_vdir ] + then + avoid=$avoid|$real_vdir + fi +done + +pattern=^[^ ]* ($avoid|/) + +#echo IGNORED MOUNTS ($pattern): +#cat /proc/mounts | tac | egrep $pattern +#echo REMOVED MOUNTS: +cat /proc/mounts | tac | egrep -v $pattern | + while read dev mntpoint junk + do + #echo unmounting $mntpoint + umount $mntpoint + done + _mountVserverInternal $cfgdir/fstab $_CHBIND [EMAIL PROTECTED] _mountVserverInternal $cfgdir/fstab.local An alternative, if you are not comfortable changing distributed files (and who is?) is to use something akin to this in /etc/vservers/.defaults/pre-start: /etc/vservers/.defaults/scripts/pre-start: #!/bin/sh # NOTE: this script will not work in the default configuration VS=`pwd | sed -e 's/\/vdir//;s/.*\///'` cat /proc/mounts | tac | perl -nlaF/\\s+/ -e 'BEGIN{$VS=shift}; ($dev, $loc) = @F; if ($loc =~ m{^/(vservers(/$VS(/.*)?)?)?$}) { print not unmounting $loc ($dev) } else { print unmounting $loc ($dev); system(umount, -nv, $loc) } ' $VS --- However, this does not work, because (for example) `/proc' will appear in /proc/mounts three times - once for the root server on /, once for the vserver on /vservers/foo/proc, and then the same mount again which has been overlaid in the VFS table with the recursive bind mount that binds /vservers/foo to /. That is, there are at pre-start time, two mounts on /proc according to /proc/mounts. A simple workaround, to keep with the above approach, assumes that all mounts that fit into the above category don't have a device that has `/dev' in their name, and you don't care about those that are in the above category appearing an extra time in /proc/mounts. /etc/vservers/.defaults/scripts/pre-start: #!/bin/sh VS=`pwd | sed -e 's/\/vdir//;s/.*\///'` cat /proc/mounts | tac | perl -nlaF/\\s+/ -e 'BEGIN{$VS=shift}; ($dev, $loc) = @F; if ($loc =~ m{^/(vservers(/$VS(/.*)?)?)?$} or $dev !~ /dev/) { print not unmounting $loc ($dev) } else { print unmounting $loc ($dev); system(umount, -nv, $loc) } ' $VS --- -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Plesk 7
Gregory (Grisha) Trubetskoy wrote: and rising energy costs. As servers get faster, they will consume more power (9W for a 90MHz Pentium vs 75W for a 2GHz), and at some point this FWIW the VIA C3 Nehemiah running at 1GHz draws only 11.25W. Drop it down to a 600MHz and you're down to 4.5W. -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] [Applause] Development 1.9.3 'Samhain'
Herbert Poetzl wrote: Here goes the fourth Linux 2.6 development version of Linux-VServer aka vs1.9.3 'Samhain' Well done Herbert! I find it outstanding how a complete port to 2.6 has happened with so much attention to detail in what seems like just a few months. It was January 4th 2004 that the first experimental Linux 2.6 vserver port was released, against linux 2.6.0. The first stable release, vs1.9.0, was dated May 11th. Herbert's incredible level of compassion towards administrators and integrators with difficulties asking for help in the #vserver channel has been the catalyst for getting the community thriving to test and assist his no less than legendary coding performance. Enrico's correspending reforming and redevelopment of the userland utilities was no mean feat. From its complete tracking of the myriad of vserver API changes through the 1.3.x transitional period to the current day, I am constantly surprised by the comprehensiveness of these tools. The result, a stable, managable release, well in time for the community-wide consensus that Linux 2.6 itself can possibly be considered resembling something that might just happen to look like its stable. If you're lucky. I look forward to that time. Apologies for not mentioning anyone else who deserves an honourable mention, interested readers should definitely get a feeling for the size of the community by browsing: http://www.linux-vserver.org/Hall+of+Fame -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] EBUSY on rmdir of a previous mount point with namespaces
Found that tricky can't-remove-the-mount-point bug. clunker:/vservers# mkdir compileit clunker:/vservers# grep compileit /etc/fstab /dev/clunker/compileit /vservers/compileit ext3defaults 1 2 clunker:/vservers# mount compileit/ kclunker:/vservers# journald starting. Commit interval 5 seconds EXT3 FS on dm-1, internal journal EXT3-fs: mounted filesystem with ordered data mode. vserver bind start Mounting shadow filesystems for bind Starting system log daemon: syslogd. Starting kernel log daemon: klogd. Starting domain name service: named. clunker:/vservers# vserver bind exec grep comp /proc/mounts clunker:/vservers# grep comp /proc/mounts /dev/clunker/compileit /vservers/compileit ext3 rw 0 0 clunker:/vservers# vserver bind exec cat /proc/mounts clunker:/vservers# umount compileit clunker:/vservers# rmdir compileit rmdir: `compileit': Device or resource busy clunker:/vservers# vserver bind stop Sending all processes the TERM signal...done. Sending all processes the KILL signal...done. clunker:/vservers# rmdir compileit/ clunker:/vservers# Look! It works with tmpfs, too! clunker:/vservers# mkdir foo clunker:/vservers# mount -t tmpfs none foo clunker:/vservers# vserver bind start Mounting shadow filesystems for bind Starting system log daemon: syslogd. Starting kernel log daemon: klogd. Starting domain name service: named. clunker:/vservers# umount foo clunker:/vservers# rmdir foo rmdir: `foo': Device or resource busy clunker:/vservers# vserver bind stop Sending all processes the TERM signal...done. Sending all processes the KILL signal...done. clunker:/vservers# rmdir foo clunker:/vservers# This really shouldn't happen for mount points which are entirely outside the chroot of the new namespace, but I think this may be another point of our `chroot/pivot_root/vnamespace/mount --rbind/chcontext' chicken, egg, rooster, barn and farmer problem. For mount points which are *inside* the chroot, is this a bug or a feature? Is it possible to have a filesystem mounted on a path in one namespace, then remove the underlying directory? Versions: Kernel: 2.6.9-final-vs1.9.3-rc3 VS-API: 0x00010022 util-vserver: 0.30.195; Oct 10 2004, 16:55:15 -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Cannot set defaultroute under vserver
TK Lew wrote: I have read the LARTC but until now I still cannot ping or get vapt-get working. You need to use SNAT; I started the a vserver using eth1 and have a ppp0 connection to the internet. Vserver IP : 192.168.100.200 This what i did using ip route :: 203.106.129.217 dev ppp0 proto kernel scope link src 210.195.72.82 Assuming your world IP address is 210.195.72.82, then you need to run a command like this after your network is up: iptables -t nat -A POSTROUTING -o ppp0 -j SNAT --to-source 210.195.72.82 Others have mentioned tools like fwbuilder and shorewall, which boil down to issuing commands like the above as well as providing nicer interfaces for doing all other things iptables. -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] quick vsched howto
Gregory (Grisha) Trubetskoy wrote: vsched takes the following arguments: --fill-rate The number of tokens that will be placed in the bucket. --interval How often (the above specified) number of tokens will be placed. This is in jiffies. Through some googleing I've found references that a jiffy is about 10ms, but it seems to me it's less than that. Not sure if the CPU speed has bearing on it. (Anyone know?) The important factor is the ratio; fill-rate - * 100 = % CPU allocation interval Note that that this is the proportion of a *single* CPU in the system. So, if you have four CPUs and you want one context to get an average of one whole CPU to itself, then you'd set fill-rate to 1 and interval to 4. It is advantageous to smooth operation of the algorithm to make the interval as small as possible (or much smaller than the bucket size). You can in most cases simplify the fraction, such as changing --fill-rate=30 and --interval=100 to --fill-rate=3 and --interval=10. For simple cases, like evenly distributing cpu time between vservers, you probably just want to set the ratio to somewhere between 1/N (where N is the number of servers) and 1/P (where P is the maximum expected peak load per CPU), and not bother with hard scheduling. Process count ulimits will put an upper bound on possible abuse by a context. When trying to come up with a good setting in my environment (basically hosting), I was looking for values that would not cripple the snappiness of the server, but prevent people from being stupid (e.g. cat /dev/zero | bzip2 | bzip2 | bzip2 /dev/null). To achieve this, it is important that contexts that are being CPU hogs are penalised fairly quickly... As the tokens in the bucket deplete, the nice value of the contexts is adjusted - they lose their vavavoom. As this happens, the processes get shorter and shorter timeslices. Other, more deserving processes will get longer timeslices and hence more CPU time. Additionally, bear in mind that individual processes also get a minor nice boost or penalty, depending on whether those processes have been CPU hogs recently or not. This is diminished in vserver kernels compared to standard kernels, but should still have sufficient effect to counter extreme conditions. The fill interval should be short enough to not be noticeable, so something like 100 jiffies. The fill rate should be relatively small, something like 30 tokens. Tokens_min seems like it should simply equal to the fill rate. The tokens_max should be generous so that people can do short cpu-intensive things when the need them, so something like 1 tokens. From the experimentation I did, I'd say 10,000 tokens is quite large - 10 seconds of real CPU time. Compare this with the default value of 500. If you've given a context 30% of the CPU as described above, then that actually means about 10-15 wall clock seconds of CPU hogging before the context gets appreciably penalised. For the algorithm to work best, I think you would want to reduce this to about 1-2 seconds' worth of jiffies. You are right in saying that tokens_max is the burst CPU rate, so setting it to a large value like 1, while setting the interval to a large value like 100, would indicate that you are optimising your system for batch scheduling (long time slices, higher overall throughput), not interactive use (short time slices, reduced throughput). My guess is that min_tokens (not in my original implementation) is a batch optimisation as well, but perhaps small values (~10) are useful to avoid excessive context switching. But then, I didn't really experiment with the hard scheduling side of things, so maybe if you are hard scheduling it is more important to make sure that the buckets don't normally run out. Of course just because I wrote the original algorithm does not by any means lend much extra weight to my opinion on how to use it, and I invite others to respond with their experience. While playing with this stuff I've run into situations where a context has no tokens left, at which point you cannot even kill the processes in it. Don't panic - you can always reenter the context and call vsched with new parameters. Heh. I don't know if this is current behaviour or not, but I think the signals should really queue and the context will close as soon as the processes wake up and receive enough cycles to process them and exit. Sending -KILL signals would clean it up pretty quickly (as soon as enough tokens are allocated for the processes to run), as chances are they won't consume any tokens to receive a KILL signal. Though, it would be nice if they didn't need tokens allocated to be stopped via KILL. -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http
Re: [Vserver] [PATCH] immulink ioctl is not available on vs1.9.3-rc2, even with CONFIG_VSERVER_LEGACY
Here's the missing link. This is tested as working, but needs to be ported to ext2, reiserfs, etc. --- linux-2.6.9-final-vs1.9.3-rc3/fs/ext3/ioctl.c.orig 2004-10-19 16:15:58.0 +1300 +++ linux-2.6.9-final-vs1.9.3-rc3/fs/ext3/ioctl.c 2004-10-19 16:33:43.0 +1300 @@ -58,11 +58,11 @@ * * This test looks nicer. Thanks to Pauline Middelink */ - if ((oldflags EXT3_IMMUTABLE_FL) || + if (((oldflags EXT3_IMMUTABLE_FL) || ((flags ^ oldflags) - (EXT3_APPEND_FL | EXT3_IMMUTABLE_FL))) { - if (!capable(CAP_LINUX_IMMUTABLE)) - return -EPERM; + (EXT3_APPEND_FL | EXT3_IMMUTABLE_FL | EXT3_IUNLINK_FL))) +!capable(CAP_LINUX_IMMUTABLE)) { + return -EPERM; } /* -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Bringing down vsever brings down _all_ interfaces
Björn Steinbrink wrote: Did you build your kernel with CONFIG_SECURITY enabled? If so, make sure that you also enabled CONFIG_SECURITY_CAPABILITIES and that the module is loaded if it was built as a module. Otherwise the default capability handling is disabled and your vserver is therefore allowed to remove the interfaces. Enrico, Could you please make `chcontext --secure' confirm that the capabilities mask was changed, or that the `capability' module is loaded - this is extremely nasty behaviour that we should work around at all costs! -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] immulink ioctl is not available on vs1.9.3-rc2, even with CONFIG_VSERVER_LEGACY
Herbert, Could you please add this (and the equivalent to ext3_fs.h, reiserfs_fs.h, etc) to the head branch? I'd like to keep using my ioctl() based scripts, until vunify works as well as unify-dirs, and there is a vserver-build.immucp --- ext2_fs.h.orig 2004-10-17 20:29:10.0 +1300 +++ ext2_fs.h 2004-10-17 20:26:47.0 +1300 @@ -196,8 +196,13 @@ #define EXT2_IUNLINK_FL0x0800 /* Immutable unlink */ #define EXT2_RESERVED_FL 0x8000 /* reserved for ext2 lib */ +#ifdef CONFIG_VSERVER_LEGACY +#define EXT2_FL_USER_VISIBLE 0x0C03DFFF /* User visible flags */ +#define EXT2_FL_USER_MODIFIABLE0x0C0380FF /* User modifiable flags */ +#else #define EXT2_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ #define EXT2_FL_USER_MODIFIABLE0x000380FF /* User modifiable flags */ +#endif /* * ioctl commands I've attached the immucp script that I find smashing for building new vservers in seconds. Shortly (once I get the vs1.9.3-rcX server I'm building up the way I like it ;-)), it will be online. -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) #!/usr/bin/perl -w use strict; use Fcntl ':mode'; use Carp qw(verbose); BEGIN { use vars qw(@POD_HOOKS $be_random $do_suid); no strict 'refs'; =head1 NAME immucp - Duplicate structures, with immutability/immulink support =cut push @POD_HOOKS, NAME = sub { my @m; ( @m = m/(\S+) - (.*)/ ) do { *{PROGNAME} = sub { $m[0] }; *{SHORT_DESC} = sub { $m[1] }; } }; =head1 SYNOPSIS immucp [options] dir1 dir2 =cut push @POD_HOOKS, SYNOPSIS = sub { my $a = $_; *{SYNOPSIS} = sub { $a } }; =head1 DESCRIPTION immucp will copy the first location to the second location, but will only ever make links, rather than copy. It is very similar to using `cp -al', but has support for setting Linux inode attributes along the way. =cut push @POD_HOOKS, DESCRIPTION = sub { my $a = $_; *{DESCRIPTION} = sub { $a } }; =head1 COMMAND LINE OPTIONS The following command line options are available: =cut # Extract the command line options for the usage screen from the # POD ;-) use vars qw(@options); push @POD_HOOKS, 'COMMAND LINE OPTIONS' = sub { # This hook is deleted below under RELEASE Pod::Constants::add_hook (#-debug = 1, '*item' = sub { my ($switches, $description) = m/^(.*?)\n\n(.*)/s; my (@switches, $longest); $longest = ; for my $switch ($switches =~ m/\G ((?:-\w|--\w+)) (?:,\s*)? /gx) { push @switches, $switch; if ( length $switch length $longest) { $longest = $switch; } } $longest =~ s/^-*//; push @options, $longest, { options = [EMAIL PROTECTED], description = $description, }; } ); }; =over 4 =item -h, --help Display program usage =item -v, --verbose Verbose program execution =item -d, --debug Even more verbose program execution =item -V, --version Print the program version =item -i, --immutable Sets the immutable inode attribute. =item -l, --linkage Sets the immutable linkage invert inode attribute. =item -S, --suid Link files that have set user/group ID bits set =item -r, --random Turns on randomising of directory scanning and tree traversal. This option tries to prevent against racing symlink attacks. A better solution is planned. =back =head1 INODE ATTRIBUTES AND IMMUTABILITY Hard linking identical files between directories has a drawback: if one is changed, then the other one changes too. To counter this, you can set the immutable inode attribute on combined files (see Lchattr). Setting inode attribute requires root privileges, CCAP_SYS_ATTR, and a filesystem that supports it. Currently this includes default ext2 and ext3 in any recent kernel, or reiserfs with the inode attributes patch applied (available from Cftp://ftp.namesys.com/pub/reiserfs-for-2.4/2.4.18.pending/). The problem with setting immutable is that then the file can not be unlinked or renamed. In the case where you have a user without CAP_SYS_ATTR, but otherwise with write permission to a file, they cannot then change it. In comes the immutable linkage invert flag. This flag will toggle immutability
Re: [Vserver] Re: [patch 1/3] lsm: add bsdjail module
Herbert Poetzl wrote: up to 16 addresses are currently allowed in this set in the future the limit will go away (network code is actually the oldest piece) by using 'markings' (network is virtualized to allow binding to 0.0.0.0) Interesting. Is it the intention that these changes be entirely transparent to host-based firewall implementations (is that possible?), or entirely administered by them? Or some mixture? -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] test and cowlinks
Jörn Engel wrote: There is vunify which is part of util-vserver package. What is better for general usage is Sam's unify-dirs script. It is located at http://mirrors.paul.sladen.org/sam.vilain.net/vserver/unify-dirs. Just use this without the -l or -i options and it will just do hard links, which is what you want for the current implementation of cow links. Darn! Doesn't work for me yet. One personal problem is a slow (USB1) 300GB hard-drive that contains some identical files. I was thinking about hashing only the first 4k or so of each file and the do a direct comparison in case of hash collision. Even with sha1 over the complete file, there is no guarantee that a hash collision means two identical files. The chances of bits on your hard drive platter randomly losing their magnetism or capacitors in your RAM losing charge and changing are probably higher than two different files having an SHA1 collision :-). Hey, maybe *that's* why I get those random reiserfs corruptions! Hashing only the first block of the file as an optimisation is a sensible idea. The script could be easily modified to do this as a seperate step, however bear in mind that it will only even consider checking the file's contents if the files already have the same owner/group/permissions, relative path and file size. My assumption was that if these all match, the files are probably going to be the same anyway. Also, I want a database with all already known files. Untimately this could be turned into a daemon that watches the complete fs tree for changes and turns new files into cowlinks shortly after creation. With such a daemon, cp -r will temporarily flush part of the page cache, have the same result as cowcopy -r. Nice idea, but I think on UNIX that's pretty much a can of worms with no easy answer. You'd need something in the kernel that notifies userland when any inode on a filesystem changes. Have a look at the intermezzo module if you want to go down that path. If you can provide the kernel half, I'll be more than happy to extend unify-dirs to work with it :). Failing active monitoring, as a simple compromise there's no reason that unify-dirs couldn't optionally store its internal inode/stat/SHA1 hash cache in a Berkeley database, and run the script every hour or so via cron. It would certainly prevent the copious stat()'ing that the script does, at the expense of not noticing unlikely unification situations until the DB cache entries expire. Of course, it would still absolutely hammer the VFS every time it runs with readdir() calls and find all those glorious reiserfs corner case bugs, but in my experience with a handful (say, 30) of vservers that are already mostly unified the script completes in under a minute when unifying just the OS (eg, /usr, /lib, /sbin and /bin). Who knows, maybe there are other optimizations possible - like only stat()'ing the leaf directories in the heirarchy, to see if any files have been added or removed before actually using readdir() to read them. Again this will not catch some unlikely unification situations until full stat()'ing happens. Sam. ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Re: unify-dirs on debian
Sergey Goldgaber wrote: From a look at the unify-dirs code I do not understand why it is the t attribute that is being set instead of the I attribute. But that seems to be the core of the problem. There are two problems; a) The e2fsprogs and kernel team ignored my original request to allocate that bit for that purpose, and it has since been also grabbed by the no tail merge attribute. Currently, setting that bit makes *both* features happen at the same time, though normally this is harmless b) You need -i as well. The immutable linkage invert flag (hey, couldn't think of a more succinct name), if set on its own, will make a file that is not linkable but is writable. Thank you for your answer. I tried unify-dirs -li and that worked... until recently, when I ran in to some directories that were making unify-dirs segfault. I traced this down to a call to readdir_inode on directories with more than 125 members. The following program will segfault with perl 5.6.1 and ReadDir 0.02: #!/usr/bin/perl # use ReadDir qw(readdir_inode); $dir = . ; for ( $i=0 ; $i126 ; $i++ ) { system(touch $i) ; } my @dirents = readdir_inode $dir; Thanks again for writing a very useful utility. Apart from this it really works great! Glad to hear. Thanks for the test script, with it the bug stood out like a sore thumb. I've just uploaded ReadDir 0.03 to CPAN. -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] ssh -X login without x11uselocalhost no; Re:
Christian, Actually, my thoughts are that it would be nice to be able to disable the SSH X11 forwarding from listening on any IPv4 address at all. The /unix xauth entry is for connection over the unix socket, rather than TCP/IP. Unix sockets are very much like a TCP/IP socket, but the addresses refer to locations on the filesystem, typically within /tmp - rather than IPv4 network addresses. So an application listens on a filesystem path, and applications connect to that filename. `ls -l /tmp/.X11-unix' should show you examples of these types of sockets. Maybe you are hitting against a problem I encountered once. I was experimenting with using addresses on the 127.* network for vserver `localhost' interfaces, much as you seem to be with routable addresses. I can't remember the details, save that the problems I discovered started with ssh X11 forwarding not working - and following the function calls in the source one by one, I tracked the problem down to there being some combination of flags you can pass to the bind()/socket()/etc combination that specifies a local address rather than first resolving localhost in your application. Rather than resolving localhost for you, the libraries simply use a hardcoded 127.0.0.1 - which, of course, doesn't work unless it's in your IPROOT, and the SSH X11 forwarding fails. As would port forwarding. This was with an old version of vserver (ctx17), but I decided that requiring a patched libc to use vservers was a bit out of the question, as this bug would affect more than just SSH. Though it is not strictly required by the RFCs for localhost to be 127.0.0.1, the fact that the C library has it hardcoded, would not seem to lend itself well to the approach working in general. Besides, IPv6 *does* specify ::1 as being the loopback interface. I was stone-walled by the libc maintainers, and eventually decided this approach was not future-proof enough. So, to implement per-vserver localhost, I realised I would have had to implement some kind of transparent masquerading of each vserver's `localhost' interface to per-vserver addresses. Unfortunately I wimped out at this point, but I think Alex Lyashkov has a similar feature in his fork of vserver. In any case, it is highly recommended if you are concerned about network security to use iptables firewalling in the root server to only permit inward connections to your well known services - I highly recommend fwbuilder as it has an excellent network object model, a great GUI and works well with Linux vservers. Be sure to get the 1.1+ release, earlier releases have this annoying habit of automatically `up'ing vserver alias interfaces for you, but with the wrong name. HTH, Sam. Christian Jaeger wrote: Hello There's a problem in the interaction between (ssh and xauth and (linux-vserver or non-127.0.0.1 localhost ip's)): If you use another ip than 127.0.0.1 (like say, 10.0.1.1) for localhost purposes like me (and use a /etc/hosts file like this: 10.0.1.1 localhost 192.186.1.1 ourvirtualhostname ), then ssh -X into this vserver won't work anymore except when you put x11uselocalhost no into the serverside sshd_config - but this opens up your X socket to non-local clients, which is not a good idea when considering the possibility of the presence of security holes in the X server code. The script here solves the problem: http://pflanze.mine.nu/~chris/vserver/xauth It changes the hostname for the setup of the authentification cookie from unix to localhost - this is all it takes to make the X authentification work again. I'm cross-posting this to the openssh mailing list. Should openssh be changed, or should xauth be changed? What is the reason for the unix argument and why doesn't it work for the vserver/'strange-localhost' case? Cheers Christian. ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] vs 1.27 debian 2.4.26 source package
The Debian patch against the 2.4.26 kernel is at: http://ftp.debian.org/debian/pool/main/k/kernel-source-2.4.26/kernel-source-2.4.26_2.4.26-1.diff.gz Ola Lundqvist and Ron Lee are the current maintainers of the Debian package that includes patches against various versions of the Debian source trees. Maintainer information about the Debian versions of the util-vserver utilities package and patch are at: http://packages.qa.debian.org/u/util-vserver.html http://packages.qa.debian.org/k/kernel-patch-ctx.html Sam. Herbert Poetzl wrote: On Wed, Apr 21, 2004 at 12:05:30PM +0300, Berk Ülsoy wrote: Hi!, I just read Herbert's note about vs 1.27 on 2.4.26 and grabbed the debian source package (http://packages.debian.org/unstable/devel/kernel-source-2.4.26) to test it. however patch fails on 5 places. I tried to alter the patch file but difference was not only line numbers but also function declarations, return values etc so i think its better to leave this more professional hands. Debian kernel source packages almost always include some fixes and patches and crash the vserver patch file. For those who do not want to switch from debian maintained kernel source to bare source, shall we expect a special debian patch file for every vserver release ? (at least stable ones ?) well, is there a stable debian kernel in debian unstable? anyway, provide the patches (from vanilla to debian kernel) and promise to test the kernel, and I'll update the debian patches to 2.4.26-X (which should be done by the debian vserver maintainers, btw ...) best, Herbert good day ! Berk -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] util-vserver -- future directions
Enrico Scholz wrote: * it has new vserver-build methods; currently the apt-rpm, debootstrap and a simple skeleton methods are implemented. New methods are in preparation (copy) or are waiting for community input (gentoo, slackware). For RPM based distributions, 'vapt-get' and 'vrpm' tools were written which are allowing a secure external packagemanagement. Allow me to throw mine into the fold, then; these additions let you have each vserver on a seperate filesystem, whilst still having the benefits of unification; all changes are in /usr/sbin/vserver: STATIC_DIRS="usr lib sbin bin" UNIQUE_DIRS="etc var" mountproc() { mkdir -p $VROOTDIR/$1/proc $VROOTDIR/$1/dev/pts if [ ! -d $VROOTDIR/$1/proc/1 ] ; then mount -n -t proc none $VROOTDIR/$1/proc mount -n -t devpts -o gid=5,mode=0620 none $VROOTDIR/$1/dev/pts fi if [ -d $VROOTDIR/shadow/$1/usr -a ! -d $VROOTDIR/$1/usr/bin ] then for dir in $STATIC_DIRS do [ -d $VROOTDIR/$1/$dir ] || mkdir $VROOTDIR/$1/$dir mount -n --bind $VROOTDIR/shadow/$1/$dir $VROOTDIR/$1/$dir done fi } umountproc() { umount $VROOTDIR/$1/proc 2/dev/null umount $VROOTDIR/$1/dev/pts 2/dev/null if [ -d $VROOTDIR/shadow/$1/usr ] then for dir in $STATIC_DIRS do umount $VROOTDIR/$1/$dir 2/dev/null done fi } # ... later on, during `vserver XXX build' code: if test "$UTIL_VSERVER_AVOID_COPY"; then mkdir -p $VROOTDIR/$1/{etc/rc.d/init.d,sbin,var/run,var/log} else MASTER=/ [ -d $VROOTDIR/master ] MASTER=$VROOTDIR/master echo "Copying files from $MASTER" if [ -d $VROOTDIR/shadow/master ] then ( cd $VROOTDIR/master; cp -ax $UNIQUE_DIRS $VROOTDIR/$1/. ) || exit 1 echo "Linking files from $VROOTDIR/shadow/master" mkdir $VROOTDIR/shadow/$1 ( cd $VROOTDIR/shadow/master; cp -a $STATIC_DIRS $VROOTDIR/shadow/$1/. cd $VROOTDIR/shadow $USR_LIB_VSERVER/unify-dirs -il master $1 ) || exit 1 mountproc $1 TMP_MOUNT=1 else ( cd $MASTER cp -ax $UNIQUE_DIRS $STATIC_DIRS $VROOTDIR/$1/. ) || exit 1 fi fi This all stems from a vague, possibly irrational urge that each vserver should have its own filesystem, rather than letting many vservers share the same filesystem and using quotas or a similar mechanism to restrict them. This is convenient for me, as I use reiserfs (the masochism of which pales in comparison to the bugs in the ext3 online resizing patches) on LVM managed space, so I can allocate vservers more space as and when required, and have protection against possible fragmentation between servers (of course, the widely touted "fact" that Unix filesystems don't suffer from fragmentation may be true, but they're not immune to it). To explain the above in excruciating detail: It is assumed that the `master' vserver, in /vservers/master, has its /usr, /lib, /sbin and /bin moved to /vservers/shadow/master. This filesystem will contain the operating system files (ie, the four directories mentioned) for all vservers which are `shadowed'. during build time, the new server has /{usr,sbin,bin,lib} copied via a `cd /vservers/shadow; cp -al master/* $vserver/; chattr -R +iI $vserver' analog, if those directories have been moved out of /vservers/master to /vservers/shadow/master in the skeleton. I'm using a straight copy, followed by a call to my unify-dirs script (which, hopefully, your new vunify is powerful enough to emulate the behaviour of without all the segfaults) - which is sub-optimal - a `vcp-al' would be useful - but works for me. The other directories (/var and /etc) are simply copied into the vserver's filesystem. during `vserver start' time, if the shadow operating system directories are detected on /vservers/shadow/$1/*, then mount them into place with mount --bind. Maintaining the unification is as simple as (cd /vservers/shadow; unify-dirs -il *) This is quite effective; even with a lot of software installed in the master image, you only need about 30MB of space on the filesystems you create as a minimal starting point for Debian woody vservers. And most of that is the `apt' and `dpkg' databases. This is all extremely groovy if you have an automatic script that runs the other associated stuff; lvcreate -L 100M -n myVserver /dev/myVG mkreiserfs /dev/myVG/myVserver echo "/dev/myVG/myVserver /vservers/myVserver reiserfs defaults 1 69" /etc/fstab mkdir /vservers/myVserver mount /vservers/myVserver I've had `vserver build' using the above technique building new vservers in *3 seconds* (not counting the mkreiserfs time) in the past. Would this be a welcome enhancement if brushed up for the current util-vserver release? -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering)
Re: [Vserver] util-vserver -- future directions
Gregory (Grisha) Trubetskoy wrote: Am I missing something - you're mounting things that are in the shadow server via --bind - but doesn't this mean that if one of the vservers unlinks the file in a directory mounted this way, it will be gone for all other vservers? No, the structure looks like this: /vservers/master/usr (empty) /vservers/shadow/master/usr (contents) /vservers/myVserver/usr (empty) /vservers/shadow/myVserver/usr (linked contents) All of the /vservers/shadow/* directories are hard linked replicas of each other, using immutable file contents flags. BTW, I really wish Linux had something like the FreeBSD unionfs. Each approach has its own drawbacks and limitations I guess. I assume that unionfs is a union mount - mounting a filesystem on top of another one so that you can see through to the contents of the filesystem underneath, but new files get created on the filesystem on top. But what about removing files? How does unionfs handle that? ie, if there is a file present on the master, can you remove it from the union mounted version? Or only replace it? -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] SSH login inside vserver not working
On Mon, 15 Mar 2004 20:10, Sam Vilain wrote; Use sshd -t -p NNN (where NNN is a port number) inside the vserver. Gah, correction - ssh -d -p NNN d for debug, not t for test :-} -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) Time is a great teacher, but it kills all its pupils. - anon. ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Re: Vserver O(1)
On Thu, 04 Mar 2004 22:39, wrote; One year ago you made a ptah for vserver to use O(1) sheduled (http://www.paul.sladen.org/vserver/archives/200302/0155.html) - didn't you have same patch for 2.4.25 and O(1) patch, possible form lck patchset ? That patchmonster Herbert has included it in his -ck1 patch stream. The most recent I could find that includes it is v1.3.5; pretty http://www.13thfloor.at/vserver/d_release/v1.3.5/ You'll need a vanilla 2.4.24 Linux kernel, with this patch applied: http://www.plumlocosoft.com/kernel/patches/2.4/2.4.24/2.4.24-ck1/patch-2.4.24-ck1.bz2 Make sure you are running the right version of the util-vserver package too. -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) A Project Manager is like the madam in a brothel. His job is to see that everything comes off right. - anon. ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
vserver@list.linux-vserver.org
On Tue, 10 Feb 2004 11:28, [EMAIL PROTECTED] wrote; I'm using rsync (-vazPx --numeric-ids) to backup vservers and then trying to unify them using Sam Vilain's unify-dirs to save space. Unify-dirs runs for a while, unifying quite a number of files, and then segfaults. This problem came up before and turned out to be faulty RAM in the server. unify-dirs seems to share a property with compiling the Linux kernel in really thrashing a system and bringing out these problems. The only thing I can suggest would be making sure that you have the most recent versions of everything from CPAN, particularly the ReadDir module. If you can reliably reproduce the problem, then run perl inside gdb and use `bt' to show a back-trace once you get the segfault. This will at least reveal the section of the program that is generating the fault. ie $ gdb /usr/bin/debugperl Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-linux. (gdb) run unify-dirs -vil /vservers/vserver1 /vservers/vserver2 ... Program received SIGSEGV, Segmentation fault (gdb) bt If running it once fails, then doing some other server activity (in particular working the dircache - eg `find /'), and running it again fails in a different place, then it is more probably a fault in the platform than the script... -- Sam Vilain, sam /\T vilain |T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering) It is a rather pleasent experience to be alone in a bank at night. WILLIE SUTTON ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] unify-dirs buglet?
On Tue, 09 Dec 2003 11:28, Roderick A. Anderson wrote; {/vservers.new}# unify-dirs -d ref tc unify-dirs: Unifying: ref tc unify-dirs: Processing ref... unify-dirs: Readdir OK unify-dirs: Processing tc... unify-dirs: Readdir OK unify-dirs: Processing ref/sbin... unify-dirs: Readdir OK unify-dirs: Processing tc/sbin... unify-dirs: Readdir OK unify-dirs: COMPARE: tc/sbin/install-info vs ref/sbin/install-info unify-dirs: digesting tc/sbin/install-info Error digesting tc/sbin/install-info; Input/output error at /root/bin/unify-dirs line 287. Input/output error means that the OS returned EIO when a file was read. As compared with EPERM (Permission Denied), this typically means there was a hard error on the disk, or a bug in the VFS layer or filesystem ( you're not running reiserfs perchance are you? ;) ). This may manifest itself as a segfault if it was in the VFS layer. If it's completely random, it might even mean that you have faulty RAM! Does: cat tc/sbin/install-info ref/sbin/install-info /dev/null Work? If not, then you have a more serious problem. I'd look at the output of `dmesg' to look for the IDE or SCSI layers complaining, or otherwise take the system down to single and fsck the partition. If it *does* work, but `unify-dirs' doesn't, then there is something wierder going on - in which case, mail me /tmp/trace (off-list) from: strace -fae -o /tmp/trace unify-dirs -d ref tc -- Sam Vilain, [EMAIL PROTECTED] The pursuit of truth and beauty is a sphere of activity in which we are permitted to remain children all our lives. ALBERT EINSTEIN ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
Re: [Vserver] Copy a vserver to different partition?
On Fri, 07 Nov 2003 09:35, Tor Rune Skoglund wrote; Hmmm, just a sidenote... */proc/* will also exclude any other proc directory that might be around. E.g. if a user for some reason has named a directory as proc under his home directory. So it might be better to specify the absolute path to proc here, to be sure you are getting the whole server. Passing: `-x' to rsync or cp `l' to tar `-xdev' to find Will stop the process from crossing filesystem boundaries. This is another quite clean approach. -- Sam Vilain, [EMAIL PROTECTED] 'Martyrdom' is the only way a person can become famous without ability GEORGE BERNARD SHAW ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Re: [ckrm-tech] [1 of 6] patch for core module
On Fri, 07 Nov 2003 03:17, Chandra Seetharaman wrote; ... in the middle of the CKRM core patch... #define __NR_tgkill270 #define __NR_utimes271 #define __NR_fadvise64_64 272 +#define __NR_set_tag 273 +#define __NR_res_ctrl 274 My, you've picked the same SysCall number as the one that the Linux Vserver project has reserved! What a co-incidence, because we should be working together as we're approaching facets of the same problem :-). We do virtualisation, you do resource management. We have 273 set up as a Virtual Server `syscall switch'. Our current stable development stream that includes the switch[1] is against 2.4.x. Take a look at how we've set up the syscall switch - it's extensible and forward thinking, and could easily accommodate CKRM's calls. We currently use this syscall for virtualisation purposes; things like hiding processes, blocking inter-context signals, limiting bind(), etc. We've also got a CPU scheduler that works for the O(1) kernel, that allows you to set the desired CPU usage level for a context with soft limits, along with a syscall to adjust the priority [2]. Is there any chance of co-operation here? If you will let us extend CKRM for virtualisation, we can share the syscall and end up with a super-duper virtualisation/resource management system for Linux! And SysAdmins will appreciate coherance between the numbers used for each. 1. http://www.13thfloor.at/vserver/d_release/v1.1.0/patch-2.4.23-pre9-vs1.1.0.diff 2. http://www.vilain.net/linux/ctx/split-2.4.22-ac4-c17g2/ -- Sam Vilain, [EMAIL PROTECTED] If youve seen one redwood, youve seen them all. RONALD REAGAN ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
[Vserver] Re: [ckrm-tech] [1 of 6] patch for core module
On Fri, 07 Nov 2003 19:46, Chandra Seetharaman wrote; But, integrating CKRM and vserver might not add more value and provide unneccessary burden for somebody that want sees value in only one of them. OK, I see your point. With CKRM design this can be easily done. RBCE can be extended to allow rules that can be defined to classify tasks based on their security context. And by limiting the resource shares used by each class, one can control how much resource a particular security context uses. OK then. In that case, it probably makes more sense if we (vserver) don't bother with doing any of the resource management stuff for our 2.6.0 port, if CKRM is going to provide a greater feature set, but instead focus on integrating documentation, examples and performing testing. We currently have implementations for resource control over: 1. cpu - as referred to before: 2. http://www.vilain.net/linux/ctx/split-2.4.22-ac4-c17g2/ have you compared this CPU scheduling with your own implementation? I didn't see any patch to sched.c in your core patch, I assume that it is packaged seperately. Do you have a pre-release patch that I can look at to see how the `pros' :) do it? 2. disk limits per vserver: http://vserver.13thfloor.at/Linux2.6/index.php?page=Per+Context+Disk+Limits we've also got a quota-per-context patch, but I don't know whether or not you care about that. 3. memory limits: Herbert, correct me if I'm wrong, but the most up to date version for the plain kernel is: http://vserver.13thfloor.at/Experimental/patch-2.4.23-pre7-O1-c17g2-ml0.04.diff.bz2 There's also a version for Rik van Riel's VM subsystem (rmap), and I'm the wrong person to ask what the sitation is re: which VM is in 2.6.0, where the memory limits patch is for rmap. All I know is that with the rmap patch, we can limit the RSS per vserver. Perhaps Rik can `take it away' here and explain what the situation is :-) 4. network limits: not part of vserver - but I've used iptables in the past for this. Have you got implementations for all these parts too? -- Sam Vilain, [EMAIL PROTECTED] A complex system that works is invariably found to have evolved from a simple system that worked. - anon. ___ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver