Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
> I would greatly appreciate it if you could open the bug, I don't have an > opensolaris bugzilla account yet and you'd probably put better technical > details in it anyway :). If you do, could you please let me know the bug# > so I can refer to it once S10U6 is out and I confirm it has the same > behavior? > 6763592 creating zfs filesystems gets slower as the number of zfs filesystems increase Pramod ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On Thu, 23 Oct 2008, Pramod Batni wrote: > On 10/23/08 08:19, Paul B. Henson wrote: > > > > Ok, that leads to another question, why does creating a new ZFS filesystem > > require determining if any of the existing filesystems in the dataset are > > mounted :)? > > I am not sure. All the checking is done as part of the libshare's sa_init > which is calling into sa_get_zfs_shares(). It does make a big difference whether or not sharenfs is enabled, I haven't finished my testing, but at 5000 filesystems it takes about 30 seconds to create a new filesystem and over 30 minutes to reboot if they are shared, but only 7 seconds to make a filesystem and about 15 minutes to reboot if they are not. > You could do that else I can open a bug for you citing the Nevada > build [b97] you are using. I would greatly appreciate it if you could open the bug, I don't have an opensolaris bugzilla account yet and you'd probably put better technical details in it anyway :). If you do, could you please let me know the bug# so I can refer to it once S10U6 is out and I confirm it has the same behavior? Thanks much... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On 10/23/08 08:19, Paul B. Henson wrote: On Tue, 21 Oct 2008, Pramod Batni wrote: Why does creating a new ZFS filesystem require enumerating all existing ones? This is to determine if any of the filesystems in the dataset are mounted. Ok, that leads to another question, why does creating a new ZFS filesystem require determining if any of the existing filesystems in the dataset are mounted :)? I could see checking the parent filesystems, but why the siblings? I am not sure. All the checking is done as part of the libshare's sa_init which is calling into sa_get_zfs_shares(). In any case a bug can be filed on this. Should I open a sun support call to request such a bug? I guess I should wait until U6 is released, I don't have support for SXCE... You could do that else I can open a bug for you citing the Nevada build [b97] you are using. Pramod Thanks... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On Tue, 21 Oct 2008, Pramod Batni wrote: > > Why does creating a new ZFS filesystem require enumerating all existing > > ones? > > This is to determine if any of the filesystems in the dataset are mounted. Ok, that leads to another question, why does creating a new ZFS filesystem require determining if any of the existing filesystems in the dataset are mounted :)? I could see checking the parent filesystems, but why the siblings? > In any case a bug can be filed on this. Should I open a sun support call to request such a bug? I guess I should wait until U6 is released, I don't have support for SXCE... Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On 10/21/08 04:52, Paul B. Henson wrote: On Mon, 20 Oct 2008, Pramod Batni wrote: Yes, the implementation of the above ioctl walks the list of mounted filesystems 'vfslist' [in this case it walks 5000 nodes of a linked list before the ioctl returns] This in-kernel traversal of the filesystems is taking time. Hmm, O(n) :(... I guess that is the implementation of getmntent(3C)? In fact the problem is that 'zfs create' calls the ioctl way too many times. getmntent(3C) issues a single ioctl( MNTIOC_GETMNTENT). Why does creating a new ZFS filesystem require enumerating all existing ones? This is to determine if any of the filesystems in the dataset are mounted. The ioctl calls are coming from: libc.so.1`ioctl+0x8 libc.so.1`getmntany+0x200 libzfs.so.1`is_mounted+0x60 libshare.so.1`sa_get_zfs_shares+0x118 libshare.so.1`sa_init+0x330 libzfs.so.1`zfs_init_libshare+0xac libzfs.so.1`zfs_share_proto+0x4c zfs`zfs_do_create+0x608 zfs`main+0x2b0 zfs`_start+0x108 zfs_init_libshare is walking through a list of filesystems and determining if each of them are mounted. I think there can be a better way to do this rather than doing a is_mounted() check on each of the filesystems. In any case a bug can be filed on this. Pramod You could set 'zfs set mountpoint=none ' and then create the filesystems under the . [In my experiments the number of ioctl's went down drastically.] You could then set a mountpoint for the pool and then issue a 'zpool mount -a' . That would work for an initial mass creation, but we are going to need to create and delete fairly large numbers of file systems over time, this workaround would not help for that. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On Mon, 20 Oct 2008, Pramod Batni wrote: > Yes, the implementation of the above ioctl walks the list of mounted > filesystems 'vfslist' [in this case it walks 5000 nodes of a linked list > before the ioctl returns] This in-kernel traversal of the filesystems is > taking time. Hmm, O(n) :(... I guess that is the implementation of getmntent(3C)? Why does creating a new ZFS filesystem require enumerating all existing ones? > You could set 'zfs set mountpoint=none ' and then create the > filesystems under the . [In my experiments the number of > ioctl's went down drastically.] You could then set a mountpoint for the > pool and then issue a 'zpool mount -a' . That would work for an initial mass creation, but we are going to need to create and delete fairly large numbers of file systems over time, this workaround would not help for that. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On Mon, 20 Oct 2008, Paul B. Henson wrote: > > I haven't rebooted it yet; I somewhat naively assumed performance would be > much better and just started a script to create test file systems for about > 10,000 people. I'm going to delete the pool and re-create it, then create > 1000 filesystems at a time and gather some performance statistics. It would be useful to know if there is a performance difference between many filesystems in one directory, or the same number of filesystems in multiple directories. For example, you could have upper directories 'a', 'b', 'c', etc, and put the filesystems under these upper directories so there are fewer filesystems per directory. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On Sun, 19 Oct 2008, Ed Plese wrote: > The biggest problem I ran into was the boot time, specifically when "zfs > volinit" is executing. With ~3500 filesystems on S10U3 the boot time for > our X4500 was around 40 minutes. Any idea what your boot time is like > with that many filesystems on the newer releases? I haven't rebooted it yet; I somewhat naively assumed performance would be much better and just started a script to create test file systems for about 10,000 people. I'm going to delete the pool and re-create it, then create 1000 filesystems at a time and gather some performance statistics. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
Paul B. Henson wrote: > > > At about 5000 filesystems, it starts taking over 30 seconds to > create/delete additional filesystems. > > At 7848, over a minute: > > # time zfs create export/user/test > > real1m22.950s > user1m12.268s > sys 0m10.184s > > I did a little experiment with truss: > > # truss -c zfs create export/user/test2 > > syscall seconds calls errors > _exit.000 1 > read .004 892 > open .023 67 2 > close.001 80 > brk .006 653 > getpid .0378598 > mount.006 1 > sysi86 .000 1 > ioctl 115.534 313036787920 > execve .000 1 > fcntl.000 18 > openat .000 2 > mkdir.000 1 > getppriv .000 1 > getprivimplinfo .000 1 > issetugid.000 4 > sigaction.000 1 > sigfillset .000 1 > getcontext .000 1 > setustack.000 1 > mmap .000 78 > munmap .000 28 > xstat.000 65 21 > lxstat .000 1 1 > getrlimit.000 1 > memcntl .000 16 > sysconfig.000 5 > lwp_sigmask .000 2 > lwp_private .000 1 > llseek .084 15819 > door_info.000 13 > door_call.1038391 > schedctl .000 1 > resolvepath .000 19 > getdents64 .000 4 > stat64 .000 3 > fstat64 .000 98 > zone_getattr .000 1 > zone_lookup .000 2 > -- > sys totals: 115.804 31338551 7944 > usr time: 107.174 > elapsed: 897.670 > > > and it seems the majority of time is spent in ioctl calls, specifically: > > ioctl(16, MNTIOC_GETMNTENT, 0x08045A60) = 0 > Yes, the implementation of the above ioctl walks the list of mounted filesystems 'vfslist' [in this case it walks 5000 nodes of a linked list before the ioctl returns] This in-kernel traversal of the filesystems is taking time. > Interestingly, I tested creating 6 filesystems simultaneously, which took a > total of only three minutes, rather than 9 minutes had they been created > > sequentially. I'm not sure how parallelizable I can make an identity > management provisioning system though. > > Was I mistaken about the increased scalability that was going to be > available? Is there anything I could configure differently to improve this > performance? We are going to need about 30,000 filesystems to cover our > You could set 'zfs set mountpoint=none ' and then create the filesystems under the . [In my experiments the number of ioctl's went down drastically.] You could then set a mountpoint for the pool and then issue a 'zpool mount -a' . Pramod > faculty, staff, students, and group project directories. We do have 5 > x4500's which will be allocated to the task, so about 6000 filesystems per. > Depending on what time of the quarter it is, our identity management sytem > can create hundreds up to thousands of accounts, and when we purge accounts > quarterly we typically delete 10,000 or so. Currently those jobs only take > 2-6 hours, with this level of performance from ZFS they would take days if > not over a week :(. > > Thanks for any suggestions. What is the internal recommendation on maximum > number of file systems per server? > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
On Sun, Oct 19, 2008 at 4:08 PM, Paul B. Henson <[EMAIL PROTECTED]> wrote: > At about 5000 filesystems, it starts taking over 30 seconds to > create/delete additional filesystems. The biggest problem I ran into was the boot time, specifically when "zfs volinit" is executing. With ~3500 filesystems on S10U3 the boot time for our X4500 was around 40 minutes. Any idea what your boot time is like with that many filesystems on the newer releases? Ed Plese ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6
I originally started testing a prototype for an enterprise file service implementation on our campus using S10U4. Scalability in terms of file system count was pretty bad, anything over a couple of thousand and operations started taking way too long. I had thought there were a number of improvements/enhancements that had been made since then to improve performance and scalability when a large number of file systems exist. I've been testing with SXCE (b97) which presumably has all of the enhancements (and potentially then some) that will be available in U6, and I'm still seeing very poor scalability once more than a few thousand filesystems are created. I have a test install on an x4500 with two TB disks as a ZFS root pool, 44 TB disks configured as mirror pairs belonging to one zpool, and the last two TB disks as hot spares. At about 5000 filesystems, it starts taking over 30 seconds to create/delete additional filesystems. At 7848, over a minute: # time zfs create export/user/test real1m22.950s user1m12.268s sys 0m10.184s I did a little experiment with truss: # truss -c zfs create export/user/test2 syscall seconds calls errors _exit.000 1 read .004 892 open .023 67 2 close.001 80 brk .006 653 getpid .0378598 mount.006 1 sysi86 .000 1 ioctl 115.534 313036787920 execve .000 1 fcntl.000 18 openat .000 2 mkdir.000 1 getppriv .000 1 getprivimplinfo .000 1 issetugid.000 4 sigaction.000 1 sigfillset .000 1 getcontext .000 1 setustack.000 1 mmap .000 78 munmap .000 28 xstat.000 65 21 lxstat .000 1 1 getrlimit.000 1 memcntl .000 16 sysconfig.000 5 lwp_sigmask .000 2 lwp_private .000 1 llseek .084 15819 door_info.000 13 door_call.1038391 schedctl .000 1 resolvepath .000 19 getdents64 .000 4 stat64 .000 3 fstat64 .000 98 zone_getattr .000 1 zone_lookup .000 2 -- sys totals: 115.804 31338551 7944 usr time: 107.174 elapsed: 897.670 and it seems the majority of time is spent in ioctl calls, specifically: ioctl(16, MNTIOC_GETMNTENT, 0x08045A60) = 0 Interestingly, I tested creating 6 filesystems simultaneously, which took a total of only three minutes, rather than 9 minutes had they been created sequentially. I'm not sure how parallelizable I can make an identity management provisioning system though. Was I mistaken about the increased scalability that was going to be available? Is there anything I could configure differently to improve this performance? We are going to need about 30,000 filesystems to cover our faculty, staff, students, and group project directories. We do have 5 x4500's which will be allocated to the task, so about 6000 filesystems per. Depending on what time of the quarter it is, our identity management sytem can create hundreds up to thousands of accounts, and when we purge accounts quarterly we typically delete 10,000 or so. Currently those jobs only take 2-6 hours, with this level of performance from ZFS they would take days if not over a week :(. Thanks for any suggestions. What is the internal recommendation on maximum number of file systems per server? -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss