Re: [zfs-discuss] million files in single directory
On Sat, October 3, 2009 20:50, Jeff Haferman wrote: > And why does an rsync take so much > longer on these directories when directories that contain hundreds of > gigabytes transfer much faster? Rsync protocol has to exchange information about each file between client and server, as part of the process of deciding whether to send that file. Clearly there will be many more such exchanges in the directories containing many more files. It therefore appears quite natural to me that, given two directories with the same amount of actual data, the one with more files will take longer to rsync. (The time will ALSO depend on the amount of actual data, of course; given two directories with the same number of files, but one having 10x the data, the one with more data will at least sometimes take longer, particularly if the files differ and the data actually has to be transmitted.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] million files in single directory
That section doesn't actually prescribe one size, so what size did you choose and how exactly did you set it? You haven't told us, neither has anyone asked you about the basic system config. For starters, what CPU, memory and storage? What other stuff is this machine doing? Also we do really need to know what version of Solaris (inc relevant patches) you are using? What other changes gave you made, if any? Thanks, Phil Sent from my iPhone On 5 Oct 2009, at 00:24, Jeff Haferman wrote: Rob Logan wrote: Directory "1" takes between 5-10 minutes for the same command to return (it has about 50,000 files). That said, directories with 50K files list quite quickly here. a directory with 52,705 files lists in half a second here 36 % time \ls -1 > /dev/null 0.41u 0.07s 0:00.50 96.0% perhaps your ARC is too small? I set it according to Section 1.1 of the ZFS Evil Tuning Guide: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] million files in single directory
Rob Logan wrote: > > >> Directory "1" takes between 5-10 minutes for the same command to > return > >> (it has about 50,000 files). > > > That said, directories with 50K files list quite quickly here. > > a directory with 52,705 files lists in half a second here > > 36 % time \ls -1 > /dev/null > 0.41u 0.07s 0:00.50 96.0% > > perhaps your ARC is too small? > I set it according to Section 1.1 of the ZFS Evil Tuning Guide: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] million files in single directory
>> Directory "1" takes between 5-10 minutes for the same command to return >> (it has about 50,000 files). > That said, directories with 50K files list quite quickly here. a directory with 52,705 files lists in half a second here 36 % time \ls -1 > /dev/null 0.41u 0.07s 0:00.50 96.0% perhaps your ARC is too small? Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] million files in single directory
On Sat, 3 Oct 2009, Jeff Haferman wrote: When I go into directory "0", it takes about a minute for an "ls -1 | grep wc" to return (it has about 12,000 files). Directory "1" takes between 5-10 minutes for the same command to return (it has about 50,000 files). This seems kind of slow. In the directory with a million files that I keep around for testing this is the time for the first access: % time \ls -1 | grep wc \ls -1 4.70s user 1.20s system 32% cpu 17.994 total grep wc 0.11s user 0.02s system 0% cpu 17.862 total and for the second access: % time \ls -1 | grep wc \ls -1 4.66s user 1.17s system 69% cpu 8.366 total grep wc 0.11s user 0.02s system 1% cpu 8.234 total However, my directory was created as quickly as possible rather than incrementally over a long period of time so it lacks the longer/increased disks seeks caused by fragmentation and block allocations. That said, directories with 50K files list quite quickly here. I did an rsync of this directory structure to another filesystem [lustre-based, FWIW] and it took about 24 hours to complete. We have Rsync is very slow in such situations. What version of Solaris are you using? The Solaris version (including patch version if using Solaris 10) can make a big difference. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] million files in single directory
On Sat, Oct 3, 2009 at 6:50 PM, Jeff Haferman wrote: > > A user has 5 directories, each has tens of thousands of files, the > largest directory has over a million files. The files themselves are > not very large, here is an "ls -lh" on the directories: > [these are all ZFS-based] > > [r...@cluster]# ls -lh > total 341M > drwxr-xr-x+ 2 someone cluster 13K Sep 14 19:09 0/ > drwxr-xr-x+ 2 someone cluster 50K Sep 14 19:09 1/ > drwxr-xr-x+ 2 someone cluster 197K Sep 14 19:09 2/ > drwxr-xr-x+ 2 someone cluster 785K Sep 14 19:09 3/ > drwxr-xr-x+ 2 someone cluster 3.1M Sep 14 19:09 4/ > > When I go into directory "0", it takes about a minute for an "ls -1 | > grep wc" to return (it has about 12,000 files). Directory "1" takes > between 5-10 minutes for the same command to return (it has about 50,000 > files). > > I did an rsync of this directory structure to another filesystem > [lustre-based, FWIW] and it took about 24 hours to complete. We have > done rsyncs on other directories that are much larger in terms of > file-sizes, but have thousands of files rather than tens, hundreds, and > millions of files. > > Is there someway to speed up "simple" things like determining the > contents of these directories? And why does an rsync take so much > longer on these directories when directories that contain hundreds of > gigabytes transfer much faster? > > Jeff > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > Be happy you don't have Windows + NTFS with hundreds of thousands, or millions of files. Explorer will crash, run your system out of memory and slow it down, or plain out hard lock windows for hours on end. This is on brand new hardware, 64bit, 32GB RAM, and 15k SAS disks. Regardless of filesystem, I'd suggest splitting your directory structure into a hierarchy. It makes sense even just for cleanliness. -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] million files in single directory
+-- | On 2009-10-03 18:50:58, Jeff Haferman wrote: | | I did an rsync of this directory structure to another filesystem | [lustre-based, FWIW] and it took about 24 hours to complete. We have | done rsyncs on other directories that are much larger in terms of | file-sizes, but have thousands of files rather than tens, hundreds, and | millions of files. | | Is there someway to speed up "simple" things like determining the | contents of these directories? Use zfs snapshots. See zfs(1M) and review the incremental send syntax. | And why does an rsync take so much | longer on these directories when directories that contain hundreds of rsync has to build its file list (stat is slow) on both sides of the sync, then compare them, and then send each one. (d)truss it sometime. It's a lot of syscalls. The initial zfs send may be slow, depending on the total size, but the incrementals will be pretty fast. Certainly faster than rsync (by orders of magnitude), as ZFS already knows which blocks it seends to send, and is only sending blocks. If the target host doesn't support ZFS in some form, you could dump the snapshots to disk and use those for backups. Or restructure your storage hierarchy (which uh, you might want to do anyway). -- bda cyberpunk is dead. long live cyberpunk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] million files in single directory
Jeff Haferman wrote: A user has 5 directories, each has tens of thousands of files, the largest directory has over a million files. The files themselves are not very large, here is an "ls -lh" on the directories: [these are all ZFS-based] [r...@cluster]# ls -lh total 341M drwxr-xr-x+ 2 someone cluster 13K Sep 14 19:09 0/ drwxr-xr-x+ 2 someone cluster 50K Sep 14 19:09 1/ drwxr-xr-x+ 2 someone cluster 197K Sep 14 19:09 2/ drwxr-xr-x+ 2 someone cluster 785K Sep 14 19:09 3/ drwxr-xr-x+ 2 someone cluster 3.1M Sep 14 19:09 4/ When I go into directory "0", it takes about a minute for an "ls -1 | grep wc" to return (it has about 12,000 files). Directory "1" takes between 5-10 minutes for the same command to return (it has about 50,000 files). "ls" sorts its output before printing, unless you use the option to turn this off (-f, IIRC, but check the man-page). "echo * | wc" is also a way to find out what's in a directory, but you'll miss "."files, and the shell you're using may have an influence .. HTH Michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] million files in single directory
A user has 5 directories, each has tens of thousands of files, the largest directory has over a million files. The files themselves are not very large, here is an "ls -lh" on the directories: [these are all ZFS-based] [r...@cluster]# ls -lh total 341M drwxr-xr-x+ 2 someone cluster 13K Sep 14 19:09 0/ drwxr-xr-x+ 2 someone cluster 50K Sep 14 19:09 1/ drwxr-xr-x+ 2 someone cluster 197K Sep 14 19:09 2/ drwxr-xr-x+ 2 someone cluster 785K Sep 14 19:09 3/ drwxr-xr-x+ 2 someone cluster 3.1M Sep 14 19:09 4/ When I go into directory "0", it takes about a minute for an "ls -1 | grep wc" to return (it has about 12,000 files). Directory "1" takes between 5-10 minutes for the same command to return (it has about 50,000 files). I did an rsync of this directory structure to another filesystem [lustre-based, FWIW] and it took about 24 hours to complete. We have done rsyncs on other directories that are much larger in terms of file-sizes, but have thousands of files rather than tens, hundreds, and millions of files. Is there someway to speed up "simple" things like determining the contents of these directories? And why does an rsync take so much longer on these directories when directories that contain hundreds of gigabytes transfer much faster? Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss