Re: [zfs-discuss] million files in single directory

2009-10-05 Thread David Dyer-Bennet

On Sat, October 3, 2009 20:50, Jeff Haferman wrote:
> And why does an rsync take so much
> longer on these directories when directories that contain hundreds of
> gigabytes transfer much faster?

Rsync protocol has to exchange information about each file between client
and server, as part of the process of deciding whether to send that file. 
Clearly there will be many more such exchanges in the directories
containing many more files.  It therefore appears quite natural to me
that, given two directories with the same amount of actual data, the one
with more files will take longer to rsync.

(The time will ALSO depend on the amount of actual data, of course; given
two directories with the same number of files, but one having 10x the
data, the one with more data will at least sometimes take longer,
particularly if the files differ and the data actually has to be
transmitted.)
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] million files in single directory

2009-10-04 Thread Phil Harman
That section doesn't actually prescribe one size, so what size did you  
choose and how exactly did you set it?


You haven't told us, neither has anyone asked you about the basic  
system config. For starters, what CPU, memory and storage? What other  
stuff is this machine doing?


Also we do really need to know what version of Solaris (inc relevant  
patches) you are using?


What other changes gave you made, if any?

Thanks,
Phil

Sent from my iPhone

On 5 Oct 2009, at 00:24, Jeff Haferman  wrote:


Rob Logan wrote:



Directory "1" takes between 5-10 minutes for the same command to

return

(it has about 50,000 files).



That said, directories with 50K files list quite quickly here.


a directory with 52,705 files lists in half a second here

36 % time \ls -1 > /dev/null
0.41u 0.07s 0:00.50 96.0%

perhaps your ARC is too small?




I set it according to Section 1.1 of the ZFS Evil Tuning Guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] million files in single directory

2009-10-04 Thread Jeff Haferman
Rob Logan wrote:
> 
> >>  Directory "1" takes between 5-10 minutes for the same command to  
> return
> >> (it has about 50,000 files).
> 
> > That said, directories with 50K files list quite quickly here.
> 
> a directory with 52,705 files lists in half a second here
> 
> 36 % time \ls -1 > /dev/null
> 0.41u 0.07s 0:00.50 96.0%
> 
> perhaps your ARC is too small?
> 


I set it according to Section 1.1 of the ZFS Evil Tuning Guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] million files in single directory

2009-10-04 Thread Rob Logan


>>  Directory "1" takes between 5-10 minutes for the same command to  
return

>> (it has about 50,000 files).

> That said, directories with 50K files list quite quickly here.

a directory with 52,705 files lists in half a second here

36 % time \ls -1 > /dev/null
0.41u 0.07s 0:00.50 96.0%

perhaps your ARC is too small?

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] million files in single directory

2009-10-04 Thread Bob Friesenhahn

On Sat, 3 Oct 2009, Jeff Haferman wrote:


When I go into directory "0", it takes about a minute for an "ls -1 |
grep wc" to return (it has about 12,000 files).  Directory "1" takes
between 5-10 minutes for the same command to return (it has about 50,000
files).


This seems kind of slow.  In the directory with a million files that I 
keep around for testing this is the time for the first access:


% time \ls -1 | grep wc
\ls -1  4.70s user 1.20s system 32% cpu 17.994 total
grep wc  0.11s user 0.02s system 0% cpu 17.862 total

and for the second access:

% time \ls -1 | grep wc
\ls -1  4.66s user 1.17s system 69% cpu 8.366 total
grep wc  0.11s user 0.02s system 1% cpu 8.234 total

However, my directory was created as quickly as possible rather than 
incrementally over a long period of time so it lacks the 
longer/increased disks seeks caused by fragmentation and block 
allocations.


That said, directories with 50K files list quite quickly here.


I did an rsync of this directory structure to another filesystem
[lustre-based, FWIW] and it took about 24 hours to complete.  We have


Rsync is very slow in such situations.

What version of Solaris are you using?  The Solaris version (including 
patch version if using Solaris 10) can make a big difference.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] million files in single directory

2009-10-03 Thread Brent Jones
On Sat, Oct 3, 2009 at 6:50 PM, Jeff Haferman  wrote:
>
> A user has 5 directories, each has tens of thousands of files, the
> largest directory has over a million files.  The files themselves are
> not very large, here is an "ls -lh" on the directories:
> [these are all ZFS-based]
>
> [r...@cluster]# ls -lh
> total 341M
> drwxr-xr-x+ 2 someone cluster  13K Sep 14 19:09 0/
> drwxr-xr-x+ 2 someone cluster  50K Sep 14 19:09 1/
> drwxr-xr-x+ 2 someone cluster 197K Sep 14 19:09 2/
> drwxr-xr-x+ 2 someone cluster 785K Sep 14 19:09 3/
> drwxr-xr-x+ 2 someone cluster 3.1M Sep 14 19:09 4/
>
> When I go into directory "0", it takes about a minute for an "ls -1 |
> grep wc" to return (it has about 12,000 files).  Directory "1" takes
> between 5-10 minutes for the same command to return (it has about 50,000
> files).
>
> I did an rsync of this directory structure to another filesystem
> [lustre-based, FWIW] and it took about 24 hours to complete.  We have
> done rsyncs on other directories that are much larger in terms of
> file-sizes, but have thousands of files rather than tens, hundreds, and
> millions of files.
>
> Is there someway to speed up "simple" things like determining the
> contents of these directories?  And why does an rsync take so much
> longer on these directories when directories that contain hundreds of
> gigabytes transfer much faster?
>
> Jeff
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Be happy you don't have Windows + NTFS with hundreds of thousands, or
millions of files.
Explorer will crash, run your system out of memory and slow it down,
or plain out hard lock windows for hours on end.
This is on brand new hardware, 64bit, 32GB RAM, and 15k SAS disks.

Regardless of filesystem, I'd suggest splitting your directory
structure into a hierarchy. It makes sense even just for cleanliness.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] million files in single directory

2009-10-03 Thread Bryan Allen
+--
| On 2009-10-03 18:50:58, Jeff Haferman wrote:
| 
| I did an rsync of this directory structure to another filesystem
| [lustre-based, FWIW] and it took about 24 hours to complete.  We have
| done rsyncs on other directories that are much larger in terms of
| file-sizes, but have thousands of files rather than tens, hundreds, and
| millions of files.
| 
| Is there someway to speed up "simple" things like determining the
| contents of these directories?

Use zfs snapshots. See zfs(1M) and review the incremental send syntax.

| And why does an rsync take so much
| longer on these directories when directories that contain hundreds of

rsync has to build its file list (stat is slow) on both sides of the sync, then
compare them, and then send each one. (d)truss it sometime. It's a lot of
syscalls.

The initial zfs send may be slow, depending on the total size, but the
incrementals will be pretty fast. Certainly faster than rsync (by orders of
magnitude), as ZFS already knows which blocks it seends to send, and is only
sending blocks.

If the target host doesn't support ZFS in some form, you could dump the
snapshots to disk and use those for backups.

Or restructure your storage hierarchy (which uh, you might want to do anyway).
-- 
bda
cyberpunk is dead. long live cyberpunk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] million files in single directory

2009-10-03 Thread michael schuster

Jeff Haferman wrote:

A user has 5 directories, each has tens of thousands of files, the
largest directory has over a million files.  The files themselves are
not very large, here is an "ls -lh" on the directories:
[these are all ZFS-based]

[r...@cluster]# ls -lh
total 341M
drwxr-xr-x+ 2 someone cluster  13K Sep 14 19:09 0/
drwxr-xr-x+ 2 someone cluster  50K Sep 14 19:09 1/
drwxr-xr-x+ 2 someone cluster 197K Sep 14 19:09 2/
drwxr-xr-x+ 2 someone cluster 785K Sep 14 19:09 3/
drwxr-xr-x+ 2 someone cluster 3.1M Sep 14 19:09 4/

When I go into directory "0", it takes about a minute for an "ls -1 |
grep wc" to return (it has about 12,000 files).  Directory "1" takes
between 5-10 minutes for the same command to return (it has about 50,000
files).


"ls" sorts its output before printing, unless you use the option to turn 
this off (-f, IIRC, but check the man-page).


"echo * | wc" is also a way to find out what's in a directory, but you'll 
miss "."files, and the shell you're using may have an influence ..


HTH
Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] million files in single directory

2009-10-03 Thread Jeff Haferman

A user has 5 directories, each has tens of thousands of files, the
largest directory has over a million files.  The files themselves are
not very large, here is an "ls -lh" on the directories:
[these are all ZFS-based]

[r...@cluster]# ls -lh
total 341M
drwxr-xr-x+ 2 someone cluster  13K Sep 14 19:09 0/
drwxr-xr-x+ 2 someone cluster  50K Sep 14 19:09 1/
drwxr-xr-x+ 2 someone cluster 197K Sep 14 19:09 2/
drwxr-xr-x+ 2 someone cluster 785K Sep 14 19:09 3/
drwxr-xr-x+ 2 someone cluster 3.1M Sep 14 19:09 4/

When I go into directory "0", it takes about a minute for an "ls -1 |
grep wc" to return (it has about 12,000 files).  Directory "1" takes
between 5-10 minutes for the same command to return (it has about 50,000
files).

I did an rsync of this directory structure to another filesystem
[lustre-based, FWIW] and it took about 24 hours to complete.  We have
done rsyncs on other directories that are much larger in terms of
file-sizes, but have thousands of files rather than tens, hundreds, and
millions of files.

Is there someway to speed up "simple" things like determining the
contents of these directories?  And why does an rsync take so much
longer on these directories when directories that contain hundreds of
gigabytes transfer much faster?

Jeff

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss