Hi,
We have approximately 3 million active users and have a storage capacity of
300 TB in ZFS zpools.
The ZFS is mounted on Sun cluster using 3*T2000 servers connected with FC to
SAN storage.
Each zpool is a LUN in SAN which already provides raid so we're not doing
raidz on top of it.

We started using ZFS about 2.5 years ago on Solaris10U3 or U4 (I can't
recall).
Our storage growth is roughly 4TB a month.

The zpools sizes are from 2TB  to the biggest 32T.
We're using ZFS to store mail headers (less than 4k) and attachments (1k to
12mb).

Currently the Sun cluster handles approx. 20K NFS OPS.
€ File sizing:
1. 
2. 10 million files less than 4K a day.
3. 
4. Addition to the 10 million there are another 10 million varies sizes:
5. 
6. 20% less than 4K.
7. 
8.  25% between 4K and 8K
9. 
10.  50% between 9K and 100K
11. 
12.  5% above 100K till 12M

Total 20 million new files a day.

We're using two file hierarchies for storing files:

For the mail headers (less than 4K):

/FF/YYMM/DDHH/SS/ABCDEFGH

Explanation:

First directory is for the mount point from 00..FF (up to 256 directories)
Second directory year and month;
Third directory day and hour;
Forth is seconds;
In the end we have a gzipped file up to 1K.



For the mail object:
We're using single instancing/de-dup on our application (Meaning no maildir
or mbox).

Mail objects can be 1K up to 12MB.
Directory structure is as follows.

 /FF/FF/FF/FF/FF/FF/FF/FF/FF/file

Explanation:

First directory holds 256 directories 00 to FF and the other directories
hold up to 256 directories, with the lower branches holding fewer
directories than higher branches. At the end of the hierarchy there's a
single file.


Mail operation:
When a new mail is arrived we split the mail into object: a header, and each
attachment is an object (even text within a body).
The header files are stored as a ²timestamp² (/FF/YYMM/DDHH/SS/file) so it
may be advantage for reads because when the users are reading their mail the
same day, the metadata of the directories and file can be in cache.



But this is not the same for the attachments, when the attachments are store
in directories with their HEX value.

Our main issue, or problem, over the last 2.5 years of using ZFS:
When a zpool becomes full, the write operation becomes significantly slower.
At first it happened around 90% zpool capacity and now, after 30-40 zpools,
it happens around 80% capacity.
The meaning of this for us is that if we define a zpool of 4TB, we can use
only 3.2T (82%) effectively.


Is there a ³best practice² from SUN/ZFS regarding building directory
hierarchies with huge capacity of files(20M a day) ? also, How can we avoid
performance degradation when we reach 80% of zpool capacity?





Regards



Yariv
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to