A couple of points/additions with regard to oracle in particular:

When talking about large database installations, copy-on-write may or may not apply. The files are never completely rewritten, only changed internally via mmap(). When you lay down your database, you will generally allocate the storage for the anticipated capacity required. That will result in sparse files in the actual filesystems.
        This brings up the question:
How does ZFS allocate sparse files, and, how does the allocation occur as the sparse files have data added?

Regarding the separation of data files, you *really* want your logs to be in a different place (spindles-wise) than your DB. After all, should you have a catastrophic failure (crash, disk hiccup, etc.), your redo and transaction logs are your recovery system.
        With this in mind, I'd envisioned using zfs as such:

- Allocate a number of database filesystems using a 'db' or standard pool. Generally 1 per CPU, as oracle will use parallel queries if the tables are spread across multiple filesystems. - Allocate another pool from another storage system (internal, another disk array, etc.) the log areas. Name the pool something like 'dblog'.

        That would guarantee that you don't mix your data types.

I'm interested to see how a zfs pool using multiple LUNs on a large storage array will behave when using a database. I think the performance spreading across multiple luns will result in increased db performance.

On a side note, does anybody know of a way to track hot areas on the disk? One concern I've got with the zfs pool layout is multiple tables being allocated on the same LUN and both tables being used heavily. The need to move the data around within the pool may be required to address a single LUN bottleneck. Has anybody thought of this situation?

On May 10, 2006, at 4:48 PM, Boyd Adamson wrote:

One question that has come up a number of times when I've been speaking with people (read: evangelizing :) ) about ZFS is about database storage. In conventional use storage has separated redo logs from table space, on a spindle basis.

I'm not a database expert but I believe the reasons boil down to a combination of:
- Separation for redundancy

- Separation for reduction of bottlenecks (most write ops touch both the logs and the table)

- Separation of usage patterns (logs are mostly sequential writes, tables are random).

The question then comes up about whether in a ZFS world this separation is still needed. It seems to me that each of the above reasons is to some extent ameliorated by ZFS: - Redundancy is performed at the filesystem level, probably on all disks in the pool.

- Dynamic striping and copy-on-write mean that all write ops can be striped across vdevs and the log writes can go right next to the table writes

- Copy-on-write also turns almost all writes into sequential writes anyway.

So it seems that the old reasoning may no longer apply. Is my thinking correct here? Have I missed something? Do we have any information to support either the use of a single pool or of separate pools for database usage?

Boyd
Melbourne, Australia

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382           [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382                 [EMAIL PROTECTED] (home)
"When Microsoft writes an application for Linux, I've Won." - Linus Torvalds


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to