A couple of points/additions with regard to oracle in particular:
When talking about large database installations, copy-on-write may
or may not apply. The files are never completely rewritten, only
changed internally via mmap(). When you lay down your database, you
will generally allocate the storage for the anticipated capacity
required. That will result in sparse files in the actual filesystems.
This brings up the question:
How does ZFS allocate sparse files, and, how does the allocation
occur as the sparse files have data added?
Regarding the separation of data files, you *really* want your logs
to be in a different place (spindles-wise) than your DB. After all,
should you have a catastrophic failure (crash, disk hiccup, etc.),
your redo and transaction logs are your recovery system.
With this in mind, I'd envisioned using zfs as such:
- Allocate a number of database filesystems using a 'db' or standard
pool. Generally 1 per CPU, as oracle will use parallel queries if
the tables are spread across multiple filesystems.
- Allocate another pool from another storage system (internal,
another disk array, etc.) the log areas. Name the pool something
like 'dblog'.
That would guarantee that you don't mix your data types.
I'm interested to see how a zfs pool using multiple LUNs on a large
storage array will behave when using a database. I think the
performance spreading across multiple luns will result in increased
db performance.
On a side note, does anybody know of a way to track hot areas on the
disk? One concern I've got with the zfs pool layout is multiple
tables being allocated on the same LUN and both tables being used
heavily. The need to move the data around within the pool may be
required to address a single LUN bottleneck. Has anybody thought of
this situation?
On May 10, 2006, at 4:48 PM, Boyd Adamson wrote:
One question that has come up a number of times when I've been
speaking with people (read: evangelizing :) ) about ZFS is about
database storage. In conventional use storage has separated redo
logs from table space, on a spindle basis.
I'm not a database expert but I believe the reasons boil down to a
combination of:
- Separation for redundancy
- Separation for reduction of bottlenecks (most write ops touch
both the logs and the table)
- Separation of usage patterns (logs are mostly sequential writes,
tables are random).
The question then comes up about whether in a ZFS world this
separation is still needed. It seems to me that each of the above
reasons is to some extent ameliorated by ZFS:
- Redundancy is performed at the filesystem level, probably on all
disks in the pool.
- Dynamic striping and copy-on-write mean that all write ops can be
striped across vdevs and the log writes can go right next to the
table writes
- Copy-on-write also turns almost all writes into sequential writes
anyway.
So it seems that the old reasoning may no longer apply. Is my
thinking correct here? Have I missed something? Do we have any
information to support either the use of a single pool or of
separate pools for database usage?
Boyd
Melbourne, Australia
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273 Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382 [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382 [EMAIL PROTECTED] (home)
"When Microsoft writes an application for Linux, I've Won." - Linus
Torvalds
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss