Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

Richard Elling Mon, 30 Jun 2008 13:03:02 -0700

Christiaan Willemsen wrote:
> I'm new so opensolaris and very new to ZFS. In the past we have always used 
> linux for our database backends.
>
> So now we are looking for a new database server to give us a big performance 
> boost, and also the possibility for scalability.
>
> Our current database consists mainly of a huge table containing about 230 
> million records and a few (relatively) smaller tables (something like 13 
> million records ans less). The main table is growing with about 800k records 
> every day, and the prognosis is that this number will increase significantly 
> in the near future.
>
> All of this is currently held in a Postgresql database with the largest 
> tables divided into segments to speed up performance. This all runs on a 
> linux machine with 4 GB of RAM and 4 10K SCSI disks in HW raid 10. The 
> complete database is about 70 Gb in size, and growing every day.
>
> We will soon need hew hardware, and are also reviewing software needs.
>
> Besides a lot more RAM (16 or 32GB), the new machine will also get a much 
> lager disk array. We don't need the size, but we do need the IO it can 
> generate.  And what we also need is it beeing able to scale. When needs grow, 
> it should be possible to add more disks to be able to handle the extra IO.
>

Why not go to 128-256 GBytes of RAM?  It isn't that expensive and would
significantly help give you a "big performance boost" ;-)

> And that is exactly where ZFS  comes in, at least as far as I read.
>
> The question is: how can we maximize IO by using the best possible 
> combination of hardware and ZFS RAID?
>   

Adding lots of RAM would allow you to minimize I/O -- generally
a good thing -- do less of those things that hurt.

Any modern machine should be able to generate decent I/O demand.
I don't think that the decision to choose RAID array redundancy
should be based purely on performance.  You might find more
differentiation in the RAS side of such designs.

> I'll probably be having 16 Seagate 15K5 SAS disks, 150 GB each.  Two in HW 
> raid1 for the OS, two in HW raid 1 or 10 for the transaction log. The OS does 
> not need to be on ZFS, but could be. 
>   

The database transaction log should be relatively small, so I would
look for two LUNs (disks), mirrored.  Similarly, the ZIL should be
relatively small -- two LUNs (disks), mirrored.  You will want ZFS to
manage the redundancy here, so think about mirroring at the
ZFS level.  The actual size needed will be based on the transaction
load which causes writes.  For ZIL sizing, we like to see something
like 20 seconds worth of write workload.  In most cases, this will
fit into the write cache of a decent array, so you may not have to
burn an actual pair of disks in the backing store.  But since I don't
know the array your using, it will be difficult to be specific.

> So that leaves 10 or 12 disks to configure for the database. The question is 
> how to divide them to get the best IO performance by mixing the best of both 
> worlds.
>
> For what I read, mirroring and striping should get me better performance than 
> raidz of RAID5. But I guess you might give me some pointer on how to 
> distribute the disk. My biggest question is what I should leave to the HW 
> raid, and what to ZFS?
>   

Array-based RAID-5 implementations perform fairly well.
If the data is important and difficult to reload, then you should
consider using some sort of ZFS redundancy: mirror or copies.

David Collier-Brown wrote:
>   This is a bit of a sidebar to the discussion about getting the 
> best performance for PostgreSQL from ZFS, but may affect
> you if you're doing sequential scans through the 70GB table
> or its segments.
>
>   ZFS copy-on-write results in tables' contents being spread across
> the full width of their stripe, which is arguably a good thing
> for transaction processing performance (or at least can be), but
> makes sequential table-scan speed degrade.
>  
>   If you're doing sequential scans over large amounts of data
> which isn't changing very rapidly, such as older segments, you
> may want to re-sequentialize that data.

There is a general feeling that COW, as used by ZFS, will cause
all sorts of badness for database scans.  Alas, there is a dearth of
real-world data on any impacts (I'm anxiously awaiting...)
There are cases where this won't be a problem at all, but it will
depend on how you use the data.

In this particular case, it would be cost effective to just buy a
bunch of RAM and not worry too much about disk I/O during
scans.  In the future, if you significantly outgrow the RAM, then
there might be a case for a ZFS (L2ARC) cache LUN to smooth
out the bumps.  You can probably defer that call until later.

Backups might be challenging, so ZFS snapshots might help
reduce that complexity.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

Reply via email to