[zfs-discuss] Pure SSD Pool

2011-07-09 Thread Karl Rossing

Hi,

I have a dual xeon 64GB 1U server with two free 3.5 drive slots. I also 
have a free PCI-E slot.


I'm going to run a postgress database with a business intelligence 
application.


The database size is not really set. It will be between 250-500GB 
running on Solaris 10 or b134.


My storage choices are

  1. OCZ Z-Drive R2 which fits in a 1U PCI slot. The docs say it has a
 raid controller built in. I don't know if that can be disabled.
  2. Mirrored OCZ Talos C Series 3.5 SAS
 http://www.ocztechnology.com/products/solid-state-drives/sas.html drives.
  3. Mirrored OCZ SATA II 3.5
 
http://www.ocztechnology.com/products/solid_state_drives/sata_3_5_solid_state_drives
 drives.

I'm looking for comments on the above drives or recommendations on other 
affordable drives running in a pure SSD pool.


Also what drives do you run as a pure SSD pool?

Thanks
Karl





CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-09 Thread Edward Ned Harvey
Given the abysmal performance, I have to assume there is a significant
number of overhead reads or writes in order to maintain the DDT for each
actual block write operation.  Something I didn't mention in the other
email is that I also tracked iostat throughout the whole operation.  It's
all writes (or at least 99.9% writes.)  So I am forced to conclude it's a
bunch of small DDT maintenance writes taking place and incurring access time
penalties in addition to each intended single block access time penalty.

The nature of the DDT is that it's a bunch of small blocks, that tend to be
scattered randomly, and require maintenance in order to do anything else.
This sounds like precisely the usage pattern that benefits from low latency
devices such as SSD's.

I understand the argument, DDT must be stored in the primary storage pool so
you can increase the size of the storage pool without running out of space
to hold the DDT...  But it's a fatal design flaw as long as you care about
performance...  If you don't care about performance, you might as well use
the netapp and do offline dedup.  The point of online dedup is to gain
performance.  So in ZFS you have to care about the performance.  

There are only two possible ways to fix the problem.  
Either ...
The DDT must be changed so it can be stored entirely in a designated
sequential area of disk, and maintained entirely in RAM, so all DDT
reads/writes can be infrequent and serial in nature...  This would solve the
case of async writes and large sync writes, but would still perform poorly
for small sync writes.  And it would be memory intensive.  But it should
perform very nicely given those limitations.  ;-)
Or ...
The DDT stays as it is now, highly scattered small blocks, and there needs
to be an option to store it entirely on low latency devices such as
dedicated SSD's.  Eliminate the need for the DDT to reside on the slow
primary storage pool disks.  I understand you must consider what happens
when the dedicated SSD gets full.  The obvious choices would be either (a)
dedup turns off whenever the metadatadevice is full or (b) it defaults to
writing blocks in the main storage pool.  Maybe that could even be a
configurable behavior.  Either way, there's a very realistic use case here.
For some people in some situations, it may be acceptable to say I have 32G
mirrored metadatadevice, divided by 137bytes per entry I can dedup up to a
maximum 218M unique blocks in pool, and if I estimate 100K average block
size that means up to 20T primary pool storage.  If I reach that limit, I'll
add more metadatadevice.

Both of those options would also go a long way toward eliminating the
surprise delete performance black hole.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pure SSD Pool

2011-07-09 Thread Roy Sigurd Karlsbakk
 I have a dual xeon 64GB 1U server with two free 3.5 drive slots. I
 also have a free PCI-E slot.
 
 I'm going to run a postgress database with a business intelligence
 application.
 
 The database size is not really set. It will be between 250-500GB
 running on Solaris 10 or b134.

Running business critical stuff on b134 isn't what I'd recommend - no updates 
anymore - either use S10, or S11ex or perhaps openindiana.

 My storage choices are
 
 1. OCZ Z-Drive R2 which fits in a 1U PCI slot. The docs say it has a
 raid controller built in. I don't know if that can be disabled.
 2. Mirrored OCZ Talos C Series 3.5 SAS drives.
 3. Mirrored OCZ SATA II 3.5 drives.
 
 I'm looking for comments on the above drives or recommendations on
 other affordable drives running in a pure SSD pool.
 
 Also what drives do you run as a pure SSD pool?

Most drives should work well for a pure SSD pool. I have a postgresql database 
on a linux box on a mirrored set of C300s. AFAIK ZFS doesn't yet support TRIM, 
so that can be an issue. Apart from that, it should work well.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-09 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
 
 When you read back duplicate data that was previously written with
 dedup, then you get a lot more cache hits, and as a result, the reads go
 faster.  Unfortunately these gains are diminished...  I don't know by
 what...  But you only have about 2x to 4x performance gain reading
 previously dedup'd data, as compared to reading the same data which was
 never dedup'd.  Even when repeatedly reading the same file which is 100%
 duplicate data (created by dd from /dev/zero) so all the data is 100% in
 cache...   I still see only 2x to 4x performance gain with dedup.

For what it's worth:

I also repeated this without dedup.  Created a large file (17G, just big
enough that it will fit entirely in my ARC).  Rebooted.  Timed reading it.
Now it's entirely in cache.  Time reading it again.

When it's not cached, of course the read time was equal to the original
write time.  When it's cached, it goes 4x faster.  Perhaps this is only
because I'm testing on a machine that has super fast storage...  11 striped
SAS disks yielding 8Gbit/sec as compared to all-RAM which yielded
31.2Gbit/sec.  It seems in this case, RAM is only 4x faster than the storage
itself...  But I would have expected a couple orders of magnitude...  So
perhaps my expectations are off, or the ARC itself simply incurs overhead.
Either way, dedup is not to blame for obtaining merely 2x or 4x performance
gain over the non-dedup equivalent.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-09 Thread Roy Sigurd Karlsbakk
 When it's not cached, of course the read time was equal to the
 original
 write time. When it's cached, it goes 4x faster. Perhaps this is only
 because I'm testing on a machine that has super fast storage... 11
 striped
 SAS disks yielding 8Gbit/sec as compared to all-RAM which yielded
 31.2Gbit/sec. It seems in this case, RAM is only 4x faster than the
 storage
 itself... But I would have expected a couple orders of magnitude... So
 perhaps my expectations are off, or the ARC itself simply incurs
 overhead.
 Either way, dedup is not to blame for obtaining merely 2x or 4x
 performance
 gain over the non-dedup equivalent.

Could you test with some SSD SLOGs and see how well or bad the system performs?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacement disks for Sun X4500

2011-07-09 Thread Roy Sigurd Karlsbakk
 Oh - and as a final point - if you are planning to run Solaris on this
 box, make sure they are not the 4KB sector disks, as at least in my
 experience, their performance with ZFS is profoundly bad. Particularly
 with all the metadata update stuff...

Hitachi deskstar uses 512byte sectors

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-09 Thread Edward Ned Harvey
 From: Roy Sigurd Karlsbakk [mailto:r...@karlsbakk.net]
 Sent: Saturday, July 09, 2011 2:33 PM
 
 Could you test with some SSD SLOGs and see how well or bad the system
 performs?

These are all async writes, so slog won't be used.  Async writes that have a 
single fflush() and fsync() at the end to ensure system buffering is not 
skewing the results.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-09 Thread Roy Sigurd Karlsbakk
  From: Roy Sigurd Karlsbakk [mailto:r...@karlsbakk.net]
  Sent: Saturday, July 09, 2011 2:33 PM
 
  Could you test with some SSD SLOGs and see how well or bad the
  system
  performs?
 
 These are all async writes, so slog won't be used. Async writes that
 have a single fflush() and fsync() at the end to ensure system
 buffering is not skewing the results.

Sorry, my bad, I meant L2ARC to help buffer the DDT

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss