[zfs-discuss] Pure SSD Pool
Hi, I have a dual xeon 64GB 1U server with two free 3.5 drive slots. I also have a free PCI-E slot. I'm going to run a postgress database with a business intelligence application. The database size is not really set. It will be between 250-500GB running on Solaris 10 or b134. My storage choices are 1. OCZ Z-Drive R2 which fits in a 1U PCI slot. The docs say it has a raid controller built in. I don't know if that can be disabled. 2. Mirrored OCZ Talos C Series 3.5 SAS http://www.ocztechnology.com/products/solid-state-drives/sas.html drives. 3. Mirrored OCZ SATA II 3.5 http://www.ocztechnology.com/products/solid_state_drives/sata_3_5_solid_state_drives drives. I'm looking for comments on the above drives or recommendations on other affordable drives running in a pure SSD pool. Also what drives do you run as a pure SSD pool? Thanks Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
Given the abysmal performance, I have to assume there is a significant number of overhead reads or writes in order to maintain the DDT for each actual block write operation. Something I didn't mention in the other email is that I also tracked iostat throughout the whole operation. It's all writes (or at least 99.9% writes.) So I am forced to conclude it's a bunch of small DDT maintenance writes taking place and incurring access time penalties in addition to each intended single block access time penalty. The nature of the DDT is that it's a bunch of small blocks, that tend to be scattered randomly, and require maintenance in order to do anything else. This sounds like precisely the usage pattern that benefits from low latency devices such as SSD's. I understand the argument, DDT must be stored in the primary storage pool so you can increase the size of the storage pool without running out of space to hold the DDT... But it's a fatal design flaw as long as you care about performance... If you don't care about performance, you might as well use the netapp and do offline dedup. The point of online dedup is to gain performance. So in ZFS you have to care about the performance. There are only two possible ways to fix the problem. Either ... The DDT must be changed so it can be stored entirely in a designated sequential area of disk, and maintained entirely in RAM, so all DDT reads/writes can be infrequent and serial in nature... This would solve the case of async writes and large sync writes, but would still perform poorly for small sync writes. And it would be memory intensive. But it should perform very nicely given those limitations. ;-) Or ... The DDT stays as it is now, highly scattered small blocks, and there needs to be an option to store it entirely on low latency devices such as dedicated SSD's. Eliminate the need for the DDT to reside on the slow primary storage pool disks. I understand you must consider what happens when the dedicated SSD gets full. The obvious choices would be either (a) dedup turns off whenever the metadatadevice is full or (b) it defaults to writing blocks in the main storage pool. Maybe that could even be a configurable behavior. Either way, there's a very realistic use case here. For some people in some situations, it may be acceptable to say I have 32G mirrored metadatadevice, divided by 137bytes per entry I can dedup up to a maximum 218M unique blocks in pool, and if I estimate 100K average block size that means up to 20T primary pool storage. If I reach that limit, I'll add more metadatadevice. Both of those options would also go a long way toward eliminating the surprise delete performance black hole. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pure SSD Pool
I have a dual xeon 64GB 1U server with two free 3.5 drive slots. I also have a free PCI-E slot. I'm going to run a postgress database with a business intelligence application. The database size is not really set. It will be between 250-500GB running on Solaris 10 or b134. Running business critical stuff on b134 isn't what I'd recommend - no updates anymore - either use S10, or S11ex or perhaps openindiana. My storage choices are 1. OCZ Z-Drive R2 which fits in a 1U PCI slot. The docs say it has a raid controller built in. I don't know if that can be disabled. 2. Mirrored OCZ Talos C Series 3.5 SAS drives. 3. Mirrored OCZ SATA II 3.5 drives. I'm looking for comments on the above drives or recommendations on other affordable drives running in a pure SSD pool. Also what drives do you run as a pure SSD pool? Most drives should work well for a pure SSD pool. I have a postgresql database on a linux box on a mirrored set of C300s. AFAIK ZFS doesn't yet support TRIM, so that can be an issue. Apart from that, it should work well. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey When you read back duplicate data that was previously written with dedup, then you get a lot more cache hits, and as a result, the reads go faster. Unfortunately these gains are diminished... I don't know by what... But you only have about 2x to 4x performance gain reading previously dedup'd data, as compared to reading the same data which was never dedup'd. Even when repeatedly reading the same file which is 100% duplicate data (created by dd from /dev/zero) so all the data is 100% in cache... I still see only 2x to 4x performance gain with dedup. For what it's worth: I also repeated this without dedup. Created a large file (17G, just big enough that it will fit entirely in my ARC). Rebooted. Timed reading it. Now it's entirely in cache. Time reading it again. When it's not cached, of course the read time was equal to the original write time. When it's cached, it goes 4x faster. Perhaps this is only because I'm testing on a machine that has super fast storage... 11 striped SAS disks yielding 8Gbit/sec as compared to all-RAM which yielded 31.2Gbit/sec. It seems in this case, RAM is only 4x faster than the storage itself... But I would have expected a couple orders of magnitude... So perhaps my expectations are off, or the ARC itself simply incurs overhead. Either way, dedup is not to blame for obtaining merely 2x or 4x performance gain over the non-dedup equivalent. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
When it's not cached, of course the read time was equal to the original write time. When it's cached, it goes 4x faster. Perhaps this is only because I'm testing on a machine that has super fast storage... 11 striped SAS disks yielding 8Gbit/sec as compared to all-RAM which yielded 31.2Gbit/sec. It seems in this case, RAM is only 4x faster than the storage itself... But I would have expected a couple orders of magnitude... So perhaps my expectations are off, or the ARC itself simply incurs overhead. Either way, dedup is not to blame for obtaining merely 2x or 4x performance gain over the non-dedup equivalent. Could you test with some SSD SLOGs and see how well or bad the system performs? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacement disks for Sun X4500
Oh - and as a final point - if you are planning to run Solaris on this box, make sure they are not the 4KB sector disks, as at least in my experience, their performance with ZFS is profoundly bad. Particularly with all the metadata update stuff... Hitachi deskstar uses 512byte sectors Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
From: Roy Sigurd Karlsbakk [mailto:r...@karlsbakk.net] Sent: Saturday, July 09, 2011 2:33 PM Could you test with some SSD SLOGs and see how well or bad the system performs? These are all async writes, so slog won't be used. Async writes that have a single fflush() and fsync() at the end to ensure system buffering is not skewing the results. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
From: Roy Sigurd Karlsbakk [mailto:r...@karlsbakk.net] Sent: Saturday, July 09, 2011 2:33 PM Could you test with some SSD SLOGs and see how well or bad the system performs? These are all async writes, so slog won't be used. Async writes that have a single fflush() and fsync() at the end to ensure system buffering is not skewing the results. Sorry, my bad, I meant L2ARC to help buffer the DDT Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss