Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-09 Thread Keith Bierman

On Oct 8, 2008, at 4:27 PM   10/8/, Jim Dunham wrote:
> , a single Solaris node can not be both
> the primary and secondary node.
>
> If one wants this type of mirror functionality on a single node, use
> host based or controller based mirroring software.


If one is running multiple zones, couldn't you fool AVS into thinking  
that one zone was the primary and the other the secondary?
-- 
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
 Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Brian Hechinger
On Wed, Oct 08, 2008 at 06:27:51PM -0400, Jim Dunham wrote:
> 
> If one wants this type of mirror functionality on a single node, use  
> host based or controller based mirroring software.

Is there mirroring software that can do async copies to a mirror?

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Jim Dunham
Joe,

> Brian Hechinger
>> On Mon, Oct 06, 2008 at 10:47:04AM -0400, Moore, Joe wrote:
>>>
>>> I wonder if an AVS-replicated storage device on the
>> backends would be appropriate?
>>>
>>> write -> ZFS-mirrored slog -> ramdisk -AVS-> physical disk
>>>   \
>>>+-iscsi-> ramdisk -AVS-> physical disk
>>>
>>> You'd get the continuous replication of the ramdisk to
>> physical drive (and perhaps automagic recovery on reboot) but
>> not pay the syncronous write to remote physical disk penalty
>>
>> It looks like the answer is no.
>>
>> [EMAIL PROTECTED] sudo sndradm -e localhost
>> /dev/rramdisk/avstest1 /dev/zvol/rdsk/SYS0/bitmap1
>> \wintermute /dev/zvol/dsk/SYS0/avstest2
>> /dev/zvol/rdsk/SYS0/bitmap2 ip async
>> Enable Remote Mirror? (Y/N) [N]: y
>> sndradm: Error: both localhost and wintermute are local
>
> I've not worked with AVS other than looking at the basic concepts,  
> but to me this looks like a dont-shoot-yourself-in-the-foot critical  
> warning rather than an actual functionality restriction.  Is there a  
> -force option to override this normally quite reasonable sanity check?

This is a hard restriction, with no override.  AVS, or more  
specifically the remote replication component call SNDR, needs to know  
which end of the replica is the SNDR primary node and which end is the  
SNDR secondary node. Since SNDR requires this functionality to know  
which direction to replica data, a single Solaris node can not be both  
the primary and secondary node.

If one wants this type of mirror functionality on a single node, use  
host based or controller based mirroring software.

>
>
> --Joe
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham

Storage Platform Software Group
Sun Microsystems, Inc.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Chris Greer
I was using EMC's iorate for the comparison.

ftp://ftp.emc.com/pub/symm3000/iorate/

I had 4 processes running on the pool in parallel do 4K sequential writes.

I've also been playing around with a few other benchmark tools (i just had 
results from other storage test with this same iorate test).
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Brian Hechinger
On Wed, Oct 08, 2008 at 08:50:57AM -0400, Moore, Joe wrote:
> 
> I've not worked with AVS other than looking at the basic concepts, but to me 
> this looks like a dont-shoot-yourself-in-the-foot critical warning rather 
> than an actual functionality restriction.  Is there a -force option to 
> override this normally quite reasonable sanity check?

There is no force option that I can see, but I'm also not ever worked with AVS.

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Wilkinson, Alex

0n Sat, Oct 04, 2008 at 10:37:26PM -0700, Chris Greer wrote: 

>The big thing here is I ended up getting a MASSIVE boost in
>performance even with the overhead of the 1GB link, and iSCSI.
>The iorate test I was using went from 3073 IOPS on 90% sequential
>writes to 23953 IOPS with the RAM slog added.  The service time 
>was also significantly better than the physical disk.

Curios, what tool did you use to benchmark your IOPS ?

 -aW

IMPORTANT: This email remains the property of the Australian Defence 
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 
1914.  If you have received this email in error, you are requested to contact 
the sender and delete the email.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Moore, Joe
Brian Hechinger
> On Mon, Oct 06, 2008 at 10:47:04AM -0400, Moore, Joe wrote:
> >
> > I wonder if an AVS-replicated storage device on the
> backends would be appropriate?
> >
> > write -> ZFS-mirrored slog -> ramdisk -AVS-> physical disk
> >\
> > +-iscsi-> ramdisk -AVS-> physical disk
> >
> > You'd get the continuous replication of the ramdisk to
> physical drive (and perhaps automagic recovery on reboot) but
> not pay the syncronous write to remote physical disk penalty
>
> It looks like the answer is no.
>
> [EMAIL PROTECTED] sudo sndradm -e localhost
> /dev/rramdisk/avstest1 /dev/zvol/rdsk/SYS0/bitmap1
> \wintermute /dev/zvol/dsk/SYS0/avstest2
> /dev/zvol/rdsk/SYS0/bitmap2 ip async
> Enable Remote Mirror? (Y/N) [N]: y
> sndradm: Error: both localhost and wintermute are local

I've not worked with AVS other than looking at the basic concepts, but to me 
this looks like a dont-shoot-yourself-in-the-foot critical warning rather than 
an actual functionality restriction.  Is there a -force option to override this 
normally quite reasonable sanity check?

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-07 Thread Robert Milkowski
Hello Nicolas,

Monday, October 6, 2008, 10:51:58 PM, you wrote:

NW> I'm pretty sure that local RAM beats remote-anything, no matter what the
NW> "anything" (as long as it isn't RAM) and what the protocol to get to it
NW> (as long as it isn't a normal backplane).  (You could claim with NUMA
NW> memory can be remote, so let's say that for a reasonable value of
NW> "remote.")

IIRC the total throughput to remote memory over fire link could be
faster than to local memory... just a funny thing I remembered.

Not that it is relevant here.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Ross
> Or would they?  A box dedicated to being a RAM based
> slog is going to be
> faster than any SSD would be.  Especially if you make
> the expensive jump
> to 8Gb FC.

Not necessarily.  While this has some advantages in terms of price & 
performance, at ~$2400 the 80GB ioDrive would give it a run for it's money.  
600MB/s and enough capacity to (hopefully) use it as a L2ARC as well.

When you consider that you need at least two machines, UPS', and the supporting 
infrastructure for this idea, the ioDrive really isn't far off for cost.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Brian Hechinger
On Mon, Oct 06, 2008 at 01:13:40AM -0700, Ross wrote:
> 
> It's also worth bearing in mind that you can have multiple mirrors.  I don't 
> know what effect that will have on the performance, but it's an easy way to 
> boost the reliability even further.  I think this idea configured on a set of 
> 2-3 servers, with separate UPS' for each, and a script that can export the 
> pool and save the ramdrive when the power fails, is potentially a very neat 
> little system.

The more slog devices, the better. :)

If the host using the slogs could trigger the shutdown, that would be even
better I think.  Once we know the zpool is exported, the slogs have just
entered a nicely consistent state at which point the copies could be made.

Also, it would also be nice if the host using these slogs would be able to
wait until enough of them are online to attempt to mount its pool.  That
shouldn't be too hard, nothing more than some startup script modifications.

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Brian Hechinger
On Mon, Oct 06, 2008 at 10:47:04AM -0400, Moore, Joe wrote:
> 
> I wonder if an AVS-replicated storage device on the backends would be 
> appropriate?
> 
> write -> ZFS-mirrored slog -> ramdisk -AVS-> physical disk
>\
> +-iscsi-> ramdisk -AVS-> physical disk
> 
> You'd get the continuous replication of the ramdisk to physical drive (and 
> perhaps automagic recovery on reboot) but not pay the syncronous write to 
> remote physical disk penalty

It looks like the answer is no.

[EMAIL PROTECTED] sudo sndradm -e localhost /dev/rramdisk/avstest1 
/dev/zvol/rdsk/SYS0/bitmap1 \wintermute /dev/zvol/dsk/SYS0/avstest2 
/dev/zvol/rdsk/SYS0/bitmap2 ip async
Enable Remote Mirror? (Y/N) [N]: y
sndradm: Error: both localhost and wintermute are local

In order to use AVS, it looks like you'd have to replicate between two (or more)
"ZIL Boxes".  Not the worst thing in the world to have to do, but it certainly
complicates things.  Also, you don't get that super fast RAM->Disk sync anymore
as you now have to traverse an IP network to get there.  Still might be an
acceptable way to achieve the goals we are looking at here.

I guess at this point falling back to 'zfs send' run in a continuous loop might
be an alternative.

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Nicolas Williams
On Mon, Oct 06, 2008 at 05:38:33PM -0400, Brian Hechinger wrote:
> On Sun, Oct 05, 2008 at 11:30:54PM -0500, Nicolas Williams wrote:
> > There have been threads about adding a feature to support slow mirror
> > devices that don't stay synced synchronously.  At least IIRC.  That
> > would help.  But then, if the pool is busy writing then your slow ZIL
> 
> That would definitely be a great help.
> 
> > mirrors would generally be out of sync, thus being of no help in the
> > even of a power failure given fast slog devices that don't survive power
> > failure.
> 
> Maybe not, but it would at least save *something* as opposed to not saving
> anything at all.  Still, with enough UPS power, there should be at least
> enough run time left to get the rest of the ZIL to the disk mirror.

Yes.  But again, you get somewhat more protection from writing to a
write-biased SSD in that once the ZIL bits are committed then you get
protection from panics in the OS too, not just power failure.

> > Also, using remote devices for a ZIL may defeat the purpose of fast
> > ZILs, even if the actual devices are fast, because what really matters
> > here is latency, and the farther the device, the higher the latency.
> 
> 4Gb FC is slow and low latency?  Tell that to all my local fast disks that
> are attached via FC. :)

The comparison was to RAM, not "local fast disks."

I'm pretty sure that local RAM beats remote-anything, no matter what the
"anything" (as long as it isn't RAM) and what the protocol to get to it
(as long as it isn't a normal backplane).  (You could claim with NUMA
memory can be remote, so let's say that for a reasonable value of
"remote.")

> > Yes, it's pretty smart.  Add UPS and it's sortof like battery-backed
> > RAM.  You can probably get a good enough reliability rate out of this
> > for your purposes, though actual slog devices would be better if you can
> > afford them.
> 
> Or would they?  A box dedicated to being a RAM based slog is going to be
> faster than any SSD would be.  Especially if you make the expensive jump
> to 8Gb FC.

Unless the SSD had a battery-backed RAM cache, or were based entierly on
battery-backed RAM (but then you have to worry about battery upkeep).

To me this is a performance/reliability trade-off.  RAM slogs mirrored
in cluster + UPS -> very fast, works as well as the UPS.  Write-biased
flash slogs -> fast, no UPS to worry about.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Brian Hechinger
On Mon, Oct 06, 2008 at 10:47:04AM -0400, Moore, Joe wrote:
> 
> I wonder if an AVS-replicated storage device on the backends would be 
> appropriate?
> 
> write -> ZFS-mirrored slog -> ramdisk -AVS-> physical disk
>\
> +-iscsi-> ramdisk -AVS-> physical disk
> 
> You'd get the continuous replication of the ramdisk to physical drive (and 
> perhaps automagic recovery on reboot) but not pay the syncronous write to 
> remote physical disk penalty

Hmmm, AVS *might* just be the ticket here.  Will have to look at that.

> A .5-ms RTT on an ethernet link to the iSCSI disk may be faster than a 9-ms 
> latency on physical media.

Or, if you're looking into what I'm thinking with 4Gb/8Gb FC, it gets even 
better.

> There was a time when it was better to place workstations' swap files on the 
> far side of a 100Mbps ethernet link rather than using the local spinning 
> rust.  Ah, the good old days...

I remember those days.  My SPARCstation LX ran that way.  Not due to speed,
however, due to lack of disk space in the LX. ;)

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Brian Hechinger
On Sun, Oct 05, 2008 at 11:30:54PM -0500, Nicolas Williams wrote:
> 
> There have been threads about adding a feature to support slow mirror
> devices that don't stay synced synchronously.  At least IIRC.  That
> would help.  But then, if the pool is busy writing then your slow ZIL

That would definitely be a great help.

> mirrors would generally be out of sync, thus being of no help in the
> even of a power failure given fast slog devices that don't survive power
> failure.

Maybe not, but it would at least save *something* as opposed to not saving
anything at all.  Still, with enough UPS power, there should be at least
enough run time left to get the rest of the ZIL to the disk mirror.

> Also, using remote devices for a ZIL may defeat the purpose of fast
> ZILs, even if the actual devices are fast, because what really matters
> here is latency, and the farther the device, the higher the latency.

4Gb FC is slow and low latency?  Tell that to all my local fast disks that
are attached via FC. :)

> Yes, it's pretty smart.  Add UPS and it's sortof like battery-backed
> RAM.  You can probably get a good enough reliability rate out of this
> for your purposes, though actual slog devices would be better if you can
> afford them.

Or would they?  A box dedicated to being a RAM based slog is going to be
faster than any SSD would be.  Especially if you make the expensive jump
to 8Gb FC.

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Moore, Joe
Nicolas Williams wrote
> There have been threads about adding a feature to support slow mirror
> devices that don't stay synced synchronously.  At least IIRC.  That
> would help.  But then, if the pool is busy writing then your slow ZIL
> mirrors would generally be out of sync, thus being of no help in the
> even of a power failure given fast slog devices that don't
> survive power
> failure.

I wonder if an AVS-replicated storage device on the backends would be 
appropriate?

write -> ZFS-mirrored slog -> ramdisk -AVS-> physical disk
   \
+-iscsi-> ramdisk -AVS-> physical disk

You'd get the continuous replication of the ramdisk to physical drive (and 
perhaps automagic recovery on reboot) but not pay the syncronous write to 
remote physical disk penalty

>
> Also, using remote devices for a ZIL may defeat the purpose of fast
> ZILs, even if the actual devices are fast, because what really matters
> here is latency, and the farther the device, the higher the latency.

A .5-ms RTT on an ethernet link to the iSCSI disk may be faster than a 9-ms 
latency on physical media.

There was a time when it was better to place workstations' swap files on the 
far side of a 100Mbps ethernet link rather than using the local spinning rust.  
Ah, the good old days...

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Ross
Very interesting idea, thanks for sharing it.

Infiniband would definately be worth looking at for performance, although I 
think you'd need iSER to get the benefits and that might still be a little new: 
 http://www.opensolaris.org/os/project/iser/Release-notes/.  

It's also worth bearing in mind that you can have multiple mirrors.  I don't 
know what effect that will have on the performance, but it's an easy way to 
boost the reliability even further.  I think this idea configured on a set of 
2-3 servers, with separate UPS' for each, and a script that can export the pool 
and save the ramdrive when the power fails, is potentially a very neat little 
system.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-05 Thread Nicolas Williams
On Sun, Oct 05, 2008 at 09:07:31PM -0400, Brian Hechinger wrote:
> On Sat, Oct 04, 2008 at 10:37:26PM -0700, Chris Greer wrote:
> > I'm not sure I could survive a crash of both nodes, going to try and
> > test some more.
> 
> Ok, so taking my idea above, maybe a pair of 15K SAS disks in those
> boxes so that you could create a backing store.  I wonder what the best
> way to setup realtime sync would be (without making the backing store
> responsible for slowing down the ramdisk, so no zfs mirroring between
> rmadisk and sas disk in other words).

There have been threads about adding a feature to support slow mirror
devices that don't stay synced synchronously.  At least IIRC.  That
would help.  But then, if the pool is busy writing then your slow ZIL
mirrors would generally be out of sync, thus being of no help in the
even of a power failure given fast slog devices that don't survive power
failure.

Also, using remote devices for a ZIL may defeat the purpose of fast
ZILs, even if the actual devices are fast, because what really matters
here is latency, and the farther the device, the higher the latency.

> > So is this idea completely crazy?
> 
> I don't think so, no. ;)

Yes, it's pretty smart.  Add UPS and it's sortof like battery-backed
RAM.  You can probably get a good enough reliability rate out of this
for your purposes, though actual slog devices would be better if you can
afford them.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-05 Thread Adam Leventhal
> So what are the downsides to this?  If both nodes were to crash and  
> I used the same technique to recreate the ramdisk I would lose any  
> transactions in the slog at the time of the crash, but the physical  
> disk image is still in a consistent state right (just not from my  
> apps point of view)?

You would lose transactions, but the pool would still reflect a  
consistent
state.

> So is this idea completely crazy?


On the contrary; it's very clever.

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-05 Thread Brian Hechinger
On Sat, Oct 04, 2008 at 10:37:26PM -0700, Chris Greer wrote:
> 
> So I tried this experiment this week...
> On each host (OpenSolaris 2008.05), I created an 8GB ramdisk with ramdiskadm. 
>  I shared this ramdisk on each host via the iscsi target and initiator over a 
> 1GB crossconnect cable (jumbo frames enabled).  I added these as mirrored 
> slog devices in a zpool.

Very interesting.  This also gives me an idea.  Using COMSTAR you could
build any number of RAM based slog devices.  They wouldn't need to be
anything amazing, just a bunch of RAM and a supported FC card (or two).

> I'm not sure I could survive a crash of both nodes, going to try and test 
> some more.

Ok, so taking my idea above, maybe a pair of 15K SAS disks in those
boxes so that you could create a backing store.  I wonder what the best
way to setup realtime sync would be (without making the backing store
responsible for slowing down the ramdisk, so no zfs mirroring between
rmadisk and sas disk in other words).

> So is this idea completely crazy?

I don't think so, no. ;)

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-04 Thread Chris Greer
I currently have a traditional NFS cluster hardware setup in the lab (2 host 
with FC attached JBOD storage) but no cluster software yet.  I've been wanting 
to try out the separate ZIL to see what it might do to boost performance.  My 
problem is that I don't have any cool SSD devices, much less ones that I could 
have shared between two host.  Commercial arrays have custom hardware with 
mirrored cache which got me thinking about a way to do this with regular 
hardware.

So I tried this experiment this week...
On each host (OpenSolaris 2008.05), I created an 8GB ramdisk with ramdiskadm.  
I shared this ramdisk on each host via the iscsi target and initiator over a 
1GB crossconnect cable (jumbo frames enabled).  I added these as mirrored slog 
devices in a zpool.

The end result was a pool that I could import and export between host, and it 
can survive one of the host dying.  I also copied a dd image of my ramdisk 
device to stable storage with the pool exported (thus flushed), which allowed 
me to shut the entire cluster down, and power 1 node up, recreate the ramdisk 
and dd the image back and re-import the pool.
I'm not sure I could survive a crash of both nodes, going to try and test some 
more.

The big thing here is I ended up getting a MASSIVE boost in performance even 
with the overhead of the 1GB link, and iSCSI.   The iorate test I was using 
went from 3073 IOPS on 90% sequential writes to 23953 IOPS with the RAM slog 
added.  The service time was also significantly better than the physical disk.
It also boosted the reads significantly and I'm guessing this is because of 
updating the access time on the files was completely cached.

So what are the downsides to this?  If both nodes were to crash and I used the 
same technique to recreate the ramdisk I would lose any transactions in the 
slog at the time of the crash, but the physical disk image is still in a 
consistent state right (just not from my apps point of view)?  Anyone have any 
idea what difference infiniband might make for the cross connect?  In some 
test, I did completely saturate the 1GB link between the boxes.

So is this idea completely crazy?  It also brings up questions of correctly 
sizing your slog in relation to the physical disk on the backend.  It looks 
like if the ZIL can handle significantly more I/O than the physical disk the 
effect will be short lived as the system will have to slow things down as it 
spends more time flushing from the slog to physical disk.  The 8GB looked like 
overkill in my case, because in a lot of the test, it drove the individual disk 
in the system to 100% and was causing service times on the physical disk in the 
900 - 1000ms range (although my app never saw that because of the slog).
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss