Re: [zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver

2011-08-07 Thread Roy Sigurd Karlsbakk
  1) is this a good idea?
 
  2) any of you are running vserver guests on iSCSI targets? Happy
  with it?
 
 Yes, we have been using iSCSI to hold vserver guests for a couple of
 years now and are generally unhappy with it. Besides our general
 distress at Nexenta, there is the constraint of the Linux file system.
 
 Someone please correct me if I'm wrong because this is a big problem
 for
 us. As far as I know, Linux file system block size cannot exceed the
 maximum memory page size and is limited to no more than 4KB. iSCSI
 appears to acknowledge every individual block that is sent. That means
 the most data one can stream without an ACK is 4KB. That means the
 throughput is limited by the latency of the network rather than the
 bandwidth.

Even if Linux filesystems generally stick to a block size of 4kB, that doesn't 
mean all transfers are maximum 4kB. If that would have been the case, Linux 
would be quite useless for a server. I/O operations are queued and if, for 
instance, a read() call requests 8MB, that's done in a single operation.

 Nexenta is built on OpenSolaris and has a significantly higher
 internal
 network latency than Linux. It is not unusual for us to see round trip
 times from host to Nexenta well upwards of 100us (micro-seconds).
 Let's
 say it was even as good as 100us. One could send up to 10,000 packets
 per second * 4KB = 40MBps maximum throughput for any one iSCSI
 conversation. That's pretty lousy disk throughput.

That's why, back in 1992, the sliding window protocol was created 
(http://tools.ietf.org/html/rfc1323), so that a peer won't wait for a TCP ACK 
before resuming operation. 

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver

2011-08-07 Thread Carson Gaspar

On 8/7/11 6:36 AM, Roy Sigurd Karlsbakk wrote:


That's why, back in 1992, the sliding window protocol was created 
(http://tools.ietf.org/html/rfc1323), so that a peer won't wait for a TCP ACK 
before resuming operation.


It was part of TCP _long_ before that (it was never as stupid as XMODEM 
;-) ). That RFC specifies window scaling to support windows sizes larger 
than 2^16 bytes, useful for large bandidth*delay product networks.


--
Carson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver

2011-08-07 Thread Carson Gaspar



maximum memory page size and is limited to no more than 4KB. iSCSI
appears to acknowledge every individual block that is sent. That means
the most data one can stream without an ACK is 4KB. That means the
throughput is limited by the latency of the network rather than the
bandwidth.


I am _far_ from an iSCSI expert, but the above should not be true, as it 
isn't true for other SCSI flavours. If your initiator supports command 
queuing, it should happily write multiple blocks before stalling on a 
response.


You can also enable write cache support, but I don't recall if it's 
necessary to do so on the initiator, the target, or both.


--
Carson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver

2011-08-06 Thread Eugen Leitl
- Forwarded message from John A. Sullivan III 
jsulli...@opensourcedevel.com -

From: John A. Sullivan III jsulli...@opensourcedevel.com
Date: Sat, 06 Aug 2011 16:30:04 -0400
To: vser...@list.linux-vserver.org
Subject: Re: [vserver] hybrid zfs pools as iSCSI targets for vserver
Reply-To: vser...@list.linux-vserver.org
X-Mailer: Evolution 2.30.3 

On Sat, 2011-08-06 at 21:40 +0200, Eugen Leitl wrote:
 I've recently figured out how to make low-end hardware (e.g. HP N36L)
 work well as zfs hybrid pools. The system (Nexenta Core + napp-it)
 exports the zfs pools as CIFS, NFS or iSCSI (Comstar).
 
 1) is this a good idea?
 
 2) any of you are running vserver guests on iSCSI targets? Happy with it?
 
Yes, we have been using iSCSI to hold vserver guests for a couple of
years now and are generally unhappy with it.  Besides our general
distress at Nexenta, there is the constraint of the Linux file system.

Someone please correct me if I'm wrong because this is a big problem for
us.  As far as I know, Linux file system block size cannot exceed the
maximum memory page size and is limited to no more than 4KB.  iSCSI
appears to acknowledge every individual block that is sent. That means
the most data one can stream without an ACK is 4KB. That means the
throughput is limited by the latency of the network rather than the
bandwidth.

Nexenta is built on OpenSolaris and has a significantly higher internal
network latency than Linux.  It is not unusual for us to see round trip
times from host to Nexenta well upwards of 100us (micro-seconds).  Let's
say it was even as good as 100us.  One could send up to 10,000 packets
per second * 4KB = 40MBps maximum throughput for any one iSCSI
conversation.  That's pretty lousy disk throughput.

Other than that, iSCSI is fabulous because it appears as a local block
device.  We typically mount a large data volume into the VServer host
and the mount rbind it into the guest file systems.  A magically well
working file server without a file server or the hassles of a network
file system.  Our single complaint other than about Nexenta themselves
is the latency constrained throughput.

Any one have a way around that? Thanks - John

- End forwarded message -
-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver

2011-08-06 Thread Eugen Leitl
- Forwarded message from Gordan Bobic gor...@bobich.net -

From: Gordan Bobic gor...@bobich.net
Date: Sat, 06 Aug 2011 21:37:30 +0100
To: vser...@list.linux-vserver.org
Subject: Re: [vserver] hybrid zfs pools as iSCSI targets for vserver
Reply-To: vser...@list.linux-vserver.org
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) 
Gecko/20110621 Red Hat/3.1.11-2.el6_1 Lightning/1.0b2 Thunderbird/3.1.11

On 08/06/2011 09:30 PM, John A. Sullivan III wrote:
 On Sat, 2011-08-06 at 21:40 +0200, Eugen Leitl wrote:
 I've recently figured out how to make low-end hardware (e.g. HP N36L)
 work well as zfs hybrid pools. The system (Nexenta Core + napp-it)
 exports the zfs pools as CIFS, NFS or iSCSI (Comstar).

 1) is this a good idea?

 2) any of you are running vserver guests on iSCSI targets? Happy with it?

 Yes, we have been using iSCSI to hold vserver guests for a couple of
 years now and are generally unhappy with it.  Besides our general
 distress at Nexenta, there is the constraint of the Linux file system.

 Someone please correct me if I'm wrong because this is a big problem for
 us.  As far as I know, Linux file system block size cannot exceed the
 maximum memory page size and is limited to no more than 4KB.

I'm pretty sure it is _only_ limited by memory page size, since I'm pretty 
sure I remember that 8KB blocks were available on SPARC.

 iSCSI
 appears to acknowledge every individual block that is sent. That means
 the most data one can stream without an ACK is 4KB. That means the
 throughput is limited by the latency of the network rather than the
 bandwidth.

Hmm, buffering in the FS shouldn't be dependant on the block layer  
immediately acknowledging unless you are issuing fsync()/barriers. What FS 
are you using on top of the iSCSI block device and is your application 
fsync() heavy?

Gordan

- End forwarded message -
-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss