Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-26 Thread Avi Kivity
On 02/26/2010 04:36 PM, Anthony Liguori wrote: On 02/26/2010 02:47 AM, Avi Kivity wrote: qcow2 is still not fully asynchronous. All the other format drivers (except raw) are fully synchronous. If we had a threaded infrastructure, we could convert them all in a day. As it is, you can only us

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-26 Thread Anthony Liguori
On 02/26/2010 02:47 AM, Avi Kivity wrote: qcow2 is still not fully asynchronous. All the other format drivers (except raw) are fully synchronous. If we had a threaded infrastructure, we could convert them all in a day. As it is, you can only use the other block format drivers in 'qemu-img co

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-26 Thread Avi Kivity
On 02/25/2010 09:55 PM, Anthony Liguori wrote: On 02/25/2010 11:33 AM, Avi Kivity wrote: On 02/25/2010 07:15 PM, Anthony Liguori wrote: I agree. Further, once we fine-grain device threading, the iothread essentially disappears and is replaced by device-specific threads. There's no "idle" any

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-25 Thread Anthony Liguori
On 02/25/2010 11:33 AM, Avi Kivity wrote: On 02/25/2010 07:15 PM, Anthony Liguori wrote: I agree. Further, once we fine-grain device threading, the iothread essentially disappears and is replaced by device-specific threads. There's no "idle" anymore. That's a nice idea, but how is io dispa

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-25 Thread malc
On Thu, 25 Feb 2010, Avi Kivity wrote: > On 02/25/2010 07:15 PM, Anthony Liguori wrote: > > > I agree. Further, once we fine-grain device threading, the iothread > > > essentially disappears and is replaced by device-specific threads. > > > There's no "idle" anymore. > > > > > > That's a nice i

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-25 Thread Avi Kivity
On 02/25/2010 07:15 PM, Anthony Liguori wrote: I agree. Further, once we fine-grain device threading, the iothread essentially disappears and is replaced by device-specific threads. There's no "idle" anymore. That's a nice idea, but how is io dispatch handled? Is everything synchronous or

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-25 Thread Anthony Liguori
On 02/25/2010 11:11 AM, Avi Kivity wrote: On 02/25/2010 05:06 PM, Paul Brook wrote: Idle bottom halves (i.e. qemu_bh_schedule_idle) are just bugs waiting to happen, and should never be used for anything. Idle bottom halves make considerable more sense than the normal bottom halves. The fact t

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-25 Thread Avi Kivity
On 02/25/2010 05:06 PM, Paul Brook wrote: Idle bottom halves (i.e. qemu_bh_schedule_idle) are just bugs waiting to happen, and should never be used for anything. Idle bottom halves make considerable more sense than the normal bottom halves. The fact that rescheduling a bottom half withi

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-25 Thread Paul Brook
> Very simply, without idle bottom halves, there's no way to implement > polling with the main loop. If we dropped idle bottom halves, we would > have to add explicit polling back to the main loop. > > How would you implement polling? AFAICS any sort of polling is by definition time based so use

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-25 Thread Anthony Liguori
On 02/25/2010 09:06 AM, Paul Brook wrote: Idle bottom halves (i.e. qemu_bh_schedule_idle) are just bugs waiting to happen, and should never be used for anything. Idle bottom halves make considerable more sense than the normal bottom halves. The fact that rescheduling a bottom half withi

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-25 Thread Paul Brook
> > Idle bottom halves (i.e. qemu_bh_schedule_idle) are just bugs waiting to > > happen, and should never be used for anything. > > Idle bottom halves make considerable more sense than the normal bottom > halves. > > The fact that rescheduling a bottom half within a bottom half results in > an in

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-24 Thread Anthony Liguori
On 02/23/2010 08:58 PM, Paul Brook wrote: Bottom halves are run at the very end of the event loop which means that they're guaranteed to be the last thing run. idle bottom halves can be rescheduled without causing an infinite loop and do not affect the select timeout (which normal bottom halves

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-02-23 Thread Paul Brook
> Bottom halves are run at the very end of the event loop which means that > they're guaranteed to be the last thing run. idle bottom halves can be > rescheduled without causing an infinite loop and do not affect the > select timeout (which normal bottom halves do). Idle bottom halves (i.e. qemu_

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori
On 01/11/2010 09:35 AM, Avi Kivity wrote: On 01/11/2010 05:32 PM, Anthony Liguori wrote: On 01/11/2010 09:31 AM, Avi Kivity wrote: On 01/11/2010 05:22 PM, Anthony Liguori wrote: Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity
On 01/11/2010 05:32 PM, Anthony Liguori wrote: On 01/11/2010 09:31 AM, Avi Kivity wrote: On 01/11/2010 05:22 PM, Anthony Liguori wrote: Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various tx mitigation schemes, timeout durations,

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori
On 01/11/2010 09:31 AM, Avi Kivity wrote: On 01/11/2010 05:22 PM, Anthony Liguori wrote: Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various tx mitigation schemes, timeout durations, etc) and then we can do some deep performance

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity
On 01/11/2010 05:22 PM, Anthony Liguori wrote: Based on our experiences with virtio-net, what I'd suggest is to make a lot of tunable options (ring size, various tx mitigation schemes, timeout durations, etc) and then we can do some deep performance studies to see how things interact with eac

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity
On 01/11/2010 05:13 PM, Anthony Liguori wrote: On 01/11/2010 08:46 AM, Avi Kivity wrote: On 01/11/2010 04:37 PM, Anthony Liguori wrote: That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specificall

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori
On 01/11/2010 09:19 AM, Avi Kivity wrote: OTOH, if we aggressively poll the ring when we have an opportunity to, there's very little down side to that and it addresses the serialization problem. But we can't guarantee that we'll get those opportunities, so it doesn't address the problem in a

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori
On 01/11/2010 08:46 AM, Avi Kivity wrote: On 01/11/2010 04:37 PM, Anthony Liguori wrote: That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specifically to avoid cache line bouncing. I meant, exits

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity
On 01/11/2010 04:37 PM, Anthony Liguori wrote: That has the downside of bouncing a cache line on unrelated exits. The read and write sides of the ring are widely separated in physical memory specifically to avoid cache line bouncing. I meant, exits on random vcpus will cause the cacheline c

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori
On 01/11/2010 08:29 AM, Avi Kivity wrote: On 01/11/2010 03:49 PM, Anthony Liguori wrote: So instead of disabling notify while requests are active we might want to only disable it while we are inside virtio_blk_handle_output. Something like the following minimally tested patch: I'd suggest tha

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity
On 01/11/2010 03:49 PM, Anthony Liguori wrote: So instead of disabling notify while requests are active we might want to only disable it while we are inside virtio_blk_handle_output. Something like the following minimally tested patch: I'd suggest that we get even more aggressive and install a

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity
On 01/11/2010 03:42 PM, Christoph Hellwig wrote: On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote: The patch has potential to reduce performance on volumes with multiple spindles. Consider two processes issuing sequential reads into a RAID array. With this patch, the reads will b

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori
On 01/11/2010 07:47 AM, Christoph Hellwig wrote: On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote: As Dor points out, the call to virtio_blk_handle_output() wants to be before the test for pending, so we scan the ring as early as possible It could cause a race window where w

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Anthony Liguori
On 01/11/2010 07:42 AM, Christoph Hellwig wrote: On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote: The patch has potential to reduce performance on volumes with multiple spindles. Consider two processes issuing sequential reads into a RAID array. With this patch, the reads will b

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote: > As Dor points out, the call to virtio_blk_handle_output() wants to be > before the test for pending, so we scan the ring as early as possible It could cause a race window where we add an entry to the ring after we run virtio_blk_handle

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 10:30:53AM +0200, Avi Kivity wrote: > The patch has potential to reduce performance on volumes with multiple > spindles. Consider two processes issuing sequential reads into a RAID > array. With this patch, the reads will be executed sequentially rather > than in parall

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 03:13:53PM +0200, Avi Kivity wrote: > As Dor points out, the call to virtio_blk_handle_output() wants to be > before the test for pending, so we scan the ring as early as possible I just reposted the patch in a way that it applies to share the work I did when starting to r

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity
On 01/11/2010 03:11 PM, Christoph Hellwig wrote: FYI below is the manually applied patch without all the wrapping: static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) { VirtIOBlock *s = req->dev; @@ -95,6 +98,12 @@ static void virtio_blk_req_complete(Virt virtqueu

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Christoph Hellwig
On Mon, Jan 11, 2010 at 11:19:21AM +0200, Dor Laor wrote: > >Attached results with rhel5.4 (qemu0.11) for win2k8 32bit guest. Note > >the drastic reduction in cpu consumption. > > Attachment did not survive the email server, so you'll have to trust me > saying that cpu consumption was done from 6

Re: [Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Dor Laor
On 01/11/2010 11:03 AM, Dor Laor wrote: On 01/11/2010 10:30 AM, Avi Kivity wrote: On 01/11/2010 09:40 AM, Vadim Rozenfeld wrote: The following patch allows us to improve Windows virtio block driver performance on small size requests. Additionally, it leads to reducing of cpu usage on write IOs

[Qemu-devel] Re: [RFC][PATCH] performance improvement for windows guests, running on top of virtio block device

2010-01-11 Thread Avi Kivity
On 01/11/2010 09:40 AM, Vadim Rozenfeld wrote: The following patch allows us to improve Windows virtio block driver performance on small size requests. Additionally, it leads to reducing of cpu usage on write IOs Note, this is not an improvement for Windows specifically. diff --git a/hw/