Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Rik van Riel
On Thu, 2019-01-10 at 12:26 +1100, Dave Chinner wrote: > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > This patch series has implementation for "virtio pmem". > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > which allows to bypass the guest page cache. This

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-11-21 Thread Rik van Riel
On Tue, 2017-11-21 at 10:26 -0800, Dan Williams wrote: > On Tue, Nov 21, 2017 at 10:19 AM, Rik van Riel <r...@redhat.com> > wrote: > > On Fri, 2017-11-03 at 14:21 +0800, Xiao Guangrong wrote: > > > On 11/03/2017 12:30 AM, Dan Williams wrote: > > > &

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-11-21 Thread Rik van Riel
On Fri, 2017-11-03 at 14:21 +0800, Xiao Guangrong wrote: > On 11/03/2017 12:30 AM, Dan Williams wrote: > > > > Good point, I was assuming that the mmio flush interface would be > > discovered separately from the NFIT-defined memory range. Perhaps > > via > > PCI in the guest? This piece of the

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Rik van Riel
On Thu, 2017-10-12 at 18:18 -0400, Pankaj Gupta wrote: > > > > On Thu, Oct 12, 2017 at 2:25 PM, Pankaj Gupta > > wrote: > > > > > > > >   This patch adds virtio-pmem driver for KVM guest. > > > > >   Guest reads the persistent memory range information > > > > >   over virtio

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-26 Thread Rik van Riel
On Wed, 2017-07-26 at 14:40 -0700, Dan Williams wrote: > On Wed, Jul 26, 2017 at 2:27 PM, Rik van Riel <r...@redhat.com> > wrote: > > On Wed, 2017-07-26 at 09:47 -0400, Pankaj Gupta wrote: > > > > > > > > > > Just want to summarize here(high lev

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-26 Thread Rik van Riel
On Wed, 2017-07-26 at 09:47 -0400, Pankaj Gupta wrote: > > > Just want to summarize here(high level): > > This will require implementing new 'virtio-pmem' device which > presents  > a DAX address range(like pmem) to guest with read/write(direct > access) > & device flush functionality. Also,

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-25 Thread Rik van Riel
On Tue, 2017-07-25 at 07:46 -0700, Dan Williams wrote: > On Tue, Jul 25, 2017 at 7:27 AM, Pankaj Gupta > wrote: > > > > Looks like only way to send flush(blk dev) from guest to host with > > nvdimm > > is using flush hint addresses. Is this the correct interface I am > >

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-23 Thread Rik van Riel
On Sun, 2017-07-23 at 09:01 -0700, Dan Williams wrote: > [ adding Ross and Jan ] > > On Sun, Jul 23, 2017 at 7:04 AM, Rik van Riel <r...@redhat.com> > wrote: > > > > The goal is to increase density of guests, by moving page > > cache into the ho

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-23 Thread Rik van Riel
On Sat, 2017-07-22 at 12:34 -0700, Dan Williams wrote: > On Fri, Jul 21, 2017 at 8:58 AM, Stefan Hajnoczi > wrote: > > > > Maybe the NVDIMM folks can comment on this idea. > > I think it's unworkable to use the flush hints as a guest-to-host > fsync mechanism. That

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-21 Thread Rik van Riel
On Fri, 2017-07-21 at 09:29 -0400, Pankaj Gupta wrote: > > > > > >  - Flush hint address traps from guest to host and do an > > > entire fsync > > >    on backing file which itself is costly. > > > > > >  - Can be used to flush specific pages on host backing disk. > > > We can > > >  

Re: [Qemu-devel] [PATCH v11 4/6] mm: function to offer a page block on the free list

2017-06-20 Thread Rik van Riel
On Tue, 2017-06-20 at 21:26 +0300, Michael S. Tsirkin wrote: > On Tue, Jun 20, 2017 at 01:29:00PM -0400, Rik van Riel wrote: > > I agree with that.  Let me go into some more detail of > > what Nitesh is implementing: > > > > 1) In arch_free_page, the being-freed page i

Re: [Qemu-devel] [PATCH v11 4/6] mm: function to offer a page block on the free list

2017-06-20 Thread Rik van Riel
On Tue, 2017-06-20 at 18:49 +0200, David Hildenbrand wrote: > On 20.06.2017 18:44, Rik van Riel wrote: > > Nitesh Lal (on the CC list) is working on a way > > to efficiently batch recently freed pages for > > free page hinting to the hypervisor. > > > > If th

Re: [Qemu-devel] [PATCH v11 4/6] mm: function to offer a page block on the free list

2017-06-20 Thread Rik van Riel
On Mon, 2017-06-12 at 07:10 -0700, Dave Hansen wrote: > The hypervisor is going to throw away the contents of these pages, > right?  As soon as the spinlock is released, someone can allocate a > page, and put good data in it.  What keeps the hypervisor from > throwing > away good data? That

Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-11 Thread Rik van Riel
On Thu, 2017-05-11 at 14:17 -0400, Stefan Hajnoczi wrote: > On Wed, May 10, 2017 at 09:26:00PM +0530, Pankaj Gupta wrote: > > * For live migration use case, if host side backing file is  > >   shared storage, we need to flush the page cache for the disk  > >   image at the destination (new fadvise

Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-11 Thread Rik van Riel
> 'KVM "fake DAX" device flushing' project for feedback. > > > Got the idea during discussion with 'Rik van Riel'. > > > > CCing NVDIMM folks. > > > > > > > > Also, request answers to 'Questions' section. > > > > > >

Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc

2017-02-27 Thread Rik van Riel
On Mon, 2017-02-27 at 11:10 +, Stefan Hajnoczi wrote: > On Thu, Feb 23, 2017 at 10:59:22AM +, Daniel P. Berrange wrote: > > When using a memory-backend object with prealloc turned on, QEMU > > will memset() the first byte in every memory page to zero. While > > this might have been

Re: [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster

2016-04-20 Thread Rik van Riel
On Wed, 2016-04-20 at 13:46 +0200, Kevin Wolf wrote: > Am 20.04.2016 um 12:40 hat Ric Wheeler geschrieben: > > > > On 04/20/2016 05:24 AM, Kevin Wolf wrote: > > > > > > Am 20.04.2016 um 03:56 hat Ric Wheeler geschrieben: > > > > > > > > On 04/19/2016 10:09 AM, Jeff Cody wrote: > > > > > > > >

Re: [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster

2016-04-20 Thread Rik van Riel
gt; > > here at LSF (the kernel summit for file and storage people) > > > > > and got > > > > > a non-public confirmation that individual storage devices (s- > > > > > ata > > > > > drives or scsi) can also dump cache state w

Re: [Qemu-devel] [PATCH kernel 1/2] mm: add the related functions to build the free page bitmap

2016-04-19 Thread Rik van Riel
On Tue, 2016-04-19 at 15:02 +, Li, Liang Z wrote: > > > > On Tue, 2016-04-19 at 22:34 +0800, Liang Li wrote: > > > > > > The free page bitmap will be sent to QEMU through virtio > > > interface and > > > used for live migration optimization. > > > Drop the cache before building the free page

Re: [Qemu-devel] [PATCH kernel 1/2] mm: add the related functions to build the free page bitmap

2016-04-19 Thread Rik van Riel
On Tue, 2016-04-19 at 22:34 +0800, Liang Li wrote: > The free page bitmap will be sent to QEMU through virtio interface > and used for live migration optimization. > Drop the cache before building the free page bitmap can get more > free pages. Whether dropping the cache is decided by user. >

Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-09 Thread Rik van Riel
On Wed, 2016-03-09 at 20:04 +0300, Roman Kagan wrote: > On Wed, Mar 09, 2016 at 05:41:39PM +0200, Michael S. Tsirkin wrote: > > On Wed, Mar 09, 2016 at 05:28:54PM +0300, Roman Kagan wrote: > > > For (1) I've been trying to make a point that skipping clean > > > pages is > > > much more likely to

Re: [Qemu-devel] [PATCH v2] util/mmap-alloc: fix hugetlb support on ppc64

2015-12-02 Thread Rik van Riel
same fd. > > Naturally, this makes the guard page bigger with hugetlbfs. > > Based on patch by Greg Kurz. > > Cc: Rik van Riel <r...@redhat.com> > CC: Greg Kurz <gk...@linux.vnet.ibm.com> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> Acked-by: Rik van Riel <r...@redhat.com> -- All rights reversed

Re: [Qemu-devel] [PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages

2013-10-10 Thread Rik van Riel
in 11feeb498086a3a5907b8148bdf1786a9b18fc55. The cacheline was already modified in order to set PG_tail so this won't affect the boot time of large memory systems. Reported-by: andy123 ajs124.ajs...@gmail.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com Acked-by: Rik van Riel r...@redhat.com

Re: [Qemu-devel] [PATCH] mm: compaction: Correct the nr_strict_isolated check for CMA

2012-10-16 Thread Rik van Riel
...@prisktech.co.nz Acked-by: Rik van Riel r...@redhat.com -- All rights reversed

Re: [Qemu-devel] [PATCH 0/9] Reduce compaction scanning and lock contention

2012-09-21 Thread Rik van Riel
On 09/21/2012 06:46 AM, Mel Gorman wrote: Hi Andrew, Richard Davies and Shaohua Li have both reported lock contention problems in compaction on the zone and LRU locks as well as significant amounts of time being spent in compaction. This series aims to reduce lock contention and scanning rates

Re: [Qemu-devel] [PATCH 1/6] mm: compaction: Abort compaction loop if lock is contended or run too long

2012-09-20 Thread Rik van Riel
...@suse.de Acked-by: Rik van Riel r...@redhat.com

Re: [Qemu-devel] [PATCH 2/6] mm: compaction: Acquire the zone-lru_lock as late as possible

2012-09-20 Thread Rik van Riel
then the LRU lock will not be acquired at all which reduces contention on zone-lru_lock. Signed-off-by: Mel Gorman mgor...@suse.de Acked-by: Rik van Riel r...@redhat.com

Re: [Qemu-devel] [PATCH 3/6] mm: compaction: Acquire the zone-lock as late as possible

2012-09-20 Thread Rik van Riel
there are no free pages in the pageblock then the lock will not be acquired at all which reduces contention on zone-lock. Signed-off-by: Mel Gorman mgor...@suse.de Acked-by: Rik van Riel r...@redhat.com

Re: [Qemu-devel] [PATCH 4/6] Revert mm: have order 0 compaction start off where it left

2012-09-20 Thread Rik van Riel
, it makes your next patches easier... Acked-by: Rik van Riel r...@redhat.com

Re: [Qemu-devel] [PATCH 5/6] mm: compaction: Cache if a pageblock was scanned and no pages were isolated

2012-09-20 Thread Rik van Riel
the cached information? If it's ignored too often, the scanning rates will still be excessive. If the information is too stale then allocations will fail that might have otherwise succeeded. In this patch Big hammer, but I guess it is effective... Acked-by: Rik van Riel r...@redhat.com

Re: [Qemu-devel] [PATCH 6/6] mm: compaction: Restart compaction from near where it left off

2012-09-20 Thread Rik van Riel
cycle through more slowly can continue, even when this particular zone is experiencing problems, so I guess this is desired behaviour... Acked-by: Rik van Riel r...@redhat.com

Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction skip ahead logic robust

2012-09-17 Thread Rik van Riel
On 09/15/2012 11:55 AM, Richard Davies wrote: Hi Rik, Mel and Shaohua, Thank you for your latest patches. I attach my latest perf report for a slow boot with all of these applied. Mel asked for timings of the slow boots. It's very hard to give anything useful here! A normal boot would be a

[Qemu-devel] [PATCH 2/2] make the compaction skip ahead logic robust

2012-09-13 Thread Rik van Riel
do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel r...@redhat.com Reported-by: Richard Davies rich...@daviesmail.org diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool

[Qemu-devel] [PATCH 1/2] Revert mm: have order 0 compaction start near a pageblock with free pages

2012-09-13 Thread Rik van Riel
-compact_cached_free_pfn is never advanced until the compaction run wraps around the start of the zone. This merely moved the starting point for the quadratic behaviour further into the zone, but the behaviour has still been observed. It looks like another fix is required. Signed-off-by: Rik van

[Qemu-devel] [PATCH -v2 2/2] make the compaction skip ahead logic robust

2012-09-13 Thread Rik van Riel
to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel r...@redhat.com Reported-by: Richard Davies rich

Re: [Qemu-devel] Windows slow boot: contractor wanted

2012-08-25 Thread Rik van Riel
On 08/25/2012 01:45 PM, Richard Davies wrote: Are you talking about these patches? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 http://marc.info/?l=linux-mmm=134521289221259 If so, I believe those are in 3.6.0-rc3, so I

Re: [Qemu-devel] Windows slow boot: contractor wanted

2012-08-22 Thread Rik van Riel
On 08/22/2012 10:41 AM, Richard Davies wrote: Avi Kivity wrote: Richard Davies wrote: I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x

Re: [Qemu-devel] [PATCH 2/2] Expose tsc deadline timer cpuid to guest

2012-03-19 Thread Rik van Riel
On 03/09/2012 01:27 PM, Liu, Jinsong wrote: As for 'tsc deadline' feature exposing, my patch (as attached) just obey qemu general cpuid exposing method, and also satisfied your target I think. One question. Why is TSC_DEADLINE not exposed in the cpuid allowed feature bits in do_cpuid_ent()

Re: [Qemu-devel] [PATCH] qemu_timedate_diff() shouldn't modify its argument.

2011-11-07 Thread Rik van Riel
may be outdated. Ohhh, nice catch. Signed-off-by: Gleb Natapovg...@redhat.com Acked-by: Rik van Riel r...@redhat.com -- All rights reversed

[Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-12-01 Thread Rik van Riel
On 12/01/2010 12:22 PM, Peter Zijlstra wrote: On Wed, 2010-12-01 at 09:17 -0800, Chris Wright wrote: Directed yield and fairness don't mix well either. You can end up feeding the other tasks more time than you'll ever get back. If the directed yield is always to another task in your cgroup

[Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-12-01 Thread Rik van Riel
On 12/01/2010 02:07 PM, Peter Zijlstra wrote: On Wed, 2010-12-01 at 12:26 -0500, Rik van Riel wrote: On 12/01/2010 12:22 PM, Peter Zijlstra wrote: The pause loop exiting directed yield patches I am working on preserve inter-vcpu fairness by round robining among the vcpus inside one KVM

[Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-12-01 Thread Rik van Riel
On 12/01/2010 02:35 PM, Peter Zijlstra wrote: On Wed, 2010-12-01 at 14:24 -0500, Rik van Riel wrote: Even if we equalized the amount of CPU time each VCPU ends up getting across some time interval, that is no guarantee they get useful work done, or that the time gets fairly divided to _user

Re: [Qemu-devel] [RFC][PATCH] make sure disk writes actually hit disk

2006-07-29 Thread Rik van Riel
Fabrice Bellard wrote: Hi, Using O_SYNC for disk image access is not acceptable: QEMU relies on the host OS to ensure that the data is written correctly. This means that write ordering is not preserved, and on a power failure any data written by qemu (or Xen fully virt) guests may not be

[Qemu-devel] Re: [RFC][PATCH] make sure disk writes actually hit disk

2006-07-28 Thread Rik van Riel
Rik van Riel wrote: This is the simple approach to making sure that disk writes actually hit disk before we tell the guest OS that IO has completed. Thanks to DMA_MULTI_THREAD the performance still seems to be adequate. Hah, and of course that bit is only found in Xen's qemu-dm. Doh! I knew

Re: [Qemu-devel] Re: [RFC][PATCH] make sure disk writes actually hit disk

2006-07-28 Thread Rik van Riel
Anthony Liguori wrote: Right now Fabrice is working on rewriting the block API to be asynchronous. There's been quite a lot of discussion about why using threads isn't a good idea for this Agreed, AIO is the way to go in the long run. With a proper async API, is there any reason why we

Re: [Qemu-devel] Re: [RFC][PATCH] make sure disk writes actually hit disk

2006-07-28 Thread Rik van Riel
Paul Brook wrote: With a proper async API, is there any reason why we would want this to be tunable? I don't think there's much of a benefit of prematurely claiming a write is complete especially once the SCSI emulation can support multiple simultaneous requests. You're right. This O_SYNC