Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Rik van Riel
On Thu, 2019-01-10 at 12:26 +1100, Dave Chinner wrote: > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > This patch series has implementation for "virtio pmem". > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > which allows to bypass the guest page cache. This

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-11-21 Thread Rik van Riel
On Tue, 2017-11-21 at 10:26 -0800, Dan Williams wrote: > On Tue, Nov 21, 2017 at 10:19 AM, Rik van Riel > wrote: > > On Fri, 2017-11-03 at 14:21 +0800, Xiao Guangrong wrote: > > > On 11/03/2017 12:30 AM, Dan Williams wrote: > > > > > > > > Good point,

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-11-21 Thread Rik van Riel
On Fri, 2017-11-03 at 14:21 +0800, Xiao Guangrong wrote: > On 11/03/2017 12:30 AM, Dan Williams wrote: > > > > Good point, I was assuming that the mmio flush interface would be > > discovered separately from the NFIT-defined memory range. Perhaps > > via > > PCI in the guest? This piece of the pro

Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver

2017-10-12 Thread Rik van Riel
On Thu, 2017-10-12 at 18:18 -0400, Pankaj Gupta wrote: > > > > On Thu, Oct 12, 2017 at 2:25 PM, Pankaj Gupta > > wrote: > > > > > > > >   This patch adds virtio-pmem driver for KVM guest. > > > > >   Guest reads the persistent memory range information > > > > >   over virtio bus from Qemu and re

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-26 Thread Rik van Riel
On Wed, 2017-07-26 at 14:40 -0700, Dan Williams wrote: > On Wed, Jul 26, 2017 at 2:27 PM, Rik van Riel > wrote: > > On Wed, 2017-07-26 at 09:47 -0400, Pankaj Gupta wrote: > > > > > > > > > > Just want to summarize here(high level): > > > &g

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-26 Thread Rik van Riel
On Wed, 2017-07-26 at 09:47 -0400, Pankaj Gupta wrote: > > > Just want to summarize here(high level): > > This will require implementing new 'virtio-pmem' device which > presents  > a DAX address range(like pmem) to guest with read/write(direct > access) > & device flush functionality. Also, qemu

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-25 Thread Rik van Riel
On Tue, 2017-07-25 at 07:46 -0700, Dan Williams wrote: > On Tue, Jul 25, 2017 at 7:27 AM, Pankaj Gupta > wrote: > > > > Looks like only way to send flush(blk dev) from guest to host with > > nvdimm > > is using flush hint addresses. Is this the correct interface I am > > looking? > > > > blkdev_

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-23 Thread Rik van Riel
On Sun, 2017-07-23 at 09:01 -0700, Dan Williams wrote: > [ adding Ross and Jan ] > > On Sun, Jul 23, 2017 at 7:04 AM, Rik van Riel > wrote: > > > > The goal is to increase density of guests, by moving page > > cache into the host (where it can be easily reclaim

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-23 Thread Rik van Riel
On Sat, 2017-07-22 at 12:34 -0700, Dan Williams wrote: > On Fri, Jul 21, 2017 at 8:58 AM, Stefan Hajnoczi > wrote: > > > > Maybe the NVDIMM folks can comment on this idea. > > I think it's unworkable to use the flush hints as a guest-to-host > fsync mechanism. That mechanism was designed to flush

Re: [Qemu-devel] KVM "fake DAX" flushing interface - discussion

2017-07-21 Thread Rik van Riel
On Fri, 2017-07-21 at 09:29 -0400, Pankaj Gupta wrote: > > > > > >  - Flush hint address traps from guest to host and do an > > > entire fsync > > >    on backing file which itself is costly. > > > > > >  - Can be used to flush specific pages on host backing disk. > > > We can > > >  

Re: [Qemu-devel] [PATCH v11 4/6] mm: function to offer a page block on the free list

2017-06-20 Thread Rik van Riel
On Tue, 2017-06-20 at 21:26 +0300, Michael S. Tsirkin wrote: > On Tue, Jun 20, 2017 at 01:29:00PM -0400, Rik van Riel wrote: > > I agree with that.  Let me go into some more detail of > > what Nitesh is implementing: > > > > 1) In arch_free_page, the being-freed page i

Re: [Qemu-devel] [PATCH v11 4/6] mm: function to offer a page block on the free list

2017-06-20 Thread Rik van Riel
On Tue, 2017-06-20 at 18:49 +0200, David Hildenbrand wrote: > On 20.06.2017 18:44, Rik van Riel wrote: > > Nitesh Lal (on the CC list) is working on a way > > to efficiently batch recently freed pages for > > free page hinting to the hypervisor. > > > > If th

Re: [Qemu-devel] [PATCH v11 4/6] mm: function to offer a page block on the free list

2017-06-20 Thread Rik van Riel
On Mon, 2017-06-12 at 07:10 -0700, Dave Hansen wrote: > The hypervisor is going to throw away the contents of these pages, > right?  As soon as the spinlock is released, someone can allocate a > page, and put good data in it.  What keeps the hypervisor from > throwing > away good data? That looks

Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-11 Thread Rik van Riel
On Thu, 2017-05-11 at 14:17 -0400, Stefan Hajnoczi wrote: > On Wed, May 10, 2017 at 09:26:00PM +0530, Pankaj Gupta wrote: > > * For live migration use case, if host side backing file is  > >   shared storage, we need to flush the page cache for the disk  > >   image at the destination (new fadvise

Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-11 Thread Rik van Riel
X" device flushing' project for feedback. > > > Got the idea during discussion with 'Rik van Riel'. > > > > CCing NVDIMM folks. > > > > > > > > Also, request answers to 'Questions' section. > > > > > > Abstr

Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc

2017-02-27 Thread Rik van Riel
On Mon, 2017-02-27 at 11:10 +, Stefan Hajnoczi wrote: > On Thu, Feb 23, 2017 at 10:59:22AM +, Daniel P. Berrange wrote: > > When using a memory-backend object with prealloc turned on, QEMU > > will memset() the first byte in every memory page to zero. While > > this might have been acceptab

Re: [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster

2016-04-20 Thread Rik van Riel
On Wed, 2016-04-20 at 13:46 +0200, Kevin Wolf wrote: > Am 20.04.2016 um 12:40 hat Ric Wheeler geschrieben: > > > > On 04/20/2016 05:24 AM, Kevin Wolf wrote: > > > > > > Am 20.04.2016 um 03:56 hat Ric Wheeler geschrieben: > > > > > > > > On 04/19/2016 10:09 AM, Jeff Cody wrote: > > > > > > > > >

Re: [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster

2016-04-20 Thread Rik van Riel
; > > here at LSF (the kernel summit for file and storage people) > > > > > and got > > > > > a non-public confirmation that individual storage devices (s- > > > > > ata > > > > > drives or scsi) can also dump cache state w

Re: [Qemu-devel] [PATCH kernel 1/2] mm: add the related functions to build the free page bitmap

2016-04-19 Thread Rik van Riel
On Tue, 2016-04-19 at 15:02 +, Li, Liang Z wrote: > > > > On Tue, 2016-04-19 at 22:34 +0800, Liang Li wrote: > > > > > > The free page bitmap will be sent to QEMU through virtio > > > interface and > > > used for live migration optimization. > > > Drop the cache before building the free page

Re: [Qemu-devel] [PATCH kernel 1/2] mm: add the related functions to build the free page bitmap

2016-04-19 Thread Rik van Riel
On Tue, 2016-04-19 at 22:34 +0800, Liang Li wrote: > The free page bitmap will be sent to QEMU through virtio interface > and used for live migration optimization. > Drop the cache before building the free page bitmap can get more > free pages. Whether dropping the cache is decided by user. > How

Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-09 Thread Rik van Riel
On Wed, 2016-03-09 at 20:04 +0300, Roman Kagan wrote: > On Wed, Mar 09, 2016 at 05:41:39PM +0200, Michael S. Tsirkin wrote: > > On Wed, Mar 09, 2016 at 05:28:54PM +0300, Roman Kagan wrote: > > > For (1) I've been trying to make a point that skipping clean > > > pages is > > > much more likely to re

Re: [Qemu-devel] [PATCH v2] util/mmap-alloc: fix hugetlb support on ppc64

2015-12-02 Thread Rik van Riel
the same fd. > > Naturally, this makes the guard page bigger with hugetlbfs. > > Based on patch by Greg Kurz. > > Cc: Rik van Riel > CC: Greg Kurz > Signed-off-by: Michael S. Tsirkin Acked-by: Rik van Riel -- All rights reversed

Re: [Qemu-devel] [PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages

2013-10-10 Thread Rik van Riel
ization in 11feeb498086a3a5907b8148bdf1786a9b18fc55. The cacheline was already modified in order to set PG_tail so this won't affect the boot time of large memory systems. Reported-by: andy123 Signed-off-by: Andrea Arcangeli Acked-by: Rik van Riel

Re: [Qemu-devel] [PATCH] mm: compaction: Correct the nr_strict_isolated check for CMA

2012-10-16 Thread Rik van Riel
possible that more pages than necessary are isolated but the check still fails and I missed that this fix was not picked up before RC1. This same problem has been identified in 3.7-RC1 by Tony Prisk and should be addressed by the following patch. Signed-off-by: Mel Gorman Tested-by: Tony Prisk

Re: [Qemu-devel] [PATCH 0/9] Reduce compaction scanning and lock contention

2012-09-21 Thread Rik van Riel
On 09/21/2012 06:46 AM, Mel Gorman wrote: Hi Andrew, Richard Davies and Shaohua Li have both reported lock contention problems in compaction on the zone and LRU locks as well as significant amounts of time being spent in compaction. This series aims to reduce lock contention and scanning rates t

Re: [Qemu-devel] [PATCH 6/6] mm: compaction: Restart compaction from near where it left off

2012-09-20 Thread Rik van Riel
e we cycle through more slowly can continue, even when this particular zone is experiencing problems, so I guess this is desired behaviour... Acked-by: Rik van Riel

Re: [Qemu-devel] [PATCH 5/6] mm: compaction: Cache if a pageblock was scanned and no pages were isolated

2012-09-20 Thread Rik van Riel
en to ignore the cached information?" If it's ignored too often, the scanning rates will still be excessive. If the information is too stale then allocations will fail that might have otherwise succeeded. In this patch Big hammer, but I guess it is effective... Acked-by: Rik van Riel

Re: [Qemu-devel] [PATCH 4/6] Revert "mm: have order > 0 compaction start off where it left"

2012-09-20 Thread Rik van Riel
r next patches easier... Acked-by: Rik van Riel

Re: [Qemu-devel] [PATCH 3/6] mm: compaction: Acquire the zone->lock as late as possible

2012-09-20 Thread Rik van Riel
ere are no free pages in the pageblock then the lock will not be acquired at all which reduces contention on zone->lock. Signed-off-by: Mel Gorman Acked-by: Rik van Riel

Re: [Qemu-devel] [PATCH 2/6] mm: compaction: Acquire the zone->lru_lock as late as possible

2012-09-20 Thread Rik van Riel
uge then the LRU lock will not be acquired at all which reduces contention on zone->lru_lock. Signed-off-by: Mel Gorman Acked-by: Rik van Riel

Re: [Qemu-devel] [PATCH 1/6] mm: compaction: Abort compaction loop if lock is contended or run too long

2012-09-20 Thread Rik van Riel
ock is even contended. [minc...@kernel.org: Putback pages isolated for migration if aborting] [a...@linux-foundation.org: compact_zone_order requires non-NULL arg contended] Signed-off-by: Andrea Arcangeli Signed-off-by: Shaohua Li Signed-off-by: Mel Gorman Acked-by: Rik van Riel

Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust

2012-09-17 Thread Rik van Riel
On 09/15/2012 11:55 AM, Richard Davies wrote: Hi Rik, Mel and Shaohua, Thank you for your latest patches. I attach my latest perf report for a slow boot with all of these applied. Mel asked for timings of the slow boots. It's very hard to give anything useful here! A normal boot would be a minu

[Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust

2012-09-13 Thread Rik van Riel
ne. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel Reported-by: Richard Davies di

[Qemu-devel] [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages"

2012-09-13 Thread Rik van Riel
ppears to have re-introduced quadratic behaviour in that the value of zone->compact_cached_free_pfn is never advanced until the compaction run wraps around the start of the zone. This merely moved the starting point for the quadratic behaviour further into the zone, but the behaviour has still been o

[Qemu-devel] [PATCH 2/2] make the compaction "skip ahead" logic robust

2012-09-13 Thread Rik van Riel
re that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel Reported-by: Richard Davies diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(s

Re: [Qemu-devel] Windows slow boot: contractor wanted

2012-08-25 Thread Rik van Riel
On 08/25/2012 01:45 PM, Richard Davies wrote: Are you talking about these patches? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 http://marc.info/?l=linux-mm&m=134521289221259 If so, I believe those are in 3.6.0-rc3, so I teste

Re: [Qemu-devel] Windows slow boot: contractor wanted

2012-08-22 Thread Rik van Riel
On 08/22/2012 10:41 AM, Richard Davies wrote: Avi Kivity wrote: Richard Davies wrote: I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40G

Re: [Qemu-devel] [PATCH 2/2] Expose tsc deadline timer cpuid to guest

2012-03-19 Thread Rik van Riel
On 03/09/2012 01:27 PM, Liu, Jinsong wrote: As for 'tsc deadline' feature exposing, my patch (as attached) just obey qemu general cpuid exposing method, and also satisfied your target I think. One question. Why is TSC_DEADLINE not exposed in the cpuid allowed feature bits in do_cpuid_ent() i

Re: [Qemu-devel] [PATCH] qemu_timedate_diff() shouldn't modify its argument.

2011-11-07 Thread Rik van Riel
may be outdated. Ohhh, nice catch. Signed-off-by: Gleb Natapov Acked-by: Rik van Riel -- All rights reversed

[Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-12-01 Thread Rik van Riel
On 12/01/2010 02:35 PM, Peter Zijlstra wrote: On Wed, 2010-12-01 at 14:24 -0500, Rik van Riel wrote: Even if we equalized the amount of CPU time each VCPU ends up getting across some time interval, that is no guarantee they get useful work done, or that the time gets fairly divided to _user

[Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-12-01 Thread Rik van Riel
On 12/01/2010 02:07 PM, Peter Zijlstra wrote: On Wed, 2010-12-01 at 12:26 -0500, Rik van Riel wrote: On 12/01/2010 12:22 PM, Peter Zijlstra wrote: The pause loop exiting& directed yield patches I am working on preserve inter-vcpu fairness by round robining among the vcpus inside one

[Qemu-devel] Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

2010-12-01 Thread Rik van Riel
On 12/01/2010 12:22 PM, Peter Zijlstra wrote: On Wed, 2010-12-01 at 09:17 -0800, Chris Wright wrote: Directed yield and fairness don't mix well either. You can end up feeding the other tasks more time than you'll ever get back. If the directed yield is always to another task in your cgroup the

Re: [Qemu-devel] [RFC][PATCH] make sure disk writes actually hit disk

2006-07-29 Thread Rik van Riel
Paul Brook wrote: On Saturday 29 July 2006 15:59, Rik van Riel wrote: Fabrice Bellard wrote: Hi, Using O_SYNC for disk image access is not acceptable: QEMU relies on the host OS to ensure that the data is written correctly. This means that write ordering is not preserved, and on a power

Re: [Qemu-devel] [RFC][PATCH] make sure disk writes actually hit disk

2006-07-29 Thread Rik van Riel
Fabrice Bellard wrote: Hi, Using O_SYNC for disk image access is not acceptable: QEMU relies on the host OS to ensure that the data is written correctly. This means that write ordering is not preserved, and on a power failure any data written by qemu (or Xen fully virt) guests may not be pres

Re: [Qemu-devel] Re: [RFC][PATCH] make sure disk writes actually hit disk

2006-07-28 Thread Rik van Riel
Paul Brook wrote: With a proper async API, is there any reason why we would want this to be tunable? I don't think there's much of a benefit of prematurely claiming a write is complete especially once the SCSI emulation can support multiple simultaneous requests. You're right. This O_SYNC band

Re: [Qemu-devel] Re: [RFC][PATCH] make sure disk writes actually hit disk

2006-07-28 Thread Rik van Riel
Anthony Liguori wrote: Right now Fabrice is working on rewriting the block API to be asynchronous. There's been quite a lot of discussion about why using threads isn't a good idea for this Agreed, AIO is the way to go in the long run. With a proper async API, is there any reason why we woul

[Qemu-devel] Re: [RFC][PATCH] make sure disk writes actually hit disk

2006-07-28 Thread Rik van Riel
Rik van Riel wrote: This is the simple approach to making sure that disk writes actually hit disk before we tell the guest OS that IO has completed. Thanks to DMA_MULTI_THREAD the performance still seems to be adequate. Hah, and of course that bit is only found in Xen's qemu-dm. Doh! I

[Qemu-devel] [RFC][PATCH] make sure disk writes actually hit disk

2006-07-28 Thread Rik van Riel
on should make the performance overhead of synchronous writes bearable, or at least comparable to native hardware. Signed-off-by: Rik van Riel <[EMAIL PROTECTED]> --- xen-unstable-10712/tools/ioemu/block-bochs.c.osync 2006-07-28 02:15:56.0 -0400 +++ xen-unstable-10712/tools/ioemu/blo