Christian Borntraeger wrote:
On kvm I have seen some rare hangs in stop_machine when I used more guest
cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
hang quite often. I could also reproduce the problem on a 4 way z/VM host
with
a 64 way guest.
I think that's one
Christian Borntraeger wrote:
I really like 64 guest cpus as a good testcase for all kind of things.
Sure, I do the same kind of thing.
I think x86 (at least) is now using ticket locks, which is fair. Which
kernel are you seeing this problem on?
Sorry, forgot to mention. Its
Gerd Hoffmann wrote:
* Host: make kvm pv clock really compatible with xen pv clock.
* Guest/xen: factor out some xen clock code into a separate
source file (pvclock.[ch]), so kvm can reuse it.
* Guest/kvm: make kvm clock compatible with xen clock by using
Gerd Hoffmann wrote:
Hmm, I somehow fail to see a case where it could be non-atomic ...
get_time_values() copies a consistent snapshot, thus
xen_clocksource_read() doesn't race against xen updating the fields.
The snapshot is in a per-cpu variable, thus it doesn't race against
other guest
Gerd Hoffmann wrote:
Jeremy Fitzhardinge wrote:
Xen could change the parameters in the instant after get_time_values().
That change could be as a result of suspend-resume, so the parameters
and the tsc could be wildly different.
Ah, ok, forgot the rdtsc in the picture
Gerd Hoffmann wrote:
+cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src)
+{
+ struct pvclock_shadow_time *shadow;
+ cycle_t ret;
+ unsigned version;
+
+ shadow = get_cpu_var(shadow_time);
+ do {
+ version = pvclock_get_time_values(shadow,
Gerd Hoffmann wrote:
I'm looking at the guest side of the issue right now, trying to identify
common code, and while doing so noticed that xen does the
version-check-loop in both get_time_values_from_xen(void) and
xen_clocksource_read(void), and I can't see any obvious reason for that.
The
Gerd Hoffmann wrote:
Wall clock is off a few hours though. Oops.
I think the way wall clock and system clock work together in xen (Jeremy
correct me if I'm wrong) is that the wall clock specifies the point in
time where the system clock started going. As kvm fills in host system
time into
Andrea Arcangeli wrote:
On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote:
I think you can break this if() down a bit:
if (!(vma-vm_file vma-vm_file-f_mapping))
continue;
It makes no difference at runtime, coding
Christoph Lameter wrote:
Provide a way to lock an mm_struct against reclaim (try_to_unmap
etc). This is necessary for the invalidate notifier approaches so
that they can reliably add and remove a notifier.
Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter
Jes Sorensen wrote:
Jeremy Fitzhardinge wrote:
Jes Sorensen wrote:
This change has been on the x86 side for ages, and not even Ingo made a
peep about it ;)
Mmmm, last time I looked, x86 didn't scale to any interesting number
of CPUs :-)
Well, I guess you need all those CPUs
Jes Sorensen wrote:
I'm a little wary of the performance impact of this change. Doing a
cpumask compare on all smp_call_function calls seems a little expensive.
Maybe it's just noise in the big picture compared to the actual cost of
the IPIs, but I thought I'd bring it up.
Keep in mind that
Carsten Otte wrote:
+struct mm_struct *dup_mm(struct task_struct *tsk);
No prototypes in .c files. Put this in an appropriate header.
J
-
This SF.net email is sponsored by: Microsoft
Defy all challenges.
Marcelo Tosatti wrote:
Forgot to copy you... Ideally all pte updates should be done via the
paravirt interface.
Hm, are you sure?
+static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
+{
+ pte_t pte = *ptep;
+ clear_bit(bit, (unsigned long *)pte.pte);
+
Marcelo Tosatti wrote:
On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote:
Marcelo Tosatti wrote:
Forgot to copy you... Ideally all pte updates should be done via the
paravirt interface.
Hm, are you sure?
+static inline void pte_clear_bit(unsigned
Avi Kivity wrote:
I find it non-descriptive, and it reminds me of another hypervisor.
I suggest 'tlp' for two-level paging.
That has its own ambiguity; without other context it reads like
two-level pagetable. Anyway, using the same term for the same thing
is not a bad idea.
J
Gerd Hoffmann wrote:
Another maybe workable approach for Xen is to go through pv_ops
(although pte_clear doesn't go through pv_ops right now, so this would
be an additional hook too ...).
It does for 32-bit PAE. Making pte_clear uniform across all pagetable
modes would be a nice cleanup.
Glauber de Oliveira Costa wrote:
the ifdef only exists because, as I said, the code itself will be always
compiled in, to avoid an ifdef in setup_64.c. So it's just a taking it
from here, putting it there issue. Kiran seem to prefer this way, but I
don't really have a preference.
It would be
Amit Shah wrote:
Glauber, are you planning on consolidating the dma_ops structure for 32- and
64-bit? 32-bit doesn't currently have a dma_mapping_ops structure, which
makes paravirtualizing DMA access difficult on 32-bit.
I think it's a good idea. While I haven't worked out the details
Glauber de Oliveira Costa wrote:
mm/sparse-vmemmap.c uses init_mm in some places. However, it is not
present in any of the headers currently included in the file.
init_mm is defined as extern in sched.h, so we add it to the headers list
Up to now, this problem was masked by the fact that
Glauber de Oliveira Costa wrote:
This patch consolidates part of the pieces of smp for both architectures.
(i386 and x86_64). It makes part the calls go through smp_ops, and shares
code for those functions in smpcommon.c
There's more room for code sharing here, but it is left as an exercise
Glauber de Oliveira Costa wrote:
arch/x86/kernel/built-in.o: In function `native_smp_send_reschedule':
/home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/smpcommon.c:262:
undefined reference to `genapic'
arch/x86/kernel/built-in.o: In function `native_smp_call_function_mask':
Glauber de Oliveira Costa wrote:
This patch introduces the include files for kvm clock.
They'll be needed for both guest and host part.
Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
include/asm-x86/kvm_para.h | 23 +++
include/linux/kvm.h|
Avi Kivity wrote:
Glauber de Oliveira Costa wrote:
+union kvm_hv_clock {
+ struct {
+ u64 tsc_mult;
+ u64 now_ns;
+ /* That's the wall clock, not the water closet */
+ u64 wc_sec;
+ u64 wc_nsec;
Glauber de Oliveira Costa wrote:
I in fact have seen bugs with mixed reads and writes to the same cr,
(cr4), but adding the volatile
flag to the read function seemed to fix it.
Well, volatile will make a read be repeated rather than caching the
previous value, but it has no effect on ordering.
Keir Fraser wrote:
volatile prevents the asm from being 'moved significantly', according to the
gcc manual. I take that to mean that reordering is not allowed.
That phrase doesn't appear in the gcc manual; in fact, it specifically
says that reordering can happen:
The `volatile' keyword
Glauber de Oliveira Costa wrote:
This patch introduces, and patch callers when needed, native
versions for read/write_crX functions, clts and wbinvd.
Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
Acked-by: Jeremy Fitzhardinge
PROTECTED]
Acked-by: Jeremy Fitzhardinge [EMAIL PROTECTED]
---
arch/x86/kernel/head_64.S |9 -
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index b6167fe..c31b1c9 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch
Ingo Molnar wrote:
* Zachary Amsden [EMAIL PROTECTED] wrote:
On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
From: Glauber de Oliveira Costa [EMAIL PROTECTED]
tsc is very good time source (when it does not have drifts, does not
change it's frequency, i.e. when
Zachary Amsden wrote:
On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
From: Glauber de Oliveira Costa [EMAIL PROTECTED]
tsc is very good time source (when it does not have drifts, does not
change it's frequency, i.e. when it works), so it should have its rating
raised
Ingo Molnar wrote:
that's totally broken then. You cannot create an SMP-safe monotonic
clocksource via interpolation - native does not do it either. Good thing
this problem got exposed, it needs to be fixed.
Sigh, I don't really want to have this fight again.
I don't really see what
Ingo Molnar wrote:
i dont remember us having discussed this before, ever. If there's any
fight about monotonicity and SMP then it would be a pretty onesided
affair, with you being beaten up seriously ;-)
This is part of Xen's ABI, so it isn't easily changed. You're right
that getting
Carsten Otte wrote:
On s390, we've had an instruction...
Yes, quite ;)
J
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and
Glauber de Oliveira Costa wrote:
My next TODOs with it are:
* Get SMP working
* Try something for stolen time, as jeremy's last suggestion for anthony's
patch
* Measure the time it takes for a hypercall, and subtract this time
for calculating the expiry time for the timer event.
I
Anthony Liguori wrote:
Nakajima, Jun wrote:
I don't understand the purpose of returning the max leaf. Who is that
information useful for?
Well, this is the key info to the user of CPUID. It tells which leaves
are valid to use. Otherwise, the user cannot tell whether the results of
Anthony Liguori wrote:
This patch refactors the current hypercall infrastructure to better support
live
migration and SMP. It eliminates the hypercall page by trapping the UD
exception that would occur if you used the wrong hypercall instruction for the
underlying architecture and replacing
Anthony Liguori wrote:
The whole point of using the instruction is to allow hypercalls to be
used in many locations. This has the nice side effect of not
requiring a central hypercall initialization routine in the guest to
fetch the hypercall page. A PV driver can be completely independent
Anthony Liguori wrote:
Yeah, see, the initial goal was to make it possible to use the KVM
paravirtualizations on other hypervisors. However, I don't think this
is really going to be possible in general so maybe it's better to just
use leaf 0. I'll let others chime in before sending a new
Nakajima, Jun wrote:
Today, 3 CPUID leaves starting from 0x4000_ are defined in a generic
fashion (hypervisor detection, version, and hypercall page), and those
are the ones used by Xen today. We should extend those leaves (e.g.
starting from 0x4000_0003) for the vmm-independent features
Nakajima, Jun wrote:
The hypervisor detection machanism is generic, and the signature
returned is implentation specific. Having a list of all hypervisor
signatures sounds fine to me as we are detecting vendor-specific
processor(s) in the native. And I don't expect the list is large.
I'm
Avi Kivity wrote:
It is, but the hooks are in much the same places. It could be argued
that you'd embed pte notifiers in paravirt_ops for a host kernel, but
that's not doable because pte notifiers use higher-level data
strutures (like vmas).
Also, I wouldn't like to preclude the possibility
Laurent Vivier wrote:
functionnalities:
- allow to measure time spent by a CPU in a virtual CPU.
- allow to display in /proc/state this value by CPU
- allow to display in /proc/pid/state this value by process
- allow KVM to use these 3 previous functionnalities
So, currently time spent
Anthony Liguori wrote:
I don't agree that having paravirt_ops within a normal module is all
that useful. By the time modules can be loaded, the kernel has
completely booted. There should only be a handful of paravirt_ops
implementations and they aren't large so I don't think there's a big
Zachary Amsden wrote:
For a VMM which supports both full emulation and para-virtualization,
testing CPUID leaves is not sufficient to determine applicability of a
paravirt device driver. This only indicates the presence of the
functionality, not the fact that the functionality has been
Zachary Amsden wrote:
Basically, it just makes it easier on distributors and allows any old
kernel with paravirt-ops module support to run on any modern, new
hypervisor - that might not have even existed at the time the distro
was created.
Hey, isn't that what VMI's for? ;)
I'd been
Anthony Liguori wrote:
I've been thinking about this wrt the hypercall page in KVM. The
problem is that in a model like KVM (or presumably VMI), migration
gets really difficult if you have anything but a trivial hypercall
page since the hypercall page will change after migration.
If you
Zachary Amsden wrote:
Unless you also migrate the hypercall page itself and impose migration
restrictions on compatible hypercall pages.
Seems unreasonable, especially if you support migration between VT and
SVM machines. The whole point of a hypercall page is to give you a
point of
Zachary Amsden wrote:
You only need to quiesce if you have guest-visible data-structures
that have details about the underlying hardware. So Xen needs to
quiesce, but I don't know of any other VMM that would.
VMI, KVM and lhype should be capable of transparent migration without
guest
Zachary Amsden wrote:
Yes, but if we want to stay with that forward compatibility story, we
need a way to allow paravirt device probing to be completely
orthogonal to paravirt-ops probing. Either the VMware hypervisor
needs to NOT implement a CPUID leaf, keeping the same ROM based
Zachary Amsden wrote:
If I had a gentoo install,
Yes, but then you'd be a gentoo user. ;)
I would probably go so far as to want to recompile everything after
migration across CPU vendors; things like NMIs, MSRs, thermal controls
and sleep states are also vendor dependent and either need to
Anthony Liguori wrote:
The real trick is doing it without the guest being involved at all.
Right now, it won't be a problem in KVM since the hypercall page only
differs by a single instruction across platforms. In the future,
we'll have to be smarter and wait for all VCPUs to leave the
Anthony Liguori wrote:
I've updated this patch and switched to using a scale/shift like Xen
is doing, but I must admit, I don't understand how it helps adjtime.
I poked around a bit and it wasn't obvious.
Why is having {mult=122, shift=22} better for adjtime than {mult=1,
shift=0}?
I don't
Anthony Liguori wrote:
Perhaps my grep'ing skills are weak, but I don't seem to see any.
Were you thinking of something in particular?
__pte(), of course. Sheesh. ;)
J
-
This SF.net email is sponsored by DB2
Anthony Liguori wrote:
Perhaps we can just print the banner before batching occurs? Then
it's being printed at the last possible moment.
s/batching/patching/? Yes, that would work.
J
-
This SF.net email is sponsored
Anthony Liguori wrote:
+static cycle_t read_hyper(void)
+{
+ struct timespec now;
+ int ret;
+
+ ret = kvm_hypercall(KVM_HYPERCALL_GET_KTIME, (u32)now, 0, 0, 0);
+ WARN_ON(ret);
+
+ return now.tv_nsec + now.tv_sec * (cycles_t)1e9;
Hm, use of FP looks pretty odd.
Anthony Liguori wrote:
Jeremy Fitzhardinge wrote:
Anthony Liguori wrote:
Okay. I may remove this patch from the patch series and attempt to
sit down next week and work out something more complete that also
implements stolen time accounting.
Well, that's a separate problem
Anthony Liguori wrote:
1) Not really sure what is needed for CONFIG_PREEMPT support. I'm not
sure which paravirt_ops calls are actually re-entrant.
I'm not sure that has specifically come up. The main issue is whether a
particular call can be preempted and whether that matters. I guess the
Anthony Liguori wrote:
Regards,
Anthony Liguori
Subject: [PATCH] KVM: Add hypercall queue for paravirt_ops implementation
Author: Anthony Liguori [EMAIL PROTECTED]
Implemented a hypercall queue that can be used when
Jeremy Fitzhardinge wrote:
+static int kvm_hypercall_flush(struct kvm_vcpu *vcpu)
+{
+struct kvm_hypercall_entry *queue;
+struct kvm_vmca *vmca;
+int ret = 0;
+int i;
+
+queue = kmap(vcpu-queue_page);
+vmca = kmap(vcpu-para_state_page);
kmap_atomic
Anthony Liguori wrote:
Hi Jeremy,
Jeremy Fitzhardinge wrote:
Anthony Liguori wrote:
1) Not really sure what is needed for CONFIG_PREEMPT support. I'm not
sure which paravirt_ops calls are actually re-entrant.
I'm not sure that has specifically come up. The main issue is whether
Santos, Jose Renato G wrote:
It seems that we will still need specific devices drivers
for each different virtualization flavor. For example,
we will still need to have a specific Xen netfront
device that talks to a backend device in dom0, using
page grants, and other Xen specific
Rusty Russell wrote:
It was actually Jeremy's paravirt cleanup patch which changed the
calling convention of rdmsr_safe() to match rdmsr().
Oops, my little mind hobgoblin is getting out of control...
J
-
Take
Andrew Morton wrote:
Which tree are you patching??
-
It looks like its against the previously posted Cleanup: rationalize
paravirt wrappers patch.
J
-
Take Surveys. Earn Cash. Influence the Future of IT
Join
63 matches
Mail list logo