Hi, I was extending perf counters to sample the stack of a KVM guest from a module[0].
The current KVM profiling architecture, keeps a CPU local variable current_vcpu of the current vcpu running before vm_enter, and removes it after a vm_exit. Then, when an NMI occurs, it could check the current_vcpu variable, and get statistics of the guest from it, if it occurred during the time the VM ran. What I needed is, sampling the guest stack evey time an NMI occurs. I needed two things. 1. A way to add code that would run when a PMI occurs. - possible with register_nmi_handler public API. 2. A way to access the CPU local variable current_vcpu. - problematic, since current_vcpu is static. What I eventually did is, since KVM expose a "setter" to current_vcpu, I scanned the assembly code of the setter, and looked for a direct move from register to gs (where CPU local variables are stored) plus offset. Then take this offset and use it to access the current_vcpu variable. What can fail? 1. kvm performance implementation is completely changed. 2. Compiler would do use different instructions to set CPU local variables (e.g., access CPU local variable by "mov $offset, %r2; mov $value, (%r2)"). I think both cases are unlikely. This mechanism was written in 2010, and had a cosmetic change in 2011 (access function to CPU local variables). I think that there are a few years until this approach could fail. Obviously, the correct approach is to fix perf counters in the kernel to support stack sampling (not trivial). But sometimes you need a solution now, without patching all your host kernels. I would be grateful for feedback of this approach, and especially possible pitfalls I haven't considered. The gist of the code is[1]: for (;;) { u8 *p; c = memchr(c, GS_SEG_OVERRIDE, end - c); if (c == NULL) return -1; c++; p = c; if (!IS_RX_W(*p)) continue; p++; if (*p != MOV_M_TO_R_OPCODE) continue; /* We need direct access to memory with displacement */ /* Don't care which registers are used */ p++; if (MOD(*p) != 0 || RM(*p) != 0b100) continue; p++; if (BASE(*p) != 0b101 || INDEX(*p) != 0b100) continue; p++; /* grab displacement32 value */ return *(u32 *)p; } [0] https://github.com/elazarl/gueststack [1] https://github.com/elazarl/gueststack/blob/master/module.c#L114 _______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il