[Xen-devel] [linux-arm-xen test] 58875: tolerable FAIL - PUSHED
flight 58875 linux-arm-xen real [real] http://logs.test-lab.xenproject.org/osstest/logs/58875/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-cubietruck 11 guest-startfail pass in 58889-bisect Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-cubietruck 12 migrate-support-check fail in 58889 never pass test-armhf-armhf-xl-sedf-pin 12 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass version targeted for testing: linux64972ceb0b0cafc91a09764bc731e1b7f0503b5c baseline version: linux9f51b5de8c3fdd01a9d692da5633449cc6936688 People who touched revisions under test: David S. Miller da...@davemloft.net Ian Campbell ian.campb...@citrix.com Luis Henriques luis.henriq...@canonical.com Wei Liu wei.l...@citrix.com jobs: build-armhf-xsm pass build-armhf pass build-armhf-libvirt pass build-armhf-pvopspass test-armhf-armhf-xl pass test-armhf-armhf-libvirt-xsm pass test-armhf-armhf-xl-xsm pass test-armhf-armhf-xl-arndale pass test-armhf-armhf-xl-credit2 pass test-armhf-armhf-xl-cubietruck fail test-armhf-armhf-libvirt pass test-armhf-armhf-xl-multivcpupass test-armhf-armhf-xl-sedf-pin pass test-armhf-armhf-xl-sedf pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=linux-arm-xen + revision=64972ceb0b0cafc91a09764bc731e1b7f0503b5c + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getconfig Repos +++ perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push linux-arm-xen 64972ceb0b0cafc91a09764bc731e1b7f0503b5c + branch=linux-arm-xen + revision=64972ceb0b0cafc91a09764bc731e1b7f0503b5c + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getconfig Repos +++ perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . cri-common ++ . cri-getconfig ++ umask 002 + select_xenbranch + case $branch in + tree=linux + xenbranch=xen-unstable + '[' xlinux = xlinux ']' + linuxbranch=linux-arm-xen + '[' x = x ']' + qemuubranch=qemu-upstream-unstable + : tested/2.6.39.x + . ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{OsstestUpstream} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://xenbits.xen.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/rumpuser-xen.git +++
Re: [Xen-devel] PCI Passthrough ARM Design : Draft1
On Wednesday 17 June 2015 07:59 PM, Ian Campbell wrote: On Wed, 2015-06-17 at 07:14 -0700, Manish Jaggi wrote: On Wednesday 17 June 2015 06:43 AM, Ian Campbell wrote: On Wed, 2015-06-17 at 13:58 +0100, Stefano Stabellini wrote: Yes, pciback is already capable of doing that, see drivers/xen/xen-pciback/conf_space.c I am not sure if the pci-back driver can query the guest memory map. Is there an existing hypercall ? No, that is missing. I think it would be OK for the virtual BAR to be initialized to the same value as the physical BAR. But I would let the guest change the virtual BAR address and map the MMIO region wherever it wants in the guest physical address space with XENMEM_add_to_physmap_range. I disagree, given that we've apparently survived for years with x86 PV guests not being able to right to the BARs I think it would be far simpler to extend this to ARM and x86 PVH too than to allow guests to start writing BARs which has various complex questions around it. All that's needed is for the toolstack to set everything up and write some new xenstore nodes in the per-device directory with the BAR address/size. Also most guests apparently don't reassign the PCI bus by default, so using a 1:1 by default and allowing it to be changed would require modifying the guests to reasssign. Easy on Linux, but I don't know about others and I imagine some OSes (especially simpler/embedded ones) are assuming the firmware sets up something sane by default. Does the Flow below captures all points a) When assigning a device to domU, toolstack creates a node in per device directory with virtual BAR address/size Option1: b) toolstack using some hypercall ask xen to create p2m mapping { virtual BAR : physical BAR } for domU While implementing I think rather than the toolstack, pciback driver in dom0 can send the hypercall by to map the physical bar to virtual bar. Thus no xenstore entry is required for BARs. Moreover a pci driver would read BARs only once. c) domU will not anytime update the BARs, if it does then it is a fault, till we decide how to handle it As Julien has noted pciback already deals with this correctly, because sizing a BAR involves a write, it implementes a scheme which allows either the hardcoded virtual BAR to be written or all 1s (needed for size detection). d) when domU queries BAR address from pci-back the virtual BAR address is provided. Option2: b) domU will not anytime update the BARs, if it does then it is a fault, till we decide how to handle it c) when domU queries BAR address from pci-back the virtual BAR address is provided. d) domU sends a hypercall to map virtual BARs, e) xen pci code reads the BAR and maps { virtual BAR : physical BAR } for domU Which option is better I think Ian is for (2) and Stefano may be (1) In fact I'm now (after Julien pointed out the current behaviour of pciback) in favour of (1), although I'm not sure if Stefano is too. (I was never in favour of (2), FWIW, I previously was in favour of (3) which is like (2) except pciback makes the hypervcall to map the virtual bars to the guest, I'd still favour that over (2) but (1) is now my preference) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 RFC 6/6] x86/MSI: properly track guest masking requests
On 24.06.15 at 19:24, andrew.coop...@citrix.com wrote: On 22/06/15 15:51, Jan Beulich wrote: --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1308,6 +1308,39 @@ printk(%04x:%02x:%02x.%u: MSI-X %03x:%u return 1; } +entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSI); +if ( entry entry-msi_attrib.maskbit ) +{ +uint16_t cntl; +uint32_t unused; + +pos = entry-msi_attrib.pos; +if ( reg pos || reg = entry-msi.mpos + 8 ) +return 0; +printk(%04x:%02x:%02x.%u: MSI %03x:%u-%04x\n, seg, bus, slot, func, reg, size, *data);//temp + +if ( reg == msi_control_reg(pos) ) +return size == 2 ? 1 : -EACCES; +if ( reg entry-msi.mpos || reg = entry-msi.mpos + 4 || size != 4 ) +return -EACCES; Can we avoid using EACCES to avoid confusing it with a mismatched tools version? What other suitable error code would you see here? I'm not sure we want this error code to be reserved for exactly one purpose, the more that here we're on a path that will never has this error code returned to the guest (and even less so via a domctl/sysctl, which would be the primary mismatched-tools-version candidates). It's also odd that you ask for this here, when patch 2 has a use of this error code too. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Hyper and Xen Project
On 25 Jun 2015, at 02:46, Wang Xu gna...@gmail.com wrote: Agree, but I think the document is a bit confused It is important that channel names are globally unique. https://github.com/mirage/xen/blob/master/docs/misc/channel.txt#L94 I agree— that wording is definitely confusing. Perhaps the docs should compare the channel names to TCP/UDP port numbers? We could say that - the IANA port registry ~=~ the channel registry in the docs/ directory - a single IP can only have one binding for a particular port at a time ~=~ a domain can only have one binding for a particular name at a time - lots of IPs can bind the same port ~=~ lots of domains can bind the same name What do you think? Thanks, Dave On Thu, Jun 25, 2015 at 2:29 AM Dave Scott dave.sc...@citrix.com wrote: Hi Xu, On 24 Jun 2015, at 14:44, Wang Xu gna...@gmail.com wrote: Thank you Dave, I think I can also get work around for that. By the way, the document says the name should be global unique, but I can start 2 domains have channels with a same name, is there some potential problems? The name needs to be unique within a domain. It’s ok to have 1. domid 10, channel name ‘agent’ 2. domid 11, channel name ‘agent’ — this will be common, as multiple domains will have the same ‘agent’ software installed. but it will cause problems if the name is used twice within a domain. It’s a bad idea to have 1: domid 10, channel name ‘agent’ 2: domid 10, channel name ‘agent’ — although this will create 2 distinct /dev/hvc devices, it will be difficult to tell which is which. If libxl allows the name to be duplicated within a domain, then this is my fault. We should add validation code to check uniqueness. Thanks, Dave Cheers Xu On Wed, Jun 24, 2015 at 9:03 PM Dave Scott dave.sc...@citrix.com wrote: I don’t think the frontend driver in Linux knows about the name key. In my testing I wrote a udev script which looks up the ‘name’ key directly in xenstore and created a named device node using that. For reference my script is here: https://github.com/mirage/mirage-console/blob/master/udev/xenconsole-setup-tty Cheers, Dave and I directly test `/dev/hvc1`, and it could communicate with the outside socket. Is there some mistake in my channel name configuration? | static void hyper_config_channel(libxl_device_channel* ch, const char* name, const char* sock, int devid) { | libxl_device_channel_init(ch); | ch-backend_domid = 0; | ch-name = strdup(name); | ch-devid = devid; | ch-connection = LIBXL_CHANNEL_CONNECTION_SOCKET; | ch-u.socket.path = strdup(sock); | } I tried to look at the oVirt code as it is mentioned in the dock, but I did not find xen console in its guest agent code. So the issue is that the name you assign here to the channel, doesn't come up anywhere in the guest. Is that correct? Thank you! On Tue, Jun 23, 2015 at 7:30 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Tue, 23 Jun 2015, Wang Xu wrote: On Sat, Jun 20, 2015 at 1:10 AM Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: Integrating hyper with Xen using libxl was the right decision and it looks like you did a good job. I think that you can go ahead with the PR! But I did have a few issues building hyper. I am getting: hyperd.go:11:2: cannot find package hyper/daemon in any of: [...] I tried with a clean 0.2-dev branch ./autogen.sh ./configure make It looks ok, are you work on the 0.2-dev branch, I did not write the branch name in the instruction of Readme, sorry for that. No worries, the most important part at this stage is the code, and that looks OK :-) Yes, I was using 0.2-dev and followed those steps. As I usually don't program in go, it is likely that my go working environment is missing something, or my go paths are wrong. This is the full error message: CGO_LDFLAGS=-Lhypervisor/xen -lxenlight -lxenctrl -lhyperxl godep go build hyperd.go hyperd.go:11:2: cannot find package hyper/daemon in any of: /local/scratch/sstabellini/go/src/hyper/daemon (from $GOROOT) /local/scratch/sstabellini/hyper/Godeps/_workspace/src/hyper/daemon (from $GOPATH) hyperd.go:10:2: cannot find package hyper/engine in any of: /local/scratch/sstabellini/go/src/hyper/engine (from $GOROOT) /local/scratch/sstabellini/hyper/Godeps/_workspace/src/hyper/engine (from $GOPATH) hyperd.go:12:2: cannot find package hyper/lib/glog in any of: /local/scratch/sstabellini/go/src/hyper/lib/glog (from $GOROOT)
Re: [Xen-devel] [PATCH 5/9] x86/pvh: Set PVH guest's mode in XEN_DOMCTL_set_address_size
On 24.06.15 at 18:21, boris.ostrov...@oracle.com wrote: On 06/24/2015 08:10 AM, Jan Beulich wrote: On 24.06.15 at 13:42, boris.ostrov...@oracle.com wrote: On 06/24/2015 03:57 AM, Jan Beulich wrote: On 24.06.15 at 04:53, boris.ostrov...@oracle.com wrote: On 06/23/2015 09:22 AM, Jan Beulich wrote: --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2320,12 +2320,7 @@ int hvm_vcpu_initialise(struct vcpu *v) v-arch.hvm_vcpu.inject_trap.vector = -1; if ( is_pvh_domain(d) ) -{ -v-arch.hvm_vcpu.hcall_64bit = 1;/* PVH 32bitfixme. */ -/* This is for hvm_long_mode_enabled(v). */ -v-arch.hvm_vcpu.guest_efer = EFER_LMA | EFER_LME; return 0; -} With this removed, is there any guarantee that hvm_set_mode() will be called for each vCPU? IIUIC, toolstack is required to call XEN_DOMCTL_set_address_size which results in a call to switch_compat/native(), which loop over all VCPUs, calling set_mode. I don't recall this being a strict requirement. I think a PV 64-bit guest would start fine without. We do call it via libxl__build_pv() - xc_dom_boot_mem_init() - arch_setup_mem_init() - x86_compat(). Right, that's in our tool stack. The question though was whether it's a requirement to be called. Since this change will assume that this domctl is called for both 32- and 64-bit --- yes, this becomes a requirement for 64-bit PVH guests. But that's the whole point of my question - it isn't right now, and hence I don't think it should become a requirement. Instead I think state should start out to be ready for a 64-bit guest just like it does for PV. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-linus test] 58873: regressions - FAIL
flight 58873 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/58873/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-rumpuserxen-amd64 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 58793 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 58793 test-amd64-i386-xl-qemuu-debianhvm-amd64 9 debian-hvm-install fail REGR. vs. 58793 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 18 guest-start/debianhvm.repeat fail REGR. vs. 58793 build-armhf-pvops 5 kernel-build fail REGR. vs. 58793 Regressions which are regarded as allowable (not blocking): test-amd64-i386-libvirt 11 guest-start fail REGR. vs. 58793 test-amd64-amd64-libvirt 11 guest-start fail like 58793 test-amd64-i386-freebsd10-amd64 9 freebsd-install fail like 58793 test-amd64-i386-freebsd10-i386 9 freebsd-install fail like 58793 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 58793 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 58793 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-arndale 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-xl-xsm 1 build-check(1) blocked n/a test-armhf-armhf-xl 1 build-check(1) blocked n/a test-armhf-armhf-xl-credit2 1 build-check(1) blocked n/a test-armhf-armhf-xl-cubietruck 1 build-check(1) blocked n/a test-armhf-armhf-xl-sedf-pin 1 build-check(1) blocked n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) blocked n/a test-armhf-armhf-xl-sedf 1 build-check(1) blocked n/a test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass version targeted for testing: linux6eae81a5e2d6646a61146501fd3032a340863c1d baseline version: linuxd2228e4310612a1289c343bcf819831a74ae0366 551 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopsfail build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl blocked test-amd64-i386-xl pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm fail test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm pass test-amd64-amd64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm blocked test-amd64-i386-libvirt-xsm pass test-amd64-amd64-xl-xsm pass test-armhf-armhf-xl-xsm blocked test-amd64-i386-xl-xsm pass test-amd64-amd64-xl-pvh-amd fail
Re: [Xen-devel] [PATCH] xen: new maintainer for the RTDS scheduler
2015-06-25 5:44 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com: Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Cc: George Dunlap george.dun...@eu.citrix.com Cc: Meng Xu xumengpa...@gmail.com --- MAINTAINERS |5 + 1 file changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 6b1068e..e6616d2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -282,6 +282,11 @@ F: tools/libxl/libxl_nonetbuffer.c F: tools/hotplug/Linux/remus-netbuf-setup F: tools/hotplug/Linux/block-drbd-probe +RTDS SCHEDULER +M: Dario Faggioli dario.faggi...@citrix.com +S: Supported +F: xen/common/sched_rt.c I'm not sure if the following response is correct and proper, just in case it is correct. :-) Reviewed-and-Acked-by: Meng Xu men...@cis.upenn.edu Thanks, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM
On 2015/6/25 19:23, Wei Liu wrote: On Tue, Jun 23, 2015 at 05:57:25PM +0800, Tiejun Chen wrote: While building a VM, HVM domain builder provides struct hvm_info_table{} to help hvmloader. Currently it includes two fields to construct guest e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should check them to fix any conflict with RAM. RAM - RDM? Fixed. RMRR can reside in address space beyond 4G theoretically, but we never [snip] +static struct xen_reserved_device_memory +*xc_device_get_rdm(libxl__gc *gc, + uint32_t flag, + uint16_t seg, + uint8_t bus, + uint8_t devfn, + unsigned int *nr_entries) I just notice this function lives in libxl_dm.c. The function should be renamed to libxl__xc_device_get_rdm. This function should return proper libxl error code (ERROR_FAIL or something more appropriate). The allocated RDM entries should be ERROR_FAIL is better. So refactor this function after address your all comments, static int libxl__xc_device_get_rdm(libxl__gc *gc, uint32_t flag, uint16_t seg, uint8_t bus, uint8_t devfn, unsigned int *nr_entries, struct xen_reserved_device_memory *xrdm) { int rc; /* * We really can't presume how many entries we can get in advance. */ *nr_entries = 0; rc = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, NULL, nr_entries); assert(rc = 0); /* 0 means we have no any rdm entry. */ if (!rc) 94,22 3% /* 0 means we have no any rdm entry. */ if (!rc) goto out; if (errno == ENOBUFS) { xrdm = libxl__malloc(gc, *nr_entries * sizeof(xen_reserved_device_memory_t)); rc = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, xrdm, nr_entries); if (rc) { LOG(ERROR, Could not get reserved device memory maps.\n); rc = ERROR_FAIL; } } else { LOG(ERROR, Could not get reserved device memory maps.\n); rc = ERROR_FAIL; } out: if (rc) { *nr_entries = 0; xrdm = NULL; } return rc; } Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V5 2/7] libxl_read_file_contents: add new entry to read sysfs file
On 6/25/2015 at 07:09 PM, in message 21899.57676.368102.982...@mariner.uk.xensource.com, Ian Jackson ian.jack...@eu.citrix.com wrote: Chunyan Liu writes ([PATCH V5 2/7] libxl_read_file_contents: add new entry to read sysfs file): Sysfs file has size=4096 but actual file content is less than that. Current libxl_read_file_contents will treat it as error when file size and actual file content differs, so reading sysfs file content with this function always fails. Add a new entry libxl_read_sysfs_file_contents to handle sysfs file specially. It would be used in later pvusb work. I think this still fails to detect a situation where the file is unexpectedly longer than the requested size ? +} else if (feof(f)) { +if (rs datalen tolerate_shrinking_file) { +datalen = rs; +} else { If the file is bigger than the requested size, it will fall to this branch and report error. Do you mean I should report another error message separately? - Chunyan +LOG(ERROR, %s changed size while we were reading it, +filename); +goto xe; +} +} else { As we wrote earlier: Is there any risk that the file is actually bigger than advertised, rather than smaller ? For sysfs file, couldn't be bigger. Then you should detect the condition that the file is bigger, and call it an error. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] Stepping up for being the maintainer of sched_rt.c
2015-06-25 5:44 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com: I've been involved with this scheduler from the very beginning of the upstreaming process (from the RT-Xen project to here). Right! Thank Dario for your help and advice! :-) I've been working with Meng and his group closely since then, and I now feel comfortable to be the one that will (N)Ack their patches! :-) I'm not sure what I should reply, but I'm raising my hands and feet to vote for it. :-) Thanks, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] vTPM issues
It worked straight away on Ubuntu 15.04. Thanks a lot for your advice. On 25 Jun 2015, at 11:52, Emil Condrea emilcond...@gmail.commailto:emilcond...@gmail.com wrote: Timeouts have the standard values. Good luck with installing 15.04. On Thu, Jun 25, 2015 at 12:34 PM, Marcos Simó Picó marco...@kth.semailto:marco...@kth.se wrote: Okay, /etc/tpm0 is present. The timeout values are: 752000 200 752000 752000 [adjusted] I have no problem actually upgrading to Ubuntu 15.04 if that might solve the problem. Thanks a lot for your reply again. De: Emil Condrea emilcond...@gmail.commailto:emilcond...@gmail.com Enviado: jueves, 25 de junio de 2015 11:22 Para: Marcos Simó Picó Cc: xen-devel@lists.xen.orgmailto:xen-devel@lists.xen.org Asunto: Re: [Xen-devel] vTPM issues Sorry, I misspelled, I meant /dev/tpm0 not /etc/tpm0 I remember that once I had this problem when almost all trousers commands were returning internal software error in domU. Can you check what are the timeout values? cat /sys/devices/vtpm-0/timeouts I remember that there was a bug in ubuntu 14.04 regarding tpm driver. You could try 14.04.2. I am using Ubuntu 15.04 as domU guest and tpm comands run succesfully. On Thu, Jun 25, 2015 at 12:10 PM, Marcos Simó Picó marco...@kth.semailto:marco...@kth.se wrote: Yes, I'm indeed using pv guests. After running #tcsd -f I get: TCSD TDDL ioctl: (25) Inappropriate ioctl for device TCSD TDDL Falling back to Read/Write device support. TCSD trousers 0.3.5git: TCSD up and running. I don't know if the problem might be there. When I invoke tpm_takeownership -z -y -l debug it returns exactly the same messages I sent in my previous email. On the other hand, /sys/devices/vtpm-0 is present, but /etc/tpm0 is not. Thanks for your reply. De: Emil Condrea emilcond...@gmail.commailto:emilcond...@gmail.com Enviado: jueves, 25 de junio de 2015 10:21 Para: Marcos Simó Picó Cc: xen-devel@lists.xen.orgmailto:xen-devel@lists.xen.org; Xu, Quan Asunto: Re: [Xen-devel] vTPM issues I guess you are using pv guests, I don't know exactly if Quan finished development for hvm. I suggest to take a look at tcsd log: pkill tcsd tcsd -f tpm_takeownership -z -y -l debug Also can you see if /sys/devices/vtpm-0 and /dev/tpm0 are present? On Wed, Jun 24, 2015 at 6:16 PM, Marcos Simó Picó marco...@kth.semailto:marco...@kth.se wrote: Hello everyone, I would like to try the vTPM feature, but I'm having some issues. Basically, I followed the steps explained in https://mhsamsal.wordpress.com/2013/12/05/configuring-virtual-tpm-vtpm-for-xen-4-3-guest-virtual-machines/ I'm running Ubuntu 14.04 as Dom0 on a Dell optiplex-9020. I compiled Xen 4.5.0 from source. After creating vtpmmgr and vtpm stubdoms, and DomU, I can invoke tpm_version from DomU: root@DomU:/home/xen# tpm_version TPM 1.2 Version Info: Chip Version:1.2.0.7 Spec Level: 2 Errata Revision: 1 TPM Vendor ID: ETHZ TPM Version: 0101 Manufacturer Info: 4554485a I can also see the PCRs status by invoking cat /sys/class/misc/tpm0/device/pcrs, however, most of the commands return an error. When I invoke takeownership I get the following error: root@DomU:/home/xen# tpm_takeownership -y -z -l debug Tspi_Context_Create success Tspi_Context_Connect success Tspi_Context_GetTpmObject success Tspi_GetPolicyObject success Tspi_Policy_SetSecret success Tspi_Context_CreateObject success Tspi_GetPolicyObject success Tspi_Policy_SetSecret success Tspi_TPM_TakeOwnership failed: 0x2004 - layer=tcs, code=0004 (4), Internal software error Tspi_Context_CloseObject success Tspi_Context_FreeMemory success Tspi_Context_Close success The same error is given when invoking tpm_getpubkey. I have already tried after clearing the TPM from BIOS, after having taken ownership and with ownership no taken with the same result when using the vTPM. I have also installed Xen 4.3.4, with the same result too. In the end, I would like to use the vTPM to generate and use RSA keys for TLS session establishing (using the API provided with GnuTLS). Since I cannot take ownership of the vTPM, the GnuTLS' tpmtool complains it doesn't find any SRK. I really appreciate any help you can provide. Best regards, Marcos ___ Xen-devel mailing list Xen-devel@lists.xen.orgmailto:Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 09/12] x86/altp2m: add remaining support routines.
On 06/25/2015 06:40 AM, Razvan Cojocaru wrote: On 06/25/2015 03:44 PM, Lengyel, Tamas wrote: On Wed, Jun 24, 2015 at 2:06 PM, Ed White edmund.h.wh...@intel.com mailto:edmund.h.wh...@intel.com wrote: On 06/24/2015 09:15 AM, Lengyel, Tamas wrote: +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx, + unsigned long pfn, xenmem_access_t access) +{ This function IMHO should be merged with p2m_set_mem_access and should be triggerable with the same memop (XENMEM_access_op) hypercall instead of introducing a new hvmop one. I think we should vote on this. My view is that it makes XENMEM_access_op too complicated to use. The two functions are not very long and share enough code that it would justify merging. The only big change added is the copy from host-alt when the entry doesn't exists in alt, and that itself is pretty self contained. Let's see if we can get a third opinion on it.. At first sight (I admit I'm rather late in the game and haven't had a chance to follow the series closely from the beginning), the two functions do seem to be mergeable (or at least the common code factored out in static helper functions). Also, if Ed's concern is that the libxc API would look unnatural if xc_set_mem_access() is used for both purposes, as far as I can tell the only difference could be a non-zero last altp2m parameter, so I agree with you that the less functions doing almost the same thing the better (I have been guilty of this in the past too, for example with my xc_enable_introspection() function ;) ). So I'd say, yes, if possible merge them. So here are my reasons why I don't think we should merge the hypercalls, in more detail: Although the two hypercalls are similar, they are not identical. For one thing, the existing hypercall can only be used cross-domain whereas the altp2m one can be used cross-domain or intra-domain. Also, the existing hypercall can be used to modify a range of pages and the new one can only modify a single page, and that is intentional. As I see it, the implementation in hvm.c would become a lot less clean, and every direct user of the existing hypercall would have to change for no good reason. Razvan's suggestion to merge the functions that implement the p2m changes I'm more ambivalent about. Personally, I prefer not to have code that contains lots of conditional logic, which would be the result, but I don't feel that strongly about it. Ed Ravi This also has implications for the XSM hooks used for these hypercalls - altp2m default policy is to allow for intra-domain , which is not the case for XENMEM_access_op - Any thoughts on how to manage this difference if we merge them? Ravi ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-3.4 test] 58878: regressions - FAIL
flight 58878 linux-3.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/58878/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-win7-amd64 6 xen-boot fail REGR. vs. 30511 Tests which are failing intermittently (not blocking): test-amd64-i386-xl-qemuu-win7-amd64 9 windows-install fail in 58831 pass in 58878 test-amd64-amd64-pair10 xen-boot/dst_host fail pass in 58798 test-amd64-amd64-pair 9 xen-boot/src_host fail pass in 58798 test-amd64-amd64-xl-sedf-pin 6 xen-bootfail pass in 58798 test-amd64-i386-pair 10 xen-boot/dst_host fail pass in 58831 test-amd64-i386-pair 9 xen-boot/src_host fail pass in 58831 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-i386-libvirt-xsm 6 xen-bootfail baseline untested test-amd64-amd64-xl-multivcpu 6 xen-boot fail baseline untested test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-amd64-libvirt-xsm 6 xen-bootfail baseline untested test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-amd64-xl-sedf 6 xen-boot fail like 30406 test-amd64-i386-libvirt 11 guest-start fail like 30511 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 30511 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 30511 test-amd64-amd64-xl-qemuu-ovmf-amd64 6 xen-bootfail like 53709-bisect test-amd64-i386-freebsd10-amd64 6 xen-boot fail like 58780-bisect test-amd64-i386-xl-qemuu-winxpsp3 6 xen-boot fail like 58786-bisect test-amd64-i386-qemut-rhel6hvm-intel 6 xen-bootfail like 58788-bisect test-amd64-i386-rumpuserxen-i386 6 xen-bootfail like 58799-bisect test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 6 xen-bootfail like 58801-bisect test-amd64-amd64-xl-qemuu-debianhvm-amd64 6 xen-boot fail like 58803-bisect test-amd64-amd64-xl-qemut-winxpsp3 6 xen-boot fail like 58804-bisect test-amd64-i386-freebsd10-i386 6 xen-boot fail like 58805-bisect test-amd64-i386-xl-qemuu-ovmf-amd64 6 xen-boot fail like 58806-bisect test-amd64-amd64-xl-qemuu-winxpsp3 6 xen-boot fail like 58807-bisect test-amd64-i386-xl-qemut-winxpsp3 6 xen-boot fail like 58808-bisect test-amd64-i386-xl-qemut-winxpsp3-vcpus1 6 xen-bootfail like 58809-bisect test-amd64-amd64-rumpuserxen-amd64 6 xen-boot fail like 58810-bisect test-amd64-i386-xl-qemuu-debianhvm-amd64 6 xen-bootfail like 58811-bisect test-amd64-amd64-xl-qemut-debianhvm-amd64 6 xen-boot fail like 58813-bisect test-amd64-i386-qemuu-rhel6hvm-intel 6 xen-bootfail like 58814-bisect test-amd64-i386-xl-qemut-debianhvm-amd64 6 xen-bootfail like 58815-bisect Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt 12 migrate-support-check fail in 58831 never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail in 58831 never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail never pass version targeted for testing: linuxcf1b3dad6c5699b977273276bada8597636ef3e2 baseline version: linuxbb4a05a0400ed6d2f1e13d1f82f289ff74300a70 500 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-amd64-i386-xl
Re: [Xen-devel] [PATCH V5 2/7] libxl_read_file_contents: add new entry to read sysfs file
On 6/25/2015 at 07:09 PM, in message 21899.57676.368102.982...@mariner.uk.xensource.com, Ian Jackson ian.jack...@eu.citrix.com wrote: Chunyan Liu writes ([PATCH V5 2/7] libxl_read_file_contents: add new entry to read sysfs file): Sysfs file has size=4096 but actual file content is less than that. Current libxl_read_file_contents will treat it as error when file size and actual file content differs, so reading sysfs file content with this function always fails. Add a new entry libxl_read_sysfs_file_contents to handle sysfs file specially. It would be used in later pvusb work. I think this still fails to detect a situation where the file is unexpectedly longer than the requested size ? +} else if (feof(f)) { +if (rs datalen tolerate_shrinking_file) { +datalen = rs; +} else { If the file is bigger than the requested size, it will fall to this branch and report error. Do you mean I should report another error message separately? - Chunyan +LOG(ERROR, %s changed size while we were reading it, +filename); +goto xe; +} +} else { As we wrote earlier: Is there any risk that the file is actually bigger than advertised, rather than smaller ? For sysfs file, couldn't be bigger. Then you should detect the condition that the file is bigger, and call it an error. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST Nested PATCH v11 6/7] Compose the main recipe of nested test job
Pang, LongtaoX writes (RE: [OSSTEST Nested PATCH v11 6/7] Compose the main recipe of nested test job): -Original Message- From: Ian Campbell [mailto:ian.campb...@citrix.com] ... I think you are correct, the logs capture will fail too. I'll leave it to Ian to suggest a solution since it will no doubt involve some tcl plumbing (I'd be inclined to record 'hosts which are actually guests' somewhere and have the infra clean them up automatically after doing leak check and log collection). Sorry I haven't done this yet, it's still on my radar. I was thinking more along the lines of creating Osstest/PDU/guest.pm with the appropriate methods calling out to toolstack($l0)-foo, setting $ho-{Power} = 'guest $l1guestname' somewhere and allowing power_cycle_host_setup to do it's thing. I have reviewed power_cycle_host_setup function, inside this function will call get_host_method_object, then we could get a $mo which will be assigned to $ho-{PowerMethobjs}, right? Inside power_state function, it will call pdu_power_state which is defined in guest.pm, right? Yes. So, I need to defined how to power off/on L1 inside pdu_power_state function? I think we need to using 'xl destroy' and 'xl create' command to implement the power method. Indeed. You'll need to use the appropriate toolstack object, in case it's libvirt or something. toolstack($ho) where $ho is the L0. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access
On Tue, 2015-06-23 at 18:25 +0200, Vitaly Kuznetsov wrote: Jan Beulich jbeul...@suse.com writes: On 26.05.15 at 15:32, vkuzn...@redhat.com wrote: --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -1709,9 +1709,9 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, const struct npfec npfec) /* * Set access type for a region of pfns. - * If start_pfn == -1ul, sets the default access type. + * If start_gfn == -1ul, sets the default access type. */ -long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr, +long p2m_set_mem_access(struct domain *d, unsigned long start_gfn, uint32_t nr, uint32_t start, uint32_t mask, xenmem_access_t access) { struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -1752,14 +1752,15 @@ long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr, p2m-mem_access_enabled = true; /* If request to set default access. */ -if ( pfn == ~0ul ) +if ( start_gfn == ~0ul ) { p2m-default_access = a; return 0; } rc = apply_p2m_changes(d, MEMACCESS, - pfn_to_paddr(pfn+start), pfn_to_paddr(pfn+nr), + pfn_to_paddr(start_gfn + start), Particularly due to this expression I'm not really happy about the start_ prefix that you're adding here, but I'll let the maintainers of the respective pieces of code decide if they're happy with it. Sorry for the ping but it has been almost one month... Sorry, I must have missed this one, pinging was absolutely the right thing to do (after a week or two would have been fine, no need to wait a month). I'm not super keen on the start_ prefix either, but I would prefer consistency between arm and x86 here more than I object to the prefix. IOW my preference would be to drop it everywhere, but if x86 folks prefer to keep it then I don't mind but ARM should keep it too. I've also copied the (new) mem access maintainers in case they have an opinion. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 09/19] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
On Tue, Jun 23, 2015 at 05:57:20PM +0800, Tiejun Chen wrote: We will introduce the hypercall xc_reserved_device_memory_map approach to libxc. This helps us get rdm entry info according to different parameters. If flag == PCI_DEV_RDM_ALL, all entries should be exposed. Or we just expose that rdm entry specific to a SBDF. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] pvUSB backend performance
On 06/25/2015 10:53 AM, Dario Faggioli wrote: On Wed, 2015-06-24 at 14:06 +0200, Juergen Gross wrote: Hi, my qemu integrated pvUSB backend is now running stable enough to do some basic performance measurements. I've passed a memory-stick with about 90MB of data on it to a pv-domU. Then I read all the data on it with tar and looked how long this would take (elapsed time): in dom0: 5.2s in domU with kernel backend: 6.1s in domU with qemu backend: 8.2s So the qemu backend is about 30% slower than the kernel backend. Is this acceptable? If I can ask (I know nothing about USB, let alone pvUSB! :-O), and if you happen to know, what's the situation of other hypervisors, in term both of support and performance? No specific knowledge, sorry. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset
On 25.06.15 at 12:51, paul.durr...@citrix.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 25 June 2015 11:47 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset On 24.06.15 at 13:24, paul.durr...@citrix.com wrote: @@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access( for ( ;; ) { -rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, -*buffer); -if ( rc != X86EMUL_OKAY ) -break; +/* Have we already done this chunk? */ +if ( (*off + chunk) = vio-mmio_cache[dir].size ) I can see why you would like to get rid of the address check, but I'm afraid you can't: You have to avoid getting mixed up multiple same kind (reads or writes) memory accesses that a single instruction can do. While generally I would assume that secondary accesses (like the I/O bitmap read associated with an OUTS) wouldn't go to MMIO, CMPS with both operands being in MMIO would break even if neither crosses a page boundary (not to think of when the emulator starts supporting the scatter/gather instructions, albeit supporting them will require further changes, or we could choose to do them one element at a time). Ok. Can I assume at most two distinct set of addresses for read or write? If so then I can just keep two sets of caches in the hvm_io struct. If we can leave out implicit accesses (like the one mentioned) as well as stack ones, then there shouldn't be more than two (disjoint) reads and one write per instruction, but each possibly crossing a page boundary. If we want to support stacks in MMIO, enter and leave would extend that set, as would said implicit accesses. Of course we should take into consideration what currently works, and I think both stack and implicit accesses would currently work as long as they're aligned (as misalignment would be the only reason for them to get split up - they're never wider than a long). I.e. you may want to consider avoiding any ASSERT()s or other conditionals potentially breaking these special cases. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 07/11] x86/intel_pstate: the main boby of the intel_pstate driver
The intel_pstate driver is ported following its kernel code logic (commit: 93f0822d).In order to port the Linux source file with minimal modifications, some of the variable types are kept intact (e.g. int current_pstae, would otherwise be changed to unsigned int). In the kernel, a user can adjust the limits via sysfs (limits.min_sysfs_pct/max_sysfs_pct). In Xen, the policy-limits.min_perf_pct/max_perf_pct acts as the transit station. A user interacts with it via xenpm. The new xen/include/asm-x86/cpufreq.h header file is added. v4 changes: 1) changed the identation to be a Tab (same as Linux intel_pstate), instead of 4 +$; 2) added a new header file, xen/include/asm-x86/cpufreq.h. Signed-off-by: Wei Wang wei.w.w...@intel.com --- xen/arch/x86/acpi/cpufreq/Makefile | 1 + xen/arch/x86/acpi/cpufreq/intel_pstate.c | 870 +++ xen/include/asm-x86/cpufreq.h| 34 ++ xen/include/asm-x86/msr-index.h | 3 + 4 files changed, 908 insertions(+) create mode 100644 xen/arch/x86/acpi/cpufreq/intel_pstate.c create mode 100644 xen/include/asm-x86/cpufreq.h diff --git a/xen/arch/x86/acpi/cpufreq/Makefile b/xen/arch/x86/acpi/cpufreq/Makefile index f75da9b..99fa9f4 100644 --- a/xen/arch/x86/acpi/cpufreq/Makefile +++ b/xen/arch/x86/acpi/cpufreq/Makefile @@ -1,2 +1,3 @@ obj-y += cpufreq.o +obj-y += intel_pstate.o obj-y += powernow.o diff --git a/xen/arch/x86/acpi/cpufreq/intel_pstate.c b/xen/arch/x86/acpi/cpufreq/intel_pstate.c new file mode 100644 index 000..19c74cc --- /dev/null +++ b/xen/arch/x86/acpi/cpufreq/intel_pstate.c @@ -0,0 +1,870 @@ +#include xen/kernel.h +#include xen/types.h +#include xen/init.h +#include xen/bitmap.h +#include xen/cpumask.h +#include xen/timer.h +#include asm/msr.h +#include asm/msr-index.h +#include asm/processor.h +#include asm/div64.h +#include asm/cpufreq.h +#include acpi/cpufreq/cpufreq.h + +#define BYT_RATIOS 0x66a +#define BYT_VIDS 0x66b +#define BYT_TURBO_RATIOS 0x66c +#define BYT_TURBO_VIDS 0x66d + +#define FRAC_BITS 8 +#define int_tofp(X) ((int64_t)(X) FRAC_BITS) +#define fp_toint(X) ((X) FRAC_BITS) + +static inline int32_t mul_fp(int32_t x, int32_t y) +{ + return ((int64_t)x * (int64_t)y) FRAC_BITS; +} + +static inline int32_t div_fp(int32_t x, int32_t y) +{ + return div_s64((int64_t)x FRAC_BITS, y); +} + +static inline int ceiling_fp(int32_t x) +{ + int mask, ret; + + ret = fp_toint(x); + mask = (1 FRAC_BITS) - 1; + if (x mask) + ret += 1; + return ret; +} + +struct sample { + int32_t core_pct_busy; + u64 aperf; + u64 mperf; + int freq; + s_time_t time; +}; + +struct pstate_data { + int current_pstate; + int min_pstate; + int max_pstate; + int scaling; + int turbo_pstate; +}; + +struct vid_data { + int min; + int max; + int turbo; + int32_t ratio; +}; + +struct _pid { + int setpoint; + int32_t integral; + int32_t p_gain; + int32_t i_gain; + int32_t d_gain; + int deadband; + int32_t last_err; +}; + +struct cpudata { + int cpu; + + struct timer timer; + + struct pstate_data pstate; + struct vid_data vid; + struct _pid pid; + + s_time_t last_sample_time; + u64 prev_aperf; + u64 prev_mperf; + struct sample sample; +}; + +static struct cpudata **all_cpu_data; + +struct pstate_adjust_policy { + int sample_rate_ms; + int deadband; + int setpoint; + int p_gain_pct; + int d_gain_pct; + int i_gain_pct; +}; + +struct pstate_funcs { + int (*get_max)(void); + int (*get_min)(void); + int (*get_turbo)(void); + int (*get_scaling)(void); + void (*set)(struct perf_limits *, struct cpudata *, int pstate); + void (*get_vid)(struct cpudata *); +}; + +struct cpu_defaults { + struct pstate_adjust_policy pid_policy; + struct pstate_funcs funcs; +}; + +static struct pstate_adjust_policy pid_params; +static struct pstate_funcs pstate_funcs; + +static inline void pid_reset(struct _pid *pid, int setpoint, int busy, +int deadband, int integral) { + pid-setpoint = setpoint; + pid-deadband = deadband; + pid-integral = int_tofp(integral); + pid-last_err = int_tofp(setpoint) - int_tofp(busy); +} + +static inline void pid_p_gain_set(struct _pid *pid, int percent) +{ + pid-p_gain = div_fp(int_tofp(percent), int_tofp(100)); +} + +static inline void pid_i_gain_set(struct _pid *pid, int percent) +{ + pid-i_gain = div_fp(int_tofp(percent), int_tofp(100)); +} + +static inline void pid_d_gain_set(struct _pid *pid, int percent) +{ + pid-d_gain = div_fp(int_tofp(percent), int_tofp(100)); +} + +static signed int pid_calc(struct _pid *pid, int32_t busy) +{ + signed int result; + int32_t pterm, dterm,
[Xen-devel] [PATCH v4 09/11] x86/intel_pstate: add a booting param to select the driver to load
By default, the old P-state driver (acpi-freq) is used. Adding intel_pstate to the Xen booting param list to enable the use of intel_pstate. However, if intel_pstate is enabled on a machine which does not support the driver (e.g. Nehalem), the old P-state driver will be loaded due to the failure loading of intel_pstate. Also, adding the intel_pstate booting parameter to xen-command-line.markdown. v4 changes: 1) moved the definition of load_intel_pstate right ahead of intel_pstate_init(); 2) merged the previous patch,adding the booting param to xen.command-line.markdown, into this one. Signed-off-by: Wei Wang wei.w.w...@intel.com --- docs/misc/xen-command-line.markdown | 7 +++ xen/arch/x86/acpi/cpufreq/cpufreq.c | 9 ++--- xen/arch/x86/acpi/cpufreq/intel_pstate.c | 6 ++ 3 files changed, 19 insertions(+), 3 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 4889e27..249bf65 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -830,6 +830,13 @@ debug hypervisor only). ### idle\_latency\_factor `= integer` +### intel\_pstate + `= boolean` + + Default: `false` + +Enable the loading of the intel pstate driver. + ### ioapic\_ack `= old | new` diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c index 643c405..e737437 100644 --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c @@ -41,6 +41,7 @@ #include asm/processor.h #include asm/percpu.h #include asm/cpufeature.h +#include asm/cpufreq.h #include acpi/acpi.h #include acpi/cpufreq/cpufreq.h @@ -648,9 +649,11 @@ static int __init cpufreq_driver_init(void) int ret = 0; if ((cpufreq_controller == FREQCTL_xen) -(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)) -ret = cpufreq_register_driver(acpi_cpufreq_driver); -else if ((cpufreq_controller == FREQCTL_xen) +(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)) { +ret = intel_pstate_init(); +if (ret) +ret = cpufreq_register_driver(acpi_cpufreq_driver); +} else if ((cpufreq_controller == FREQCTL_xen) (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)) ret = powernow_register_driver(); diff --git a/xen/arch/x86/acpi/cpufreq/intel_pstate.c b/xen/arch/x86/acpi/cpufreq/intel_pstate.c index 19c74cc..5e03625 100644 --- a/xen/arch/x86/acpi/cpufreq/intel_pstate.c +++ b/xen/arch/x86/acpi/cpufreq/intel_pstate.c @@ -831,12 +831,18 @@ static void __init copy_cpu_funcs(struct pstate_funcs *funcs) pstate_funcs.get_vid = funcs-get_vid; } +static bool_t __initdata load_intel_pstate; +boolean_param(intel_pstate, load_intel_pstate); + int __init intel_pstate_init(void) { int cpu, rc = 0; const struct x86_cpu_id *id; struct cpu_defaults *cpu_info; + if (!load_intel_pstate) + return -ENODEV; + id = x86_match_cpu(intel_pstate_cpu_ids); if (!id) return -ENODEV; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 10/11] x86/intel_pstate: support the use of intel_pstate in pmstat.c
Add support in the pmstat.c so that the xenpm tool can request to access the intel_pstate driver. v4 changes: 1) changed to use the internal_governor struct; 2) coding style change (indentation of gov_num++). Signed-off-by: Wei Wang wei.w.w...@intel.com --- tools/libxc/xc_pm.c | 4 +- xen/drivers/acpi/pmstat.c | 148 xen/include/public/sysctl.h | 16 - 3 files changed, 138 insertions(+), 30 deletions(-) diff --git a/tools/libxc/xc_pm.c b/tools/libxc/xc_pm.c index 5a7148e..823bab6 100644 --- a/tools/libxc/xc_pm.c +++ b/tools/libxc/xc_pm.c @@ -265,8 +265,8 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid, user_para-cpuinfo_max_freq = sys_para-cpuinfo_max_freq; user_para-cpuinfo_min_freq = sys_para-cpuinfo_min_freq; user_para-scaling_cur_freq = sys_para-scaling_cur_freq; -user_para-scaling_max_freq = sys_para-scaling_max_freq; -user_para-scaling_min_freq = sys_para-scaling_min_freq; +user_para-scaling_max_freq = sys_para-scaling_max.freq; +user_para-scaling_min_freq = sys_para-scaling_min.freq; user_para-turbo_enabled= sys_para-turbo_enabled; memcpy(user_para-scaling_driver, diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c index daac2da..89628aa 100644 --- a/xen/drivers/acpi/pmstat.c +++ b/xen/drivers/acpi/pmstat.c @@ -192,22 +192,33 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op) uint32_t ret = 0; const struct processor_pminfo *pmpt; struct cpufreq_policy *policy; +struct perf_limits *limits; +struct internal_governor *internal_gov; uint32_t gov_num = 0; uint32_t *affected_cpus; uint32_t *scaling_available_frequencies; char *scaling_available_governors; struct list_head *pos; uint32_t cpu, i, j = 0; +uint32_t cur_gov; pmpt = processor_pminfo[op-cpuid]; policy = per_cpu(cpufreq_cpu_policy, op-cpuid); +limits = policy-limits; +internal_gov = policy-internal_gov; +cur_gov = internal_gov ? internal_gov-cur_gov : 0; if ( !pmpt || !pmpt-perf.states || - !policy || !policy-governor ) + !policy || (!policy-governor !policy-internal_gov) ) return -EINVAL; -list_for_each(pos, cpufreq_governor_list) -gov_num++; +if (internal_gov) +gov_num = internal_gov-gov_num; +else +{ +list_for_each(pos, cpufreq_governor_list) +gov_num++; +} if ( (op-u.get_para.cpu_num != cpumask_weight(policy-cpus)) || (op-u.get_para.freq_num != pmpt-perf.state_count)|| @@ -241,28 +252,47 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op) if ( ret ) return ret; -if ( !(scaling_available_governors = - xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) ) -return -ENOMEM; -if ( (ret = read_scaling_available_governors(scaling_available_governors, -gov_num * CPUFREQ_NAME_LEN * sizeof(char))) ) +if (internal_gov) { +scaling_available_governors = internal_gov-avail_gov; +ret = copy_to_guest(op-u.get_para.scaling_available_governors, +scaling_available_governors, gov_num * CPUFREQ_NAME_LEN); +if ( ret ) +return ret; +} +else +{ +if ( !(scaling_available_governors = + xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) ) +return -ENOMEM; +if ( (ret = read_scaling_available_governors(scaling_available_governors, +gov_num * CPUFREQ_NAME_LEN * sizeof(char))) ) +{ +xfree(scaling_available_governors); +return ret; +} +ret = copy_to_guest(op-u.get_para.scaling_available_governors, +scaling_available_governors, gov_num * CPUFREQ_NAME_LEN); xfree(scaling_available_governors); -return ret; +if ( ret ) +return ret; } -ret = copy_to_guest(op-u.get_para.scaling_available_governors, -scaling_available_governors, gov_num * CPUFREQ_NAME_LEN); -xfree(scaling_available_governors); -if ( ret ) -return ret; - op-u.get_para.cpuinfo_cur_freq = cpufreq_driver-get ? cpufreq_driver-get(op-cpuid) : policy-cur; op-u.get_para.cpuinfo_max_freq = policy-cpuinfo.max_freq; op-u.get_para.cpuinfo_min_freq = policy-cpuinfo.min_freq; op-u.get_para.scaling_cur_freq = policy-cur; -op-u.get_para.scaling_max_freq = policy-max; -op-u.get_para.scaling_min_freq = policy-min; +if (internal_gov) +{ +op-u.get_para.scaling_max.pct = limits-max_perf_pct; +op-u.get_para.scaling_min.pct = limits-min_perf_pct; +op-u.get_para.scaling_turbo_pct = limits-turbo_pct; +} +else +{ +op-u.get_para.scaling_max.freq = policy-max; +op-u.get_para.scaling_min.freq = policy-min; +} if (
Re: [Xen-devel] [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests
On 24/06/15 17:31, Andrew Cooper wrote: Experimentally, older Linux guests perform construction of `init` with user pagetable mappings. This is fine for native systems as such a guest would not set CR4.SMAP itself. However if Xen uses SMAP itself, 32bit PV guests (whose kernels run in ring1) are also affected. Older Linux guests end up spinning in a loop assuming that the SMAP violation pagefaults are spurious, and make no further progress. One option is to disable SMAP completely, but this is unreasonable. A better alternative is to disable SMAP only in the context of 32bit PV guests, but reduces the effectiveness SMAP security. A 3rd option is for Xen to fix up behind a 32bit guest if it were SMAP-aware. It is a heuristic, and does result in a guest-visible state change, but allows Xen to keep CR4.SMAP unconditionally enabled. [...] --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1261,11 +1261,32 @@ Set the serial transmit buffer size. Flag to enable Supervisor Mode Execution Protection ### smap - `= boolean` + `= boolean | compat | fixup` Default: `true` -Flag to enable Supervisor Mode Access Prevention +Handling of Supervisor Mode Access Prevention. + +32bit PV guest kernels qualify as supervisor code, as they execute in ring 1. +If Xen uses SMAP protection itself, a PV guest which is not SMAP aware may +suffer unexpected pagefaults which it cannot handle. (Experimentally, there +are 32bit PV guests which fall foul of SMAP enforcement and spin in an +infinite loop taking pagefaults early on boot.) + +Two further SMAP modes are introduced to work around buggy 32bit PV guests to +prevent functional regressions of VMs on newer hardware. At any point if the +guest sets `CR4.SMAP` itself, it is deemed aware, and **compat/fixup** cease +to apply. Guests that is not aware of SMAP or do not support it are not buggy. + +A SMAP mode of **compat** causes Xen to disable `CR4.SMAP` in the context of +an unaware 32bit PV guest. This prevents the guest from being subject to SMAP +enforcement, but also prevents Xen from benefiting from the added security +checks. + +A SMAP mode of **fixup** causes Xen to set `EFLAGS.AC` when discovering a SMAP +pagefault in the context of an unaware 32bit PV guest. This allows Xen to +retain the added security from SMAP checks, but results in a guest-visible +state change which it might object to. What does the PV ABI say about the use of EFLAGS.AC? Have guests historically been allowed to use this bit? If so, does Xen fiddling with it potentially break some guests? David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool
Ideally, the pCPUs that are 'free', i.e., not assigned to any cpupool, should not be considred by the scheduler for load balancing or anything. In Credit1, we fail at this, because of how we use cpupool_scheduler_cpumask(). In fact, for a free pCPU, cpupool_scheduler_cpumask() returns a pointer to cpupool_free_cpus, and hence, near the top of csched_load_balance(): if ( unlikely(!cpumask_test_cpu(cpu, online)) ) goto out; is false (the pCPU _is_ free!), and we therefore do not jump to the end right away, as we should. This, causes the following splat when resuming from ACPI S3 with pCPUs not assigned to any pool: (XEN) [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) ... ... ... (XEN) Xen call trace: (XEN)[82d080122eaa] csched_load_balance+0x213/0x794 (XEN)[82d08012374c] csched_schedule+0x321/0x452 (XEN)[82d08012c85e] schedule+0x12a/0x63c (XEN)[82d08012fa09] __do_softirq+0x82/0x8d (XEN)[82d08012fa61] do_softirq+0x13/0x15 (XEN)[82d080164780] idle_loop+0x5b/0x6b (XEN) (XEN) (XEN) (XEN) Panic on CPU 8: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=] (XEN) The cure is: * use cpupool_online_cpumask(), as a better guard to the case when the cpu is being offlined; * explicitly check whether the cpu is free. SEDF is in a similar situation, so fix it too. Still in Credit1, we must make sure that free (or offline) CPUs are not considered ticklable. Not doing so would impair the load balancing algorithm, making the scheduler think that it is possible to 'ask' the pCPU to pick up some work, while in reallity, that will never happen! Evidence of such behavior is shown in this trace: Name CPU list Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 0.112998198 | ||.|| -|x||-|- d0v0 runstate_change d0v4 offline-runnable ] 0.112998198 | ||.|| -|x||-|- d0v0 22006(2:2:6) 1 [ f ] ] 0.112999612 | ||.|| -|x||-|- d0v0 28004(2:8:4) 2 [ 0 4 ] 0.113003387 | ||.|| --|x d32767v15 runstate_continue d32767v15 running-running where 22006(2:2:6) 1 [ f ] means that pCPU 15, which is free from any pool, is tickled. The cure, in this case, is to filter out the free pCPUs, within __runq_tickle(). Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Cc: George Dunlap george.dun...@eu.citrix.com Cc: Juergen Gross jgr...@suse.com --- xen/common/sched_credit.c | 23 --- xen/common/sched_sedf.c |3 ++- 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 953ecb0..a1945ac 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -366,12 +366,17 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) { struct csched_vcpu * const cur = CSCHED_VCPU(curr_on_cpu(cpu)); struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); -cpumask_t mask, idle_mask; +cpumask_t mask, idle_mask, *online; int balance_step, idlers_empty; ASSERT(cur); cpumask_clear(mask); -idlers_empty = cpumask_empty(prv-idlers); + +/* cpu is vc-processor, so it must be in a cpupool. */ +ASSERT(per_cpu(cpupool, cpu) != NULL); +online = cpupool_online_cpumask(per_cpu(cpupool, cpu)); +cpumask_and(idle_mask, prv-idlers, online); +idlers_empty = cpumask_empty(idle_mask); /* @@ -408,8 +413,8 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) /* Are there idlers suitable for new (for this balance step)? */ csched_balance_cpumask(new-vcpu, balance_step, csched_balance_mask); -cpumask_and(idle_mask, prv-idlers, csched_balance_mask); -new_idlers_empty = cpumask_empty(idle_mask); +cpumask_and(csched_balance_mask, csched_balance_mask, idle_mask); +new_idlers_empty = cpumask_empty(csched_balance_mask); /* * Let's not be too harsh! If there aren't idlers suitable @@ -1510,6 +1515,7 @@ static struct csched_vcpu * csched_load_balance(struct csched_private *prv, int cpu, struct csched_vcpu *snext, bool_t *stolen) { +struct cpupool *c = per_cpu(cpupool, cpu); struct csched_vcpu *speer; cpumask_t workers; cpumask_t *online; @@ -1517,10 +1523,13 @@ csched_load_balance(struct csched_private *prv, int cpu, int node = cpu_to_node(cpu); BUG_ON( cpu != snext-vcpu-processor ); -online = cpupool_scheduler_cpumask(per_cpu(cpupool, cpu)); +online = cpupool_online_cpumask(c); -/* If this CPU is going offline we shouldn't steal work. */ -if ( unlikely(!cpumask_test_cpu(cpu, online)) ) +/* + * If this CPU is going offline, or is not (yet) part of any cpupool + * (as it happens, e.g., during cpu bringup), we shouldn't steal work. + */ +if ( unlikely(!cpumask_test_cpu(cpu, online) ||
[Xen-devel] [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved
This is mostly about fixing bugs showing up during suspend/resume, with non default configurations such as, pCPUs free from any cpupool, more than one cpupool in the system, etc. I tried a few different appoaches, for dealing with these cases. For instance, I tried creating an 'idle cpupool', and then putting the free pCPUs there, instead than sort-of parking them in cpupool0 (although in a special condition), like we're doing now, but that introduces other issues. I think this series is, the least invasive, and yet correct, way of dealing with the situation. In some more detail: * patch 1 is just refactoring/beautifying dump output; * patch 2 is the fix for a bug showing up during resume, when two or more cpupools exist; * patch 3 fixes a bug (in the suspend/resume path again) and also improves Credit1 behavior, i.e., stops it from considering pCPUs that are outside of any pool as potential candidates where to execute vCPUs; * patch 4 is refactoring again, with the intent of making what made patch 3 necessary less likely to happen! :-) Thanks and Regards, Dario --- Dario Faggioli (4): xen: sched: avoid dumping duplicate information xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown xen: credit1: properly deal with pCPUs not in any cpupool xen: sched: get rid of cpupool_scheduler_cpumask() xen/arch/x86/smpboot.c |1 - xen/common/cpupool.c|8 +--- xen/common/domain.c |5 +++-- xen/common/domctl.c |4 ++-- xen/common/sched_arinc653.c |2 +- xen/common/sched_credit.c | 27 ++- xen/common/sched_rt.c | 12 ++-- xen/common/sched_sedf.c |5 +++-- xen/common/schedule.c | 20 ++-- xen/include/xen/sched-if.h | 12 ++-- 10 files changed, 62 insertions(+), 34 deletions(-) -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask()
and of (almost every) direct use of cpupool_online_cpumask(). In fact, what we really want for the most of the times, is the set of valid pCPUs of the cpupool a certain domain is part of. Furthermore, in case it's called with a NULL pool as argument, cpupool_scheduler_cpumask() does more harm than good, by returning the bitmask of free pCPUs! This commit, therefore: * gets rid of cpupool_scheduler_cpumask(), in favour of cpupool_domain_cpumask(), which makes it more evident what we are after, and accommodates some sanity checking; * replaces some of the calls to cpupool_online_cpumask() with calls to the new functions too. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Cc: George Dunlap george.dun...@eu.citrix.com Cc: Juergen Gross jgr...@suse.com Cc: Robert VanVossen robert.vanvos...@dornerworks.com Cc: Josh Whitehead josh.whiteh...@dornerworks.com Cc: Meng Xu men...@cis.upenn.edu Cc: Sisu Xi xis...@gmail.com --- xen/common/domain.c |5 +++-- xen/common/domctl.c |4 ++-- xen/common/sched_arinc653.c |2 +- xen/common/sched_credit.c |6 +++--- xen/common/sched_rt.c | 12 ++-- xen/common/sched_sedf.c |2 +- xen/common/schedule.c |2 +- xen/include/xen/sched-if.h | 12 ++-- 8 files changed, 27 insertions(+), 18 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index 3bc52e6..c20accb 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -184,7 +184,8 @@ struct vcpu *alloc_vcpu( /* Must be called after making new vcpu visible to for_each_vcpu(). */ vcpu_check_shutdown(v); -domain_update_node_affinity(d); +if ( !is_idle_domain(d) ) +domain_update_node_affinity(d); return v; } @@ -437,7 +438,7 @@ void domain_update_node_affinity(struct domain *d) return; } -online = cpupool_online_cpumask(d-cpupool); +online = cpupool_domain_cpumask(d); spin_lock(d-node_affinity_lock); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 2a2d203..a399aa6 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -664,7 +664,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) goto maxvcpu_out; ret = -ENOMEM; -online = cpupool_online_cpumask(d-cpupool); +online = cpupool_domain_cpumask(d); if ( max d-max_vcpus ) { struct vcpu **vcpus; @@ -748,7 +748,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) if ( op-cmd == XEN_DOMCTL_setvcpuaffinity ) { cpumask_var_t new_affinity, old_affinity; -cpumask_t *online = cpupool_online_cpumask(v-domain-cpupool);; +cpumask_t *online = cpupool_domain_cpumask(v-domain);; /* * We want to be able to restore hard affinity if we are trying diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c index cff5da9..dbe02ed 100644 --- a/xen/common/sched_arinc653.c +++ b/xen/common/sched_arinc653.c @@ -667,7 +667,7 @@ a653sched_pick_cpu(const struct scheduler *ops, struct vcpu *vc) * If present, prefer vc's current processor, else * just find the first valid vcpu . */ -online = cpupool_scheduler_cpumask(vc-domain-cpupool); +online = cpupool_domain_cpumask(vc-domain); cpu = cpumask_first(online); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index a1945ac..8c36635 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -309,7 +309,7 @@ __runq_remove(struct csched_vcpu *svc) static inline int __vcpu_has_soft_affinity(const struct vcpu *vc, const cpumask_t *mask) { -return !cpumask_subset(cpupool_online_cpumask(vc-domain-cpupool), +return !cpumask_subset(cpupool_domain_cpumask(vc-domain), vc-cpu_soft_affinity) !cpumask_subset(vc-cpu_hard_affinity, vc-cpu_soft_affinity) cpumask_intersects(vc-cpu_soft_affinity, mask); @@ -374,7 +374,7 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) /* cpu is vc-processor, so it must be in a cpupool. */ ASSERT(per_cpu(cpupool, cpu) != NULL); -online = cpupool_online_cpumask(per_cpu(cpupool, cpu)); +online = cpupool_domain_cpumask(new-sdom-dom); cpumask_and(idle_mask, prv-idlers, online); idlers_empty = cpumask_empty(idle_mask); @@ -641,7 +641,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) int balance_step; /* Store in cpus the mask of online cpus on which the domain can run */ -online = cpupool_scheduler_cpumask(vc-domain-cpupool); +online = cpupool_domain_cpumask(vc-domain); cpumask_and(cpus, vc-cpu_hard_affinity, online); for_each_csched_balance_step( balance_step ) diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 4372486..08611c8 100644 --- a/xen/common/sched_rt.c
Re: [Xen-devel] [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
On Tue, 2015-06-23 at 17:57 +0800, Tiejun Chen wrote: This patch passes our rdm reservation policy inside libxl when we assign a device or attach a device. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v4: * Fix one typo, s/unkwon/unknown * In command description, we should use [] to indicate it's optional for that extended xl command, pci-attach. docs/man/xl.pod.1 | 7 ++- tools/libxl/libxl_pci.c | 10 +- tools/libxl/xl_cmdimpl.c | 23 +++ tools/libxl/xl_cmdtable.c | 2 +- 4 files changed, 35 insertions(+), 7 deletions(-) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index 4eb929d..c5c4809 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its original driver, making it usable by Domain 0 again. If the device is not bound to pciback, it will return success. -=item Bpci-attach Idomain-id IBDF +=item Bpci-attach Idomain-id IBDF [Irdm] Hot-plug a new pass-through pci device to the specified domain. BBDF is the PCI Bus/Device/Function of the physical device to pass-through. +Brdm policy is about how to handle conflict between reserving reserved device s/is about/specifies/ and I think s/between/while/ +memory and guest address space. strict means an unsolved conflict leads to I think you mean in rather than and? +immediate VM crash, while relaxed allows VM moving forward with a warning +message thrown out. Here strict is default. The default is strict. You've repeated the list of allowed values for this two or three times now in the various docs, perhaps try and centralise on one definition and cross reference instead? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v3 21/22] Debian: Arrange to be able to chainload a xen.efi from grub2
Ian Campbell writes (Re: [PATCH OSSTEST v3 21/22] Debian: Arrange to be able to chainload a xen.efi from grub2): On Thu, 2015-06-25 at 13:36 +0100, Ian Jackson wrote: I think people are working on a better way is what I was looking for. When that change comes along, we can remove 20_linux_xen ? OK. By `OK' do you mean `yes' ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 09/12] x86/altp2m: add remaining support routines.
On 06/25/2015 03:44 PM, Lengyel, Tamas wrote: On Wed, Jun 24, 2015 at 2:06 PM, Ed White edmund.h.wh...@intel.com mailto:edmund.h.wh...@intel.com wrote: On 06/24/2015 09:15 AM, Lengyel, Tamas wrote: +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx, + unsigned long pfn, xenmem_access_t access) +{ This function IMHO should be merged with p2m_set_mem_access and should be triggerable with the same memop (XENMEM_access_op) hypercall instead of introducing a new hvmop one. I think we should vote on this. My view is that it makes XENMEM_access_op too complicated to use. The two functions are not very long and share enough code that it would justify merging. The only big change added is the copy from host-alt when the entry doesn't exists in alt, and that itself is pretty self contained. Let's see if we can get a third opinion on it.. At first sight (I admit I'm rather late in the game and haven't had a chance to follow the series closely from the beginning), the two functions do seem to be mergeable (or at least the common code factored out in static helper functions). Also, if Ed's concern is that the libxc API would look unnatural if xc_set_mem_access() is used for both purposes, as far as I can tell the only difference could be a non-zero last altp2m parameter, so I agree with you that the less functions doing almost the same thing the better (I have been guilty of this in the past too, for example with my xc_enable_introspection() function ;) ). So I'd say, yes, if possible merge them. Regards, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio check op
-Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 25 June 2015 14:38 To: Paul Durrant; Jan Beulich Cc: xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25/06/15 14:36, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 25 June 2015 14:34 To: Jan Beulich Cc: Paul Durrant; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25/06/15 13:46, Jan Beulich wrote: On 25.06.15 at 14:21, andrew.coop...@citrix.com wrote: On 24/06/15 12:24, Paul Durrant wrote: When memory mapped I/O is range checked by internal handlers, the length of the access should be taken into account. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com For what purpose? The length of the access doesn't affect which handler should accept the IO. This length check now causes an MMIO handler to not claim an access which straddles the upper boundary. It is probably fine to terminate such an access early, but it isn't fine to pass such a straddled access to the default ioreq server. No, without involving the length in the check we can end up with check() saying Yes, mine but read() or write() saying Not me. What I would agree with is for the generic handler to split the access if the first byte fits, but the final byte doesn't. I discussed this with Paul over lunch. I had not considered how IO gets forwarded to the device model for shared implementations. Is it reasonable to split a straddled access and direct the halves at different handlers? This is not in line with how other hardware behaves (PCIe will reject any straddled access). Furthermore, given small MMIO regions and larger registers, there is no guarantee that a single split will suffice. I see in the other thread going on that a domain_crash() is deemed ok for now, which is fine my me. I think that also allows me to simplfy the patch since I don't have to modify the mmio_check op any more. I simply call it once for the first byte of the access and, if it accepts, verify that it also accepts the last byte of the access. At that point, I would say it would be easier to modify the claim check to return yes/straddled/no rather than calling it twice. That's excessive code churn, I think. The check functions are generally cheap and the second call is only made if the first accepts. Paul ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST 1/2] mg-debian-installer-update: Print the correct value for TftpDiVersion
Ian Campbell writes ([PATCH OSSTEST 1/2] mg-debian-installer-update: Print the correct value for TftpDiVersion): That is, the date without the suite suffix. ... -echo $date -echo 2 downloaded $dstroot/$arch/$date +echo New TftpDiVersion: $date +echo 2 downloaded $dstroot/$dst You could make the output suitable for cp ? +echo TftpDiVersion $date Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST 2/2] mg-debian-installer-update: Update current symlink, if appropriate
Ian Campbell writes ([PATCH OSSTEST 2/2] mg-debian-installer-update: Update current symlink, if appropriate): Where appropriate means if TftpDiVersion is set to current, which is the default in standalone mode. The assumption is that someone wuth that configration runs mg-debian-installer-update then they would expected the update to be immediately effective. There was some existing, but commented, code to do this update, reinstate it with the correct condition and adjusting for the addition of -$suite to the patch many moons ago. There is no impact on any production configuration, since they always set TftpDiVersion. Acked-by: Ian Jackson ian.jack...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset
On 24.06.15 at 13:24, paul.durr...@citrix.com wrote: @@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access( for ( ;; ) { -rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, -*buffer); -if ( rc != X86EMUL_OKAY ) -break; +/* Have we already done this chunk? */ +if ( (*off + chunk) = vio-mmio_cache[dir].size ) I can see why you would like to get rid of the address check, but I'm afraid you can't: You have to avoid getting mixed up multiple same kind (reads or writes) memory accesses that a single instruction can do. While generally I would assume that secondary accesses (like the I/O bitmap read associated with an OUTS) wouldn't go to MMIO, CMPS with both operands being in MMIO would break even if neither crosses a page boundary (not to think of when the emulator starts supporting the scatter/gather instructions, albeit supporting them will require further changes, or we could choose to do them one element at a time). +{ +ASSERT(*off + chunk = vio-mmio_cache[dir].size); I don't see any difference to the if() expression just above. +if ( dir == IOREQ_READ ) +memcpy(buffer[*off], + vio-mmio_cache[IOREQ_READ].buffer[*off], + chunk); +else +{ +if ( memcmp(buffer[*off], else if please. +vio-mmio_cache[IOREQ_WRITE].buffer[*off], +chunk) != 0 ) +domain_crash(curr-domain); +} +} +else +{ +ASSERT(*off == vio-mmio_cache[dir].size); + +rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, +buffer[*off]); +if ( rc != X86EMUL_OKAY ) +break; + +/* Note that we have now done this chunk */ Missing stop. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: pci-passthrough of device using MSI-X interrupts not working after commit x86/MSI: track host and guest masking separately
Thursday, June 25, 2015, 10:48:40 AM, you wrote: On 24.06.15 at 21:38, li...@eikelenboom.it wrote: I'm having some trouble with a xhci controller passed through with pci-passthrough to one of my HVM guests. It uses MSI-X for interrupts, a bisection turned up the following commit: x86/MSI: track host and guest masking separately Although from a first glance it looks as if the controller is correctly initialize during the boot of the HVM guest (no worrying messages in dmesg yet). It utterly fails a simple lsusb this results in the hang pasted below. Other devices i passthrough which use legacy or MSI interrupts seem to be unaffected. Odd enough, since I'm having a hard time testing MSI (no suitable devices), but did a lot of testing with MSI-X. Please say so if you need any specific output from Xen debug keys or anything else ! M and i debug key output would be the first thing. I'd suspect host masking to be wrongly active for some reason. Jan Hi Jan, Attached is the xl-dmesg output of: - debug-keys M and i before guest boot - guest boot - debug-keys M and i after lsusb in the guest that hangs. The not working controller is :08:00.0. -- Sander77] traps.c:2655:d0v1 Domain attempted WRMSR c084 from 0x00074700 to 0x00047700. (XEN) [2015-06-25 10:38:46.277] traps.c:2655:d0v2 Domain attempted WRMSR c081 from 0xe023e008 to 0x00230010. (XEN) [2015-06-25 10:38:46.277] traps.c:2655:d0v2 Domain attempted WRMSR c082 from 0x82d0bfffd100 to 0x81b2f010. (XEN) [2015-06-25 10:38:46.277] traps.c:2655:d0v2 Domain attempted WRMSR c083 from 0x82d0bfffd120 to 0x81b30f10. (XEN) [2015-06-25 10:38:46.277] traps.c:2655:d0v2 Domain attempted WRMSR 0174 from 0x to 0x0010. (XEN) [2015-06-25 10:38:46.277] traps.c:2655:d0v2 Domain attempted WRMSR 0176 from 0x to 0x81b30dc0. (XEN) [2015-06-25 10:38:46.277] traps.c:2655:d0v2 Domain attempted WRMSR c084 from 0x00074700 to 0x00047700. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v3 Domain attempted WRMSR c081 from 0xe023e008 to 0x00230010. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v3 Domain attempted WRMSR c082 from 0x82d0bfffc180 to 0x81b2f010. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v3 Domain attempted WRMSR c083 from 0x82d0bfffc1a0 to 0x81b30f10. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v3 Domain attempted WRMSR 0174 from 0x to 0x0010. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v3 Domain attempted WRMSR 0176 from 0x to 0x81b30dc0. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v3 Domain attempted WRMSR c084 from 0x00074700 to 0x00047700. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v4 Domain attempted WRMSR c081 from 0xe023e008 to 0x00230010. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v4 Domain attempted WRMSR c082 from 0x82d0bfffb200 to 0x81b2f010. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v4 Domain attempted WRMSR c083 from 0x82d0bfffb220 to 0x81b30f10. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v4 Domain attempted WRMSR 0174 from 0x to 0x0010. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v4 Domain attempted WRMSR 0176 from 0x to 0x81b30dc0. (XEN) [2015-06-25 10:38:46.278] traps.c:2655:d0v4 Domain attempted WRMSR c084 from 0x00074700 to 0x00047700. (XEN) [2015-06-25 10:38:46.279] traps.c:2655:d0v5 Domain attempted WRMSR c081 from 0xe023e008 to 0x00230010. (XEN) [2015-06-25 10:38:46.279] traps.c:2655:d0v5 Domain attempted WRMSR c082 from 0x82d0bfffa280 to 0x81b2f010. (XEN) [2015-06-25 10:38:46.279] traps.c:2655:d0v5 Domain attempted WRMSR c083 from 0x82d0bfffa2a0 to 0x81b30f10. (XEN) [2015-06-25 10:38:46.279] traps.c:2655:d0v5 Domain attempted WRMSR 0174 from 0x to 0x0010. (XEN) [2015-06-25 10:38:46.279] traps.c:2655:d0v5 Domain attempted WRMSR 0176 from 0x to 0x81b30dc0. (XEN) [2015-06-25 10:38:46.279] traps.c:2655:d0v5 Domain attempted WRMSR c084 from 0x00074700 to 0x00047700. (XEN) [2015-06-25 10:38:46.739] PCI add device :00:00.0 (XEN) [2015-06-25 10:38:46.740] PCI add device :00:00.2 (XEN) [2015-06-25 10:38:46.740] PCI add device :00:02.0 (XEN) [2015-06-25 10:38:46.740] PCI add device :00:03.0 (XEN) [2015-06-25 10:38:46.740] PCI add device :00:05.0 (XEN) [2015-06-25 10:38:46.740] PCI add device :00:06.0 (XEN)
Re: [Xen-devel] [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 25 June 2015 11:47 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset On 24.06.15 at 13:24, paul.durr...@citrix.com wrote: @@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access( for ( ;; ) { -rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, -*buffer); -if ( rc != X86EMUL_OKAY ) -break; +/* Have we already done this chunk? */ +if ( (*off + chunk) = vio-mmio_cache[dir].size ) I can see why you would like to get rid of the address check, but I'm afraid you can't: You have to avoid getting mixed up multiple same kind (reads or writes) memory accesses that a single instruction can do. While generally I would assume that secondary accesses (like the I/O bitmap read associated with an OUTS) wouldn't go to MMIO, CMPS with both operands being in MMIO would break even if neither crosses a page boundary (not to think of when the emulator starts supporting the scatter/gather instructions, albeit supporting them will require further changes, or we could choose to do them one element at a time). Ok. Can I assume at most two distinct set of addresses for read or write? If so then I can just keep two sets of caches in the hvm_io struct. +{ +ASSERT(*off + chunk = vio-mmio_cache[dir].size); I don't see any difference to the if() expression just above. That's possible - this has been through a few re-bases. +if ( dir == IOREQ_READ ) +memcpy(buffer[*off], + vio-mmio_cache[IOREQ_READ].buffer[*off], + chunk); +else +{ +if ( memcmp(buffer[*off], else if please. Ok. +vio-mmio_cache[IOREQ_WRITE].buffer[*off], +chunk) != 0 ) +domain_crash(curr-domain); +} +} +else +{ +ASSERT(*off == vio-mmio_cache[dir].size); + +rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, +buffer[*off]); +if ( rc != X86EMUL_OKAY ) +break; + +/* Note that we have now done this chunk */ Missing stop. Ok. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a buffer
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 25 June 2015 11:50 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: RE: [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a buffer On 25.06.15 at 12:32, paul.durr...@citrix.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 25 June 2015 10:58 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a buffer On 24.06.15 at 13:24, paul.durr...@citrix.com wrote: If memory mapped I/O is 'chunked' then the I/O must be re-emulated, otherwise only the first chunk will be processed. This patch makes sure all I/O from a buffer is re-emulated regardless of whether it is a read or a write. I'm not sure I understand this: Isn't the reason for treating reads and writes differently due to the fact that MMIO reads may have side effects, and hence can't be re-issued (whereas writes are always the last thing an instruction does, and hence can't hold up retiring of it, and hence don't need retrying)? Read were always re-issued, which is why handle_mmio() is called in hvm_io_assit(). If the underlying MMIO is deferred to QEMU then this is the only way for Xen to pick up the result. This patch adds completion for writes. If the I/O has been broken down in the underlying hvmemul_write() and a 'chunk' deferred to QEMU then there is actually need to re-emulate otherwise any remaining chunks will not be handled. Furthermore, doesn't only the first chunk get represented correctly already by informing the caller that only a single iteration of a repeated instruction was done, such that further repeats will get carried out anyway (resulting in another, fresh cycle through the emulator)? No, because we're talking about 'chunks' here and not 'reps'. If a single non-rep I/O is broken down into, say, two chunks then we: - Issue the I/O for the first chunk to QEMU - Say we did nothing by returning RETRY - Re-issue the emulation from hvm_io_assist() - Pick up the result of the first chunk from the ioreq, add it to the cache, and issue the second chunk to QEMU - Say we did nothing by returning RETRY - Re-issue the emulation from hvm_io_assist() - Pick up the result of the first chunk from the cache and pick up the result of the second chunk from the ioreq - Say we completed the I/O by returning OKAY I agree it's not nice, and bouncing would have been preferable, but that's the way 'wide I/O' works. I see. Which means Acked-by: Jan Beulich jbeul...@suse.com Thanks. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build
On Tue, Jun 23, 2015 at 05:57:24PM +0800, Tiejun Chen wrote: After commit 5dff8e9eedc7, libxc/libxl: fill xc_hvm_build_args in libxl is introduced, we won't check to set args.mmio_size inside xc_hvm_build as before. So instead, we need to do this before call that. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Wei Liu wei.l...@citrix.com Sigh. I missed this because libxl doesn't use this function and there is no in tree xend anymore. I think you should move this earlier in this series. Presumably your RDM changes depend on this. Wei. --- v4: * Separate this from currenpt patch #14 since this is specific to xc. tools/libxc/xc_hvm_build_x86.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index 003ea06..7343e87 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -754,6 +754,8 @@ int xc_hvm_build_target_mem(xc_interface *xch, args.mem_size = (uint64_t)memsize 20; args.mem_target = (uint64_t)target 20; args.image_file_name = image_name; +if ( args.mmio_size == 0 ) +args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH; return xc_hvm_build(xch, domid, args); } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V5 2/7] libxl_read_file_contents: add new entry to read sysfs file
Chunyan Liu writes ([PATCH V5 2/7] libxl_read_file_contents: add new entry to read sysfs file): Sysfs file has size=4096 but actual file content is less than that. Current libxl_read_file_contents will treat it as error when file size and actual file content differs, so reading sysfs file content with this function always fails. Add a new entry libxl_read_sysfs_file_contents to handle sysfs file specially. It would be used in later pvusb work. I think this still fails to detect a situation where the file is unexpectedly longer than the requested size ? As we wrote earlier: Is there any risk that the file is actually bigger than advertised, rather than smaller ? For sysfs file, couldn't be bigger. Then you should detect the condition that the file is bigger, and call it an error. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset
On 25.06.15 at 12:55, paul.durr...@citrix.com wrote: From: Paul Durrant Sent: 25 June 2015 11:52 From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 25 June 2015 11:47 On 24.06.15 at 13:24, paul.durr...@citrix.com wrote: @@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access( for ( ;; ) { -rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, -*buffer); -if ( rc != X86EMUL_OKAY ) -break; +/* Have we already done this chunk? */ +if ( (*off + chunk) = vio-mmio_cache[dir].size ) I can see why you would like to get rid of the address check, but I'm afraid you can't: You have to avoid getting mixed up multiple same kind (reads or writes) memory accesses that a single instruction can do. While generally I would assume that secondary accesses (like the I/O bitmap read associated with an OUTS) wouldn't go to MMIO, CMPS with both operands being in MMIO would break even if neither crosses a page boundary (not to think of when the emulator starts supporting the scatter/gather instructions, albeit supporting them will require further changes, or we could choose to do them one element at a time). Ok. Can I assume at most two distinct set of addresses for read or write? If so then I can just keep two sets of caches in the hvm_io struct. Oh, I mean linear addresses here BTW. Yes, that's what I implied - afaics switching to using linear addresses shouldn't result in any problem (but then again I wonder whether physical addresses really were chosen originally for no real reason). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM
On Tue, Jun 23, 2015 at 05:57:25PM +0800, Tiejun Chen wrote: While building a VM, HVM domain builder provides struct hvm_info_table{} to help hvmloader. Currently it includes two fields to construct guest e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should check them to fix any conflict with RAM. RAM - RDM? RMRR can reside in address space beyond 4G theoretically, but we never see this in real world. So in order to avoid breaking highmem layout we don't solve highmem conflict. Note this means highmem rmrr could still be supported if no conflict. But in the case of lowmem, RMRR probably scatter the whole RAM space. Especially multiple RMRR entries would worsen this to lead a complicated memory layout. And then its hard to extend hvm_info_table{} to work hvmloader out. So here we're trying to figure out a simple solution to avoid breaking existing layout. So when a conflict occurs, #1. Above a predefined boundary (2G) - move lowmem_end below reserved region to solve conflict; #2. Below a predefined boundary (2G) - Check strict/relaxed policy. strict policy leads to fail libxl. Note when both policies are specified on a given region, 'strict' is always preferred. relaxed policy issue a warning message and also mask this entry INVALID to indicate we shouldn't expose this entry to hvmloader. Note later we need to provide a parameter to set that predefined boundary dynamically. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Kevin Tian kevint.t...@intel.com --- v4: * Consistent to use term RDM. * Unconditionally set *nr_entries to 0 * Grab to all sutffs to provide a parameter to set our predefined boundary dynamically to as a separated patch later tools/libxl/libxl_create.c | 2 +- tools/libxl/libxl_dm.c | 259 +++ tools/libxl/libxl_dom.c | 17 ++- tools/libxl/libxl_internal.h | 11 +- tools/libxl/libxl_types.idl | 7 ++ 5 files changed, 293 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 6c8ec63..30e6593 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -460,7 +460,7 @@ int libxl__domain_build(libxl__gc *gc, switch (info-type) { case LIBXL_DOMAIN_TYPE_HVM: -ret = libxl__build_hvm(gc, domid, info, state); +ret = libxl__build_hvm(gc, domid, d_config, state); if (ret) goto out; diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 33f9ce6..5436bcf 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -90,6 +90,265 @@ const char *libxl__domain_device_model(libxl__gc *gc, return dm; } +static struct xen_reserved_device_memory +*xc_device_get_rdm(libxl__gc *gc, + uint32_t flag, + uint16_t seg, + uint8_t bus, + uint8_t devfn, + unsigned int *nr_entries) I just notice this function lives in libxl_dm.c. The function should be renamed to libxl__xc_device_get_rdm. This function should return proper libxl error code (ERROR_FAIL or something more appropriate). The allocated RDM entries should be returned with an out parameter. I had always thought this lived in libxc. Sorry for not having noticed this earlier. +{ +struct xen_reserved_device_memory *xrdm; +int rc; + +/* + * We really can't presume how many entries we can get in advance. + */ +*nr_entries = 0; +rc = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, + NULL, nr_entries); +assert(rc = 0); +/* 0 means we have no any rdm entry. */ +if (!rc) +goto out; + +if (errno == ENOBUFS) { +xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t)); libxl__malloc(gc, ...); +if (!xrdm) { +LOG(ERROR, Could not allocate RDM buffer!\n); +goto out; +} Get rid of this. +rc = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, + xrdm, nr_entries); +if (rc) { +LOG(ERROR, Could not get reserved device memory maps.\n); +*nr_entries = 0; +free(xrdm); +xrdm = NULL; Get rid of free. +} +} else +LOG(ERROR, Could not get reserved device memory maps.\n); + + out: +return xrdm; +} The reset of this patch looks good to me. It does what we've discussed. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] xen: Allow xen tools to run in guest using 64K page granularity
On Thu, 2015-06-25 at 11:21 +0100, Wei Liu wrote: On Mon, May 11, 2015 at 12:55:34PM +0100, Julien Grall wrote: Hi all, This small series are the only changes required in Xen in order to run a guest using 64K page granularity on top of an unmodified Xen. I'd like feedback from maintainers tools to know if it might be worth to introduce a function xc_pagesize() replicating the behavior of getpagesize() for Xen. Can we start with documenting the ABI (?) for communicating between guests with different page sizes? We should certainly make it clearer what things are in terms of Xen ABI page size vs the guest's page size and other things. I think we can commit these two without that though? Or at least mention the ring mfn always has the size of XC_PAGE_SIZE (if that's the case). Wei. Sincerely yours, Julien Grall (2): tools/xenstored: Use XC_PAGE_SIZE rather than getpagesize() tools/xenconsoled: Use XC_PAGE_SIZE rather than getpagesize() tools/console/daemon/io.c | 4 ++-- tools/xenstore/xenstored_domain.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map
The subject line should be changed. You're not extending that hypercall. libxl: construct e820 map with RDM information for HVM guest On Tue, Jun 23, 2015 at 05:57:27PM +0800, Tiejun Chen wrote: Here we'll construct a basic guest e820 table via XENMEM_set_memory_map. This table includes lowmem, highmem and RDMs if they exist. And hvmloader would need this info later. I have one question. When RDM is disabled, the generated e820 map should look exactly the same as before (i.e. without this patch), right? Whatever the answer is, please say that in your commit log. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v4: * Use goto style error handling. * Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820. tools/libxl/libxl_dom.c | 5 +++ tools/libxl/libxl_internal.h | 24 + tools/libxl/libxl_x86.c | 83 3 files changed, 112 insertions(+) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 0987991..bc8fd5b 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, goto out; } +if (libxl__domain_construct_e820(gc, d_config, domid, args)) { +LOG(ERROR, setting domain memory map failed); +goto out; +} + ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port, state-store_mfn, state-console_port, state-console_mfn, state-store_domid, diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index c0acf11..ae2f5e0 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3714,6 +3714,30 @@ static inline void libxl__update_config_vtpm(libxl__gc *gc, */ void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr, const libxl_bitmap *sptr); + +/* + * Here we're just trying to set these kinds of e820 mappings: + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * Note: Those stuffs below 1M are still constructed with multiple + * e820 entries by hvmloader. At this point we don't change anything. + * + * #2. RDM region if it exists + * + * #3. High memory region if it exists + * + * Note: these regions are not overlapping since we already check + * to adjust them. Please refer to libxl__domain_device_construct_rdm(). + */ +int libxl__domain_construct_e820(libxl__gc *gc, hidden Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough ARM Design : Draft1
On Thursday 25 June 2015 02:41 PM, Ian Campbell wrote: On Thu, 2015-06-25 at 13:14 +0530, Manish Jaggi wrote: On Wednesday 17 June 2015 07:59 PM, Ian Campbell wrote: On Wed, 2015-06-17 at 07:14 -0700, Manish Jaggi wrote: On Wednesday 17 June 2015 06:43 AM, Ian Campbell wrote: On Wed, 2015-06-17 at 13:58 +0100, Stefano Stabellini wrote: Yes, pciback is already capable of doing that, see drivers/xen/xen-pciback/conf_space.c I am not sure if the pci-back driver can query the guest memory map. Is there an existing hypercall ? No, that is missing. I think it would be OK for the virtual BAR to be initialized to the same value as the physical BAR. But I would let the guest change the virtual BAR address and map the MMIO region wherever it wants in the guest physical address space with XENMEM_add_to_physmap_range. I disagree, given that we've apparently survived for years with x86 PV guests not being able to right to the BARs I think it would be far simpler to extend this to ARM and x86 PVH too than to allow guests to start writing BARs which has various complex questions around it. All that's needed is for the toolstack to set everything up and write some new xenstore nodes in the per-device directory with the BAR address/size. Also most guests apparently don't reassign the PCI bus by default, so using a 1:1 by default and allowing it to be changed would require modifying the guests to reasssign. Easy on Linux, but I don't know about others and I imagine some OSes (especially simpler/embedded ones) are assuming the firmware sets up something sane by default. Does the Flow below captures all points a) When assigning a device to domU, toolstack creates a node in per device directory with virtual BAR address/size Option1: b) toolstack using some hypercall ask xen to create p2m mapping { virtual BAR : physical BAR } for domU While implementing I think rather than the toolstack, pciback driver in dom0 can send the hypercall by to map the physical bar to virtual bar. Thus no xenstore entry is required for BARs. pciback doesn't (and shouldn't) have sufficient knowledge of the guest address space layout to determine what the virtual BAR should be. The toolstack is the right place for that decision to be made. Yes, the point is the pciback driver reads the physical BAR regions on request from domU. So it sends a hypercall to map the physical bars into stage2 translation for the domU through xen. Xen would use the holes left in IPA for MMIO. Xen would return the IPA for pci-back to return to the request to domU. Moreover a pci driver would read BARs only once. You can't assume that though, a driver can do whatever it likes, or the module might be unloaded and reloaded in the guest etc etc. Are you going to send out a second draft based on the discussion so far? yes, I was working on that only. I was traveling this week 24 hour flights jetlag... Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] xen/arm: Propagate clock-frequency to DOMU if present in the DT timer node
On Fri, 2015-06-19 at 13:41 +0100, Julien Grall wrote: When the property clock-frequency is present in the DT timer node, it means that the bootloader/firmware didn't correctly configure the CNTFRQ/CNTFRQ_EL0 on each processor. The best solution would be to fix the offending firmware/bootloader, although it may not always be possible to modify and re-flash it. As it's not possible to trap the register CNTFRQ/CNTFRQ_EL0, we have to extend xen_arch_domainconfig to provide the timer frequency to the toolstack when the property clock-frequency is present to the host DT timer node. Then, a property clock-frequency will be created in the guest DT timer node if the value is not 0. We could have set the property in the guest DT no matter if the property is present in the host DT. Although, we still want to let the guest using CNTFRQ in normal case. After all, the property clock-frequency is just a workaround for buggy firmware. Also add a stub for fdt_property_u32 which is not present in libfdt 1.4.0 used by distribution such as Debian Wheezy. Signed-off-by: Julien Grall julien.gr...@citrix.com Tested-by: Chris Brand chris.br...@broadcom.com Acked + applied, thanks This patch requires to regenerate tools/configure. Done. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 13/17] x86/hvm: remove HVMIO_dispatched I/O state
On 24/06/15 12:24, Paul Durrant wrote: +#define HVMIO_NEED_COMPLETION(_vio) \ +( ((_vio)-io_state == HVMIO_awaiting_completion) \ + !(_vio)-io_data_is_addr \ + ((_vio)-io_dir == IOREQ_READ) ) Please can this be a static inline which takes a const pointer. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: pci-passthrough of device using MSI-X interrupts not working after commit x86/MSI: track host and guest masking separately
On 25.06.15 at 14:02, li...@eikelenboom.it wrote: Thursday, June 25, 2015, 1:29:39 PM, you wrote: I'd be curious what the guest view of the MSI-X table entries is at that point. Can you still use the console inside the guest? If so, sufficiently verbose lspci of the device should be able to tell us (hoping that this isn't a Windows guest), or a dd of /dev/mem at the right offset. Perhaps there are also way to get at that from qemu, but I do not know how. The guest(linux) keeps running, only that terminal with the lsusb command hangs, so no problem to gather the lspci output. Guest lspci -vvvknn attached. Hmm, no, this Capabilities: [90] MSI-X: Enable+ Count=8 Masked- Vector table: BAR=0 offset=1000 PBA: BAR=0 offset=1080 isn't enough. I was sure I saw lspci capable of listing the individual table entries... Btw., are (XEN) [2015-06-25 10:44:26.550] traps.c:3227: GPF (): 82d0801d8282 - 82d080239eec (XEN) [2015-06-25 10:44:26.550] traps.c:3227: GPF (): 82d0801d8282 - 82d080239eec (XEN) [2015-06-25 10:44:26.550] traps.c:3227: GPF (): 82d0801d8282 - 82d080239eec (XEN) [2015-06-25 10:44:26.550] traps.c:3227: GPF (): 82d0801d8282 - 82d080239eec new? Did you ever try to figure out what they're being caused by? No those aren't new (they are present for at least some months now), something in a booting guest kernel triggers those, not only for HVM's but also for PV guests (and so they also appear for dom0). No, the Dom0 ones were different from what I recall. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio check op
On 25/06/15 13:46, Jan Beulich wrote: On 25.06.15 at 14:21, andrew.coop...@citrix.com wrote: On 24/06/15 12:24, Paul Durrant wrote: When memory mapped I/O is range checked by internal handlers, the length of the access should be taken into account. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com For what purpose? The length of the access doesn't affect which handler should accept the IO. This length check now causes an MMIO handler to not claim an access which straddles the upper boundary. It is probably fine to terminate such an access early, but it isn't fine to pass such a straddled access to the default ioreq server. No, without involving the length in the check we can end up with check() saying Yes, mine but read() or write() saying Not me. What I would agree with is for the generic handler to split the access if the first byte fits, but the final byte doesn't. I discussed this with Paul over lunch. I had not considered how IO gets forwarded to the device model for shared implementations. Is it reasonable to split a straddled access and direct the halves at different handlers? This is not in line with how other hardware behaves (PCIe will reject any straddled access). Furthermore, given small MMIO regions and larger registers, there is no guarantee that a single split will suffice. I see in the other thread going on that a domain_crash() is deemed ok for now, which is fine my me. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio check op
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 25 June 2015 14:48 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: RE: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25.06.15 at 15:36, paul.durr...@citrix.com wrote: I think that also allows me to simplfy the patch since I don't have to modify the mmio_check op any more. I simply call it once for the first byte of the access and, if it accepts, verify that it also accepts the last byte of the access. That's actually not (generally) okay: There could be a hole in the middle. But as long as instructions don't do accesses wider than a page, we're fine with that in practice I think. Or wait, no, in the MSI-X this could not be okay: A 64-byte read to the 16 bytes 32 bytes away from a page boundary (and being the last entry on one device's MSI-X table) would extend into another device's MSI-X table on the next page. I.e. first and last bytes would be okay to be accessed, but bytes 16...31 of the access wouldn't. Of course the MSI-X read/write handlers don't currently permit such wide accesses, but anyway... We could also verify that, for a rep op, all reads/writes come back with OKAY. I think that would be ok. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 09/11] libxc: support XEN_DOMCTL_soft_reset operation
On Tue, Jun 23, 2015 at 06:11:51PM +0200, Vitaly Kuznetsov wrote: Introduce xc_domain_soft_reset() function supporting XEN_DOMCTL_soft_reset. Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com Acked-by: Wei Liu wei.l...@citrix.com --- tools/libxc/include/xenctrl.h | 3 +++ tools/libxc/xc_domain.c | 9 + 2 files changed, 12 insertions(+) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index d1d2ab3..7aa0e81 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1301,6 +1301,9 @@ int xc_domain_setvnuma(xc_interface *xch, unsigned int *vcpu_to_vnode, unsigned int *vnode_to_pnode); +int xc_domain_soft_reset(xc_interface *xch, + uint32_t domid); + #if defined(__i386__) || defined(__x86_64__) /* * PC BIOS standard E820 types and structure. diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index ce51e69..a59d0b0 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -2452,6 +2452,15 @@ int xc_domain_setvnuma(xc_interface *xch, return rc; } + +int xc_domain_soft_reset(xc_interface *xch, + uint32_t domid) +{ +DECLARE_DOMCTL; +domctl.cmd = XEN_DOMCTL_soft_reset; +domctl.domain = (domid_t)domid; +return do_domctl(xch, domctl); +} /* * Local variables: * mode: C -- 2.4.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 02/11] x86/intel_pstate: add some calculation related support
The added calculation related functions will be used in the intel_pstate.c. They are copied from the Linux kernel(commit 2418f4f2, f3002134, eb18cba7). v4 changes: 1) in commit message, kernel changed to Linux kernel 2) if-else coding style change. Signed-off-by: Wei Wang wei.w.w...@intel.com --- xen/include/asm-x86/div64.h | 78 + xen/include/xen/kernel.h| 12 +++ 2 files changed, 90 insertions(+) diff --git a/xen/include/asm-x86/div64.h b/xen/include/asm-x86/div64.h index dd49f64..1f171ba 100644 --- a/xen/include/asm-x86/div64.h +++ b/xen/include/asm-x86/div64.h @@ -11,4 +11,82 @@ __rem; \ }) +static inline uint64_t div_u64_rem(uint64_t dividend, uint32_t divisor, + uint32_t *remainder) +{ +*remainder = do_div(dividend, divisor); +return dividend; +} + +static inline uint64_t div_u64(uint64_t dividend, uint32_t divisor) +{ +uint32_t remainder; + +return div_u64_rem(dividend, divisor, remainder); +} + +/* + * div64_u64 - unsigned 64bit divide with 64bit divisor + * @dividend:64bit dividend + * @divisor:64bit divisor + * + * This implementation is a modified version of the algorithm proposed + * by the book 'Hacker's Delight'. The original source and full proof + * can be found here and is available for use without restriction. + * + * 'http://www.hackersdelight.org/HDcode/newCode/divDouble.c.txt' + */ +static inline uint64_t div64_u64(uint64_t dividend, uint64_t divisor) +{ +uint32_t high = divisor 32; +uint64_t quot; + +if (high == 0) +quot = div_u64(dividend, divisor); +else +{ +int n = 1 + fls(high); +quot = div_u64(dividend n, divisor n); + +if (quot != 0) +quot--; +if ((dividend - quot * divisor) = divisor) +quot++; +} +return quot; +} + +static inline int64_t div_s64_rem(int64_t dividend, int32_t divisor, + int32_t *remainder) +{ +int64_t quotient; + +if (dividend 0) +{ +quotient = div_u64_rem(-dividend, ABS(divisor), +(uint32_t *)remainder); +*remainder = -*remainder; +if (divisor 0) +quotient = -quotient; +} +else +{ +quotient = div_u64_rem(dividend, ABS(divisor), +(uint32_t *)remainder); +if (divisor 0) +quotient = -quotient; +} +return quotient; +} + +/* + * div_s64 - signed 64bit divide with 32bit divisor + */ +static inline int64_t div_s64(int64_t dividend, int32_t divisor) +{ +int32_t remainder; + +return div_s64_rem(dividend, divisor, remainder); +} + #endif diff --git a/xen/include/xen/kernel.h b/xen/include/xen/kernel.h index 548b64d..bfdcdb6 100644 --- a/xen/include/xen/kernel.h +++ b/xen/include/xen/kernel.h @@ -42,6 +42,18 @@ #define MIN(x,y) ((x) (y) ? (x) : (y)) #define MAX(x,y) ((x) (y) ? (x) : (y)) +/* + * clamp_t - return a value clamped to a given range using a given type + * @type: the type of variable to use + * @val: current value + * @lo: minimum allowable value + * @hi: maximum allowable value + * + * This macro does no typechecking and uses temporary variables of type + * 'type' to make all the comparisons. + */ +#define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) + /** * container_of - cast a member of a structure out to the containing structure * -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 05/11] x86/intel_pstate: relocate the driver register function
Register the CPU hotplug notifier when the driver is registered, and move the driver register function to the cpufreq.c. v4 changes: 1) Coding style change (the position of ||). Signed-off-by: Wei Wang wei.w.w...@intel.com --- xen/drivers/cpufreq/cpufreq.c | 14 +++--- xen/include/acpi/cpufreq/cpufreq.h | 27 +-- 2 files changed, 12 insertions(+), 29 deletions(-) diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c index 91b6c25..acc4bb5 100644 --- a/xen/drivers/cpufreq/cpufreq.c +++ b/xen/drivers/cpufreq/cpufreq.c @@ -630,10 +630,18 @@ static struct notifier_block cpu_nfb = { .notifier_call = cpu_callback }; -static int __init cpufreq_presmp_init(void) +int cpufreq_register_driver(struct cpufreq_driver *driver_data) { +if (!driver_data || !driver_data-init || +!driver_data-verify || !driver_data-exit || +(!driver_data-target == !driver_data-setpolicy)) +return -EINVAL; + +if (cpufreq_driver) +return -EBUSY; + +cpufreq_driver = driver_data; + register_cpu_notifier(cpu_nfb); return 0; } -presmp_initcall(cpufreq_presmp_init); - diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h index af37e90..502774f 100644 --- a/xen/include/acpi/cpufreq/cpufreq.h +++ b/xen/include/acpi/cpufreq/cpufreq.h @@ -183,32 +183,7 @@ struct cpufreq_driver { extern struct cpufreq_driver *cpufreq_driver; -static __inline__ -int cpufreq_register_driver(struct cpufreq_driver *driver_data) -{ -if (!driver_data || -!driver_data-init || -!driver_data-exit || -!driver_data-verify || -!driver_data-target) -return -EINVAL; - -if (cpufreq_driver) -return -EBUSY; - -cpufreq_driver = driver_data; -return 0; -} - -static __inline__ -int cpufreq_unregister_driver(struct cpufreq_driver *driver) -{ -if (!cpufreq_driver || (driver != cpufreq_driver)) -return -EINVAL; - -cpufreq_driver = NULL; -return 0; -} +extern int cpufreq_register_driver(struct cpufreq_driver *driver_data); static __inline__ void cpufreq_verify_within_limits(struct cpufreq_policy *policy, -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 06/11] x86/intel_pstate: APERF/MPERF feature detect
Add support to detect the APERF/MPERF feature. Also, remove the identical code in cpufreq.c and powernow.c. v4 changes: 1) this is a new consolidated patch dealing with the APERF/MPERF feature detection. Signed-off-by: Wei Wang wei.w.w...@intel.com --- xen/arch/x86/acpi/cpufreq/cpufreq.c | 6 ++ xen/arch/x86/acpi/cpufreq/powernow.c | 6 ++ xen/arch/x86/cpu/common.c| 3 +++ xen/include/asm-x86/cpufeature.h | 1 + 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c index fa3678d..643c405 100644 --- a/xen/arch/x86/acpi/cpufreq/cpufreq.c +++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c @@ -51,7 +51,6 @@ enum { }; #define INTEL_MSR_RANGE (0xull) -#define CPUID_6_ECX_APERFMPERF_CAPABILITY (0x1) struct acpi_cpufreq_data *cpufreq_drv_data[NR_CPUS]; @@ -352,10 +351,9 @@ static unsigned int get_cur_freq_on_cpu(unsigned int cpu) static void feature_detect(void *info) { struct cpufreq_policy *policy = info; -unsigned int eax, ecx; +unsigned int eax; -ecx = cpuid_ecx(6); -if (ecx CPUID_6_ECX_APERFMPERF_CAPABILITY) { +if (boot_cpu_has(X86_FEATURE_APERFMPERF)) { policy-aperf_mperf = 1; acpi_cpufreq_driver.getavg = get_measured_perf; } diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c index 2c9fea2..b5b752c 100644 --- a/xen/arch/x86/acpi/cpufreq/powernow.c +++ b/xen/arch/x86/acpi/cpufreq/powernow.c @@ -38,7 +38,6 @@ #include acpi/acpi.h #include acpi/cpufreq/cpufreq.h -#define CPUID_6_ECX_APERFMPERF_CAPABILITY (0x1) #define CPUID_FREQ_VOLT_CAPABILITIES0x8007 #define CPB_CAPABLE 0x0200 #define USE_HW_PSTATE 0x0080 @@ -212,10 +211,9 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy) static void feature_detect(void *info) { struct cpufreq_policy *policy = info; -unsigned int ecx, edx; +unsigned int edx; -ecx = cpuid_ecx(6); -if (ecx CPUID_6_ECX_APERFMPERF_CAPABILITY) { +if (boot_cpu_has(X86_FEATURE_APERFMPERF)) { policy-aperf_mperf = 1; powernow_cpufreq_driver.getavg = get_measured_perf; } diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c index e105aeb..dba29c0 100644 --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -238,6 +238,9 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 *c) if ( cpu_has(c, X86_FEATURE_CLFLSH) ) c-x86_clflush_size = ((ebx 8) 0xff) * 8; + if (cpuid_ecx(6) 0x1) + set_bit(X86_FEATURE_APERFMPERF, c-x86_capability); + /* AMD-defined flags: level 0x8001 */ c-extended_cpuid_level = cpuid_eax(0x8000); if ( (c-extended_cpuid_level 0x) == 0x8000 ) { diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h index 7963a3a..efc9711 100644 --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -69,6 +69,7 @@ #define X86_FEATURE_XTOPOLOGY(3*32+13) /* cpu topology enum extensions */ #define X86_FEATURE_CPUID_FAULTING (3*32+14) /* cpuid faulting */ #define X86_FEATURE_CLFLUSH_MONITOR (3*32+15) /* clflush reqd with monitor */ +#define X86_FEATURE_APERFMPERF (3*32+28) /* APERFMPERF */ /* Intel-defined CPU features, CPUID level 0x0001 (ecx), word 4 */ #define X86_FEATURE_XMM3 (4*32+ 0) /* Streaming SIMD Extensions-3 */ -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 08/11] x86/intel_pstate: changes in cpufreq_del_cpu for CPU offline
We change to NULL the cpufreq_cpu_policy pointer after the call of cpufreq_driver-exit, because the pointer is still needed in intel_pstate_set_pstate(). v4 changes: None. Signed-off-by: Wei Wang wei.w.w...@intel.com --- xen/drivers/cpufreq/cpufreq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c index acc4bb5..d1b423f 100644 --- a/xen/drivers/cpufreq/cpufreq.c +++ b/xen/drivers/cpufreq/cpufreq.c @@ -335,12 +335,11 @@ int cpufreq_del_cpu(unsigned int cpu) /* for HW_ALL, stop gov for each core of the _PSD domain */ /* for SW_ALL SW_ANY, stop gov for the 1st core of the _PSD domain */ -if (hw_all || (cpumask_weight(cpufreq_dom-map) == - perf-domain_info.num_processors)) +if (!policy-internal_gov (hw_all || (cpumask_weight(cpufreq_dom-map) == + perf-domain_info.num_processors))) __cpufreq_governor(policy, CPUFREQ_GOV_STOP); cpufreq_statistic_exit(cpu); -per_cpu(cpufreq_cpu_policy, cpu) = NULL; cpumask_clear_cpu(cpu, policy-cpus); cpumask_clear_cpu(cpu, cpufreq_dom-map); @@ -349,6 +348,7 @@ int cpufreq_del_cpu(unsigned int cpu) free_cpumask_var(policy-cpus); xfree(policy); } +per_cpu(cpufreq_cpu_policy, cpu) = NULL; /* for the last cpu of the domain, clean room */ /* It's safe here to free freq_table, drv_data and policy */ -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 04/11] x86/intel_pstate: avoid calling cpufreq_add_cpu() twice
cpufreq_add_cpu() is already called in the hypercall code path (the bottom of set_px_pminfo() and inside cpufreq_cpu_init()). So, we remove the redundant calling here. v4 changes: None. Signed-off-by: Wei Wang wei.w.w...@intel.com --- xen/drivers/cpufreq/cpufreq.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c index ab66884..91b6c25 100644 --- a/xen/drivers/cpufreq/cpufreq.c +++ b/xen/drivers/cpufreq/cpufreq.c @@ -632,8 +632,6 @@ static struct notifier_block cpu_nfb = { static int __init cpufreq_presmp_init(void) { -void *cpu = (void *)(long)smp_processor_id(); -cpu_callback(cpu_nfb, CPU_ONLINE, cpu); register_cpu_notifier(cpu_nfb); return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST 1/2] mg-debian-installer-update: Print the correct value for TftpDiVersion
On Thu, 2015-06-25 at 11:34 +0100, Ian Jackson wrote: Ian Campbell writes ([PATCH OSSTEST 1/2] mg-debian-installer-update: Print the correct value for TftpDiVersion): That is, the date without the suite suffix. ... -echo $date -echo 2 downloaded $dstroot/$arch/$date +echo New TftpDiVersion: $date +echo 2 downloaded $dstroot/$dst You could make the output suitable for cp ? +echo TftpDiVersion $date Good idea. Shall I resend or just do it on commit? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] vif-bridge: ip link set failed, name too long
Hi, When one tries to start an HVM guest via OpenStack, which is setup with Neutron for network, the guest creation always fail. Here are a few relevent logs: /var/log/libvirt/libxl/libxl-driver.log: libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge add [-1] exited with error status 1 libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script: ip link set vif188.0-emu name tap695cf459-b0-emu failed libxl: debug: libxl_event.c:618:libxl__ev_xswatch_deregister: watch w=0x7f5a9c05ddd0: deregister unregistered libxl: error: libxl_create.c:1226:domcreate_attach_vtpms: unable to add nic devices /var/log/xen/xen-hotplug.log: Error: argument tap695cf459-b0-emu is wrong: name too long The libvirt config, from Nova: interface type='bridge' mac address='fa:16:3e:b0:cd:2a'/ source bridge='qbr695cf459-b0'/ target dev='tap695cf459-b0'/ /interface Thanks, -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio check op
-Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 25 June 2015 14:34 To: Jan Beulich Cc: Paul Durrant; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25/06/15 13:46, Jan Beulich wrote: On 25.06.15 at 14:21, andrew.coop...@citrix.com wrote: On 24/06/15 12:24, Paul Durrant wrote: When memory mapped I/O is range checked by internal handlers, the length of the access should be taken into account. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com For what purpose? The length of the access doesn't affect which handler should accept the IO. This length check now causes an MMIO handler to not claim an access which straddles the upper boundary. It is probably fine to terminate such an access early, but it isn't fine to pass such a straddled access to the default ioreq server. No, without involving the length in the check we can end up with check() saying Yes, mine but read() or write() saying Not me. What I would agree with is for the generic handler to split the access if the first byte fits, but the final byte doesn't. I discussed this with Paul over lunch. I had not considered how IO gets forwarded to the device model for shared implementations. Is it reasonable to split a straddled access and direct the halves at different handlers? This is not in line with how other hardware behaves (PCIe will reject any straddled access). Furthermore, given small MMIO regions and larger registers, there is no guarantee that a single split will suffice. I see in the other thread going on that a domain_crash() is deemed ok for now, which is fine my me. I think that also allows me to simplfy the patch since I don't have to modify the mmio_check op any more. I simply call it once for the first byte of the access and, if it accepts, verify that it also accepts the last byte of the access. Paul ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: pci-passthrough of device using MSI-X interrupts not working after commit x86/MSI: track host and guest masking separately
On 2015-06-25 15:37, Jan Beulich wrote: On 25.06.15 at 15:16, li...@eikelenboom.it wrote: Thursday, June 25, 2015, 2:40:18 PM, you wrote: Hmm, no, this Capabilities: [90] MSI-X: Enable+ Count=8 Masked- Vector table: BAR=0 offset=1000 PBA: BAR=0 offset=1080 isn't enough. I was sure I saw lspci capable of listing the individual table entries... It seems to be the most verbose option for my lspci of debian Jessie. So probably a debug-patch would be best ? Yes, but I'm not sure when I'd get to it (being on vacation all next week). Jan Ok no problem no hurry, reverting the commit and the following cleanup to get a clean revert, fixes it for me. It can wait (or Andrew should beat you to it ;) ) Have a good vacation ! -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v] Add some sanity checks for presence of Repos configuration
Ian Campbell writes ([PATCH OSSTEST v] Add some sanity checks for presence of Repos configuration): By providing an explicit fetch method in cri-getconfig which checks things. Without this then anything which uses cr-daily-branch produces the rather cryptic: + test -f daily.xsettings ++ ./ap-print-url xen-unstable with-lock-ex ./ap-print-url: /lock: Permission denied + treeurl= FAILED rc=255 Which has caught out one or two people using standalone mode. Acked-by: Ian Jackson ian.jack...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a buffer
On 25.06.15 at 12:32, paul.durr...@citrix.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 25 June 2015 10:58 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 16/17] x86/hvm: always re-emulate I/O from a buffer On 24.06.15 at 13:24, paul.durr...@citrix.com wrote: If memory mapped I/O is 'chunked' then the I/O must be re-emulated, otherwise only the first chunk will be processed. This patch makes sure all I/O from a buffer is re-emulated regardless of whether it is a read or a write. I'm not sure I understand this: Isn't the reason for treating reads and writes differently due to the fact that MMIO reads may have side effects, and hence can't be re-issued (whereas writes are always the last thing an instruction does, and hence can't hold up retiring of it, and hence don't need retrying)? Read were always re-issued, which is why handle_mmio() is called in hvm_io_assit(). If the underlying MMIO is deferred to QEMU then this is the only way for Xen to pick up the result. This patch adds completion for writes. If the I/O has been broken down in the underlying hvmemul_write() and a 'chunk' deferred to QEMU then there is actually need to re-emulate otherwise any remaining chunks will not be handled. Furthermore, doesn't only the first chunk get represented correctly already by informing the caller that only a single iteration of a repeated instruction was done, such that further repeats will get carried out anyway (resulting in another, fresh cycle through the emulator)? No, because we're talking about 'chunks' here and not 'reps'. If a single non-rep I/O is broken down into, say, two chunks then we: - Issue the I/O for the first chunk to QEMU - Say we did nothing by returning RETRY - Re-issue the emulation from hvm_io_assist() - Pick up the result of the first chunk from the ioreq, add it to the cache, and issue the second chunk to QEMU - Say we did nothing by returning RETRY - Re-issue the emulation from hvm_io_assist() - Pick up the result of the first chunk from the cache and pick up the result of the second chunk from the ioreq - Say we completed the I/O by returning OKAY I agree it's not nice, and bouncing would have been preferable, but that's the way 'wide I/O' works. I see. Which means Acked-by: Jan Beulich jbeul...@suse.com Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] docs: Build ARM documentation
On Sat, 2015-06-20 at 12:37 +0100, Julien Grall wrote: Julien Grall (2): docs: Look for documentation in sub-directories docs: Update INDEX to give a title for each ARM docs Acked + Applied. docs/INDEX| 5 + docs/Makefile | 12 ++-- 2 files changed, 11 insertions(+), 6 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] Revert libxl_set_memory_target: retain the same maxmem offset on top of the current target
On Tue, 2015-06-23 at 17:07 +0100, Wei Liu wrote: This reverts commit 0c029c4da2169159064568ef4fea862a5d2cd84a. A new memory model that allows QEMU to bump memory behind libxl's back was merged a few months ago. We didn't fully understand the repercussions back then. Now it breaks migration and becomes blocker of 4.6 release. It's better to restore to original behaviour at this stage of the release cycle, that would put us in a position no worse than before, so the release is unblocked. The said function is still racy after reverting these two patches. Making domain memory state consistent requires a bit more work. Separate patch(es) will be sent out to deal with that problem. Fix up conflicts with f5b43e95 (libxl: fix xl mem-set regression from 0c029c4da2). Signed-off-by: Wei Liu wei.l...@citrix.com Acked + applied. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] xen{trace/analyze}: fix build on FreeBSD
On Fri, 2015-06-19 at 10:58 +0200, Roger Pau Monne wrote: Fix the build of xentrace/xenalyze on FreeBSD, and possibly other libcs not having argp. Also fix the usage of fstat64 and O_LARGEFILE. Both applied, thanks. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] Build libxc on rump kernel
On Wed, 2015-06-24 at 11:10 +0100, Wei Liu wrote: I have upstreamed a privcmd driver for rump kernel. That driver has the same semantics as the NetBSD one so we can just use xc_netbsd for rump kernel. Wei. Wei Liu (2): NetBSDRump: provide evtchn.h and privcmd.h libxc: use xc_netbsd.c for rump kernel Acked + applied. At some point I may need to pick your brains regarding the refactoring I'm doing to all this stuff.. tools/include/xen-sys/NetBSDRump/evtchn.h | 86 ++ tools/include/xen-sys/NetBSDRump/privcmd.h | 81 ++-- tools/libxc/Makefile | 1 + 3 files changed, 165 insertions(+), 3 deletions(-) create mode 100644 tools/include/xen-sys/NetBSDRump/evtchn.h ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough ARM Design : Draft1
On Thu, 2015-06-25 at 17:29 +0530, Manish Jaggi wrote: On Thursday 25 June 2015 02:41 PM, Ian Campbell wrote: On Thu, 2015-06-25 at 13:14 +0530, Manish Jaggi wrote: On Wednesday 17 June 2015 07:59 PM, Ian Campbell wrote: On Wed, 2015-06-17 at 07:14 -0700, Manish Jaggi wrote: On Wednesday 17 June 2015 06:43 AM, Ian Campbell wrote: On Wed, 2015-06-17 at 13:58 +0100, Stefano Stabellini wrote: Yes, pciback is already capable of doing that, see drivers/xen/xen-pciback/conf_space.c I am not sure if the pci-back driver can query the guest memory map. Is there an existing hypercall ? No, that is missing. I think it would be OK for the virtual BAR to be initialized to the same value as the physical BAR. But I would let the guest change the virtual BAR address and map the MMIO region wherever it wants in the guest physical address space with XENMEM_add_to_physmap_range. I disagree, given that we've apparently survived for years with x86 PV guests not being able to right to the BARs I think it would be far simpler to extend this to ARM and x86 PVH too than to allow guests to start writing BARs which has various complex questions around it. All that's needed is for the toolstack to set everything up and write some new xenstore nodes in the per-device directory with the BAR address/size. Also most guests apparently don't reassign the PCI bus by default, so using a 1:1 by default and allowing it to be changed would require modifying the guests to reasssign. Easy on Linux, but I don't know about others and I imagine some OSes (especially simpler/embedded ones) are assuming the firmware sets up something sane by default. Does the Flow below captures all points a) When assigning a device to domU, toolstack creates a node in per device directory with virtual BAR address/size Option1: b) toolstack using some hypercall ask xen to create p2m mapping { virtual BAR : physical BAR } for domU While implementing I think rather than the toolstack, pciback driver in dom0 can send the hypercall by to map the physical bar to virtual bar. Thus no xenstore entry is required for BARs. pciback doesn't (and shouldn't) have sufficient knowledge of the guest address space layout to determine what the virtual BAR should be. The toolstack is the right place for that decision to be made. Yes, the point is the pciback driver reads the physical BAR regions on request from domU. So it sends a hypercall to map the physical bars into stage2 translation for the domU through xen. Xen would use the holes left in IPA for MMIO. I still think it is the toolstack which should do this, that's whewre these sorts of layout decisions belong. Xen would return the IPA for pci-back to return to the request to domU. Moreover a pci driver would read BARs only once. You can't assume that though, a driver can do whatever it likes, or the module might be unloaded and reloaded in the guest etc etc. Are you going to send out a second draft based on the discussion so far? yes, I was working on that only. I was traveling this week 24 hour flights jetlag... Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio check op
On 24/06/15 12:24, Paul Durrant wrote: When memory mapped I/O is range checked by internal handlers, the length of the access should be taken into account. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com For what purpose? The length of the access doesn't affect which handler should accept the IO. This length check now causes an MMIO handler to not claim an access which straddles the upper boundary. It is probably fine to terminate such an access early, but it isn't fine to pass such a straddled access to the default ioreq server. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxc: delete sent_last_iter
On Thu, 2015-06-18 at 17:37 +0100, Wei Liu wrote: It's set in code but never used. Detected by -Wunused-but-set-variable. Signed-off-by: Wei Liu wei.l...@citrix.com Applied thanks (I figured there was no harm even if it is just about to be deleted) --- tools/libxc/xc_domain_save.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c index 301e770..3222473 100644 --- a/tools/libxc/xc_domain_save.c +++ b/tools/libxc/xc_domain_save.c @@ -811,7 +811,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter int live = (flags XCFLAGS_LIVE); int debug = (flags XCFLAGS_DEBUG); int superpages = !!hvm; -int race = 0, sent_last_iter, skip_this_iter = 0; +int race = 0, skip_this_iter = 0; unsigned int sent_this_iter = 0; int tmem_saved = 0; @@ -1014,9 +1014,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter last_iter = !live; -/* pretend we sent all the pages last iteration */ -sent_last_iter = dinfo-p2m_size; - /* Setup to_send / to_fix and to_skip bitmaps */ to_send = xc_hypercall_buffer_alloc_pages(xch, to_send, NRPAGES(bitmap_size(dinfo-p2m_size))); to_skip = xc_hypercall_buffer_alloc_pages(xch, to_skip, NRPAGES(bitmap_size(dinfo-p2m_size))); @@ -1586,8 +1583,6 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter goto out; } -sent_last_iter = sent_this_iter; - print_stats(xch, dom, sent_this_iter, time_stats, shadow_stats, 1); } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
Tiejun Chen writes ([v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy): This patch introduces user configurable parameters to specify RDM resource and according policies, ... Global RDM parameter, type, allows user to specify reserved regions explicitly, e.g. using 'host' to include all reserved regions reported on this platform which is good to handle hotplug scenario. In the future this parameter may be further extended to allow specifying random regions, e.g. even those belonging to another platform as a preparation for live migration with passthrough devices. Instead, 'none' means we have nothing to do all reserved regions and ignore all policies, so guest work as before. I think the description in the documentation needs to have more user-focused information. It's not quite clear to me what the tradeoffs are of the different options. (Your use of random here is rather information. You should say arbitrary.) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 03/18] xen: console: Add ratelimit support for error message
On Mon, Jun 22, 2015 at 6:51 PM, Jan Beulich jbeul...@suse.com wrote: On 22.06.15 at 14:01, vijay.kil...@gmail.com wrote: From: Vijaya Kumar K vijaya.ku...@caviumnetworks.com XENLOG_ERR_RATE_LIMIT and XENLOG_G_ERR_RATE_LIMIT log levels are added to support rate limit for error messages If you mean to say that rate limiting currently doesn't work for XENLOG_ERR messages, then that's a problem to be fixed by adjusting existing code, not by adding yet another log level. For GUEST messages ERR and WARN are rate limited by setting lower threshold to 0 and upper threshold to 2 as below #define XENLOG_GUEST_UPPER_THRESHOLD 2 /* Do not print INFO and DEBUG */ #define XENLOG_GUEST_LOWER_THRESHOLD 0 /* Rate-limit ERR and WARNING */ So do you recommend to set same threshold levels to Xen messages there by ERR WARN are rate limited? Regards Vijay ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 03/18] xen: console: Add ratelimit support for error message
On 25.06.15 at 15:14, vijay.kil...@gmail.com wrote: On Mon, Jun 22, 2015 at 6:51 PM, Jan Beulich jbeul...@suse.com wrote: On 22.06.15 at 14:01, vijay.kil...@gmail.com wrote: From: Vijaya Kumar K vijaya.ku...@caviumnetworks.com XENLOG_ERR_RATE_LIMIT and XENLOG_G_ERR_RATE_LIMIT log levels are added to support rate limit for error messages If you mean to say that rate limiting currently doesn't work for XENLOG_ERR messages, then that's a problem to be fixed by adjusting existing code, not by adding yet another log level. For GUEST messages ERR and WARN are rate limited by setting lower threshold to 0 and upper threshold to 2 as below #define XENLOG_GUEST_UPPER_THRESHOLD 2 /* Do not print INFO and DEBUG */ #define XENLOG_GUEST_LOWER_THRESHOLD 0 /* Rate-limit ERR and WARNING */ So do you recommend to set same threshold levels to Xen messages there by ERR WARN are rate limited? I'm not sure I understand what you're asking: I recommend no change at all, unless you see something broken (in which case that's what I want clearly described). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/8] xen/x86: Calculate PV CR4 masks at boot
On 25/06/15 14:08, Jan Beulich wrote: On 24.06.15 at 18:31, andrew.coop...@citrix.com wrote: --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -682,24 +682,47 @@ void arch_domain_unpause(struct domain *d) viridian_time_ref_count_thaw(d); } -unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long guest_cr4) +/* + * These are the masks of CR4 bits (subject to hardware availability) which a + * PV guest may not legitimiately attempt to modify. + */ +static unsigned long __read_mostly pv_cr4_mask, compat_pv_cr4_mask; The patch generally being fine, I still wonder why you chose to use pv in the names instead of the previous hv: To me, the latter makes more sense: the bits the hypervisor controls instead of the bits pv guests do not control. It is the set of bits Xen doesn't mind the guest attempting to modify, which is specifically different from the bits Xen actually controls, and different from the set of bits shadowed in a guests CR4. The masks do represent a superset of the shadowed bits, (clamped by hardware support). Bits such as PGE and FSGSBASE are deemed ok for a guest to attempt to modify, but are not shadowed and the guests interests are completely ignored. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/8] xen/x86: Calculate PV CR4 masks at boot
On 25.06.15 at 15:31, andrew.coop...@citrix.com wrote: On 25/06/15 14:08, Jan Beulich wrote: On 24.06.15 at 18:31, andrew.coop...@citrix.com wrote: --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -682,24 +682,47 @@ void arch_domain_unpause(struct domain *d) viridian_time_ref_count_thaw(d); } -unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long guest_cr4) +/* + * These are the masks of CR4 bits (subject to hardware availability) which a + * PV guest may not legitimiately attempt to modify. + */ +static unsigned long __read_mostly pv_cr4_mask, compat_pv_cr4_mask; The patch generally being fine, I still wonder why you chose to use pv in the names instead of the previous hv: To me, the latter makes more sense: the bits the hypervisor controls instead of the bits pv guests do not control. It is the set of bits Xen doesn't mind the guest attempting to modify, It's the inverse of that set of bits really, isn't it? Jan which is specifically different from the bits Xen actually controls, and different from the set of bits shadowed in a guests CR4. The masks do represent a superset of the shadowed bits, (clamped by hardware support). Bits such as PGE and FSGSBASE are deemed ok for a guest to attempt to modify, but are not shadowed and the guests interests are completely ignored. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 01/11] x86/acpi: add a common interface for x86 cpu matching
On 25.06.15 at 13:14, wei.w.w...@intel.com wrote: Add a common interface for matching the current cpu against an array of x86_cpu_ids. Also change mwait-idle.c to use it. v4 changes: None. Signed-off-by: Wei Wang wei.w.w...@intel.com Please avoid re-sending patches that got applied already. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests
On 25/06/15 12:18, David Vrabel wrote: On 24/06/15 17:31, Andrew Cooper wrote: Experimentally, older Linux guests perform construction of `init` with user pagetable mappings. This is fine for native systems as such a guest would not set CR4.SMAP itself. However if Xen uses SMAP itself, 32bit PV guests (whose kernels run in ring1) are also affected. Older Linux guests end up spinning in a loop assuming that the SMAP violation pagefaults are spurious, and make no further progress. One option is to disable SMAP completely, but this is unreasonable. A better alternative is to disable SMAP only in the context of 32bit PV guests, but reduces the effectiveness SMAP security. A 3rd option is for Xen to fix up behind a 32bit guest if it were SMAP-aware. It is a heuristic, and does result in a guest-visible state change, but allows Xen to keep CR4.SMAP unconditionally enabled. [...] --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1261,11 +1261,32 @@ Set the serial transmit buffer size. Flag to enable Supervisor Mode Execution Protection ### smap - `= boolean` + `= boolean | compat | fixup` Default: `true` -Flag to enable Supervisor Mode Access Prevention +Handling of Supervisor Mode Access Prevention. + +32bit PV guest kernels qualify as supervisor code, as they execute in ring 1. +If Xen uses SMAP protection itself, a PV guest which is not SMAP aware may +suffer unexpected pagefaults which it cannot handle. (Experimentally, there +are 32bit PV guests which fall foul of SMAP enforcement and spin in an +infinite loop taking pagefaults early on boot.) + +Two further SMAP modes are introduced to work around buggy 32bit PV guests to +prevent functional regressions of VMs on newer hardware. At any point if the +guest sets `CR4.SMAP` itself, it is deemed aware, and **compat/fixup** cease +to apply. Guests that is not aware of SMAP or do not support it are not buggy. Taking and not understanding a SMAP #PF is understandable. The way it spins in an infinite loop is unquestionably buggy. + +A SMAP mode of **compat** causes Xen to disable `CR4.SMAP` in the context of +an unaware 32bit PV guest. This prevents the guest from being subject to SMAP +enforcement, but also prevents Xen from benefiting from the added security +checks. + +A SMAP mode of **fixup** causes Xen to set `EFLAGS.AC` when discovering a SMAP +pagefault in the context of an unaware 32bit PV guest. This allows Xen to +retain the added security from SMAP checks, but results in a guest-visible +state change which it might object to. What does the PV ABI say about the use of EFLAGS.AC? Have guests historically been allowed to use this bit? If so, does Xen fiddling with it potentially break some guests? If there were an ABI written down anywhere, I might be able to answer that question. 32bit PV guest kernels cannot make use of AC themselves; alignment checking is only available in cpl3. AC is however able to be changed by a popf instruction even in cpl3 (which make it very curious as to why stac/clac are strictly cpl0 instructions). Fundamentally, smap=fixup might indeed break a PV guest, but testing shows that RHEL/CentOS 5/6, SLES 11/12 and Debian 6/7 PV guests are all fine with it. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] vif-bridge: ip link set failed, name too long
On Thu, 2015-06-25 at 12:36 +0100, Anthony PERARD wrote: Error: argument tap695cf459-b0-emu is wrong: name too long Under Linux IFNAMSIZ is 16, whereas this is 18 characters. Since our suffix is -emu we are adding 4 to the original 14, so we could/should pick a 2 character suffix to distinguish PV from emulated interfaces. -e perhaps? It looks like the suffix is in both tools/hotplug/Linux/vif-common.sh and tools/libxl/libxl_internal.h:TAP_DEVICE_SUFFIX. We could perhaps arrange somehow that only the hotplug scripts needed to know this, allowing this to be a more localised decision but it would no doubt involve a bunch of faff. I'm inclined to suggest we just change the suffix globally. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] libxl: Add AHCI support for upstream qemu
On 25/06/15 12:15, Fabio Fantoni wrote: Il 25/06/2015 12:21, Ian Campbell ha scritto: On Tue, 2015-06-23 at 11:15 +0200, Fabio Fantoni wrote: Usage: ahci=0|1 (default=0) I think a global rather than per disk option is OK (I can't think why a user would want to mix and match) but maybe we should consider using an enum (with values ide and ahci, defaulting to ide in libxl) so that we can add support for whatever fancy new disk controller everyone is using in 5 years time? ahci was added 4 years ago in qemu and I don't know of newer similar tecnology, in the case of enum probably shold be more generic for include more future possibility or I'm wrong? in that case what can be the name? @stabellini and other developer: any advice about this? You may want to support nvme device interface as well. This would be the newer similar technology you are referring to :) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/4] xen: sched: avoid dumping duplicate information
When dumping scheduling information (debug key 'r'), what we print as 'Idle cpupool' is pretty much the same of what we print immediately after as 'Cpupool0'. In fact, if there are no pCPUs outside of any cpupools, it is exactly the same. If there are free pCPUs, there is some valuable information, but still a lot of duplication: (XEN) Online Cpus: 0-15 (XEN) Free Cpus: 8 (XEN) Idle cpupool: (XEN) Scheduler: SMP Credit Scheduler (credit) (XEN) info: (XEN) ncpus = 13 (XEN) master = 0 (XEN) credit = 3900 (XEN) credit balance = 45 (XEN) weight = 1280 (XEN) runq_sort = 11820 (XEN) default-weight = 256 (XEN) tslice = 30ms (XEN) ratelimit = 1000us (XEN) credits per msec = 10 (XEN) ticks per tslice = 3 (XEN) migration delay= 0us (XEN) idlers: ,6d29 (XEN) active vcpus: (XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)} (XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)} (XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)} (XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)} (XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)} (XEN) CPU[08] sort=5791, sibling=,0300, core=,ff00 (XEN) run: [32767.8] pri=-64 flags=0 cpu=8 (XEN) Cpupool 0: (XEN) Cpus: 0-5,10-15 (XEN) Scheduler: SMP Credit Scheduler (credit) (XEN) info: (XEN) ncpus = 13 (XEN) master = 0 (XEN) credit = 3900 (XEN) credit balance = 45 (XEN) weight = 1280 (XEN) runq_sort = 11820 (XEN) default-weight = 256 (XEN) tslice = 30ms (XEN) ratelimit = 1000us (XEN) credits per msec = 10 (XEN) ticks per tslice = 3 (XEN) migration delay= 0us (XEN) idlers: ,6d29 (XEN) active vcpus: (XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)} (XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) {a/i=37/36 m=11+544 (k=0)} (XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) {a/i=127/126 m=108+9 (k=0)} (XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] (-84+300) {a/i=163/162 m=36+568 (k=0)} (XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) {a/i=129/128 m=16+50 (k=0)} (XEN) CPU[00] sort=11801, sibling=,0003, core=,00ff (XEN) run: [32767.0] pri=-64 flags=0 cpu=0 ... ... ... (XEN) CPU[15] sort=11820, sibling=,c000, core=,ff00 (XEN) run: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) {a/i=22/21 m=18+5 (k=0)} (XEN) 1: [32767.15] pri=-64 flags=0 cpu=15 (XEN) Cpupool 1: (XEN) Cpus: 6-7,9 (XEN) Scheduler: SMP RTDS Scheduler (rtds) (XEN) CPU[06] (XEN) CPU[07] (XEN) CPU[09] With this change, we get rid of the redundancy, and retain only the information about the free pCPUs. (While there, turn a loop index variable from `int' to `unsigned int' in schedule_dump().) Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Cc: Juergen Gross jgr...@suse.com Cc: George Dunlap george.dun...@eu.citrix.com --- xen/common/cpupool.c |6 +++--- xen/common/schedule.c | 18 +- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 563864d..5471f93 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -728,10 +728,10 @@ void dump_runq(unsigned char key) print_cpumap(Online Cpus, cpu_online_map); if ( !cpumask_empty(cpupool_free_cpus) ) +{ print_cpumap(Free Cpus, cpupool_free_cpus); - -printk(Idle cpupool:\n); -schedule_dump(NULL); +schedule_dump(NULL); +} for_each_cpupool(c) { diff --git a/xen/common/schedule.c b/xen/common/schedule.c index ecf1545..4ffcd98 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1473,16 +1473,24 @@ void scheduler_free(struct scheduler *sched) void schedule_dump(struct cpupool *c) { -int i; +unsigned int i; struct scheduler *sched; cpumask_t*cpus; /* Locking, if necessary, must be handled withing each scheduler */ -sched = (c == NULL) ? ops : c-sched; -cpus = cpupool_scheduler_cpumask(c); -printk(Scheduler: %s (%s)\n, sched-name, sched-opt_name); -SCHED_OP(sched, dump_settings); +if ( c != NULL ) +{ +sched = c-sched; +cpus = c-cpu_valid; +printk(Scheduler: %s (%s)\n, sched-name, sched-opt_name); +SCHED_OP(sched, dump_settings); +} +else +{ +sched = ops; +cpus
[Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
In fact, if a pCPU belonging to some other pool than cpupool0 goes down, we want to clear the relevant bit from its actual pool, rather than always from cpupool0. Before this commit, all the pCPUs in the non-default pool(s) will be considered immediately valid, during system resume, even the one that have not been brought up yet. As a result, the (Credit1) scheduler will attempt to run its load balancing logic on them, causing the following Oops: # xl cpupool-cpu-remove Pool-0 8-15 # xl cpupool-create name=\Pool-1\ # xl cpupool-cpu-add Pool-1 8-15 -- suspend -- resume (XEN) [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) CPU:8 (XEN) RIP:e008:[82d080123078] csched_schedule+0x4be/0xb97 (XEN) RFLAGS: 00010087 CONTEXT: hypervisor (XEN) rax: 80007d2f7fccb780 rbx: 0009 rcx: (XEN) rdx: 82d08031ed40 rsi: 82d080334980 rdi: (XEN) rbp: 8301fe20 rsp: 8301fd40 r8: 0004 (XEN) r9: r10: 00ff00ff00ff00ff r11: 0f0f0f0f0f0f0f0f (XEN) r12: 8303191ea870 r13: 8303226aadf0 r14: 0009 (XEN) r15: 0008 cr0: 8005003b cr4: 26f0 (XEN) cr3: dba9d000 cr2: (XEN) ds: es: fs: gs: ss: cs: e008 (XEN) ... ... ... (XEN) Xen call trace: (XEN)[82d080123078] csched_schedule+0x4be/0xb97 (XEN)[82d08012c732] schedule+0x12a/0x63c (XEN)[82d08012f8c8] __do_softirq+0x82/0x8d (XEN)[82d08012f920] do_softirq+0x13/0x15 (XEN)[82d080164791] idle_loop+0x5b/0x6b (XEN) (XEN) (XEN) Panic on CPU 8: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=] (XEN) Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Cc: Juergen Gross jgr...@suse.com Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com --- xen/arch/x86/smpboot.c |1 - xen/common/cpupool.c |2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 2289284..a4ec396 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -887,7 +887,6 @@ void __cpu_disable(void) remove_siblinginfo(cpu); /* It's now safe to remove this processor from the online map */ -cpumask_clear_cpu(cpu, cpupool0-cpu_valid); cpumask_clear_cpu(cpu, cpu_online_map); fixup_irqs(); diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 5471f93..b48ae17 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -530,6 +530,7 @@ static int cpupool_cpu_remove(unsigned int cpu) if ( cpumask_test_cpu(cpu, (*c)-cpu_valid ) ) { cpumask_set_cpu(cpu, (*c)-cpu_suspended); +cpumask_clear_cpu(cpu, (*c)-cpu_valid); break; } } @@ -552,6 +553,7 @@ static int cpupool_cpu_remove(unsigned int cpu) * If we are not suspending, we are hot-unplugging cpu, and that is * allowed only for CPUs in pool0. */ +cpumask_clear_cpu(cpu, cpupool0-cpu_valid); ret = 0; } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.5-testing test] 58867: regressions - FAIL
flight 58867 xen-4.5-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/58867/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 58776 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-localmigrate.2 fail REGR. vs. 58776 test-amd64-i386-xl-qemuu-winxpsp3 15 guest-localmigrate/x10 fail REGR. vs. 58776 Regressions which are regarded as allowable (not blocking): test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 58776 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 58776 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 58776 test-amd64-amd64-xl-qemuu-winxpsp3 15 guest-localmigrate/x10 fail like 58776 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 12 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf-pin 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass version targeted for testing: xen e3bd3cefba5f11062523701bd07051c92a47ef34 baseline version: xen a24672752214b07661db594921ba70c0ee3066c5 People who touched revisions under test: Ian Jackson ian.jack...@eu.citrix.com Jan Beulich jbeul...@suse.com jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-qemut-rhel6hvm-amd fail test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-rumpuserxen-amd64 pass test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-armhf-armhf-xl-arndale pass test-amd64-amd64-xl-credit2 pass test-armhf-armhf-xl-credit2 pass test-armhf-armhf-xl-cubietruck pass test-amd64-i386-freebsd10-i386 pass test-amd64-i386-rumpuserxen-i386 pass test-amd64-amd64-xl-pvh-intelfail test-amd64-i386-qemut-rhel6hvm-intel pass
Re: [Xen-devel] [PATCH OSSTEST v3 21/22] Debian: Arrange to be able to chainload a xen.efi from grub2
Ian Campbell writes (Re: [PATCH OSSTEST v3 21/22] Debian: Arrange to be able to chainload a xen.efi from grub2): On Thu, 2015-06-25 at 11:33 +0100, Ian Jackson wrote: Is there some upstream-friendly way of achieving the same thing ? Not AFAIK. I could try upstreaming this but given that a) the user still needs to manually copy things to the ESP and create a suitable xen.cfg and b) people are working on a better way which will just work with the existing non-UEFI grub.cfg file entries, I'm not sure how much point there is. I think people are working on a better way is what I was looking for. When that change comes along, we can remove 20_linux_xen ? I'm not really sure what is `specific to us' (or what `us' here means - osstest, or Xen on arm64, or ...?) All the paths are basically specific to us, just the general shape of the entry is more generically applicable. `us' = osstest ? Xen ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen: new maintainer for the RTDS scheduler
Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Cc: George Dunlap george.dun...@eu.citrix.com Cc: Meng Xu xumengpa...@gmail.com --- MAINTAINERS |5 + 1 file changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 6b1068e..e6616d2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -282,6 +282,11 @@ F: tools/libxl/libxl_nonetbuffer.c F: tools/hotplug/Linux/remus-netbuf-setup F: tools/hotplug/Linux/block-drbd-probe +RTDS SCHEDULER +M: Dario Faggioli dario.faggi...@citrix.com +S: Supported +F: xen/common/sched_rt.c + SCHEDULING M: George Dunlap george.dun...@eu.citrix.com S: Supported ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 09/12] x86/altp2m: add remaining support routines.
On Wed, Jun 24, 2015 at 2:06 PM, Ed White edmund.h.wh...@intel.com wrote: On 06/24/2015 09:15 AM, Lengyel, Tamas wrote: +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx, + unsigned long pfn, xenmem_access_t access) +{ This function IMHO should be merged with p2m_set_mem_access and should be triggerable with the same memop (XENMEM_access_op) hypercall instead of introducing a new hvmop one. I think we should vote on this. My view is that it makes XENMEM_access_op too complicated to use. The two functions are not very long and share enough code that it would justify merging. The only big change added is the copy from host-alt when the entry doesn't exists in alt, and that itself is pretty self contained. Let's see if we can get a third opinion on it.. It also makes using this one specific altp2m capability different to using any of the others That argument goes both ways - a new mem_access function being introduced that is different from the others. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] Stepping up for being the maintainer of sched_rt.c
I've been involved with this scheduler from the very beginning of the upstreaming process (from the RT-Xen project to here). I've been working with Meng and his group closely since then, and I now feel comfortable to be the one that will (N)Ack their patches! :-) Regards, Dario --- Dario Faggioli (1): xen: new maintainer for the RTDS scheduler MAINTAINERS |5 + 1 file changed, 5 insertions(+) -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v3 21/22] Debian: Arrange to be able to chainload a xen.efi from grub2
On Thu, 2015-06-25 at 13:36 +0100, Ian Jackson wrote: Ian Campbell writes (Re: [PATCH OSSTEST v3 21/22] Debian: Arrange to be able to chainload a xen.efi from grub2): On Thu, 2015-06-25 at 11:33 +0100, Ian Jackson wrote: Is there some upstream-friendly way of achieving the same thing ? Not AFAIK. I could try upstreaming this but given that a) the user still needs to manually copy things to the ESP and create a suitable xen.cfg and b) people are working on a better way which will just work with the existing non-UEFI grub.cfg file entries, I'm not sure how much point there is. I think people are working on a better way is what I was looking for. When that change comes along, we can remove 20_linux_xen ? OK. I'm not really sure what is `specific to us' (or what `us' here means - osstest, or Xen on arm64, or ...?) All the paths are basically specific to us, just the general shape of the entry is more generically applicable. `us' = osstest ? Xen ? Mostly osstest. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio check op
-Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 25 June 2015 14:47 To: Paul Durrant; Jan Beulich Cc: xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25/06/15 14:38, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 25 June 2015 14:38 To: Paul Durrant; Jan Beulich Cc: xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25/06/15 14:36, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 25 June 2015 14:34 To: Jan Beulich Cc: Paul Durrant; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25/06/15 13:46, Jan Beulich wrote: On 25.06.15 at 14:21, andrew.coop...@citrix.com wrote: On 24/06/15 12:24, Paul Durrant wrote: When memory mapped I/O is range checked by internal handlers, the length of the access should be taken into account. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com For what purpose? The length of the access doesn't affect which handler should accept the IO. This length check now causes an MMIO handler to not claim an access which straddles the upper boundary. It is probably fine to terminate such an access early, but it isn't fine to pass such a straddled access to the default ioreq server. No, without involving the length in the check we can end up with check() saying Yes, mine but read() or write() saying Not me. What I would agree with is for the generic handler to split the access if the first byte fits, but the final byte doesn't. I discussed this with Paul over lunch. I had not considered how IO gets forwarded to the device model for shared implementations. Is it reasonable to split a straddled access and direct the halves at different handlers? This is not in line with how other hardware behaves (PCIe will reject any straddled access). Furthermore, given small MMIO regions and larger registers, there is no guarantee that a single split will suffice. I see in the other thread going on that a domain_crash() is deemed ok for now, which is fine my me. I think that also allows me to simplfy the patch since I don't have to modify the mmio_check op any more. I simply call it once for the first byte of the access and, if it accepts, verify that it also accepts the last byte of the access. At that point, I would say it would be easier to modify the claim check to return yes/straddled/no rather than calling it twice. That's excessive code churn, I think. The check functions are generally cheap and the second call is only made if the first accepts. You are already churning everything anyway by inserting an extra parameter. I do think it would make the logic cleaner and easier to follow (which IMO takes precedent over churn). No, my point was that by making the second call I don't need to add the extra parameter. Wait for the revised patch... it's about 6 lines long now ;-) Paul ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio check op
On 25.06.15 at 15:36, paul.durr...@citrix.com wrote: I think that also allows me to simplfy the patch since I don't have to modify the mmio_check op any more. I simply call it once for the first byte of the access and, if it accepts, verify that it also accepts the last byte of the access. That's actually not (generally) okay: There could be a hole in the middle. But as long as instructions don't do accesses wider than a page, we're fine with that in practice I think. Or wait, no, in the MSI-X this could not be okay: A 64-byte read to the 16 bytes 32 bytes away from a page boundary (and being the last entry on one device's MSI-X table) would extend into another device's MSI-X table on the next page. I.e. first and last bytes would be okay to be accessed, but bytes 16...31 of the access wouldn't. Of course the MSI-X read/write handlers don't currently permit such wide accesses, but anyway... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 07/17] x86/hvm: add length to mmio check op
On 25/06/15 14:38, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 25 June 2015 14:38 To: Paul Durrant; Jan Beulich Cc: xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25/06/15 14:36, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 25 June 2015 14:34 To: Jan Beulich Cc: Paul Durrant; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 07/17] x86/hvm: add length to mmio check op On 25/06/15 13:46, Jan Beulich wrote: On 25.06.15 at 14:21, andrew.coop...@citrix.com wrote: On 24/06/15 12:24, Paul Durrant wrote: When memory mapped I/O is range checked by internal handlers, the length of the access should be taken into account. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com For what purpose? The length of the access doesn't affect which handler should accept the IO. This length check now causes an MMIO handler to not claim an access which straddles the upper boundary. It is probably fine to terminate such an access early, but it isn't fine to pass such a straddled access to the default ioreq server. No, without involving the length in the check we can end up with check() saying Yes, mine but read() or write() saying Not me. What I would agree with is for the generic handler to split the access if the first byte fits, but the final byte doesn't. I discussed this with Paul over lunch. I had not considered how IO gets forwarded to the device model for shared implementations. Is it reasonable to split a straddled access and direct the halves at different handlers? This is not in line with how other hardware behaves (PCIe will reject any straddled access). Furthermore, given small MMIO regions and larger registers, there is no guarantee that a single split will suffice. I see in the other thread going on that a domain_crash() is deemed ok for now, which is fine my me. I think that also allows me to simplfy the patch since I don't have to modify the mmio_check op any more. I simply call it once for the first byte of the access and, if it accepts, verify that it also accepts the last byte of the access. At that point, I would say it would be easier to modify the claim check to return yes/straddled/no rather than calling it twice. That's excessive code churn, I think. The check functions are generally cheap and the second call is only made if the first accepts. You are already churning everything anyway by inserting an extra parameter. I do think it would make the logic cleaner and easier to follow (which IMO takes precedent over churn). ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] libxl: Add AHCI support for upstream qemu
On Tue, 2015-06-23 at 11:15 +0200, Fabio Fantoni wrote: Usage: ahci=0|1 (default=0) I think a global rather than per disk option is OK (I can't think why a user would want to mix and match) but maybe we should consider using an enum (with values ide and ahci, defaulting to ide in libxl) so that we can add support for whatever fancy new disk controller everyone is using in 5 years time? If enabled adds ich9 disk controller in ahci mode and uses it with upstream qemu to emulate disks instead of ide. It doesn't support cdroms which still using ide (cdroms will use -device ide-cd as new qemu parameter) I don't follow this reference to will use and a new qemu parameter, there seems to be nothing corresponding in this patch AFAICT. Ahci requires new qemu parameter but for now other emulated disks cases remains with old ones (I did it in other patch, not needed by this one) You can drop the reference to the other patch I think. I did it as libxl parameter disabled by default to avoid possible problems: - with save/restore/migration (restoring with ahci a domU that was with ide instead) - windows 8 without pv drivers (a registry key change is needed for AHCI-IDE change FWIK to avoid possible blue screen) What is FWIK? - windows XP or older that many not support ahci by default. Setting AHCI with libxl parameter and default to disabled seems the best solution. AHCI increase hvm domUs boot performance. On linux hvm domU I saw up to only 20% of the previous total boot time, whereas boot time decrease a lot on W7 domUs for most of boots I have done. Small difference in boot time compared to ide mode on W8 and newer (probably other xen improvements or fixes are needed not ahci related) Signed-off-by: Fabio Fantoni fabio.fant...@m2r.biz --- Changes in v2: - libxl_dm.c: small code style fix - added vbd-interface.txt changes --- docs/man/xl.cfg.pod.5 | 9 + docs/misc/vbd-interface.txt | 5 +++-- tools/libxl/libxl.h | 10 ++ tools/libxl/libxl_create.c | 1 + tools/libxl/libxl_dm.c | 10 +- tools/libxl/libxl_types.idl | 1 + tools/libxl/xl_cmdimpl.c| 1 + 7 files changed, 34 insertions(+), 3 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index a3e0e2e..7e16123 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -904,6 +904,15 @@ default is Bcd. =back +=item Bahci=[0|1] =item Bahci=BOOLEAN please. +If enabled adds ich9 disk controller in ahci mode and uses it with +upstream qemu to emulate disks instead of ide. It decrease boot time but decreases +may be not supported by default in windows xp and older windows. +The default is disabled (0). may not be supported. I think AHCI and IDE should be capitalised in the text (not the option name). As should Windows XP and Windows + +=back + =head3 Paging The following options control the mechanisms used to virtualise guest diff --git a/docs/misc/vbd-interface.txt b/docs/misc/vbd-interface.txt index f873db0..afb6846 100644 --- a/docs/misc/vbd-interface.txt +++ b/docs/misc/vbd-interface.txt @@ -3,18 +3,19 @@ Xen guest interface A Xen guest can be provided with block devices. These are always provided as Xen VBDs; for HVM guests they may also be provided as -emulated IDE or SCSI disks. +emulated IDE, AHCI or SCSI disks. The abstract interface involves specifying, for each block device: * Nominal disk type: Xen virtual disk (aka xvd*, the default); SCSI - (sd*); IDE (hd*). + (sd*); IDE or AHCI (hd*). For HVM guests, each whole-disk hd* and and sd* device is made available _both_ via emulated IDE resp. SCSI controller, _and_ as a Xen VBD. The HVM guest is entitled to assume that the IDE or SCSI disks available via the emulated IDE controller target the same underlying devices as the corresponding Xen VBD (ie, multipath). + In hd* case with ahci=1, disk will be AHCI via emulated ich9 controller. For PV guests every device is made available to the guest only as a Xen VBD. For these domains the type is advisory, for use by the diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 0a7913b..6a3677d 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -596,6 +596,16 @@ typedef struct libxl__ctx libxl_ctx; #define LIBXL_HAVE_SPICE_STREAMINGVIDEO 1 /* + * LIBXL_HAVE_AHCI + * + * If defined, then the u.hvm structure will contain a boolean type: + * ahci. This value defines if ahci support is present. + * + * If this is not defined, the ahci support is ignored. + */ +#define LIBXL_HAVE_AHCI 1 + +/* * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1 * * If this is defined, libxl_domain_create_restore()'s API has changed to diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 86384d2..8ca2481 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -331,6
Re: [Xen-devel] [PATCH OSSTEST v3 02/22] mg-*: Make package fetching common in new mgi-debian
Ian Campbell writes (Re: [PATCH OSSTEST v3 02/22] mg-*: Make package fetching common in new mgi-debian): On Wed, 2015-06-24 at 17:00 +0100, Ian Jackson wrote: ... Although, another option would be to put this in mgi-common and call it fetch_debian_package. I'm happy either way, which would you prefer? I'd marginally prefer it all in mgi-common. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] xen: Allow xen tools to run in guest using 64K page granularity
On Mon, May 11, 2015 at 12:55:34PM +0100, Julien Grall wrote: Hi all, This small series are the only changes required in Xen in order to run a guest using 64K page granularity on top of an unmodified Xen. I'd like feedback from maintainers tools to know if it might be worth to introduce a function xc_pagesize() replicating the behavior of getpagesize() for Xen. Can we start with documenting the ABI (?) for communicating between guests with different page sizes? Or at least mention the ring mfn always has the size of XC_PAGE_SIZE (if that's the case). Wei. Sincerely yours, Julien Grall (2): tools/xenstored: Use XC_PAGE_SIZE rather than getpagesize() tools/xenconsoled: Use XC_PAGE_SIZE rather than getpagesize() tools/console/daemon/io.c | 4 ++-- tools/xenstore/xenstored_domain.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC v1 10/13] lib{xc/xl}: allow the creation of HVM domains with a kernel
I think the subject line should be changed a bit. We already support HVM direct kernel boot with QEMU. Now you're implementing that without QEMU. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 10/19] tools: extend xc_assign_device() to support rdm reservation policy
On Tue, Jun 23, 2015 at 05:57:21PM +0800, Tiejun Chen wrote: This patch passes rdm reservation policy to xc_assign_device() so the policy is checked when assigning devices to a VM. Note this also bring some fallout to python usage of xc_assign_device(). CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com CC: David Scott dave.sc...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Wei Liu wei.l...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset
-Original Message- From: Paul Durrant Sent: 25 June 2015 11:52 To: 'Jan Beulich' Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: RE: [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 25 June 2015 11:47 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: Re: [PATCH v4 17/17] x86/hvm: track large memory mapped accesses by buffer offset On 24.06.15 at 13:24, paul.durr...@citrix.com wrote: @@ -621,14 +574,41 @@ static int hvmemul_phys_mmio_access( for ( ;; ) { -rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, -*buffer); -if ( rc != X86EMUL_OKAY ) -break; +/* Have we already done this chunk? */ +if ( (*off + chunk) = vio-mmio_cache[dir].size ) I can see why you would like to get rid of the address check, but I'm afraid you can't: You have to avoid getting mixed up multiple same kind (reads or writes) memory accesses that a single instruction can do. While generally I would assume that secondary accesses (like the I/O bitmap read associated with an OUTS) wouldn't go to MMIO, CMPS with both operands being in MMIO would break even if neither crosses a page boundary (not to think of when the emulator starts supporting the scatter/gather instructions, albeit supporting them will require further changes, or we could choose to do them one element at a time). Ok. Can I assume at most two distinct set of addresses for read or write? If so then I can just keep two sets of caches in the hvm_io struct. Oh, I mean linear addresses here BTW. Paul +{ +ASSERT(*off + chunk = vio-mmio_cache[dir].size); I don't see any difference to the if() expression just above. That's possible - this has been through a few re-bases. +if ( dir == IOREQ_READ ) +memcpy(buffer[*off], + vio-mmio_cache[IOREQ_READ].buffer[*off], + chunk); +else +{ +if ( memcmp(buffer[*off], else if please. Ok. +vio-mmio_cache[IOREQ_WRITE].buffer[*off], +chunk) != 0 ) +domain_crash(curr-domain); +} +} +else +{ +ASSERT(*off == vio-mmio_cache[dir].size); + +rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, +buffer[*off]); +if ( rc != X86EMUL_OKAY ) +break; + +/* Note that we have now done this chunk */ Missing stop. Ok. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 11/11] tools: enable xenpm to control the intel_pstate driver
The intel_pstate driver receives percentage values to set the performance limits. This patch adds interfaces to support the input of percentage values to control the intel_pstate driver. Also, the get-cpufreq-para is modified to show percentage based feedback info. v4 changes: None. Signed-off-by: Wei Wang wei.w.w...@intel.com --- tools/libxc/include/xenctrl.h | 14 - tools/libxc/xc_pm.c | 17 --- tools/misc/xenpm.c| 116 +- 3 files changed, 115 insertions(+), 32 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 100b89c..a79494a 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2266,8 +2266,18 @@ struct xc_get_cpufreq_para { uint32_t scaling_cur_freq; char scaling_governor[CPUFREQ_NAME_LEN]; -uint32_t scaling_max_freq; -uint32_t scaling_min_freq; + +union { +uint32_t freq; +uint32_t pct; +} scaling_max; + +union { +uint32_t freq; +uint32_t pct; +} scaling_min; + +uint32_t scaling_turbo_pct; /* for specific governor */ union { diff --git a/tools/libxc/xc_pm.c b/tools/libxc/xc_pm.c index 823bab6..300de33 100644 --- a/tools/libxc/xc_pm.c +++ b/tools/libxc/xc_pm.c @@ -261,13 +261,16 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid, } else { -user_para-cpuinfo_cur_freq = sys_para-cpuinfo_cur_freq; -user_para-cpuinfo_max_freq = sys_para-cpuinfo_max_freq; -user_para-cpuinfo_min_freq = sys_para-cpuinfo_min_freq; -user_para-scaling_cur_freq = sys_para-scaling_cur_freq; -user_para-scaling_max_freq = sys_para-scaling_max.freq; -user_para-scaling_min_freq = sys_para-scaling_min.freq; -user_para-turbo_enabled= sys_para-turbo_enabled; +user_para-cpuinfo_cur_freq = sys_para-cpuinfo_cur_freq; +user_para-cpuinfo_max_freq = sys_para-cpuinfo_max_freq; +user_para-cpuinfo_min_freq = sys_para-cpuinfo_min_freq; +user_para-scaling_cur_freq = sys_para-scaling_cur_freq; +user_para-scaling_max.freq = sys_para-scaling_max.freq; +user_para-scaling_min.freq = sys_para-scaling_min.freq; +user_para-scaling_max.pct = sys_para-scaling_max.pct; +user_para-scaling_min.pct = sys_para-scaling_min.pct; +user_para-scaling_turbo_pct= sys_para-scaling_turbo_pct; +user_para-turbo_enabled= sys_para-turbo_enabled; memcpy(user_para-scaling_driver, sys_para-scaling_driver, CPUFREQ_NAME_LEN); diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c index 2f9bd8e..ea6a32f 100644 --- a/tools/misc/xenpm.c +++ b/tools/misc/xenpm.c @@ -33,6 +33,11 @@ #define MAX_CORE_RESIDENCIES 8 #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0])) +#define min_t(type,x,y) \ +({ type __x = (x); type __y = (y); __x __y ? __x: __y; }) +#define max_t(type,x,y) \ +({ type __x = (x); type __y = (y); __x __y ? __x: __y; }) +#define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) static xc_interface *xc_handle; static unsigned int max_cpu_nr; @@ -47,6 +52,9 @@ void show_help(void) get-cpuidle-states[cpuid] list cpu idle info of CPU cpuid or all\n get-cpufreq-states[cpuid] list cpu freq info of CPU cpuid or all\n get-cpufreq-para [cpuid] list cpu freq parameter of CPU cpuid or all\n + set-scaling-max-pct [cpuid] num set max performance limit in percentage\n + or as scaling speed in percentage in userspace governor\n + set-scaling-min-pct [cpuid] num set min performance limit in percentage\n set-scaling-maxfreq [cpuid] HZ set max cpu frequency HZ on CPU cpuid\n or all CPUs\n set-scaling-minfreq [cpuid] HZ set min cpu frequency HZ on CPU cpuid\n @@ -60,10 +68,10 @@ void show_help(void) set-up-threshold [cpuid] num set up threshold on CPU cpuid or all\n it is used in ondemand governor.\n get-cpu-topologyget thread/core/socket topology info\n - set-sched-smt enable|disable enable/disable scheduler smt power saving\n + set-sched-smt enable|disable enable/disable scheduler smt power saving\n set-vcpu-migration-delay num set scheduler vcpu migration delay in us\n get-vcpu-migration-delayget scheduler vcpu migration delay\n - set-max-cstatenum set the C-State limitation (num = 0)\n + set-max-cstatenum set
Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy
On Tue, Jun 23, 2015 at 05:57:22PM +0800, Tiejun Chen wrote: This patch introduces user configurable parameters to specify RDM resource and according policies, Global RDM parameter: rdm = type=none/host,reserve=strict/relaxed Per-device RDM parameter: pci = [ 'sbdf, rdm_reserve=strict/relaxed' ] Global RDM parameter, type, allows user to specify reserved regions explicitly, e.g. using 'host' to include all reserved regions reported on this platform which is good to handle hotplug scenario. In the future this parameter may be further extended to allow specifying random regions, e.g. even those belonging to another platform as a preparation for live migration with passthrough devices. Instead, 'none' means we have nothing to do all reserved regions and ignore all policies, so guest work as before. 'strict/relaxed' policy decides how to handle conflict when reserving RDM regions in pfn space. If conflict exists, 'strict' means an immediate error so VM will be killed, while 'relaxed' allows moving forward with a warning message thrown out. Default per-device RDM policy is 'strict', while default global RDM policy is 'relaxed'. When both policies are specified on a given region, 'strict' is always preferred. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com The code looks good to me. I will wait for native English speakers to have a look at the docs. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy
On Tue, Jun 23, 2015 at 05:57:23PM +0800, Tiejun Chen wrote: This patch passes our rdm reservation policy inside libxl when we assign a device or attach a device. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com The code looks good to me. I will wait for native English speakers to have a look at the docs. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] xen: Allow xen tools to run in guest using 64K page granularity
On Thu, Jun 25, 2015 at 12:23:26PM +0100, Ian Campbell wrote: On Thu, 2015-06-25 at 11:21 +0100, Wei Liu wrote: On Mon, May 11, 2015 at 12:55:34PM +0100, Julien Grall wrote: Hi all, This small series are the only changes required in Xen in order to run a guest using 64K page granularity on top of an unmodified Xen. I'd like feedback from maintainers tools to know if it might be worth to introduce a function xc_pagesize() replicating the behavior of getpagesize() for Xen. Can we start with documenting the ABI (?) for communicating between guests with different page sizes? We should certainly make it clearer what things are in terms of Xen ABI page size vs the guest's page size and other things. I think we can commit these two without that though? It worries me a bit due to the lack of document, though I have a hunch these patches are correct. Saying that Xen always use XC_PAGE_SIZE page for store and console mfn is good enough. Wei. Or at least mention the ring mfn always has the size of XC_PAGE_SIZE (if that's the case). Wei. Sincerely yours, Julien Grall (2): tools/xenstored: Use XC_PAGE_SIZE rather than getpagesize() tools/xenconsoled: Use XC_PAGE_SIZE rather than getpagesize() tools/console/daemon/io.c | 4 ++-- tools/xenstore/xenstored_domain.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 06/11] x86/intel_pstate: APERF/MPERF feature detect
On 25.06.15 at 13:16, wei.w.w...@intel.com wrote: Add support to detect the APERF/MPERF feature. Also, remove the identical code in cpufreq.c and powernow.c. v4 changes: 1) this is a new consolidated patch dealing with the APERF/MPERF feature detection. Signed-off-by: Wei Wang wei.w.w...@intel.com I would have taken this right away, if only it had been at the beginning of the series (or stated that it's independent of the earlier patches) and, more importantly, ... --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -238,6 +238,9 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 *c) if ( cpu_has(c, X86_FEATURE_CLFLSH) ) c-x86_clflush_size = ((ebx 8) 0xff) * 8; + if (cpuid_ecx(6) 0x1) + set_bit(X86_FEATURE_APERFMPERF, c-x86_capability); ... if you hadn't used this plain 0x1 here when _both_ of the old code pieces nicely used CPUID_6_ECX_APERFMPERF_CAPABILITY. Bonus points for also giving a sensible name to leaf 6 and naming its other bits code in the tree already uses (see CPUID_MWAIT_LEAF). And of course you should check -cpuid_level first. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v3 21/22] Debian: Arrange to be able to chainload a xen.efi from grub2
On Thu, 2015-06-25 at 11:33 +0100, Ian Jackson wrote: Ian Campbell writes ([PATCH OSSTEST v3 21/22] Debian: Arrange to be able to chainload a xen.efi from grub2): Note that the 20_linux_xen change here is a bit specific to us and not really generic enough to go upstream IMHO, hence I haven't. So if we accept this patch, we are committing to always having 20_linux_xen (and perhaps updating it to cope with new versions of grub). Originally having this file in osstest was intended as a stopgap, pending inclusion of a suitable file upstream. I originally considered writing NN_osstest_uefi, but it looked like it was going to involve copying a fair bit of boilerplate from 20_linux_xen. However, I've changed the approach I was using since then and now I suspect there wouldn't actually be much duplication. So unless you think otherwise I'll try that for next time around. Is there some upstream-friendly way of achieving the same thing ? Not AFAIK. I could try upstreaming this but given that a) the user still needs to manually copy things to the ESP and create a suitable xen.cfg and b) people are working on a better way which will just work with the existing non-UEFI grub.cfg file entries, I'm not sure how much point there is. I'm not really sure what is `specific to us' (or what `us' here means - osstest, or Xen on arm64, or ...?) All the paths are basically specific to us, just the general shape of the entry is more generically applicable. http://wiki.xen.org/wiki/Xen_EFI already documents how to do things FWIW. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] libxl: Add AHCI support for upstream qemu
On Thu, 25 Jun 2015, Fabio Fantoni wrote: Il 25/06/2015 12:21, Ian Campbell ha scritto: On Tue, 2015-06-23 at 11:15 +0200, Fabio Fantoni wrote: Usage: ahci=0|1 (default=0) I think a global rather than per disk option is OK (I can't think why a user would want to mix and match) but maybe we should consider using an enum (with values ide and ahci, defaulting to ide in libxl) so that we can add support for whatever fancy new disk controller everyone is using in 5 years time? ahci was added 4 years ago in qemu and I don't know of newer similar tecnology, in the case of enum probably shold be more generic for include more future possibility or I'm wrong? in that case what can be the name? @stabellini and other developer: any advice about this? I don't know of any other block technologies that would use hd as block device names. Virtio-blk uses vd, so it couldn't be confused. However for the sake of being future proof, it might make sense to introduce an enum, maybe something like hdtype? enum hdtype { ide, ahci, } then in the config file: hdtype=ahci ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel