[Xen-devel] [PATCH v2] libxl/libxl_qmp.c: fix qmp_open

2015-03-18 Thread PRAMOD DEVENDRA
From: Pramod Devendra 

Signed-off-by: Pramod Devendra 
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
---
Changed since v1:
1. Make sure sun_path does not overflow.
2. Close qmp_fd on error.
---
 tools/libxl/libxl_qmp.c |   26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index c7324e6..316a93f 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -357,22 +357,32 @@ static libxl__qmp_handler *qmp_init_handler(libxl__gc 
*gc, uint32_t domid)
 static int qmp_open(libxl__qmp_handler *qmp, const char *qmp_socket_path,
 int timeout)
 {
-int ret;
+int ret = -1;
 int i = 0;
 
 qmp->qmp_fd = socket(AF_UNIX, SOCK_STREAM, 0);
 if (qmp->qmp_fd < 0) {
-return -1;
+goto out;
 }
 ret = libxl_fd_set_nonblock(qmp->ctx, qmp->qmp_fd, 1);
-if (ret) return -1;
+if (ret) {
+ret = -1;
+goto out;
+}
 ret = libxl_fd_set_cloexec(qmp->ctx, qmp->qmp_fd, 1);
-if (ret) return -1;
+if (ret) {
+ret = -1;
+goto out;
+}
 
+if (sizeof (qmp->addr.sun_path) <= strlen(qmp_socket_path)) {
+ret = -1;
+goto out;
+}
 memset(&qmp->addr, 0, sizeof (qmp->addr));
 qmp->addr.sun_family = AF_UNIX;
 strncpy(qmp->addr.sun_path, qmp_socket_path,
-sizeof (qmp->addr.sun_path));
+sizeof (qmp->addr.sun_path)-1);
 
 do {
 ret = connect(qmp->qmp_fd, (struct sockaddr *) &qmp->addr,
@@ -384,9 +394,13 @@ static int qmp_open(libxl__qmp_handler *qmp, const char 
*qmp_socket_path,
  * ECONNREFUSED : Leftover socket hasn't been removed yet */
 continue;
 }
-return -1;
+ret = -1;
+goto out;
 } while ((++i / 5 <= timeout) && (usleep(200 * 1000) <= 0));
 
+out:
+if (ret == -1 && qmp->qmp_fd > -1) close(qmp->qmp_fd);
+
 return ret;
 }
 
-- 
1.7.10.4



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] tools/libxl: close the logfile_w and null file descriptors in libxl__spawn_qdisk_backend() error path

2015-03-18 Thread Koushik Chakravarty
Signed-off-by: Koushik Chakravarty 
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
---
 tools/libxl/libxl_dm.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index cb006df..2678be2 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1508,7 +1508,7 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, 
libxl__dm_spawn_state *dmss)
 flexarray_t *dm_args;
 char **args;
 const char *dm;
-int logfile_w, null, rc;
+int logfile_w, null = -1, rc;
 uint32_t domid = dmss->guest_domid;
 
 /* Always use qemu-xen as device model */
@@ -1534,6 +1534,10 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, 
libxl__dm_spawn_state *dmss)
 goto error;
 }
 null = open("/dev/null", O_RDONLY);
+if (null < 0) {
+   rc = ERROR_FAIL;
+   goto error;
+}
 
 dmss->guest_config = NULL;
 /*
@@ -1568,6 +1572,8 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, 
libxl__dm_spawn_state *dmss)
 
 error:
 assert(rc);
+if (logfile_w >= 0) close(logfile_w);
+if (null >= 0) close(null);
 dmss->callback(egc, dmss, rc);
 return;
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Network blocked after sending several packets larger than 128 bytes when using Driver Domain

2015-03-18 Thread openlui
Hi, all:

I am trying to use a HVM with PCI pass-through NIC as network driver domain. 
However, when I send packets whose size are larger than 128 bytes from DomU 
using pkt-gen tools, after several seconds, the network between driver domain 
and destination host will be blocked. 

The networking structure when testing is shown below:
Pkt-gen (in DomU) <--> Virtual Eth (in DomU) <---> VIF (in Driver Domain) <--> 
OVS (in Driver Domain) <--> pNIC (passthrough nic in Driver Domain) <---> 
Another Host

The summarized results are as follows:
1. When we just ping from DomU to another host, the network seems ok.
2. When sending 64 or 128 bytes UDP packets from DomU, the network will not be 
blocked
3. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and if the 
scatter-gather feature of passthrough NIC in driver domain is on, the network 
will be blocked
4. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and only if the 
scatter-gather feature of passthrough NIC in driver domain is off, the network 
will not be blocked

As shown in detailed syslog below, when network is blocked, it seems that the 
passthrough NIC's driver entry an exception state and the tx queue is hung.
As far as I know, when sending 64 or 128 bytes package, the skb generated by 
netback only has the linearized data, and the data is stored in the PAGE 
allocated from the driver domain's memory. But for packets whose size is larger 
than 128 bytes, the skb will also has a frag page which is grant mapped from 
DomU's memory. And if we disable the scatter-gather feature of NIC, the skb 
sent from netback will be linearized firstly, and it will make the skb's data 
is stored in the PAGE allocated from the driver domain other than the DomU's 
memory.

I am wondering if it is the problem caused by PCI-passthrough and DMA 
operations, or if there is some wrong configuration in our environment. How can 
I continue to debug this problem? I am looking forward to your replay and 
advice, Thanks.


The environment we used are as follows:
a. Dom0: SUSE 12 (kernel: 3.12.28)
b. XEN: 4.4.1_0602.2 (provided by SUSE 12)
c. DomU: kernel 3.17.4
d. Driver Domain: kernel 3.17.8
e. OVS: 2.1.2
f. Host: Huawei RH2288, CPU Intel Xenon E5645@2.40GHz, disabled HyperThread, 
enabled VT-d
g. pNIC: we tried Intel 82599 10GE NIC (ixgbe v3.23.2), Intel 82576 1GE NIC 
(igb) and Broadcom NetXtreme II BCM 5709 1GE NIC (bnx2 v2.2.5)
h. para-virtulization driver: netfront/netback
i. MTU: 1500

The detailed Logs in Driver Domain after the network is blocked are as follows:
1. When using 82599 10GE NIC, syslog and dmesg includes infos below. The log 
shows that the Tx unit Hang is detected and driver will try to reset the 
adapter repeatly, however, the network is still blocked.


ixgbe: :00:04.0 eth10: Detected Tx Unit Hang
Tx Queue <0>
TDH, TDT <1fd>, <5a>
next_to_use  <5a>
next_to_clean<1fc>
ixgbe: :00:04.0 eth0: tx hang 11 detected on queue 0, resetting adapter
ixgbe: :00:04.0 eth10: Reset adapter
ixgbe: :00:04.0 eth10: PCIe transaction pending bit also did not clear
ixgbe: :00:04.0 master disable timed out
ixgbe: :00:04.0 eth10: detected SFP+: 3
ixgbe: :00:04.0 eth10: NIC Link is Up 10 Gbps, Flow Control: RX/TX
...


I have tried to remove the "reset adpater" call in ixgbe driver's 
ndo_tx_timeout function, and the logs are shown below. The log shows that when 
network is blocked, the "TDH" and the nic cannot be incremented any more.


ixgbe :00:04.0 eth3: Detected Tx Unit Hang
Tx Queue <0>
TDH, TDT <1fd>, <5a>
next_to_use  <5a>
next_to_clean<1fc>
ixgbe :00:04.0 eth3: tx_buffer_info[next_to_clean]
time_stamp   <1075b74ca>
jiffies  <1075b791c>
ixgbe :00:04.0 eth3: Fake Tx hang detected with timeout of 5 seconds
ixgbe :00:04.0 eth3: Detected Tx Unit Hang
Tx Queue <0>
TDH, TDT <1fd>, <5a>
next_to_use  <5a>
next_to_clean<1fc>
ixgbe :00:04.0 eth3: tx_buffer_info[next_to_clean]
time_stamp   <1075b74ca>
jiffies  <1075b7b11>
...


I have also compared the nic's corresponding pci status before and after the 
network is hung, and found that the "DevSta" filed changed from "TransPend-" to 
"TransPend+" after the network is blocked:


DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+


The network can only be recovered after we reload the ixgbe module in driver 
domain.

2. When using BCM5709 NIC, the results is smiliar. After the network is 
blocked, the syslog has info below:


bnx2 :00:04.0 eth14: <--- start FTQ dump --->
bnx2 :00:04.0 eth14: RV2P_PFTQ_CTL 0001
bnx2 :00:04.0 eth14: RV2P_TFTQ_CTL 0002
...
bnx2 :00:04.0 eth14: CP_CPQ_FTQ_CTL 4000
bnx2 :00:04.0 eth14: CPU states:
bnx2 :00:04.0 eth14: 045000 mode b84c state 80001000 evt_mask 500 pc 
8001280 pc 8001288 instr 8e03
...
bnx2 :00:0

Re: [Xen-devel] (v2) VT-d Posted-interrupt (PI) design for XEN

2015-03-18 Thread Wu, Feng
Thanks for the comments!

> -Original Message-
> From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
> Sent: Thursday, March 19, 2015 12:10 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; Keir Fraser
> (k...@xen.org); Jan Beulich (jbeul...@suse.com)
> Subject: Re: [Xen-devel] (v2) VT-d Posted-interrupt (PI) design for XEN
> 
> On Wed, Mar 18, 2015 at 12:44:21PM +, Wu, Feng wrote:
> > VT-d Posted-interrupt (PI) design for XEN
> >
> > Background
> > ==
> > With the development of virtualization, there are more and more device
> > assignment requirements. However, today when a VM is running with
> > assigned devices (such as, NIC), external interrupt handling for the 
> > assigned
> > devices always needs VMM intervention.
> >
> > VT-d Posted-interrupt is a more enhanced method to handle interrupts
> > in the virtualization environment. Interrupt posting is the process by
> > which an interrupt request is recorded in a memory-resident
> > posted-interrupt-descriptor structure by the root-complex, followed by
> > an optional notification event issued to the CPU complex.
> >
> > With VT-d Posted-interrupt we can get the following advantages:
> > - Direct delivery of external interrupts to running vCPUs without VMM
> > intervention
> 
> 
> I hadn't digged deep in what Xen has currently - but I would assume that
> this is exactly what we have now in Xen?

Here is what Xen currently does for external interrupts from assigned devices:

When a VM is running and an external interrupts from an assigned devices occurs
for it. VM-EXIT happens, then:

vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() --> 
raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)

softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()

dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver() 
-->
vmsi_inj_irq() --> vlapic_set_irq()

vlapic_set_irq() does the following things:
1. If CPU-side posted-interrupt is supported (I think it is supported from Xen 
4.3, or Xen 4.4,
sorry, not quite remember the exact version), call vmx_deliver_posted_intr() to 
deliver
the virtual interrupt via posted-interrupt infrastructure.
2. Else If CPU-side posted-interrupt is not supported, set the related vIRR in 
vLAPIC
page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, 
vmx_intr_assist()
will help to inject the interrupt to guests.

However, after VT-d PI is supported, when a guest is running in non-root and an
external interrupt from an assigned device occurs for it. _no_ VM-Exit is 
needed,
the guest can handle this totally in non-root mode, thus avoiding all the above
code flow.

> 
> Hm, actually we seem to be still invoking the hypervisor on the
> interrupts  -except that if we need to dispatch it to another CPU
> using an normal vector to do so - which would still cause the
> hypervisor to be invoked? Or does it actually go straight in the
> guest?
> 

Like what I mentioned above, If the guest is running, we don't need invoke 
hypervisor.

> So what kind of support do we currently have in Xen from posted
> interrupt? Could you add a bit about this in the background please?

Good suggestion.

Currently, Xen only supports the CPU-side posted-interrupt. Like what I 
mentioned above,
function vlapic_set_irq() can use this to deliver virtual interrupts, basically 
there are several
methods to deliver virtual interrupts to guests:
- Event delivery before VM-Entry via __vmx_inject_exception(), this is the 
oldest way.
- After APICv was enabled, we had hardware support for virtual interrupt 
delivery, virtual
interrupts are stored in virtual LAPIC page, after VM-Entry, guests can 
evaluate these
virtual interrupt and handle them in non-root mode.
- As an enhancement to APICv, CPU-side posted-interrupt was introduced, like 
above comments,
with this new feature, we don't need to kick the vCPU and deliver the virtual 
interrupts
direct to it.

About APICv and CPU-side Posted-interrupt, please refer to Chapter 29, and 
Section 29.6 in the Intel SDM: 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

> 
> > - Decrease the interrupt migration complexity. On vCPU migration, software
> > can atomically co-migrate all interrupts targeting the migrating vCPU. For
> > virtual machines with assigned devices, migrating a vCPU across pCPUs
> > either incur the overhead of forwarding interrupts in software (e.g. via VMM
> > generated IPIS), or complexity to independently migrate each interrupt
> targeting
> > the vCPU to the new pCPU. However, after enabling VT-d PI, the destination
> vCPU
> > of an external interrupt from assigned devices is stored in the IRTE (i.e.
> > Posted-interrupt Descriptor Address), when vCPU is migrated to another
> pCPU,
> > we will set this new pCPU in the 'NDST' filed of Posted-interrupt 
> > descriptor,
> this
> > make the interrupt migration

[Xen-devel] for test - please ignore

2015-03-18 Thread Wu, Feng
For test, please ignore.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 9/9] qspinlock, x86, kvm: Implement KVM support for paravirt qspinlock

2015-03-18 Thread Waiman Long

On 03/16/2015 09:16 AM, Peter Zijlstra wrote:

Implement the paravirt qspinlock for x86-kvm.

We use the regular paravirt call patching to switch between:

   native_queue_spin_lock_slowpath()__pv_queue_spin_lock_slowpath()
   native_queue_spin_unlock()   __pv_queue_spin_unlock()

We use a callee saved call for the unlock function which reduces the
i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
again.

We further optimize the unlock path by patching the direct call with a
"movb $0,%arg1" if we are indeed using the native unlock code. This
makes the unlock code almost as fast as the !PARAVIRT case.

This significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.


Signed-off-by: Peter Zijlstra (Intel)


I do have some concern about this call site patching mechanism as the 
modification is not atomic. The spin_unlock() calls are in many places 
in the kernel. There is a possibility that a thread is calling a certain 
spin_unlock call site while it is being patched by another one with the 
alternative() function call.


So far, I don't see any problem with bare metal where 
paravirt_patch_insns() is used to patch it to the move instruction. 
However, in a virtual guest enivornment where paravirt_patch_call() was 
used, there were situations where the system panic because of page fault 
on some invalid memory in the kthread. If you look at the 
paravirt_patch_call(), you will see:


:
b->opcode = 0xe8; /* call */
b->delta = delta;

If another CPU reads the instruction at the call site at the right 
moment, it will get the modified call instruction, but not the new delta 
value. It will then jump to a random location. I believe that was 
causing the system panic that I saw.


So I think it is kind of risky to use it here unless we can guarantee 
that call site patching is atomic wrt other CPUs.


-Longman

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) VT-d Posted-interrupt (PI) design for XEN

2015-03-18 Thread Zhang, Yang Z
Konrad Rzeszutek Wilk wrote on 2015-03-19:
> On Wed, Mar 18, 2015 at 12:44:21PM +, Wu, Feng wrote:
>> VT-d Posted-interrupt (PI) design for XEN
>> 
>> Background
>> ==
>> With the development of virtualization, there are more and more
>> device assignment requirements. However, today when a VM is running
>> with assigned devices (such as, NIC), external interrupt handling
>> for the assigned devices always needs VMM intervention.
>> 
>> VT-d Posted-interrupt is a more enhanced method to handle interrupts
>> in the virtualization environment. Interrupt posting is the process
>> by which an interrupt request is recorded in a memory-resident
>> posted-interrupt-descriptor structure by the root-complex, followed
>> by an optional notification event issued to the CPU complex.
>> 
>> With VT-d Posted-interrupt we can get the following advantages:
>> - Direct delivery of external interrupts to running vCPUs without
>> VMM intervention
> 
> 
> I hadn't digged deep in what Xen has currently - but I would assume
> that this is exactly what we have now in Xen?
> 
> Hm, actually we seem to be still invoking the hypervisor on the
> interrupts -except that if we need to dispatch it to another CPU using
> an normal vector to do so - which would still cause the hypervisor to
> be invoked? Or does it actually go straight in the guest?
> 
> So what kind of support do we currently have in Xen from posted interrupt?
> Could you add a bit about this in the background please?

All virtual interrupts are delivered through CPU side posted interrupt 
regardless the VT-d side PI supporting. The difference is: W/o VT-side PI 
supporting, for the interrupt of assigned device, we deliver it to another 
CPU(different from the CPU which target vcpu is running) and then use PI to 
deliver it to eliminate the vmexit. With VT-d side PI supporting, the interrupt 
is able to be delivered to guest directly without any other CPU's involvement 
and vmexit. 

Is it clear?

> 


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-upstream-4.3-testing test] 36518: tolerable FAIL - PUSHED

2015-03-18 Thread xen . org
flight 36518 qemu-upstream-4.3-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36518/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 debian-hvm-install fail never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 debian-hvm-install  fail never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 qemuuab689a89ec47b2e1c964c57bea7da68f8ddf89fd
baseline version:
 qemuu580b1d06aa3eed3ae9c12b4225a1ea1c192ab119


People who touched revisions under test:
  Andreas Färber 
  Anthony Liguori 
  Asias He 
  Benoit Canet 
  Benoît Canet 
  Gerd Hoffmann 
  Juan Quintela 
  Kevin Wolf 
  Michael Roth 
  Michael S. Tsirkin 
  Paolo Bonzini 
  Peter Maydell 
  Petr Matousek 
  Stefan Hajnoczi 
  Stefano Stabellini 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-amd64-xl-credit2  pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-amd64-xl-pcipt-intel  fail
 test-amd64-i386-rhel6hvm-intel   pass
 test-amd64-i386-qemut-rhel6hvm-intel pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-libvirt pass
 test-amd64-i386-libvirt  fail
 test-amd64-amd64-xl-multivcpupass
 test-amd64-amd64-pairpass
 test-amd64-i386-pair pass
 test-amd64-amd64-xl-sedf-pin pass
 test-amd64-amd64-pv  pass
 test-amd64-i386-pv   pass
 test-amd64-amd64-xl-sedf pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail

Re: [Xen-devel] [v2][PATCH 2/2] libxl: introduce gfx_passthru_kind

2015-03-18 Thread Chen, Tiejun

This duplicates the code from above. I think this would be best done as:

static int libxl__detect_gfx_passthru_kind(libxl__gc *gc, guest_config)
{
 if (b_info->u.hvm.gfx_passthru_kind != LIBXL_GFX_PASSTHRU_KIND_DEFAULT)
 return 0;

 if (libxl__is_igd_vga_passthru(gc, guest_config)) {
 b_info->u.hvm.gfx_passthru_kind = LIBXL_GFX_PASSTHRU_KIND_IGD;
 return 0;
 }

 LOG(ERROR, "Unable to detect graphics passthru kind");
 return 1;
}

Then for the code in libxl__build_device_model_args_new:

  if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) {
  if (!libxl__detect_gfx_passthru_kind(gc, guest_config))
   return NULL
  switch (b_info->u.hvm.gfx_passthru_kind) {
  case LIBXL_GFX_PASSTHRU_KIND_IGD:
  machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg);
  break;
  default:
  LOG(ERROR, "unknown gfx_passthru_kind\n");
 return NULL;
  }
 }

That is, a helper to try and autodetect kind if it is default and then a
single switch entry for each kind.


+default:
+LOG(WARN, "gfx_passthru_kind is invalid so ignored.\n");


Please return an error here, as I've shown above.


Looks good and thanks, but here 'guest_config' is a const so we 
shouldn't/can't reset b_info->u.hvm.gfx_passthru_kind like this,


b_info->u.hvm.gfx_passthru_kind = LIBXL_GFX_PASSTHRU_KIND_IGD;

So I tried to refactor a little bit to follow up yours,

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 8599a6a..605b17c 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -409,6 +409,23 @@ static char *dm_spice_options(libxl__gc *gc,
 return opt;
 }

+static int
+libxl__detect_gfx_passthru_kind(libxl__gc *gc,
+const libxl_domain_config *guest_config)
+{
+const libxl_domain_build_info *b_info = &guest_config->b_info;
+
+if (b_info->u.hvm.gfx_passthru_kind != LIBXL_GFX_PASSTHRU_KIND_DEFAULT)
+return b_info->u.hvm.gfx_passthru_kind;
+
+if (libxl__is_igd_vga_passthru(gc, guest_config)) {
+return LIBXL_GFX_PASSTHRU_KIND_IGD;
+}
+
+LOG(ERROR, "Unable to detect graphics passthru kind");
+return -1;
+}
+
 static char ** libxl__build_device_model_args_new(libxl__gc *gc,
 const char *dm, int guest_domid,
 const libxl_domain_config 
*guest_config,
@@ -427,7 +444,7 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,

 const char *keymap = dm_keymap(guest_config);
 char *machinearg;
 flexarray_t *dm_args;
-int i, connection, devid;
+int i, connection, devid, gfx_passthru_kind;
 uint64_t ram_size;
 const char *path, *chardev;

@@ -710,9 +727,6 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,

 flexarray_append(dm_args, "-net");
 flexarray_append(dm_args, "none");
 }
-if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) {
-flexarray_append(dm_args, "-gfx_passthru");
-}
 } else {
 if (!sdl && !vnc) {
 flexarray_append(dm_args, "-nographic");
@@ -757,6 +771,20 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,

 machinearg, max_ram_below_4g);
 }
 }
+
+if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) {
+gfx_passthru_kind = libxl__detect_gfx_passthru_kind(gc,
+ 
guest_config);

+switch (gfx_passthru_kind) {
+case LIBXL_GFX_PASSTHRU_KIND_IGD:
+machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg);
+break;
+default:
+LOG(ERROR, "unknown gfx_passthru_kind\n");
+return NULL;
+}
+}
+
 flexarray_append(dm_args, machinearg);
 for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; 
i++)

 flexarray_append(dm_args, b_info->extra_hvm[i]);




diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c97c62d..8912421 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3632,6 +3632,11 @@ static inline void
libxl__update_config_vtpm(libxl__gc *gc,
*/
   void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
   const libxl_bitmap *sptr);
+
+#ifdef LIBXL_HAVE_GFX_PASSTHRU_KIND


No need for this ifdef.


Removed.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 4/5] Qemu-Xen-vTPM: Qemu vTPM xenstubdoms backen.

2015-03-18 Thread Xu, Quan


> -Original Message-
> From: Stefan Berger [mailto:stef...@linux.vnet.ibm.com]
> Sent: Thursday, March 19, 2015 3:17 AM
> To: Xu, Quan; stefano.stabell...@eu.citrix.com; qemu-de...@nongnu.org;
> arm...@redhat.com; lcapitul...@redhat.com; aligu...@amazon.com;
> pbonz...@redhat.com; ebl...@redhat.com; kra...@redhat.com;
> meyer...@redhat.com; m...@tls.msk.ru; s...@weilnetz.de; wei.l...@citrix.com
> Cc: xen-devel@lists.xen.org
> Subject: Re: [PATCH v4 4/5] Qemu-Xen-vTPM: Qemu vTPM xenstubdoms
> backen.
> 
> On 03/10/2015 08:14 AM, Quan Xu wrote:
> > This Patch provides the glue for the TPM_TIS(Qemu frontend) to Xen
> > stubdom vTPM domain that provides the actual TPM functionality. It
> > sends data and TPM commends with xen_vtpm_frontend. It is similar as
> > another two vTPM backens:
> >*vTPM passthrough backen Since QEMU 1.5.
> >*vTPM libtpms-based backen.
> >
> > Some details:
> > This part of the patch provides support for the spawning of a thread
> > that will interact with stubdom vTPM domain by the xen_vtpm_frontend.
> > It expects a signal from the frontend to wake and pick up the TPM
> > command that is supposed to be processed and delivers the response
> > packet using a callback function provided by the frontend.
> >
> > The backend connects itself to the frontend by filling out an
> > interface structure with pointers to the function implementing support
> > for various operations.
> >
> > (QEMU) vTPM XenStubdoms backen is initialized by Qemu command line
> options,
> >"-tpmdev xenstubdoms,id=xenvtpm0 -device
> tpm-tis,tpmdev=xenvtpm0"
> >
> > --Changes in v3:
> > -Call vtpm_send() and vtpm_recv() directly
> >
> > --Changes in v4:
> > -Fix the comment style
> >
> > Signed-off-by: Quan Xu 
> > ---
> >   hw/tpm/Makefile.objs |   2 +-
> >   hw/tpm/tpm_xenstubdoms.c | 247
> +++
> >   2 files changed, 248 insertions(+), 1 deletion(-)
> >   create mode 100644 hw/tpm/tpm_xenstubdoms.c
> >
> > diff --git a/hw/tpm/Makefile.objs b/hw/tpm/Makefile.objs index
> > 57919fa..190e776 100644
> > --- a/hw/tpm/Makefile.objs
> > +++ b/hw/tpm/Makefile.objs
> > @@ -1,3 +1,3 @@
> >   common-obj-$(CONFIG_TPM_TIS) += tpm_tis.o
> >   common-obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o
> > -common-obj-$(CONFIG_TPM_XENSTUBDOMS) += xen_vtpm_frontend.o
> > +common-obj-$(CONFIG_TPM_XENSTUBDOMS) += tpm_xenstubdoms.o
> > +xen_vtpm_frontend.o
> > diff --git a/hw/tpm/tpm_xenstubdoms.c b/hw/tpm/tpm_xenstubdoms.c new
> > file mode 100644 index 000..6d0dc32
> > --- /dev/null
> > +++ b/hw/tpm/tpm_xenstubdoms.c
> > @@ -0,0 +1,247 @@
> > +/*
> > + * Xen Stubdom vTPM driver
> > + *
> > + *  Copyright (c) 2015 Intel Corporation
> > + *  Authors:
> > + *Quan Xu 
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see
> > +  */
> > +
> > +#include 
> > +#include "qemu-common.h"
> > +#include "qapi/error.h"
> > +#include "qemu/sockets.h"
> > +#include "qemu/log.h"
> > +#include "sysemu/tpm_backend.h"
> > +#include "tpm_int.h"
> > +#include "hw/hw.h"
> > +#include "hw/i386/pc.h"
> > +#include "hw/xen/xen_backend.h"
> > +#include "sysemu/tpm_backend_int.h"
> > +#include "tpm_tis.h"
> > +
> > +#ifdef DEBUG_TPM
> > +#define DPRINTF(fmt, ...) \
> > +do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) #else
> > +#define DPRINTF(fmt, ...) \
> > +do { } while (0)
> > +#endif
> > +
> > +#define TYPE_TPM_XENSTUBDOMS "tpm-xenstubdoms"
> > +#define TPM_XENSTUBDOMS(obj) \
> > +OBJECT_CHECK(TPMXenstubdomsState, (obj),
> TYPE_TPM_XENSTUBDOMS)
> > +
> > +static const TPMDriverOps tpm_xenstubdoms_driver;
> > +
> > +/* Data structures */
> > +typedef struct TPMXenstubdomsThreadParams {
> > +TPMState *tpm_state;
> > +TPMRecvDataCB *recv_data_callback;
> > +TPMBackend *tb;
> > +} TPMXenstubdomsThreadParams;
> > +
> > +struct TPMXenstubdomsState {
> > +TPMBackend parent;
> > +TPMBackendThread tbt;
> > +TPMXenstubdomsThreadParams tpm_thread_params;
> > +bool had_startup_error;
> > +};
> > +
> > +typedef struct TPMXenstubdomsState TPMXenstubdomsState;
> > +
> > +/* Functions */
> > +static void tpm_xenstubdoms_cancel_cmd(TPMBackend *tb);
> > +
> > +static int tpm_xenstubdoms_unix_transfer(const TPMLocality
> > +*locty_data) {
> > +size_t rlen;
> > +struct XenDevice

[Xen-devel] Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)

2015-03-18 Thread Marek Marczykowski-Górecki
Hi,

I've hit some deadlock in kernel xenstore client exposed via
/proc/xen/xenbus. Steps to reproduce are simple:
int main() {
struct xs_handle *xs;
xs = xs_open(0);
xs_watch(xs, "domid", "token");
xs_read(xs, 0, "name", NULL);
return 0;
}

xs_watch internally creates new thread, which uses read to wait for the
watch. And in the same time, the program tries to read some value,
but actually it hangs at sending the command (before even sending a path to be
read). Strace gives this (simplified for readability):
[pid  2494] write(3, "\4\0\0\0\0\0\0\0\0\0\0\0\f\0\0\0", 160 = 16
[pid  2494] write(3, "domid\0", 6)  = 6
[pid  2494] write(3, "token\0", 6)  = 6
[pid  2495] read(3,  
[pid  2494] futex(0x71c0d4, FUTEX_WAIT_PRIVATE, 1, NULL 
[pid  2495] <... read resumed>
"\17\0\0\0\377\377\377\377\220~\255\27\f\0\0\0", 16) = 16
[pid  2495] read(3, "domid\0token\0", 12) = 12
[pid  2495] read(3, "\4\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0", 16) = 16
[pid  2495] read(3, "OK\0", 3)  = 3
[pid  2495] futex(0x71c0d4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x71c0d0,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} 
[pid  2494] <... futex resumed> )   = 0
[pid  2495] <... futex resumed> )   = 1
[pid  2494] futex(0x71c0a8, FUTEX_WAIT_PRIVATE, 2, NULL 
[pid  2495] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 
[pid  2494] <... futex resumed> )   = -1 EAGAIN (Resource
temporarily unavailable)
[pid  2495] <... futex resumed> )   = 0
[pid  2494] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 
[pid  2495] read(3,  
[pid  2494] <... futex resumed> )   = 0
[pid  2494] rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER,
0x7fc78c1488f0}, NULL, 8) = 0
[pid  2494] rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER,
0x7fc78c1488f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc78c1488f0}, 8) = 0
[pid  2494] write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0", 16

And thats all - 2494 is waiting on write, 2495 is waiting on read.

On 3.12.x it is working. On 3.17.0 and 3.18.7 it is broken. I haven't
checked versions in the middle.

Any ideas?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


pgpTtYH7GsIrQ.pgp
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxlu: avoid having two definitions of XLU_ConfigList

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 09:33:58PM +, Wei Liu wrote:
> There is already a typedef in libxlutil.h. Remove the one in
> libxlu_internal.h.
> 
> Signed-off-by: Wei Liu 

Missing the 'Reported-by: Konrad Rzeszutek Wilk '
> ---
>  tools/libxl/libxlu_internal.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxlu_internal.h b/tools/libxl/libxlu_internal.h
> index 3451cfe..0acdde3 100644
> --- a/tools/libxl/libxlu_internal.h
> +++ b/tools/libxl/libxlu_internal.h
> @@ -25,11 +25,11 @@
>  
>  #include "libxlutil.h"
>  
> -typedef struct XLU_ConfigList {
> +struct XLU_ConfigList {
>  int avalues; /* available slots */
>  int nvalues; /* actual occupied slots */
>  XLU_ConfigValue **values;
> -} XLU_ConfigList;
> +};
>  
>  typedef struct YYLTYPE
>  {
> -- 
> 1.9.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 10/13] libxc: If xc_domain_add_to_physmap fails, include errno value

2015-03-18 Thread Konrad Rzeszutek Wilk
Instead of just the return value.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by Ian Campbell 
---
 tools/libxc/xc_dom_x86.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index af0c9f4..3301f53 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -946,7 +946,8 @@ static int map_grant_table_frames(struct xc_dom_image *dom)
 }
 xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
  "%s: mapping grant tables failed " "(pfn=0x%" PRIpfn
- ", rc=%d)", __FUNCTION__, dom->p2m_size + i, rc);
+ ", rc=%d, errno=%d)", __FUNCTION__, dom->p2m_size + i,
+ rc, errno);
 return rc;
 }
 }
@@ -999,8 +1000,8 @@ int arch_setup_bootlate(struct xc_dom_image *dom)
 if ( rc != 0 )
 {
 xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, "%s: mapping"
- " shared_info failed (pfn=0x%" PRIpfn ", rc=%d)",
- __FUNCTION__, dom->shared_info_pfn, rc);
+ " shared_info failed (pfn=0x%" PRIpfn ", rc=%d, 
errno: %d)",
+ __FUNCTION__, dom->shared_info_pfn, rc, errno);
 return rc;
 }
 
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 01/13] libxc: Replaces tabs with spaces in xc_cpupool_freeinfo

2015-03-18 Thread Konrad Rzeszutek Wilk
The goto looks very wrong when the rest of the code
has spaces.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Ian Campbell 
---
 tools/libxc/xc_cpupool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_cpupool.c b/tools/libxc/xc_cpupool.c
index 6393cfb..828f234 100644
--- a/tools/libxc/xc_cpupool.c
+++ b/tools/libxc/xc_cpupool.c
@@ -190,11 +190,11 @@ xc_cpumap_t xc_cpupool_freeinfo(xc_interface *xch)
 err = do_sysctl_save(xch, &sysctl);
 
 if ( err < 0 )
-   goto out;
+goto out;
 
 cpumap = xc_cpumap_alloc(xch);
 if (cpumap == NULL)
-   goto out;
+goto out;
 
 memcpy(cpumap, local, mapsize);
 
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 06/13] libxc: Fix xc_pm API calls to return negative error and stash error in errno.

2015-03-18 Thread Konrad Rzeszutek Wilk
Oddly enough the user of this API did the right thing -
check for return being negative and used 'errno' for the
real error.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Ian Campbell 
---
 tools/libxc/xc_pm.c | 54 +++--
 1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/tools/libxc/xc_pm.c b/tools/libxc/xc_pm.c
index e4e0fb9..5a7148e 100644
--- a/tools/libxc/xc_pm.c
+++ b/tools/libxc/xc_pm.c
@@ -51,8 +51,10 @@ int xc_pm_get_pxstat(xc_interface *xch, int cpuid, struct 
xc_px_stat *pxpt)
 int max_px, ret;
 
 if ( !pxpt->trans_pt || !pxpt->pt )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 if ( (ret = xc_pm_get_max_px(xch, cpuid, &max_px)) != 0)
 return ret;
 
@@ -219,8 +221,10 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 if ( (!user_para->affected_cpus)||
  (!user_para->scaling_available_frequencies)||
  (!user_para->scaling_available_governors) )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 if ( xc_hypercall_bounce_pre(xch, affected_cpus) )
 goto unlock_1;
 if ( xc_hypercall_bounce_pre(xch, scaling_available_frequencies) )
@@ -293,8 +297,10 @@ int xc_set_cpufreq_gov(xc_interface *xch, int cpuid, char 
*govname)
 char *scaling_governor = sysctl.u.pm_op.u.set_gov.scaling_governor;
 
 if ( !xch || !govname )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 sysctl.cmd = XEN_SYSCTL_pm_op;
 sysctl.u.pm_op.cmd = SET_CPUFREQ_GOV;
 sysctl.u.pm_op.cpuid = cpuid;
@@ -310,8 +316,10 @@ int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
 DECLARE_SYSCTL;
 
 if ( !xch )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 sysctl.cmd = XEN_SYSCTL_pm_op;
 sysctl.u.pm_op.cmd = SET_CPUFREQ_PARA;
 sysctl.u.pm_op.cpuid = cpuid;
@@ -327,8 +335,10 @@ int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, 
int *avg_freq)
 DECLARE_SYSCTL;
 
 if ( !xch || !avg_freq )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 sysctl.cmd = XEN_SYSCTL_pm_op;
 sysctl.u.pm_op.cmd = GET_CPUFREQ_AVGFREQ;
 sysctl.u.pm_op.cpuid = cpuid;
@@ -392,8 +402,10 @@ int xc_get_cpuidle_max_cstate(xc_interface *xch, uint32_t 
*value)
 DECLARE_SYSCTL;
 
 if ( !xch || !value )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 sysctl.cmd = XEN_SYSCTL_pm_op;
 sysctl.u.pm_op.cmd = XEN_SYSCTL_pm_op_get_max_cstate;
 sysctl.u.pm_op.cpuid = 0;
@@ -409,8 +421,10 @@ int xc_set_cpuidle_max_cstate(xc_interface *xch, uint32_t 
value)
 DECLARE_SYSCTL;
 
 if ( !xch )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 sysctl.cmd = XEN_SYSCTL_pm_op;
 sysctl.u.pm_op.cmd = XEN_SYSCTL_pm_op_set_max_cstate;
 sysctl.u.pm_op.cpuid = 0;
@@ -424,8 +438,10 @@ int xc_enable_turbo(xc_interface *xch, int cpuid)
 DECLARE_SYSCTL;
 
 if ( !xch )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 sysctl.cmd = XEN_SYSCTL_pm_op;
 sysctl.u.pm_op.cmd = XEN_SYSCTL_pm_op_enable_turbo;
 sysctl.u.pm_op.cpuid = cpuid;
@@ -437,8 +453,10 @@ int xc_disable_turbo(xc_interface *xch, int cpuid)
 DECLARE_SYSCTL;
 
 if ( !xch )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 sysctl.cmd = XEN_SYSCTL_pm_op;
 sysctl.u.pm_op.cmd = XEN_SYSCTL_pm_op_disable_turbo;
 sysctl.u.pm_op.cpuid = cpuid;
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 03/13] libxc: Fix xc_domain_get_tsc_info to return -1 instead of -Exx.

2015-03-18 Thread Konrad Rzeszutek Wilk
We don't need to put fill errno because xc_hypercall_buffer_alloc
fills the errno with the appropriate errno values and we just
need to pass them up the stack.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Ian Campbell 
---
 tools/libxc/xc_domain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 845d1d7..2fed727 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -771,7 +771,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
 
 info = xc_hypercall_buffer_alloc(xch, info, sizeof(*info));
 if ( info == NULL )
-return -ENOMEM;
+return -1;
 
 domctl.cmd = XEN_DOMCTL_gettscinfo;
 domctl.domain = (domid_t)domid;
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 08/13] libxc: Check xc_domain_maximum_gpfn for negative return values

2015-03-18 Thread Konrad Rzeszutek Wilk
Instead of assuming everything is always OK. We stash
the gpfns value as an parameter.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 tools/libxc/xc_core_arm.c| 17 ++---
 tools/libxc/xc_core_x86.c| 24 
 tools/libxc/xc_domain_save.c |  8 +++-
 3 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/xc_core_arm.c b/tools/libxc/xc_core_arm.c
index 16508e7..26cec04 100644
--- a/tools/libxc/xc_core_arm.c
+++ b/tools/libxc/xc_core_arm.c
@@ -31,9 +31,16 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
*arch_ctxt,
 }
 
 
-static int nr_gpfns(xc_interface *xch, domid_t domid)
+static int nr_gpfns(xc_interface *xch, domid_t domid, unsigned long *gpfns)
 {
-return xc_domain_maximum_gpfn(xch, domid) + 1;
+int rc = xc_domain_maximum_gpfn(xch, domid);
+
+if ( rc >= 0 )
+{
+*gpfns = rc + 1;
+rc = 0;
+}
+return rc;
 }
 
 int
@@ -48,9 +55,13 @@ xc_core_arch_memory_map_get(xc_interface *xch, struct 
xc_core_arch_context *unus
 xc_core_memory_map_t **mapp,
 unsigned int *nr_entries)
 {
-unsigned long p2m_size = nr_gpfns(xch, info->domid);
+unsigned long p2m_size = 0;
+int rc = nr_gpfns(xch, info->domid, &p2m_size);
 xc_core_memory_map_t *map;
 
+if ( rc < 0 )
+return -1;
+
 map = malloc(sizeof(*map));
 if ( map == NULL )
 {
diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
index d8846f1..02377e8 100644
--- a/tools/libxc/xc_core_x86.c
+++ b/tools/libxc/xc_core_x86.c
@@ -36,9 +36,16 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
*arch_ctxt,
 }
 
 
-static int nr_gpfns(xc_interface *xch, domid_t domid)
+static int nr_gpfns(xc_interface *xch, domid_t domid, unsigned long *gpfns)
 {
-return xc_domain_maximum_gpfn(xch, domid) + 1;
+int rc = xc_domain_maximum_gpfn(xch, domid);
+
+if ( rc >= 0 )
+{
+*gpfns = rc + 1;
+rc = 0;
+}
+return rc;
 }
 
 int
@@ -53,9 +60,13 @@ xc_core_arch_memory_map_get(xc_interface *xch, struct 
xc_core_arch_context *unus
 xc_core_memory_map_t **mapp,
 unsigned int *nr_entries)
 {
-unsigned long p2m_size = nr_gpfns(xch, info->domid);
+unsigned long p2m_size = 0;
+int rc = nr_gpfns(xch, info->domid, &p2m_size);
 xc_core_memory_map_t *map;
 
+if ( rc < 0 )
+return -1;
+
 map = malloc(sizeof(*map));
 if ( map == NULL )
 {
@@ -88,7 +99,12 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct 
domain_info_context *dinfo, xc
 int err;
 int i;
 
-dinfo->p2m_size = nr_gpfns(xch, info->domid);
+err = nr_gpfns(xch, info->domid, &dinfo->p2m_size);
+if ( err < 0 )
+{
+ERROR("nr_gpfns returns errno: %d.", errno);
+goto out;
+}
 if ( dinfo->p2m_size < info->nr_pages  )
 {
 ERROR("p2m_size < nr_pages -1 (%lx < %lx", dinfo->p2m_size, 
info->nr_pages - 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 254fdb3..6346c12 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -939,7 +939,13 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom, uint32_t max_iter
 }
 
 /* Get the size of the P2M table */
-dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
+rc = xc_domain_maximum_gpfn(xch, dom);
+if ( rc < 0 )
+{
+ERROR("Could not get maximum GPFN!");
+goto out;
+}
+dinfo->p2m_size = rc + 1;
 
 if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
 {
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] Fix libxc return -E misusage.

2015-03-18 Thread Konrad Rzeszutek Wilk
Hey

Please see the following patches which fix an subset of
some of the various usages of return -Exx instead of
using -1 (and stashing in errno the error value).

We also clean up some cases where the errno is over-writen
- we want to bubble up the errnor that the underlaying
hypercall had done.

Lastly it also wraps an invisibility layer on xc_hypercall_bounce_*
so that any errors in those won't over-write the hypercall
ones (as we usually do hypercall and then xc_hypercall_bounce_post).

Compared to v1, the patches that have changed the most are:

 [PATCH v2 07/13] libxc: Fix xc_tmem_control to return proper error.
 [PATCH v2 08/13] libxc: Check xc_domain_maximum_gpfn for negative
 [PATCH v2 09/13] libxc: Check xc_maximum_ram_page for negative return
 [PATCH v2 11/13] libxc: Check xc_sharing_* for proper return values.

which have changed per Ian's review.

 tools/libxc/include/xenctrl.h  |  2 +-
 tools/libxc/xc_core_arm.c  | 17 +--
 tools/libxc/xc_core_x86.c  | 24 ---
 tools/libxc/xc_cpupool.c   |  4 +--
 tools/libxc/xc_dom_x86.c   |  7 +++--
 tools/libxc/xc_domain.c|  2 +-
 tools/libxc/xc_domain_save.c   |  8 -
 tools/libxc/xc_freebsd_osdep.c |  3 ++
 tools/libxc/xc_hcall_buf.c |  6 
 tools/libxc/xc_linux_osdep.c   |  3 ++
 tools/libxc/xc_offline_page.c  | 42 ++
 tools/libxc/xc_physdev.c   | 12 +---
 tools/libxc/xc_pm.c| 54 ++
 tools/libxc/xc_private.c   | 15 +++---
 tools/libxc/xc_tmem.c  | 14 ++---
 tools/libxc/xg_save_restore.h  |  3 +-
 tools/libxl/libxl.c|  4 +--
 tools/libxl/libxl_x86.c|  9 ++
 tools/misc/xen-hptool.c|  6 ++--
 tools/misc/xen-mfndump.c   |  9 +++---
 tools/tests/mem-sharing/memshrtool.c   | 12 ++--
 tools/xenstat/libxenstat/src/xenstat.c |  7 +++--
 22 files changed, 178 insertions(+), 85 deletions(-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 02/13] libxc: Propagate errno from hypercall instead of anything else.

2015-03-18 Thread Konrad Rzeszutek Wilk
After we have done the hypercall - the errno has the failure
code. However our usage of pthread and munmap can trigger them
to manipulate the errno with their failure values. That would
be bad as what we care about is just the hypercall error value.

Another solution to this would be to save the 'errno' from
pthread/munmap/madvise as an extra parameter to be analyzed
later. However the call-sites above us do not care about it.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Ian Campbell 
---
 tools/libxc/xc_freebsd_osdep.c | 3 +++
 tools/libxc/xc_hcall_buf.c | 6 ++
 tools/libxc/xc_linux_osdep.c   | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/tools/libxc/xc_freebsd_osdep.c b/tools/libxc/xc_freebsd_osdep.c
index 151d3bf..e6613ef 100644
--- a/tools/libxc/xc_freebsd_osdep.c
+++ b/tools/libxc/xc_freebsd_osdep.c
@@ -125,10 +125,13 @@ static void 
freebsd_privcmd_free_hypercall_buffer(xc_interface *xch,
   int npages)
 {
 
+int saved_errno = errno;
 /* Unlock pages */
 munlock(ptr, npages * XC_PAGE_SIZE);
 
 munmap(ptr, npages * XC_PAGE_SIZE);
+/* We MUST propagate the hypercall errno, not unmap call's. */
+errno = saved_errno;
 }
 
 static int freebsd_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h,
diff --git a/tools/libxc/xc_hcall_buf.c b/tools/libxc/xc_hcall_buf.c
index e762a93..932b47c 100644
--- a/tools/libxc/xc_hcall_buf.c
+++ b/tools/libxc/xc_hcall_buf.c
@@ -33,16 +33,22 @@ pthread_mutex_t hypercall_buffer_cache_mutex = 
PTHREAD_MUTEX_INITIALIZER;
 
 static void hypercall_buffer_cache_lock(xc_interface *xch)
 {
+int saved_errno = errno;
 if ( xch->flags & XC_OPENFLAG_NON_REENTRANT )
 return;
 pthread_mutex_lock(&hypercall_buffer_cache_mutex);
+/* Ignore pthread errors. */
+errno = saved_errno;
 }
 
 static void hypercall_buffer_cache_unlock(xc_interface *xch)
 {
+int saved_errno = errno;
 if ( xch->flags & XC_OPENFLAG_NON_REENTRANT )
 return;
 pthread_mutex_unlock(&hypercall_buffer_cache_mutex);
+/* Ignore pthread errors. */
+errno = saved_errno;
 }
 
 static void *hypercall_buffer_cache_alloc(xc_interface *xch, int nr_pages)
diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c
index 92f7cac..2687424 100644
--- a/tools/libxc/xc_linux_osdep.c
+++ b/tools/libxc/xc_linux_osdep.c
@@ -122,10 +122,13 @@ out:
 
 static void linux_privcmd_free_hypercall_buffer(xc_interface *xch, 
xc_osdep_handle h, void *ptr, int npages)
 {
+int saved_errno = errno;
 /* Recover the VMA flags. Maybe it's not necessary */
 madvise(ptr, npages * XC_PAGE_SIZE, MADV_DOFORK);
 
 munmap(ptr, npages * XC_PAGE_SIZE);
+/* We MUST propagate the hypercall errno, not unmap call's. */
+errno = saved_errno;
 }
 
 static int linux_privcmd_hypercall(xc_interface *xch, xc_osdep_handle h, 
privcmd_hypercall_t *hypercall)
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 07/13] libxc: Fix xc_tmem_control to return proper error.

2015-03-18 Thread Konrad Rzeszutek Wilk
The API returns now negative values on error and stashes
the error in errno. Fix the user of this API.

The 'xc_hypercall_bounce_pre' can fail - and if so it will
stash its errno values - no need to over-write it.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 tools/libxc/xc_tmem.c  | 14 ++
 tools/xenstat/libxenstat/src/xenstat.c |  7 ---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c
index 3261e10..02797bf 100644
--- a/tools/libxc/xc_tmem.c
+++ b/tools/libxc/xc_tmem.c
@@ -73,11 +73,14 @@ int xc_tmem_control(xc_interface *xch,
 if ( subop == TMEMC_LIST && arg1 != 0 )
 {
 if ( buf == NULL )
-return -EINVAL;
+{
+errno = EINVAL;
+return -1;
+}
 if ( xc_hypercall_bounce_pre(xch, buf) )
 {
 PERROR("Could not bounce buffer for tmem control hypercall");
-return -ENOMEM;
+return -1;
 }
 }
 
@@ -118,11 +121,14 @@ int xc_tmem_control_oid(xc_interface *xch,
 if ( subop == TMEMC_LIST && arg1 != 0 )
 {
 if ( buf == NULL )
-return -EINVAL;
+{
+errno = EINVAL;
+return -1;
+}
 if ( xc_hypercall_bounce_pre(xch, buf) )
 {
 PERROR("Could not bounce buffer for tmem control (OID) hypercall");
-return -ENOMEM;
+return -1;
 }
 }
 
diff --git a/tools/xenstat/libxenstat/src/xenstat.c 
b/tools/xenstat/libxenstat/src/xenstat.c
index 8072a90..c1f6511 100644
--- a/tools/xenstat/libxenstat/src/xenstat.c
+++ b/tools/xenstat/libxenstat/src/xenstat.c
@@ -166,6 +166,7 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, 
unsigned int flags)
xc_domaininfo_t domaininfo[DOMAIN_CHUNK_SIZE];
int new_domains;
unsigned int i;
+   int rc;
 
/* Create the node */
node = (xenstat_node *) calloc(1, sizeof(xenstat_node));
@@ -189,9 +190,9 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, 
unsigned int flags)
node->free_mem = ((unsigned long long)physinfo.free_pages)
* handle->page_size;
 
-   node->freeable_mb = (long)xc_tmem_control(handle->xc_handle, -1,
-   TMEMC_QUERY_FREEABLE_MB, -1, 0, 0, 0, NULL);
-
+   rc = xc_tmem_control(handle->xc_handle, -1,
+ TMEMC_QUERY_FREEABLE_MB, -1, 0, 0, 0, NULL);
+   node->freeable_mb = (rc < 0) ? 0 : rc;
/* malloc(0) is not portable, so allocate a single domain.  This will
 * be resized below. */
node->domains = malloc(sizeof(xenstat_domain));
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 13/13] libxc: Fix do_memory_op to return negative value on errors

2015-03-18 Thread Konrad Rzeszutek Wilk
instead of the -Exx values (which should go in errno).

This patch has HUGE implications. There is a lot of APIs
that are using do_memory_op. Fortunately most of them
check for 'if (do_memory_op(..) < 0)' so will function
properly. However there were some which printed the return
value to the user. They have been fixed in:

 libxc: Don't assign return value to errno for E820 get/set xc_ calls.
 libxc: Check xc_sharing_* for proper return values.
 libxc: If xc_domain_add_to_physmap fails, include errno value
 libxc: Check xc_maximum_ram_page for negative return values.
 libxc: Check xc_domain_maximum_gpfn for negative return values

Signed-off-by: Konrad Rzeszutek Wilk 
---
 tools/libxc/xc_private.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index f4f2748..2eb44b6 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -516,7 +516,7 @@ int do_memory_op(xc_interface *xch, int cmd, void *arg, 
size_t len)
 {
 DECLARE_HYPERCALL;
 DECLARE_HYPERCALL_BOUNCE(arg, len, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-long ret = -EINVAL;
+long ret = -1;
 
 if ( xc_hypercall_bounce_pre(xch, arg) )
 {
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 12/13] libxl: Don't assign return value to errno for E820 get/set xc_ calls

2015-03-18 Thread Konrad Rzeszutek Wilk
We should be using the errno that the hypercall left
instead of overwriting it with the return value.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Ian Campbell 
---
 tools/libxl/libxl_x86.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index fd45ead..376b477 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -215,10 +215,8 @@ static int e820_host_sanitize(libxl__gc *gc,
 int rc;
 
 rc = xc_get_machine_memory_map(CTX->xch, map, *nr);
-if (rc < 0) {
-errno = rc;
+if (rc < 0)
 return ERROR_FAIL;
-}
 
 *nr = rc;
 
@@ -251,10 +249,9 @@ static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
 
 rc = xc_domain_set_memory_map(ctx->xch, domid, map, nr);
 
-if (rc < 0) {
-errno  = rc;
+if (rc < 0)
 return ERROR_FAIL;
-}
+
 return 0;
 }
 
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 05/13] libxc: Return negative value and propagate errno for xc_offline_page API

2015-03-18 Thread Konrad Rzeszutek Wilk
Instead of returning -Exx we now return -1 for error.
We could stash the -Exx values in errno values but why - the
underlaying functions we call all stash the proper errno
value. Hence we just propagate it up wherver it is needed.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Ian Campbell 
---
 tools/libxc/xc_offline_page.c | 36 +---
 tools/libxc/xc_private.c  |  2 +-
 tools/misc/xen-hptool.c   |  6 +++---
 3 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/tools/libxc/xc_offline_page.c b/tools/libxc/xc_offline_page.c
index 3147203..d46cc5b 100644
--- a/tools/libxc/xc_offline_page.c
+++ b/tools/libxc/xc_offline_page.c
@@ -58,12 +58,14 @@ int xc_mark_page_online(xc_interface *xch, unsigned long 
start,
 int ret = -1;
 
 if ( !status || (end < start) )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 if ( xc_hypercall_bounce_pre(xch, status) )
 {
 ERROR("Could not bounce memory for xc_mark_page_online\n");
-return -EINVAL;
+return -1;
 }
 
 sysctl.cmd = XEN_SYSCTL_page_offline_op;
@@ -86,12 +88,14 @@ int xc_mark_page_offline(xc_interface *xch, unsigned long 
start,
 int ret = -1;
 
 if ( !status || (end < start) )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 if ( xc_hypercall_bounce_pre(xch, status) )
 {
 ERROR("Could not bounce memory for xc_mark_page_offline");
-return -EINVAL;
+return -1;
 }
 
 sysctl.cmd = XEN_SYSCTL_page_offline_op;
@@ -114,12 +118,14 @@ int xc_query_page_offline_status(xc_interface *xch, 
unsigned long start,
 int ret = -1;
 
 if ( !status || (end < start) )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 if ( xc_hypercall_bounce_pre(xch, status) )
 {
 ERROR("Could not bounce memory for xc_query_page_offline_status\n");
-return -EINVAL;
+return -1;
 }
 
 sysctl.cmd = XEN_SYSCTL_page_offline_op;
@@ -411,19 +417,19 @@ int xc_exchange_page(xc_interface *xch, int domid, 
xen_pfn_t mfn)
 if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
 {
 ERROR("Could not get domain info");
-return -EFAULT;
+return -1;
 }
 
 if (!info.shutdown || info.shutdown_reason != SHUTDOWN_suspend)
 {
+errno = EINVAL;
 ERROR("Can't exchange page unless domain is suspended\n");
-return -EINVAL;
+return -1;
 }
-
 if (!is_page_exchangable(xch, domid, mfn, &info))
 {
 ERROR("Could not exchange page\n");
-return -EINVAL;
+return -1;
 }
 
 /* Map M2P and obtain gpfn */
@@ -431,7 +437,7 @@ int xc_exchange_page(xc_interface *xch, int domid, 
xen_pfn_t mfn)
 if ( !(m2p_table = xc_map_m2p(xch, max_mfn, PROT_READ, NULL)) )
 {
 PERROR("Failed to map live M2P table");
-return -EFAULT;
+return -1;
 }
 gpfn = m2p_table[mfn];
 
@@ -440,7 +446,7 @@ int xc_exchange_page(xc_interface *xch, int domid, 
xen_pfn_t mfn)
 if ( xc_map_domain_meminfo(xch, domid, &minfo) )
 {
 PERROR("Could not map domain's memory information\n");
-return -EFAULT;
+return -1;
 }
 
 /* For translation macros */
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index df6cd9b..0735e23 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -427,7 +427,7 @@ int xc_mmuext_op(
 {
 DECLARE_HYPERCALL;
 DECLARE_HYPERCALL_BOUNCE(op, nr_ops*sizeof(*op), 
XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
-long ret = -EINVAL;
+long ret = -1;
 
 if ( xc_hypercall_bounce_pre(xch, op) )
 {
diff --git a/tools/misc/xen-hptool.c b/tools/misc/xen-hptool.c
index 1134603..c7561a9 100644
--- a/tools/misc/xen-hptool.c
+++ b/tools/misc/xen-hptool.c
@@ -49,7 +49,7 @@ static int hp_mem_online_func(int argc, char *argv[])
 ret = xc_mark_page_online(xch, mfn, mfn, &status);
 
 if (ret < 0)
-fprintf(stderr, "Onlining page mfn %lx failed, error %x", mfn, ret);
+fprintf(stderr, "Onlining page mfn %lx failed, error %x", mfn, errno);
 else if (status & (PG_ONLINE_FAILED |PG_ONLINE_BROKEN)) {
 fprintf(stderr, "Onlining page mfn %lx is broken, "
 "Memory online failed\n", mfn);
@@ -80,7 +80,7 @@ static int hp_mem_query_func(int argc, char *argv[])
 ret = xc_query_page_offline_status(xch, mfn, mfn, &status);
 
 if (ret < 0)
-fprintf(stderr, "Querying page mfn %lx failed, error %x", mfn, ret);
+fprintf(stderr, "Querying page mfn %lx failed, error %x", mfn, errno);
 else
 {
printf("Memory Status %x: [", status);
@@ -160,7 +160,7 @@ static int hp_mem_offline_func(int argc, char *argv[])
 printf("Prepare to offline MEMORY mfn %lx\n", mfn);
 ret = xc_mark_page_offline(xch, mfn, mfn, &status);
 if (ret < 0) {
-fprintf(stderr,

[Xen-devel] [PATCH v2 11/13] libxc: Check xc_sharing_* for proper return values.

2015-03-18 Thread Konrad Rzeszutek Wilk
If there is a negative return value - check for that and
also use errno for the proper error value.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 tools/libxl/libxl.c  |  4 ++--
 tools/tests/mem-sharing/memshrtool.c | 12 ++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index ad69a8a..55d2340 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5024,7 +5024,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
*physinfo)
 physinfo->scrub_pages = xcphysinfo.scrub_pages;
 physinfo->outstanding_pages = xcphysinfo.outstanding_pages;
 l = xc_sharing_freed_pages(ctx->xch);
-if (l == -ENOSYS) {
+if (l < 0 && errno == ENOSYS) {
 l = 0;
 } else if (l < 0) {
 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, l,
@@ -5033,7 +5033,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
*physinfo)
 }
 physinfo->sharing_freed_pages = l;
 l = xc_sharing_used_frames(ctx->xch);
-if (l == -ENOSYS) {
+if (l < 0 && errno == ENOSYS) {
 l = 0;
 } else if (l < 0) {
 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, l,
diff --git a/tools/tests/mem-sharing/memshrtool.c 
b/tools/tests/mem-sharing/memshrtool.c
index db44294..6454bc3 100644
--- a/tools/tests/mem-sharing/memshrtool.c
+++ b/tools/tests/mem-sharing/memshrtool.c
@@ -55,11 +55,19 @@ int main(int argc, const char** argv)
 
 if( !strcasecmp(cmd, "info") )
 {
+long rc;
 if( argc != 2 )
 return usage(argv[0]);
 
-printf("used = %ld\n", xc_sharing_used_frames(xch));
-printf("freed = %ld\n", xc_sharing_freed_pages(xch));
+rc = xc_sharing_freed_pages(xch);
+if ( rc < 0 )
+return 1;
+
+printf("used = %ld\n", rc);
+rc = xc_sharing_used_frames(xch);
+if ( rc < 0 )
+return 1;
+printf("freed = %ld\n", rc);
 }
 else if( !strcasecmp(cmd, "enable") )
 {
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 09/13] libxc: Check xc_maximum_ram_page for negative return values.

2015-03-18 Thread Konrad Rzeszutek Wilk
Instead of assuming everything is always OK. As such
we return now the return value (or zero for success).
The max_mfn is now passed in as the parameter.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 tools/libxc/include/xenctrl.h |  2 +-
 tools/libxc/xc_offline_page.c |  6 +++---
 tools/libxc/xc_private.c  | 11 +--
 tools/libxc/xg_save_restore.h |  3 ++-
 tools/misc/xen-mfndump.c  |  9 -
 5 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index df18292..14aae84 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1509,7 +1509,7 @@ int xc_mmuext_op(xc_interface *xch, struct mmuext_op *op, 
unsigned int nr_ops,
  domid_t dom);
 
 /* System wide memory properties */
-long xc_maximum_ram_page(xc_interface *xch);
+int xc_maximum_ram_page(xc_interface *xch, unsigned long *max_mfn);
 
 /* Get current total pages allocated to a domain. */
 long xc_get_tot_pages(xc_interface *xch, uint32_t domid);
diff --git a/tools/libxc/xc_offline_page.c b/tools/libxc/xc_offline_page.c
index d46cc5b..b1d169c 100644
--- a/tools/libxc/xc_offline_page.c
+++ b/tools/libxc/xc_offline_page.c
@@ -412,7 +412,7 @@ int xc_exchange_page(xc_interface *xch, int domid, 
xen_pfn_t mfn)
 uint32_t status;
 xen_pfn_t new_mfn, gpfn;
 xen_pfn_t *m2p_table;
-int max_mfn;
+unsigned long max_mfn;
 
 if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
 {
@@ -433,8 +433,8 @@ int xc_exchange_page(xc_interface *xch, int domid, 
xen_pfn_t mfn)
 }
 
 /* Map M2P and obtain gpfn */
-max_mfn = xc_maximum_ram_page(xch);
-if ( !(m2p_table = xc_map_m2p(xch, max_mfn, PROT_READ, NULL)) )
+rc = xc_maximum_ram_page(xch, &max_mfn);
+if ( rc || !(m2p_table = xc_map_m2p(xch, max_mfn, PROT_READ, NULL)) )
 {
 PERROR("Failed to map live M2P table");
 return -1;
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index 0735e23..f4f2748 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -535,9 +535,16 @@ int do_memory_op(xc_interface *xch, int cmd, void *arg, 
size_t len)
 return ret;
 }
 
-long xc_maximum_ram_page(xc_interface *xch)
+int xc_maximum_ram_page(xc_interface *xch, unsigned long *max_mfn)
 {
-return do_memory_op(xch, XENMEM_maximum_ram_page, NULL, 0);
+long rc = do_memory_op(xch, XENMEM_maximum_ram_page, NULL, 0);
+
+if ( rc >= 0 )
+{
+*max_mfn = rc;
+rc = 0;
+}
+return rc;
 }
 
 long long xc_domain_get_cpu_usage( xc_interface *xch, domid_t domid, int vcpu )
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index bdd9009..832c329 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -311,7 +311,8 @@ static inline int get_platform_info(xc_interface *xch, 
uint32_t dom,
 if (xc_version(xch, XENVER_capabilities, &xen_caps) != 0)
 return 0;
 
-*max_mfn = xc_maximum_ram_page(xch);
+if (xc_maximum_ram_page(xch, max_mfn))
+return 0;
 
 *hvirt_start = xen_params.virt_start;
 
diff --git a/tools/misc/xen-mfndump.c b/tools/misc/xen-mfndump.c
index 0761f6e..0c018e0 100644
--- a/tools/misc/xen-mfndump.c
+++ b/tools/misc/xen-mfndump.c
@@ -31,7 +31,7 @@ int help_func(int argc, char *argv[])
 int dump_m2p_func(int argc, char *argv[])
 {
 unsigned long i;
-long max_mfn;
+unsigned long max_mfn;
 xen_pfn_t *m2p_table;
 
 if ( argc > 0 )
@@ -41,8 +41,7 @@ int dump_m2p_func(int argc, char *argv[])
 }
 
 /* Map M2P and obtain gpfn */
-max_mfn = xc_maximum_ram_page(xch);
-if ( max_mfn < 0 )
+if ( xc_maximum_ram_page(xch, &max_mfn) < 0 );
 {
 ERROR("Failed to get the maximum mfn");
 return -1;
@@ -183,8 +182,8 @@ int dump_ptes_func(int argc, char *argv[])
 }
 
 /* Map M2P and obtain gpfn */
-max_mfn = xc_maximum_ram_page(xch);
-if ( (mfn > max_mfn) ||
+rc = xc_maximum_ram_page(xch, &max_mfn);
+if ( rc || (mfn > max_mfn) ||
  !(m2p_table = xc_map_m2p(xch, max_mfn, PROT_READ, NULL)) )
 {
 xc_unmap_domain_meminfo(xch, &minfo);
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 04/13] libxc: xc_physdev_map return -1 and populate errno.

2015-03-18 Thread Konrad Rzeszutek Wilk
The users of these (qemu) check for a negative value
so we are safe in regards to that. However they
also use the return value to inform the user of the
error.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Ian Campbell 
---
 tools/libxc/xc_physdev.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_physdev.c b/tools/libxc/xc_physdev.c
index cf02d85..9b064b8 100644
--- a/tools/libxc/xc_physdev.c
+++ b/tools/libxc/xc_physdev.c
@@ -43,8 +43,10 @@ int xc_physdev_map_pirq(xc_interface *xch,
 struct physdev_map_pirq map;
 
 if ( !pirq )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 memset(&map, 0, sizeof(struct physdev_map_pirq));
 map.domid = domid;
 map.type = MAP_PIRQ_TYPE_GSI;
@@ -72,8 +74,10 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
 struct physdev_map_pirq map;
 
 if ( !pirq )
-return -EINVAL;
-
+{
+errno = EINVAL;
+return -1;
+}
 memset(&map, 0, sizeof(struct physdev_map_pirq));
 map.domid = domid;
 map.type = MAP_PIRQ_TYPE_MSI;
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-18 Thread Samuel Thibault
Hello,

Wei Liu, le Tue 17 Mar 2015 14:29:07 +, a écrit :
> One of my lessons learned from the existing stubdom stuffs is that I
> should work with upstream and produce maintainable code.

Not only maintainable, but really make sure to have the time to stick
with upstream on the long run, first until it gets integrated in the
upstream QEMU release process and then still to maintain it there on the
long run.  The old work on mini-os qemu stubdomain wasn't too bad, but
without actual integration in the QEMU process, and nobody to update the
fork, it was deemed to fall behind.

Samuel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.4-testing test] 36516: tolerable FAIL - PUSHED

2015-03-18 Thread xen . org
flight 36516 xen-4.4-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36516/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 35919

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-armhf-armhf-libvirt  9 guest-start  fail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-xl-credit2   5 xen-boot fail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass

version targeted for testing:
 xen  04ac29f0c38236840ed852b4fc0933d388a0e776
baseline version:
 xen  24ecb0be82825e366edd559af29562bca0e07d95


People who touched revisions under test:
  Aaron Adams 
  Ian Campbell 
  Ian Jackson 
  Jan Beulich 


jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64 

[Xen-devel] [xen-4.5-testing test] 36511: tolerable FAIL - PUSHED

2015-03-18 Thread xen . org
flight 36511 xen-4.5-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36511/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-amd64-rumpuserxen-amd64 13 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2   5 xen-boot fail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass

version targeted for testing:
 xen  e76209d69c406ecc3518dbd0e2efa5705273fa20
baseline version:
 xen  25c6ee85a88b42ab6e63a418008448f1935d3312


People who touched revisions under test:
  Ian Campbell 
  Ian Jackson 
  Jan Beulich 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-rumpuserxen-amd64   fail
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qem

[Xen-devel] [qemu-upstream-4.5-testing test] 36517: tolerable FAIL - PUSHED

2015-03-18 Thread xen . org
flight 36517 qemu-upstream-4.5-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36517/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-xl-credit2   5 xen-boot fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3  7 windows-install  fail never pass

version targeted for testing:
 qemuu0b8fb1ec3d666d1eb8bbff56c76c5e6daa2789e4
baseline version:
 qemuu1ebb75b1fee779621b63e84fefa7b07354c43a99


People who touched revisions under test:
  Gerd Hoffmann 
  Gonglei 
  Juan Quintela 
  Michael S. Tsirkin 
  Paolo Bonzini 
  Peter Maydell 
  Petr Matousek 
  Stefano Stabellini 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-amd64-xl-credit2  pass
 test-armhf-armhf-xl-credit2  

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-18 Thread Antti Kantee

On 18/03/15 21:21, Anil Madhavapeddy wrote:

This is not an argument for or against; if you want to expose AF_WHATEVER to 
applications running on a rump kernel, you need to sell AF_WHATEVER to NetBSD, 
not to rumpkernel-users.  Well, preferably you need to sell it to everyone 
implementing sockets and running on some sort of hypervisor, but of course 
gotta start from somewhere.


Given that most of the uses of this will be in userspace code, just
faking out AF_UNIX in Rump does seem a lot easier.  It doesn't matter
to MirageOS either way -- we just need a well-defined XenStore/ring
protocol to obey to do connection setup on the other side.


Where do you propose to inject that faking out (and what does it even 
mean)?  Someone at Berkeley decided that socket drivers should be 
globally enumerated, and PF_UNIX leads to exactly one handler.  Just 
hacking hooks as local patches into the PF_UNIX driver is against the 
whole point of having unmodified, tested drivers from upstream.


So, if you want your bus to appear as a socket to userspace, I don't see 
any shortcut to not going via NetBSD.  If you're happy with something 
else than a socket, that's another story.


Especially if the interface doesn't matter too much for whatever purpose 
you plan to use it for, it's silly to specify the interface so that the 
implementation process is as convoluted as possible ;)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] libxlu: avoid having two definitions of XLU_ConfigList

2015-03-18 Thread Wei Liu
There is already a typedef in libxlutil.h. Remove the one in
libxlu_internal.h.

Signed-off-by: Wei Liu 
---
 tools/libxl/libxlu_internal.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxlu_internal.h b/tools/libxl/libxlu_internal.h
index 3451cfe..0acdde3 100644
--- a/tools/libxl/libxlu_internal.h
+++ b/tools/libxl/libxlu_internal.h
@@ -25,11 +25,11 @@
 
 #include "libxlutil.h"
 
-typedef struct XLU_ConfigList {
+struct XLU_ConfigList {
 int avalues; /* available slots */
 int nvalues; /* actual occupied slots */
 XLU_ConfigValue **values;
-} XLU_ConfigList;
+};
 
 typedef struct YYLTYPE
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression introduced by libxlu: rework internal representation of setting

2015-03-18 Thread Wei Liu
On Wed, Mar 18, 2015 at 04:18:32PM -0400, Konrad Rzeszutek Wilk wrote:
> Hey
> 
> The git commit 1a09c5113a38dcf1fb6582d77285d727defeea6c
> "libxlu: rework internal representation of setting" breaks build:
> 
> ~/xtt-x86_64/xen/tools/libxl> make
> rm -f _paths.h.tmp;  echo "#define SBINDIR \"/usr/sbin\"" >>_paths.h.tmp;  
> echo "#define BINDIR \"/usr/bin\"" >>_paths.h.tmp;  echo "#define LIBEXEC 
> \"/usr/lib/xen\"" >>_paths.h.tmp;  echo "#define LIBEXEC_BIN 
> \"/usr/lib/xen/bin\"" >>_paths.h.tmp;  echo "#define LIBDIR \"/usr/lib\"" 
> >>_paths.h.tmp;  echo "#define SHAREDIR \"/usr/share\"" >>_paths.h.tmp;  echo 
> "#define XENFIRMWAREDIR \"/usr/lib/xen/boot\"" >>_paths.h.tmp;  echo "#define 
> XEN_CONFIG_DIR \"/etc/xen\"" >>_paths.h.tmp;  echo "#define XEN_SCRIPT_DIR 
> \"/etc/xen/scripts\"" >>_paths.h.tmp;  echo "#define XEN_LOCK_DIR 
> \"/var/lock\"" >>_paths.h.tmp;  echo "#define XEN_RUN_DIR \"/var/run/xen\"" 
> >>_paths.h.tmp;  echo "#define XEN_PAGING_DIR \"/var/lib/xen/xenpaging\"" 
> >>_paths.h.tmp;  if ! cmp -s _paths.h.tmp _paths.h; then mv -f _paths.h.tmp 
> _paths.h; else rm -f _paths.h.tmp; fi
> gcc  -O1 -fno-omit-frame-pointer -m64 -g -fno-strict-aliasing -std=gnu99 
> -Wall -Wstrict-prototypes -Wdeclaration-after-statement 
> -Wno-unused-but-set-variable   -O0 -g3 -D__XEN_TOOLS__ -MMD -MF 
> .libxlu_cfg_y.o.d -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE 
> -fno-optimize-sibling-calls  -Werror -Wno-format-zero-length 
> -Wmissing-declarations -Wno-declaration-after-statement -Wformat-nonliteral 
> -I. -fPIC -pthread 
> -I/home/konrad/ssd/konrad/xtt-x86_64/bootstrap/xen-unstable/tools/libxl/../../tools/libxc/include
>  
> -I/home/konrad/ssd/konrad/xtt-x86_64/bootstrap/xen-unstable/tools/libxl/../../tools/include
>-c -o libxlu_cfg_y.o libxlu_cfg_y.c 
> In file included from libxlu_cfg_i.h:23,
>  from libxlu_cfg_y.y:22:
> libxlu_internal.h:39: error: redefinition of typedef ‘XLU_ConfigList’
> libxlutil.h:25: note: previous declaration of ‘XLU_ConfigList’ was here
> make: *** [libxlu_cfg_y.o] Error 1
> FC-64  
> FC-64  git log --oneline 
> | head -1
> 1a09c51 libxlu: rework internal representation of setting
> FC-64  git checkout HEAD^
> Previous HEAD position was 1a09c51... libxlu: rework internal representation 
> of setting
> HEAD is now at 711dec6... libxl: define LIBXL_HAVE_VNUMA
> FC-64  git clean -f -x 
> 1>/dev/null
> FC-64  make 1>/dev/null
> Parsing libxl_types.idl
> Parsing libxl_types_internal.idl
> Parsing libxl_types.idl
> 
> This is under Fedora Core 13.

Wow, this is really old release.

I think some older gccs (<= 4.5 IIRC) don't like multiple definitions. I
will send a patch to fix it.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-18 Thread Anil Madhavapeddy
On 18 Mar 2015, at 20:23, Antti Kantee  wrote:
> 
> On 18/03/15 19:05, Anil Madhavapeddy wrote:
>>> This fits in with a couple of things I hope to make time to work on in the
>>> next couple of months:
>>> 
>>> 1. Introspection of Rump Kernel domUs for ops purposes, i.e. get some
>>>basic "ps", "top", "vmstat"-like information about what the domU is
>>>doing from the dom0.
>>> 
>>> 2. Connecting up multiple Rump Kernel domUs and/or Mirage domUs. The
>>>general idea here is that you can have e.g. a Mirage domU running a
>>>HTTP+TLS frontend, communicating with a Rump Kernel domU running PHP +
>>>FastCGI.
>>> 
>>>The Mirage folks are already doing something similar in their
>>>Jitsu work, using a protocol called Conduit which runs over vchan.
>> 
>> Yeah, this is currently requiring a couple of things:
>> 
>> - kicking the tires with Vchan and its associated machinery, which has
>>   taken some time.  Dave Scott has built a complementary system for
>>   the xentropyd which simply sets up a console ring instead of vchan.
>>   This has the drawback of being a single fixed page, but far simpler.
>> 
>> - A XenStore protocol for setting up stream connections.  This could
>>   indeed quite easily turn into a AF_VCHAN that could be transparently
>>   used by rump/Mirage/HaLVM and normal domUs for VM<->VM comms.
> 
> This is not an argument for or against; if you want to expose AF_WHATEVER to 
> applications running on a rump kernel, you need to sell AF_WHATEVER to 
> NetBSD, not to rumpkernel-users.  Well, preferably you need to sell it to 
> everyone implementing sockets and running on some sort of hypervisor, but of 
> course gotta start from somewhere.

Given that most of the uses of this will be in userspace code, just
faking out AF_UNIX in Rump does seem a lot easier.  It doesn't matter
to MirageOS either way -- we just need a well-defined XenStore/ring
protocol to obey to do connection setup on the other side.

-anil


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 8/9] qspinlock: Generic paravirt support

2015-03-18 Thread Waiman Long

On 03/16/2015 09:16 AM, Peter Zijlstra wrote:

Implement simple paravirt support for the qspinlock.

Provide a separate (second) version of the spin_lock_slowpath for
paravirt along with a special unlock path.

The second slowpath is generated by adding a few pv hooks to the
normal slowpath, but where those will compile away for the native
case, they expand into special wait/wake code for the pv version.

The actual MCS queue can use extra storage in the mcs_nodes[] array to
keep track of state and therefore uses directed wakeups.

The head contender has no such storage available and reverts to the
per-cpu lock entry similar to the current kvm code. We can do a single
enrty because any nesting will wake the vcpu and cause the lower loop
to retry.

Signed-off-by: Peter Zijlstra (Intel)
---
  include/asm-generic/qspinlock.h |3
  kernel/locking/qspinlock.c  |   69 +-
  kernel/locking/qspinlock_paravirt.h |  177 

  3 files changed, 248 insertions(+), 1 deletion(-)

--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -118,6 +118,9 @@ static __always_inline bool virt_queue_s
  }
  #endif

+extern void __pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queue_spin_unlock(struct qspinlock *lock);
+
  /*
   * Initializier
   */
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -18,6 +18,9 @@
   * Authors: Waiman Long
   *  Peter Zijlstra
   */
+
+#ifndef _GEN_PV_LOCK_SLOWPATH
+
  #include
  #include
  #include
@@ -65,13 +68,21 @@

  #include "mcs_spinlock.h"

+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define MAX_NODES  8
+#else
+#define MAX_NODES  4
+#endif
+
  /*
   * Per-CPU queue node structures; we can never have more than 4 nested
   * contexts: task, softirq, hardirq, nmi.
   *
   * Exactly fits one 64-byte cacheline on a 64-bit architecture.
+ *
+ * PV doubles the storage and uses the second cacheline for PV state.
   */
-static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]);
+static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[MAX_NODES]);

  /*
   * We must be able to distinguish between no-tail and the tail at 0:0,
@@ -230,6 +241,32 @@ static __always_inline void set_locked(s
WRITE_ONCE(l->locked, _Q_LOCKED_VAL);
  }

+
+/*
+ * Generate the native code for queue_spin_unlock_slowpath(); provide NOPs for
+ * all the PV callbacks.
+ */
+
+static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_kick_node(struct mcs_spinlock *node) { }
+
+static __always_inline void __pv_wait_head(struct qspinlock *lock) { }
+
+#define pv_enabled()   false
+
+#define pv_init_node   __pv_init_node
+#define pv_wait_node   __pv_wait_node
+#define pv_kick_node   __pv_kick_node
+
+#define pv_wait_head   __pv_wait_head
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define queue_spin_lock_slowpath   native_queue_spin_lock_slowpath
+#endif
+
+#endif /* _GEN_PV_LOCK_SLOWPATH */
+
  /**
   * queue_spin_lock_slowpath - acquire the queue spinlock
   * @lock: Pointer to queue spinlock structure
@@ -259,6 +296,9 @@ void queue_spin_lock_slowpath(struct qsp

BUILD_BUG_ON(CONFIG_NR_CPUS>= (1U<<  _Q_TAIL_CPU_BITS));

+   if (pv_enabled())
+   goto queue;
+
if (virt_queue_spin_lock(lock))
return;

@@ -335,6 +375,7 @@ void queue_spin_lock_slowpath(struct qsp
node += idx;
node->locked = 0;
node->next = NULL;
+   pv_init_node(node);

/*
 * We touched a (possibly) cold cacheline in the per-cpu queue node;
@@ -360,6 +401,7 @@ void queue_spin_lock_slowpath(struct qsp
prev = decode_tail(old);
WRITE_ONCE(prev->next, node);

+   pv_wait_node(node);
arch_mcs_spin_lock_contended(&node->locked);
}

@@ -374,6 +416,7 @@ void queue_spin_lock_slowpath(struct qsp
 * sequentiality; this is because the set_locked() function below
 * does not imply a full barrier.
 */
+   pv_wait_head(lock);
while ((val = smp_load_acquire(&lock->val.counter))&  
_Q_LOCKED_PENDING_MASK)
cpu_relax();

@@ -406,6 +449,7 @@ void queue_spin_lock_slowpath(struct qsp
cpu_relax();

arch_mcs_spin_unlock_contended(&next->locked);
+   pv_kick_node(next);

  release:
/*
@@ -414,3 +458,26 @@ void queue_spin_lock_slowpath(struct qsp
this_cpu_dec(mcs_nodes[0].count);
  }
  EXPORT_SYMBOL(queue_spin_lock_slowpath);
+
+/*
+ * Generate the paravirt code for queue_spin_unlock_slowpath().
+ */
+#if !defined(_GEN_PV_LOCK_SLOWPATH)&&  defined(CONFIG_PARAVIRT_SPINLOCKS)
+#define _GEN_PV_LOCK_SLOWPATH
+
+#undef pv_enabled
+#define pv_enabled()   true
+
+#undef pv_init_node
+#undef pv_wait_node
+#undef pv_kick_node
+
+#und

Re: [Xen-devel] [PATCH 0/9] qspinlock stuff -v15

2015-03-18 Thread Waiman Long

On 03/16/2015 09:16 AM, Peter Zijlstra wrote:

Hi Waiman,

As promised; here is the paravirt stuff I did during the trip to BOS last week.

All the !paravirt patches are more or less the same as before (the only real
change is the copyright lines in the first patch).

The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more
convoluted and I've no real way to test that but it should be stright fwd to
make work.

I ran this using the virtme tool (thanks Andy) on my laptop with a 4x
overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and
it both booted and survived a hackbench run (perf bench sched messaging -g 20
-l 5000).

So while the paravirt code isn't the most optimal code ever conceived it does 
work.

Also, the paravirt patching includes replacing the call with "movb $0, %arg1"
for the native case, which should greatly reduce the cost of having
CONFIG_PARAVIRT_SPINLOCKS enabled on actual hardware.

I feel that if someone were to do a Xen patch we can go ahead and merge this
stuff (finally!).

These patches do not implement the paravirt spinlock debug stats currently
implemented (separately) by KVM and Xen, but that should not be too hard to do
on top and in the 'generic' code -- no reason to duplicate all that.

Of course; once this lands people can look at improving the paravirt nonsense.



Thanks for sending this out. I have no problem with the !paravirt patch. 
I do have some comments on the paravirt one which I will reply individually.


Cheers,
Longman

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-18 Thread Antti Kantee

On 18/03/15 19:05, Anil Madhavapeddy wrote:

This fits in with a couple of things I hope to make time to work on in the
next couple of months:

1. Introspection of Rump Kernel domUs for ops purposes, i.e. get some
basic "ps", "top", "vmstat"-like information about what the domU is
doing from the dom0.

2. Connecting up multiple Rump Kernel domUs and/or Mirage domUs. The
general idea here is that you can have e.g. a Mirage domU running a
HTTP+TLS frontend, communicating with a Rump Kernel domU running PHP +
FastCGI.

The Mirage folks are already doing something similar in their
Jitsu work, using a protocol called Conduit which runs over vchan.


Yeah, this is currently requiring a couple of things:

- kicking the tires with Vchan and its associated machinery, which has
   taken some time.  Dave Scott has built a complementary system for
   the xentropyd which simply sets up a console ring instead of vchan.
   This has the drawback of being a single fixed page, but far simpler.

- A XenStore protocol for setting up stream connections.  This could
   indeed quite easily turn into a AF_VCHAN that could be transparently
   used by rump/Mirage/HaLVM and normal domUs for VM<->VM comms.


This is not an argument for or against; if you want to expose 
AF_WHATEVER to applications running on a rump kernel, you need to sell 
AF_WHATEVER to NetBSD, not to rumpkernel-users.  Well, preferably you 
need to sell it to everyone implementing sockets and running on some 
sort of hypervisor, but of course gotta start from somewhere.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Regression introduced by libxlu: rework internal representation of setting

2015-03-18 Thread Konrad Rzeszutek Wilk
Hey

The git commit 1a09c5113a38dcf1fb6582d77285d727defeea6c
"libxlu: rework internal representation of setting" breaks build:

~/xtt-x86_64/xen/tools/libxl> make
rm -f _paths.h.tmp;  echo "#define SBINDIR \"/usr/sbin\"" >>_paths.h.tmp;  echo 
"#define BINDIR \"/usr/bin\"" >>_paths.h.tmp;  echo "#define LIBEXEC 
\"/usr/lib/xen\"" >>_paths.h.tmp;  echo "#define LIBEXEC_BIN 
\"/usr/lib/xen/bin\"" >>_paths.h.tmp;  echo "#define LIBDIR \"/usr/lib\"" 
>>_paths.h.tmp;  echo "#define SHAREDIR \"/usr/share\"" >>_paths.h.tmp;  echo 
"#define XENFIRMWAREDIR \"/usr/lib/xen/boot\"" >>_paths.h.tmp;  echo "#define 
XEN_CONFIG_DIR \"/etc/xen\"" >>_paths.h.tmp;  echo "#define XEN_SCRIPT_DIR 
\"/etc/xen/scripts\"" >>_paths.h.tmp;  echo "#define XEN_LOCK_DIR 
\"/var/lock\"" >>_paths.h.tmp;  echo "#define XEN_RUN_DIR \"/var/run/xen\"" 
>>_paths.h.tmp;  echo "#define XEN_PAGING_DIR \"/var/lib/xen/xenpaging\"" 
>>_paths.h.tmp;  if ! cmp -s _paths.h.tmp _paths.h; then mv -f _paths.h.tmp 
_paths.h; else rm -f _paths.h.tmp; fi
gcc  -O1 -fno-omit-frame-pointer -m64 -g -fno-strict-aliasing -std=gnu99 -Wall 
-Wstrict-prototypes -Wdeclaration-after-statement -Wno-unused-but-set-variable  
 -O0 -g3 -D__XEN_TOOLS__ -MMD -MF .libxlu_cfg_y.o.d -D_LARGEFILE_SOURCE 
-D_LARGEFILE64_SOURCE -fno-optimize-sibling-calls  -Werror 
-Wno-format-zero-length -Wmissing-declarations -Wno-declaration-after-statement 
-Wformat-nonliteral -I. -fPIC -pthread 
-I/home/konrad/ssd/konrad/xtt-x86_64/bootstrap/xen-unstable/tools/libxl/../../tools/libxc/include
 
-I/home/konrad/ssd/konrad/xtt-x86_64/bootstrap/xen-unstable/tools/libxl/../../tools/include
   -c -o libxlu_cfg_y.o libxlu_cfg_y.c 
In file included from libxlu_cfg_i.h:23,
 from libxlu_cfg_y.y:22:
libxlu_internal.h:39: error: redefinition of typedef ‘XLU_ConfigList’
libxlutil.h:25: note: previous declaration of ‘XLU_ConfigList’ was here
make: *** [libxlu_cfg_y.o] Error 1
FC-64  
FC-64  git log --oneline | 
head -1
1a09c51 libxlu: rework internal representation of setting
FC-64  git checkout HEAD^
Previous HEAD position was 1a09c51... libxlu: rework internal representation of 
setting
HEAD is now at 711dec6... libxl: define LIBXL_HAVE_VNUMA
FC-64  git clean -f -x 
1>/dev/null
FC-64  make 1>/dev/null
Parsing libxl_types.idl
Parsing libxl_types_internal.idl
Parsing libxl_types.idl

This is under Fedora Core 13.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 4/5] Qemu-Xen-vTPM: Qemu vTPM xenstubdoms backen.

2015-03-18 Thread Stefan Berger

On 03/10/2015 08:14 AM, Quan Xu wrote:

This Patch provides the glue for the TPM_TIS(Qemu frontend) to Xen
stubdom vTPM domain that provides the actual TPM functionality. It
sends data and TPM commends with xen_vtpm_frontend. It is similar as
another two vTPM backens:
   *vTPM passthrough backen Since QEMU 1.5.
   *vTPM libtpms-based backen.

Some details:
This part of the patch provides support for the spawning of a thread
that will interact with stubdom vTPM domain by the xen_vtpm_frontend.
It expects a signal from the frontend to wake and pick up the TPM
command that is supposed to be processed and delivers the response
packet using a callback function provided by the frontend.

The backend connects itself to the frontend by filling out an interface
structure with pointers to the function implementing support for various
operations.

(QEMU) vTPM XenStubdoms backen is initialized by Qemu command line options,
   "-tpmdev xenstubdoms,id=xenvtpm0 -device tpm-tis,tpmdev=xenvtpm0"

--Changes in v3:
-Call vtpm_send() and vtpm_recv() directly

--Changes in v4:
-Fix the comment style

Signed-off-by: Quan Xu 
---
  hw/tpm/Makefile.objs |   2 +-
  hw/tpm/tpm_xenstubdoms.c | 247 +++
  2 files changed, 248 insertions(+), 1 deletion(-)
  create mode 100644 hw/tpm/tpm_xenstubdoms.c

diff --git a/hw/tpm/Makefile.objs b/hw/tpm/Makefile.objs
index 57919fa..190e776 100644
--- a/hw/tpm/Makefile.objs
+++ b/hw/tpm/Makefile.objs
@@ -1,3 +1,3 @@
  common-obj-$(CONFIG_TPM_TIS) += tpm_tis.o
  common-obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o
-common-obj-$(CONFIG_TPM_XENSTUBDOMS) += xen_vtpm_frontend.o
+common-obj-$(CONFIG_TPM_XENSTUBDOMS) += tpm_xenstubdoms.o xen_vtpm_frontend.o
diff --git a/hw/tpm/tpm_xenstubdoms.c b/hw/tpm/tpm_xenstubdoms.c
new file mode 100644
index 000..6d0dc32
--- /dev/null
+++ b/hw/tpm/tpm_xenstubdoms.c
@@ -0,0 +1,247 @@
+/*
+ * Xen Stubdom vTPM driver
+ *
+ *  Copyright (c) 2015 Intel Corporation
+ *  Authors:
+ *Quan Xu 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include 
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "qemu/sockets.h"
+#include "qemu/log.h"
+#include "sysemu/tpm_backend.h"
+#include "tpm_int.h"
+#include "hw/hw.h"
+#include "hw/i386/pc.h"
+#include "hw/xen/xen_backend.h"
+#include "sysemu/tpm_backend_int.h"
+#include "tpm_tis.h"
+
+#ifdef DEBUG_TPM
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+#define TYPE_TPM_XENSTUBDOMS "tpm-xenstubdoms"
+#define TPM_XENSTUBDOMS(obj) \
+OBJECT_CHECK(TPMXenstubdomsState, (obj), TYPE_TPM_XENSTUBDOMS)
+
+static const TPMDriverOps tpm_xenstubdoms_driver;
+
+/* Data structures */
+typedef struct TPMXenstubdomsThreadParams {
+TPMState *tpm_state;
+TPMRecvDataCB *recv_data_callback;
+TPMBackend *tb;
+} TPMXenstubdomsThreadParams;
+
+struct TPMXenstubdomsState {
+TPMBackend parent;
+TPMBackendThread tbt;
+TPMXenstubdomsThreadParams tpm_thread_params;
+bool had_startup_error;
+};
+
+typedef struct TPMXenstubdomsState TPMXenstubdomsState;
+
+/* Functions */
+static void tpm_xenstubdoms_cancel_cmd(TPMBackend *tb);
+
+static int tpm_xenstubdoms_unix_transfer(const TPMLocality *locty_data)
+{
+size_t rlen;
+struct XenDevice *xendev;
+
+xendev = xen_find_xendev("vtpm", xen_domid, xenstore_dev);
+if (xendev == NULL) {
+xen_be_printf(xendev, 0, "Con not find vtpm device\n");

Con not -> Cannot

+return -1;
+}
+vtpm_send(xendev, locty_data->w_buffer.buffer, locty_data->w_offset);
+vtpm_recv(xendev, locty_data->r_buffer.buffer, &rlen);
+return 0;
+}
+
+static void tpm_xenstubdoms_worker_thread(gpointer data,
+  gpointer user_data)
+{
+TPMXenstubdomsThreadParams *thr_parms = user_data;
+TPMBackendCmd cmd = (TPMBackendCmd)data;
+
+switch (cmd) {
+case TPM_BACKEND_CMD_PROCESS_CMD:
+
+/* here need a the cmd process function */
+tpm_xenstubdoms_unix_transfer(thr_parms->tpm_state->locty_data);
+thr_parms->recv_data_callback(thr_parms->tpm_state,
+  thr_parms->tpm_state->locty_number);


By now you'll need another parameter here indicating whether the comman

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-18 Thread Martin Lucina
a...@recoil.org said:
> > Point 2. will further require implementing support in the Rump Kernel,
> > either for a shim which would proxy AF_UNIX / AF_INET transparently using
> > vchan, or possibly later implementing a separate socket family (AF_VCHAN /
> > AF_HYPER?). Once that is done you should be able to just drop it in to
> > QEMU on Rump.
> 
> I'm a little wary of point 2) asking for filesystem access to dom0.  What
> exactly is the qemu state API?  Does it need arbitrary file access, or is
> there a slightly higher level set of operations that could be marshalled
> along the socket?  In fact, why doesn't qemu privilege separate and use
> a QMP socket for its host filesystem operations as well?

Email thread context confusion here. I meant point 2 that I wrote
(connecting up Rump<->Mirage domUs where the Rump application is unmodified
and listens on what it believes is AF_INET/AF_UNIX).

Regarding the qemu state API, as I understood from others replies to the
full thread
(http://www.freelists.org/post/rumpkernel-users/Upstream-QEMU-based-stubdom-and-rump-kernel)
the state API does not require filesystem access and can be made to work
over a socket.

Martin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-18 Thread Anil Madhavapeddy
On 18 Mar 2015, at 11:20, Martin Lucina  wrote:
>> 
>> A bit background information. A stubdom is a service domain.  With QEMU
>> stubdom we are able to run QEMU device emulation code in a separate
>> domain so that bugs in QEMU don't affect Dom0 (the controlling domain).
>> Xen currently has a QEMU stubdom, but it's based on our fork of ancient
>> QEMU (plus some other libraries and mini-os). Eventually we would like
>> to use upstream QEMU in stubdom.
>> 
>> I've now successfully built QEMU upstream with rump kernel. However to
>> make it fully functional as a stubdom, there are some missing pieces to
>> be added in.
>> 
>> 1. The ability to access QMP socket (a unix socket) from Dom0. That
>>   will be used to issue command to QEMU.
>> 2. The ability to access files in Dom0. That will be used to write to /
>>   read from QEMU state file.
> 
> As I understand from Stefano's and Anthony's replies in this thread, both
> of the above can be implemented using an AF_UNIX or AF_INET socket on the
> QEMU end. Such an implementation would not require anything special done in
> QEMU, just telling it which socket to use using existing mechanisms.
> 
> So, let's step back a bit: What we need is a trusted communication channel
> from a Rump Kernel domU to dom0, using existing socket or socket-like[*]
> APIs at both the domU and dom0 ends.
> 
> This fits in with a couple of things I hope to make time to work on in the
> next couple of months:
> 
> 1. Introspection of Rump Kernel domUs for ops purposes, i.e. get some
>basic "ps", "top", "vmstat"-like information about what the domU is
>doing from the dom0.
> 
> 2. Connecting up multiple Rump Kernel domUs and/or Mirage domUs. The
>general idea here is that you can have e.g. a Mirage domU running a
>HTTP+TLS frontend, communicating with a Rump Kernel domU running PHP +
>FastCGI.
> 
>The Mirage folks are already doing something similar in their 
>Jitsu work, using a protocol called Conduit which runs over vchan.

Yeah, this is currently requiring a couple of things:

- kicking the tires with Vchan and its associated machinery, which has
  taken some time.  Dave Scott has built a complementary system for
  the xentropyd which simply sets up a console ring instead of vchan.
  This has the drawback of being a single fixed page, but far simpler.

- A XenStore protocol for setting up stream connections.  This could
  indeed quite easily turn into a AF_VCHAN that could be transparently
  used by rump/Mirage/HaLVM and normal domUs for VM<->VM comms.

> Now, both of the above require exactly the same underlying mechanism.
> 
> Point 2. will further require implementing support in the Rump Kernel,
> either for a shim which would proxy AF_UNIX / AF_INET transparently using
> vchan, or possibly later implementing a separate socket family (AF_VCHAN /
> AF_HYPER?). Once that is done you should be able to just drop it in to
> QEMU on Rump.

I'm a little wary of point 2) asking for filesystem access to dom0.  What
exactly is the qemu state API?  Does it need arbitrary file access, or is
there a slightly higher level set of operations that could be marshalled
along the socket?  In fact, why doesn't qemu privilege separate and use
a QMP socket for its host filesystem operations as well?

> 
> [*] Aside: What I mean by socket-like is that the implementation does not
> need to be in the dom0 kernel, it can just be a user-space library. For
> example, see the nanomsg or ZeroMQ APIs, which I have worked on extensively
> in the past.
> 
>> 3. The building process requires mini-os headers. That will be used
>>   to build libxc (the controlling library).
> 
> As Antti already suggested, if you can use POSIX interfaces rather than
> mini-os ones in QEMU, then that would be a better approach.
> 
>> One of my lessons learned from the existing stubdom stuffs is that I
>> should work with upstream and produce maintainable code. So before I do
>> anything for real I'd better consult the community. My gut feeling is
>> that the first two requirements are not really Xen specific. Let me know
>> what you guys plan and think.
> 
> Thanks for getting in touch. I think this is an important discussion!

Very much so -- the time is definitely right to establish some unikernel
interop standards.  I'm also looking forward to better rump<->MirageOS
comms in particular (shiny new protocol stacks working alongside existing
applications in separate VM containers).

-anil

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Implementing working set tracker in xen

2015-03-18 Thread HANNAS YAYA Issa

Hello
Today I'm working on virtual machines working set size tracking in xen 
hypervisor.

I follow the vmware approch I read in this article:

https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&sqi=2&ved=0CCEQFjAA&url=https%3A%2F%2Fwww.usenix.org%2Fevents%2Fosdi02%2Ftech%2Fwaldspurger%2Fwaldspurger.pdf&ei=18cJVcqNH5HuaM-agIgK&usg=AFQjCNHbfSzjSGUcjHtYGij7Yz6xmQIm0w&sig2=D3GOzsjJVlO7Pp5-RXTeSw&cad=rja


ESX Server uses a statistical sampling approach to ob-
tain aggregate VM working set estimates directly, with-
out any guest involvement. Each VM is sampled inde-
pendently, using a configurable sampling period defined
in units of VM execution time. At the start of each sam-
pling period, a small number of the virtual machine’s
“physical” pages are selected randomly using a uniform
distribution. Each sampled page is tracked by invalidat-
ing any cached mappings associated with its PPN, such
as hardware TLB entries and virtualized MMU state.
The next guest access to a sampled page will be inter-
cepted to re-establish these mappings, at which time a
touched page count is incremented. At the end of the
sampling period, a statistical estimate of the fraction
of memory actively accessed by the VM is f = t/n.

The algorithm is easy to understand. but I have a real problem while 
implementing it.
I don't how I can invalidate all cache mapping with the select physical 
number. it's simple to invalidate tlb mapping by flushing the tlb
but I cannot invalidate page table entries that map the selected 
physical number because I cannot get the virtual address of the physical 
number.
I have the virtual addresse of page table entries only during the 
hypercall HYPERVISOR_mmu_update. but this give only the address of the 
pages to be updated.


Please can someone help me to resole this issue.

Thank you for your help


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 3/5] Qemu-Xen-vTPM: Register Xen stubdom vTPM frontend driver

2015-03-18 Thread Stefan Berger

On 03/10/2015 08:14 AM, Quan Xu wrote:

This drvier transfers any request/repond between TPM xenstubdoms
driver and Xen vTPM stubdom, and facilitates communications between
Xen vTPM stubdom domain and vTPM xenstubdoms driver. It is a glue for
the TPM xenstubdoms driver and Xen stubdom vTPM domain that provides
the actual TPM functionality.

(Xen) Xen backend driver should run before running this frontend, and
initialize XenStore as the following for communication.

[XenStore]

for example:

Domain 0: runs QEMU for guest A
Domain 1: vtpmmgr
Domain 2: vTPM for guest A
Domain 3: HVM guest A

[...]
  local = ""
domain = ""
 0 = ""
  frontend = ""
   vtpm = ""
2 = ""
 0 = ""
  backend = "/local/domain/2/backend/vtpm/0/0"
  backend-id = "2"
  state = "*"
  handle = "0"
  domain = "Domain3's name"
  ring-ref = "*"
  event-channel = "*"
  feature-protocol-v2 = "1"
  backend = ""
   qdisk = ""
[...]
   console = ""
   vif = ""
[...]
 2 = ""
  [...]
  backend = ""
   vtpm = ""
0 = ""
 0 = ""
  frontend = "/local/domain/0/frontend/vtpm/2/0"
  frontend-id = "0" ('0', frontend is running in Domain-0)
  [...]
 3 = ""
  [...]
  device = "" (frontend device, the backend is running in QEMU/.etc)
   vkbd = ""
[...]
   vif = ""
[...]

  ..

(QEMU) xen_vtpmdev_ops is initialized with the following process:
   xen_hvm_init()
 [...]
 -->xen_fe_register("vtpm", ...)
   -->xenstore_fe_scan()
 -->xen_fe_try_init(ops)
   --> XenDevOps.init()
 -->xen_fe_get_xendev()
   --> XenDevOps.alloc()
 -->xen_fe_check()
   -->xen_fe_try_initialise()
 --> XenDevOps.initialise()
   -->xen_fe_try_connected()
 --> XenDevOps.connected()
 -->xs_watch()
 [...]

--Changes in v3:
-Move xen_stubdom_vtpm.c to xen_vtpm_frontend.c
-Read Xen vTPM status via XenStore

--Changes in v4:
-Redesign vTPM xenstore architecture for HVM virtual machine.
-Remove unnecessary busy loop.
-Call xen_fe_register(vtpm ...) directly and move some initialzation
  chunk in the xen_vtpmdev_ops .init function.

Signed-off-by: Quan Xu 
---
  hw/tpm/Makefile.objs |   1 +
  hw/tpm/xen_vtpm_frontend.c   | 278 +++
  hw/xen/xen_frontend.c|  20 
  include/hw/xen/xen_backend.h |   5 +
  include/hw/xen/xen_common.h  |   6 +
  xen-hvm.c|   5 +
  6 files changed, 315 insertions(+)
  create mode 100644 hw/tpm/xen_vtpm_frontend.c

diff --git a/hw/tpm/Makefile.objs b/hw/tpm/Makefile.objs
index 99f5983..57919fa 100644
--- a/hw/tpm/Makefile.objs
+++ b/hw/tpm/Makefile.objs
@@ -1,2 +1,3 @@
  common-obj-$(CONFIG_TPM_TIS) += tpm_tis.o
  common-obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o
+common-obj-$(CONFIG_TPM_XENSTUBDOMS) += xen_vtpm_frontend.o
diff --git a/hw/tpm/xen_vtpm_frontend.c b/hw/tpm/xen_vtpm_frontend.c
new file mode 100644
index 000..4ef0a26
--- /dev/null
+++ b/hw/tpm/xen_vtpm_frontend.c
@@ -0,0 +1,278 @@
+/*
+ * Connect to Xen vTPM stubdom domain
+ *
+ *  Copyright (c) 2015 Intel Corporation
+ *  Authors:
+ *Quan Xu 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "hw/hw.h"
+#include "block/aio.h"
+#include "hw/xen/xen_backend.h"
+
+#define XS_STUBDOM_VTPM_ENABLE"1"
+
+enum tpmif_state {
+TPMIF_STATE_IDLE,/* no contents / vTPM idle / cancel complete */
+TPMIF_STATE_SUBMIT,  /* request ready / vTPM working */
+TPMIF_STATE_FINISH,  /* response ready / vTPM idle */
+TPMIF_STATE_CANCEL,  /* cancel requested / vTPM working */
+};
+
+static AioContext *vtpm_aio_ctx;
+
+enum status_bits {
+VTPM_STATUS_RUNNING  = 0x1,
+VTPM_STATUS_IDLE = 0x2,
+VTPM_STATUS_RESULT   = 0x4,
+VTPM_STATUS_CANCELED = 0x8,
+};
+
+struct tpmif_shared_page {
+uint32_t length; /* request/response length in bytes */
+
+uint8_t  state;   /* enum tpmif_state */
+uint8_t  locality;/* for the current request */
+u

Re: [Xen-devel] Fwd: difference between physical address, machine address, physical frame number and machine frame number

2015-03-18 Thread Razvan Cojocaru
> the source codes I read maddr, paddr, pfn and mdfn.
> please can someone explain me the differences between these concepts in
> xen hypervisor.

I think Tim did a very good job here:

http://www.slideshare.net/xen_com_mgr/xen-memory-management


HTH,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] Fix libxc return -E misusage.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 04:43:07PM +, Ian Campbell wrote:
> On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> > Hey,
> > 
> > Please see the following set of patches which fix the various
> > usage of return -Exx instead of return -1 for errors (and
> > stashing the error value in errno).
> 
> All of them, or just a subset of the callpaths?

Not all of them. I did do a careful analysis of the
ones that the patches have to make sure I didn't cause
any regressions. But I hadn't looked at some of
the other ones because I got fatigued.
> 
> If it's all of them then I'm amazed it was only 14 patches!

I am sure there are others. When I get another day when to do
spring cleaning I will crank out some other patches.
> 
> Either way, well done and thanks for draining this swamp even a bit!

Hehe
> 
> > It also cleans up some of the invalid usage of errno (as the
> > underlaying calls already stash errno values) - and we just
> > need to bubble them up.
> > 
> > Lastly it also wraps an errno-invisibility shield against
> > xc_hypercall_bounce_* calls so that any errors in those
> > won't over-write the hypercall ones.
> > 
> > 
> >  tools/libxc/xc_core_arm.c  | 15 --
> >  tools/libxc/xc_core_x86.c  | 22 +++---
> >  tools/libxc/xc_cpupool.c   |  4 +--
> >  tools/libxc/xc_dom_x86.c   |  7 +++--
> >  tools/libxc/xc_domain.c|  2 +-
> >  tools/libxc/xc_domain_save.c   |  8 -
> >  tools/libxc/xc_freebsd_osdep.c |  3 ++
> >  tools/libxc/xc_hcall_buf.c |  6 
> >  tools/libxc/xc_linux_osdep.c   |  3 ++
> >  tools/libxc/xc_offline_page.c  | 36 +--
> >  tools/libxc/xc_physdev.c   | 12 +---
> >  tools/libxc/xc_pm.c| 54 
> > ++
> >  tools/libxc/xc_private.c   |  4 +--
> >  tools/libxc/xc_tmem.c  | 14 ++---
> >  tools/libxc/xg_save_restore.h  |  3 ++
> >  tools/libxl/libxl.c|  4 +--
> >  tools/libxl/libxl_x86.c|  9 ++
> >  tools/misc/xen-hptool.c|  6 ++--
> >  tools/misc/xen-mfndump.c   |  2 +-
> >  tools/tests/mem-sharing/memshrtool.c   | 12 ++--
> >  tools/xenstat/libxenstat/src/xenstat.c |  5 ++--
> >  21 files changed, 158 insertions(+), 73 deletions(-)
> > 
> > 
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Fwd: difference between physical address, machine address, physical frame number and machine frame number

2015-03-18 Thread HANNAS YAYA Issa

Hello guys.
I start to develop in xen hypervisor no so long. I work on memory. in 
the source codes I read maddr, paddr, pfn and mdfn.
please can someone explain me the differences between these concepts in 
xen hypervisor.

Thank you


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 12/14] libxl: Check xc_sharing_* for proper return values.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 04:36:40PM +, Ian Campbell wrote:
> On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> > If there is a negative return value - check for that and
> > also use errno for the proper error value.
> 
> Was xc_sharing_freed_pages fixed earlier in the series (which is
> strictly speaking a bisection hazard, but nevermind) or was the existing
> check for l == -E... just buggy?

The existing check for ENOSYS was buggy.
> 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> >  tools/libxl/libxl.c  |  4 ++--
> >  tools/tests/mem-sharing/memshrtool.c | 12 ++--
> >  2 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > index 94b4d59..99a99e9 100644
> > --- a/tools/libxl/libxl.c
> > +++ b/tools/libxl/libxl.c
> > @@ -5024,7 +5024,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
> > *physinfo)
> >  physinfo->scrub_pages = xcphysinfo.scrub_pages;
> >  physinfo->outstanding_pages = xcphysinfo.outstanding_pages;
> >  l = xc_sharing_freed_pages(ctx->xch);
> > -if (l == -ENOSYS) {
> > +if (l < 0 && errno == ENOSYS) {
> >  l = 0;
> >  } else if (l < 0) {
> >  LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, l,
> > @@ -5033,7 +5033,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
> > *physinfo)
> >  }
> >  physinfo->sharing_freed_pages = l;
> >  l = xc_sharing_used_frames(ctx->xch);
> > -if (l == -ENOSYS) {
> > +if (l < 0 && errno == ENOSYS) {
> >  l = 0;
> >  } else if (l < 0) {
> >  LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, l,
> > diff --git a/tools/tests/mem-sharing/memshrtool.c 
> > b/tools/tests/mem-sharing/memshrtool.c
> > index db44294..d1dd8a2 100644
> > --- a/tools/tests/mem-sharing/memshrtool.c
> > +++ b/tools/tests/mem-sharing/memshrtool.c
> > @@ -55,11 +55,19 @@ int main(int argc, const char** argv)
> >  
> >  if( !strcasecmp(cmd, "info") )
> >  {
> > +long rc;
> >  if( argc != 2 )
> >  return usage(argv[0]);
> >  
> > -printf("used = %ld\n", xc_sharing_used_frames(xch));
> > -printf("freed = %ld\n", xc_sharing_freed_pages(xch));
> > +rc = xc_sharing_freed_pages(xch);
> > +if ( rc < 0 )
> > +return errno;
> 
> Just return 1 please.
> 
> > +
> > +printf("used = %ld\n", rc);
> > +rc = xc_sharing_used_frames(xch);
> > +if ( rc < 0 )
> > +return errno;
> 
> Likewise.
> 
> > +printf("freed = %ld\n", rc);
> >  }
> >  else if( !strcasecmp(cmd, "enable") )
> >  {
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] libxl: Domain destroy: unlock userdata earlier

2015-03-18 Thread Jim Fehlig
Ian Campbell wrote:
> On Tue, 2015-03-17 at 17:34 +, Wei Liu wrote:
>   
>> On Tue, Mar 17, 2015 at 09:30:58AM -0600, Jim Fehlig wrote:
>> 
>>> From: Ian Jackson 
>>>
>>> Unlock the userdata before we actually call xc_domain_destroy.  This
>>> leaves open the possibility that other libxl callers will see the
>>> half-destroyed domain (with no devices, paused), but this is fine.
>>>
>>> Signed-off-by: Ian Jackson 
>>> CC: Wei Liu 
>>> Reviewed-by: Jim Fehlig 
>>> Tested-by: Jim Fehlig 
>>>   
>> Acked-by: Wei Liu 
>> 
>
> Acked-by: Ian Campbell 
>
> I'm not sure if this is safe/sensible to apply without the preceding
> patch which I had a comment on.
>   

It is not so sensible without the subsequent patch 3/3.  This patch is
not related to the preceding patch 1/3.

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] libxl: In domain death search, start search at first domid we want

2015-03-18 Thread Jim Fehlig
Ian Campbell wrote:
> On Tue, 2015-03-17 at 09:30 -0600, Jim Fehlig wrote:
>   
>> From: Ian Jackson 
>>
>> From: Ian Jackson 
>>
>> When domain_death_xswatch_callback needed a further call to
>> xc_domain_getinfolist it would restart it with the last domain it
>> found rather than the first one it wants.
>>
>> If it only wants one it will also only ask for one domain.  The result
>> would then be that it gets the previous domain again (ie, the previous
>> one to the one it wants), which still doesn't reveal the answer to the
>> question, and it would therefore loop again.
>>
>> It's completely unclear to me why I thought it was a good idea to
>> start the xc_domain_getinfolist with the last domain previously found
>> rather than the first one left un-confirmed.  The code has been that
>> way since it was introduced.
>> 
>
> Is it because the xc_domain_getinfolist will fetch at most:
> int nentries = LIBXL_TAILQ_NEXT(evg, entry) ? 200 : 1;
> entries?
>
> After your change then if the domid we are looking for is the 201st
> domain then won't we just keep going round looking at the first 200
> (undying) domains?
>   

Yes, that theoretically looks to be the case. When all 200 domains have
been examined the inner loop is terminated

if (got == gotend) {
LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, " got==gotend");
break;
}

which will repeat the outer loop with same conditions. But
xc_domain_getinfolist() is called with first_domain set to the domain ID
we are looking for. Is it possible for xc_domain_getinfolist() to return
200 domains with IDs less than the ID we asked for?

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 08/14] libxl: Fix xc_tmem_control to return proper error.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 04:26:37PM +, Ian Campbell wrote:
> On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> > The API returns now negative values on error and stashes
> > the error in errno. Fix the user of this API.
> > 
> > The 'xc_hypercall_bounce_pre' can fail - and if so it will
> > stash its errno values - no need to over-write it.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> >  tools/libxc/xc_tmem.c  | 14 ++
> >  tools/xenstat/libxenstat/src/xenstat.c |  5 +++--
> >  2 files changed, 13 insertions(+), 6 deletions(-)
> > 
> > diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c
> > index 3261e10..02797bf 100644
> > --- a/tools/libxc/xc_tmem.c
> > +++ b/tools/libxc/xc_tmem.c
> > @@ -73,11 +73,14 @@ int xc_tmem_control(xc_interface *xch,
> >  if ( subop == TMEMC_LIST && arg1 != 0 )
> >  {
> >  if ( buf == NULL )
> > -return -EINVAL;
> > +{
> > +errno = EINVAL;
> > +return -1;
> > +}
> >  if ( xc_hypercall_bounce_pre(xch, buf) )
> >  {
> >  PERROR("Could not bounce buffer for tmem control hypercall");
> > -return -ENOMEM;
> > +return -1;
> >  }
> >  }
> >  
> > @@ -118,11 +121,14 @@ int xc_tmem_control_oid(xc_interface *xch,
> >  if ( subop == TMEMC_LIST && arg1 != 0 )
> >  {
> >  if ( buf == NULL )
> > -return -EINVAL;
> > +{
> > +errno = EINVAL;
> > +return -1;
> > +}
> >  if ( xc_hypercall_bounce_pre(xch, buf) )
> >  {
> >  PERROR("Could not bounce buffer for tmem control (OID) 
> > hypercall");
> > -return -ENOMEM;
> > +return -1;
> >  }
> >  }
> >  
> > diff --git a/tools/xenstat/libxenstat/src/xenstat.c 
> > b/tools/xenstat/libxenstat/src/xenstat.c
> > index 8072a90..bf257ef 100644
> > --- a/tools/xenstat/libxenstat/src/xenstat.c
> > +++ b/tools/xenstat/libxenstat/src/xenstat.c
> > @@ -166,6 +166,7 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, 
> > unsigned int flags)
> > xc_domaininfo_t domaininfo[DOMAIN_CHUNK_SIZE];
> > int new_domains;
> > unsigned int i;
> > +   long rc;
> >  
> > /* Create the node */
> > node = (xenstat_node *) calloc(1, sizeof(xenstat_node));
> > @@ -189,9 +190,9 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, 
> > unsigned int flags)
> > node->free_mem = ((unsigned long long)physinfo.free_pages)
> > * handle->page_size;
> >  
> > -   node->freeable_mb = (long)xc_tmem_control(handle->xc_handle, -1,
> > +   rc = (long)xc_tmem_control(handle->xc_handle, -1,
> > TMEMC_QUERY_FREEABLE_MB, -1, 0, 0, 0, NULL);
> 
> Why the cast, why not make rc an int since that is what xc_tmem_control
> takes and you don't seem to use the full width anyway?

Right. int should be enough.
> 
> Or alternatively fix the return type of xc_tmem_control.
> 
> > -
> > +   node->freeable_mb = (rc < 0) ? 0 : rc;
> 
> Should rc not get propagated into an error for the caller?

Nope. If tmem is not enabled (so xc_tmem_control returns -ENOSYS)
freeable_mb should be zero. In this case we would have returned negative
values which is certainly not right.

> 
> > /* malloc(0) is not portable, so allocate a single domain.  This will
> >  * be resized below. */
> > node->domains = malloc(sizeof(xenstat_domain));
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Userspace PV backend hangs

2015-03-18 Thread Iurii Konovalenko
Hi, guys!

We continue bringing up System based on Xen hypervisor on Renesas Lager board.
We started 4 domains: Dom0 (Linux 3.14), Driver Domain (Linux 3.14),
Cluster Domain (Linux 3.14), and DomU (Tizen with 3.14 kernel).
Each domain has 2 VCPUs (4 physical CPUs are up).
Currently, we use Xen version 4.6-unstable.
We've successfully brought up PVBLK and PVUSB - both backends and
frontends of these PV drivers live in Kernel-space.
But we have problems with PV drivers with backends, that live in user-space.
For example, Audio backend application hangs in the middle of
initialization. Same for other PV drivers.
Investigation shows, that we have deadlock. It is example of source:

if (!xs_watch(xsh, sndbk_params.reconnectpath, ""))
return -1;

xs_read(xsh, XBT_NULL, buf, NULL);

xs_read() hangs because dev/xenbus is locked by watch thread, created
in xs_watch().
xs_watch() creates thread, which locks dev/xenbus, and goes to sleep
in xenbus frontend (ret =
wait_event_interruptible(u->read_waitq,!list_empty(&u->read_buffers));)
until someone wake up it. But nobody can wake it up. We tried to
change value using utils (xenstore-write), but it helps only
sometimes: sometimes it helps thread to wake up, sometimes that thread
sleeps and does not wake up.
Sometimes backend passes this place successfully, but hangs on next
similar xs_watch() + xs_read() combination.
One more important note is, that if we set watch from via tools
(xenstore-watch) to the exactly same path, as in backend, and then
change value with xenstore-write, we see prints from xenstore-watch,
but xs_watch thread is still sleeping.

Could you please suggest what can be reason of such behavior? May be
you saw such behavior earlier?

Best regards.

Iurii Konovalenko | Senior Software Engineer
GlobalLogic

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code

2015-03-18 Thread George Dunlap
On 03/18/2015 04:49 PM, Dario Faggioli wrote:
>> scheduler_init() is called before smp_prepare_cpus() in part because of
>> a dependency chain elsewhere: we cannot set up the idle domain until
>> scheduler_init() is called; and other places further on in the
>> initialization but before setting up cpu topology information assume
>> that the idle domain has been set up.
>>
> However, as you say yourself below, there is boot_cpu_data, which
> stashes the info we need for the boot cpu. It gets filled after
> init_idle_domain()->scheduler_init() right now, but I don't see a reason
> why it can't be pulled up a bit (and from my tests, it seems to work).

Well yes, my goal was first to primarily talk about the state of the
world and get all the facts and constraints out there; and then after
that to talk about what we can / could do to fix it. :-)

I do think we want to minimize the special casing as much as possible;
and I think it makes sense for setup to be the one trying to sort out
constraints, rather than the callees: i.e., for setup to say, "I know
scheduler_init may need the topology information for this cpu to be in
cpu_data[], so I'm going to make sure that's in place before I call it".

So if it's possible to arrange for cpu_data[] to be populated -- at
least with the topology information -- for each cpu before alloc_pdata
happens, I think that's ideal.

>> If this would work, I think it would be a lot better.  It would remove
>> the credit2-specific callback, and it would mean we wouldn't need an
>> additional test to filter out
>>
> I see. Ok, I guess I can give this a try. If it does not explode,
> because some weird dependency we can't see right now, we can either try
> harder to track it, or use the other solution outlined above as a
> fallback.

If you could look into this, that would be awesome. :-)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 04:43:40PM +, Jan Beulich wrote:
> >>> On 18.03.15 at 15:06,  wrote:
> > On Wed, Mar 18, 2015 at 07:41:55AM +, Jan Beulich wrote:
> >> >>> On 17.03.15 at 18:44,  wrote:
> >> > As you can see to preserve the existing functionality such as
> >> > being able to schedule N amount of interrupt injections
> >> > for the N interrupts we might get - I modified '->masked'
> >> > to be an atomic counter.
> >> 
> >> Why would that be? When an earlier interrupt wasn't fully handled,
> >> real hardware wouldn't latch more than one further instance either.
> > 
> > We acknowledge the interrupt in the hypervisor - as in we call
> > ->ack on the handler (which for MSI is an nop anyhow).
> 
> The case where ->ack is a nop (for the purposes here) is specifically
> not a problem, as that means we defer ack-ing the LAPIC (hence
> further instances can't show up).
> 
> > If the device is misconfigured and keeps on sending burst of
> > interrupts every 10 msec for 1msec we can dead-lock.
> 
> How is this different from the hypervisor itself not being fast
> enough to handle one instance before the next one shows up?

If by 'handle' you mean process it to the guest (so update guest vAPIC
and so on), then yes - this is exactly the case I am describing.

> I've been trying to reconstruct the rationale for our current
> treatment of maskable MSI sources (in that we ack them at the
> LAPIC right away), but so far wasn't really successful (sadly
> commit 5f4c1bb65e lacks any word of description other than
> its title).
> 
> (Ill behaved devices shouldn't be handed to guests anyway.)

They might become ill-behaved if the guest OS becomes
compromised.
> 
> > Either way we should tell the guest about those interrupts.
> >> 
> >> > The end result is that we can still live-lock. Unless we:
> >> >  - Drop on the floor the injection of N interrupts and
> >> >just deliever at max one per VMX_EXIT (and not bother
> >> >with interrupts arriving when we are in the VMX handler).
> >> 
> >> I'm afraid I again don't see the point here.
> > 
> > I am basing all of this on the assumption that we have
> > many interrupts for the same device coming it - and we have
> > not been able to tell the guest about it (the guest could
> > be descheduled, too slow, etc) so that it can do what it
> > needs to silence the device.
> 
> But that's the same as with the native hardware case: When there
> are new interrupt instances before the earlier one was acked, at
> most one will be seen at the point the interrupt becomes unmasked
> again.

Correct. However we split the 'handling' of an interrupt in two
stages. First stage is Acking it and activating an softirq to
process this dpci.

The second stage is running the softirq handler (processing)- and
right then we can get interrupted by the same interrupt (we have
Acked it - so the device is OK to send another one). The interrupt
handler (do_IRQ) will try to tell the softirq to process it.
And in here - depending on which flavour of RFC patches I've
posted - we could deadlock.

The deadlocks arise if we explicitly wait for the softirq to finish
in raise_softirq_for - as in we spin in raise_softirq_for for the
dpci to be out of running - while we have just stomped over the
softirq that was processing the dpci!

The live-lock scenario is also possible - if the device sends an
interrupt right as dpci_softirq is in hvm_dirq_assist - and it
does at such regular intervals that dpci_softirq ends up
rescheduling its dpci every time.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-18 Thread Ian Campbell
On Wed, 2015-03-18 at 12:45 +, Stefano Stabellini wrote:
> On Wed, 18 Mar 2015, Ian Campbell wrote:
> > On Wed, 2015-03-18 at 12:24 +0100, Martin Lucina wrote:
> > > ian.campb...@citrix.com said:
> > > > On Tue, 2015-03-17 at 15:27 +, Wei Liu wrote:
> > > > > This looks most interesting as it implies we can easily pipe a console
> > > > > to it.
> > > > 
> > > > BTW, rather than rawe consoles we should probably consider using the
> > > > channel extension: http://xenbits.xen.org/docs/unstable/misc/channel.txt
> > > 
> > > What would be the advantage/rationale for using channels rather than 
> > > vchan?
> > > (See my other reply to this thread)
> > 
> > Not much really.
> > 
> > About the only relevant difference between vchan and channels(/consoles)
> > is that there is an existing backend running on most xen systems
> > (xenconsoled) which can be leveraged in some cases for channels, whereas
> > vchan would need a specific backend writing for each case.
> > 
> > Apart from that implementation convenience vchan is probably going to be
> > better in terms of proper integration for the other end.
> > 
> > But iff the decision goes the way of consoles then using channels in
> > preference to raw consoles makes sense.
> 
> I think that for simplicity's sake and to limit dependencies on the
> system, using consoles for low bandwidth channels, such as QMP, is
> preferable.

s/consoles/channels/, please ;-)

That said, a having libxl be a user of libvchan to slurp the data in/out
of qemu directly (perhaps using the datacopier infrastructure) might be
nicer from a design point of view, since it would mean libxl could
read/write things directly instead of via a temp file and it takes
xenconsoled out of that path, which might be nice.

Ian,



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 14/14] libxl: Fix do_memory_op to return negative value on errors

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> instead of the -Exx values (which should go in errno).
> 
> This patch has HUGE implications.

For such a small patch!

>  There is a lot of APIs
> that are using do_memory_op. Fortunatly most of them

"Fortunately".

> check for 'if (do_memory_op(..) < 0)' so will function
> properly. However there were some which printed the return
> value to the user. They have been fixed in:
> 
>  libxl: Don't assign return value to errno for E820 get/set xc_ calls.
>  libxl: Check xc_sharing_* for proper return values.
>  libxl: Print xc_domain_decrease_reservation proper errno value.
>  libxl: If xc_domain_add_to_physmap fails, include errno value
>  libxl: Check xc_maximum_ram_page for negative return values.
>  libxl: Check xc_domain_maximum_gpfn for negative return values

s/libxl/libxc/ in most of them.

> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 

> ---
>  tools/libxc/xc_private.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
> index 0735e23..1222d05 100644
> --- a/tools/libxc/xc_private.c
> +++ b/tools/libxc/xc_private.c
> @@ -516,7 +516,7 @@ int do_memory_op(xc_interface *xch, int cmd, void *arg, 
> size_t len)
>  {
>  DECLARE_HYPERCALL;
>  DECLARE_HYPERCALL_BOUNCE(arg, len, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> -long ret = -EINVAL;
> +long ret = -1;
>  
>  if ( xc_hypercall_bounce_pre(xch, arg) )
>  {



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 12/14] libxl: Check xc_sharing_* for proper return values.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> If there is a negative return value - check for that and
> also use errno for the proper error value.

Was xc_sharing_freed_pages fixed earlier in the series (which is
strictly speaking a bisection hazard, but nevermind) or was the existing
check for l == -E... just buggy?

> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  tools/libxl/libxl.c  |  4 ++--
>  tools/tests/mem-sharing/memshrtool.c | 12 ++--
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 94b4d59..99a99e9 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5024,7 +5024,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
> *physinfo)
>  physinfo->scrub_pages = xcphysinfo.scrub_pages;
>  physinfo->outstanding_pages = xcphysinfo.outstanding_pages;
>  l = xc_sharing_freed_pages(ctx->xch);
> -if (l == -ENOSYS) {
> +if (l < 0 && errno == ENOSYS) {
>  l = 0;
>  } else if (l < 0) {
>  LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, l,
> @@ -5033,7 +5033,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
> *physinfo)
>  }
>  physinfo->sharing_freed_pages = l;
>  l = xc_sharing_used_frames(ctx->xch);
> -if (l == -ENOSYS) {
> +if (l < 0 && errno == ENOSYS) {
>  l = 0;
>  } else if (l < 0) {
>  LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, l,
> diff --git a/tools/tests/mem-sharing/memshrtool.c 
> b/tools/tests/mem-sharing/memshrtool.c
> index db44294..d1dd8a2 100644
> --- a/tools/tests/mem-sharing/memshrtool.c
> +++ b/tools/tests/mem-sharing/memshrtool.c
> @@ -55,11 +55,19 @@ int main(int argc, const char** argv)
>  
>  if( !strcasecmp(cmd, "info") )
>  {
> +long rc;
>  if( argc != 2 )
>  return usage(argv[0]);
>  
> -printf("used = %ld\n", xc_sharing_used_frames(xch));
> -printf("freed = %ld\n", xc_sharing_freed_pages(xch));
> +rc = xc_sharing_freed_pages(xch);
> +if ( rc < 0 )
> +return errno;

Just return 1 please.

> +
> +printf("used = %ld\n", rc);
> +rc = xc_sharing_used_frames(xch);
> +if ( rc < 0 )
> +return errno;

Likewise.

> +printf("freed = %ld\n", rc);
>  }
>  else if( !strcasecmp(cmd, "enable") )
>  {



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code

2015-03-18 Thread Dario Faggioli
On Wed, 2015-03-18 at 15:26 +, George Dunlap wrote:
> On 03/18/2015 08:53 AM, Dario Faggioli wrote:
> >>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, 
> >>> int cpu)
> >>>  static void *
> >>>  csched2_alloc_pdata(const struct scheduler *ops, int cpu)
> >>>  {
> >>> -/* Check to see if the cpu is online yet */
> >>> -/* Note: cpu 0 doesn't get a STARTING callback */
> >>> -if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
> >>> +/*
> >>> + * Actual initialization is deferred to when the pCPU will be
> >>> + * online, via a STARTING callback. The only exception is
> >>> + * the boot cpu, which does not get such a notification, and
> >>> + * hence needs to be taken care of here.
> >>> + */
> >>> +if ( system_state == SYS_STATE_boot )
> >>>  init_pcpu(ops, cpu);
> 
> I think this will break adding a cpu to a cpupool after boot, won't it?
> 
Oh, yeah, that. Yes, I knew it, and probably should have mentioned it...
I assumed it was clear enough that I was asking an opinion about the
shown use of system_state, as opposed to using a particular init value
of cpu_data.phys_proc_id to the same effect, during the boot process.

> Don't we want effectively, "if ( is_boot_cpu() ||
> is_cpu_topology_set_up() )"?
> 
> is_cpu_topology_set_up() could be "system_state >= SYS_STATE_smp_boot" I
> suppose?
> 
That is what I was thinking. Alternatively, I thought we can keep the
'<= SYS_STATE_boot' part for recognizing that the call comes from within
the boot process, and actually do tweak the initialization/assignment
logic of phys_proc_id, as it was also mentioned previously in the thread
(and as you also mention below, IIUIC), to make it better represent
whether topology info are valid or not.

> Is "cpu == 0" the best test for "is_boot_cpu()", or is there another test?
> 
It is a valid test for that, right now, AFAICT, but Jan would like for
it to stop being one. :-D

> So let's step back for a minute, and let me see if I can describe the
> situation.
> 
Yeah... thanks for this, it's been extremely helpful, IMO!

> Unlike the other schedulers, credit2 needs to know the topology of a
> cpus when setting it up, so that we can place it in the proper runqueue.
> 
> Setting up the per-cpu structures would normally happen in alloc_pdata.
>  This is called in three different ways:
> 
> * During boot, on the boot processer, alloc_pdata is called from
> scheduler_init().
> 
> * During boot, on the secondary processors, alloc_pdata is called from
> a CPU_UP_PREPARE callback, which happens just before the cpu is actually
> brought up.
> 
> * When switching a cpu to a new cpupool after boot, alloc_pdata is also
> called from cpu_schedule_switch().
> 
> The "normal" place the cpu topology information can be found is in
> global the cpu_data[] array, typically accessed by the cpu_to_socket()
> or cpu_to_core() macros.
> 
> This topology information is written to cpu_data[] by
> smp_store_cpu_info().  For the boot cpu, it happens in
> smp_prepare_cpus();  For secondary cpus, it's happens in
> start_secondary(), which is the code run on the cpu itself as it's being
> brought up.
> 
> Unfortunately, at the moment, both of these places are after the
> respective places where the alloc pdata is called for those cpus.
> Flattening the entire x86 setup at the moment you'd see:
>   alloc_pdata for boot cpu
>   smp_store_cpu_info for boot cpu
>   for each secondary cpu
> alloc_pdata for secondary cpu
> smp_store_cpu_info for secondary cpu
> 
> scheduler_init() is called before smp_prepare_cpus() in part because of
> a dependency chain elsewhere: we cannot set up the idle domain until
> scheduler_init() is called; and other places further on in the
> initialization but before setting up cpu topology information assume
> that the idle domain has been set up.
> 
However, as you say yourself below, there is boot_cpu_data, which
stashes the info we need for the boot cpu. It gets filled after
init_idle_domain()->scheduler_init() right now, but I don't see a reason
why it can't be pulled up a bit (and from my tests, it seems to work).

That happens during system_state==SYS_STATE_boot, which also means
system_state I haven't actually tracked down this dependency yet; nor have I explored
> the possibility of populating cpu_data[] for the boot cpu earier than
> it's done now.
> 
That looks like what boot_cpu_data is for, AFAICT.

> I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather
> than CPU_STARTING -- whether that's just what seemed most sensible when
> cpupools were created, or whether there's a dependency somewhere between
> those two points.
> 
Me neither, this may be worth a try.

> At the moment we deal with the fact that the topology information for a
> CPU is not available on CPU_UP_PREPARE by setting a callback that will
> call init_pcpu() on the CPU_STARTING action.
> 
> The boot cpu will not get such a callback; but information about the
> topo

Re: [Xen-devel] [PATCH 11/14] libxl: If xc_domain_add_to_physmap fails, include errno value

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:

$subject says libxl but means libxc.

Actually, now I notice it so do almost all of the patches! Once fixed
any acks I gave (or will give shortly) can still be applied.

> Instead of just the return value.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  tools/libxc/xc_dom_x86.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index bf06fe4..20e379c 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -865,7 +865,8 @@ static int map_grant_table_frames(struct xc_dom_image 
> *dom)
>  }
>  xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
>   "%s: mapping grant tables failed " "(pfn=0x%" PRIpfn
> - ", rc=%d)", __FUNCTION__, dom->total_pages + i, rc);
> + ", rc=%d, errno=%d)", __FUNCTION__,
> + dom->total_pages + i, rc ,errno);

s/ ,/, /

(hey, I drew a pretty picture)

With that fixed: Acked-by Ian Campbell 

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 10/14] libxl: Check xc_maximum_ram_page for negative return values.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> Instead of assuming everything is always OK.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  tools/libxc/xg_save_restore.h | 3 +++
>  tools/misc/xen-mfndump.c  | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
> index bdd9009..6d26a0a 100644
> --- a/tools/libxc/xg_save_restore.h
> +++ b/tools/libxc/xg_save_restore.h
> @@ -313,6 +313,9 @@ static inline int get_platform_info(xc_interface *xch, 
> uint32_t dom,
>  
>  *max_mfn = xc_maximum_ram_page(xch);
>  
> +if (*max_mfn < 0)
> +return 0;

*max_mfn is unsigned, so the negative would have been lost already.
Perhaps do the same sort of thing with a helper as the last patch?

> +
>  *hvirt_start = xen_params.virt_start;
>  
>  if ( xc_domain_get_guest_width(xch, dom, guest_width) != 0)
> diff --git a/tools/misc/xen-mfndump.c b/tools/misc/xen-mfndump.c
> index 0761f6e..81ef448 100644
> --- a/tools/misc/xen-mfndump.c
> +++ b/tools/misc/xen-mfndump.c
> @@ -184,7 +184,7 @@ int dump_ptes_func(int argc, char *argv[])
>  
>  /* Map M2P and obtain gpfn */
>  max_mfn = xc_maximum_ram_page(xch);
> -if ( (mfn > max_mfn) ||
> +if ( (max_mfn < 0 ) || (mfn > max_mfn) ||

Same here.

>   !(m2p_table = xc_map_m2p(xch, max_mfn, PROT_READ, NULL)) )
>  {
>  xc_unmap_domain_meminfo(xch, &minfo);



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 09/14] libxl: Check xc_domain_maximum_gpfn for negative return values

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> Instead of assuming everything is always OK.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  tools/libxc/xc_core_arm.c| 14 +++---
>  tools/libxc/xc_core_x86.c| 21 +
>  tools/libxc/xc_domain_save.c |  8 +++-
>  3 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/libxc/xc_core_arm.c b/tools/libxc/xc_core_arm.c
> index 34185cf..654692a 100644
> --- a/tools/libxc/xc_core_arm.c
> +++ b/tools/libxc/xc_core_arm.c
> @@ -31,9 +31,13 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
> *arch_ctxt,
>  }
>  
> 
> -static int nr_gpfns(xc_interface *xch, domid_t domid)
> +static int nr_gpfns(xc_interface *xch, domid_t domid, int *rc)

It would be more natural to return the rc directly and update a pointer
argument for the max gpfn on success.

> diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
> index b5d442d..426b90d 100644
> --- a/tools/libxc/xc_core_x86.c
> +++ b/tools/libxc/xc_core_x86.c
> @@ -36,9 +36,13 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
> *arch_ctxt,
>  }
>  
> 
> -static int nr_gpfns(xc_interface *xch, domid_t domid)
> +static int nr_gpfns(xc_interface *xch, domid_t domid, int *rc)

Likewise, although perhaps you could consolidate those two into one
common wrapper? And then use it everywhere (there were some open-coded
ones later on).

Or even fix the prototype of xc_domain_maximum_gpfn turning it into said
helper, it doesn't look like it has too many callers.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

2015-03-18 Thread Jan Beulich
>>> On 18.03.15 at 15:06,  wrote:
> On Wed, Mar 18, 2015 at 07:41:55AM +, Jan Beulich wrote:
>> >>> On 17.03.15 at 18:44,  wrote:
>> > As you can see to preserve the existing functionality such as
>> > being able to schedule N amount of interrupt injections
>> > for the N interrupts we might get - I modified '->masked'
>> > to be an atomic counter.
>> 
>> Why would that be? When an earlier interrupt wasn't fully handled,
>> real hardware wouldn't latch more than one further instance either.
> 
> We acknowledge the interrupt in the hypervisor - as in we call
> ->ack on the handler (which for MSI is an nop anyhow).

The case where ->ack is a nop (for the purposes here) is specifically
not a problem, as that means we defer ack-ing the LAPIC (hence
further instances can't show up).

> If the device is misconfigured and keeps on sending burst of
> interrupts every 10 msec for 1msec we can dead-lock.

How is this different from the hypervisor itself not being fast
enough to handle one instance before the next one shows up?
I've been trying to reconstruct the rationale for our current
treatment of maskable MSI sources (in that we ack them at the
LAPIC right away), but so far wasn't really successful (sadly
commit 5f4c1bb65e lacks any word of description other than
its title).

(Ill behaved devices shouldn't be handed to guests anyway.)

> Either way we should tell the guest about those interrupts.
>> 
>> > The end result is that we can still live-lock. Unless we:
>> >  - Drop on the floor the injection of N interrupts and
>> >just deliever at max one per VMX_EXIT (and not bother
>> >with interrupts arriving when we are in the VMX handler).
>> 
>> I'm afraid I again don't see the point here.
> 
> I am basing all of this on the assumption that we have
> many interrupts for the same device coming it - and we have
> not been able to tell the guest about it (the guest could
> be descheduled, too slow, etc) so that it can do what it
> needs to silence the device.

But that's the same as with the native hardware case: When there
are new interrupt instances before the earlier one was acked, at
most one will be seen at the point the interrupt becomes unmasked
again.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] xen-netback: making the bandwidth limiter runtime settable

2015-03-18 Thread Wei Liu
On Wed, Mar 18, 2015 at 05:21:08PM +0100, Imre Palik wrote:
> On 03/17/15 12:17, Wei Liu wrote:
> > On Fri, Mar 13, 2015 at 01:51:05PM +0100, Imre Palik wrote:
> >> From: "Palik, Imre" 
> >>
> >> With the current netback, the bandwidth limiter's parameters are only
> >> settable during vif setup time.  This patch register a watch on them, and
> >> thus makes them runtime changeable.
> >>
> >> When the watch fires, the timer is reset.  The timer's mutex is used for
> >> fencing the change.
> >>
> > 
> > I think this is a valid idea.  Just that this commit message is not
> > complete. It doesn't describe everything this patch does.
> > 
> >> Cc: Anthony Liguori 
> >> Signed-off-by: Imre Palik 
> >> ---
> > [...]
> >>queue->rx_queue_max = XENVIF_RX_QUEUE_BYTES;
> >> diff --git a/drivers/net/xen-netback/netback.c 
> >> b/drivers/net/xen-netback/netback.c
> >> index cab9f52..bcc1880 100644
> >> --- a/drivers/net/xen-netback/netback.c
> >> +++ b/drivers/net/xen-netback/netback.c
> >> @@ -641,7 +641,7 @@ static void tx_add_credit(struct xenvif_queue *queue)
> >>queue->remaining_credit = min(max_credit, max_burst);
> >>  }
> >>  
> >> -static void tx_credit_callback(unsigned long data)
> >> +void xenvif_tx_credit_callback(unsigned long data)
> > 
> > Please keep this function static.
> 
> The trouble with that, is that now I am initialising credit_timeout.function 
> in
> drivers/net/xen-netback/interface.c .
> 

Oh, yes. I misread the hunk of common.h. Sorry about the noise.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] Fix libxc return -E misusage.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> Hey,
> 
> Please see the following set of patches which fix the various
> usage of return -Exx instead of return -1 for errors (and
> stashing the error value in errno).

All of them, or just a subset of the callpaths?

If it's all of them then I'm amazed it was only 14 patches!

Either way, well done and thanks for draining this swamp even a bit!

> It also cleans up some of the invalid usage of errno (as the
> underlaying calls already stash errno values) - and we just
> need to bubble them up.
> 
> Lastly it also wraps an errno-invisibility shield against
> xc_hypercall_bounce_* calls so that any errors in those
> won't over-write the hypercall ones.
> 
> 
>  tools/libxc/xc_core_arm.c  | 15 --
>  tools/libxc/xc_core_x86.c  | 22 +++---
>  tools/libxc/xc_cpupool.c   |  4 +--
>  tools/libxc/xc_dom_x86.c   |  7 +++--
>  tools/libxc/xc_domain.c|  2 +-
>  tools/libxc/xc_domain_save.c   |  8 -
>  tools/libxc/xc_freebsd_osdep.c |  3 ++
>  tools/libxc/xc_hcall_buf.c |  6 
>  tools/libxc/xc_linux_osdep.c   |  3 ++
>  tools/libxc/xc_offline_page.c  | 36 +--
>  tools/libxc/xc_physdev.c   | 12 +---
>  tools/libxc/xc_pm.c| 54 
> ++
>  tools/libxc/xc_private.c   |  4 +--
>  tools/libxc/xc_tmem.c  | 14 ++---
>  tools/libxc/xg_save_restore.h  |  3 ++
>  tools/libxl/libxl.c|  4 +--
>  tools/libxl/libxl_x86.c|  9 ++
>  tools/misc/xen-hptool.c|  6 ++--
>  tools/misc/xen-mfndump.c   |  2 +-
>  tools/tests/mem-sharing/memshrtool.c   | 12 ++--
>  tools/xenstat/libxenstat/src/xenstat.c |  5 ++--
>  21 files changed, 158 insertions(+), 73 deletions(-)
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Any work on sharing of large multi-page segments?

2015-03-18 Thread George Dunlap
On Tue, Mar 17, 2015 at 11:45 PM, Andrew Warkentin  wrote:
> On 3/17/15, George Dunlap  wrote:
>> Any deduplication code would run in as a process probably in domain 0,
>> and may be somewhat slow; but the actual mechanism of sharing is a
>> generic mechanism in the hypervisor which any client can use.  Jan is
>> suggesting that you might be able to use that interface to
>> pro-actively tell Xen about the memory pages shared between your
>> various domains.
>>
>
> I wasn't quite sure if it's generic enough to use to implement shared
> segments, or if it were specific to deduplication at the hypervisor
> level.

I haven't used the shared memory interface myself, but we designed it
I'm pretty sure that was the intention.  An idea was discussed, for
instance, was that scanning random pages for duplicates has turned out
to actually be not much benefit for VMWare; but that a more promising
avenue might be hooking into the block layer, so that if VM A and VM B
have disks that share an image base, and block X is shared, that if VM
A reads block X and then VM B reads block X, instead of issuing a
second DMA into VM B's memory, the disk layer can just tell Xen,
"please share this page copy-on-write between VM A and VM B".

Remember also that Jan was recommending this method for sharing
read-only shared data and executable segments, not for areas for
communication (i.e.., where both are going to want to write to the
memory and have the other side see the updates).  For that you've got
to go with grant tables.

Anyway, I think it's worth looking into.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 13/14] libxl: Don't assign return value to errno for E820 get/set xc_ calls.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:

> We should be using the errno that the hypercall left
> instead of overwritting it with the return value.

"overwriting"

> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/14] libxc: Fix xc_domain_get_tsc_info to return -1 instead of -Exx.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> We don't need to put fill errno because xc_hypercall_buffer_alloc
> fills the errno with the appropiate errno values and we just
> need to pass them up the stack.

"appropriate"

> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 03/14] libxc: xc_core_arch_memory_map_get populate errno

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> with proper value (ENOMEM) when reporting failures.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
> [v1: errno before using PERROR]
> ---
>  tools/libxc/xc_core_arm.c | 1 +
>  tools/libxc/xc_core_x86.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/tools/libxc/xc_core_arm.c b/tools/libxc/xc_core_arm.c
> index 16508e7..34185cf 100644
> --- a/tools/libxc/xc_core_arm.c
> +++ b/tools/libxc/xc_core_arm.c
> @@ -54,6 +54,7 @@ xc_core_arch_memory_map_get(xc_interface *xch, struct 
> xc_core_arch_context *unus
>  map = malloc(sizeof(*map));
>  if ( map == NULL )
>  {
> +errno = ENOMEM;

http://pubs.opengroup.org/onlinepubs/9699919799/functions/malloc.html#tag_16_311
 says that malloc will set errno itself:
"Otherwise, it shall return a null pointer [CX] [Option Start]
and set errno to indicate the error"



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 02/14] libxl: Propagate errno from hypercall instead of anything else.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> After we have done the hypercall - the errno has the failure
> code. However our usage of pthread and munmap can trigger them
> to manipulate the errno with their failure values. That would
> be bad as what we care about is just the hypercall error value.
> 
> Another solution to this would be to save the 'errno' from
> pthread/munmap/madvise as an extra parameter to be analyzed
> later. However the call-sites above us do not care about it.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 

> ---
>  tools/libxc/xc_freebsd_osdep.c | 3 +++
>  tools/libxc/xc_hcall_buf.c | 6 ++
>  tools/libxc/xc_linux_osdep.c   | 3 +++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/tools/libxc/xc_freebsd_osdep.c b/tools/libxc/xc_freebsd_osdep.c
> index 151d3bf..bb6d240 100644
> --- a/tools/libxc/xc_freebsd_osdep.c
> +++ b/tools/libxc/xc_freebsd_osdep.c
> @@ -125,10 +125,13 @@ static void 
> freebsd_privcmd_free_hypercall_buffer(xc_interface *xch,
>int npages)
>  {
>  
> +int saved_errno = errno;
>  /* Unlock pages */
>  munlock(ptr, npages * XC_PAGE_SIZE);
>  
>  munmap(ptr, npages * XC_PAGE_SIZE);
> +/* We MUST propagate the hypercall errno, not unmap calls. */

If you have cause to resend then:

 /* We MUST Propagate the hypercall's ernno, not the unmap call's. */.

> +/* Ignore the pthread errors. */

s/the //

(both throughout).

Not a big deal but if it needs resending anyway

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code

2015-03-18 Thread Dario Faggioli
On Wed, 2015-03-18 at 16:08 +, George Dunlap wrote:
> On 03/18/2015 03:59 PM, Jan Beulich wrote:
>  On 18.03.15 at 16:26,  wrote:
> >> In both cases there's a slight risk in using system_state to determine
> >> whether to rely on cpu_data[] or not, because there's actually a window
> >> for each processor after system_state == SYS_STATE_smp_boot where
> >> cpu_data[] is *not* initialized, but it's not obvious from looking at
> >> the data itself.  If the callback mechanisms ever change order with the
> >> cpu_data[] initializations in the future, we risk a situation where
> >> credit2 silently regresses to using a single massive runqueue.
> > 
> > But such an ordering change is extremely unlikely, since the
> > CPU_STARTING notification specifically tells you that the given
> > CPU is now ready for normal use, which includes it having its
> > topology related data set up.

> Even if they look at init_pcpu() in credit2, they're
> unlikely to know that cpu_to_socket() isn't valid until after
> boot_secondary() has happened; and if they try it and boot it,
> everything will seem to work perfectly for credit2 -- except that there
> will be a single global runqueue.
> 
Agreed... and I'm still replying to your (George's) email, but just
wanted to say that, for this, having system_state referenced from
Credit2's code would help making it clear the fact that code has
dependencies with the boot process, wouldn't it?

> If you, Dario and I don't happen to catch it -- or don't happen to
> remember the exact details of the constraints -- nobody may notice until
> a year later.
> 
:-)

> This is the point of having ASSERTs -- so that we don't have to worry
> about remembering and catching all these crazy constraints.
> 
I'm all for finding a way for ASSERT()ing something meaningful to this
effect.

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 05/14] libxl: xc_physdev_map return -1 and populate errno.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 04:21:52PM +, Ian Campbell wrote:
> On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> > The users of these (qemu) check for a negative value
> > so we are safe in regards to that. However they
> > also use the return value to inform the user of the
> > error.
> 
> IIRC I saw a qemu patch go past to fix the callers?
> 

Yes and Stefano has already Acked them.
> > Signed-off-by: Konrad Rzeszutek Wilk 
> 
> Acked-by: Ian Campbell 

Yeey!
> 
> > ---
> >  tools/libxc/xc_physdev.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/libxc/xc_physdev.c b/tools/libxc/xc_physdev.c
> > index cf02d85..9b064b8 100644
> > --- a/tools/libxc/xc_physdev.c
> > +++ b/tools/libxc/xc_physdev.c
> > @@ -43,8 +43,10 @@ int xc_physdev_map_pirq(xc_interface *xch,
> >  struct physdev_map_pirq map;
> >  
> >  if ( !pirq )
> > -return -EINVAL;
> > -
> > +{
> > +errno = EINVAL;
> > +return -1;
> > +}
> >  memset(&map, 0, sizeof(struct physdev_map_pirq));
> >  map.domid = domid;
> >  map.type = MAP_PIRQ_TYPE_GSI;
> > @@ -72,8 +74,10 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
> >  struct physdev_map_pirq map;
> >  
> >  if ( !pirq )
> > -return -EINVAL;
> > -
> > +{
> > +errno = EINVAL;
> > +return -1;
> > +}
> >  memset(&map, 0, sizeof(struct physdev_map_pirq));
> >  map.domid = domid;
> >  map.type = MAP_PIRQ_TYPE_MSI;
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 08/14] libxl: Fix xc_tmem_control to return proper error.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> The API returns now negative values on error and stashes
> the error in errno. Fix the user of this API.
> 
> The 'xc_hypercall_bounce_pre' can fail - and if so it will
> stash its errno values - no need to over-write it.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  tools/libxc/xc_tmem.c  | 14 ++
>  tools/xenstat/libxenstat/src/xenstat.c |  5 +++--
>  2 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/libxc/xc_tmem.c b/tools/libxc/xc_tmem.c
> index 3261e10..02797bf 100644
> --- a/tools/libxc/xc_tmem.c
> +++ b/tools/libxc/xc_tmem.c
> @@ -73,11 +73,14 @@ int xc_tmem_control(xc_interface *xch,
>  if ( subop == TMEMC_LIST && arg1 != 0 )
>  {
>  if ( buf == NULL )
> -return -EINVAL;
> +{
> +errno = EINVAL;
> +return -1;
> +}
>  if ( xc_hypercall_bounce_pre(xch, buf) )
>  {
>  PERROR("Could not bounce buffer for tmem control hypercall");
> -return -ENOMEM;
> +return -1;
>  }
>  }
>  
> @@ -118,11 +121,14 @@ int xc_tmem_control_oid(xc_interface *xch,
>  if ( subop == TMEMC_LIST && arg1 != 0 )
>  {
>  if ( buf == NULL )
> -return -EINVAL;
> +{
> +errno = EINVAL;
> +return -1;
> +}
>  if ( xc_hypercall_bounce_pre(xch, buf) )
>  {
>  PERROR("Could not bounce buffer for tmem control (OID) 
> hypercall");
> -return -ENOMEM;
> +return -1;
>  }
>  }
>  
> diff --git a/tools/xenstat/libxenstat/src/xenstat.c 
> b/tools/xenstat/libxenstat/src/xenstat.c
> index 8072a90..bf257ef 100644
> --- a/tools/xenstat/libxenstat/src/xenstat.c
> +++ b/tools/xenstat/libxenstat/src/xenstat.c
> @@ -166,6 +166,7 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, 
> unsigned int flags)
>   xc_domaininfo_t domaininfo[DOMAIN_CHUNK_SIZE];
>   int new_domains;
>   unsigned int i;
> + long rc;
>  
>   /* Create the node */
>   node = (xenstat_node *) calloc(1, sizeof(xenstat_node));
> @@ -189,9 +190,9 @@ xenstat_node *xenstat_get_node(xenstat_handle * handle, 
> unsigned int flags)
>   node->free_mem = ((unsigned long long)physinfo.free_pages)
>   * handle->page_size;
>  
> - node->freeable_mb = (long)xc_tmem_control(handle->xc_handle, -1,
> + rc = (long)xc_tmem_control(handle->xc_handle, -1,
>   TMEMC_QUERY_FREEABLE_MB, -1, 0, 0, 0, NULL);

Why the cast, why not make rc an int since that is what xc_tmem_control
takes and you don't seem to use the full width anyway?

Or alternatively fix the return type of xc_tmem_control.

> -
> + node->freeable_mb = (rc < 0) ? 0 : rc;

Should rc not get propagated into an error for the caller?

>   /* malloc(0) is not portable, so allocate a single domain.  This will
>* be resized below. */
>   node->domains = malloc(sizeof(xenstat_domain));



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] xentop: add support for qdisks

2015-03-18 Thread Ian Campbell
On Wed, 2015-03-11 at 11:51 -0600, Charles Arnold wrote:
> Now that Xen uses qdisks by default and qemu does not write out
> statistics to sysfs this patch queries the QMP for disk statistics.
> 
> This patch depends on libyajl for parsing statistics returned from
> QMP. The runtime requires libyajl 2.0.3 or newer for required bug
> fixes in yajl_tree_parse().

Elsewhere we currently support libyajl1 even, which means that this is
all configure tests for.

You say bug fixes here, but the code comment says:
 /* Use libyajl version 2.1.x or newer for the tree parser feature with bug 
fixes */

which suggests it perhaps didn't even exist in earlier versions. Also I
note the quoted versions differ, FWIW.

Whether the interface exists (even in buggy form) or not in older
versions is important because if it doesn't exist then that would be a
build failure, which we would want to avoid.

Whereas a functional failure would perhaps be tolerable. However, given
the existing HAVE_YAJL_YAJL_VERSION_H define I think the code could
easily check if the YAJL library is good enough at compile time and stub
itself out -- i.e. not report qdisk stats if the yajl doesn't do the
job.

My second concern here is with the use of /var/run/xen/qmp-libxl-%i from
outside of libxl. I can't remember if qemu is safe against multiple
users of the socket. ISTR asking Anthony this before, but I don't recall
the answer, sorry :-(

Even if it is strictly speaking ok it seems a bit warty to do it, but
perhaps for an in-tree user like libxenstat it is tolerable.
Alternatively we could (relatively) easily arrange for their to be a
second qemp-libxenstat-%i socket, assuming the qemu overhead of a second
one is sane.

Would it be possible to include somewhere, either in a code comment or
in the changelog, an example of the JSON response to the QMP commands.

(I'm also consistently surprised by the lack of a qmp client library,
but that's not your fault!)

> diff --git a/tools/xenstat/libxenstat/src/xenstat.c 
> b/tools/xenstat/libxenstat/src/xenstat.c
> index 8072a90..f3847be 100644
> --- a/tools/xenstat/libxenstat/src/xenstat.c
> +++ b/tools/xenstat/libxenstat/src/xenstat.c
> @@ -657,6 +657,24 @@ static void xenstat_uninit_xen_version(xenstat_handle * 
> handle)
>   * VBD functions
>   */
>  
> +/* Save VBD information */
> +xenstat_vbd *xenstat_save_vbd(xenstat_domain *domain, xenstat_vbd *vbd)
> +{
> +if (domain->vbds == NULL) {
> +domain->num_vbds = 1;
> +domain->vbds = malloc(sizeof(xenstat_vbd));
> +} else {
> +domain->num_vbds++;
> +domain->vbds = realloc(domain->vbds,
> +   domain->num_vbds *
> +   sizeof(xenstat_vbd));
> +}

FYI realloc handles the old pointer being NULL just fine, so you don't
need to special case that so long as num_vbds starts out initialised to
0.

Also, if realloc returns NULL then you need to have remembered the old
value to free it, else it gets leaked.

> @@ -477,18 +480,10 @@ int xenstat_collect_vbds(xenstat_node * node)
>   continue;
>   }
>  
> - if (domain->vbds == NULL) {
> - domain->num_vbds = 1;
> - domain->vbds = malloc(sizeof(xenstat_vbd));
> - } else {
> - domain->num_vbds++;
> - domain->vbds = realloc(domain->vbds,
> -domain->num_vbds *
> -sizeof(xenstat_vbd));
> - }

Oh, I see my comments above were actually on the old code you were
moving.


> + /* Use libyajl version 2.1.x or newer for the tree parser feature with 
> bug fixes */
> + if ((info = yajl_tree_parse((char *)qmp_stats, errbuf, sizeof(errbuf))) 
> == NULL) {

You don't want to log something using errbuf? If not then it may as well
be as small as possible.

> + /* Use libyajl version 2.0.3 or newer for the tree parser feature */
> + if ((info = yajl_tree_parse((char *)stats_buf, errbuf, sizeof(errbuf))) 
> == NULL)

Likewise.

> +/* Get all the active domains */

actually only up to 1024 of them ;-)

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 07/14] libxl: Fix xc_pm API calls to return negative error and stash error in errno.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> Oddly enough the user of this API did the right thing -
> check for return being negative and used 'errno' for the
> real error.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 06/14] libxl: Return negative value and propagate errno for xc_offline_page API

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> Instead of returning -Exx we now return -1 for error.
> We could stash the -Exx values in errno values but why - the
> underlaying functions we call all stash the proper errno
> value. Hence we just propagate it up wherver it is needed.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 05/14] libxl: xc_physdev_map return -1 and populate errno.

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> The users of these (qemu) check for a negative value
> so we are safe in regards to that. However they
> also use the return value to inform the user of the
> error.

IIRC I saw a qemu patch go past to fix the callers?

> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 

> ---
>  tools/libxc/xc_physdev.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxc/xc_physdev.c b/tools/libxc/xc_physdev.c
> index cf02d85..9b064b8 100644
> --- a/tools/libxc/xc_physdev.c
> +++ b/tools/libxc/xc_physdev.c
> @@ -43,8 +43,10 @@ int xc_physdev_map_pirq(xc_interface *xch,
>  struct physdev_map_pirq map;
>  
>  if ( !pirq )
> -return -EINVAL;
> -
> +{
> +errno = EINVAL;
> +return -1;
> +}
>  memset(&map, 0, sizeof(struct physdev_map_pirq));
>  map.domid = domid;
>  map.type = MAP_PIRQ_TYPE_GSI;
> @@ -72,8 +74,10 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
>  struct physdev_map_pirq map;
>  
>  if ( !pirq )
> -return -EINVAL;
> -
> +{
> +errno = EINVAL;
> +return -1;
> +}
>  memset(&map, 0, sizeof(struct physdev_map_pirq));
>  map.domid = domid;
>  map.type = MAP_PIRQ_TYPE_MSI;



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] xen-netback: making the bandwidth limiter runtime settable

2015-03-18 Thread Imre Palik
On 03/17/15 12:17, Wei Liu wrote:
> On Fri, Mar 13, 2015 at 01:51:05PM +0100, Imre Palik wrote:
>> From: "Palik, Imre" 
>>
>> With the current netback, the bandwidth limiter's parameters are only
>> settable during vif setup time.  This patch register a watch on them, and
>> thus makes them runtime changeable.
>>
>> When the watch fires, the timer is reset.  The timer's mutex is used for
>> fencing the change.
>>
> 
> I think this is a valid idea.  Just that this commit message is not
> complete. It doesn't describe everything this patch does.
> 
>> Cc: Anthony Liguori 
>> Signed-off-by: Imre Palik 
>> ---
> [...]
>>  queue->rx_queue_max = XENVIF_RX_QUEUE_BYTES;
>> diff --git a/drivers/net/xen-netback/netback.c 
>> b/drivers/net/xen-netback/netback.c
>> index cab9f52..bcc1880 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -641,7 +641,7 @@ static void tx_add_credit(struct xenvif_queue *queue)
>>  queue->remaining_credit = min(max_credit, max_burst);
>>  }
>>  
>> -static void tx_credit_callback(unsigned long data)
>> +void xenvif_tx_credit_callback(unsigned long data)
> 
> Please keep this function static.

The trouble with that, is that now I am initialising credit_timeout.function in
drivers/net/xen-netback/interface.c .

The reason for the change is that mod_timer_pending() hits a BUG() if
the timeout function is not initialised.

> 
> And say in the commit message you change tx_credit_callback to a better
> name.
> 
>>  {
>>  struct xenvif_queue *queue = (struct xenvif_queue *)data;
>>  tx_add_credit(queue);
>> @@ -1170,7 +1170,7 @@ static bool tx_credit_exceeded(struct xenvif_queue 
>> *queue, unsigned size)
>>  queue->credit_timeout.data =
>>  (unsigned long)queue;
>>  queue->credit_timeout.function =
>> -tx_credit_callback;
>> +xenvif_tx_credit_callback;
>>  mod_timer(&queue->credit_timeout,
>>next_credit);
> [...]
>> @@ -594,13 +597,9 @@ static void xen_net_read_rate(struct xenbus_device *dev,
>>  unsigned long b, u;
>>  char *ratestr;
>>  
>> -/* Default to unlimited bandwidth. */
>> -*bytes = ~0UL;
>> -*usec = 0;
>> -
>>  ratestr = xenbus_read(XBT_NIL, dev->nodename, "rate", NULL);
>>  if (IS_ERR(ratestr))
>> -return;
>> +goto reset;
>>  
>>  s = ratestr;
>>  b = simple_strtoul(s, &e, 10);
>> @@ -612,15 +611,21 @@ static void xen_net_read_rate(struct xenbus_device 
>> *dev,
>>  if ((s == e) || (*e != '\0'))
>>  goto fail;
>>  
>> +kfree(ratestr);
>> +
>>  *bytes = b;
>>  *usec = u;
>>  
>> -kfree(ratestr);
>>  return;
>>  
>> - fail:
>> +fail:
>>  pr_warn("Failed to parse network rate limit. Traffic unlimited.\n");
>>  kfree(ratestr);
>> +
>> +reset:
>> +/* Default to unlimited bandwidth. */
>> +*bytes = ~0UL;
>> +*usec = 0;
>>  }
>>  
> 
> Any reason you modify this function? It is still doing the exact same
> thing, right?

These changes made sense before the channel support, but not anymore.
I will roll them back.

> 
>>  static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
>> @@ -645,6 +650,59 @@ static int xen_net_read_mac(struct xenbus_device *dev, 
>> u8 mac[])
>>  return 0;
>>  }
>>  
>> +static void xen_net_rate_changed(struct xenbus_watch *watch,
>> +const char **vec, unsigned int len)
>> +{
>> +struct xenvif *vif = container_of(watch, struct xenvif, credit_watch);
>> +struct xenbus_device *dev = xenvif_to_xenbus_device(vif);
>> +unsigned long   credit_bytes;
>> +unsigned long   credit_usec;
>> +unsigned int queue_index;
>> +
>> +xen_net_read_rate(dev, &credit_bytes, &credit_usec);
>> +for (queue_index = 0; queue_index < vif->num_queues; queue_index++) {
>> +struct xenvif_queue *queue = &vif->queues[queue_index];
>> +
>> +queue->credit_bytes = credit_bytes;
>> +queue->credit_usec = credit_usec;
>> +if (!mod_timer_pending(&queue->credit_timeout, jiffies) &&
>> +queue->remaining_credit > queue->credit_bytes) {
>> +queue->remaining_credit = queue->credit_bytes;
>> +}
>> +}
>> +}
>> +
>> +static int xen_register_watchers(struct xenbus_device *dev, struct xenvif 
>> *vif)
>> +{
>> +int err = 0;
>> +char *node;
>> +unsigned maxlen = strlen(dev->nodename) + sizeof("/rate");
>> +
>> +node = kmalloc(maxlen, GFP_KERNEL);
>> +if (!node)
>> +return -ENOMEM;
>> +sprintf(node, "%s/rate", dev->nodename);
> 
> Please use snprintf. (Though I can see using sprintf is fine here but I
> want the code to be a bit future proof.)
> 
>> +vif->credit_watch.node = node;
>> +vif->credit_watch.callback = xen_net_rate_changed;
>> +err = register_xenbus_watch(&vif->credit_watch);
>> +if (err

Re: [Xen-devel] [PATCH 01/14] libxc: Replaces tabs with spaces in xc_cpupool_freeinfo

2015-03-18 Thread Ian Campbell
On Mon, 2015-03-16 at 11:39 -0400, Konrad Rzeszutek Wilk wrote:
> The goto looks very wrong when the rest of the code
> has spaces.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 04:06:37PM +, Jan Beulich wrote:
> >>> On 18.03.15 at 14:58,  wrote:
> > On Wed, Mar 18, 2015 at 07:38:12AM +, Jan Beulich wrote:
> >> >>> On 17.03.15 at 18:15,  wrote:
> >> > The latest one (please see attached) would cause an dead-lock iff
> >> > on the CPU we are running the softirq and an do_IRQ comes for the
> >> > exact dpci we are in process of executing.
> >> 
> >> I'm afraid I'm not seeing it - please explain.
> > 
> > Lets assume that the device is an PCIe with MSI. We have only one
> > VCPU in the guest.
> > 
> > We receive the first interrupt, end up going:
> > vmx_vmexit_handler
> >  - case EXIT_REASON_EXTERNAL_INTERRUPT
> >\- vmx_do_extint
> > \- do_IRQ
> >   \- __do_IRQ_guest
> >   \- hvm_do_IRQ_dpci
> >  \- raise_softirq_for
> > [DPCI_SOFTIRQ bit set]
> > vmx_process_softirq
> >  sti
> >  do_softirq
> >-\ __do_sofitq_
> > \- dpci_softirq
> > -\ hvm_dirq_assist
> >[state is 'running']
> > 
> > [Same vector comes in]
> 
> Is that indeed possible? Afaict nothing in the code sequence above
> ack-ed the interrupt, and hence another one can't come in.

The do_IRQ acks it:

 854 spin_lock(&desc->lock);
 855 desc->handler->ack(desc);

which is:
 424 static void ack_maskable_msi_irq(struct irq_desc *desc)
 425 {
 426 ack_nonmaskable_msi_irq(desc);
 427 ack_APIC_irq(); /* ACKTYPE_NONE */
 428 }


.. snip..
 857 if ( likely(desc->status & IRQ_GUEST) )

..
 886 __do_IRQ_guest(irq);


> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) VT-d Posted-interrupt (PI) design for XEN

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 12:44:21PM +, Wu, Feng wrote:
> VT-d Posted-interrupt (PI) design for XEN
> 
> Background
> ==
> With the development of virtualization, there are more and more device
> assignment requirements. However, today when a VM is running with
> assigned devices (such as, NIC), external interrupt handling for the assigned
> devices always needs VMM intervention.
> 
> VT-d Posted-interrupt is a more enhanced method to handle interrupts
> in the virtualization environment. Interrupt posting is the process by
> which an interrupt request is recorded in a memory-resident
> posted-interrupt-descriptor structure by the root-complex, followed by
> an optional notification event issued to the CPU complex.
> 
> With VT-d Posted-interrupt we can get the following advantages:
> - Direct delivery of external interrupts to running vCPUs without VMM
> intervention


I hadn't digged deep in what Xen has currently - but I would assume that
this is exactly what we have now in Xen?

Hm, actually we seem to be still invoking the hypervisor on the
interrupts  -except that if we need to dispatch it to another CPU
using an normal vector to do so - which would still cause the
hypervisor to be invoked? Or does it actually go straight in the
guest?

So what kind of support do we currently have in Xen from posted
interrupt? Could you add a bit about this in the background please?

> - Decrease the interrupt migration complexity. On vCPU migration, software
> can atomically co-migrate all interrupts targeting the migrating vCPU. For
> virtual machines with assigned devices, migrating a vCPU across pCPUs
> either incur the overhead of forwarding interrupts in software (e.g. via VMM
> generated IPIS), or complexity to independently migrate each interrupt 
> targeting
> the vCPU to the new pCPU. However, after enabling VT-d PI, the destination 
> vCPU
> of an external interrupt from assigned devices is stored in the IRTE (i.e.
> Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU,
> we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor, 
> this
> make the interrupt migration automatic.
> 
> 
> Posted-interrupt Introduction
> 
> There are two components to the Posted-interrupt architecture:
> Processor Support and Root-Complex Support
> 
> - Processor Support
> Posted-interrupt processing is a feature by which a processor processes
> the virtual interrupts by recording them as pending on the virtual-APIC
> page.
> 
> Posted-interrupt processing is enabled by setting the "process posted
> interrupts" VM-execution control. The processing is performed in response
> to the arrival of an interrupt with the posted-interrupt notification vector.
> In response to such an interrupt, the processor processes virtual interrupts
> recorded in a data structure called a posted-interrupt descriptor.
> 
> More information about APICv and CPU-side Posted-interrupt, please refer
> to Chapter 29, and Section 29.6 in the Intel SDM:
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
> 
> - Root-Complex Support
> Interrupt posting is the process by which an interrupt request (from IOAPIC
> or MSI/MSIx capable sources) is recorded in a memory-resident
> posted-interrupt-descriptor structure by the root-complex, followed by
> an optional notification event issued to the CPU complex. The interrupt
> request arriving at the root-complex carry the identity of the interrupt
> request source and a 'remapping-index'. The remapping-index is used to
> look-up an entry from the memory-resident interrupt-remap-table. Unlike
> with interrupt-remapping, the interrupt-remap-table-entry for a posted-
> interrupt, specifies a virtual-vector and a pointer to the posted-interrupt
> descriptor. The virtual-vector specifies the vector of the interrupt to be
> recorded in the posted-interrupt descriptor. The posted-interrupt descriptor
> hosts storage for the virtual-vectors and contains the attributes of the
> notification event (interrupt) to be issued to the CPU complex to inform
> CPU/software about pending interrupts recorded in the posted-interrupt
> descriptor.
> 
> More information about VT-d PI, please refer to
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
> 
> Important Definitions
> ==
> There are some changes to IRTE and posted-interrupt descriptor after
> VT-d PI is introduced:

s/is/was/

> IRTE:
> Posted-interrupt Descriptor Address: the address of the posted-interrupt 
> descriptor
> Virtual Vector: the guest vector of the interrupt
> URG: indicates if the interrupt is urgent
> 
> Posted-interrupt descriptor:
> The Posted Interrupt Descriptor hosts the following fields:
> Posted Interrupt Request (PIR): Provide storage for posting (recording) 
> interrupts (one bit
> per vector, for up to 256 vectors).
> 
> Outstanding

Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code

2015-03-18 Thread George Dunlap
On 03/18/2015 03:59 PM, Jan Beulich wrote:
 On 18.03.15 at 16:26,  wrote:
>> In both cases there's a slight risk in using system_state to determine
>> whether to rely on cpu_data[] or not, because there's actually a window
>> for each processor after system_state == SYS_STATE_smp_boot where
>> cpu_data[] is *not* initialized, but it's not obvious from looking at
>> the data itself.  If the callback mechanisms ever change order with the
>> cpu_data[] initializations in the future, we risk a situation where
>> credit2 silently regresses to using a single massive runqueue.
> 
> But such an ordering change is extremely unlikely, since the
> CPU_STARTING notification specifically tells you that the given
> CPU is now ready for normal use, which includes it having its
> topology related data set up.

I didn't mean so much that CPU_STARTING might change meaning, but that
someone looking only at schedule.c might think that it would make sense
to call alloc_pdata() somewhere else, before cpu_data[] had been
populated.  Even if they look at init_pcpu() in credit2, they're
unlikely to know that cpu_to_socket() isn't valid until after
boot_secondary() has happened; and if they try it and boot it,
everything will seem to work perfectly for credit2 -- except that there
will be a single global runqueue.

If you, Dario and I don't happen to catch it -- or don't happen to
remember the exact details of the constraints -- nobody may notice until
a year later.

This is the point of having ASSERTs -- so that we don't have to worry
about remembering and catching all these crazy constraints.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

2015-03-18 Thread Jan Beulich
>>> On 18.03.15 at 14:58,  wrote:
> On Wed, Mar 18, 2015 at 07:38:12AM +, Jan Beulich wrote:
>> >>> On 17.03.15 at 18:15,  wrote:
>> > The latest one (please see attached) would cause an dead-lock iff
>> > on the CPU we are running the softirq and an do_IRQ comes for the
>> > exact dpci we are in process of executing.
>> 
>> I'm afraid I'm not seeing it - please explain.
> 
> Lets assume that the device is an PCIe with MSI. We have only one
> VCPU in the guest.
> 
> We receive the first interrupt, end up going:
> vmx_vmexit_handler
>  - case EXIT_REASON_EXTERNAL_INTERRUPT
>\- vmx_do_extint
>   \- do_IRQ
>   \- __do_IRQ_guest
>   \- hvm_do_IRQ_dpci
>  \- raise_softirq_for
>   [DPCI_SOFTIRQ bit set]
> vmx_process_softirq
>  sti
>  do_softirq
>-\ __do_sofitq_
> \- dpci_softirq
> -\ hvm_dirq_assist
>[state is 'running']
> 
> [Same vector comes in]

Is that indeed possible? Afaict nothing in the code sequence above
ack-ed the interrupt, and hence another one can't come in.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code

2015-03-18 Thread Jan Beulich
>>> On 18.03.15 at 16:26,  wrote:
> In both cases there's a slight risk in using system_state to determine
> whether to rely on cpu_data[] or not, because there's actually a window
> for each processor after system_state == SYS_STATE_smp_boot where
> cpu_data[] is *not* initialized, but it's not obvious from looking at
> the data itself.  If the callback mechanisms ever change order with the
> cpu_data[] initializations in the future, we risk a situation where
> credit2 silently regresses to using a single massive runqueue.

But such an ordering change is extremely unlikely, since the
CPU_STARTING notification specifically tells you that the given
CPU is now ready for normal use, which includes it having its
topology related data set up.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 0/7] Some (not only) cpupool related fixes and improvements

2015-03-18 Thread Ian Campbell
On Fri, 2015-03-13 at 12:08 +0100, Dario Faggioli wrote:
> Dario Faggioli (7):
>   xl: turn some int local variable into bool
>   xl: add -c/--cpupool option to `xl list'
>   libxl: introduce libxl_cpupool_cpu{add,remove}_cpumap()
>   xl: enable using ranges of pCPUs when manipulating cpupools
>   xl: enable using ranges of pCPUs when creating cpupools
>   xl: make error reporting of cpupool subcommands consistent
>   xl: use libxl_cpupoolinfo_list_free() in main_cpupoolnumasplit

Applied with Wei's acks.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4] Add flag to start info regarding virtual mapped p2m list

2015-03-18 Thread Juergen Gross
Xen pv domains are using a domain private p2m list to convert guest pfns
to mfns. This p2m list has to be updated by the Xen tools during domain
restore and migration, as the mfns will most likely change. In order to
locate the p2m list the Xen tools need an interface provided by the
guest. Up to now this interface has been the shared info page where the
guest would store the mfn of the top level page of a 3-level p2m tree.

This p2m tree is fixed in it's layout and due to the limitation of
entries it can hold at each level it is limiting the maximum size of the
p2m list which can be reported to the Xen tools. The maximum memory the
p2m tree can support for 64 bit domains is 512 GB (32 bit domains don't
have a problem, as the p2m tree limit is much higher than the supported
domain size of 64 GB).

In order to be able to support pv domains with more than 512 GB an
additional way to specify the p2m list for the Xen tools has been added:
instead of a tree structure linked via mfns, the virtual address of a
linear p2m list, the cr3 value of the related address space and the size
of the p2m list can be specified by the guest (added by commit
50bd1f0825339dfacde471df7664729216fc46e3).

Guests implementing this new interface need to know, of course, whether
the Xen tools are capable to use the new interface instead of the old
p2m tree interface. Otherwise a guest using only the new interface with
the virtual mapped linear p2m list on a machine with old Xen tools not
supporting this interface could not be restored or migrated.

The added flag in the start info indicates the Xen tool's capability to
use the new interface enabling the guest to omit the p2m tree and thus
to support more than 512 GB of RAM.

Signed-off-by: Juergen Gross 
---
 xen/include/public/xen.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 3703c39..66c09a3 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -773,10 +773,12 @@ typedef struct start_info start_info_t;
 #endif /* XEN_HAVE_PV_GUEST_ENTRY */
 
 /* These flags are passed in the 'flags' field of start_info_t. */
-#define SIF_PRIVILEGED(1<<0)  /* Is the domain privileged? */
-#define SIF_INITDOMAIN(1<<1)  /* Is this the initial control domain? */
-#define SIF_MULTIBOOT_MOD (1<<2)  /* Is mod_start a multiboot module? */
-#define SIF_MOD_START_PFN (1<<3)  /* Is mod_start a PFN? */
+#define SIF_PRIVILEGED  (1<<0)  /* Is the domain privileged? */
+#define SIF_INITDOMAIN  (1<<1)  /* Is this the initial control domain? */
+#define SIF_MULTIBOOT_MOD   (1<<2)  /* Is mod_start a multiboot module? */
+#define SIF_MOD_START_PFN   (1<<3)  /* Is mod_start a PFN? */
+#define SIF_VIRT_P2M_4TOOLS (1<<4)  /* Do Xen tools understand a virt. mapped 
*/
+/* P->M making the 3 level tree obsolete? 
*/
 #define SIF_PM_MASK   (0xFF<<8) /* reserve 1 byte for xen-pm options */
 
 /*
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code

2015-03-18 Thread George Dunlap
On 03/18/2015 08:53 AM, Dario Faggioli wrote:
>>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, 
>>> int cpu)
>>>  static void *
>>>  csched2_alloc_pdata(const struct scheduler *ops, int cpu)
>>>  {
>>> -/* Check to see if the cpu is online yet */
>>> -/* Note: cpu 0 doesn't get a STARTING callback */
>>> -if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
>>> +/*
>>> + * Actual initialization is deferred to when the pCPU will be
>>> + * online, via a STARTING callback. The only exception is
>>> + * the boot cpu, which does not get such a notification, and
>>> + * hence needs to be taken care of here.
>>> + */
>>> +if ( system_state == SYS_STATE_boot )
>>>  init_pcpu(ops, cpu);

I think this will break adding a cpu to a cpupool after boot, won't it?

Don't we want effectively, "if ( is_boot_cpu() ||
is_cpu_topology_set_up() )"?

is_cpu_topology_set_up() could be "system_state >= SYS_STATE_smp_boot" I
suppose?

Is "cpu == 0" the best test for "is_boot_cpu()", or is there another test?

>>
>> Same here, plus the new condition at the first glance isn't matching
>> the old one, but perhaps that's intentional.
>>
> It is intentional, and that is why we're changing it! :-) Let me try to
> explain:
> 
> The idea, in both old and new code, is to call init_pcpu() either:
>  a) on the boot cpu, during boot
>  b) on non-boot cpu, *only* if they are online.
> 
> The issue is that, for assessing b), it checks whether
> cpu_to_socket(cpu) returns >=0 and, if it does, it assumes the cpu is
> online. That is not true, as cpu_data.phys_proc_id (which is what
> cpu_to_socket() go reading) is 0 even before any onlining and topolog
> identification has happened (it's a global).
> 
> Therefore, all the pcpus are initialized right away, and all are
> assigned to runqueue 0, as returned by cpu_to_socket() at this stage,
> which is clearly wrong.
> 
> In fact, this is the reason why, previous versions of this took the
> approach of initializing things such as cpu_to_socket() returned -1
> before topology identification.
> 
> In the new version, as you suggested, I'm using system_state to figure
> out whether we are dealing with the boot cpu, and that's it. :-)
> 
> Hope this clarifies things... If yes, I'll make sure to put a similar
> explanation in the changelog, when sending the patch officially.

So let's step back for a minute, and let me see if I can describe the
situation.

Unlike the other schedulers, credit2 needs to know the topology of a
cpus when setting it up, so that we can place it in the proper runqueue.

Setting up the per-cpu structures would normally happen in alloc_pdata.
 This is called in three different ways:

* During boot, on the boot processer, alloc_pdata is called from
scheduler_init().

* During boot, on the secondary processors, alloc_pdata is called from
a CPU_UP_PREPARE callback, which happens just before the cpu is actually
brought up.

* When switching a cpu to a new cpupool after boot, alloc_pdata is also
called from cpu_schedule_switch().

The "normal" place the cpu topology information can be found is in
global the cpu_data[] array, typically accessed by the cpu_to_socket()
or cpu_to_core() macros.

This topology information is written to cpu_data[] by
smp_store_cpu_info().  For the boot cpu, it happens in
smp_prepare_cpus();  For secondary cpus, it's happens in
start_secondary(), which is the code run on the cpu itself as it's being
brought up.

Unfortunately, at the moment, both of these places are after the
respective places where the alloc pdata is called for those cpus.
Flattening the entire x86 setup at the moment you'd see:
  alloc_pdata for boot cpu
  smp_store_cpu_info for boot cpu
  for each secondary cpu
alloc_pdata for secondary cpu
smp_store_cpu_info for secondary cpu

scheduler_init() is called before smp_prepare_cpus() in part because of
a dependency chain elsewhere: we cannot set up the idle domain until
scheduler_init() is called; and other places further on in the
initialization but before setting up cpu topology information assume
that the idle domain has been set up.

I haven't actually tracked down this dependency yet; nor have I explored
the possibility of populating cpu_data[] for the boot cpu earier than
it's done now.

I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather
than CPU_STARTING -- whether that's just what seemed most sensible when
cpupools were created, or whether there's a dependency somewhere between
those two points.

I *think* that there probably cannot be a dependency: no cpu boot code
can (or should) depend on the cpu having been initialized, because it
doesn't happen when you're booting credit2; and no scheduler code in
alloc_pdata can (or should) depend on being in a pre-initialized state,
because it also happens when assigning an already-initialized cpu to a
cpupool.

At the moment we deal with the fact that the topology information for a
CPU is not availabl

Re: [Xen-devel] [PATCHv1] xen/balloon: disable memory hotplug in PV guests

2015-03-18 Thread Daniel Kiper
On Wed, Mar 18, 2015 at 01:59:58PM +, David Vrabel wrote:
> On 18/03/15 13:57, Juergen Gross wrote:
> > On 03/18/2015 11:36 AM, David Vrabel wrote:
> >> On 16/03/15 10:31, Juergen Gross wrote:
> >>> On 03/16/2015 11:03 AM, Daniel Kiper wrote:
>  On Mon, Mar 16, 2015 at 06:35:04AM +0100, Juergen Gross wrote:
> > On 03/11/2015 04:40 PM, Boris Ostrovsky wrote:
> >> On 03/11/2015 10:42 AM, David Vrabel wrote:
> >>> On 10/03/15 13:35, Boris Ostrovsky wrote:
>  On 03/10/2015 07:40 AM, David Vrabel wrote:
> > On 09/03/15 14:10, David Vrabel wrote:
> >> Memory hotplug doesn't work with PV guests because:
> >>
> >>  a) The p2m cannot be expanded to cover the new sections.
> > Broken by 054954eb051f35e74b75a566a96fe756015352c8 (xen: switch to
> > linear virtual mapped sparse p2m list).
> >
> > This one would be non-trivial to fix.  We'd need a sparse set of
> > vm_area's for the p2m or similar.
> >
> >>  b) add_memory() builds page tables for the new sections
> >> which
> >> means
> >> the new pages must have valid p2m entries (or a BUG
> >> occurs).
> > After some more testing this appears to be broken by:
> >
> > 25b884a83d487fd62c3de7ac1ab5549979188482 (x86/xen: set regions
> > above
> > the
> > end of RAM as 1:1) included 3.16.
> >
> > This one can be trivially fixed by setting the new sections in
> > the p2m
> > to INVALID_P2M_ENTRY before calling add_memory().
>  Have you tried 3.17? As I said yesterday, it worked for me (with
>  4.4
>  Xen).
> >>> No.  But there are three bugs that prevent it from working in
> >>> 3.16+ so
> >>> I'm really not sure how you had a working in a 3.17 PV guest.
> >>
> >> This is what I have:
> >>
> >> [build@build-mk2 linux-boris]$ ssh root@tst008 cat
> >> /mnt/lab/bootstrap-x86_64/test_small.xm
> >> extra="console=hvc0 debug earlyprintk=xen "
> >> kernel="/mnt/lab/bootstrap-x86_64/vmlinuz"
> >> ramdisk="/mnt/lab/bootstrap-x86_64/initramfs.cpio.gz"
> >> memory=1024
> >> maxmem = 4096
> >> vcpus=1
> >> maxvcpus=3
> >> name="bootstrap-x86_64"
> >> on_crash="preserve"
> >> vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ]
> >> vnc=1
> >> vnclisten="0.0.0.0"
> >> disk=['phy:/dev/guests/bootstrap-x86_64,xvda,w']
> >> [build@build-mk2 linux-boris]$ ssh root@tst008 xl create
> >> /mnt/lab/bootstrap-x86_64/test_small.xm
> >> Parsing config from /mnt/lab/bootstrap-x86_64/test_small.xm
> >> [build@build-mk2 linux-boris]$ ssh root@tst008 xl list |grep
> >> bootstrap-x86_64
> >> bootstrap-x86_64 2  1024 1
> >> -b   5.4
> >> [build@build-mk2 linux-boris]$ ssh root@g-pvops uname -r
> >> 3.17.0upstream
> >> [build@build-mk2 linux-boris]$ ssh root@g-pvops dmesg|grep
> >> paravirtualized
> >> [0.00] Booting paravirtualized kernel on Xen
> >> [build@build-mk2 linux-boris]$ ssh root@g-pvops grep MemTotal
> >> /proc/meminfo
> >> MemTotal: 968036 kB
> >> [build@build-mk2 linux-boris]$ ssh root@tst008 xl mem-set
> >> bootstrap-x86_64 2048
> >> [build@build-mk2 linux-boris]$ ssh root@tst008 xl list |grep
> >> bootstrap-x86_64
> >> bootstrap-x86_64 2  2048 1
> >> -b   5.7
> >> [build@build-mk2 linux-boris]$ ssh root@g-pvops grep MemTotal
> >> /proc/meminfo
> >> MemTotal:2016612 kB
> >> [build@build-mk2 linux-boris]$
> >>
> >>
> >>>
> >>> Regardless, it definitely doesn't work now because of the linear p2m
> >>> change.  What do you want to do about this?
> >>
> >> Since backing out p2m changes is not an option I guess your patch is
> >> the
> >> only short-term alternative.
> >>
> >> But this still looks like a regression so perhaps Juergen can take a
> >> look to see how it can be fixed.
> >
> > Hmm, the p2m list is allocated for the maximum memory size of the
> > domain
> > which is obtained from the hypervisor. In case of Dom0 it is read via
> > XENMEM_maximum_reservation, for a domU it is based on the E820 memory
> > map read via XENMEM_memory_map.
> >
> > I just tested it with a 4.0-rc1 domU kernel with 512MB initial memory
> > and 4GB of maxmem. The E820 map looked like this:
> >
> > [0.00] Xen: [mem 0x-0x0009] usable
> > [0.00] Xen: [mem 0x000a-0x000f]
> > reserved
> > [0.00] Xen: [mem 0x0010-0x] usable
> >
> > So the complete 4GB were included, like they should. The resulting p2m
> > list is allocated in the needed size:
> >
> >

Re: [Xen-devel] OpenStack - Libvirt+Xen CI overview : anyone interested in a walk/through or presentation on how it works

2015-03-18 Thread Wei Liu
On Wed, Mar 18, 2015 at 01:12:50PM +, Lars Kurth wrote:
> Hi,
> 
> I added all the people who raised their hands so far to the TO list. As far 
> as I can tell we have people from the following timezones: GMT, GMT+1, MTZ, 
> ETZ - if I got this wrong, please let me know your timezone. Depending on 
> when we have the presentation, we will need to consider daylight savings 
> offsets.
> 
> I will give it another few days for more people to raise their hands. We can 
> then try and figure out a slot which fits everyone.
> 

I would be interested in such meeting. My time zone is the same as
George's and Anthony's.

Wei.

> Best Regards
> Lars
> 
> Begin forwarded message:
> 
> From: Bob Ball 
> To: "xen-devel@lists.xen.org" 
> Date: 10 March 2015 12:03:13 GMT
> Cc: Anthony Perard 
> Subject: [Xen-devel] OpenStack - Libvirt+Xen CI overview
> 
> For the last few weeks Anthony and I have been working on creating a CI 
> environment to run against all OpenStack jobs.  We're now in a position where 
> we can share the current status, overview of how it works and next steps.  We 
> actively want to support involvement in this effort from others with an 
> interest in libvirt+Xen's openstack integration.
> 
> The CI we have set up is follow the recommendations made by the OpenStack 
> official infrastructure maintainers, and reproduces a notable portion of the 
> official OpenStack CI environment to run these tests.  Namely this setup is 
> using:
> - Puppet to deploy the master node
> - Zuul to watch for code changes uploaded to review.openstack.org
> - Jenkins job builder to create Jenkins job definitions from a YAML file
> - Nodepool to automatically create single-use virtual machines in the 
> Rackspace public cloud 
> - Devstack-gate to run Tempest tests in serial
> 
> More information on Zuul, JJB, Nodepool and devstack-gate is available 
> through http://ci.openstack.org
> 
> The current status is that we have a zuul instance monitoring for jobs and 
> adding them to the queue of jobs to be run at 
> http://zuul.openstack.xenproject.org/
> 
> In the background Nodepool provisions virtual machines into a pool of nodes 
> ready to be used.  All ready nodes are automatically added to Jenkins 
> (https://jenkins.openstack.xenproject.org/), and then Zuul+Jenkins will 
> trigger a particular job on a node when one is available.
> 
> Logs are then uploaded to Rackspace's Cloud Files with sample logs for a 
> passing job at 
> http://logs.openstack.xenproject.org/52/162352/3/silent/dsvm-tempest-xen/da3ff30/index.html
> 
> I'd like to organise a meeting to walk through the various components of the 
> CI with those who are interested, so this is an initial call to find out who 
> is interested in finding out more!
> 
> Thanks,
> 
> Bob
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/5] libxl: vcpuset: Return error values.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 02:09:09PM +, Dario Faggioli wrote:
> On Wed, 2015-03-18 at 13:08 +, Ian Campbell wrote:
> > On Wed, 2015-03-18 at 13:06 +, Ian Campbell wrote:
> > > On Fri, 2015-03-13 at 16:26 -0400, Konrad Rzeszutek Wilk wrote:
> > > > The function does not return any values at all. Convert the
> > > > internal libxl ones (ERROR_FAIL, ..., etc) to positive values
> > > > and for the other cases just return standard libxl values.
> > > 
> > > It's not clear why you want to do this, in particular returning
> > > -ERROR_INVAL and inverting libxl error codes seems like a very strange
> > > thing to be doing.
> > 
> > BTW I know the xl error handling is horribly confused, and there are
> > even a small number of instances of -ERROR_* already, but I think those
> > are wrong and we shouldn't introduce more.
> > 
> Indeed. I did some xl error code refactoring for a series of mine a few
> days back, and as far as I could see, the most common pattern in xl is
> returning 0 or 1.

Gah, I seem to have looked at the wrong examples and thought that was
the proper way!
> 
> FWIW, I think we should not diverge any further from that and, at some
> point, convert 0/1 to EXIT_SUCCESS/EXIT_FAILURE.
> 
> > > I think you should either use ERROR_INVAL (not inverted) and propagate
> > > libxl rc's directly or convert them into something which suits xl, i.e.
> > > 0 and 1.
> > > 
> Again, +1 for 0 or 1.
> 
> Regards,
> Dario



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 22/24] tools/libxl: arm: Use an higher value for the GIC phandle

2015-03-18 Thread Julien Grall

Hi Ian,

On 24/02/2015 10:08, Ian Campbell wrote:

On Mon, 2015-02-23 at 22:02 +, Julien Grall wrote:


On 23/02/2015 14:36, Ian Campbell wrote:

On Thu, 2015-01-29 at 13:48 +, Julien Grall wrote:

On 29/01/15 12:28, Stefano Stabellini wrote:

On Thu, 29 Jan 2015, Julien Grall wrote:

On 29/01/15 11:07, Stefano Stabellini wrote:

On Tue, 13 Jan 2015, Julien Grall wrote:

The partial device tree may contains phandle. The Device Tree Compiler
tends to allocate the phandle from 1.

Reserve the ID 65000 for the GIC phandle. I think we can safely assume
that the partial device tree will never contain a such ID.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Wei Liu 



Shouldn't we at least check that the partial device tree doesn't contain
a conflicting phandle?


I don't think so. This will unlikely happen, and if it happens the guest
will crash with an obvious error.


It is good that the error is obvious.

But how expensive is to check for it?


I would have to check the validity of the properties (name + value
size). At least the properties "linux,phandle" and "phandle" should be
checked.

Though I could do in copy_properties but I find it hackish.


Can't you just track the largest phandle ever seen during
copy_properties and then use N+1 for the GIC?


Now the we decided to trust the input device tree, it would be easier to
write the code.

I will give a look.




Think about the poor user that ends up in this situation: the fact that
is unlikely only makes it harder for a user to figure out what to do to
fix it.


The poor use will have to write his device tree by hand to hit this
error ;).


Or use a new version of dtc which does things differently for some
reason.


And you would not be able to get a phandle for the GIC if largest
phandle is too high.

So the guest won't work correctly.


Indeed, filling in a bitmap as you go might be another option, although
you'd either need an 8k bitmap (not so bad in userspace) or to realloc
as it grows.


I though about different implementation for tracking the phandle. All 
require to parse the partial device tree twice: one for tracking the 
phandle and the second for copy the nodes.


This is because we need to have the GIC phandle at the time we create 
the root node (properties has to be created before the child nodes).


So I plan to keep this patch as it is and documenting the value of the 
GIC. Would it be fine for you?


Though I could put an higher value too.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/5] libxl: vcpuset: Return error values.

2015-03-18 Thread Dario Faggioli
On Wed, 2015-03-18 at 13:08 +, Ian Campbell wrote:
> On Wed, 2015-03-18 at 13:06 +, Ian Campbell wrote:
> > On Fri, 2015-03-13 at 16:26 -0400, Konrad Rzeszutek Wilk wrote:
> > > The function does not return any values at all. Convert the
> > > internal libxl ones (ERROR_FAIL, ..., etc) to positive values
> > > and for the other cases just return standard libxl values.
> > 
> > It's not clear why you want to do this, in particular returning
> > -ERROR_INVAL and inverting libxl error codes seems like a very strange
> > thing to be doing.
> 
> BTW I know the xl error handling is horribly confused, and there are
> even a small number of instances of -ERROR_* already, but I think those
> are wrong and we shouldn't introduce more.
> 
Indeed. I did some xl error code refactoring for a series of mine a few
days back, and as far as I could see, the most common pattern in xl is
returning 0 or 1.

FWIW, I think we should not diverge any further from that and, at some
point, convert 0/1 to EXIT_SUCCESS/EXIT_FAILURE.

> > I think you should either use ERROR_INVAL (not inverted) and propagate
> > libxl rc's directly or convert them into something which suits xl, i.e.
> > 0 and 1.
> > 
Again, +1 for 0 or 1.

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 07:41:55AM +, Jan Beulich wrote:
> >>> On 17.03.15 at 18:44,  wrote:
> > As you can see to preserve the existing functionality such as
> > being able to schedule N amount of interrupt injections
> > for the N interrupts we might get - I modified '->masked'
> > to be an atomic counter.
> 
> Why would that be? When an earlier interrupt wasn't fully handled,
> real hardware wouldn't latch more than one further instance either.

We acknowledge the interrupt in the hypervisor - as in we call
->ack on the handler (which for MSI is an nop anyhow).

If the device is misconfigured and keeps on sending burst of
interrupts every 10 msec for 1msec we can dead-lock.

Either way we should tell the guest about those interrupts.
> 
> > The end result is that we can still live-lock. Unless we:
> >  - Drop on the floor the injection of N interrupts and
> >just deliever at max one per VMX_EXIT (and not bother
> >with interrupts arriving when we are in the VMX handler).
> 
> I'm afraid I again don't see the point here.

I am basing all of this on the assumption that we have
many interrupts for the same device coming it - and we have
not been able to tell the guest about it (the guest could
be descheduled, too slow, etc) so that it can do what it
needs to silence the device.

It might be very well an busted device - and if that is
the case be that - but we must not dead-lock due to it.
> 
> >  - Alter the softirq code slightly - to have an variant
> >which will only iterate once over the pending softirq
> >bits per call. (so save an copy of the bitmap on the
> >stack when entering the softirq handler - and use that.
> >We could also xor it against the current to catch any
> >non-duplicate bits being set that we should deal with).
> 
> That's clearly not an option: The solution should be isolated
> to DPCI code, i.e. without altering existing behavior in other
> (more generic) components.
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1] xen/balloon: disable memory hotplug in PV guests

2015-03-18 Thread David Vrabel
On 18/03/15 13:57, Juergen Gross wrote:
> On 03/18/2015 11:36 AM, David Vrabel wrote:
>> On 16/03/15 10:31, Juergen Gross wrote:
>>> On 03/16/2015 11:03 AM, Daniel Kiper wrote:
 On Mon, Mar 16, 2015 at 06:35:04AM +0100, Juergen Gross wrote:
> On 03/11/2015 04:40 PM, Boris Ostrovsky wrote:
>> On 03/11/2015 10:42 AM, David Vrabel wrote:
>>> On 10/03/15 13:35, Boris Ostrovsky wrote:
 On 03/10/2015 07:40 AM, David Vrabel wrote:
> On 09/03/15 14:10, David Vrabel wrote:
>> Memory hotplug doesn't work with PV guests because:
>>
>>  a) The p2m cannot be expanded to cover the new sections.
> Broken by 054954eb051f35e74b75a566a96fe756015352c8 (xen: switch to
> linear virtual mapped sparse p2m list).
>
> This one would be non-trivial to fix.  We'd need a sparse set of
> vm_area's for the p2m or similar.
>
>>  b) add_memory() builds page tables for the new sections
>> which
>> means
>> the new pages must have valid p2m entries (or a BUG
>> occurs).
> After some more testing this appears to be broken by:
>
> 25b884a83d487fd62c3de7ac1ab5549979188482 (x86/xen: set regions
> above
> the
> end of RAM as 1:1) included 3.16.
>
> This one can be trivially fixed by setting the new sections in
> the p2m
> to INVALID_P2M_ENTRY before calling add_memory().
 Have you tried 3.17? As I said yesterday, it worked for me (with
 4.4
 Xen).
>>> No.  But there are three bugs that prevent it from working in
>>> 3.16+ so
>>> I'm really not sure how you had a working in a 3.17 PV guest.
>>
>> This is what I have:
>>
>> [build@build-mk2 linux-boris]$ ssh root@tst008 cat
>> /mnt/lab/bootstrap-x86_64/test_small.xm
>> extra="console=hvc0 debug earlyprintk=xen "
>> kernel="/mnt/lab/bootstrap-x86_64/vmlinuz"
>> ramdisk="/mnt/lab/bootstrap-x86_64/initramfs.cpio.gz"
>> memory=1024
>> maxmem = 4096
>> vcpus=1
>> maxvcpus=3
>> name="bootstrap-x86_64"
>> on_crash="preserve"
>> vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ]
>> vnc=1
>> vnclisten="0.0.0.0"
>> disk=['phy:/dev/guests/bootstrap-x86_64,xvda,w']
>> [build@build-mk2 linux-boris]$ ssh root@tst008 xl create
>> /mnt/lab/bootstrap-x86_64/test_small.xm
>> Parsing config from /mnt/lab/bootstrap-x86_64/test_small.xm
>> [build@build-mk2 linux-boris]$ ssh root@tst008 xl list |grep
>> bootstrap-x86_64
>> bootstrap-x86_64 2  1024 1
>> -b   5.4
>> [build@build-mk2 linux-boris]$ ssh root@g-pvops uname -r
>> 3.17.0upstream
>> [build@build-mk2 linux-boris]$ ssh root@g-pvops dmesg|grep
>> paravirtualized
>> [0.00] Booting paravirtualized kernel on Xen
>> [build@build-mk2 linux-boris]$ ssh root@g-pvops grep MemTotal
>> /proc/meminfo
>> MemTotal: 968036 kB
>> [build@build-mk2 linux-boris]$ ssh root@tst008 xl mem-set
>> bootstrap-x86_64 2048
>> [build@build-mk2 linux-boris]$ ssh root@tst008 xl list |grep
>> bootstrap-x86_64
>> bootstrap-x86_64 2  2048 1
>> -b   5.7
>> [build@build-mk2 linux-boris]$ ssh root@g-pvops grep MemTotal
>> /proc/meminfo
>> MemTotal:2016612 kB
>> [build@build-mk2 linux-boris]$
>>
>>
>>>
>>> Regardless, it definitely doesn't work now because of the linear p2m
>>> change.  What do you want to do about this?
>>
>> Since backing out p2m changes is not an option I guess your patch is
>> the
>> only short-term alternative.
>>
>> But this still looks like a regression so perhaps Juergen can take a
>> look to see how it can be fixed.
>
> Hmm, the p2m list is allocated for the maximum memory size of the
> domain
> which is obtained from the hypervisor. In case of Dom0 it is read via
> XENMEM_maximum_reservation, for a domU it is based on the E820 memory
> map read via XENMEM_memory_map.
>
> I just tested it with a 4.0-rc1 domU kernel with 512MB initial memory
> and 4GB of maxmem. The E820 map looked like this:
>
> [0.00] Xen: [mem 0x-0x0009] usable
> [0.00] Xen: [mem 0x000a-0x000f]
> reserved
> [0.00] Xen: [mem 0x0010-0x] usable
>
> So the complete 4GB were included, like they should. The resulting p2m
> list is allocated in the needed size:
>
> [0.00] p2m virtual area at c900, size is 80
>
> So what is your problem here? Can you post the E820 map and the p2m
> map
> info for your failing domain, please?

 If you use memory hotplug then maxmem is not a limit from guest kernel
 p

Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 07:38:12AM +, Jan Beulich wrote:
> >>> On 17.03.15 at 18:15,  wrote:
> > On Tue, Mar 17, 2015 at 04:06:14PM +, Jan Beulich wrote:
> >> >>> On 17.03.15 at 16:38,  wrote:
> >> > --- a/xen/drivers/passthrough/io.c
> >> > +++ b/xen/drivers/passthrough/io.c
> >> > @@ -804,7 +804,17 @@ static void dpci_softirq(void)
> >> >  d = pirq_dpci->dom;
> >> >  smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
> >> >  if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> >> > -BUG();
> >> > +{
> >> > +unsigned long flags;
> >> > +
> >> > +/* Put back on the list and retry. */
> >> > +local_irq_save(flags);
> >> > +list_add_tail(&pirq_dpci->softirq_list, 
> >> > &this_cpu(dpci_list));
> >> > +local_irq_restore(flags);
> >> > +
> >> > +raise_softirq(HVM_DPCI_SOFTIRQ);
> >> > +continue;
> >> > +}
> >> 
> >> As just said in another mail - unless there are convincing new
> >> arguments in favor of this (more of a hack than a real fix), I'm
> >> not going to accept it and instead consider reverting the
> >> offending commit. Iirc the latest we had come to looked quite a
> >> bit better than this one.
> > 
> > The latest one (please see attached) would cause an dead-lock iff
> > on the CPU we are running the softirq and an do_IRQ comes for the
> > exact dpci we are in process of executing.
> 
> I'm afraid I'm not seeing it - please explain.

Lets assume that the device is an PCIe with MSI. We have only one
VCPU in the guest.

We receive the first interrupt, end up going:
vmx_vmexit_handler
 - case EXIT_REASON_EXTERNAL_INTERRUPT
   \- vmx_do_extint
\- do_IRQ
  \- __do_IRQ_guest
  \- hvm_do_IRQ_dpci
 \- raise_softirq_for
[DPCI_SOFTIRQ bit set]
vmx_process_softirq
 sti
 do_softirq
   -\ __do_sofitq_
\- dpci_softirq
-\ hvm_dirq_assist
   [state is 'running']

[Same vector comes in]
do_IRQ
 \- __do_IRQ_guest
\- hvm_do_IRQ_dpci
   \- raise_softirq_for
   [here we either
a) spin waiting 'running' to be done or -- dead-lock
b) we just exit out and drop this interrupt, 
c) increment 'masked' to tell 'dpci_softirq' to reschedule
   -- live lock if this keeps on going]

Now c) is protected - the do_IRQ has anti-storm code so eventually
it will stop.
> 
> As to the code - I think switch() is rather hiding the intentions
> here, i.e. the code would be better readable if using two if()s:
> 
> +for ( ;; )
> +{
> +old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED);
> +/* If the 'state' is 0 (not in use) we can schedule it. */
> +if ( !old )
> +break;
> +if ( !(old & (1 << STATE_RUN)) )
> +{
> +/* Whenever STATE_SCHED is set we MUST not schedule it. */
> +assert(old & (1 << STATE_SCHED));
> +return;
> +}
> +}
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1] xen/balloon: disable memory hotplug in PV guests

2015-03-18 Thread Juergen Gross

On 03/18/2015 11:36 AM, David Vrabel wrote:

On 16/03/15 10:31, Juergen Gross wrote:

On 03/16/2015 11:03 AM, Daniel Kiper wrote:

On Mon, Mar 16, 2015 at 06:35:04AM +0100, Juergen Gross wrote:

On 03/11/2015 04:40 PM, Boris Ostrovsky wrote:

On 03/11/2015 10:42 AM, David Vrabel wrote:

On 10/03/15 13:35, Boris Ostrovsky wrote:

On 03/10/2015 07:40 AM, David Vrabel wrote:

On 09/03/15 14:10, David Vrabel wrote:

Memory hotplug doesn't work with PV guests because:

 a) The p2m cannot be expanded to cover the new sections.

Broken by 054954eb051f35e74b75a566a96fe756015352c8 (xen: switch to
linear virtual mapped sparse p2m list).

This one would be non-trivial to fix.  We'd need a sparse set of
vm_area's for the p2m or similar.


 b) add_memory() builds page tables for the new sections which
means
the new pages must have valid p2m entries (or a BUG occurs).

After some more testing this appears to be broken by:

25b884a83d487fd62c3de7ac1ab5549979188482 (x86/xen: set regions above
the
end of RAM as 1:1) included 3.16.

This one can be trivially fixed by setting the new sections in
the p2m
to INVALID_P2M_ENTRY before calling add_memory().

Have you tried 3.17? As I said yesterday, it worked for me (with 4.4
Xen).

No.  But there are three bugs that prevent it from working in 3.16+ so
I'm really not sure how you had a working in a 3.17 PV guest.


This is what I have:

[build@build-mk2 linux-boris]$ ssh root@tst008 cat
/mnt/lab/bootstrap-x86_64/test_small.xm
extra="console=hvc0 debug earlyprintk=xen "
kernel="/mnt/lab/bootstrap-x86_64/vmlinuz"
ramdisk="/mnt/lab/bootstrap-x86_64/initramfs.cpio.gz"
memory=1024
maxmem = 4096
vcpus=1
maxvcpus=3
name="bootstrap-x86_64"
on_crash="preserve"
vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ]
vnc=1
vnclisten="0.0.0.0"
disk=['phy:/dev/guests/bootstrap-x86_64,xvda,w']
[build@build-mk2 linux-boris]$ ssh root@tst008 xl create
/mnt/lab/bootstrap-x86_64/test_small.xm
Parsing config from /mnt/lab/bootstrap-x86_64/test_small.xm
[build@build-mk2 linux-boris]$ ssh root@tst008 xl list |grep
bootstrap-x86_64
bootstrap-x86_64 2  1024 1
-b   5.4
[build@build-mk2 linux-boris]$ ssh root@g-pvops uname -r
3.17.0upstream
[build@build-mk2 linux-boris]$ ssh root@g-pvops dmesg|grep
paravirtualized
[0.00] Booting paravirtualized kernel on Xen
[build@build-mk2 linux-boris]$ ssh root@g-pvops grep MemTotal
/proc/meminfo
MemTotal: 968036 kB
[build@build-mk2 linux-boris]$ ssh root@tst008 xl mem-set
bootstrap-x86_64 2048
[build@build-mk2 linux-boris]$ ssh root@tst008 xl list |grep
bootstrap-x86_64
bootstrap-x86_64 2  2048 1
-b   5.7
[build@build-mk2 linux-boris]$ ssh root@g-pvops grep MemTotal
/proc/meminfo
MemTotal:2016612 kB
[build@build-mk2 linux-boris]$




Regardless, it definitely doesn't work now because of the linear p2m
change.  What do you want to do about this?


Since backing out p2m changes is not an option I guess your patch is
the
only short-term alternative.

But this still looks like a regression so perhaps Juergen can take a
look to see how it can be fixed.


Hmm, the p2m list is allocated for the maximum memory size of the domain
which is obtained from the hypervisor. In case of Dom0 it is read via
XENMEM_maximum_reservation, for a domU it is based on the E820 memory
map read via XENMEM_memory_map.

I just tested it with a 4.0-rc1 domU kernel with 512MB initial memory
and 4GB of maxmem. The E820 map looked like this:

[0.00] Xen: [mem 0x-0x0009] usable
[0.00] Xen: [mem 0x000a-0x000f] reserved
[0.00] Xen: [mem 0x0010-0x] usable

So the complete 4GB were included, like they should. The resulting p2m
list is allocated in the needed size:

[0.00] p2m virtual area at c900, size is 80

So what is your problem here? Can you post the E820 map and the p2m map
info for your failing domain, please?


If you use memory hotplug then maxmem is not a limit from guest kernel
point of view (host still must allow that operation but it is another
not related issue). The problem is that p2m must be dynamically
expendable
to support it. Earlier implementation supported that thing and memory
hotplug worked without any issue.


Okay, now I get it.

The problem with the earlier p2m implementation was that it was
expendable to support only up to 512GB of RAM. So we need some way to
tell the kernel how much virtual memory it should reserve for the p2m
list if memory hotplug is enabled. We could:

a) use a configurable maximum (e.g. for 512GB RAM as today)


I would set the p2m virtual area to cover up to 512 GB (needs 1 GB of
virt space) for a 64-bit guest and up to 64 GB (needs 64 MB of virt
space) for a 32-bit guest.


Are 64 GB for 32 bit guests a sensible default? This will need more than
10% of the available virtual kernel space (taking fixmap etc. into
accou

Re: [Xen-devel] [PATCH v2 2/5] libxl: vcpuset: Return error values.

2015-03-18 Thread Konrad Rzeszutek Wilk
On Wed, Mar 18, 2015 at 01:06:18PM +, Ian Campbell wrote:
> On Fri, 2015-03-13 at 16:26 -0400, Konrad Rzeszutek Wilk wrote:
> > The function does not return any values at all. Convert the
> > internal libxl ones (ERROR_FAIL, ..., etc) to positive values
> > and for the other cases just return standard libxl values.
> 
> It's not clear why you want to do this, in particular returning
> -ERROR_INVAL and inverting libxl error codes seems like a very strange
> thing to be doing.

The ERROR_INVAL are negative values. Inverting them makes them
positive - which is what the rest of xl does for error vales?
> 
> I think you should either use ERROR_INVAL (not inverted) and propagate
> libxl rc's directly or convert them into something which suits xl, i.e.
> 0 and 1.

Oh, there is a lot of other code (xl <>) which return -ERROR_XYZ so that
the error values are positive.

How should 'xl' return errors? Just as 1 for failure or actually
use an -ERROR_XYZ so that folks can map them to -ERROR_XYZ?

> 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> >  tools/libxl/xl_cmdimpl.c | 23 +--
> >  1 file changed, 13 insertions(+), 10 deletions(-)
> > 
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 2d7145f..454a895 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -5013,17 +5013,18 @@ int main_vcpupin(int argc, char **argv)
> >  return rc;
> >  }
> >  
> > -static void vcpuset(uint32_t domid, const char* nr_vcpus, int check_host)
> > +static int vcpuset(uint32_t domid, const char* nr_vcpus, int check_host)
> >  {
> >  char *endptr;
> >  unsigned int max_vcpus, i;
> >  libxl_bitmap cpumap;
> > +int rc;
> >  
> >  libxl_bitmap_init(&cpumap);
> >  max_vcpus = strtoul(nr_vcpus, &endptr, 10);
> >  if (nr_vcpus == endptr) {
> >  fprintf(stderr, "Error: Invalid argument.\n");
> > -return;
> > +return -ERROR_INVAL;
> >  }
> >  
> >  /*
> > @@ -5036,22 +5037,25 @@ static void vcpuset(uint32_t domid, const char* 
> > nr_vcpus, int check_host)
> >  fprintf(stderr, "You are overcommmitting! You have %d physical 
> > " \
> >  " CPUs and want %d vCPUs! Aborting, use --ignore-host 
> > to " \
> >  " continue\n", host_cpu, max_vcpus);
> > -return;
> > +return -ERROR_INVAL;
> >  }
> >  /* NB: This also limits how many are set in the bitmap */
> >  max_vcpus = (max_vcpus > host_cpu ? host_cpu : max_vcpus);
> >  }
> > -if (libxl_cpu_bitmap_alloc(ctx, &cpumap, max_vcpus)) {
> > -fprintf(stderr, "libxl_cpu_bitmap_alloc failed\n");
> > -return;
> > +rc = libxl_cpu_bitmap_alloc(ctx, &cpumap, max_vcpus);
> > +if (rc) {
> > +fprintf(stderr, "libxl_cpu_bitmap_alloc failed, rc: %d\n", rc);
> > +return -rc;
> >  }
> >  for (i = 0; i < max_vcpus; i++)
> >  libxl_bitmap_set(&cpumap, i);
> >  
> > -if (libxl_set_vcpuonline(ctx, domid, &cpumap) < 0)
> > -fprintf(stderr, "libxl_set_vcpuonline failed domid=%d 
> > max_vcpus=%d\n", domid, max_vcpus);
> > +rc = libxl_set_vcpuonline(ctx, domid, &cpumap);
> > +if (rc)
> > +fprintf(stderr, "libxl_set_vcpuonline failed domid=%d 
> > max_vcpus=%d, rc: %d\n", domid, max_vcpus, rc);
> >  
> >  libxl_bitmap_dispose(&cpumap);
> > +return rc ? -rc : 0;
> >  }
> >  
> >  int main_vcpuset(int argc, char **argv)
> > @@ -5070,8 +5074,7 @@ int main_vcpuset(int argc, char **argv)
> >  break;
> >  }
> >  
> > -vcpuset(find_domain(argv[optind]), argv[optind + 1], check_host);
> > -return 0;
> > +return vcpuset(find_domain(argv[optind]), argv[optind + 1], 
> > check_host);
> >  }
> >  
> >  static void output_xeninfo(void)
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/5] libxl: wait for stubdom to be ready

2015-03-18 Thread Ian Campbell
On Fri, 2015-03-13 at 10:34 +, Wei Liu wrote:
> Watch /local/domain/$dm_domid/device-model/$domid/state, wait until
> state turns "running" then unpause guest.
> 
> LIBXL_STUBDOM_START_TIMEOUT is the timeout used wait for stubdom to be
> ready. My test on a very old machine (Core 2 6400) showed that it might
> need more than 20s before the stubdom is ready to serve DomU.
> 
> Signed-off-by: Wei Liu 
> ---
>  tools/libxl/libxl_dm.c   | 39 ++-
>  tools/libxl/libxl_internal.h |  2 ++
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 4a38455..ad2ef41 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -984,6 +984,8 @@ static void stubdom_pvqemu_cb(libxl__egc *egc,
>  static void spawn_stubdom_pvqemu_destroy_cb(libxl__egc *egc,
>  libxl__destroy_domid_state *dis,
>  int rc);
> +static void stubdom_xswait_cb(libxl__egc *egc, libxl__xswait_state *xswait,
> +  int rc, const char *p);
>  
>  char *libxl__stub_dm_name(libxl__gc *gc, const char *guest_name)
>  {
> @@ -1273,16 +1275,51 @@ static void stubdom_pvqemu_cb(libxl__egc *egc,
>  rc = libxl_domain_unpause(CTX, dm_domid);
>  if (rc) goto out;
>  
> +libxl__xswait_init(&sdss->xswait);
> +sdss->xswait.ao = ao;
> +sdss->xswait.what = GCSPRINTF("Stubdom %d startup", dm_domid);

Maybe include the domid too, e.g. "Stubdom %d for dom %d startup"?

(And probably %u throughout?)

> +static void stubdom_xswait_cb(libxl__egc *egc, libxl__xswait_state *xswait,
> +  int rc, const char *p)
> +{
> +EGC_GC;
> +libxl__stub_dm_spawn_state *sdss = CONTAINER_OF(xswait, *sdss, xswait);
> +uint32_t dm_domid = sdss->pvqemu.guest_domid;
> +
> +if (rc) {
> +if (rc == ERROR_TIMEDOUT)
> +LOG(ERROR, "%s: startup timed out", xswait->what);
> +goto out;
> +}
> +
> +if (!p) return;
> +
> +if (strcmp(p, "running"))
> +return;
>   out:
>  if (rc) {
>  if (dm_domid) {
> -sdss->dis.ao = ao;
> +sdss->dis.ao = sdss->dm.spawn.ao;
>  sdss->dis.domid = dm_domid;
>  sdss->dis.callback = spawn_stubdom_pvqemu_destroy_cb;
>  libxl__destroy_domid(egc, &sdss->dis);
> +libxl__xswait_stop(gc, xswait);

Can you do this once befire the if (rc)?

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/5] libxl: use new QEMU xenstore protocol

2015-03-18 Thread Ian Campbell
On Fri, 2015-03-13 at 10:34 +, Wei Liu wrote:
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 3dedad4..4a38455 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -998,7 +998,7 @@ void libxl__spawn_stub_dm(libxl__egc *egc, 
> libxl__stub_dm_spawn_state *sdss)
>  libxl_device_vfb *vfb;
>  libxl_device_vkb *vkb;
>  char **args;
> -struct xs_permissions perm[2];
> +struct xs_permissions perm[1];
>  xs_transaction_t t;
>  
>  /* convenience aliases */
> @@ -1106,16 +1106,16 @@ void libxl__spawn_stub_dm(libxl__egc *egc, 
> libxl__stub_dm_spawn_state *sdss)
>  }
>  xs_set_target(ctx->xsh, dm_domid, guest_domid);
>  
> -perm[0].id = dm_domid;
> -perm[0].perms = XS_PERM_NONE;
> -perm[1].id = guest_domid;
> -perm[1].perms = XS_PERM_READ;
> +perm[0].id = guest_domid;
> +perm[0].perms = XS_PERM_READ;

This will make the node owned by the guest (and hence r/w to the guest)
and set the perms for everyone else to r/o. I don't think this is what
you meant?

(The way perms are specified is a bit confusing, check out
http://wiki.xen.org/wiki/XenBus#Permissions )


> -return GCSPRINTF("/local/domain/0/device-model/%d/physmap/%s/%s",
> -domid, phys_offset, node);
> +return GCSPRINTF("/local/domain/%d/device-model/%d/physmap/%s/%s",
> + dm_domid, domid, phys_offset, node);

This sort of thing might imply that the helper takes the tail of the
path?
> SPRINTF(
> -"/local/domain/0/device-model/%d/physmap", domid), &num);
> +"/local/domain/%d/device-model/%d/physmap",

I only just noticed, but I think you want %u not %d (since dm_domid is
unsigned), although given the existing domid one is wrong too maybe you
don't want to bother...

> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index f3ae132..4aeb2bb 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -850,11 +850,14 @@ static int qemu_pci_add_xenstore(libxl__gc *gc, 
> uint32_t domid,
>  int rc = 0;
>  char *path;
>  char *state, *vdevfn;
> +uint32_t dm_domid;
>  
> -path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", 
> domid);
> +dm_domid = libxl_get_stubdom_id(CTX, domid);
> +path = libxl__sprintf(gc, "/local/domain/%d/device-model/%d/state",
> +  dm_domid, domid);

Maybe (up to you) switch thing to GCSPRINTF as you touch them?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/5] libxl: use LIBXL_TOOLSTACK_DOMID

2015-03-18 Thread Ian Campbell
On Fri, 2015-03-13 at 10:34 +, Wei Liu wrote:
> The function in question is libxl__spawn_local_dm. We should use
> LIBXL_TOOLSTACK_DOMID when constructing xenstore path.
> 
> Currently LIBXL_TOOLSTACK_DOMID is 0, so this patch introduces no
> functional change.

As I mentioned on the previous patch, path should come from a helper
rather than repeated everywhere.

> 
> Signed-off-by: Wei Liu 
> ---
>  tools/libxl/libxl_dm.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 0fd5ffa..3dedad4 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -1375,7 +1375,8 @@ void libxl__spawn_local_dm(libxl__egc *egc, 
> libxl__dm_spawn_state *dmss)
>  free(path);
>  }
>  
> -path = libxl__sprintf(gc, "/local/domain/0/device-model/%d", domid);
> +path = libxl__sprintf(gc, "/local/domain/%d/device-model/%d",
> +  LIBXL_TOOLSTACK_DOMID, domid);
>  xs_mkdir(ctx->xsh, XBT_NULL, path);
>  
>  if (b_info->type == LIBXL_DOMAIN_TYPE_HVM &&
> @@ -1424,7 +1425,8 @@ retry_transaction:
>  LIBXL__LOG(CTX, XTL_DEBUG, "  %s", *arg);
>  
>  spawn->what = GCSPRINTF("domain %d device model", domid);
> -spawn->xspath = GCSPRINTF("/local/domain/0/device-model/%d/state", 
> domid);
> +spawn->xspath = GCSPRINTF("/local/domain/%d/device-model/%d/state",
> +  LIBXL_TOOLSTACK_DOMID, domid);
>  spawn->timeout_ms = LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
>  spawn->pidpath = GCSPRINTF("%s/image/device-model-pid", dom_path);
>  spawn->midproc_cb = libxl__spawn_record_pid;



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 4/5] libxl: vcpuset: Remove useless limit on max_vcpus.

2015-03-18 Thread Ian Campbell
On Fri, 2015-03-13 at 16:26 -0400, Konrad Rzeszutek Wilk wrote:
> The check is superflous. If the 'max_vcpus' (argument
> value) is greater than  pCPU and --ignore-host has not
> been supplied we would print an warning and return
> and not call this code.
> 
> If the --ignore-host parameter had been used we would
> never end up in this condition and enforce 'max_vcpus'.
> 
> The only time it would be invoked is if max_vcpus < host_cpu
> in which case it would set max_vcpus to max_vcpus.
> 
> In short - it is dead code.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/5] libxl: remove DM path in libxl__device_model_destroy

2015-03-18 Thread Ian Campbell
On Wed, 2015-03-18 at 13:21 +, Ian Campbell wrote:
> On Fri, 2015-03-13 at 10:34 +, Wei Liu wrote:
> > ... because it is the right place to clean up device model stuffs.
> 
> ... and not devices_destroy_cb because it is the right ...
> 
> (also "stuff").
> 
> > And the path should use LIBXL_TOOLSTACK_DOMID instead of hardcoded 0.
> 
> Between this and what is in the next patch I think you probably should
> refactor into a helper function to get the correct path for a given
> domid.
> 
> FWIW I think you might also be able to do:
> GCSPRINTF("/local/domain/" #LIBXL_TOOLSTACK_DOMID "/device-model/%d",
>   domid);
> 
> If you want, although perhaps the intention is for it to eventually not
> be a hard-define of 0 and become e.g. a function call or a global
> variable reference, in which case best not?

Having reread patch #0 more carefully I suppose the helper would need to
take the dm_domid too, which would then rule out that cpp trick.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >