Re: [Xen-devel] [PATCH v5 4/4] MAINTAINERS: support for xen-access

2015-07-09 Thread Lengyel, Tamas
On Thu, Jul 9, 2015 at 9:17 AM, Razvan Cojocaru rcojoc...@bitdefender.com
wrote:

 On 07/09/2015 04:14 PM, Tamas K Lengyel wrote:
  Add tools/tests/xen-acess to the supported list under VM EVENT/MEM
 ACCESS.
 
  Signed-off-by: Tamas K Lengyel tleng...@novetta.com
  ---
   MAINTAINERS | 1 +
   1 file changed, 1 insertion(+)
 
  diff --git a/MAINTAINERS b/MAINTAINERS
  index 1e74688..dae0aa3 100644
  --- a/MAINTAINERS
  +++ b/MAINTAINERS
  @@ -389,6 +389,7 @@ F:xen/common/mem_access.c
   F:   xen/arch/x86/hvm/event.c
   F:   xen/arch/x86/monitor.c
   F:   xen/arch/x86/vm_event.c
  +F:  tools/tests/xen-access

 This seems to use the wrong indentation type, it doesn't align with the
 rest.

 With that fixed,
 Acked-by: Razvan Cojocaru rcojoc...@bitdefender.com


 Cheers,
 Razvan


Ah what, it's the only spot where it's not using whitespace for
indentation? =) Nice catch.

Tamas
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Pass-through in Xen ARM - Draft 2.

2015-07-09 Thread Julien Grall



On 09/07/2015 12:30, Manish Jaggi wrote:

On Thursday 09 July 2015 01:38 PM, Julien Grall wrote:

On 09/07/2015 08:13, Manish Jaggi wrote:


If this was a domctl there might be scope for accepting an
implementation which made assumptions such as sbdf == deviceid. However
I'd still like to see this topic given proper treatment in the design
and not just glossed over with this is how ThunderX does things.

I got your point.

Or maybe the solution is simple and we should just do it now -- i.e.
can
we add a new field to the PHYSDEVOP_pci_host_bridge_add argument struct
which contains the base deviceid for that bridge

deviceId would be same as sbdf. As we dont have a way to translate sbdf
to deviceID.


I think we have to be clear in this design document about the
different meaning.

When the Device Tree is used, it's assumed that the deviceID will be
equal to the requester ID and not the sbdf.

Does SMMU v2 has a concept of requesterID.
I see requesterID term in SMMUv3


The requester ID is part of the PCI spec and not the SMMU.

The version of the SMMUv2 spec doesn't mention anything about PCI. I 
suspect this is because the spec has been written before the introduced 
PCI. And therefore this is implementation defined.




Linux provides a function (pci_for_each_dma_alias) which will return a
requester ID for a given PCI device. It appears that the BDF (the 's'
of sBDF is only internal to Linux and not part of the hardware) is
equal to the requester ID on your platform but we can't assume it for
anyone else.

so you mean requesterID = pci_for_each_dma_alias(sbdf)


Yes.



When we have a PCI in hand, we have to find the requester ID for this
device.

That is the question. How to map requesterID to sbdf


See above.


On

Once ?


Yes.


we have it we can deduce the streamID and the deviceID. The way to do
it will depend on whether we use device tree or ACPI:
- For device tree, the streamID, and deviceID will be equal to the
requester ID

what do you think should be streamID when a device is PCI EP and is
enumerated. Also per ARM SMMU 2.0 spec  StreamID is implementation
specific.
As per SMMUv3 specs
For PCI, it is intended that StreamID is generated from the PCI
RequesterID. The generation function may be 1:1
where one Root Complex is hosted by one SMMU


I think my sentence For device tree, the streamID, and deviceID will be 
equal to the requester ID is pretty clear. FWIW, this is the solution 
chosen for Linux:


Assume Stream ID == Requester ID for now. We need a way to describe the 
ID mappings in FDT (see arm_smmu_add_pci_device in 
drivers/iommu/arm-smmu.c).


You can refer to my point below about ACPI tables. The solution would be 
exactly the same. If we have a requester ID in hand we can do pretty 
much everything.


The whole point of my previous email is to give insight about what we 
need and what we can deduce based on firmware tables. I didn't cover 
anything implementation details.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 07/13] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.

2015-07-09 Thread Jan Beulich
 On 01.07.15 at 20:09, edmund.h.wh...@intel.com wrote:
 @@ -1830,6 +1831,20 @@ static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
  vmx_vmcs_exit(v);
  }
  
 +static int vmx_vcpu_emulate_vmfunc(struct cpu_user_regs *regs)
 +{
 +int rc = X86EMUL_EXCEPTION;
 +struct vcpu *v = current;

curr

 +if ( !cpu_has_vmx_vmfunc  altp2m_active(v-domain) 
 + regs-eax == 0 
 + p2m_switch_vcpu_altp2m_by_id(v, (uint16_t)regs-ecx) )
 +{
 +rc = X86EMUL_OKAY;
 +}

Pointless braces.

 @@ -2095,6 +2112,12 @@ static void vmx_invlpg_intercept(unsigned long vaddr)
  vpid_sync_vcpu_gva(curr, vaddr);
  }
  
 +static int vmx_vmfunc_intercept(struct cpu_user_regs *regs)
 +{
 +gdprintk(XENLOG_ERR, Failed guest VMFUNC execution\n);
 +return X86EMUL_EXCEPTION;
 +}
 +
  static int vmx_cr_access(unsigned long exit_qualification)
  {
  struct vcpu *curr = current;
 @@ -3245,6 +3268,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
  update_guest_eip();
  break;
  
 +case EXIT_REASON_VMFUNC:
 +if ( vmx_vmfunc_intercept(regs) == X86EMUL_OKAY )
 +update_guest_eip();
 +else
 +hvm_inject_hw_exception(TRAP_invalid_op, 
 HVM_DELIVER_NO_ERROR_CODE);
 +break;

The two changes don't fit together (and continue to look pointless
considering that the helper unconditionally returns
X86EMUL_EXCEPTION): != X86EMUL_OKAY doesn't mean
== X86EMUL_EXCEPTION.

 --- a/xen/arch/x86/x86_emulate/x86_emulate.c
 +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
 @@ -3815,28 +3815,40 @@ x86_emulate(
  case 0x01: /* Grp7 */ {
  struct segment_register reg;
  unsigned long base, limit, cr0, cr0w;
 +uint64_t tsc_aux;
  
 -if ( modrm == 0xdf ) /* invlpga */
 +switch( modrm )
  {
 -generate_exception_if(!in_protmode(ctxt, ops), EXC_UD, -1);
 -generate_exception_if(!mode_ring0(), EXC_GP, 0);
 -fail_if(ops-invlpg == NULL);
 -if ( (rc = ops-invlpg(x86_seg_none, truncate_ea(_regs.eax),
 -   ctxt)) )
 -goto done;
 -break;
 -}
 -
 -if ( modrm == 0xf9 ) /* rdtscp */
 -{
 -uint64_t tsc_aux;
 -fail_if(ops-read_msr == NULL);
 -if ( (rc = ops-read_msr(MSR_TSC_AUX, tsc_aux, ctxt)) != 0 )
 -goto done;
 -_regs.ecx = (uint32_t)tsc_aux;
 -goto rdtsc;
 +case 0xdf: /* invlpga AMD */

Case labels indented the same as the containing switch() please.

 +case 0xd4: /* vmfunc */
 +generate_exception_if(
 +(lock_prefix |
 +rep_prefix() |
 +(vex.pfx == vex_66)),
 +EXC_UD, -1);

FWIW, while Andrew pointed out that this doesn't match the doc, I
suppose it's the doc that's wrong here, so I would be inclined to
suggest keeping it as is. What I don't like though is the formatting -
why does this need to span across 5 lines?

 +fail_if(ops-vmfunc == NULL);
 +if ( (rc = ops-vmfunc(ctxt) != X86EMUL_OKAY) )
 +goto done;
 +break;
 +default:
 +goto continue_grp7;
  }
 +break;
  
 +continue_grp7:

Labels indented by at least one space please.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 59207: regressions - FAIL

2015-07-09 Thread Ian Jackson
osstest service owner writes ([xen-unstable test] 59207: regressions - FAIL):
 flight 59207 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/59207/
 
 Regressions :-(
 
 Tests which did not succeed and are blocking,
 including tests which could not be run:
...
  test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  6 xen-boot \
fail REGR. vs. 58965

Xen crash in set_cpu_sibling_map AFAICT.  osstest reports failures in
PV jobs too.  ARM seems unaffected.

Ian.

Jul  9 08:38:43.197031 (XEN) If possible, disable SATA Combined mode in BIOS or 
contact your vendor for BIOS update.
Jul  9 08:38:43.205026 (XEN) AMD-Vi: Error initialization
Jul  9 08:38:43.213017 (XEN) I/O virtualisation disabled
Jul  9 08:38:43.221011 (XEN) [ Xen-4.6-unstable  x86_64  debug=y  Not 
tainted ]
Jul  9 08:38:43.221048 (XEN) CPU:0
Jul  9 08:38:43.229024 (XEN) RIP:e008:[82d08019c044] 
set_cpu_sibling_map+0x50/0x391
Jul  9 08:38:43.237034 (XEN) RFLAGS: 00010293   CONTEXT: hypervisor
Jul  9 08:38:43.245023 (XEN) rax:    rbx: 82d08034cf40   
rcx: 
Jul  9 08:38:43.253022 (XEN) rdx: 0001   rsi:    
rdi: 
Jul  9 08:38:43.261021 (XEN) rbp: 82d080307e28   rsp: 82d080307dd8   
r8:  830219028930
Jul  9 08:38:43.269024 (XEN) r9:     r10: 0004   
r11: 82d0803140d0
Jul  9 08:38:43.277033 (XEN) r12: 83021903afe0   r13: 82d08034cec0   
r14: 
Jul  9 08:38:43.285020 (XEN) r15: 0005   cr0: 8005003b   
cr4: 000406e0
Jul  9 08:38:43.293007 (XEN) cr3: dfcb4000   cr2: 
Jul  9 08:38:43.293040 (XEN) ds:    es:    fs:    gs:    ss: 
   cs: e008
Jul  9 08:38:43.301037 (XEN) Xen stack trace from rsp=82d080307dd8:
Jul  9 08:38:43.309032 (XEN)82d080307e28 0020 
83021903afe0 82d08034cec0
Jul  9 08:38:43.317031 (XEN)82d08034cec0 82d08034cf40 
83021903afe0 82d08034cec0
Jul  9 08:38:43.325031 (XEN)82d08034cec0 0005 
82d080307e48 82d0802dae3e
Jul  9 08:38:43.333029 (XEN)82d08030 82d08030 
82d080307f08 82d0802da680
Jul  9 08:38:43.341040 (XEN)0003  
00ae2000 0021f000
Jul  9 08:38:43.349033 (XEN)83085d70 0011 
00021d02e000 
Jul  9 08:38:43.357026 (XEN)0015024d 83085f00 
83085fb0 82d5
Jul  9 08:38:43.365037 (XEN)0008 0001006e 
0003 02f8
Jul  9 08:38:43.373026 (XEN)  
 
Jul  9 08:38:43.381037 (XEN)  
 82d080100073
Jul  9 08:38:43.389028 (XEN)  
 
Jul  9 08:38:43.397035 (XEN)  
 
Jul  9 08:38:43.405039 (XEN)  
 
Jul  9 08:38:43.413035 (XEN)  
 
Jul  9 08:38:43.421040 (XEN)  
 
Jul  9 08:38:43.429040 (XEN)  
 
Jul  9 08:38:43.437028 (XEN)  
8300dfd4d000 
Jul  9 08:38:43.445033 (XEN)
Jul  9 08:38:43.453008 (XEN) Xen call trace:
Jul  9 08:38:43.453038 (XEN)[82d08019c044] 
set_cpu_sibling_map+0x50/0x391
Jul  9 08:38:43.461024 (XEN)[82d0802dae3e] 
smp_prepare_cpus+0x12e/0x254
Jul  9 08:38:43.469122 (XEN)[82d0802da680] __start_xen+0x20b4/0x265c
Jul  9 08:38:43.477198 (XEN)[82d080100073] __high_start+0x53/0x58
Jul  9 08:38:43.485020 (XEN) 
Jul  9 08:38:43.485048 (XEN) Pagetable walk from :
Jul  9 08:38:43.493009 (XEN)  L4[0x000] = 000219036063 
Jul  9 08:38:43.493043 (XEN)  L3[0x000] = 000219035063 
Jul  9 08:38:43.501045 (XEN)  L2[0x000] = 000219034063  
Jul  9 08:38:43.509034 (XEN)  L1[0x000] =  
Jul  9 08:38:43.517005 (XEN) 
Jul  9 08:38:43.517030 (XEN) 
Jul  9 08:38:43.525030 (XEN) Panic on CPU 0:
Jul  9 08:38:43.525060 (XEN) FATAL PAGE FAULT
Jul  9 08:38:43.525086 (XEN) [error_code=0002]
Jul  9 08:38:43.533038 (XEN) Faulting linear address: 
Jul  9 08:38:43.541011 (XEN) 
Jul  9 08:38:43.541043 (XEN) 
Jul  9 08:38:43.548990 (XEN) Reboot in five seconds...

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-09 Thread Dario Faggioli
On Thu, 2015-07-09 at 14:44 +0100, Jan Beulich wrote:
  On 09.07.15 at 14:53, george.dun...@eu.citrix.com wrote:

  Consider the following scenario:
  - v1 blocks on pcpu 0.
   - vcpu_runstate_change() will do everything necessary for v1 on p0.
  - The scheduler does load balancing and moves v1 to p1, calling
  vcpu_migrate().  Because the vcpu is still blocked,
  vcpu_runstate_change() is not called.
  - A device interrupt is generated.
  
  What happens to the interrupt?  Does everything still work properly, or
  will the device wake-up interrupt go to the wrong pcpu (p0 rather than p1)?
 
 I think much of this was discussed before, since I also disliked the
 hooking into vcpu_runstate_change(). What I remember having
 been told is that it really only matters which pCPU's list a vCPU is
 on, not what v-processor says. 

Right.

But, as far as I could understand from the patches I've seen, a vcpu
ends up in a list when it blocks, and when it blocks there will be a
context switch, and hence we can deal with the queueing during the the
context switch itself (which is, in part, an arch specific operation
already).

What am I missing?

Maybe (looking at vmx_pi_desc_update() in patch 14), something that is
right now dealt with by means of RUNSTATE_offline? What (if yes)? Can
(if yes) it be explained in some non rusntate-based way, so we can
judge whether that could also be achieved differently than by actually
hooking in vcpu_runstate_change()?

Regards,
Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to handle retry

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 16:00, paul.durr...@citrix.com wrote:
 So, the only significant difference I can see is that using a signed value 
 causes an extra cltq instruction to be used. Or maybe I need some specific 
 level of optimization for there to be a big difference?

No, this is the difference I'm trying to point out (and if the index
register wasn't %rax, it would be a movslq or whatever its ATT
name is).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-09 Thread George Dunlap
On 07/09/2015 03:18 PM, Dario Faggioli wrote:
 On Thu, 2015-07-09 at 14:44 +0100, Jan Beulich wrote:
 On 09.07.15 at 14:53, george.dun...@eu.citrix.com wrote:
 
 Consider the following scenario:
 - v1 blocks on pcpu 0.
  - vcpu_runstate_change() will do everything necessary for v1 on p0.
 - The scheduler does load balancing and moves v1 to p1, calling
 vcpu_migrate().  Because the vcpu is still blocked,
 vcpu_runstate_change() is not called.
 - A device interrupt is generated.

 What happens to the interrupt?  Does everything still work properly, or
 will the device wake-up interrupt go to the wrong pcpu (p0 rather than p1)?

 I think much of this was discussed before, since I also disliked the
 hooking into vcpu_runstate_change(). What I remember having
 been told is that it really only matters which pCPU's list a vCPU is
 on, not what v-processor says. 

 Right.
 
 But, as far as I could understand from the patches I've seen, a vcpu
 ends up in a list when it blocks, and when it blocks there will be a
 context switch, and hence we can deal with the queueing during the the
 context switch itself (which is, in part, an arch specific operation
 already).
 
 What am I missing?

I think what you're missing is that Jan is answering my question about
migrating a blocked vcpu, not arguing that vcpu_runstate_change() is the
right way to go.  At least that's how I understood him. :-)

But regarding context_switch: I think the reason we need more hooks than
that is that context_switch only changes into and out of running state.
 There are also changes that need to happen when you change from blocked
to offline, offline to blocked, blocked to runnable, c; these don't go
through context_switch.  That's why I was suggesting some architectural
equivalents to the SCHED_OP() callbacks to be added to vcpu_wake c.

vcpu_runstate_change() is at the moment a nice quiet cul-de-sac that
just does a little bit of accounting; I'd rather not have it suddenly
become a major thoroughfare for runstate change hooks, if we can avoid
it. :-)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 11/13] x86/altp2m: define and implement alternate p2m HVMOP types.

2015-07-09 Thread Jan Beulich
 On 01.07.15 at 20:09, edmund.h.wh...@intel.com wrote:
 +case HVMOP_altp2m_set_domain_state:
 +{
 +struct xen_hvm_altp2m_domain_state a;
 +struct domain *d;
 +struct vcpu *v;
 +bool_t ostate;
 +
 +if ( copy_from_guest(a, arg, 1) )
 +return -EFAULT;
 +
 +d = rcu_lock_domain_by_any_id(a.domid);
 +if ( d == NULL )
 +return -ESRCH;

Is it indeed intended for a guest to enable this on itself?

 +rc = -EINVAL;
 +if ( is_hvm_domain(d)  hvm_altp2m_supported() 
 + !nestedhvm_enabled(d) )
 +{
 +ostate = d-arch.altp2m_active;
 +d-arch.altp2m_active = !!a.state;
 +
 +rc = 0;
 +
 +/* If the alternate p2m state has changed, handle appropriately 
 */
 +if ( d-arch.altp2m_active != ostate )
 +{
 +if ( ostate || !(rc = p2m_init_altp2m_by_id(d, 0)) )
 +{

Two if()-s like these should be combined into one.

 +case HVMOP_altp2m_vcpu_enable_notify:
 +{
 +struct domain *curr_d = current-domain;
 +struct vcpu *curr = current;

The other way around please.

 +case HVMOP_altp2m_set_mem_access:
 +{
 +struct xen_hvm_altp2m_set_mem_access a;
 +struct domain *d;
 +
 +if ( copy_from_guest(a, arg, 1) )
 +return -EFAULT;
 +
 +d = rcu_lock_domain_by_any_id(a.domid);
 +if ( d == NULL )
 +return -ESRCH;

Again - is it intended for this to be invokable by the guest for itself?
If so, is it being made certain that it can't increase access permissions
beyond what its controlling domain dictates?

 --- a/xen/include/public/hvm/hvm_op.h
 +++ b/xen/include/public/hvm/hvm_op.h
 @@ -396,6 +396,75 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t);
  
  #endif /* defined(__i386__) || defined(__x86_64__) */
  
 +/* Set/get the altp2m state for a domain */
 +#define HVMOP_altp2m_set_domain_state 24
 +#define HVMOP_altp2m_get_domain_state 25
 +struct xen_hvm_altp2m_domain_state {
 +/* Domain to be updated or queried */
 +domid_t domid;
 +/* IN or OUT variable on/off */
 +uint8_t state;
 +};

Not sure if it was said before - explicit padding please (with padding
fields verified to be zero for future extensibility).

 +struct xen_hvm_altp2m_view {
 +/* Domain to be updated */
 +domid_t domid;
 +/* IN/OUT variable */
 +uint16_t view;

Is it certain that the number of views will never exceed 64k?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86: correct socket_cpumask allocation

2015-07-09 Thread Chao Peng
On Thu, Jul 09, 2015 at 10:41:55AM +0100, Jan Beulich wrote:
  On 09.07.15 at 10:26, chao.p.p...@linux.intel.com wrote:
  @@ -748,8 +758,9 @@ static int cpu_smpboot_alloc(unsigned int cpu)
   goto oom;
   per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu);
   
  -if ( !socket_cpumask[socket] 
  - !zalloc_cpumask_var(socket_cpumask + socket) )
  +if ( secondary_socket_cpumask == NULL 
  + (secondary_socket_cpumask = _xzalloc(nr_cpumask_bits / 8,
  +  sizeof(long))) == NULL )
 
 This is horrible since completely type-unsafe, and correct only
 because _xmalloc() happens to allocate more space than requested
 if the size isn't a multiple of MEM_ALIGN. And it makes me realize why
 on IRC I first suggested xzalloc_array(): That would at least have
 taken care of that latent bug. And remember that I did _not_
 suggest _xzalloc(), but xzalloc().
 
 Taken together I think we should stay with using zalloc_cpumask_var(),
 and introduce zap_cpumask_var() (storing NULL in the big NR_CPUS
 case and doing nothing in the small one).

Apart from zap_cpumask_var() there is need to check if cpumask_vat_t is
NULL as well. While that is weird to satisfy compiler for small NR_CPUS case.

 Should I be overlooking
 something that still prevents this from building in both cases, the
 above allocation should be changed to at least be type safe (and I
 guess I'd rather waste a few bytes here than see you add fragile
 casts or some such).

So this solution is finally adopted. The new version is already sent out.

Chao

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [ovmf test] 59220: regressions - FAIL

2015-07-09 Thread osstest service owner
flight 59220 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59220/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-ovmf-amd64 17 guest-start.2 fail REGR. vs. 59179

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 59179
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 59179

version targeted for testing:
 ovmf 51dc120608129b652a1fdbb3f0d9c9a7f8a4b748
baseline version:
 ovmf e109e3fec9245e3fa777802d17e00431e349a6bd

Last test of basis59179  2015-07-08 01:15:55 Z1 days
Testing same since59220  2015-07-08 19:32:31 Z0 days1 attempts


People who touched revisions under test:
  Bruce Cran br...@cran.org.uk
  Feng Tian feng.t...@intel.com
  Hess Chen hesheng.c...@intel.com
  Jeff Fan jeff@intel.com
  jiaxinwu jiaxin...@intel.com
  Olivier Martin olivier.mar...@arm.com
  Qiu Shumin shumin@intel.com
  Star Zeng star.z...@intel.com
  Zhang Lubo lubo.zh...@intel.com

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass
 test-amd64-amd64-xl-qemuu-winxpsp3   pass
 test-amd64-i386-xl-qemuu-winxpsp3pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 355 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 07/15] x86/hvm: unify stdvga mmio intercept with standard mmio intercept

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
 @@ -424,8 +426,22 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t 
 val)
  }
  }
  
 -static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
 +static int stdvga_mem_write(const struct hvm_io_handler *handler,
 +uint64_t addr, uint32_t size,
 +uint64_t data)
  {
 +struct hvm_hw_stdvga *s = current-domain-arch.hvm_domain.stdvga;
 +ioreq_t p = { .type = IOREQ_TYPE_COPY,
 +  .addr = addr,
 +  .size = size,
 +  .count = 1,
 +  .dir = IOREQ_WRITE,
 +  .data = data,
 +};

Indentation (still - I know I pointed this out on v6, just perhaps at
another example). See e.g. the context of the earlier change to the
beginning of hvm_mmio_internal() in this patch for how this should
look like.

 -int stdvga_intercept_mmio(ioreq_t *p)
 +static bool_t stdvga_mem_accept(const struct hvm_io_handler *handler,
 +const ioreq_t *p)
  {
 -struct domain *d = current-domain;
 -struct hvm_hw_stdvga *s = d-arch.hvm_domain.stdvga;
 -uint64_t start, end, addr = p-addr, count = p-count, size = p-size;
 -int buf = 0, rc;
 -
 -if ( unlikely(p-df) )
 -{
 -start = (addr - (count - 1) * size);
 -end = addr + size;
 -}
 -else
 -{
 -start = addr;
 -end = addr + count * size;
 -}
 -
 -if ( (start  VGA_MEM_BASE) || (end  (VGA_MEM_BASE + VGA_MEM_SIZE)) )
 -return X86EMUL_UNHANDLEABLE;
 -
 -if ( size  8 )
 -{
 -gdprintk(XENLOG_WARNING, invalid mmio size %d\n, (int)p-size);
 -return X86EMUL_UNHANDLEABLE;
 -}
 +struct hvm_hw_stdvga *s = current-domain-arch.hvm_domain.stdvga;
  
  spin_lock(s-lock);
  
 -if ( s-stdvga  s-cache )
 +if ( !s-stdvga ||
 + (hvm_mmio_first_byte(p)  VGA_MEM_BASE) ||
 + (hvm_mmio_last_byte(p)  (VGA_MEM_BASE + VGA_MEM_SIZE)) )

If last means what is says, you need = here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 59207: regressions - FAIL

2015-07-09 Thread osstest service owner
flight 59207 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59207/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-pvh-amd   6 xen-boot  fail REGR. vs. 58965
 test-amd64-amd64-xl-xsm   6 xen-boot  fail REGR. vs. 58965
 test-amd64-i386-xl-xsm6 xen-boot  fail REGR. vs. 58965
 test-amd64-i386-rumpuserxen-i386  6 xen-boot  fail REGR. vs. 58965
 test-amd64-i386-qemut-rhel6hvm-amd  6 xen-bootfail REGR. vs. 58965
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 6 xen-boot fail REGR. 
vs. 58965
 test-amd64-i386-qemuu-rhel6hvm-amd  6 xen-bootfail REGR. vs. 58965
 test-amd64-i386-xl-qemut-win7-amd64  6 xen-boot   fail REGR. vs. 58965
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
REGR. vs. 58965
 test-amd64-i386-freebsd10-amd64  6 xen-boot   fail REGR. vs. 58965
 test-armhf-armhf-xl-multivcpu 15 guest-start/debian.repeat fail REGR. vs. 58965
 test-amd64-amd64-xl-qemut-debianhvm-amd64  6 xen-boot fail REGR. vs. 58965
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  6 xen-boot  fail REGR. vs. 58965
 test-amd64-i386-xl-qemut-debianhvm-amd64  6 xen-boot  fail REGR. vs. 58965
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  6 xen-boot  fail REGR. vs. 58965
 test-amd64-i386-xl-qemuu-ovmf-amd64  6 xen-boot   fail REGR. vs. 58965
 test-amd64-amd64-xl-qemuu-win7-amd64  9 windows-install   fail REGR. vs. 58965
 test-amd64-i386-xl-qemut-winxpsp3  6 xen-boot fail REGR. vs. 58965
 test-amd64-amd64-xl-qemuu-winxpsp3  6 xen-bootfail REGR. vs. 58965

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt-xsm   6 xen-boot  fail REGR. vs. 58965
 test-amd64-i386-libvirt   6 xen-boot  fail REGR. vs. 58965
 test-amd64-amd64-libvirt  6 xen-boot  fail REGR. vs. 58965
 test-amd64-amd64-libvirt-xsm  6 xen-boot  fail REGR. vs. 58965
 test-amd64-amd64-xl-rtds  6 xen-boot  fail REGR. vs. 58965
 test-armhf-armhf-xl-rtds 11 guest-start  fail   like 58965
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58965
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 58965

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 xen  89483dc2b3eca29b9b104019bda1883c08150dc9
baseline version:
 xen  c40317f11b3f05e7c06a2213560c8471081f2662

Last test of basis58965  2015-06-29 02:08:30 Z   10 days
Failing since 58974  2015-06-29 15:11:59 Z9 days   10 attempts
Testing same since59207  2015-07-08 13:02:00 Z1 days1 attempts


People who touched revisions under test:
  Andrew Cooper andrew.coop...@citrix.com
  Anthony PERARD anthony.per...@citrix.com
  Ard Biesheuvel a...@linaro.org
  Boris Ostrovsky boris.ostrov...@oracle.com
  Chao Peng chao.p.p...@linux.intel.com
  Chen Baozi baoz...@gmail.com
  Daniel De Graaf dgde...@tycho.nsa.gov
  Dario Faggioli dario.faggi...@citrix.com
  Euan Harris euan.har...@citrix.com
  Feng Wu feng...@intel.com
  George Dunlap george.dun...@eu.citrix.com
  Ian Campbell ian,campb...@citrix.com
  Ian Campbell ian.campb...@citrix.com
  Ian Jackson ian.jack...@eu.citrix.com
  Jan Beulich jbeul...@suse.com
  Jennifer Herbert jennifer.herb...@citrix.com
  Juergen Gross jgr...@suse.com
  Julien Grall julien.gr...@citrix.com
  Julien Grall julien.gr...@linaro.org
  Liang Li liang.z...@intel.com
  Paul Durrant paul.durr...@citrix.com
  Roger Pau Monné roger@citrix.com
  Samuel Thibault samuel.thiba...@ens-lyon.org
  Thomas Leonard tal...@gmail.com
  Tiejun Chen tiejun.c...@intel.com
  Wei Liu wei.l...@citrix.com
  Wen Congyang we...@cn.fujitsu.com
  Yang Zhang yang.z.zh...@intel.com

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64 

Re: [Xen-devel] [PATCH v3 00/12] Alternate p2m: support multiple copies of host p2m

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 13:49, wei.l...@citrix.com wrote:
 As far as I can tell, there are several issues here with regard to this
 patch series on hypervisor side:
 
 1. Patch #1 is neither acked nor reviewed.

The patch is obvious and straightforward - the only reason I didn't
commit it yet is that it would be dead code right now.

 3. Patch #7 has issue with regard to instruction emulator.

Those would seem to be easily fixed.

 4. Patch #8 in this version (#7 in previous version, v2) is reviewed,
but I'm not sure if the concern raised in v2 went away. Judging
from the date of v3 submission and date of issue raised on v2, the
issue is still there.

No, the concern did not go away, but George actually had provided
a patch to address the concern.

 5. Patch #11 is not acked, and because it's ABI matter we need to be
very careful about it.

Indeed. But directions were given, so it should be mostly a
mechanical thing.

 Question for hypervisor maintainers, how much risk do you think this
 series impose?  If this feature is off, is the code path more or less
 the same of before?

It looks like it mostly is, so I don't view the risk as too high.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH OSSTEST v8 2/2] mg-all-branch-statuses: Show how up to date each branch is

2015-07-09 Thread Ian Campbell
Using report_find_push_age_info allows us to provide counts of
attempts since the last baseline on current tip as well as the first
attempt of each of those.

Since everything serialises on the repo lock I didn't bother trying to
parallelise anything.

It's a little terse to keep it in 80 chars.

cron infrastructure is added to run this in a new
$HOME/for-maintjobs.git on a daily basis and push the output to the
location given in the Publish config variable, if it is set (it is set
in production but not for cambridge).

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v2: Use new report_find_push_age_info functionality, output condensed.

v3: - Correctly quote parameters to printf, so empty ones don't misalign the
  rest.
- Drop dates in favour of number of days, and n/a in those cases
- Print Error! if no tip is available and UpToDate if tip==basis.

v4: - Simplify $onevar, dropping $dflt, dropping export, subsuming
  $oneflightvar.
- Right justify some stuff
- s/UpToDate/same/
- Simplify some pointless or complex shell conditionals

v5: - Add cron infrastructure

v6: - Propagate cron output to logs (untested apart from bash -n due
  to lack of local infra).

v7: - Use cr-publish-flight-logs.

v8: - Output to $c{Results}

time with an recently updates set of Repos says:

57.34user 28.28system 5:34.16elapsed 25%CPU (0avgtext+0avgdata 
47256maxresident)k
100216inputs+600outputs (673major+2332436minor)pagefaults 0swaps

So it's not quick...

Example output:
Branch   BasisTip  #Tip #Tot 1stTip 1stNew
libvirt  d10a5f58 285407940   11   n/a  9 days
linux-3.0e1c63f9f 5dba9ddd29 3 days   870 days
linux-3.10   b3d78448 same
linux-3.14   762167f9 same
linux-3.16   26749e75 same
linux-3.18   d048c068 ea5dd38e55 3 days 3 days
linux-3.4bb4a05a0 cf1b3dad   17  16312 days   213 days
linux-4.1b953c0d2 6a010c0a00   n/an/a
linux-arm-xen64972ceb same
linux-linus  6aaf0da8 9bdc771f01   n/a  1 day
linux-mingo-tip-master   d935d0f7 21316d690   16   n/a   1175 days
linux-nexte58c24350  219   n/a448 days
osstest  15d2dd50 Error!  0-   n/an/a
ovmf 3735017c 0bb5d7a501   n/a  0 days
qemu-mainlined2966f80 3536064201   n/a  0 days
qemu-upstream-4.2-testingd2382550 same
qemu-upstream-4.3-testingefae5e0f same
qemu-upstream-4.4-testing32226f42 same
qemu-upstream-4.5-testingd9552b0a same
qemu-upstream-unstable   c4a962ec same
rumpuserxen  30d72f3f 3b91e449   62  11478 days   150 days
seabios  f24eb2f8 6cfebb4e00   n/an/a
xen-4.0-testing  2692df2a same
xen-4.1-testing  40feff87 same
xen-4.2-testing  38fcda22 same
xen-4.3-testing  e7c02297 same
xen-4.4-testing  6c1cb3db same
xen-4.5-testing  e3bd3cef same
xen-unstable c40317f1 9379af0805   n/a  3 days
---
 cr-all-branch-statuses |  32 +
 crontab|   1 +
 crontab-cambridge  |   1 +
 mg-all-branch-statuses | 120 +
 4 files changed, 154 insertions(+)
 create mode 100755 cr-all-branch-statuses
 create mode 100755 mg-all-branch-statuses

diff --git a/cr-all-branch-statuses b/cr-all-branch-statuses
new file mode 100755
index 000..f9885db
--- /dev/null
+++ b/cr-all-branch-statuses
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+# This is part of osstest, an automated testing framework for Xen.
+# Copyright (C) 2015 Citrix Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+
+set -ex
+
+. cri-args-hostlists
+branch=$1; shift
+
+check_stop all-branch-statuses.
+
+results=`getconfig Results`
+
+abs=$results/all-branch-statuses.txt
+( date ; echo ; bash -x ./mg-all-branch-statuses )  $abs.new
+mv $abs.new $abs
+
+./cr-publish-flight-logs
diff --git a/crontab b/crontab
index 5dc8675..436645b 100644
--- a/crontab
+++ b/crontab
@@ -7,4 +7,5 @@ 

Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain

2015-07-09 Thread Boris Ostrovsky

On 07/09/2015 10:17 AM, Jan Beulich wrote:

On 09.07.15 at 16:10, boris.ostrov...@oracle.com wrote:

On 07/09/2015 03:02 AM, Jan Beulich wrote:

On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote:

As I started to update the patches I realized that in some cases
(especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't
have VCPU (which is what hvm_guest_x86_mode() wants) but rather only the
domain. d-vcpu[0] should work. Otherwise I'll either need a new field
in struct domain or wrap has_32bit_shinfo into something PVH-specific,
like is_32bit_pvh_vcpu().

Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like
for PVH, especially if you also intend the tools to use the 64-bit
guest context variant even for 32-bit PVH? Once again - are you
intending to prohibit 32-bit PVH switching to 64-bit mode (which
would seem both wrong and possibly cumbersome to me)?

With current PVH implementation I don't think we can switch. We are
starting the guest in very much PV-like fashion. That's why we are
getting into switch_compat() --- via XEN_DOMCTL_set_address_size.

For XEN_DOMCTL_get_address_size specifically we need to be able to
figure out the mode *before* the guest is running because we use it to
set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't
change the mode.

Okay - but is there code (being put) in place to refuse switch
attempts?


No, I should add code to deal with this.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 59207: regressions - FAIL

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 16:07, ian.jack...@eu.citrix.com wrote:
 osstest service owner writes ([xen-unstable test] 59207: regressions - 
 FAIL):
 flight 59207 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/59207/ 
 
 Regressions :-(
 
 Tests which did not succeed and are blocking,
 including tests which could not be run:
 ...
  test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  6 xen-boot \
fail REGR. vs. 58965
 
 Xen crash in set_cpu_sibling_map AFAICT.

I'm expecting a v3 of the patch any minute.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH OSSTEST v8 1/2] cr-publish-flight-logs: allow publishing only the Results

2015-07-09 Thread Ian Campbell
Removing the requirement that a flight be provided allows this to be
used to publish the results directory even in contexts which have no
flight.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v7: New patch

v8: No need for a list of files, just allow to omit publishing flight.
Therefore flight becomes optional.

Renamed from : cr-publish-flight-logs: support publishing files from 
$HOME/public_html
---
 cr-publish-flight-logs | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/cr-publish-flight-logs b/cr-publish-flight-logs
index 94c74c8..8a19d88 100755
--- a/cr-publish-flight-logs
+++ b/cr-publish-flight-logs
@@ -31,10 +31,10 @@ if (@ARGV  $ARGV[0] eq '--push-harness') {
 $push_harness = 1;
 }
 
-die usage: ./cr-publish-flight-logs flight unless @ARGV==1;
+my $flight= shift @ARGV // '';
+die unless $flight =~ m/^\d*$/;
 
-my $flight= shift @ARGV;
-die unless $flight =~ m/^\d+$/;
+die usage: ./cr-publish-flight-logs [flight] unless @ARGV==0;
 
 open LOCK,  $c{GlobalLockDir}/publish-lock or die $!;
 flock LOCK, LOCK_EX or die $!;
@@ -59,5 +59,6 @@ sub copydir ($$) {
 $!=0; $?=0; system @cmd; die rsync $? $! if $? or $!;
 }
 
-copydir($c{Logs}/$flight/, $c{LogsPublish}/$flight) if $c{LogsPublish};
+copydir($c{Logs}/$flight/, $c{LogsPublish}/$flight)
+if $flight  $c{LogsPublish};
 copydir($c{Results}/, $c{ResultsPublish}) if $c{ResultsPublish};
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST v8 1/2] cr-publish-flight-logs: allow publishing only the Results

2015-07-09 Thread Ian Jackson
Ian Campbell writes ([PATCH OSSTEST v8 1/2] cr-publish-flight-logs: allow 
publishing only the Results):
 Removing the requirement that a flight be provided allows this to be
 used to publish the results directory even in contexts which have no
 flight.

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-09 Thread Dario Faggioli
On Thu, 2015-07-09 at 15:27 +0100, George Dunlap wrote:
 On 07/09/2015 03:18 PM, Dario Faggioli wrote:
  On Thu, 2015-07-09 at 14:44 +0100, Jan Beulich wrote:
  On 09.07.15 at 14:53, george.dun...@eu.citrix.com wrote:
  
  Consider the following scenario:
  - v1 blocks on pcpu 0.
   - vcpu_runstate_change() will do everything necessary for v1 on p0.
  - The scheduler does load balancing and moves v1 to p1, calling
  vcpu_migrate().  Because the vcpu is still blocked,
  vcpu_runstate_change() is not called.
  - A device interrupt is generated.
 
  What happens to the interrupt?  Does everything still work properly, or
  will the device wake-up interrupt go to the wrong pcpu (p0 rather than 
  p1)?
 
  I think much of this was discussed before, since I also disliked the
  hooking into vcpu_runstate_change(). What I remember having
  been told is that it really only matters which pCPU's list a vCPU is
  on, not what v-processor says. 
 
  Right.
  
  But, as far as I could understand from the patches I've seen, a vcpu
  ends up in a list when it blocks, and when it blocks there will be a
  context switch, and hence we can deal with the queueing during the the
  context switch itself (which is, in part, an arch specific operation
  already).
  
  What am I missing?
 
 I think what you're missing is that Jan is answering my question about
 migrating a blocked vcpu, not arguing that vcpu_runstate_change() is the
 right way to go.  At least that's how I understood him. :-)
 
Eheh, no, I got that... I just wanted to take one more chance to try to
get an answer (from Feng, mainly, but from anyone who knows or has an
idea, in reality! :-D) on why another approach is not possible.

Sorry for abusing the reply for my own mean purposes. :-D

 But regarding context_switch: I think the reason we need more hooks than
 that is that context_switch only changes into and out of running state.

Sure.

 There are also changes that need to happen when you change from blocked
 to offline, offline to blocked, blocked to runnable, c; these don't go
 through context_switch.  

Right, those ones. And about them, I could say for the third time that
having a non-runstate biased description of what we need would help to
understand where to do things, whether there are suitable spots already
or we need to add new ones, etc But I guess I don't want to be so
repetitive! :-PP

 That's why I was suggesting some architectural
 equivalents to the SCHED_OP() callbacks to be added to vcpu_wake c.
 
Yep. If (or at least for those of which) really there is no other place
already, adding these would be TheRightThing for me as well.

 vcpu_runstate_change() is at the moment a nice quiet cul-de-sac that
 just does a little bit of accounting; 

Indeed!

 I'd rather not have it suddenly
 become a major thoroughfare for runstate change hooks, if we can avoid
 it. :-)
 
Yep, that was my understanding of it, and my idea too.

Regards,
Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 3/4] Convert map_domain_page() to use the new mfn_t type

2015-07-09 Thread Ben Catterall
Reworked the internals and declaration, applying (un)boxing
where needed. Converted calls to map_domain_page() to
provide mfn_t types, boxing where needed.

Signed-off-by: Ben Catterall ben.catter...@citrix.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

---
Changed since v1:
   * Created paddr_to_mfn() and mfn_to_paddr() for both x86 and ARM
   * Converted code to use the new paddr_to_mfn() rather than e.g.
 paddrPAGE_SHIFT

Changed since v2:
   * Switch to using paddr_to_pfn() and pfn_to_addr().
   * Removed paddr_to_mfn() and mfn_to_paddr()
   * Added missing blank line
---
 xen/arch/arm/domain_build.c   |  2 +-
 xen/arch/arm/kernel.c |  2 +-
 xen/arch/arm/mm.c | 12 +-
 xen/arch/arm/p2m.c|  4 ++--
 xen/arch/arm/traps.c  |  4 ++--
 xen/arch/x86/debug.c  | 10 
 xen/arch/x86/domain.c |  4 ++--
 xen/arch/x86/domain_build.c   | 10 
 xen/arch/x86/domain_page.c| 22 -
 xen/arch/x86/domctl.c |  2 +-
 xen/arch/x86/mm.c | 40 +++
 xen/arch/x86/mm/guest_walk.c  |  2 +-
 xen/arch/x86/mm/hap/guest_walk.c  |  2 +-
 xen/arch/x86/mm/mem_sharing.c |  4 ++--
 xen/arch/x86/mm/p2m-ept.c | 22 -
 xen/arch/x86/mm/p2m-pod.c |  8 +++
 xen/arch/x86/mm/p2m-pt.c  | 28 +++---
 xen/arch/x86/mm/p2m.c |  2 +-
 xen/arch/x86/mm/paging.c  | 32 -
 xen/arch/x86/mm/shadow/common.c   |  2 +-
 xen/arch/x86/mm/shadow/multi.c|  4 ++--
 xen/arch/x86/mm/shadow/private.h  |  2 +-
 xen/arch/x86/smpboot.c|  2 +-
 xen/arch/x86/tboot.c  |  5 ++--
 xen/arch/x86/traps.c  | 12 +-
 xen/arch/x86/x86_64/mm.c  | 14 +--
 xen/arch/x86/x86_64/traps.c   | 10 
 xen/arch/x86/x86_emulate.c| 10 
 xen/common/grant_table.c  |  4 ++--
 xen/common/kexec.c|  4 ++--
 xen/common/kimage.c   | 10 
 xen/common/memory.c   |  6 ++---
 xen/common/tmem_xen.c |  6 ++---
 xen/drivers/passthrough/amd/iommu_guest.c | 10 
 xen/drivers/passthrough/amd/iommu_map.c   | 14 +--
 xen/drivers/passthrough/vtd/x86/vtd.c |  2 +-
 xen/include/asm-x86/hap.h |  2 +-
 xen/include/asm-x86/page.h|  7 +++---
 xen/include/asm-x86/paging.h  |  2 +-
 xen/include/xen/domain_page.h |  8 +++
 40 files changed, 175 insertions(+), 173 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 8556afd..a059de6 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1408,7 +1408,7 @@ static void initrd_load(struct kernel_info *kinfo)
 return;
 }
 
-dst = map_domain_page(maPAGE_SHIFT);
+dst = map_domain_page(_mfn(paddr_to_pfn(ma)));
 
 copy_from_paddr(dst + s, paddr + offs, l);
 
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 209c3dd..f641b12 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -182,7 +182,7 @@ static void kernel_zimage_load(struct kernel_info *info)
 return;
 }
 
-dst = map_domain_page(maPAGE_SHIFT);
+dst = map_domain_page(_mfn(paddr_to_pfn(ma)));
 
 copy_from_paddr(dst + s, paddr + offs, l);
 
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index d479048..ae0f34c 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -213,7 +213,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr,
 else
 root_table = 0;
 
-mapping = map_domain_page(root_pfn + root_table);
+mapping = map_domain_page(_mfn(root_pfn + root_table));
 
 for ( level = root_level; ; level++ )
 {
@@ -230,7 +230,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr,
 
 /* For next iteration */
 unmap_domain_page(mapping);
-mapping = map_domain_page(pte.walk.base);
+mapping = map_domain_page(_mfn(pte.walk.base));
 }
 
 unmap_domain_page(mapping);
@@ -282,11 +282,11 @@ void unmap_domain_page_global(const void *va)
 }
 
 /* Map a page of domheap memory */
-void *map_domain_page(unsigned long mfn)
+void *map_domain_page(mfn_t mfn)
 {
 unsigned long flags;
 lpae_t *map = this_cpu(xen_dommap);
-unsigned long slot_mfn = mfn  ~LPAE_ENTRY_MASK;
+unsigned long slot_mfn = mfn_x(mfn)  ~LPAE_ENTRY_MASK;
 vaddr_t va;
 lpae_t pte;
 int i, slot;
@@ -339,7 +339,7 @@ void *map_domain_page(unsigned long mfn)
 
 va = (DOMHEAP_VIRT_START
   + (slot  SECOND_SHIFT)
-  + 

[Xen-devel] [PATCH v3 1/4] xen/domain_page: Convert map_domain_page_global() to using mfn_t

2015-07-09 Thread Ben Catterall
From: Andrew Cooper andrew.coop...@citrix.com

The sh_map/unmap wrappers can be dropped, and take the opportunity to turn
some #define's into static inlines, for added type saftey.

As part of adding the type safety, GCC highlights an problematic include cycle
with arm/mm.h including domain_page.h which includes xen/mm.h and falls over
__page_to_mfn being used before being declared.  Simply dropping the inclusion
of domain_page.h fixes the compilation issue.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Jan Beulich jbeul...@suse.com
CC: Tim Deegan t...@xen.org
CC: Ian Campbell ian.campb...@citrix.com
CC: Stefano Stabellini stefano.stabell...@citrix.com

---
Changed since v2
* (un)map_domain_page_global() now take ptr-to-const

Reviewed-by: Jan Beulich jbeul...@suse.com
For ARM: Acked-by: Ian Campbell ian.campb...@citrix.com
---
 xen/arch/arm/mm.c|  6 ++
 xen/arch/x86/domain_page.c   |  9 -
 xen/arch/x86/mm/shadow/multi.c   | 10 +-
 xen/arch/x86/mm/shadow/private.h | 12 
 xen/include/asm-arm/mm.h |  1 -
 xen/include/xen/domain_page.h| 22 +-
 6 files changed, 28 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index ff1b330..d479048 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -271,11 +271,9 @@ void clear_fixmap(unsigned map)
 }
 
 #ifdef CONFIG_DOMAIN_PAGE
-void *map_domain_page_global(unsigned long mfn)
+void *map_domain_page_global(mfn_t mfn)
 {
-mfn_t m = _mfn(mfn);
-
-return vmap(m, 1);
+return vmap(mfn, 1);
 }
 
 void unmap_domain_page_global(const void *va)
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index d684b2f..0f7548b 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -302,17 +302,16 @@ int mapcache_vcpu_init(struct vcpu *v)
 return 0;
 }
 
-void *map_domain_page_global(unsigned long mfn)
+void *map_domain_page_global(mfn_t mfn)
 {
-mfn_t m = _mfn(mfn);
 ASSERT(!in_irq()  local_irq_is_enabled());
 
 #ifdef NDEBUG
-if ( mfn = PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
-return mfn_to_virt(mfn);
+if ( mfn_x(mfn) = PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+return mfn_to_virt(mfn_x(mfn));
 #endif
 
-return vmap(m, 1);
+return vmap(mfn, 1);
 }
 
 void unmap_domain_page_global(const void *ptr)
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 4058a9c..19644d2 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3806,7 +3806,7 @@ sh_detach_old_tables(struct vcpu *v)
 if ( v-arch.paging.shadow.guest_vtable )
 {
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
-sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
+unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
 v-arch.paging.shadow.guest_vtable = NULL;
 }
 #endif // !NDEBUG
@@ -3977,8 +3977,8 @@ sh_update_cr3(struct vcpu *v, int do_locking)
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
 {
 if ( v-arch.paging.shadow.guest_vtable )
-sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
-v-arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn);
+unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
+v-arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn);
 /* PAGING_LEVELS==4 implies 64-bit, which means that
  * map_domain_page_global can't fail */
 BUG_ON(v-arch.paging.shadow.guest_vtable == NULL);
@@ -4010,8 +4010,8 @@ sh_update_cr3(struct vcpu *v, int do_locking)
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
 {
 if ( v-arch.paging.shadow.guest_vtable )
-sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
-v-arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn);
+unmap_domain_page_global(v-arch.paging.shadow.guest_vtable);
+v-arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn);
 /* Does this really need map_domain_page_global?  Handle the
  * error properly if so. */
 BUG_ON(v-arch.paging.shadow.guest_vtable == NULL); /* XXX */
diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
index f72ea9f..eff39dc 100644
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -517,18 +517,6 @@ sh_unmap_domain_page(void *p)
 unmap_domain_page(p);
 }
 
-static inline void *
-sh_map_domain_page_global(mfn_t mfn)
-{
-return map_domain_page_global(mfn_x(mfn));
-}
-
-static inline void
-sh_unmap_domain_page_global(void *p)
-{
-unmap_domain_page_global(p);
-}
-
 /**/
 /* Shadow-page refcounting. */
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 

[Xen-devel] [PATCH v3 4/4] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()

2015-07-09 Thread Ben Catterall
Removed as they were wrappers around map_domain_page() to
make it appear to take an mfn_t type.

Signed-off-by: Ben Catterall ben.catter...@citrix.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
Reviewed-by: Tim Deegan t...@xen.org
---
 xen/arch/x86/mm/hap/hap.c|   4 +-
 xen/arch/x86/mm/shadow/common.c  |  22 +++---
 xen/arch/x86/mm/shadow/multi.c   | 152 +++
 xen/arch/x86/mm/shadow/private.h |  13 
 xen/include/asm-x86/hap.h|  15 
 5 files changed, 89 insertions(+), 117 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index d0d3f1e..63980af 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -395,7 +395,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, 
mfn_t l4mfn)
 struct domain *d = v-domain;
 l4_pgentry_t *l4e;
 
-l4e = hap_map_domain_page(l4mfn);
+l4e = map_domain_page(l4mfn);
 
 /* Copy the common Xen mappings from the idle domain */
 memcpy(l4e[ROOT_PAGETABLE_FIRST_XEN_SLOT],
@@ -411,7 +411,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, 
mfn_t l4mfn)
 l4e[l4_table_offset(LINEAR_PT_VIRT_START)] =
 l4e_from_pfn(mfn_x(l4mfn), __PAGE_HYPERVISOR);
 
-hap_unmap_domain_page(l4e);
+unmap_domain_page(l4e);
 }
 
 static mfn_t hap_make_monitor_table(struct vcpu *v)
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index c36ffeb..6574206 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -781,11 +781,11 @@ static void oos_hash_add(struct vcpu *v, mfn_t gmfn)
 if ( swap )
 SWAP(oos_snapshot[idx], oos_snapshot[oidx]);
 
-gptr = sh_map_domain_page(oos[oidx]);
-gsnpptr = sh_map_domain_page(oos_snapshot[oidx]);
+gptr = map_domain_page(oos[oidx]);
+gsnpptr = map_domain_page(oos_snapshot[oidx]);
 memcpy(gsnpptr, gptr, PAGE_SIZE);
-sh_unmap_domain_page(gptr);
-sh_unmap_domain_page(gsnpptr);
+unmap_domain_page(gptr);
+unmap_domain_page(gsnpptr);
 }
 
 /* Remove an MFN from the list of out-of-sync guest pagetables */
@@ -1498,7 +1498,7 @@ mfn_t shadow_alloc(struct domain *d,
 p = __map_domain_page(sp);
 ASSERT(p != NULL);
 clear_page(p);
-sh_unmap_domain_page(p);
+unmap_domain_page(p);
 INIT_PAGE_LIST_ENTRY(sp-list);
 page_list_add(sp, tmp_list);
 sp-u.sh.type = shadow_type;
@@ -2524,7 +2524,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, 
mfn_t smfn)
 if (sp-up == 0) return 0;
 pmfn = _mfn(sp-up  PAGE_SHIFT);
 ASSERT(mfn_valid(pmfn));
-vaddr = sh_map_domain_page(pmfn);
+vaddr = map_domain_page(pmfn);
 ASSERT(vaddr);
 vaddr += sp-up  (PAGE_SIZE-1);
 ASSERT(l1e_get_pfn(*(l1_pgentry_t *)vaddr) == mfn_x(smfn));
@@ -2554,7 +2554,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, 
mfn_t smfn)
 default: BUG(); /* Some wierd unknown shadow type */
 }
 
-sh_unmap_domain_page(vaddr);
+unmap_domain_page(vaddr);
 if ( rc )
 perfc_incr(shadow_up_pointer);
 else
@@ -3028,7 +3028,7 @@ int shadow_enable(struct domain *d, u32 mode)
 e[i] = ((0x40U * i)
 | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER
 | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-sh_unmap_domain_page(e);
+unmap_domain_page(e);
 pg-u.inuse.type_info = PGT_l2_page_table | 1 | PGT_validated;
 }
 
@@ -3631,8 +3631,8 @@ int shadow_track_dirty_vram(struct domain *d,
 if ( sl1mfn != map_mfn )
 {
 if ( map_sl1p )
-sh_unmap_domain_page(map_sl1p);
-map_sl1p = sh_map_domain_page(_mfn(sl1mfn));
+unmap_domain_page(map_sl1p);
+map_sl1p = map_domain_page(_mfn(sl1mfn));
 map_mfn = sl1mfn;
 }
 sl1e = map_sl1p + (sl1ma  ~PAGE_MASK);
@@ -3663,7 +3663,7 @@ int shadow_track_dirty_vram(struct domain *d,
 }
 
 if ( map_sl1p )
-sh_unmap_domain_page(map_sl1p);
+unmap_domain_page(map_sl1p);
 
 memcpy(dirty_bitmap, dirty_vram-dirty_bitmap, dirty_size);
 memset(dirty_vram-dirty_bitmap, 0, dirty_size);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 0a942f8..00e8f1f 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -221,16 +221,16 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, 
walk_t *gw, int version)
 #if GUEST_PAGING_LEVELS = 4 /* 64-bit only... */
 l4p = (guest_l4e_t *)v-arch.paging.shadow.guest_vtable;
 mismatch |= (gw-l4e.l4 != l4p[guest_l4_table_offset(va)].l4);
-l3p = sh_map_domain_page(gw-l3mfn);
+l3p = map_domain_page(gw-l3mfn);
 mismatch 

[Xen-devel] [PATCH v3 2/4] xen/domain_page: Convert copy/clear_domain_page() to using mfn_t

2015-07-09 Thread Ben Catterall
From: Andrew Cooper andrew.coop...@citrix.com

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
[Convert grant_table.c to pass mfn_t types and fix ARM compiling]

Signed-off-by: Ben Catterall ben.catter...@citrix.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
 xen/arch/x86/mm.c |  7 ---
 xen/common/grant_table.c  |  2 +-
 xen/common/kimage.c   | 12 ++--
 xen/common/memory.c   | 12 +---
 xen/include/xen/domain_page.h | 15 ++-
 5 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b011c95..df9c190 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3293,7 +3293,7 @@ long do_mmuext_op(
 /* A page is dirtied when it's being cleared. */
 paging_mark_dirty(pg_owner, page_to_mfn(page));
 
-clear_domain_page(page_to_mfn(page));
+clear_domain_page(_mfn(page_to_mfn(page)));
 
 put_page_and_type(page);
 break;
@@ -3327,7 +3327,8 @@ long do_mmuext_op(
 /* A page is dirtied when it's being copied to. */
 paging_mark_dirty(pg_owner, page_to_mfn(dst_page));
 
-copy_domain_page(page_to_mfn(dst_page), page_to_mfn(src_page));
+copy_domain_page(_mfn(page_to_mfn(dst_page)),
+ _mfn(page_to_mfn(src_page)));
 
 put_page_and_type(dst_page);
 put_page(src_page);
@@ -6003,7 +6004,7 @@ int create_perdomain_mapping(struct domain *d, unsigned 
long va,
 pg = alloc_domheap_page(d, MEMF_no_owner);
 if ( pg )
 {
-clear_domain_page(page_to_mfn(pg));
+clear_domain_page(_mfn(page_to_mfn(pg)));
 if ( !IS_NIL(ppg) )
 *ppg++ = pg;
 l1tab[l1_table_offset(va)] =
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index dc3487d..681a553 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1848,7 +1848,7 @@ gnttab_transfer(
 goto unlock_and_copyback;
 }
 
-copy_domain_page(page_to_mfn(new_page), mfn);
+copy_domain_page(_mfn(page_to_mfn(new_page)), _mfn(mfn));
 
 page-count_info = ~(PGC_count_mask|PGC_allocated);
 free_domheap_page(page);
diff --git a/xen/common/kimage.c b/xen/common/kimage.c
index 8c4854d..742e4e8 100644
--- a/xen/common/kimage.c
+++ b/xen/common/kimage.c
@@ -77,7 +77,7 @@ static struct page_info *kimage_alloc_zeroed_page(unsigned 
memflags)
 if ( !page )
 return NULL;
 
-clear_domain_page(page_to_mfn(page));
+clear_domain_page(_mfn(page_to_mfn(page)));
 
 return page;
 }
@@ -409,7 +409,7 @@ static struct page_info 
*kimage_alloc_crash_control_page(struct kexec_image *ima
 if ( page )
 {
 image-next_crash_page = hole_end;
-clear_domain_page(page_to_mfn(page));
+clear_domain_page(_mfn(page_to_mfn(page)));
 }
 
 return page;
@@ -637,15 +637,15 @@ static struct page_info *kimage_alloc_page(struct 
kexec_image *image,
 if ( old )
 {
 /* If so move it. */
-unsigned long old_mfn = *old  PAGE_SHIFT;
-unsigned long mfn = addr  PAGE_SHIFT;
+mfn_t old_mfn = _mfn(*old  PAGE_SHIFT);
+mfn_t mfn = _mfn(addr  PAGE_SHIFT);
 
 copy_domain_page(mfn, old_mfn);
 clear_domain_page(old_mfn);
 *old = (addr  ~PAGE_MASK) | IND_SOURCE;
 unmap_domain_page(old);
 
-page = mfn_to_page(old_mfn);
+page = mfn_to_page(mfn_x(old_mfn));
 break;
 }
 else
@@ -917,7 +917,7 @@ int kimage_build_ind(struct kexec_image *image, unsigned 
long ind_mfn,
 goto done;
 }
 
-copy_domain_page(page_to_mfn(xen_page), mfn);
+copy_domain_page(_mfn(page_to_mfn(xen_page)), _mfn(mfn));
 put_page(guest_page);
 
 ret = kimage_add_page(image, page_to_maddr(xen_page));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..ae4c32e 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1170,25 +1170,23 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-#ifdef CONFIG_DOMAIN_PAGE
-void clear_domain_page(unsigned long mfn)
+void clear_domain_page(mfn_t mfn)
 {
-void *ptr = map_domain_page(mfn);
+void *ptr = map_domain_page(mfn_x(mfn));
 
 clear_page(ptr);
 unmap_domain_page(ptr);
 }
 
-void copy_domain_page(unsigned long dmfn, unsigned long smfn)
+void copy_domain_page(mfn_t dest, mfn_t source)
 {
-const void *src = map_domain_page(smfn);
-void *dst = map_domain_page(dmfn);
+const void *src = map_domain_page(mfn_x(source));
+void *dst = map_domain_page(mfn_x(dest));
 
 copy_page(dst, src);
 

Re: [Xen-devel] [PATCH v7 01/15] x86/hvm: remove multiple open coded 'chunking' loops

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
 +static int hvmemul_linear_mmio_access(
 +unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer,
 +uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t known_gpfn)
 +{
 +struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io;
 +unsigned long offset = gla  ~PAGE_MASK;
 +unsigned int chunk;
 +paddr_t gpa;
 +unsigned long one_rep = 1;
 +int rc;
 +
 +chunk = min_t(unsigned int, size, PAGE_SIZE - offset);
 +
 +if ( known_gpfn )
 +gpa = pfn_to_paddr(vio-mmio_gpfn) | offset;
 +else
 +{
 +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec,
 +hvmemul_ctxt);
 +if ( rc != X86EMUL_OKAY )
 +return rc;
 +}
 +
 +for ( ;; )
 +{
 +rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer);
 +if ( rc != X86EMUL_OKAY )
 +break;
 +
 +gla += chunk;
 +buffer += chunk;
 +size -= chunk;
 +
 +if ( size == 0 )
 +break;
 +
 +ASSERT((gla  ~PAGE_MASK) == 0);

While I don't mean you to re-submit another time, I'd still like to
get my question answered: Does this really matter for the code
below?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 1/2] OSSTEST: introduce a raisin build test

2015-07-09 Thread Ian Campbell
On Mon, 2015-06-22 at 16:16 +0100, Stefano Stabellini wrote:
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

In general I'm very suspicious of a 200 line patch with _no_ commit
message at all.

What about the details of the stuff around overriding the versions iff
tree and revision are set etc and why that matters and how the two
projects therefore interact wrt selecting what to build? I remember
discussing this at length either IRL or in older versions of the thread
and we both know it wasn't trivial to arrive at how this should be done,
but I _already_ can't remember how we reached this point or why and that
is information which really needs to be recorded so it can be referred
to later when we wonder why it does this.

The scope is also missing, i.e. this is currently just a straight raisin
build test, and the results are not consumed by anything. This is
important because of all the stuff about splitting up the dist dir and
incremental builds which we discussed means that as it stands this test
step is not producing build artefacts suitable as an input for actual
tests.

I expect there's also stuff in the intra-version changelog which
actually belongs in the main changelog, such as why you are refactoring
certain things into BuildSupport.pm, which deserves some sort of brief
mention IMHO.

[...]
 Changes in v3:
 - use $raisindir throughout ts-raisin-build
 - do not specify ENABLED_COMPONENTS so that empty REVISION variables can
 be used to disable building a raisin component

I see we do again in the code below, which I suspect was deliberate and
based on some discussion, but it's not mentioned in the Changes in v4+
and since the changelog doesn't explain anything I can't even begin to
guess why or how we arrived at this point.

[...]
 +sub build () {
 +target_cmd_root($ho, END);
 +cd $raisindir
 +./raise -y install-builddep

If two of these happen to run in parallel (build machines can be shared)
then one or the other risks timing out on the underlying dpkg lock
waiting for the other, since the wait might be arbitrarily long
depending on what is going on.

It also risks other builds happening in an environment which differs
from one build to the next (if this happened to run in the middle) or
even things changing while a build is happening.

This is why all build dependencies are installed from ts-xen-build-prep,
that step is run once for each build host as it is installed. This does
unfortunately mean that I think we can't take advantage of raisin's
tracking of necessary build dependencies, but at least it will be
checking things for us.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] x86: correct socket_cpumask allocation

2015-07-09 Thread Chao Peng
For booting cpu, the socket number is not needed to be 0 so
it needs to be computed by cpu number.

For secondary cpu, phys_proc_id is not valid in CPU_PREPARE
notifier(cpu_smpboot_alloc), so cpu_to_socket(cpu) can't be used.
Instead, pre-allocate secondary_cpu_mask in cpu_smpboot_alloc()
and later consume it in smp_store_cpu_info().

This patch also change socket_cpumask type from 'cpumask_var_t *'
to 'cpumask_t **' so that smaller NR_CPUS works.

Reported-by: Boris Ostrovsky boris.ostrov...@oracle.com
Tested-by: Dario Faggioli dario.faggi...@citrix.com
Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
Changes in v3:
* use type safe xzalloc(cpumask_t)
Changes in v2:
* Fix case that booting cpu is on the socket other than socket0.
* cpumask_var_t = cpumask_t * to make smaller NR_CPUS builds.
---
 xen/arch/x86/smpboot.c| 25 ++---
 xen/include/asm-x86/smp.h |  2 +-
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index c73aa1b..0f03364 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -61,7 +61,8 @@ cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
 unsigned int __read_mostly nr_sockets;
-cpumask_var_t *__read_mostly socket_cpumask;
+cpumask_t **__read_mostly socket_cpumask;
+static cpumask_t *secondary_socket_cpumask;
 
 struct cpuinfo_x86 cpu_data[NR_CPUS];
 
@@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
 static void smp_store_cpu_info(int id)
 {
 struct cpuinfo_x86 *c = cpu_data + id;
+unsigned int socket;
 
 *c = boot_cpu_data;
 if ( id != 0 )
+{
 identify_cpu(c);
 
+socket = cpu_to_socket(id);
+if ( !socket_cpumask[socket] )
+{
+socket_cpumask[socket] = secondary_socket_cpumask;
+secondary_socket_cpumask = NULL;
+}
+}
+
 /*
  * Certain Athlons might work (for various values of 'work') in SMP
  * but they are not certified as MP capable.
@@ -658,7 +669,7 @@ static void cpu_smpboot_free(unsigned int cpu)
 
 if ( cpumask_empty(socket_cpumask[socket]) )
 {
-free_cpumask_var(socket_cpumask[socket]);
+xfree(socket_cpumask[socket]);
 socket_cpumask[socket] = NULL;
 }
 
@@ -705,7 +716,6 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 nodeid_t node = cpu_to_node(cpu);
 struct desc_struct *gdt;
 unsigned long stub_page;
-unsigned int socket = cpu_to_socket(cpu);
 
 if ( node != NUMA_NO_NODE )
 memflags = MEMF_node(node);
@@ -748,8 +758,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 goto oom;
 per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu);
 
-if ( !socket_cpumask[socket] 
- !zalloc_cpumask_var(socket_cpumask + socket) )
+if ( secondary_socket_cpumask == NULL 
+ (secondary_socket_cpumask = xzalloc(cpumask_t)) == NULL )
 goto oom;
 
 if ( zalloc_cpumask_var(per_cpu(cpu_sibling_mask, cpu)) 
@@ -804,8 +814,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 
 set_nr_sockets();
 
-socket_cpumask = xzalloc_array(cpumask_var_t, nr_sockets);
-if ( !socket_cpumask || !zalloc_cpumask_var(socket_cpumask) )
+socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets);
+if ( socket_cpumask == NULL ||
+ (socket_cpumask[cpu_to_socket(0)] = xzalloc(cpumask_t)) == NULL )
 panic(No memory for socket CPU siblings map);
 
 if ( !zalloc_cpumask_var(per_cpu(cpu_sibling_mask, 0)) ||
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index e594062..ea07888 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -67,7 +67,7 @@ extern unsigned int nr_sockets;
 void set_nr_sockets(void);
 
 /* Representing HT and core siblings in each socket. */
-extern cpumask_var_t *socket_cpumask;
+extern cpumask_t **socket_cpumask;
 
 #endif /* !__ASSEMBLY__ */
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] x86: correct socket_cpumask allocation

2015-07-09 Thread Boris Ostrovsky

On 07/09/2015 11:16 AM, Andrew Cooper wrote:

On 09/07/15 15:23, Chao Peng wrote:

For booting cpu, the socket number is not needed to be 0 so
it needs to be computed by cpu number.

For secondary cpu, phys_proc_id is not valid in CPU_PREPARE
notifier(cpu_smpboot_alloc), so cpu_to_socket(cpu) can't be used.
Instead, pre-allocate secondary_cpu_mask in cpu_smpboot_alloc()
and later consume it in smp_store_cpu_info().

This patch also change socket_cpumask type from 'cpumask_var_t *'
to 'cpumask_t **' so that smaller NR_CPUS works.

Reported-by: Boris Ostrovsky boris.ostrov...@oracle.com
Tested-by: Dario Faggioli dario.faggi...@citrix.com
Signed-off-by: Chao Peng chao.p.p...@linux.intel.com

The v2 of this path seems to be holding up fine against the XenServer
test pool.  No failures to boot encountered so far.

Tested-by: Andrew Cooper andrew.coop...@citrix.com

As the differences between v2 and v3 appear largely mechanical and not
functional in nature, this Tested-by can probably still stand.


Works for me as well. (You can add 'Tested-by: Boris Ostrovsky 
boris.ostrov...@oracle.com' although I think by now there are enough 
of those)


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST v7 1/2] cr-publish-flight-logs: support publishing files from $HOME/public_html

2015-07-09 Thread Ian Campbell
On Thu, 2015-07-09 at 13:48 +0100, Ian Jackson wrote:
 Ian Campbell writes ([PATCH OSSTEST v7 1/2] cr-publish-flight-logs: support 
 publishing files from $HOME/public_html):
  Any arguments given after the flight are assumed to be files or
  directories relative to $HOME/public_html which should be published to
  $c{Publish} at the same relative path.
  
  Removing the requirement that a flight be provided, by accepting '',
  allows this to be used even in contexts which have no flight.
  
  The results subtree is always published if $c{ResultsPublish} is set,
  since it doesn't depend on any particular flight.
 
 Uh.  Looking at this, I hadn't spotted that in your other patch you
 generate the output in ~/public_html.  Shouldn't it be in results/
 (which is nowadays a general place for publicly visible output) ?
 
 At the very least, if you do mean ~/public_html, you should be using
 WebspaceFile.

Results would be ok too, if you prefer (which you seem to so I shall
make that change).

 
 Also:
 
  -sub copydir ($$) {
  +sub copylogs ($$) {
 
 Why did you rename this function ?

I meant to mention this in the commit message and forgot, it was just
because the thing I was now copying wasn't a dir...

But if I generate the file into results this isn't need, the existing
publication of Results will do the job for free.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to handle retry

2015-07-09 Thread Paul Durrant
 -Original Message-
 From: Paul Durrant
 Sent: 09 July 2015 14:43
 To: 'Jan Beulich'
 Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org)
 Subject: RE: [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to
 handle retry
 
  -Original Message-
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: 09 July 2015 14:38
  To: Paul Durrant
  Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org)
  Subject: RE: [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to
  handle retry
 
   On 09.07.15 at 14:50, paul.durr...@citrix.com wrote:
   From: Jan Beulich [mailto:jbeul...@suse.com]
   Sent: 09 July 2015 13:04
On 09.07.15 at 13:11, paul.durr...@citrix.com wrote:
From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: 09 July 2015 11:05
 On 03.07.15 at 18:25, paul.durr...@citrix.com wrote:
 @@ -287,17 +271,56 @@ static int hvmemul_do_io_addr(
  bool_t is_mmio, paddr_t addr, unsigned long *reps,
  unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
  {
 -struct page_info *ram_page;
 +struct vcpu *v = current;
 +unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
 +unsigned int page_off = ram_gpa  (PAGE_SIZE - 1);
 +struct page_info *ram_page[2];
 +int nr_pages = 0;
   
unsigned int
   
No, it's intentionally signed because the unwind code at the end of
 the
function is:
   
while ( --nr_pages = 0 )
hvmemul_release_page(ram_page[nr_pages]);
   
I.e. the loop terminates when nr_pages gets to -1.
  
while ( nr_pages-- )
hvmemul_release_page(ram_page[nr_pages]);
  
  
   But that wouldn't be correct, since the loop would try to release
   ram_page[1] and ram_page[0] in the case where only ram_page[0] had
  been
   acquired. It would have to be:
  
 while ( nr_pages )
 hvmemul_release_page(ram_page[--nr_pages]);
 
  Unless you need the value of nr_pages after the loop, I can't see
  the difference between your and my proposal - the decrement in
  both cases happens between determining whether to continue the
  loop and accessing the intended array element.
 
 D'oh. Of course it does.
 
 
   which I'll do, since you so dislike signed ints.
 
  Sounds like you're not understanding why. Just look at the
  generated code when using signed vs unsigned int as array
  index.
 
 Ok, I'll take a look because I would not have thought it would make a huge
 difference.
 

So Two test programs:

test1.c:
---

#include stdio.h

int main(void)
{
  int i = 10;
  int a[10];

  while ( --i = 0 ) {
a[i] = i;
printf(%d\n, i);
  }

  return 0;
}

test2.c:
---

#include stdio.h

int main(void)
{
  unsigned int i = 10;
  int a[10];

  while ( i ) {
a[--i] = i;
printf(%d\n, i);
  }

  return 0;
}

I compiled them just using 'make test1' and 'make test2', so using whatever the 
default options are for gcc (on a debian wheezy box). The relevant bits of the 
objdump are:

test1
-

0040050c main:
  40050c:   55  push   %rbp
  40050d:   48 89 e5mov%rsp,%rbp
  400510:   48 83 ec 30 sub$0x30,%rsp
  400514:   c7 45 fc 0a 00 00 00movl   $0xa,-0x4(%rbp)
  40051b:   eb 20   jmp40053d main+0x31
  40051d:   8b 45 fcmov-0x4(%rbp),%eax
  400520:   48 98   cltq   
  400522:   8b 55 fcmov-0x4(%rbp),%edx
  400525:   89 54 85 d0 mov%edx,-0x30(%rbp,%rax,4)
  400529:   8b 45 fcmov-0x4(%rbp),%eax
  40052c:   89 c6   mov%eax,%esi
  40052e:   bf fc 05 40 00  mov$0x4005fc,%edi
  400533:   b8 00 00 00 00  mov$0x0,%eax
  400538:   e8 a3 fe ff ff  callq  4003e0 printf@plt
  40053d:   83 6d fc 01 subl   $0x1,-0x4(%rbp)
  400541:   83 7d fc 00 cmpl   $0x0,-0x4(%rbp)
  400545:   79 d6   jns40051d main+0x11
  400547:   b8 00 00 00 00  mov$0x0,%eax
  40054c:   c9  leaveq 
  40054d:   c3  retq   
  40054e:   90  nop
  40054f:   90  nop

test2
-

0040050c main:
  40050c:   55  push   %rbp
  40050d:   48 89 e5mov%rsp,%rbp
  400510:   48 83 ec 30 sub$0x30,%rsp
  400514:   c7 45 fc 0a 00 00 00movl   $0xa,-0x4(%rbp)
  40051b:   eb 22   jmp40053f main+0x33
  40051d:   83 6d fc 01 subl   $0x1,-0x4(%rbp)
  400521:   8b 55 fcmov-0x4(%rbp),%edx
  400524:   8b 45 fcmov-0x4(%rbp),%eax
  400527:   89 54 85 d0 mov%edx,-0x30(%rbp,%rax,4)
  40052b:   8b 45 fcmov-0x4(%rbp),%eax
  40052e:

Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
 Building on the previous patch, this patch restricts portio port numbers
 to uint16_t in registration/relocate calls. In portio_action_t the port
 number is change to unsigned int though to avoid the compiler generating
 16-bit operations unnecessarily. The patch also changes I/O sizes to
 unsigned int which then allows the io_handler size field to reduce to
 an unsigned int.
 
 Signed-off-by: Paul Durrant paul.durr...@citrix.com
 Cc: Keir Fraser k...@xen.org
 Cc: Jan Beulich jbeul...@suse.com
 Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
 ---
 
 v7:
 - Change port type in portio_action_t to unsigned int as requested
   by Jan

Yet title and description were left in places, and ...

 @@ -96,17 +96,17 @@ int hvm_mmio_intercept(ioreq_t *p);
  int hvm_buffered_io_send(ioreq_t *p);
  
  static inline void register_portio_handler(
 -struct domain *d, unsigned long addr,
 -unsigned long size, portio_action_t action)
 +struct domain *d, uint16_t port, unsigned int size,
 +portio_action_t action)
  {
 -register_io_handler(d, addr, size, action, HVM_PORTIO);
 +register_io_handler(d, port, size, action, HVM_PORTIO);
  }
  
  static inline void relocate_portio_handler(
 -struct domain *d, unsigned long old_addr, unsigned long new_addr,
 -unsigned long size)
 +struct domain *d, uint16_t old_port, uint16_t new_port,
 +unsigned int size)
  {
 -relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO);
 +relocate_io_handler(d, old_port, new_port, size, HVM_PORTIO);
  }

... these still use uint16_t. I'm pretty sure I gave my comment in a
way indicating that this should generally change, perhaps just at
the example of portio_action_t.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST v8 2/2] mg-all-branch-statuses: Show how up to date each branch is

2015-07-09 Thread Ian Jackson
Ian Campbell writes ([PATCH OSSTEST v8 2/2] mg-all-branch-statuses: Show how 
up to date each branch is):
 Using report_find_push_age_info allows us to provide counts of
 attempts since the last baseline on current tip as well as the first
 attempt of each of those.
 
 Since everything serialises on the repo lock I didn't bother trying to
 parallelise anything.

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 2/2] OSSTest: push successful raisin builds

2015-07-09 Thread Ian Campbell
On Mon, 2015-06-22 at 16:16 +0100, Stefano Stabellini wrote:
 Determine the most recent raisin revision that needs to be tested, by 
 comparing
 with the already tested xen-tested-master branch.  Push to
 raisin.git:xen-tested-master when the build is successful.

The mechanics here look right to me, but do you perhaps want to adopt
the staging-master push gate scheme used in xen.git etc? This would
make master (the default thing which users use) be something which has
undergone at least some amount of testing. We typically use
xen-tested-master for 3rd party things where the output of the test gate
is mainly just for osstest's bookkeeping and not really aimed at
actually users.

Oh, one thing you missed is adding raisin to the default $BRANCHES at
the top of cr-for-branches, without that there will be no raisin flights
run and therefore no push.

You might also want to considering all non-raisin build and test jobs
from the raisin flight, since they are a waste of time until the raisin
build results are actually used for something. You can filter tests in
make-flight:job_create_test_filter_callback and for jobs you will need
my mfi-common: Allow make-*flight to filter the set of build jobs to
include patch which adds job_create_build_filter_callback.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] MSR_PLATFORM_INFO register can't be emulated in vmx_msr_read_intercept (xen latest version)

2015-07-09 Thread Fanhenglong
Xen latest new version, CPU: ivybridge.


SiSoftware Sandra software fails to run on a windows vm, the guest print BSOD 
3B,

after debug I found the problem  in function vmx_msr_read_intercept

windows vm first read from 0xce register and then read 0x620 caused gp_fault by 
the follow steps:
vmx_msr_read_intercept
-rdmsr_safe()


0xce is MSR_PLATFORM_INFO refered to Intel(r) 64 and IA-32 Architectures 
Software Developer's Manual
MSR_PLATFORM_INFO can't be emulated in xen,


so can anyone give me some advice to solve the problem ad make SiSoftware 
Sandra software work well.


Thanks

Henglong Fan
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST v7 1/2] cr-publish-flight-logs: support publishing files from $HOME/public_html

2015-07-09 Thread Ian Jackson
Ian Campbell writes (Re: [PATCH OSSTEST v7 1/2] cr-publish-flight-logs: 
support publishing files from $HOME/public_html):
 On Thu, 2015-07-09 at 13:48 +0100, Ian Jackson wrote:
  At the very least, if you do mean ~/public_html, you should be using
  WebspaceFile.
 
 Results would be ok too, if you prefer (which you seem to so I shall
 make that change).

Thanks.

  Also:
  
   -sub copydir ($$) {
   +sub copylogs ($$) {
  
  Why did you rename this function ?
 
 I meant to mention this in the commit message and forgot, it was just
 because the thing I was now copying wasn't a dir...
 
 But if I generate the file into results this isn't need, the existing
 publication of Results will do the job for free.

Indeed.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain

2015-07-09 Thread Boris Ostrovsky

On 07/09/2015 03:02 AM, Jan Beulich wrote:

On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote:

As I started to update the patches I realized that in some cases
(especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't
have VCPU (which is what hvm_guest_x86_mode() wants) but rather only the
domain. d-vcpu[0] should work. Otherwise I'll either need a new field
in struct domain or wrap has_32bit_shinfo into something PVH-specific,
like is_32bit_pvh_vcpu().

Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like
for PVH, especially if you also intend the tools to use the 64-bit
guest context variant even for 32-bit PVH? Once again - are you
intending to prohibit 32-bit PVH switching to 64-bit mode (which
would seem both wrong and possibly cumbersome to me)?


With current PVH implementation I don't think we can switch. We are 
starting the guest in very much PV-like fashion. That's why we are 
getting into switch_compat() --- via XEN_DOMCTL_set_address_size.


For XEN_DOMCTL_get_address_size specifically we need to be able to 
figure out the mode *before* the guest is running because we use it to 
set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't 
change the mode.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 16:10, boris.ostrov...@oracle.com wrote:
 On 07/09/2015 03:02 AM, Jan Beulich wrote:
 On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote:
 As I started to update the patches I realized that in some cases
 (especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't
 have VCPU (which is what hvm_guest_x86_mode() wants) but rather only the
 domain. d-vcpu[0] should work. Otherwise I'll either need a new field
 in struct domain or wrap has_32bit_shinfo into something PVH-specific,
 like is_32bit_pvh_vcpu().
 Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like
 for PVH, especially if you also intend the tools to use the 64-bit
 guest context variant even for 32-bit PVH? Once again - are you
 intending to prohibit 32-bit PVH switching to 64-bit mode (which
 would seem both wrong and possibly cumbersome to me)?
 
 With current PVH implementation I don't think we can switch. We are 
 starting the guest in very much PV-like fashion. That's why we are 
 getting into switch_compat() --- via XEN_DOMCTL_set_address_size.
 
 For XEN_DOMCTL_get_address_size specifically we need to be able to 
 figure out the mode *before* the guest is running because we use it to 
 set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't 
 change the mode.

Okay - but is there code (being put) in place to refuse switch
attempts?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 01/13] x86/hvm: change portio port numbers and sizes to unsigned int

2015-07-09 Thread Paul Durrant
Building on the previous patch, this patch changes portio port numbers
and sizes to unsigned int which then allows the io_handler size field to
reduce to an unsigned int.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- Get rid of remaining uint16_t port numbers as requested by Jan

v7:
- Change port type in portio_action_t to unsigned int as requested
  by Jan

v6:
- Added Andrew's reviewed-by

v5:
- New patch to tidy up more types
---
 xen/arch/x86/hvm/hvm.c   |6 +++---
 xen/arch/x86/hvm/i8254.c |8 
 xen/arch/x86/hvm/intercept.c |4 ++--
 xen/arch/x86/hvm/pmtimer.c   |4 ++--
 xen/arch/x86/hvm/rtc.c   |2 +-
 xen/arch/x86/hvm/stdvga.c|2 +-
 xen/arch/x86/hvm/vpic.c  |4 ++--
 xen/include/asm-x86/hvm/io.h |   20 ++--
 8 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 554e239..126882d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -559,7 +559,7 @@ static int hvm_add_ioreq_gmfn(
 }
 
 static int hvm_print_line(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+int dir, unsigned int port, unsigned int bytes, uint32_t *val)
 {
 struct domain *cd = current-domain;
 char c = *val;
@@ -585,7 +585,7 @@ static int hvm_print_line(
 }
 
 static int hvm_access_cf8(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+int dir, unsigned int port, unsigned int bytes, uint32_t *val)
 {
 struct domain *d = current-domain;
 
@@ -597,7 +597,7 @@ static int hvm_access_cf8(
 }
 
 static int handle_pvh_io(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+int dir, unsigned int port, unsigned int bytes, uint32_t *val)
 {
 struct domain *currd = current-domain;
 
diff --git a/xen/arch/x86/hvm/i8254.c b/xen/arch/x86/hvm/i8254.c
index 36a0a53..8a93c88 100644
--- a/xen/arch/x86/hvm/i8254.c
+++ b/xen/arch/x86/hvm/i8254.c
@@ -50,9 +50,9 @@
 #define RW_STATE_WORD1 4
 
 static int handle_pit_io(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val);
+int dir, unsigned int port, unsigned int bytes, uint32_t *val);
 static int handle_speaker_io(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val);
+int dir, unsigned int port, unsigned int bytes, uint32_t *val);
 
 #define get_guest_time(v) \
(is_hvm_vcpu(v) ? hvm_get_guest_time(v) : (u64)get_s_time())
@@ -479,7 +479,7 @@ void pit_deinit(struct domain *d)
 
 /* the intercept action for PIT DM retval:0--not handled; 1--handled */  
 static int handle_pit_io(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+int dir, unsigned int port, unsigned int bytes, uint32_t *val)
 {
 struct PITState *vpit = vcpu_vpit(current);
 
@@ -522,7 +522,7 @@ static uint32_t speaker_ioport_read(
 }
 
 static int handle_speaker_io(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+int dir, unsigned int port, uint32_t bytes, uint32_t *val)
 {
 struct PITState *vpit = vcpu_vpit(current);
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index cc44733..52879ff 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -369,7 +369,7 @@ int hvm_io_intercept(ioreq_t *p, int type)
 }
 
 void register_io_handler(
-struct domain *d, unsigned long addr, unsigned long size,
+struct domain *d, unsigned long addr, unsigned int size,
 void *action, int type)
 {
 struct hvm_io_handler *handler = d-arch.hvm_domain.io_handler;
@@ -386,7 +386,7 @@ void register_io_handler(
 
 void relocate_io_handler(
 struct domain *d, unsigned long old_addr, unsigned long new_addr,
-unsigned long size, int type)
+unsigned int size, int type)
 {
 struct hvm_io_handler *handler = d-arch.hvm_domain.io_handler;
 int i;
diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 6ad2797..34fc062 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -142,7 +142,7 @@ static void pmt_timer_callback(void *opaque)
 
 /* Handle port I/O to the PM1a_STS and PM1a_EN registers */
 static int handle_evt_io(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+int dir, unsigned int port, unsigned int bytes, uint32_t *val)
 {
 struct vcpu *v = current;
 PMTState *s = v-domain-arch.hvm_domain.pl_time.vpmt;
@@ -205,7 +205,7 @@ static int handle_evt_io(
 
 /* Handle port I/O to the TMR_VAL register */
 static int handle_pmt_io(
-int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+int dir, unsigned int port, unsigned int bytes, uint32_t *val)
 {
 struct vcpu *v = current;
 PMTState *s = v-domain-arch.hvm_domain.pl_time.vpmt;
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 3448971..a9efeaf 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -697,7 +697,7 @@ static uint32_t 

Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-09 Thread Elena Ufimtseva

- jbeul...@suse.com wrote:

  On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote:
  You are right, it needs to be rebased. I can post later rebased on
 memory 
  leak fix version, if you thin its a way to go.
 
 I didn't look at v9 yet, and can't predict when I will be able to.
 
 Jan

Jan 

Would you like me to post v10 with memory leak patch included in the patchset 
before you start looking at v9?

Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain

2015-07-09 Thread Boris Ostrovsky

On 07/09/2015 10:30 AM, Boris Ostrovsky wrote:

On 07/09/2015 10:17 AM, Jan Beulich wrote:

On 09.07.15 at 16:10, boris.ostrov...@oracle.com wrote:

On 07/09/2015 03:02 AM, Jan Beulich wrote:

On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote:

As I started to update the patches I realized that in some cases
(especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't
have VCPU (which is what hvm_guest_x86_mode() wants) but rather 
only the
domain. d-vcpu[0] should work. Otherwise I'll either need a new 
field
in struct domain or wrap has_32bit_shinfo into something 
PVH-specific,

like is_32bit_pvh_vcpu().

Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like
for PVH, especially if you also intend the tools to use the 64-bit
guest context variant even for 32-bit PVH? Once again - are you
intending to prohibit 32-bit PVH switching to 64-bit mode (which
would seem both wrong and possibly cumbersome to me)?

With current PVH implementation I don't think we can switch. We are
starting the guest in very much PV-like fashion. That's why we are
getting into switch_compat() --- via XEN_DOMCTL_set_address_size.

For XEN_DOMCTL_get_address_size specifically we need to be able to
figure out the mode *before* the guest is running because we use it to
set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't
change the mode.

Okay - but is there code (being put) in place to refuse switch
attempts?


No, I should add code to deal with this.


Forgot to include here --- so what is your preference wrt what I am 
asking in the first paragraph? d-vcpu[0], new field (or maybe a flag 
with bits per 32bit-pv and 32bit-pvh), or a PVH-wrapper for 
has_32bit_shinfo?


-boris

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 00/12] Alternate p2m: support multiple copies of host p2m

2015-07-09 Thread Sahita, Ravi
From: Wei Liu [mailto:wei.l...@citrix.com]
Sent: Thursday, July 09, 2015 4:49 AM

Question for you and Ed, how much testing have you done to this series? I
assume you've done testing with it turned on and off, to the point you get a
functioning guest. Have you run any test suites and got positive results?

Yes, we have tested with Windows hvm guests and got positive results, and 
Tamas' tool patches based tests have as well.


[0]: e1z8rcu-0003v6...@ukmail1.uk.xensource.com

This ref link was messed up

Thanks,
Ravi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 18:10, paul.durr...@citrix.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 09 July 2015 16:24
  On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
  Building on the previous patch, this patch restricts portio port numbers
  to uint16_t in registration/relocate calls. In portio_action_t the port
  number is change to unsigned int though to avoid the compiler generating
  16-bit operations unnecessarily. The patch also changes I/O sizes to
  unsigned int which then allows the io_handler size field to reduce to
  an unsigned int.
 
  Signed-off-by: Paul Durrant paul.durr...@citrix.com
  Cc: Keir Fraser k...@xen.org
  Cc: Jan Beulich jbeul...@suse.com
  Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
  ---
 
  v7:
  - Change port type in portio_action_t to unsigned int as requested
by Jan
 
 Yet title and description were left in places, and ...
 
 The title remains. The description was modified:
 
  In portio_action_t the port number is change to unsigned int though to 
 avoid the compiler generating 16-bit operations unnecessarily.

Maybe I'm just being confused by the two and-s in the title,
which I assumed can't both be meant to be and, or else you'd
have written port numbers, uint16_t, and sizes. Plus it seems
unclear what restrict a uint16_t to unsigned int actually means.

  @@ -96,17 +96,17 @@ int hvm_mmio_intercept(ioreq_t *p);
   int hvm_buffered_io_send(ioreq_t *p);
 
   static inline void register_portio_handler(
  -struct domain *d, unsigned long addr,
  -unsigned long size, portio_action_t action)
  +struct domain *d, uint16_t port, unsigned int size,
  +portio_action_t action)
   {
  -register_io_handler(d, addr, size, action, HVM_PORTIO);
  +register_io_handler(d, port, size, action, HVM_PORTIO);
   }
 
   static inline void relocate_portio_handler(
  -struct domain *d, unsigned long old_addr, unsigned long new_addr,
  -unsigned long size)
  +struct domain *d, uint16_t old_port, uint16_t new_port,
  +unsigned int size)
   {
  -relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO);
  +relocate_io_handler(d, old_port, new_port, size, HVM_PORTIO);
   }
 
 ... these still use uint16_t. I'm pretty sure I gave my comment in a
 way indicating that this should generally change, perhaps just at
 the example of portio_action_t.
 
 Why? Do we not want to restrict to uint16_t at the interface level?

Quite the inverse - why would we want to? This just makes the
calling sequence less efficient (due to the needed operand size
overrides), and hides mistakes in callers properly truncating
when reading guest registers.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 00/12] Alternate p2m: support multiple copies of host p2m

2015-07-09 Thread Ian Campbell
On Thu, 2015-07-09 at 16:13 +, Sahita, Ravi wrote:
 [0]: e1z8rcu-0003v6...@ukmail1.uk.xensource.com
 
 This ref link was messed up

It was a message id, you can paste it into the URL of your favourite ML
archive which supports such things. e.g.
http://mid.gmane.org/e1z8rcu-0003v6...@ukmail1.uk.xensource.com

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.

2015-07-09 Thread Dagaen Golomb
 I think you might assume that the first M VCPUs  in the runq are the
 current running VCPUs on the M pCPUs. Am I correct? (From what you
 described in the following example, I think I'm correct. ;-) )

 Mmm... Interesting. Yes, I was. I was basing this assumption on this
 chunk on Dagaen's patch:

 // If we become one of top [# CPUs] in the runq, tickle it
 // TODO: make this work when multiple tickles are required
 if ( new_position  0  new_position = prv-NUM_CPUS )
 runq_tickle(ops, svc);

 And forgot (and did not go check) about the __q_remove() in
 rt_schedule(). My bad again.

 But then, since we don't have the running vCPUs in the runq, how the
 code above is supposed to be correct?

This was my bad. I need to change so running vcpus are in runq, and
this would then be correct.

 I tell that you make the above assumption from here.

 However, in the current implementation, runq does not hold the current
 running VCPUs on the pCPUs. We remove the vcpu from runq in
 rt_schedule() function. What you described above make perfect sense
 if we decide to make runq hold the current running VCPUs.

 Yep. And it indeed seems to me that we may well think about doing so. It
 will make it possible to base on the position for making/optimizing
 scheduling decisions, and at the same time I don't think I see much
 downsides in that, do you?

 Actually, after thinking about the example you described, I think we
 can hold the current running VCPUs *and* the current idle pCPUs in the
 scheduler-wide structure;

 What do you mean with 'current idle pCPUs'? I said something similar as
 well, and what I meant was a cpumask with bit i set if i-eth pCPU is
 idle, do you also mean this?

 About the running vCPUs, why just not leave them in the actual runq?

This seemed like the straightforward way to me, too.

 In other words, we can have another runningq
 (not runq) and a idle_pcpu list in the rt_private; Now all VCPUs are
 stored in three queues: runningq, runq, and depletedq, in increasing
 priority order.

 Perhaps, but I'm not sure I see the need for another list. Again, why
 just not leave them in runq? I appreciate this is a rather big  change
 (although, perhaps it looks bigger said than done), but I think it could
 be worth pursuing.

 For double checking, asserting, and making sure that we are able to
 identify the running svc-s, we have the __RTDS_scheduled flag.

I also don't feel we need another list.

Regards,
~Dagaen Golomb

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 07/13] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io()

2015-07-09 Thread Paul Durrant
By removing the calls in hvmemul_do_io() (which is replaced by a single
assignment) and hvm_complete_assist_request() (which is replaced by a
call to hvm_process_portio_intercept() with a suitable set of ops) then
hvm_io_assist() can be moved into hvm.c and made static (and hence be a
candidate for inlining).

The calls to msix_write_completion() and vcpu_end_shutdown_deferral()
are also made unconditionally because the ioreq state will always be
STATE_IOREQ_NONE at the end of hvm_io_assist() so the test was
pointless. These calls are also only relevant when the emulation has
been handled externally which is now always the case.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Acked-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- No change

v7:
- No change

v6:
- Added Andrew's reviewed-by

v5:
- Added Jan's acked-by
---
 xen/arch/x86/hvm/emulate.c|   34 ++---
 xen/arch/x86/hvm/hvm.c|   67 ++---
 xen/arch/x86/hvm/io.c |   39 
 xen/include/asm-x86/hvm/hvm.h |1 -
 xen/include/asm-x86/hvm/io.h  |1 -
 5 files changed, 66 insertions(+), 76 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 02bde26..c26be54 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -50,6 +50,32 @@ static void hvmtrace_io_assist(const ioreq_t *p)
 trace_var(event, 0/*!cycles*/, size, buffer);
 }
 
+static int null_read(const struct hvm_io_handler *io_handler,
+ uint64_t addr,
+ uint32_t size,
+ uint64_t *data)
+{
+*data = ~0ul;
+return X86EMUL_OKAY;
+}
+
+static int null_write(const struct hvm_io_handler *handler,
+  uint64_t addr,
+  uint32_t size,
+  uint64_t data)
+{
+return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops null_ops = {
+.read = null_read,
+.write = null_write
+};
+
+static const struct hvm_io_handler null_handler = {
+.ops = null_ops
+};
+
 static int hvmemul_do_io(
 bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
 uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
@@ -139,8 +165,7 @@ static int hvmemul_do_io(
 switch ( rc )
 {
 case X86EMUL_OKAY:
-p.state = STATE_IORESP_READY;
-hvm_io_assist(p);
+vio-io_data = p.data;
 vio-io_state = HVMIO_none;
 break;
 case X86EMUL_UNHANDLEABLE:
@@ -151,8 +176,9 @@ static int hvmemul_do_io(
 /* If there is no suitable backing DM, just ignore accesses */
 if ( !s )
 {
-hvm_complete_assist_req(p);
-rc = X86EMUL_OKAY;
+rc = hvm_process_io_intercept(null_handler, p);
+if ( rc == X86EMUL_OKAY )
+vio-io_data = p.data;
 vio-io_state = HVMIO_none;
 }
 else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 9bdc1d6..7358acf 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -411,6 +411,42 @@ bool_t hvm_io_pending(struct vcpu *v)
 return 0;
 }
 
+static void hvm_io_assist(ioreq_t *p)
+{
+struct vcpu *curr = current;
+struct hvm_vcpu_io *vio = curr-arch.hvm_vcpu.hvm_io;
+enum hvm_io_state io_state;
+
+p-state = STATE_IOREQ_NONE;
+
+io_state = vio-io_state;
+vio-io_state = HVMIO_none;
+
+switch ( io_state )
+{
+case HVMIO_awaiting_completion:
+vio-io_state = HVMIO_completed;
+vio-io_data = p-data;
+break;
+case HVMIO_handle_mmio_awaiting_completion:
+vio-io_state = HVMIO_completed;
+vio-io_data = p-data;
+(void)handle_mmio();
+break;
+case HVMIO_handle_pio_awaiting_completion:
+if ( vio-io_size == 4 ) /* Needs zero extension. */
+guest_cpu_user_regs()-rax = (uint32_t)p-data;
+else
+memcpy(guest_cpu_user_regs()-rax, p-data, vio-io_size);
+break;
+default:
+break;
+}
+
+msix_write_completion(curr);
+vcpu_end_shutdown_deferral(curr);
+}
+
 static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
 {
 /* NB. Optimised for common case (p-state == STATE_IOREQ_NONE). */
@@ -2663,37 +2699,6 @@ int hvm_send_assist_req(struct hvm_ioreq_server *s, 
ioreq_t *proto_p)
 return X86EMUL_UNHANDLEABLE;
 }
 
-void hvm_complete_assist_req(ioreq_t *p)
-{
-switch ( p-type )
-{
-case IOREQ_TYPE_PCI_CONFIG:
-ASSERT_UNREACHABLE();
-break;
-case IOREQ_TYPE_COPY:
-case IOREQ_TYPE_PIO:
-if ( p-dir == IOREQ_READ )
-{
-if ( !p-data_is_ptr )
-p-data = ~0ul;
-else
-{
-int i, step = p-df ? -p-size : p-size;
-uint32_t data = ~0;
-
-for ( i = 0; i  

[Xen-devel] [PATCH v8 08/13] x86/hvm: split I/O completion handling from state model

2015-07-09 Thread Paul Durrant
The state of in-flight I/O and how its completion will be handled are
logically separate and conflating the two makes the code unnecessarily
confusing.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- No change

v7:
- Modified struct field placement as requested by Jan

v6:
- Added Andrew's reviewed-by

v5:
- Confirmed call to msix_write_completion() is in the correct place.
---
 xen/arch/x86/hvm/hvm.c|   44 +++--
 xen/arch/x86/hvm/io.c |6 ++---
 xen/arch/x86/hvm/vmx/realmode.c   |   27 +--
 xen/include/asm-x86/hvm/vcpu.h|   16 +-
 xen/include/asm-x86/hvm/vmx/vmx.h |1 +
 5 files changed, 67 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 7358acf..093a710 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -59,6 +59,7 @@
 #include asm/hvm/trace.h
 #include asm/hvm/nestedhvm.h
 #include asm/hvm/event.h
+#include asm/hvm/vmx/vmx.h
 #include asm/mtrr.h
 #include asm/apic.h
 #include public/sched.h
@@ -428,17 +429,6 @@ static void hvm_io_assist(ioreq_t *p)
 vio-io_state = HVMIO_completed;
 vio-io_data = p-data;
 break;
-case HVMIO_handle_mmio_awaiting_completion:
-vio-io_state = HVMIO_completed;
-vio-io_data = p-data;
-(void)handle_mmio();
-break;
-case HVMIO_handle_pio_awaiting_completion:
-if ( vio-io_size == 4 ) /* Needs zero extension. */
-guest_cpu_user_regs()-rax = (uint32_t)p-data;
-else
-memcpy(guest_cpu_user_regs()-rax, p-data, vio-io_size);
-break;
 default:
 break;
 }
@@ -479,6 +469,7 @@ void hvm_do_resume(struct vcpu *v)
 struct hvm_vcpu_io *vio = v-arch.hvm_vcpu.hvm_io;
 struct domain *d = v-domain;
 struct hvm_ioreq_server *s;
+enum hvm_io_completion io_completion;
 
 check_wakeup_from_wait();
 
@@ -505,8 +496,37 @@ void hvm_do_resume(struct vcpu *v)
 }
 }
 
-if ( vio-mmio_retry )
+io_completion = vio-io_completion;
+vio-io_completion = HVMIO_no_completion;
+
+switch ( io_completion )
+{
+case HVMIO_no_completion:
+break;
+case HVMIO_mmio_completion:
 handle_mmio();
+break;
+case HVMIO_pio_completion:
+if ( vio-io_size == 4 ) /* Needs zero extension. */
+guest_cpu_user_regs()-rax = (uint32_t)vio-io_data;
+else
+memcpy(guest_cpu_user_regs()-rax, vio-io_data, vio-io_size);
+vio-io_state = HVMIO_none;
+break;
+case HVMIO_realmode_completion:
+{
+struct hvm_emulate_ctxt ctxt;
+
+hvm_emulate_prepare(ctxt, guest_cpu_user_regs());
+vmx_realmode_emulate_one(ctxt);
+hvm_emulate_writeback(ctxt);
+
+break;
+}
+default:
+ASSERT_UNREACHABLE();
+break;
+}
 
 /* Inject pending hw/sw trap */
 if ( v-arch.hvm_vcpu.inject_trap.vector != -1 ) 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index fe099d8..221d05e 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -92,8 +92,8 @@ int handle_mmio(void)
 
 if ( rc != X86EMUL_RETRY )
 vio-io_state = HVMIO_none;
-if ( vio-io_state == HVMIO_awaiting_completion )
-vio-io_state = HVMIO_handle_mmio_awaiting_completion;
+if ( vio-io_state == HVMIO_awaiting_completion || vio-mmio_retry )
+vio-io_completion = HVMIO_mmio_completion;
 else
 vio-mmio_access = (struct npfec){};
 
@@ -158,7 +158,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
 return 0;
 /* Completion in hvm_io_assist() with no re-emulation required. */
 ASSERT(dir == IOREQ_READ);
-vio-io_state = HVMIO_handle_pio_awaiting_completion;
+vio-io_completion = HVMIO_pio_completion;
 break;
 default:
 gdprintk(XENLOG_ERR, Weird HVM ioemulation status %d.\n, rc);
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index fe8b4a0..76ff9a5 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -101,15 +101,19 @@ static void realmode_deliver_exception(
 }
 }
 
-static void realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
+void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
 {
 struct vcpu *curr = current;
+struct hvm_vcpu_io *vio = curr-arch.hvm_vcpu.hvm_io;
 int rc;
 
 perfc_incr(realmode_emulations);
 
 rc = hvm_emulate_one(hvmemul_ctxt);
 
+if ( vio-io_state == HVMIO_awaiting_completion || vio-mmio_retry )
+vio-io_completion = HVMIO_realmode_completion;
+
 if ( rc == X86EMUL_UNHANDLEABLE )
 {
 gdprintk(XENLOG_ERR, Failed to emulate insn.\n);
@@ -177,9 +181,6 @@ void vmx_realmode(struct 

[Xen-devel] [PATCH v8 09/13] x86/hvm: remove HVMIO_dispatched I/O state

2015-07-09 Thread Paul Durrant
By removing the HVMIO_dispatched state and making all pending emulations
(i.e. all those not handled by the hypervisor) use HVMIO_awating_completion,
various code-paths can be simplified.

The completion case for HVMIO_dispatched can also be trivally removed
from hvmemul_do_io() as it was already unreachable. This is because that
state was only ever used for writes or I/O to/from a guest page and
hvmemul_do_io() is never called to complete such I/O.

NOTE: There is one sublety in handle_pio()... The only case when
  handle_pio() got a return code of X86EMUL_RETRY back from
  hvmemul_do_pio_buffer() and found io_state was not
  HVMIO_awaiting_completion was in the case where the domain is
  shutting down. This is because all writes normally yield a return
  of HVMEMUL_OKAY and all reads put io_state into
  HVMIO_awaiting_completion. Hence the io_state check there is
  replaced with a check of the is_shutting_down flag on the domain.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- No change

v7:
- Modified struct field placement as knock-on from previous patch

v6:
- Added Andrew's reviewed-by

v5:
- Added some extra comments to the commit
---
 xen/arch/x86/hvm/emulate.c  |   12 +++-
 xen/arch/x86/hvm/hvm.c  |   12 +++-
 xen/arch/x86/hvm/io.c   |   14 +++---
 xen/arch/x86/hvm/vmx/realmode.c |2 +-
 xen/include/asm-x86/hvm/vcpu.h  |   10 +-
 5 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index c26be54..933a605 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -137,20 +137,14 @@ static int hvmemul_do_io(
 if ( data_is_addr || dir == IOREQ_WRITE )
 return X86EMUL_UNHANDLEABLE;
 goto finish_access;
-case HVMIO_dispatched:
-/* May have to wait for previous cycle of a multi-write to complete. */
-if ( is_mmio  !data_is_addr  (dir == IOREQ_WRITE) 
- (addr == (vio-mmio_large_write_pa +
-   vio-mmio_large_write_bytes)) )
-return X86EMUL_RETRY;
-/* fallthrough */
 default:
 return X86EMUL_UNHANDLEABLE;
 }
 
-vio-io_state = (data_is_addr || dir == IOREQ_WRITE) ?
-HVMIO_dispatched : HVMIO_awaiting_completion;
+vio-io_state = HVMIO_awaiting_completion;
 vio-io_size = size;
+vio-io_dir = dir;
+vio-io_data_is_addr = data_is_addr;
 
 if ( dir == IOREQ_WRITE )
 {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 093a710..8e487d4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -416,22 +416,16 @@ static void hvm_io_assist(ioreq_t *p)
 {
 struct vcpu *curr = current;
 struct hvm_vcpu_io *vio = curr-arch.hvm_vcpu.hvm_io;
-enum hvm_io_state io_state;
 
 p-state = STATE_IOREQ_NONE;
 
-io_state = vio-io_state;
-vio-io_state = HVMIO_none;
-
-switch ( io_state )
+if ( hvm_vcpu_io_need_completion(vio) )
 {
-case HVMIO_awaiting_completion:
 vio-io_state = HVMIO_completed;
 vio-io_data = p-data;
-break;
-default:
-break;
 }
+else
+vio-io_state = HVMIO_none;
 
 msix_write_completion(curr);
 vcpu_end_shutdown_deferral(curr);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 221d05e..3b51d59 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -90,9 +90,7 @@ int handle_mmio(void)
 
 rc = hvm_emulate_one(ctxt);
 
-if ( rc != X86EMUL_RETRY )
-vio-io_state = HVMIO_none;
-if ( vio-io_state == HVMIO_awaiting_completion || vio-mmio_retry )
+if ( hvm_vcpu_io_need_completion(vio) || vio-mmio_retry )
 vio-io_completion = HVMIO_mmio_completion;
 else
 vio-mmio_access = (struct npfec){};
@@ -142,6 +140,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
 
 rc = hvmemul_do_pio_buffer(port, size, dir, data);
 
+if ( hvm_vcpu_io_need_completion(vio) )
+vio-io_completion = HVMIO_pio_completion;
+
 switch ( rc )
 {
 case X86EMUL_OKAY:
@@ -154,11 +155,10 @@ int handle_pio(uint16_t port, unsigned int size, int dir)
 }
 break;
 case X86EMUL_RETRY:
-if ( vio-io_state != HVMIO_awaiting_completion )
+/* We should not advance RIP/EIP if the domain is shutting down */
+if ( curr-domain-is_shutting_down )
 return 0;
-/* Completion in hvm_io_assist() with no re-emulation required. */
-ASSERT(dir == IOREQ_READ);
-vio-io_completion = HVMIO_pio_completion;
+
 break;
 default:
 gdprintk(XENLOG_ERR, Weird HVM ioemulation status %d.\n, rc);
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 76ff9a5..deb53ae 100644
--- 

[Xen-devel] [PATCH v8 05/13] x86/hvm: unify stdvga mmio intercept with standard mmio intercept

2015-07-09 Thread Paul Durrant
It's clear from the following check in hvmemul_rep_movs:

if ( sp2mt == p2m_mmio_direct || dp2mt == p2m_mmio_direct ||
 (sp2mt == p2m_mmio_dm  dp2mt == p2m_mmio_dm) )
return X86EMUL_UNHANDLEABLE;

that mmio - mmio copy is not handled. This means the code in the
stdvga mmio intercept that explicitly handles mmio - mmio copy when
hvm_copy_to/from_guest_phys() fails is never going to be executed.

This patch therefore adds a check in hvmemul_do_io_addr() to make sure
mmio - mmio is disallowed and then registers standard mmio intercept ops
in stdvga_init().

With this patch all mmio and portio handled within Xen now goes through
process_io_intercept().

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- Addressed further comments from Jan

v7:
- Use hvm_mmio_{first,last}_byte in stdvga_mem_accept for correctness
- Add comments requested by Jan

v6:
- Added Andrew's reviewed-by

v5:
- Fixed semantic problems pointed out by Jan
---
 xen/arch/x86/hvm/emulate.c   |9 ++
 xen/arch/x86/hvm/intercept.c |   30 --
 xen/arch/x86/hvm/stdvga.c|  212 +++---
 xen/include/asm-x86/hvm/io.h |   10 +-
 4 files changed, 113 insertions(+), 148 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 7eeaaea..195de00 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -266,6 +266,15 @@ static int hvmemul_acquire_page(unsigned long gmfn, struct 
page_info **page)
 return X86EMUL_RETRY;
 }
 
+/* This code should not be reached if the gmfn is not RAM */
+if ( p2m_is_mmio(p2mt) )
+{
+domain_crash(curr_d);
+
+put_page(*page);
+return X86EMUL_UNHANDLEABLE;
+}
+
 return X86EMUL_OKAY;
 }
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 71c4a0f..e36189e 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -263,20 +263,21 @@ const struct hvm_io_handler *hvm_find_io_handler(ioreq_t 
*p)
 int hvm_io_intercept(ioreq_t *p)
 {
 const struct hvm_io_handler *handler;
-
-if ( p-type == IOREQ_TYPE_COPY )
-{
-int rc = stdvga_intercept_mmio(p);
-if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
-return rc;
-}
+const struct hvm_io_ops *ops;
+int rc;
 
 handler = hvm_find_io_handler(p);
 
 if ( handler == NULL )
 return X86EMUL_UNHANDLEABLE;
 
-return hvm_process_io_intercept(handler, p);
+rc = hvm_process_io_intercept(handler, p);
+
+ops = handler-ops;
+if ( ops-complete != NULL )
+ops-complete(handler);
+
+return rc;
 }
 
 struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
@@ -338,6 +339,8 @@ void relocate_portio_handler(struct domain *d, unsigned int 
old_port,
 
 bool_t hvm_mmio_internal(paddr_t gpa)
 {
+const struct hvm_io_handler *handler;
+const struct hvm_io_ops *ops;
 ioreq_t p = {
 .type = IOREQ_TYPE_COPY,
 .addr = gpa,
@@ -345,7 +348,16 @@ bool_t hvm_mmio_internal(paddr_t gpa)
 .size = 1,
 };
 
-return hvm_find_io_handler(p) != NULL;
+handler = hvm_find_io_handler(p);
+
+if ( handler == NULL )
+return 0;
+
+ops = handler-ops;
+if ( ops-complete != NULL )
+ops-complete(handler);
+
+return 1;
 }
 
 /*
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index e6dfdb7..4a7593d 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -148,7 +148,7 @@ static int stdvga_outb(uint64_t addr, uint8_t val)
 }
 else if ( prev_stdvga  !s-stdvga )
 {
-gdprintk(XENLOG_INFO, leaving stdvga\n);
+gdprintk(XENLOG_INFO, leaving stdvga mode\n);
 }
 
 return rc;
@@ -275,9 +275,10 @@ static uint8_t stdvga_mem_readb(uint64_t addr)
 return ret;
 }
 
-static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size)
+static int stdvga_mem_read(const struct hvm_io_handler *handler,
+   uint64_t addr, uint32_t size, uint64_t *p_data)
 {
-uint64_t data = 0;
+uint64_t data = ~0ul;
 
 switch ( size )
 {
@@ -309,11 +310,12 @@ static uint64_t stdvga_mem_read(uint64_t addr, uint64_t 
size)
 break;
 
 default:
-gdprintk(XENLOG_WARNING, invalid io size: %PRId64\n, size);
+gdprintk(XENLOG_WARNING, invalid io size: %u\n, size);
 break;
 }
 
-return data;
+*p_data = data;
+return X86EMUL_OKAY;
 }
 
 static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
@@ -424,8 +426,23 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val)
 }
 }
 
-static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
+static int stdvga_mem_write(const struct hvm_io_handler *handler,
+uint64_t addr, uint32_t size,
+

[Xen-devel] [PATCH v8 06/13] x86/hvm: limit reps to avoid the need to handle retry

2015-07-09 Thread Paul Durrant
By limiting hvmemul_do_io_addr() to reps falling within the page on which
a reference has already been taken, we can guarantee that calls to
hvm_copy_to/from_guest_phys() will not hit the HVMCOPY_gfn_paged_out
or HVMCOPY_gfn_shared cases. Thus we can remove the retry logic (added
by c/s 82ed8716b fix direct PCI port I/O emulation retry and error
handling) from the intercept code and simplify it significantly.

Normally hvmemul_do_io_addr() will only reference single page at a time.
It will, however, take an extra page reference for I/O spanning a page
boundary.

It is still important to know, upon returning from x86_emulate(), whether
the number of reps was reduced so the mmio_retry flag is retained for that
purpose.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- No change

v7:
- Fixed flaw in calculation pointed out by Jan
- Make nr_pages unsigned as requested by Jan
- Added Andrew's reviewed-by

v6:
- Added comment requested by Andrew

v5:
- Addressed further comments from Jan
---
 xen/arch/x86/hvm/emulate.c |   76 ++--
 xen/arch/x86/hvm/hvm.c |4 +++
 xen/arch/x86/hvm/intercept.c   |   57 +++---
 xen/include/asm-x86/hvm/vcpu.h |2 +-
 4 files changed, 66 insertions(+), 73 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 195de00..02bde26 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -51,7 +51,7 @@ static void hvmtrace_io_assist(const ioreq_t *p)
 }
 
 static int hvmemul_do_io(
-bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
 uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
 struct vcpu *curr = current;
@@ -60,6 +60,7 @@ static int hvmemul_do_io(
 .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
 .addr = addr,
 .size = size,
+.count = reps,
 .dir = dir,
 .df = df,
 .data = data,
@@ -125,15 +126,6 @@ static int hvmemul_do_io(
 HVMIO_dispatched : HVMIO_awaiting_completion;
 vio-io_size = size;
 
-/*
- * When retrying a repeated string instruction, force exit to guest after
- * completion of the retried iteration to allow handling of interrupts.
- */
-if ( vio-mmio_retrying )
-*reps = 1;
-
-p.count = *reps;
-
 if ( dir == IOREQ_WRITE )
 {
 if ( !data_is_addr )
@@ -147,17 +139,9 @@ static int hvmemul_do_io(
 switch ( rc )
 {
 case X86EMUL_OKAY:
-case X86EMUL_RETRY:
-*reps = p.count;
 p.state = STATE_IORESP_READY;
-if ( !vio-mmio_retry )
-{
-hvm_io_assist(p);
-vio-io_state = HVMIO_none;
-}
-else
-/* Defer hvm_io_assist() invocation to hvm_do_resume(). */
-vio-io_state = HVMIO_handle_mmio_awaiting_completion;
+hvm_io_assist(p);
+vio-io_state = HVMIO_none;
 break;
 case X86EMUL_UNHANDLEABLE:
 {
@@ -235,7 +219,7 @@ static int hvmemul_do_io_buffer(
 
 BUG_ON(buffer == NULL);
 
-rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
+rc = hvmemul_do_io(is_mmio, addr, *reps, size, dir, df, 0,
(uintptr_t)buffer);
 if ( rc == X86EMUL_UNHANDLEABLE  dir == IOREQ_READ )
 memset(buffer, 0xff, size);
@@ -287,17 +271,56 @@ static int hvmemul_do_io_addr(
 bool_t is_mmio, paddr_t addr, unsigned long *reps,
 unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa)
 {
-struct page_info *ram_page;
+struct vcpu *v = current;
+unsigned long ram_gmfn = paddr_to_pfn(ram_gpa);
+unsigned int page_off = ram_gpa  (PAGE_SIZE - 1);
+struct page_info *ram_page[2];
+unsigned int nr_pages = 0;
+unsigned long count;
 int rc;
 
-rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), ram_page);
+rc = hvmemul_acquire_page(ram_gmfn, ram_page[nr_pages]);
 if ( rc != X86EMUL_OKAY )
-return rc;
+goto out;
 
-rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1,
+nr_pages++;
+
+/* Detemine how many reps will fit within this page */
+count = min_t(unsigned long,
+  *reps,
+  df ?
+  ((page_off + size - 1)  ~PAGE_MASK) / size :
+  (PAGE_SIZE - page_off) / size);
+
+if ( count == 0 )
+{
+/*
+ * This access must span two pages, so grab a reference to
+ * the next page and do a single rep.
+ * It is safe to assume multiple pages are physically
+ * contiguous at this point as hvmemul_linear_to_phys() will
+ * ensure this is the case.
+ */
+rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1,
+  

[Xen-devel] [PATCH v8 02/13] x86/hvm: unify internal portio and mmio intercepts

2015-07-09 Thread Paul Durrant
The implementation of mmio and portio intercepts is unnecessarily different.
This leads to much code duplication. This patch unifies much of the
intercept handling, leaving only distinct handlers for stdvga mmio and dpci
portio. Subsequent patches will unify those handlers.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- Re-base

v7:
- Change start/end fields in portio handler to port/size so that
  relocation only has to modify a single field
- Addressed further comments from Jan

v6:
- Added Andrew's reviewed-by and made the modification requested
  by Roger

v5:
- Addressed further comments from Jan and simplified implementation
  by passing ioreq_t to accept() function
---
 xen/arch/x86/hvm/emulate.c|   11 +-
 xen/arch/x86/hvm/hpet.c   |4 +-
 xen/arch/x86/hvm/hvm.c|9 +-
 xen/arch/x86/hvm/intercept.c  |  475 +
 xen/arch/x86/hvm/stdvga.c |   32 +-
 xen/arch/x86/hvm/vioapic.c|4 +-
 xen/arch/x86/hvm/vlapic.c |5 +-
 xen/arch/x86/hvm/vmsi.c   |   10 +-
 xen/drivers/passthrough/amd/iommu_guest.c |   30 +-
 xen/include/asm-x86/hvm/domain.h  |1 +
 xen/include/asm-x86/hvm/io.h  |  118 ---
 11 files changed, 331 insertions(+), 368 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 7d2e3c1..7eeaaea 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -142,16 +142,7 @@ static int hvmemul_do_io(
 hvmtrace_io_assist(p);
 }
 
-if ( is_mmio )
-{
-rc = hvm_mmio_intercept(p);
-if ( rc == X86EMUL_UNHANDLEABLE )
-rc = hvm_buffered_io_intercept(p);
-}
-else
-{
-rc = hvm_portio_intercept(p);
-}
+rc = hvm_io_intercept(p);
 
 switch ( rc )
 {
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 30ac5dd..732504a 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -504,7 +504,7 @@ static int hpet_range(struct vcpu *v, unsigned long addr)
  (addr  (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
 }
 
-const struct hvm_mmio_ops hpet_mmio_ops = {
+static const struct hvm_mmio_ops hpet_mmio_ops = {
 .check = hpet_range,
 .read  = hpet_read,
 .write = hpet_write
@@ -659,6 +659,8 @@ void hpet_init(struct domain *d)
 h-hpet.comparator64[i] = ~0ULL;
 h-pt[i].source = PTSRC_isa;
 }
+
+register_mmio_handler(d, hpet_mmio_ops);
 }
 
 void hpet_deinit(struct domain *d)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 126882d..1fd5efc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1455,9 +1455,6 @@ int hvm_domain_initialise(struct domain *d)
 spin_lock_init(d-arch.hvm_domain.irq_lock);
 spin_lock_init(d-arch.hvm_domain.uc_lock);
 
-INIT_LIST_HEAD(d-arch.hvm_domain.msixtbl_list);
-spin_lock_init(d-arch.hvm_domain.msixtbl_list_lock);
-
 hvm_init_cacheattr_region_list(d);
 
 rc = paging_enable(d, PG_refcounts|PG_translate|PG_external);
@@ -1465,11 +1462,11 @@ int hvm_domain_initialise(struct domain *d)
 goto fail0;
 
 d-arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
-d-arch.hvm_domain.io_handler = xmalloc(struct hvm_io_handler);
+d-arch.hvm_domain.io_handler = xzalloc_array(struct hvm_io_handler,
+  NR_IO_HANDLERS);
 rc = -ENOMEM;
 if ( !d-arch.hvm_domain.params || !d-arch.hvm_domain.io_handler )
 goto fail1;
-d-arch.hvm_domain.io_handler-num_slot = 0;
 
 /* Set the default IO Bitmap. */
 if ( is_hardware_domain(d) )
@@ -1506,6 +1503,8 @@ int hvm_domain_initialise(struct domain *d)
 
 rtc_init(d);
 
+msixtbl_init(d);
+
 register_portio_handler(d, 0xe9, 1, hvm_print_line);
 register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 52879ff..f97ee52 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,205 +32,86 @@
 #include xen/event.h
 #include xen/iommu.h
 
-static const struct hvm_mmio_ops *const
-hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
+static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler,
+  const ioreq_t *p)
 {
-hpet_mmio_ops,
-vlapic_mmio_ops,
-vioapic_mmio_ops,
-msixtbl_mmio_ops,
-iommu_mmio_ops
-};
+BUG_ON(handler-type != IOREQ_TYPE_COPY);
+
+return handler-mmio.ops-check(current, p-addr);
+}
 
-static int hvm_mmio_access(struct vcpu *v,
-   ioreq_t *p,
-   hvm_mmio_read_t read,
-   hvm_mmio_write_t write)
+static int hvm_mmio_read(const struct hvm_io_handler 

[Xen-devel] [PATCH v8 00/13] x86/hvm: I/O emulation cleanup and fix

2015-07-09 Thread Paul Durrant
This patch series re-works much of the code involved in emulation of port
and memory mapped I/O for HVM guests.

The code has become very convoluted and, at least by inspection, certain
emulations will apparently malfunction.

The series is broken down into 13 patches (which are also available in
my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
on the emulation36 branch).

Previous changelog
--

v4:
 - Removed previous patch (make sure translated MMIO reads or
   writes fall within a page) and rebased rest of series.
 - Address Jan's comments on patch #1

v3:
 - Addressed comments from Jan
 - Re-ordered series to bring a couple of more trivial patches to the
   front
 - Backport to XenServer (4.5) now passing automated tests
 - Tested on unstable with QEMU upstream and trad, with and without
   HAP (to force shadow emulation)

v2:
 - Removed bogus assertion from patch #15
 - Re-worked patch #17 after basic testing of back-port onto XenServer

Subsequent changes are logged in the individual patch files (thanks
to David Vrabel for that).

Testing
---

v6 of the series was been back-ported to staging-4.5 and then dropped
onto the XenServer (Dundee) patch queue. All automated branch-safety tests
pass.

v8 has just been compile tested since changes since v6 are largely
cosmetic. It will be back-ported in the near future.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 03/13] x86/hvm: add length to mmio check op

2015-07-09 Thread Paul Durrant
When memory mapped I/O is range checked by internal handlers, the length
of the access should be taken into account.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- No change

v7:
- No change

v6:
- Added Andrew's reviewed-by

v5:
- Simplified by leaving mmio_check() implementation alone and
  calling to check last byte if first-byte check passes
---
 xen/arch/x86/hvm/intercept.c |   23 ---
 xen/include/asm-x86/hvm/io.h |   16 
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index f97ee52..f4dbf17 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -35,9 +35,20 @@
 static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler,
   const ioreq_t *p)
 {
+paddr_t first = hvm_mmio_first_byte(p);
+paddr_t last = hvm_mmio_last_byte(p);
+
 BUG_ON(handler-type != IOREQ_TYPE_COPY);
 
-return handler-mmio.ops-check(current, p-addr);
+if ( !handler-mmio.ops-check(current, first) )
+return 0;
+
+/* Make sure the handler will accept the whole access */
+if ( p-size  1 
+ !handler-mmio.ops-check(current, last) )
+domain_crash(current-domain);
+
+return 1;
 }
 
 static int hvm_mmio_read(const struct hvm_io_handler *handler,
@@ -106,7 +117,8 @@ static const struct hvm_io_ops portio_ops = {
 int hvm_process_io_intercept(const struct hvm_io_handler *handler,
  ioreq_t *p)
 {
-struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io;
+struct vcpu *curr = current;
+struct hvm_vcpu_io *vio = curr-arch.hvm_vcpu.hvm_io;
 const struct hvm_io_ops *ops = (p-type == IOREQ_TYPE_COPY) ?
mmio_ops : portio_ops;
 int rc = X86EMUL_OKAY, i, step = p-df ? -p-size : p-size;
@@ -215,6 +227,9 @@ int hvm_process_io_intercept(const struct hvm_io_handler 
*handler,
 
 if ( i != 0 )
 {
+if ( rc == X86EMUL_UNHANDLEABLE )
+domain_crash(curr-domain);
+
 p-count = i;
 rc = X86EMUL_OKAY;
 }
@@ -331,7 +346,9 @@ bool_t hvm_mmio_internal(paddr_t gpa)
 {
 ioreq_t p = {
 .type = IOREQ_TYPE_COPY,
-.addr = gpa
+.addr = gpa,
+.count = 1,
+.size = 1,
 };
 
 return hvm_find_io_handler(p) != NULL;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 4594e91..2b22b50 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -43,6 +43,22 @@ struct hvm_mmio_ops {
 hvm_mmio_write_t write;
 };
 
+static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
+{
+return p-df ?
+   p-addr - (p-count - 1ul) * p-size :
+   p-addr;
+}
+
+static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
+{
+unsigned long count = p-count;
+
+return p-df ?
+   p-addr + p-size - 1:
+   p-addr + (count * p-size) - 1;
+}
+
 typedef int (*portio_action_t)(
 int dir, unsigned int port, unsigned int bytes, uint32_t *val);
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-09 Thread Wei Liu
On Thu, Jul 09, 2015 at 05:00:45PM +0100, Jan Beulich wrote:
  On 09.07.15 at 17:53, elena.ufimts...@oracle.com wrote:
  - jbeul...@suse.com wrote:
   On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote:
   You are right, it needs to be rebased. I can post later rebased on
  memory 
   leak fix version, if you thin its a way to go.
  
  I didn't look at v9 yet, and can't predict when I will be able to.
  
  Would you like me to post v10 with memory leak patch included in the 
  patchset before you start looking at v9?
 
 If there is a dependency on the changes in the leak fix v6, then
 this would be a good idea. If not, you can keep things as they are
 now. I view the entire set more as a bug fix than a feature anyway,
 and hence see no reason not to get this in after the freeze. But I'm
 adding Wei just in case...
 

I just looked at v9. The first three patches are quite mechanical. The
fourth patch is relatively bigger but it's also quite straightforward
(mostly parsing input). All in all, this series itself is
self-contained.

I'm don't think OSSTest is able to test that, so it would not cause
visible regression on our side.

I also agree it's a bug fix. Preferably this series should be applied
before first RC.

Wei.

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] bsd-sys-queue-h-seddery: Massage `offsetof'

2015-07-09 Thread Ian Jackson
For some reason BSD's queue.h uses `__offsetof'.  It expects it to
work just like offsetof.  So use offsetof.

Reported-by: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
---
 tools/include/xen-external/bsd-sys-queue-h-seddery |2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/xen-external/bsd-sys-queue-h-seddery 
b/tools/include/xen-external/bsd-sys-queue-h-seddery
index 7a957e3..3f8716d 100755
--- a/tools/include/xen-external/bsd-sys-queue-h-seddery
+++ b/tools/include/xen-external/bsd-sys-queue-h-seddery
@@ -69,4 +69,6 @@ s/\b struct \s+ type \b/type/xg;
 
 s,^\#include.*sys/cdefs.*,/* $ */,xg;
 
+s,\b __offsetof \b ,offsetof,xg;
+
 s/\b( NULL )/0/xg;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-mainline test] 59214: regressions - FAIL

2015-07-09 Thread osstest service owner
flight 59214 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59214/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-xsm  15 guest-start/debian.repeat fail REGR. vs. 59059
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 11 guest-saverestore fail REGR. vs. 
59059
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 11 guest-saverestore fail REGR. 
vs. 59059
 test-amd64-amd64-xl-qemuu-ovmf-amd64 11 guest-saverestore fail REGR. vs. 59059
 test-amd64-i386-freebsd10-amd64 12 guest-saverestore  fail REGR. vs. 59059
 test-amd64-i386-xl-qemuu-ovmf-amd64 11 guest-saverestore  fail REGR. vs. 59059
 test-amd64-i386-freebsd10-i386 12 guest-saverestore   fail REGR. vs. 59059
 test-amd64-i386-xl-qemuu-debianhvm-amd64 11 guest-saverestore fail REGR. vs. 
59059
 test-amd64-amd64-xl-qemuu-win7-amd64 11 guest-saverestore fail REGR. vs. 59059
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 11 guest-saverestore fail REGR. 
vs. 59059
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 11 guest-saverestore fail REGR. vs. 
59059
 test-amd64-amd64-xl-qemuu-winxpsp3 11 guest-saverestore   fail REGR. vs. 59059
 test-amd64-i386-xl-qemuu-win7-amd64 11 guest-saverestore  fail REGR. vs. 59059
 test-amd64-i386-xl-qemuu-winxpsp3 11 guest-saverestorefail REGR. vs. 59059

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 11 guest-start   fail REGR. vs. 59059
 test-amd64-i386-libvirt  11 guest-start  fail   like 59059
 test-amd64-amd64-libvirt 11 guest-start  fail   like 59059
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   like 59059

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt-xsm 11 guest-start  fail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass

version targeted for testing:
 qemuud09952ee8caeea928695d5a3dc3ec50d8afb98c6
baseline version:
 qemuu35360642d043c2a5366e8a04a10e5545e7353bd5

Last test of basis59059  2015-07-05 10:39:20 Z4 days
Failing since 59109  2015-07-06 14:58:21 Z3 days3 attempts
Testing same since59214  2015-07-08 17:07:56 Z0 days1 attempts


People who touched revisions under test:
  Alberto Garcia be...@igalia.com
  Alex Williamson alex.william...@redhat.com
  Alexander Graf ag...@suse.de
  Alexey Kardashevskiy a...@ozlabs.ru
  Alvise Rigo a.r...@virtualopensystems.com
  Andrew Jones drjo...@redhat.com
  Artyom Tarasenko atar4q...@gmail.com
  Aurelien Jarno aurel...@aurel32.net
  Benjamin Herrenschmidt b...@kernel.crashing.org
  Bharata B Rao bhar...@linux.vnet.ibm.com
  Brian Kress kre...@moose.net
  Cormac O'Brien i.am.cormac.obr...@gmail.com
  Cornelia Huck cornelia.h...@de.ibm.com
  David Gibson da...@gibson.dropbear.id.au
  Denis V. Lunev d...@openvz.org
  Dmitry Osipenko dig...@gmail.com
  Dr. David Alan Gilbert dgilb...@redhat.com
  Eduardo Habkost ehabk...@redhat.com
  Eric Auger eric.au...@linaro.org
  Fam Zheng f...@redhat.com
  Gabriel Laupre glau...@chelsio.com
  Gavin Shan gws...@linux.vnet.ibm.com
  Gerd Hoffmann kra...@redhat.com
  Gonglei arei.gong...@huawei.com
  Greg Kurz gk...@linux.vnet.ibm.com
  Igor Mammedov imamm...@redhat.com
  Jan Kiszka jan.kis...@siemens.com
  Johannes Schlatow schla...@ida.ing.tu-bs.de
  John Snow js...@redhat.com
  Juan Quintela quint...@redhat.com
  Justin Ossevoort jus...@quarantainenet.nl
  Kirk Allan kal...@suse.com
  Laszlo Ersek ler...@redhat.com
  Laurent Vivier laur...@vivier.eu
  Laurent Vivier lviv...@redhat.com
  Li Zhijian lizhij...@cn.fujitsu.com
  Marc-André Lureau marcandre.lur...@redhat.com
  Markus Armbruster arm...@redhat.com
  Max Filippov jcmvb...@gmail.com
  Michael Roth mdr...@linux.vnet.ibm.com
  Michael S. Tsirkin m...@redhat.com
  Nikunj A Dadhania nik...@linux.vnet.ibm.com
  Olga Krishtal okrish...@virtuozzo.com
  Paolo Bonzini pbonz...@redhat.com
  Paul Durrant paul.durr...@citrix.com
  Paulo Alcantara pca...@gmail.com
  Paulo Alcantara pca...@zytor.com
  Peter Crosthwaite crosthwaite.pe...@gmail.com
  Peter 

Re: [Xen-devel] [PATCH v7 01/15] x86/hvm: remove multiple open coded 'chunking' loops

2015-07-09 Thread Paul Durrant
 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 09 July 2015 16:13
 To: Paul Durrant
 Cc: xen-devel@lists.xen.org; Keir (Xen.org)
 Subject: Re: [PATCH v7 01/15] x86/hvm: remove multiple open coded
 'chunking' loops
 
  On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
  +static int hvmemul_linear_mmio_access(
  +unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer,
  +uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t
 known_gpfn)
  +{
  +struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io;
  +unsigned long offset = gla  ~PAGE_MASK;
  +unsigned int chunk;
  +paddr_t gpa;
  +unsigned long one_rep = 1;
  +int rc;
  +
  +chunk = min_t(unsigned int, size, PAGE_SIZE - offset);
  +
  +if ( known_gpfn )
  +gpa = pfn_to_paddr(vio-mmio_gpfn) | offset;
  +else
  +{
  +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec,
  +hvmemul_ctxt);
  +if ( rc != X86EMUL_OKAY )
  +return rc;
  +}
  +
  +for ( ;; )
  +{
  +rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer);
  +if ( rc != X86EMUL_OKAY )
  +break;
  +
  +gla += chunk;
  +buffer += chunk;
  +size -= chunk;
  +
  +if ( size == 0 )
  +break;
  +
  +ASSERT((gla  ~PAGE_MASK) == 0);
 
 While I don't mean you to re-submit another time, I'd still like to
 get my question answered: Does this really matter for the code
 below?
 

No, it doesn't, but if it's not true then an incorrect chunk size was chosen 
above.

  Paul

 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 01/15] x86/hvm: remove multiple open coded 'chunking' loops

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 18:16, paul.durr...@citrix.com wrote:
  -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 09 July 2015 16:13
 To: Paul Durrant
 Cc: xen-devel@lists.xen.org; Keir (Xen.org)
 Subject: Re: [PATCH v7 01/15] x86/hvm: remove multiple open coded
 'chunking' loops
 
  On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
  +static int hvmemul_linear_mmio_access(
  +unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer,
  +uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t
 known_gpfn)
  +{
  +struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io;
  +unsigned long offset = gla  ~PAGE_MASK;
  +unsigned int chunk;
  +paddr_t gpa;
  +unsigned long one_rep = 1;
  +int rc;
  +
  +chunk = min_t(unsigned int, size, PAGE_SIZE - offset);
  +
  +if ( known_gpfn )
  +gpa = pfn_to_paddr(vio-mmio_gpfn) | offset;
  +else
  +{
  +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec,
  +hvmemul_ctxt);
  +if ( rc != X86EMUL_OKAY )
  +return rc;
  +}
  +
  +for ( ;; )
  +{
  +rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer);
  +if ( rc != X86EMUL_OKAY )
  +break;
  +
  +gla += chunk;
  +buffer += chunk;
  +size -= chunk;
  +
  +if ( size == 0 )
  +break;
  +
  +ASSERT((gla  ~PAGE_MASK) == 0);
 
 While I don't mean you to re-submit another time, I'd still like to
 get my question answered: Does this really matter for the code
 below?
 
 No, it doesn't, but if it's not true then an incorrect chunk size was chosen 
 above.

I suspected as much. It's then just slightly less odd than having

x = 0;
ASSERT(x == 0);

I guess I'll strip the ASSERT() when applying.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 07/15] x86/hvm: unify stdvga mmio intercept with standard mmio intercept

2015-07-09 Thread Paul Durrant
 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 09 July 2015 17:22
 To: Paul Durrant
 Cc: xen-devel@lists.xen.org; Keir (Xen.org)
 Subject: RE: [PATCH v7 07/15] x86/hvm: unify stdvga mmio intercept with
 standard mmio intercept
 
  On 09.07.15 at 18:12, paul.durr...@citrix.com wrote:
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: 09 July 2015 16:33
   On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
   @@ -424,8 +426,22 @@ static void stdvga_mem_writeb(uint64_t addr,
  uint32_t val)
}
}
  
   -static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t
 size)
   +static int stdvga_mem_write(const struct hvm_io_handler *handler,
   +uint64_t addr, uint32_t size,
   +uint64_t data)
{
   +struct hvm_hw_stdvga *s = current-domain-
 arch.hvm_domain.stdvga;
   +ioreq_t p = { .type = IOREQ_TYPE_COPY,
   +  .addr = addr,
   +  .size = size,
   +  .count = 1,
   +  .dir = IOREQ_WRITE,
   +  .data = data,
   +};
 
  Indentation (still - I know I pointed this out on v6, just perhaps at
  another example). See e.g. the context of the earlier change to the
  beginning of hvm_mmio_internal() in this patch for how this should
  look like.
 
  It has to be something my emacs is doing then; I can't see any brokenness.
 
 Indentation simply is too deep. See the good example I pointed
 you at.
 

Yes, I'll look... and maybe see what it looks like with vi too.

  Paul

 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 18:23, paul.durr...@citrix.com wrote:
 I would have thought that the compiler would point out the overflow if a 
 caller tried to pass 64k in the port number,

That would at best happen when passing literal numbers.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.

2015-07-09 Thread Dagaen Golomb
 I think you might assume that the first M VCPUs  in the runq are the
 current running VCPUs on the M pCPUs. Am I correct? (From what you
 described in the following example, I think I'm correct. ;-) )

This is an assumption he gathered from my implementation, because I
wrote code that uses this fact. Right now the code is wrong since
current vcpus are not kept in the queue.

 I tell that you make the above assumption from here.

 However, in the current implementation, runq does not hold the current
 running VCPUs on the pCPUs. We remove the vcpu from runq in
 rt_schedule() function. What you described above make perfect sense
 if we decide to make runq hold the current running VCPUs.

 Actually, after thinking about the example you described, I think we
 can hold the current running VCPUs *and* the current idle pCPUs in the
 scheduler-wide structure; In other words, we can have another runningq
 (not runq) and a idle_pcpu list in the rt_private; Now all VCPUs are
 stored in three queues: runningq, runq, and depletedq, in increasing
 priority order.

 When we make the tickle decision, we only need to scan the idle_pcpu
 and then runningq to figure out which pCPU to tickle. All of other
 design you describe still hold here, except that the position where a
 VCPU is inserted into runq cannot directly give us which pCPU to
 tickle. What do you think?

This is an good idea too. I think having the running vcpus simply at the
beginning of the runq could work, too.

 In case (b) there may be idle pCPUs (and, if that's the case, we
 should tickle one of them, of course) or not. If not, we need to go
 figure out which pCPU to tickle, which is exactly what runq_tickle()
 does, but we at least know for sure that we want to tickle the pCPU
 where vCPU k runs, or others where vCPUs with deadline greater than vCPU
 k run.

 Does this make sense?

 Yes, if we decide to hold the currently running VCPUs in
 scheduler-wide structure: it can be runq or runningq.

Right. I think for now I'll just keep them in runq to keep most of the
selection logic the same.

 Thank you again! It is very clear and I'm clear which part is unclear now. :-D


 Dagaen, Meng, any question?

 I really think that, if we manage to implement all this, code quality
 and performance would improve a lot. Oh, considering all the various and
 (logically) different changes that I've hinted at, the patch needs to
 become a patch series! :-D

 Sure! Dagaen, what do you think?

Yes, this may become a series with several changes like this. For now
I am going to get it working with the running vcpus in the runq.
I thought returning the inserted index was a good way of checking if
we need to tickle, and moving the runnings vpus to runq will make
the scheduler almost done as far as functional correctness goes.
Various other features hinted at would be a series on top of this.

Regards,
~Dagaen Golomb

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/15] x86/hvm: track large memory mapped accesses by buffer offset

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
 @@ -635,13 +605,49 @@ static int hvmemul_phys_mmio_access(
  return rc;
  }
  
 +/*
 + * Multi-cycle MMIO handling is based upon the assumption that emulation
 + * of the same instruction will not access the same MMIO region more
 + * than once. Hence we can deal with re-emulation (for secondary or
 + * subsequent cycles) by looking up the result or previous I/O in a
 + * cache indexed by linear MMIO address.
 + */
 +static struct hvm_mmio_cache *hvmemul_find_mmio_cache(
 +struct hvm_vcpu_io *vio, unsigned long gla, uint8_t dir)
 +{
 +unsigned int i;
 +struct hvm_mmio_cache *cache;
 +
 +for ( i = 0; i  vio-mmio_cache_count; i ++ )
 +{
 +cache = vio-mmio_cache[i];
 +
 +if ( gla == cache-gla 
 + dir == cache-dir )
 +return cache;
 +}
 +
 +i = vio-mmio_cache_count++;
 +if( i == ARRAY_SIZE(vio-mmio_cache) )
 +domain_crash(current-domain);

But you mustn't continue here, or at least force i into range so you
don't corrupt other data.

And while doing that please also add the missing space on the
if() line.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 18:05, boris.ostrov...@oracle.com wrote:
 On 07/09/2015 10:30 AM, Boris Ostrovsky wrote:
 On 07/09/2015 10:17 AM, Jan Beulich wrote:
 On 09.07.15 at 16:10, boris.ostrov...@oracle.com wrote:
 On 07/09/2015 03:02 AM, Jan Beulich wrote:
 On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote:
 As I started to update the patches I realized that in some cases
 (especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't
 have VCPU (which is what hvm_guest_x86_mode() wants) but rather 
 only the
 domain. d-vcpu[0] should work. Otherwise I'll either need a new 
 field
 in struct domain or wrap has_32bit_shinfo into something 
 PVH-specific,
 like is_32bit_pvh_vcpu().
 Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like
 for PVH, especially if you also intend the tools to use the 64-bit
 guest context variant even for 32-bit PVH? Once again - are you
 intending to prohibit 32-bit PVH switching to 64-bit mode (which
 would seem both wrong and possibly cumbersome to me)?
 With current PVH implementation I don't think we can switch. We are
 starting the guest in very much PV-like fashion. That's why we are
 getting into switch_compat() --- via XEN_DOMCTL_set_address_size.

 For XEN_DOMCTL_get_address_size specifically we need to be able to
 figure out the mode *before* the guest is running because we use it to
 set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't
 change the mode.
 Okay - but is there code (being put) in place to refuse switch
 attempts?

 No, I should add code to deal with this.
 
 Forgot to include here --- so what is your preference wrt what I am 
 asking in the first paragraph? d-vcpu[0], new field (or maybe a flag 
 with bits per 32bit-pv and 32bit-pvh), or a PVH-wrapper for 
 has_32bit_shinfo?

To be honest, my preference would be to have none of those, and
have the whole thing more HVM-like from the beginning. Considering
that this isn't realistic, a PVH alias for has_32bit_shinfo() would seem
least ugly to me.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 00/12] Alternate p2m: support multiple copies of host p2m

2015-07-09 Thread Wei Liu
On Thu, Jul 09, 2015 at 04:13:20PM +, Sahita, Ravi wrote:
 From: Wei Liu [mailto:wei.l...@citrix.com]
 Sent: Thursday, July 09, 2015 4:49 AM
 
 Question for you and Ed, how much testing have you done to this series? I
 assume you've done testing with it turned on and off, to the point you get a
 functioning guest. Have you run any test suites and got positive results?
 
 Yes, we have tested with Windows hvm guests and got positive results,
 and Tamas' tool patches based tests have as well.
 
 
 [0]: e1z8rcu-0003v6...@ukmail1.uk.xensource.com
 
 This ref link was messed up
 

That's the message-id of my email.

Here is a link to the archive:

http://www.gossamer-threads.com/lists/xen/devel/386386

It's just my 2 weeks' reminder email, which has information on how to
ask for freeze exception. I believed you saw that already.

Wei.

 Thanks,
 Ravi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 07/15] x86/hvm: unify stdvga mmio intercept with standard mmio intercept

2015-07-09 Thread Jan Beulich
 On 09.07.15 at 18:12, paul.durr...@citrix.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 09 July 2015 16:33
  On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
  @@ -424,8 +426,22 @@ static void stdvga_mem_writeb(uint64_t addr,
 uint32_t val)
   }
   }
 
  -static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size)
  +static int stdvga_mem_write(const struct hvm_io_handler *handler,
  +uint64_t addr, uint32_t size,
  +uint64_t data)
   {
  +struct hvm_hw_stdvga *s = current-domain-arch.hvm_domain.stdvga;
  +ioreq_t p = { .type = IOREQ_TYPE_COPY,
  +  .addr = addr,
  +  .size = size,
  +  .count = 1,
  +  .dir = IOREQ_WRITE,
  +  .data = data,
  +};
 
 Indentation (still - I know I pointed this out on v6, just perhaps at
 another example). See e.g. the context of the earlier change to the
 beginning of hvm_mmio_internal() in this patch for how this should
 look like.
 
 It has to be something my emacs is doing then; I can't see any brokenness.

Indentation simply is too deep. See the good example I pointed
you at.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 59295: trouble: pass/preparing/queued/running

2015-07-09 Thread osstest service owner
flight 59295 xen-unstable running [real]
http://logs.test-lab.xenproject.org/osstest/logs/59295/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-rumpuserxen  none executed  queued
 build-amd64-rumpuserxen none executed  queued
 test-amd64-amd64-xl-pvh-intelnone executed  queued
 test-amd64-amd64-xl-pvh-amd none executed  queued
 test-amd64-i386-libvirt-xsm none executed  queued
 test-amd64-i386-libvirt none executed  queued
 build-i386-libvirt  none executed  queued
 test-amd64-amd64-libvirtnone executed  queued
 test-amd64-amd64-libvirt-xsmnone executed  queued
 test-amd64-amd64-xl-credit2 none executed  queued
 test-amd64-amd64-xl-xsm none executed  queued
 build-amd64-libvirt none executed  queued
 test-armhf-armhf-xl-rtdsnone executed  queued
 test-amd64-amd64-xl none executed  queued
 test-amd64-amd64-rumpuserxen-amd64none executed  queued
 test-amd64-i386-xl-xsm  none executed  queued
 test-amd64-amd64-xl-multivcpunone executed  queued
 test-amd64-i386-rumpuserxen-i386none executed  queued
 test-amd64-i386-qemuu-rhel6hvm-intelnone executed  queued
 test-amd64-i386-qemut-rhel6hvm-intelnone executed  queued
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmnone executedqueued
 test-amd64-i386-qemut-rhel6hvm-amdnone executed  queued
 test-armhf-armhf-xl-arndale none executed  queued
 test-amd64-amd64-pair   none executed  queued
 test-armhf-armhf-xl-credit2 none executed  queued
 test-amd64-i386-xl  none executed  queued
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm   none executed queued
 test-amd64-i386-qemuu-rhel6hvm-amdnone executed  queued
 test-amd64-amd64-xl-qemuu-debianhvm-amd64none executedqueued
 test-amd64-i386-xl-qemut-win7-amd64none executed  queued
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsmnone executed queued
 test-amd64-amd64-xl-rtdsnone executed  queued
 test-amd64-i386-pairnone executed  queued
 test-amd64-i386-freebsd10-amd64none executed  queued
 test-amd64-amd64-xl-qemut-win7-amd64none executed  queued
 test-amd64-i386-freebsd10-i386none executed  queued
 test-armhf-armhf-xl-cubietrucknone executed  queued
 test-armhf-armhf-xl-xsm none executed  queued
 test-armhf-armhf-xl none executed  queued
 test-armhf-armhf-xl-multivcpunone executed  queued
 test-armhf-armhf-libvirt-xsmnone executed  queued
 test-amd64-amd64-xl-qemut-debianhvm-amd64none executedqueued
 test-armhf-armhf-libvirtnone executed  queued
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsmnone executed queued
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmnone executedqueued
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsmnone executed queued
 test-amd64-i386-xl-qemuu-debianhvm-amd64none executed queued
 test-amd64-i386-xl-qemut-debianhvm-amd64none executed queued
 test-amd64-amd64-xl-qemuu-ovmf-amd64none executed  queued
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1none executed queued
 test-amd64-i386-xl-qemuu-ovmf-amd64none executed  queued
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1none executed queued
 test-amd64-amd64-xl-qemuu-win7-amd64none executed  queued
 test-amd64-i386-xl-qemuu-win7-amd64none executed  queued
 test-amd64-i386-xl-qemut-winxpsp3none executed  queued
 test-amd64-amd64-xl-qemuu-winxpsp3none executed  queued
 test-amd64-i386-xl-qemuu-winxpsp3none executed  queued
 test-amd64-amd64-xl-qemut-winxpsp3none executed  queued
 build-amd64-oldkern   2 hosts-allocate   running
 build-i386-oldkern2 hosts-allocate   running
 build-amd64-xsm   2 hosts-allocate   running
 build-i386-xsm2 hosts-allocate   running
 build-amd64   2 hosts-allocate   running
 build-i3862 hosts-allocate   running
 build-amd64-pvops 2 hosts-allocate   running
 build-i386-pvops  2 hosts-allocate   running
 build-armhf-libvirt   5 libvirt-buildrunning
 

[Xen-devel] [PATCH v8 13/13] x86/hvm: track large memory mapped accesses by buffer offset

2015-07-09 Thread Paul Durrant
The code in hvmemul_do_io() that tracks large reads or writes, to avoid
re-issue of component I/O, is defeated by accesses across a page boundary
because it uses physical address. The code is also only relevant to memory
mapped I/O to or from a buffer.

This patch re-factors the code and moves it into hvmemul_phys_mmio_access()
where it is relevant and tracks using buffer offset rather than address.
Separate I/O emulations (of which there may be up to three per instruction)
are distinguished by linear address.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- Make sure emulation does not continue after domain_crash()

v7:
- Added comment requested by Jan
- Changed BUG_ON() to domain_crash()

v6:
- Added Andrew's reviewed-by

v5:
- Fixed to cache up three distict I/O emulations per instruction
---
 xen/arch/x86/hvm/emulate.c |  136 ++--
 xen/include/asm-x86/hvm/vcpu.h |   25 +---
 2 files changed, 92 insertions(+), 69 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 89b1616..01ee972 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -106,29 +106,6 @@ static int hvmemul_do_io(
 return X86EMUL_UNHANDLEABLE;
 }
 
-if ( is_mmio  !data_is_addr )
-{
-/* Part of a multi-cycle read or write? */
-if ( dir == IOREQ_WRITE )
-{
-paddr_t pa = vio-mmio_large_write_pa;
-unsigned int bytes = vio-mmio_large_write_bytes;
-if ( (addr = pa)  ((addr + size) = (pa + bytes)) )
-return X86EMUL_OKAY;
-}
-else
-{
-paddr_t pa = vio-mmio_large_read_pa;
-unsigned int bytes = vio-mmio_large_read_bytes;
-if ( (addr = pa)  ((addr + size) = (pa + bytes)) )
-{
-memcpy(p_data, vio-mmio_large_read[addr - pa],
-   size);
-return X86EMUL_OKAY;
-}
-}
-}
-
 switch ( vio-io_req.state )
 {
 case STATE_IOREQ_NONE:
@@ -208,33 +185,6 @@ static int hvmemul_do_io(
 memcpy(p_data, p.data, size);
 }
 
-if ( is_mmio  !data_is_addr )
-{
-/* Part of a multi-cycle read or write? */
-if ( dir == IOREQ_WRITE )
-{
-paddr_t pa = vio-mmio_large_write_pa;
-unsigned int bytes = vio-mmio_large_write_bytes;
-if ( bytes == 0 )
-pa = vio-mmio_large_write_pa = addr;
-if ( addr == (pa + bytes) )
-vio-mmio_large_write_bytes += size;
-}
-else
-{
-paddr_t pa = vio-mmio_large_read_pa;
-unsigned int bytes = vio-mmio_large_read_bytes;
-if ( bytes == 0 )
-pa = vio-mmio_large_read_pa = addr;
-if ( (addr == (pa + bytes)) 
- ((bytes + size) = sizeof(vio-mmio_large_read)) )
-{
-memcpy(vio-mmio_large_read[bytes], p_data, size);
-vio-mmio_large_read_bytes += size;
-}
-}
-}
-
 return X86EMUL_OKAY;
 }
 
@@ -590,11 +540,12 @@ static int hvmemul_virtual_to_linear(
 }
 
 static int hvmemul_phys_mmio_access(
-paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer)
+struct hvm_mmio_cache *cache, paddr_t gpa, unsigned int size, uint8_t dir,
+uint8_t *buffer, unsigned int offset)
 {
 unsigned long one_rep = 1;
 unsigned int chunk;
-int rc;
+int rc = X86EMUL_OKAY;
 
 /* Accesses must fall within a page. */
 BUG_ON((gpa  ~PAGE_MASK) + size  PAGE_SIZE);
@@ -611,14 +562,33 @@ static int hvmemul_phys_mmio_access(
 
 for ( ;; )
 {
-rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0,
-buffer);
-if ( rc != X86EMUL_OKAY )
-break;
+/* Have we already done this chunk? */
+if ( offset  cache-size )
+{
+ASSERT((offset + chunk) = cache-size);
+
+if ( dir == IOREQ_READ )
+memcpy(buffer[offset], cache-buffer[offset], chunk);
+else if ( memcmp(buffer[offset], cache-buffer[offset], chunk) 
!= 0 )
+domain_crash(current-domain);
+}
+else
+{
+ASSERT(offset == cache-size);
+
+rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0,
+buffer[offset]);
+if ( rc != X86EMUL_OKAY )
+break;
+
+/* Note that we have now done this chunk. */
+memcpy(cache-buffer[offset], buffer[offset], chunk);
+cache-size += chunk;
+}
 
 /* Advance to the next chunk. */
 gpa += chunk;
-buffer += chunk;
+offset += chunk;
 size -= chunk;
 
 if ( 

Re: [Xen-devel] [PATCH v7 15/15] x86/hvm: track large memory mapped accesses by buffer offset

2015-07-09 Thread Paul Durrant
 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 09 July 2015 16:46
 To: Paul Durrant
 Cc: xen-devel@lists.xen.org; Keir (Xen.org)
 Subject: Re: [PATCH v7 15/15] x86/hvm: track large memory mapped
 accesses by buffer offset
 
  On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
  @@ -635,13 +605,49 @@ static int hvmemul_phys_mmio_access(
   return rc;
   }
 
  +/*
  + * Multi-cycle MMIO handling is based upon the assumption that
 emulation
  + * of the same instruction will not access the same MMIO region more
  + * than once. Hence we can deal with re-emulation (for secondary or
  + * subsequent cycles) by looking up the result or previous I/O in a
  + * cache indexed by linear MMIO address.
  + */
  +static struct hvm_mmio_cache *hvmemul_find_mmio_cache(
  +struct hvm_vcpu_io *vio, unsigned long gla, uint8_t dir)
  +{
  +unsigned int i;
  +struct hvm_mmio_cache *cache;
  +
  +for ( i = 0; i  vio-mmio_cache_count; i ++ )
  +{
  +cache = vio-mmio_cache[i];
  +
  +if ( gla == cache-gla 
  + dir == cache-dir )
  +return cache;
  +}
  +
  +i = vio-mmio_cache_count++;
  +if( i == ARRAY_SIZE(vio-mmio_cache) )
  +domain_crash(current-domain);
 
 But you mustn't continue here, or at least force i into range so you
 don't corrupt other data.
 
 And while doing that please also add the missing space on the
 if() line.
 

Sorry, I was forgetting domain_crash() is not immediate. I'll send v8.

  Paul

 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int

2015-07-09 Thread Paul Durrant
 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 09 July 2015 17:20
 To: Paul Durrant
 Cc: xen-devel@lists.xen.org; Keir (Xen.org)
 Subject: RE: [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t
 and sizes to unsigned int
 
  On 09.07.15 at 18:10, paul.durr...@citrix.com wrote:
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: 09 July 2015 16:24
   On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
   Building on the previous patch, this patch restricts portio port numbers
   to uint16_t in registration/relocate calls. In portio_action_t the port
   number is change to unsigned int though to avoid the compiler
 generating
   16-bit operations unnecessarily. The patch also changes I/O sizes to
   unsigned int which then allows the io_handler size field to reduce to
   an unsigned int.
  
   Signed-off-by: Paul Durrant paul.durr...@citrix.com
   Cc: Keir Fraser k...@xen.org
   Cc: Jan Beulich jbeul...@suse.com
   Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
   ---
  
   v7:
   - Change port type in portio_action_t to unsigned int as requested
 by Jan
 
  Yet title and description were left in places, and ...
 
  The title remains. The description was modified:
 
   In portio_action_t the port number is change to unsigned int though to
  avoid the compiler generating 16-bit operations unnecessarily.
 
 Maybe I'm just being confused by the two and-s in the title,
 which I assumed can't both be meant to be and, or else you'd
 have written port numbers, uint16_t, and sizes. Plus it seems
 unclear what restrict a uint16_t to unsigned int actually means.
 
   @@ -96,17 +96,17 @@ int hvm_mmio_intercept(ioreq_t *p);
int hvm_buffered_io_send(ioreq_t *p);
  
static inline void register_portio_handler(
   -struct domain *d, unsigned long addr,
   -unsigned long size, portio_action_t action)
   +struct domain *d, uint16_t port, unsigned int size,
   +portio_action_t action)
{
   -register_io_handler(d, addr, size, action, HVM_PORTIO);
   +register_io_handler(d, port, size, action, HVM_PORTIO);
}
  
static inline void relocate_portio_handler(
   -struct domain *d, unsigned long old_addr, unsigned long new_addr,
   -unsigned long size)
   +struct domain *d, uint16_t old_port, uint16_t new_port,
   +unsigned int size)
{
   -relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO);
   +relocate_io_handler(d, old_port, new_port, size, HVM_PORTIO);
}
 
  ... these still use uint16_t. I'm pretty sure I gave my comment in a
  way indicating that this should generally change, perhaps just at
  the example of portio_action_t.
 
  Why? Do we not want to restrict to uint16_t at the interface level?
 
 Quite the inverse - why would we want to? This just makes the
 calling sequence less efficient (due to the needed operand size
 overrides), and hides mistakes in callers properly truncating
 when reading guest registers.
 

I would have thought that the compiler would point out the overflow if a caller 
tried to pass 64k in the port number, but I'll get rid of the remaining 
uint16_ts.

  Paul

 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 05/13] x86/altp2m: basic data structures and support routines.

2015-07-09 Thread George Dunlap
On Wed, Jul 1, 2015 at 7:09 PM, Ed White edmund.h.wh...@intel.com wrote:
 diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
 index b4f035e..301ca59 100644
 --- a/xen/arch/x86/mm/mm-locks.h
 +++ b/xen/arch/x86/mm/mm-locks.h
 @@ -217,7 +217,7 @@ declare_mm_lock(nestedp2m)
  #define nestedp2m_lock(d)   mm_lock(nestedp2m, (d)-arch.nested_p2m_lock)
  #define nestedp2m_unlock(d) mm_unlock((d)-arch.nested_p2m_lock)

 -/* P2M lock (per-p2m-table)
 +/* P2M lock (per-non-alt-p2m-table)
   *
   * This protects all queries and updates to the p2m table.
   * Queries may be made under the read lock but all modifications
 @@ -225,10 +225,44 @@ declare_mm_lock(nestedp2m)
   *
   * The write lock is recursive as it is common for a code path to look
   * up a gfn and later mutate it.
 + *
 + * Note that this lock shares its implementation with the altp2m
 + * lock (not the altp2m list lock), so the implementation
 + * is found there.
   */

  declare_mm_rwlock(p2m);
 -#define p2m_lock(p)   mm_write_lock(p2m, (p)-lock);
 +
 +/* Alternate P2M list lock (per-domain)
 + *
 + * A per-domain lock that protects the list of alternate p2m's.
 + * Any operation that walks the list needs to acquire this lock.
 + * Additionally, before destroying an alternate p2m all VCPU's
 + * in the target domain must be paused.
 + */
 +
 +declare_mm_lock(altp2mlist)
 +#define altp2m_lock(d)   mm_lock(altp2mlist, (d)-arch.altp2m_lock)
 +#define altp2m_unlock(d) mm_unlock((d)-arch.altp2m_lock)

This is unlocking an altp2mlist lock type; and the lock above is
protecting an altp2m list, not an altp2m.  Please use
altp2m_list_lock, both for the lock itself and the macro that locks
it.

Also, even after reading Tim's comment and going through the whole
series, I still don't have a clear idea in what circumstances the
various locks (p2m, altp2mlist, and altp2m) will be acquired after one
another, such that this sandwiched lock structure is required.

Please include a comment brief description of what codepaths might
cause different pairs of locks to be acquired, so that someone coming
and looking at this code doesn't have to try to figure it out
themselves.

Everything else looks OK to me.  Thanks.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int

2015-07-09 Thread Paul Durrant
 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: 09 July 2015 16:24
 To: Paul Durrant
 Cc: xen-devel@lists.xen.org; Keir (Xen.org)
 Subject: Re: [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t
 and sizes to unsigned int
 
  On 09.07.15 at 15:10, paul.durr...@citrix.com wrote:
  Building on the previous patch, this patch restricts portio port numbers
  to uint16_t in registration/relocate calls. In portio_action_t the port
  number is change to unsigned int though to avoid the compiler generating
  16-bit operations unnecessarily. The patch also changes I/O sizes to
  unsigned int which then allows the io_handler size field to reduce to
  an unsigned int.
 
  Signed-off-by: Paul Durrant paul.durr...@citrix.com
  Cc: Keir Fraser k...@xen.org
  Cc: Jan Beulich jbeul...@suse.com
  Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
  ---
 
  v7:
  - Change port type in portio_action_t to unsigned int as requested
by Jan
 
 Yet title and description were left in places, and ...

The title remains. The description was modified:

 In portio_action_t the port number is change to unsigned int though to avoid 
the compiler generating 16-bit operations unnecessarily.

 
  @@ -96,17 +96,17 @@ int hvm_mmio_intercept(ioreq_t *p);
   int hvm_buffered_io_send(ioreq_t *p);
 
   static inline void register_portio_handler(
  -struct domain *d, unsigned long addr,
  -unsigned long size, portio_action_t action)
  +struct domain *d, uint16_t port, unsigned int size,
  +portio_action_t action)
   {
  -register_io_handler(d, addr, size, action, HVM_PORTIO);
  +register_io_handler(d, port, size, action, HVM_PORTIO);
   }
 
   static inline void relocate_portio_handler(
  -struct domain *d, unsigned long old_addr, unsigned long new_addr,
  -unsigned long size)
  +struct domain *d, uint16_t old_port, uint16_t new_port,
  +unsigned int size)
   {
  -relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO);
  +relocate_io_handler(d, old_port, new_port, size, HVM_PORTIO);
   }
 
 ... these still use uint16_t. I'm pretty sure I gave my comment in a
 way indicating that this should generally change, perhaps just at
 the example of portio_action_t.

Why? Do we not want to restrict to uint16_t at the interface level?

  Paul

 
 Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.

2015-07-09 Thread Dagaen Golomb
 Can you summarize the design discussion with Dario on the first
 version of the patch? The above information is ok, but it does not say
 much about the design. As you can see in our discussion with Dario,
 there exist several design choices to achieve this. Explicitly state
 it here can help people understand what this patch is doing.

 Yes, that would be helpful. It's not required to summarize all the
 various ML thread in the actual changelog(s), but, really, a summary and
 maybe a few links and pointers in the cover letter, would be useful, to
 give interested people that may join late the chance to get some
 context.

 Note that, even for single patch and not only for series, it is ok to
 include a cover letter (i.e., a [PATCH 0/x]), e.g., right for this
 purpose.

 So, as I was saying, I like the architecture now. There are issues, or
 at least I have some questions about specific decisions or code chunks,
 but I think this is on the right track now.

I'll strengthen the summary/cover letter on future patches.

  +struct timer replenishment_timer;
  +unsigned NUM_CPUS;

 First, it does not follow the convension of variable names. UPPER CASE
 should be the macro definition and a constant, which is not the case.
 You can use a more proper name, such as ncpus, here and add what it is
 used for.

 Agreed.

Will change.

   cpumask_t tickled;  /* cpus been tickled */
   };
 
  @@ -178,6 +181,8 @@ struct rt_vcpu {
   unsigned flags; /* mark __RTDS_scheduled, etc.. */
   };
 
  +static void runq_tickle(const struct scheduler *, struct rt_vcpu *);

 Not sure if declaring the function is the best approach or having
 another patch to move the function forward. But anyway, we don't have
 to worry about this yet right now, considering there are more
 important issues to tackle for this patch.

 Sure.

 Personally, I would like moving the function better (in a separate
 patch). Forward declarations make it (slightly, yes, but still...) more
 difficult to find (and switch, like in editors) between call sites and
 where the actual code is.

I'd like to move it, but as mentioned that can be a separate patch.

  @@ -411,15 +422,77 @@ __runq_insert(const struct scheduler *ops, struct 
  rt_vcpu *svc)
   struct rt_vcpu * iter_svc = __q_elem(iter);
   if ( svc-cur_deadline = iter_svc-cur_deadline )
   break;
  +else
  +++inserted_index;
}
   list_add_tail(svc-q_elem, iter);
  +return inserted_index;
   }
   else
   {
  -list_add(svc-q_elem, prv-depletedq);
  +// If we are inserting into previously empty depletedq, rearm
  +//   rearm replenish timer. It will handle further scheduling 
  until
  +//   the depletedq is empty again (if ever)
  +if( list_empty(depletedq) )
  +set_timer( prv-replenishment_timer, svc-cur_deadline );

 Hmm, can you handle the set_timer outside of this function? It should
 be possible and make this runq_insert() function serve for only one
 purpose.

 TBH, I think I like it being here. I mean, __runq_insert is called in
 many places (at least right now), and I wouldn't like to have the check
 and the arming repeated everywhere.

 Then, that being said, I'm still hoping that some of these calls could
 go away, and maybe there are call sites where we know that the timer
 won't be armed in any case... So, we'll have to see about this.

 So, just take this comment as I don't dislike the timer machinery to be
 in here. :-)

I thought it was an acceptable way of accomplishing this. We can focus on
trimming more later if this doesn't seem ideal.


  +// If we become one of top [# CPUs] in the runq, tickle it
  +// TODO: make this work when multiple tickles are required

 Why do you need multiple tickles?

 Because the loop above may have been done more than one replenishment,
 which makes sense.

 While we are here. I've said that I don't like the fact that we have one
 big spinlock for the whole scheduler. However, we do have it right now,
 so let's make good use of it.

 So (but please double check what I'm about to say, and provide your
 view), it should be safe to access the various svc-vc-processor from
 inside here, since we're holding the big log. If that's the case, then
 it should be fairly easy to figure out which are the pCPUs that needs to
 reschedule (playing with the index, etc), and then use their
 vc-processor to decide what pCPU to actually tickle, isn't this the
 case?

 (let me know if I've explained myself...)

  +// Replenishment timer, for now always on CPU 0
  +init_timer(prv-replenishment_timer, replenishment_handler, ops, 0);

 Oh, yes, I wanted to discuss about this! It is true that in Xen, timers
 are bound to cpus (while, e.g., in Linux, but both the interface and the
 implementation are different).

 So, for know, you're doing the right thing in just putting 

[Xen-devel] [PATCH v8 12/13] x86/hvm: always re-emulate I/O from a buffer

2015-07-09 Thread Paul Durrant
If memory mapped I/O is 'chunked' then the I/O must be re-emulated,
otherwise only the first chunk will be processed. This patch makes
sure all I/O from a buffer is re-emulated regardless of whether it
is a read or a write.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Acked-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v7:
- No change

v6:
- Added Andrew's reviewed-by

v5:
- Added Jan's acked-by
---
 xen/arch/x86/hvm/emulate.c |4 ++--
 xen/include/asm-x86/hvm/vcpu.h |3 +--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 1444b56..89b1616 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -147,7 +147,7 @@ static int hvmemul_do_io(
  (p.data_is_ptr != data_is_addr) )
 domain_crash(curr-domain);
 
-if ( data_is_addr || dir == IOREQ_WRITE )
+if ( data_is_addr )
 return X86EMUL_UNHANDLEABLE;
 goto finish_access;
 default:
@@ -187,7 +187,7 @@ static int hvmemul_do_io(
 rc = hvm_send_assist_req(s, p);
 if ( rc != X86EMUL_RETRY || curr-domain-is_shutting_down )
 vio-io_req.state = STATE_IOREQ_NONE;
-else if ( data_is_addr || dir == IOREQ_WRITE )
+else if ( data_is_addr )
 rc = X86EMUL_OKAY;
 }
 break;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 01cbfe5..13ff54f 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -81,8 +81,7 @@ struct hvm_vcpu_io {
 static inline bool_t hvm_vcpu_io_need_completion(const struct hvm_vcpu_io *vio)
 {
 return (vio-io_req.state == STATE_IOREQ_READY) 
-   !vio-io_req.data_is_ptr 
-   (vio-io_req.dir == IOREQ_READ);
+   !vio-io_req.data_is_ptr;
 }
 
 #define VMCX_EADDR(~0ULL)
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 11/13] x86/hvm: use ioreq_t to track in-flight state

2015-07-09 Thread Paul Durrant
Use an ioreq_t rather than open coded state, size, dir and data fields
in struct hvm_vcpu_io. This also allows PIO completion to be handled
similarly to MMIO completion by re-issuing the handle_pio() call.

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- No change

v7:
- Modified struct field placement as knock-on from previous patch

v6:
- Added Andrew's reviewed-by

v5:
- Added missing hunk with call to handle_pio()
---
 xen/arch/x86/hvm/emulate.c   |   35 +--
 xen/arch/x86/hvm/hvm.c   |   13 +
 xen/arch/x86/hvm/svm/nestedsvm.c |2 +-
 xen/arch/x86/hvm/vmx/realmode.c  |4 ++--
 xen/include/asm-x86/hvm/vcpu.h   |   12 
 5 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 010e2fd..1444b56 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -91,6 +91,7 @@ static int hvmemul_do_io(
 .df = df,
 .data = data,
 .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
+.state = STATE_IOREQ_READY,
 };
 void *p_data = (void *)data;
 int rc;
@@ -128,12 +129,24 @@ static int hvmemul_do_io(
 }
 }
 
-switch ( vio-io_state )
+switch ( vio-io_req.state )
 {
 case STATE_IOREQ_NONE:
 break;
 case STATE_IORESP_READY:
-vio-io_state = STATE_IOREQ_NONE;
+vio-io_req.state = STATE_IOREQ_NONE;
+p = vio-io_req;
+
+/* Verify the emulation request has been correctly re-issued */
+if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
+ (p.addr != addr) ||
+ (p.size != size) ||
+ (p.count != reps) ||
+ (p.dir != dir) ||
+ (p.df != df) ||
+ (p.data_is_ptr != data_is_addr) )
+domain_crash(curr-domain);
+
 if ( data_is_addr || dir == IOREQ_WRITE )
 return X86EMUL_UNHANDLEABLE;
 goto finish_access;
@@ -141,11 +154,6 @@ static int hvmemul_do_io(
 return X86EMUL_UNHANDLEABLE;
 }
 
-vio-io_state = STATE_IOREQ_READY;
-vio-io_size = size;
-vio-io_dir = dir;
-vio-io_data_is_addr = data_is_addr;
-
 if ( dir == IOREQ_WRITE )
 {
 if ( !data_is_addr )
@@ -154,13 +162,14 @@ static int hvmemul_do_io(
 hvmtrace_io_assist(p);
 }
 
+vio-io_req = p;
+
 rc = hvm_io_intercept(p);
 
 switch ( rc )
 {
 case X86EMUL_OKAY:
-vio-io_data = p.data;
-vio-io_state = STATE_IOREQ_NONE;
+vio-io_req.state = STATE_IOREQ_NONE;
 break;
 case X86EMUL_UNHANDLEABLE:
 {
@@ -171,15 +180,13 @@ static int hvmemul_do_io(
 if ( !s )
 {
 rc = hvm_process_io_intercept(null_handler, p);
-if ( rc == X86EMUL_OKAY )
-vio-io_data = p.data;
-vio-io_state = STATE_IOREQ_NONE;
+vio-io_req.state = STATE_IOREQ_NONE;
 }
 else
 {
 rc = hvm_send_assist_req(s, p);
 if ( rc != X86EMUL_RETRY || curr-domain-is_shutting_down )
-vio-io_state = STATE_IOREQ_NONE;
+vio-io_req.state = STATE_IOREQ_NONE;
 else if ( data_is_addr || dir == IOREQ_WRITE )
 rc = X86EMUL_OKAY;
 }
@@ -198,7 +205,7 @@ static int hvmemul_do_io(
 hvmtrace_io_assist(p);
 
 if ( !data_is_addr )
-memcpy(p_data, vio-io_data, size);
+memcpy(p_data, p.data, size);
 }
 
 if ( is_mmio  !data_is_addr )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3be17f9..31ae4d4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -421,11 +421,11 @@ static void hvm_io_assist(ioreq_t *p)
 
 if ( hvm_vcpu_io_need_completion(vio) )
 {
-vio-io_state = STATE_IORESP_READY;
-vio-io_data = p-data;
+vio-io_req.state = STATE_IORESP_READY;
+vio-io_req.data = p-data;
 }
 else
-vio-io_state = STATE_IOREQ_NONE;
+vio-io_req.state = STATE_IOREQ_NONE;
 
 msix_write_completion(curr);
 vcpu_end_shutdown_deferral(curr);
@@ -501,11 +501,8 @@ void hvm_do_resume(struct vcpu *v)
 handle_mmio();
 break;
 case HVMIO_pio_completion:
-if ( vio-io_size == 4 ) /* Needs zero extension. */
-guest_cpu_user_regs()-rax = (uint32_t)vio-io_data;
-else
-memcpy(guest_cpu_user_regs()-rax, vio-io_data, vio-io_size);
-vio-io_state = STATE_IOREQ_NONE;
+(void)handle_pio(vio-io_req.addr, vio-io_req.size,
+ vio-io_req.dir);
 break;
 case HVMIO_realmode_completion:
 {
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 2243873..2653bc1 100644
--- 

[Xen-devel] [PATCH v8 10/13] x86/hvm: remove hvm_io_state enumeration

2015-07-09 Thread Paul Durrant
Emulation request status is already covered by STATE_IOREQ_XXX values so
just use those. The mapping is:

HVMIO_none- STATE_IOREQ_NONE
HVMIO_awaiting_completion - STATE_IOREQ_READY
HVMIO_completed   - STATE_IORESP_READY

Signed-off-by: Paul Durrant paul.durr...@citrix.com
Cc: Keir Fraser k...@xen.org
Acked-by: Jan Beulich jbeul...@suse.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
---

v8:
- No change

v7:
- Modified struct field placement as knock-on from previous patch

v6:
- Added Andrew's reviewed-by

v5:
- Added Jan's acked-by
---
 xen/arch/x86/hvm/emulate.c   |   14 +++---
 xen/arch/x86/hvm/hvm.c   |6 +++---
 xen/arch/x86/hvm/svm/nestedsvm.c |2 +-
 xen/arch/x86/hvm/vmx/realmode.c  |4 ++--
 xen/include/asm-x86/hvm/vcpu.h   |   10 ++
 5 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 933a605..010e2fd 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -130,10 +130,10 @@ static int hvmemul_do_io(
 
 switch ( vio-io_state )
 {
-case HVMIO_none:
+case STATE_IOREQ_NONE:
 break;
-case HVMIO_completed:
-vio-io_state = HVMIO_none;
+case STATE_IORESP_READY:
+vio-io_state = STATE_IOREQ_NONE;
 if ( data_is_addr || dir == IOREQ_WRITE )
 return X86EMUL_UNHANDLEABLE;
 goto finish_access;
@@ -141,7 +141,7 @@ static int hvmemul_do_io(
 return X86EMUL_UNHANDLEABLE;
 }
 
-vio-io_state = HVMIO_awaiting_completion;
+vio-io_state = STATE_IOREQ_READY;
 vio-io_size = size;
 vio-io_dir = dir;
 vio-io_data_is_addr = data_is_addr;
@@ -160,7 +160,7 @@ static int hvmemul_do_io(
 {
 case X86EMUL_OKAY:
 vio-io_data = p.data;
-vio-io_state = HVMIO_none;
+vio-io_state = STATE_IOREQ_NONE;
 break;
 case X86EMUL_UNHANDLEABLE:
 {
@@ -173,13 +173,13 @@ static int hvmemul_do_io(
 rc = hvm_process_io_intercept(null_handler, p);
 if ( rc == X86EMUL_OKAY )
 vio-io_data = p.data;
-vio-io_state = HVMIO_none;
+vio-io_state = STATE_IOREQ_NONE;
 }
 else
 {
 rc = hvm_send_assist_req(s, p);
 if ( rc != X86EMUL_RETRY || curr-domain-is_shutting_down )
-vio-io_state = HVMIO_none;
+vio-io_state = STATE_IOREQ_NONE;
 else if ( data_is_addr || dir == IOREQ_WRITE )
 rc = X86EMUL_OKAY;
 }
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8e487d4..3be17f9 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -421,11 +421,11 @@ static void hvm_io_assist(ioreq_t *p)
 
 if ( hvm_vcpu_io_need_completion(vio) )
 {
-vio-io_state = HVMIO_completed;
+vio-io_state = STATE_IORESP_READY;
 vio-io_data = p-data;
 }
 else
-vio-io_state = HVMIO_none;
+vio-io_state = STATE_IOREQ_NONE;
 
 msix_write_completion(curr);
 vcpu_end_shutdown_deferral(curr);
@@ -505,7 +505,7 @@ void hvm_do_resume(struct vcpu *v)
 guest_cpu_user_regs()-rax = (uint32_t)vio-io_data;
 else
 memcpy(guest_cpu_user_regs()-rax, vio-io_data, vio-io_size);
-vio-io_state = HVMIO_none;
+vio-io_state = STATE_IOREQ_NONE;
 break;
 case HVMIO_realmode_completion:
 {
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index ad8afb4..2243873 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1221,7 +1221,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
  * Delay the injection because this would result in delivering
  * an interrupt *within* the execution of an instruction.
  */
-if ( v-arch.hvm_vcpu.hvm_io.io_state != HVMIO_none )
+if ( v-arch.hvm_vcpu.hvm_io.io_state != STATE_IOREQ_NONE )
 return hvm_intblk_shadow;
 
 if ( !nv-nv_vmexit_pending  n2vmcb-exitintinfo.bytes != 0 ) {
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index deb53ae..25533dc 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -205,7 +205,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
 
 vmx_realmode_emulate_one(hvmemul_ctxt);
 
-if ( vio-io_state != HVMIO_none || vio-mmio_retry )
+if ( vio-io_state != STATE_IOREQ_NONE || vio-mmio_retry )
 break;
 
 /* Stop emulating unless our segment state is not safe */
@@ -219,7 +219,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
 }
 
 /* Need to emulate next time if we've started an IO operation */
-if ( vio-io_state != HVMIO_none )
+if ( vio-io_state != STATE_IOREQ_NONE )
 curr-arch.hvm_vmx.vmx_emulate = 1;
 
 if ( !curr-arch.hvm_vmx.vmx_emulate  

Re: [Xen-devel] [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ([v7][PATCH 10/16] tools: introduce some new parameters to 
set rdm policy):
 This patch introduces user configurable parameters to specify RDM
 resource and according policies,
...

  int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
  {
 +/* We'd like to force reserve rdm specific to a device by default.*/
 +if ( pci-rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
   ^

I have just spotted that spurious whitespace.  However I won't block
this for that.

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

(actually).

I would appreciate it if you could ensure that this is fixed in any
repost.  You may retain my ack if you do that.  Committers should feel
free to fix it on commit.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ([v7][PATCH 13/16] libxl: construct e820 map with RDM 
information for HVM guest):
 Here we'll construct a basic guest e820 table via
 XENMEM_set_memory_map. This table includes lowmem, highmem
 and RDMs if they exist, and hvmloader would need this info
 later.
 
 Note this guest e820 table would be same as before if the
 platform has no any RDM or we disable RDM (by default).
...
  tools/libxl/libxl_dom.c  |  5 +++
  tools/libxl/libxl_internal.h | 24 +
  tools/libxl/libxl_x86.c  | 83 
 
...
 diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
 index 62ef120..41da479 100644
 --- a/tools/libxl/libxl_dom.c
 +++ b/tools/libxl/libxl_dom.c
 @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
  goto out;
  }
  
 +if (libxl__domain_construct_e820(gc, d_config, domid, args)) {
 +LOG(ERROR, setting domain memory map failed);
 +goto out;
 +}

This is platform-independent code, isn't it ?  In which case this will
break the build on ARM, I think.

Would an ARM maintainer please confirm.

Aside from that I have no issues with this patch.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 4/4] iommu: add rmrr Xen command line option for extra rmrrs

2015-07-09 Thread Konrad Rzeszutek Wilk
On Wed, Jul 08, 2015 at 09:50:26PM -0400, elena.ufimts...@oracle.com wrote:
 From: Elena Ufimtseva elena.ufimts...@oracle.com
 
 On some platforms RMRR regions may be not specified
 in ACPI and thus will not be mapped 1:1 in dom0. This
 causes IO Page Faults and prevents dom0 from booting
 in PVH mode.
 New Xen command line option rmrr allows to specify
 such devices and memory regions. These regions are added
 to the list of RMRR defined in ACPI if the device
 is present in system. As a result, additional RMRRs will
 be mapped 1:1 in dom0 with correct permissions.
 
 Mentioned above problems were discovered during PVH work with
 ThinkCentre M and Dell 5600T. No official documentation
 was found so far in regards to what devices and why cause this.
 Experiments show that ThinkCentre M USB devices with enabled
 debug port generate DMA read transactions to the regions of
 memory marked reserved in host e820 map.
 For Dell 5600T the device and faulting addresses are not found yet.
 
 For detailed history of the discussion please check following threads:
 http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
 http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html
 
 Format for rmrr Xen command line option:
 rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
 If grub2 used and multiple ranges are specified, ';' should be
 quoted/escaped, refer to grub2 manual for more information.
 

Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Thanks!
 Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
 ---
  docs/misc/xen-command-line.markdown |  13 +++
  xen/drivers/passthrough/vtd/dmar.c  | 209 
 +++-
  2 files changed, 221 insertions(+), 1 deletion(-)
 
 diff --git a/docs/misc/xen-command-line.markdown 
 b/docs/misc/xen-command-line.markdown
 index aa684c0..f307f3d 100644
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -1197,6 +1197,19 @@ Specify the host reboot method.
  'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
   default it will use that method first).
  
 +### rmrr
 + '= 
 start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
 +
 +Define RMRR units that are missing from ACPI table along with device they
 +belong to and use them for 1:1 mapping. End addresses can be omitted and one
 +page will be mapped. The ranges are inclusive when start and end are 
 specified.
 +If segment of the first device is not specified, segment zero will be used.
 +If other segments are not specified, first device segment will be used.
 +If a segment is specified for other than the first device and it does not 
 match
 +the one specified for the first one, an error will be reported.
 +Note: grub2 requires to escape or use quotations if special characters are 
 used,
 +namely ';', refer to the grub2 documentation if multiple ranges are 
 specified.
 +
  ### ro-hpet
   `= boolean`
  
 diff --git a/xen/drivers/passthrough/vtd/dmar.c 
 b/xen/drivers/passthrough/vtd/dmar.c
 index a8e1e5d..f62fb02 100644
 --- a/xen/drivers/passthrough/vtd/dmar.c
 +++ b/xen/drivers/passthrough/vtd/dmar.c
 @@ -869,6 +869,145 @@ out:
  return ret;
  }
  
 +#define MAX_EXTRA_RMRR_PAGES 16
 +#define MAX_EXTRA_RMRR 10
 +
 +/* RMRR units derived from command line rmrr option. */
 +#define MAX_EXTRA_RMRR_DEV 20
 +struct extra_rmrr_unit {
 +struct list_head list;
 +unsigned long base_pfn, end_pfn;
 +unsigned int dev_count;
 +u32sbdf[MAX_EXTRA_RMRR_DEV];
 +};
 +static __initdata unsigned int nr_rmrr;
 +static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR];
 +
 +/* Macro for RMRR inclusive range formatting. */
 +#define PRI_RMRR(s,e) [%lx-%lx]
 +
 +static void __init add_extra_rmrr(void)
 +{
 +struct acpi_rmrr_unit *acpi_rmrr;
 +struct acpi_rmrr_unit *rmrru;
 +unsigned int dev, seg, i, j;
 +unsigned long pfn;
 +bool_t overlap;
 +
 +for ( i = 0; i  nr_rmrr; i++ )
 +{
 +if ( extra_rmrr_units[i].base_pfn  extra_rmrr_units[i].end_pfn )
 +{
 +printk(XENLOG_ERR VTDPREFIX
 +   Invalid RMRR Range PRI_RMRR(s,e)\n,
 +   extra_rmrr_units[i].base_pfn, 
 extra_rmrr_units[i].end_pfn);
 +continue;
 +}
 +
 +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn =
 + MAX_EXTRA_RMRR_PAGES )
 +{
 +printk(XENLOG_ERR VTDPREFIX
 +   RMRR range PRI_RMRR(s,e) exceeds 
 __stringify(MAX_EXTRA_RMRR_PAGES) pages\n,
 +   extra_rmrr_units[i].base_pfn, 
 extra_rmrr_units[i].end_pfn);
 +continue;
 +}
 +
 +for ( j = 0; j  nr_rmrr; j++ )
 +{
 +if ( i != j 
 + extra_rmrr_units[i].base_pfn = extra_rmrr_units[j].end_pfn 
 
 + extra_rmrr_units[j].base_pfn = extra_rmrr_units[i].end_pfn 
 )
 +{
 + 

Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-09 Thread Elena Ufimtseva

- wei.l...@citrix.com wrote:

 On Thu, Jul 09, 2015 at 05:00:45PM +0100, Jan Beulich wrote:
   On 09.07.15 at 17:53, elena.ufimts...@oracle.com wrote:
   - jbeul...@suse.com wrote:
On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote:
You are right, it needs to be rebased. I can post later rebased
 on
   memory 
leak fix version, if you thin its a way to go.
   
   I didn't look at v9 yet, and can't predict when I will be able
 to.
   
   Would you like me to post v10 with memory leak patch included in
 the 
   patchset before you start looking at v9?
  
  If there is a dependency on the changes in the leak fix v6, then
  this would be a good idea. If not, you can keep things as they are
  now. I view the entire set more as a bug fix than a feature anyway,
  and hence see no reason not to get this in after the freeze. But
 I'm
  adding Wei just in case...
  
 

Thanks Jan.
The dependency exists on memory leak patch, so I will add it to this series and 
squash the first patch from v9.
 
 I just looked at v9. The first three patches are quite mechanical.
 The
 fourth patch is relatively bigger but it's also quite straightforward
 (mostly parsing input). All in all, this series itself is
 self-contained.
 
 I'm don't think OSSTest is able to test that, so it would not cause
 visible regression on our side.
 
 I also agree it's a bug fix. Preferably this series should be applied
 before first RC.
 
 Wei.

Thank you Wei.
 
  Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ([v7][PATCH 11/16] tools/libxl: detect and avoid conflicts 
with RDM):
 While building a VM, HVM domain builder provides struct hvm_info_table{}
 to help hvmloader. Currently it includes two fields to construct guest
 e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
 check them to fix any conflict with RDM.
...
 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Wei Liu wei.l...@citrix.com
 Acked-by: Wei Liu wei.l...@citrix.com
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 Reviewed-by: Kevin Tian kevint.t...@intel.com

I have found a few things in this patch which I would like to see
improved.  See below.

Given how late I am with this review, I do not feel that I should be
nacking it at this time.  You have a tools ack from Wei, so my
comments are not a blocker for this series.

But if you need to respin, please take these comments into account,
and consider which are feasible to fix in the time available.  If you
are respinning this series targeting Xen 4.7 or later, please address
all of the points I make below.

Thanks.


 +int libxl__domain_device_construct_rdm(libxl__gc *gc,
 +   libxl_domain_config *d_config,
 +   uint64_t rdm_mem_boundary,
 +   struct xc_hvm_build_args *args)
...
 +uint64_t highmem_end = args-highmem_end ? args-highmem_end : (1ull\
32);
...
 +if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE  !d_config-num_\
pcidevs)

There are quite a few of these long lines, which should be wrapped.
See tools/libxl/CODING_STYLE.

 +d_config-num_rdms = nr_entries;
 +d_config-rdms = libxl__realloc(NOGC, d_config-rdms,
 +d_config-num_rdms * sizeof(libxl_device_rdm));

This code is remarkably similar to a function later on which adds an
rdm.  Please can you factor it out.

 +} else
 +d_config-num_rdms = 0;

Please can you put { } around the else block too.  I don't think this
mixed style is good.

 +for (j = 0; j  d_config-num_rdms; j++) {
 +if (d_config-rdms[j].start ==
 + (uint64_t)xrdm[0].start_pfn  XC_PAGE_SHIFT)

This construct
   (uint64_t)some_pfn  XC_PAGE_SHIFT
appears an awful lot.

I would prefer it if it were done in an inline function (or maybe a
macro).


 +libxl_domain_build_info *const info = d_config-b_info;
 +/*
 + * Currently we fix this as 2G to guarantte how to handle
 ^

Should read guarantee.

 +ret = libxl__domain_device_construct_rdm(gc, d_config,
 + rdm_mem_boundary,
 + args);
 +if (ret) {
 +LOG(ERROR, checking reserved device memory failed);
 +goto out;
 +}

`rc' should be used here rather than `ret'.  (It is unfortunate that
this function has poor style already, but it would be best not to make
it worse.)


Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 07/27] tools/libxl: Extra management APIs for the save helper

2015-07-09 Thread Andrew Cooper
With migration v2, there are several moving parts needing to be juggled at
once.  This requires the error handling logic to be able to query the state of
each moving part, possibly before they have been started, and be able to
cancel them.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
Since v1:
 * Add an _init() function which allows _inuse() to be safe to call even
   before the save helper has started.
---
 tools/libxl/libxl_internal.h |9 +
 tools/libxl/libxl_save_callout.c |   17 ++---
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 588cfb8..4bd6ea1 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3272,6 +3272,15 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc,
 _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
int rc, int retval, int errnoval);
 
+_hidden void libxl__save_helper_init(libxl__save_helper_state *shs);
+_hidden void libxl__save_helper_abort(libxl__egc *egc,
+  libxl__save_helper_state *shs);
+
+static inline bool libxl__save_helper_inuse(const libxl__save_helper_state 
*shs)
+{
+return libxl__ev_child_inuse(shs-child);
+}
+
 /* Each time the dm needs to be saved, we must call suspend and then save */
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
libxl__domain_suspend_state *dss);
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 1136b79..cd18cd2 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -146,6 +146,13 @@ void 
libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
 shs-egc = 0;
 }
 
+void libxl__save_helper_init(libxl__save_helper_state *shs)
+{
+libxl__ao_abortable_init(shs-abrt);
+libxl__ev_fd_init(shs-readable);
+libxl__ev_child_init(shs-child);
+}
+
 /*- helper execution -*/
 
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
@@ -167,9 +174,7 @@ static void run_helper(libxl__egc *egc, 
libxl__save_helper_state *shs,
 shs-rc = 0;
 shs-completed = 0;
 shs-pipes[0] = shs-pipes[1] = 0;
-libxl__ao_abortable_init(shs-abrt);
-libxl__ev_fd_init(shs-readable);
-libxl__ev_child_init(shs-child);
+libxl__save_helper_init(shs);
 
 shs-abrt.ao = shs-ao;
 shs-abrt.callback = helper_stop;
@@ -279,6 +284,12 @@ static void helper_stop(libxl__egc *egc, 
libxl__ao_abortable *abrt, int rc)
 libxl__kill(gc, shs-child.pid, SIGTERM, save/restore helper);
 }
 
+void libxl__save_helper_abort(libxl__egc *egc,
+  libxl__save_helper_state *shs)
+{
+helper_stop(egc, shs-abrt, ERROR_FAIL);
+}
+
 static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
int fd, short events, short revents)
 {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 04/27] tools/libxl: Introduce libxl__kill()

2015-07-09 Thread Andrew Cooper
as a wrapper to kill(2), and use it in preference to sendig in
libxl_save_callout.c.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
Logically new in v2 - split out from a v1 change which was itself a
cherrypick-and-modify from the AO Abort series
---
 tools/libxl/libxl_aoutils.c  |   15 +++
 tools/libxl/libxl_internal.h |2 ++
 tools/libxl/libxl_save_callout.c |   10 ++
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 0931eee..274ef39 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -621,3 +621,18 @@ bool libxl__async_exec_inuse(const libxl__async_exec_state 
*aes)
 assert(time_inuse == child_inuse);
 return child_inuse;
 }
+
+void libxl__kill(libxl__gc *gc, pid_t pid, int sig, const char *what)
+{
+int r = kill(pid, sig);
+if (r) LOGE(WARN, failed to kill() %s [%lu] (signal %d),
+what, (unsigned long)pid, sig);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 19fc425..9147de1 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2244,6 +2244,8 @@ struct libxl__async_exec_state {
 int libxl__async_exec_start(libxl__async_exec_state *aes);
 bool libxl__async_exec_inuse(const libxl__async_exec_state *aes);
 
+void libxl__kill(libxl__gc *gc, pid_t pid, int sig, const char *what);
+
 /*- device addition/removal -*/
 
 typedef struct libxl__ao_device libxl__ao_device;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 087c2d5..b82a5c1 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -244,12 +244,6 @@ static void run_helper(libxl__egc *egc, 
libxl__save_helper_state *shs,
 libxl__carefd_close(childs_pipes[1]);
 helper_failed(egc, shs, rc);;
 }
-static void sendsig(libxl__gc *gc, libxl__save_helper_state *shs, int sig)
-{
-int r = kill(shs-child.pid, sig);
-if (r) LOGE(WARN, failed to kill save/restore helper [%lu] (signal %d),
-(unsigned long)shs-child.pid, sig);
-}
 
 static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
   int rc)
@@ -266,7 +260,7 @@ static void helper_failed(libxl__egc *egc, 
libxl__save_helper_state *shs,
 return;
 }
 
-sendsig(gc, shs, SIGKILL);
+libxl__kill(gc, shs-child.pid, SIGKILL, save/restore helper);
 }
 
 static void helper_stop(libxl__egc *egc, libxl__ao_abortable *abrt, int rc)
@@ -282,7 +276,7 @@ static void helper_stop(libxl__egc *egc, 
libxl__ao_abortable *abrt, int rc)
 if (!shs-rc)
 shs-rc = rc;
 
-sendsig(gc, shs, SIGTERM);
+libxl__kill(gc, shs-child.pid, SIGTERM, save/restore helper);
 }
 
 static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 05/27] tools/libxl: Stash all restore parameters in domain_create_state

2015-07-09 Thread Andrew Cooper
Shortly more parameters will appear, and this saves unboxing each one.
libxl_domain_restore_params is mandatory for restore streams, and ignored for
plain creation.  The old 'checkpointed_stream' was incorrectly identified as a
private parameter when it was infact public.

No functional change.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
Reviewed-by: Yang Hongyang yan...@cn.fujitsu.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
Since v1:
 * Gate validity on restore_fd being valid.
---
 tools/libxl/libxl_create.c   |   13 +++--
 tools/libxl/libxl_internal.h |2 +-
 tools/libxl/libxl_save_callout.c |2 +-
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index b785ddd..61515da 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1512,8 +1512,8 @@ static void domain_create_cb(libxl__egc *egc,
  int rc, uint32_t domid);
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
-uint32_t *domid,
-int restore_fd, int checkpointed_stream,
+uint32_t *domid, int restore_fd,
+const libxl_domain_restore_params *params,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 {
@@ -1526,8 +1526,9 @@ static int do_domain_create(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 libxl_domain_config_init(cdcs-dcs.guest_config_saved);
 libxl_domain_config_copy(ctx, cdcs-dcs.guest_config_saved, d_config);
 cdcs-dcs.restore_fd = restore_fd;
+if (restore_fd  -1)
+cdcs-dcs.restore_params = *params;
 cdcs-dcs.callback = domain_create_cb;
-cdcs-dcs.checkpointed_stream = checkpointed_stream;
 libxl__ao_progress_gethow(cdcs-dcs.aop_console_how, aop_console_how);
 cdcs-domid_out = domid;
 
@@ -1553,7 +1554,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 {
-return do_domain_create(ctx, d_config, domid, -1, 0,
+return do_domain_create(ctx, d_config, domid, -1, NULL,
 ao_how, aop_console_how);
 }
 
@@ -1563,8 +1564,8 @@ int libxl_domain_create_restore(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 const libxl_asyncop_how *ao_how,
 const libxl_asyncprogress_how *aop_console_how)
 {
-return do_domain_create(ctx, d_config, domid, restore_fd,
-params-checkpointed_stream, ao_how, 
aop_console_how);
+return do_domain_create(ctx, d_config, domid, restore_fd, params,
+ao_how, aop_console_how);
 }
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9147de1..5ab945a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3217,11 +3217,11 @@ struct libxl__domain_create_state {
 libxl_domain_config *guest_config;
 libxl_domain_config guest_config_saved; /* vanilla config */
 int restore_fd;
+libxl_domain_restore_params restore_params;
 libxl__domain_create_cb *callback;
 libxl_asyncprogress_how aop_console_how;
 /* private to domain_create */
 int guest_domid;
-int checkpointed_stream;
 libxl__domain_build_state build_state;
 libxl__bootloader_state bl;
 libxl__stub_dm_spawn_state dmss;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index b82a5c1..80aae1b 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -60,7 +60,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
libxl__domain_create_state *dcs,
 state-store_domid, state-console_port,
 state-console_domid,
 hvm, pae, superpages,
-cbflags, dcs-checkpointed_stream,
+cbflags, dcs-restore_params.checkpointed_stream,
 };
 
 dcs-shs.ao = ao;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 03/27] tools/libxl: Introduce ROUNDUP()

2015-07-09 Thread Andrew Cooper
This is the same as is used by libxc.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxl/libxl_internal.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5235d25..19fc425 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -110,6 +110,9 @@
 
 #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
 
+#define ROUNDUP(_val, _order)   \
+(((unsigned long)(_val)+(1UL(_order))-1)  ~((1UL(_order))-1))
+
 #define min(X, Y) ({ \
 const typeof (X) _x = (X);   \
 const typeof (Y) _y = (Y);   \
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 09/27] docs: Libxl migration v2 stream specification

2015-07-09 Thread Andrew Cooper
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 docs/specs/libxl-migration-stream.pandoc |  205 ++
 1 file changed, 205 insertions(+)
 create mode 100644 docs/specs/libxl-migration-stream.pandoc

diff --git a/docs/specs/libxl-migration-stream.pandoc 
b/docs/specs/libxl-migration-stream.pandoc
new file mode 100644
index 000..7235317
--- /dev/null
+++ b/docs/specs/libxl-migration-stream.pandoc
@@ -0,0 +1,205 @@
+% LibXenLight Domain Image Format
+% Andrew Cooper andrew.coop...@citrix.com
+% Draft B
+
+Introduction
+
+
+For the purposes of this document, `xl` is used as a representation of any
+implementer of the `libxl` API.  `xl` should be considered completely
+interchangeable with alternates, such as `libvirt` or `xenopsd-xl`.
+
+Purpose
+---
+
+The _domain image format_ is the context of a running domain used for
+snapshots of a domain or for transferring domains between hosts during
+migration.
+
+There are a number of problems with the domain image format used in Xen 4.5
+and earlier (the _legacy format_)
+
+* There is no `libxl` context information.  `xl` is required to send certain
+  pieces of `libxl` context itself.
+
+* The contents of the stream is passed directly through `libxl` to `libxc`.
+  The legacy `libxc` format contained some information which belonged at the
+  `libxl` level, resulting in awkward layer violation to return the
+  information back to `libxl`.
+
+* The legacy `libxc` format was inextensible, causing inextensibility in the
+  legacy `libxl` handling.
+
+This design addresses the above points, allowing for a completely
+self-contained, extensible stream with each layer responsibile for its own
+appropriate information.
+
+
+Not Yet Included
+
+
+The following features are not yet fully specified and will be
+included in a future draft.
+
+* Remus
+
+* ARM
+
+
+Overview
+
+
+The image format consists of a _Header_, followed by 1 or more _Records_.
+Each record consists of a type and length field, followed by any type-specific
+data.
+
+\clearpage
+
+Header
+==
+
+The header identifies the stream as a `libxl` stream, including the version of
+this specification that it complies with.
+
+All fields in this header shall be in _big-endian_ byte order, regardless of
+the setting of the endianness bit.
+
+ 0 1 2 3 4 5 6 7 octet
++-+
+| ident   |
++---+-+
+| version   | options |
++---+-+
+
+
+Field   Description
+--- 
+ident   0x4c6962786c466d74 (LibxlFmt in ASCII).
+
+version 0x0002.  The version of this specification.
+
+options bit 0: Endianness.0 = little-endian, 1 = big-endian.
+
+bit 1: Legacy Format. If set, this stream was created by
+  the legacy conversion tool.
+
+bits 2-31: Reserved.
+
+
+The endianness shall be 0 (little-endian) for images generated on an
+i386, x86_64, or arm host.
+
+\clearpage
+
+
+Records
+===
+
+A record has a record header, type specific data and a trailing footer.  If
+`length` is not a multiple of 8, the body is padded with zeroes to align the
+end of the record on an 8 octet boundary.
+
+ 0 1 2 3 4 5 6 7 octet
++---+-+
+| type  | body_length |
++---+---+-+
+| body... |
+...
+|   | padding (0 to 7 octets) |
++---+-+
+
+
+FieldDescription
+---  ---
+type 0x: END
+
+ 0x0001: LIBXC_CONTEXT
+
+ 0x0002: XENSTORE_DATA
+
+ 0x0003: EMULATOR_CONTEXT
+
+ 0x0004 - 0x7FFF: Reserved for future _mandatory_
+ records.
+
+ 0x8000 - 0x: Reserved for future _optional_
+ records.
+
+body_length  Length in octets of the record body.
+
+body Content of the record.
+
+padding  0 to 7 octets of zeros to pad the whole record to a multiple
+ of 8 octets.
+
+
+\clearpage
+
+END
+
+
+A end record marks the end of 

Re: [Xen-devel] [RFC Patch V1 06/12] PCI: Use for_pci_msi_entry() to access MSI device list

2015-07-09 Thread Konrad Rzeszutek Wilk
On Thu, Jul 09, 2015 at 04:00:41PM +0800, Jiang Liu wrote:
 Use accessor for_pci_msi_entry() to access MSI device list, so we could easily
 move msi_list from struct pci_dev into struct device later.
 
 Signed-off-by: Jiang Liu jiang@linux.intel.com
 ---
  drivers/pci/msi.c  |   39 ---
  drivers/pci/xen-pcifront.c |2 +-

Acked-by on the Xen bits.

  2 files changed, 21 insertions(+), 20 deletions(-)
 
 diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
 index 7b4c20c9f9ca..d09afa78d7a1 100644
 --- a/drivers/pci/msi.c
 +++ b/drivers/pci/msi.c
 @@ -131,7 +131,7 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int 
 nvec, int type)
   if (type == PCI_CAP_ID_MSI  nvec  1)
   return 1;
  
 - list_for_each_entry(entry, dev-msi_list, list) {
 + for_each_pci_msi_entry(entry, dev) {
   ret = arch_setup_msi_irq(dev, entry);
   if (ret  0)
   return ret;
 @@ -151,7 +151,7 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
   int i;
   struct msi_desc *entry;
  
 - list_for_each_entry(entry, dev-msi_list, list)
 + for_each_pci_msi_entry(entry, dev)
   if (entry-irq)
   for (i = 0; i  entry-nvec_used; i++)
   arch_teardown_msi_irq(entry-irq + i);
 @@ -168,7 +168,7 @@ static void default_restore_msi_irq(struct pci_dev *dev, 
 int irq)
  
   entry = NULL;
   if (dev-msix_enabled) {
 - list_for_each_entry(entry, dev-msi_list, list) {
 + for_each_pci_msi_entry(entry, dev) {
   if (irq == entry-irq)
   break;
   }
 @@ -282,7 +282,7 @@ void default_restore_msi_irqs(struct pci_dev *dev)
  {
   struct msi_desc *entry;
  
 - list_for_each_entry(entry, dev-msi_list, list)
 + for_each_pci_msi_entry(entry, dev)
   default_restore_msi_irq(dev, entry-irq);
  }
  
 @@ -363,21 +363,22 @@ EXPORT_SYMBOL_GPL(pci_write_msi_msg);
  
  static void free_msi_irqs(struct pci_dev *dev)
  {
 + struct list_head *msi_list = dev_to_msi_list(dev-dev);
   struct msi_desc *entry, *tmp;
   struct attribute **msi_attrs;
   struct device_attribute *dev_attr;
   int i, count = 0;
  
 - list_for_each_entry(entry, dev-msi_list, list)
 + for_each_pci_msi_entry(entry, dev)
   if (entry-irq)
   for (i = 0; i  entry-nvec_used; i++)
   BUG_ON(irq_has_action(entry-irq + i));
  
   pci_msi_teardown_msi_irqs(dev);
  
 - list_for_each_entry_safe(entry, tmp, dev-msi_list, list) {
 + list_for_each_entry_safe(entry, tmp, msi_list, list) {
   if (entry-msi_attrib.is_msix) {
 - if (list_is_last(entry-list, dev-msi_list))
 + if (list_is_last(entry-list, msi_list))
   iounmap(entry-mask_base);
   }
  
 @@ -448,7 +449,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
  
   if (!dev-msix_enabled)
   return;
 - BUG_ON(list_empty(dev-msi_list));
 + BUG_ON(list_empty(dev_to_msi_list(dev-dev)));
  
   /* route the table */
   pci_intx_for_msi(dev, 0);
 @@ -456,7 +457,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
   PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
  
   arch_restore_msi_irqs(dev);
 - list_for_each_entry(entry, dev-msi_list, list)
 + for_each_pci_msi_entry(entry, dev)
   msix_mask_irq(entry, entry-masked);
  
   pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 @@ -501,7 +502,7 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
   int count = 0;
  
   /* Determine how many msi entries we have */
 - list_for_each_entry(entry, pdev-msi_list, list)
 + for_each_pci_msi_entry(entry, pdev)
   ++num_msi;
   if (!num_msi)
   return 0;
 @@ -510,7 +511,7 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
   msi_attrs = kzalloc(sizeof(void *) * (num_msi + 1), GFP_KERNEL);
   if (!msi_attrs)
   return -ENOMEM;
 - list_for_each_entry(entry, pdev-msi_list, list) {
 + for_each_pci_msi_entry(entry, pdev) {
   msi_dev_attr = kzalloc(sizeof(*msi_dev_attr), GFP_KERNEL);
   if (!msi_dev_attr)
   goto error_attrs;
 @@ -599,7 +600,7 @@ static int msi_verify_entries(struct pci_dev *dev)
  {
   struct msi_desc *entry;
  
 - list_for_each_entry(entry, dev-msi_list, list) {
 + for_each_pci_msi_entry(entry, dev) {
   if (!dev-no_64bit_msi || !entry-msg.address_hi)
   continue;
   dev_err(dev-dev, Device has broken 64-bit MSI but arch
 @@ -636,7 +637,7 @@ static int msi_capability_init(struct pci_dev *dev, int 
 nvec)
   mask = msi_mask(entry-msi_attrib.multi_cap);
   

[Xen-devel] [PATCH 7/9] libxl: event tests: Provide miniature poll machinery

2015-07-09 Thread Ian Jackson
This allows a test case to have a poll-based event loop with exact
control of the sequence of operations.

Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
---
 tools/libxl/test_common.c |   44 +++-
 tools/libxl/test_common.h |   15 +++
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/test_common.c b/tools/libxl/test_common.c
index 83b94eb..c6bbbab 100644
--- a/tools/libxl/test_common.c
+++ b/tools/libxl/test_common.c
@@ -12,4 +12,46 @@ void test_common_setup(int level)
 
 int rc = libxl_ctx_alloc(ctx, LIBXL_VERSION, 0, logger);
 assert(!rc);
-}
+}
+
+struct timeval now;
+
+void test_common_get_now(void)
+{
+int r = gettimeofday(now, 0);  assert(!r);
+}
+
+int poll_nfds, poll_nfds_allocd;
+struct pollfd *poll_fds;
+int poll_timeout;
+
+void test_common_beforepoll(void)
+{
+for (;;) {
+test_common_get_now();
+
+poll_timeout = -1;
+poll_nfds = poll_nfds_allocd;
+int rc = libxl_osevent_beforepoll(ctx, poll_nfds, poll_fds,
+  poll_timeout, now);
+if (!rc) return;
+assert(rc == ERROR_BUFFERFULL);
+
+assert(poll_nfds  poll_nfds_allocd);
+poll_fds = realloc(poll_fds, poll_nfds * sizeof(poll_fds[0]));
+assert(poll_fds);
+poll_nfds_allocd = poll_nfds;
+}
+}
+
+void test_common_dopoll(void) {
+errno = 0;
+int r = poll(poll_fds, poll_nfds, poll_timeout);
+fprintf(stderr, poll: r=%d errno=%s\n, r, strerror(errno));
+}
+
+void test_common_afterpoll(void)
+{
+test_common_get_now();
+libxl_osevent_afterpoll(ctx, poll_nfds, poll_fds, now);
+}
diff --git a/tools/libxl/test_common.h b/tools/libxl/test_common.h
index 8b2471e..10c7166 100644
--- a/tools/libxl/test_common.h
+++ b/tools/libxl/test_common.h
@@ -6,9 +6,24 @@
 #include assert.h
 #include stdlib.h
 #include unistd.h
+#include fcntl.h
+#include sys/stat.h
+#include sys/types.h
 
 void test_common_setup(int level);
 
 extern libxl_ctx *ctx;
 
+void test_common_get_now(void);
+
+extern struct timeval now;
+
+void test_common_beforepoll(void);
+void test_common_dopoll(void);
+void test_common_afterpoll(void);
+
+extern int poll_nfds, poll_nfds_allocd;
+extern struct pollfd *poll_fds;
+extern int poll_timeout;
+
 #endif /*TEST_COMMON_H*/
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/9] libxl: poll: Make libxl__poller_get have only one success return path

2015-07-09 Thread Ian Jackson
In preparation for doing some more work on successful exit.

No functional change.

Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
CC: Jim Fehlig jfeh...@suse.com
---
 tools/libxl/libxl_event.c |   15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 9072df4..b332dd7 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1627,15 +1627,14 @@ libxl__poller *libxl__poller_get(libxl__gc *gc)
 libxl__poller *p = LIBXL_LIST_FIRST(CTX-pollers_idle);
 if (p) {
 LIBXL_LIST_REMOVE(p, entry);
-return p;
-}
-
-p = libxl__zalloc(NOGC, sizeof(*p));
+} else {
+p = libxl__zalloc(NOGC, sizeof(*p));
 
-rc = libxl__poller_init(gc, p);
-if (rc) {
-free(p);
-return NULL;
+rc = libxl__poller_init(gc, p);
+if (rc) {
+free(p);
+return NULL;
+}
 }
 
 return p;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/9] libxl: poll: Avoid fd deregistration race POLLNVAL

2015-07-09 Thread Ian Jackson
The first three patches in this series fix a race which I think is the
cause of a bug reported by Jim Fehlig (thanks, Jim).  They are IMO
backport candidates.

The next 5 provide a test case which reproduces the bug.  I actually
developed these patches in the opposite order, and have verified that
the without the fix, the test case fails the POLLNVAL assertion as
expected.

The final patch is a fix to another libxl event test.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 17/27] tools/libxl: Support converting a legacy stream to a v2 stream

2015-07-09 Thread Andrew Cooper
When a legacy stream is found, it needs to be converted to a v2 stream for the
reading logic.  This is done by exec()ing the python conversion utility.

One complication is that the caller of this interface needs to assume
ownership of the output fd, to prevent it being closed while still in use in a
datacopier.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
No major differences since v1.  It has gained a new ao_abrt as a result
of rebasing over the AO ABORT series, and has been made x86-specific.
---
 tools/libxl/Makefile|1 +
 tools/libxl/libxl_convert_callout.c |  172 +++
 tools/libxl/libxl_internal.h|   48 ++
 3 files changed, 221 insertions(+)
 create mode 100644 tools/libxl/libxl_convert_callout.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index c71c5fe..0ebc35a 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -59,6 +59,7 @@ endif
 LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
+LIBXL_OBJS-$(CONFIG_X86) += libxl_convert_callout.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o
 
 ifeq ($(CONFIG_NetBSD),y)
diff --git a/tools/libxl/libxl_convert_callout.c 
b/tools/libxl/libxl_convert_callout.c
new file mode 100644
index 000..02f3b82
--- /dev/null
+++ b/tools/libxl/libxl_convert_callout.c
@@ -0,0 +1,172 @@
+/*
+ * Copyright (C) 2014  Citrix Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include libxl_osdeps.h
+
+#include libxl_internal.h
+
+/*
+ * Infrastructure for converting a legacy migration stream into a libxl v2
+ * stream.
+ *
+ * This is done by fork()ing the python conversion script, which takes in a
+ * legacy stream, and puts out a suitably-formatted v2 stream.
+ */
+
+static void helper_failed(libxl__egc *egc,
+  libxl__conversion_helper_state *chs, int rc);
+static void helper_stop(libxl__egc *egc, libxl__ao_abortable*, int rc);
+static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
+  pid_t pid, int status);
+static void helper_done(libxl__egc *egc,
+libxl__conversion_helper_state *chs);
+
+void libxl__conversion_helper_init(libxl__conversion_helper_state *chs)
+{
+libxl__ao_abortable_init(chs-abrt);
+libxl__ev_child_init(chs-child);
+}
+
+void libxl__convert_legacy_stream(libxl__egc *egc,
+  libxl__conversion_helper_state *chs)
+{
+STATE_AO_GC(chs-ao);
+libxl__carefd *child_in = NULL, *child_out = NULL;
+int ret = 0;
+
+chs-rc = 0;
+libxl__conversion_helper_init(chs);
+
+chs-abrt.ao = chs-ao;
+chs-abrt.callback = helper_stop;
+ret = libxl__ao_abortable_register(chs-abrt);
+if (ret) goto err;
+
+libxl__carefd_begin();
+int fds[2];
+if (libxl_pipe(CTX, fds)) {
+ret = ERROR_FAIL;
+libxl__carefd_unlock();
+goto err;
+}
+child_out = libxl__carefd_record(CTX, fds[0]);
+child_in  = libxl__carefd_record(CTX, fds[1]);
+libxl__carefd_unlock();
+
+pid_t pid = libxl__ev_child_fork(gc, chs-child, helper_exited);
+if (!pid) {
+char * const args[] =
+{
+getenv(LIBXL_CONVERT_HELPER) ?:
+LIBEXEC_BIN /convert-legacy-stream,
+--in, GCSPRINTF(%d, chs-legacy_fd),
+--out,GCSPRINTF(%d, fds[1]),
+--width,
+#ifdef __i386__
+32,
+#else
+64,
+#endif
+--guest,  chs-hvm ? hvm : pv,
+--format, libxl,
+/* --verbose, */
+NULL,
+};
+
+libxl_fd_set_cloexec(CTX, chs-legacy_fd, 0);
+libxl_fd_set_cloexec(CTX, libxl__carefd_fd(child_in), 0);
+
+libxl__exec(gc,
+-1, -1, -1,
+args[0], args, NULL);
+}
+
+libxl__carefd_close(child_in);
+chs-v2_carefd = child_out;
+
+assert(!ret);
+return;
+
+ err:
+assert(ret);
+helper_failed(egc, chs, ret);
+}
+
+void libxl__convert_legacy_stream_abort(libxl__egc *egc,
+libxl__conversion_helper_state *chs,
+int rc)
+{
+helper_stop(egc, chs-abrt, rc);
+}
+
+static void helper_failed(libxl__egc *egc,
+   

Re: [Xen-devel] [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-09 Thread Ian Jackson
Tiejun Chen writes ([v7][PATCH 12/16] tools: introduce a new parameter to set 
a predefined rdm boundary):
 Previously we always fix that predefined boundary as 2G to handle
 conflict between memory and rdm, but now this predefined boundar
 can be changes with the parameter rdm_mem_boundary in .cfg file.
 
 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Wei Liu wei.l...@citrix.com
 Acked-by: Wei Liu wei.l...@citrix.com
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 00/27] Libxl migration v2

2015-07-09 Thread Andrew Cooper
This series adds support for the libxl migration v2 stream, and untangles the
existing layering violations of the toolstack and qemu records.

It can be found on the branch libxl-migv2-v2
  git://xenbits.xen.org/people/andrewcoop/xen.git
  http://xenbits.xen.org/git-http/people/andrewcoop/xen.git

Major changes in v2 are being rebased over the libxl AO-abort series, and a
redesign of the internal logic to support Remus/COLO buffering and failover.

At the end of the series, legacy migration is no longer used.

The Remus code is untested by me.  All other combinations of
suspend/migrate/resume have been tested with PV and HVM guests (qemu-trad and
qemu-upstream), including 32 - 64 bit migration (which was the underlying bug
causing us to write migration v2 in the first place).

Anyway, thoughts/comments welcome.  Please test!

~Andrew

Summary of Acks/Modified/New from v1

  N bsd-sys-queue-h-seddery: Massage `offsetof'
A   tools/libxc: Always compile the compat qemu variables into xc_sr_context
A   tools/libxl: Introduce ROUNDUP()
  N tools/libxl: Introduce libxl__kill()
AM  tools/libxl: Stash all restore parameters in domain_create_state
  N tools/libxl: Split libxl__domain_create_state.restore_fd in two
 M  tools/libxl: Extra management APIs for the save helper
AM  tools/xl: Mandatory flag indicating the format of the migration stream
docs: Libxl migration v2 stream specification
AM  tools/python: Libxc migration v2 infrastructure
AM  tools/python: Libxl migration v2 infrastructure
  N tools/python: Other migration infrastructure
AM  tools/python: Verification utility for v2 stream spec compliance
AM  tools/python: Conversion utility for legacy migration streams
 M  tools/libxl: Migration v2 stream format
 M  tools/libxl: Infrastructure for reading a libxl migration v2 stream
 M  tools/libxl: Support converting a legacy stream to a v2 stream
 M  tools/libxl: Convert a legacy stream if needed
 M  tools/libxc+libxl+xl: Restore v2 streams
 M  tools/libxl: Infrastructure for writing a v2 stream
 M  tools/libxc+libxl+xl: Save v2 streams
AM  docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams
 M  tools/libxl: Write checkpoint records into the stream
 M  tools/libx{c,l}: Introduce restore_callbacks.checkpoint()
 M  tools/libxl: Handle checkpoint records in a libxl migration v2 stream
A   tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
A   tools/libxl: Drop all knowledge of toolstack callbacks



Andrew Cooper (23):
  tools/libxc: Always compile the compat qemu variables into xc_sr_context
  tools/libxl: Introduce ROUNDUP()
  tools/libxl: Introduce libxl__kill()
  tools/libxl: Stash all restore parameters in domain_create_state
  tools/libxl: Split libxl__domain_create_state.restore_fd in two
  tools/libxl: Extra management APIs for the save helper
  tools/xl: Mandatory flag indicating the format of the migration stream
  docs: Libxl migration v2 stream specification
  tools/python: Libxc migration v2 infrastructure
  tools/python: Libxl migration v2 infrastructure
  tools/python: Other migration infrastructure
  tools/python: Verification utility for v2 stream spec compliance
  tools/python: Conversion utility for legacy migration streams
  tools/libxl: Support converting a legacy stream to a v2 stream
  tools/libxl: Convert a legacy stream if needed
  tools/libxc+libxl+xl: Restore v2 streams
  tools/libxc+libxl+xl: Save v2 streams
  docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams
  tools/libxl: Write checkpoint records into the stream
  tools/libx{c,l}: Introduce restore_callbacks.checkpoint()
  tools/libxl: Handle checkpoint records in a libxl migration v2 stream
  tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
  tools/libxl: Drop all knowledge of toolstack callbacks

Ian Jackson (1):
  bsd-sys-queue-h-seddery: Massage `offsetof'

Ross Lagerwall (3):
  tools/libxl: Migration v2 stream format
  tools/libxl: Infrastructure for reading a libxl migration v2 stream
  tools/libxl: Infrastructure for writing a v2 stream

 docs/specs/libxl-migration-stream.pandoc   |  217 ++
 tools/include/xen-external/bsd-sys-queue-h-seddery |2 +
 tools/libxc/Makefile   |2 -
 tools/libxc/include/xenguest.h |9 +
 tools/libxc/xc_sr_common.h |   12 +-
 tools/libxc/xc_sr_restore.c|   71 +-
 tools/libxc/xc_sr_restore_x86_hvm.c|  124 
 tools/libxc/xc_sr_save_x86_hvm.c   |   36 -
 tools/libxl/Makefile   |2 +
 tools/libxl/libxl.h|   19 +
 tools/libxl/libxl_aoutils.c|   15 +
 tools/libxl/libxl_convert_callout.c|  172 +
 tools/libxl/libxl_create.c |   86 ++-
 tools/libxl/libxl_dom.c|   65 +-
 tools/libxl/libxl_internal.h   |  192 -
 

[Xen-devel] [PATCH v2 01/27] bsd-sys-queue-h-seddery: Massage `offsetof'

2015-07-09 Thread Andrew Cooper
From: Ian Jackson ian.jack...@eu.citrix.com

For some reason BSD's queue.h uses `__offsetof'.  It expects it to
work just like offsetof.  So use offsetof.

Reported-by: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com
---
 tools/include/xen-external/bsd-sys-queue-h-seddery |2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/xen-external/bsd-sys-queue-h-seddery 
b/tools/include/xen-external/bsd-sys-queue-h-seddery
index 7a957e3..3f8716d 100755
--- a/tools/include/xen-external/bsd-sys-queue-h-seddery
+++ b/tools/include/xen-external/bsd-sys-queue-h-seddery
@@ -69,4 +69,6 @@ s/\b struct \s+ type \b/type/xg;
 
 s,^\#include.*sys/cdefs.*,/* $ */,xg;
 
+s,\b __offsetof \b ,offsetof,xg;
+
 s/\b( NULL )/0/xg;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 02/27] tools/libxc: Always compile the compat qemu variables into xc_sr_context

2015-07-09 Thread Andrew Cooper
This is safe (as the variables will simply be unused), and is required for
correct compilation when midway through untangling the libxc/libxl
interaction.

The #define is left in place to highlight that the variables can be removed
once the untangling is complete.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxc/xc_sr_common.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 565c5da..08c66db 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -307,10 +307,10 @@ struct xc_sr_context
 void *context;
 size_t contextsz;
 
-#ifdef XG_LIBXL_HVM_COMPAT
+/* #ifdef XG_LIBXL_HVM_COMPAT */
 uint32_t qlen;
 void *qbuf;
-#endif
+/* #endif */
 } restore;
 };
 } x86_hvm;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 08/27] tools/xl: Mandatory flag indicating the format of the migration stream

2015-07-09 Thread Andrew Cooper
Introduced at this point so the python stream conversion code has a concrete
ABI to use.  Later when libxl itself starts supporting a v2 stream, it will be
added to XL_MANDATORY_FLAG_ALL.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
v2: Expand commit message
---
 tools/libxl/xl_cmdimpl.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 971209c..26b1e7d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -109,6 +109,7 @@
*/
 
 #define XL_MANDATORY_FLAG_JSON (1U  0) /* config data is in JSON format */
+#define XL_MANDATORY_FLAG_STREAMv2 (1U  1) /* stream is v2 */
 #define XL_MANDATORY_FLAG_ALL  (XL_MANDATORY_FLAG_JSON)
 struct save_file_header {
 char magic[32]; /* savefileheader_magic */
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 06/27] tools/libxl: Split libxl__domain_create_state.restore_fd in two

2015-07-09 Thread Andrew Cooper
In a future patch, we shall support automatically converting a legacy stream
to a v2 stream, in which case libxc needs to read from a different fd.

Simply overwriting restore_fd does not work; the two fd's have different
circumstances.  The restore_fd needs to be returned to its origial state
before libxl_domain_create_restore() returns, while in the converted case, the
fd needs allocating and deallocating appropriately.

No functional change.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
New in v2
---
 tools/libxl/libxl_create.c   |2 +-
 tools/libxl/libxl_internal.h |2 +-
 tools/libxl/libxl_save_callout.c |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 61515da..be13204 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1525,7 +1525,7 @@ static int do_domain_create(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 cdcs-dcs.guest_config = d_config;
 libxl_domain_config_init(cdcs-dcs.guest_config_saved);
 libxl_domain_config_copy(ctx, cdcs-dcs.guest_config_saved, d_config);
-cdcs-dcs.restore_fd = restore_fd;
+cdcs-dcs.restore_fd = cdcs-dcs.libxc_fd = restore_fd;
 if (restore_fd  -1)
 cdcs-dcs.restore_params = *params;
 cdcs-dcs.callback = domain_create_cb;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5ab945a..588cfb8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3216,7 +3216,7 @@ struct libxl__domain_create_state {
 libxl__ao *ao;
 libxl_domain_config *guest_config;
 libxl_domain_config guest_config_saved; /* vanilla config */
-int restore_fd;
+int restore_fd, libxc_fd;
 libxl_domain_restore_params restore_params;
 libxl__domain_create_cb *callback;
 libxl_asyncprogress_how aop_console_how;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 80aae1b..1136b79 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -48,7 +48,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
libxl__domain_create_state *dcs,
 
 /* Convenience aliases */
 const uint32_t domid = dcs-guest_domid;
-const int restore_fd = dcs-restore_fd;
+const int restore_fd = dcs-libxc_fd;
 libxl__domain_build_state *const state = dcs-build_state;
 
 unsigned cbflags = libxl__srm_callout_enumcallbacks_restore
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 26/27] tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc

2015-07-09 Thread Andrew Cooper
Libxl has now been fully adjusted not to need it.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxc/xc_sr_common.h  |5 --
 tools/libxc/xc_sr_restore.c |   18 -
 tools/libxc/xc_sr_restore_x86_hvm.c |  124 ---
 tools/libxc/xc_sr_save_x86_hvm.c|   36 --
 4 files changed, 183 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 1f4d4e4..64f6082 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -309,11 +309,6 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
-
-/* #ifdef XG_LIBXL_HVM_COMPAT */
-uint32_t qlen;
-void *qbuf;
-/* #endif */
 } restore;
 };
 } x86_hvm;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index c9b5213..2f6a763 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -627,9 +627,6 @@ static void cleanup(struct xc_sr_context *ctx)
 PERROR(Failed to clean up);
 }
 
-#ifdef XG_LIBXL_HVM_COMPAT
-extern int read_qemu(struct xc_sr_context *ctx);
-#endif
 /*
  * Restore a domain.
  */
@@ -656,21 +653,6 @@ static int restore(struct xc_sr_context *ctx)
 goto err;
 }
 
-#ifdef XG_LIBXL_HVM_COMPAT
-if ( ctx-dominfo.hvm 
- (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) )
-{
-rc = read_qemu(ctx);
-if ( rc )
-{
-if ( ctx-restore.buffer_all_records )
-goto remus_failover;
-else
-goto err;
-}
-}
-#endif
-
 if ( ctx-restore.buffer_all_records 
  rec.type != REC_TYPE_END 
  rec.type != REC_TYPE_CHECKPOINT )
diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c 
b/tools/libxc/xc_sr_restore_x86_hvm.c
index 6f5af0e..49d22c7 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -3,24 +3,6 @@
 
 #include xc_sr_common_x86.h
 
-#ifdef XG_LIBXL_HVM_COMPAT
-static int handle_toolstack(struct xc_sr_context *ctx, struct xc_sr_record 
*rec)
-{
-xc_interface *xch = ctx-xch;
-int rc;
-
-if ( !ctx-restore.callbacks || !ctx-restore.callbacks-toolstack_restore 
)
-return 0;
-
-rc = ctx-restore.callbacks-toolstack_restore(
-ctx-domid, rec-data, rec-length, ctx-restore.callbacks-data);
-
-if ( rc  0 )
-PERROR(restoring toolstack);
-return rc;
-}
-#endif
-
 /*
  * Process an HVM_CONTEXT record from the stream.
  */
@@ -93,98 +75,6 @@ static int handle_hvm_params(struct xc_sr_context *ctx,
 return 0;
 }
 
-#ifdef XG_LIBXL_HVM_COMPAT
-int read_qemu(struct xc_sr_context *ctx);
-int read_qemu(struct xc_sr_context *ctx)
-{
-xc_interface *xch = ctx-xch;
-char qemusig[21];
-uint32_t qlen;
-void *qbuf = NULL;
-int rc = -1;
-
-if ( read_exact(ctx-fd, qemusig, sizeof(qemusig)) )
-{
-PERROR(Error reading QEMU signature);
-goto out;
-}
-
-if ( !memcmp(qemusig, DeviceModelRecord0002, sizeof(qemusig)) )
-{
-if ( read_exact(ctx-fd, qlen, sizeof(qlen)) )
-{
-PERROR(Error reading QEMU record length);
-goto out;
-}
-
-qbuf = malloc(qlen);
-if ( !qbuf )
-{
-PERROR(no memory for device model state);
-goto out;
-}
-
-if ( read_exact(ctx-fd, qbuf, qlen) )
-{
-PERROR(Error reading device model state);
-goto out;
-}
-}
-else
-{
-ERROR(Invalid device model state signature '%*.*s',
-  (int)sizeof(qemusig), (int)sizeof(qemusig), qemusig);
-goto out;
-}
-
-/* With Remus, this could be read many times */
-if ( ctx-x86_hvm.restore.qbuf )
-free(ctx-x86_hvm.restore.qbuf);
-ctx-x86_hvm.restore.qbuf = qbuf;
-ctx-x86_hvm.restore.qlen = qlen;
-rc = 0;
-
-out:
-if (rc)
-free(qbuf);
-return rc;
-}
-
-static int handle_qemu(struct xc_sr_context *ctx)
-{
-xc_interface *xch = ctx-xch;
-char path[256];
-uint32_t qlen = ctx-x86_hvm.restore.qlen;
-void *qbuf = ctx-x86_hvm.restore.qbuf;
-int rc = -1;
-FILE *fp = NULL;
-
-sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE.%u, ctx-domid);
-fp = fopen(path, wb);
-if ( !fp )
-{
-PERROR(Failed to open '%s' for writing, path);
-goto out;
-}
-
-DPRINTF(Writing %u bytes of QEMU data, qlen);
-if ( fwrite(qbuf, 1, qlen, fp) != qlen )
-{
-PERROR(Failed to write %u bytes of QEMU data, qlen);
-goto out;
-}
-
-rc = 0;
-
- out:
-if ( fp )
-fclose(fp);

[Xen-devel] [PATCH v2 14/27] tools/python: Conversion utility for legacy migration streams

2015-07-09 Thread Andrew Cooper
This utility will take a legacy stream as in input, and produce a v2 stream as
an output.  It is exec()'d by libxl to provide backwards compatibility.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/python/Makefile  |4 +
 tools/python/scripts/convert-legacy-stream |  678 
 2 files changed, 682 insertions(+)
 create mode 100755 tools/python/scripts/convert-legacy-stream

diff --git a/tools/python/Makefile b/tools/python/Makefile
index e933be8..df942a7 100644
--- a/tools/python/Makefile
+++ b/tools/python/Makefile
@@ -17,9 +17,13 @@ build: genwrap.py $(XEN_ROOT)/tools/libxl/libxl_types.idl \
 
 .PHONY: install
 install:
+   $(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR)
+
CC=$(CC) CFLAGS=$(PY_CFLAGS) $(PYTHON) setup.py install \
$(PYTHON_PREFIX_ARG) --root=$(DESTDIR) --force
 
+   $(INSTALL_PROG) scripts/convert-legacy-stream 
$(DESTDIR)$(PRIVATE_BINDIR)
+
 .PHONY: test
 test:
export LD_LIBRARY_PATH=$$(readlink -f ../libxc):$$(readlink -f 
../xenstore); $(PYTHON) test.py -b -u
diff --git a/tools/python/scripts/convert-legacy-stream 
b/tools/python/scripts/convert-legacy-stream
new file mode 100755
index 000..d54fa22
--- /dev/null
+++ b/tools/python/scripts/convert-legacy-stream
@@ -0,0 +1,678 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+
+Convert a legacy migration stream to a v2 stream.
+
+
+import sys
+import os, os.path
+import syslog
+import traceback
+
+from struct import calcsize, unpack, pack
+
+from xen.migration import legacy, public, libxc, libxl, xl
+
+__version__ = 1
+
+fin = None # Input file/fd
+fout = None# Output file/fd
+twidth = 0 # Legacy toolstack bitness (32 or 64)
+pv = None  # Boolean (pv or hvm)
+qemu = True# Boolean - process qemu record?
+log_to_syslog = False  # Boolean - Log to syslog instead of stdout/err?
+verbose = False# Boolean - Summarise stream contents
+
+def stream_read(_ = None):
+Read from the input
+return fin.read(_)
+
+def stream_write(_):
+Write to the output
+return fout.write(_)
+
+def info(msg):
+Info message, routed to appropriate destination
+if verbose:
+if log_to_syslog:
+for line in msg.split(\n):
+syslog.syslog(syslog.LOG_INFO, line)
+else:
+print msg
+
+def err(msg):
+Error message, routed to appropriate destination
+if log_to_syslog:
+for line in msg.split(\n):
+syslog.syslog(syslog.LOG_ERR, line)
+print  sys.stderr, msg
+
+class StreamError(StandardError):
+Error with the incoming migration stream
+pass
+
+class VM(object):
+Container of VM parameters
+
+def __init__(self, fmt):
+# Common
+self.p2m_size = 0
+
+# PV
+self.max_vcpu_id = 0
+self.online_vcpu_map = []
+self.width = 0
+self.levels = 0
+self.basic_len = 0
+self.extd = False
+self.xsave_len = 0
+
+# libxl
+self.libxl = fmt == libxl
+self.xenstore = [] # Deferred toolstack records
+
+def write_libxc_ihdr():
+stream_write(pack(libxc.IHDR_FORMAT,
+  libxc.IHDR_MARKER,  # Marker
+  libxc.IHDR_IDENT,   # Ident
+  libxc.IHDR_VERSION, # Version
+  libxc.IHDR_OPT_LE,  # Options
+  0, 0))  # Reserved
+
+def write_libxc_dhdr():
+if pv:
+dtype = libxc.DHDR_TYPE_x86_pv
+else:
+dtype = libxc.DHDR_TYPE_x86_hvm
+
+stream_write(pack(libxc.DHDR_FORMAT,
+  dtype,# Type
+  12,   # Page size
+  0,# Reserved
+  0,# Xen major (converted)
+  __version__)) # Xen minor (converted)
+
+def write_libxl_hdr():
+stream_write(pack(libxl.HDR_FORMAT,
+  libxl.HDR_IDENT, # Ident
+  libxl.HDR_VERSION,   # Version 2
+  libxl.HDR_OPT_LE |   # Options
+  libxl.HDR_OPT_LEGACY # Little Endian and Legacy
+  ))
+
+def write_record(rt, *argl):
+alldata = ''.join(argl)
+length = len(alldata)
+
+record = pack(libxc.RH_FORMAT, rt, length) + alldata
+plen = (8 - (length  7))  7
+record += '\x00' * plen
+
+stream_write(record)
+
+def write_libxc_pv_info(vm):
+write_record(libxc.REC_TYPE_x86_pv_info,
+ pack(libxc.X86_PV_INFO_FORMAT,
+  vm.width, vm.levels, 0, 0))
+
+def write_libxc_pv_p2m_frames(vm, pfns):
+write_record(libxc.REC_TYPE_x86_pv_p2m_frames,
+ pack(libxc.X86_PV_P2M_FRAMES_FORMAT,
+  0, vm.p2m_size - 1),
+  

[Xen-devel] [PATCH v2 27/27] tools/libxl: Drop all knowledge of toolstack callbacks

2015-07-09 Thread Andrew Cooper
Libxl has now been fully adjusted not to need them.

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxl/libxl_dom.c|1 -
 tools/libxl/libxl_internal.h   |2 --
 tools/libxl/libxl_save_callout.c   |   39 +---
 tools/libxl/libxl_save_helper.c|   29 ---
 tools/libxl/libxl_save_msgs_gen.pl |7 ++-
 5 files changed, 3 insertions(+), 75 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 9f5ddc9..3c765f4 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -2115,7 +2115,6 @@ void libxl__domain_suspend(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
 callbacks-suspend = libxl__domain_suspend_callback;
 
 callbacks-switch_qemu_logdirty = 
libxl__domain_suspend_common_switch_qemu_logdirty;
-dss-shs.callbacks.save.toolstack_save = libxl__toolstack_save;
 
 dss-sws.fd = dss-fd;
 dss-sws.ao = dss-ao;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1b62f25..13e2493 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2699,8 +2699,6 @@ _hidden void libxl__datacopier_prefixdata(libxl__egc*, 
libxl__datacopier_state*,
 
 typedef struct libxl__srm_save_callbacks {
 libxl__srm_save_autogen_callbacks a;
-int (*toolstack_save)(uint32_t domid, uint8_t **buf,
-  uint32_t *len, void *data);
 } libxl__srm_save_callbacks;
 
 typedef struct libxl__srm_restore_callbacks {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index cd18cd2..2a6662f 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -78,41 +78,12 @@ void libxl__xc_domain_restore(libxl__egc *egc, 
libxl__domain_create_state *dcs,
 void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss)
 {
 STATE_AO_GC(dss-ao);
-int r, rc, toolstack_data_fd = -1;
-uint32_t toolstack_data_len = 0;
-
-/* Resources we need to free */
-uint8_t *toolstack_data_buf = 0;
 
 unsigned cbflags = libxl__srm_callout_enumcallbacks_save
 (dss-shs.callbacks.save.a);
 
-if (dss-shs.callbacks.save.toolstack_save) {
-r = dss-shs.callbacks.save.toolstack_save
-(dss-domid, toolstack_data_buf, toolstack_data_len, dss);
-if (r) { rc = ERROR_FAIL; goto out; }
-
-dss-shs.toolstack_data_file = tmpfile();
-if (!dss-shs.toolstack_data_file) {
-LOGE(ERROR, cannot create toolstack data tmpfile);
-rc = ERROR_FAIL;
-goto out;
-}
-toolstack_data_fd = fileno(dss-shs.toolstack_data_file);
-
-r = libxl_write_exactly(CTX, toolstack_data_fd,
-toolstack_data_buf, toolstack_data_len,
-toolstack data tmpfile, 0);
-if (r) { rc = ERROR_FAIL; goto out; }
-
-/* file position must be reset before passing to libxl-save-helper. */
-r = lseek(toolstack_data_fd, 0, SEEK_SET);
-if (r) { rc = ERROR_FAIL; goto out; }
-}
-
 const unsigned long argnums[] = {
 dss-domid, 0, 0, dss-xcflags, dss-hvm,
-toolstack_data_fd, toolstack_data_len,
 cbflags,
 };
 
@@ -123,18 +94,10 @@ void libxl__xc_domain_save(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
 dss-shs.caller_state = dss;
 dss-shs.need_results = 0;
 
-free(toolstack_data_buf);
-
 run_helper(egc, dss-shs, --save-domain, dss-fd,
-   toolstack_data_fd, 1,
+   NULL, 0,
argnums, ARRAY_SIZE(argnums));
 return;
-
- out:
-free(toolstack_data_buf);
-if (dss-shs.toolstack_data_file) fclose(dss-shs.toolstack_data_file);
-
-libxl__xc_domain_save_done(egc, dss, rc, 0, 0);
 }
 
 
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index f196786..1622bb7 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -213,32 +213,8 @@ int helper_getreply(void *user)
 
 /*- other callbacks -*/
 
-static int toolstack_save_fd;
-static uint32_t toolstack_save_len;
 static struct save_callbacks helper_save_callbacks;
 
-static int toolstack_save_cb(uint32_t domid, uint8_t **buf,
- uint32_t *len, void *data)
-{
-int r;
-
-assert(toolstack_save_fd  0);
-
-/* This is a hack for remus */
-if (helper_save_callbacks.checkpoint) {
-r = lseek(toolstack_save_fd, 0, SEEK_SET);
-if (r) fail(errno,rewind toolstack data tmpfile);
-}
-
-*buf = xmalloc(toolstack_save_len);
-r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
-if (r0) fail(errno,read toolstack data);
-if (r==0) fail(0,read toolstack data eof);
-
-*len = toolstack_save_len;
-return 0;

[Xen-devel] [PATCH v2 21/27] tools/libxc+libxl+xl: Save v2 streams

2015-07-09 Thread Andrew Cooper
This is a complicated set of changes which must be done together for
bisectability.

 * libxl-save-helper is updated to unconditionally use libxc migration v2.
 * libxl compatibility workarounds in libxc are disabled for save operations.
 * libxl__stream_write_start() is logically spliced into the event location
   where libxl__xc_domain_save() used to reside.
 * xl is updated to indicate that the stream is now v2

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com
---
 tools/libxc/Makefile |2 --
 tools/libxl/libxl.h  |2 ++
 tools/libxl/libxl_dom.c  |   46 +-
 tools/libxl/libxl_save_helper.c  |2 +-
 tools/libxl/libxl_stream_write.c |   35 +++--
 tools/libxl/xl_cmdimpl.c |1 +
 6 files changed, 47 insertions(+), 41 deletions(-)

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 2cd0b1a..1aec848 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -64,8 +64,6 @@ GUEST_SRCS-$(CONFIG_X86) += xc_sr_save_x86_hvm.c
 GUEST_SRCS-y += xc_sr_restore.c
 GUEST_SRCS-y += xc_sr_save.c
 GUEST_SRCS-y += xc_offline_page.c xc_compression.c
-xc_sr_save_x86_hvm.o: CFLAGS += -DXG_LIBXL_HVM_COMPAT
-xc_sr_save_x86_hvm.opic: CFLAGS += -DXG_LIBXL_HVM_COMPAT
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index e64a606..4f24e5f 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -812,6 +812,8 @@
  *
  * If this is defined, then the libxl_domain_create_restore() interface takes
  * a stream_version parameter and supports a value of 2.
+ *
+ * libxl_domain_suspend() will produce a v2 stream.
  */
 #define LIBXL_HAVE_STREAM_V2 1
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 8642192..de05124 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1153,6 +1153,8 @@ int libxl__toolstack_restore(uint32_t domid, const 
uint8_t *ptr,
 
 /* Domain suspend (save) */
 
+static void stream_done(libxl__egc *egc,
+libxl__stream_write_state *sws, int rc);
 static void domain_suspend_done(libxl__egc *egc,
 libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
@@ -2117,50 +2119,22 @@ void libxl__domain_suspend(libxl__egc *egc, 
libxl__domain_suspend_state *dss)
 callbacks-switch_qemu_logdirty = 
libxl__domain_suspend_common_switch_qemu_logdirty;
 dss-shs.callbacks.save.toolstack_save = libxl__toolstack_save;
 
-libxl__xc_domain_save(egc, dss);
+dss-sws.fd = dss-fd;
+dss-sws.ao = dss-ao;
+dss-sws.completion_callback = stream_done;
+
+libxl__stream_write_start(egc, dss-sws);
 return;
 
  out:
 domain_suspend_done(egc, dss, rc);
 }
 
-void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
-int rc, int retval, int errnoval)
+static void stream_done(libxl__egc *egc,
+libxl__stream_write_state *sws, int rc)
 {
-libxl__domain_suspend_state *dss = dss_void;
-STATE_AO_GC(dss-ao);
-
-/* Convenience aliases */
-const libxl_domain_type type = dss-type;
-
-if (rc)
-goto out;
-
-if (retval) {
-LOGEV(ERROR, errnoval, saving domain: %s,
- dss-guest_responded ?
- domain responded to suspend request :
- domain did not respond to suspend request);
-if ( !dss-guest_responded )
-rc = ERROR_GUEST_TIMEDOUT;
-else if (dss-rc)
-rc = dss-rc;
-else
-rc = ERROR_FAIL;
-goto out;
-}
-
-if (type == LIBXL_DOMAIN_TYPE_HVM) {
-rc = libxl__domain_suspend_device_model(gc, dss);
-if (rc) goto out;
+libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws);
 
-libxl__domain_save_device_model(egc, dss, domain_suspend_done);
-return;
-}
-
-rc = 0;
-
-out:
 domain_suspend_done(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index efbe2eb..f196786 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -286,7 +286,7 @@ int main(int argc, char **argv)
 startup(save);
 setup_signals(save_signal_handler);
 
-r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
+r = xc_domain_save2(xch, io_fd, dom, max_iters, max_factor, flags,
helper_save_callbacks, hvm);
 complete(r);
 
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index bf568ad..331173f 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -276,8 +276,39 @@ static void 

[Xen-devel] [PATCH v2 12/27] tools/python: Other migration infrastructure

2015-07-09 Thread Andrew Cooper
Contains:
 * Reverse-engineered notes of the legacy format from xg_save_restore.h
 * Python implementation of the legacy format
 * Public HVM Params used in the legacy stream
 * XL header format

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
New in v2 - removes various many magic numbers from subsequent scripts
---
 tools/python/xen/migration/legacy.py |  279 ++
 tools/python/xen/migration/public.py |   21 +++
 tools/python/xen/migration/xl.py |   12 ++
 3 files changed, 312 insertions(+)
 create mode 100644 tools/python/xen/migration/legacy.py
 create mode 100644 tools/python/xen/migration/public.py
 create mode 100644 tools/python/xen/migration/xl.py

diff --git a/tools/python/xen/migration/legacy.py 
b/tools/python/xen/migration/legacy.py
new file mode 100644
index 000..2f2240a
--- /dev/null
+++ b/tools/python/xen/migration/legacy.py
@@ -0,0 +1,279 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+
+Libxc legacy migration streams
+
+Documentation and record structures for legacy migration
+
+
+
+SAVE/RESTORE/MIGRATE PROTOCOL
+=
+
+The general form of a stream of chunks is a header followed by a
+body consisting of a variable number of chunks (terminated by a
+chunk with type 0) followed by a trailer.
+
+For a rolling/checkpoint (e.g. remus) migration then the body and
+trailer phases can be repeated until an external event
+(e.g. failure) causes the process to terminate and commit to the
+most recent complete checkpoint.
+
+HEADER
+--
+
+unsigned long: p2m_size
+
+extended-info (PV-only, optional):
+
+  If first unsigned long == ~0UL then extended info is present,
+  otherwise unsigned long is part of p2m. Note that p2m_size above
+  does not include the length of the extended info.
+
+  extended-info:
+
+unsigned long: signature == ~0UL
+uint32_t   : number of bytes remaining in extended-info
+
+1 or more extended-info blocks of form:
+char[4]  : block identifier
+uint32_t : block data size
+bytes: block data
+
+defined extended-info blocks:
+vcpu : VCPU context info containing vcpu_guest_context_t.
+   The precise variant of the context structure
+   (e.g. 32 vs 64 bit) is distinguished by
+   the block size.
+extv   : Presence indicates use of extended VCPU context in
+   tail, data size is 0.
+
+p2m (PV-only):
+
+  consists of p2m_size bytes comprising an array of xen_pfn_t sized entries.
+
+BODY PHASE - Format A (for live migration or Remus without compression)
+--
+
+A series of chunks with a common header:
+  int  : chunk type
+
+If the chunk type is +ve then chunk contains guest memory data, and the
+type contains the number of pages in the batch:
+
+unsigned long[]  : PFN array, length == number of pages in batch
+   Each entry consists of XEN_DOMCTL_PFINFO_*
+   in bits 31-28 and the PFN number in bits 27-0.
+page data: PAGE_SIZE bytes for each page marked present in PFN
+   array
+
+If the chunk type is -ve then chunk consists of one of a number of
+metadata types.  See definitions of XC_SAVE_ID_* below.
+
+If chunk type is 0 then body phase is complete.
+
+
+BODY PHASE - Format B (for Remus with compression)
+--
+
+A series of chunks with a common header:
+  int  : chunk type
+
+If the chunk type is +ve then chunk contains array of PFNs corresponding
+to guest memory and type contains the number of PFNs in the batch:
+
+unsigned long[]  : PFN array, length == number of pages in batch
+   Each entry consists of XEN_DOMCTL_PFINFO_*
+   in bits 31-28 and the PFN number in bits 27-0.
+
+If the chunk type is -ve then chunk consists of one of a number of
+metadata types.  See definitions of XC_SAVE_ID_* below.
+
+If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the
+chunk consists of compressed page data, in the following format:
+
+unsigned long: Size of the compressed chunk to follow
+compressed data :  variable length data of size indicated above.
+   This chunk consists of compressed page data.
+   The number of pages in one chunk depends on
+   the amount of space available in the sender's
+   output buffer.
+
+Format of compressed data:
+  compressed_data = deltas*
+  delta   = marker, run*
+  marker  = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker]
+  RUNFLAG = 0
+  SKIPFLAG= 1  7
+  RUNLEN  = 7-bit unsigned value indicating number of WORDS in the run
+  run = 

[Xen-devel] [PATCH v2 20/27] tools/libxl: Infrastructure for writing a v2 stream

2015-07-09 Thread Andrew Cooper
From: Ross Lagerwall ross.lagerw...@citrix.com

This contains the event machinary and state machines to write non-checkpointed
migration v2 stream (with the exception of the xc_domain_save() handling which
is spliced later in a bisectable way).

Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Wei Liu wei.l...@citrix.com

---
As with the read side of things, this has undergone substantial changes in v2.
---
 tools/libxl/Makefile |2 +-
 tools/libxl/libxl_internal.h |   47 
 tools/libxl/libxl_stream_write.c |  451 ++
 3 files changed, 499 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_stream_write.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 0ebc35a..7d44483 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -95,7 +95,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o 
libxl_pci.o \
libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
libxl_internal.o libxl_utils.o libxl_uuid.o \
libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o 
\
-   libxl_stream_read.o \
+   libxl_stream_read.o libxl_stream_write.o \
libxl_save_callout.o _libxl_save_msgs_callout.o \
libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1cf1884..2beb534 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2973,6 +2973,52 @@ typedef void libxl__domain_suspend_cb(libxl__egc*,
 typedef void libxl__save_device_model_cb(libxl__egc*,
  libxl__domain_suspend_state*, int rc);
 
+/* State for writing a libxl migration v2 stream */
+typedef struct libxl__stream_write_state libxl__stream_write_state;
+
+typedef void (*sws_record_done_cb)(libxl__egc *egc,
+   libxl__stream_write_state *sws);
+
+struct libxl__stream_write_state {
+/* filled by the user */
+libxl__ao *ao;
+int fd;
+uint32_t domid;
+void (*completion_callback)(libxl__egc *egc,
+libxl__stream_write_state *sws,
+int rc);
+/* Private */
+int rc;
+bool running;
+
+/* Active-stuff handling */
+int joined_rc;
+
+/* Main stream-writing data */
+size_t padding;
+libxl__datacopier_state dc;
+sws_record_done_cb record_done_callback;
+
+/* Emulator blob handling */
+libxl__datacopier_state emu_dc;
+libxl__carefd *emu_carefd;
+libxl__sr_rec_hdr emu_rec_hdr;
+void *emu_body;
+};
+
+_hidden void libxl__stream_write_start(libxl__egc *egc,
+   libxl__stream_write_state *stream);
+
+_hidden void libxl__stream_write_abort(libxl__egc *egc,
+   libxl__stream_write_state *stream,
+   int rc);
+
+static inline bool libxl__stream_write_inuse(
+const libxl__stream_write_state *stream)
+{
+return stream-running;
+}
+
 typedef struct libxl__logdirty_switch {
 const char *cmd;
 const char *cmd_path;
@@ -3013,6 +3059,7 @@ struct libxl__domain_suspend_state {
 /* private for libxl__domain_save_device_model */
 libxl__save_device_model_cb *save_dm_callback;
 libxl__datacopier_state save_dm_datacopier;
+libxl__stream_write_state sws;
 };
 
 
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
new file mode 100644
index 000..bf568ad
--- /dev/null
+++ b/tools/libxl/libxl_stream_write.c
@@ -0,0 +1,451 @@
+/*
+ * Copyright (C) 2015  Citrix Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include libxl_osdeps.h /* must come before any other headers */
+
+#include libxl_internal.h
+
+/*
+ * Infrastructure for writing a domain to a libxl migration v2 stream.
+ *
+ * Entry points from outside:
+ *  - libxl__stream_write_start()
+ * - Start writing a stream from the start.
+ *
+ * In normal operation, there are two tasks running at once; this stream
+ * processing, and the libxl-save-helper.  check_stream_finished() is used to
+ * join all the tasks in both success and error 

  1   2   3   4   >