On Sat, Nov 02, 2019 at 06:40:52AM +0900, Iori YONEJI wrote:
> On Tue, Oct 29, 2019 at 02:17:28AM -0700, Mike Larkin wrote:
> > On Thu, Oct 24, 2019 at 08:54:58AM +0900, Iori YONEJI wrote:
> > > Hello tech@,
> > > 
> > > Here is the patch discussed in the previous email. This part mainly
> > > covers changes in the declaration part and fault handlers.
> > > 
> > 
> > Hello,
> > 
> >  I read through the three diffs and have some feedback.
> > 
> > First, please reformat all 3 diffs using style(9) guidelines. There are many
> > spaces vs tabs issues in the diffs.
> Thank you for your review. I'll fix the issues on this weekend.
> 
> First of all, the reason of the spaces vs. tabs issues was due to
> mangling by the mailer. I fixed this now, but I'm not aware of other
> types of style issues, even I will go through the patches to find other
> style mismatches after started working on them.
> 
> > Second, there seems to be quite a bit of important code missing here:
> Most of them convinced me, but let me leave a few comment for them.
> 

Iori Yoneji and I are discussing this in another thread, but for the benefit
of searchability/posterity, see below for a summary.

> 1. Segment base and limit check
> > 1. There appears to be no handling of segment base and limits for 32 bit
> >    instructions. These need to be read from the gdt and strictly adhered
> >    to. For 64 bit mode, it's not as important, unless the guest has enabled
> >    LMSLE (long mode segment limit enable) on AMD CPUs, in which case the
> >    limits need to be checked also. If I recall correctly, there are also
> >    some different rules that need to be followed for 32 bit segment use
> >    relating to permission checks and segment types.
> I wasn't aware of limit check, but I think vcpu would be triggered #GP
> if the segment was out of range instead of EPT violation, wouldn't it?
>

The SDM seems to be a bit vague here about the rules for checking segment limits
in VMX non-root mode (eg, inside a VM) and whether that causes an exit or not
based on various configurations, specifically around segment limits. This is
likely an issue only for 32 bit guests (we can block LMSLE).

> 2. Privilege level check
> > 2. For that matter, there appears to be no handling of any permission or
> >    privilege checks in the instruction emulator. This means any privilege
> >    level can read or write any memory in the VM.
> Sorry, it is my very big fault. I must check CPU mode in next post.
> 

In the generic EPT case (not a segment limit issue), this is likely handled
properly already. We are discussing if it makes sense to put the permission
checks in anyway, in case this code ends up being used in other places.

> 3-4. 'A'ccessed and 'D'irty bit management
> > 3. There appears to be no handling of the updating of the 'A'ccessed or
> >    'D'irty bits on successful page table walk and writing to a page
> >    computed by that translation. Granted, this probably needs to go into
> >    the existing translate_gva function for the 'A' bit, but the 'D' bit
> >    needs to also be handled here.
> > 
> > 4. There appears to be no handling of the updating of the 'A' bit in the
> >    GDT for the segment descriptor in use.
> Yes, but I'm not sure what would be the expected use of A/D bit
> management on MMIO region because it wouldn't be subject to swap in/out.
> I think I will understand the expected behavior after working on it,
> however, I couldn't get how it compromise the virtual machine features.
> 

There are certain degenerate cases that might require this, and we are unclear
under these circumstances if the CPU does this for us or if the emulator
needs to. As pointed out above, the need for A/D bits in this particular case
is likely minimal, but in the same theme as above, we may want to make the
code work generically instead of assuming the emulator will only ever be
used for accesses in the MMIO region.

-ml


> 5-7. Prefixes
> > 5. The code seems to ignore any segment prefix override bytes.
> > 
> > 6. The code seems to ignore forbidden instruction prefixes (like rex.w
> >    or lock prefixes being placed before instructions where they don't
> >    make sense or are prohibited by the SDM)
> > 
> > 7. For that matter, operand size encoding prefixes don't seem to be
> >    honoured at all
> Yes, there are some prefixes I missed in the patches. I will enhance
> them to handle those prefixes.
> One thing to confirm: Is it OK not to calculate in the segment registers
> (whether override-ed or not) but just count the bytes, because we know
> the address the instruction intended to access?
> 
> 8. %rflags
> > 8. %rflags is not being updated in any scenario. It looks like there were
> >    a few #defines for things like VMM_EMUL_ADD/SUB, which would require
> >    setting various flags in %rflags, but I can't seem to find the code that
> >    actually does these instructions, so maybe these are just old #defines?
> MOV is the only instruction in my 'top-priority' to-do list, but I will
> handle these instructions as well as %rflags if needed.
> For now, the #defines has no role but just suggesting how the parser
> indicates different types of the instructions, so I delete those until
> they get needed.
> > 
> > While this appears to be a step in the right direction, I'm afraid there
> > is quite a bit that needs to get done before we can commit this. The
> > items above are not nit-pick items; they are critical for security and
> > without implementing these we will end up opening huge security holes
> > inside the VM.
> > 
> > Please let me know if any of these are actually implemented somewhere
> > in the three diffs; it's certainly possible I missed something.
> > 
> > How would you like to proceed? If you would like to clean up the style
> > and start implementing the items above, I'm happy to take another look
> > once you make some progress. Let me know.
> > 
> > Thanks again for trying to make progress here. I hope we can get things
> > moving forward.
> > 
> > -ml
> > 
> 
> Thanks again too for your precise review and feedback. I will work on
> those issues on this weekend (and might take a couple of weeks).
> 
> In my mind, 2 is the most urgent, and 5-7 are also considered as bugs,
> not 'unimplemented' things. So I will start with them.
> I will skip supporting non-MOV instructions in next try, so 8 will also
> be potentially solved (I will trigger an exception if those are used by
> the guest). 1 and 3-4 remain uncertain for me, though.
> 
> 
> Iori Yoneji
> 
> > > 
> > > Index: sys/arch/amd64/amd64/vmm.c
> > > ===================================================================
> > > RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v
> > > retrieving revision 1.254
> > > diff -u -p -r1.254 vmm.c
> > > --- sys/arch/amd64/amd64/vmm.c    22 Sep 2019 08:47:54 -0000    1.254
> > > +++ sys/arch/amd64/amd64/vmm.c    23 Oct 2019 23:04:30 -0000
> > > @@ -107,6 +107,27 @@ struct vmm_softc {
> > >      uint8_t            vpids[512];    /* bitmap of used VPID/ASIDs */
> > >  };
> > > 
> > > +/* Instruction type collection for MMIO emulation */
> > > +#define VMM_EMUL_MOV 1
> > > +#define VMM_EMUL_ADD 2
> > > +#define VMM_EMUL_SUB 3
> > > +#define VMM_EMUL_XCHG 4
> > > +
> > > +/* vmm_emul_data_desc indicates a location used for MMIO emulation */
> > > +struct vmm_emul_data_desc {
> > > +    vaddr_t    gpa;
> > > +    int    reg_id;        /* -1: use gpa as the data location */
> > > +};
> > > +
> > > +struct vmm_emul_instruction {
> > > +    struct vmm_emul_data_desc src;
> > > +    struct vmm_emul_data_desc dest;
> > > +    int    len;
> > > +    int    insn_type;
> > > +    int    width;     /* effective width of data to be wrote */
> > > +    int    zero_sign; /* zero or sign extend? */
> > > +};
> > > +
> > >  void vmx_dump_vmcs_field(uint16_t, const char *);
> > >  int vmm_enabled(void);
> > >  int vmm_probe(struct device *, void *, void *);
> > > @@ -167,7 +188,9 @@ int vmx_handle_cr0_write(struct vcpu *,
> > >  int vmx_handle_cr4_write(struct vcpu *, uint64_t);
> > >  int vmx_handle_cr(struct vcpu *);
> > >  int svm_handle_inout(struct vcpu *);
> > > +int svm_handle_mmio(struct vcpu *, paddr_t);
> > >  int vmx_handle_inout(struct vcpu *);
> > > +int vmx_handle_mmio(struct vcpu *, paddr_t);
> > >  int svm_handle_hlt(struct vcpu *);
> > >  int vmx_handle_hlt(struct vcpu *);
> > >  int vmm_inject_ud(struct vcpu *);
> > > @@ -183,6 +206,9 @@ int svm_get_guest_faulttype(struct vmcb
> > >  int vmx_get_exit_qualification(uint64_t *);
> > >  int vmm_get_guest_cpu_cpl(struct vcpu *);
> > >  int vmm_get_guest_cpu_mode(struct vcpu *);
> > > +int parse_instruction(struct vcpu *vcpu, uint8_t insn[], paddr_t iomem,
> > > +        struct vmm_emul_instruction *info);
> > > +int emul_mmio(struct vcpu *, paddr_t, vaddr_t, struct 
> > > vmm_emul_instruction *);
> > >  int svm_fault_page(struct vcpu *, paddr_t);
> > >  int vmx_fault_page(struct vcpu *, paddr_t);
> > >  int vmx_handle_np_fault(struct vcpu *);
> > > @@ -1602,8 +1628,6 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin
> > >  errout:
> > >      ret = EINVAL;
> > >  out:
> > > -    if (vmclear(&vcpu->vc_control_pa))
> > > -        ret = EINVAL;
> > >      return (ret);
> > >  }
> > > 
> > > @@ -4079,7 +4103,7 @@ vmm_translate_gva(struct vcpu *vcpu, uin
> > > 
> > >      if (vmm_softc->mode == VMM_MODE_EPT ||
> > >          vmm_softc->mode == VMM_MODE_VMX) {
> > > -        if (vcpu_readregs_vmx(vcpu, VM_RWREGS_ALL, &vrs))
> > > +        if (vcpu_readregs_vmx(vcpu, VM_RWREGS_CRS | VM_RWREGS_MSRS, 
> > > &vrs))
> > >              return (EINVAL);
> > >      } else if (vmm_softc->mode == VMM_MODE_RVI ||
> > >          vmm_softc->mode == VMM_MODE_SVM) {
> > > @@ -4793,6 +4817,12 @@ svm_handle_exit(struct vcpu *vcpu)
> > >          break;
> > >      case SVM_VMEXIT_NPF:
> > >          ret = svm_handle_np_fault(vcpu);
> > > +        if (ret == EAGAIN) { /* MMIO is handled */
> > > +            update_rip = 1;
> > > +        } else if (ret == ERANGE) {
> > > +            update_rip = 1;
> > > +            ret = 0;
> > > +        }
> > >          break;
> > >      case SVM_VMEXIT_CPUID:
> > >          ret = vmm_handle_cpuid(vcpu);
> > > @@ -4885,6 +4915,12 @@ vmx_handle_exit(struct vcpu *vcpu)
> > >          break;
> > >      case VMX_EXIT_EPT_VIOLATION:
> > >          ret = vmx_handle_np_fault(vcpu);
> > > +        if (ret == EAGAIN) { /* MMIO is handled */
> > > +            update_rip = 1;
> > > +        } else if (ret == ERANGE) {
> > > +            update_rip = 1;
> > > +            ret = 0;
> > > +        }
> > >          break;
> > >      case VMX_EXIT_CPUID:
> > >          ret = vmm_handle_cpuid(vcpu);
> > > @@ -5063,10 +5099,8 @@ vmm_get_guest_memtype(struct vm *vm, pad
> > >      int i;
> > >      struct vm_mem_range *vmr;
> > > 
> > > -    if (gpa >= VMM_PCI_MMIO_BAR_BASE && gpa <= VMM_PCI_MMIO_BAR_END) {
> > > -        DPRINTF("guest mmio access @ 0x%llx\n", (uint64_t)gpa);
> > > -        return (VMM_MEM_TYPE_REGULAR);
> > > -    }
> > > +    if (gpa >= VMM_PCI_MMIO_BAR_BASE && gpa <= VMM_PCI_MMIO_BAR_END)
> > > +        return (VMM_MEM_TYPE_MMIO);
> > > 
> > >      /* XXX Use binary search? */
> > >      for (i = 0; i < vm->vm_nmemranges; i++) {
> > > @@ -5187,6 +5221,12 @@ svm_fault_page(struct vcpu *vcpu, paddr_
> > >   *
> > >   * High level nested paging handler for SVM. Verifies that a fault is 
> > > for a
> > >   * valid memory region, then faults a page, or aborts otherwise.
> > > + *
> > > + * Return values:
> > > + * 0: Successfully corresponding page assigned
> > > + * ERANGE: The range is not a memory but MMIO
> > > + * EAGAIN: In addition to above, the device must handled by vmd(8)
> > > + * and other errors produced by svm_fault_page and svm_handle_mmio
> > >   */
> > >  int
> > >  svm_handle_np_fault(struct vcpu *vcpu)
> > > @@ -5204,6 +5244,15 @@ svm_handle_np_fault(struct vcpu *vcpu)
> > >      case VMM_MEM_TYPE_REGULAR:
> > >          ret = svm_fault_page(vcpu, gpa);
> > >          break;
> > > +    case VMM_MEM_TYPE_MMIO:
> > > +#ifdef VMM_DEBUG
> > > +        printf("%s: mmio %d for GPA 0x%llx\n",
> > > +            __func__, gpa_memtype, gpa);
> > > +#endif /* VMM_DEBUG */
> > > +        ret = svm_handle_mmio(vcpu, gpa);
> > > +        if (ret == 0)
> > > +            ret = ERANGE;
> > > +        break;
> > >      default:
> > >          printf("unknown memory type %d for GPA 0x%llx\n",
> > >              gpa_memtype, gpa);
> > > @@ -5245,6 +5294,11 @@ vmx_fault_page(struct vcpu *vcpu, paddr_
> > >   *
> > >   * High level nested paging handler for VMX. Verifies that a fault is 
> > > for a
> > >   * valid memory region, then faults a page, or aborts otherwise.
> > > + *
> > > + * Return values:
> > > + * 0: Successfully corresponding page assigned
> > > + * EAGAIN: The range is not a memory but MMIO
> > > + * and other errors produced b vmx_fault_page and vmx_handle_mmio
> > >   */
> > >  int
> > >  vmx_handle_np_fault(struct vcpu *vcpu)
> > > @@ -5263,6 +5317,15 @@ vmx_handle_np_fault(struct vcpu *vcpu)
> > >      case VMM_MEM_TYPE_REGULAR:
> > >          ret = vmx_fault_page(vcpu, gpa);
> > >          break;
> > > +    case VMM_MEM_TYPE_MMIO:
> > > +#ifdef VMM_DEBUG
> > > +        printf("%s: mmio %d for GPA 0x%llx\n",
> > > +            __func__, gpa_memtype, gpa);
> > > +#endif /* VMM_DEBUG */
> > > +        ret = vmx_handle_mmio(vcpu, gpa);
> > > +        if (ret == 0)
> > > +            ret = ERANGE;
> > > +        break;
> > >      default:
> > >          printf("unknown memory type %d for GPA 0x%llx\n",
> > >              gpa_memtype, gpa);
> > > 

Reply via email to