Re: Is xl vcpu-set broken

2023-02-28 Thread Joe Jin
On 2/28/23 2:24 AM, Anthony PERARD wrote:
> On Tue, Feb 28, 2023 at 10:37:00AM +0100, Jan Beulich wrote:
>> On 28.02.2023 07:44, Joe Jin wrote:
>>> We encountered a vcpu-set issue on old xen, when I tried to confirm
>>> if xen upstream xen has the same issue I find neither my upstream build
>>> nor ubuntu 22.04 xen-hypervisor-4.16 work.
>>>
>>> I can add vcpus(8->16) to my guest but I can not reduce vcpu number:
>>>
>>> root@ubuntu2204:~/vm# xl list
>>> Name    ID   Mem VCPUs    State    
>>> Time(s)
>>> Domain-0 0 255424    32 r- 
>>> 991.9
>>> testvm   1   4088    16 -b  
>>> 94.6
>>> root@ubuntu2204:~/vm# xl vcpu-set testvm 8
>>> root@ubuntu2204:~/vm# xl list
>>> Name    ID   Mem VCPUs    State    
>>> Time(s)
>>> Domain-0 0 255424    32 r- 
>>> 998.5
>>> testvm   1   4088    16 -b  
>>> 97.3
>>>
>>> After xl vcpu-set, xenstore showed online cpu number reduced to 8:
> [...]
>>> But guest did not received any offline events at all.
>>>
>>> From source code my understand is for cpu online, libxl will
>>> send device_add to qemu to trigger vcpu add, for cpu offline,
>>> it still rely on xenstore, is this correct?
>> Judging from the DSDT we provide, offlining looks to also be intended to
>> go the ACPI way. Whereas libxl only ever sends "device_add" commands to
>> qemu, afaics (or "cpu-add" for older qemu). Anthony - do you have any
>> insight what the intentions here are?
> The intention is to one day implement cpu offline in QEMU upstream for
> HVM guest, I don't think that's ever been done so far.
>
> As we use device_add for cpu hotplug, we would probably do device_del
> for hot-unplug, so qemu would still have to send the appropriate even to
> the guest.

Sounds like this still in TODO list but not of build or test env issue.

>
> Someone will have to figure out if "device_del" works with a Xen guest,
> doc here:
> https://www.qemu.org/docs/master/system/cpu-hotplug.html#vcpu-hot-unplug

Tried QMP and looks it does not works:

(QEMU) device_del id=cpu-5
{"error": {"class": "GenericError", "desc": "acpi: device unplug request for 
not supported device type: qemu32-i386-cpu"}}


Thanks so much for your input!

Cheers,
Joe




Re: Is xl vcpu-set broken

2023-02-28 Thread Joe Jin
On 2/28/23 12:49 AM, Andrew Cooper wrote:
> On 28/02/2023 6:44 am, Joe Jin wrote:
>> Hi,
>>
>> We encountered a vcpu-set issue on old xen, when I tried to confirm
>> if xen upstream xen has the same issue I find neither my upstream build
>> nor ubuntu 22.04 xen-hypervisor-4.16 work.
>>
>> I can add vcpus(8->16) to my guest but I can not reduce vcpu number:
>>
>> root@ubuntu2204:~/vm# xl list
>> Name    ID   Mem VCPUs    State    
>> Time(s)
>> Domain-0 0 255424    32 r- 
>> 991.9
>> testvm   1   4088    16 -b  
>> 94.6
>> root@ubuntu2204:~/vm# xl vcpu-set testvm 8
>> root@ubuntu2204:~/vm# xl list
>> Name    ID   Mem VCPUs    State    
>> Time(s)
>> Domain-0 0 255424    32 r- 
>> 998.5
>> testvm   1   4088    16 -b  
>> 97.3
>>
>> After xl vcpu-set, xenstore showed online cpu number reduced to 8:
>>
>> /local/domain/1/vm = "/vm/aa109ea0-2fde-4712-81d8-75f73bd73715"
>> /local/domain/1/name = "testvm"
>> /local/domain/1/cpu = ""
>> /local/domain/1/cpu/0 = ""
>> /local/domain/1/cpu/0/availability = "online"
>> /local/domain/1/cpu/1 = ""
>> /local/domain/1/cpu/1/availability = "online"
>> /local/domain/1/cpu/2 = ""
>> /local/domain/1/cpu/2/availability = "online"
>> /local/domain/1/cpu/3 = ""
>> /local/domain/1/cpu/3/availability = "online"
>> /local/domain/1/cpu/4 = ""
>> /local/domain/1/cpu/4/availability = "online"
>> /local/domain/1/cpu/5 = ""
>> /local/domain/1/cpu/5/availability = "online"
>> /local/domain/1/cpu/6 = ""
>> /local/domain/1/cpu/6/availability = "online"
>> /local/domain/1/cpu/7 = ""
>> /local/domain/1/cpu/7/availability = "online"
>> /local/domain/1/cpu/8 = ""
>> /local/domain/1/cpu/8/availability = "offline"
>> /local/domain/1/cpu/9 = ""
>> /local/domain/1/cpu/9/availability = "offline"
>> /local/domain/1/cpu/10 = ""
>> /local/domain/1/cpu/10/availability = "offline"
>> /local/domain/1/cpu/11 = ""
>> /local/domain/1/cpu/11/availability = "offline"
>> /local/domain/1/cpu/12 = ""
>> /local/domain/1/cpu/12/availability = "offline"
>> /local/domain/1/cpu/13 = ""
>> /local/domain/1/cpu/13/availability = "offline"
>> /local/domain/1/cpu/14 = ""
>> /local/domain/1/cpu/14/availability = "offline"
>> /local/domain/1/cpu/15 = ""
>> /local/domain/1/cpu/15/availability = "offline"
>> /local/domain/1/cpu/16 = ""
>> /local/domain/1/cpu/16/availability = "offline"
>>
>> But guest did not received any offline events at all.
>>
>> From source code my understand is for cpu online, libxl will
>> send device_add to qemu to trigger vcpu add, for cpu offline,
>> it still rely on xenstore, is this correct? anything can block
>> cpu scale down?
>>
>> Appreciate for any suggestions!
> Your mention of Qemu suggests this is an HVM guest.  Can you confirm?
Yes it's HVM guest.

Thanks,
Joe




Re: Is xl vcpu-set broken

2023-02-28 Thread Anthony PERARD
On Tue, Feb 28, 2023 at 10:37:00AM +0100, Jan Beulich wrote:
> On 28.02.2023 07:44, Joe Jin wrote:
> > We encountered a vcpu-set issue on old xen, when I tried to confirm
> > if xen upstream xen has the same issue I find neither my upstream build
> > nor ubuntu 22.04 xen-hypervisor-4.16 work.
> > 
> > I can add vcpus(8->16) to my guest but I can not reduce vcpu number:
> > 
> > root@ubuntu2204:~/vm# xl list
> > Name    ID   Mem VCPUs    State    
> > Time(s)
> > Domain-0 0 255424    32 r- 
> > 991.9
> > testvm   1   4088    16 -b  
> > 94.6
> > root@ubuntu2204:~/vm# xl vcpu-set testvm 8
> > root@ubuntu2204:~/vm# xl list
> > Name    ID   Mem VCPUs    State    
> > Time(s)
> > Domain-0 0 255424    32 r- 
> > 998.5
> > testvm   1   4088    16 -b  
> > 97.3
> > 
> > After xl vcpu-set, xenstore showed online cpu number reduced to 8:
[...]
> > 
> > But guest did not received any offline events at all.
> > 
> > From source code my understand is for cpu online, libxl will
> > send device_add to qemu to trigger vcpu add, for cpu offline,
> > it still rely on xenstore, is this correct?
> 
> Judging from the DSDT we provide, offlining looks to also be intended to
> go the ACPI way. Whereas libxl only ever sends "device_add" commands to
> qemu, afaics (or "cpu-add" for older qemu). Anthony - do you have any
> insight what the intentions here are?

The intention is to one day implement cpu offline in QEMU upstream for
HVM guest, I don't think that's ever been done so far.

As we use device_add for cpu hotplug, we would probably do device_del
for hot-unplug, so qemu would still have to send the appropriate even to
the guest.

Someone will have to figure out if "device_del" works with a Xen guest,
doc here:
https://www.qemu.org/docs/master/system/cpu-hotplug.html#vcpu-hot-unplug

Cheers,

-- 
Anthony PERARD



Re: Is xl vcpu-set broken

2023-02-28 Thread Jan Beulich
On 28.02.2023 07:44, Joe Jin wrote:
> Hi,
> 
> We encountered a vcpu-set issue on old xen, when I tried to confirm
> if xen upstream xen has the same issue I find neither my upstream build
> nor ubuntu 22.04 xen-hypervisor-4.16 work.
> 
> I can add vcpus(8->16) to my guest but I can not reduce vcpu number:
> 
> root@ubuntu2204:~/vm# xl list
> Name    ID   Mem VCPUs    State    Time(s)
> Domain-0 0 255424    32 r- 
> 991.9
> testvm   1   4088    16 -b  
> 94.6
> root@ubuntu2204:~/vm# xl vcpu-set testvm 8
> root@ubuntu2204:~/vm# xl list
> Name    ID   Mem VCPUs    State    Time(s)
> Domain-0 0 255424    32 r- 
> 998.5
> testvm   1   4088    16 -b  
> 97.3
> 
> After xl vcpu-set, xenstore showed online cpu number reduced to 8:
> 
> /local/domain/1/vm = "/vm/aa109ea0-2fde-4712-81d8-75f73bd73715"
> /local/domain/1/name = "testvm"
> /local/domain/1/cpu = ""
> /local/domain/1/cpu/0 = ""
> /local/domain/1/cpu/0/availability = "online"
> /local/domain/1/cpu/1 = ""
> /local/domain/1/cpu/1/availability = "online"
> /local/domain/1/cpu/2 = ""
> /local/domain/1/cpu/2/availability = "online"
> /local/domain/1/cpu/3 = ""
> /local/domain/1/cpu/3/availability = "online"
> /local/domain/1/cpu/4 = ""
> /local/domain/1/cpu/4/availability = "online"
> /local/domain/1/cpu/5 = ""
> /local/domain/1/cpu/5/availability = "online"
> /local/domain/1/cpu/6 = ""
> /local/domain/1/cpu/6/availability = "online"
> /local/domain/1/cpu/7 = ""
> /local/domain/1/cpu/7/availability = "online"
> /local/domain/1/cpu/8 = ""
> /local/domain/1/cpu/8/availability = "offline"
> /local/domain/1/cpu/9 = ""
> /local/domain/1/cpu/9/availability = "offline"
> /local/domain/1/cpu/10 = ""
> /local/domain/1/cpu/10/availability = "offline"
> /local/domain/1/cpu/11 = ""
> /local/domain/1/cpu/11/availability = "offline"
> /local/domain/1/cpu/12 = ""
> /local/domain/1/cpu/12/availability = "offline"
> /local/domain/1/cpu/13 = ""
> /local/domain/1/cpu/13/availability = "offline"
> /local/domain/1/cpu/14 = ""
> /local/domain/1/cpu/14/availability = "offline"
> /local/domain/1/cpu/15 = ""
> /local/domain/1/cpu/15/availability = "offline"
> /local/domain/1/cpu/16 = ""
> /local/domain/1/cpu/16/availability = "offline"
> 
> But guest did not received any offline events at all.
> 
> From source code my understand is for cpu online, libxl will
> send device_add to qemu to trigger vcpu add, for cpu offline,
> it still rely on xenstore, is this correct?

Judging from the DSDT we provide, offlining looks to also be intended to
go the ACPI way. Whereas libxl only ever sends "device_add" commands to
qemu, afaics (or "cpu-add" for older qemu). Anthony - do you have any
insight what the intentions here are?

> anything can block cpu scale down?

Nothing as far as I'm aware. But if done the xenbus way, the guest needs
to watch the respective node. Yet on x86 that's done in upstream Linux
only for PV and PVH (see setup_vcpu_hotplug_event()); as per Andrew's
question the assumption here looks to be that you're asking about a HVM
guest, though. And of course it can't be blindly enabled there, as the
"online" operation should not be handled there in that case (or else
there'll be colliding onlining requests from ACPI and xenbus).

Jan



Re: Is xl vcpu-set broken

2023-02-28 Thread Andrew Cooper
On 28/02/2023 6:44 am, Joe Jin wrote:
> Hi,
>
> We encountered a vcpu-set issue on old xen, when I tried to confirm
> if xen upstream xen has the same issue I find neither my upstream build
> nor ubuntu 22.04 xen-hypervisor-4.16 work.
>
> I can add vcpus(8->16) to my guest but I can not reduce vcpu number:
>
> root@ubuntu2204:~/vm# xl list
> Name    ID   Mem VCPUs    State    Time(s)
> Domain-0 0 255424    32 r- 
> 991.9
> testvm   1   4088    16 -b  
> 94.6
> root@ubuntu2204:~/vm# xl vcpu-set testvm 8
> root@ubuntu2204:~/vm# xl list
> Name    ID   Mem VCPUs    State    Time(s)
> Domain-0 0 255424    32 r- 
> 998.5
> testvm   1   4088    16 -b  
> 97.3
>
> After xl vcpu-set, xenstore showed online cpu number reduced to 8:
>
> /local/domain/1/vm = "/vm/aa109ea0-2fde-4712-81d8-75f73bd73715"
> /local/domain/1/name = "testvm"
> /local/domain/1/cpu = ""
> /local/domain/1/cpu/0 = ""
> /local/domain/1/cpu/0/availability = "online"
> /local/domain/1/cpu/1 = ""
> /local/domain/1/cpu/1/availability = "online"
> /local/domain/1/cpu/2 = ""
> /local/domain/1/cpu/2/availability = "online"
> /local/domain/1/cpu/3 = ""
> /local/domain/1/cpu/3/availability = "online"
> /local/domain/1/cpu/4 = ""
> /local/domain/1/cpu/4/availability = "online"
> /local/domain/1/cpu/5 = ""
> /local/domain/1/cpu/5/availability = "online"
> /local/domain/1/cpu/6 = ""
> /local/domain/1/cpu/6/availability = "online"
> /local/domain/1/cpu/7 = ""
> /local/domain/1/cpu/7/availability = "online"
> /local/domain/1/cpu/8 = ""
> /local/domain/1/cpu/8/availability = "offline"
> /local/domain/1/cpu/9 = ""
> /local/domain/1/cpu/9/availability = "offline"
> /local/domain/1/cpu/10 = ""
> /local/domain/1/cpu/10/availability = "offline"
> /local/domain/1/cpu/11 = ""
> /local/domain/1/cpu/11/availability = "offline"
> /local/domain/1/cpu/12 = ""
> /local/domain/1/cpu/12/availability = "offline"
> /local/domain/1/cpu/13 = ""
> /local/domain/1/cpu/13/availability = "offline"
> /local/domain/1/cpu/14 = ""
> /local/domain/1/cpu/14/availability = "offline"
> /local/domain/1/cpu/15 = ""
> /local/domain/1/cpu/15/availability = "offline"
> /local/domain/1/cpu/16 = ""
> /local/domain/1/cpu/16/availability = "offline"
>
> But guest did not received any offline events at all.
>
> From source code my understand is for cpu online, libxl will
> send device_add to qemu to trigger vcpu add, for cpu offline,
> it still rely on xenstore, is this correct? anything can block
> cpu scale down?
>
> Appreciate for any suggestions!

Your mention of Qemu suggests this is an HVM guest.  Can you confirm?

~Andrew