Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
I tested on VMware Fusion with 3, 4 and 8 CPUs, and it works in all cases. (XEN) Xen version 4.6.1-pre ( 4.6.1~pre-1skyport1) (eswi...@skyportsystems.com) (gcc (Debian 5.2.1-19.1skyport1) 5.2.1 20150930) debug=n Wed Dec 2 07:22:20 PST 2015 (XEN) Bootloader: SYSLINUX 4.05 20140113 (XEN) Command line: xen console=com1,vga com1=115200 no-bootscrub dom0_mem=2048M,max:2048M loglvl=all cpuinfo=1 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009f800 (usable) (XEN) 0009f800 - 000a (reserved) (XEN) 000dc000 - 0010 (reserved) (XEN) 0010 - bfef (usable) (XEN) bfef - bfeff000 (ACPI data) (XEN) bfeff000 - bff0 (ACPI NVS) (XEN) bff0 - c000 (usable) (XEN) f000 - f800 (reserved) (XEN) fec0 - fec1 (reserved) (XEN) fee0 - fee01000 (reserved) (XEN) fffe - 0001 (reserved) (XEN) 0001 - 0001c000 (usable) (XEN) ACPI: RSDP 000F6A10, 0024 (r2 PTLTD ) (XEN) ACPI: XSDT BFEF030B, 0054 (r1 INTEL 440BX 604 VMW 1324272) (XEN) ACPI: FACP BFEFEE73, 00F4 (r4 INTEL 440BX 604 PTL F4240) (XEN) ACPI: DSDT BFEF05B1, E8C2 (r1 PTLTD Custom604 MSFT 301) (XEN) ACPI: FACS BFEFFFC0, 0040 (XEN) ACPI: BOOT BFEF0589, 0028 (r1 PTLTD $SBFTBL$ 604 LTP1) (XEN) ACPI: APIC BFEF050F, 007A (r1 PTLTD APIC604 LTP0) (XEN) ACPI: MCFG BFEF04D3, 003C (r1 PTLTD $PCITBL$ 604 LTP1) (XEN) ACPI: SRAT BFEF03C3, 0110 (r2 VMWARE MEMPLUG 604 VMW 1) (XEN) ACPI: WAET BFEF039B, 0028 (r1 VMWARE VMW WAET 604 VMW 1) (XEN) System RAM: 6143MB (6291004kB) (XEN) SRAT: PXM 0 -> APIC 00 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 02 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 04 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 06 -> Node 0 (XEN) SRAT: Node 0 PXM 0 0-a (XEN) SRAT: Node 0 PXM 0 10-1000 (XEN) SRAT: Node 0 PXM 0 1000-c000 (XEN) SRAT: Node 0 PXM 0 1-1c000 (XEN) NUMA: Allocated memnodemap from 1bdc5 - 1bdc52000 (XEN) NUMA: Using 8 for the hash shift. (XEN) Domain heap initialised (XEN) found SMP MP-table at 000f6a80 (XEN) DMI present. (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0x1008 (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1004,1:0], pm1x_evt[1:1000,1:0] (XEN) ACPI: wakeup_vec[bfefffcc], vec_size[20] (XEN) ACPI: Local APIC address 0xfee0 (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) (XEN) Processor #0 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) (XEN) Processor #2 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled) (XEN) Processor #4 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled) (XEN) Processor #6 6:6 APIC version 21 (XEN) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) (XEN) ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) ERST table was not found (XEN) Using ACPI (MADT) for SMP configuration information (XEN) SMP: Allowing 4 CPUs (0 hotplug CPUs) (XEN) IRQ limits: 24 GSI, 760 MSI/MSI-X (XEN) Not enabling x2APIC: depends on iommu_supports_eim. (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K (XEN) CPU: L2 cache: 256K (XEN) CPU: L3 cache: 6144K (XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7 (XEN) CPU0: No MCE banks present. Machine check support disabled (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Initializing CPU#0 (XEN) Detected 2592.669 MHz processor. (XEN) Initing memory sharing. (XEN) alt table 82d0802bd090 -> 82d0802be2c0 (XEN) PCI: MCFG configuration 0: base f000 segment buses 00 - 7f (XEN) PCI: MCFG area at f000 reserved in E820 (XEN) PCI: Using MCFG for segment bus 00-7f (XEN) I/O virtualisation disabled (XEN) CPU0: Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz stepping 01 (XEN) nr_sockets: 7 (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) Platform timer is 3.579MHz ACPI PM Timer (XEN) Allocated console ring of 32 KiB. (XEN) mwait-idle: MWAIT substates: 0x10 (XEN) mwait-idle: v0.4 model 0x46 (XEN) mwait-idle: lapic_timer_reliable_states 0x (XEN) VMX: Supported advanced features: (XEN) - APIC TPR shadow
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 24.11.15 at 21:28,wrote: > RFC. Boot tested on VMware Fusion, and on a 2-socket Xeon server. Mind trying this one instead? Jan --- unstable.orig/xen/arch/x86/mpparse.c +++ unstable/xen/arch/x86/mpparse.c @@ -89,19 +89,14 @@ void __init set_nr_cpu_ids(unsigned int void __init set_nr_sockets(void) { -/* - * Count the actual cpus in the socket 0 and use it to calculate nr_sockets - * so that the latter will be always >= the actual socket number in the - * system even when APIC IDs from MP table are too sparse. - */ -unsigned int cpus = bitmap_weight(phys_cpu_present_map.mask, - boot_cpu_data.x86_max_cores * - boot_cpu_data.x86_num_siblings); - -if ( cpus == 0 ) -cpus = 1; - -nr_sockets = DIV_ROUND_UP(num_processors + disabled_cpus, cpus); + nr_sockets = last_physid(phys_cpu_present_map) +/ boot_cpu_data.x86_max_cores +/ boot_cpu_data.x86_num_siblings + 1; + if (disabled_cpus) + nr_sockets += (disabled_cpus - 1) + / boot_cpu_data.x86_max_cores + / boot_cpu_data.x86_num_siblings + 1; + printk(XENLOG_DEBUG "nr_sockets: %u\n", nr_sockets); } /* --- unstable.orig/xen/include/asm-x86/mpspec.h +++ unstable/xen/include/asm-x86/mpspec.h @@ -43,6 +43,19 @@ typedef struct physid_mask physid_mask_t #define physid_isset(physid, map) test_bit(physid, (map).mask) #define physid_test_and_set(physid, map) test_and_set_bit(physid, (map).mask) +#define first_physid(map) find_first_bit((map).mask, \ + MAX_APICS) +#define next_physid(id, map) find_next_bit((map).mask, \ + MAX_APICS, (id) + 1) +#define last_physid(map) ({ \ + const unsigned long *mask = (map).mask; \ + unsigned int id, last = MAX_APICS; \ + for (id = find_first_bit(mask, MAX_APICS); id < MAX_APICS; \ +id = find_next_bit(mask, MAX_APICS, (id) + 1)) \ + last = id; \ + last; \ +}) + #define physids_and(dst, src1, src2) bitmap_and((dst).mask, (src1).mask, (src2).mask, MAX_APICS) #define physids_or(dst, src1, src2)bitmap_or((dst).mask, (src1).mask, (src2).mask, MAX_APICS) #define physids_clear(map) bitmap_zero((map).mask, MAX_APICS) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
BTW, we use "package ID", rather than "socket ID" in the SDM. Assignment of the package IDs on a system is a BIOS matter. Basically BIOS needs to assign package IDs to resolve APIC ID collision at early boot time, and the convention is up to the vendor/or the specific system configuration agents. And "contiguous package IDs" are not required there to make it flexible. In fact, if you look at "Example 8-22. Compute the Number of Packages, Cores, and Processor Relationships in a MP System", the sample code doesn't assume that PACKAGE_IDs be continuous. On Thu, Nov 26, 2015 at 6:11 PM, Chao Pengwrote: > On Thu, Nov 26, 2015 at 12:49:42AM -0700, Jan Beulich wrote: >> >>> On 26.11.15 at 00:27, wrote: >> > A few more data points: I also tested Xen 4.6 on VMware ESXi 5.5, and >> > it yields similar results. Not surprising, since Fusion uses basically >> > the same virtualization engine. >> > >> > However, ESXi offers many more choices of number of processors, number >> > of cores, hyperthreading, etc. The weird processor ID assignment (0, >> > 2, 4, 6, ...) occurs only with 4 or 8 processors, 1 core per socket, >> > and no hyperthreading. If I change any of these parameters, the >> > processor IDs become sequential. >> > >> > It appears in the 4- and 8-processor cases, VMware is emulating >> > something like a Xeon E7340: >> > https://github.com/deater/test_proc/blob/master/x86_64/x86_64.intel.6.15.11. >> > xeon_e7340 >> > >> > In fact someone asked a question about running Xen on this platform >> > way back when: >> > http://lists.xenproject.org/archives/html/xen-users/2008-05/msg00691.html >> > >> > Others of similar vintage assign processor IDs 0 and 3 on a >> > 2-processor system: >> > https://www.centos.org/forums/viewtopic.php?t=30255 >> > >> > or even 0 and 6: >> > http://serverfault.com/questions/302429/interpreting-cpuinfo >> > >> > So there are real hardware platforms with non-sequential processor >> > IDs. They are quite ancient and don't support CAT, but that doesn't >> > rule out the possibility of a newer or future platform behaving >> > similarly. >> >> Not supporting CAT is not a criteria, since the socket data setup >> happens unconditionally. However (and as said before), non- >> sequential processor IDs are fine. Non-sequential socket IDs are >> what is problematic. > > I asked non-sequential socket ID problem internally but I don't know if > I can get a clear answer in the end, please just stay tuned for a while. > > Thanks, > Chao > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel -- Jun Intel Open Source Technology Center ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
On Thu, Nov 26, 2015 at 12:49:42AM -0700, Jan Beulich wrote: > >>> On 26.11.15 at 00:27,wrote: > > A few more data points: I also tested Xen 4.6 on VMware ESXi 5.5, and > > it yields similar results. Not surprising, since Fusion uses basically > > the same virtualization engine. > > > > However, ESXi offers many more choices of number of processors, number > > of cores, hyperthreading, etc. The weird processor ID assignment (0, > > 2, 4, 6, ...) occurs only with 4 or 8 processors, 1 core per socket, > > and no hyperthreading. If I change any of these parameters, the > > processor IDs become sequential. > > > > It appears in the 4- and 8-processor cases, VMware is emulating > > something like a Xeon E7340: > > https://github.com/deater/test_proc/blob/master/x86_64/x86_64.intel.6.15.11. > > xeon_e7340 > > > > In fact someone asked a question about running Xen on this platform > > way back when: > > http://lists.xenproject.org/archives/html/xen-users/2008-05/msg00691.html > > > > Others of similar vintage assign processor IDs 0 and 3 on a > > 2-processor system: > > https://www.centos.org/forums/viewtopic.php?t=30255 > > > > or even 0 and 6: > > http://serverfault.com/questions/302429/interpreting-cpuinfo > > > > So there are real hardware platforms with non-sequential processor > > IDs. They are quite ancient and don't support CAT, but that doesn't > > rule out the possibility of a newer or future platform behaving > > similarly. > > Not supporting CAT is not a criteria, since the socket data setup > happens unconditionally. However (and as said before), non- > sequential processor IDs are fine. Non-sequential socket IDs are > what is problematic. I asked non-sequential socket ID problem internally but I don't know if I can get a clear answer in the end, please just stay tuned for a while. Thanks, Chao ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
A few more data points: I also tested Xen 4.6 on VMware ESXi 5.5, and it yields similar results. Not surprising, since Fusion uses basically the same virtualization engine. However, ESXi offers many more choices of number of processors, number of cores, hyperthreading, etc. The weird processor ID assignment (0, 2, 4, 6, ...) occurs only with 4 or 8 processors, 1 core per socket, and no hyperthreading. If I change any of these parameters, the processor IDs become sequential. It appears in the 4- and 8-processor cases, VMware is emulating something like a Xeon E7340: https://github.com/deater/test_proc/blob/master/x86_64/x86_64.intel.6.15.11.xeon_e7340 In fact someone asked a question about running Xen on this platform way back when: http://lists.xenproject.org/archives/html/xen-users/2008-05/msg00691.html Others of similar vintage assign processor IDs 0 and 3 on a 2-processor system: https://www.centos.org/forums/viewtopic.php?t=30255 or even 0 and 6: http://serverfault.com/questions/302429/interpreting-cpuinfo So there are real hardware platforms with non-sequential processor IDs. They are quite ancient and don't support CAT, but that doesn't rule out the possibility of a newer or future platform behaving similarly. At least there is no evidence of a platform assigning extremely large processor IDs; until then we are safe using arrays and bitmaps. The issue is sizing these data structures appropriately. --Ed On Wed, Nov 25, 2015 at 1:04 AM, Jan Beulichwrote: On 25.11.15 at 08:48, wrote: >> On Tue, Nov 24, 2015 at 03:34:45AM -0700, Jan Beulich wrote: >>> Chao, could you - inside Intel - please check whether there are >>> any assumptions on the respective CPUID leaf output that aren't >>> explicitly stated in the SDM right now (like resulting in contiguous >>> socket numbers), and ask for them getting made explicit (if there >>> are any), or it being made explicit that no assumptions at all are >>> to be made at all on the presented values >> >> Actually there is already such statement in SDM (ch8.9.1, vol3): >> >> "The value of valid APIC_IDs need not be contiguous across package >> boundary or core boundaries". > > That's a statement on APIC ID space (which necessarily can't be > contiguous on systems with a non-power-of-2 core count), but I > was asking about the socket ID space. > >>> (in which case we'd >>> have to consume MADT parsing data in set_nr_sockets(), e.g. >>> by replacing num_processors there with one more than the >>> maximum APIC ID of any non-disabled CPU)? >> >> Even with this, we still have problem for hotplug case, the inserted >> CPU may have a APIC_ID bigger than the maximum APIC_ID here. >> >> But let's back to the real world. Most machines that support CAT should >> have continuous SOCKET_ID so it's not a problem. Giving that CAT is the >> only feature uses this, I guess this suggestion might be better than >> other solutions in practice. > > And we could actually cater for that by extrapolating the value > added to cover disabled_cpus. > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 26.11.15 at 00:27,wrote: > A few more data points: I also tested Xen 4.6 on VMware ESXi 5.5, and > it yields similar results. Not surprising, since Fusion uses basically > the same virtualization engine. > > However, ESXi offers many more choices of number of processors, number > of cores, hyperthreading, etc. The weird processor ID assignment (0, > 2, 4, 6, ...) occurs only with 4 or 8 processors, 1 core per socket, > and no hyperthreading. If I change any of these parameters, the > processor IDs become sequential. > > It appears in the 4- and 8-processor cases, VMware is emulating > something like a Xeon E7340: > https://github.com/deater/test_proc/blob/master/x86_64/x86_64.intel.6.15.11. > xeon_e7340 > > In fact someone asked a question about running Xen on this platform > way back when: > http://lists.xenproject.org/archives/html/xen-users/2008-05/msg00691.html > > Others of similar vintage assign processor IDs 0 and 3 on a > 2-processor system: > https://www.centos.org/forums/viewtopic.php?t=30255 > > or even 0 and 6: > http://serverfault.com/questions/302429/interpreting-cpuinfo > > So there are real hardware platforms with non-sequential processor > IDs. They are quite ancient and don't support CAT, but that doesn't > rule out the possibility of a newer or future platform behaving > similarly. Not supporting CAT is not a criteria, since the socket data setup happens unconditionally. However (and as said before), non- sequential processor IDs are fine. Non-sequential socket IDs are what is problematic. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 25.11.15 at 08:48,wrote: > On Tue, Nov 24, 2015 at 03:34:45AM -0700, Jan Beulich wrote: >> Chao, could you - inside Intel - please check whether there are >> any assumptions on the respective CPUID leaf output that aren't >> explicitly stated in the SDM right now (like resulting in contiguous >> socket numbers), and ask for them getting made explicit (if there >> are any), or it being made explicit that no assumptions at all are >> to be made at all on the presented values > > Actually there is already such statement in SDM (ch8.9.1, vol3): > > "The value of valid APIC_IDs need not be contiguous across package > boundary or core boundaries". That's a statement on APIC ID space (which necessarily can't be contiguous on systems with a non-power-of-2 core count), but I was asking about the socket ID space. >> (in which case we'd >> have to consume MADT parsing data in set_nr_sockets(), e.g. >> by replacing num_processors there with one more than the >> maximum APIC ID of any non-disabled CPU)? > > Even with this, we still have problem for hotplug case, the inserted > CPU may have a APIC_ID bigger than the maximum APIC_ID here. > > But let's back to the real world. Most machines that support CAT should > have continuous SOCKET_ID so it's not a problem. Giving that CAT is the > only feature uses this, I guess this suggestion might be better than > other solutions in practice. And we could actually cater for that by extrapolating the value added to cover disabled_cpus. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 24.11.15 at 21:28,wrote: > RFC. Boot tested on VMware Fusion, and on a 2-socket Xeon server. Well, thanks, but as said I view this is overkill (and I'm also not sure what you have is completely race free). Hence I'd prefer a more light weight solution if at all possible. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
RFC. Boot tested on VMware Fusion, and on a 2-socket Xeon server. diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h index ea07888..a41ce2d 100644 --- a/xen/include/asm-x86/smp.h +++ b/xen/include/asm-x86/smp.h @@ -67,7 +67,7 @@ extern unsigned int nr_sockets; void set_nr_sockets(void); /* Representing HT and core siblings in each socket. */ -extern cpumask_t **socket_cpumask; +cpumask_t *socket_cpumask(unsigned int socket); #endif /* !__ASSEMBLY__ */ diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 0946992..6aadaac 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -60,7 +60,7 @@ cpumask_t cpu_online_map __read_mostly; EXPORT_SYMBOL(cpu_online_map); unsigned int __read_mostly nr_sockets; -cpumask_t **__read_mostly socket_cpumask; +static struct radix_tree_root socket_cpumask_tree; static cpumask_t *secondary_socket_cpumask; struct cpuinfo_x86 cpu_data[NR_CPUS]; @@ -81,6 +81,11 @@ static enum cpu_state { void *stack_base[NR_CPUS]; +cpumask_t *socket_cpumask(unsigned int socket) +{ +return radix_tree_lookup(_cpumask_tree, socket); +} + static void smp_store_cpu_info(int id) { struct cpuinfo_x86 *c = cpu_data + id; @@ -92,9 +97,9 @@ static void smp_store_cpu_info(int id) identify_cpu(c); socket = cpu_to_socket(id); -if ( !socket_cpumask[socket] ) +if ( radix_tree_insert(_cpumask_tree, socket, + secondary_socket_cpumask) == 0 ) { -socket_cpumask[socket] = secondary_socket_cpumask; secondary_socket_cpumask = NULL; } } @@ -258,7 +263,7 @@ static void set_cpu_sibling_map(int cpu) cpumask_set_cpu(cpu, _sibling_setup_map); -cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]); +cpumask_set_cpu(cpu, socket_cpumask(cpu_to_socket(cpu))); if ( c[cpu].x86_num_siblings > 1 ) { @@ -666,11 +671,12 @@ static void cpu_smpboot_free(unsigned int cpu) { unsigned int order, socket = cpu_to_socket(cpu); struct cpuinfo_x86 *c = cpu_data; +cpumask_t *m = socket_cpumask(socket); -if ( cpumask_empty(socket_cpumask[socket]) ) +if ( m && cpumask_empty(m) ) { -xfree(socket_cpumask[socket]); -socket_cpumask[socket] = NULL; +radix_tree_delete(_cpumask_tree, socket); +xfree(m); } c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID; @@ -804,6 +810,8 @@ static struct notifier_block cpu_smpboot_nfb = { void __init smp_prepare_cpus(unsigned int max_cpus) { +cpumask_t *m; + register_cpu_notifier(_smpboot_nfb); mtrr_aps_sync_begin(); @@ -819,9 +827,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus) set_nr_sockets(); -socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets); -if ( socket_cpumask == NULL || - (socket_cpumask[cpu_to_socket(0)] = xzalloc(cpumask_t)) == NULL ) +radix_tree_init(_cpumask_tree); +if ( (m = xzalloc(cpumask_t)) == NULL || + radix_tree_insert(_cpumask_tree, cpu_to_socket(0), m) != 0 ) panic("No memory for socket CPU siblings map"); if ( !zalloc_cpumask_var(_cpu(cpu_sibling_mask, 0)) || @@ -888,7 +896,7 @@ remove_siblinginfo(int cpu) { int sibling; -cpumask_clear_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]); +cpumask_clear_cpu(cpu, socket_cpumask(cpu_to_socket(cpu))); for_each_cpu ( sibling, per_cpu(cpu_core_mask, cpu) ) { diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index c0daa2e..7acb3d9 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -52,14 +52,6 @@ static DEFINE_PER_CPU(struct psr_assoc, psr_assoc); static struct psr_cat_cbm *temp_cos_to_cbm; -static unsigned int get_socket_cpu(unsigned int socket) -{ -if ( likely(socket < nr_sockets) ) -return cpumask_any(socket_cpumask[socket]); - -return nr_cpu_ids; -} - static void __init parse_psr_bool(char *s, char *value, char *feature, unsigned int mask) { @@ -331,7 +323,8 @@ static int write_l3_cbm(unsigned int socket, unsigned int cos, uint64_t cbm) do_write_l3_cbm(); else { -unsigned int cpu = get_socket_cpu(socket); +cpumask_t *m = socket_cpumask(socket); +unsigned int cpu = m ? cpumask_any(m) : nr_cpu_ids; if ( cpu >= nr_cpu_ids ) return -ENOTSOCK; @@ -503,8 +496,9 @@ static void cat_cpu_init(void) static void cat_cpu_fini(unsigned int cpu) { unsigned int socket = cpu_to_socket(cpu); +cpumask_t *m = socket_cpumask(socket); -if ( !socket_cpumask[socket] || cpumask_empty(socket_cpumask[socket]) ) +if ( !m || cpumask_empty(m) ) { struct psr_cat_socket_info *info = cat_socket_info + socket; On Tue, Nov 24, 2015 at 7:20 AM, Jan Beulichwrote: On 24.11.15 at 15:13, wrote: >> On Tue, Nov 24, 2015 at 2:34 AM, Jan Beulich wrote: >>> Bottom line - for
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
On Tue, Nov 24, 2015 at 03:34:45AM -0700, Jan Beulich wrote: > >>> On 23.11.15 at 17:36,wrote: > > I instrumented detect_extended_topology() and ran again with 4 CPUs. > >[...] > > (XEN) smp_store_cpu_info id=3 > > (XEN) detect_extended_topology cpuid_count op=0xb count=0 eax=0x0 ebx=0x1 > > ecx=0x100 edx=0x6 > > (XEN) detect_extended_topology initial_apicid=6 core_plus_mask_width=0 > > core_level_siblings=1 > > (XEN) detect_extended_topology cpuid_count op=0xb count=1 eax=0x0 ebx=0x1 > > ecx=0x201 edx=0x6 > > (XEN) detect_extended_topology ht_mask_width=0 core_plus_mask_width=0 > > core_select_mask=0x0 core_level_siblings=1 > >[...] > > If cpuid 0xb returned 1 rather than 0 in eax[4:0], we would get > > consecutively-numbered physical processor IDs. > > > > But the only requirement I see in the IA SDM (vol 2A, table 3-17) is that > > the eax[4:0] value yield unique IDs, not necessarily consecutive. Likewise > > while the examples in vol 3A sec 8.9 show physical IDs numbered > > consecutively, the algorithms do not assume this is the case. > > Indeed, and I think I had said so. The algorithm does, however, tell > us that with the above output CPU 3 (APIC ID 6) is on socket 6 (both > shifts being zero), which for the whole system results in sockets 1, > 3, and 5 unused. While not explicitly excluded, I'm not sure how far > we should go in expecting all kinds of odd configurations (along those > lines we e.g. have a limit on the largest APIC ID we allow: MAX_APICS / > MAX_LOCAL_APIC, which for big systems is 4 times the number of > CPUs we support). > > Taking it to set_nr_sockets(), a pretty basic assumption is broken by > the above way of presenting topology: We would have to have more > sockets than there are CPUs. I would have wanted to check what > e.g. Linux does here, but there doesn't seem to be any support of > CAT (and hence any need for per-socket data) there. Actually I checked Linux code when I implementing this but it doesn't exist. Current Linux CAT patch supports only system-level other than per-socket level so it doesn't need that as well. There are people requesting to add per-socket support so Linux need solve this problem eventually. But at this time, we don't have any reference. > > (I am, btw, now also confused by you saying that e.g. for a 3-CPU > config things work. If the topology data gets presented in similar > ways in that case, I can't see why you wouldn't run into the same > problem. Unless memory corruption occurs silently in one case, but > "loudly" in the other.) > > Bottom line - for the moment I do not see a reasonable way of > dealing with that situation. The closest I could see would be what > we iirc had temporarily during the review cycles of the initial CAT > series: A command line option to specify the number of sockets. Or > make all accesses to socket_cpumask[] conditional upon PSR being > enabled (which would have the bad side effect of making future > uses for other purposes more cumbersome), or go through and > range check the socket number on all of those accesses. > > Chao, could you - inside Intel - please check whether there are > any assumptions on the respective CPUID leaf output that aren't > explicitly stated in the SDM right now (like resulting in contiguous > socket numbers), and ask for them getting made explicit (if there > are any), or it being made explicit that no assumptions at all are > to be made at all on the presented values Actually there is already such statement in SDM (ch8.9.1, vol3): "The value of valid APIC_IDs need not be contiguous across package boundary or core boundaries". > (in which case we'd > have to consume MADT parsing data in set_nr_sockets(), e.g. > by replacing num_processors there with one more than the > maximum APIC ID of any non-disabled CPU)? Even with this, we still have problem for hotplug case, the inserted CPU may have a APIC_ID bigger than the maximum APIC_ID here. But let's back to the real world. Most machines that support CAT should have continuous SOCKET_ID so it's not a problem. Giving that CAT is the only feature uses this, I guess this suggestion might be better than other solutions in practice. Chao ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 23.11.15 at 17:36,wrote: > I instrumented detect_extended_topology() and ran again with 4 CPUs. >[...] > (XEN) smp_store_cpu_info id=3 > (XEN) detect_extended_topology cpuid_count op=0xb count=0 eax=0x0 ebx=0x1 > ecx=0x100 edx=0x6 > (XEN) detect_extended_topology initial_apicid=6 core_plus_mask_width=0 > core_level_siblings=1 > (XEN) detect_extended_topology cpuid_count op=0xb count=1 eax=0x0 ebx=0x1 > ecx=0x201 edx=0x6 > (XEN) detect_extended_topology ht_mask_width=0 core_plus_mask_width=0 > core_select_mask=0x0 core_level_siblings=1 >[...] > If cpuid 0xb returned 1 rather than 0 in eax[4:0], we would get > consecutively-numbered physical processor IDs. > > But the only requirement I see in the IA SDM (vol 2A, table 3-17) is that > the eax[4:0] value yield unique IDs, not necessarily consecutive. Likewise > while the examples in vol 3A sec 8.9 show physical IDs numbered > consecutively, the algorithms do not assume this is the case. Indeed, and I think I had said so. The algorithm does, however, tell us that with the above output CPU 3 (APIC ID 6) is on socket 6 (both shifts being zero), which for the whole system results in sockets 1, 3, and 5 unused. While not explicitly excluded, I'm not sure how far we should go in expecting all kinds of odd configurations (along those lines we e.g. have a limit on the largest APIC ID we allow: MAX_APICS / MAX_LOCAL_APIC, which for big systems is 4 times the number of CPUs we support). Taking it to set_nr_sockets(), a pretty basic assumption is broken by the above way of presenting topology: We would have to have more sockets than there are CPUs. I would have wanted to check what e.g. Linux does here, but there doesn't seem to be any support of CAT (and hence any need for per-socket data) there. (I am, btw, now also confused by you saying that e.g. for a 3-CPU config things work. If the topology data gets presented in similar ways in that case, I can't see why you wouldn't run into the same problem. Unless memory corruption occurs silently in one case, but "loudly" in the other.) Bottom line - for the moment I do not see a reasonable way of dealing with that situation. The closest I could see would be what we iirc had temporarily during the review cycles of the initial CAT series: A command line option to specify the number of sockets. Or make all accesses to socket_cpumask[] conditional upon PSR being enabled (which would have the bad side effect of making future uses for other purposes more cumbersome), or go through and range check the socket number on all of those accesses. Chao, could you - inside Intel - please check whether there are any assumptions on the respective CPUID leaf output that aren't explicitly stated in the SDM right now (like resulting in contiguous socket numbers), and ask for them getting made explicit (if there are any), or it being made explicit that no assumptions at all are to be made at all on the presented values (in which case we'd have to consume MADT parsing data in set_nr_sockets(), e.g. by replacing num_processors there with one more than the maximum APIC ID of any non-disabled CPU)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
On Tue, Nov 24, 2015 at 2:34 AM, Jan Beulichwrote: > Indeed, and I think I had said so. The algorithm does, however, tell > us that with the above output CPU 3 (APIC ID 6) is on socket 6 (both > shifts being zero), which for the whole system results in sockets 1, > 3, and 5 unused. While not explicitly excluded, I'm not sure how far > we should go in expecting all kinds of odd configurations (along those > lines we e.g. have a limit on the largest APIC ID we allow: MAX_APICS / > MAX_LOCAL_APIC, which for big systems is 4 times the number of > CPUs we support). That's why I thought it reasonable to substitute MAX_APICS for nr_sockets in sizing the socket_cpumask array. > Taking it to set_nr_sockets(), a pretty basic assumption is broken by > the above way of presenting topology: We would have to have more > sockets than there are CPUs. I would have wanted to check what > e.g. Linux does here, but there doesn't seem to be any support of > CAT (and hence any need for per-socket data) there. I looked at Linux, and there is no per-socket bookkeeping, AFAICT. > (I am, btw, now also confused by you saying that e.g. for a 3-CPU > config things work. If the topology data gets presented in similar > ways in that case, I can't see why you wouldn't run into the same > problem. Unless memory corruption occurs silently in one case, but > "loudly" in the other.) For 3, 6 and 12 CPUs, Fusion presents a completely different topology, with 3-core sockets numbered consecutively starting with 0. > Bottom line - for the moment I do not see a reasonable way of > dealing with that situation. The closest I could see would be what > we iirc had temporarily during the review cycles of the initial CAT > series: A command line option to specify the number of sockets. Or > make all accesses to socket_cpumask[] conditional upon PSR being > enabled (which would have the bad side effect of making future > uses for other purposes more cumbersome), or go through and > range check the socket number on all of those accesses. Could we avoid the issue by replacing socket_cpumask array with a list or hashtable, indexed by socket ID? > Chao, could you - inside Intel - please check whether there are > any assumptions on the respective CPUID leaf output that aren't > explicitly stated in the SDM right now (like resulting in contiguous > socket numbers), and ask for them getting made explicit (if there > are any), or it being made explicit that no assumptions at all are > to be made at all on the presented values (in which case we'd > have to consume MADT parsing data in set_nr_sockets(), e.g. > by replacing num_processors there with one more than the > maximum APIC ID of any non-disabled CPU)? I suppose the key is whether Intel has encoded such assumptions in the BIOS reference code, or has otherwise communicated them to AMI et al. --Ed ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 24.11.15 at 15:13,wrote: > On Tue, Nov 24, 2015 at 2:34 AM, Jan Beulich wrote: >> Bottom line - for the moment I do not see a reasonable way of >> dealing with that situation. The closest I could see would be what >> we iirc had temporarily during the review cycles of the initial CAT >> series: A command line option to specify the number of sockets. Or >> make all accesses to socket_cpumask[] conditional upon PSR being >> enabled (which would have the bad side effect of making future >> uses for other purposes more cumbersome), or go through and >> range check the socket number on all of those accesses. > > Could we avoid the issue by replacing socket_cpumask array with a list > or hashtable, indexed by socket ID? Yes, a radix tree would work. But it would also seem like overkill if all we need it for is some strange virtualization of CPUID. The more I think about it, the better I like the option below. Jan >> Chao, could you - inside Intel - please check whether there are >> any assumptions on the respective CPUID leaf output that aren't >> explicitly stated in the SDM right now (like resulting in contiguous >> socket numbers), and ask for them getting made explicit (if there >> are any), or it being made explicit that no assumptions at all are >> to be made at all on the presented values (in which case we'd >> have to consume MADT parsing data in set_nr_sockets(), e.g. >> by replacing num_processors there with one more than the >> maximum APIC ID of any non-disabled CPU)? > > I suppose the key is whether Intel has encoded such assumptions in the > BIOS reference code, or has otherwise communicated them to AMI et al. > > --Ed ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 21.11.15 at 02:21,wrote: > The problem is that the index of the socket_cpumask array is derived via > cpu_to_socket() from the APIC ID of the processor in a given socket, but > the size of the array is computed based on nr_sockets, which is not > necessarily equal to the maximum APIC ID. > > Sizing the socket_cpumask to MAX_APICS rather than nr_sockets seems safer, > though a bit wasteful. I verified that this change fixes the boot crash > with 4 or 8 CPUs on VMware Fusion. But that raises the question of sanity of the CPUID output Xen gets presented: With > (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) > (XEN) Processor #0 6:6 APIC version 21 > (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) > (XEN) Processor #2 6:6 APIC version 21 > (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled) > (XEN) Processor #4 6:6 APIC version 21 > (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled) > (XEN) Processor #6 6:6 APIC version 21 and taking the output you added, I can only suspect that the value used for determining the socket shift is unexpected (CPUID leaf 0xb). Could you supply the observed values? (See detect_extended_topology() and set_nr_sockets().) As you can see, the core IDs ("CPU: Physical Processor ID: ...") aren't sequential, which we expect them to be (with holes left only when non-power-of-2 values need taking care of). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
I instrumented detect_extended_topology() and ran again with 4 CPUs. Loading xen-4.6-amd64.gz... ok Loading vmlinuz-3.14.51-grsec-dock... ok Loading initrd.img-3.14.51-grsec-dock... ok (XEN) Xen version 4.6.1-pre (Debian 4.6.1~pre-1skyport1) ( eswi...@skyportsystems.com) (gcc (Debian 4.9.3-4) 4.9.3) debug=y Mon Nov 23 07:18:36 PST 2015 (XEN) Bootloader: SYSLINUX 4.05 20140113 (XEN) Command line: console=com1,vga com1=115200 no-bootscrub dom0_mem=2048M,max:2048M loglvl=all cpuinfo=1 apic_verbosity=debug (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009f800 (usable) (XEN) 0009f800 - 000a (reserved) (XEN) 000dc000 - 0010 (reserved) (XEN) 0010 - bfef (usable) (XEN) bfef - bfeff000 (ACPI data) (XEN) bfeff000 - bff0 (ACPI NVS) (XEN) bff0 - c000 (usable) (XEN) f000 - f800 (reserved) (XEN) fec0 - fec1 (reserved) (XEN) fee0 - fee01000 (reserved) (XEN) fffe - 0001 (reserved) (XEN) 0001 - 0001c000 (usable) (XEN) ACPI: RSDP 000F6A10, 0024 (r2 PTLTD ) (XEN) ACPI: XSDT BFEF030B, 0054 (r1 INTEL 440BX 604 VMW 1324272) (XEN) ACPI: FACP BFEFEE73, 00F4 (r4 INTEL 440BX 604 PTL F4240) (XEN) ACPI: DSDT BFEF05B1, E8C2 (r1 PTLTD Custom604 MSFT 301) (XEN) ACPI: FACS BFEFFFC0, 0040 (XEN) ACPI: BOOT BFEF0589, 0028 (r1 PTLTD $SBFTBL$ 604 LTP1) (XEN) ACPI: APIC BFEF050F, 007A (r1 PTLTD APIC604 LTP0) (XEN) ACPI: MCFG BFEF04D3, 003C (r1 PTLTD $PCITBL$ 604 LTP1) (XEN) ACPI: SRAT BFEF03C3, 0110 (r2 VMWARE MEMPLUG 604 VMW 1) (XEN) ACPI: WAET BFEF039B, 0028 (r1 VMWARE VMW WAET 604 VMW 1) (XEN) System RAM: 6143MB (6291004kB) (XEN) SRAT: PXM 0 -> APIC 00 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 02 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 04 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 06 -> Node 0 (XEN) SRAT: Node 0 PXM 0 0-a (XEN) SRAT: Node 0 PXM 0 10-1000 (XEN) SRAT: Node 0 PXM 0 1000-c000 (XEN) SRAT: Node 0 PXM 0 1-1c000 (XEN) NUMA: Allocated memnodemap from 1bd8f8000 - 1bd8fa000 (XEN) NUMA: Using 8 for the hash shift. (XEN) Domain heap initialised (XEN) found SMP MP-table at 000f6a80 (XEN) DMI present. (XEN) APIC boot state is 'xapic' (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0x1008 (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1004,1:0], pm1x_evt[1:1000,1:0] (XEN) ACPI: wakeup_vec[bfefffcc], vec_size[20] (XEN) ACPI: Local APIC address 0xfee0 (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) (XEN) Processor #0 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) (XEN) Processor #2 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled) (XEN) Processor #4 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled) (XEN) Processor #6 6:6 APIC version 21 (XEN) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) (XEN) ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) ERST table was not found (XEN) Using ACPI (MADT) for SMP configuration information (XEN) SMP: Allowing 4 CPUs (0 hotplug CPUs) (XEN) mapped APIC to 82cfffdfb000 (fee0) (XEN) mapped IOAPIC to 82cfffdfa000 (fec0) (XEN) IRQ limits: 24 GSI, 760 MSI/MSI-X (XEN) Not enabling x2APIC: depends on iommu_supports_eim. (XEN) detect_extended_topology cpuid_count op=0xb count=0 eax=0x0 ebx=0x1 ecx=0x100 edx=0x0 (XEN) detect_extended_topology initial_apicid=0 core_plus_mask_width=0 core_level_siblings=1 (XEN) detect_extended_topology cpuid_count op=0xb count=1 eax=0x0 ebx=0x1 ecx=0x201 edx=0x0 (XEN) detect_extended_topology ht_mask_width=0 core_plus_mask_width=0 core_select_mask=0x0 core_level_siblings=1 (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K (XEN) CPU: L2 cache: 256K (XEN) CPU: L3 cache: 6144K (XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7 (XEN) CPU0: No MCE banks present. Machine check support disabled (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Initializing CPU#0 (XEN) Detected 2592.632 MHz processor. (XEN) Initing memory sharing. (XEN) alt table 82d0802e7f90 -> 82d0802e9244 (XEN) PCI: MCFG configuration 0: base f000 segment
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
On Mon, Nov 23, 2015 at 09:10:08AM +0800, Chao Peng wrote: > On Fri, Nov 20, 2015 at 05:21:11PM -0800, Ed Swierk wrote: > > The problem is that the index of the socket_cpumask array is derived via > > cpu_to_socket() from the APIC ID of the processor in a given socket, but > > the size of the array is computed based on nr_sockets, which is not > > necessarily equal to the maximum APIC ID. > > > > Sizing the socket_cpumask to MAX_APICS rather than nr_sockets seems safer, > > though a bit wasteful. I verified that this change fixes the boot crash > > with 4 or 8 CPUs on VMware Fusion. > > > > --- a/xen/arch/x86/smpboot.c > > +++ b/xen/arch/x86/smpboot.c > > @@ -819,7 +819,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) > > > > set_nr_sockets(); > > > > -socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets); > > +socket_cpumask = xzalloc_array(cpumask_t *, MAX_APICS); > > Just replacing nr_sockets with MAX_APICS can not really solve problem. > socket_cpumask should always be synchronized with nr_sockets, otherwise > at least some function will be missing, if not cause panic in another > place. > > If possible, I'd suggest you can debug set_nr_sockets(), especially you > can inspect the following two values for panic case: > boot_cpu_data.x86_max_cores > boot_cpu_data.x86_num_siblings After carefully checked the log, it looks nr_sockets is computed correctly for your case, instead phys_proc_id is not right. It could be again caused by bad CPUID information. Therefor you need debug the CPU detection code which set phys_proc_id. Thanks, Chao ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
On Fri, Nov 20, 2015 at 05:21:11PM -0800, Ed Swierk wrote: > The problem is that the index of the socket_cpumask array is derived via > cpu_to_socket() from the APIC ID of the processor in a given socket, but > the size of the array is computed based on nr_sockets, which is not > necessarily equal to the maximum APIC ID. > > Sizing the socket_cpumask to MAX_APICS rather than nr_sockets seems safer, > though a bit wasteful. I verified that this change fixes the boot crash > with 4 or 8 CPUs on VMware Fusion. > > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -819,7 +819,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) > > set_nr_sockets(); > > -socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets); > +socket_cpumask = xzalloc_array(cpumask_t *, MAX_APICS); Just replacing nr_sockets with MAX_APICS can not really solve problem. socket_cpumask should always be synchronized with nr_sockets, otherwise at least some function will be missing, if not cause panic in another place. If possible, I'd suggest you can debug set_nr_sockets(), especially you can inspect the following two values for panic case: boot_cpu_data.x86_max_cores boot_cpu_data.x86_num_siblings Thanks, Chao ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 20.11.15 at 02:22,wrote: > (XEN) [ Xen-4.6.1-pre x86_64 debug=n Not tainted ] > (XEN) CPU:3 > (XEN) RIP:e008:[] set_cpu_sibling_map+0x3f/0x330 > (XEN) RFLAGS: 00010006 CONTEXT: hypervisor > (XEN) rax: 0001 rbx: rcx: 00313d5b4080 > (XEN) rdx: 0006 rsi: rdi: 0003 > (XEN) rbp: 0300 rsp: 8301bd87fe90 r8: 8301bd878000 > (XEN) r9: 00313d5b4080 r10: 0001 r11: 0001 > (XEN) r12: 82d0802fd500 r13: r14: > (XEN) r15: 0003 cr0: 8005003b cr4: 001526a0 > (XEN) cr3: bfc75000 cr2: 0001 > (XEN) ds: es: fs: gs: ss: cs: e008 > (XEN) Xen stack trace from rsp=8301bd87fe90: > (XEN)0003802fd800 0018 0100 > (XEN)82d0802fd800 00c8 0003 > (XEN) 82d0801834dc > (XEN) 0001 > (XEN) > (XEN) > (XEN) > (XEN) > (XEN) > (XEN) > (XEN) 0003 8300bfafc000 > (XEN)00313d5b4080 > (XEN) Xen call trace: > (XEN)[] set_cpu_sibling_map+0x3f/0x330 > (XEN)[] start_secondary+0x1bc/0x250 > (XEN) > (XEN) Pagetable walk from 0001: > (XEN) L4[0x000] = 0001bd8f0063 > (XEN) L3[0x000] = 0001bd8ef063 > (XEN) L2[0x000] = 0001bd8ee063 > (XEN) L1[0x000] = > (XEN) > (XEN) > (XEN) Panic on CPU 3: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0002] > (XEN) Faulting linear address: 0001 > (XEN) > (XEN) > (XEN) Reboot in five seconds... > > set_cpu_sibling_map+0x3f is the second cpumask_set_cpu() call in > set_cpu_sibling_map(): > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/smpboot.c;h=0 > 94699286f4f6962942024ec8b2b24c7b7996cc0;hb=78833c04250416f1870c458309d3ac0e5c > f915fd#l261 I suppose cpu_to_socket(cpu) returns a value for which the socket_cpumask[] entry didn't get set up yet. But to prove that, we'd need to see the disassembly around the code location above, to be able to associate register values with variables. If that's the case, then I'd further guess that the CPUID information provided by Fusion isn't exactly as one would expect on real hardware. Whether we need to fix something, or can work around a quirk of theirs depends on the exact nature of the issue. Instrumenting code populating socket_cpumask[] would be a good first step. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
On 20/11/15 10:25, Jan Beulich wrote: On 20.11.15 at 02:22,wrote: >> (XEN) [ Xen-4.6.1-pre x86_64 debug=n Not tainted ] >> (XEN) CPU:3 >> (XEN) RIP:e008:[] set_cpu_sibling_map+0x3f/0x330 >> (XEN) RFLAGS: 00010006 CONTEXT: hypervisor >> (XEN) rax: 0001 rbx: rcx: 00313d5b4080 >> (XEN) rdx: 0006 rsi: rdi: 0003 >> (XEN) rbp: 0300 rsp: 8301bd87fe90 r8: 8301bd878000 >> (XEN) r9: 00313d5b4080 r10: 0001 r11: 0001 >> (XEN) r12: 82d0802fd500 r13: r14: >> (XEN) r15: 0003 cr0: 8005003b cr4: 001526a0 >> (XEN) cr3: bfc75000 cr2: 0001 >> (XEN) ds: es: fs: gs: ss: cs: e008 >> (XEN) Xen stack trace from rsp=8301bd87fe90: >> (XEN)0003802fd800 0018 0100 >> (XEN)82d0802fd800 00c8 0003 >> (XEN) 82d0801834dc >> (XEN) 0001 >> (XEN) >> (XEN) >> (XEN) >> (XEN) >> (XEN) >> (XEN) >> (XEN) 0003 8300bfafc000 >> (XEN)00313d5b4080 >> (XEN) Xen call trace: >> (XEN)[] set_cpu_sibling_map+0x3f/0x330 >> (XEN)[] start_secondary+0x1bc/0x250 >> (XEN) >> (XEN) Pagetable walk from 0001: >> (XEN) L4[0x000] = 0001bd8f0063 >> (XEN) L3[0x000] = 0001bd8ef063 >> (XEN) L2[0x000] = 0001bd8ee063 >> (XEN) L1[0x000] = >> (XEN) >> (XEN) >> (XEN) Panic on CPU 3: >> (XEN) FATAL PAGE FAULT >> (XEN) [error_code=0002] >> (XEN) Faulting linear address: 0001 >> (XEN) >> (XEN) >> (XEN) Reboot in five seconds... >> >> set_cpu_sibling_map+0x3f is the second cpumask_set_cpu() call in >> set_cpu_sibling_map(): >> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/smpboot.c;h=0 >> 94699286f4f6962942024ec8b2b24c7b7996cc0;hb=78833c04250416f1870c458309d3ac0e5c >> f915fd#l261 > I suppose cpu_to_socket(cpu) returns a value for which the > socket_cpumask[] entry didn't get set up yet. But to prove that, > we'd need to see the disassembly around the code location > above, to be able to associate register values with variables. > > If that's the case, then I'd further guess that the CPUID > information provided by Fusion isn't exactly as one would expect > on real hardware. Whether we need to fix something, or can > work around a quirk of theirs depends on the exact nature of > the issue. Instrumenting code populating socket_cpumask[] > would be a good first step. Might also be interesting to see the logs with "apic_verbosity=debug", and a debug hypervisor. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
I instrumented set_nr_sockets() and smp_store_cpu_info(), and re-ran with varying numbers of CPUs. With 4 CPUs, nr_sockets=4, so smp_store_cpu_info() exceeds the bounds of the socket_cpumask array when socket=4 or 6. Loading xen-4.6-amd64.gz... ok Loading vmlinuz-3.14.51-grsec-dock... ok Loading initrd.img-3.14.51-grsec-dock... ok (XEN) Xen version 4.6.1-pre (Debian 4.6.1~pre-1skyport1) ( eswi...@skyportsystems.com) (gcc (Debian 4.9.3-4) 4.9.3) debug=y Fri Nov 20 10:07:47 PST 2015 (XEN) Bootloader: SYSLINUX 4.05 20140113 (XEN) Command line: console=com1,vga com1=115200 no-bootscrub dom0_mem=2048M,max:2048M loglvl=all cpuinfo=1 apic_verbosity=debug (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009f800 (usable) (XEN) 0009f800 - 000a (reserved) (XEN) 000dc000 - 0010 (reserved) (XEN) 0010 - bfef (usable) (XEN) bfef - bfeff000 (ACPI data) (XEN) bfeff000 - bff0 (ACPI NVS) (XEN) bff0 - c000 (usable) (XEN) f000 - f800 (reserved) (XEN) fec0 - fec1 (reserved) (XEN) fee0 - fee01000 (reserved) (XEN) fffe - 0001 (reserved) (XEN) 0001 - 0001c000 (usable) (XEN) ACPI: RSDP 000F6A10, 0024 (r2 PTLTD ) (XEN) ACPI: XSDT BFEF030B, 0054 (r1 INTEL 440BX 604 VMW 1324272) (XEN) ACPI: FACP BFEFEE73, 00F4 (r4 INTEL 440BX 604 PTL F4240) (XEN) ACPI: DSDT BFEF05B1, E8C2 (r1 PTLTD Custom604 MSFT 301) (XEN) ACPI: FACS BFEFFFC0, 0040 (XEN) ACPI: BOOT BFEF0589, 0028 (r1 PTLTD $SBFTBL$ 604 LTP1) (XEN) ACPI: APIC BFEF050F, 007A (r1 PTLTD APIC604 LTP0) (XEN) ACPI: MCFG BFEF04D3, 003C (r1 PTLTD $PCITBL$ 604 LTP1) (XEN) ACPI: SRAT BFEF03C3, 0110 (r2 VMWARE MEMPLUG 604 VMW 1) (XEN) ACPI: WAET BFEF039B, 0028 (r1 VMWARE VMW WAET 604 VMW 1) (XEN) System RAM: 6143MB (6291004kB) (XEN) SRAT: PXM 0 -> APIC 00 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 02 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 04 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 06 -> Node 0 (XEN) SRAT: Node 0 PXM 0 0-a (XEN) SRAT: Node 0 PXM 0 10-1000 (XEN) SRAT: Node 0 PXM 0 1000-c000 (XEN) SRAT: Node 0 PXM 0 1-1c000 (XEN) NUMA: Allocated memnodemap from 1bd8f8000 - 1bd8fa000 (XEN) NUMA: Using 8 for the hash shift. (XEN) Domain heap initialised (XEN) found SMP MP-table at 000f6a80 (XEN) DMI present. (XEN) APIC boot state is 'xapic' (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0x1008 (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1004,1:0], pm1x_evt[1:1000,1:0] (XEN) ACPI: wakeup_vec[bfefffcc], vec_size[20] (XEN) ACPI: Local APIC address 0xfee0 (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) (XEN) Processor #0 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) (XEN) Processor #2 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled) (XEN) Processor #4 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled) (XEN) Processor #6 6:6 APIC version 21 (XEN) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) (XEN) ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) ERST table was not found (XEN) Using ACPI (MADT) for SMP configuration information (XEN) SMP: Allowing 4 CPUs (0 hotplug CPUs) (XEN) mapped APIC to 82cfffdfb000 (fee0) (XEN) mapped IOAPIC to 82cfffdfa000 (fec0) (XEN) IRQ limits: 24 GSI, 760 MSI/MSI-X (XEN) Not enabling x2APIC: depends on iommu_supports_eim. (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K (XEN) CPU: L2 cache: 256K (XEN) CPU: L3 cache: 6144K (XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7 (XEN) CPU0: No MCE banks present. Machine check support disabled (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Initializing CPU#0 (XEN) Detected 2592.620 MHz processor. (XEN) Initing memory sharing. (XEN) alt table 82d0802e6f10 -> 82d0802e81c4 (XEN) PCI: MCFG configuration 0: base f000 segment buses 00 - 7f (XEN) PCI: MCFG area at f000 reserved in E820 (XEN) PCI: Using MCFG for segment bus 00-7f (XEN) I/O virtualisation disabled (XEN) smp_store_cpu_info id=0 (XEN) CPU0: Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
[Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
Xen staging-4.6 crashes when booting on VMware Fusion 8.0.2 (with VT-x/EPT enabled), with 4 virtual CPUs: Loading xen-4.6-amd64.gz... ok Loading vmlinuz-3.14.51-grsec-dock... ok Loading initrd.img-3.14.51-grsec-dock... ok (XEN) Xen version 4.6.1-pre (Debian 4.6.1~pre-1skyport1) ( eswi...@skyportsystems.com) (gcc (Debian 4.9.3-4) 4.9.3) debug=n Thu Nov 19 17:05:27 PST 2015 (XEN) Bootloader: SYSLINUX 4.05 20140113 (XEN) Command line: console=com1,vga com1=115200 no-bootscrub dom0_mem=2048M,max:2048M maxcpus=4 loglvl=all cpuinfo=1 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009f800 (usable) (XEN) 0009f800 - 000a (reserved) (XEN) 000dc000 - 0010 (reserved) (XEN) 0010 - bfef (usable) (XEN) bfef - bfeff000 (ACPI data) (XEN) bfeff000 - bff0 (ACPI NVS) (XEN) bff0 - c000 (usable) (XEN) f000 - f800 (reserved) (XEN) fec0 - fec1 (reserved) (XEN) fee0 - fee01000 (reserved) (XEN) fffe - 0001 (reserved) (XEN) 0001 - 0001c000 (usable) (XEN) ACPI: RSDP 000F6A10, 0024 (r2 PTLTD ) (XEN) ACPI: XSDT BFEF030B, 0054 (r1 INTEL 440BX 604 VMW 1324272) (XEN) ACPI: FACP BFEFEE73, 00F4 (r4 INTEL 440BX 604 PTL F4240) (XEN) ACPI: DSDT BFEF05B1, E8C2 (r1 PTLTD Custom604 MSFT 301) (XEN) ACPI: FACS BFEFFFC0, 0040 (XEN) ACPI: BOOT BFEF0589, 0028 (r1 PTLTD $SBFTBL$ 604 LTP1) (XEN) ACPI: APIC BFEF050F, 007A (r1 PTLTD APIC604 LTP0) (XEN) ACPI: MCFG BFEF04D3, 003C (r1 PTLTD $PCITBL$ 604 LTP1) (XEN) ACPI: SRAT BFEF03C3, 0110 (r2 VMWARE MEMPLUG 604 VMW 1) (XEN) ACPI: WAET BFEF039B, 0028 (r1 VMWARE VMW WAET 604 VMW 1) (XEN) System RAM: 6143MB (6291004kB) (XEN) SRAT: PXM 0 -> APIC 00 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 02 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 04 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 06 -> Node 0 (XEN) SRAT: Node 0 PXM 0 0-a (XEN) SRAT: Node 0 PXM 0 10-1000 (XEN) SRAT: Node 0 PXM 0 1000-c000 (XEN) SRAT: Node 0 PXM 0 1-1c000 (XEN) NUMA: Allocated memnodemap from 1bd8fc000 - 1bd8fe000 (XEN) NUMA: Using 8 for the hash shift. (XEN) Domain heap initialised (XEN) found SMP MP-table at 000f6a80 (XEN) DMI present. (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0x1008 (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1004,1:0], pm1x_evt[1:1000,1:0] (XEN) ACPI: wakeup_vec[bfefffcc], vec_size[20] (XEN) ACPI: Local APIC address 0xfee0 (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) (XEN) Processor #0 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) (XEN) Processor #2 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled) (XEN) Processor #4 6:6 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled) (XEN) Processor #6 6:6 APIC version 21 (XEN) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) (XEN) ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) ERST table was not found (XEN) Using ACPI (MADT) for SMP configuration information (XEN) SMP: Allowing 4 CPUs (0 hotplug CPUs) (XEN) IRQ limits: 24 GSI, 760 MSI/MSI-X (XEN) Not enabling x2APIC: depends on iommu_supports_eim. (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: L1 I cache: 32K, L1 D cache: 32K (XEN) CPU: L2 cache: 256K (XEN) CPU: L3 cache: 6144K (XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7 (XEN) CPU0: No MCE banks present. Machine check support disabled (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Initializing CPU#0 (XEN) Detected 2592.811 MHz processor. (XEN) Initing memory sharing. (XEN) alt table 82d0802be010 -> 82d0802bf384 (XEN) PCI: MCFG configuration 0: base f000 segment buses 00 - 7f (XEN) PCI: MCFG area at f000 reserved in E820 (XEN) PCI: Using MCFG for segment bus 00-7f (XEN) I/O virtualisation disabled (XEN) CPU0: Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz stepping 01 (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) Platform timer is 3.579MHz ACPI PM Timer (XEN) Allocated console ring of 32 KiB. (XEN) mwait-idle: MWAIT substates: 0x10 (XEN) mwait-idle: v0.4 model 0x46