Re: [Xen-devel] [PATCH V2] acpi: enlarge NUM_FIXMAP_ACPI_PAGES to support larger scale boards

2017-05-15 Thread Zhangbo (Oscar)
>
>>
>> Thus, we make NUM_FIXMAP_ACPI_PAGES much larger, to 64(256KB). it's
>> calculated for that the theoretical largest CPU number on main Linux
>> distros is about 8092, and memory slots number should be within 1000,
>> that's 24B*8092+40B*1000 = 234208B. Meanwhile, because
>IOREMAP_VIRT_*
>> region is 16GB, thus I think extending it to 256KB is safe enough.
>>
>> Of course, there's much more work to do to support large scale boards of
>> that many(8092) CPUs and 1000 memory slots. We just make life easier for
>> boards with serveral hundreds of CPUs and serveral TBs of memory.
>>
>> Signed-off-by: Zhang Bo 
>
>Much better, but how did you arrive at 8092? Did you mean 8192
>(2**13)? Also I don't think the table entry fields should be listed,
>stating their size is good enough (if anyone cares to check the
>sizes are correct, (s)he'd need to go look at the spec or some
>header anyway). I'd be fine with adjusting the commit message
>accordingly while committing. With such adjustments
>

Thanks, Jan!
Yes, that's 8192 :) 
The sites I referenced are:
https://www.suse.com/products/server/technical-information/#Kernel (8192)
https://access.redhat.com/articles/rhel-limits (5120)

>Reviewed-by: Jan Beulich 
>
>You didn't Cc Julien, so I assume (and agree) that this is rather
>meant for post-4.9. Whether to backport we can decide later.
>
>Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V2] acpi: enlarge NUM_FIXMAP_ACPI_PAGES to support larger scale boards

2017-05-15 Thread Zhangbo (Oscar)
In acpi_tb_verify_table()->__acpi_map_table(), it suppose all ACPI tables
may not exceed 4 pages, the tables includes SRAT/APIC/ERST etc.
Please note that the table DSDT is not mapped through
acpi_tb_verify_table(), thus we don't care its size although it's usually
the largest table among all the ACPI tables. Then the biggest table we
concern is SRAT.
As we know, the size of SRAT if affected by both CPU number and memory
slot number, each CPU costs 24B, and each memory slot costs 40B.

[030h 0048   1]Subtable Type : 02 [Processor Local x2APIC
Affinity]
[031h 0049   1]   Length : 18

[032h 0050   2]Reserved1 : 
[034h 0052   4] Proximity Domain : 
[038h 0056   4]  Apic ID : 
[03Ch 0060   4]Flags (decoded below) : 0001
 Enabled : 1
[040h 0064   4] Clock Domain : 
[044h 0068   4]Reserved2 : 

[7090h 28816   1]Subtable Type : 01 [Memory Affinity]
[7091h 28817   1]   Length : 28

[7092h 28818   4] Proximity Domain : 001A
[7096h 28822   2]Reserved1 : 
[7098h 28824   8] Base Address : 0E00
[70A0h 28832   8]   Address Length : 
[70A8h 28840   4]Reserved2 : 
[70ACh 28844   4]Flags (decoded below) : 0001
   Enabled : 1
 Hot Pluggable : 0
  Non-Volatile : 0
[70B0h 28848   8]Reserved3 : 


Please note: even when SRAT table is within 4 pages, eg. 14128B, in
__acpi_map_table(), it maps pages to get the table. suppose the start
address is near the end of the first page:

   1000B4096B 4096B  4096B  840B
   |___|_|__|__||

although the total page is within 4 pages , but it may be in fact across 5
pages, as shown above. Thus the NUM_FIXMAP_ACPI_PAGES should be much
larger nowadays. If not, xen would wrongly thinks no NUMA configuration
could be found as that it could not get SRAT table.

Thus, we make NUM_FIXMAP_ACPI_PAGES much larger, to 64(256KB). it's
calculated for that the theoretical largest CPU number on main Linux
distros is about 8092, and memory slots number should be within 1000,
that's 24B*8092+40B*1000 = 234208B. Meanwhile, because IOREMAP_VIRT_*
region is 16GB, thus I think extending it to 256KB is safe enough.

Of course, there's much more work to do to support large scale boards of
that many(8092) CPUs and 1000 memory slots. We just make life easier for
boards with serveral hundreds of CPUs and serveral TBs of memory.

Signed-off-by: Zhang Bo 
---
 xen/include/xen/acpi.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 30ec0ee..9409350 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -41,9 +41,9 @@

 /*
  * Fixmap pages to reserve for ACPI boot-time tables (see asm-x86/fixmap.h or
- * asm-arm/config.h)
+ * asm-arm/config.h, 64 pages(256KB) is large enough for most cases.)
  */
-#define NUM_FIXMAP_ACPI_PAGES  4
+#define NUM_FIXMAP_ACPI_PAGES  64

 #define BAD_MADT_ENTRY(entry, end) (\
 (!(entry)) || (unsigned long)(entry) + sizeof(*(entry)) > 
(end) ||  \

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] acpi: enlarge NUM_FIXMAP_ACPI_PAGES to support larger scale boards

2017-05-14 Thread Zhangbo (Oscar)
In acpi_tb_verify_table()->__acpi_map_table(), it suppose all ACPI tables
may not exceed 4 pages, the tables includes SRAT/APIC/ERST etc. Please
note that the table DSDT is not mapped through acpi_tb_verify_table(),
thus we don't care its size although it's usually the largest table among
all the ACPI tables. Then the biggest table we concern is SRAT. From
experience, its size is mostly affected by CPU number, it costs about 100B
for each CPU. for example, on my BIOS board, 30336B for 288 CPUs and
14128B for 144 CPUs.(memory size is around 2TB)
Please note: even on the board with 128 CPUs, whose SRAT table is 14128B,
although it's within 4 pages, but in __acpi_map_table(), it maps
pages to get the table. suppose the start address is near the end of the
first page:

   1000B4096B 4096B  4096B  840B
   |___|_|__|__||

although the total page is within 4 pages , but it may be in fact across 5
pages, as shown above. Thus the NUM_FIXMAP_ACPI_PAGES should be much
larger nowadays. If not, xen would wrongly thinks no NUMA configuration
could be found as that it could not get SRAT table.

Thus, we make NUM_FIXMAP_ACPI_PAGES much larger, to 256(1MB). it's
calculated for that the theoretical largest CPU number on main Linux
distros is about 8000, 100B for each gets the sum of 800KB. Meanwhile,
because IOREMAP_VIRT_*  region is 16GB, thus I think extending it to 1MB
is safe enough.

Of course, there's much more work to do to support large scale boards of
that many(8000) CPUs. We just make life easier for boards with serveral
hundreds of CPUs.

Signed-off-by: Zhang Bo 
---
 xen/include/xen/acpi.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 30ec0ee..88ea2f5 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -41,9 +41,9 @@

 /*
  * Fixmap pages to reserve for ACPI boot-time tables (see asm-x86/fixmap.h or
- * asm-arm/config.h)
+ * asm-arm/config.h, 256 pages(1MB) is large enough for most cases.)
  */
-#define NUM_FIXMAP_ACPI_PAGES  4
+#define NUM_FIXMAP_ACPI_PAGES  256

 #define BAD_MADT_ENTRY(entry, end) (\
 (!(entry)) || (unsigned long)(entry) + sizeof(*(entry)) > 
(end) ||  \

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH]acpi: enlarge NUM_FIXMAP_ACPI_PAGES from 4 to 5

2017-05-11 Thread Zhangbo (Oscar)

>> --- a/xen/include/xen/acpi.h
>> +++ b/xen/include/xen/acpi.h
>> @@ -43,7 +43,7 @@
>>   * Fixmap pages to reserve for ACPI boot-time tables (see asm-x86/fixmap.h
>or
>>   * asm-arm/config.h)
>>   */
>> -#define NUM_FIXMAP_ACPI_PAGES  4
>> +#define NUM_FIXMAP_ACPI_PAGES  5
>
>Well, this is the kind of fix I don't really like: You make things work for
>you without thinking about others. If you found 4 pages aren't
>enough, how likely is it that soon someone will find 5 aren't enough
>either? IOW, short of eliminating the fixed upper bound altogether,
>you should at least add some slack for the foreseeable future. For
>this it may also help to estimate the theoretical upper limit of SRAT
>(and perhaps other affected tables) for systems currently around
>plus, again, some slack. As you may have concluded already it
>would therefore also have helped if you had indicated what size a
>system you see this relatively large SRAT on.
>
>Jan

Thanks, Jan, I'm checking the proper number, and patch v2 is on its way.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH]acpi: enlarge NUM_FIXMAP_ACPI_PAGES from 4 to 5

2017-05-09 Thread Zhangbo (Oscar)
Ping

>In __acpi_map_table(), it suppose all ACPI tables may not exceed 4 pages.
>but nowadays ACPI tables, such as SRAT table, are large enough, On my
>BIOS board, the SRAT table achieves the size of 14428B, although it's
>above 3 pages and within 4pages, but in __acpi_map_table(), it maps
>pages
>to get the table. suppose the start address is near the end of the first 
> page:
>
>  1000B4096B 4096B  4096B   1140B
>
> |___|_|__|__||
>
>although the total page is within 4pages , but it may be in fact across 5
>pages, thus the NUM_FIXMAP_ACPI_PAGES should be at least 5 nowadays.
>If
>not, xen would wrongly thinks no NUMA configuation could be found as that
>it
>could not get SRAT table.
>
>diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
>index 30ec0ee..bd616a1 100644
>--- a/xen/include/xen/acpi.h
>+++ b/xen/include/xen/acpi.h
>@@ -43,7 +43,7 @@
>  * Fixmap pages to reserve for ACPI boot-time tables (see asm-x86/fixmap.h or
>  * asm-arm/config.h)
>  */
>-#define NUM_FIXMAP_ACPI_PAGES  4
>+#define NUM_FIXMAP_ACPI_PAGES  5
>
> #define BAD_MADT_ENTRY(entry, end)
>(\
> (!(entry)) || (unsigned long)(entry) + sizeof(*(entry)) > 
> (end)
>||  \

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH]acpi: enlarge NUM_FIXMAP_ACPI_PAGES from 4 to 5

2017-05-03 Thread Zhangbo (Oscar)
In __acpi_map_table(), it suppose all ACPI tables may not exceed 4 pages.
but nowadays ACPI tables, such as SRAT table, are large enough, On my
BIOS board, the SRAT table achieves the size of 14428B, although it's
above 3 pages and within 4pages, but in __acpi_map_table(), it maps pages
to get the table. suppose the start address is near the end of the first 
page:

  1000B4096B 4096B  4096B 1140B
   |___|_|__|__||

although the total page is within 4pages , but it may be in fact across 5
pages, thus the NUM_FIXMAP_ACPI_PAGES should be at least 5 nowadays. If
not, xen would wrongly thinks no NUMA configuation could be found as that it
could not get SRAT table.

diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 30ec0ee..bd616a1 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -43,7 +43,7 @@
  * Fixmap pages to reserve for ACPI boot-time tables (see asm-x86/fixmap.h or
  * asm-arm/config.h)
  */
-#define NUM_FIXMAP_ACPI_PAGES  4
+#define NUM_FIXMAP_ACPI_PAGES  5

 #define BAD_MADT_ENTRY(entry, end) (\
 (!(entry)) || (unsigned long)(entry) + sizeof(*(entry)) > 
(end) ||  \

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] xen: migration: guest kernel gets stuck because of too-early-swappness

2016-07-12 Thread Zhangbo (Oscar)
Hi all:
  We found that guests such as RHEL6, they occasionally got stuck after 
migration.
  The stack of the stuck guest kernel is as follows:
PID: 18 TASK: 88007de61500 CPU: 1 COMMAND: "xenwatch"
#0 [88007de62e40] schedule at 8150d692
#1 [88007de62f08] io_schedule at 8150de73
#2 [88007de62f28] get_request_wait at 8125e4c8
#3 [88007de62fb8] blk_queue_bio at 8125e60d
#4 [88007de63038] generic_make_request at 8125ccce
#5 [88007de63108] submit_bio at 8125d02d
#6 [88007de63158] swap_writepage at 81154374
#7 [88007de63188] pageout.clone.2 at 8113205b
#8 [88007de63238] shrink_page_list.clone.3 at 811326e5
#9 [88007de63388] shrink_inactive_list at 81133263
#10 [88007de63538] shrink_mem_cgroup_zone at 81133afe
#11 [88007de63608] shrink_zone at 81133dc3
#12 [88007de63678] do_try_to_free_pages at 81133f25
#13 [88007de63718] try_to_free_pages at 811345f2
#14 [88007de637b8] __alloc_pages_nodemask at 8112be48
#15 [88007de638f8] kmem_getpages at 811669d2
#16 [88007de63928] fallback_alloc at 811675ea
#17 [88007de639a8] cache_alloc_node at 81167369
#18 [88007de63a08] kmem_cache_alloc at 811682eb
#19 [88007de63a48] idr_pre_get at 812786c0
#20 [88007de63a78] ida_pre_get at 8127870c
#21 [88007de63a98] proc_register at 811efc71
#22 [88007de63ae8] proc_mkdir_mode at 811f0082
#23 [88007de63b18] proc_mkdir at 811f00b6
#24 [88007de63b28] register_handler_proc at 810e54fb
#25 [88007de63bf8] __setup_irq at 810e2594
#26 [88007de63c48] request_threaded_irq at 810e2e43
#27 [88007de63ca8] serial8250_startup at 81356fac
#28 [88007de63cf8] uart_resume_port at 813547be
#29 [88007de63d78] serial8250_resume_port at 813567b6
#30 [88007de63d98] serial_pnp_resume at 81358a58
#31 [88007de63da8] pnp_bus_resume at 81311853
#32 [88007de63dc8] dpm_resume_end at 813648a8
#33 [88007de63e28] shutdown_handler at 81319351
#34 [88007de63e68] xenwatch_thread at 8131ab1a
#35 [88007de63ee8] kthread at 81096916
#36 [88007de63f48] kernel_thread at 8100c0ca

  The reason we guess is that:
  1 Guests with kernel of 3.*, such as RHEL6, when they are not configured with 
CONFIG_PREEMPT , they do NOT call FREEZE/THAW before resuming disks, thus, 
kernel threads maybe active before the disks are available. We know that kernel 
threads may require/allocate memories, which may occasionally cause swappness. 
Swappness before disks get ready may cause kernel stuck.
this problem is fixed at: 
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.16.y=2edbf3c6af0f5f1f9d2ef00a15339c10beaff405
  2 However, even the kernel thread xenwatch itself needs to allocate memories, 
any attempt to acquire memory before the disk is resumed may cause deadlock 
shown above.

  So, how to fix the kernel stuck problem caused by too-early-swappness? Thanks 
in advance.


ZhangBo(Oscar)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] failed to get gcov result for libelf_* files

2016-05-02 Thread Zhangbo (Oscar)

>Hi all:
>I'm backporting gcov-related patches(68ca0bc4ba -> 922153cd37) to xen
>4.1.2, but encountered with several problems, one problem is as follows:
>1) although all the files are gathered into gcov_info_list correctly by
>__gcov_init(), but when we call write_gcov(), 3 gcov_info seem had been broken.
>   xen/common/libelf/libelf-tools.c
>   xen/common/libelf/libelf-loader.c
>   xen/common/libelf/libelf-dominfo.c
>
>info->filename of theis are NULL, thus when we call write_gcov() ->
>write_string() -> strlen(), segment fault and kernel panic occurs.
>
>2)  even if we fixed the above problem by:
>If ( !info->filename )
>continue;
>   inside write_gcov(),  I found that the info next to the problem one is
>null, thus, files after the problem one all get skipped. ( we only get 200 
>gcda files
>although we have 262 source files in total)
>
>3) I 'fixed' this by modify __gcov_init():
>   void __gcov_init(struct gcov_info *info)
>   {
>   /* add new profiling data structure to list */
>  +n_info_list ++;
>  +if (n_info_list >= 61 && n_info_list <=63) {
>  +   printk("skip:%d\n", n_info_list);
>  +   return;
>  +}
>   info->next = info_list;
>   info_list = info;
>   }
>   Then the files after these 3 files are not affected. We got 259 gcda 
> files
>then.
>

I found that OBJCOPY is related to the problem! If I remove the OBJCOPY process 
in the Makefile of libelf, the problem got 'fixed':

diff --git a/open-source/xen/xen-4.1.2/xen/common/libelf/Makefile 
b/open-source/xen/xen-4.1.2/xen/common/libelf/Makefile
index 854e738..f522ae0 100755
--- a/open-source/xen/xen-4.1.2/xen/common/libelf/Makefile
+++ b/open-source/xen/xen-4.1.2/xen/common/libelf/Makefile
@@ -1,9 +1,9 @@
 obj-y := libelf.o

-SECTIONS := text data rodata $(foreach n,1 2 4 8,rodata.str1.$(n)) $(foreach 
r,rel rel.ro,data.$(r) data.$(r).local)
+#SECTIONS := text data rodata $(foreach n,1 2 4 8,rodata.str1.$(n)) $(foreach 
r,rel rel.ro,data.$(r) data.$(r).local)

-libelf.o: libelf-temp.o Makefile
-   $(OBJCOPY) $(foreach s,$(SECTIONS),--rename-section .$(s)=.init.$(s)) 
$< $@
+#libelf.o: libelf-temp.o Makefile
+#  $(OBJCOPY) $(foreach s,$(SECTIONS),--rename-section .$(s)=.init.$(s)) 
$< $@

-libelf-temp.o: libelf-tools.o libelf-loader.o libelf-dominfo.o 
#libelf-relocate.o
+libelf.o: libelf-tools.o libelf-loader.o libelf-dominfo.o #libelf-relocate.o
$(LD) $(LDFLAGS) -r -o $@ $^

It's still not the final solution. What's objcopy here used for? What bad 
result will happen if I remove it here? Thanks.


>But, my solution obviously is the not the right solution, what's special 
> about
>the 3 files? Why are their gcov_info get modified(filename changed to NULL)
>between __gcov_init() and write_gcov()? What's your suggestion to debug the
>problem?
>
>Thanks in advance.
>
>   Oscar.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] failed to get gcov result for libelf_* files

2016-05-02 Thread Zhangbo (Oscar)
Hi all:
I'm backporting gcov-related patches(68ca0bc4ba -> 922153cd37) to xen 
4.1.2, but encountered with several problems, one problem is as follows:
1) although all the files are gathered into gcov_info_list correctly by 
__gcov_init(), but when we call write_gcov(), 3 gcov_info seem had been broken.
   xen/common/libelf/libelf-tools.c
   xen/common/libelf/libelf-loader.c
   xen/common/libelf/libelf-dominfo.c

info->filename of theis are NULL, thus when we call write_gcov() -> 
write_string() -> strlen(), segment fault and kernel panic occurs.

2)  even if we fixed the above problem by:
If ( !info->filename )
continue;
   inside write_gcov(),  I found that the info next to the problem one is 
null, thus, files after the problem one all get skipped. ( we only get 200 gcda 
files although we have 262 source files in total)

3) I 'fixed' this by modify __gcov_init():
   void __gcov_init(struct gcov_info *info)
   {
   /* add new profiling data structure to list */
  +n_info_list ++;
  +if (n_info_list >= 61 && n_info_list <=63) {
  +   printk("skip:%d\n", n_info_list);
  +   return;
  +}
   info->next = info_list;
   info_list = info;
   }
   Then the files after these 3 files are not affected. We got 259 gcda 
files then.

But, my solution obviously is the not the right solution, what's special 
about the 3 files? Why are their gcov_info get modified(filename changed to 
NULL) between __gcov_init() and write_gcov()? What's your suggestion to debug 
the problem?

Thanks in advance.

   Oscar.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] can xenrt be used to test libxenlight/libvirt APIs directly?

2016-03-14 Thread Zhangbo (Oscar)
Hi all:
   I'm not sure but doubt that xenRT is used to just test xenserver/xapi, it's 
not
appropriate for testing libxenlight, am I right?

   Although it uses libvirt to do guest lifecycle jobs, see file
" ./exec/xenrt/lib/libvirt/guest.py", but it seems that these functions are all
used by testcases for xenserver, this means that, it doesn't test libvirt APIs
either.

   So the question is:
 Is xenRT just used to test xenserver/xenapi, rather than libxenlight or
libvirt APIs ?


   Thanks in advance.

Oscar.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] gfn_lock() seems useless.

2016-02-01 Thread Zhangbo (Oscar)
Hi all:
In patch e1e40bccee7490a01ac7d1f759ec2bbafd3c7185, it says that"many routines 
can logically assert holding the p2m *FOR A SPECIFIC GFN.*" , 
But I find out that it did nothing for locking a single gfn, in fact  it still 
locked the entire p2m list. 

-#define p2m_lock_recursive(p) mm_lock_recursive(p2m, &(p)->lock)
+#define gfn_lock(p,g,o)   mm_lock_recursive(p2m, &(p)->lock)  //'g' is not 
used. The entire p2m list is locked.


Do we have any plan to lock a specific gfn? If so, how ? If not, shall we 
redefine the macro? 

Thans.

Oscar.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] what's the equivalent function to "schedule_timeout" in xen kernel?

2015-11-15 Thread Zhangbo (Oscar)
Hi all:
  I'd like to SLEEP a while in xen kernel during VMEXIT,  the easiest way is to 
call "udelay" or "mdelay" there. However, these 2 functions use busy wait to 
sleep, which is a waste.
  In linux kernel, there's a function named  'schedule_timeout', allowing the 
CPU to run other tasks during SLEEPING. 
  So, is there any equivalent function to "schedule_timeout" in xen kernel ? 
  Thanks in advance.
  

Oscar.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel