Re: [osol-code] workaround proposal for bug 6745357

2009-03-24 Thread Liu, Jiang
Hi Guy, As you mentioned there's no SLIT on your machine, I found I made a mistake last night. Following latency table should be generated by function lgrp_plat_2level_setup() in i86pc/os/lgrpplat.c instead of read from SLIT. According to ACPI, entry[n][n] in SLIT table should have value 1

Re: [osol-code] workaround proposal for bug 6745357

2009-03-24 Thread Guy
Hi gerry On my system, there is only a srat table, no slit. So the latencies are always calculated, and this part is done in lgrpplac.c So having values of 1 or 2 instead of 105 and 139 doesn't sound quite normal to me. But maybe Jonathan or Kit could could tell us what they think bout that ?

Re: [osol-code] workaround proposal for bug 6745357

2009-03-24 Thread Liu, Jiang
Hi Guy, The latency statistics will be different with SRAT enabled or disabled. When SRAT is enabled, all latency information will be read from SLIT table, which is generated by BIOS/firmware. When SRAT is disabled (or more precise, SLIT is disabled), lgrp will try to probe the latency amo

Re: [osol-code] workaround proposal for bug 6745357

2009-03-24 Thread Guy
Hello gerry, hello kit I tested your fix proposal on my opteron system (before kit answered). With this fix, the system boots, but the lgrp framework doesn't seem to be initialized properly. The latency stats are weird. > lgrp_plat_node_domain::print [ { exists = 0x1 prox_dom

Re: [osol-code] workaround proposal for bug 6745357

2009-03-19 Thread Liu, Jiang
Hi Kit, Thanks for your comments. I missed other places which hace dependency on order of mem_node_config, so will read relative code later. I'm just feeling that bug 6745357 is caused by platform dependent vm subsystem and lgrp subsystem performs correctly according to the design, so I w

Re: [osol-code] workaround proposal for bug 6745357

2009-03-19 Thread Kit M. Chow
Hi Gerry, Please be aware that there are other places in Solaris besides mnode_range_setup() making assumptions about the memory ordering in mem_node_config[]. I think the current plan of attack to have Jonathan make sure mem_node_config[] is well behaved is the least risky solution for now.

Re: [osol-code] workaround proposal for bug 6745357

2009-03-19 Thread Liu, Jiang
Hi Guy, After reading more code relative to bug 6745357, I found there may be another better way to fix it. In file uts/i86pc/vm/vm_machdep.c, all "mnoderanges" relative logic has an assumption that entries in mnoderanges array are arranged in ascendent order with memory physical

Re: [osol-code] workaround proposal for bug 6745357

2009-03-17 Thread Guy
Hello Gerry, About your former post : > The patch is still based on the assumption that memory node with > bigger node id will have higher memory address with it. That assumption is > true for most current platforms, but things change fast and that assumption > may become broken with future platf

Re: [osol-code] workaround proposal for bug 6745357

2009-03-16 Thread Liu, Jiang
Hi Guy, You are right! I haven't realized that "lgrp_plat_srat_enable" and "lgrp_plat_slit_enable" are accessed at very early stage during boot, lgrp_init() called from ml_setup(), which is even before main(). So the mechanism mentioned by me is not suitable for such a case. Thanks!

Re: [osol-code] workaround proposal for bug 6745357

2009-03-16 Thread Guy
Hello Gerry Can you tell me more about the implied mechanism ? I already tested to add such bootargs without modifying the code (without success). /etc/system is not suitable for such initialization, because this problem arises when we boot on the install DVD, so you cannot edit the /etc/system

Re: [osol-code] workaround proposal for bug 6745357

2009-03-14 Thread Liu, Jiang
Hi Guy, There's no need to add boot option "lgrp-srat-enable" and "lgrp-slit-enable", there are already mechanism supporting such function. You could add boot option "-B lgrp_plat_srat_enable=0" to disable SRAT and "-B lgrp_plat_slit_enable" to disable SLIT, or you could set it in /etc/sy

Re: [osol-code] workaround proposal for bug 6745357

2009-03-14 Thread Liu, Jiang
Hi Guy, I have reviewed your patch for bug 6745357 and feels that it's just a workaround instead of a real fix for 6745357. The patch is still based on the assumption that memory node with bigger node id will have higher memory address with it. That assumption is true for most curr

Re: [osol-code] workaround proposal for bug 6745357

2009-03-13 Thread Guy
Hello After working with Jonathan Chew on the subject (by emails), I discovered that the problem came from the wrong initialization of the lgrp kernel internal objects when the ACPI SRAT table is used to populate these objects (this applies to my system HP proliant with at least 2 opteron proces

Re: [osol-code] workaround proposal for bug 6745357

2009-02-24 Thread Guy
Hello Gavin Yes I tested the fix suggested by Dimitri, and it didn't work. The system hung during boot. After doing this test, I did more testings in other directions. - First, I searched the build that introduced the regression. I tested builds 55, 87, that worked fine. And all build after snv_

Re: [osol-code] workaround proposal for bug 6745357

2009-02-23 Thread Gavin Maltby
Hi, Guy wrote: Hello I worked lately on the bug 6745357 (kernel crash during startup at page_ctr_add_internal). I have observed the following : The bug was introduced from nv_88. The changeset that breaks the boot is this one : Issues Resolved: BUG/RFE:6594519Need support for ACPI System R

[osol-code] workaround proposal for bug 6745357

2009-02-23 Thread Guy
Hello I worked lately on the bug 6745357 (kernel crash during startup at page_ctr_add_internal). I have observed the following : The bug was introduced from nv_88. The changeset that breaks the boot is this one : Issues Resolved: BUG/RFE:6594519Need support for ACPI System Resource Affinity T