On 17-1-2022 18:14, Tomoaki AOKI wrote:
On Mon, 17 Jan 2022 15:04:16 +0100
Willem Jan Withagen <[email protected]> wrote:
On 17-1-2022 14:46, Eugene Grosbein wrote:
17.01.2022 20:24, Willem Jan Withagen wrote:
Well, perform independent hardware (memory) testing with something like
memtest86+
and if it is all right, you show ask someone more knowledgeable. Maybe CC:
[email protected]
Perhaps should have done that when I started, but supplier assured me that
the they just retired the boards with out any issues.
Memtest86 found the faulty DIMM in 30 secs...
Not sure if we could/want educate vm_mem_init() to actually detect this.
It is still in the part where everthing is still running on the first CPU.
Making things a bit easier to understand what is going on.
Lets see if the box will run on 3 DIMMs for the rime being.
Then figure out with DMIdecode what we need expand again.
Is it ECC memory or non-ECC?
The kernel already have full memory testing performed at boot time
unless disabled with another loader knob:
hw.memtest.tests=0
Try booting it with memory testing disabled and without hw.physmem limitation.
Maybe it will boot.
With ECC, it could be hardware interrupt while kernel runs that test
and wrong in-kernel processing of the interrupt.
Swapped the DIMM with 3 others, but still the same errors.
Then I changed DIMM slot, and the errors went away.
So definitely a hardware issue
when booted FreeBSD reported already only 12Gb in system ( there are 4
4GB dimms)
Using 8Gb. DIMMs are ECC.
But then still it would only boot when mem set to 8G.
Waiting for memtest to finish at least one pass.
Usually that will take quite some time.
--WjW
Not sure this is the case, but some motherboards have severe limitation
about DIMM slot usage, if not fully used.
For example, assuming slot No. are B0-0, 1, 2, 3 and B1-0, 1, 2, 3,
*Must use "interleaved. If 4 in 8 slots are to be used,
B0-0, B0-2, B1-0, B1-2 shall be used.
(Some forced B0-1, B0-3, B1-1, B1-3, IIRC)
*Must NOT use "interleaved.
B0-0, B0-1, B1-0, B1-1 shall be used.
*Must NOT use B1 unless B0 is full of DIMs.
B0-0. B0-1, B0-2, B0-3 shall be used.
and so on, depending on motherboard vendor (at worst, per model.)
Yup, I know... I used the board in the configuration I got it.
And its a DUAL processor board with 2 opterons.
The config works correct for the first Opteron (Called CPU1)
using slots: CPU1/DIMM1A and CPU1/DIMM1B
But on the second CPU I have to use the third slot....
so using slots: CPU2/DIMM1B and CPU2/DIMM2B
And my memtest86 has complete 1 full pass over 16G without errors.
So I'm guessing that the order is not majorly picky.
But you are correct in noting this, so I will read up ont this in the
manual.
Thanx,
--WjW