Greetings,

is there a way to lower the log rate on error messages in slurmctld for nodes 
with hardware errors? 


We see for example this for a node that has DIMM errors:



[2022-05-12T07:07:34.757] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:35.760] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:36.763] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:37.766] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:38.769] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:39.773] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:40.776] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:41.779] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:42.781] error: Node node37 has low real_memory size (257642 < 
257660)
[2022-05-12T07:07:45.143] error: Node node37 has low real_memory size (257642 < 
257660)


The log warning is correct, the node has DIMM errors, but that´s one log entry 
per second. That doesn´t seem right with such high log rate?


Thanks,
/ Per Lonnborg
_______________________________________________________________
Annons: Handla enkelt och smidigt hos Clas Ohlson

Reply via email to