Dear All,

This might be useful for anyone who is building a Linux PC system.

I have some more insight into the speed using i7-13700K, which is the current 13th gen Intel CPU. I have Z690-P D4 Asus board and either 128 GB (4x32) or 64 GB (2x32) Kingston FURY RAM DD4-3600 CL18-22-22 (I can just physcially add/remove 2 DIMMs).

With 64 GM RAM the system is seemingly couple of percent faster as compared to 128 GB. The reason is probably due to the i7 having only 2 memory channels (as any other consumer CPU), so having 4 DIMMs probably needs extra effort from the memory controller.

Disabling HT and/or VMX in BIOS didn't make a difference. Disabling all efficient cores in BIOS didn't make a difference.

Current conclusion is that the bottleneck of this system is the memory speed (RAM and probably CPU cache). My previous benchmarks were made with DDR4 RAM running at 2400, which is the default top speed for the DDR4 RAM. In order to get the RAM running at 3600 one needs to go into BIOS and enable the XMP there. My board has two default XMP settings in BIOS called XMP-I and XMP-II (one can also manipulate things manually but I didn't try). XMP is some protocol which allows the DIMM to tell the BIOS at which speed it should run (I think something like this is default in DDR5, but for DDR4 is has been added at some point, so older DDR4 boards might have a problem with this).

Our IT experts also compiled mpi. Their tests found mpi 10% slower than OMP. Maybe problems with compilation... I tried with 20 layer Fe slab and also found mpi clearly slower than OMP. So for now I decided not to invest time in mpi, I think very big cases are anyway not suitable for this system, because of that memory speed bottleneck.

Perhaps same CPU will run faster in parallel execution with DDR5. Also, perhaps CPUs with more cache will run faster. But these things are expensive, and e.g. the premium AMD CPUs are much more expensive than the i7 that I have. Also cache structure seems to be quite complex nowadays, so I am not sure if AMD CPUs would be better. Quite obviously, at this point efficient cores are useless due to the memory bottleneck.

Some OMP and k-parallel results of my current setup below. I think in general 4x localhost and OMP=2 is the winner.

Best,
Lukasz


With XMP-I the system is up over nearly 2 weeks now (so I call it stable). The serial benchmark is:

XMP-I, 128 GB DDR4 RAM at 3600, system stable
OMP=1 11.65 seconds
OMP=2 6.93
OMP=3 5.49
OMP=4 4.92
OMP=6 4.09
OMP=8 3.68
OMP=9 4.53
OMP=12 4.41 - 4.85 (results vary within this range more or less)
OMP=16 4.54

In general results can vary maybe by 1% from run to run, I have a feeling they are quite stable. I think OMP=12 variation might be related to usage or not of efficient cores.

With XMP-II the system is fastest but unstable (PC froze after 2 hours and needed hard reboot). The serial benchmark is:

XMP-II, 64 GB DDR4 RAM at 3600, system unstable
OMP=1 12.08
OMP=2 6.87
OMP=3 5.21
OMP=4 4.48
OMP=6 3.92
OMP=8 3.51
OMP=9 4.53
OMP=12 5.07

Previous results (email Feb 22, 2023) with 64GB DDR4 RAM at 2400:
OMP=1 12.82
OMP=2 7.65
OMP=4 5.51
OMP=6 4.87
OMP=8 4.52
OMP=12 5.54
OMP=16 5.55


k-parallel results with 16 k-points (16x Gamma point)
XMP-I, 128 GB DDR4 RAM at 3600, system stable
1x localhost OMP=1 3.05.30 min.sec.
2x localhost OMP=1 1.48.28
2x localhost OMP=2 1.18.23
4x localhost OMP=2 1.03.30
8x localhost OMP=1 1.04.53
8x localhost OMP=2 1.07.19

The best I ever got for this k-parallel test was 0.58.60 with XMP-II (system unstable) and 4x localhost OMP=2.



_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to