Linux 4.10.1 (has SMT scheduler fix) GCC 5.4 - so no Ryzen optimizations pachi-git-13115394 Intel Haswell t=8 13325 g/s t=1 1665 g/s @3.6GHz t=4 9352 g/s t=1 2338 g/s @3.6GHz t=1 2542 g/s @3.8GHz
AMD Ryzen t=16 26589 g/s t=1 1661 g/s @3.7GHz t=8 15464 g/s t=1 1933 g/s @3.7GHz t=4 8141 g/s t=1 2035 g/s @3.7GHz t=1 2221 g/s @3.7GHz Leela 0.9.4 Intel Haswell @3.8GHz (OpenBLAS Haswell BLAS) 1 thread benchmark = 5685 g/s (mostly INT) netbench predictions = 33 p/s (DCNN AVX2 FPU) netbench evaluations = 238 p/s (DCNN AVX2 FPU) AMD Ryzen @3.7GHz (OpenBLAS Haswell BLAS) 1 thread benchmark = 5099 g/s (mostly INT) netbench predictions = 27 p/s (DCNN AVX2 FPU) netbench evaluations = 239 p/s (DCNN AVX2 FPU) Observations: - SMT performance of Ryzen appears to be extremely good (+72% on pachi vs +42% Intel). - Single core IPC is 8.5% ~ 11.5% behind Haswell. - Ryzen's AVX2 performance is too good. Ryzen has 2 x 128 bit FPU vs. 2 x 256 bit FPU for Haswell, and the majority of the time in Leela 0.9 is spent in SGEMM, which is an ideal case for AVX2 code. I would have predicted AVX2 results to be about half as fast on Ryzen, but its results are extremely competitive or even better. I have no real explanation for this, my best guess is a win due to a better fit cache subsystem. - By default OpenBLAS selects the Barcelona kernel for Ryzen (ugh!). Overriding with the Haswell kernel gives much better results due to AVX2 usage. -- GCP _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go