Re: How hardware implements CAS

2017-01-06 Thread Yunpeng Li
Thanks a lot I will dig it later -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options,

Re: How hardware implements CAS

2017-01-05 Thread Rajiv Kurian
For folks who know Verilog (and even for folks who don't) here are some resources re Chisel and HDL in general that might be a good place to start: Standalone Chisel intro: http://inst.eecs.berkeley.edu/~cs250/sp16/lectures/lec02-sp16-rev2.pdf Chisel for folks who know Verilog:

Re: How hardware implements CAS

2017-01-05 Thread Rajiv Kurian
If you are really interested in the low level details and can read or are willing to learn to read an HDL, I'd say take a look at the risc-v project. You can either look at the rocket (in-order) or boom (out-of-order) cores. My recommendation would be to start with rocket since it is simpler.

Re: How hardware implements CAS

2017-01-04 Thread Yunpeng Li
Thanks a lot for the explanations. -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options,

Re: How hardware implements CAS

2017-01-04 Thread Vitaly Davidovich
Oh, and forgot to mention the LL/SC style of CAS that's offered by some architectures with weak (by default) memory models. The Load-Linked/Store-Conditional becomes a non-atomic operation underneath, but the CPU ensures that the store is only done if the underlying cacheline wasn't taken away

Re: How hardware implements CAS

2017-01-04 Thread Avi Kivity
Gil covered the implementation details; as to overhead, it can be quite low if there is no cacheline contention. Agner's tables list Skylake lock cmpxchg as having a throughput of 1 insn per 18 cycles, which is fairly amazing. However, as soon as you have contention, this tanks completely due

How hardware implements CAS

2017-01-04 Thread Yunpeng Li
Hi there, Could someone help to share some light on how hardware really do to implement atomic operations such as CAS? Especially what's the difference and overhead in the spectrum from single-thread-single-core-single-socket to hyper-thread-multi-core-multi-socket architectures. The