Hi, I'm working on that issue as an exercise/playground while studding the GHC internals: http://hackage.haskell.org/trac/ghc/ticket/693
First I tried just to replace "ccall lockClosure(mvar "ptr")" with GET_INFO(mvar) in stg_takeMVarzh and stg_putMVarzh and got 60% speedup (see the test case at the end.) Then I changed lockClosure to read header info directly when enabled_capabilities == 1. The speedup was significantly lower, <20% I tried to hack stg_putMVarzh directly: if (enabled_capabilities == 1) { info = GET_INFO(mvar); } else { ("ptr" info) = ccall lockClosure(mvar "ptr"); } But got no speedup at all. The generated asm (amd64): movl $enabled_capabilities,%eax cmpq $1,%rax je .Lcgq .Lcgp: movq %rbx,%rdi subq $8,%rsp movl $0,%eax call lockClosure addq $8,%rsp .Lcgr: cmpq $stg_MVAR_CLEAN_info,%rax jne .Lcgu {...} .Lcgq: movq (%rbx),%rax jmp .Lcgr It moves enabled_capabilities into %eax and then compares 1 with %rax. It looks wrong for me: the highest part of %rax remains uninitialized. (And actually it ignores the highest 32 bits of enabled_capabilities. That is ok here, but generally wrong.) The compiler knows that enabled_capabilities is 64bit (it complains when trying to assign it to bits32 register.) What do I miss? Any help? Thanks, Yuras -- The test case import Control.Concurrent import Control.Monad main :: IO () main = do var <- newMVar () replicateM_ 100000000 $ do takeMVar var putMVar var () _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users