Status: Accepted
Owner: [email protected]
CC: [email protected]
Labels: Type-Bug Priority-Medium

New issue 2424 by [email protected]: Test case runs slower on V8 than on JSC
http://code.google.com/p/v8/issues/detail?id=2424

Reported by Dmitry Chestnykh:

"""
Chrome runs at ~1 sec with any array type, Safari 248 ms (with Int32Array, slower with Uint32Array, heh), Firefox - 670 ms.
"""
https://twitter.com/dchest/status/271974857124749312

Code: https://gist.github.com/4135716, hottest function salsa20_8_xor.

Results of the standalone test (attached) on my machine:

∮ out/ia32.release/d8 test.js scrypt: 686 ms 66,19,172,174,118,47,129,147,213,139,191,0,100,201
scrypt: 551 ms 66,19,172,174,118,47,129,147,213,139,191,0,100,201

Snippet from the function code (in the loop):

                u = x0 + x12;
                x4 ^= (u << 7) | (u >>> (32 - 7));
                u = x4 + x0;
                ...

Here x0 and x12 are known to be int32 (they are result of ^). Thus I would expect that u will be a truncating addition, however it is not:

      0 2 d1238 Change i359 i to d range[-2147483648,2147483647,m0=0] <|@
      0 1 d1258 Change i371 i to d range[-2147483648,2147483647,m0=0] <|@
      0 2 d391 Add d1238 d1258 ! <|@
0 2 i1265 Change d391 d to i truncating-int32 range[-2147483648,2147483647,m0=0] <|@
      0 1 i394 Shl i1265 i393 range[-2147483648,2147483647,m0=0] <|@
      0 1 i397 Shr i1265 i396 range[0,127,m0=0] <|@
0 1 i399 Bitwise BIT_OR i394 i397 range[-2147483648,2147483647,m0=0] <|@ 0 3 i401 Bitwise BIT_XOR i363 i399 range[-2147483648,2147483647,m0=0] <|@

It seems that we failed to infer that d391 should be actually i391 because it has only truncating uses.

Furthermore even if I rewrite code like:

                u = (x0 + x12) | 0;
                x4 ^= (u << 7) | (u >>> (32 - 7));
                u = (x4 + x0) | 0;

The addition remains double. I think now we should be able to revert both case with and case without |0 to integer addition (previously there were I think some issues with interfering simulates etc-etc). Adding the following case to HAdd::Canonicalize

    if (representation().IsDouble() &&
        left()->IsChange() && HChange::cast(left())->from().IsInteger32() &&
right()->IsChange() && HChange::cast(right())->from().IsInteger32() &&
        CheckUsesForFlag(kTruncatingToInt32)) {
      HAdd* instr = new(block()->zone()) HAdd(context(),
          HChange::cast(left())->value(),
          HChange::cast(right())->value());
      instr->set_representation(Representation::Integer32());
      instr->SetFlag(kTruncatingToInt32);
      instr->ClearFlag(kCanOverflow);
      instr->InsertBefore(this);
      return instr;
    }

Drastically improves both code and our results:

∮ out/ia32.release/d8 test.js
scrypt: 223 ms 66,19,172,174,118,47,129,147,213,139,191,0,100,201
scrypt: 197 ms 66,19,172,174,118,47,129,147,213,139,191,0,100,201

This is a factor of 3. I can upload my patch as a CL if you don't think that we should approach this problem in some other more generic way.

[there are also seem to be some representation inference problems in cold function SHA256.blocks which deopts a lot on non-truncating double-to-i for some phi that either has truncating uses and a use at int32-to-double, but I did not investigate that].

Attachments:
        test.js  21.6 KB

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to