Re: [v8-dev] performance considerations for well-typed Wasm binaries

Wolfgang Fri, 05 Apr 2024 13:27:42 -0700

Hi Andreas,

yes, the binaries are small. (Independent note: they are strange in some
way it seems: wasmtime takes 2 seconds to compile/instantiate color, and is
in general much slower for all our other benchmarks as well. Investigating
this as well currently.)


Thanks for the explanation, interesting!
No, we don't use ARM, we measured the improvement on two standard x64 linux
laptops.

Some more background, not sure if relevant:
Our memory model is quite simplistic: memory is never freed, we have a
global that points to the next free segment in the linear mem, the pointer
is increased after every allocation and never decreased. We grow the linear
memory regularly if needed.
In this model, color uses just 16MB of linear mem, sha_fast 25MB.

In the end I'm happy it's a bit faster :) Sure, it would be cool to know
why...
Best,
Wolfgang

The setup of the following two is exactly the same, except in the second
one we call node with "--no-wasm-bounds-checks".

Without --no-wasm-bounds-checks:
Running avg. of 20 runs in node (v20.10.0).
       demo1-opt_coalesce-locals : startup:    3, main:   2, pp: 20, sum:
25
       demo2-opt_coalesce-locals : startup:    3, main:   0, pp:  9, sum:
12
    list_sum-opt_coalesce-locals : startup:    3, main:   0, pp:  2, sum:
 5
     vs_easy-opt_coalesce-locals : startup:    4, main:  22, pp:  3, sum:
29
     vs_hard-opt_coalesce-locals : startup:    4, main:  91, pp:  3, sum:
98
       binom-opt_coalesce-locals : startup:    3, main:  15, pp: 18, sum:
36
       color-opt_coalesce-locals : startup:    7, main: 164, pp:  2, sum:
 173
    sha_fast-opt_coalesce-locals : startup:    4, main: 112, pp:  7, sum:
 123
  even_10000-opt_coalesce-locals : startup:    3, main:   1, pp:  1, sum:
 5
     ack_3_9-opt_coalesce-locals : startup:    3, main: 112, pp: 28, sum:
 143
sm_gauss_nat-opt_coalesce-locals : startup:    3, main:  58, pp:  1, sum:
62
  sm_gauss_N-opt_coalesce-locals : startup:    3, main:  22, pp:  1, sum:
26

With --no-wasm-bounds-checks:
Running avg. of 20 runs in node (v20.10.0).
       demo1-opt_coalesce-locals : startup:    3, main:   2, pp: 19, sum:
24
       demo2-opt_coalesce-locals : startup:    3, main:   0, pp:  9, sum:
12
    list_sum-opt_coalesce-locals : startup:    3, main:   0, pp:  3, sum:
 6
     vs_easy-opt_coalesce-locals : startup:    4, main:  21, pp:  3, sum:
28
     vs_hard-opt_coalesce-locals : startup:    4, main:  91, pp:  2, sum:
97
       binom-opt_coalesce-locals : startup:    3, main:  14, pp: 18, sum:
35
       color-opt_coalesce-locals : startup:    7, main: 157, pp:  2, sum:
 166
    sha_fast-opt_coalesce-locals : startup:    4, main: 109, pp:  7, sum:
 120
  even_10000-opt_coalesce-locals : startup:    3, main:   1, pp:  1, sum:
 5
     ack_3_9-opt_coalesce-locals : startup:    3, main: 112, pp: 28, sum:
 143
sm_gauss_nat-opt_coalesce-locals : startup:    3, main:  58, pp:  1, sum:
62
  sm_gauss_N-opt_coalesce-locals : startup:    3, main:  22, pp:  1, sum:
26

On Tue, Apr 2, 2024 at 10:26 AM 'Andreas Haas' via v8-dev <
[email protected]> wrote:

> Hi Wolfgang,
>
> Your module seems to be quite small, so I guess disabling validation saves
> you less than 1ms. Do you run your experiments on a MacBook?
>
> It is a bit surprising that you can save even 5% by disabling bounds
> checks. The reason is that we don't have explicit bounds checks, but
> instead use a signal handler. This works as follows: The WebAssembly memory
> is surrounded by guard pages which are marked as not readable and not
> writable. The WebAssembly memory and the guard pages together cover 8GB.
> which means that any 32 bit memory address + 32 bit offset will always
> either hit the WebAssembly memory or a guard page. WebAssembly code then
> executes memory accesses unconditionally. If the memory access is inside
> the WebAssembly memory, then everything works just fine. If it hits a guard
> page, then a segfault is triggered. The signal handler catches the
> segfault, creates a JavaScript exception, and then continues execution at a
> JavaScript exception handler. Naturally this only works on 64-bit
> architecture, because otherwise you cannot reserve 8GB of address space.
> Also, on arm64 we cannot do this optimization for writes. Assume a write
> that is partially out of memory, e.g. the first two bytes are in-bounds,
> but the second two bytes are out of bounds. On x64, a segfault is triggered
> before any of the four bytes are written, but on some arm64 CPUs, the two
> in-bounds bytes get written before the segfault gets triggered. Therefore,
> I assume you did your experiment on a MacBook or some other arm64 device,
> because on x64 I would assume that you cannot measure any performance
> improvements by skipping bounds checks.
>
> Cheers, Andreas
>
> On Sat, Mar 30, 2024 at 11:15 PM Wolfgang <[email protected]> wrote:
>
>> Hi Andreas,
>> thanks a lot for the quick response!
>> This is precisely what I was looking for.
>>
>> We indeed know, that memory accesses are always in bounds, so we can
>> safely use the flag `--no-wasm-bounds-checks`. This speeds up one of our
>> benchmarks by ~5% thanks! (for the others, the difference is not really
>> measurable)
>>
>> Here is the Coq definition for instantiation [0] which we proved, this
>> includes that the module is well-typed [1], so omitting the function
>> validation check you suggested (ValidateFunctionBody) should be safe.
>> However, doing that didn't affect the performance in a measurable way: It
>> seems, function validation is quite cheap.
>> (I compared the current master of Node.js (v8 version: 11.9.169.7) with
>> and without the function validation check, by commenting out lines 87-101,
>> 103 [2].
>>  Our modules have <150 functions with around 10-1000 instructions and a
>> main function with typically >50k instructions.)
>>
>> We also know that all programs terminate, but I guess we can still
>> overflow the call stack, e.g. when the source program is big enough (and
>> non-tail-recursive).
>>
>> Best,
>> Wolfgang
>>
>> [0]:
>> https://github.com/WasmCert/WasmCert-Coq/blob/1fda06ea0d52caae1ece83af109b48a9838b3869/theories/instantiation_spec.v#L530
>> [1]:
>> https://github.com/WasmCert/WasmCert-Coq/blob/1fda06ea0d52caae1ece83af109b48a9838b3869/theories/instantiation_spec.v#L389
>> [2]:
>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/wasm/function-compiler.cc;l=87;drc=d176526630b2049486a8be93fd89482d0aba6cfc;bpv=1;bpt=1
>>
>> On Monday, March 25, 2024 at 11:16:00 PM UTC+1 [email protected] wrote:
>>
>>> Hi Wolfgang,
>>>
>>> The command "nodejs --help --v8-options" seems to print all V8 command
>>> line options. It's not clear what runtime checks you would like to adjust.
>>> Here are some flags I can think of:
>>>
>>> You could use --wasm-lazy-validation to avoid function validation during
>>> module compilation and module instantiation, but function validation would
>>> still happen lazily when a function gets executed for the first time. If
>>> you want to prevent even the lazy function validation, then you have to
>>> adjust the code at [1].
>>>
>>> You can use `--no-wasm-stack-checks` if you can guarantee that there
>>> will not be a stack overflow, e.g. because of an unbounded recursion.
>>>
>>> You can use `--no-wasm-bounds-checks` if you can guarantee that memory
>>> accesses are always in-bounds.
>>>
>>> Cheers, Andreas
>>>
>>> [1]
>>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/wasm/function-compiler.cc;l=97;drc=d176526630b2049486a8be93fd89482d0aba6cfc
>>>
>>> On Sun, Mar 24, 2024 at 7:41 PM Wolfgang <[email protected]> wrote:
>>>
>>>> Hello,
>>>> I'm a masters student at Aarhus university.
>>>>
>>>> We built a provably correct compiler (using Coq) targeting Wasm: Link
>>>> <https://womeier.de/files/certicoqwasm-coqpl24-abstract.pdf>
>>>> This includes a proof that the modules our compiler generates
>>>> instantiate
>>>> according to the spec, i.e. they are well-typed.
>>>>
>>>> We are evaluating the performance with Node.js, is there a way to
>>>> improve our performance
>>>> by e.g. disabling some runtime-checks, given that we have the stronger
>>>> correctness guarantees?
>>>>
>>>> Node.js doesn't seem to have (publicly documented) flags to allow that.
>>>>
>>>> I included some of our numbers below.
>>>>
>>>> Best,
>>>> Wolfgang
>>>>
>>>> All times in ms, avarage of 10 runs:
>>>> startup=load binary+instantiate
>>>> main=main function
>>>> pp=pretty printing the result as an S-expression, we call an imported
>>>> function
>>>>       for every character, thus somewhat slow, but not that relevant
>>>> for my question
>>>>
>>>> For a description of the benchmarks, see: Chapter 8.2,
>>>> https://zoep.github.io/thesis_final.pdf
>>>>
>>>> Node.js: v18.19.0:
>>>>    demo1-opt_coalesce-locals : startup:     6,  main:   0,   pp: 28,
>>>> sum:   34
>>>>    demo2-opt_coalesce-locals : startup:     3,  main:   0,   pp:  8,
>>>> sum:   11
>>>> list_sum-opt_coalesce-locals : startup:     3,   main:   0,   pp:  2,
>>>> sum:    5
>>>>  vs_easy-opt_coalesce-locals : startup:    10,  main:  33,  pp:  1,
>>>> sum:   44
>>>>  vs_hard-opt_coalesce-locals : startup:    10,  main: 101, pp:  1,
>>>> sum:  112
>>>>     binom-opt_coalesce-locals : startup:    18,  main:  10,  pp: 24,
>>>> sum:   52
>>>> sha_fast-opt_coalesce-locals : startup:    65,  main:  70,  pp:  7,
>>>> sum:  142
>>>>        color-opt_coalesce-locals : startup:  132, main:  44,  pp:  2,
>>>> sum:  178
>>>>
>>>> Node.js: v20.11.1
>>>>    demo1-opt_coalesce-locals : startup:    1, main:     3, pp:  24,
>>>> sum:   28
>>>>    demo2-opt_coalesce-locals : startup:    2, main:     0, pp:  12,
>>>> sum:   14
>>>> list_sum-opt_coalesce-locals : startup:    2,  main:     0, pp:   2,
>>>> sum:    4
>>>>  vs_easy-opt_coalesce-locals : startup:    4, main:    38, pp:   4,
>>>> sum:   46
>>>>  vs_hard-opt_coalesce-locals : startup:    3,  main:  110, pp:   3,
>>>> sum:  116
>>>>     binom-opt_coalesce-locals : startup:     3, main:    26, pp: 23,
>>>> sum:   52
>>>> sha_fast-opt_coalesce-locals : startup:    4,  main:  228, pp: 10,
>>>> sum:  242
>>>>        color-opt_coalesce-locals : startup:   12, main: 332, pp:  2,
>>>> sum:  346
>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> v8-dev mailing list
>>>> [email protected]
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "v8-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/v8-dev/e7e5b6c8-0f7b-4787-b66f-592f02873950n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/v8-dev/e7e5b6c8-0f7b-4787-b66f-592f02873950n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>>
>>> Andreas Haas
>>>
>>> Software Engineer
>>>
>>> [email protected]
>>>
>>>
>>> Google Germany GmbH
>>>
>>> Erika-Mann-Straße 33
>>>
>>> 80636 München
>>>
>>>
>>> Geschäftsführer: Paul Manicle, Liana Sebastian
>>>
>>> Registergericht und -nummer: Hamburg, HRB 86891
>>>
>>> Sitz der Gesellschaft: Hamburg
>>>
>>>
>>> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten
>>> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
>>> löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen,
>>> dass die E-Mail an die falsche Person gesendet wurde.
>>>
>>>
>>>
>>> This e-mail is confidential. If you received this communication by
>>> mistake, please don't forward it to anyone else, please erase all copies
>>> and attachments, and please let me know that it has gone to the wrong
>>> person.
>>>
>>> --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/v8-dev/5aa215cc-ed00-4577-bc54-d322161d44a3n%40googlegroups.com
>> <https://groups.google.com/d/msgid/v8-dev/5aa215cc-ed00-4577-bc54-d322161d44a3n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
>
> Andreas Haas
>
> Software Engineer
>
> [email protected]
>
>
> Google Germany GmbH
>
> Erika-Mann-Straße 33
>
> 80636 München
>
>
> Geschäftsführer: Paul Manicle, Liana Sebastian
>
> Registergericht und -nummer: Hamburg, HRB 86891
>
> Sitz der Gesellschaft: Hamburg
>
>
> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten
> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
> löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen,
> dass die E-Mail an die falsche Person gesendet wurde.
>
>
>
> This e-mail is confidential. If you received this communication by
> mistake, please don't forward it to anyone else, please erase all copies
> and attachments, and please let me know that it has gone to the wrong
> person.
>
> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-dev/CAELSTvfyxsUiF7RZfw3tAY%2B3f7veg3PVqWUUEjJd8KxxOBM5vA%40mail.gmail.com
> <https://groups.google.com/d/msgid/v8-dev/CAELSTvfyxsUiF7RZfw3tAY%2B3f7veg3PVqWUUEjJd8KxxOBM5vA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CAGU0z--x6pVB02j2hQV5ADq070tN3XaOKwkEpB5tSsSZhnQ9LA%40mail.gmail.com.

Re: [v8-dev] performance considerations for well-typed Wasm binaries

Reply via email to