Erik: Shared GC is still only partially implemented and definitely not
shipped (or usable), so that document is surely unrelated to whatever is
going on here. All existing ways to share data between isolates (such as
the NativeModule cache) use other mechanisms.

Kenton: I can't rule out anything. We admittedly don't have much test
coverage for thousands-of-isolates scenarios. Perhaps the
--trace-wasm-offheap-memory flag can help narrow it down a bit. It's
currently only hooked up with the memory measurement API, so you'll either
have to use that, or hack some more triggers into convenient places
(perhaps isolate shutdown or creation?), see occurrences of
v8_flags.print_wasm_offheap_memory_size for inspiration.

A few more ideas:
- from what you describe, perhaps it would be feasible to craft a
reproducer. It'd probably have to be a custom V8 embedder that, in a loop,
creates many fresh isolates and instantiates/runs the same (or several?)
demo Wasm module in them.
- it could make sense to verify (with printfs in their destructors) that
both Isolates and NativeModules get destroyed as expected. It's conceivable
that the memory growth you're observing is intentional caching (of
generated code, or something?) because the WasmEngine thinks that the
cached data is still needed/useful.

How/where exactly are you seeing this increased "external memory"?
I.e. what reporting system are you using to get memory consumption numbers?


On Tue, Jan 14, 2025 at 1:09 AM Kenton Varda <[email protected]> wrote:

> To add context here:
>
> The problem appears to show up only after running in production for an
> hour or two. During that time we will have created thousands of isolates to
> handle millions of requests.
>
> But the problem seems to affect *new* isolates, even when those isolates
> are loaded with applications that had been loaded into previous isolates
> without problems. Startup of an application should be 100% deterministic
> since we disallow any I/O during startup, but we're seeing that after the
> host has been running a while, new isolates are showing much higher
> "external memory" on startup. (E.g. 400MB external memory, but we enforce a
> 128MB limit on the whole isolate.)
>
> We observed that the wasm native module cache causes identical wasm
> modules to be shared across isolates, and that wasm lazy compilation causes
> memory usage of a wasm module -- as accounted by all isolates that have
> loaded it -- to change.
>
> Could it be that there is a memory leak in lazy compilation, such that
> these shared cached modules are gradually growing over time, to the point
> where new isolates that try to load these modules are being hit with
> extremely high "external memory" numbers right off the bat?
>
> -Kenton
>
> On Mon, Jan 13, 2025 at 5:31 PM Erik Corry <[email protected]> wrote:
>
>> It looks like it's related to shared objects between isolates. Is there a
>> newer document than
>> https://docs.google.com/document/d/18lYuaEsDSudzl2TDu-nc-0sVXW7WTGAs14k64GEhnFg/edit?usp=drivesdk
>> that describes how this works today? In particular cross-isolate GCs?
>>
>> On Mon, 13 Jan 2025, 15:25 Jakob Kummerow, <[email protected]>
>> wrote:
>>
>>> Sounds like a bug, but without more details (or a repro) I don't have a
>>> more specific guess than that.
>>>
>>> If you're desperate, you could try to bisect it (even with a flaky
>>> repro). Or review the ~500 changes between those branches:
>>> https://chromium.googlesource.com/v8/v8/+log/branch-heads/13.1..branch-heads/13.2?n=10000
>>>
>>>
>>> On Mon, Jan 13, 2025 at 2:48 PM 'Dan Lapid' via v8-dev <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>> In V8 13.2 and 13.3 we see wasm isolates external memory usage blowing
>>>> up sometimes (up to gigabytes).
>>>> Under V8 13.1 the same code would never ever use more than 80-100MB
>>>> The issue doesn't happen every time for the same wasm bytecode. It
>>>> doesn't even reproduce locally.
>>>> But some significant percentage of the time it does happen.
>>>> This has only started happening in 13.2, what are we missing? Should we
>>>> be enabling/disabling some flags?
>>>> It also seems that 13.3 is significantly worse in terms of error rate.
>>>> The problem happens under "--liftoff-only".
>>>> We use pointer compression but not sandbox.
>>>> We've tried enabling --turboshaft-wasm in 13.1 and the problem did not
>>>> reproduce.
>>>> Has anything changed that we need to adapt to?
>>>> Would really appreciate your help!
>>>>
>>>> --
>>>>
>>>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/v8-dev/CAKSzg3TxsHYNPC1o%2Beq%3DP__-FYZcWxFEwJmi1fGx5_AcMBarEw%40mail.gmail.com.

Reply via email to