On Tue, Jan 14, 2025 at 2:00 PM Erik Corry <[email protected]> wrote:
> The external memory is the one the internal heap knows about:
>
> uint64_t Heap::external_memory() const { return external_memory_.total(); }
>
> The following code in wasm-engine.cc:1015 attributes external memory to
> the isolate, in the From() call on the second-to-last line.
>
> Is the native_module likely to be shared between isolates here, and long
> lived?
>
Yes, NativeModules are shared per process. They are primarily keyed on Wasm
wire bytes, so if multiple Isolates instantiate the same Wasm module,
they'll share a lot of memory (including generated code, and other
engine-internal metadata) via the NativeModule. NativeModules are freed
when no Wasm instance is keeping them alive any more.
> Could it be that it is gradually committing more code space, causing later
> isolates to get a higher external
> memory size?
>
Yes, absolutely: with lazy compilation, committed code space should at
first be near-zero, and will grow over time as functions are called (for
the first time, triggering lazy compilation) and eventually optimized (when
they're sufficiently hot). That should be quite deterministic, and
upper-bounded by a module-specific maximum (once everything is optimized
and all inlining budgets are exhausted).
Also, none of this has changed recently, as far as I'm aware. I don't know
how to explain the regression you're observing.
> (does this backquoting work in email for fixed formatting? Probably not).
>
It does not. But this specific snippet is sufficiently readable either way.
```
> // Use the given shared {NativeModule}, but increase its reference count
> by
> // allocating a new {Managed<T>} that the {Script} references.
> size_t code_size_estimate = native_module->committed_code_space();
> size_t memory_estimate =
> code_size_estimate +
> wasm::WasmCodeManager::EstimateNativeModuleMetaDataSize(module);
> DirectHandle<Managed<wasm::NativeModule>> managed_native_module =
> Managed<wasm::NativeModule>::From(isolate, memory_estimate,
> std::move(native_module));
> ```
>
>
> On Tue, Jan 14, 2025 at 12:59 PM Jakob Kummerow <[email protected]>
> wrote:
>
>> Erik: Shared GC is still only partially implemented and definitely not
>> shipped (or usable), so that document is surely unrelated to whatever is
>> going on here. All existing ways to share data between isolates (such as
>> the NativeModule cache) use other mechanisms.
>>
>> Kenton: I can't rule out anything. We admittedly don't have much test
>> coverage for thousands-of-isolates scenarios. Perhaps the
>> --trace-wasm-offheap-memory flag can help narrow it down a bit. It's
>> currently only hooked up with the memory measurement API, so you'll either
>> have to use that, or hack some more triggers into convenient places
>> (perhaps isolate shutdown or creation?), see occurrences of
>> v8_flags.print_wasm_offheap_memory_size for inspiration.
>>
>> A few more ideas:
>> - from what you describe, perhaps it would be feasible to craft a
>> reproducer. It'd probably have to be a custom V8 embedder that, in a loop,
>> creates many fresh isolates and instantiates/runs the same (or several?)
>> demo Wasm module in them.
>> - it could make sense to verify (with printfs in their destructors) that
>> both Isolates and NativeModules get destroyed as expected. It's
>> conceivable that the memory growth you're observing is intentional caching
>> (of generated code, or something?) because the WasmEngine thinks that
>> the cached data is still needed/useful.
>>
>> How/where exactly are you seeing this increased "external memory"?
>> I.e. what reporting system are you using to get memory consumption numbers?
>>
>>
>> On Tue, Jan 14, 2025 at 1:09 AM Kenton Varda <[email protected]>
>> wrote:
>>
>>> To add context here:
>>>
>>> The problem appears to show up only after running in production for an
>>> hour or two. During that time we will have created thousands of isolates to
>>> handle millions of requests.
>>>
>>> But the problem seems to affect *new* isolates, even when those isolates
>>> are loaded with applications that had been loaded into previous isolates
>>> without problems. Startup of an application should be 100% deterministic
>>> since we disallow any I/O during startup, but we're seeing that after the
>>> host has been running a while, new isolates are showing much higher
>>> "external memory" on startup. (E.g. 400MB external memory, but we enforce a
>>> 128MB limit on the whole isolate.)
>>>
>>> We observed that the wasm native module cache causes identical wasm
>>> modules to be shared across isolates, and that wasm lazy compilation causes
>>> memory usage of a wasm module -- as accounted by all isolates that have
>>> loaded it -- to change.
>>>
>>> Could it be that there is a memory leak in lazy compilation, such that
>>> these shared cached modules are gradually growing over time, to the point
>>> where new isolates that try to load these modules are being hit with
>>> extremely high "external memory" numbers right off the bat?
>>>
>>> -Kenton
>>>
>>> On Mon, Jan 13, 2025 at 5:31 PM Erik Corry <[email protected]>
>>> wrote:
>>>
>>>> It looks like it's related to shared objects between isolates. Is there
>>>> a newer document than
>>>> https://docs.google.com/document/d/18lYuaEsDSudzl2TDu-nc-0sVXW7WTGAs14k64GEhnFg/edit?usp=drivesdk
>>>> that describes how this works today? In particular cross-isolate GCs?
>>>>
>>>> On Mon, 13 Jan 2025, 15:25 Jakob Kummerow, <[email protected]>
>>>> wrote:
>>>>
>>>>> Sounds like a bug, but without more details (or a repro) I don't have
>>>>> a more specific guess than that.
>>>>>
>>>>> If you're desperate, you could try to bisect it (even with a flaky
>>>>> repro). Or review the ~500 changes between those branches:
>>>>> https://chromium.googlesource.com/v8/v8/+log/branch-heads/13.1..branch-heads/13.2?n=10000
>>>>>
>>>>>
>>>>> On Mon, Jan 13, 2025 at 2:48 PM 'Dan Lapid' via v8-dev <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>> In V8 13.2 and 13.3 we see wasm isolates external memory usage
>>>>>> blowing up sometimes (up to gigabytes).
>>>>>> Under V8 13.1 the same code would never ever use more than 80-100MB
>>>>>> The issue doesn't happen every time for the same wasm bytecode. It
>>>>>> doesn't even reproduce locally.
>>>>>> But some significant percentage of the time it does happen.
>>>>>> This has only started happening in 13.2, what are we missing? Should
>>>>>> we be enabling/disabling some flags?
>>>>>> It also seems that 13.3 is significantly worse in terms of error rate.
>>>>>> The problem happens under "--liftoff-only".
>>>>>> We use pointer compression but not sandbox.
>>>>>> We've tried enabling --turboshaft-wasm in 13.1 and the problem did
>>>>>> not reproduce.
>>>>>> Has anything changed that we need to adapt to?
>>>>>> Would really appreciate your help!
>>>>>>
>>>>>> --
>>>>>>
>>>>>
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/v8-dev/CAKSzg3QcapgstXRaKHR24JJ9Ya1zCwKfc8bTGYF0%3DNwmWZwsxQ%40mail.gmail.com.