Re: [v8-users] Preliminary RFC: Stabilizing the V8 script compiler cached data format

Vitali Lovich Thu, 22 Jul 2021 16:18:28 -0700

Hi Leszek,

Apologies for the delayed reply - I've been a bit swamped at work the past
couple of days. Thank you for the excellent details & we'll align our plans
accordingly. Some replies inline.


I've replied privately to Jacob's concern as I don't want to derail this
conversation.

On Tue, Jul 20, 2021 at 3:19 AM Leszek Swirski <lesz...@chromium.org> wrote:

> Hi Vitali,
>
> Stabilising the cached data format as-is is pretty challenging; the cache
> as written is pretty much a direct field-by-field serialisation of the
> internal data structures, so freezing the cache would mean freezing the
> shapes of those internal objects, effectively making the internal fields an
> API-level guarantee. Furthermore, it's a backdoor to a stable bytecode
> format, which is something we've also pushed back on as it severely limits
> our ability to work on the interpreter; if we wanted to have a slightly
> weaker constraint of at least guaranteeing backwards compatibility with old
> bytecode, we'd have to vastly expand our test suite with old bytecodes in
> order to try to maintain this backwards compatibility, and even then I'm
> not sure we could fully guarantee if there's some edge case not covered in
> the test suite. Same story with porting code caches from older to newer
> versions; such a port would require a mapping from old to new, which would
> require a) some sort of log of what old fields/bytecodes translate to what
> new ones, and b) heavy testing to make sure that this mapping is valid.
> This is a big security problem; the deserialisation is pretty dumb (for
> performance reasons), and just spits out data onto the V8 heap without e.g.
> checking if the number of fields match. Having bugs in the old->new
> mapping, or in the backwards compatibility, would open up a whole pandora's
> box of security issues, where one deleted field in an edge case that tests
> don't cover would become an out-of-bounds write widget.
>
> Given that this would greatly increase our development complexity
> (maintaining a stable API is already a lot of trouble for us), would be a
> big source of security issues, and I don't expect it to provide much
> benefit for Chrome (since we expect websites to change more often than
> Chrome versions), I don't see us either working on (or accepting patches
> for) a stable or even upgradeable cache.
>
> I'd be curious to know if you've actually observed/measured script parse
> time being a big problem, or whether you're more seeing issues due to lazy
> function compilation time. We've done a lot of work on parse time in recent
> years, so it's not as slow as (some) people assume.
>
What's the best way to measure script parse time vs lazy function
compilation time? It's been a few months since I last looked at this so my
memory is a bit hazy on whether it was instantiating
v8::ScriptCompiler::Source, v8::ScriptCompiler::CompileUnboundScript, or
the combined time of both (although I suspect both count as script parse
time?). I do recall that on my laptop, using the code cache basically
halved the time on larger scripts of what I was measuring & I suspect I
would have looked at the overall time to instantiate the isolate with a
script (it was a no-op on smaller scripts, so I suspect we're talking about
script parse time).

FWIW, if It's helpful, when I profiled a stress test of isolate
construction on my machine with a release build, I saw V8 spending a lot of
time deserializing the snapshot (seemingly once for the isolate & then
again for the context). Breakdown of the flamegraph:
* ~22% of total runtime to run NewContextFromSnapshot. Within that ~5% of
total runtime was spent just decompressing the snapshot & the rest was
deserializing it (17%). I thought there was only 1 snapshot. Couldn't the
decompression happen once in V8System instead?
* 9% of total runtime spent decompressing the snapshot for the isolate (in
other words 14% of total runtime was spent decompressing the snapshot).

In our use-case we construct a lot of isolates in the same process. I'm
curious if there's opportunities to extend V8 to utilize COW to reduce the
memory & CPU impact of deserializing the snapshot multiple times. Is my
guess correct that deserialization is actually doing non-trivial things
like relocating objects or do you think there's a 0-copy approach that can
be taken with serializing/deserializing the snapshot so that it's prebuilt
in the right format (perhaps even without any compression)?

With respect to compression, do you think that maybe the snapshot could be
moved to being provided when V8System is constructed so that all isolates
deserialize out of the same decompressed snapshot?

Apologies if these questions are nonsensical. I'm still trying to learn how
the internals of V8 hook up together.


> We're also prototyping a potential stable & standardisable snapshot format
> for the results of partial script execution, which could help you if you're
> seeing large script "setup" code being an issue, but it wouldn't store
> compiled bytecode (for the above reasons).
>
> I appreciate that this might be a disappointing answer for you, but having
> flexibility with internal objects and bytecode is one of the things that
> allows us to stay performant and secure.
>
I fully understand. I'm definitely interested in the snapshot format since
presumably anything that helps the web here will also help us. Is there a
paper I can reference to read up more on the proposal? I've seen a few in
the wild from the broader JS community but nothing about V8's plans here. I
have no idea if that will help our workload but it's certainly something
we're open to exploring.

Thanks,
Vitali

- Leszek
>
> On Monday, July 19, 2021 at 9:00:52 PM UTC+2 lewis....@gmail.com wrote:
>
>> Hi Vitali,
>>
>> I’m neither from the v8 team, nor an expert in this subject matter. Just
>> wanted to drop an interesting project: Hermes - https://hermesengine.dev
>> , a javascript engine by Facebook that is tailored for fast startup times.
>> It does this by precompiling javascript into bytecode at build time.
>>
>> So something like this should be possible maybe.
>>
>> Best,
>> Joe
>>
>> On Mon, Jul 19, 2021 at 9:32 PM Vitali Lovich <vlo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I wanted to kick off a discussion and solicit some thoughts on whether
>>> it would be operationally feasible to try to stabilize the cached data
>>> format of the compiler.
>>>
>>> The context is that I work on Cloudflare Workers. We'd like to increase
>>> the script size we allow our customers to upload, but we have concerns
>>> about the performance impact that will have (specifically script parse
>>> time). One mitigation for this would be to leverage the script compiler's
>>> cached data & generate the cache whenever the user uploads a script. This
>>> way we can precompute the cached data on upload & deliver it alongside the
>>> script.
>>>
>>> Unfortunately, this approach has a major stumbling block which is that
>>> we track V8 releases as they're published. That means our V8 version
>>> changes roughly every week which would (at best) necessitate us
>>> regenerating the cache for all the scripts on a weekly basis. This adds
>>> scalability & implementation complexity concerns (especially since we may
>>> have multiple versions of V8 running at one time).
>>>
>>> I'm not looking to discuss implementation specific details, but more
>>> trying to get an overview of the opinions from the talented V8 team.
>>>
>>>    - I haven't actually examined yet what the structure of the code
>>>    cache actually looks like. Are there prohibitive technical blockers that
>>>    can't really be resolved that make this a non-starter?
>>>    - Are there meaningful maintenance/security/implementation concerns?
>>>    I'm assuming there are very good reasons why the data is version locked.
>>>    - It's not necessarily a requirement to freeze it for all time
>>>    (although that would of course be ideal). What is the cadence for this
>>>    format actually changing (vs no-op version bumps for safety)? Would it be
>>>    possible to stabilize within a major V8 release (8->9, 9->10, etc) or 
>>> for 6
>>>    month periods?
>>>    - If stabilizing is truly impossible (as I suspect it probably is),
>>>    would it be technically feasible to implement a cheaper "upgrade" that
>>>    converts the previous code cache to the current one? It's not ideal, but 
>>> it
>>>    could significantly reduce the costs needed to upgrade many scripts at 
>>> once
>>>
>>> I suspect that any improvement here would also apply to Chrome in the
>>> form of a more consistent performance experience after an upgrade.
>>>
>>> We do have a fallback plan that's workable within the current
>>> architecture, but it's got some downsides that would be neat to bypass by
>>> stabilizing the format. Appreciate any feedback/insights anyone can offer.
>>>
>>> Thanks,
>>> Vitali
>>>
>>> --
>>> --
>>> v8-users mailing list
>>> v8-u...@googlegroups.com
>>> http://groups.google.com/group/v8-users
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "v8-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to v8-users+u...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/v8-users/CAF8PYMgNXRdvW16Sb%3DwRaU21XGcMG3eBgkz_ey65%2BX7DdQ0a6g%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/v8-users/CAF8PYMgNXRdvW16Sb%3DwRaU21XGcMG3eBgkz_ey65%2BX7DdQ0a6g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> --
> v8-users mailing list
> v8-users@googlegroups.com
> http://groups.google.com/group/v8-users
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to v8-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/v8-users/a10e0853-3cec-43d3-abbb-d6a2ecdb8796n%40googlegroups.com
> <https://groups.google.com/d/msgid/v8-users/a10e0853-3cec-43d3-abbb-d6a2ecdb8796n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-users/CAF8PYMjOEK1cZQmryp%3DAZphyb-dyDZ3XCSeTpwoabHkYZpuRwQ%40mail.gmail.com.

Re: [v8-users] Preliminary RFC: Stabilizing the V8 script compiler cached data format

Reply via email to