Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

Marja Hölttä Wed, 01 Sep 2021 08:46:28 -0700

Wait, no, we do handle running out of stack in a robust way and the "does
this parse" should just return false then (even though the code might be
valid Js). Please ignore that part of my comment :)


On Wed, 1 Sep 2021, 16:38 Marja Hölttä, <ma...@chromium.org> wrote:

> A random side note: it's also possible to make V8's recursive descent
> parser run out of stack using valid JS, e.g., let a = [[[[[..[ 0 ]]]]]..]
> or other similar constructs (deep enough). Meaning you prob don't want to
> call into the parser in a process where you don't want this to happen.
>
> Re: encodings, when I worked on script streaming I noticed it's pretty
> common that scripts advertised as UTF-8 are not valid UTF-8 (e.g., have
> invalid chars inside comments), and Chrome is currently pretty lenient
> about those.
>
>
> On Wed, Aug 18, 2021 at 3:18 PM Toon Verwaest <verwa...@chromium.org>
> wrote:
>
>>
>>
>> On Wed, Aug 18, 2021 at 2:29 AM 'Łukasz Anforowicz' via v8-dev <
>> v8-dev@googlegroups.com> wrote:
>>
>>>
>>>
>>> On Tue, Aug 17, 2021 at 6:59 AM Toon Verwaest <verwa...@chromium.org>
>>> wrote:
>>>
>>>> Thinking out loud: One idea could be to have a separate sandboxed
>>>> compiler process in which we compile incoming JS code. That could reject
>>>> the source if it doesn't compile; or compile it to a script that just
>>>> throws with no additional info about the actual source.
>>>>
>>>> That process could implement streaming compilation; so we don't block
>>>> streaming until later, we don't double parse, we still have a sandbox (not
>>>> in the network process). There might even be benefits for caching as a
>>>> compromised renderer cannot look at the compilation artefacts until it
>>>> receives them.
>>>>
>>>> If we fully compile and create a code cache from the compilation result
>>>> we don't need a new API on the V8 side, but do additional
>>>> serialization/deserialization work. That should be faster than reparsing
>>>> though. The upper limit of the cost would essentially be the cost of
>>>> serializing / deserializing a code cache for each script.
>>>>
>>>
>>> This seems like an interesting idea.  I wonder if compilation (no
>>> evaluation / running of scripts) would be considered safe enough to handle
>>> in a single (not origin/site-bound/locked) process.
>>>
>>
>> The parser/compiler aren't tiny, so it's not unlikely there's a bug. It's
>> certainly much less easy to control such bugs than full-blown JS OOB access
>> though. I could imagine a security bug replacing scripts in another site
>> (assuming it's sandboxed so well that it can't do much else), which would
>> be terrible; and it's unclear to me how easy that would be.
>>
>>
>>>
>>> One thing that I don't fully understand (For both full-JS-parsing and
>>> partial/hackish-non-JS-detection approaches) is if the encoding (e.g. UTF8
>>> vs UTF16-LE vs Win-1250) has to be known and communicated upfront to the
>>> parser/sniffer?  Or maybe the input to the decoder needs to be already in
>>> UTF8?  Or maybe something in //net or //network layers can already handle
>>> this aspect of the problem (e.g. ensuring UTF8 in URLLoader::DidRead)?
>>>
>>
>> There's some encoding guessing happening before we streaming compile (
>> https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_streamer.cc;l=584;drc=f0b502c3c977f47c58b49506629b2dd8353e4c59;bpv=1;bpt=1)
>> and some afterwards; and if we initially compiled with the wrong encoding
>> we discard and redo iirc. Presumably compilation failed anyway if the
>> encoding was wrong; but this presumably also doesn't happen too often.
>>
>>
>>>
>>> Also - when trying to explore the partial/hackish-non-JS-detection idea,
>>> I wondered if the very first character in a script may only come from a
>>> relatively limited set of characters?  Let's assume that the sniffer can
>>> skip whitespace (space, tab, CR, LF, LS, PS) and html/xml comments (e.g.
>>> <!-- ... -->) - AFAICT the very next character has to be either:
>>>
>>>    - The start of a reserved keyword like "if", "let", etc. (all
>>>    lowercase ASCII)
>>>    - The start of an identifier (any Unicode code point with the
>>>    Unicode property “ID_Start”)
>>>    - The start of a unary expression: + - ~ !
>>>    - The start of a string literal, string template, or a regexp
>>>    literal (or non-HTML comment): " ' ` /
>>>    - The start of a numeric literal: 0-9
>>>    - An opening paren, bracket or brace: ( [ {
>>>    - Not quite sure if a dot or an equal sign can appear as the very
>>>    first character: . =
>>>
>>> This would reject PDFs (starts with %) and HTML/XML (starts with <), but
>>> still would accept ZIP files (first character is a 0x50 - capital P) and
>>> MSOffice files (first character is a 0xD0 which according to Unicode has
>>> ID_Start property set to true).  Rejecting ZIP and MSOffice files would
>>> require going beyond the first character - maybe rejecting control
>>> characters like 0x11 or 0x03 outside of comments (not sure if at this point
>>> the sniffer's heuristics are starting to get too complex).
>>>
>>
>> That was my initial thought too for e.g., PDF. You'd be blacklisting
>> files you don't want to leak vs whitelisting JS though, which isn't
>> entirely ideal security-wise. It might be better than the alternative
>> though; if we either end up spending slowing down the web (repeat parsing,
>> interfere with streaming) or potentially have new security issues through a
>> shared compiler process.
>>
>>
>>>
>>>
>>>> On Fri, Aug 13, 2021 at 12:26 AM 'Łukasz Anforowicz' via v8-dev <
>>>> v8-dev@googlegroups.com> wrote:
>>>>
>>>>> On Thu, Aug 12, 2021 at 3:18 PM Łukasz Anforowicz <luka...@google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 3:11 PM Jakob Kummerow <
>>>>>> jkumme...@chromium.org> wrote:
>>>>>>
>>>>>>> ORB-with-html/json/xml-sniffing shows that some security benefits of
>>>>>>>> ORB may be realized without full-fidelity JS sniffing/parsing.
>>>>>>>>
>>>>>>>>
>>>>>>> You may call it a security benefit to block "obvious" parser
>>>>>>> breakers like )]}', but in general, any "when in doubt, don't block
>>>>>>> it" strategy won't be much of an obstacle to intentional attacks. For
>>>>>>> instance, once Mr. Bad Guy has learned that the sniffer only looks at 
>>>>>>> the
>>>>>>> first 1024 characters, they can send a response whose first 1024 
>>>>>>> characters
>>>>>>> lead to a "well, it *might* be valid JS" judgement (such as a JS
>>>>>>> comment, or long string, or whatever). OTOH any "when in doubt, block 
>>>>>>> it"
>>>>>>> strategy runs the risk of breaking existing websites in those doubtful
>>>>>>> cases.
>>>>>>>
>>>>>>
>>>>>> In CORB threat model the attacker does *not* control the responses -
>>>>>> CORB tries to prevent https://attacker.com (with either Spectre or a
>>>>>> compromised renderer) from being able to read no-cors responses from
>>>>>> https://victim.com.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>  (Although the JSON object syntax is exactly Javascript's
>>>>>>>> object-initializer syntax, a Javascript object-initializer expression 
>>>>>>>> is
>>>>>>>> not valid as a standalone Javascript statement.)
>>>>>>>
>>>>>>>
>>>>>>> There is (at least) one subtlety here: JS is more permissive than
>>>>>>> the official JSON spec. The latter requires quotes around property 
>>>>>>> names,
>>>>>>> the former doesn't. I.e. {"foo": is indeed never valid JS, but {foo: is
>>>>>>> (the brace opens a code block, and foo is a label). Also, the colon is
>>>>>>> essential for rejecting the former snippet, because {"foo"; is
>>>>>>> valid JS (code block plus ignored string á la "use strict";), so
>>>>>>> this is a concrete example where the 1024-char prefix issue is relevant.
>>>>>>>
>>>>>>>
>>>>>>>> When the sniffer sees:
>>>>>>>>      [ 123, 456, “long string taking X bytes”,
>>>>>>>> then it should block the response when the Content-Type is a JSON
>>>>>>>> MIME type
>>>>>>>
>>>>>>>
>>>>>>> I don't follow. When the Content-Type is JSON, and the actual
>>>>>>> contents are valid JSON, why should that be blocked?
>>>>>>>
>>>>>>
>>>>>> Correct.  There is no way to read cross-origin JSON via a "no-cors"
>>>>>> fetch.  The only way to read cross-origin JSON is via CORS-mediated fetch
>>>>>> (where the victim has to opt-in by responding with
>>>>>> "Access-Control-Allow-Origin: ...").
>>>>>>
>>>>>
>>>>> Maybe another way to look at it is:
>>>>>
>>>>>    - Only Javascript (and images/audio/video/stylesheets) can be sent
>>>>>    in no-cors mode (e.g. without CORS).  Non-Javascript (and
>>>>>    non-image/video/etc), no-cors, cross-origin responses can be blocked.
>>>>>    - If the response sniffs as JSON (Content-Type=JSON and
>>>>>    First1024bytes=JSON) then it is *not* Javascript.  Therefore we can 
>>>>> block
>>>>>    the response (and prevent disclosing https://victim.com/secret.json
>>>>>    to a no-cors fetch from https://attacker.com).
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> v8-dev mailing list
>>>>>>> v8-dev@googlegroups.com
>>>>>>> http://groups.google.com/group/v8-dev
>>>>>>> ---
>>>>>>> You received this message because you are subscribed to a topic in
>>>>>>> the Google Groups "v8-dev" group.
>>>>>>> To unsubscribe from this topic, visit
>>>>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>>> v8-dev+unsubscr...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>>
>>>>>> Lukasz
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>>
>>>>> Lukasz
>>>>>
>>>>> --
>>>>> --
>>>>> v8-dev mailing list
>>>>> v8-dev@googlegroups.com
>>>>> http://groups.google.com/group/v8-dev
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "v8-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to v8-dev+unsubscr...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>>> --
>>>> v8-dev mailing list
>>>> v8-dev@googlegroups.com
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "v8-dev" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> v8-dev+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>>
>>> Lukasz
>>>
>>> --
>>> --
>>> v8-dev mailing list
>>> v8-dev@googlegroups.com
>>> http://groups.google.com/group/v8-dev
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "v8-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to v8-dev+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> --
>> v8-dev mailing list
>> v8-dev@googlegroups.com
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to v8-dev+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
>
>
> Google Germany GmbH
>
> Erika-Mann-Straße 33
>
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>
> Registergericht und -nummer: Hamburg, HRB 86891
>
> Sitz der Gesellschaft: Hamburg
>
> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten
> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
> löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen,
> dass die E-Mail an die falsche Person gesendet wurde.
>
>
>
> This e-mail is confidential. If you received this communication by
> mistake, please don't forward it to anyone else, please erase all copies
> and attachments, and please let me know that it has gone to the wrong
> person.
>

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CAED6dUCPtJQc%3DRqu_uCCV8oYpbX%3D9xhDJaxwbNYA0unaqrWaZA%40mail.gmail.com.

Re: [v8-dev] Utility to check if a given stream can parse as Javascript (ORB)

Reply via email to