Wait, no, we do handle running out of stack in a robust way and the "does this parse" should just return false then (even though the code might be valid Js). Please ignore that part of my comment :)
On Wed, 1 Sep 2021, 16:38 Marja Hölttä, <ma...@chromium.org> wrote: > A random side note: it's also possible to make V8's recursive descent > parser run out of stack using valid JS, e.g., let a = [[[[[..[ 0 ]]]]]..] > or other similar constructs (deep enough). Meaning you prob don't want to > call into the parser in a process where you don't want this to happen. > > Re: encodings, when I worked on script streaming I noticed it's pretty > common that scripts advertised as UTF-8 are not valid UTF-8 (e.g., have > invalid chars inside comments), and Chrome is currently pretty lenient > about those. > > > On Wed, Aug 18, 2021 at 3:18 PM Toon Verwaest <verwa...@chromium.org> > wrote: > >> >> >> On Wed, Aug 18, 2021 at 2:29 AM 'Łukasz Anforowicz' via v8-dev < >> v8-dev@googlegroups.com> wrote: >> >>> >>> >>> On Tue, Aug 17, 2021 at 6:59 AM Toon Verwaest <verwa...@chromium.org> >>> wrote: >>> >>>> Thinking out loud: One idea could be to have a separate sandboxed >>>> compiler process in which we compile incoming JS code. That could reject >>>> the source if it doesn't compile; or compile it to a script that just >>>> throws with no additional info about the actual source. >>>> >>>> That process could implement streaming compilation; so we don't block >>>> streaming until later, we don't double parse, we still have a sandbox (not >>>> in the network process). There might even be benefits for caching as a >>>> compromised renderer cannot look at the compilation artefacts until it >>>> receives them. >>>> >>>> If we fully compile and create a code cache from the compilation result >>>> we don't need a new API on the V8 side, but do additional >>>> serialization/deserialization work. That should be faster than reparsing >>>> though. The upper limit of the cost would essentially be the cost of >>>> serializing / deserializing a code cache for each script. >>>> >>> >>> This seems like an interesting idea. I wonder if compilation (no >>> evaluation / running of scripts) would be considered safe enough to handle >>> in a single (not origin/site-bound/locked) process. >>> >> >> The parser/compiler aren't tiny, so it's not unlikely there's a bug. It's >> certainly much less easy to control such bugs than full-blown JS OOB access >> though. I could imagine a security bug replacing scripts in another site >> (assuming it's sandboxed so well that it can't do much else), which would >> be terrible; and it's unclear to me how easy that would be. >> >> >>> >>> One thing that I don't fully understand (For both full-JS-parsing and >>> partial/hackish-non-JS-detection approaches) is if the encoding (e.g. UTF8 >>> vs UTF16-LE vs Win-1250) has to be known and communicated upfront to the >>> parser/sniffer? Or maybe the input to the decoder needs to be already in >>> UTF8? Or maybe something in //net or //network layers can already handle >>> this aspect of the problem (e.g. ensuring UTF8 in URLLoader::DidRead)? >>> >> >> There's some encoding guessing happening before we streaming compile ( >> https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/core/v8/script_streamer.cc;l=584;drc=f0b502c3c977f47c58b49506629b2dd8353e4c59;bpv=1;bpt=1) >> and some afterwards; and if we initially compiled with the wrong encoding >> we discard and redo iirc. Presumably compilation failed anyway if the >> encoding was wrong; but this presumably also doesn't happen too often. >> >> >>> >>> Also - when trying to explore the partial/hackish-non-JS-detection idea, >>> I wondered if the very first character in a script may only come from a >>> relatively limited set of characters? Let's assume that the sniffer can >>> skip whitespace (space, tab, CR, LF, LS, PS) and html/xml comments (e.g. >>> <!-- ... -->) - AFAICT the very next character has to be either: >>> >>> - The start of a reserved keyword like "if", "let", etc. (all >>> lowercase ASCII) >>> - The start of an identifier (any Unicode code point with the >>> Unicode property “ID_Start”) >>> - The start of a unary expression: + - ~ ! >>> - The start of a string literal, string template, or a regexp >>> literal (or non-HTML comment): " ' ` / >>> - The start of a numeric literal: 0-9 >>> - An opening paren, bracket or brace: ( [ { >>> - Not quite sure if a dot or an equal sign can appear as the very >>> first character: . = >>> >>> This would reject PDFs (starts with %) and HTML/XML (starts with <), but >>> still would accept ZIP files (first character is a 0x50 - capital P) and >>> MSOffice files (first character is a 0xD0 which according to Unicode has >>> ID_Start property set to true). Rejecting ZIP and MSOffice files would >>> require going beyond the first character - maybe rejecting control >>> characters like 0x11 or 0x03 outside of comments (not sure if at this point >>> the sniffer's heuristics are starting to get too complex). >>> >> >> That was my initial thought too for e.g., PDF. You'd be blacklisting >> files you don't want to leak vs whitelisting JS though, which isn't >> entirely ideal security-wise. It might be better than the alternative >> though; if we either end up spending slowing down the web (repeat parsing, >> interfere with streaming) or potentially have new security issues through a >> shared compiler process. >> >> >>> >>> >>>> On Fri, Aug 13, 2021 at 12:26 AM 'Łukasz Anforowicz' via v8-dev < >>>> v8-dev@googlegroups.com> wrote: >>>> >>>>> On Thu, Aug 12, 2021 at 3:18 PM Łukasz Anforowicz <luka...@google.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, Aug 12, 2021 at 3:11 PM Jakob Kummerow < >>>>>> jkumme...@chromium.org> wrote: >>>>>> >>>>>>> ORB-with-html/json/xml-sniffing shows that some security benefits of >>>>>>>> ORB may be realized without full-fidelity JS sniffing/parsing. >>>>>>>> >>>>>>>> >>>>>>> You may call it a security benefit to block "obvious" parser >>>>>>> breakers like )]}', but in general, any "when in doubt, don't block >>>>>>> it" strategy won't be much of an obstacle to intentional attacks. For >>>>>>> instance, once Mr. Bad Guy has learned that the sniffer only looks at >>>>>>> the >>>>>>> first 1024 characters, they can send a response whose first 1024 >>>>>>> characters >>>>>>> lead to a "well, it *might* be valid JS" judgement (such as a JS >>>>>>> comment, or long string, or whatever). OTOH any "when in doubt, block >>>>>>> it" >>>>>>> strategy runs the risk of breaking existing websites in those doubtful >>>>>>> cases. >>>>>>> >>>>>> >>>>>> In CORB threat model the attacker does *not* control the responses - >>>>>> CORB tries to prevent https://attacker.com (with either Spectre or a >>>>>> compromised renderer) from being able to read no-cors responses from >>>>>> https://victim.com. >>>>>> >>>>>>> >>>>>>> >>>>>>>> (Although the JSON object syntax is exactly Javascript's >>>>>>>> object-initializer syntax, a Javascript object-initializer expression >>>>>>>> is >>>>>>>> not valid as a standalone Javascript statement.) >>>>>>> >>>>>>> >>>>>>> There is (at least) one subtlety here: JS is more permissive than >>>>>>> the official JSON spec. The latter requires quotes around property >>>>>>> names, >>>>>>> the former doesn't. I.e. {"foo": is indeed never valid JS, but {foo: is >>>>>>> (the brace opens a code block, and foo is a label). Also, the colon is >>>>>>> essential for rejecting the former snippet, because {"foo"; is >>>>>>> valid JS (code block plus ignored string á la "use strict";), so >>>>>>> this is a concrete example where the 1024-char prefix issue is relevant. >>>>>>> >>>>>>> >>>>>>>> When the sniffer sees: >>>>>>>> [ 123, 456, “long string taking X bytes”, >>>>>>>> then it should block the response when the Content-Type is a JSON >>>>>>>> MIME type >>>>>>> >>>>>>> >>>>>>> I don't follow. When the Content-Type is JSON, and the actual >>>>>>> contents are valid JSON, why should that be blocked? >>>>>>> >>>>>> >>>>>> Correct. There is no way to read cross-origin JSON via a "no-cors" >>>>>> fetch. The only way to read cross-origin JSON is via CORS-mediated fetch >>>>>> (where the victim has to opt-in by responding with >>>>>> "Access-Control-Allow-Origin: ..."). >>>>>> >>>>> >>>>> Maybe another way to look at it is: >>>>> >>>>> - Only Javascript (and images/audio/video/stylesheets) can be sent >>>>> in no-cors mode (e.g. without CORS). Non-Javascript (and >>>>> non-image/video/etc), no-cors, cross-origin responses can be blocked. >>>>> - If the response sniffs as JSON (Content-Type=JSON and >>>>> First1024bytes=JSON) then it is *not* Javascript. Therefore we can >>>>> block >>>>> the response (and prevent disclosing https://victim.com/secret.json >>>>> to a no-cors fetch from https://attacker.com). >>>>> >>>>> >>>>> >>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> v8-dev mailing list >>>>>>> v8-dev@googlegroups.com >>>>>>> http://groups.google.com/group/v8-dev >>>>>>> --- >>>>>>> You received this message because you are subscribed to a topic in >>>>>>> the Google Groups "v8-dev" group. >>>>>>> To unsubscribe from this topic, visit >>>>>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe. >>>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>>> v8-dev+unsubscr...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/v8-dev/CAKSzg3TNvd1jd3yH8xyD767ZhbCqhEZJMFmm7nQ%2BtcQcXfjt_g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> >>>>>> Lukasz >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> >>>>> Lukasz >>>>> >>>>> -- >>>>> -- >>>>> v8-dev mailing list >>>>> v8-dev@googlegroups.com >>>>> http://groups.google.com/group/v8-dev >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "v8-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to v8-dev+unsubscr...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHWD5G2G9aHe%3DnM6k-hSZY2ufqx7GwEhmKYSfPN9b%3D9WA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> -- >>>> v8-dev mailing list >>>> v8-dev@googlegroups.com >>>> http://groups.google.com/group/v8-dev >>>> --- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "v8-dev" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/v8-dev/NGGCw9OjatI/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> v8-dev+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqhC5Z_XeNuN0-4VNMgOV-bJ6LHd1e%3Daw%2Bn82pjxWJx1Q%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> Thanks, >>> >>> Lukasz >>> >>> -- >>> -- >>> v8-dev mailing list >>> v8-dev@googlegroups.com >>> http://groups.google.com/group/v8-dev >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "v8-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to v8-dev+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com >>> <https://groups.google.com/d/msgid/v8-dev/CAA_NCUHjjiB9kMbyk%2Bn1ZMEda%2B8Oehr6ukU1VkK0vt9pcW%2B%3DuQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> -- >> v8-dev mailing list >> v8-dev@googlegroups.com >> http://groups.google.com/group/v8-dev >> --- >> You received this message because you are subscribed to the Google Groups >> "v8-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to v8-dev+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com >> <https://groups.google.com/d/msgid/v8-dev/CANS-YRqxEZHNcHV%2ByHZLBfoNOCbzQRxjXkfaeo2VCQgvUG9zKg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > > Google Germany GmbH > > Erika-Mann-Straße 33 > > 80636 München > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado > > Registergericht und -nummer: Hamburg, HRB 86891 > > Sitz der Gesellschaft: Hamburg > > Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten > haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, > löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, > dass die E-Mail an die falsche Person gesendet wurde. > > > > This e-mail is confidential. If you received this communication by > mistake, please don't forward it to anyone else, please erase all copies > and attachments, and please let me know that it has gone to the wrong > person. > -- -- v8-dev mailing list v8-dev@googlegroups.com http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/CAED6dUCPtJQc%3DRqu_uCCV8oYpbX%3D9xhDJaxwbNYA0unaqrWaZA%40mail.gmail.com.