Branch: refs/heads/main Home: https://github.com/WebKit/WebKit Commit: 46e6b3f97425a4a7bb16fc175288903a5f74d5f2 https://github.com/WebKit/WebKit/commit/46e6b3f97425a4a7bb16fc175288903a5f74d5f2 Author: Michael Saboff <msab...@apple.com> Date: 2022-12-13 (Tue, 13 Dec 2022)
Changed paths: A JSTests/stress/regexp-lookbehind.js M JSTests/test262/config.yaml M Source/JavaScriptCore/runtime/RegExp.cpp M Source/JavaScriptCore/yarr/YarrInterpreter.cpp M Source/JavaScriptCore/yarr/YarrInterpreter.h M Source/JavaScriptCore/yarr/YarrJIT.cpp M Source/JavaScriptCore/yarr/YarrJIT.h M Source/JavaScriptCore/yarr/YarrParser.h M Source/JavaScriptCore/yarr/YarrPattern.cpp M Source/JavaScriptCore/yarr/YarrPattern.h M Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp M Source/WTF/wtf/PrintStream.cpp M Source/WTF/wtf/PrintStream.h M Source/WebCore/contentextensions/URLFilterParser.cpp Log Message: ----------- Add support for RegExp lookbehind assertions https://bugs.webkit.org/show_bug.cgi?id=174931 rdar://33183185 This change implements RegExp lookbehind in the Yarr interpreter. This change introduces the notion of match direction, either forward or backward. The forward match direction is the way the current code works, matching disjunciton terms and the subject string in a right to left manner. Lookbehind assertions, as defined in the EcmaScript spec, process disjunctions terms right to left matching the correspondding subject string right to left as well. Except for the Yarr JIT, almost all of the Yarr code has been touched to account for this backward matching. An additional Byteterm has been added, HaveCheckedInput, which checks that there is at least as many characters available in the input stream, but it doesn't move the input stream position. This is basically a CheckInput, without moving the input position. For variable counted terms, we still need to check that we won't try to access characters beyond the first character of the subject string. For functions like readSurrogatePairChecked(), we check for input before calling the funcion. For new input functions with a try prefix like tryReadBackward, the function itselfs checks for available input. After these checks prove that it is safe to access an offset to the left of the current input position, the actual matching can be performed. The Yarr parser, parses regular expression in left to right order. It also computes character offest in forward order. When we Byteterm compile, we process backward matching disjunctions right to left. The parser also has special handling of forward references within a backward matching parenthetical group. All such forward references are saved for that parenthetical group and are processed at the end of the group. Every one of these forward reference are check to see if a capture to the right of the forward reference was found, if so the forward reference is converted to a back reference. As part of this work, the ByteTerm dumping code was significantly updated to allow for not only dumping of the ByteCode after it has been generated, but to dump ByteCode while it is being interpreted. This ByteTerm dumping while interpreting is enabled with the Interpreter::verbose compile time constant. Reviewed by Yusuke Suzuki. * JSTests/stress/regexp-lookbehind.js: New tests. (arrayToString): (dumpValue): (compareArray): (testRegExp): * JSTests/test262/config.yaml: * Source/JavaScriptCore/runtime/RegExp.cpp: (JSC::RegExp::compile): (JSC::RegExp::compileMatchOnly): * Source/JavaScriptCore/yarr/YarrInterpreter.cpp: (JSC::Yarr::ByteTermDumper::ByteTermDumper): (JSC::Yarr::ByteTermDumper::unicode): (JSC::Yarr::Interpreter::InputStream::readForCharacterDump): (JSC::Yarr::Interpreter::InputStream::tryReadBackward): (JSC::Yarr::Interpreter::InputStream::tryUncheckInput): (JSC::Yarr::Interpreter::InputStream::isValidNegativeInputOffset): (JSC::Yarr::Interpreter::InputStream::dump const): (JSC::Yarr::Interpreter::checkCharacter): (JSC::Yarr::Interpreter::checkSurrogatePair): (JSC::Yarr::Interpreter::checkCasedCharacter): (JSC::Yarr::Interpreter::checkCharacterClass): (JSC::Yarr::Interpreter::checkCharacterClassDontAdvanceInputForNonBMP): (JSC::Yarr::Interpreter::tryConsumeBackReference): (JSC::Yarr::Interpreter::matchAssertionWordBoundary): (JSC::Yarr::Interpreter::backtrackPatternCharacter): (JSC::Yarr::Interpreter::backtrackPatternCasedCharacter): (JSC::Yarr::Interpreter::matchCharacterClass): (JSC::Yarr::Interpreter::backtrackCharacterClass): (JSC::Yarr::Interpreter::matchBackReference): (JSC::Yarr::Interpreter::backtrackBackReference): (JSC::Yarr::Interpreter::recordParenthesesMatch): (JSC::Yarr::Interpreter::matchParenthesesOnceBegin): (JSC::Yarr::Interpreter::matchParenthesesOnceEnd): (JSC::Yarr::Interpreter::backtrackParenthesesOnceEnd): (JSC::Yarr::Interpreter::matchParentheticalAssertionBegin): (JSC::Yarr::Interpreter::backtrackParentheticalAssertionBegin): (JSC::Yarr::Interpreter::matchDisjunction): (JSC::Yarr::ByteCompiler::compile): (JSC::Yarr::ByteCompiler::haveCheckedInput): (JSC::Yarr::ByteCompiler::assertionWordBoundary): (JSC::Yarr::ByteCompiler::atomPatternCharacter): (JSC::Yarr::ByteCompiler::atomCharacterClass): (JSC::Yarr::ByteCompiler::atomBackReference): (JSC::Yarr::ByteCompiler::atomParenthesesOnceBegin): (JSC::Yarr::ByteCompiler::atomParenthesesTerminalBegin): (JSC::Yarr::ByteCompiler::atomParenthesesSubpatternBegin): (JSC::Yarr::ByteCompiler::atomParentheticalAssertionBegin): (JSC::Yarr::ByteCompiler::atomParentheticalAssertionEnd): (JSC::Yarr::ByteCompiler::atomParenthesesSubpatternEnd): (JSC::Yarr::ByteCompiler::atomParenthesesOnceEnd): (JSC::Yarr::ByteCompiler::atomParenthesesTerminalEnd): (JSC::Yarr::ByteCompiler::emitDisjunction): (JSC::Yarr::ByteCompiler::isSafeToRecurse): (JSC::Yarr::ByteTermDumper::dumpTerm): (JSC::Yarr::ByteTermDumper::dumpDisjunction): (JSC::Yarr::Interpreter::InputStream::readPair): Deleted. (JSC::Yarr::ByteCompiler::dumpDisjunction): Deleted. * Source/JavaScriptCore/yarr/YarrInterpreter.h: (JSC::Yarr::ByteTerm::ByteTerm): (JSC::Yarr::ByteTerm::HaveCheckedInput): (JSC::Yarr::ByteTerm::WordBoundary): (JSC::Yarr::ByteTerm::BackReference): (JSC::Yarr::ByteTerm::isCharacterType): (JSC::Yarr::ByteTerm::isCasedCharacterType): (JSC::Yarr::ByteTerm::isCharacterClass): (JSC::Yarr::ByteTerm::matchDirection): * Source/JavaScriptCore/yarr/YarrJIT.cpp: (JSC::Yarr::dumpCompileFailure): * Source/JavaScriptCore/yarr/YarrJIT.h: * Source/JavaScriptCore/yarr/YarrParser.h: (JSC::Yarr::Parser::parseParenthesesBegin): * Source/JavaScriptCore/yarr/YarrPattern.cpp: (JSC::Yarr::YarrPatternConstructor::resetForReparsing): (JSC::Yarr::YarrPatternConstructor::assertionBOL): (JSC::Yarr::YarrPatternConstructor::atomPatternCharacter): (JSC::Yarr::YarrPatternConstructor::atomBuiltInCharacterClass): (JSC::Yarr::YarrPatternConstructor::atomParenthesesSubpatternBegin): (JSC::Yarr::YarrPatternConstructor::atomParentheticalAssertionBegin): (JSC::Yarr::YarrPatternConstructor::atomParenthesesEnd): (JSC::Yarr::YarrPatternConstructor::atomBackReference): (JSC::Yarr::YarrPatternConstructor::copyDisjunction): (JSC::Yarr::YarrPatternConstructor::quantifyAtom): (JSC::Yarr::YarrPatternConstructor::disjunction): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::SavedContext::SavedContext): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::SavedContext::restore): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::ParenthesisContext): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::push): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::pop): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::setInvert): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::invert const): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::setMatchDirection): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::matchDirection const): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::reset): (JSC::Yarr::YarrPatternConstructor::pushParenthesisContext): (JSC::Yarr::YarrPatternConstructor::popParenthesisContext): (JSC::Yarr::YarrPatternConstructor::setParenthesisInvert): (JSC::Yarr::YarrPatternConstructor::parenthesisInvert const): (JSC::Yarr::YarrPatternConstructor::setParenthesisMatchDirection): (JSC::Yarr::YarrPatternConstructor::parenthesisMatchDirection const): (JSC::Yarr::YarrPattern::YarrPattern): (JSC::Yarr::dumpCharacterClass): (JSC::Yarr::PatternTerm::dump): * Source/JavaScriptCore/yarr/YarrPattern.h: (JSC::Yarr::PatternTerm::PatternTerm): (JSC::Yarr::PatternTerm::convertToBackreference): (JSC::Yarr::PatternTerm::setMatchDirection): (JSC::Yarr::PatternTerm::matchDirection const): (JSC::Yarr::PatternAlternative::PatternAlternative): (JSC::Yarr::PatternAlternative::matchDirection const): (JSC::Yarr::PatternDisjunction::addNewAlternative): (JSC::Yarr::YarrPattern::resetForReparsing): * Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp: (JSC::Yarr::SyntaxChecker::atomParentheticalAssertionBegin): * Source/WTF/wtf/PrintStream.cpp: (WTF::printInternal): * Source/WTF/wtf/PrintStream.h: * Source/WebCore/contentextensions/URLFilterParser.cpp: (WebCore::ContentExtensions::PatternParser::atomParentheticalAssertionBegin): Canonical link: https://commits.webkit.org/257823@main _______________________________________________ webkit-changes mailing list webkit-changes@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-changes