Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 46e6b3f97425a4a7bb16fc175288903a5f74d5f2
      
https://github.com/WebKit/WebKit/commit/46e6b3f97425a4a7bb16fc175288903a5f74d5f2
  Author: Michael Saboff <msab...@apple.com>
  Date:   2022-12-13 (Tue, 13 Dec 2022)

  Changed paths:
    A JSTests/stress/regexp-lookbehind.js
    M JSTests/test262/config.yaml
    M Source/JavaScriptCore/runtime/RegExp.cpp
    M Source/JavaScriptCore/yarr/YarrInterpreter.cpp
    M Source/JavaScriptCore/yarr/YarrInterpreter.h
    M Source/JavaScriptCore/yarr/YarrJIT.cpp
    M Source/JavaScriptCore/yarr/YarrJIT.h
    M Source/JavaScriptCore/yarr/YarrParser.h
    M Source/JavaScriptCore/yarr/YarrPattern.cpp
    M Source/JavaScriptCore/yarr/YarrPattern.h
    M Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp
    M Source/WTF/wtf/PrintStream.cpp
    M Source/WTF/wtf/PrintStream.h
    M Source/WebCore/contentextensions/URLFilterParser.cpp

  Log Message:
  -----------
  Add support for RegExp lookbehind assertions
https://bugs.webkit.org/show_bug.cgi?id=174931
rdar://33183185

This change implements RegExp lookbehind in the Yarr interpreter.

This change introduces the notion of match direction, either forward or 
backward.
The forward match direction is the way the current code works, matching 
disjunciton terms and the subject
string in a right to left manner.  Lookbehind assertions, as defined in the 
EcmaScript spec, process disjunctions
terms right to left matching the correspondding subject string right to left as 
well.

Except for the Yarr JIT, almost all of the Yarr code has been touched to 
account for this backward matching.
An additional Byteterm has been added, HaveCheckedInput, which checks that 
there is at least as many characters
available in the input stream, but it doesn't move the input stream position.  
This is basically a CheckInput,
without moving the input position.  For variable counted terms, we still need 
to check that we won't try to access
characters beyond the first character of the subject string.  For functions 
like readSurrogatePairChecked(),
we check for input before calling the funcion.  For new input functions with a 
try prefix like tryReadBackward,
the function itselfs checks for available input.  After these checks prove that 
it is safe to access an offset
to the left of the current input position, the actual matching can be performed.

The Yarr parser, parses regular expression in left to right order.  It also 
computes character offest in forward
order.  When we Byteterm compile, we process backward matching disjunctions 
right to left.  The parser also has
special handling of forward references within a backward matching parenthetical 
group.  All such forward references
are saved for that parenthetical group and are processed at the end of the 
group.  Every one of these forward
reference are check to see if a capture to the right of the forward reference 
was found, if so the forward
reference is converted to a back reference.

As part of this work, the ByteTerm dumping code was significantly updated to 
allow for not only dumping of the
ByteCode after it has been generated, but to dump ByteCode while it is being 
interpreted.  This ByteTerm dumping
while interpreting is enabled with the Interpreter::verbose compile time 
constant.

Reviewed by Yusuke Suzuki.

* JSTests/stress/regexp-lookbehind.js: New tests.
(arrayToString):
(dumpValue):
(compareArray):
(testRegExp):
* JSTests/test262/config.yaml:
* Source/JavaScriptCore/runtime/RegExp.cpp:
(JSC::RegExp::compile):
(JSC::RegExp::compileMatchOnly):
* Source/JavaScriptCore/yarr/YarrInterpreter.cpp:
(JSC::Yarr::ByteTermDumper::ByteTermDumper):
(JSC::Yarr::ByteTermDumper::unicode):
(JSC::Yarr::Interpreter::InputStream::readForCharacterDump):
(JSC::Yarr::Interpreter::InputStream::tryReadBackward):
(JSC::Yarr::Interpreter::InputStream::tryUncheckInput):
(JSC::Yarr::Interpreter::InputStream::isValidNegativeInputOffset):
(JSC::Yarr::Interpreter::InputStream::dump const):
(JSC::Yarr::Interpreter::checkCharacter):
(JSC::Yarr::Interpreter::checkSurrogatePair):
(JSC::Yarr::Interpreter::checkCasedCharacter):
(JSC::Yarr::Interpreter::checkCharacterClass):
(JSC::Yarr::Interpreter::checkCharacterClassDontAdvanceInputForNonBMP):
(JSC::Yarr::Interpreter::tryConsumeBackReference):
(JSC::Yarr::Interpreter::matchAssertionWordBoundary):
(JSC::Yarr::Interpreter::backtrackPatternCharacter):
(JSC::Yarr::Interpreter::backtrackPatternCasedCharacter):
(JSC::Yarr::Interpreter::matchCharacterClass):
(JSC::Yarr::Interpreter::backtrackCharacterClass):
(JSC::Yarr::Interpreter::matchBackReference):
(JSC::Yarr::Interpreter::backtrackBackReference):
(JSC::Yarr::Interpreter::recordParenthesesMatch):
(JSC::Yarr::Interpreter::matchParenthesesOnceBegin):
(JSC::Yarr::Interpreter::matchParenthesesOnceEnd):
(JSC::Yarr::Interpreter::backtrackParenthesesOnceEnd):
(JSC::Yarr::Interpreter::matchParentheticalAssertionBegin):
(JSC::Yarr::Interpreter::backtrackParentheticalAssertionBegin):
(JSC::Yarr::Interpreter::matchDisjunction):
(JSC::Yarr::ByteCompiler::compile):
(JSC::Yarr::ByteCompiler::haveCheckedInput):
(JSC::Yarr::ByteCompiler::assertionWordBoundary):
(JSC::Yarr::ByteCompiler::atomPatternCharacter):
(JSC::Yarr::ByteCompiler::atomCharacterClass):
(JSC::Yarr::ByteCompiler::atomBackReference):
(JSC::Yarr::ByteCompiler::atomParenthesesOnceBegin):
(JSC::Yarr::ByteCompiler::atomParenthesesTerminalBegin):
(JSC::Yarr::ByteCompiler::atomParenthesesSubpatternBegin):
(JSC::Yarr::ByteCompiler::atomParentheticalAssertionBegin):
(JSC::Yarr::ByteCompiler::atomParentheticalAssertionEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesSubpatternEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesOnceEnd):
(JSC::Yarr::ByteCompiler::atomParenthesesTerminalEnd):
(JSC::Yarr::ByteCompiler::emitDisjunction):
(JSC::Yarr::ByteCompiler::isSafeToRecurse):
(JSC::Yarr::ByteTermDumper::dumpTerm):
(JSC::Yarr::ByteTermDumper::dumpDisjunction):
(JSC::Yarr::Interpreter::InputStream::readPair): Deleted.
(JSC::Yarr::ByteCompiler::dumpDisjunction): Deleted.
* Source/JavaScriptCore/yarr/YarrInterpreter.h:
(JSC::Yarr::ByteTerm::ByteTerm):
(JSC::Yarr::ByteTerm::HaveCheckedInput):
(JSC::Yarr::ByteTerm::WordBoundary):
(JSC::Yarr::ByteTerm::BackReference):
(JSC::Yarr::ByteTerm::isCharacterType):
(JSC::Yarr::ByteTerm::isCasedCharacterType):
(JSC::Yarr::ByteTerm::isCharacterClass):
(JSC::Yarr::ByteTerm::matchDirection):
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
(JSC::Yarr::dumpCompileFailure):
* Source/JavaScriptCore/yarr/YarrJIT.h:
* Source/JavaScriptCore/yarr/YarrParser.h:
(JSC::Yarr::Parser::parseParenthesesBegin):
* Source/JavaScriptCore/yarr/YarrPattern.cpp:
(JSC::Yarr::YarrPatternConstructor::resetForReparsing):
(JSC::Yarr::YarrPatternConstructor::assertionBOL):
(JSC::Yarr::YarrPatternConstructor::atomPatternCharacter):
(JSC::Yarr::YarrPatternConstructor::atomBuiltInCharacterClass):
(JSC::Yarr::YarrPatternConstructor::atomParenthesesSubpatternBegin):
(JSC::Yarr::YarrPatternConstructor::atomParentheticalAssertionBegin):
(JSC::Yarr::YarrPatternConstructor::atomParenthesesEnd):
(JSC::Yarr::YarrPatternConstructor::atomBackReference):
(JSC::Yarr::YarrPatternConstructor::copyDisjunction):
(JSC::Yarr::YarrPatternConstructor::quantifyAtom):
(JSC::Yarr::YarrPatternConstructor::disjunction):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::SavedContext::SavedContext):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::SavedContext::restore):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::ParenthesisContext):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::push):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::pop):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::setInvert):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::invert const):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::setMatchDirection):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::matchDirection const):
(JSC::Yarr::YarrPatternConstructor::ParenthesisContext::reset):
(JSC::Yarr::YarrPatternConstructor::pushParenthesisContext):
(JSC::Yarr::YarrPatternConstructor::popParenthesisContext):
(JSC::Yarr::YarrPatternConstructor::setParenthesisInvert):
(JSC::Yarr::YarrPatternConstructor::parenthesisInvert const):
(JSC::Yarr::YarrPatternConstructor::setParenthesisMatchDirection):
(JSC::Yarr::YarrPatternConstructor::parenthesisMatchDirection const):
(JSC::Yarr::YarrPattern::YarrPattern):
(JSC::Yarr::dumpCharacterClass):
(JSC::Yarr::PatternTerm::dump):
* Source/JavaScriptCore/yarr/YarrPattern.h:
(JSC::Yarr::PatternTerm::PatternTerm):
(JSC::Yarr::PatternTerm::convertToBackreference):
(JSC::Yarr::PatternTerm::setMatchDirection):
(JSC::Yarr::PatternTerm::matchDirection const):
(JSC::Yarr::PatternAlternative::PatternAlternative):
(JSC::Yarr::PatternAlternative::matchDirection const):
(JSC::Yarr::PatternDisjunction::addNewAlternative):
(JSC::Yarr::YarrPattern::resetForReparsing):
* Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:
(JSC::Yarr::SyntaxChecker::atomParentheticalAssertionBegin):
* Source/WTF/wtf/PrintStream.cpp:
(WTF::printInternal):
* Source/WTF/wtf/PrintStream.h:
* Source/WebCore/contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::PatternParser::atomParentheticalAssertionBegin):

Canonical link: https://commits.webkit.org/257823@main


_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to