Title: [197715] trunk/Source/_javascript_Core
Revision
197715
Author
fpi...@apple.com
Date
2016-03-07 16:34:44 -0800 (Mon, 07 Mar 2016)

Log Message

RegExp.prototype.exec() should call into Yarr at most once
https://bugs.webkit.org/show_bug.cgi?id=155139

Reviewed by Saam Barati.

For apparently no good reason, RegExp.prototype.match() was calling into Yarr twice, almost
as if it was hoping that the non-matching case was so common that it was best to have the
matching case do the work all over again.

This is a 4% speed-up on Octane/regexp. It's also a matter of common sense: we should not be
in the business of presuming whether someone's match will succeed or fail. The increased
cost of running Yarr twice is so much larger than whatever savings we were getting from
running a match-only regexp that this is just not a good overall deal for the engine.

Also, it's interesting that we are seeing a 4% speed-up on regexp despite the fact that a
majority (almost a supermajority, I think) of calls into RegExp.prototype.match() are failed
matches. So, this change is a 4% speed-up despite being a slow down on the common case. That
tells you just how bad the old behavior was on the uncommon case.

* runtime/MatchResult.h:
(MatchResult::MatchResult):
(MatchResult::failed):
(MatchResult::operator bool):
* runtime/RegExpCachedResult.cpp:
(JSC::RegExpCachedResult::lastResult):
* runtime/RegExpConstructor.h:
(JSC::RegExpConstructor::setMultiline):
(JSC::RegExpConstructor::multiline):
(JSC::RegExpConstructor::performMatch):
(JSC::RegExpConstructor::recordMatch):
* runtime/RegExpMatchesArray.cpp:
(JSC::createRegExpMatchesArray):
(JSC::createEmptyRegExpMatchesArray):
(JSC::createStructureImpl):
* runtime/RegExpMatchesArray.h:
(JSC::createRegExpMatchesArray):
* runtime/RegExpObject.cpp:
(JSC::RegExpObject::put):
(JSC::getLastIndexAsUnsigned):
(JSC::RegExpObject::exec):
(JSC::RegExpObject::match):
* runtime/RegExpObject.h:
(JSC::RegExpObject::getLastIndex):
(JSC::RegExpObject::test):
* runtime/StringPrototype.cpp:
(JSC::stringProtoFuncMatch):

Modified Paths

Diff

Modified: trunk/Source/_javascript_Core/ChangeLog (197714 => 197715)


--- trunk/Source/_javascript_Core/ChangeLog	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/ChangeLog	2016-03-08 00:34:44 UTC (rev 197715)
@@ -1,3 +1,52 @@
+2016-03-07  Filip Pizlo  <fpi...@apple.com>
+
+        RegExp.prototype.exec() should call into Yarr at most once
+        https://bugs.webkit.org/show_bug.cgi?id=155139
+
+        Reviewed by Saam Barati.
+
+        For apparently no good reason, RegExp.prototype.match() was calling into Yarr twice, almost
+        as if it was hoping that the non-matching case was so common that it was best to have the
+        matching case do the work all over again.
+
+        This is a 4% speed-up on Octane/regexp. It's also a matter of common sense: we should not be
+        in the business of presuming whether someone's match will succeed or fail. The increased
+        cost of running Yarr twice is so much larger than whatever savings we were getting from
+        running a match-only regexp that this is just not a good overall deal for the engine.
+
+        Also, it's interesting that we are seeing a 4% speed-up on regexp despite the fact that a
+        majority (almost a supermajority, I think) of calls into RegExp.prototype.match() are failed
+        matches. So, this change is a 4% speed-up despite being a slow down on the common case. That
+        tells you just how bad the old behavior was on the uncommon case.
+
+        * runtime/MatchResult.h:
+        (MatchResult::MatchResult):
+        (MatchResult::failed):
+        (MatchResult::operator bool):
+        * runtime/RegExpCachedResult.cpp:
+        (JSC::RegExpCachedResult::lastResult):
+        * runtime/RegExpConstructor.h:
+        (JSC::RegExpConstructor::setMultiline):
+        (JSC::RegExpConstructor::multiline):
+        (JSC::RegExpConstructor::performMatch):
+        (JSC::RegExpConstructor::recordMatch):
+        * runtime/RegExpMatchesArray.cpp:
+        (JSC::createRegExpMatchesArray):
+        (JSC::createEmptyRegExpMatchesArray):
+        (JSC::createStructureImpl):
+        * runtime/RegExpMatchesArray.h:
+        (JSC::createRegExpMatchesArray):
+        * runtime/RegExpObject.cpp:
+        (JSC::RegExpObject::put):
+        (JSC::getLastIndexAsUnsigned):
+        (JSC::RegExpObject::exec):
+        (JSC::RegExpObject::match):
+        * runtime/RegExpObject.h:
+        (JSC::RegExpObject::getLastIndex):
+        (JSC::RegExpObject::test):
+        * runtime/StringPrototype.cpp:
+        (JSC::stringProtoFuncMatch):
+
 2016-03-07  Joseph Pecoraro  <pecor...@apple.com>
 
         Heap Snapshot should include different Edge types and data (Property, Index, Variable)

Modified: trunk/Source/_javascript_Core/runtime/MatchResult.h (197714 => 197715)


--- trunk/Source/_javascript_Core/runtime/MatchResult.h	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/runtime/MatchResult.h	2016-03-08 00:34:44 UTC (rev 197715)
@@ -29,6 +29,12 @@
 typedef uint64_t EncodedMatchResult;
 
 struct MatchResult {
+    MatchResult()
+        : start(WTF::notFound)
+        , end(0)
+    {
+    }
+    
     ALWAYS_INLINE MatchResult(size_t start, size_t end)
         : start(start)
         , end(end)
@@ -51,10 +57,10 @@
 
     ALWAYS_INLINE static MatchResult failed()
     {
-        return MatchResult(WTF::notFound, 0);
+        return MatchResult();
     }
 
-    ALWAYS_INLINE operator bool()
+    ALWAYS_INLINE explicit operator bool() const
     {
         return start != WTF::notFound;
     }

Modified: trunk/Source/_javascript_Core/runtime/RegExpCachedResult.cpp (197714 => 197715)


--- trunk/Source/_javascript_Core/runtime/RegExpCachedResult.cpp	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/runtime/RegExpCachedResult.cpp	2016-03-08 00:34:44 UTC (rev 197715)
@@ -45,7 +45,10 @@
 {
     if (!m_reified) {
         m_reifiedInput.set(exec->vm(), owner, m_lastInput.get());
-        m_reifiedResult.set(exec->vm(), owner, createRegExpMatchesArray(exec, exec->lexicalGlobalObject(), m_lastInput.get(), m_lastRegExp.get(), m_result));
+        if (m_result)
+            m_reifiedResult.set(exec->vm(), owner, createRegExpMatchesArray(exec, exec->lexicalGlobalObject(), m_lastInput.get(), m_lastRegExp.get(), m_result.start));
+        else
+            m_reifiedResult.set(exec->vm(), owner, createEmptyRegExpMatchesArray(exec->lexicalGlobalObject(), m_lastInput.get(), m_lastRegExp.get()));
         m_reified = true;
     }
     return m_reifiedResult.get();

Modified: trunk/Source/_javascript_Core/runtime/RegExpConstructor.h (197714 => 197715)


--- trunk/Source/_javascript_Core/runtime/RegExpConstructor.h	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/runtime/RegExpConstructor.h	2016-03-08 00:34:44 UTC (rev 197715)
@@ -54,6 +54,7 @@
 
     MatchResult performMatch(VM&, RegExp*, JSString*, const String&, int startOffset, int** ovector);
     MatchResult performMatch(VM&, RegExp*, JSString*, const String&, int startOffset);
+    void recordMatch(VM&, RegExp*, JSString*, const MatchResult&);
 
     void setMultiline(bool multiline) { m_multiline = multiline; }
     bool multiline() const { return m_multiline; }
@@ -124,6 +125,12 @@
     return result;
 }
 
+ALWAYS_INLINE void RegExpConstructor::recordMatch(VM& vm, RegExp* regExp, JSString* string, const MatchResult& result)
+{
+    ASSERT(result);
+    m_cachedResult.record(vm, this, regExp, string, result);
+}
+
 } // namespace JSC
 
 #endif // RegExpConstructor_h

Modified: trunk/Source/_javascript_Core/runtime/RegExpMatchesArray.cpp (197714 => 197715)


--- trunk/Source/_javascript_Core/runtime/RegExpMatchesArray.cpp	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/runtime/RegExpMatchesArray.cpp	2016-03-08 00:34:44 UTC (rev 197715)
@@ -52,26 +52,33 @@
 
 JSArray* createRegExpMatchesArray(
     ExecState* exec, JSGlobalObject* globalObject, JSString* input, RegExp* regExp,
-    MatchResult result)
+    unsigned startOffset, MatchResult& result)
 {
     SamplingRegion samplingRegion("createRegExpMatchesArray");
     
-    ASSERT(result);
     VM& vm = globalObject->vm();
+    
+    Vector<int, 32> subpatternResults;
+    int position = regExp->match(vm, input->value(exec), startOffset, subpatternResults);
+    if (position == -1) {
+        result = MatchResult::failed();
+        return nullptr;
+    }
 
+    result.start = position;
+    result.end = subpatternResults[1];
+    
     JSArray* array;
+
+    // FIXME: This should handle array allocation errors gracefully.
+    // https://bugs.webkit.org/show_bug.cgi?id=155144
+    
     if (UNLIKELY(globalObject->isHavingABadTime())) {
         array = JSArray::tryCreateUninitialized(vm, globalObject->regExpMatchesArrayStructure(), regExp->numSubpatterns() + 1);
         
         array->initializeIndex(vm, 0, jsSubstring(vm, exec, input, result.start, result.end - result.start));
         
         if (unsigned numSubpatterns = regExp->numSubpatterns()) {
-            Vector<int, 32> subpatternResults;
-            int position = regExp->match(vm, input->value(exec), result.start, subpatternResults);
-            ASSERT_UNUSED(position, position >= 0 && static_cast<size_t>(position) == result.start);
-            ASSERT(result.start == static_cast<size_t>(subpatternResults[0]));
-            ASSERT(result.end == static_cast<size_t>(subpatternResults[1]));
-            
             for (unsigned i = 1; i <= numSubpatterns; ++i) {
                 int start = subpatternResults[2 * i];
                 if (start >= 0)
@@ -87,12 +94,6 @@
         array->initializeIndex(vm, 0, jsSubstring(vm, exec, input, result.start, result.end - result.start), ArrayWithContiguous);
         
         if (unsigned numSubpatterns = regExp->numSubpatterns()) {
-            Vector<int, 32> subpatternResults;
-            int position = regExp->match(vm, input->value(exec), result.start, subpatternResults);
-            ASSERT_UNUSED(position, position >= 0 && static_cast<size_t>(position) == result.start);
-            ASSERT(result.start == static_cast<size_t>(subpatternResults[0]));
-            ASSERT(result.end == static_cast<size_t>(subpatternResults[1]));
-            
             for (unsigned i = 1; i <= numSubpatterns; ++i) {
                 int start = subpatternResults[2 * i];
                 if (start >= 0)
@@ -109,6 +110,40 @@
     return array;
 }
 
+JSArray* createEmptyRegExpMatchesArray(JSGlobalObject* globalObject, JSString* input, RegExp* regExp)
+{
+    VM& vm = globalObject->vm();
+    JSArray* array;
+
+    // FIXME: This should handle array allocation errors gracefully.
+    // https://bugs.webkit.org/show_bug.cgi?id=155144
+    
+    if (UNLIKELY(globalObject->isHavingABadTime())) {
+        array = JSArray::tryCreateUninitialized(vm, globalObject->regExpMatchesArrayStructure(), regExp->numSubpatterns() + 1);
+        
+        array->initializeIndex(vm, 0, jsEmptyString(&vm));
+        
+        if (unsigned numSubpatterns = regExp->numSubpatterns()) {
+            for (unsigned i = 1; i <= numSubpatterns; ++i)
+                array->initializeIndex(vm, i, jsUndefined());
+        }
+    } else {
+        array = tryCreateUninitializedRegExpMatchesArray(vm, globalObject->regExpMatchesArrayStructure(), regExp->numSubpatterns() + 1);
+        RELEASE_ASSERT(array);
+        
+        array->initializeIndex(vm, 0, jsEmptyString(&vm), ArrayWithContiguous);
+        
+        if (unsigned numSubpatterns = regExp->numSubpatterns()) {
+            for (unsigned i = 1; i <= numSubpatterns; ++i)
+                array->initializeIndex(vm, i, jsUndefined(), ArrayWithContiguous);
+        }
+    }
+
+    array->putDirect(vm, indexPropertyOffset, jsNumber(-1));
+    array->putDirect(vm, inputPropertyOffset, input);
+    return array;
+}
+
 static Structure* createStructureImpl(VM& vm, JSGlobalObject* globalObject, IndexingType indexingType)
 {
     Structure* structure = globalObject->arrayStructureForIndexingTypeDuringAllocation(indexingType);

Modified: trunk/Source/_javascript_Core/runtime/RegExpMatchesArray.h (197714 => 197715)


--- trunk/Source/_javascript_Core/runtime/RegExpMatchesArray.h	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/runtime/RegExpMatchesArray.h	2016-03-08 00:34:44 UTC (rev 197715)
@@ -26,7 +26,13 @@
 
 namespace JSC {
 
-JSArray* createRegExpMatchesArray(ExecState*, JSGlobalObject*, JSString*, RegExp*, MatchResult);
+JSArray* createRegExpMatchesArray(ExecState*, JSGlobalObject*, JSString*, RegExp*, unsigned startOffset, MatchResult&);
+inline JSArray* createRegExpMatchesArray(ExecState* exec, JSGlobalObject* globalObject, JSString* string, RegExp* regExp, unsigned startOffset)
+{
+    MatchResult ignoredResult;
+    return createRegExpMatchesArray(exec, globalObject, string, regExp, startOffset, ignoredResult);
+}
+JSArray* createEmptyRegExpMatchesArray(JSGlobalObject*, JSString*, RegExp*);
 Structure* createRegExpMatchesArrayStructure(VM&, JSGlobalObject*);
 Structure* createRegExpMatchesArraySlowPutStructure(VM&, JSGlobalObject*);
 

Modified: trunk/Source/_javascript_Core/runtime/RegExpObject.cpp (197714 => 197715)


--- trunk/Source/_javascript_Core/runtime/RegExpObject.cpp	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/runtime/RegExpObject.cpp	2016-03-08 00:34:44 UTC (rev 197715)
@@ -159,11 +159,58 @@
     Base::put(cell, exec, propertyName, value, slot);
 }
 
+ALWAYS_INLINE unsigned getLastIndexAsUnsigned(
+    ExecState* exec, RegExpObject* regExpObject, const String& input)
+{
+    JSValue jsLastIndex = regExpObject->getLastIndex();
+    unsigned lastIndex;
+    if (LIKELY(jsLastIndex.isUInt32())) {
+        lastIndex = jsLastIndex.asUInt32();
+        if (lastIndex > input.length()) {
+            regExpObject->setLastIndex(exec, 0);
+            return UINT_MAX;
+        }
+    } else {
+        double doubleLastIndex = jsLastIndex.toInteger(exec);
+        if (doubleLastIndex < 0 || doubleLastIndex > input.length()) {
+            regExpObject->setLastIndex(exec, 0);
+            return UINT_MAX;
+        }
+        lastIndex = static_cast<unsigned>(doubleLastIndex);
+    }
+    return lastIndex;
+}
+
 JSValue RegExpObject::exec(ExecState* exec, JSGlobalObject* globalObject, JSString* string)
 {
-    if (MatchResult result = match(exec, globalObject, string))
-        return createRegExpMatchesArray(exec, globalObject, string, regExp(), result);
-    return jsNull();
+    RegExp* regExp = this->regExp();
+    RegExpConstructor* regExpConstructor = globalObject->regExpConstructor();
+    String input = string->value(exec); // FIXME: Handle errors. https://bugs.webkit.org/show_bug.cgi?id=155145
+    VM& vm = globalObject->vm();
+
+    if (!regExp->global()) {
+        MatchResult result;
+        JSArray* array = createRegExpMatchesArray(exec, globalObject, string, regExp, 0, result);
+        if (!array)
+            return jsNull();
+        regExpConstructor->recordMatch(vm, regExp, string, result);
+        return array;
+    }
+
+    unsigned lastIndex = getLastIndexAsUnsigned(exec, this, input);
+    if (lastIndex == UINT_MAX)
+        return jsNull();
+    
+    MatchResult result;
+    JSArray* array =
+        createRegExpMatchesArray(exec, globalObject, string, regExp, lastIndex, result);
+    if (!array) {
+        setLastIndex(exec, 0);
+        return jsNull();
+    }
+    setLastIndex(exec, result.end);
+    regExpConstructor->recordMatch(vm, regExp, string, result);
+    return array;
 }
 
 // Shared implementation used by test and exec.
@@ -171,28 +218,15 @@
 {
     RegExp* regExp = this->regExp();
     RegExpConstructor* regExpConstructor = globalObject->regExpConstructor();
-    String input = string->value(exec);
+    String input = string->value(exec); // FIXME: Handle errors. https://bugs.webkit.org/show_bug.cgi?id=155145
     VM& vm = globalObject->vm();
     if (!regExp->global())
         return regExpConstructor->performMatch(vm, regExp, string, input, 0);
 
-    JSValue jsLastIndex = getLastIndex();
-    unsigned lastIndex;
-    if (LIKELY(jsLastIndex.isUInt32())) {
-        lastIndex = jsLastIndex.asUInt32();
-        if (lastIndex > input.length()) {
-            setLastIndex(exec, 0);
-            return MatchResult::failed();
-        }
-    } else {
-        double doubleLastIndex = jsLastIndex.toInteger(exec);
-        if (doubleLastIndex < 0 || doubleLastIndex > input.length()) {
-            setLastIndex(exec, 0);
-            return MatchResult::failed();
-        }
-        lastIndex = static_cast<unsigned>(doubleLastIndex);
-    }
-
+    unsigned lastIndex = getLastIndexAsUnsigned(exec, this, input);
+    if (lastIndex == UINT_MAX)
+        return MatchResult::failed();
+    
     MatchResult result = regExpConstructor->performMatch(vm, regExp, string, input, lastIndex);
     setLastIndex(exec, result.end);
     return result;

Modified: trunk/Source/_javascript_Core/runtime/RegExpObject.h (197714 => 197715)


--- trunk/Source/_javascript_Core/runtime/RegExpObject.h	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/runtime/RegExpObject.h	2016-03-08 00:34:44 UTC (rev 197715)
@@ -66,7 +66,7 @@
         return m_lastIndex.get();
     }
 
-    bool test(ExecState* exec, JSGlobalObject* globalObject, JSString* string) { return match(exec, globalObject, string); }
+    bool test(ExecState* exec, JSGlobalObject* globalObject, JSString* string) { return !!match(exec, globalObject, string); }
     JSValue exec(ExecState*, JSGlobalObject*, JSString*);
 
     static bool getOwnPropertySlot(JSObject*, ExecState*, PropertyName, PropertySlot&);

Modified: trunk/Source/_javascript_Core/runtime/StringPrototype.cpp (197714 => 197715)


--- trunk/Source/_javascript_Core/runtime/StringPrototype.cpp	2016-03-08 00:14:34 UTC (rev 197714)
+++ trunk/Source/_javascript_Core/runtime/StringPrototype.cpp	2016-03-08 00:34:44 UTC (rev 197715)
@@ -1072,7 +1072,7 @@
     MatchResult result = regExpConstructor->performMatch(*vm, regExp, string, s, 0);
     // case without 'g' flag is handled like RegExp.prototype.exec
     if (!global)
-        return JSValue::encode(result ? createRegExpMatchesArray(exec, globalObject, string, regExp, result) : jsNull());
+        return JSValue::encode(result ? createRegExpMatchesArray(exec, globalObject, string, regExp, result.start) : jsNull());
 
     // return array of matches
     MarkedArgumentBuffer list;
_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to