Title: [182808] trunk
Revision
182808
Author
benja...@webkit.org
Date
2015-04-14 14:08:02 -0700 (Tue, 14 Apr 2015)

Log Message

Add a conservative DFA minimizer for the content extension matcher
https://bugs.webkit.org/show_bug.cgi?id=143501

Reviewed by Alex Christensen.

Source/WebCore:

This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
some indistinguishable are not merged, but no two distinguishable are merged.

The general idea of the algorithm is to put all the state into a single set
and partition iteratively until it is impossible to split any subset by using
a transition to distinguish two states.

Let's ignore fallback transition for now, and I'll explain later how they fit in
the big picture.


The first thing we do is create a partition of the transition by grouping every
transition by the same character in the same subset. This partition of transitions
is the base by which we will partition the states.

Each subset in the transition partition is a "distinguisher" by which we can
separate the state partition.

We also create a second partition, the state partition. This is where we keep
all the subsets of states that have been split so far.

Let say we have the following graph.

    1 --a--> 2
    1 --b--> 3
    2 --c--> 4 (final)
    3 --c--> 4 (final)

The partition of transition would start with:
Set 0:
    1 --a--> 2
Set 1:
    1 --b--> 3
Set 2:
    2 --c--> 4
    3 --c--> 4

The state partition would have a single set with { 1, 2, 3, 4 }.


Next, we split the state partition by distinguishable final states. In this case,
we would split it into { 1, 2, 3 }, { 4 }.

We then refine the transition partition by splitting it by the states that have
been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
so the transition partition remains the same.


We can now execute the main loop of the algorithm:
1) Split the states by the transitions.
2) Split the transitions that are now reaching two different sets of the state partition.
3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
   to process.

In this case, we just iterate over the partition set in order, and add newly split transitions
to the end of the list.

In the example, we would first visit set 0. We have that state 1 is distinguishable
by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.

We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.

Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
set -> nothing to do.

There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.

---

Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
approach: we split everything assuming fallback transition do not exist, then we refine
by the fallback transitions.

Let's take the following example:
    1 --a--> 3
    2 --a--> 3
    1 -[f]-> 4
    2 -[f]-> 5

and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
The states 1 and 2 are together because they cannot be distinguished by 'a', but
the fallback transition distinguishes them.

Since we have done every other split, we have one useful property: we know that every
state in every set transition with the exact set of characters within that set.
If that was not true, there would be one "distinguisher" 'x' that could spit the set
into two subsets: the one with the transition 'x' and the ones without.

Since all the transitions are the same, there is no overlap between the defined transition
and the fallback transition. Consequently, we can use the fallback transition as a whole
transition and use it to distinguish the states.

The fallback transitions are handled like any other transition, we have a partition of such
transitions and split by each of them. BUT, we can only use them after every unique transition
has been covered.

This trick is also what makes the minimization imperfect: it should be possible to merge
states with overlap in their fallback transitions but we would split them.

---

Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
work on this patch. Thanks for your wonderful papers about DFA minimization.

* WebCore.xcodeproj/project.pbxproj:
* contentextensions/ContentExtensionCompiler.cpp:
(WebCore::ContentExtensions::compileRuleList):
* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::minimize):
(WebCore::ContentExtensions::DFA::debugPrintDot):
* contentextensions/DFA.h:
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
* contentextensions/DFAMinimizer.cpp: Added.
(WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
(WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
(WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
(WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
(WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
(WebCore::ContentExtensions::DFAMinimizer::Partition::size):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
(WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
(WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
(WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
(WebCore::ContentExtensions::DFAMinimizer::minimize):
* contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
* contentextensions/DFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::NFAToDFA::convert):
(WebCore::ContentExtensions::simplifyTransitions): Deleted.

Tools:

* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
* TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:

Modified Paths

Added Paths

Diff

Modified: trunk/Source/WebCore/ChangeLog (182807 => 182808)


--- trunk/Source/WebCore/ChangeLog	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/ChangeLog	2015-04-14 21:08:02 UTC (rev 182808)
@@ -1,3 +1,152 @@
+2015-04-14  Benjamin Poulain  <benja...@webkit.org>
+
+        Add a conservative DFA minimizer for the content extension matcher
+        https://bugs.webkit.org/show_bug.cgi?id=143501
+
+        Reviewed by Alex Christensen.
+
+        This patch adds a simple minimizer for DFA graphs. It is not a perfect minimizer:
+        some indistinguishable are not merged, but no two distinguishable are merged.
+
+        The general idea of the algorithm is to put all the state into a single set
+        and partition iteratively until it is impossible to split any subset by using
+        a transition to distinguish two states.
+
+        Let's ignore fallback transition for now, and I'll explain later how they fit in
+        the big picture.
+
+
+        The first thing we do is create a partition of the transition by grouping every
+        transition by the same character in the same subset. This partition of transitions
+        is the base by which we will partition the states.
+
+        Each subset in the transition partition is a "distinguisher" by which we can
+        separate the state partition.
+
+        We also create a second partition, the state partition. This is where we keep
+        all the subsets of states that have been split so far.
+
+        Let say we have the following graph.
+
+            1 --a--> 2
+            1 --b--> 3
+            2 --c--> 4 (final)
+            3 --c--> 4 (final)
+
+        The partition of transition would start with:
+        Set 0:
+            1 --a--> 2
+        Set 1:
+            1 --b--> 3
+        Set 2:
+            2 --c--> 4
+            3 --c--> 4
+
+        The state partition would have a single set with { 1, 2, 3, 4 }.
+
+
+        Next, we split the state partition by distinguishable final states. In this case,
+        we would split it into { 1, 2, 3 }, { 4 }.
+
+        We then refine the transition partition by splitting it by the states that have
+        been distinguished. Here, the only transitions to 4 are both is the same set (set 2),
+        so the transition partition remains the same.
+
+
+        We can now execute the main loop of the algorithm:
+        1) Split the states by the transitions.
+        2) Split the transitions that are now reaching two different sets of the state partition.
+        3) Add any newly discovered "distinguisher" (the ones we split) to the list of "distinguisher"
+           to process.
+
+        In this case, we just iterate over the partition set in order, and add newly split transitions
+        to the end of the list.
+
+        In the example, we would first visit set 0. We have that state 1 is distinguishable
+        by "a", and the state partition would become { 1 }, { 2, 3 }, { 4 }.
+
+        We then visit transition set 1, it distinguishes state 1 which is already alone -> nothing to do.
+
+        Finally, we process the transition set 2, it distinguishes 2 and 3, they are already in the same
+        set -> nothing to do.
+
+        There is no more transition to process, we have 3 unique subsets and we should merge 2 and 3.
+
+        ---
+
+        Okay, now how to we fit fallback transition in this model. In this patch, I take the conservative
+        approach: we split everything assuming fallback transition do not exist, then we refine
+        by the fallback transitions.
+
+        Let's take the following example:
+            1 --a--> 3
+            2 --a--> 3
+            1 -[f]-> 4
+            2 -[f]-> 5
+
+        and at this stage in the algorithm, we have the sets { 1, 2 }, { 3 }, { 4 }, { 5 }.
+        The states 1 and 2 are together because they cannot be distinguished by 'a', but
+        the fallback transition distinguishes them.
+
+        Since we have done every other split, we have one useful property: we know that every
+        state in every set transition with the exact set of characters within that set.
+        If that was not true, there would be one "distinguisher" 'x' that could spit the set
+        into two subsets: the one with the transition 'x' and the ones without.
+
+        Since all the transitions are the same, there is no overlap between the defined transition
+        and the fallback transition. Consequently, we can use the fallback transition as a whole
+        transition and use it to distinguish the states.
+
+        The fallback transitions are handled like any other transition, we have a partition of such
+        transitions and split by each of them. BUT, we can only use them after every unique transition
+        has been covered.
+
+        This trick is also what makes the minimization imperfect: it should be possible to merge
+        states with overlap in their fallback transitions but we would split them.
+
+        ---
+
+        Antti Valmari, Petri Lehtinen, Marie-Pierre Béal and Maxime Crochemore deserve credit for their indirect
+        work on this patch. Thanks for your wonderful papers about DFA minimization.
+
+        * WebCore.xcodeproj/project.pbxproj:
+        * contentextensions/ContentExtensionCompiler.cpp:
+        (WebCore::ContentExtensions::compileRuleList):
+        * contentextensions/DFA.cpp:
+        (WebCore::ContentExtensions::DFA::minimize):
+        (WebCore::ContentExtensions::DFA::debugPrintDot):
+        * contentextensions/DFA.h:
+        * contentextensions/DFABytecodeCompiler.cpp:
+        (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
+        * contentextensions/DFAMinimizer.cpp: Added.
+        (WebCore::ContentExtensions::DFAMinimizer::simplifyTransitions):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::initialize):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::markElementInCurrentGeneration):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::refineGeneration):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::iterateSet):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::setIndex):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::firstElementInSet):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::size):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::indexAfterMarkedElements):
+        (WebCore::ContentExtensions::DFAMinimizer::Partition::SetDescriptor::end):
+        (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::FullGraphPartition):
+        (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::markNode):
+        (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::refinePartitions):
+        (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByUniqueTransitions):
+        (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::splitByFallbackTransitions):
+        (WebCore::ContentExtensions::DFAMinimizer::FullGraphPartition::nodeReplacement):
+        (WebCore::ContentExtensions::DFAMinimizer::ActionKey::ActionKey):
+        (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isEmptyValue):
+        (WebCore::ContentExtensions::DFAMinimizer::ActionKey::isDeletedValue):
+        (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::hash):
+        (WebCore::ContentExtensions::DFAMinimizer::ActionKeyHash::equal):
+        (WebCore::ContentExtensions::DFAMinimizer::minimize):
+        * contentextensions/DFAMinimizer.h: Copied from Source/WebCore/contentextensions/DFA.h.
+        * contentextensions/DFANode.h:
+        * contentextensions/NFAToDFA.cpp:
+        (WebCore::ContentExtensions::NFAToDFA::convert):
+        (WebCore::ContentExtensions::simplifyTransitions): Deleted.
+
 2015-04-14  Chris Dumez  <cdu...@apple.com>
 
         ASSERT(frame().view() == this) assertion hit in FrameView::windowClipRect() on Windows bots

Modified: trunk/Source/WebCore/WebCore.xcodeproj/project.pbxproj (182807 => 182808)


--- trunk/Source/WebCore/WebCore.xcodeproj/project.pbxproj	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/WebCore.xcodeproj/project.pbxproj	2015-04-14 21:08:02 UTC (rev 182808)
@@ -1003,16 +1003,18 @@
 		26601EBF14B3B9AD0012C0FE /* PlatformEventFactoryIOS.h in Headers */ = {isa = PBXBuildFile; fileRef = 26601EBD14B3B9AD0012C0FE /* PlatformEventFactoryIOS.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		26601EC014B3B9AD0012C0FE /* PlatformEventFactoryIOS.mm in Sources */ = {isa = PBXBuildFile; fileRef = 26601EBE14B3B9AD0012C0FE /* PlatformEventFactoryIOS.mm */; };
 		267725FC1A5B3AD9003C24DD /* DFA.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 267725F61A5B3AD9003C24DD /* DFA.cpp */; };
-		267725FD1A5B3AD9003C24DD /* DFA.h in Headers */ = {isa = PBXBuildFile; fileRef = 267725F71A5B3AD9003C24DD /* DFA.h */; };
-		267725FF1A5B3AD9003C24DD /* DFANode.h in Headers */ = {isa = PBXBuildFile; fileRef = 267725F91A5B3AD9003C24DD /* DFANode.h */; };
+		267725FD1A5B3AD9003C24DD /* DFA.h in Headers */ = {isa = PBXBuildFile; fileRef = 267725F71A5B3AD9003C24DD /* DFA.h */; settings = {ATTRIBUTES = (Private, ); }; };
+		267725FF1A5B3AD9003C24DD /* DFANode.h in Headers */ = {isa = PBXBuildFile; fileRef = 267725F91A5B3AD9003C24DD /* DFANode.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		267726001A5B3AD9003C24DD /* NFAToDFA.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 267725FA1A5B3AD9003C24DD /* NFAToDFA.cpp */; };
-		267726011A5B3AD9003C24DD /* NFAToDFA.h in Headers */ = {isa = PBXBuildFile; fileRef = 267725FB1A5B3AD9003C24DD /* NFAToDFA.h */; };
+		267726011A5B3AD9003C24DD /* NFAToDFA.h in Headers */ = {isa = PBXBuildFile; fileRef = 267725FB1A5B3AD9003C24DD /* NFAToDFA.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		267726041A5DF6F2003C24DD /* URLFilterParser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 267726021A5DF6F2003C24DD /* URLFilterParser.cpp */; };
 		267726051A5DF6F2003C24DD /* URLFilterParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 267726031A5DF6F2003C24DD /* URLFilterParser.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		269239961505E1AA009E57FC /* JSIDBVersionChangeEvent.h in Headers */ = {isa = PBXBuildFile; fileRef = 269239921505E1AA009E57FC /* JSIDBVersionChangeEvent.h */; };
 		269397221A4A412F00E8349D /* NFANode.h in Headers */ = {isa = PBXBuildFile; fileRef = 269397201A4A412F00E8349D /* NFANode.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		269397241A4A5B6400E8349D /* NFA.h in Headers */ = {isa = PBXBuildFile; fileRef = 269397231A4A5B6400E8349D /* NFA.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		269397261A4A5FBD00E8349D /* NFA.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 269397251A4A5FBD00E8349D /* NFA.cpp */; };
+		26A517FD1AB92238006335DF /* DFAMinimizer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 26A517FB1AB92238006335DF /* DFAMinimizer.cpp */; };
+		26A517FE1AB92238006335DF /* DFAMinimizer.h in Headers */ = {isa = PBXBuildFile; fileRef = 26A517FC1AB92238006335DF /* DFAMinimizer.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		26AA0F9E18D2A18B00419381 /* SelectorPseudoElementTypeMap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 26AA0F9D18D2A18B00419381 /* SelectorPseudoElementTypeMap.cpp */; };
 		26B9998F1803AE7200D01121 /* RegisterAllocator.h in Headers */ = {isa = PBXBuildFile; fileRef = 26B9998E1803AE7200D01121 /* RegisterAllocator.h */; };
 		26B999911803B3C900D01121 /* StackAllocator.h in Headers */ = {isa = PBXBuildFile; fileRef = 26B999901803B3C900D01121 /* StackAllocator.h */; };
@@ -8102,6 +8104,8 @@
 		269397201A4A412F00E8349D /* NFANode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = NFANode.h; sourceTree = "<group>"; };
 		269397231A4A5B6400E8349D /* NFA.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = NFA.h; sourceTree = "<group>"; };
 		269397251A4A5FBD00E8349D /* NFA.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = NFA.cpp; sourceTree = "<group>"; };
+		26A517FB1AB92238006335DF /* DFAMinimizer.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = DFAMinimizer.cpp; sourceTree = "<group>"; };
+		26A517FC1AB92238006335DF /* DFAMinimizer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DFAMinimizer.h; sourceTree = "<group>"; };
 		26AA0F9918D2973D00419381 /* makeSelectorPseudoElementsMap.py */ = {isa = PBXFileReference; lastKnownFileType = text.script.python; path = makeSelectorPseudoElementsMap.py; sourceTree = "<group>"; };
 		26AA0F9A18D2973D00419381 /* SelectorPseudoElementTypeMap.in */ = {isa = PBXFileReference; lastKnownFileType = text; path = SelectorPseudoElementTypeMap.in; sourceTree = "<group>"; };
 		26AA0F9D18D2A18B00419381 /* SelectorPseudoElementTypeMap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SelectorPseudoElementTypeMap.cpp; sourceTree = "<group>"; };
@@ -15535,6 +15539,8 @@
 				5C39305F1AA0F6A90029C816 /* DFABytecodeCompiler.h */,
 				5C3930601AA0F6A90029C816 /* DFABytecodeInterpreter.cpp */,
 				5C3930611AA0F6A90029C816 /* DFABytecodeInterpreter.h */,
+				26A517FB1AB92238006335DF /* DFAMinimizer.cpp */,
+				26A517FC1AB92238006335DF /* DFAMinimizer.h */,
 				267725F91A5B3AD9003C24DD /* DFANode.h */,
 				269397251A4A5FBD00E8349D /* NFA.cpp */,
 				269397231A4A5B6400E8349D /* NFA.h */,
@@ -23659,6 +23665,7 @@
 				316FE1120E6E1DA700BF6088 /* AnimationBase.h in Headers */,
 				316FE1140E6E1DA700BF6088 /* AnimationController.h in Headers */,
 				0F15DA8A0F3AAEE70000CE47 /* AnimationControllerPrivate.h in Headers */,
+				26A517FE1AB92238006335DF /* DFAMinimizer.h in Headers */,
 				5CBC8DAD1AAA302200E1C803 /* MediaAccessibilitySoftLink.h in Headers */,
 				49E912AD0EFAC906009D0CAF /* AnimationList.h in Headers */,
 				6C568CB119DAFEA000430CA2 /* MaskImageOperation.h in Headers */,
@@ -30363,6 +30370,7 @@
 				A833C7CC0A2CF07400D57664 /* XLinkNames.cpp in Sources */,
 				00B9318713BA8DB30035A948 /* XMLDocumentParser.cpp in Sources */,
 				00B9318913BA8DBC0035A948 /* XMLDocumentParserLibxml2.cpp in Sources */,
+				26A517FD1AB92238006335DF /* DFAMinimizer.cpp in Sources */,
 				00B9318B13BA8DC90035A948 /* XMLDocumentParserScope.cpp in Sources */,
 				59C28045138DC2410079B7E2 /* XMLErrors.cpp in Sources */,
 				BC772C460C4EB2C60083285F /* XMLHttpRequest.cpp in Sources */,

Modified: trunk/Source/WebCore/contentextensions/ContentExtensionCompiler.cpp (182807 => 182808)


--- trunk/Source/WebCore/contentextensions/ContentExtensionCompiler.cpp	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/contentextensions/ContentExtensionCompiler.cpp	2015-04-14 21:08:02 UTC (rev 182808)
@@ -189,12 +189,29 @@
 #endif
 
 #if CONTENT_EXTENSIONS_PERFORMANCE_REPORTING
-    double dfaBuildTimeStart = monotonicallyIncreasingTime();
+    double totalNFAToByteCodeBuildTimeStart = monotonicallyIncreasingTime();
 #endif
+
     Vector<DFABytecode> bytecode;
     for (size_t i = 0; i < nfas.size(); ++i) {
+#if CONTENT_EXTENSIONS_PERFORMANCE_REPORTING
+        double dfaBuildTimeStart = monotonicallyIncreasingTime();
+#endif
         DFA dfa = NFAToDFA::convert(nfas[i]);
+#if CONTENT_EXTENSIONS_PERFORMANCE_REPORTING
+        double dfaBuildTimeEnd = monotonicallyIncreasingTime();
+        dataLogF("    Time spent building the DFA %zu: %f\n", i, (dfaBuildTimeEnd - dfaBuildTimeStart));
+#endif
 
+#if CONTENT_EXTENSIONS_PERFORMANCE_REPORTING
+        double dfaMinimizationTimeStart = monotonicallyIncreasingTime();
+#endif
+        dfa.minimize();
+#if CONTENT_EXTENSIONS_PERFORMANCE_REPORTING
+        double dfaMinimizationTimeEnd = monotonicallyIncreasingTime();
+        dataLogF("    Time spent miniminizing the DFA %zu: %f\n", i, (dfaMinimizationTimeEnd - dfaMinimizationTimeStart));
+#endif
+
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING
         WTFLogAlways("DFA %zu", i);
         dfa.debugPrintDot();
@@ -211,8 +228,8 @@
     }
 
 #if CONTENT_EXTENSIONS_PERFORMANCE_REPORTING
-    double dfaBuildTimeEnd = monotonicallyIncreasingTime();
-    dataLogF("    Time spent building and compiling the DFAs: %f\n", (dfaBuildTimeEnd - dfaBuildTimeStart));
+    double totalNFAToByteCodeBuildTimeEnd = monotonicallyIncreasingTime();
+    dataLogF("    Time spent building and compiling the DFAs: %f\n", (totalNFAToByteCodeBuildTimeEnd - totalNFAToByteCodeBuildTimeStart));
     dataLogF("    Bytecode size %zu\n", bytecode.size());
     dataLogF("    DFA count %zu\n", nfas.size());
 #endif

Modified: trunk/Source/WebCore/contentextensions/DFA.cpp (182807 => 182808)


--- trunk/Source/WebCore/contentextensions/DFA.cpp	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/contentextensions/DFA.cpp	2015-04-14 21:08:02 UTC (rev 182808)
@@ -28,6 +28,7 @@
 
 #if ENABLE(CONTENT_EXTENSIONS)
 
+#include "DFAMinimizer.h"
 #include <wtf/DataLog.h>
 
 namespace WebCore {
@@ -59,6 +60,11 @@
     return *this;
 }
 
+void DFA::minimize()
+{
+    m_root = DFAMinimizer::minimize(m_nodes, m_root);
+}
+
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING
 static void printRange(bool firstRange, char rangeStart, char rangeEnd)
 {
@@ -129,6 +135,9 @@
     dataLogF("    node [shape=circle];\n");
     dataLogF("    {\n");
     for (unsigned i = 0; i < m_nodes.size(); ++i) {
+        if (m_nodes[i].isKilled)
+            continue;
+
         dataLogF("         %d [label=<Node %d", i, i);
         const Vector<uint64_t>& actions = m_nodes[i].actions;
         if (!actions.isEmpty()) {

Modified: trunk/Source/WebCore/contentextensions/DFA.h (182807 => 182808)


--- trunk/Source/WebCore/contentextensions/DFA.h	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/contentextensions/DFA.h	2015-04-14 21:08:02 UTC (rev 182808)
@@ -37,7 +37,7 @@
 namespace ContentExtensions {
 
 // The DFA abstract a partial DFA graph in a compact form.
-class DFA {
+class WEBCORE_EXPORT DFA {
 public:
     DFA();
     DFA(Vector<DFANode>&& nodes, unsigned rootIndex);
@@ -50,6 +50,8 @@
     const DFANode& nodeAt(unsigned i) const { return m_nodes[i]; }
     DFANode& nodeAt(unsigned i) { return m_nodes[i]; }
 
+    void minimize();
+
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING
     void debugPrintDot() const;
 #endif

Modified: trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.cpp (182807 => 182808)


--- trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.cpp	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.cpp	2015-04-14 21:08:02 UTC (rev 182808)
@@ -96,6 +96,10 @@
 void DFABytecodeCompiler::compileNode(unsigned index, bool root)
 {
     const DFANode& node = m_dfa.nodeAt(index);
+    if (node.isKilled) {
+        m_nodeStartOffsets[index] = std::numeric_limits<unsigned>::max();
+        return;
+    }
 
     // Record starting index for linking.
     if (!root)
@@ -214,8 +218,14 @@
     }
 
     // Link.
-    for (const auto& linkRecord : m_linkRecords)
-        set32Bits(m_bytecode, linkRecord.first, m_nodeStartOffsets[linkRecord.second]);
+    for (const auto& linkRecord : m_linkRecords) {
+        unsigned offset = linkRecord.first;
+        ASSERT(!(*reinterpret_cast<unsigned*>(&m_bytecode[offset])));
+
+        unsigned target = m_nodeStartOffsets[linkRecord.second];
+        RELEASE_ASSERT(target != std::numeric_limits<unsigned>::max());
+        set32Bits(m_bytecode, offset, m_nodeStartOffsets[linkRecord.second]);
+    }
     
     // Set size header.
     set32Bits(m_bytecode, startLocation, m_bytecode.size() - startLocation);

Added: trunk/Source/WebCore/contentextensions/DFAMinimizer.cpp (0 => 182808)


--- trunk/Source/WebCore/contentextensions/DFAMinimizer.cpp	                        (rev 0)
+++ trunk/Source/WebCore/contentextensions/DFAMinimizer.cpp	2015-04-14 21:08:02 UTC (rev 182808)
@@ -0,0 +1,519 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include "DFAMinimizer.h"
+
+#if ENABLE(CONTENT_EXTENSIONS)
+
+#include <wtf/HashSet.h>
+#include <wtf/StringHasher.h>
+
+namespace WebCore {
+namespace ContentExtensions {
+
+namespace {
+
+// simplifyTransitions() tries to collapse individual transitions into fallback transitions.
+// After simplifyTransitions(), you can also make the assumption that a fallback transition target will never be
+// also the target of an individual transition.
+static void simplifyTransitions(Vector<DFANode>& dfaGraph)
+{
+    for (DFANode& dfaNode : dfaGraph) {
+        if (!dfaNode.hasFallbackTransition
+            && ((dfaNode.transitions.size() == 126 && !dfaNode.transitions.contains(0))
+                || (dfaNode.transitions.size() == 127 && dfaNode.transitions.contains(0)))) {
+            unsigned bestTarget = std::numeric_limits<unsigned>::max();
+            unsigned bestTargetScore = 0;
+            HashMap<unsigned, unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> targetHistogram;
+            for (const auto& transition : dfaNode.transitions) {
+                if (!transition.key)
+                    continue;
+
+                unsigned transitionTarget = transition.value;
+                auto addResult = targetHistogram.add(transitionTarget, 1);
+                if (!addResult.isNewEntry)
+                    addResult.iterator->value++;
+
+                if (addResult.iterator->value > bestTargetScore) {
+                    bestTargetScore = addResult.iterator->value;
+                    bestTarget = transitionTarget;
+                }
+            }
+            ASSERT_WITH_MESSAGE(bestTargetScore, "There should be at least one valid target since having transitions is a precondition to enter this path.");
+
+            dfaNode.hasFallbackTransition = true;
+            dfaNode.fallbackTransition = bestTarget;
+        }
+
+        if (dfaNode.hasFallbackTransition) {
+            Vector<uint16_t, 128> keys;
+            DFANodeTransitions& transitions = dfaNode.transitions;
+            copyKeysToVector(transitions, keys);
+
+            for (uint16_t key : keys) {
+                auto transitionIterator = transitions.find(key);
+                if (transitionIterator->value == dfaNode.fallbackTransition)
+                    transitions.remove(transitionIterator);
+            }
+        }
+    }
+}
+
+class Partition {
+public:
+    void initialize(unsigned size)
+    {
+        if (!size)
+            return;
+
+        m_sets.reserveInitialCapacity(size);
+        m_partitionedElements.resize(size);
+        m_elementPositionInPartitionedNodes.resize(size);
+        m_elementToSetMap.resize(size);
+
+        for (unsigned i = 0; i < size; ++i) {
+            m_partitionedElements[i] = i;
+            m_elementPositionInPartitionedNodes[i] = i;
+            m_elementToSetMap[i] = 0;
+        }
+        m_sets.append(SetDescriptor({ 0, size, 0 }));
+    }
+
+    void markElementInCurrentGeneration(unsigned elementIndex)
+    {
+        // Swap the node with the first unmarked node.
+        unsigned setIndex = m_elementToSetMap[elementIndex];
+        SetDescriptor& setDescriptor = m_sets[setIndex];
+
+        unsigned elementPositionInPartition = m_elementPositionInPartitionedNodes[elementIndex];
+        ASSERT(elementPositionInPartition >= setDescriptor.start);
+        ASSERT(elementPositionInPartition < setDescriptor.end());
+
+        unsigned firstUnmarkedElementPositionInPartition = setDescriptor.indexAfterMarkedElements();
+        ASSERT(firstUnmarkedElementPositionInPartition >= setDescriptor.start && firstUnmarkedElementPositionInPartition < setDescriptor.end());
+        ASSERT(firstUnmarkedElementPositionInPartition >= firstUnmarkedElementPositionInPartition);
+
+        // Swap the nodes in the set.
+        unsigned firstUnmarkedElement = m_partitionedElements[firstUnmarkedElementPositionInPartition];
+        m_partitionedElements[firstUnmarkedElementPositionInPartition] = elementIndex;
+        m_partitionedElements[elementPositionInPartition] = firstUnmarkedElement;
+
+        // Update their index.
+        m_elementPositionInPartitionedNodes[elementIndex] = firstUnmarkedElementPositionInPartition;
+        m_elementPositionInPartitionedNodes[firstUnmarkedElement] = elementPositionInPartition;
+
+        if (!setDescriptor.markedCount) {
+            ASSERT(!m_setsMarkedInCurrentGeneration.contains(setIndex));
+            m_setsMarkedInCurrentGeneration.append(setIndex);
+        }
+        ++setDescriptor.markedCount;
+    }
+
+    // The function passed as argument MUST not modify the partition.
+    template<typename Function>
+    void refineGeneration(const Function& function)
+    {
+        for (unsigned setIndex : m_setsMarkedInCurrentGeneration) {
+            SetDescriptor& setDescriptor = m_sets[setIndex];
+            if (setDescriptor.markedCount == setDescriptor.size) {
+                // Everything is marked, there is nothing to refine.
+                setDescriptor.markedCount = 0;
+                continue;
+            }
+
+            SetDescriptor newSet;
+            bool newSetIsMarkedSet = setDescriptor.markedCount * 2 <= setDescriptor.size;
+            if (newSetIsMarkedSet) {
+                // Less than half of the nodes have been marked.
+                newSet = { setDescriptor.start, setDescriptor.markedCount, 0 };
+                setDescriptor.start = setDescriptor.start + setDescriptor.markedCount;
+            } else
+                newSet = { setDescriptor.start + setDescriptor.markedCount, setDescriptor.size - setDescriptor.markedCount, 0 };
+            setDescriptor.size -= newSet.size;
+            setDescriptor.markedCount = 0;
+
+            unsigned newSetIndex = m_sets.size();
+            m_sets.append(newSet);
+
+            for (unsigned i = newSet.start; i < newSet.end(); ++i)
+                m_elementToSetMap[m_partitionedElements[i]] = newSetIndex;
+
+            function(newSetIndex);
+        }
+        m_setsMarkedInCurrentGeneration.clear();
+    }
+
+    // Call Function() on every node of a given subset.
+    template<typename Function>
+    void iterateSet(unsigned setIndex, const Function& function)
+    {
+        SetDescriptor& setDescriptor = m_sets[setIndex];
+        for (unsigned i = setDescriptor.start; i < setDescriptor.end(); ++i)
+            function(m_partitionedElements[i]);
+    }
+
+    // Index of the set containing the Node.
+    unsigned setIndex(unsigned elementIndex) const
+    {
+        return m_elementToSetMap[elementIndex];
+    }
+
+    // NodeIndex of the first element in the set.
+    unsigned firstElementInSet(unsigned setIndex) const
+    {
+        return m_partitionedElements[m_sets[setIndex].start];
+    }
+
+    unsigned size() const
+    {
+        return m_sets.size();
+    }
+
+private:
+    struct SetDescriptor {
+        unsigned start;
+        unsigned size;
+        unsigned markedCount;
+
+        unsigned indexAfterMarkedElements() const { return start + markedCount; }
+        unsigned end() const { return start + size; }
+    };
+
+    // List of sets.
+    Vector<SetDescriptor> m_sets;
+
+    // All the element indices such that two elements of the same set never have a element of a different set between them.
+    Vector<unsigned> m_partitionedElements;
+
+    // Map elementIndex->position in the partitionedElements.
+    Vector<unsigned> m_elementPositionInPartitionedNodes;
+
+    // Map elementIndex->SetIndex.
+    Vector<unsigned> m_elementToSetMap;
+
+    // List of sets with any marked node. Each set can appear at most once.
+    // FIXME: find a good inline size for this.
+    Vector<unsigned, 128> m_setsMarkedInCurrentGeneration;
+};
+
+class FullGraphPartition {
+public:
+    FullGraphPartition(const Vector<DFANode>& dfaGraph)
+    {
+        m_nodePartition.initialize(dfaGraph.size());
+
+        m_flattenedTransitionsStartOffsetPerNode.resize(dfaGraph.size());
+        for (unsigned& counter : m_flattenedTransitionsStartOffsetPerNode)
+            counter = 0;
+
+        m_flattenedFallbackTransitionsStartOffsetPerNode.resize(dfaGraph.size());
+        for (unsigned& counter : m_flattenedFallbackTransitionsStartOffsetPerNode)
+            counter = 0;
+
+        // Count the number of incoming transitions per node.
+        for (const DFANode& dfaNode : dfaGraph) {
+            for (const auto& transition : dfaNode.transitions)
+                ++m_flattenedTransitionsStartOffsetPerNode[transition.value];
+            if (dfaNode.hasFallbackTransition)
+                ++m_flattenedFallbackTransitionsStartOffsetPerNode[dfaNode.fallbackTransition];
+        }
+
+        // Accumulate the offsets.
+        unsigned transitionAccumulator = 0;
+        for (unsigned i = 0; i < m_flattenedTransitionsStartOffsetPerNode.size(); ++i) {
+            unsigned transitionsCountForNode = m_flattenedTransitionsStartOffsetPerNode[i];
+            m_flattenedTransitionsStartOffsetPerNode[i] = transitionAccumulator;
+            transitionAccumulator += transitionsCountForNode;
+        }
+        unsigned flattenedTransitionsSize = transitionAccumulator;
+
+        unsigned fallbackTransitionAccumulator = 0;
+        for (unsigned i = 0; i < m_flattenedFallbackTransitionsStartOffsetPerNode.size(); ++i) {
+            unsigned fallbackTransitionsCountForNode = m_flattenedFallbackTransitionsStartOffsetPerNode[i];
+            m_flattenedFallbackTransitionsStartOffsetPerNode[i] = fallbackTransitionAccumulator;
+            fallbackTransitionAccumulator += fallbackTransitionsCountForNode;
+        }
+        unsigned flattenedFallbackTransitionsSize = fallbackTransitionAccumulator;
+
+        // Next, let's fill the transition table and set up the size of each group at the same time.
+        m_flattenedTransitionsSizePerNode.resize(dfaGraph.size());
+        for (unsigned& counter : m_flattenedTransitionsSizePerNode)
+            counter = 0;
+
+        m_flattenedFallbackTransitionsSizePerNode.resize(dfaGraph.size());
+        for (unsigned& counter : m_flattenedFallbackTransitionsSizePerNode)
+            counter = 0;
+
+        m_flattenedTransitions.resize(flattenedTransitionsSize);
+
+        m_flattenedFallbackTransitions.resize(flattenedFallbackTransitionsSize);
+
+        for (unsigned i = 0; i < dfaGraph.size(); ++i) {
+            const DFANode& dfaNode = dfaGraph[i];
+            for (const auto& transition : dfaNode.transitions) {
+                unsigned targetNodeIndex = transition.value;
+
+                unsigned start = m_flattenedTransitionsStartOffsetPerNode[targetNodeIndex];
+                unsigned offset = m_flattenedTransitionsSizePerNode[targetNodeIndex];
+
+                m_flattenedTransitions[start + offset] = Transition({ i, targetNodeIndex, transition.key });
+
+                ++m_flattenedTransitionsSizePerNode[targetNodeIndex];
+            }
+            if (dfaNode.hasFallbackTransition) {
+                unsigned targetNodeIndex = dfaNode.fallbackTransition;
+
+                unsigned start = m_flattenedFallbackTransitionsStartOffsetPerNode[targetNodeIndex];
+                unsigned offset = m_flattenedFallbackTransitionsSizePerNode[targetNodeIndex];
+
+                m_flattenedFallbackTransitions[start + offset] = i;
+                ++m_flattenedFallbackTransitionsSizePerNode[targetNodeIndex];
+            }
+        }
+
+        // Create the initial partition of transition. Each character differentiating a transiton
+        // from an other gets its own set in the partition.
+        m_transitionPartition.initialize(m_flattenedTransitions.size());
+        for (uint16_t i = 0; i < 128; ++i) {
+            for (unsigned transitionIndex = 0; transitionIndex < m_flattenedTransitions.size(); ++transitionIndex) {
+                const Transition& transition = m_flattenedTransitions[transitionIndex];
+                if (transition.character == i)
+                    m_transitionPartition.markElementInCurrentGeneration(transitionIndex);
+            }
+            m_transitionPartition.refineGeneration([](unsigned) { });
+        }
+
+        // Fallback partitions are considered as a special type of differentiator, we don't split them initially.
+        m_fallbackTransitionPartition.initialize(m_flattenedFallbackTransitions.size());
+    }
+
+    void markNode(unsigned nodeIndex)
+    {
+        m_nodePartition.markElementInCurrentGeneration(nodeIndex);
+    }
+
+    void refinePartitions()
+    {
+        m_nodePartition.refineGeneration([&](unsigned smallestSetIndex) {
+            m_nodePartition.iterateSet(smallestSetIndex, [&](unsigned nodeIndex) {
+                unsigned incomingTransitionsStartForNode = m_flattenedTransitionsStartOffsetPerNode[nodeIndex];
+                unsigned incomingTransitionsSizeForNode = m_flattenedTransitionsSizePerNode[nodeIndex];
+
+                for (unsigned i = 0; i < incomingTransitionsSizeForNode; ++i)
+                    m_transitionPartition.markElementInCurrentGeneration(incomingTransitionsStartForNode + i);
+
+                unsigned incomingFallbackTransitionsStartForNode = m_flattenedFallbackTransitionsStartOffsetPerNode[nodeIndex];
+                unsigned incomingFallbackTransitionsSizeForNode = m_flattenedFallbackTransitionsSizePerNode[nodeIndex];
+
+                for (unsigned i = 0; i < incomingFallbackTransitionsSizeForNode; ++i)
+                    m_fallbackTransitionPartition.markElementInCurrentGeneration(incomingFallbackTransitionsStartForNode + i);
+            });
+
+            // We only need to split the transitions, we handle the new sets through the main loop.
+            m_transitionPartition.refineGeneration([](unsigned) { });
+            m_fallbackTransitionPartition.refineGeneration([](unsigned) { });
+        });
+    }
+
+    void splitByUniqueTransitions()
+    {
+        for (; m_nextTransitionSetToProcess < m_transitionPartition.size(); ++m_nextTransitionSetToProcess) {
+            // We use the known splitters to refine the set.
+            m_transitionPartition.iterateSet(m_nextTransitionSetToProcess, [&](unsigned transitionIndex) {
+                unsigned sourceNodeIndex = m_flattenedTransitions[transitionIndex].source;
+                m_nodePartition.markElementInCurrentGeneration(sourceNodeIndex);
+            });
+
+            refinePartitions();
+        }
+    }
+
+    void splitByFallbackTransitions()
+    {
+        ASSERT_WITH_MESSAGE(m_nextTransitionSetToProcess || !m_transitionPartition.size(), "We can only distinguish nodes by fallback transition *after* all other transitions are covered. Doing otherwise would be incorrect since the unique transitions from 2 nodes could cover all possible transitions.");
+
+        for (unsigned fallbackTransitionSetIndex = 0; fallbackTransitionSetIndex < m_fallbackTransitionPartition.size(); ++fallbackTransitionSetIndex) {
+
+            m_fallbackTransitionPartition.iterateSet(fallbackTransitionSetIndex, [&](unsigned transitionIndex) {
+                unsigned sourceNodeIndex = m_flattenedFallbackTransitions[transitionIndex];
+                m_nodePartition.markElementInCurrentGeneration(sourceNodeIndex);
+            });
+            refinePartitions();
+
+            // Any new split need to spread to all the unique transition that can reach the two new sets.
+            splitByUniqueTransitions();
+        }
+    }
+
+    unsigned nodeReplacement(unsigned nodeIndex)
+    {
+        unsigned setIndex = m_nodePartition.setIndex(nodeIndex);
+        return m_nodePartition.firstElementInSet(setIndex);
+    }
+
+private:
+    struct Transition {
+        unsigned source;
+        unsigned target;
+        uint16_t character;
+    };
+
+    Vector<unsigned> m_flattenedTransitionsStartOffsetPerNode;
+    Vector<unsigned> m_flattenedTransitionsSizePerNode;
+    Vector<unsigned> m_flattenedFallbackTransitionsStartOffsetPerNode;
+    Vector<unsigned> m_flattenedFallbackTransitionsSizePerNode;
+
+    Vector<Transition> m_flattenedTransitions;
+    Vector<unsigned> m_flattenedFallbackTransitions;
+
+    Partition m_nodePartition;
+    Partition m_transitionPartition;
+    Partition m_fallbackTransitionPartition;
+
+    unsigned m_nextTransitionSetToProcess { 0 };
+};
+
+struct ActionKey {
+    enum DeletedValueTag { DeletedValue };
+    explicit ActionKey(DeletedValueTag) { state = Deleted; }
+
+    enum EmptyValueTag { EmptyValue };
+    explicit ActionKey(EmptyValueTag) { state = Empty; }
+
+    explicit ActionKey(const Vector<uint64_t>& actions)
+        : actions(&actions)
+        , state(Valid)
+    {
+        StringHasher hasher;
+        hasher.addCharactersAssumingAligned(reinterpret_cast<const UChar*>(actions.data()), actions.size() * sizeof(uint64_t) / sizeof(UChar));
+        hash = hasher.hash();
+    }
+
+    bool isEmptyValue() const { return state == Empty; }
+    bool isDeletedValue() const { return state == Deleted; }
+
+    const Vector<uint64_t>* actions;
+    unsigned hash;
+    enum {
+        Valid,
+        Empty,
+        Deleted
+    } state;
+};
+
+struct ActionKeyHash {
+    static unsigned hash(const ActionKey& actionKey)
+    {
+        return actionKey.hash;
+    }
+
+    static bool equal(const ActionKey& a, const ActionKey& b)
+    {
+        return a.state == b.state && *a.actions == *b.actions;
+    }
+    static const bool safeToCompareToEmptyOrDeleted = false;
+};
+
+struct ActionKeyHashTraits : public WTF::CustomHashTraits<ActionKey> {
+    static const bool emptyValueIsZero = true;
+};
+
+} // anonymous namespace.
+
+unsigned DFAMinimizer::minimize(Vector<DFANode>& dfaGraph, unsigned root)
+{
+    simplifyTransitions(dfaGraph);
+
+    FullGraphPartition fullGraphPartition(dfaGraph);
+
+    // Unlike traditional minimization final states can be differentiated by their action.
+    // Instead of creating a single set for the final state, we partition by actions from
+    // the start.
+    HashMap<ActionKey, Vector<unsigned>, ActionKeyHash, ActionKeyHashTraits> finalStates;
+    for (unsigned i = 0; i < dfaGraph.size(); ++i) {
+        Vector<uint64_t>& actions = dfaGraph[i].actions;
+        if (actions.size()) {
+            std::sort(actions.begin(), actions.end());
+            auto addResult = finalStates.add(ActionKey(actions), Vector<unsigned>());
+            addResult.iterator->value.append(i);
+        }
+    }
+
+    for (const auto& slot : finalStates) {
+        for (unsigned finalStateIndex : slot.value)
+            fullGraphPartition.markNode(finalStateIndex);
+        fullGraphPartition.refinePartitions();
+    }
+
+    // Use every splitter to refine the node partitions.
+    fullGraphPartition.splitByUniqueTransitions();
+
+    // At this stage, we know that no pair of state can be distinguished by the individual transitions. They can still
+    // be distinguished by their fallback transitions.
+    //
+    // For example, two states [1, 2] can both have a transition on 'x' to a state [3], but each have fallback transition
+    // to different states [4] and [5].
+    //
+    // Here, we distinguish such cases and at each stage refine the new discovered sets by their individual transitions.
+    fullGraphPartition.splitByFallbackTransitions();
+
+    Vector<unsigned> relocationVector;
+    relocationVector.reserveInitialCapacity(dfaGraph.size());
+    for (unsigned i = 0; i < dfaGraph.size(); ++i)
+        relocationVector.append(i);
+
+    // Kill the useless nodes and keep track of the new node transitions should point to.
+    for (unsigned i = 0; i < dfaGraph.size(); ++i) {
+        unsigned replacement = fullGraphPartition.nodeReplacement(i);
+        if (i != replacement) {
+            relocationVector[i] = replacement;
+
+            DFANode& node = dfaGraph[i];
+            node.actions.clear();
+            node.transitions.clear();
+            node.hasFallbackTransition = false;
+            node.isKilled = true;
+        }
+    }
+
+    for (DFANode& node : dfaGraph) {
+        for (auto& transition : node.transitions)
+            transition.value = relocationVector[transition.value];
+        if (node.hasFallbackTransition)
+            node.fallbackTransition = relocationVector[node.fallbackTransition];
+    }
+
+    // After minimizing, there is no guarantee individual transition are still poiting to different states.
+    // The state pointed by one individual transition and the fallback states may have been merged.
+    simplifyTransitions(dfaGraph);
+
+    return relocationVector[root];
+}
+
+} // namespace ContentExtensions
+} // namespace WebCore
+
+#endif // ENABLE(CONTENT_EXTENSIONS)

Added: trunk/Source/WebCore/contentextensions/DFAMinimizer.h (0 => 182808)


--- trunk/Source/WebCore/contentextensions/DFAMinimizer.h	                        (rev 0)
+++ trunk/Source/WebCore/contentextensions/DFAMinimizer.h	2015-04-14 21:08:02 UTC (rev 182808)
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef DFAMinimizer_h
+#define DFAMinimizer_h
+
+#if ENABLE(CONTENT_EXTENSIONS)
+
+#include "DFANode.h"
+#include <wtf/Vector.h>
+
+namespace WebCore {
+namespace ContentExtensions {
+
+class DFAMinimizer {
+public:
+    WEBCORE_EXPORT static unsigned minimize(Vector<DFANode>& dfaGraph, unsigned rootNode);
+};
+
+} // namespace ContentExtensions
+} // namespace WebCore
+
+#endif // ENABLE(CONTENT_EXTENSIONS)
+
+#endif // DFAMinimizer_h

Modified: trunk/Source/WebCore/contentextensions/DFANode.h (182807 => 182808)


--- trunk/Source/WebCore/contentextensions/DFANode.h	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/contentextensions/DFANode.h	2015-04-14 21:08:02 UTC (rev 182808)
@@ -44,7 +44,8 @@
 class DFANode {
 public:
     DFANodeTransitions transitions;
-    bool hasFallbackTransition = false;
+    bool hasFallbackTransition { false };
+    bool isKilled { false };
     unsigned fallbackTransition;
     Vector<uint64_t> actions;
 

Modified: trunk/Source/WebCore/contentextensions/NFAToDFA.cpp (182807 => 182808)


--- trunk/Source/WebCore/contentextensions/NFAToDFA.cpp	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/contentextensions/NFAToDFA.cpp	2015-04-14 21:08:02 UTC (rev 182808)
@@ -338,49 +338,6 @@
     return uniqueNodeIdAddResult.iterator->impl()->m_dfaNodeId;
 }
 
-static void simplifyTransitions(Vector<DFANode>& dfaGraph)
-{
-    for (DFANode& dfaNode : dfaGraph) {
-        if (!dfaNode.hasFallbackTransition
-            && ((dfaNode.transitions.size() == 126 && !dfaNode.transitions.contains(0))
-                || (dfaNode.transitions.size() == 127 && dfaNode.transitions.contains(0)))) {
-            unsigned bestTarget = std::numeric_limits<unsigned>::max();
-            unsigned bestTargetScore = 0;
-            HashMap<unsigned, unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> targetHistogram;
-            for (const auto transition : dfaNode.transitions) {
-                if (!transition.key)
-                    continue;
-
-                unsigned transitionTarget = transition.value;
-                auto addResult = targetHistogram.add(transitionTarget, 1);
-                if (!addResult.isNewEntry)
-                    addResult.iterator->value++;
-
-                if (addResult.iterator->value > bestTargetScore) {
-                    bestTargetScore = addResult.iterator->value;
-                    bestTarget = transitionTarget;
-                }
-            }
-            ASSERT_WITH_MESSAGE(bestTargetScore, "There should be at least one valid target since having transitions is a precondition to enter this path.");
-
-            dfaNode.hasFallbackTransition = true;
-            dfaNode.fallbackTransition = bestTarget;
-        }
-
-        if (dfaNode.hasFallbackTransition) {
-            Vector<uint16_t, 128> keys;
-            DFANodeTransitions& transitions = dfaNode.transitions;
-            copyKeysToVector(transitions, keys);
-
-            for (uint16_t key : keys) {
-                auto transitionIterator = transitions.find(key);
-                if (transitionIterator->value == dfaNode.fallbackTransition)
-                    transitions.remove(transitionIterator);
-            }
-        }
-    }
-}
-
 DFA NFAToDFA::convert(NFA& nfa)
 {
     Vector<NFANode>& nfaGraph = nfa.m_nodes;
@@ -430,7 +387,6 @@
         }
     } while (!unprocessedNodes.isEmpty());
 
-    simplifyTransitions(dfaGraph);
     return DFA(WTF::move(dfaGraph), 0);
 }
 

Modified: trunk/Source/WebCore/contentextensions/NFAToDFA.h (182807 => 182808)


--- trunk/Source/WebCore/contentextensions/NFAToDFA.h	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Source/WebCore/contentextensions/NFAToDFA.h	2015-04-14 21:08:02 UTC (rev 182808)
@@ -39,7 +39,7 @@
 // NFAToDFA provides a way to build a DFA corresponding to a NFA.
 class NFAToDFA {
 public:
-    static DFA convert(NFA&);
+    WEBCORE_EXPORT static DFA convert(NFA&);
 };
 
 }

Modified: trunk/Tools/ChangeLog (182807 => 182808)


--- trunk/Tools/ChangeLog	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Tools/ChangeLog	2015-04-14 21:08:02 UTC (rev 182808)
@@ -1,3 +1,13 @@
+2015-04-14  Benjamin Poulain  <benja...@webkit.org>
+
+        Add a conservative DFA minimizer for the content extension matcher
+        https://bugs.webkit.org/show_bug.cgi?id=143501
+
+        Reviewed by Alex Christensen.
+
+        * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
+        * TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp:
+
 2015-04-14  Daniel Bates  <daba...@apple.com>
 
         Skip failing test Tests/WebKit2Cocoa/FixedLayoutSize.mm on iOS

Modified: trunk/Tools/TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj (182807 => 182808)


--- trunk/Tools/TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Tools/TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj	2015-04-14 21:08:02 UTC (rev 182808)
@@ -24,6 +24,7 @@
 		26F52EAF18288C230023D412 /* geolocationGetCurrentPositionWithHighAccuracy.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 26F52EAE18288C040023D412 /* geolocationGetCurrentPositionWithHighAccuracy.html */; };
 		26F52EB218288F240023D412 /* geolocationWatchPosition.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 26F52EB018288F0F0023D412 /* geolocationWatchPosition.html */; };
 		26F52EB318288F240023D412 /* geolocationWatchPositionWithHighAccuracy.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 26F52EB118288F0F0023D412 /* geolocationWatchPositionWithHighAccuracy.html */; };
+		26F6E1F01ADC749B00DE696B /* DFAMinimizer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 26F6E1EF1ADC749B00DE696B /* DFAMinimizer.cpp */; };
 		290A9BB91735F63800D71BBC /* OpenNewWindow.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 290A9BB81735F42300D71BBC /* OpenNewWindow.html */; };
 		290F4275172A221C00939FF0 /* custom-protocol-sync-xhr.html in Copy Resources */ = {isa = PBXBuildFile; fileRef = 290F4274172A1FDE00939FF0 /* custom-protocol-sync-xhr.html */; };
 		297234B7173AFAC700983601 /* CustomProtocolsInvalidScheme_Bundle.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 297234B5173AFAC700983601 /* CustomProtocolsInvalidScheme_Bundle.cpp */; };
@@ -441,6 +442,7 @@
 		26F52EAE18288C040023D412 /* geolocationGetCurrentPositionWithHighAccuracy.html */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.html; path = geolocationGetCurrentPositionWithHighAccuracy.html; sourceTree = "<group>"; };
 		26F52EB018288F0F0023D412 /* geolocationWatchPosition.html */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.html; path = geolocationWatchPosition.html; sourceTree = "<group>"; };
 		26F52EB118288F0F0023D412 /* geolocationWatchPositionWithHighAccuracy.html */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.html; path = geolocationWatchPositionWithHighAccuracy.html; sourceTree = "<group>"; };
+		26F6E1EF1ADC749B00DE696B /* DFAMinimizer.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = DFAMinimizer.cpp; sourceTree = "<group>"; };
 		290A9BB51735DE8A00D71BBC /* CloseNewWindowInNavigationPolicyDelegate.mm */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.objcpp; path = CloseNewWindowInNavigationPolicyDelegate.mm; sourceTree = "<group>"; };
 		290A9BB81735F42300D71BBC /* OpenNewWindow.html */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.html; path = OpenNewWindow.html; sourceTree = "<group>"; };
 		290F4274172A1FDE00939FF0 /* custom-protocol-sync-xhr.html */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = text.html; path = "custom-protocol-sync-xhr.html"; sourceTree = "<group>"; };
@@ -845,6 +847,7 @@
 				93A720E518F1A0E800A848E1 /* CalculationValue.cpp */,
 				7CB184C41AA3F2100066EDFD /* ContentExtensions.cpp */,
 				CD5451E919E41F9D0016936F /* CSSParser.cpp */,
+				26F6E1EF1ADC749B00DE696B /* DFAMinimizer.cpp */,
 				14464012167A8305000BD218 /* LayoutUnit.cpp */,
 				CDC2C7141797089D00E627FB /* TimeRanges.cpp */,
 				440A1D3814A0103A008A66F2 /* URL.cpp */,
@@ -1581,6 +1584,7 @@
 				7AA6A1521AAC0B31002B2ED3 /* WorkQueue.cpp in Sources */,
 				2E7765CF16C4D81100BA2BB1 /* mainMac.mm in Sources */,
 				7A5623111AD5AF3E0096B920 /* MenuTypesForMouseEvents.cpp in Sources */,
+				26F6E1F01ADC749B00DE696B /* DFAMinimizer.cpp in Sources */,
 				7A99D9941AD4A29D00373141 /* MenuTypesForMouseEvents.mm in Sources */,
 			);
 			runOnlyForDeploymentPostprocessing = 0;

Modified: trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp (182807 => 182808)


--- trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp	2015-04-14 20:54:36 UTC (rev 182807)
+++ trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp	2015-04-14 21:08:02 UTC (rev 182808)
@@ -730,4 +730,190 @@
     testPatternStatus("((foo)?((.)*)(bar)*)", ContentExtensions::URLFilterParser::ParseStatus::MatchesEverything);
 }
 
+TEST_F(ContentExtensionTest, MinimizingWithMoreFinalStatesThanNonFinalStates)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^h[a-z://]+\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^http://foo.com/\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^http://bar.com/\"}}]");
+
+    testRequest(backend, mainDocumentRequest("http://foo.com/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("http://bar.com/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("attp://foo.com/"), { });
+    testRequest(backend, mainDocumentRequest("attp://bar.com/"), { });
+
+    testRequest(backend, mainDocumentRequest("http://webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("https://webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bttp://webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bttps://webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("http://webkit.org/b"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("https://webkit.org/b"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("cttp://webkit.org/B"), { });
+    testRequest(backend, mainDocumentRequest("cttps://webkit.org/B"), { });
+}
+
+TEST_F(ContentExtensionTest, StatesWithDifferentActionsAreNotUnified1)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^http://www.webkit.org/\"}},"
+        "{\"action\":{\"type\":\"block-cookies\"},\"trigger\":{\"url-filter\":\"^https://www.webkit.org/\"}},"
+        "{\"action\":{\"type\":\"block-cookies\"},\"trigger\":{\"url-filter\":\"^attps://www.webkit.org/\"}}]");
+
+    testRequest(backend, mainDocumentRequest("http://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("https://www.webkit.org/"), { ContentExtensions::ActionType::BlockCookies });
+    testRequest(backend, mainDocumentRequest("attps://www.webkit.org/"), { ContentExtensions::ActionType::BlockCookies });
+    testRequest(backend, mainDocumentRequest("http://www.webkit.org/a"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("https://www.webkit.org/B"), { ContentExtensions::ActionType::BlockCookies });
+    testRequest(backend, mainDocumentRequest("attps://www.webkit.org/c"), { ContentExtensions::ActionType::BlockCookies });
+    testRequest(backend, mainDocumentRequest("http://www.whatwg.org/"), { });
+    testRequest(backend, mainDocumentRequest("https://www.whatwg.org/"), { });
+    testRequest(backend, mainDocumentRequest("attps://www.whatwg.org/"), { });
+}
+
+TEST_F(ContentExtensionTest, StatesWithDifferentActionsAreNotUnified2)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^http://www.webkit.org/\"}},"
+        "{\"action\":{\"type\":\"block-cookies\"},\"trigger\":{\"url-filter\":\"^https://www.webkit.org/\"}},"
+        "{\"action\":{\"type\":\"css-display-none\", \"selector\":\"#foo\"},\"trigger\":{\"url-filter\":\"^https://www.webkit.org/\"}}]");
+
+    testRequest(backend, mainDocumentRequest("http://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("https://www.webkit.org/"), { ContentExtensions::ActionType::CSSDisplayNoneSelector, ContentExtensions::ActionType::BlockCookies });
+    testRequest(backend, mainDocumentRequest("https://www.whatwg.org/"), { });
+    testRequest(backend, mainDocumentRequest("attps://www.whatwg.org/"), { });
+}
+
+// The order in which transitions from the root will be processed is unpredictable.
+// To exercises the various options, this test exists in various version exchanging the transition to the final state.
+TEST_F(ContentExtensionTest, FallbackTransitionsWithDifferentiatorDoNotMerge1)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^a.a\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^b.a\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^bac\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^bbc\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^BCC\"}}]");
+
+    testRequest(backend, mainDocumentRequest("aza://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bza://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bac://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bbc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bcc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+
+    testRequest(backend, mainDocumentRequest("aac://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("abc://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("acc://www.webkit.org/"), { });
+
+    testRequest(backend, mainDocumentRequest("bzc://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bzc://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bzc://www.webkit.org/"), { });
+}
+TEST_F(ContentExtensionTest, FallbackTransitionsWithDifferentiatorDoNotMerge2)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^bac\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^bbc\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^BCC\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^a.a\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^b.a\"}}]");
+
+    testRequest(backend, mainDocumentRequest("aza://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bza://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bac://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bbc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bcc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+
+    testRequest(backend, mainDocumentRequest("aac://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("abc://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("acc://www.webkit.org/"), { });
+
+    testRequest(backend, mainDocumentRequest("bzc://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bzc://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bzc://www.webkit.org/"), { });
+}
+TEST_F(ContentExtensionTest, FallbackTransitionsWithDifferentiatorDoNotMerge3)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^a.c\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^b.c\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^baa\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^bba\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^BCA\"}}]");
+
+    testRequest(backend, mainDocumentRequest("azc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bzc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("baa://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bba://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bca://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+
+    testRequest(backend, mainDocumentRequest("aaa://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("aba://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("aca://www.webkit.org/"), { });
+
+    testRequest(backend, mainDocumentRequest("bza://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bza://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bza://www.webkit.org/"), { });
+}
+TEST_F(ContentExtensionTest, FallbackTransitionsWithDifferentiatorDoNotMerge4)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^baa\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^bba\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^BCA\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^a.c\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^b.c\"}}]");
+
+    testRequest(backend, mainDocumentRequest("azc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bzc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("baa://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bba://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bca://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+
+    testRequest(backend, mainDocumentRequest("aaa://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("aba://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("aca://www.webkit.org/"), { });
+
+    testRequest(backend, mainDocumentRequest("bza://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bza://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bza://www.webkit.org/"), { });
+}
+
+TEST_F(ContentExtensionTest, FallbackTransitionsToOtherNodeInSameGroupDoesNotDifferentiateGroup)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^aac\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^a.c\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^b.c\"}}]");
+
+    testRequest(backend, mainDocumentRequest("aac://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("abc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("bac://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("abc://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+
+    testRequest(backend, mainDocumentRequest("aaa://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("aca://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("baa://www.webkit.org/"), { });
+}
+
+TEST_F(ContentExtensionTest, SimpleFallBackTransitionDifferentiator1)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^a.bc.de\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^a.bd.ef\"}}]");
+
+    testRequest(backend, mainDocumentRequest("abbccde://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("aabcdde://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("aabddef://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("aabddef://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+
+    testRequest(backend, mainDocumentRequest("abcde://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("abdef://www.webkit.org/"), { });
+}
+
+TEST_F(ContentExtensionTest, SimpleFallBackTransitionDifferentiator2)
+{
+    auto backend = makeBackend("[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^cb.\"}},"
+        "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^db.b\"}}]");
+
+    testRequest(backend, mainDocumentRequest("cba://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("cbb://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("dbab://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testRequest(backend, mainDocumentRequest("dbxb://www.webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+
+    testRequest(backend, mainDocumentRequest("cca://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("dddd://www.webkit.org/"), { });
+    testRequest(backend, mainDocumentRequest("bbbb://www.webkit.org/"), { });
+}
+
 } // namespace TestWebKitAPI

Added: trunk/Tools/TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp (0 => 182808)


--- trunk/Tools/TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp	                        (rev 0)
+++ trunk/Tools/TestWebKitAPI/Tests/WebCore/DFAMinimizer.cpp	2015-04-14 21:08:02 UTC (rev 182808)
@@ -0,0 +1,131 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+
+#include <WebCore/CombinedURLFilters.h>
+#include <WebCore/NFA.h>
+#include <WebCore/NFAToDFA.h>
+#include <WebCore/URLFilterParser.h>
+#include <wtf/MainThread.h>
+
+using namespace WebCore;
+
+namespace TestWebKitAPI {
+
+class DFAMinimizerTest : public testing::Test {
+public:
+    virtual void SetUp()
+    {
+        WTF::initializeMainThread();
+    }
+};
+
+unsigned countLiveNodes(const ContentExtensions::DFA& dfa)
+{
+    unsigned counter = 0;
+    for (unsigned i = 0; i < dfa.size(); ++i) {
+        if (!dfa.nodeAt(i).isKilled)
+            ++counter;
+    }
+    return counter;
+}
+
+ContentExtensions::DFA buildDFAFromPatterns(Vector<const char*> patterns)
+{
+    ContentExtensions::CombinedURLFilters combinedURLFilters;
+    ContentExtensions::URLFilterParser parser(combinedURLFilters);
+
+    for (const char* pattern : patterns)
+        parser.addPattern(pattern, false, 0);
+    Vector<ContentExtensions::NFA> nfas = combinedURLFilters.createNFAs();
+    return ContentExtensions::NFAToDFA::convert(nfas[0]);
+}
+
+TEST_F(DFAMinimizerTest, BasicSearch)
+{
+    ContentExtensions::DFA dfa = buildDFAFromPatterns({ ".*foo", ".*bar", ".*bang"});
+    EXPECT_EQ(static_cast<size_t>(10), countLiveNodes(dfa));
+    dfa.minimize();
+    EXPECT_EQ(static_cast<size_t>(7), countLiveNodes(dfa));
+}
+
+TEST_F(DFAMinimizerTest, FallbackTransitionsWithDifferentiatorDoNotMerge1)
+{
+    ContentExtensions::DFA dfa = buildDFAFromPatterns({ "^a.a", "^b.a", "^bac", "^bbc", "^BCC"});
+    EXPECT_EQ(static_cast<size_t>(13), countLiveNodes(dfa));
+    dfa.minimize();
+    EXPECT_EQ(static_cast<size_t>(6), countLiveNodes(dfa));
+}
+
+TEST_F(DFAMinimizerTest, FallbackTransitionsWithDifferentiatorDoNotMerge2)
+{
+    ContentExtensions::DFA dfa = buildDFAFromPatterns({ "^bbc", "^BCC", "^a.a", "^b.a"});
+    EXPECT_EQ(static_cast<size_t>(11), countLiveNodes(dfa));
+    dfa.minimize();
+    EXPECT_EQ(static_cast<size_t>(6), countLiveNodes(dfa));
+}
+
+TEST_F(DFAMinimizerTest, FallbackTransitionsWithDifferentiatorDoNotMerge3)
+{
+    ContentExtensions::DFA dfa = buildDFAFromPatterns({ "^a.c", "^b.c", "^baa", "^bba", "^BCA"});
+    EXPECT_EQ(static_cast<size_t>(13), countLiveNodes(dfa));
+    dfa.minimize();
+    EXPECT_EQ(static_cast<size_t>(6), countLiveNodes(dfa));
+}
+
+TEST_F(DFAMinimizerTest, FallbackTransitionsWithDifferentiatorDoNotMerge4)
+{
+    ContentExtensions::DFA dfa = buildDFAFromPatterns({ "^baa", "^bba", "^BCA", "^a.c", "^b.c"});
+    EXPECT_EQ(static_cast<size_t>(13), countLiveNodes(dfa));
+    dfa.minimize();
+    EXPECT_EQ(static_cast<size_t>(6), countLiveNodes(dfa));
+}
+
+TEST_F(DFAMinimizerTest, FallbackTransitionsToOtherNodeInSameGroupDoesNotDifferentiateGroup)
+{
+    ContentExtensions::DFA dfa = buildDFAFromPatterns({ "^aac", "^a.c", "^b.c"});
+    EXPECT_EQ(static_cast<size_t>(9), countLiveNodes(dfa));
+    dfa.minimize();
+    EXPECT_EQ(static_cast<size_t>(5), countLiveNodes(dfa));
+}
+
+TEST_F(DFAMinimizerTest, SimpleFallBackTransitionDifferentiator1)
+{
+    ContentExtensions::DFA dfa = buildDFAFromPatterns({ "^a.bc.de", "^a.bd.ef"});
+    EXPECT_EQ(static_cast<size_t>(12), countLiveNodes(dfa));
+    dfa.minimize();
+    EXPECT_EQ(static_cast<size_t>(11), countLiveNodes(dfa));
+}
+
+TEST_F(DFAMinimizerTest, SimpleFallBackTransitionDifferentiator2)
+{
+    ContentExtensions::DFA dfa = buildDFAFromPatterns({ "^cb.", "^db.b"});
+    EXPECT_EQ(static_cast<size_t>(8), countLiveNodes(dfa));
+    dfa.minimize();
+    EXPECT_EQ(static_cast<size_t>(7), countLiveNodes(dfa));
+}
+
+} // namespace TestWebKitAPI
_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to