This is an automated email from the ASF dual-hosted git repository.

twalthr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 63a6a0e1411e25be50a48302422c0bde0dd3ac78
Author: Dawid Wysakowicz <dwysakow...@apache.org>
AuthorDate: Wed Nov 28 14:29:14 2018 +0100

    [FLINK-7603] [docs] Update documentation with WITHIN clause for 
MATCH_RECOGNIZE
    
    This closes #7187.
---
 docs/dev/table/sql.md                       |  1 +
 docs/dev/table/streaming/match_recognize.md | 76 ++++++++++++++++++++++++++---
 2 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/docs/dev/table/sql.md b/docs/dev/table/sql.md
index 90e2006..55c144a 100644
--- a/docs/dev/table/sql.md
+++ b/docs/dev/table/sql.md
@@ -211,6 +211,7 @@ matchRecognize:
             | SKIP TO variable )
       ]
       PATTERN '(' pattern ')'
+      [ WITHIN intervalLiteral ]
       DEFINE variable AS condition [, variable AS condition ]*
       ')'
 
diff --git a/docs/dev/table/streaming/match_recognize.md 
b/docs/dev/table/streaming/match_recognize.md
index e61d384..01799cc 100644
--- a/docs/dev/table/streaming/match_recognize.md
+++ b/docs/dev/table/streaming/match_recognize.md
@@ -93,7 +93,7 @@ Every `MATCH_RECOGNIZE` query consists of the following 
clauses:
 * [MEASURES](#define--measures) - defines output of the clause; similar to a 
`SELECT` clause.
 * [ONE ROW PER MATCH](#output-mode) - output mode which defines how many rows 
per match should be produced.
 * [AFTER MATCH SKIP](#after-match-strategy) - specifies where the next match 
should start; this is also a way to control how many distinct matches a single 
event can belong to.
-* [PATTERN](#defining-pattern) - allows constructing patterns that will be 
searched for using a _regular expression_-like syntax.
+* [PATTERN](#defining-a-pattern) - allows constructing patterns that will be 
searched for using a _regular expression_-like syntax.
 * [DEFINE](#define--measures) - this section defines the conditions that the 
pattern variables must satisfy.
 
 <span class="label label-danger">Attention</span> Currently, the 
`MATCH_RECOGNIZE` clause can only be applied to an [append 
table](dynamic_tables.html#update-and-append-queries). Furthermore, it always 
produces
@@ -206,7 +206,7 @@ The `DEFINE` and `MEASURES` keywords have similar meanings 
to the `WHERE` and `S
 The `MEASURES` clause defines what will be included in the output of a 
matching pattern. It can project columns and define expressions for evaluation.
 The number of produced rows depends on the [output mode](#output-mode) setting.
 
-The `DEFINE` clause specifies conditions that rows have to fulfill in order to 
be classified to a corresponding [pattern variable](#defining-pattern).
+The `DEFINE` clause specifies conditions that rows have to fulfill in order to 
be classified to a corresponding [pattern variable](#defining-a-pattern).
 If a condition is not defined for a pattern variable, a default condition will 
be used which evaluates to `true` for every row.
 
 For a more detailed explanation about expressions that can be used in those 
clauses, please have a look at the [event stream 
navigation](#pattern-navigation) section.
@@ -311,6 +311,71 @@ DEFINE
 
 <span class="label label-danger">Attention</span> The optional reluctant 
quantifier (`A??` or `A{0,1}?`) is not supported right now.
 
+### Time constraint
+
+Especially for streaming use cases, it is often required that a pattern 
finishes within a given period of time.
+This allows for limiting the overall state size that Flink has to maintain 
internally, even in case of greedy quantifiers.
+
+Therefore, Flink SQL supports the additional (non-standard SQL) `WITHIN` 
clause for defining a time constraint for a pattern. The clause can be defined 
after the `PATTERN` clause and takes an interval of millisecond resolution.
+
+If the time between the first and last event of a potential match is longer 
than the given value, such a match will not be appended to the result table.
+
+<span class="label label-info">Note</span> It is generally encouraged to use 
the `WITHIN` clause as it helps Flink with efficient memory management. 
Underlying state can be pruned once the threshold is reached.
+
+<span class="label label-danger">Attention</span> However, the `WITHIN` clause 
is not part of the SQL standard. The recommended way of dealing with time 
constraints might change in the future.
+
+The use of the `WITHIN` clause is illustrated in the following example query:
+
+{% highlight sql %}
+SELECT *
+FROM Ticker
+    MATCH_RECOGNIZE(
+        PARTITION BY symbol
+        ORDER BY rowtime
+        MEASURES
+            C.rowtime AS dropTime,
+            A.price - C.price AS dropDiff
+        PATTERN (A B* C) WITHIN INTERVAL '1' HOUR
+        ONE ROW PER MATCH
+        AFTER MATCH SKIP PAST LAST ROW
+        DEFINE
+            B AS B.price > A.price - 10
+            C AS C.price < A.price - 10
+    )
+{% endhighlight %}
+
+The query detects a price drop of `10` that happens within an interval of 1 
hour.
+
+Let's assume the query is used to analyze the following ticker data:
+
+{% highlight text %}
+symbol         rowtime         price    tax
+======  ====================  ======= =======
+'ACME'  '01-Apr-11 10:00:00'   20      1
+'ACME'  '01-Apr-11 10:20:00'   17      2
+'ACME'  '01-Apr-11 10:40:00'   18      1
+'ACME'  '01-Apr-11 11:00:00'   11      3
+'ACME'  '01-Apr-11 11:20:00'   14      2
+'ACME'  '01-Apr-11 11:40:00'   9       1
+'ACME'  '01-Apr-11 12:00:00'   15      1
+'ACME'  '01-Apr-11 12:20:00'   14      2
+'ACME'  '01-Apr-11 12:40:00'   24      2
+'ACME'  '01-Apr-11 13:00:00'   1       2
+'ACME'  '01-Apr-11 13:20:00'   19      1
+{% endhighlight %}
+
+The query will produce the following results:
+
+{% highlight text %}
+symbol         dropTime         dropDiff
+======  ====================  =============
+'ACME'  '01-Apr-11 13:00:00'      14
+{% endhighlight %}
+
+The resulting row represents a price drop from `15` (at `01-Apr-11 12:00:00`) 
to `1` (at `01-Apr-11 13:00:00`). The `dropDiff` column contains the price 
difference.
+
+Notice that even though prices also drop by higher values, for example, by 
`11` (between `01-Apr-11 10:00:00` and `01-Apr-11 11:40:00`), the time 
difference between those two events is larger than 1 hour. Thus, they don't 
produce a match.
+
 Output Mode
 -----------
 
@@ -781,8 +846,8 @@ One has to keep in mind that in case of the `SKIP TO 
FIRST/LAST variable`strateg
 variable (e.g. for pattern `A*`). In such cases, a runtime exception will be 
thrown as the standard requires a valid row to continue the
 matching.
 
-
-### Controlling Memory Consumption
+Controlling Memory Consumption
+------------------------------
 
 Memory consumption is an important consideration when writing 
`MATCH_RECOGNIZE` queries, as the space of potential matches is built in a 
breadth-first-like manner.
 Having that in mind, one must make sure that the pattern can finish. 
Preferably with a reasonable number of rows mapped to the match as they have to 
fit into memory.
@@ -815,8 +880,7 @@ DEFINE
   C as C.price > 20
 {% endhighlight %}
 
-<span class="label label-danger">Attention</span> Please note that the 
`MATCH_RECOGNIZE` clause does not use a configured [state retention 
time](query_configuration.html#idle-state-retention-time). As of now, there is 
also no possibility to define a time restriction on the pattern to finish 
because there is no such possibility in the SQL standard. The community is in 
the process of designing a proper syntax for that
-feature right now.
+<span class="label label-danger">Attention</span> Please note that the 
`MATCH_RECOGNIZE` clause does not use a configured [state retention 
time](query_configuration.html#idle-state-retention-time). One may want to use 
the `WITHIN` [clause](#time-constraint) for this purpose.
 
 Known Limitations
 -----------------

Reply via email to