Hi Nicolaus,
I'm sending records as an attachment.

Regards,
Maciek

śr., 7 lip 2021 o 11:47 Nicolaus Weidner
<nicolaus.weid...@data-artisans.com> napisał(a):
>
> Hi Maciek,
>
> is there a typo in the input data? Timestamp 2021-05-01 04:42:57 appears 
> twice, but timestamp 2021-05-01T15:28:34 (from the log lines) is not there at 
> all. I find it hard to correlate the logs with the input...
>
> Best regards,
> Nico
>
> On Wed, Jul 7, 2021 at 11:16 AM Arvid Heise <ar...@apache.org> wrote:
>>
>> Hi Maciek,
>>
>> could you bypass the MATCH_RECOGNIZE (=comment out) and check if the records 
>> appear in a shortcutted output?
>>
>> I suspect that they may be filtered out before (for example because of 
>> number conversion issues with 0E-18)
>>
>> On Tue, Jul 6, 2021 at 3:26 PM Maciek Bryński <mac...@brynski.pl> wrote:
>>>
>>> Hi,
>>> I have a very strange bug when using MATCH_RECOGNIZE.
>>>
>>> I'm using some joins and unions to create event stream. Sample event stream 
>>> (for one user) looks like this:
>>>
>>> uuid    cif     event_type      v       balance ts
>>> 621456e9-389b-409b-aaca-bca99eeb43b3    0004091386      trx     
>>> 4294.380000000000000000 74.524950000000000000   2021-05-01 04:42:57
>>> 7b2bc022-b069-41ca-8bbf-e93e3f0e85a7    0004091386      application     
>>> 0E-18   74.524950000000000000   2021-05-01 10:29:10
>>> 942cd3ce-fb3d-43d3-a69a-aaeeec5ee90e    0004091386      application     
>>> 0E-18   74.524950000000000000   2021-05-01 10:39:02
>>> 433ac9bc-d395-457n-986c-19e30e375f2e    0004091386      trx     
>>> 4294.380000000000000000 74.524950000000000000   2021-05-01 04:42:57
>>>
>>> Then I'm using following MATCH_RECOGNIZE definition (trace function will be 
>>> explained later)
>>>
>>> CREATE VIEW scenario_1 AS (
>>> SELECT * FROM events
>>>     MATCH_RECOGNIZE(
>>>         PARTITION BY cif
>>>         ORDER BY ts
>>>         MEASURES
>>>             TRX.v as trx_amount,
>>>             TRX.ts as trx_ts,
>>>             APP_1.ts as app_1_ts,
>>>             APP_2.ts as app_2_ts,
>>>             APP_2.balance as app_2_balance
>>>         ONE ROW PER MATCH
>>>         PATTERN (TRX ANY_EVENT*? APP_1 NOT_LOAN*? APP_2) WITHIN INTERVAL 
>>> '10' DAY
>>>         DEFINE
>>>         TRX AS trace(TRX.event_type = 'trx' AND TRX.v > 1000,
>>>                   'TRX', TRX.uuid, TRX.cif, TRX.event_type, TRX.ts),
>>>         ANY_EVENT AS trace(true,
>>>                   'ANY_EVENT', TRX.uuid, ANY_EVENT.cif, 
>>> ANY_EVENT.event_type, ANY_EVENT.ts),
>>>         APP_1 AS trace(APP_1.event_type = 'application' AND APP_1.ts < 
>>> TRX.ts + INTERVAL '3' DAY,
>>>                   'APP_1', TRX.uuid, APP_1.cif, APP_1.event_type, APP_1.ts),
>>>         APP_2 AS trace(APP_2.event_type = 'application' AND APP_2.ts > 
>>> APP_1.ts
>>>                    AND APP_2.ts < APP_1.ts + INTERVAL '7' DAY AND 
>>> APP_2.balance < 100,
>>>                   'APP_2', TRX.uuid, APP_2.cif, APP_2.event_type, APP_2.ts),
>>>         NOT_LOAN AS trace(NOT_LOAN.event_type <> 'loan',
>>>                   'NOT_LOAN', TRX.uuid, NOT_LOAN.cif, NOT_LOAN.event_type, 
>>> NOT_LOAN.ts)
>>>     ))
>>>
>>>
>>> This scenario could be matched by sample events because:
>>> - TRX is matched by event with ts 2021-05-01 04:42:57
>>> - APP_1 by ts 2021-05-01 10:29:10
>>> - APP_2 by ts 2021-05-01 10:39:02
>>> Unfortunately I'm not getting any data. And it's not watermarks fault.
>>>
>>> Trace function has following code and gives me some logs:
>>>
>>> public class TraceUDF extends ScalarFunction {
>>>
>>>     public Boolean eval(Boolean condition, @DataTypeHint(inputGroup = 
>>> InputGroup.ANY) Object ... message) {
>>>         log.info((condition ? "Condition true: " : "Condition false: ") + 
>>> Arrays.stream(message).map(Object::toString).collect(Collectors.joining(" 
>>> ")));
>>>         return condition;
>>>     }
>>> }
>>>
>>> And log from this trace function is following.
>>>
>>> 2021-07-06 13:09:43,762 INFO TraceUDF                             [] - 
>>> Condition true: TRX 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx 
>>> 2021-05-01T04:42:57
>>> 2021-07-06 13:12:28,914 INFO  TraceUDF                             [] - 
>>> Condition true: ANY_EVENT 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 
>>> trx 2021-05-01T15:28:34
>>> 2021-07-06 13:12:28,915 INFO  TraceUDF                             [] - 
>>> Condition false: APP_1 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx 
>>> 2021-05-01T15:28:34
>>> 2021-07-06 13:12:28,915 INFO  TraceUDF                             [] - 
>>> Condition false: TRX 433ac9bc-d395-457n-986c-19e30e375f2e 0004091386 trx 
>>> 2021-05-01T15:28:34
>>>
>>> As you can see 2 events are missing.
>>> What can I do ?
>>> I failed with create minimal example of this bug. Any other ideas ?



-- 
Maciek Bryński

Reply via email to