Hi Nicolaus, I'm sending records as an attachment. Regards, Maciek
śr., 7 lip 2021 o 11:47 Nicolaus Weidner <nicolaus.weid...@data-artisans.com> napisał(a): > > Hi Maciek, > > is there a typo in the input data? Timestamp 2021-05-01 04:42:57 appears > twice, but timestamp 2021-05-01T15:28:34 (from the log lines) is not there at > all. I find it hard to correlate the logs with the input... > > Best regards, > Nico > > On Wed, Jul 7, 2021 at 11:16 AM Arvid Heise <ar...@apache.org> wrote: >> >> Hi Maciek, >> >> could you bypass the MATCH_RECOGNIZE (=comment out) and check if the records >> appear in a shortcutted output? >> >> I suspect that they may be filtered out before (for example because of >> number conversion issues with 0E-18) >> >> On Tue, Jul 6, 2021 at 3:26 PM Maciek Bryński <mac...@brynski.pl> wrote: >>> >>> Hi, >>> I have a very strange bug when using MATCH_RECOGNIZE. >>> >>> I'm using some joins and unions to create event stream. Sample event stream >>> (for one user) looks like this: >>> >>> uuid cif event_type v balance ts >>> 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx >>> 4294.380000000000000000 74.524950000000000000 2021-05-01 04:42:57 >>> 7b2bc022-b069-41ca-8bbf-e93e3f0e85a7 0004091386 application >>> 0E-18 74.524950000000000000 2021-05-01 10:29:10 >>> 942cd3ce-fb3d-43d3-a69a-aaeeec5ee90e 0004091386 application >>> 0E-18 74.524950000000000000 2021-05-01 10:39:02 >>> 433ac9bc-d395-457n-986c-19e30e375f2e 0004091386 trx >>> 4294.380000000000000000 74.524950000000000000 2021-05-01 04:42:57 >>> >>> Then I'm using following MATCH_RECOGNIZE definition (trace function will be >>> explained later) >>> >>> CREATE VIEW scenario_1 AS ( >>> SELECT * FROM events >>> MATCH_RECOGNIZE( >>> PARTITION BY cif >>> ORDER BY ts >>> MEASURES >>> TRX.v as trx_amount, >>> TRX.ts as trx_ts, >>> APP_1.ts as app_1_ts, >>> APP_2.ts as app_2_ts, >>> APP_2.balance as app_2_balance >>> ONE ROW PER MATCH >>> PATTERN (TRX ANY_EVENT*? APP_1 NOT_LOAN*? APP_2) WITHIN INTERVAL >>> '10' DAY >>> DEFINE >>> TRX AS trace(TRX.event_type = 'trx' AND TRX.v > 1000, >>> 'TRX', TRX.uuid, TRX.cif, TRX.event_type, TRX.ts), >>> ANY_EVENT AS trace(true, >>> 'ANY_EVENT', TRX.uuid, ANY_EVENT.cif, >>> ANY_EVENT.event_type, ANY_EVENT.ts), >>> APP_1 AS trace(APP_1.event_type = 'application' AND APP_1.ts < >>> TRX.ts + INTERVAL '3' DAY, >>> 'APP_1', TRX.uuid, APP_1.cif, APP_1.event_type, APP_1.ts), >>> APP_2 AS trace(APP_2.event_type = 'application' AND APP_2.ts > >>> APP_1.ts >>> AND APP_2.ts < APP_1.ts + INTERVAL '7' DAY AND >>> APP_2.balance < 100, >>> 'APP_2', TRX.uuid, APP_2.cif, APP_2.event_type, APP_2.ts), >>> NOT_LOAN AS trace(NOT_LOAN.event_type <> 'loan', >>> 'NOT_LOAN', TRX.uuid, NOT_LOAN.cif, NOT_LOAN.event_type, >>> NOT_LOAN.ts) >>> )) >>> >>> >>> This scenario could be matched by sample events because: >>> - TRX is matched by event with ts 2021-05-01 04:42:57 >>> - APP_1 by ts 2021-05-01 10:29:10 >>> - APP_2 by ts 2021-05-01 10:39:02 >>> Unfortunately I'm not getting any data. And it's not watermarks fault. >>> >>> Trace function has following code and gives me some logs: >>> >>> public class TraceUDF extends ScalarFunction { >>> >>> public Boolean eval(Boolean condition, @DataTypeHint(inputGroup = >>> InputGroup.ANY) Object ... message) { >>> log.info((condition ? "Condition true: " : "Condition false: ") + >>> Arrays.stream(message).map(Object::toString).collect(Collectors.joining(" >>> "))); >>> return condition; >>> } >>> } >>> >>> And log from this trace function is following. >>> >>> 2021-07-06 13:09:43,762 INFO TraceUDF [] - >>> Condition true: TRX 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx >>> 2021-05-01T04:42:57 >>> 2021-07-06 13:12:28,914 INFO TraceUDF [] - >>> Condition true: ANY_EVENT 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 >>> trx 2021-05-01T15:28:34 >>> 2021-07-06 13:12:28,915 INFO TraceUDF [] - >>> Condition false: APP_1 621456e9-389b-409b-aaca-bca99eeb43b3 0004091386 trx >>> 2021-05-01T15:28:34 >>> 2021-07-06 13:12:28,915 INFO TraceUDF [] - >>> Condition false: TRX 433ac9bc-d395-457n-986c-19e30e375f2e 0004091386 trx >>> 2021-05-01T15:28:34 >>> >>> As you can see 2 events are missing. >>> What can I do ? >>> I failed with create minimal example of this bug. Any other ideas ? -- Maciek Bryński