How to implement Multi-tenancy in Flink

2018-07-03 Thread Ahmad Hassan
Hi Folks, We are using Flink to capture various interactions of a customer with ECommerce store i.e. product views, orders created. We run 24 hour sliding window 5 minutes apart which makes 288 parallel windows for a single Tenant. We implement Fold Method that has various hashmaps to update the s

Re: How to implement Multi-tenancy in Flink

2018-07-04 Thread Fabian Hueske
Hi Ahmad, Some tricks that might help to bring down the effort per tenant if you run one job per tenant (or key per tenant): - Pre-aggregate records in a 5 minute Tumbling window. However, pre-aggregation does not work for FoldFunctions. - Implement the window as a custom ProcessFunction that mai

Re: How to implement Multi-tenancy in Flink

2018-07-04 Thread Ahmad Hassan
Hi Fabian, One job per tenant model soon becomes hard to maintain. For example 1000 tenants would require 1000 Flink and providing HA and resilience for 1000 jobs is not so trivial solution. This is why we are hoping to get single flink job handling all the tenants through keyby tenant. However t

Re: How to implement Multi-tenancy in Flink

2018-07-04 Thread Chesnay Schepler
Would it be feasible for you to partition your tenants across jobs, like for example 100 customers per job? On 04.07.2018 12:53, Ahmad Hassan wrote: Hi Fabian, One job per tenant model soon becomes hard to maintain. For example 1000 tenants would require 1000 Flink and providing HA and resili

Re: How to implement Multi-tenancy in Flink

2018-07-05 Thread Ahmad Hassan
HI Chesnay, Yes this is something we would eventually be doing and then maintaining the configuration of which tenants are mapped to which flink jobs. This would reduce the number of flinks jobs to maintain in order to support 1000s of tenants in our use case . Thanks. On Wed, 4 Jul 2018 at 12:

Re: How to implement Multi-tenancy in Flink

2019-08-01 Thread Ahmad Hassan
Hi Fabian, > On 4 Jul 2018, at 11:39, Fabian Hueske wrote: > > - Pre-aggregate records in a 5 minute Tumbling window. However, > pre-aggregation does not work for FoldFunctions. > - Implement the window as a custom ProcessFunction that maintains a state of > 288 events and aggregates and ret

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Fabian Hueske
Hi Ahmad, First of all, you need to preaggregate the data in a 5 minute tumbling window. For example, if your aggregation function is count or sum, this is simple. You have a 5 min tumbling window that just emits a count or sum every 5 minutes. The ProcessFunction then has a MapState (called buff

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Ahmad Hassan
Hi Fabian, Thanks for this detail. However, our pipeline is keeping track of list of products seen in 24 hour with 5 min slide (288 windows). inStream .filter(Objects::*nonNull*) .keyBy(*TENANT*) .window(SlidingProcessingTimeWindows.*of*(Time.*minutes*(24), Time.*minutes* (5))) .trigger(TimeT

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Fabian Hueske
Ok, I won't go into the implementation detail. The idea is to track all products that were observed in the last five minutes (i.e., unique product ids) in a five minute tumbling window. Every five minutes, the observed products are send to a process function that collects the data of the last 24 h

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Ahmad Hassan
Hi Fabian, Thank you, We will look into it now. Best, On Fri, 2 Aug 2019 at 12:50, Fabian Hueske wrote: > Ok, I won't go into the implementation detail. > > The idea is to track all products that were observed in the last five > minutes (i.e., unique product ids) in a five minute tumbling wind

Re: How to implement Multi-tenancy in Flink

2019-08-15 Thread Ahmad Hassan
Hi Fabian, In this case, how do we emit tumbling window when there are no events? Otherwise it’s not possible to emulate a sliding window in process function and move the buffer ring every 5 mins when there are no events. Yes I can create a periodic source function but how can it be associated

Re: How to implement Multi-tenancy in Flink

2019-08-16 Thread Fabian Hueske
Hi Ahmad, The ProcessFunction should not rely on new records to come (i..e, do the processsing in the onElement() method) but rather register a timer every 5 minutes and perform the processing when the timer fires in onTimer(). Essentially, you'd only collect data the data in `processElement()` an

Re: How to implement Multi-tenancy in Flink

2019-08-19 Thread Ahmad Hassan
Thank you Fabian. This works really well. Best Regards, On Fri, 16 Aug 2019 at 09:22, Fabian Hueske wrote: > Hi Ahmad, > > The ProcessFunction should not rely on new records to come (i..e, do the > processsing in the onElement() method) but rather register a timer every 5 > minutes and perform

Re: How to implement Multi-tenancy in Flink

2019-08-19 Thread Fabian Hueske
Great! Thanks for the feedback. Cheers, Fabian Am Mo., 19. Aug. 2019 um 17:12 Uhr schrieb Ahmad Hassan < ahmad.has...@gmail.com>: > > Thank you Fabian. This works really well. > > Best Regards, > > On Fri, 16 Aug 2019 at 09:22, Fabian Hueske wrote: > >> Hi Ahmad, >> >> The ProcessFunction shou

Re: How to implement Multi-tenancy in Flink

2019-08-19 Thread Ahmad Hassan
Flink's sliding window didn't work well for our use case at SAP as the checkpointing freezes with 288 sliding windows per tenant. Implementing sliding window through tumbling window / process function reduces the checkpointing time to few seconds. We will see how that scales with 1000 or more tenan