Hi Ahmad,

Some tricks that might help to bring down the effort per tenant if you run
one job per tenant (or key per tenant):

- Pre-aggregate records in a 5 minute Tumbling window. However,
pre-aggregation does not work for FoldFunctions.
- Implement the window as a custom ProcessFunction that maintains a state
of 288 events and aggregates and retracts the pre-aggregated records.

Best, Fabian


2018-07-03 15:22 GMT+02:00 Ahmad Hassan <ahmad.has...@gmail.com>:

> Hi Folks,
>
> We are using Flink to capture various interactions of a customer with
> ECommerce store i.e. product views, orders created. We run 24 hour sliding
> window 5 minutes apart which makes 288 parallel windows for a single
> Tenant. We implement Fold Method that has various hashmaps to update the
> statistics of customers from the incoming Ecommerce event one by one. As
> soon as the event arrives, the fold method updates the statistics in
> hashmaps.
>
> Considering 1000 Tenants, we have two solutions in mind:
>
> !) Implement a flink job per tenant. So 1000 tenants would create 1000
> flink jobs
>
> 2) Implement a single flink with keyBy 'tenant' so that each tenant gets a
> separate window. But this will end up in creating 1000 * 288 number of
> windows in 24 hour period. This would cause extra load on single flink job.
>
> What is recommended approach to handle multitenancy in flink at such a big
> scale with over 1000 tenants while storing the fold state for each event.
> Solution I would require significant effort to keep track of 1000 flink
> jobs and provide resilience.
>
> Thanks.
>
> Best Regards,
>

Reply via email to