Re: Regarding Stateful Functions

Jessy Ping Thu, 13 May 2021 00:59:48 -0700

Hi Austin,


Thanks for your insights.


We are currently following a microservice architecture for accomplishing
our data processing requirements. We are planning to use Flink as our
unified platform for all data processing tasks. Although most of our use
cases are a suitable fit for Flink, there is one use case that needs some
extra deep dive into the capabilities of Flink.


As I mentioned in my previous email, the processing flow of the use case in
discussion is as follows,


*ingress(>=10k/s)--> First transformation based on certain static rules -->
second transformation based on certain dynamic rules --> Third and final
transformation based on certain dynamic and static rules --> egress*


In our current design, we are using a microservice embedded Hazelcast
cluster. It's a complex system with several stability issues. We are
looking for an alternative solution based on open sources, and it seems
like the stateful function powered by Flink is an ideal candidate. The
following features of 'Stateful Functions' attracted us,

1. Consistent State.

2. No Database Required

3. Exactly once semantics.

4. Logical Addressing

5. Multi-language support.


Any additional insights in the already mentioned questions are helpful.

Thanks

Jessy

On Thu, 13 May 2021 at 04:25, Austin Cawley-Edwards <austin.caw...@gmail.com>
wrote:

> Hey Jessy,
>
> I'm not a Statefun expert but, hopefully, I can point you in the right
> direction for some of your questions. I'll also cc Gordan, who helps to
> maintain Statefun.
>
> *1. Is the stateful function a good candidate for a system(as above) that
>> should process incoming requests at the rate of 10K/s depending on various
>> dynamic rules and static rules*
>> *? *
>>
>
> The scale is definitely manageable in a Statefun cluster, and could
> possibly be a good fit for dynamic and static rules. Hopefully Gordon can
> comment more there. For the general Flink solution to this problem, I
> always turn to this great series of blog posts around fraud detection with
> dynamic rules[1].
>
> 2.* Is Flink capable of accommodating the above-mentioned dynamic rules
>> in its states (about 1500 rules per Keyed Event ) for the faster
>> transformation of incoming streams? *
>>
>
> This may be manageable as well, depending on how you are applying these
> rules and what they look like (size, etc.). Can you give any more
> information there?
>
>
> *3.** I**f we are not interested in using AWS lambda or Azure functions,
>> what are the other options?. What about using co-located functions and
>> embedded functions? * *Is there any benefit in using one over the other
>> for my data processing flow?*
>>
>
> Yes, you can embed JVM functions via Embedded Modules[2], which in your
> case might benefit from the Flink DataStream integration[3]. You can also
> host remote functions anywhere, i.e. Kubernetes, behind an NGINX server,
> etc. The Module Configuration section[4] will likely shed more light on
> what is available. I think the main tradeoffs here are availability,
> scalability, and network latency for external functions.
>
> 4*.If we are going with embedded functions/co-located functions, is it
>> possible to autoscale the application using the recently released reactive
>> mode in Flink 1.13?*
>>
>
> Statefun 3.0 uses Flink 1.12 but is expected to upgrade to Flink 1.13 in
> the next release cycle. There are a few other changes that are necessary to
> be compatible with Reactive Mode (i.e make the Statefun Cluster a regular
> Flink Application tracked in FLINK-16930 [5]), but it's coming!
>
>
> On a higher note, what made you interested in Statefun for this use case?
> The community is currently trying to expand our understanding of potential
> users, so it would be great to hear a bit more!
>
> Best,
> Austin
>
> [1]: https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html
> [2]:
> https://ci.apache.org/projects/flink/flink-statefun-docs-release-3.0/docs/deployment/embedded/#embedded-module-configuration
> [3]:
> https://ci.apache.org/projects/flink/flink-statefun-docs-release-3.0/docs/sdk/flink-datastream/
> [4]:
> https://ci.apache.org/projects/flink/flink-statefun-docs-release-3.0/docs/deployment/module/#module-configuration
> [5]: https://issues.apache.org/jira/browse/FLINK-16930
>
> On Wed, May 12, 2021 at 11:53 AM Jessy Ping <tech.user.str...@gmail.com>
> wrote:
>
>> Hi all,
>>
>>
>> I have gone through the stateful function's documentation and required
>> some expert advice or clarification regarding the following points.
>>
>>
>> *Note: My data processing flow is as follows,*
>>
>>
>> *ingress(10k/s)--> First transformation based on certain static rules -->
>> second transformation based on certain dynamic rules --> Third and final
>> transformation based on certain dynamic and static rules --> egress*
>>
>>
>> *Questions*
>>
>> *1. Is the stateful function a good candidate for a system(as above) that
>> should process incoming requests at the rate of 10K/s depending on various
>> dynamic rules and static rules**? *
>>
>>
>> 2.* Is Flink capable of accommodating the above-mentioned dynamic rules
>> in its states (about 1500 rules per Keyed Event ) for the faster
>> transformation of incoming streams? *
>>
>>
>> *3.** I**f we are not interested in using AWS lambda or Azure functions,
>> what are the other options?. What about using co-located functions and
>> embedded functions? * *Is there any benefit in using one over the other
>> for my data processing flow?*
>>
>>
>> 4*.If we are going with embedded functions/co-located functions, is it
>> possible to autoscale the application using the recently released reactive
>> mode in Flink 1.13?*
>>
>>
>> *Thanks*
>>
>> *Jessy*
>>
>>
>>

Re: Regarding Stateful Functions

Reply via email to