Hi Ken,

you can certainly have partitioned sources and sinks. You can control the
parallelism by calling .setParallelism() method.
If you need a partitioned sink, you can call .keyBy() to hash partition.

I did not completely understand the requirements of your program. Can you
maybe provide pseudo code for how the program should look like.

Some general comments:
- Flink's fault tolerance mechanism does not work with iterative data flows
yet. This is work in progress see: FLINK-3257 [1]
- Flink's fault tolerance mechanism does only work if you expose all!
internal operator state. So you would need to put your Java DB in Flink
state to have a recoverable job.
- Is the DB essential in your application? Could you use Flink's
key-partitioned state interface instead? That would help to make your job
fault-tolerant.

Best,
Fabian

[1] https://issues.apache.org/jira/browse/FLINK-3257
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/state.html#using-the-keyvalue-state-interface

2016-09-29 1:15 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>:

> Hi all,
>
> I’ve got a very specialized DB (runs in the JVM) that I need to use to
> both keep track of state and generate new records to be processed by my
> Flink streaming workflow. Some of the workflow results are updates to be
> applied to the DB.
>
> And the DB needs to be partitioned.
>
> My initial approach is to wrap it in a regular operator, and have
> subsequent streams be inputs for updating state. So now I’ve got an
> IterativeDataStream, which should work.
>
> But I imagine I could also wrap this DB in a source and a sink, yes?
> Though I’m not sure how I could partition it as a source, in that case.
>
> If it is feasible to have a partitioned source/sink, are there general
> pros/cons to either approach?
>
> Thanks,
>
> — Ken
>
>

Reply via email to