Hi,

(This question is more appropriate for the user mailing list, not dev - when 
responding to my e-mail please remove dev mailing list from the recipients, 
I’ve kept it just FYI that discussion has moved to user mailing list).

Could it be, that the problem is caused by changes in chaining strategy of the 
AsyncWaitOperator in 1.9, as explained in the release notes [1]?

> AsyncIO
> Due to a bug in the AsyncWaitOperator, in 1.9.0 the default chaining 
> behaviour of the operator is now changed so that it 
> is never chained after another operator. This should not be problematic for 
> migrating from older version snapshots as 
> long as an uid was assigned to the operator. If an uid was not assigned to 
> the operator, please see the instructions here [2]
> for a possible workaround.
>
> Related issues:
>
>       • FLINK-13063: AsyncWaitOperator shouldn’t be releasing 
> checkpointingLock [3]

Piotrek

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.9.html#asyncio
 
<https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.9.html#asyncio>
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/upgrading.html#matching-operator-state
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/upgrading.html#matching-operator-state>
[3] https://issues.apache.org/jira/browse/FLINK-13063 
<https://issues.apache.org/jira/browse/FLINK-13063>

> On 30 Oct 2019, at 16:52, Bekir Oguz <bekir.o...@dpgmedia.nl> wrote:
> 
> Hi guys,
> during our upgrade from 1.8.1 to 1.9.1, one of our jobs fail to start with
> the following exception. We deploy the job with 'allow-non-restored-state'
> option and from the latest checkpoint dir of the 1.8.1 version.
> 
> org.apache.flink.util.StateMigrationException: The new state typeSerializer
> for operator state must not be incompatible.
>    at org.apache.flink.runtime.state.DefaultOperatorStateBackend
> .getListState(DefaultOperatorStateBackend.java:323)
>    at org.apache.flink.runtime.state.DefaultOperatorStateBackend
> .getListState(DefaultOperatorStateBackend.java:214)
>    at org.apache.flink.streaming.api.operators.async.AsyncWaitOperator
> .initializeState(AsyncWaitOperator.java:272)
>    at org.apache.flink.streaming.api.operators.AbstractStreamOperator
> .initializeState(AbstractStreamOperator.java:281)
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(
> StreamTask.java:881)
>    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask
> .java:395)
>    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>    at java.lang.Thread.run(Thread.java:748)
> 
> We see from the Web UI that the 'async wait operator' is causing this,
> which is not changed at all during this upgrade.
> 
> All other jobs are migrated without problems, only this one is failing. Has
> anyone else experienced this during migration?
> 
> Regards,
> Bekir Oguz

Reply via email to