Hi JM,

Yes, `InvalidPidMappingException` occurs because the transaction is
lost in most cases.

For short-term, " transaction.timeout.ms" >
"transactional.id.expiration.ms" can ignore the
`InvalidPidMappingException`[1].
For long-term, FLIP-319[2] provides a solution.

[1] 
https://speakerdeck.com/rmetzger/3-flink-mistakes-we-made-so-you-wont-have-to?slide=13
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255071710

Jean-Marc Paulin <j...@uk.ibm.com> 于2024年4月20日周六 02:30写道:
>
> Hi,
>
> we use Flink 1.18 with Kafka Sink, and we enabled `EXACTLY_ONCE` on one of 
> our kafka sink. We set the transation timeout to 15 minutes. When we try to 
> restore from a savepoint, way after that 15 minutes window, Flink enter in a 
> RESTARTING loop. We see the error:
>
> ```
> {
>   "exception": {
>     "exception_class": 
> "org.apache.kafka.common.errors.InvalidPidMappingException",
>     "exception_message": "The producer attempted to use a producer id which 
> is not currently assigned to its transactional id.",
>     "stacktrace": "org.apache.kafka.common.errors.InvalidPidMappingException: 
> The producer attempted to use a producer id which is not currently assigned 
> to its transactional id.\n"
>   },
>   "@version": 1,
>   "source_host": "aiops-ir-lifecycle-eventprocessor-ep-jobmanager-0",
>   "message": "policy-exec::schedule-policy-execution -> 
> (policy-exec::select-kafka-async-policy-stages, 
> policy-exec::select-async-policy-stages -> 
> policy-exec::execute-async-policy-stages, 
> policy-exec::select-non-async-policy-stages, Sink: stories-input, Sink: 
> policy-completion-results, Sink: stories-changes, Sink: alerts-input, Sink: 
> story-notifications-output, Sink: alerts-output, Sink: alerts-changes, Sink: 
> connector-alerts, Sink: updated-events-output, Sink: stories-output, Sink: 
> runbook-execution-requests) (6/6) 
> (3f8cb042c1aa628891c444466a8b52d1_593c33b9decafa4ad6ae85c185860bef_5_0) 
> switched from INITIALIZING to FAILED on 
> aiops-ir-lifecycle-eventprocessor-ep-taskmanager-1.aiops-ir-lifecycle-eventprocessor-ep-taskmanager.cp4aiops.svc:6122-d2828c
>  @ 
> aiops-ir-lifecycle-eventprocessor-ep-taskmanager-1.aiops-ir-lifecycle-eventprocessor-ep-taskmanager.cp4aiops.svc.cluster.local
>  (dataPort=6121).",
>   "thread_name": "flink-pekko.actor.default-dispatcher-18",
>   "@timestamp": "2024-04-19T11:11:05.169+0000",
>   "level": "INFO",
>   "logger_name": "org.apache.flink.runtime.executiongraph.ExecutionGraph"
> }
> ```
> As much as I understanding the transaction is lost, would it be possible to 
> ignore this particular error and resume the job anyway?
>
> Thanks for any suggestions
>
> JM
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU



-- 
Best,
Yanfei

Reply via email to