If logs are not helping, I think the remaining option is to attach a debugger [1]. I'd probably add a breakpoint to LegacySourceFunctionThread#run and see what happens. If the issue is in recovery, you should add a breakpoint to StreamTask#beforeInvoke.
[1] https://cwiki.apache.org/confluence/display/FLINK/Remote+Debugging+of+Flink+Clusters On Fri, May 28, 2021 at 1:11 PM Igal Shilman <i...@ververica.com> wrote: > Hi Tim, > Any additional logs from before are highly appreciated, this would help us > to trace this issue. > By the way, do you see something in the JobManager's UI? > > On Fri, May 28, 2021 at 9:06 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > >> Hi Timothy, >> >> It would indeed be hard to figure this out without any stack traces. >> >> Have you tried changing to debug level logs? Maybe you can also try using >> the StateFun Harness to restore and run your job in the IDE - in that case >> you should be able to see which code exactly is throwing this exception. >> >> Cheers, >> Gordon >> >> On Fri, May 28, 2021 at 12:39 PM Timothy Bess <tdbga...@gmail.com> wrote: >> >>> Hi, >>> >>> Just checking to see if anyone has experienced this error. Might just be >>> a Flink thing that's irrelevant to statefun, but my job keeps failing over >>> and over with this message: >>> >>> 2021-05-28 03:51:13,001 INFO >>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer [] - >>> Starting FlinkKafkaInternalProducer (10/10) to produce into default >>> topic __stateful_functions_random_topic_lNVlkW9SkYrtZ1oK >>> 2021-05-28 03:51:13,001 INFO >>> org.apache.flink.streaming.connectors.kafka.internal. >>> FlinkKafkaInternalProducer [] - Attempting to resume transaction >>> feedback-union -> functions -> Sink: >>> bluesteel-kafka_egress-egress-dd0a6f77c8b5eccd4b7254cdfd577ff9-45 with >>> producerId 31 and epoch 3088 >>> 2021-05-28 03:51:13,017 WARN org.apache.flink.runtime.taskmanager.Task >>> [] - Source: lead-leads-ingress -> router (leads) (10/10) >>> (ff51aacdb850c6196c61425b82718862) switched from RUNNING to FAILED. >>> java.lang.NullPointerException: null >>> >>> The null pointer doesn't come with any stack traces or anything. It's >>> really mystifying. Seems to just fail while restoring continuously. >>> >>> Thanks, >>> >>> Tim >>> >>