有时候这种job持续2个多小时,我只能cancel job,但无法正常 cancel,都会导致 taskmanager 挂掉,错误如下

2021-01-31 23:04:23,677 ERROR
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Task did
not exit gracefully within 180 + seconds.
org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully
within 180 + seconds.
        at
org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1685)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
2021-01-31 23:04:23,685 ERROR
org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Fatal
error occurred while executing the TaskManager. Shutting it down...
org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully
within 180 + seconds.
        at
org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1685)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
2021-01-31 23:04:23,686 INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Stopping
TaskExecutor akka.tcp://flink@10.13.69.52:45901/user/rpc/taskmanager_0.
2021-01-31 23:04:23,686 INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Close
ResourceManager connection 1bd159f361d86e77d17e261ab44b5128.
2021-01-31 23:04:23,689 WARN  org.apache.flink.runtime.taskmanager.Task         
          
[] - Task 'Source: HiveSource-snmpprobe.p_port_traffic_5m ->
Calc(select=[binaryid AS id, ver, CAST(2021-01-31 21:45:00:TIMESTAMP(6)) AS
coltime, CAST(in_octets) AS in_octets, CAST(out_octets) AS out_octets,
CAST(bi_octets) AS bi_octets, CAST(unimax_octets) AS unimax_octets,
in_speed, out_speed, bi_speed, unimax_speed, in_util, out_util, bi_util,
unimax_util, inout_ratio, bandwidth, origin, CAST((() DATE_FORMAT
_UTF-16LE'yyyy-MM-dd HH:mm:ss')) AS crtime], where=[(coltime = 2021-01-31
21:45:00:TIMESTAMP(9))]) -> Sink:
Sink(table=[myhive.prod_mysql_zqzynetdb.p_port_traffic_5m], fields=[id, ver,
coltime, in_octets, out_octets, bi_octets, unimax_octets, in_speed,
out_speed, bi_speed, unimax_speed, in_util, out_util, bi_util, unimax_util,
inout_ratio, bandwidth, origin, crtime]) (1/1)#0' did not react to
cancelling signal for 30 seconds, but is stuck in method:
 java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
com.mysql.cj.protocol.ReadAheadInputStream.fill(ReadAheadInputStream.java:107)
com.mysql.cj.protocol.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:150)
com.mysql.cj.protocol.ReadAheadInputStream.read(ReadAheadInputStream.java:180)
java.io.FilterInputStream.read(FilterInputStream.java:133)
com.mysql.cj.protocol.FullReadInputStream.readFully(FullReadInputStream.java:64)
com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:63)
com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:45)
com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:52)
com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:41)
com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:54)
com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:44)
com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:538)
com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(NativeProtocol.java:708)
com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:647)
com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:946)
com.mysql.cj.NativeSession.execSQL(NativeSession.java:1075)
com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:930)
com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1092)
com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchSerially(ClientPreparedStatement.java:832)
com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)
com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:796)
org.apache.flink.connector.jdbc.statement.FieldNamedPreparedStatementImpl.executeBatch(FieldNamedPreparedStatementImpl.java:65)
org.apache.flink.connector.jdbc.internal.executor.TableSimpleStatementExecutor.executeBatch(TableSimpleStatementExecutor.java:64)
org.apache.flink.connector.jdbc.internal.executor.TableBufferReducedStatementExecutor.executeBatch(TableBufferReducedStatementExecutor.java:101)
org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.attemptFlush(JdbcBatchingOutputFormat.java:216)
org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.flush(JdbcBatchingOutputFormat.java:184)
org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.writeRecord(JdbcBatchingOutputFormat.java:167)
org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.invoke(OutputFormatSinkFunction.java:87)
org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:49)
org.apache.flink.table.runtime.operators.sink.SinkOperator.processElement(SinkOperator.java:72)
org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:112)
org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:93)
org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
BatchCalc$597.processElement(Unknown Source)
org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:112)
org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:93)
org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:160)
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:101)
org.apache.flink.connector.file.src.impl.FileSourceRecordEmitter.emitRecord(FileSourceRecordEmitter.java:45)
org.apache.flink.connector.file.src.impl.FileSourceRecordEmitter.emitRecord(FileSourceRecordEmitter.java:35)
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:128)
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:275)
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:67)
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:395)
org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$267/2023964146.runDefaultAction(Unknown
Source)
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:609)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573)
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
java.lang.Thread.run(Thread.java:748)

2021-01-31 23:04:23,691 INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Close
JobManager connection for job 13bc5fa5addba5772e9425161b48a2e3.
2021-01-31 23:04:23,691 INFO  org.apache.flink.runtime.taskmanager.Task         
          
[] - Attempting to fail task externally Source:
HiveSource-snmpprobe.p_port_traffic_5m -> Calc(select=[binaryid AS id, ver,
CAST(2021-01-31 21:45:00:TIMESTAMP(6)) AS coltime, CAST(in_octets) AS
in_octets, CAST(out_octets) AS out_octets, CAST(bi_octets) AS bi_octets,
CAST(unimax_octets) AS unimax_octets, in_speed, out_speed, bi_speed,
unimax_speed, in_util, out_util, bi_util, unimax_util, inout_ratio,
bandwidth, origin, CAST((() DATE_FORMAT _UTF-16LE'yyyy-MM-dd HH:mm:ss')) AS
crtime], where=[(coltime = 2021-01-31 21:45:00:TIMESTAMP(9))]) -> Sink:
Sink(table=[myhive.prod_mysql_zqzynetdb.p_port_traffic_5m], fields=[id, ver,
coltime, in_octets, out_octets, bi_octets, unimax_octets, in_speed,
out_speed, bi_speed, unimax_speed, in_util, out_util, bi_util, unimax_util,
inout_ratio, bandwidth, origin, crtime]) (1/1)#0
(7f4ca2467b4b31e2476bb7f5b93f6d33).
2021-01-31 23:04:23,691 INFO  org.apache.flink.runtime.taskmanager.Task         
          
[] - Task Source: HiveSource-snmpprobe.p_port_traffic_5m ->
Calc(select=[binaryid AS id, ver, CAST(2021-01-31 21:45:00:TIMESTAMP(6)) AS
coltime, CAST(in_octets) AS in_octets, CAST(out_octets) AS out_octets,
CAST(bi_octets) AS bi_octets, CAST(unimax_octets) AS unimax_octets,
in_speed, out_speed, bi_speed, unimax_speed, in_util, out_util, bi_util,
unimax_util, inout_ratio, bandwidth, origin, CAST((() DATE_FORMAT
_UTF-16LE'yyyy-MM-dd HH:mm:ss')) AS crtime], where=[(coltime = 2021-01-31
21:45:00:TIMESTAMP(9))]) -> Sink:
Sink(table=[myhive.prod_mysql_zqzynetdb.p_port_traffic_5m], fields=[id, ver,
coltime, in_octets, out_octets, bi_octets, unimax_octets, in_speed,
out_speed, bi_speed, unimax_speed, in_util, out_util, bi_util, unimax_util,
inout_ratio, bandwidth, origin, crtime]) (1/1)#0 is already in state
CANCELING
2021-01-31 23:04:23,693 INFO 
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot
TaskSlot(index:22, state:ALLOCATED, resource profile:
ResourceProfile{cpuCores=1.0000000000000000, taskHeapMemory=172.000mb
(180355070 bytes), taskOffHeapMemory=0 bytes, managedMemory=128.000mb
(134217730 bytes), networkMemory=16.000mb (16777216 bytes)}, allocationId:
b70921707d488586bce0319feb054ebc, jobId: 13bc5fa5addba5772e9425161b48a2e3).
2021-01-31 23:04:23,693 INFO  org.apache.flink.runtime.taskmanager.Task         
          
[] - Attempting to fail task externally Source:
HiveSource-snmpprobe.p_port_traffic_5m -> Calc(select=[binaryid AS id, ver,
CAST(2021-01-31 21:45:00:TIMESTAMP(6)) AS coltime, CAST(in_octets) AS
in_octets, CAST(out_octets) AS out_octets, CAST(bi_octets) AS bi_octets,
CAST(unimax_octets) AS unimax_octets, in_speed, out_speed, bi_speed,
unimax_speed, in_util, out_util, bi_util, unimax_util, inout_ratio,
bandwidth, origin, CAST((() DATE_FORMAT _UTF-16LE'yyyy-MM-dd HH:mm:ss')) AS
crtime], where=[(coltime = 2021-01-31 21:45:00:TIMESTAMP(9))]) -> Sink:
Sink(table=[myhive.prod_mysql_zqzynetdb.p_port_traffic_5m], fields=[id, ver,
coltime, in_octets, out_octets, bi_octets, unimax_octets, in_speed,
out_speed, bi_speed, unimax_speed, in_util, out_util, bi_util, unimax_util,
inout_ratio, bandwidth, origin, crtime]) (1/1)#0
(7f4ca2467b4b31e2476bb7f5b93f6d33).
2021-01-31 23:04:23,693 INFO  org.apache.flink.runtime.taskmanager.Task         
          
[] - Task Source: HiveSource-snmpprobe.p_port_traffic_5m ->
Calc(select=[binaryid AS id, ver, CAST(2021-01-31 21:45:00:TIMESTAMP(6)) AS
coltime, CAST(in_octets) AS in_octets, CAST(out_octets) AS out_octets,
CAST(bi_octets) AS bi_octets, CAST(unimax_octets) AS unimax_octets,
in_speed, out_speed, bi_speed, unimax_speed, in_util, out_util, bi_util,
unimax_util, inout_ratio, bandwidth, origin, CAST((() DATE_FORMAT
_UTF-16LE'yyyy-MM-dd HH:mm:ss')) AS crtime], where=[(coltime = 2021-01-31
21:45:00:TIMESTAMP(9))]) -> Sink:
Sink(table=[myhive.prod_mysql_zqzynetdb.p_port_traffic_5m], fields=[id, ver,
coltime, in_octets, out_octets, bi_octets, unimax_octets, in_speed,
out_speed, bi_speed, unimax_speed, in_util, out_util, bi_util, unimax_util,
inout_ratio, bandwidth, origin, crtime]) (1/1)#0 is already in state
CANCELING
2021-01-31 23:04:23,695 INFO 
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl [] - Free slot
TaskSlot(index:22, state:RELEASING, resource profile:
ResourceProfile{cpuCores=1.0000000000000000, taskHeapMemory=172.000mb
(180355070 bytes), taskOffHeapMemory=0 bytes, managedMemory=128.000mb
(134217730 bytes), networkMemory=16.000mb (16777216 bytes)}, allocationId:
b70921707d488586bce0319feb054ebc, jobId: 13bc5fa5addba5772e9425161b48a2e3).
2021-01-31 23:04:33,693 INFO 
org.apache.flink.runtime.blob.PermanentBlobCache             [] - Shutting
down BLOB cache
2021-01-31 23:04:33,693 INFO 
org.apache.flink.runtime.blob.TransientBlobCache             [] - Shutting
down BLOB cache
2021-01-31 23:04:33,693 INFO 
org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager [] -
Shutting down TaskExecutorLocalStateStoresManager.
2021-01-31 23:04:33,695 INFO  org.apache.flink.runtime.filecache.FileCache      
          
[] - removed file cache directory
/tmp/flink-dist-cache-b62b4fe6-a247-48c2-b1c2-f813ac4d2a78
2021-01-31 23:04:33,697 INFO 
org.apache.flink.runtime.io.disk.FileChannelManagerImpl      [] -
FileChannelManager removed spill file directory
/tmp/flink-io-a8733545-1457-42f4-892a-779141bc4ce5
2021-01-31 23:04:33,697 INFO 
org.apache.flink.runtime.io.disk.FileChannelManagerImpl      [] -
FileChannelManager removed spill file directory
/tmp/flink-netty-shuffle-3889e83c-89e2-436e-8e30-2997eaf8cd21




--
Sent from: http://apache-flink.147419.n8.nabble.com/

回复