Hi yidan,

One more thing to confirm: are you create the savepoint and stop the job all 
together with 

 bin/flink cancel -s [:targetDirectory] :jobId
command ?

Best,
 Yun



 ------------------Original Mail ------------------
Sender:赵一旦 <hinobl...@gmail.com>
Send Date:Sun Feb 7 16:13:57 2021
Recipients:Till Rohrmann <trohrm...@apache.org>
CC:Robert Metzger <rmetz...@apache.org>, user <user@flink.apache.org>
Subject:Re: flink kryo exception

It also maybe have something to do with my job's first tasks. The second task 
have two input, one is the kafka source stream(A), another is self-defined 
mysql source as broadcast stream.(B) 
In A: I have a 'WatermarkReAssigner', a self-defined operator which add an 
offset to its input watermark and then forward to downstream.
In B: The parallelism is 30, but in my rich function's implementation, only the 
subtask-0 will do mysql query and send out records, other subtasks do nothing. 
All subtasks will send max_watermark - 86400_000 as the watermark.
Since both the first task have some self-defined source or implementation, I do 
not know whether the problem have something to do with it.
赵一旦 <hinobl...@gmail.com> 于2021年2月7日周日 下午4:05写道:

The first problem is critical, since the savepoint do not work.
The second problem, in which I changed the solution, removed the 'Map' based 
implementation before the data are transformed to the second task, and this 
case savepoint works.  The only problem is that, I should stop the job and 
remember the savepoint path, then restart job with the savepoint path. And now 
it is : I stop the job, then the job failed and restart automatically with the 
generated savepoint.  So I do not need to restart the job anymore, since what 
it does automatically is what I want to do.

I have some idea that maybe it is also related to the data? So I am not sure 
that I can provide an example to reproduces the problem.  
Till Rohrmann <trohrm...@apache.org> 于2021年2月6日周六 上午12:13写道:

Could you provide us with a minimal working example which reproduces the 
problem for you? This would be super helpful in figuring out the problem you 
are experiencing. Thanks a lot for your help.

Cheers,
Till
On Fri, Feb 5, 2021 at 1:03 PM 赵一旦 <hinobl...@gmail.com> wrote:

Yeah, and if it is different, why my job runs normally.  The problem only 
occurres when I stop it. 
Robert Metzger <rmetz...@apache.org> 于2021年2月5日周五 下午7:08写道:

Are you 100% sure that the jar files in the classpath (/lib folder) are exactly 
the same on all machines? (It can happen quite easily in a distributed 
standalone setup that some files are different)


On Fri, Feb 5, 2021 at 12:00 PM 赵一旦 <hinobl...@gmail.com> wrote:

Flink1.12.0; only using aligned checkpoint; Standalone Cluster; 



Robert Metzger <rmetz...@apache.org> 于2021年2月5日周五 下午6:52写道:

Are you using unaligned checkpoints? (there's a known bug in 1.12.0 which can 
lead to corrupted data when using UC)
Can you tell us a little bit about your environment? (How are you deploying 
Flink, which state backend are you using, what kind of job (I guess DataStream 
API))

Somehow the process receiving the data is unable to deserialize it, most likely 
because they are configured differently (different classpath, dependency 
versions etc.)
On Fri, Feb 5, 2021 at 10:36 AM 赵一旦 <hinobl...@gmail.com> wrote:

I do not think this is some code related problem anymore, maybe it is some bug?
赵一旦 <hinobl...@gmail.com> 于2021年2月5日周五 下午4:30写道:

Hi all, I find that the failure always occurred in the second task, after the 
source task. So I do something in the first chaining task, I transform the 
'Map' based class object to another normal class object, and the problem 
disappeared.

Based on the new solution, I also tried to stop and restore job with savepoint 
(all successful).

But, I also met another problem. Also this problem occurs while I stop the job, 
and also occurs in the second task after the source task. The log is below:
2021-02-05 16:21:26
java.io.EOFException
    at 
org.apache.flink.core.memory.DataInputDeserializer.readUnsignedByte(DataInputDeserializer.java:321)
    at org.apache.flink.types.StringValue.readString(StringValue.java:783)
    at 
org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:75)
    at 
org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:33)
    at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:411)
    at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:411)
    at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:202)
    at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
    at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
    at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:92)
    at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:145)
    at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
    at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:92)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:372)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:575)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:539)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547)
    at java.lang.Thread.run(Thread.java:748)
It is also about serialize and deserialize, but not related to kryo this time.


Till Rohrmann <trohrm...@apache.org> 于2021年2月3日周三 下午9:22写道:

From these snippets it is hard to tell what's going wrong. Could you maybe give 
us a minimal example with which to reproduce the problem? Alternatively, have 
you read through Flink's serializer documentation [1]? Have you tried to use a 
simple POJO instead of inheriting from a HashMap?

The stack trace looks as if the job fails deserializing some key of your 
MapRecord map.

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/types_serialization.html#most-frequent-issues

Cheers,
Till
On Wed, Feb 3, 2021 at 11:49 AM 赵一旦 <hinobl...@gmail.com> wrote:

Some facts are possibly related with these, since another job do not meet these 
expectations.
The problem job use a class which contains a field of Class MapRecord, and 
MapRecord is defined to extend HashMap so as to accept variable json data.

Class MapRecord:
@NoArgsConstructor
@Slf4j
public class MapRecord extends HashMap<Object, Object> implements Serializable {
    @Override
    public void setTimestamp(Long timestamp) {
        put("timestamp", timestamp);
        put("server_time", timestamp);
    }

    @Override
    public Long getTimestamp() {
        try {
            Object ts = getOrDefault("timestamp", getOrDefault("server_time", 
0L));
            return ((Number) Optional.ofNullable(ts).orElse(0L)).longValue();
        } catch (Exception e) {
            log.error("Error, MapRecord's timestamp invalid.", e);
            return 0L;
        }
    }
}
Class UserAccessLog:
public class UserAccessLog extends AbstractRecord<UserAccessLog> {
    private MapRecord d;  // I think this is related to the problem...
    ... ...
}
赵一旦 <hinobl...@gmail.com> 于2021年2月3日周三 下午6:43写道:

Actually the exception is different every time I stop the job.
Such as:
(1) com.esotericsoftware.kryo.KryoException: Unable to find class: g^XT
The stack as I given above.

(2) java.lang.IndexOutOfBoundsException: Index: 46, Size: 17
2021-02-03 18:37:24
java.lang.IndexOutOfBoundsException: Index: 46, Size: 17
    at java.util.ArrayList.rangeCheck(ArrayList.java:657)
    at java.util.ArrayList.get(ArrayList.java:433)
    at 
com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
    at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759)
    at 
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:135)
    at 
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
    at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:346)
    at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:411)
    at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:202)
    at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
    at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
    at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:92)
    at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:145)
    at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
    at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:92)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:372)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:575)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:539)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547)
    at java.lang.Thread.run(Thread.java:748)
(3)  com.esotericsoftware.kryo.KryoException: Encountered unregistered class 
ID: 96
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 96
    at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
    at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
    at 
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:135)
    at 
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
    at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:346)
    at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:411)
    at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:202)
    at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
    at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
    at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:92)
    at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:145)
    at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
    at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:92)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:372)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:575)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:539)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547)
    at java.lang.Thread.run(Thread.java:748)
...
Till Rohrmann <trohrm...@apache.org> 于2021年2月3日周三 下午6:28写道:

Hi,

could you show us the job you are trying to resume? Is it a SQL job or a 
DataStream job, for example?

From the stack trace, it looks as if the class g^XT is not on the class path.

Cheers,
Till
On Wed, Feb 3, 2021 at 10:30 AM 赵一旦 <hinobl...@gmail.com> wrote:

I have a job, the checkpoint and savepoint all right.
But, if I stop the job using 'stop -p', after the savepoint generated, then the 
job goes to fail. Here is the log:

2021-02-03 16:53:55,179 WARN  org.apache.flink.runtime.taskmanager.Task         
           [] - ual_ft_uid_subid_SidIncludeFilter -> ual_ft_uid_subid_Default
PassThroughFilter[null, null) -> ual_ft_uid_subid_UalUidFtExtractor -> 
ual_ft_uid_subid_EmptyUidFilter (17/30)#0 (46abce5d1148b56094726d442df2fd9c) 
switched 
from RUNNING to FAILED.

com.esotericsoftware.kryo.KryoException: Unable to find class: g^XT
        at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:346)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:411)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:202)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:92)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:145)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:92)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:372)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:575)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:539) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) 
[flink-dist_2.11-1.12.0.jar:1.12.0]
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) 
[flink-dist_2.11-1.12.0.jar:1.12.0]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_251]
Caused by: java.lang.ClassNotFoundException: g^XT
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 
~[?:1.8.0_251]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_251]
        at 
org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:63)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:72)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:49)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_251]
        at 
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:168)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at java.lang.Class.forName0(Native Method) ~[?:1.8.0_251]
        at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_251]
        at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        ... 22 more

Reply via email to