Hi,
We use S3 as our datastore for checkpoint/savepoints, and following an S3 error
we saw that exception:
```
java.io.IOException: GET operation failed: Could not transfer error message
at
org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:231)
at
org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:139)
at
org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:177)
at
org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:269)
at
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.createUserCodeClassLoader(BlobLibraryCacheManager.java:268)
at
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:243)
at
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1200(BlobLibraryCacheManager.java:210)
at
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:350)
at
org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1042)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:624)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.base/java.lang.Thread.run(Thread.java:839)
Caused by: java.io.IOException: Could not transfer error message
at
org.apache.flink.runtime.blob.BlobUtils.readExceptionFromStream(BlobUtils.java:348)
at
org.apache.flink.runtime.blob.BlobClient.receiveAndCheckGetResponse(BlobClient.java:276)
at
org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:226)
... 11 more
Caused by: java.lang.ClassNotFoundException:
com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException
at java.base/java.lang.Class.forNameImpl(Native Method)
at java.base/java.lang.Class.forName(Class.java:418)
at
org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78)
at
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2124)
at
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1991)
at
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2322)
at
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1808)
at
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:573)
at
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:483)
at
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:539)
at
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:527)
at
org.apache.flink.runtime.blob.BlobUtils.readExceptionFromStream(BlobUtils.java:345)
```
Which I think comes from here:
https://github.com/apache/flink/blob/4fe66e0697471105e0f0a3f8519bb0c0ac559709/flink-runtime/src/main/java/org/apache/flink/runtime/blob/BlobUtils.java#L338-L350.
I sadly do not have a good/easy way to replicate this scenario.
My guess is the ClassLoader.getSystemClassLoader() is not valid here when the
exception comes from a plugin (S3 plugin here) , as pligins have their
dedicated classloader if I understand correctly.
Seems harmless (way more things are broken for us at that point), but as the
code comments states this "should never occur", I thought I could probably flag
it.
Kind regard
JM
Unless otherwise stated above:
IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU