Hi,

We use S3 as our datastore for checkpoint/savepoints, and following an S3 error 
we saw that exception:

```
java.io.IOException: GET operation failed: Could not transfer error message
        at 
org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:231)
        at 
org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:139)
        at 
org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:177)
        at 
org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:269)
        at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.createUserCodeClassLoader(BlobLibraryCacheManager.java:268)
        at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:243)
        at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1200(BlobLibraryCacheManager.java:210)
        at 
org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:350)
        at 
org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1042)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:624)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
        at java.base/java.lang.Thread.run(Thread.java:839)
Caused by: java.io.IOException: Could not transfer error message
        at 
org.apache.flink.runtime.blob.BlobUtils.readExceptionFromStream(BlobUtils.java:348)
        at 
org.apache.flink.runtime.blob.BlobClient.receiveAndCheckGetResponse(BlobClient.java:276)
        at 
org.apache.flink.runtime.blob.BlobClient.getInternal(BlobClient.java:226)
        ... 11 more
Caused by: java.lang.ClassNotFoundException: 
com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException
        at java.base/java.lang.Class.forNameImpl(Native Method)
        at java.base/java.lang.Class.forName(Class.java:418)
        at 
org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78)
        at 
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2124)
        at 
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1991)
        at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2322)
        at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1808)
        at 
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:573)
        at 
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:483)
        at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:539)
        at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:527)
        at 
org.apache.flink.runtime.blob.BlobUtils.readExceptionFromStream(BlobUtils.java:345)
```

Which I think comes from here: 
https://github.com/apache/flink/blob/4fe66e0697471105e0f0a3f8519bb0c0ac559709/flink-runtime/src/main/java/org/apache/flink/runtime/blob/BlobUtils.java#L338-L350.
 I sadly do not have a good/easy way to replicate this scenario.

My guess is the ClassLoader.getSystemClassLoader() is not valid here when the 
exception comes from a plugin (S3 plugin here) , as pligins have their 
dedicated classloader if I understand correctly.

Seems harmless (way more things are broken for us at that point), but as the 
code comments states this "should never occur", I thought I could probably flag 
it.

Kind regard

JM

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Reply via email to