Ufuk Celebi created FLINK-11402:
-----------------------------------

             Summary: User code can fail with an UnsatisfiedLinkError in the 
presence of multiple classloaders
                 Key: FLINK-11402
                 URL: https://issues.apache.org/jira/browse/FLINK-11402
             Project: Flink
          Issue Type: Bug
          Components: Distributed Coordination
    Affects Versions: 1.7.0
            Reporter: Ufuk Celebi
         Attachments: hello-snappy-1.0-SNAPSHOT.jar, hello-snappy.tgz

As reported on the user mailing list thread "[`env.java.opts` not persisting 
after job canceled or failed and then 
restarted|https://lists.apache.org/thread.html/37cc1b628e16ca6c0bacced5e825de8057f88a8d601b90a355b6a291@%3Cuser.flink.apache.org%3E]";,
 there can be issues with using native libraries and user code class loading.

h2. Steps to reproduce

I was able to reproduce the issue reported on the mailing list using 
[snappy-java|https://github.com/xerial/snappy-java] in a user program. Running 
the attached user program works fine on initial submission, but results in a 
failure when re-executed.

I'm using Flink 1.7.0 using a standalone cluster started via 
{{bin/start-cluster.sh}}.

0. Unpack attached Maven project and build using {{mvn clean package}} *or* 
directly use attached {{hello-snappy-1.0-SNAPSHOT.jar}}
1. Download 
[snappy-java-1.1.7.2.jar|http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.2/snappy-java-1.1.7.2.jar]
 and unpack libsnappyjava for your system:
{code}
jar tf snappy-java-1.1.7.2.jar | grep libsnappy
...
org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so
...
org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib
...
{code}
2. Configure system library path to {{libsnappyjava}} in {{flink-conf.yaml}} 
(path needs to be adjusted for your system):
{code}
env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64
{code}
3. Run attached {{hello-snappy-1.0-SNAPSHOT.jar}}
{code}
bin/flink run hello-snappy-1.0-SNAPSHOT.jar
Starting execution of program
Program execution finished
Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished.
Job Runtime: 359 ms
{code}
4. Rerun attached {{hello-snappy-1.0-SNAPSHOT.jar}}
{code}
bin/flink run hello-snappy-1.0-SNAPSHOT.jar
Starting execution of program

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 
7d69baca58f33180cb9251449ddcd396)
  at 
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268)
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487)
  at 
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
  at com.github.uce.HelloSnappy.main(HelloSnappy.java:18)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
  at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
  at 
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
  at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
  at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
  at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
  at 
org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
  at 
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
  at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution 
failed.
  at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
  at 
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
  ... 17 more
Caused by: java.lang.UnsatisfiedLinkError: Native Library 
/.../org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib already loaded in 
another classloader
  at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1907)
  at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1861)
  at java.lang.Runtime.loadLibrary0(Runtime.java:870)
  at java.lang.System.loadLibrary(System.java:1122)
  at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:182)
  at org.xerial.snappy.SnappyLoader.loadSnappyApi(SnappyLoader.java:154)
  at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47)
  at com.github.uce.HelloSnappy.lambda$main$95f17bfa$1(HelloSnappy.java:13)
  at 
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
  at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
  at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
  at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
  at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
  at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
  at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
  at 
org.apache.flink.streaming.api.functions.source.FromElementsFunction.run(FromElementsFunction.java:164)
  at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94)
  at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
  at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
  at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
  at java.lang.Thread.run(Thread.java:748)
{code}

*Note*: The attached user code configures Snappy to use {{libsnappyjava}} in 
the path specified by {{java.library.path}} (see 
{{org-xerial-snappy.properties}}). When bundling the native code in the user 
JAR, repeated execution works fine.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to