Christopher Tubbs created ACCUMULO-2764:
-------------------------------------------

             Summary: Stopping MAC before it's processes have fully started 
causes an indefinite hang
                 Key: ACCUMULO-2764
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2764
             Project: Accumulo
          Issue Type: Bug
          Components: mini
    Affects Versions: 1.6.0
         Environment: OpenJDK 1.6.0, CentOS 6.5, 2CPU, 6GB RAM (virtual 
hardware)
            Reporter: Christopher Tubbs
             Fix For: 1.6.1, 1.7.0


I saw this testing 1.6.0-RC5.

Calling process.destroy() and then process.waitFor(), as MiniAccumuloCluster 
does in it's stop method, before the process is fully started, appears to 
create an indefinite hang.

I saw this most recently in 
MiniAccumuloClusterGCTest.testAccurateProcessListReturned, which gets a 
ProcessReference and then immediately shuts down MAC, though it was also the 
root cause of ACCUMULO-2756. In this instance, the test got stuck in the MAC 
teardown.

{code:java}
"main" prio=10 tid=0x00007f3cf4008800 nid=0x2b19 in Object.wait() 
[0x00007f3cf8f9c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000e29dd2e8> (a java.lang.UNIXProcess)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.UNIXProcess.waitFor(UNIXProcess.java:181)
        - locked <0x00000000e29dd2e8> (a java.lang.UNIXProcess)
        at 
org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.stop(MiniAccumuloClusterImpl.java:607)
        at 
org.apache.accumulo.minicluster.impl.MiniAccumuloClusterGCTest.tearDownMiniCluster(MiniAccumuloClusterGCTest.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
{code}

It appears that destroy() doesn't actually succeed in destroying a process 
which is still starting, so the waitFor() waits indefinitely. I haven't 
debugged further. It may be a JVM bug, or a limitation in the java Process API, 
or some UNIX signal handling quirk with process instantiation that destroy() 
cannot know.

One fix could be to make start() wait until the metadata table can be scanned 
before it returns, to ensure all processes are actually running and ready. 
Another fix would be to have the teardown code try another destroy if waitFor() 
doesn't return after a reasonable amount of time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to