Hi Póka ,

nice approach!
Please report what you find out,

thanks
Marcel

Póka Balázs schrieb:
Hi Marcel,

I have modified the TopicContainer class as follows (in order to help me track down the problem):

  private final class TopicContainer {
     private volatile TopicHandler topicHandler;

     private final boolean fairness = false;

     private final ReentrantLock lock = new DebugReentrantLock(fairness);

     public TopicContainer(TopicHandler topicHandler) {
        this.topicHandler = topicHandler;
     }

     public TopicHandler getTopicHandler() {
        return this.topicHandler;
     }

     public void erase() {
        if (!this.lock.isHeldByCurrentThread())
throw new IllegalStateException("must be locked before erased");
        if (this.topicHandler == null)
           return;
        synchronized (this.lock) {
           this.topicHandler = null;
           int c = this.lock.getHoldCount();
           for (int i = 0; i < c; i++)
              this.lock.unlock();
           if (this.lock.isLocked()) log.severe("Must be unlocked!");
        }
     }

     public TopicHandler lock() {
        // topicHandler already erased
        if (this.topicHandler == null)
           return null;
        try {
           this.lock.tryLock(60, TimeUnit.SECONDS);
        } catch (InterruptedException e) {
           throw new RuntimeException("Interrupted while locking", e);
        }
        // topicHandler might have been erased while locking
        TopicHandler th = this.topicHandler;
        if (th == null) {
           this.lock.unlock();
           return null;
        }
        return th;
     }

     public void unlock() {
        synchronized (this.lock) {
if (this.lock.getHoldCount() > 0) // returns 0 if we are not the
              // holder
              this.lock.unlock(); // IllegalMonitorStateException if our
           // thread is not the holder of the lock,
           // never happens because of above if()
        }
     }
  } // class TopicContainer

There are some additional checks as you can see. Additionally, I'm using tryLock with a timeout of one minute. I also extended ReentrantLock as DebugReentrantLock which saves the last stacktraces of the thread which aquired the lock:

public class DebugReentrantLock extends ReentrantLock {
private static final Logger logger=Logger.getLogger("DebugReentrantLock"); private ArrayList holderTrace=new ArrayList();

   public DebugReentrantLock() {
   }

   public DebugReentrantLock(boolean fair) {
      super(fair);
   }
public boolean tryLock() {
      boolean ret=super.tryLock();
      synchronized(this) {
         if (ret) {
            if (getHoldCount()==1) holderTrace.clear();
            holderTrace.add(new Exception());
         }
         return ret;
      }
   }
public boolean tryLock(long timeout, TimeUnit unit) throws InterruptedException {
      boolean ret=super.tryLock(timeout, unit);
      synchronized(this) {
         if (ret) {
            if (getHoldCount()==1) holderTrace.clear();
            holderTrace.add(new Exception());
         }
         else {
            Thread owner=getOwner();
            Exception ownerE=new Exception();
            ownerE.setStackTrace(owner.getStackTrace());
            logger.log(Level.SEVERE, "trylock failed, locked by", ownerE);
            Iterator i=holderTrace.iterator();
            while (i.hasNext()) {
logger.log(Level.SEVERE, "lock() called at", (Exception)i.next());
            }
logger.log(Level.SEVERE, "failed to lock() at", new Exception());
         }
         return ret;
      }
   }

   public void lock() {
      super.lock();
      synchronized(this) {
         if (getHoldCount()==1) holderTrace.clear();
         holderTrace.add(new Exception());
      }
   }
public void lockInterruptibly() throws InterruptedException {
      super.lockInterruptibly();
      synchronized(this) {
         if (getHoldCount()==1) holderTrace.clear();
         holderTrace.add(new Exception());
      }
   }
}

I hope this will yield some results, I'm trying to stress test the server now. Unfortunately this bug only occurred on the production server so far.
Feel free to tell me any idea you might have. :)

regards,
Balázs Póka


--
Marcel Ruff
http://www.xmlBlaster.org
http://watchee.net
Phone: +49 7551 309371

Reply via email to