Hi Leszek,

2.10.10 is the latest official stable version.
It is the first time that I hear of a problem like yours. As the RMI thread seems to still be running I would think that this is an issue with the OS/JVM stack. Just a last thing, do you have iptables enabled? Do you see the port being used in netstat?

Best regards,
Emmanuel

I have only one IP address and one network adapter. There are no eth
interface related messages in /var/log/messages. None of the telnets
worked. The strange thing is that I was able to connect to the
controller at the very beginning, and once it just stopped responding.
It is not a problem for me as long as I have system under
construction, but as soon as it becomes production ready it will be
unacceptable. That is why I am searching for the explanation of what
happened.

I guess that 2.10.10 is the latest stable version I can use?

Anyway, thanks for help!


On Thu, Dec 11, 2008 at 7:02 PM, Emmanuel Cecchet <[email protected]> wrote:
Hi Leszek,

The stacktraces look good. The console uses the standard RMI connector to
connect to the controller and RMI still seems to be running on the machine.
To see if it's not a problem with the naming resolution could you try from
the controller machine:
telnet 127.0.0.1 1090
telnet public_ip 1090
telnet localhost 1090
telnet public_name 1090

Also does your machine have multiple IP addresses and network adapters?

To see if your network went down, you should have messages in
/var/log/messages saying that eth0 went down and then up.

Thanks for the detailed feedback,
Emmanuel

Hi Emmanuel,

Thank you for your response and redirecting me to appropriate list. I
answer your questions below.

On Thu, Dec 11, 2008 at 4:31 PM, Emmanuel Cecchet <[email protected]>
wrote:

Hi Leszek,


Please use sequoia@ rather than community@ for Sequoia questions. The
community mailing list is used for community announcements (we should
probably rename it to announcements to avoid the confusion).

I've experienced a serious problem with Sequoia Controller (version
2.10.10).
Yesterday , I left it running with full debug on and there were almost
no
requests to the virtual database after that (or just a few per hour).
What
happened this morning really surprised me. Sequoia Controller process is
still running in the system, but there is nothing listening at the 1090
(which is the default port which console connects to). I cannot connect
with
the console, and (now it gets even worse) the virtual database is no
longer
reachable by sequoia JDBC driver.
I've checked the logs but there is nothing there. There is only
"controller.core.PingResponder" which stopped to ping anything during
the
night... It looks like Controller died, but the java process is still
there.

Does anyone have any idea what could have happened there? Is it a known
issue, or could this my mine misconfiguration or something?


The JVM might have crashed. Do you have the output of the console where
you
started Sequoia?

- JVM still seems to be working. I still can see the process in the
system and I even when I check for all java processes with "jps"  JDK
tool, the Controller is there (I have this Controller still running
although not working),
- Console output ends with "20:47:50,416 DEBUG
controller.core.PingResponder Response to ping sent to /192.168.2.120"
and many other ping logs before,
- I can even use JDK "jstack" tool to dump current thread traces.
Thread dumps do not change at all and are always the same (please find
the jstack output pasted below this message). Thread Dump Analyzer
(TDS) says "65% of all threads are sleeping on a monitor".  I guess
this is because there are no requests received by Controller. Please
notice that this one: ""RMI TCP Accept-0" daemon prio=10
tid=0xb53d7000 nid=0x2bbe runnable" is RUNNABLE which makes it even
more confusing...


What OS and JVM are you using?

This is Debian, Linux myhost 2.6.26-1-686 #1 SMP Sat Nov 8 19:00:26
UTC 2008 i686 GNU/Linux


If there is nothing in the Sequoia log, did you find anything in your
system
logs?

Completely nothing useful can be found there.


Did the network go down on the machine?

I don't think so, however I don't know how to make 100% sure. There
should be some exceptions logged by the PingResponder if that
happened.


Does the machine use DHCP and changed its IP address?

There is no DHCP so IP never changes.

I guess this is can be the operating system and/or JDK error. Seems
extremely strange.

I attach a bunch of stacktraces below,
Thank you for your involvement,
Leszek.

*** JSTACK output ***

2008-12-11 15:12:12
Full thread dump Java HotSpot(TM) Client VM (10.0-b23 mixed mode,
sharing):

"Attach Listener" daemon prio=10 tid=0x08a4e400 nid=0x3fab waiting on
condition [0x00000000..0x00000000]
  java.lang.Thread.State: RUNNABLE

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08a87c00 nid=0x25c0 in Object.wait()
[0xb48d5000..0xb48d5e30]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08a87800 nid=0x25bf in Object.wait()
[0xb455a000..0xb455aeb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08885c00 nid=0x25be in Object.wait()
[0xb4a7e000..0xb4a7ef30]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08885800 nid=0x25bd in Object.wait()
[0xb4509000..0xb4509fb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08a83400 nid=0x25bc in Object.wait()
[0xb45ab000..0xb45ac030]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextCommitRollbackToExecute(BackendTaskQueues.java:1796)
       - locked <0x8ca52888> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:182)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08ab0400 nid=0x7213 in Object.wait()
[0xb46ef000..0xb46efdb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08889400 nid=0x7212 in Object.wait()
[0xb4833000..0xb4833e30]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08aaf400 nid=0x7211 in Object.wait()
[0xb4740000..0xb4740eb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x08888800 nid=0x7210 in Object.wait()
[0xb5055000..0xb5055f30]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
daemon prio=10 tid=0x0888a800 nid=0x720f in Object.wait()
[0xb4791000..0xb4791fb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextCommitRollbackToExecute(BackendTaskQueues.java:1796)
       - locked <0x8ca26958> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:182)

"DestroyJavaVM" prio=10 tid=0xb516c400 nid=0x2bb5 waiting on condition
[0x00000000..0xb7d32080]
  java.lang.Thread.State: RUNNABLE

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
prio=10 tid=0x08b6d000 nid=0x2bed in Object.wait()
[0xb4acf000..0xb4acfeb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
prio=10 tid=0x08b6ac00 nid=0x2bec in Object.wait()
[0xb4bc2000..0xb4bc2f30]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
prio=10 tid=0x08b69800 nid=0x2beb in Object.wait()
[0xb4b71000..0xb4b71fb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
prio=10 tid=0x08b67000 nid=0x2bea in Object.wait()
[0xb4b20000..0xb4b21030]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextEntryToExecute(BackendTaskQueues.java:1722)
       - locked <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:185)

"dating - BackendWorkerThread for backend 'pg1' with RAIDb level:1"
prio=10 tid=0x08b68800 nid=0x2be9 in Object.wait()
[0xb4c13000..0xb4c140b0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at java.lang.Object.wait(Object.java:485)
       at
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues.getNextCommitRollbackToExecute(BackendTaskQueues.java:1796)
       - locked <0x8c8bfca0> (a
org.continuent.sequoia.controller.loadbalancer.BackendTaskQueues)
       at
org.continuent.sequoia.controller.loadbalancer.BackendWorkerThread.run(BackendWorkerThread.java:182)

"derby.rawStoreDaemon" daemon prio=10 tid=0x08a77000 nid=0x2bca in
Object.wait() [0xb4d79000..0xb4d7a130]
  java.lang.Thread.State: TIMED_WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       at org.apache.derby.impl.services.daemon.BasicDaemon.rest(Unknown
Source)
       - locked <0x8c68d8e8> (a
org.apache.derby.impl.services.daemon.BasicDaemon)
       at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown
Source)
       at java.lang.Thread.run(Thread.java:619)

"Timer-1" daemon prio=10 tid=0x08a64000 nid=0x2bc9 in Object.wait()
[0xb4dca000..0xb4dcadb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c650b68> (a java.util.TaskQueue)
       at java.lang.Object.wait(Object.java:485)
       at java.util.TimerThread.mainLoop(Timer.java:483)
       - locked <0x8c650b68> (a java.util.TaskQueue)
       at java.util.TimerThread.run(Timer.java:462)

"derby.antiGC" daemon prio=10 tid=0xb512e800 nid=0x2bc8 in
Object.wait() [0xb4e1e000..0xb4e1ee30]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c64adc8> (a
org.apache.derby.impl.services.monitor.AntiGC)
       at java.lang.Object.wait(Object.java:485)
       at org.apache.derby.impl.services.monitor.AntiGC.run(Unknown
Source)
       - locked <0x8c64adc8> (a
org.apache.derby.impl.services.monitor.AntiGC)
       at java.lang.Thread.run(Thread.java:619)

"RMI Scheduler(0)" daemon prio=10 tid=0xb53db400 nid=0x2bc2 waiting on
condition [0xb5004000..0xb5005130]
  java.lang.Thread.State: WAITING (parking)
       at sun.misc.Unsafe.park(Native Method)
       - parking to wait for  <0x8c617e20> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
       at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
       at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
       at java.util.concurrent.DelayQueue.take(DelayQueue.java:160)
       at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:582)
       at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:575)
       at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:946)
       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:906)
       at java.lang.Thread.run(Thread.java:619)

"RMI TCP Accept-0" daemon prio=10 tid=0xb53d7000 nid=0x2bbe runnable
[0xb5252000..0xb5252f30]
  java.lang.Thread.State: RUNNABLE
       at java.net.PlainSocketImpl.socketAccept(Native Method)
       at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
       - locked <0x8c61a5e0> (a java.net.SocksSocketImpl)
       at java.net.ServerSocket.implAccept(ServerSocket.java:453)
       at java.net.ServerSocket.accept(ServerSocket.java:421)
       at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
       at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
       at java.lang.Thread.run(Thread.java:619)

"Low Memory Detector" daemon prio=10 tid=0xb590a000 nid=0x2bbb
runnable [0x00000000..0x00000000]
  java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0xb5908800 nid=0x2bba waiting on
condition [0x00000000..0xb5773be8]
  java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0xb5907400 nid=0x2bb9 runnable
[0x00000000..0xb57c4a90]
  java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0xb5901c00 nid=0x2bb8 in Object.wait()
[0xb5a64000..0xb5a64e30]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c57af60> (a java.lang.ref.ReferenceQueue$Lock)
       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
       - locked <0x8c57af60> (a java.lang.ref.ReferenceQueue$Lock)
       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
       at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0xb5900c00 nid=0x2bb7 in
Object.wait() [0xb5ab5000..0xb5ab5eb0]
  java.lang.Thread.State: WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x8c57afe8> (a java.lang.ref.Reference$Lock)
       at java.lang.Object.wait(Object.java:485)
       at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
       - locked <0x8c57afe8> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x08856800 nid=0x2bb6 runnable

"VM Periodic Task Thread" prio=10 tid=0xb5913800 nid=0x2bbc waiting on
condition

JNI global references: 1076




I would greatly appreciate any help!!

BTW: When visiting http://sequoia.continuent.org/HomePage by I receive:
"Query failed: insert into sequoia_referrers set page_tag = 'HomePage',
referrer =

'http://www.google.com/search?client=opera&rls=pl&q=sequoia+jdbc&sourceid=opera&ie=utf-8&oe=utf-8',
time = now() (Can't open file: 'sequoia_referrers.MYI' (errno: 145))"
error
- looks like referrer cannot be saved (quite unexpected on the webpage
of
database professionals, however there is no problem to workaround;))


It looks like the MySQL database behind the site has crashed.
Unfortunately
I don't think that Continuent is using its own product to backup its
site...

Thanks for your feedback,
Emmanuel


--
Emmanuel Cecchet
FTO @ Frog Thinker Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: [email protected]
Skype: emmanuel_cecchet

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Reply via email to