On Mon, Jun 8, 2009 at 12:04 PM, Neil Ellis <neil.el...@mangala.co.uk> wrote:
> So after a weekend away from the problem it occured to me that I had
> neglected to nohup the java process. Therefore I am suspect that was
> the problem, running now with nohup ;-)
>

Lets hope this works. I am not 100% sure about this but normally a
kill signal sent to a Java process shouldn't affect any of the Java
thread's interrupt status. If thats the case (when for example using
java.nio stuff) behavior is new to me. A way to achieve this would be
to register a signal handler (via sun.misc.Signal) and manually go
through each thread and interrupt them (so check if there is any code
like that running). What JVM are you running?

Another thing, I just now went through all the catching of
InterruptedException we do in Neo4j and actually found one place that
calls java.nio code that didn't reset the interrupt properly after a
wait. I fixed this in trunk (b9-SNAPSHOT) and if you want you could
try to apply the same patch to b8 (see
https://trac.neo4j.org/changeset/2890) or I could provide a jar for
you. If you want to try out b9-SNAPSHOT just remember you can't switch
back to b8 since the store layout as changed.

> It also occured to me that on receipt of ClosedByInterruptException
> Neo4J should probably trigger a shutdown (i.e. neo.shutdown()) and log
> that it was interrupted since an interrupt has been triggered on the
> write thread. The later stack traces show that Neo doesn't recover
> from such an interrupt, so one would suspect that this is currently
> the best course of action and is logically what should happen when an
> interrupt is triggered.
>

Agreed, performing a shutdown and presenting a message to the user
about the cause would be much better then current behavior. I will
start thinking on how this can be implemented.

Regards,
Johan

> Let me know what you think please Johan, meanwhile I'm running with
> nohup and will let you know if that was indeed the cause.
>
> All the best
> Neil
>
> On 5 Jun 2009, at 19:05, Johan Svensson wrote:
>
>> These problems are hard to find.
>>
>> I do not like the way java.nio behaves on interrupts since I don't
>> know how much data got written/read and the underlying file channel
>> just gets closed. At the moment the only thing we can do is throw an
>> exception and do a full recovery process...
>>
>> Also I think Thread.interrupt idiom/usage is broken. Only time you can
>> use it is when you have total control and "own" the full stack knowing
>> exactly where the specific thread is executing. I have seen some web
>> servers/containers use it to try control threads that should timeout
>> and that doesn't work very well.
>>
>> How many CPUs do you have on the machine running this (could be a
>> spurious wakeup somewhere)? Also do you run concurrent transactions or
>> not? I have been trying to reproduce the other problem with nested
>> transactions but nothing so far.
>>
>> -Johan
>>
>> On Fri, Jun 5, 2009 at 7:42 PM, Neil Ellis
>> <neil.el...@mangala.co.uk> wrote:
>>> Nope, don't think it's a user exception, digging deeper.
>>>
>>>
>>> On 5 Jun 2009, at 14:55, Neil Ellis wrote:
>>>
>>>> Hi Johan
>>>>
>>>> Took a little longer to fail  (circa 31Gb) with
>>>>
>>>> java.nio.channels.ClosedByInterruptException
>>>> Received http://www.myspace.com/nuski
>>>>        at
>>>> java
>>>> .nio
>>>> .channels
>>>> .spi
>>>> .AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:
>>>> 184)
>>>>        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
>>>>        at org.neo4j.impl.transaction.TxLog.txDone(TxLog.java:221)
>>>>        at
>>>> org.neo4j.impl.transaction.TxManager.rollback(TxManager.java:732)
>>>>        at
>>>> org
>>>> .neo4j
>>>> .impl.transaction.TransactionImpl.rollback(TransactionImpl.java:108)
>>>>        at org.neo4j.api.core.EmbeddedNeo
>>>> $TransactionImpl.finish(EmbeddedNeo.java:377)
>>>>
>>>> So I'm going to have to check through my code to see how the thread
>>>> got interrupted. That smells
>>>> like a user error, however it would be good of course if we can
>>>> track the mistake down and allow
>>>> Neo4j to realise it is a user error and fail more gracefully.
>>>> Anyway, my turn to investigate a little more :)
>>>>
>>>> ATB
>>>> Neil
>>>> On 4 Jun 2009, at 19:19, Johan Svensson wrote:
>>>>
>>>>> Thanks, I'll have a look at this and run some tests with nested
>>>>> transactions.
>>>>>
>>>>> -Johan
>>>>>
>>>>> On Thu, Jun 4, 2009 at 7:25 PM, Neil Ellis
>>>>> <neil.el...@mangala.co.uk> wrote:
>>>>>> Hi these are from centos and 1.0-b8 I have changed to avoid nested
>>>>>> transactions and now I'm not getting this, so (at the moment)
>>>>>> looks
>>>>>> like a nested transaction quirk ... and would only occur after a
>>>>>> large
>>>>>> number of successful writes (gigs).
>>>>>>
>>>>>> Thx
>>>>>> Neil
>>>>>>
>>>>>> On 4 Jun 2009, at 18:18, Johan Svensson wrote:
>>>>>>
>>>>>>> Hi Niel,
>>>>>>>
>>>>>>> What version of Neo4j are you running (b8,b9-SNAPSHOT) and on
>>>>>>> what
>>>>>>> OS/file system?
>>>>>>>
>>>>>>> -Johan
>>>>>>>
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to