:47 PM
To: Rick Grubin
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] I/O errors with NAMD
On 07/23/2010 06:39 PM, Rick Grubin wrote:
>
>> On 2010-07-23, at 11:53, Richard Lefebvre wrote:
>>
>>> If I had some Lustre error, it would give me a clue, but the o
On 07/23/2010 06:39 PM, Rick Grubin wrote:
>
>> On 2010-07-23, at 11:53, Richard Lefebvre wrote:
>>
>>> If I had some Lustre error, it would give me a clue, but the only
>>> errors the users get is the following traceback on the
>>> application:
>>>
>>> -
> On 2010-07-23, at 11:53, Richard Lefebvre wrote:
>
>> If I had some Lustre error, it would give me a clue, but the only errors the
>> users get is the following traceback on the application:
>>
>> ---
>> Reason: FATAL ERROR: Er
On 2010-07-23, at 11:53, Richard Lefebvre wrote:
> If I had some Lustre error, it would give me a clue, but the only errors the
> users get is the following traceback on the application:
>
> ---
> Reason: FATAL ERROR: Error on write
On Fri, Jul 23, 2010 at 10:53:45AM -0700, Richard Lefebvre wrote:
> If I had some Lustre error, it would give me a clue, but the only errors
> the users get is the following traceback on the application:
>
> ---
> Reason: FATAL ERROR
Hi Larry,
>From my experience, if the application is doing some I/O and server evicts
the node that application is running on this will definitely result in EIO
error being send to the application, thus the input/output error message in
the standard output of the application.
In the case of my clu
If I had some Lustre error, it would give me a clue, but the only errors
the users get is the following traceback on the application:
---
Reason: FATAL ERROR: Error on write to binary file
restart/ABCD_les4.95.vel: Interrupted sy
There are many kinds of reasons that a server evicts a client, maybe
network error, maybe ptlrpcd bug, but according to my experience, the
only chance to see the I/O error is running namd in lustre filesystem,
I can see some other "evict" events sometimes, but none of them
results in I/O error. So
There is a similar thread on this mailing list:
http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/afe24159554cd3ff/8b37bababf848123?lnk=gst&q=I%2FO+error+on+clients#
Also there is a bug open which reports similar problem:
https://bugzilla.lustre.org/show_bug.cgi?id=23190
On
we have the same problem when running namd in lustre sometimes, the
console log suggest file lock expired, but I don't know why.
On Fri, Jul 23, 2010 at 8:12 AM, Wojciech Turek wrote:
> Hi Richard,
>
> If the cause of the I/O errors is Lustre there will be some message in the
> logs. I am seeing
Hi Richard,
If the cause of the I/O errors is Lustre there will be some message in the
logs. I am seeing similar problem with some applications that run on our
cluster. The symptoms are always the same, just before application crashes
with I/O error node gets evicted with a message like that:
Lus
On 2010-07-22, at 14:59, Richard Lefebvre wrote:
> I have a problem with the Scalable molecular dynamics software NAMD. It
> write restart files once in a while. But sometime the binary write
> crashes. The when it crashes is not constant. The only constant thing is
> it happens when it writes o
Hi,
I have a problem with the Scalable molecular dynamics software NAMD. It
write restart files once in a while. But sometime the binary write
crashes. The when it crashes is not constant. The only constant thing is
it happens when it writes on our Lustre file system. When it write on
somethin
13 matches
Mail list logo