Re: [uClinux-dev] waitqueues unreliable on coldfire/2.6.23-uc0?

David McCullough Mon, 07 Apr 2008 16:50:41 -0700

Jivin Wolfgang Wegner lays it down ...
> Hi,
> 
> On Mon, Apr 07, 2008 at 11:52:55AM +1000, David McCullough wrote:
> > 
> > Jivin Wolfgang Wegner lays it down ...
> > > 
> > > you are right, this is a problem that could happen in theory - but on
> > > the other hand there should be one and exactly one interrupt while waiting
> > > for the transfer to complete, so I can not see where the other execution
> > > thread triggering the second interrupt should come from.
> > 
> > 
> > Trace and make sure it isn't.  I don't know the HW,  are you acking the
> > interrupt or stopping further interrupts ?
> 
> I am just acking the interrupt, but as this driver is only capable of
> I2C master anyways, there is no reason for a new interrupt to be generated
> without driver intervention on the HW.



Some hardware will continue to interrupt until the condition causing the
interrupt is removed (ie., you pull the data from the chip or fix an
error condition).

Whatever is happening in this case,  it won't hurt to add trace to the
code and make 100% sure it is behaving as you would expect.

> > > Well, maybe I got it wrong, but to me this looks like a busy wait,
> > 
> > It is,  but you aren't doing the work in your interrupt rouinte,
> > and normally (ie., good practice) you would be limiting the number of
> > times it can loop with a work counter or similar.  Check out some network
> > dirvers for examples.
> 
> coldfire_wait_transfer is called during transfers, not in interrupt
> context

I realised that :-)

> - so the processor would be doing busy wait almost during the
> whole transfer. The driver would have to be completely re-structured

Then the condition for more work when servicing the device is the wrong
check.  I don't know the HW,  I am just making suggestions based on
experience.

> to be comparable to a network driver as far as I can see. (queueing
> transfer requests and processing them in an interrupt-controlled
> tasklet or something.)
> 
> > > which is clearly not what I want a 200+ MHz processor to do while
> > > waiting for a slow bus like I2C.
> > 
> > It's only busy while there are interrupts pending on the device.  If the
> > bus is slow you will easily pull everything out of the chip and exit for
> > further processing before the next interrupt comes along and wakes it up.
> 
> See above - this is not true.
> 
> I agree that the possible race condition is a bit ugly and should be fixed,
> but as far as I can see it is currently only a theoretical problem that can
> not be triggered by current hardware.
> 
> However, as I could solve the problem by adding a second check in
> coldfire_wait_transfer like this:
> 
> static int coldfire_wait_transfer(void) {
> 
>       int timeout;
> 
>       /* wait for data transfer to complete */
>       timeout = wait_event_interruptible_timeout(coldfire_i2c_queue,
>                                                  1 == 
> atomic_read(&coldfire_i2c_irq_happened),
>                                                  COLDFIRE_I2C_TIMEOUT);
> 
>       if ((timeout <= 0 ) || (*MCF_I2C_I2SR & MCF_I2C_I2SR_IAL)) {
>               if((*MCF_I2C_I2SR & MCF_I2C_I2SR_ICF) && (!(*MCF_I2C_I2SR & 
> MCF_I2C_I2SR_IAL))) {
>                       atomic_set(&coldfire_i2c_irq_happened, 0);
>                       return 0;
>               }
>               printk("wt: timeout = %d, I2SR = %x\n, ih = %d",timeout, 
> *MCF_I2C_I2SR, atomic_read(&coldfire_i2c_irq_happened));
>               return -1;
>       }
>       else {
>               atomic_set(&coldfire_i2c_irq_happened, 0);
>               return 0;
>       }
> };
> 
> and I can not see a timeout on I2C hardware level it is obvious to me that
> the original problem is related to the waitqueue failing from time to time.

Then you need to prove this, go looking for why it is happening, try and
find a way to describe in more detail what is going wrong so that others
may be able to spot what is causing your problems.

Mostly, problems are in the drivers,  generic stuff like waitqueue is much
less likely to be broken, but it is possible.  If it is broken you need
to focus on the arch portion of it,  it would be quite unlikely for any
generic waitqueue code to be causing the problem.

Cheers,
Davidm

-- 
David McCullough,  [EMAIL PROTECTED],   Ph:+61 734352815
Secure Computing - SnapGear  http://www.uCdot.org http://www.cyberguard.com
_______________________________________________
uClinux-dev mailing list
[email protected]
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by [email protected]
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev

Re: [uClinux-dev] waitqueues unreliable on coldfire/2.6.23-uc0?

Reply via email to