DragonFly BSD
DragonFly bugs List (threaded) for 2011-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

[issue2092] Panic: Bad link elm 0x... next->prev != elm


From: "Matthew Dillon \(via DragonFly issue tracker\)" <sinknull@xxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 08 Nov 2011 08:33:01 +0000

Matthew Dillon <dillon@apollo.backplane.com> added the comment:

Ok, a couple of things.

0003 patch - there is no race condition here that I can see in camisr().  It
moves cam_simq to a local on-stack queue with the lock held.  Once on the local
queue it can safely remove the sims from the queue without locking, but of
course must still obtain the CAM_SIM_LOCK() to process the sim.

0002 patch - This primarily exists to handle synchronous cases and the
TAILQ_EMPTY check does not require locking.

0001 patch - This is a very interesting patch. I don't think it fixes any bugs,
though, it's just an optimization.  Is that correct or does it fix bugs?  I can
commit it for its optimization I guess.

Now on to the bugs.  I've gone through the code and I see one possible case in
the timeout handling.  If a callout has a high latency it is possible it can
wind up running long after it's been callout_stop()'d by the driver.  That is,
the callout function itself can be in-progress and stuck waiting for a lock at
the time the frontend (which owns the lock) calls callout_stop().  I think state
can get confused if this happens.

I have made some commits to try to address this potential issue.  I also changed
the 'unknown state' kprintf into a panic... see if you can get a backtrace when
it occurs.  State 6 is a CCB that is on the free list, meaning that all command
processing should already have been done.  Whatever corruption is causing this
double-completion is also probably causing the list corruption.

I have noticed an occasional AHCI timeout while testing high latencies on our
48-core box.  fsstress might introduce similar issues though it might disappear
if you update to the latest master (which you need to get the committed patches
anyway).  The issues were related to heavy MP contention on a global pmap
spinlock which has since been removed.

-Matt

----------
assignedto:  -> dillon
nosy: +dillon
priority:  -> bug

_____________________________________________________
DragonFly issue tracker <bugs@lists.dragonflybsd.org>
<http://bugs.dragonflybsd.org/issue2092>
_____________________________________________________




[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]