DragonFly BSD
DragonFly kernel List (threaded) for 2004-01
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Background fsck


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 19 Jan 2004 11:04:38 -0800 (PST)

:On Mon, 19 Jan 2004 08:45:30 -0800 (PST)
:Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx> wrote:
:
:>     I really dislike the concept of a background fsck.  I don't trust
:>     it.
:
:On Mon, 19 Jan 2004 09:28:30 -0800 (PST)
:Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx> wrote:
:
:>     I really dislike a R/W mount on anything dirty.
:
:Matt, I can appreciate that you feel a certain way, but, but, but,
:you're not saying *why* and it's driving me bonkers.  :)
:
:-Chris

    The problem is that while it is possible to make softupdates 
    algorithmically robust in regards to recovery, softupdates itself
    is such a complex beast that bugs have and will still be found which
    create unexpected corruption on the disk during a failure.

    In addition, it is very difficult to tell when and in what order
    data actually winds up on a disk, even with the dependancy information
    available.  For example, lets say you are doing an atomic 64KB 
    write transaction to a hard drive and a power failure occurs right smack
    in the middle of the transaction.  You might think that you would be able
    to assume that a sequential portion of your 64KB block might have made
    it to the disk, but in reality it is possible for *RANDOM* portions
    of that 64KB block to wind up on the disk because:

	* Most modern HD's do whole-track writes now, and may start writing
	  the track at any relative sector.  So it could start writing the
	  track in the middle of the 64K block, getting the last half of it
	  to disk before getting the first of it to disk.

	* If part of your block happens to reside in a spare sector (all 
	  modern disks keep a number of spare sectors on each physical track
	  to handle media errors), then the actual update on the disk could
	  be entirely reandom.

	* Modern HD's number sectors either backwards or forwards on the
	  actual media.  Most do it backwards now, so you can't make
	  assumptions in regards to which portion of your larger write
	  might have gotten to the disk before other portions of your
	  larger write.

    So what does this all mean?  This means that if a power failure occurs
    write smack in the middle of a disk I/O, all of softupdate's careful
    block ordering could wind up for naught, which means that unexpected
    corruption can creep in no matter what you do.

    At least with a journal you can (A) replay the log starting much farther
    back then you really need to start in order to add slop and (B) you
    can serialize (add serial numbers to) individual log blocks to detect
    hardware-related out-of-order failure conditions on reboot.  If you are
    missing part of your log that's where you stop the replay.  And, (C),
    since metea-data log flushes do not have to occur at a high rate you 
    can afford to force the HD to physically flush its caches between the
    journal meta-data write and the random meta-data writes.

					-Matt
					Matthew Dillon 
					<dillon@xxxxxxxxxxxxx>



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]