DragonFly BSD
DragonFly kernel List (threaded) for 2004-12
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Description of the Journaling topology


To: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
From: Maxim Sobolev <sobomax@xxxxxxxxxxx>
Date: Fri, 31 Dec 2004 02:10:38 +0200

Matthew Dillon wrote:
:All this work on the VFS layer looks very exciting and I agree with most
:of what you've said, specially the "Solaris did it that way" comment. To
:get to the point, I have a couple of questions about this
:implementation.
:
:Where will the log reside? As a special file in /, at the end of the
:partition, in another section of the disk? I take a transparent
:migration from normal UFS to journaled UFS will be provided, at least I
:hope so :)

The log is just a file descriptor, which means that it could represent
a special journaling device, a pipe to a process, a regular file, and in particular it could represent a socket piping the journaled data
to an off-site machine.


    The plan is to evolve this basic mechanism into a more sophisticated
    one as time passes, introducing a stream in the reverse direction to
    allow the journaling target to tell the journaling system when a
    piece of data has been physically committed to hard storage.  This
    information could in turn be fed back to a journal-aware filesystem
    but I would stress that awareness of the journal by the filesystem
    is not a requirement.  One can reap huge benefits from the journaling
    mechanism whether the filesystem is aware of it or not.

I think that there is a basic synchronisation issue in such topology. Due to buffering, delays, etc it is possible that in some cases filesystem will commit changes to the permanent storage before appropriate journaling entry is created, i.e.:


1. App executes unlink("foo").
2. Kernel sends appropriate VOP to the filesystem and to the journal.
3. Filesystem commits metadata update, journal entry still sits somewhere in the buffer.
4. App executes open("foo", O_CREAT).
5. Kernel sends appropriate VOP to the filesystem and to the journal.
6. Journaling system commits unlink() entry to the storage.
7. Filesystem commits metadata update, machine crashes before journal entry for open() is committed.


On reboot, kernel tries to replay journal as a result already created file foo is lost. The same situation may happen for subsequent write's and other operations - due to jounrnal lagging behing storage it is possible that in the case of failure some data already written to the storage is lost.

How you are going to address this issue?

-Maxim



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]