DragonFly kernel List (threaded) for 2005-07
Re: VFS journaling... similar technology
:I was reading the following thread.
:I think it makes sense to point out that there are other systems that
:are linux/BSD like that have VFS messaging. Andrew Morton has adopted
:v9fs which is an implementation of the 9P filesystem protocol, done
:through messages, and hooks into the linux VFS.
:In fact, it should be possible to now use the Plan9Ports "venti"
:system which does something similar to the idea of a realtime-backup
:system. Venti is traditionally used as the block storage backend for a
:"Fossil" filesystem on Plan 9 systems. It effectively implements
:something like a "write once" repository [yes, you can dump old
:blocks] and for some people this eliminates the need to for CVS as
:people can pull old files from the log per-se from previous snapshots
:that are taken.
:It's pretty fascinating stuff. Even if DragonFly never does 9P all of
:these things could be done through this messaging interface that you
:now have. It'd be really cool for us Inferno/Plan9 geeks to have a
:translation layer to 9P and back on DragonFly. We could instantly tie
:DragonFly into our grids. And 9P is pretty danged reliable... I can
:mount my files on a japanese server from Seattle and the connection
:never seems to break [of course they have stability algorithms for
:Lookin good guys! It might be helpful for ideas to poke around in
:these more esoteric OSes... some of this kind of work has, in fact,
:been done before [just not exactly the same way].
I am not familiar with 9P, but Hiten and I were just talking
yesterday about implementing userland VFS.
It turns out that we are a lot closer to being able to do it then I
thought we were. The journaling code's FIFO infrastructure is already
fully capable of a generic two-way transaction-based stream between
userland and the kernel, and already solves the issue of large I/O's
(i.e. someone does a read() or write() of a gigabyte in a single call).
It is also capable of handling stream restarts (i.e. you kill and restart
the userland process), though there is one synchronization issue there
related to large transactions that I haven't solved yet.
Implementing a userland VFS based on a two-way stream is thus a very
easily reachable goal. We basically just create a VFS layer that uses
the same journaling FIFO mechanism that the journaling code currently
uses and then instead of encapsulating only the modifying ops in the
stream, we would encapsulate ALL the ops and process the return stream
to get the results.
Insofar as robustness goes, I think that is a reachable goal as well
once I solve this last little issue with restarting large transactions
(the basic problem is that a large transaction, e.g. a 1GB read or write,
is far larger then the memory FIFO the kernel uses to buffer the stream,
so the userland process must acknowledge portions of the transaction
before actually completing the transaction, which means it must store
the data somewhere and fsync it so it can transparently reconnect to
the journaling stream if it is killed and restarted).