DragonFly kernel List (threaded) for 2007-02
DragonFly BSD
DragonFly kernel List (threaded) for 2007-02
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: Initial filesystem design synopsis.

From: "Thomas E. Spanjaard" <tgen@xxxxxxxxxxxxx>
Date: Wed, 21 Feb 2007 23:47:41 +0000

Matthew Dillon wrote:

    The physical storage backing a filesystem is broken up into large
    1MB-4GB segments (64MB is a typical value).  Each segment is
    self-identifying and contains its own header, data table, and record
    table.  The operating system glues together filesystems and determines
    availability based on the segments it finds.

I think the more common term for this kind of thing is 'allocation group'.

    - The data table consists of pure data, laid out linearly in the forward
      direction within the segment.   Data blocks are variable-sized entities
      containing pure data, with no other identifying information, suitable
      for direct DMA.  The segment header has a simple append index for
      the data table.

And 'extent' for the variable-sized entities :).

    - The record table consists of fixed-sized records and a reference to
      data in the data table.  The record table is built backwards from
      the end of the segment.

Doesn't this prepending stuff incur a significant performance penalty for operations that walk the record table in a chronological/otherwise 'fifo' ordered fashion?

Record destruction creates holes in both the data table and the record
table. Any holes adjacent to the data table append point or the record
table prepend point are immediately recovered by adjusting the appropriate indices in the segment header. The operating system may
cache a record of non-adjacent holes (in memory) and reuse the space,
and can also generate an in-memory index of available holes on the
fly when space is very tight (which requires scanning the record table),
but otherwise the recovery of any space not adjacent to the data table
append point requires a performance reorganization of the segment.

I think these lists/trees should be kept sorted, at least on-disk for performance reasons (random reads/writes on rotational media is a bummer given current seek times).

Generally, I can't help but feel that the clustering/replication stuff needs to be separate from the 'actual on-disk' filesystem.

        Thomas E. Spanjaard

Attachment: signature.asc
Description: OpenPGP digital signature

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]