DragonFly kernel List (threaded) for 2008-06
DragonFly BSD
DragonFly kernel List (threaded) for 2008-06
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: HAMMER Update 13-June-2008 - HEADS UP: MEDIA CHANGED AGAIN


From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 16 Jun 2008 14:37:36 -0700 (PDT)

:<dillon@apollo.backplane.com> wrote:
:>    I am not too familiar with Reiser so I can't really come to that
:>    conclusion, but it doesn't seem likely that the reasons are similar.
:
:They used B+ in earlier versions. In Reiser4 they decided to use
:
:http://en.wikipedia.org/wiki/B*-tree
:
:and
:
:http://en.wikipedia.org/wiki/Dancing_trees
:
:-- 
:Dan

    I can never keep all the variations straight, I need a Wiki reference
    every time to get the name right :-).

    These are all minor variations of B+Trees.  The B* variation keeps
    nodes 2/3rds full instead of half full.  It is an optimization which
    is intended to maintain higher fill ratios to make better use of system
    caches.

    A Dancing tree ... hehehehe.  Looks like Hans invented that one.  I'm
    not sure it even counts as a variation of a B+Tree.  It doesn't apply
    to HAMMER at all because HAMMER makes no modifications to the B-Tree
    whatsoever on the front-end.  When you create, delete, rename, write,
    etc...  when you do those operations HAMMER caches them in a
    virtualization layer in memory and doesn't make any modifications to
    its on-media data structures (or their in-memory representations) at
    all until the meta-data is synced to disk.

    HAMMER doesn't have to make real-time on-media modifications for any
    meta-data change for ANY operation.  Not a single one.  Not rename, not
    hard or soft links, not truncations, nothing.

    HAMMER uses a modified B+Tree for its on-disk representation, which is
    a B-Tree with only keys at internal nodes and only records at the leafs.
    This was done to reduce structural bloat, allow for a leaf->leaf
    linking optimization in the future, and for other reasons.  The Wiki
    definition isn't quite right.  The leaf->leaf links are optional (and
    HAMMER doesn't implement them, though I've left room to do so).  It's
    still a B+Tree if you don't have them.

    HAMMER's custom modification is to formalize both a left AND a right
    bounding element.  So a HAMMER B+Tree looks like this:

	B B B B B B	<---- internal node  (e.g. 6 elements w/ 5 leaf ptrs)
	 L L L L L	<---- leaf nodes

    Whereas a normal B+Tree looks like this (note the missing right bound):

	B B B B B  	<---- internal node
	 L L L L L	<---- leaf nodes

    Note that HAMMER uses a radix of 64, so e.g. 64 internal elements but one
    is a right-bound so only 63 pointers to the next level.  The leafs then
    have 64 elements.  So HAMMER's B+Tree recursion is 63x63x63....x64.

    HAMMER's internal nodes have a left and right bounding element.  A 
    standard B+Tree only has a left bounding element.  By adding a right
    bounding element HAMMER can cache pointers into its B+Tree and 'pick up'
    searches, insertions, and deletions relative to the cached pointers instead
    of having to start at the root of the tree.  More importantly, it can 
    pickup searches, insertions, and deletions at internal nodes, not just
    leaf nodes.  So I can cache a proximity pointer and if I do a good job
    I never have to traverse the B+Tree above that point.

    Without the right bound a search cannot be picked up from mid-tree
    without potentially going down the wrong branch and then having to 
    backtrack.

    --

    Many of the textbook optimizations listed on e.g. Wiki have serious
    issues... you can't make the blanket statement "method A is better
    then method B".  Balancing optimizations can cause massive issues with
    deadlocks, for example, or a need to lock extra nodes, or a need to
    traverse a greater portion of the tree.  In fact, for that reason, a
    lot of people don't even try to balance B-Tree fill ratios in the
    critical path, and neither does HAMMER.  Putting in better balancing and
    fill-ratio algorithms are something I'll probably do in the pruner or
    reblocker at some point, but never in the critical path.

    I've left the implementation open to additional optimizations, such
    as the leaf-linking optimization and better balancing optimizations
    such as you get with B*.  There are plenty of others as well, and in fact
    WHEN you do the optimizations is often more important then WHAT
    optimizations you choose to do.

    HAMMER does a bunch of other stuff as well.  But HAMMER's main
    optimization is the addition of the right bounding element and its
    ability to cache pointers into the middle of the B+Tree to short-cut
    searches.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>



[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]