DragonFly BSD
From: Matthew Dillon <dillon@apollo.backplane.com>
Subject: Re: VKernel progress update - 9 Jan 2006
Date: Thu, 11 Jan 2007 17:40:44 -0800 (PST)
:On Thu, Jan 11, 2007 at 04:53:52PM -0800, Bill Huey wrote:
:> This is more recent.
:> 	http://user-mode-linux.sourceforge.net/slides/lwe2005/img0.html
:Jeff Dike's modifications for address space handling in the host kernel.
:	http://user-mode-linux.sourceforge.net/skas.html

    This is more interesting.  It looks like SKAS is a lot closer to
    what I have been doing, though it is still unclear to me what they
    are using to manage the multiple emulated user VM spaces.  In DragonFly
    I built direct VM space support into the (real) kernel and the virtual
    kernel can simply manipulate them however it wishes, and just tells
    the real kernel to switch into one.

    The original UVM used one real-kernel process for each UVM emulated
    process and had some fairly serious security issues related to the
    visiblity of the UVM kernel to said processes.

    The new SKAS stuff is using two primary processes under the real kernel,
    one running the UVM kernel, and one running whichever emulated user
    process the UVM kernel wants to run.

    In DragonFly there is just one user process and N VM spaces and the
    virtual kernel simply tells the real kernel which VM space to run in
    that process.  Some of the linux slides and comments imply that there
    is less overhead splitting that into two separate processes, but I
    don't see how that can be the case since VM space swapping is exactly
    the same as context switching.  Signals within the virtual kernel
    and emulated virtual user process are entirely handled by the virtual
    kernel.  Any real-kernel signal simply causes the real kernel to swap
    the virtual kernel's VM space back in and then delivers the signal

    Dealing with VM/CPU contexts is nasty as hell, no matter how you twist
    it.  Switching VM spaces requires reloading the MMU page directory
    (%cr3), swapping out the register frame, swapping out the TLS (three
    descriptors in the cpu's GDT), and the LDT table.  If floating point
    is involved we can at least lazy swap the FP registers, but it is still
    mad expensive.

    The memory mapping is another big issue.  The real kernel is managing
    the VM spaces which means that it is also managing the PMAPs.  On BSD
    systems (including ours), PMAPs are always throw-away entities so the
    real kernel overhead is almost non-existant.  Since the DragonFly 
    virtual kernel manipulates the entire emulated user VM space with a
    single real-kernel mmap() using MAP_VPAGETABLE to slave the
    emulated user VM space to the virtual kernel's 'memory', the vm_map
    overhead in the real kernel is almost non-existant.

    Oooh, my buildworld finshed!

    vkernel# /usr/bin/time make -j 8 buildworld
	 3748.21 real         0.00 user      3337.10 sys

	 2071.23 real      2621.60 user       331.64 sys

    (the user time is zero in the virtual kernel only because I haven't
    fixed up the real/user/sys context test in the virtual kernel build

    That isn't bad considering that all the I/O is synchronous and I haven't
    made any effort to optimize anything yet.

					Matthew Dillon 

