DragonFly On-Line Manual Pages

VKERNEL(7)        DragonFly Miscellaneous Information Manual        VKERNEL(7)

NAME
     vkernel, vcd, vkd, vke - virtual kernel architecture

SYNOPSIS
     platform vkernel64 # for 64 bit vkernels
     device vcd
     device vkd
     device vke

     /var/vkernel/boot/kernel/kernel [-hstUvz] [-c file] [-e
     name=value:name=value:...] [-i file] [-I
     interface[:address1[:address2][/netmask][=mac]]] [-l cpulock] [-m size]
     [-n numcpus[:lbits[:cbits]]] [-p pidfile] [-r file[:serno]] [-R
     file[:serno]]

DESCRIPTION
     The vkernel architecture allows for running DragonFly kernels in
     userland.

     The following options are available:

     -c file        Specify a readonly CD-ROM image file to be used by the
                    kernel, with the first -c option defining vcd0, the second
                    one vcd1, and so on.  The first -r, -R, or -c option
                    specified on the command line will be the boot disk.  The
                    CD9660 filesystem is assumed when booting from this media.

     -e name=value:name=value:...
                    Specify an environment to be used by the kernel.  This
                    option can be specified more than once.

     -h             Shows a list of available options, each with a short
                    description.

     -i file        Specify a memory image file to be used by the virtual
                    kernel.  If no -i option is given, the kernel will
                    generate a name of the form /var/vkernel/memimg.XXXXXX,
                    with the trailing `Xs' being replaced by a sequential
                    number, e.g. memimg.000001.

     -I interface[:address1[:address2][/netmask][=MAC]]
                    Create a virtual network device, with the first -I option
                    defining vke0, the second one vke1, and so on.

                    The interface argument is the name of a tap(4) device node
                    or the path to a vknetd(8) socket.  The /dev/ path prefix
                    does not have to be specified and will be automatically
                    prepended for a device node.  Specifying auto will pick
                    the first unused tap(4) device.

                    The address1 and address2 arguments are the IP addresses
                    of the tap(4) and vke interfaces.  Optionally, address1
                    may be of the form bridgeX in which case the tap(4)
                    interface is added to the specified bridge(4) interface.
                    The vke address is not assigned until the interface is
                    brought up in the guest.

                    The netmask argument applies to all interfaces for which
                    an address is specified.

                    The MAC argument is the MAC address of the vke(4)
                    interface.  If not specified, a pseudo-random one will be
                    generated.

                    When running multiple vkernels it is often more convenient
                    to simply connect to a vknetd(8) socket and let vknetd
                    deal with the tap and/or bridge.  An example of this would
                    be /var/run/vknet:0.0.0.0:10.2.0.2/16.

     -l cpulock     Specify which, if any, real CPUs to lock virtual CPUs to.
                    cpulock is one of any, map[,startCPU], or CPU.

                    any does not map virtual CPUs to real CPUs.  This is the
                    default.

                    map[,startCPU] maps each virtual CPU to a real CPU
                    starting with real CPU 0 or startCPU if specified.

                    CPU locks all virtual CPUs to the real CPU specified by
                    CPU.

                    Locking the vkernel to a set of cpus is recommended on
                    multi-socket systems to improve NUMA locality of
                    reference.

     -m size        Specify the amount of memory to be used by the kernel in
                    bytes, K (kilobytes), M (megabytes) or G (gigabytes).
                    Lowercase versions of K, M, and G are allowed.

     -n numcpus[:lbits[:cbits]]
                    numcpus specifies the number of CPUs you wish to emulate.
                    Up to 16 CPUs are supported with 2 being the default
                    unless otherwise specified.

                    lbits specifies the number of bits within APICID(=CPUID)
                    needed for representing the logical ID.  Controls the
                    number of threads/core (0 bits - 1 thread, 1 bit - 2
                    threads).  This parameter is optional (mandatory only if
                    cbits is specified).

                    cbits specifies the number of bits within APICID(=CPUID)
                    needed for representing the core ID.  Controls the number
                    of core/package (0 bits - 1 core, 1 bit - 2 cores).  This
                    parameter is optional.

     -p pidfile     Specify a pidfile in which to store the process ID.
                    Scripts can use this file to locate the vkernel pid for
                    the purpose of shutting down or killing it.

                    The vkernel will hold a lock on the pidfile while running.
                    Scripts may test for the lock to determine if the pidfile
                    is valid or stale so as to avoid accidentally killing a
                    random process.  Something like '/usr/bin/lockf -ks -t 0
                    pidfile echo -n' may be used to test the lock.  A non-zero
                    exit code indicates that the pidfile represents a running
                    vkernel.

                    An error is issued and the vkernel exits if this file
                    cannot be opened for writing or if it is already locked by
                    an active vkernel process.

     -r file[:serno]
                    Specify a R/W disk image file to be used by the kernel,
                    with the first -r option defining vkd0, the second one
                    vkd1, and so on.  A serial number for the virtual disk can
                    be specified in serno.

                    The first -r or -c option specified on the command line
                    will be the boot disk.

     -R file[:serno]
                    Works like -r but treats the disk image as copy-on-write.
                    This allows a private copy of the image to be modified but
                    does not modify the image file.  The image file will not
                    be locked in this situation and multiple vkernels can run
                    off the same image file if desired.

                    Since modifications are thrown away, any data you wish to
                    retain across invocations needs to be exported over the
                    network prior to shutdown.  This gives you the flexibility
                    to mount the disk image either read-only or read-write
                    depending on what is convenient.  However, keep in mind
                    that when mounting a COW image read-write, modifications
                    will eat system memory and swap space until the vkernel is
                    shut down.

     -s             Boot into single-user mode.

     -t             Tell the vkernel to use a precise host timer when
                    calculating clock values.  If the TSC isn't used, this
                    will impose higher overhead on the vkernel as it will have
                    to make a system call to the real host every time it wants
                    to get the time.  However, the more precise timer might be
                    necessary for your application.

                    By default, the vkernel uses the TSC cpu timer if
                    possible, or an imprecise (host-tick-resolution) timer
                    which uses a user-mapped kernel page and does not have any
                    syscall overhead.  To disable the TSC cpu timer, use the
                    -e hw.tsc_cputimer_enable=0 flag.

     -U             Enable writing to kernel memory and module loading.  By
                    default, those are disabled for security reasons.

     -v             Turn on verbose booting.

     -z             Force the vkernel's ram to be pre-zerod.  Useful for
                    benchmarking on single-socket systems where the memory
                    allocation does not have to be NUMA-friendly.  This
                    options is not recommended on multi-socket systems or when
                    the -l option is used.

DEVICES
     A number of virtual device drivers exist to supplement the virtual
     kernel.

   Disk device
     The vkd driver allows for up to 16 vn(4) based disk devices.  The root
     device will be vkd0 (see EXAMPLES for further information on how to
     prepare a root image).

   CD-ROM device
     The vcd driver allows for up to 16 virtual CD-ROM devices.  Basically
     this is a read only vkd device with a block size of 2048.

   Network interface
     The vke driver supports up to 16 virtual network interfaces which are
     associated with tap(4) devices on the host.  For each vke device, the
     per-interface read only sysctl(3) variable hw.vkeX.tap_unit holds the
     unit number of the associated tap(4) device.

     By default, half of the total mbuf clusters available is distributed
     equally among all the vke devices up to 256.  This can be overridden with
     the tunable hw.vke.max_ringsize.  Take into account the number passed
     will be aligned to the lower power of two.

SIGNALS
     The virtual kernel only enables SIGQUIT and SIGTERM while operating in
     regular console mode.  Sending `^\' (SIGQUIT) to the virtual kernel
     causes the virtual kernel to enter its internal ddb(4) debugger and re-
     enable all other terminal signals.  Sending SIGTERM to the virtual kernel
     triggers a clean shutdown by passing a SIGUSR2 to the virtual kernel's
     init(8) process.

DEBUGGING
     It is possible to directly gdb the virtual kernel's process.  It is
     recommended that you do a `handle SIGSEGV noprint' to ignore page faults
     processed by the virtual kernel itself and `handle SIGUSR1 noprint' to
     ignore signals used for simulating inter-processor interrupts.

FILES
     /dev/vcdX                     vcd device nodes
     /dev/vkdX                     vkd device nodes
     /sys/config/VKERNEL64

     vkernel configuration file, for config(8).

CONFIGURATION FILES
     Your virtual kernel is a complete DragonFly system, but you might not
     want to run all the services a normal kernel runs.  Here is what a
     typical virtual kernel's /etc/rc.conf file looks like, with some
     additional possibilities commented out.

     hostname="vkernel"
     network_interfaces="lo0 vke0"
     ifconfig_vke0="DHCP"
     sendmail_enable="NO"
     #syslog_enable="NO"
     blanktime="NO"

BOOT DRIVE SELECTION
     You can override the default boot drive selection and filesystem using a
     kernel environment variable.  Note that the filesystem selected must be
     compiled into the vkernel and not loaded as a module.  You need to escape
     some quotes around the variable data to avoid mis-interpretation of the
     colon in the -e option.  For example:

     -e vfs.root.mountfrom=\"hammer:vkd0s1d\"

DISKLESS OPERATION
     To boot a vkernel from a NFS root, a number of tunables need to be set:

     boot.netif.ip
             IP address to be set in the vkernel interface.

     boot.netif.netmask
             Netmask for the IP to be set.

     boot.netif.name
             Network interface name inside the vkernel.

     boot.nfsroot.server
             Host running nfsd(8).

     boot.nfsroot.path
             Host path where a world and distribution targets are properly
             installed.

     See an example on how to boot a diskless vkernel in the EXAMPLES section.

EXAMPLES
     A couple of steps are necessary in order to prepare the system to build
     and run a virtual kernel.

   Setting up the filesystem
     The vkernel architecture needs a number of files which reside in
     /var/vkernel.  Since these files tend to get rather big and the /var
     partition is usually of limited size, we recommend the directory to be
     created in the /home partition with a link to it in /var:

     mkdir -p /home/var.vkernel/boot
     ln -s /home/var.vkernel /var/vkernel

     Next, a filesystem image to be used by the virtual kernel has to be
     created and populated (assuming world has been built previously).  If the
     image is created on a UFS filesystem you might want to pre-zero it.  On a
     HAMMER filesystem you should just truncate-extend to the image size as
     HAMMER does not re-use data blocks already present in the file.

     vnconfig -c -S 2g -T vn0 /var/vkernel/rootimg.01
     disklabel -r -w vn0s0 auto
     disklabel -e vn0s0      # add `a' partition with fstype `4.2BSD'
     newfs /dev/vn0s0a
     mount /dev/vn0s0a /mnt
     cd /usr/src
     make installworld DESTDIR=/mnt
     cd etc
     make distribution DESTDIR=/mnt
     echo '/dev/vkd0s0a      /       ufs     rw      1  1' >/mnt/etc/fstab
     echo 'proc              /proc   procfs  rw      0  0' >>/mnt/etc/fstab

     Edit /mnt/etc/ttys and replace the console entry with the following line
     and turn off all other gettys.

     console "/usr/libexec/getty Pc"         cons25  on  secure

     Replace Pc with al.Pc if you would like to automatically log in as root.

     Then, unmount the disk.

     umount /mnt
     vnconfig -u vn0

   Compiling the virtual kernel
     In order to compile a virtual kernel use the VKERNEL64 kernel
     configuration file residing in /sys/config (or a configuration file
     derived thereof):

     cd /usr/src
     make -DNO_MODULES buildkernel KERNCONF=VKERNEL64
     make -DNO_MODULES installkernel KERNCONF=VKERNEL64 DESTDIR=/var/vkernel

   Enabling virtual kernel operation
     A special sysctl(8), vm.vkernel_enable, must be set to enable vkernel
     operation:

     sysctl vm.vkernel_enable=1

   Configuring the network on the host system
     In order to access a network interface of the host system from the
     vkernel, you must add the interface to a bridge(4) device which will then
     be passed to the -I option:

     kldload if_bridge.ko
     kldload if_tap.ko
     ifconfig bridge0 create
     ifconfig bridge0 addm re0       # assuming re0 is the host's interface
     ifconfig bridge0 up

   Running the kernel
     Finally, the virtual kernel can be run:

     cd /var/vkernel
     ./boot/kernel/kernel -m 1g -r rootimg.01 -I auto:bridge0

     You can issue the reboot(8), halt(8), or shutdown(8) commands from inside
     a virtual kernel.  After doing a clean shutdown the reboot(8) command
     will re-exec the virtual kernel binary while the other two will cause the
     virtual kernel to exit.

   Diskless operation (vkernel as a NFS client)
     Booting a vkernel with a vknetd(8) network configuration.  The line
     continuation backslashes have been omitted.  For convenience and to
     reduce confusion I recommend mounting the server's remote vkernel root
     onto the host running the vkernel binary using the same path as the NFS
     mount.  It is assumed that a full system install has been made to
     /var/vkernel/root using a kernel KERNCONF=VKERNEL64 for the kernel build.

     /var/vkernel/root/boot/kernel/kernel
             -m 1g -n 4 -I /var/run/vknet
             -e boot.netif.ip=10.100.0.2
             -e boot.netif.netmask=255.255.0.0
             -e boot.netif.gateway=10.100.0.1
             -e boot.netif.name=vke0
             -e boot.nfsroot.server=10.0.0.55
             -e boot.nfsroot.path=/var/vkernel/root

     In this example vknetd is assumed to have been started as shown below,
     before running the vkernel, using an unbridged TAP configuration routed
     through the host.  IP forwarding must be turned on, and in this example
     the server resides on a different network accessible to the host
     executing the vkernel but not directly on the vkernel's subnet.

     kldload if_tap
     sysctl net.inet.ip.forwarding=1
     vknetd -t tap0 10.100.0.1/16

     You can run multiple vkernels trivially with the same NFS root as long as
     you assign each one a different IP on the subnet (2, 3, 4, etc).  You
     should also be careful with certain directories, particularly /var/run
     and possibly also /var/db depending on what your vkernels are going to be
     doing.  This can complicate matters with /var/db/pkg.

BUILDING THE WORLD UNDER A VKERNEL
     The virtual kernel platform does not have all the header files expected
     by a world build, so the easiest thing to do right now is to specify a
     pc64 (in a 64 bit vkernel) target when building the world under a virtual
     kernel, like this:

     vkernel# make MACHINE_PLATFORM=pc64 buildworld
     vkernel# make MACHINE_PLATFORM=pc64 installworld

SEE ALSO
     vknet(1), bridge(4), ifmedia(4), tap(4), vn(4), sysctl.conf(5), build(7),
     config(8), disklabel(8), ifconfig(8), vknetd(8), vnconfig(8)

     Aggelos Economopoulos, A Peek at the DragonFly Virtual Kernel, March
     2007.

HISTORY
     Virtual kernels were introduced in DragonFly 1.7.

AUTHORS
     Matt Dillon thought up and implemented the vkernel architecture and wrote
     the vkd device driver.  Sepherosa Ziehau wrote the vke device driver.
     This manual page was written by Sascha Wildner.

DragonFly 6.3-DEVELOPMENT      September 7, 2021     DragonFly 6.3-DEVELOPMENT