DragonFly On-Line Manual Pages
VKERNEL(7) DragonFly Miscellaneous Information Manual VKERNEL(7)
NAME
vkernel, vcd, vkd, vke -- virtual kernel architecture
SYNOPSIS
platform vkernel64 # for 64 bit vkernels
device vcd
device vkd
device vke
/var/vkernel/boot/kernel/kernel [-hdstUvz] [-c file]
[-e name=value:name=value:...] [-i file]
[-I interface[:address1[:address2][/netmask][=mac]]] [-l cpulock]
[-m size] [-n numcpus[:lbits[:cbits]]] [-p pidfile] [-r file[:serno]]
[-R file[:serno]]
DESCRIPTION
The vkernel architecture allows for running DragonFly kernels in
userland.
The following options are available:
-c file Specify a readonly CD-ROM image file to be used by the kernel,
with the first -c option defining vcd0, the second one vcd1, and
so on. The first -r, -R, or -c option specified on the command
line will be the boot disk. The CD9660 filesystem is assumed
when booting from this media.
-d Disables hardware pagetable for vkernel.
-e name=value:name=value:...
Specify an environment to be used by the kernel. This option
can be specified more than once.
-h Shows a list of available options, each with a short
description.
-i file Specify a memory image file to be used by the virtual kernel.
If no -i option is given, the kernel will generate a name of the
form /var/vkernel/memimg.XXXXXX, with the trailing `Xs' being
replaced by a sequential number, e.g. memimg.000001.
-I interface[:address1[:address2][/netmask][=MAC]]
Create a virtual network device, with the first -I option
defining vke0, the second one vke1, and so on.
The interface argument is the name of a tap(4) device node or
the path to a vknetd(8) socket. The /dev/ path prefix does not
have to be specified and will be automatically prepended for a
device node. Specifying auto will pick the first unused tap(4)
device.
The address1 and address2 arguments are the IP addresses of the
tap(4) and vke interfaces. Optionally, address1 may be of the
form bridgeX in which case the tap(4) interface is added to the
specified bridge(4) interface. The vke address is not assigned
until the interface is brought up in the guest.
The netmask argument applies to all interfaces for which an
address is specified.
The MAC argument is the MAC address of the vke(4) interface. If
not specified, a pseudo-random one will be generated.
When running multiple vkernels it is often more convenient to
simply connect to a vknetd(8) socket and let vknetd deal with
the tap and/or bridge. An example of this would be
/var/run/vknet:0.0.0.0:10.2.0.2/16.
-l cpulock
Specify which, if any, real CPUs to lock virtual CPUs to.
cpulock is one of any, map[,startCPU], or CPU.
any does not map virtual CPUs to real CPUs. This is the
default.
map[,startCPU] maps each virtual CPU to a real CPU starting with
real CPU 0 or startCPU if specified.
CPU locks all virtual CPUs to the real CPU specified by CPU.
Locking the vkernel to a set of cpus is recommended on multi-
socket systems to improve NUMA locality of reference.
-m size Specify the amount of memory to be used by the kernel in bytes,
K (kilobytes), M (megabytes) or G (gigabytes). Lowercase
versions of K, M, and G are allowed.
-n numcpus[:lbits[:cbits]]
numcpus specifies the number of CPUs you wish to emulate. Up to
16 CPUs are supported with 2 being the default unless otherwise
specified.
lbits specifies the number of bits within APICID(=CPUID) needed
for representing the logical ID. Controls the number of
threads/core (0 bits - 1 thread, 1 bit - 2 threads). This
parameter is optional (mandatory only if cbits is specified).
cbits specifies the number of bits within APICID(=CPUID) needed
for representing the core ID. Controls the number of
core/package (0 bits - 1 core, 1 bit - 2 cores). This parameter
is optional.
-p pidfile
Specify a pidfile in which to store the process ID. Scripts can
use this file to locate the vkernel pid for the purpose of
shutting down or killing it.
The vkernel will hold a lock on the pidfile while running.
Scripts may test for the lock to determine if the pidfile is
valid or stale so as to avoid accidentally killing a random
process. Something like '/usr/bin/lockf -ks -t 0 pidfile echo
-n' may be used to test the lock. A non-zero exit code
indicates that the pidfile represents a running vkernel.
An error is issued and the vkernel exits if this file cannot be
opened for writing or if it is already locked by an active
vkernel process.
-r file[:serno]
Specify a R/W disk image file to be used by the kernel, with the
first -r option defining vkd0, the second one vkd1, and so on.
A serial number for the virtual disk can be specified in serno.
The first -r or -c option specified on the command line will be
the boot disk.
-R file[:serno]
Works like -r but treats the disk image as copy-on-write. This
allows a private copy of the image to be modified but does not
modify the image file. The image file will not be locked in
this situation and multiple vkernels can run off the same image
file if desired.
Since modifications are thrown away, any data you wish to retain
across invocations needs to be exported over the network prior
to shutdown. This gives you the flexibility to mount the disk
image either read-only or read-write depending on what is
convenient. However, keep in mind that when mounting a COW
image read-write, modifications will eat system memory and swap
space until the vkernel is shut down.
-s Boot into single-user mode.
-t Tell the vkernel to use a precise host timer when calculating
clock values. If the TSC isn't used, this will impose higher
overhead on the vkernel as it will have to make a system call to
the real host every time it wants to get the time. However, the
more precise timer might be necessary for your application.
By default, the vkernel uses the TSC cpu timer if possible, or
an imprecise (host-tick-resolution) timer which uses a user-
mapped kernel page and does not have any syscall overhead. To
disable the TSC cpu timer, use the -e hw.tsc_cputimer_enable=0
flag.
-U Enable writing to kernel memory and module loading. By default,
those are disabled for security reasons.
-v Turn on verbose booting.
-z Force the vkernel's ram to be pre-zerod. Useful for
benchmarking on single-socket systems where the memory
allocation does not have to be NUMA-friendly. This options is
not recommended on multi-socket systems or when the -l option is
used.
DEVICES
A number of virtual device drivers exist to supplement the virtual
kernel.
Disk device
The vkd driver allows for up to 16 vn(4) based disk devices. The root
device will be vkd0 (see EXAMPLES for further information on how to
prepare a root image).
CD-ROM device
The vcd driver allows for up to 16 virtual CD-ROM devices. Basically
this is a read only vkd device with a block size of 2048.
Network interface
The vke driver supports up to 16 virtual network interfaces which are
associated with tap(4) devices on the host. For each vke device, the
per-interface read only sysctl(3) variable hw.vkeX.tap_unit holds the
unit number of the associated tap(4) device.
By default, half of the total mbuf clusters available is distributed
equally among all the vke devices up to 256. This can be overridden with
the tunable hw.vke.max_ringsize. Take into account the number passed
will be aligned to the lower power of two.
SIGNALS
The virtual kernel only enables SIGQUIT and SIGTERM while operating in
regular console mode. Sending `^\' (SIGQUIT) to the virtual kernel
causes the virtual kernel to enter its internal ddb(4) debugger and re-
enable all other terminal signals. Sending SIGTERM to the virtual kernel
triggers a clean shutdown by passing a SIGUSR2 to the virtual kernel's
init(8) process.
DEBUGGING
It is possible to directly gdb the virtual kernel's process. It is
recommended that you do a `handle SIGSEGV noprint' to ignore page faults
processed by the virtual kernel itself and `handle SIGUSR1 noprint' to
ignore signals used for simulating inter-processor interrupts.
FILES
/dev/vcdX vcd device nodes
/dev/vkdX vkd device nodes
/sys/config/VKERNEL64
vkernel configuration file, for config(8).
CONFIGURATION FILES
Your virtual kernel is a complete DragonFly system, but you might not
want to run all the services a normal kernel runs. Here is what a
typical virtual kernel's /etc/rc.conf file looks like, with some
additional possibilities commented out.
hostname="vkernel"
network_interfaces="lo0 vke0"
ifconfig_vke0="DHCP"
sendmail_enable="NO"
#syslog_enable="NO"
blanktime="NO"
BOOT DRIVE SELECTION
You can override the default boot drive selection and filesystem using a
kernel environment variable. Note that the filesystem selected must be
compiled into the vkernel and not loaded as a module. You need to escape
some quotes around the variable data to avoid mis-interpretation of the
colon in the -e option. For example:
-e vfs.root.mountfrom=\"hammer:vkd0s1d\"
DISKLESS OPERATION
To boot a vkernel from a NFS root, a number of tunables need to be set:
boot.netif.ip
IP address to be set in the vkernel interface.
boot.netif.netmask
Netmask for the IP to be set.
boot.netif.name
Network interface name inside the vkernel.
boot.nfsroot.server
Host running nfsd(8).
boot.nfsroot.path
Host path where a world and distribution targets are properly
installed.
See an example on how to boot a diskless vkernel in the EXAMPLES section.
EXAMPLES
A couple of steps are necessary in order to prepare the system to build
and run a virtual kernel.
Setting up the filesystem
The vkernel architecture needs a number of files which reside in
/var/vkernel. Since these files tend to get rather big and the /var
partition is usually of limited size, we recommend the directory to be
created in the /home partition with a link to it in /var:
mkdir -p /home/var.vkernel/boot
ln -s /home/var.vkernel /var/vkernel
Next, a filesystem image to be used by the virtual kernel has to be
created and populated (assuming world has been built previously). If the
image is created on a UFS filesystem you might want to pre-zero it. On a
HAMMER filesystem you should just truncate-extend to the image size as
HAMMER does not re-use data blocks already present in the file.
vnconfig -c -S 2g -T vn0 /var/vkernel/rootimg.01
disklabel -r -w vn0s0 auto
disklabel -e vn0s0 # add `a' partition with fstype `4.2BSD'
newfs /dev/vn0s0a
mount /dev/vn0s0a /mnt
cd /usr/src
make installworld DESTDIR=/mnt
cd etc
make distribution DESTDIR=/mnt
echo '/dev/vkd0s0a / ufs rw 1 1' >/mnt/etc/fstab
echo 'proc /proc procfs rw 0 0' >>/mnt/etc/fstab
Edit /mnt/etc/ttys and replace the console entry with the following line
and turn off all other gettys.
console "/usr/libexec/getty Pc" cons25 on secure
Replace Pc with al.Pc if you would like to automatically log in as root.
Then, unmount the disk.
umount /mnt
vnconfig -u vn0
Compiling the virtual kernel
In order to compile a virtual kernel use the VKERNEL64 kernel
configuration file residing in /sys/config (or a configuration file
derived thereof):
cd /usr/src
make -DNO_MODULES buildkernel KERNCONF=VKERNEL64
make -DNO_MODULES installkernel KERNCONF=VKERNEL64 DESTDIR=/var/vkernel
Enabling virtual kernel operation
A special sysctl(8), vm.vkernel_enable, must be set to enable vkernel
operation:
sysctl vm.vkernel_enable=1
Configuring the network on the host system
In order to access a network interface of the host system from the
vkernel, you must add the interface to a bridge(4) device which will then
be passed to the -I option:
kldload if_bridge.ko
kldload if_tap.ko
ifconfig bridge0 create
ifconfig bridge0 addm re0 # assuming re0 is the host's interface
ifconfig bridge0 up
Running the kernel
Finally, the virtual kernel can be run:
cd /var/vkernel
./boot/kernel/kernel -m 1g -r rootimg.01 -I auto:bridge0
You can issue the reboot(8), halt(8), or shutdown(8) commands from inside
a virtual kernel. After doing a clean shutdown the reboot(8) command
will re-exec the virtual kernel binary while the other two will cause the
virtual kernel to exit.
Diskless operation (vkernel as a NFS client)
Booting a vkernel with a vknetd(8) network configuration. The line
continuation backslashes have been omitted. For convenience and to
reduce confusion I recommend mounting the server's remote vkernel root
onto the host running the vkernel binary using the same path as the NFS
mount. It is assumed that a full system install has been made to
/var/vkernel/root using a kernel KERNCONF=VKERNEL64 for the kernel build.
/var/vkernel/root/boot/kernel/kernel
-m 1g -n 4 -I /var/run/vknet
-e boot.netif.ip=10.100.0.2
-e boot.netif.netmask=255.255.0.0
-e boot.netif.gateway=10.100.0.1
-e boot.netif.name=vke0
-e boot.nfsroot.server=10.0.0.55
-e boot.nfsroot.path=/var/vkernel/root
In this example vknetd is assumed to have been started as shown below,
before running the vkernel, using an unbridged TAP configuration routed
through the host. IP forwarding must be turned on, and in this example
the server resides on a different network accessible to the host
executing the vkernel but not directly on the vkernel's subnet.
kldload if_tap
sysctl net.inet.ip.forwarding=1
vknetd -t tap0 10.100.0.1/16
You can run multiple vkernels trivially with the same NFS root as long as
you assign each one a different IP on the subnet (2, 3, 4, etc). You
should also be careful with certain directories, particularly /var/run
and possibly also /var/db depending on what your vkernels are going to be
doing. This can complicate matters with /var/db/pkg.
BUILDING THE WORLD UNDER A VKERNEL
The virtual kernel platform does not have all the header files expected
by a world build, so the easiest thing to do right now is to specify a
pc64 (in a 64 bit vkernel) target when building the world under a virtual
kernel, like this:
vkernel# make MACHINE_PLATFORM=pc64 buildworld
vkernel# make MACHINE_PLATFORM=pc64 installworld
SEE ALSO
vknet(1), bridge(4), ifmedia(4), tap(4), vn(4), sysctl.conf(5), build(7),
config(8), disklabel(8), ifconfig(8), vknetd(8), vnconfig(8)
Aggelos Economopoulos, A Peek at the DragonFly Virtual Kernel, March
2007.
HISTORY
Virtual kernels were introduced in DragonFly 1.7.
AUTHORS
Matt Dillon thought up and implemented the vkernel architecture and wrote
the vkd device driver. Sepherosa Ziehau wrote the vke device driver.
This manual page was written by Sascha Wildner.
DragonFly 5.5 January 5, 2019 DragonFly 5.5