NetBSD/src
branch: trunk
commit ebae899d8263b70b17fbd1c1ff13f46419b659df (HEAD -> trunk, origin/trunk, origin/HEAD)
Author: thorpej <thorpej@NetBSD.org>
Date:   Fri Jun 25 03:52:41 2021 +0000

$ git log --reverse -- sys/dev/nvmm lib/libnvmm

commit 5c184ba9b5764e690485213d54f1c92aade4a23a
Author: maxv <maxv@NetBSD.org>
Date:   Wed Nov 7 07:43:07 2018 +0000

    Add NVMM - for NetBSD Virtual Machine Monitor -, a kernel driver that
    provides support for hardware-accelerated virtualization on NetBSD.
    
    It is made of an MI frontend, to which MD backends can be plugged. One
    MD backend is implemented, x86-SVM, for x86 AMD CPUs.
    
    We install
    
            /usr/include/dev/nvmm/nvmm.h
            /usr/include/dev/nvmm/nvmm_ioctl.h
            /usr/include/dev/nvmm/{arch}/nvmm_{arch}.h
    
    And the kernel module. For now, the only architecture where we do that
    is amd64 (arch=x86).
    
    NVMM is not enabled by default in amd64-GENERIC, but is instead easily
    modloadable.
    
    Sent to tech-kern@ a month ago. Validated with kASan, and optimized
    with tprof.

commit e00c4d7eb507db478a9a082f0db453d10159e0b9
Author: maxv <maxv@NetBSD.org>
Date:   Sat Nov 10 09:28:56 2018 +0000

    Add libnvmm, NetBSD's new virtualization API. It provides a way for VMM
    software to effortlessly create and manage virtual machines via NVMM.
    
    It is mostly complete, only nvmm_assist_mem needs to be filled -- I have
    a draft for that, but it needs some more care. This Mem Assist should
    not be needed when emulating a system in x2apic mode, so theoretically
    the current form of libnvmm is sufficient to emulate a whole class of
    systems.
    
    Generally speaking, there are so many modes in x86 that it is difficult
    to handle each corner case without introducing a ton of checks that just
    slow down the common-case execution. Currently we check a limited number
    of things; we may add more checks in the future if they turn out to be
    needed, but that's rather low priority.
    
    Libnvmm is compiled and installed only on amd64. A man page (reviewed by
    wiz@) is provided.

commit 33c33657c05873631724acbc68f17cb79eb93b2b
Author: maxv <maxv@NetBSD.org>
Date:   Sat Nov 10 09:42:42 2018 +0000

    Remove unused cpu_msr.h includes.

commit 38a06a522d6f5adfb9ba8037988859f476b512a7
Author: maxv <maxv@NetBSD.org>
Date:   Sat Nov 10 10:57:06 2018 +0000

    Add copyright and RCSID, from wiz@.

commit 7cf5622063423def53a97e70e8f2cc4aa22ce8b9
Author: maya <maya@NetBSD.org>
Date:   Sun Nov 11 00:06:48 2018 +0000

    Add missing include for struct nvmm_x64_state
    (Pointed out by the clang build)

commit 07fe401d4f470069f52053322129188bf6629ce5
Author: nakayama <nakayama@NetBSD.org>
Date:   Mon Nov 12 17:46:53 2018 +0000

    No need to install shared libraries to /lib.

commit 791fac88452f8910ed09c323800b1e23619d4f48
Author: maya <maya@NetBSD.org>
Date:   Tue Nov 13 06:57:14 2018 +0000

    Revert my own rev 1.2, the missing include was only when building the 32-bit
    compat library, we no longer do this.

commit 1ccf0491dec2869fca0feaa815aaaf715f1c27bd
Author: martin <martin@NetBSD.org>
Date:   Tue Nov 13 09:00:08 2018 +0000

    Move conditionals for libnvmm to subdir makefile, requested boy mrg.

commit d2cb6db9f3a97f343d3cb2d8d66d4fcc3d6af5ef
Author: martin <martin@NetBSD.org>
Date:   Tue Nov 13 09:14:14 2018 +0000

    Need some minimalistic support for additional things that ../Makefile
    requires, even if we do nothing here

commit 9db01b7e2c13b3ed9227fba4df27be9d2a7647a8
Author: martin <martin@NetBSD.org>
Date:   Tue Nov 13 09:24:37 2018 +0000

    Too much magic involved - revert previous.

commit 86a74f4fcdb0da86ef214b0af2bb28e833dba5e7
Author: maxv <maxv@NetBSD.org>
Date:   Wed Nov 14 19:14:40 2018 +0000

    Take RAX from the VMCB and not the VCPU state, the latter is not
    synchronized and contains old values.

commit 327344f16ac2ab65ec73fff66073bfd69412bbe8
Author: maxv <maxv@NetBSD.org>
Date:   Sat Nov 17 16:11:33 2018 +0000

    Don't forget to set 'prot' when the guest has paging disabled.

commit e0463a397f0f81bd1549fdb2002e8de92e90f0df
Author: maxv <maxv@NetBSD.org>
Date:   Sun Nov 18 07:42:24 2018 +0000

    Ah, should be UVM_ADV_RANDOM.

commit 9d4e3adbeef2598e7eeb4e0141d9ad587e3266ce
Author: maxv <maxv@NetBSD.org>
Date:   Mon Nov 19 17:35:12 2018 +0000

    Rename one constant, for clarity.

commit ed114ff4dc2467302e81153e3b3a0fd3090ee411
Author: maxv <maxv@NetBSD.org>
Date:   Mon Nov 19 21:45:37 2018 +0000

    Fix error handling of realloc, and use memmove because the areas overlap;
    noted by agc@. These _nvmm_area_add/delete functions don't make a lot of
    sense right now and will likely be rewritten to match the behavior
    expected by Qemu; but still fix for the time being.
    
    Also fix a collision check while here.

commit 5aea1ef46fc591f95cb4a13e4df7ac90c503813c
Author: maxv <maxv@NetBSD.org>
Date:   Thu Nov 22 07:37:12 2018 +0000

    Add missing pmap_update after pmap_kenter_pa, noted by Kamil.

commit aa9fffd9559d03f72c332e6e70a86dbe6c583e2a
Author: maxv <maxv@NetBSD.org>
Date:   Sun Nov 25 14:09:57 2018 +0000

    Add RFLAGS in the exitstate.

commit 2affef406444a1316ae1662d2702c910145a788a
Author: maxv <maxv@NetBSD.org>
Date:   Sun Nov 25 14:11:24 2018 +0000

    Appease the check: allow NVMM_MAX_RAM bytes of memory, and not just
    NVMM_MAX_RAM-1.

commit 6882fb8eca9b71cad2c720fb94d273333dee0eca
Author: maxv <maxv@NetBSD.org>
Date:   Thu Nov 29 19:55:20 2018 +0000

    Rewrite the gpa map/unmap functions. Dig holes in the mapped areas when
    there is an overlap. Close to what Qemu expects.

commit e785e4bc51e75f6bd156006b6aee07ecf3769f99
Author: maxv <maxv@NetBSD.org>
Date:   Wed Dec 12 09:09:08 2018 +0000

    Change the "FILES" section, in the end I don't want to commit toyvirt
    and smallkern, there is little interest installing them by default,
    rather they can be downloaded from www. It's better this way.
    
    While here add NVMM(4) in "SEE ALSO".

commit e3002faf2d915ce3ca295b1fb2b7ad042d25fa55
Author: maxv <maxv@NetBSD.org>
Date:   Wed Dec 12 10:42:34 2018 +0000

    Change the map/unmap functions, again.

commit 4cf43b42f438b161db7d8d117b77b123724c6119
Author: wiz <wiz@NetBSD.org>
Date:   Wed Dec 12 11:40:08 2018 +0000

    Remove superfluous dot.

commit 65f495e8aa2116bef51801809f26a7ad61d3472b
Author: maxv <maxv@NetBSD.org>
Date:   Thu Dec 13 16:28:10 2018 +0000

    Don't forget to advance the RIP after an XSETBV emulation.

commit e88dc7e8bf7b918df1c85a6c72dacd5e18dc7468
Author: maxv <maxv@NetBSD.org>
Date:   Sat Dec 15 13:09:02 2018 +0000

    Two changes:
    
     - Fix the I/O Assist, for INS* it is RDI and not RSI, and the register
       gets updated regardless of the REP prefix.
    
     - Fill in the Mem Assist. We decode and emulate certain instructions,
       and pass a mem descriptor to the callback to handle the transaction.
       The disassembler could use some polishing, and there are still a
       few instructions missing; but basically it works.

commit a7d1548b50e74991bfacdee2a9f8bf4d6c154100
Author: maxv <maxv@NetBSD.org>
Date:   Sat Dec 15 13:39:43 2018 +0000

    Invert the mapping logic.
    
    Until now, the "owner" of the memory was the guest, and by calling
    nvmm_gpa_map(), the virtualizer was creating a view towards the guest
    memory.
    
    Qemu expects the contrary: it wants the owner to be the virtualizer, and
    nvmm_gpa_map should just create a view from the guest towards the
    virtualizer's address space. Under this scheme, it is legal to have two
    GPAs that point to the same HVA.
    
    Introduce nvmm_hva_map() and nvmm_hva_unmap(), that map/unamp the HVA into
    a dedicated UOBJ. Change nvmm_gpa_map() and nvmm_gpa_unmap() to just
    perform an enter into the desired UOBJ.
    
    With this change in place, all the mapping-related problems in Qemu+NVMM
    are fixed.

commit 6d0107130349e7ccc613d85a3866e5fc6356acd2
Author: maxv <maxv@NetBSD.org>
Date:   Thu Dec 27 07:22:31 2018 +0000

    Several improvements and fixes:
    
     * Change the Assist API. Rather than passing callbacks in each call, the
       callbacks are now registered beforehand. Then change the I/O Assist to
       fetch MMIO data via the Mem callback. This allows a guest to perform an
       I/O string operation on a memory that is itself an MMIO.
    
     * Introduce two new functions internal to libnvmm, read_guest_memory and
       write_guest_memory. They can handle mapped memory, MMIO memory and
       cross-page transactions.
    
     * Allow nvmm_gva_to_gpa and nvmm_gpa_to_hva to take non-page-aligned
       addresses. This simplifies a lot of things.
    
     * Support the MOVS instruction, and add a test for it. This instruction
       is special, in that it takes two implicit memory operands. In
       particular, it means that the two buffers can both be in MMIO memory,
       and we handle this case.
    
     * Fix gross copy-pasto in nvmm_hva_unmap. Also fix a few things here and
       there.

commit 6b8576f31ede26be7f6e2e9b69db5cafaffe8129
Author: maxv <maxv@NetBSD.org>
Date:   Sat Dec 29 17:54:54 2018 +0000

    Fix the segmentation check, the limit is relative, not absolute.

commit 1cb762c12044df197ad606bb031b92ee27edb4c8
Author: maxv <maxv@NetBSD.org>
Date:   Wed Jan 2 12:18:08 2019 +0000

    When there's no DecodeAssist in hardware, decode manually in software. This
    is needed on certain AMD CPUs (like mine): the segment base of OUTS can be
    overridden, and it is wrong to just assume DS.
    
    We fetch the instruction and look at the prefixes if any to determine the
    correct segment.

commit fac5e55a24367828649f075d3fd77fc9f88afd01
Author: maxv <maxv@NetBSD.org>
Date:   Thu Jan 3 08:02:49 2019 +0000

    Fix another gross copy-pasto.

commit b6c2e14c6bfa17ce0f7cecb911e286412a73a262
Author: maxv <maxv@NetBSD.org>
Date:   Fri Jan 4 10:25:39 2019 +0000

    In !64bit mode RIP-relative is null+disp32, handle that correctly.

commit 02a074bc61ed811a75d5377ceb3325cc09f3a9cf
Author: maxv <maxv@NetBSD.org>
Date:   Sun Jan 6 16:10:51 2019 +0000

    Improvements and fixes in NVMM.
    
    Kernel driver:
    
     * Don't take an extra (unneeded) reference to the UAO.
    
     * Provide npc for HLT. I'm not really happy with it right now, will
       likely be revisited.
    
     * Add the INT_SHADOW, INT_WINDOW_EXIT and NMI_WINDOW_EXIT states. Provide
       them in the exitstate too.
    
     * Don't take the TPR into account when processing INTs. The virtualizer
       can do that itself (Qemu already does).
    
     * Provide a hypervisor signature in CPUID, and hide SVM.
    
     * Ignore certain MSRs. One special case is MSR_NB_CFG in which we set
       NB_CFG_INITAPICCPUIDLO. Allow reads of MSR_TSC.
    
     * If the LWP has pending signals or softints, leave, rather than waiting
       for a rescheduling to happen later. This reduces interrupt processing
       time in the guest (Qemu sends a signal to the thread, and now we leave
       right away). This could be improved even more by sending an actual IPI
       to the CPU, but I'll see later.
    
    Libnvmm:
    
     * Fix the MMU translation of large pages, we need to add the lower bits
       too.
    
     * Change the IO and Mem structures to take a pointer rather than a
       static array. This provides more flexibility.
    
     * Batch together the str+rep IO transactions. We do one big memory
       read/write, and then send the IO commands to the hypervisor all at
       once. This considerably increases performance.
    
     * Decode MOVZX.
    
    With these changes in place, Qemu+NVMM works. I can install NetBSD 8.0
    in a VM with multiple VCPUs, connect to the network, etc.

commit f66528d8cdad2beb8654e43fbb5e26dad3d27dc8
Author: maxv <maxv@NetBSD.org>
Date:   Sun Jan 6 18:32:54 2019 +0000

    Add more VMCB fields. Also remove debugging code I mistakenly committed
    in the previous revision. No functional change.

commit 15992ed9f793d45abc86f105b6f2051936ac850a
Author: maxv <maxv@NetBSD.org>
Date:   Mon Jan 7 13:47:33 2019 +0000

    Improvements and fixes:
    
     * Decode AND/OR/XOR from Group1.
    
     * Sign-extend the immediates and displacements in 64bit mode.
    
     * Fix the storage of {read,write}_guest_memory, now that we batch certain
       IO operations we can copy more than 8 bytes, and shit hits the fan.
    
     * Remove the CR4_PSE check in the 64bit MMU. This bit is actually ignored
       in long mode, and some systems (like FreeBSD) don't set it.

commit def9083f44681cdf683fb584498e6256ff19a17f
Author: maxv <maxv@NetBSD.org>
Date:   Mon Jan 7 14:08:02 2019 +0000

    Optimize: cache the guest state entirely in the VMCB-cache, flush it on a
    state-by-state basis when needed.

commit 9edc324e9b99e233a44399f04ff26eb5caf4bbc1
Author: maxv <maxv@NetBSD.org>
Date:   Mon Jan 7 16:30:25 2019 +0000

    Optimize: on single memory operand instructions, take the GPA directly from
    the exit structure provided by the kernel. This saves an MMU translation,
    and sometimes complex address computation (eg SIB).
    
    Drop the GVA field, it is not useful to virtualizers.

commit f7bb016339701f15208edb198c280d8baef8e050
Author: maxv <maxv@NetBSD.org>
Date:   Mon Jan 7 18:13:34 2019 +0000

    Optimize the legpref node: omit BRN (we don't care and it's the same as
    OVR_CS), inline the loops, sort the checks from most to least likely
    prefix, and use a compact structure.

commit 54bdfea1ce230e6c0aa2e785c456d27bd08c22fc
Author: wiz <wiz@NetBSD.org>
Date:   Mon Jan 7 22:17:02 2019 +0000

    Remove leading zero from date.

commit 22f74dfea3707feb5529f4127650ed27a9c52aa3
Author: maxv <maxv@NetBSD.org>
Date:   Tue Jan 8 07:29:46 2019 +0000

    _IOWR -> _IOW

commit c31f2b7a7f314e565fdf7bdf6e143bf974fc47af
Author: maxv <maxv@NetBSD.org>
Date:   Tue Jan 8 07:34:22 2019 +0000

    Handle REPN. FreeBSD has a "repn movs", which is a bit unusual, but doesn't
    seem illegal as far as I can tell from the AMD SDM.
    
    With that, I can boot FreeBSD on Qemu+NVMM.

commit c73cb7aefbfc51da7ee63133ea164a8c908bc21a
Author: maxv <maxv@NetBSD.org>
Date:   Tue Jan 8 14:43:18 2019 +0000

    Optimize: don't keep a full copy of the guest state, rather take only what
    is needed. This avoids expensive memcpy's.
    
    Also flush the V_TPR as part of the CR-state, because there is CR8 in it.

commit fe65b1082f76612238471131fc9c6c66a0a04c8c
Author: maxv <maxv@NetBSD.org>
Date:   Thu Jan 10 06:58:36 2019 +0000

    Optimize:
    
     * Don't save/restore the host CR2, we don't care because we're not in a
       #PF context (and preemption switches already handle CR2 safely).
    
     * Don't save/restore the host FS and GS, just reset them to zero after
       VMRUN. Note: DS and ES must be reset _before_ VMRUN, but that doesn't
       apply to FS and GS.
    
     * Handle FSBASE and KGSBASE outside of the VCPU loop, to avoid the cost
       of saving/restoring them when there's no reason to leave the loop.

commit cf712e1021547cd92b0e6ce990bf11a6abc57259
Author: maxv <maxv@NetBSD.org>
Date:   Sun Jan 13 10:07:50 2019 +0000

    Reset DR7 before loading DR0-3, to prevent a fault if the host process
    has dbregs enabled.

commit 8cb399318926d8aee03f910c1af3c094621de84e
Author: maxv <maxv@NetBSD.org>
Date:   Sun Jan 13 10:43:22 2019 +0000

    Handle more corner cases, clean up a little, and add a set of instructions
    in Group1.

commit e0ff52afb5d1fcb09eaa6f0fb6ecea4b7d8b81ee
Author: maxv <maxv@NetBSD.org>
Date:   Sun Jan 20 16:55:21 2019 +0000

    Improvements in NVMM
    
     * Handle the FPU differently, limit the states via the given mask rather
       than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
       that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
       the virtualizer, to force a reload from memory.
    
     * Hide RDTSCP.
    
     * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.
    
     * Take ECX and not RCX on MSR instructions.

commit 4edf9e08c2d5da7cf40982a7d6a4b75c2bfc4dbb
Author: maxv <maxv@NetBSD.org>
Date:   Thu Jan 24 13:05:59 2019 +0000

    Optimize: change the behavior of the HLT vmexit, make it a "change in vcpu
    state" which occurs after the instruction executed, rather than an
    instruction intercept which occurs before. Disable the shadow and the intr
    window in kernel mode, and advance the RIP, so that the virtualizer doesn't
    have to do it itself. This saves two syscalls and one VMCB cache flush.
    
    Provide npc for other instruction intercepts, in case someone is
    interested.

commit 90b0ce8a8beb73f6c4aec9738ef1f69d2a2195a6
Author: maxv <maxv@NetBSD.org>
Date:   Sat Jan 26 14:44:54 2019 +0000

    Ah, fix bug: when the opcode has an immediate, we fill the src with a
    register storage, but then we overwrite it without zeroing out the highest
    bits of the resulting immediate (which may contain garbage from the union).

commit c034519d8c971b80e248b55eb183b1b47b539dcb
Author: maxv <maxv@NetBSD.org>
Date:   Sat Jan 26 15:12:20 2019 +0000

    Remove nvmm_exit_memory.npc, useless.

commit 7d511c12ce179812b31745508a6bc6ef085ad22b
Author: maxv <maxv@NetBSD.org>
Date:   Sat Jan 26 15:25:51 2019 +0000

    Optimize: keep a per-VCPU buffer for the state, and copy in and out
    directly on it. The VCPUs are protected by mutexes, so nothing to worry
    about.
    
    This saves two kmem_allocs in {get,set}state.

commit 6b34ef66c5bc0e04cdfce2b6f36cedc083cb803b
Author: pgoyette <pgoyette@NetBSD.org>
Date:   Sun Jan 27 02:08:33 2019 +0000

    Merge the [pgoyette-compat] branch

commit cdb31846f88aa75d95305ec0042b31178809f208
Author: maxv <maxv@NetBSD.org>
Date:   Fri Feb 1 06:49:58 2019 +0000

    Fix two issues:
    
     * Uh I put the wrong masks in some GPRs, fuck.
    
     * When the opsize of MOVZX is 4, we need to combine the zero-extend from
       the instruction with the natural zero-extend of long mode.
    
    Add two associated tests.

commit ef67fd308549ea0e74a403dfbf03c5c34647c610
Author: maxv <maxv@NetBSD.org>
Date:   Mon Feb 4 12:11:18 2019 +0000

    Improvements:
    
     - Guest reads/writes to PAT land in gPAT, so no need to emulate them.
    
     - When emulating EFER, don't advance the RIP if a fault occurs, and don't
       forget to flush the VMCB cache accordingly.

commit a1a558cbc8b8801712536119e62333865a1325cb
Author: maxv <maxv@NetBSD.org>
Date:   Tue Feb 5 13:56:32 2019 +0000

    Sync with reality, and improve.

commit 1a85186570917f36ecef73c076a66a49f6745c1e
Author: wiz <wiz@NetBSD.org>
Date:   Tue Feb 5 15:03:35 2019 +0000

    Mark up NULL with Dv. Remove empty line.

commit 1f9a1ec47be5f51d6cb858092d628f7d7071a438
Author: maxv <maxv@NetBSD.org>
Date:   Thu Feb 7 10:58:45 2019 +0000

    Improvements:
    
     - Emulate the instructions by executing them directly on the host CPU.
       This is easier and probably faster than doing it in software
       manually.
    
     - Decode SUB from Primary, CMP from Group1, TEST from Group3, and add
       associated tests.
    
     - Handle correctly the cases where an instruction that always implicitly
       reads the register operand is executed with the mem operand as source
       (eg: "orq (%rbx),%rax").
    
     - Fix the MMU handling of 32bit-PAE. Under PAE CR3 is not page-aligned,
       so there are extra bits that are valid.
    
    With these changes in place I can boot Windows XP on Qemu+NVMM.

commit 00dbf654738e05140e158c4145e06cc50cf47ac9
Author: christos <christos@NetBSD.org>
Date:   Sun Feb 10 19:30:28 2019 +0000

    #### is not legal.

commit 0c66f932a362a9c62828a8dab4ea30e3f0cefc98
Author: maxv <maxv@NetBSD.org>
Date:   Mon Feb 11 07:07:37 2019 +0000

    Increase the max guest ram from 4GB to 128GB.

commit 43dbe2c4ed79a2aaa0b47688b2b078404598b463
Author: maxv <maxv@NetBSD.org>
Date:   Tue Feb 12 14:50:21 2019 +0000

    Optimize: fetch only 5 bytes instead of 15, the instruction can have only
    up to five prefixes.

commit 0196af49ab249be76c52d1ae4e6dd63fd820c186
Author: maxv <maxv@NetBSD.org>
Date:   Tue Feb 12 14:54:59 2019 +0000

    Optimize: the hardware does not clear the TLB flush command after a
    VMENTRY, so clear it ourselves, to avoid uselessly flushing the guest
    TLB. While here also fix the processing of EFER-induced flushes, they
    shouldn't be delayed.

commit 008a833faf902dcb502f8ae4c6bf368fa3c54654
Author: maxv <maxv@NetBSD.org>
Date:   Wed Feb 13 06:32:45 2019 +0000

    Reorder the GPRs to match the CPU encoding, simplifies things on Intel.

commit e7b70c162b837efb5f76fd989aa58ffafbc36868
Author: maxv <maxv@NetBSD.org>
Date:   Wed Feb 13 07:04:12 2019 +0000

    Micro optimization: the STAR/LSTAR/CSTAR/SFMASK MSRs are static, so rather
    than saving them on each VMENTRY, save them only once, at VCPU creation
    time.

commit 098836b92dccbd757fe736cc3c8e5bfc67c8d3a1
Author: maxv <maxv@NetBSD.org>
Date:   Wed Feb 13 10:55:13 2019 +0000

    Drop support for software interrupts. I had initially added that to cover
    the three event types available on AMD, but Intel has seven of them, all
    with weird and twisted meanings, and they require extra parameters.
    
    Software interrupts should not be used anyway.

commit 2829a4737b63754a5375b25d9d3f4dd56448f388
Author: maxv <maxv@NetBSD.org>
Date:   Wed Feb 13 16:03:16 2019 +0000

    Add Intel-VMX support in NVMM. This allows us to run hardware-accelerated
    VMs on Intel CPUs. Overall this implementation is fast and reliable, I am
    able to run NetBSD VMs with many VCPUs on a quad-core Intel i5.
    
    NVMM-Intel applies several optimizations already present in NVMM-AMD, and
    has a code structure similar to it. No change was needed in the NVMM MI
    frontend, or in libnvmm.
    
    Some differences exist against AMD:
    
     - On Intel the ASID space is big, so we don't fall back to a shared ASID
       when there are more VCPUs executing than available ASIDs in the host,
       contrary to AMD. There are enough ASIDs for the maximum number of VCPUs
       supported by NVMM.
    
     - On Intel there are two TLBs we need to take care of, one for the host
       (EPT) and one for the guest (VPID). Changes in EPT paging flush the
       host TLB, changes to the guest mode flush the guest TLB.
    
     - On Intel there is no easy way to set/fetch the VTPR, so we intercept
       reads/writes to CR8 and maintain a software TPR, that we give to the
       virtualizer as if it was the effective TPR in the guest.
    
     - On Intel, because of SVS, the host CR4 and LSTAR are not static, so
       we're forced to save them on each VMENTRY.
    
     - There is extra Intel weirdness we need to take care of, for example the
       reserved bits in CR0 and CR4 when accesses trap.
    
    While this implementation is functional and can already run many OSes, we
    likely have a problem on 32bit-PAE guests, because they require special
    care on Intel CPUs, and currently we don't handle that correctly; such
    guests may misbehave for now (without altering the host stability). I
    expect to fix that soon.

commit 9eb6ddd38e3562dec16e1271f08147ff965482c4
Author: maxv <maxv@NetBSD.org>
Date:   Thu Feb 14 09:37:31 2019 +0000

    On AMD, the segments have a simple "present" bit. On Intel however there
    is an extra "unusable" bit, which has a twisted meaning. We can't just
    ignore this bit, because when unset, the CPU performs extra checks on the
    other attributes, which may cause VMENTRY to fail and the guest to be
    killed.
    
    Typically, on Qemu, some guests like Windows XP trigger two consecutive
    getstate+setstate calls, and while processing them, we end up wrongfully
    removing the "unusable" bits that were previously set.
    
    Fix that by forcing "unusable = !present". Each hypervisor I could check
    does something different, but this seems to be the least problematic
    solution for now.
    
    While here, the fields of vmx_guest_segs are VMX indexes, so they should
    be uint64_t (no functional change).

commit 83a61044252c6b126e7d3210ea15dc56dae6b2e8
Author: maxv <maxv@NetBSD.org>
Date:   Thu Feb 14 14:30:20 2019 +0000

    Harmonize the handling of the CPL between AMD and Intel.
    
    AMD has a separate guest CPL field, because on AMD, the SYSCALL/SYSRET
    instructions do not force SS.DPL to predefined values. On Intel they do,
    so the CPL on Intel is just the guest's SS.DPL value.
    
    Even though technically possible on AMD, there is no sane reason for a
    guest kernel to set a non-three SS.DPL, doing that would mess up several
    common segmentation practices and wouldn't be compatible with Intel.
    
    So, force the Intel behavior on AMD, by always setting SS.DPL<=>CPL.
    Remove the now unused CPL field from nvmm_x64_state::misc[]. This actually
    increases performance on AMD: to detect interrupt windows the virtualizer
    has to modify some fields of misc[], and because CPL was there, we had to
    flush the SEG set of the VMCB cache. Now there is no flush necessary.
    
    While here remove the CPL check for XSETBV on Intel, contrary to AMD
    Intel checks the CPL before the intercept, so if we receive an XSETBV
    VMEXIT, we are certain that it was executed at CPL=0 in the guest. By the
    way my check was wrong in the first place, it was reading SS.RPL instead
    of SS.DPL.

commit 5f77213fd2f4c5b42649f18aa9f0c4c0a99870e6
Author: maxv <maxv@NetBSD.org>
Date:   Fri Feb 15 13:17:05 2019 +0000

    Initialize the guest TSC to zero at VCPU creation time, and handle guest
    writes to MSR_TSC at run time.
    
    This is imprecise, because the hardware does not provide a way to preserve
    the TSC during #VMEXITs, but that's fine enough.

commit 40c6faeb0023754bcb8de68dfad4e9aaf7e9786a
Author: maxv <maxv@NetBSD.org>
Date:   Fri Feb 15 16:42:27 2019 +0000

    Remove the PSE check in the 32bit-PAE MMU. Setting CR4.PAE automatically
    enables PSE regardless of whether CR4.PSE is set or not, so we should just
    ignore it.
    
    With this in place I can boot Windows 8.1 on NVMM.

commit 60a2d4b8fb52e73be5f5c69b46c2d323ab33a930
Author: maxv <maxv@NetBSD.org>
Date:   Sat Feb 16 12:05:30 2019 +0000

    Handle MSR_MISC_ENABLE on NVMM-Intel (Intel-specific).

commit 613287c8d80e0ae608c972974949f8297a15a1c5
Author: maxv <maxv@NetBSD.org>
Date:   Sat Feb 16 12:40:31 2019 +0000

    Improve the FPU detection: hide XSAVES because we're not allowing it, and
    don't set CPUID2_OSXSAVE if the guest didn't first set CR4_OSXSAVE.
    
    With these changes in place, I can boot Windows 10 on NVMM.

commit bdf95c00c81f94ebe2b7638cfbd3378cadcb0062
Author: maxv <maxv@NetBSD.org>
Date:   Sat Feb 16 12:58:13 2019 +0000

    Ah no, adapt previous, on AMD RAX is in the VMCB.

commit b360319c237c85e717ab9e13d6b74c1ea9074404
Author: maxv <maxv@NetBSD.org>
Date:   Sun Feb 17 20:25:46 2019 +0000

    Fix handling of SIB instructions. We were jumping to the SIB node _before_
    fetching the displacement, so the node would always think there was no
    displacement.
    
    This didn't alter the final GPA we would be touching - because it is
    fetched from the kernel directly and not from the computation -, but it
    altered the instruction length, and on some guests (like Fedora 64bit),
    the VCPU would resume execution at the wrong RIP and crash.
    
    Now these guests work.

commit 042f8e32f3f132fe4cd182325724da3b7ca055dc
Author: maxv <maxv@NetBSD.org>
Date:   Mon Feb 18 12:17:45 2019 +0000

    Ah, finally found you. Fix scheduling bug in NVMM.
    
    When processing guest page faults, we were calling uvm_fault with
    preemption disabled. The thing is, uvm_fault may block, and if it does,
    we land in sleepq_block which calls mi_switch; so we get switched away
    while we explicitly asked not to be. From then on things could go really
    wrong.
    
    Fix that by processing such faults in MI, where we have preemption enabled
    and are allowed to block.
    
    A KASSERT in sleepq_block (or before) would have helped.

commit 409f23287dab9681e2948875bcbb1061a2009467
Author: maxv <maxv@NetBSD.org>
Date:   Thu Feb 21 11:58:04 2019 +0000

    Clarify the gTLB code a little.

commit d6f8093fcd631fbae0ce8ae04447ae1492d92992
Author: maxv <maxv@NetBSD.org>
Date:   Thu Feb 21 12:17:52 2019 +0000

    Another locking issue in NVMM: the {svm,vmx}_tlb_flush functions take VCPU
    mutexes which can sleep, but their context does not allow it.
    
    Rewrite the TLB handling code to fix that. It becomes a bit complex. In
    short, we use a per-VM generation number, which we increase on each TLB
    flush, before sending a broadcast IPI to everybody. The IPIs cause a
    #VMEXIT of each VCPU, and each VCPU Loop will synchronize the per-VM gen
    with a per-VCPU copy, and apply the flushes as neededi lazily.
    
    The behavior differs between AMD and Intel; in short, on Intel we don't
    flush the hTLB (EPT cache) if a context switch of a VCPU occurs, so now,
    we need to maintain a kcpuset to know which VCPU's hTLBs are active on
    which hCPU. This creates some redundancy on Intel, ie there are cases
    where we flush the hTLB several times unnecessarily; but hTLB flushes are
    very rare, so there is no real performance regression.
    
    The thing is lock-less and non-blocking, so it solves our problem.

commit 875518e2abe1e083fc8caacc74a0731d7bdd944d
Author: maxv <maxv@NetBSD.org>
Date:   Thu Feb 21 13:25:44 2019 +0000

    Reorder the detection in vmx_ident(), to fix panic on old CPUs. We must
    read MSR_IA32_VMX_EPT_VPID_CAP _after_ ensuring EPT is there, because if
    it's not, the rdmsr faults.

commit d3528af0854366fe84a0e7a20921a87bbd0e725d
Author: maxv <maxv@NetBSD.org>
Date:   Fri Feb 22 12:24:34 2019 +0000

    Fix omission: if we receive a guest trap on CR0, and if the original
    instruction would have resulted in Long Mode being enabled, we need to
    manually enable Long Mode ourselves. We were already doing that correctly
    in setstate, but not in the CR0 trap handler.
    
    Problem initially reported by Aymeric Vincent; ArchLinux wouldn't boot,
    now it does and works correctly.
    
    While here, add CR0_ET in the CR0 mask, for the associated shadow to
    be taken into account. Normally this shadow bit shouldn't be necessary,
    but for now I keep it regardless.

commit 1712eeb2645b4a221705e694b0a61ec3f9fc8830
Author: maxv <maxv@NetBSD.org>
Date:   Sat Feb 23 08:19:16 2019 +0000

    Reorder the functions, and constify setstate. No functional change.

commit bce11997382ba886f8576eb6df01582b9747bd89
Author: maxv <maxv@NetBSD.org>
Date:   Sat Feb 23 10:43:36 2019 +0000

    Add support for CPUs that don't have the EPT_{A,D} bits.
    
    On such CPUs, these bits are ignored by the hardware. We don't care about
    setting them, however, we must always assume they are set. Modify the pmap
    code to do that.
    
    While here, in pmap_ept_remove_pte, don't flush the TLB when it's not
    needed.
    
    Tested on an old Intel Celeron.

commit 3f7ba5d607892caff67ede7857c13f6b4659d0fc
Author: maxv <maxv@NetBSD.org>
Date:   Sat Feb 23 12:27:00 2019 +0000

    Install the x86 RESET state at VCPU creation time, for convenience, so
    that the libnvmm users can expect a functional VCPU right away.

commit f35533a8094ee9d3fc61401beaf9028ab5d6f1f6
Author: maxv <maxv@NetBSD.org>
Date:   Tue Feb 26 10:18:39 2019 +0000

    Set hardseg to -1 rather than 0, because 0 can be a valid segment.

commit a7cdcccb8494989cb2e0a70cb06e5905b8a0ce28
Author: maxv <maxv@NetBSD.org>
Date:   Tue Feb 26 12:23:12 2019 +0000

    Change the layout of the SEG state:
    
     - Reorder it, to match the CPU encoding. This is the universal order,
       also used by Qemu. Drop the seg_to_nvmm[] tables.
    
     - Compress it. This divides its size by two.
    
     - Rename some of its fields, to better match the x86 spec. Also, take S
       out of Type, this was a NetBSD-ism that was likely confusing to other
       people.

commit ad584c20185c433b9a7d3abb49a27a6a145f2f40
Author: maxv <maxv@NetBSD.org>
Date:   Sun Mar 3 07:01:09 2019 +0000

    Choose which CPUID bits to allow, rather than which bits to disallow. This
    is clearer, and also forward compatible with future CPUs.
    
    While here be more consistent when allowing the bits, and sync between
    nvmm-amd and nvmm-intel. Also make sure to disallow AVX, because the guest
    state we provide is only x86+SSE. Fixes a CentOS panic when booting on
    NVMM, reported by Jared McNeill, thanks.

commit 4b725333a19d4295e7116a66e64600bf8c430d0d
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 7 15:06:37 2019 +0000

    Parse EXC_NMI on nvmm-intel, and don't return NVMM_EXIT_INVALID if we
    received a host NMI, otherwise the guest could get killed if an NMI comes
    in, typically when the host runs tprof at the same time.
    
    Already handled on nvmm-amd.

commit 29f0099ec30e94009fcc2cbf75cfb803e8b04cef
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 7 15:22:21 2019 +0000

    Rename the internal NVMM HVA table entries from "segment" to "hmapping",
    less confusing. Also fix the error handling in nvmm_hva_unmap().

commit 824d33230d413d9a96d0392a69131c56377a8954
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 7 15:47:34 2019 +0000

    Micro optimizations:
    
     - Compress x86_rexpref, x86_regmodrm, x86_opcode and x86_instr.
     - Cache-align the register, opcode and group tables.
     - Modify the opcode tables to have 256 entries, and avoid a lookup.

commit 3c093f6fb3e30cb7eaf4a43797824a8d4702895f
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 14 19:10:27 2019 +0000

    Fail early if we're beyond the guest max ram.

commit 78aa41a956f0454eb0b8003fab03bfd139f9f84a
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 14 19:15:26 2019 +0000

    Reduce the mask of the VTPR, only the first four bits matter.

commit 7c63f593d73c2584a06f794e05bff5c594be25a4
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 14 19:26:44 2019 +0000

    Move a KASSERT, applies to all branches.

commit b93efd52e70760b40fadce6293a47d26b3a560c0
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 14 20:29:53 2019 +0000

    Optimize NVMM-Intel: keep the VMCS active on the host CPU, and lazy-switch
    it on demand only when needed. This allows the CPU to use the cached
    version of the guest state, rather than the in-memory copy of it. This is
    much more performant.
    
    A VMCS must be active on only one CPU, but one CPU can have several active
    VMCSs at the same time.
    
    We keep track of which CPU each VMCS is active on. When we want to execute
    a VCPU, we determine whether its VMCS is loaded on another CPU, and if so
    send an IPI to ask it to unbusy that VMCS. In most cases the VMCS is
    already active on the current CPU, so we don't have to do anything and can
    proceed with a fast VMRESUME.
    
    We send IPIs with kpreemption enabled but with a bound LWP, because we
    don't want to get context-switched to the CPU we just sent an IPI to.
    
    Overall, with this in place, I see a ~15% performance increase in the
    guests on NVMM-Intel.

commit df7220ac04759dff216e0cbdb1d74429f7ae392e
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 21 20:21:40 2019 +0000

    Make it possible for an emulator to set the protection of the guest pages.
    For some reason I had initially concluded that it wasn't doable; verily it
    is, so let's do it.
    
    The reserved 'flags' argument of nvmm_gpa_map() becomes 'prot' and takes
    mmap-like protection codes.

commit 0895cc3a398fca7be974f1ea435e45ad6b992145
Author: maxv <maxv@NetBSD.org>
Date:   Thu Mar 28 19:00:40 2019 +0000

    Move NVMM in the "any" class, so that it can be enabled in GENERIC. Add
    missing files in files.nvmm, and add NVMM (commented out) in the amd64
    GENERIC. Remove the "caveats" section in the man page.

commit fe5d0e57b74a2dc1dccbd4262cbf1b3ef26b14c3
Author: maxv <maxv@NetBSD.org>
Date:   Wed Apr 3 17:32:58 2019 +0000

    Add MSR_TSC.

commit 142dd7189d5502fae113cded5d2b5952b714fbc3
Author: maxv <maxv@NetBSD.org>
Date:   Wed Apr 3 18:05:55 2019 +0000

    Add new VMCS bits.

commit d4ea4f24d0f171e44ac50f48f055daade6a2d6e0
Author: maxv <maxv@NetBSD.org>
Date:   Wed Apr 3 19:10:58 2019 +0000

    VMX: if PAT is not valid, #GP on WRMSR, rather than crashing the guest.

commit cee0a1fca1498ab9c3de45bd069b06989d6be71b
Author: maxv <maxv@NetBSD.org>
Date:   Thu Apr 4 17:33:47 2019 +0000

    Check the GPA permissions too in the Assists, because it is possible that
    the guest traps on a page the virtualizer marked as read-only (even if it
    appears as read-write in the HVA).

commit 2d7d00f32f8289cc7ae72e1790add5e61c84b17a
Author: maxv <maxv@NetBSD.org>
Date:   Sat Apr 6 11:49:53 2019 +0000

    Replace the misc[] state by a new compressed nvmm_x64_state_intr structure,
    which describes the interruptibility state of the guest.
    
    Add evt_pending, read-only, that allows the virtualizer to know if an event
    is pending.

commit dbae0ceee4ed07f63929343fe58b428d04b94563
Author: maxv <maxv@NetBSD.org>
Date:   Sun Apr 7 14:05:15 2019 +0000

    Don't allow unloading when there are still VMs registered, and don't allow
    auto-unloading at all. Not a big problem actually, because since I changed
    the module class it's not auto-loadable anymore.

commit 8f258a3edcb2d2d522f3090bba8c022fb41eb37a
Author: maxv <maxv@NetBSD.org>
Date:   Sun Apr 7 14:13:03 2019 +0000

    Sync, and fix grammar.

commit c5ddaabe68e9be361363faf215be0ef87108ed81
Author: maxv <maxv@NetBSD.org>
Date:   Sun Apr 7 14:28:50 2019 +0000

    Invert the filtering priority: now the kernel-managed cpuid leaves are
    overwritable by the virtualizer. This is useful to virtualizers that want
    to 100% control every leaf.

commit 4077ef59599375eeed48ac452d63d440583af5a9
Author: maxv <maxv@NetBSD.org>
Date:   Mon Apr 8 18:21:42 2019 +0000

    Use the fd_clone approach, to avoid losing references to the registered
    VMs during fork(). We attach an nvmm_owner struct to the fd, reference it
    in each VM, and identify the process' VMs by just comparing the pointer.

commit 03a3a4384fc5891c12b53dce75f0493e6a093b84
Author: maxv <maxv@NetBSD.org>
Date:   Mon Apr 8 18:23:46 2019 +0000

    Don't forget to call (*machine_destroy) when killing VMs.

commit 80bc33b376c379a8434db4527fdb5d1bd7b803e0
Author: maxv <maxv@NetBSD.org>
Date:   Mon Apr 8 18:30:54 2019 +0000

    Switch to MODULE_CLASS_MISC, from pgoyette@.

commit 71d0393c414b5bd8f6682493422c2eb34d0a4d57
Author: maxv <maxv@NetBSD.org>
Date:   Wed Apr 10 18:49:04 2019 +0000

    Add the NVMM_CTL ioctl, always privileged regardless of the permissions of
    /dev/nvmm. We'll use it to provide a way for an admin to control the
    registered VMs in the kernel.
    
    Add an associated wrapper in libnvmm.

commit 2b64ba44dd6880f19807eee36a4dbd534a8d49b9
Author: maxv <maxv@NetBSD.org>
Date:   Sat Apr 20 08:45:30 2019 +0000

    Ah, take XSAVE into account in ECX too, not just in EBX. Otherwise if the
    guest relies only on ECX to initialize/copy the FPU state (like NetBSD
    does), spurious #GPs can be encountered because the bitmap is clobbered.

commit 5752ed64e4a0572edcb1573eff2de53bcd5e7413
Author: maxv <maxv@NetBSD.org>
Date:   Wed Apr 24 18:19:28 2019 +0000

    Provide the hardware error code for NVMM_EXIT_INVALID, useful when
    debugging.

commit 6e083df1253a316ead4b31c755b9be92dc1b8ad9
Author: maxv <maxv@NetBSD.org>
Date:   Wed Apr 24 18:45:15 2019 +0000

    Match the structure order, for better cache utilization.

commit 005f3c8b3a55c052dd3915faab8b403f46f4858f
Author: maxv <maxv@NetBSD.org>
Date:   Sat Apr 27 08:16:19 2019 +0000

    Optimize nvmm-intel, use inlined GCC assembly rather than function calls.

commit 5e829d46acdcee0321e4e572bb68f62c664bb94b
Author: maxv <maxv@NetBSD.org>
Date:   Sat Apr 27 09:06:18 2019 +0000

    If guest events were being processed when a #VMEXIT occurred, reschedule
    the events rather than dismissing them. This can happen for instance when a
    guest wants to process an exception and an #NPF occurs on the guest IDT. In
    practice it occurs only when the host swapped out specific guest pages.

commit 476039c9961b8b37a4d97b94c56da14e713693f9
Author: maxv <maxv@NetBSD.org>
Date:   Sat Apr 27 15:45:21 2019 +0000

    Reorder the NVMM headers, to make a clear(er) distinction between MI and
    MD. Also use #defines for the exit reasons rather than an union. No ABI
    change, and no API change except 'cap->u.{}' renamed to 'cap->arch'.

commit 7fd08024e558cd8d5f30167e7232aefb8588350c
Author: maxv <maxv@NetBSD.org>
Date:   Sat Apr 27 17:30:38 2019 +0000

    Mmh, fix nvmm_vcpu_create(), the cpuid is given, and must not be chosen
    from the free map. Looks like I forgot this after all my design rounds.
    While here reorder the initialization.

commit 45e2a27ec0c25d5be79280294f4d27e2ab9e02d3
Author: maxv <maxv@NetBSD.org>
Date:   Sun Apr 28 14:22:13 2019 +0000

    Modify the communication layer between the kernel NVMM driver and libnvmm:
    introduce a bidirectionnal "comm page", a page of memory shared between
    the kernel and userland, and used to transfer data in and out in a more
    performant manner than ioctls.
    
    The comm page contains the VCPU state, plus three flags:
    
     - "wanted": the states the kernel must get/set when requested via ioctls
     - "cached": the states that are in the comm page
     - "commit": the states the kernel must set in vcpu_run
    
    The idea is to avoid performing expensive syscalls, by using the VCPU
    state cached, either explicitly or speculatively, in the comm page. For
    example, if the state is cached we do a direct 1->5 with no syscall:
    
              +---------------------------------------------+
              |                    Qemu                     |
              +---------------------------------------------+
                   |                                   ^
                   | (0) nvmm_vcpu_getstate            | (6) Done
                   |                                   |
                   V                                   |
                 +---------------------------------------+
                 |                libnvmm                |
                 +---------------------------------------+
                      |   ^          |               ^
            (1) State |   | (2) No   | (3) Ioctl:    | (5) Ok, state
            cached?   |   |          | "please cache | fetched
                      |   |          |  the state"   |
                      V   |          |               |
                  +-----------+      |               |
                  | Comm Page |------+---------------+
                  +-----------+      |
                           ^         |
              (4) "Alright |         V
                   babe"   |     +--------+
                           +-----| Kernel |
                                 +--------+
    
    The main changes in behavior are:
    
     - nvmm_vcpu_getstate(): won't emit a syscall if the state is already
       cached in the comm page, will just fetch from the comm page directly
     - nvmm_vcpu_setstate(): won't emit a syscall at all, will just cache
       the wanted state in the comm page
     - nvmm_vcpu_run(): will commit the to-be-set state in the comm page,
       as previously requested by nvmm_vcpu_setstate()
    
    In addition to this, the kernel NVMM driver is changed to speculatively
    cache certain states known to be of interest, so that the future
    nvmm_vcpu_getstate() calls libnvmm or the emulator will perform will use
    the comm page rather than expensive syscalls. For example, if an I/O
    VMEXIT occurs, the I/O Assist in libnvmm will want GPRS+SEGS+CRS+MSRS,
    and now the kernel caches all of that in the comm page before returning
    to userland.
    
    Overall, in a normal run of Windows 10, this saves several millions of
    syscalls. Eg on a 4CPU Intel with 4VCPUs, booting the Win10 install ISO
    goes from taking 1min35 to taking 1min16.
    
    The libnvmm API is not changed, but the ABI is. If we changed the API it
    would be possible to save expensive memcpys on libnvmm's side. This will
    be avoided in a future version. The comm page can also be extended to
    implement future services.

commit 604a261057e604c2616f4404e5b9f06f3e18a618
Author: maxv <maxv@NetBSD.org>
Date:   Mon Apr 29 17:27:57 2019 +0000

    Remove useless calls to nvmm_init().

commit ed9fe4d15fc1177ea499bd2708f7dc6965686646
Author: maxv <maxv@NetBSD.org>
Date:   Mon Apr 29 18:54:25 2019 +0000

    Stop taking care of the INT/NMI windows in the kernel, the emulator is
    supposed to do that itself.

commit 9b9b8400e517955de0a7298d2d980b618e657f42
Author: maxv <maxv@NetBSD.org>
Date:   Mon Apr 29 19:03:17 2019 +0000

    sync with reality

commit 6087bfac2af9811302540232697de9be85ac714c
Author: maxv <maxv@NetBSD.org>
Date:   Wed May 1 09:20:21 2019 +0000

    Use the comm page to inject events, rather than ioctls, and commit them in
    vcpu_run. This saves a few syscalls and copyins.
    
    For example on Windows 10, moving the mouse from the left to right sides of
    the screen generates ~500 events, which now don't result in syscalls.
    
    The error handling is done in vcpu_run and it is less precise, but this
    doesn't matter a lot, and will be solved with future NVMM error codes.

commit 08b6253b720d3b8f53c9ab0c3ed78e48dfb117d8
Author: maxv <maxv@NetBSD.org>
Date:   Sat May 11 07:31:56 2019 +0000

    Rework the machine configuration interface.
    
    Provide three ranges in the conf space: <libnvmm:0-100>, <MI:100-200> and
    <MD:200-...>. Remove nvmm_callbacks_register(), and replace it by the conf
    op NVMM_MACH_CONF_CALLBACKS, handled by libnvmm. The callbacks are now
    per-machine, and the emulators should now do:
    
    -       nvmm_callbacks_register(&cbs);
    +       nvmm_machine_configure(&mach, NVMM_MACH_CONF_CALLBACKS, &cbs);
    
    This provides more granularity, for example if the process runs two VMs
    and wants different callbacks for each.

commit b0c3d26d9a3ac415670cce95549ae9a3d868a46d
Author: maxv <maxv@NetBSD.org>
Date:   Sat May 11 07:40:38 2019 +0000

    Sync with reality.

commit 8d06476a4af78e72e2840470daac171f98e30abc
Author: maxv <maxv@NetBSD.org>
Date:   Sat May 11 07:44:00 2019 +0000

    Replace "VMM" by "emulator", clearer.

commit e66fce09b8b94eb5624b1f46bd6a75f17c24c076
Author: maxv <maxv@NetBSD.org>
Date:   Wed May 15 04:39:52 2019 +0000

    NVMM: Expose MD_CLEAR to the guests.

commit 54ebea61e129501586cb6328d90159ccc3b04162
Author: maxv <maxv@NetBSD.org>
Date:   Sat May 18 08:55:59 2019 +0000

    Now that SVS cannot be disabled at run time, MSR_LSTAR is static, so no
    need to save it on each VM enter.

commit 2d39ebf166acd9d66525b7758ae0391f7a5e5ea4
Author: maxv <maxv@NetBSD.org>
Date:   Sat Jun 8 07:27:44 2019 +0000

    Change the NVMM API to reduce data movements. Sent to tech-kern@.

commit 4febafc81e1fa9d1ed55d506625dd3b40e893987
Author: maxv <maxv@NetBSD.org>
Date:   Sun Jun 16 18:30:31 2019 +0000

    Make sure VMX-outside-SMX is allowed. It may not be if the BIOS decided to
    disable VMX. Seen on an HP laptop, where NVMM would panic because of that.

commit 8c134110d67792ba7deb0a095c19b8d04dac6c34
Author: maxv <maxv@NetBSD.org>
Date:   Sat Jul 6 05:13:10 2019 +0000

    Localify two functions that are no longer used outside. Also return the
    error from the *_vcpu_run() functions, now that we commit the states in
    them (which can fail).

commit 6eb4ead5e100ccf72831d9ebaa5257144d0c2b71
Author: maxv <maxv@NetBSD.org>
Date:   Fri Sep 13 14:19:13 2019 +0000

    Always set hwcode on error. Useful for debugging.

commit 46c51c30a5127b69cb986c302da8f65b13f6db5a
Author: maxv <maxv@NetBSD.org>
Date:   Fri Oct 4 12:11:38 2019 +0000

    Add definitions for RDPRU, MCOMMIT, GMET and VTE.

commit 102c55913f4ee6686b9c88b6072e52196d0034fb
Author: maxv <maxv@NetBSD.org>
Date:   Fri Oct 4 12:15:21 2019 +0000

    Fix definition for MWAIT. It should be bit 11, not 12; 12 is the armed
    version.

commit 039e81cf6c8cc3dbbb6afc150097da7c6ec96807
Author: maxv <maxv@NetBSD.org>
Date:   Fri Oct 4 12:17:05 2019 +0000

    Switch to the new PTE naming.

commit 384b2d5c83a21b1987b997b0e0c8b6494c603a3b
Author: maxv <maxv@NetBSD.org>
Date:   Sat Oct 12 06:31:03 2019 +0000

    Rewrite the FPU code on x86. This greatly simplifies the logic and removes
    the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
    port-amd64 a week ago.
    
    Bump the kernel version to 9.99.16.

commit c0ad51c10cea066a0cf5814bb0bc4b30fb9e0a4f
Author: maxv <maxv@NetBSD.org>
Date:   Sun Oct 13 17:32:15 2019 +0000

    Fix incorrect parsing: the R/M field uses a special GPR map when the
    address size is 16 bits, regardless of the actual operating mode. With
    this special map there can be two registers referenced at once, and
    also disp16-only.
    
    Implement this special behavior, and add associated tests. While here
    simplify a few things.
    
    With this in place, the Windows 95 installer initializes correctly.
    
    Part of PR/54611.

commit c5376a643d326adbd88085db34a271aab978874c
Author: maxv <maxv@NetBSD.org>
Date:   Mon Oct 14 10:39:24 2019 +0000

    Implement XCHG, add associated tests, and add comments to explain. With
    this in place the Windows 95 installer completes successfuly.
    
    Part of PR/54611.

commit 64285dbda9d9601f2516b944a1470d1d537d8c2e
Author: maxv <maxv@NetBSD.org>
Date:   Mon Oct 14 10:43:40 2019 +0000

    Improve nvmm_vcpu_dump().

commit 8480c50537edd93bd6d5a2c8013741ded8b60267
Author: maxv <maxv@NetBSD.org>
Date:   Sat Oct 19 19:45:10 2019 +0000

    Put back 'default', because llvm apparently doesn't realize that all cases
    are covered in the switch.

commit 09694d91da1cdf17e6c0a05c186cd7efcfedb16e
Author: maxv <maxv@NetBSD.org>
Date:   Wed Oct 23 07:01:11 2019 +0000

    Miscellaneous changes in NVMM, to address several inconsistencies and
    issues in the libnvmm API.
    
     - Rename NVMM_CAPABILITY_VERSION to NVMM_KERN_VERSION, and check it in
       libnvmm. Introduce NVMM_USER_VERSION, for future use.
    
     - In libnvmm, open "/dev/nvmm" as read-only and with O_CLOEXEC. This is to
       avoid sharing the VMs with the children if the process forks. In the
       NVMM driver, force O_CLOEXEC on open().
    
     - Rename the following things for consistency:
           nvmm_exit*              -> nvmm_vcpu_exit*
           nvmm_event*             -> nvmm_vcpu_event*
           NVMM_EXIT_*             -> NVMM_VCPU_EXIT_*
           NVMM_EVENT_INTERRUPT_HW -> NVMM_VCPU_EVENT_INTR
           NVMM_EVENT_EXCEPTION    -> NVMM_VCPU_EVENT_EXCP
       Delete NVMM_EVENT_INTERRUPT_SW, unused already.
    
     - Slightly reorganize the MI/MD definitions, for internal clarity.
    
     - Split NVMM_VCPU_EXIT_MSR in two: NVMM_VCPU_EXIT_{RD,WR}MSR. Also provide
       separate u.rdmsr and u.wrmsr fields. This is more consistent with the
       other exit reasons.
    
     - Change the types of several variables:
           event.type                  enum -> u_int
           event.vector                uint64_t -> uint8_t
           exit.u.*msr.msr:            uint64_t -> uint32_t
           exit.u.io.type:             enum -> bool
           exit.u.io.seg:              int -> int8_t
           cap.arch.mxcsr_mask:        uint64_t -> uint32_t
           cap.arch.conf_cpuid_maxops: uint64_t -> uint32_t
    
     - Delete NVMM_VCPU_EXIT_MWAIT_COND, it is AMD-only and confusing, and we
       already intercept 'monitor' so it is never armed.
    
     - Introduce vmx_exit_insn() for NVMM-Intel, similar to svm_exit_insn().
       The 'npc' field wasn't getting filled properly during certain VMEXITs.
    
     - Introduce nvmm_vcpu_configure(). Similar to nvmm_machine_configure(),
       but as its name indicates, the configuration is per-VCPU and not per-VM.
       Migrate and rename NVMM_MACH_CONF_X86_CPUID to NVMM_VCPU_CONF_CPUID.
       This becomes per-VCPU, which makes more sense than per-VM.
    
     - Extend the NVMM_VCPU_CONF_CPUID conf to allow triggering VMEXITs on
       specific leaves. Until now we could only mask the leaves. An uint32_t
       is added in the structure:
            uint32_t mask:1;
            uint32_t exit:1;
            uint32_t rsvd:30;
       The two first bits select the desired behavior on the leaf. Specifying
       zero on both resets the leaf to the default behavior. The new
       NVMM_VCPU_EXIT_CPUID exit reason is added.

commit 542ebda393d9adc284250bfc57b070e1ea327616
Author: maxv <maxv@NetBSD.org>
Date:   Wed Oct 23 12:02:55 2019 +0000

    Three changes in libnvmm:
    
     - Add 'mach' and 'vcpu' backpointers in the nvmm_io and nvmm_mem
       structures.
    
     - Rename 'nvmm_callbacks' to 'nvmm_assist_callbacks'.
    
     - Rename and migrate NVMM_MACH_CONF_CALLBACKS to NVMM_VCPU_CONF_CALLBACKS,
       it now becomes per-VCPU.

commit 780a287d23b27aeced6ddf1a5f341f254866d86f
Author: maxv <maxv@NetBSD.org>
Date:   Fri Oct 25 09:09:24 2019 +0000

    Update the libnvmm man page:
    
     - Sync the naming with reality.
    
     - Replace "relevant" by "desired" and "virtualizer" by "emulator", closer
       to what I meant.
    
     - Add a "VCPU Configuration" section.
    
     - Add a "Machine Ownership" section.

commit 5a3637508ece1a726e410c1d7f1b08b892875161
Author: maxv <maxv@NetBSD.org>
Date:   Sun Oct 27 07:08:15 2019 +0000

    Add the "nvmm" group, and make nvmm_init() public. Sent to tech-kern@ a few
    days ago.

commit fc80e0cce26c2da124035ec9fbad52a9e9826fee
Author: maxv <maxv@NetBSD.org>
Date:   Sun Oct 27 08:30:05 2019 +0000

    Use the new PTE naming, and define CR3_FRAME_* separately. No functional
    change.

commit 0b241ea6128f97cd03ba49c6a93ce3ee44424ea6
Author: maxv <maxv@NetBSD.org>
Date:   Sun Oct 27 10:28:55 2019 +0000

    Add a new VCPU conf option, that allows userland to request VMEXITs after a
    TPR change. This is supported on all Intel CPUs, and not-too-old AMD CPUs.
    
    The reason for wanting this option is that certain OSes (like Win10 64bit)
    manage interrupt priority in hardware via CR8 directly, and for these OSes,
    the emulator may want to sync its internal TPR state on each change.
    
    Add two new fields in cap.arch, to report the conf capabilities. Report TPR
    only on Intel for now, not AMD, because I don't have a recent AMD CPU on
    which to test.

commit cdd813f0b7d494c4c5b85c7bf45fbc41e498ee2e
Author: maxv <maxv@NetBSD.org>
Date:   Sun Oct 27 11:11:09 2019 +0000

    Mask CPUID leaf 0x0A on Intel, because we don't want the guest to try (and
    fail) to probe the PMC MSRs. This avoids "Unexpected WRMSR" warnings in
    qemu-nvmm.

commit 1e2f864ebd6551a533e95db18e49fe52eab4c068
Author: maxv <maxv@NetBSD.org>
Date:   Sun Oct 27 18:26:54 2019 +0000

    Add PCID support in the guests. This speeds up most 64bit guests, because
    since Meltdown, everybody uses PCID (including NetBSD).

commit 2b9ed29212620c750d54ef1bd7e660ff8461f2da
Author: maxv <maxv@NetBSD.org>
Date:   Sun Oct 27 20:17:36 2019 +0000

    Change the way root_owner works: consider the calling process as root_owner
    not if it has root privileges, but if the /dev/nvmm device was opened with
    write permissions. Introduce the undocumented nvmm_root_init() function to
    achieve that.
    
    The goal is to simplify the logic and have more granularity, eg if we want
    a monitoring agent to access VMs but don't want to give this agent real
    root access on the system.

commit cb9488e548b038f08bae0eec68a312c8ec88a155
Author: maxv <maxv@NetBSD.org>
Date:   Mon Oct 28 08:30:49 2019 +0000

    A few changes:
    
     - Use smaller types in struct nvmm_capability.
     - Use smaller type for nvmm_io.port.
     - Switch exitstate to a compacted structure.

commit b68235391efdb6dc8b21df51c97c4d1c856468ce
Author: maxv <maxv@NetBSD.org>
Date:   Mon Oct 28 09:00:08 2019 +0000

    Add nram in struct nvmm_ctl_mach_info.

commit e7196dfa79dd717397c6838b193c70f29271c78d
Author: wiz <wiz@NetBSD.org>
Date:   Mon Oct 28 13:43:42 2019 +0000

    Macro tidyness.

commit d8120ee2964a92fbe226dba2ea89f9a533e2c730
Author: maxv <maxv@NetBSD.org>
Date:   Mon Oct 28 14:20:28 2019 +0000

    should be fork(2), noticed by wiz

commit 886668d28dbd5508e6f8a425955f75e60f9c72bc
Author: joerg <joerg@NetBSD.org>
Date:   Mon Oct 28 18:12:17 2019 +0000

    Annotate a covering switch as such to avoid warnings about missing
    returns.

commit d885bfddbcf92add5d655500ee2caf5969558b56
Author: maxv <maxv@NetBSD.org>
Date:   Sat Nov 16 17:53:46 2019 +0000

    Don't report MWAITX by default.

commit 1aa7028ed57dfb21ec9b26b76f29f84edb22fc51
Author: maxv <maxv@NetBSD.org>
Date:   Wed Nov 20 10:26:56 2019 +0000

    Hide XSAVES-specific stuff and the masked extended states.

commit c4ad5b81265adc3f726fb48b584a7de88fcf6be4
Author: ad <ad@NetBSD.org>
Date:   Tue Dec 10 18:06:50 2019 +0000

    pg->phys_addr > VM_PAGE_TO_PHYS(pg)

commit 27e27f10d0149b896a359d095c24443932ce13f6
Author: maxv <maxv@NetBSD.org>
Date:   Thu Jan 9 16:20:12 2020 +0000

    Mmh, as noted in PR/54847, this should be uint64_t, not uint16_t. Harmless
    because we use only the two lowest bits anyway.
    
    I believe this could be caught by KUBSAN; time to do another round of
    NVMM+K_SAN testing.

commit 6bee680fa9877a273e637cda752421643c468931
Author: maxv <maxv@NetBSD.org>
Date:   Thu Jan 9 16:27:57 2020 +0000

    Registering the host's CR0 is done outside of the VCPU loop, so it must be
    cleared because it is also cleared inside the loop.
    
    Not clearing it could trigger DNAs on VMEXITs, because STTS/CLTS are still
    here as part of debugging since my FPU overhaul.

commit 3f1abe1b72bbc3498fb7f1dd33fec62645a02f34
Author: maxv <maxv@NetBSD.org>
Date:   Sun Feb 9 12:19:01 2020 +0000

    Reference nvmmctl(8).

commit a5b1ac72e322329e0a3ffdcc0e98eae95f2e07d9
Author: joerg <joerg@NetBSD.org>
Date:   Fri Feb 21 00:26:21 2020 +0000

    Explicitly cast pointers to uintptr_t before casting to enums. They are
    not necessarily the same size. Don't cast pointers to bool, check for
    NULL instead.

commit cea4bb6570934d927e9c5610c254477fba679b8a
Author: tnn <tnn@NetBSD.org>
Date:   Thu Mar 12 13:01:59 2020 +0000

    vmx_vmptrst(): only used when DIAGNOSTIC

commit 65d1dc513f0f2625e76fb79286f2a00c2a5c5598
Author: ad <ad@NetBSD.org>
Date:   Sat Mar 14 18:08:38 2020 +0000

    - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
      functions: preempt_point() and preempt_needed().
    
    - preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
      any priority boost gained earlier from blocking.

commit b02390f73f38a8fd9c913578c28086d16aef3372
Author: ad <ad@NetBSD.org>
Date:   Sun Mar 22 00:16:16 2020 +0000

    x86 pmap:
    
    - Give pmap_remove_all() its own version of pmap_remove_ptes() that on native
      x86 does the bare minimum needed to clear out PTPs.  Cuts ~4% sys time on
      'build.sh release' for me.
    
    - pmap_sync_pv(): there's no need to issue a redundant TLB shootdown.  The
      caller waits for the competing operation to finish.
    
    - Bring 'options TLBSTATS' up to date.

commit 1ceb9fde91690774a19f960c963aaeb32e9e4dbe
Author: maxv <maxv@NetBSD.org>
Date:   Sun Apr 26 19:31:36 2020 +0000

    In nvmm_open(), make sure an implementation was found. This fixes an
    initialization bug triggerable in certain conditions.
    
    If you build nvmm inside the kernel, AND have a cpu that is not supported,
    AND run nvmmctl (or qemu-nvmm, both being the only binaries in the "nvmm"
    group), you get a page fault.
    
    This is because when nvmm is built inside the kernel, the kernel registers
    nvmm_cdevsw behind nvmm's back. The ioctl is therefore always accessible,
    and will hit NULL pointers if nvmm_init() failed.
    
    Problem reported by Andrei M. on netbsd-users@, thanks.

commit 89b3b8cd2752484f5d43ca4743de43ca1b0d4608
Author: maxv <maxv@NetBSD.org>
Date:   Thu Apr 30 16:50:17 2020 +0000

    When the identification fails, print the reason.

commit d5aadcc9ac991f9dd5a5f76850163a5395b38fec
Author: maxv <maxv@NetBSD.org>
Date:   Thu Apr 30 16:56:23 2020 +0000

    If we were processing a software int/excp, and got a VMEXIT in the middle,
    we must also reflect the instruction length, otherwise the next VMENTER
    fails and Qemu shuts the guest down.

commit 90c0d3d161dcf3ebe12319b39c940fc9b58e3bdb
Author: maxv <maxv@NetBSD.org>
Date:   Sat May 9 08:39:07 2020 +0000

    On Intel CPUs, CPUID leaf 0xB, too, provides topology information, so
    filter it correctly, to avoid inconsistencies if the host has SMT.
    
    This fixes HaikuOS which fetches SMT information from there and would
    panic because of the inconsistencies.

commit 557a4ab541753196ae7989ae145494fa05ec400a
Author: maxv <maxv@NetBSD.org>
Date:   Sat May 9 16:18:57 2020 +0000

    Improve the CPUID emulation of basic leaves:
     - Hide DCA and PQM, they cannot be used in guests.
     - On Intel, explicitly handle each basic leaf until 0x16.
     - On AMD, explicitly handle each basic leaf until 0x0D.

commit 02636961efc2af09ccf06c37dc1bf49a9d16fdd4
Author: maxv <maxv@NetBSD.org>
Date:   Sun May 10 06:24:16 2020 +0000

    Respect the convention for the hypervisor information: return the highest
    hypervisor leaf in 0x40000000.EAX.

commit 31dceb1a51243a1849d5afe6f672f8daf8918515
Author: maxv <maxv@NetBSD.org>
Date:   Thu May 21 07:36:16 2020 +0000

    Improve the CPUID emulation on nvmm-intel: limit the highest basic and
    hypervisor leaves.

commit c96146a49e0af77fbbb1f4ea7421cda08ec8ce24
Author: maxv <maxv@NetBSD.org>
Date:   Thu May 21 07:43:23 2020 +0000

    Complete rev1.26: reset nvmm_impl to NULL in nvmm_fini().

commit 6b6041894cd84f2c1e8fc605751f4b84c74c031a
Author: maxv <maxv@NetBSD.org>
Date:   Sun May 24 08:08:49 2020 +0000

    Gather the conditions to return from the VCPU loops in nvmm_return_needed(),
    and use it in nvmm_do_vcpu_run() as well. This fixes two undesired behaviors:
    
     - When a VM initializes, the many nested page faults that need processing
       could cause the calling thread to occupy the CPU too much if we're unlucky
       and are only getting repeated nested page faults thousands of times in a
       row.
    
     - When the emulator calls nvmm_vcpu_run() and immediately sends a signal to
       stop the VCPU, it's better to check signals earlier and leave right away,
       rather than doing a round of VCPU run that could increase the time spent
       by the emulator waiting for the return.

commit 9f300452e9d939531a438e96b8c792fe27822073
Author: maxv <maxv@NetBSD.org>
Date:   Thu Jun 18 16:31:15 2020 +0000

    style

commit f47abc1d1d995477b0cb0e01428ccd4b19fed392
Author: maxv <maxv@NetBSD.org>
Date:   Thu Jun 25 17:01:19 2020 +0000

    Register NVMM as an actual pseudo-device. Without PMF handler, to
    explicitly disallow ACPI suspend if NVMM is running.
    
    Should fix PR/55406.

commit 96c02688068ccc856c249b436e3d7a27ed72c0c5
Author: maxv <maxv@NetBSD.org>
Date:   Fri Jul 3 16:09:54 2020 +0000

    Print the backend name when attaching.

commit dd40e9a30d1234c07a9e933708f38a594a60a3d5
Author: yamaguchi <yamaguchi@NetBSD.org>
Date:   Tue Jul 14 00:45:52 2020 +0000

    Introduce per-cpu IDTs
    
    This is realized by following modifications:
    - Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
    - Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
    - Copy the IDT entries for cpu0 to other CPUs at attach
       - These are, for example, exceptions, db, system calls, etc.
    
    And, added a kernel option named PCPU_IDT to enable the feature.

commit b5eb68f5a3452bfd2f4ce2276b16c40ddb9ba4de
Author: maxv <maxv@NetBSD.org>
Date:   Sat Jul 18 20:56:53 2020 +0000

    Now that the IDT is per-CPU, it must be saved/restored on each CPU
    independently.

commit 1e4dd926af83264b8f72adf0c73668400839cddb
Author: maxv <maxv@NetBSD.org>
Date:   Sun Jul 19 06:36:37 2020 +0000

    The TLB flush IPIs do not respect the IPL, so enforcing IPL_HIGH has no
    effect. Disable interrupts earlier instead. This prevents a possible race
    against such IPIs.

commit 8ffd72f910cc76f62275b471f4312c8824e6443a
Author: maxv <maxv@NetBSD.org>
Date:   Sun Jul 19 06:56:09 2020 +0000

    Switch to fpu_kern_enter/leave, to prevent clobbering, now that the kernel
    itself uses the fpu.

commit 799dec2dd9552dd10529592e5bfa73a9d63354a3
Author: maxv <maxv@NetBSD.org>
Date:   Sat Aug 1 08:18:36 2020 +0000

    Put the few x86-specific structures under #ifdef __x86_64__, for clarity.

commit 2f80dd1987a9f351952d3e2ee72971ce21f19830
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 5 10:20:50 2020 +0000

    Simplify, remove unnecessary #ifdef DIAGNOSTIC around KASSERTs.

commit 6e2006e1622cec346cb4c61198e272c2cd685976
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 5 10:31:37 2020 +0000

    Use ULL, to make it clear we are unsigned.

commit 83007be7f7fcf89a0e7660901cfcd0a863ea68ea
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 5 15:16:50 2020 +0000

    Make it easier to understand what's going on, no functional change.

commit 097a99600fc948d8ce07fd1b4cd620eb3204e533
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 5 15:20:09 2020 +0000

    Add new field definitions.

commit 9a3ed323e4e5a6c6e5bcde4162c527cc229fa170
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 5 15:22:25 2020 +0000

    Add new field definitions, and intercept everything, for future-proofness.

commit 199f8cf2d6962e6407f685e0fc5b66b74d7271da
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 5 15:38:28 2020 +0000

    Improve the CPUID emulation:
    
     - Hide SGX*, PKU, WAITPKG, and SKINIT, because they are not supported.
     - Hide HLE and RTM, part of TSX. Because TSX is just too buggy and we
       cannot guarantee that it remains enabled in the guest (if for example
       the host disables TSX while the guest is running). Nobody wants this
       crap anyway, so bye-bye.
     - Advertise FSREP_MOV, because no reason to hide it.

commit e54f5a405293b999ada410e7e4591c73bd5bfe09
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 5 16:36:33 2020 +0000

    Add CTASSERT.

commit 2bbda022ac6dceda15b114586c24cf2ce9e0c3f8
Author: maxv <maxv@NetBSD.org>
Date:   Tue Aug 11 15:23:10 2020 +0000

    Hide OSPKE. NFC since the host never uses PKU, but still.

commit aae889a992e1b4352ea470c8e82609dd39d67abc
Author: maxv <maxv@NetBSD.org>
Date:   Tue Aug 11 15:27:46 2020 +0000

    Improve emulation of MSR_IA32_ARCH_CAPABILITIES: publish only the *_NO
    bits. Initially they were the only ones there, but Intel then added other
    bits we aren't interested in, and they must be filtered out.

commit edd13e30ac879948c35503bb64d6dde25a350b5a
Author: maxv <maxv@NetBSD.org>
Date:   Tue Aug 11 15:31:51 2020 +0000

    Improve the CPUID emulation on nvmm-intel:
    
     - Limit the highest extended leaf.
     - Limit 0x00000007 to ECX=0, for future-proofness.

commit fb838398250600aa351210790bb3e466dd378fc9
Author: maxv <maxv@NetBSD.org>
Date:   Tue Aug 11 15:48:42 2020 +0000

    Micro-optimize: use pushq instead of pushw. To avoid LCP stalls and
    unaligned stack accesses.

commit e7bd6d9a8d5077de045d511a7bcd6dd2b2b007a3
Author: maxv <maxv@NetBSD.org>
Date:   Tue Aug 18 17:03:10 2020 +0000

    nvmm-x86: also flush the guest TLB when CR4.{PCIDE,SMEP} changes

commit cad669b3c98801e03d224d41d73234313ec1b8a5
Author: maxv <maxv@NetBSD.org>
Date:   Tue Aug 18 17:03:58 2020 +0000

    nvmm: localify a variable that doesn't need to be global

commit 4308459a1f3562d7cc61a39b42f8d48a22ec66b0
Author: maxv <maxv@NetBSD.org>
Date:   Tue Aug 18 17:04:37 2020 +0000

    nvmm: use relaxed atomics to read nmachines

commit ab518bc257596919a45c99a0d55770ffac41bea0
Author: maxv <maxv@NetBSD.org>
Date:   Tue Aug 18 17:08:05 2020 +0000

    nvmm-x86-svm: improve the CPUID emulation
    
    Limit the hypervisor range, and properly handle each basic leaf until 0xD.

commit 29c4d4a2e1d5099161a6a1e9ed649acd4d612369
Author: maxv <maxv@NetBSD.org>
Date:   Thu Aug 20 11:07:43 2020 +0000

    nvmm-x86: advertise the SERIALIZE instruction, available on future CPUs

commit b29d3889a2524b11261307a7d23a33c2575f3dfa
Author: maxv <maxv@NetBSD.org>
Date:   Thu Aug 20 11:09:56 2020 +0000

    nvmm-x86: improve the CPUID emulation
    
     - x86-svm: explicitly handle 0x80000007 and 0x80000008. The latter
       contains extended features we must filter out. Apply the same in
       x86-vmx for symmetry.
     - x86-svm: explicitly handle extended leaves until 0x8000001F, and
       truncate to it.

commit d94f8dcced0a10cd654fdf1498a97de7a5f613ea
Author: maxv <maxv@NetBSD.org>
Date:   Sat Aug 22 10:59:05 2020 +0000

    nvmm-x86-svm: dedup code

commit 709ded95090a151848e54eb29b7a3e95b892093f
Author: maxv <maxv@NetBSD.org>
Date:   Sat Aug 22 11:00:00 2020 +0000

    nvmm-x86: hide more CPUID flags, mostly related to perf monitors

commit 83ba6bd9d6a235c1885332116132050d1d2031a5
Author: maxv <maxv@NetBSD.org>
Date:   Sat Aug 22 11:01:10 2020 +0000

    nvmm-x86-vmx: fix detection of the BIOS lock
    
    If it's locked, ensure it's locked with VMX enabled. If it's not locked,
    then lock it ourselves with VMX enabled.
    
    Should fix NetBSD PR/55596.

commit c7c1b20f858ec80897e08555c699b48baf17f33c
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 26 16:28:17 2020 +0000

    nvmm: misc improvements
    
     - use mach->ncpus to get the number of vcpus, now that we have it
     - don't forget to decrement mach->ncpus when a machine gets killed
     - add more __predict_false()

commit 228e6c158bba4cead6bc53398c50fb21da522c12
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 26 16:29:19 2020 +0000

    nvmm-x86-svm: don't forget to intercept INVD
    
    INVD executed in the guest can be dangerous for the host, due to CPU
    caches being flushed without write-back.

commit 80f4f7a3b82d3b0661a2e00dd4baca10075dbb39
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 26 16:29:49 2020 +0000

    nvmm: slightly clarify

commit 71f388f820fdf4071cc06015872690fc97732318
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 26 16:30:50 2020 +0000

    nvmm-x86-vmx: improve the handling of CR4
    
     - Filter out certain features we don't want the guest to enable. This is
       for general correctness, and future-proofness.
     - Flush the guest TLB when certain flags change.

commit 4d6065c1735d37cfcd0806e610c55231acf96f73
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 26 16:32:02 2020 +0000

    nvmm-x86: improve the handling of RFLAGS.RF
    
     - When injecting certain exceptions, set RF. For us to have an up-to-date
       view of RFLAGS, we commit the state before the event.
     - When advancing RIP, clear RF.

commit 3b6ca11d0ae4797779c00f1a5c6bdca3ded618b4
Author: maxv <maxv@NetBSD.org>
Date:   Wed Aug 26 16:33:03 2020 +0000

    nvmm-x86-svm: improve the handling of MSR_EFER
    
    Intercept reads of it as well, just to mask EFER_SVME, which the guest
    doesn't need to see.

commit d298740ae2aaabb9c97987b337b7a8fa0e936929
Author: maxv <maxv@NetBSD.org>
Date:   Sat Aug 29 07:14:17 2020 +0000

    nvmm: explicitly include atomic.h

commit 177603665b3c344a894a8df3cbcf126f66e607ef
Author: maxv <maxv@NetBSD.org>
Date:   Fri Sep 4 17:06:23 2020 +0000

    nvmm-x86-svm: check the SVM revision
    
    Only revision 1 exists, but check it, for future-proofness.

commit 657ab4a5acc2d781d28d0615d8d41662a887da99
Author: maxv <maxv@NetBSD.org>
Date:   Fri Sep 4 17:07:33 2020 +0000

    nvmm-x86-vmx: improve the handling of CR0
    
     - Flush the guest TLB when certain CR0 bits change.
     - If the guest updates a static bit in CR0, then reflect the change in
       VMCS_CR0_SHADOW, for the guest to get the illusion that the change was
       applied. The "real" CR0 static bits remain unchanged.
     - In vmx_vcpu_{g,s}et_state(), take VMCS_CR0_SHADOW into account.
     - Slightly modify the CR4 handling code, just for more symmetry with CR0.

commit ddad819285b52c1f42114413dc01ca1f1baa97ce
Author: maxv <maxv@NetBSD.org>
Date:   Fri Sep 4 17:08:01 2020 +0000

    nvmm: more __read_mostly

commit 64e229ffdde22a67903b83fe5ee92fda6f8a8452
Author: maxv <maxv@NetBSD.org>
Date:   Fri Sep 4 17:09:03 2020 +0000

    nvmm-x86: improve the CPUID emulation
    
     - Mask DTES64, DS_CPL, CID, SDBG, xTPR, PN.
     - B10, B20 and IA64 do not exist, so just remove them.

commit 0d7915df0905c4786bd175aa189aed623a54b372
Author: maxv <maxv@NetBSD.org>
Date:   Sat Sep 5 07:22:25 2020 +0000

    nvmm: update copyright headers

commit 1323febd1adbcc920d113d00eae93aebb944d711
Author: maxv <maxv@NetBSD.org>
Date:   Sat Sep 5 07:26:37 2020 +0000

    x86: rename PGEX_X -> PGEX_I
    
    To match the x86 specification and the other OSes.

commit ebf0d4b209d02634bbfb3f9a6d81dcdb8229cbd3
Author: maxv <maxv@NetBSD.org>
Date:   Sat Sep 5 07:45:44 2020 +0000

    x86: fix several CPUID flags
    
     - Rename: CPUID_PN      -> CPUID_PSN
               CPUID_CFLUSH  -> CPUID_CLFSH
               CPUID_SBF     -> CPUID_PBE
               CPUID_LZCNT   -> CPUID_ABM
               CPUID_P1GB    -> CPUID_PAGE1GB
               CPUID2_PCLMUL -> CPUID2_PCLMULQDQ
               CPUID2_CID    -> CPUID2_CNXTID
               CPUID2_xTPR   -> CPUID2_XTPR
               CPUID2_AES    -> CPUID2_AESNI
       To match the x86 specification and the other OSes.
    
     - Remove: CPUID_B10, CPUID_B20, CPUID_IA64. They do not exist.

commit 7c232c9eaa0c46d44ca2882b76a266a766a40983
Author: riastradh <riastradh@NetBSD.org>
Date:   Sat Sep 5 16:30:10 2020 +0000

    Round of uvm.h cleanup.
    
    The poorly named uvm.h is generally supposed to be for uvm-internal
    users only.
    
    - Narrow it to files that actually need it -- mostly files that need
      to query whether curlwp is the pagedaemon, which should maybe be
      exposed by an external header.
    
    - Use uvm_extern.h where feasible and uvm_*.h for things not exposed
      by it.  We should split up uvm_extern.h but this will serve for now
      to reduce the uvm.h dependencies.
    
    - Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
      UVMHIST(ubchist), since ubchist is declared in uvm.h but the
      reference evaporates if UVMHIST is not defined, so we reduce header
      file dependencies.
    
    - Make uvm_device.h and uvm_swap.h independently includable while
      here.
    
    ok chs@

commit 4911e12740006ccd4c9b69d8ffe550344ebd3a2c
Author: riastradh <riastradh@NetBSD.org>
Date:   Sun Sep 6 02:18:53 2020 +0000

    Fix fallout from previous uvm.h cleanup.
    
    - pmap(9) needs uvm/uvm_extern.h.
    
    - x86/pmap.h is not usable on its own; it is only usable if included
      via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).
    
    - Make nvmm.h and nvmm_internal.h standalone.

commit 98b282ab067b39cfe0ff3de5e686f33307a60478
Author: maxv <maxv@NetBSD.org>
Date:   Tue Sep 8 16:58:38 2020 +0000

    nvmm: cosmetic changes
    
     - Style.
     - Explicitly include ioccom.h.

commit 71086e4f94355507cf868b098f844261dbfa678b
Author: maxv <maxv@NetBSD.org>
Date:   Tue Sep 8 17:00:07 2020 +0000

    nvmm-x86-vmx: improve the handling of CR0
    
     - CR0_ET is hard-wired to 1 in the cpu, so force CR0_ET to 1 in the
       shadow.
     - Clarify.

commit 3c8d9b6d6ccc61ff6b9fd9151d12911bec084ab2
Author: maxv <maxv@NetBSD.org>
Date:   Tue Sep 8 17:02:03 2020 +0000

    nvmm-x86: avoid hogging behavior observed recently
    
    When the FPU code got rewritten in NetBSD, the dependency on IPL_HIGH was
    eliminated, and I took _vcpu_guest_fpu_enter() out of the VCPU loop since
    there was no need to be in the splhigh window.
    
    Later, the code was switched to use the kernel FPU API, API that works at
    IPL_VM, not at IPL_NONE.
    
    These two changes mean that the whole VCPU loop is now executing at IPL_VM,
    which is not desired, because it introduces a delay in interrupt processing
    on the host in certain cases.
    
    Fix this by putting _vcpu_guest_fpu_enter() back inside the VCPU loop.

commit df2a32627dcc4d633067b130fc32a09e1778f5d8
Author: mgorny <mgorny@NetBSD.org>
Date:   Sat Oct 24 07:14:29 2020 +0000

    Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs
    
    When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64
    use the 64-suffixed variant in order to include the complete FIP/FDP
    registers in the x87 area.
    
    The difference between the two variants is that the FXSAVE64 (new)
    variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64),
    while the legacy FXSAVE variant uses split fields: 32-bit offset,
    16-bit segment and 16-bit reserved field (union fp_addr.fa_32).
    The latter implies that the actual addresses are truncated to 32 bits
    which is insufficient in modern programs.
    
    The change is applied only to 64-bit programs on amd64.  Plain i386
    and compat32 continue using plain FXSAVE.  Similarly, NVMM is not
    changed as I am not familiar with that code.
    
    This is a potentially breaking change.  However, I don't think it likely
    to actually break anything because the data provided by the old variant
    were not meaningful (because of the truncated pointer).

commit b5e2c38d48a2920baff9640c9cce28443b479ad0
Author: reinoud <reinoud@NetBSD.org>
Date:   Fri Oct 30 21:06:13 2020 +0000

    Implement missing (REPE) CMPS instruction support in NVMMs x86_decode().
    
    In apparently rare cases the (REPE) CMPS instruction can trigger an memory
    assist. NVMM wouldn't recognize the instruction and thus couldn't assist and
    Qemu would abort.

commit 96e3f391c5149ea313afbd572156de01b399594c
Author: reinoud <reinoud@NetBSD.org>
Date:   Sat Oct 31 15:44:01 2020 +0000

    Revert (REPE) CMPS support per request of Maxime, it is incorrect.

commit 37a5fdbb7dca9549e0c5d0a335a2b13e54c563bd
Author: reinoud <reinoud@NetBSD.org>
Date:   Sun Dec 27 20:56:14 2020 +0000

    Implement support for trapping REP CMPS instructions in NVMM.
    
    Qemu would abort hard when NVMM would get a memory trap on the instruction
    since it didn't know it.

commit 3bd11f1298baa52e684e3dd8fcf4d8c209d4cd3b
Author: reinoud <reinoud@NetBSD.org>
Date:   Fri Mar 26 15:59:53 2021 +0000

    Implement nvmm_vcpu::stop, a race-free exit from nvmm_vcpu_run() without
    signals. This introduces a new kernel and userland NVMM version indicating
    this support.
    
    Patch by Kamil Rytarowski <kamil@netbsd.org> and committed on his request.

commit 13dd0447d7455541259359dbed575110868009a0
Author: reinoud <reinoud@NetBSD.org>
Date:   Tue Apr 6 08:40:17 2021 +0000

    Implement nvmm_vcpu::stop, a race-free exit from nvmm_vcpu_run() without
    signals. This introduces a new kernel and userland NVMM version indicating
    this support.
    
    Patch by Kamil Rytarowski <kamil@netbsd.org> and committed on his request.
    
    This is the missing libnvmm part I forgot to include in the origional commit.

commit 46196ce544fdf340da7304aef5f2f8e8993a18e0
Author: mrg <mrg@NetBSD.org>
Date:   Mon Apr 12 09:22:58 2021 +0000

    be sure to only access vcpu if it was initialised.