commit 9ce1f978a4fdbc74d51786cc9fbc3eb175c04880 Author: maxv Date: Thu Sep 10 14:10:46 2020 +0000 kasan: fix the copyright notices commit 6e3ca747b74a9bd431b5e8288c9ba37b754f3be2 Author: maxv Date: Thu Sep 10 14:04:45 2020 +0000 kcsan: fix the copyright notices commit b86ad532344224d42a1fe715c729415ed8da95b0 Author: maxv Date: Wed Sep 9 16:29:59 2020 +0000 kmsan: update the copyright notices commit 8b40b349574b0378decb2adb56aa5681cdba9837 Author: maxv Date: Tue Sep 8 17:02:03 2020 +0000 nvmm-x86: avoid hogging behavior observed recently When the FPU code got rewritten in NetBSD, the dependency on IPL_HIGH was eliminated, and I took _vcpu_guest_fpu_enter() out of the VCPU loop since there was no need to be in the splhigh window. Later, the code was switched to use the kernel FPU API, API that works at IPL_VM, not at IPL_NONE. These two changes mean that the whole VCPU loop is now executing at IPL_VM, which is not desired, because it introduces a delay in interrupt processing on the host in certain cases. Fix this by putting _vcpu_guest_fpu_enter() back inside the VCPU loop. commit ffe54af021badd5ba3e79e4ec72af76e655db24a Author: maxv Date: Tue Sep 8 17:00:07 2020 +0000 nvmm-x86-vmx: improve the handling of CR0 - CR0_ET is hard-wired to 1 in the cpu, so force CR0_ET to 1 in the shadow. - Clarify. commit 3775d81fbdc7704248d9123446a4e6ef0ef7486d Author: maxv Date: Tue Sep 8 16:58:38 2020 +0000 nvmm: cosmetic changes - Style. - Explicitly include ioccom.h. commit 1fd28129bc8b07e5edebfb9b9c4d8367a83468a9 Author: maxv Date: Sat Sep 5 07:45:44 2020 +0000 x86: fix several CPUID flags - Rename: CPUID_PN -> CPUID_PSN CPUID_CFLUSH -> CPUID_CLFSH CPUID_SBF -> CPUID_PBE CPUID_LZCNT -> CPUID_ABM CPUID_P1GB -> CPUID_PAGE1GB CPUID2_PCLMUL -> CPUID2_PCLMULQDQ CPUID2_CID -> CPUID2_CNXTID CPUID2_xTPR -> CPUID2_XTPR CPUID2_AES -> CPUID2_AESNI To match the x86 specification and the other OSes. - Remove: CPUID_B10, CPUID_B20, CPUID_IA64. They do not exist. commit 8ac915a1e2efdf7e548f7f72c60e62caf58ce007 Author: maxv Date: Sat Sep 5 07:26:37 2020 +0000 x86: rename PGEX_X -> PGEX_I To match the x86 specification and the other OSes. commit 7be9cc91f1a965cb456a9e3eacbb9578969b5a30 Author: maxv Date: Sat Sep 5 07:22:25 2020 +0000 nvmm: update copyright headers commit 7375b983f36bbc8d0e99eceeb2ab15bfa4bf4c7a Author: maxv Date: Fri Sep 4 17:09:03 2020 +0000 nvmm-x86: improve the CPUID emulation - Mask DTES64, DS_CPL, CID, SDBG, xTPR, PN. - B10, B20 and IA64 do not exist, so just remove them. commit f3dbba79fbce634505d638ee5a60a825489693f6 Author: maxv Date: Fri Sep 4 17:08:01 2020 +0000 nvmm: more __read_mostly commit 1f045156127a1a619d13c1d4d0ce28f4160e6917 Author: maxv Date: Fri Sep 4 17:07:33 2020 +0000 nvmm-x86-vmx: improve the handling of CR0 - Flush the guest TLB when certain CR0 bits change. - If the guest updates a static bit in CR0, then reflect the change in VMCS_CR0_SHADOW, for the guest to get the illusion that the change was applied. The "real" CR0 static bits remain unchanged. - In vmx_vcpu_{g,s}et_state(), take VMCS_CR0_SHADOW into account. - Slightly modify the CR4 handling code, just for more symmetry with CR0. commit f5830a1f8149b7781d68910ba1216da60a96851f Author: maxv Date: Fri Sep 4 17:06:23 2020 +0000 nvmm-x86-svm: check the SVM revision Only revision 1 exists, but check it, for future-proofness. commit 30c0b3b1246f7be3182368f92a49d659f24b9608 Author: maxv Date: Fri Sep 4 17:05:09 2020 +0000 Add a few more CPUID flags. commit 4a45bc6b52d13c402a2b92c9a00b5564cbe2737e Author: maxv Date: Sat Aug 29 07:17:23 2020 +0000 Slightly clarify, and style. commit 4229a9030b25723a2ece928181ea07eae5bea0e9 Author: maxv Date: Sat Aug 29 07:16:03 2020 +0000 'doreti_checkast' isn't global anymore, localify. commit 66feb5fd67f8b0b8510ef61d900e2bbe8a75ce9d Author: maxv Date: Sat Aug 29 07:14:50 2020 +0000 Correct my rev1.159, it was incomplete, the check must be done later because the value can change in the meantime (and get set to zero). commit c82a3b641a6a7bf11e74ea2bdb14af23ca49446f Author: maxv Date: Sat Aug 29 07:14:17 2020 +0000 nvmm: explicitly include atomic.h commit be6d71080f521a1d8388b7b5c79740f895d1b60d Author: maxv Date: Wed Aug 26 16:36:32 2020 +0000 Add a check to prevent shift by -1. Not really important in this case, but to appease KUBSAN. Reported-by: syzbot+4026e8201b6b484b8cb4@syzkaller.appspotmail.com commit b38b0db924ad2e45fc80b06be2e0b82e9d34f114 Author: maxv Date: Wed Aug 26 16:33:03 2020 +0000 nvmm-x86-svm: improve the handling of MSR_EFER Intercept reads of it as well, just to mask EFER_SVME, which the guest doesn't need to see. commit 7a3717a67bd217bcc0accc60d95c76f3d1db2f77 Author: maxv Date: Wed Aug 26 16:32:02 2020 +0000 nvmm-x86: improve the handling of RFLAGS.RF - When injecting certain exceptions, set RF. For us to have an up-to-date view of RFLAGS, we commit the state before the event. - When advancing RIP, clear RF. commit 18493bcb6e2949d034014193a9a5678ff5b96b77 Author: maxv Date: Wed Aug 26 16:30:50 2020 +0000 nvmm-x86-vmx: improve the handling of CR4 - Filter out certain features we don't want the guest to enable. This is for general correctness, and future-proofness. - Flush the guest TLB when certain flags change. commit 88dc3ef52b19892bf62e9292f22164e2127d2b55 Author: maxv Date: Wed Aug 26 16:29:49 2020 +0000 nvmm: slightly clarify commit ba2f16b3ecaa1a452c823b36d035870ffe1d979d Author: maxv Date: Wed Aug 26 16:29:19 2020 +0000 nvmm-x86-svm: don't forget to intercept INVD INVD executed in the guest can be dangerous for the host, due to CPU caches being flushed without write-back. commit 502b2dcc8f62dd884ec65ca4f450d76e7e387f2e Author: maxv Date: Wed Aug 26 16:28:17 2020 +0000 nvmm: misc improvements - use mach->ncpus to get the number of vcpus, now that we have it - don't forget to decrement mach->ncpus when a machine gets killed - add more __predict_false() commit 3af8843c9a152760232ab20ee9433cbb041aa3cb Author: maxv Date: Sat Aug 22 11:01:10 2020 +0000 nvmm-x86-vmx: fix detection of the BIOS lock If it's locked, ensure it's locked with VMX enabled. If it's not locked, then lock it ourselves with VMX enabled. Should fix NetBSD PR/55596. commit b828a37ccc1afc08c900d06956fd00f514aaddfc Author: maxv Date: Sat Aug 22 11:00:00 2020 +0000 nvmm-x86: hide more CPUID flags, mostly related to perf monitors commit ec2be18f15eb4f04032d3ecbc895f6d437fa9a63 Author: maxv Date: Sat Aug 22 10:59:05 2020 +0000 nvmm-x86-svm: dedup code commit 125f3b826967a1bbdfef6d60a41c5fafb5fd5c19 Author: maxv Date: Thu Aug 20 11:09:56 2020 +0000 nvmm-x86: improve the CPUID emulation - x86-svm: explicitly handle 0x80000007 and 0x80000008. The latter contains extended features we must filter out. Apply the same in x86-vmx for symmetry. - x86-svm: explicitly handle extended leaves until 0x8000001F, and truncate to it. commit 3798821acce322584c329669934c8b040b241b4b Author: maxv Date: Thu Aug 20 11:07:43 2020 +0000 nvmm-x86: advertise the SERIALIZE instruction, available on future CPUs commit c98962e3bda9ce5ffb8ebd728cb2dc0d30ec629c Author: maxv Date: Tue Aug 18 17:08:05 2020 +0000 nvmm-x86-svm: improve the CPUID emulation Limit the hypervisor range, and properly handle each basic leaf until 0xD. commit 890a86f4e4636b0d83529344086fbaa1779f14ad Author: maxv Date: Tue Aug 18 17:04:37 2020 +0000 nvmm: use relaxed atomics to read nmachines commit f4870ec0a97923f0710833cba66321c63faa9528 Author: maxv Date: Tue Aug 18 17:03:58 2020 +0000 nvmm: localify a variable that doesn't need to be global commit 69c09203f4e3e9f4fccc5571041b7405f60a1993 Author: maxv Date: Tue Aug 18 17:03:10 2020 +0000 nvmm-x86: also flush the guest TLB when CR4.{PCIDE,SMEP} changes commit 92218c36478a84f36b878d9d73ba94dc05e22bb2 Author: maxv Date: Tue Aug 18 07:53:24 2020 +0000 Add missing cases, to prevent memory corruption. Reported-by: syzbot+f8b8a689a3560dda27f7@syzkaller.appspotmail.com commit 37170b235feb313a299590a2162b5a3242b470e6 Author: maxv Date: Tue Aug 11 15:48:42 2020 +0000 Micro-optimize: use pushq instead of pushw. To avoid LCP stalls and unaligned stack accesses. commit a3197877851ef98a51183b9b1e3a37620873c0e7 Author: maxv Date: Tue Aug 11 15:35:17 2020 +0000 sync commit e7a695c04f828ae5fd38c6dc7f64116293171e0c Author: maxv Date: Tue Aug 11 15:31:51 2020 +0000 Improve the CPUID emulation on nvmm-intel: - Limit the highest extended leaf. - Limit 0x00000007 to ECX=0, for future-proofness. commit 3a2a5e40e0bc7d418ffe8a85c89606d8d0b81838 Author: maxv Date: Tue Aug 11 15:27:46 2020 +0000 Improve emulation of MSR_IA32_ARCH_CAPABILITIES: publish only the *_NO bits. Initially they were the only ones there, but Intel then added other bits we aren't interested in, and they must be filtered out. commit 354916cd9199a4ecad64525a4e5cb7c109b2d3f5 Author: maxv Date: Tue Aug 11 15:23:10 2020 +0000 Hide OSPKE. NFC since the host never uses PKU, but still. commit 606d80361c5859d41ae471b7f6919659f04a9c22 Author: maxv Date: Wed Aug 5 16:36:33 2020 +0000 Add CTASSERT. commit 7153309d876e620159d729a807a19bf799badb9a Author: maxv Date: Wed Aug 5 15:40:46 2020 +0000 Add new fields here and there. commit b83fdbad346c469b0c85e00eca989ca0cd102f20 Author: maxv Date: Wed Aug 5 15:38:28 2020 +0000 Improve the CPUID emulation: - Hide SGX*, PKU, WAITPKG, and SKINIT, because they are not supported. - Hide HLE and RTM, part of TSX. Because TSX is just too buggy and we cannot guarantee that it remains enabled in the guest (if for example the host disables TSX while the guest is running). Nobody wants this crap anyway, so bye-bye. - Advertise FSREP_MOV, because no reason to hide it. commit 4ee94dfc5216247b29fcb36d1fa6a79ea3834bd8 Author: maxv Date: Wed Aug 5 15:22:25 2020 +0000 Add new field definitions, and intercept everything, for future-proofness. commit aed3066decbb21d1098bf106e72e0c3fbee61e1e Author: maxv Date: Wed Aug 5 15:20:09 2020 +0000 Add new field definitions. commit a12f276dcb0bc8fe57594de8f08c9b843e39b988 Author: maxv Date: Wed Aug 5 15:16:50 2020 +0000 Make it easier to understand what's going on, no functional change. commit bef1bb6f4deff7f4e5e5f163facc58c3cbee5bb4 Author: maxv Date: Wed Aug 5 10:33:01 2020 +0000 Upgrade NVMM to WARNS=5. commit 027accf11a4caa576f82171451170b0915efe5ee Author: maxv Date: Wed Aug 5 10:31:37 2020 +0000 Use ULL, to make it clear we are unsigned. commit 0392e1491677c9fd39e35f2366bae2aa730d3455 Author: maxv Date: Wed Aug 5 10:20:50 2020 +0000 Simplify, remove unnecessary #ifdef DIAGNOSTIC around KASSERTs. commit d56156e290b2dbbe411ab1683997f06950bc600b Author: maxv Date: Sun Aug 2 07:19:39 2020 +0000 Use a more informative panic message. commit 54c43e3cfc4f385fbef189147bc69158b876f712 Author: maxv Date: Sun Aug 2 07:15:05 2020 +0000 Note PAN. commit 461521856056d0cb318823bede9355ee0c0310dd Author: maxv Date: Sun Aug 2 06:58:16 2020 +0000 Add support for Privileged Access Never (ARMv8.1-PAN). PAN provides the same functionality as SMAP on x86: it forbids kernel access to userland pages when PSTATE.PAN=1, and allows such accesses when PSTATE.PAN=0. We clear SCTLR_SPAN, to guarantee that PAN=1 each time the kernel is entered. We catch PAN faults and panic right away without further processing. In copyin, copyout, etc, we temporarily authorize access to userland pages. PAN is a very useful exploit mitigation. Reviewed by ryo@, thanks. Tested on Qemu. Enabled by default. commit 0dc42cce2b3299ba9a6b8ba870aa6c3534a31bdd Author: maxv Date: Sat Aug 1 08:47:05 2020 +0000 The system registers we modify can have an impact on memory accesses, and we don't want the compiler to randomly re-order the instructions, so add barriers. Same as WRMSR on x86. commit bf99d9c5af4cc067fcb3636e22fdbcbfc015c772 Author: maxv Date: Sat Aug 1 08:22:37 2020 +0000 Note BRIDGE_IPF removal. commit e288ec3be8ae2c456cd85f8cf47ff21377c1ffb2 Author: maxv Date: Sat Aug 1 08:20:47 2020 +0000 Remove references to BRIDGE_IPF, it is now compiled in by default. commit 357ed385853326b082e365ada9c1885ad5ce9880 Author: maxv Date: Sat Aug 1 08:18:36 2020 +0000 Put the few x86-specific structures under #ifdef __x86_64__, for clarity. commit 59599b55ee8a43f93396ad06d3cec0e0dbceda08 Author: maxv Date: Sat Aug 1 06:50:42 2020 +0000 Remove #ifdef BRIDGE_IPF, compile in the code by default. Sent to tech-net@. commit b14572bec766da90a5b3e01fca273f13ab754f87 Author: maxv Date: Sat Aug 1 06:34:59 2020 +0000 Use large pages for the KASAN shadow, same as amd64, discussed with ryo@. commit 2d38a48347b5c7047675c29e746c3069a6f19dae Author: maxv Date: Fri Jul 31 16:59:04 2020 +0000 BRIDGE_IPF is MP-safe, discussed with ozaki-r@ commit 3f56dc0c68588d22763aced327a8430c903d827a Author: maxv Date: Mon Jul 20 05:50:55 2020 +0000 Revert previous, to unbreak the build (NVMM declares the macro too). There are hundreds of MSRs, we're not going to list them all, especially when the majority are unused. commit 2460276bf13636e26e2d559513cd15ad4d356dbb Author: maxv Date: Sun Jul 19 14:39:42 2020 +0000 sync with reality commit 45856efc5329cc8d81829ee612a8eded6916d007 Author: maxv Date: Sun Jul 19 14:31:31 2020 +0000 Compile USER_LDT by default, but, put it behind a privileged sysctl that defaults to disabled. To enable: # sysctl -w machdep.user_ldt=1 commit 14d3695e1a9837f3a29b385c76adc2397a61ac54 Author: maxv Date: Sun Jul 19 13:58:26 2020 +0000 we're already in an #ifdef USER_LDT block, so no need to #ifdef again commit 1770fc8c0a4e4e347acedc03d28035a51b0ab190 Author: maxv Date: Sun Jul 19 13:55:08 2020 +0000 don't include opt_user_ldt.h when it is not needed commit 49c2358bd4128b3f3e8df1e440d99fec34aa079a Author: maxv Date: Sun Jul 19 07:35:08 2020 +0000 Revert most of ad's movs/stos change. Instead do a lot simpler: declare svs_quad_copy() used by SVS only, with no need for instrumentation, because SVS is disabled when sanitizers are on. commit 547e5f2139292a0b68439be62f9411f7c558f93e Author: maxv Date: Sun Jul 19 06:56:09 2020 +0000 Switch to fpu_kern_enter/leave, to prevent clobbering, now that the kernel itself uses the fpu. commit a1714894de33cab0986491aa18123bade04a0f97 Author: maxv Date: Sun Jul 19 06:36:37 2020 +0000 The TLB flush IPIs do not respect the IPL, so enforcing IPL_HIGH has no effect. Disable interrupts earlier instead. This prevents a possible race against such IPIs. commit da692ee876cb2b04f1fd8e0edf45900d7c5020e6 Author: maxv Date: Sat Jul 18 20:56:53 2020 +0000 Now that the IDT is per-CPU, it must be saved/restored on each CPU independently. commit 1974940a28dfc25fafcf463255d810994171d560 Author: maxv Date: Sun Jul 12 10:10:53 2020 +0000 fix inaccuracy about kmsan commit 94bf8c4127bfec9e238c854bd1a54afa0703968a Author: maxv Date: Sat Jul 11 07:14:53 2020 +0000 Remove support for '%n' in the kernel printf functions. It makes vulnerabilities too easily exploitable, is unused and as a sanity rule should not be used in the kernel to begin with. Now, "printf(unfiltered_string);" is much less of a problem. commit e8f3e972b2b76a2744ca2fd9fbddd4d581ca8b80 Author: maxv Date: Fri Jul 3 16:23:02 2020 +0000 hardclock_ticks -> getticks() commit 49be30d22a6573d09354fdf2c3a5a8124e615568 Author: maxv Date: Fri Jul 3 16:17:24 2020 +0000 In cpu_uarea_{alloc,free}: - My previous change in this file was not correct, kremove does not free the underlying PA, which caused a very slow leak under memory pressure. Rework to correctly free the PA. - Add a second redzone, this time after the stack, to catch several stack overflows. The main concern is read overflows which leak the heap that follows the stack. - UVM_KMF_WAITVA doesn't fail, so remove error check. - Add KASSERTs. commit ec0a60ff66074cc6fa6c495a03b397662dbc9ab5 Author: maxv Date: Fri Jul 3 16:12:16 2020 +0000 Enable trace-cmp. commit ed64edcd373681d43f53e4aeb1eac2dba743b0a6 Author: maxv Date: Fri Jul 3 16:11:11 2020 +0000 Sync trace-pc and trace-cmp. commit a5baa0d310080f32f631b6acc921045c49fe9402 Author: maxv Date: Fri Jul 3 16:09:54 2020 +0000 Print the backend name when attaching. commit c2f290a0c2cf307fc9897784da379146cf251d6d Author: maxv Date: Fri Jul 3 16:07:52 2020 +0000 more commit c0a752c8daa88655d4be596e8c4c5dc341b94e0a Author: maxv Date: Tue Jun 30 16:28:17 2020 +0000 be one-shot by default, with room for circular commit e64a99c86dc644cfba8c7a8e3329e2f7318954e6 Author: maxv Date: Tue Jun 30 16:22:55 2020 +0000 fix file path commit 6540f7fa9dcda34a8467bff0b7c251d99ea63598 Author: maxv Date: Tue Jun 30 16:20:00 2020 +0000 Make copystr() a MI C function, part of libkern and shared on all architectures. Notes: - On alpha and ia64 the function is kept but gets renamed locally to avoid symbol collision. This is because on these two arches, I am not sure whether the ASM callers do not rely on fixed registers, so I prefer to keep the ASM body for now. - On Vax, only the symbol is removed, because the body is used from other functions. - On RISC-V, this change fixes a bug: copystr() was just a wrapper around strlcpy(), but strlcpy() makes the operation less safe (strlen on the source beyond its size). - The kASan, kCSan and kMSan wrappers are removed, because now that copystr() is in C, the compiler transformations are applied to it, without the need for manual wrappers. Could test on amd64 only, but should be fine. commit c5f300dca8a73b9ea21e58c193d147becd5ac17d Author: maxv Date: Sat Jun 27 07:29:11 2020 +0000 Fix NULL deref on attach failure. Found via vHCI fuzzing. Reported-by: syzbot+9fdcdc21799e5d6d75ee@syzkaller.appspotmail.com commit d5aa2ca52b51bf58767fd50cf3f361ec49c644ae Author: maxv Date: Sat Jun 27 07:00:43 2020 +0000 Yet another idiotic compat syscall that was developed with literally zero test made. Simply invoking this syscall with _valid parameters_ triggers a fatal fault, because the kernel tries to write to userland addresses. With specially-crafted parameters it is easy to completely escalate privileges into the kernel. Also the size of the allocation is just obviously wrong, but it looks like the callers are even more wrong, so not gonna fix it for now. Reported-by: syzbot+b05096f3114b2820d81c@syzkaller.appspotmail.com commit ec222be60e24f0ea2afa66218e464c2eb0eb0168 Author: maxv Date: Thu Jun 25 17:01:19 2020 +0000 Register NVMM as an actual pseudo-device. Without PMF handler, to explicitly disallow ACPI suspend if NVMM is running. Should fix PR/55406. commit a7353fbe1d67041bbdbdaf089615194f9e7bcb3e Author: maxv Date: Thu Jun 25 16:19:07 2020 +0000 Fix NULL deref. The original code before Jaromir's cleanup had an #ifndef block that wrongly contained the 'else' statement, causing the NULL check to have no effect. Reported-by: syzbot+c41bbfe5a7ff07bf0f99@syzkaller.appspotmail.com commit dcaf2d35f751071fabd75ad35e2c81c1ecc874dd Author: maxv Date: Wed Jun 24 18:09:37 2020 +0000 remove unused x86_stos commit 0ba25371b7fa5a8c4ad85c1dc11f0875367f0703 Author: maxv Date: Tue Jun 23 18:30:17 2020 +0000 Hum. Fix NULL deref triggerable with just write(0). Reported-by: syzbot+45b31355bf880e175b73@syzkaller.appspotmail.com commit dff33c8dcd61b560008d6af05f55bec2ae8a4020 Author: maxv Date: Tue Jun 23 17:21:55 2020 +0000 Rename __MD_CANONICAL_BASE -> __MD_KERNMEM_BASE for clarity. commit fba53df273316f5c81a140149d8be73d44064335 Author: maxv Date: Tue Jun 23 16:08:46 2020 +0000 kernel_sanitizers.7 commit 9da414f619411cf2dd2a71bcb1d3250c66e290de Author: maxv Date: Mon Jun 22 16:39:56 2020 +0000 pfil_psz gets dropped by the compiler because it is unused if !NET_MPSAFE, so add an #ifdef around it, not to leak memory. Found by kLSan. commit 0c60c87016dbbe462fa7d541e79979e5de5185d3 Author: maxv Date: Mon Jun 22 16:29:24 2020 +0000 Don't leak an unused sysctl log. Found by kLSan. commit be6e60b5b999faa7c0801e6bf3e18bf3fcfe85f8 Author: maxv Date: Mon Jun 22 16:21:29 2020 +0000 Permanent node doesn't need a log, plus the log gets leaked anyway. Found by kLSan. commit 18c5747ae9deb08dbe3458bf71f666cb91ef6514 Author: maxv Date: Mon Jun 22 16:14:18 2020 +0000 Fix memory leak. Found by kLSan. commit 7dc0af42570a9839c3a5d23797bd2800c4c34622 Author: maxv Date: Fri Jun 19 16:20:22 2020 +0000 localify commit 247f5da84da61add3d219825eef73d5665eae7e2 Author: maxv Date: Fri Jun 19 16:08:06 2020 +0000 localify commit 7de5fead2cefa51d4d7b127d82fae4e3d5ed2195 Author: maxv Date: Thu Jun 18 16:56:31 2020 +0000 style commit 071d8310c780807c3fe2559266a35c457d08b3ed Author: maxv Date: Thu Jun 18 16:31:15 2020 +0000 style commit 7a6c4855d13b94b7a59bbbf4fd5663ecaa1d4d36 Author: maxv Date: Thu Jun 18 16:27:24 2020 +0000 style and fix typo commit 48cf12a8029aa01ac2cb90d993170f27cedea6a3 Author: maxv Date: Thu Jun 18 16:23:43 2020 +0000 style commit 8cdb2c30ffb47e1717b11088083c604db1421e65 Author: maxv Date: Tue Jun 16 17:25:56 2020 +0000 remove unused commit 1af8005ca483d55126dc3fca4a21b771ca401101 Author: maxv Date: Tue Jun 16 17:12:18 2020 +0000 remove unused commit ece5308207c2e2defe499c2bc3912d9193a57408 Author: maxv Date: Mon Jun 8 16:36:18 2020 +0000 install fault.h commit f8794815b52e1e1e717b19339f33e03d8b84269e Author: maxv Date: Sun Jun 7 15:19:05 2020 +0000 Fix bohr bug triggered only once by syzkaller 2,5 months ago. In sockopt_alloc(), 'sopt' may already have been initialized with 'sopt->sopt_data = sopt->sopt_buf'. If the allocation fails, we end up with 'sopt->sopt_data = NULL', and later try to free this NULL pointer in sockopt_destroy(). Fix that by not modifying 'sopt_data' if the allocation failed. Difficult to reproduce in normal times, but fault(4) makes it easy. Reported-by: syzbot+380cb5d518742f063ad2@syzkaller.appspotmail.com commit 1651e0d7f52760ca21dc9403ef5076abafb3790c Author: maxv Date: Sun Jun 7 09:45:19 2020 +0000 Add fault(4). commit b98e3482bd7adad1bf6c5932d741b4f1d98d2bd1 Author: maxv Date: Sat Jun 6 07:03:21 2020 +0000 If the frame is not aligned, leave right away. This place probably needs to be revisited, because %rbp could easily contain garbage. Reported-by: syzbot+ecb40cf7f8acc102c29b@syzkaller.appspotmail.com commit 500c33eaba0754341b510a69dae03e19e34e6f08 Author: maxv Date: Sat Jun 6 06:42:54 2020 +0000 kMSan: re-set the orig after pool_cache_get_slow(), using the address of the caller of pool_cache_get_paddr(). Otherwise the orig is just pool_cache_get_paddr(), and that's not really useful for debugging. commit b88bb9e2018620d3129278d5970faec3c7ef3c3a Author: maxv Date: Fri Jun 5 17:20:56 2020 +0000 Register eight vHCI buses, and use separate KCOV mailboxes for them. commit 42029ecc3374a7a5b640372e64fc1d27b2571fd5 Author: maxv Date: Sun May 31 18:33:08 2020 +0000 Reset ud_ifaces and ud_cdesc to NULL, to prevent use-after-free in usb_free_device(). Reported-by: syzbot+c7e74d0ae89e9f08f863@syzkaller.appspotmail.com commit b9ddffb1df4d76a3127af16cc2cd3c668d9e5e9d Author: maxv Date: Sun May 31 17:52:58 2020 +0000 If we failed because we didn't encounter an endpoint, do not attempt to read 'ed', because its value is past the end of the buffer, and we thus perform out-of-bounds accesses. Detected thanks to vHCI+KASAN. First bug found by USB fuzzing. Reported-by: syzbot+59e7f6b3f353584ac810@syzkaller.appspotmail.com commit f0512fcb6b403827ab45294acfb3b282fe82ae1a Author: maxv Date: Sun May 31 08:05:30 2020 +0000 sc_statuspend is allocated with kmem_zalloc, so no need to memset it. commit 0eed82068b00f480f523556f89c74bfd675f75b3 Author: maxv Date: Sun May 31 07:53:38 2020 +0000 Add comments. commit 3f19939b6bc9a3628fe4eb49c01a7443e308c8e0 Author: maxv Date: Sat May 30 08:50:31 2020 +0000 Avoid passing file paths in panic strings, this results in extra long output that is annoying and that syzbot classifies as independent reports due to the instances having different build paths. commit 28c8bd54c840cb7926727d24586dac9d774154ba Author: maxv Date: Sat May 30 08:41:22 2020 +0000 Introduce PTRACE_REGS_ALIGN, and on x86, enforce a 16-byte alignment, due to fpregs having fxsave which requires 16-byte alignment. Reported-by: syzbot+f44d47e617ebf7fda081@syzkaller.appspotmail.com commit ab282b49d49b8d84715890732bdb854a282381a6 Author: maxv Date: Sun May 24 08:08:49 2020 +0000 Gather the conditions to return from the VCPU loops in nvmm_return_needed(), and use it in nvmm_do_vcpu_run() as well. This fixes two undesired behaviors: - When a VM initializes, the many nested page faults that need processing could cause the calling thread to occupy the CPU too much if we're unlucky and are only getting repeated nested page faults thousands of times in a row. - When the emulator calls nvmm_vcpu_run() and immediately sends a signal to stop the VCPU, it's better to check signals earlier and leave right away, rather than doing a round of VCPU run that could increase the time spent by the emulator waiting for the return. commit 6cc0bb067f505b05c3547766cc9d53e545d8fa2a Author: maxv Date: Sat May 23 08:25:32 2020 +0000 Bump copyrights. commit cb09a061cc5e088c49d99fbbe7e1aefacc6208ff Author: maxv Date: Sat May 23 08:23:28 2020 +0000 Extract putc(). commit 009be47cbc8af7dbce5469d3ccbb6b0400b85b7b Author: maxv Date: Sat May 23 08:10:50 2020 +0000 Hum, forgot to include this file in my "Clarify." commit on mm.c:rev1.27 and elf.c:rev1.21. commit 22199ca3e0158c05dc43ee6d62c6965850624cb3 Author: maxv Date: Thu May 21 08:20:25 2020 +0000 Mmh, should check cpuid_level first. commit ef0698b19c13f2846ae389ef6a4ba43d8bd9d4e2 Author: maxv Date: Thu May 21 07:43:23 2020 +0000 Complete rev1.26: reset nvmm_impl to NULL in nvmm_fini(). commit 471fc45916df38031c3a3fc9c0d4ef7a31147eb2 Author: maxv Date: Thu May 21 07:36:16 2020 +0000 Improve the CPUID emulation on nvmm-intel: limit the highest basic and hypervisor leaves. commit 4334bbd722214700038fb3801561956dabed96bf Author: maxv Date: Thu May 21 05:58:00 2020 +0000 Increase the number of ports to 8. commit d3deb06c8043b8e9088c615574b446a1d7368fdb Author: maxv Date: Wed May 20 21:05:21 2020 +0000 sync with reality commit 1a1305c46316499741611291a6f8de0d44efa9fc Author: maxv Date: Wed May 20 20:59:31 2020 +0000 future-proof-ness commit 8b08d8559ca8bf2b64471cf903022a38ccfd2eda Author: maxv Date: Wed May 20 18:52:48 2020 +0000 this is kmsan commit 01f4788ce77f7c8d6646e603a5ac36df5047e3d4 Author: maxv Date: Fri May 15 19:28:09 2020 +0000 hardclock_ticks -> getticks() commit 76aabb623a3a6c6cf59970d77089cee092058ab5 Author: maxv Date: Fri May 15 19:07:01 2020 +0000 Don't add KCOV instrumentation on top of the KUBSAN instrumentation, this is useless and too bloated. commit 468fc2736f9b412b53f9098d33084072d3c43d4a Author: maxv Date: Fri May 15 13:09:02 2020 +0000 Introduce kcov_silence_enter() and kcov_silence_leave(), to allow to temporarily disable KCOV on the current lwp. Should be used in the rare but problematic cases where extreme noise is introduced by an uninteresting subsystem. Use this capability to silence KCOV during the LOCKDEBUG lookups. This divides the size of the KCOV output by more than two in my KCOV+vHCI tests. commit c0fcd3a7b730021e94fc36a99cd15785b347c735 Author: maxv Date: Fri May 15 12:34:52 2020 +0000 Introduce KCOV remote support. This allows to collect KCOV coverage on threads other than curlwp, which is useful when fuzzing components that defer processing, such as the network stack (partially runs in softints) and the USB stack (partially runs in uhub kthreads). A subsystem that whishes to provide coverage for its threads creates a "mailbox" via kcov_remote_register() and gives it a (subsystem, id) identifier. There is one mailbox per "target lwp". The target lwp(s) must then call kcov_remote_enter() and kcov_remote_leave() with the identifier, to respectively enable and disable coverage within the thread. On the userland side, the fuzzer has access to the mailboxes on the system with the KCOV_IOC_REMOTE_ATTACH and KCOV_IOC_REMOTE_DETACH ioctls. When attached to a mailbox with a given identifier, the KCOV_IOC_ENABLE, KCOV_IOC_DISABLE and mmap() operations will affect the mailbox. As a demonstrator, the vHCI subsystem is changed to use KCOV mailboxes. When the vHCI bus attaches it creates as many mailboxes as it has USB ports, each mailbox being associated with a distinct port. Uhub is changed to enable KCOV coverage in usbd_new_device(). With that in place, all of the USB enumeration procedure can be traced with KCOV. commit 4dc57c404f91fab0a4e3adc5ed6e8977e1ca1963 Author: maxv Date: Fri May 15 07:51:49 2020 +0000 It should be allowed to have exactly a usb_descriptor_t. commit c8deba4f2a8de25d996f6185d3284ceae9861d44 Author: maxv Date: Fri May 15 07:47:53 2020 +0000 Use a generic description when scanning mbufs. commit 8e07a871e169f46d7a59a6ddd4630f1758a0774a Author: maxv Date: Fri May 15 06:34:34 2020 +0000 igmp_sendpkt() expects ip_output() to set 'imo.imo_multicast_ttl' into 'ip->ip_ttl'; but ip_output() won't if the target is not a multicast address, meaning that the uninitialized 'ip->ip_ttl' byte gets sent to the network. This leaks one byte of kernel heap. Fix this by filling 'ip->ip_ttl' with a TTL of one. Found by KMSAN. Reported-by: syzbot+e49f7b8a8fec5a477c9a@syzkaller.appspotmail.com commit 14e736dacb7a9dc3fa5703106f8031a7831194f6 Author: maxv Date: Thu May 14 18:18:24 2020 +0000 Fix uninitialized memory access. Found by KMSAN. Reported-by: syzbot+9f2a173d29d66c88f9ac@syzkaller.appspotmail.com commit f02b4a3adf1a4ff91c07b8c335edd770adbc905a Author: maxv Date: Thu May 14 17:01:34 2020 +0000 KASSERT -> panic commit 74b41b596ff170deaf768db2b0cdc3979058afb2 Author: maxv Date: Thu May 14 16:57:53 2020 +0000 Don't even try to go past a syscall. Fixes severe panic recursions in KUBSAN. commit c2662c76e38c429adb0429897bba287c42a40e72 Author: maxv Date: Sun May 10 06:38:24 2020 +0000 Pass -Wno-unused-command-line-argument for LLVM, discussed on tech-toolchain@. commit 1827e8c4722309a1a2f13a1bb3b1e4096f562824 Author: maxv Date: Sun May 10 06:30:57 2020 +0000 Reintroduce cpu_rng_early_sample(), but this time with embedded detection for RDRAND/RDSEED, because TSC is not very strong. commit d2243eba21d6fc0d8e0bad7da8fce755ffe10037 Author: maxv Date: Sun May 10 06:24:16 2020 +0000 Respect the convention for the hypervisor information: return the highest hypervisor leaf in 0x40000000.EAX. commit 3e034e38c96c3274a81e07f45be3af409f1ec92b Author: maxv Date: Sat May 9 16:18:57 2020 +0000 Improve the CPUID emulation of basic leaves: - Hide DCA and PQM, they cannot be used in guests. - On Intel, explicitly handle each basic leaf until 0x16. - On AMD, explicitly handle each basic leaf until 0x0D. commit b88542ff514a50a7228a982a76961fb46f28c9e9 Author: maxv Date: Sat May 9 09:08:41 2020 +0000 A kernel without USER_LDT returns ENOSYS, not ENOTSUP. commit 8cf8a955993673a8027f28786bdc60694d21cb81 Author: maxv Date: Sat May 9 08:39:07 2020 +0000 On Intel CPUs, CPUID leaf 0xB, too, provides topology information, so filter it correctly, to avoid inconsistencies if the host has SMT. This fixes HaikuOS which fetches SMT information from there and would panic because of the inconsistencies. commit c6ad9531be7ac53f80b2224d104ed2c29f8cd976 Author: maxv Date: Thu May 7 21:05:37 2020 +0000 Forgot to commit this file as part of elf.c::rev1.21 mm.c::rev1.27. commit e80a9a237ee561e8566f2c3dd9b9f8a09bae95c1 Author: maxv Date: Thu May 7 19:25:57 2020 +0000 Localify. commit bf7b1eb7687a0b5b494f6482fa766216e4cebac2 Author: maxv Date: Thu May 7 18:13:05 2020 +0000 Fix LOCKDEBUG compilation on i386. commit 0cca6bed7eb2a59955c7f93f401f271c95f07a03 Author: maxv Date: Thu May 7 18:02:48 2020 +0000 Update the comments. commit 04cc92fd41c57b2a80200087f4ec73263fd60e5c Author: maxv Date: Thu May 7 17:58:26 2020 +0000 Clarify. commit f8b1241e1aeadf598f0910b938f61a793ecfed0a Author: maxv Date: Thu May 7 17:10:02 2020 +0000 Explain more. commit 661a70914997b5fe822e50079f90556965a7e087 Author: maxv Date: Thu May 7 16:49:59 2020 +0000 If we encounter relocations from a section that the bootloader dropped, AND if the section is a note, then skip the relocations. Considering a note that the bootloader dropped, there are two possible sides for the relocations: (1) the relocations from the note towards the rest of the binary, and (2) the relocations from the rest of the binary towards the note. We skip (1), which is correct, because the notes do not play any role at run time. If we encounter (2) however then there is a bug in the kernel, so add a sanity check against that. This fixes KASLR since the latest Xen changes (which introduced .note.Xen). commit 4b89f9c718d029af20813d8c733927875f607c0e Author: maxv Date: Tue May 5 19:26:47 2020 +0000 Gather the section filtering in a single function, and add a sanity check when relocating, to make sure the section we're accessing is mappable. Currently this check fails, because of the Xen section, which has RELAs but is an unmappable unallocated note. Also improve the prekern ASSERTs while here. commit 5ee43fa31810e23c7ce8a9521cf4557d2d491b68 Author: maxv Date: Tue May 5 06:32:43 2020 +0000 Fix KASAN, init_xen_early must be called after kasan_early_init. commit bc03756eb517c3cc50bb76a48521eea4637d62bb Author: maxv Date: Sat May 2 16:28:37 2020 +0000 Call kasan_early_init earlier, to unbreak KASAN after the recent RNG changes. Will also prevent further trouble. commit 4bbfc45ae042eb8cb560a9464c9865fc1121f7d2 Author: maxv Date: Sat May 2 16:25:47 2020 +0000 Remove the D bit as part of the hotpatch cleanup procedure. commit 1b0e82f3cd93d40580ba77340532b13b10d77cbc Author: maxv Date: Sat May 2 11:37:17 2020 +0000 Modify the hotpatch mechanism, in order to make it much less ROP-friendly. Currently x86_patch_window_open is a big problem, because it is a perfect function to inject/modify executable code with ROP. - Remove x86_patch_window_open(), along with its x86_patch_window_close() counterpart. - Introduce a read-only link-set of hotpatch descriptor structures, which reference a maximum of two read-only hotpatch sources. - Modify x86_hotpatch() to open a window and call the new x86_hotpatch_apply() function in a hard-coded manner. - Modify x86_hotpatch() to take a name and a selector, and have x86_hotpatch_apply() resolve the descriptor from the name and the source from the selector, before hotpatching. - Move the error handling in a separate x86_hotpatch_cleanup() function, that gets called after we closed the window. The resulting implementation is a bit complex and non-obvious. But it gains the following properties: the code executed in the hotpatch window is strictly hard-coded (no callback and no possibility to execute your own code in the window) and the pointers this code accesses are strictly read-only (no possibility to forge pointers to hotpatch an area that was not designated as hotpatchable at compile-time, and no possibility to choose what bytes to write other than the maximum of two read-only templates that were designated as valid for the given destination at compile-time). With current CPUs this slightly improves a situation that is already pretty bad by definition on x86. Assuming CET however, this change closes a big hole and is kinda great. The only ~problem there is, is that dtrace-fbt tries to hotpatch random places with random bytes, and there is just no way to make it safe. However dtrace is only in a module, that is rarely used and never compiled into the kernel, so it's not a big problem; add a shitty & vulnerable independent hotpatch window in it, and leave big XXXs. It looks like fbt is going to collapse soon anyway. commit a4fdbd40f17e730664c722b38775178e7749493e Author: maxv Date: Sat May 2 11:12:49 2020 +0000 Remove unused. commit a79320ea3f0a5b85e3f5b233b279d5854ba14b00 Author: maxv Date: Fri May 1 09:40:47 2020 +0000 Switch the rest of i386 to the x86_hotpatch mechanism. commit 039330cc45e9e60d672b036ecc0c6d551746e74f Author: maxv Date: Fri May 1 09:23:43 2020 +0000 Remove dead code, we are in an #ifndef XENPV block here. commit a389d921e35abe5788c15287cb02ce13b811d049 Author: maxv Date: Fri May 1 09:17:58 2020 +0000 Use absolute jumps, and drop the PC-relative patching. We want exact templates. commit f8a035b51f57ad1b7be323a7964b31c4a5bf97d9 Author: maxv Date: Fri May 1 08:32:50 2020 +0000 Use the hotpatch framework when patching _atomic_cas_64. commit 83bf34c8da54539c9797708bac6df96d5a4cc8f7 Author: maxv Date: Fri May 1 07:03:02 2020 +0000 Explicitly align to 8 bytes, found by kUBSan. Reported-by: syzbot+f1e1561ed739db869d44@syzkaller.appspotmail.com commit 40011a39b92d4028859e73c45794638bbe508447 Author: maxv Date: Thu Apr 30 17:21:12 2020 +0000 The labels are already global, drop unused. commit cdc420795765695a392b2ccb58ffbd537e04298d Author: maxv Date: Thu Apr 30 17:17:33 2020 +0000 Switch to templates. commit 2f9600b3f9fb1ed0d7b72800ad9eabb4f631afb7 Author: maxv Date: Thu Apr 30 16:56:23 2020 +0000 If we were processing a software int/excp, and got a VMEXIT in the middle, we must also reflect the instruction length, otherwise the next VMENTER fails and Qemu shuts the guest down. commit 94f77039ec57dcb27999f5ee45faa4528beaf4ae Author: maxv Date: Thu Apr 30 16:50:17 2020 +0000 When the identification fails, print the reason. commit 0132044a58abc943d8c98fa0c9fd960730847568 Author: maxv Date: Sun Apr 26 19:31:36 2020 +0000 In nvmm_open(), make sure an implementation was found. This fixes an initialization bug triggerable in certain conditions. If you build nvmm inside the kernel, AND have a cpu that is not supported, AND run nvmmctl (or qemu-nvmm, both being the only binaries in the "nvmm" group), you get a page fault. This is because when nvmm is built inside the kernel, the kernel registers nvmm_cdevsw behind nvmm's back. The ioctl is therefore always accessible, and will hit NULL pointers if nvmm_init() failed. Problem reported by Andrei M. on netbsd-users@, thanks. commit 82b887c98cb96198732c72fd9aeb8a152dadc6ca Author: maxv Date: Sun Apr 26 14:49:17 2020 +0000 Use the hotpatch framework for LFENCE/MFENCE. commit 00e8704ff706faff3d3a729e449dd5cd6828927c Author: maxv Date: Sun Apr 26 14:07:43 2020 +0000 Put the template functions in the rodata section; they get hotpatched into other places, but never execute directly. commit 059bf1c9f61baa4d3c132aae529de9ecf332c01b Author: maxv Date: Sun Apr 26 13:59:44 2020 +0000 Remove unused argument in macro. commit bdf3e2b952a2770d52263c87f344164c5eed46f2 Author: maxv Date: Sun Apr 26 13:54:02 2020 +0000 Remove unused. commit f97e49b1f49c24661f066e3d2cf8d85b8f728ef2 Author: maxv Date: Sun Apr 26 13:37:14 2020 +0000 Drop the hardcoded array, use the hotpatch section. commit c6eb88ce8deb99663f0df373416764ae21ba5445 Author: maxv Date: Sun Apr 26 12:13:10 2020 +0000 Add a test on the maximum number of slots. commit 9b3792f469fc6b5d844b345cec4a9cd3c4b7d411 Author: maxv Date: Sun Apr 26 11:56:38 2020 +0000 Split in sub-tests for clarity, and add a new test, marked as expected failure for now. commit 46abf8266804975350e40dcfcf193996038c8d42 Author: maxv Date: Sun Apr 26 09:08:40 2020 +0000 Add tests on the x86 PTEs. We scan the MMU page tables directly and verify certain properties. commit 6b101561069063aabdae3f9a7d43d9b6fa80e3ba Author: maxv Date: Sat Apr 25 05:17:16 2020 +0000 Switch to the new PTE naming. The old naming is now unused, remove it. commit a515a5baff49e3c6270f2f1e0807e87584fa8475 Author: maxv Date: Fri Apr 24 16:27:27 2020 +0000 Give the ldt a fixed size of one page (512 slots), and drop the variable- sized mechanism that was too complex. This fixes a race between USER_LDT and SVS: during context switches, the way SVS installs the new ldt relies on the ldt pointer AND the ldt size, but both cannot be accessed atomically at the same time. commit cdab2503d3f06b1639fded578e0946cb381c9919 Author: maxv Date: Wed Apr 22 16:24:15 2020 +0000 We have USER_LDT tests in ATF, remove the ones from regress. commit 8f5c5690fbc99b268b5d8ad08528913dc6d2cf15 Author: maxv Date: Mon Apr 20 16:32:03 2020 +0000 Add three KASSERTs, to detect refcount bugs. This narrows down an unknown bug in some place near, that has manifested itself in various forms (use-after-frees, uninit accesses, page faults, segmentation faults), all pointed out by syzbot. The first KASSERT in fixjobc() fires when the bug is encountered. commit 5460d8d2097c9c0db7cd7125253337d78601b60d Author: maxv Date: Sun Apr 19 13:22:58 2020 +0000 Add tests for USER_LDT. commit 37081eff049098d860835e0d121c1ca08f04038e Author: maxv Date: Fri Apr 17 17:24:46 2020 +0000 Slightly reorder for clarity, and add header. commit 2786a5b3224bb3f90760dbe8cc271fa7daaa0ab7 Author: maxv Date: Wed Apr 15 17:28:26 2020 +0000 Drop the todo and qualify the accesses. commit 6dba72453e7b781428843371e7bce4199abb14e3 Author: maxv Date: Wed Apr 15 17:16:22 2020 +0000 Introduce POOL_NOCACHE, simple option to cancel pool_caches and go directly to the pool layer. It is taken out of POOL_QUARANTINE. Advertise POOL_NOCACHE for kMSan rather than POOL_QUARANTINE. With kMSan we are only interested in the no-caching effect, not the quarantine. This reduces memory pressure on kMSan kernels. commit d2f48fc91430733c0f29f3c26771aeae52aa487b Author: maxv Date: Wed Apr 15 17:00:07 2020 +0000 Use large pages for the kASan shadow, same as kMSan. commit 716f38f39bfad40dc5b664dba5a4cee97674a674 Author: maxv Date: Wed Apr 15 16:28:28 2020 +0000 Use large pages for the kMSan shadows. This greatly improves performance, and slightly reduces memory consumption. commit 4657dcbe0a5b625248ac0d10346274a7a3363bbe Author: maxv Date: Mon Apr 13 16:09:21 2020 +0000 Use relaxed atomics on spc_mcount. commit 6dd0184a42d1b6c811885db88939f34bee40c740 Author: maxv Date: Mon Apr 13 15:54:45 2020 +0000 hardclock_ticks -> getticks() commit 0ca5e7929e77cdee2c3f3247da81bc668fcc85e1 Author: maxv Date: Mon Apr 13 11:44:20 2020 +0000 Add KUBSAN. commit d2ae2316d44457ad066487d4b85b002124280f45 Author: maxv Date: Mon Apr 13 09:34:02 2020 +0000 Make KASAN compatible with LLVM. Same as GCC, except that LLVM aggressively inlines the shadow checks, and this causes problems at boot time; so we pass -asan-instrumentation-with-call-threshold=0 to force callbacks instead of inlines. commit 705efdeb4051865ff50308dcd055dd739c532a3f Author: maxv Date: Mon Apr 13 08:05:02 2020 +0000 constify commit 5028663c0099ea3aef54f0da7ccba11d4cc64679 Author: maxv Date: Mon Apr 13 07:32:36 2020 +0000 Add KASAN instrumentation on on-stack VLAs, same as amd64. commit 050ada8597ecf04420951ca938adc51412f1d37d Author: maxv Date: Mon Apr 13 07:09:50 2020 +0000 Add KASAN-DMA support on aarch64, same as amd64. Discussed with skrll@. commit 1fe341f28aceb8ad401ec8eaa072ee03964f534e Author: maxv Date: Mon Apr 13 06:24:52 2020 +0000 Note PAC and BTI. commit 00456e138cb08e903d9c871e239c1ff9c0808fec Author: maxv Date: Mon Apr 13 06:02:03 2020 +0000 Meant to do a store here, not a load. Ie we want to replace the initial weak key by the stronger one we just generated. Rototilled this place too many times. commit d6f9a6f3972c8988ae125d734a6c75e57181f2b2 Author: maxv Date: Mon Apr 13 05:40:25 2020 +0000 Add support for Branch Target Identification (BTI). On the executable pages that have the GP (Guarded Page) bit, the semantic of the "br" and "blr" instructions is changed: the CPU expects the first instruction of the jump/call target to be "bti", and faults if it isn't. We add the GP bit on the kernel .text pages (and incidentally the .rodata pages, but we don't care). The compiler adds a "bti c" instruction at the beginning of each C function. We modify the ENTRY() macros to manually add "bti c" in the asm functions. cpuswitch.S needs a specific change: with "br x27" the CPU expects "bti j", which is bad because the functions begin with "bti c"; switch to "br x16", for the CPU to accept "bti c". BTI helps defend against JOP/COP. Tested on Qemu. commit 8dfafde81682f709a5f3e00faf3c6b4763bd74fd Author: maxv Date: Sun Apr 12 07:49:58 2020 +0000 Add support for Pointer Authentication (PAC). We use the "pac-ret" option, to sign the return instruction pointer on function entry, and authenticate it on function exit. This acts as a mitigation against ROP. The authentication uses a per-lwp (secret) I-A key stored in the 128bit APIAKey register and part of the lwp context. During lwp creation, the kernel generates a random key, and during context switches, it installs the key of the target lwp on the CPU. Userland cannot read the APIAKey register directly. However, it can sign its pointers with it, because the register is architecturally shared between userland and the kernel. Although part of the CPU design, it is a bit of an undesired behavior, because it allows to forge valid kernel pointers from userland. To avoid that, we don't share the key with userland, and rather switch it in EL0<->EL1 transitions. This means that when userland executes, a different key is loaded in APIAKey than the one the kernel uses. For now the userland key is a fixed 128bit zero value. The DDB stack unwinder is changed to strip the authentication code from the pointers in lr. Two problems are known: * Currently the idlelwps' keys are not really secret. This is because the RNG is not yet available when we spawn these lwps. Not overly important, but would be nice to fix with UEFI RNG. * The key switching in EL0<->EL1 transitions is not the most optimized code on the planet. Instead of checking aarch64_pac_enabled, it would be better to hot-patch the code at boot time, but there currently is no hot-patch support on aarch64. Tested on Qemu. commit 9e4d1f670208c499235021c98e3078e5b73e8fa2 Author: maxv Date: Sun Apr 12 07:16:09 2020 +0000 Don't inline cprng_strong{32,64}(), so they can be called from asm. commit 33da2672ceb38c2151b300787da0dbdc2b057874 Author: maxv Date: Sat Apr 11 09:02:04 2020 +0000 The vectors allow for up to 0x80 bytes of instructions, but we've reached this limit already, so implement the handler functions outside, and jump to them. This allows to add instructions in the future. Sent to ryo@ and skrll@. commit ad578b079ca4a8235e87f7ba49d50c7140eac444 Author: maxv Date: Sat Apr 4 07:03:57 2020 +0000 KCOV doesn't depend on specificdata and cpu_intr_p() anymore, so drop references. commit e3cabdacb5b18ff0f16f2d0525dddae3b9e81fb3 Author: maxv Date: Sat Apr 4 06:51:46 2020 +0000 Drop specificdata from KCOV, kMSan doesn't interact well with it. Also reduces the overhead. commit 3ca1607dd1a5d10653b652a032bac96a956e7424 Author: maxv Date: Fri Apr 3 19:09:43 2020 +0000 Avoid overflows when reading strings. commit fee1bdd93038e64c64aa651fc7873f5f978f1a5d Author: maxv Date: Fri Apr 3 18:44:50 2020 +0000 Add KASAN instrumentation on strcat/strchr/strrchr. commit 92f6f58f77a070323349f753bb346921484769f1 Author: maxv Date: Fri Apr 3 18:26:14 2020 +0000 Verify that the terminating '\0', too, is initialized. commit 360e421a98db74ec5653094fa87af253a3136ffe Author: maxv Date: Fri Apr 3 18:12:39 2020 +0000 Add KASAN instrumentation on on-stack VLAs. commit 7ea28430bd08603b8dcf79841500c1d2be986a88 Author: maxv Date: Thu Apr 2 16:31:37 2020 +0000 Add a comment. commit f5394bfafda0faffea8adeff269fe22ae11ccad5 Author: maxv Date: Thu Apr 2 16:29:30 2020 +0000 Hide 'hardclock_ticks' behind a new getticks() function, and use relaxed atomics internally. Only one caller is converted for now. Discussed with riastradh@ and ad@. commit 7b406fe3bc19de3956456a6e42faede85681a876 Author: maxv Date: Tue Mar 31 16:34:25 2020 +0000 Publish the request/response structures too. commit 0191dafe2a2a990cd365ca4f0945801593c653b5 Author: maxv Date: Tue Mar 31 16:28:28 2020 +0000 Put the ioctl definitions in a header, and install it. commit 030d268ece156e29ea1fdd2a697b75380a1f5076 Author: maxv Date: Tue Mar 31 16:17:32 2020 +0000 Allow short transfers. We introduce a third packet, in the U->H list, that contains a vhci_response_t, which indicates the size. commit 957aebfb8af312577373b10aa6be772684dc7960 Author: maxv Date: Sun Mar 29 09:46:14 2020 +0000 store the request buffer in the vxfer instead of the packet, clearer commit 50f51d7e9cdb6ca47ca08029c2b079c099f219ea Author: maxv Date: Tue Mar 24 17:20:55 2020 +0000 Remove the argument from USB_{ATTACH,DETACH}, for consistency. commit bb791dfaecfd037796bb2c07d389c77890eb4c7e Author: maxv Date: Tue Mar 24 07:12:16 2020 +0000 Fix type confusion. Found by kASan when doing a normal attach+detach over vHCI. commit b63295fc4053cc61e2b9c4f8a1099d0321f5acb7 Author: maxv Date: Tue Mar 24 07:11:07 2020 +0000 Use a vhci_request_t, will be required for future changes. commit 7abf38f693f45f351be51a5b861fae10ecd48823 Author: maxv Date: Sun Mar 22 17:15:15 2020 +0000 Add internal support for multiple endpoints. commit f7a214b194094a6e248e0af57b23c5135058e2b6 Author: maxv Date: Sun Mar 22 15:14:03 2020 +0000 clarify and explain commit cd4e7970c90c2420783512144491099fbc4ea3a1 Author: maxv Date: Tue Mar 17 17:18:49 2020 +0000 Add a redzone between the pcb and the stack. Sent to port-amd64@. commit d78235cacba67bde996b04aba107657fd03cd400 Author: maxv Date: Sat Mar 14 05:19:50 2020 +0000 On amd64, mark the whole tree as NX. No real functional change, just to prevent possible future surprises, and to make it a little harder to map executable pages in ROP chains. commit 83674352d89315f59d60b81e00c0069da68adce0 Author: maxv Date: Sat Mar 14 04:55:14 2020 +0000 style commit 36ae7bd433408ce02a0b42bc6d04935ec52da3e3 Author: maxv Date: Sat Mar 14 04:49:33 2020 +0000 fix memory leaks commit 12e61f97eb006672bff8108f0f19296d38fedb13 Author: maxv Date: Sat Mar 14 04:39:15 2020 +0000 wrong size passed to copyout commit 5cc84c31f1fc5d95ecd9072eda462ecf1318adc7 Author: maxv Date: Sat Feb 29 11:40:06 2020 +0000 pass the address of the field, instead of relying on it being the first field of the structure/union, no functional change, discussed with plunky@ commit fbf43bbcc504597ba3a021f5a3b5ccb767dcb529 Author: maxv Date: Wed Feb 26 18:00:12 2020 +0000 Zero out the padding in 'd_namlen', to prevent info leaks. Same logic as ufs_makedirentry(). Found by kMSan: the unzeroed bytes of the pool_cache were getting copied to the disk via a DMA write operation, and there kMSan was noticing uninitialized memory leaving the system. Reported-by: syzbot+382c9dffc06a9683abb5@syzkaller.appspotmail.com commit 9f5c5bd58a8dbca27434303dc0c2c63e8b384ff3 Author: maxv Date: Sat Feb 22 20:12:40 2020 +0000 add relaxed atomics, ok ad@ riastradh@ commit a11bfc8e8f998ee19bbeca4adbe70e981f6a1036 Author: maxv Date: Sat Feb 22 20:08:39 2020 +0000 Be less strict: when copyinstr() returns ENAMETOOLONG, it does initialize the buffer, so mark it as such. commit 30a965fd9d07511d61523caadb54899f0a754527 Author: maxv Date: Sat Feb 22 09:42:20 2020 +0000 pass the address of the field, instead of relying on it being the first field of the structure, no functional change commit 35e310c50811c3a169121373d42726e17be427bf Author: maxv Date: Sat Feb 22 09:30:42 2020 +0000 pass the address of the field, instead of relying on it being the first field of the structure, no functional change commit cc2c9b2c945d653a35841ce7c587df4d893c1f3c Author: maxv Date: Sat Feb 22 09:24:05 2020 +0000 pass the address of the field, instead of relying on it being the first field of the structure, no functional change, ok kamil commit 909d1af87a6dac2c7787e62b2306cf117eb5a6ad Author: maxv Date: Sat Feb 22 08:58:39 2020 +0000 Inline the block in the parent block, for clarity, and also to prevent a false positive with kMSan. Here, LLVM reorders the conditions and checks 'vattr' before 'error'. But if 'error' is non-zero then 'vattr' is not initialized, and kMSan notices the uninitialized memory read. commit 9a5d6291028b2e4e160f9553e2c4b1cf5d937740 Author: maxv Date: Sat Feb 22 08:39:33 2020 +0000 Zero out 'tv', to prevent uninitialized bytes in its padding from leaking to userland. Found by kMSan. Reported-by: syzbot+8134380511a82c8f5fd7@syzkaller.appspotmail.com commit 52c48dc1a8688827e6ada414c5492b7d813e296c Author: maxv Date: Fri Feb 21 18:34:37 2020 +0000 In pmap_changeprot_local(), drop the dirty bit along with the write bit. commit 8d92e249083fa7a5a3226a9d2609e4adb81927a3 Author: maxv Date: Fri Feb 21 18:31:55 2020 +0000 Add comments. commit bbf976964cd1c7860e0a724a35c1855902c0ee4e Author: maxv Date: Sun Feb 16 09:53:54 2020 +0000 Improve the check, to prevent more surprises. commit 344d09ff3c36507f5e4226954d3ed9ecbe7a2069 Author: maxv Date: Sun Feb 16 09:40:35 2020 +0000 Move usb_desc_* into usbdi_util.c, no functional change. commit c866bfa4d82bc76f7d5622611f25ddf909f43a97 Author: maxv Date: Sat Feb 15 10:41:25 2020 +0000 Explain more. commit 6db0083a918cdf1acbb816067a9cd87f43cb6ecf Author: maxv Date: Sun Feb 9 12:19:01 2020 +0000 Reference nvmmctl(8). commit cbb1c4572cfc63d6950b30b512123e33020139a5 Author: maxv Date: Sat Feb 8 09:05:08 2020 +0000 Sync the codes with reality: partial replaced by mid, and use-after-ret added. commit b237b3e3b142847f3c31c876c384a1d0de92f8bb Author: maxv Date: Sat Feb 8 08:47:27 2020 +0000 Move three functions into usbdi_util.c, where they belong. No functional change. commit e1596387386ca6ac9a9508d7997660fac76a3b39 Author: maxv Date: Sat Feb 8 08:18:06 2020 +0000 Reorder usbdi_util.{c,h}, for clarity. No functional change. commit 3bc47d0d3af0c59073b380b86462988a8b057cec Author: maxv Date: Sat Feb 8 07:57:16 2020 +0000 Dedup usb_desc_iter_next with usb_desc_iter_peek. commit 0c1d0449d5bf5929586d39642129543639c1f2b2 Author: maxv Date: Sat Feb 8 07:53:23 2020 +0000 Introduce usbd_clear_endpoint_feature(), and dedup. commit e44657ff4aee8d429871150d82b9f04603159e01 Author: maxv Date: Sat Feb 8 07:38:17 2020 +0000 Move uvideo's parsers into usbdi.c, to make them global. Rename usb_desc_iter_peek_next -> usb_desc_iter_peek for consistency. commit 90289361a08b9af7df687d5a76bfecd0455d3268 Author: maxv Date: Sat Feb 8 07:24:46 2020 +0000 constify commit 0869427b16d4c158531fbd7cc97e6e66affd6b92 Author: maxv Date: Sat Feb 8 07:20:41 2020 +0000 localify commit 09e647526dec1dad5ef5d4069ebb49c6aeb98bc7 Author: maxv Date: Sat Feb 8 07:19:09 2020 +0000 constify commit bfdaf9be63ffbddb8f8ebb755445341d67ef702a Author: maxv Date: Sat Feb 8 07:07:06 2020 +0000 Retire KLEAK. KLEAK was a nice feature and served its purpose; it allowed us to detect dozens of info leaks on the kernel->userland boundary, and thanks to it we tackled a good part of the infoleak problem 1.5 years ago. Nowadays however, we have kMSan, which can detect uninitialized memory in the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to detect, but in addition, (1) it operates in all of the kernel and not just the kernel->userland boundary, (2) it requires no user interaction, and (3) it is deterministic and not statistical. That makes kMSan the feature of choice to detect info leaks nowadays; people interested in detecting info leaks should boot a kMSan kernel and just wait for the magic to happen. KLEAK was a good ride, and a fun project, but now is time for it to go. Discussed with several people, including Thomas Barabosch. commit 7a2c69b6da82e808818ae7be10a27983f39a79c7 Author: maxv Date: Fri Jan 31 09:23:58 2020 +0000 BTI definitions. commit 5a077e575792ffeb416265eeea5acdfc821333fa Author: maxv Date: Fri Jan 31 09:08:57 2020 +0000 D means E here (aarch32), so don't check it. A-I-F are checked below already, so drop the whole line. commit 8541437a8d4c16c77277598f0dc57e44b47957da Author: maxv Date: Fri Jan 31 09:01:23 2020 +0000 Fix copyout overflows in fhstat, found by the LGTM bot. Not a big problem since this syscall is privileged. commit b88c43f3457d2e226b9178be3ddd42264ed5c10e Author: maxv Date: Fri Jan 31 08:55:38 2020 +0000 'oldlwp' is never NULL now, so remove the NULL checks. commit 029c958d1f2da86196da15fe26aeb74fff262cfd Author: maxv Date: Fri Jan 31 08:26:10 2020 +0000 Be more informative. commit ec7ed85d524a6b7cd408c50a294c0a021f89bd36 Author: maxv Date: Fri Jan 31 08:21:11 2020 +0000 constify commit f09c8573104a811466d197cff492d00c74c4f2c3 Author: maxv Date: Tue Jan 28 18:02:30 2020 +0000 More SCTLR. commit c4b014883bf1d84519c2f286c3115ffa4fa47f05 Author: maxv Date: Tue Jan 28 17:47:50 2020 +0000 Fetch ID_AA64MMFR2_EL1. Okayed by Nick the other day. commit 6598ba45f8d04fe3fc2fa7910cfdb5f037b58631 Author: maxv Date: Tue Jan 28 17:36:42 2020 +0000 More identification. commit ec85fca96eeb19497bd584d6bd1b063ace35976f Author: maxv Date: Tue Jan 28 17:33:07 2020 +0000 Jazelle and T32EE are not part of ARMv8, fix the bits to their real meanings. No functional change. commit e3735426cdd572fe80a36e76944e15184b2b2ab0 Author: maxv Date: Tue Jan 28 17:23:30 2020 +0000 More definitions. commit f8a7284b8356bbad8a78291006c93c8d0bb20351 Author: maxv Date: Sat Jan 25 15:59:11 2020 +0000 Fix uninitialized variable. There may not be a TYPE_ASCII block. Found by kMSan with nouveau. commit d62ec35051406bd31d36dadda1e71546a643d646 Author: maxv Date: Sat Jan 25 15:55:33 2020 +0000 Actually, uio_vmspace is never NULL, the check should be against pmap_kernel. commit 9b2b5dc3cf0aa2217ac65345c17cb38a0bb4cd0f Author: maxv Date: Thu Jan 9 16:27:57 2020 +0000 Registering the host's CR0 is done outside of the VCPU loop, so it must be cleared because it is also cleared inside the loop. Not clearing it could trigger DNAs on VMEXITs, because STTS/CLTS are still here as part of debugging since my FPU overhaul. commit 89261c10455ee770e9920d4d8a797a7c1274af48 Author: maxv Date: Thu Jan 9 16:20:12 2020 +0000 Mmh, as noted in PR/54847, this should be uint64_t, not uint16_t. Harmless because we use only the two lowest bits anyway. I believe this could be caught by KUBSAN; time to do another round of NVMM+K_SAN testing. commit 14c7394c5f4e9317f15f17bb781f8b9306717478 Author: maxv Date: Tue Jan 7 06:42:26 2020 +0000 Localify, constify. commit 7fbf46de7da00ed8e985245df66d5ec22169bf2e Author: maxv Date: Tue Jan 7 06:14:42 2020 +0000 Set 'ntencpass' to NULL as part of 'again', to prevent use-after-free. commit b5dcdba757c9ab9fd131078d62e697b9bdc435a2 Author: maxv Date: Tue Jan 7 06:12:09 2020 +0000 Set 'ld_sync' to NULL as part of 'again', to prevent use-after-free. commit a7ccaaae07445f1925b776957cdc193c0216266d Author: maxv Date: Tue Jan 7 06:10:18 2020 +0000 Fix big bugs. commit 9be4a629f3c35db4454e0bbc93897b787194f529 Author: maxv Date: Fri Jan 3 08:53:14 2020 +0000 Don't forget to initialize 'sin6_len'. With kASan, from time to time the value will be bigger than the size of the source, and we get a read overflow. With kMSan the uninitialized access is detected immediately. Reported-by: syzbot+841ca14baccec37b4f8f@syzkaller.appspotmail.com commit 4950ff50d183c106941b9f48e68e41bddd6b18b4 Author: maxv Date: Thu Jan 2 08:08:30 2020 +0000 Remove the call to KERNEL_UNLOCK_ONE(), it was forgotten when the biglock was dropped in rev1.63. Found via vHCI. commit b3dbe39e0da07f277673452bdbf9648afbaacd37 Author: maxv Date: Wed Jan 1 14:52:38 2020 +0000 Fix three stack info leaks, found by kMSan when just invoking all syscalls with a zero page as argument. MSan: Uninitialized Stack Memory In copyout() At Offset 0, Variable 'sb32' From compat_20_netbsd32_getfsstat() MSan: Uninitialized Stack Memory In copyout() At Offset 12, Variable 'oss' From compat_43_sys_sigstack() MSan: Uninitialized Stack Memory In copyout() At Offset 0, Variable 'sb' From compat_50_netbsd32___fhstat40() commit b5b94b025a74020b433166bae5985ffb307872e5 Author: maxv Date: Wed Jan 1 09:40:17 2020 +0000 Fix small read overflows when parsing HID tables. Noticed by kASan the other day while I was playing with vHCI. commit f163e7fe8a573e4f3c8fecefa9808a57e8a86032 Author: maxv Date: Wed Jan 1 09:17:45 2020 +0000 Fix sizeof mismatch in copyin. This leads to a user-triggerable stack overflow. On my test build at least, by luck, the compiler orders the variables in a way that the overflow hits only local structures which haven't yet been initialized and used, so the overflow is harmless. Very easily seeable with kASan - just invoke the syscall from a 32bit binary. commit fff965eec9e7ddf4191957660891277553b67a33 Author: maxv Date: Wed Jan 1 09:08:28 2020 +0000 Fix buffer overflows: validate the lengths at attach time, given that they are apparently not supposed to be variable. Drop sc_ilen since it is unused. commit c60e6a598f162e1dd13e297c1076417d4b243c86 Author: maxv Date: Wed Jan 1 09:05:03 2020 +0000 Fix buffer overflows. Also add missing mutex_exit. commit ac1549a38f4271d463b56f3422b80179cea1b56e Author: maxv Date: Wed Jan 1 09:03:00 2020 +0000 Fix buffer overflows. sc_{o,f}len are controlled by the USB device. By crafting the former the device can leak stack data. By crafting the latter the device can overwrite the stack. The combination of the two means the device can ROP the kernel and obtain code execution (demonstrated with an actual exploit over vHCI). Truncate the lengths to the size of the buffers, and also drop sc_ilen since it is unused. Patch tested with vHCI+kASan. commit 7bebcdc45c1ac1333e2b569435013f4e890ac4f3 Author: maxv Date: Fri Dec 27 15:49:20 2019 +0000 Switch to panic, and make the message more useful. commit 986c49e07896566c7946bbf21e459a65879150b9 Author: maxv Date: Mon Dec 23 06:45:36 2019 +0000 Revert the removal of filemon. commit 3b1ff0c790e715027a73e37101e7a201027e4136 Author: maxv Date: Thu Dec 19 07:14:07 2019 +0000 Revert the filemon removal in bmake, as pointed out by maya we do care about not introducing divergence with FreeBSD, and the cost of unused is acceptable here. commit 39b8f21d5879b760f80b29f6324b3285d430cf13 Author: maxv Date: Wed Dec 18 07:37:17 2019 +0000 Retire filemon, discussed on tech-kern@. commit 276bd3ed8eb8fa72e7b2e1f87c9740c656991d23 Author: maxv Date: Sat Dec 14 07:45:20 2019 +0000 Disable multiboot for now, too much breakage. commit 93a6b305cdf07dceff694ac1ed71f1dfa0d02180 Author: maxv Date: Fri Dec 13 14:13:55 2019 +0000 Read the len before pushing the packet, otherwise possible use-after-free. Found by a custom query on LGTM. commit ab94523c22d40e6d58449dbb4ff8cda443e2d83e Author: maxv Date: Fri Dec 13 14:10:32 2019 +0000 Fix gross use-after-free. Found by a custom query on LGTM. commit c49707cdb80cae8b075f836247bcfad453a3a7a0 Author: maxv Date: Thu Dec 12 16:49:20 2019 +0000 Check CPUID.IBRS in addition to ARCH_CAP.IBRS_ALL. For clarity, and also because VirtualBox clears the former but forgets to clear the latter (which makes us hit a #GP on RDMSR). commit 748becb771e9a70d6bb3de1d683ce772d68b6285 Author: maxv Date: Sun Dec 8 11:53:54 2019 +0000 Use the inlines; it is actually fine, since the compiler drops the inlines if the caller is kmsan-instrumented, forcing a white-listing of the memory access. commit f92a8afcaf8631a61131eb8f9b50131b8b2285ee Author: maxv Date: Sun Dec 8 11:48:15 2019 +0000 Fix __nomsan: missing opt_kmsan.h, and the attribute should be kernel-memory. commit a14bb496bb8d1c83043efd272f966671bfe5fc1f Author: maxv Date: Sat Dec 7 10:19:35 2019 +0000 Panic instead of printf, same as syscall. commit 9336fdf6479ab35b6940c337a51bd0c657801c47 Author: maxv Date: Fri Dec 6 16:54:47 2019 +0000 cast to proper type commit e3e09bab60698c58499faafbb29c8a0d60a1b683 Author: maxv Date: Fri Dec 6 08:35:21 2019 +0000 Fix a bunch of unimportant "Local variable hides global variable" warnings from the LGTM bot. commit 8f77eb0caaa5db3f9a72ac0d35366a109e42ec6b Author: maxv Date: Fri Dec 6 07:27:06 2019 +0000 Minor changes, reported by the LGTM bot. commit a7d6704469b7cd8abf04588054cb62a7526f9187 Author: maxv Date: Fri Dec 6 07:12:38 2019 +0000 localify commit 3362a4411c6b055abd93183edfd379dab262de62 Author: maxv Date: Sun Dec 1 12:47:10 2019 +0000 minor adjustments, to avoid warnings on debug builds commit a89955c02b27737a067b5802cc735685dae785da Author: maxv Date: Sun Dec 1 08:23:09 2019 +0000 localify commit 4f0ab62baf0368beaaf2ef6eaad908c00b04e6e2 Author: maxv Date: Sun Dec 1 08:19:09 2019 +0000 Use atomic_{load,store}_relaxed() on global counters. commit 51ad8d0623d0353631dea0cbeea716c9a8b31062 Author: maxv Date: Sun Dec 1 08:15:58 2019 +0000 Add KCSAN instrumentation for atomic_{load,store}_*. commit 76e3263c4edbd59931d988ae3e1cda6eb2fdf748 Author: maxv Date: Fri Nov 29 17:40:16 2019 +0000 Add sanity check, only sat_len bytes got copied in, the rest is uninitialized. Found by KMSAN. commit ddc5f0156698aab2affd64f658c609dd85449831 Author: maxv Date: Thu Nov 28 17:09:10 2019 +0000 localify commit 00f333c4a64b0f00f117399291236f10ddb022a3 Author: maxv Date: Wed Nov 27 19:21:36 2019 +0000 localify commit 1f06f3b18fcd141a59e7be81f904669fb78d8e89 Author: maxv Date: Wed Nov 27 06:24:33 2019 +0000 Add a small API for in-kernel FPU operations. fpu_kern_enter(); /* do FPU stuff */ fpu_kern_leave(); commit 077dec59e95da281f909116d427a3b21d436a1ca Author: maxv Date: Fri Nov 22 14:28:46 2019 +0000 Ah, strcat/strchr/strrchr are ASM functions, so instrument them. commit 561050702aed8c9bbc1783de26f0b6440ad526db Author: maxv Date: Fri Nov 22 10:26:32 2019 +0000 Several improvements. In particular, reduce CS.limit, because Intel CPUs perform strict sanity checks, and the previous (too high) limit caused the VM entry to fail. commit a789c37d5f358726e4f931c31ffba9d37906be72 Author: maxv Date: Wed Nov 20 10:26:56 2019 +0000 Hide XSAVES-specific stuff and the masked extended states. commit 3a9491ad228bb4ea060dcd14585a01d61f9075d8 Author: maxv Date: Sun Nov 17 14:07:00 2019 +0000 Disable KCOV - by raising the interrupt level - in the TLB IPI handler, because this is only noise. commit 7375b374c5e80c2225d2a8976ed4c95dfa9d105a Author: maxv Date: Sun Nov 17 11:28:48 2019 +0000 Not a bug strictly speaking, but compute the address only after the length checks, for clarity and to appease kUBSan. commit db479982a8cece48118a69cc4a89ad6a071957fb Author: maxv Date: Sat Nov 16 17:53:46 2019 +0000 Don't report MWAITX by default. commit 8b183b955b322596104f19fdf0685580c51571cf Author: maxv Date: Sat Nov 16 10:19:29 2019 +0000 Add a NULL check on the structure pointer, not to retrieve its first field if it is NULL. The previous code was not buggy strictly speaking. This change probably doesn't change anything, except removing assumptions in the compiler optimization passes, which too probably doesn't change anything in this case. Reported-by: syzbot+110b29c1973f38a38026@syzkaller.appspotmail.com commit 4208052cd7c4690a3bb52adcec1533aa5e94128f Author: maxv Date: Sat Nov 16 10:15:10 2019 +0000 Call rtcache_unref() only when the checks succeed, instead of relying on another NULL check in rtcache_unref(). Because, in order to resolve the address of the second argument, we do a dereference on 'tp', which is theoretically allowed to be NULL. The five callers of nd6_hint() never pass a NULL argument however, so by luck the actual NULL deref never happens. Maybe the NULL check on 'tp' in should be replaced to a KASSERT ensuring it isn't NULL, for clarity. Reported by kUBSan. commit 94b66dc41530fd99d3d5339198138d993e99149f Author: maxv Date: Sat Nov 16 10:07:53 2019 +0000 NULL-check the structure pointer, not the address of its first field. Also add KASSERT. For clarity, and to appease kUBSan. commit 442a5894eb127e552bf96a7a71233ea37aaef53d Author: maxv Date: Sat Nov 16 10:05:44 2019 +0000 Add a NULL check on the structure (same logic as my previous change in this file). For clarity, and to appease kUBSan. commit 7735fa75d0016c53197c702100f56801a03cee35 Author: maxv Date: Fri Nov 15 15:51:57 2019 +0000 NULL-check the structure pointer, not the address of its first field. This is clearer and also appeases syzbot. Reported-by: syzbot+d27bc1be926b3641c0ad@syzkaller.appspotmail.com commit 9a7f89350eaea269016799b404241a0bc1c46914 Author: maxv Date: Fri Nov 15 12:18:46 2019 +0000 Instrument ufetch/ustore in kMSan, these were the last remaining functions. commit 65543e42fb5f2b67276c4a64028e3bc632a4e3fd Author: maxv Date: Fri Nov 15 09:50:01 2019 +0000 Since cpu_in_cksum.S can be built outside of the kernel, add an ugly #ifdef _KERNEL for kMSan. commit 777f995c14a63bfedf0abd3ccc913d3d41fad903 Author: maxv Date: Fri Nov 15 09:44:44 2019 +0000 Make kMSan compatible with KCOV. With kMSan we are forced to stay with the fsanitize flag on subr_kcov.c, which means that kMSan will instrument KCOV. We add a bunch of __nomsan attributes to reduce this instrumentation, but it does not remove it completely. That's fine. commit 445b671f06a4515f8faea5f9d8cc91bac9d3b640 Author: maxv Date: Fri Nov 15 09:03:26 2019 +0000 Remove the ins* and outs* functions. Not sanitizer-friendly, and unused anyway. commit 34a3f98781c541cf7bfe11902f580819d6846cbe Author: maxv Date: Fri Nov 15 08:11:36 2019 +0000 Instrument copyout() in kCSan, for parity with kMSan. commit 75d9f645a8cff09ddb6843077b222a3e7bae1852 Author: maxv Date: Thu Nov 14 17:09:22 2019 +0000 Mark several kASan functions with __nothing, to avoid annoying #ifdefs. Same as kCSan and kMSan. commit 6b934f670d84815e10cd80ad69e970913c443e84 Author: maxv Date: Thu Nov 14 16:56:13 2019 +0000 Don't include "opt_kcsan.h" since there's already included. commit 434309ff6f227bbdf356062759480f2199d74ea5 Author: maxv Date: Thu Nov 14 16:48:51 2019 +0000 Don't include "opt_kasan.h" when there's already included. commit d00db216d2a9329457d22f79273a44a981d1bc7e Author: maxv Date: Thu Nov 14 16:27:26 2019 +0000 Note kMSan. commit 805abc06e390e7015651dc1efcde7f3acd380006 Author: maxv Date: Thu Nov 14 16:23:52 2019 +0000 Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized memory used by the kernel at run time, and just like kASan and kCSan, it is an excellent feature. It has already detected 38 uninitialized variables in the kernel during my testing, which I have since discreetly fixed. We use two shadows: - "shad", to track uninitialized memory with a bit granularity (1:1). Each bit set to 1 in the shad corresponds to one uninitialized bit of real kernel memory. - "orig", to track the origin of the memory with a 4-byte granularity (1:1). Each uint32_t cell in the orig indicates the origin of the associated uint32_t of real kernel memory. The memory consumption of these shadows is consequent, so at least 4GB of RAM is recommended to run kMSan. The compiler inserts calls to specific __msan_* functions on each memory access, to manage both the shad and the orig and detect uninitialized memory accesses that change the execution flow (like an "if" on an uninitialized variable). We mark as uninit several types of memory buffers (stack, pools, kmem, malloc, uvm_km), and check each buffer passed to copyout, copyoutstr, bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory that leaves the system. This allows us to detect kernel info leaks in a way that is more efficient and also more user-friendly than KLEAK. Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot tolerate having one non-instrumented function, because this could cause false positives. kMSan cannot instrument ASM functions, so I converted most of them to __asm__ inlines, which kMSan is able to instrument. Those that remain receive special treatment. Contrary to kASan again, kMSan uses a TLS, so we must context-switch this TLS during interrupts. We use different contexts depending on the interrupt level. The orig tracks precisely the origin of a buffer. We use a special encoding for the orig values, and pack together in each uint32_t cell of the orig: - a code designating the type of memory (Stack, Pool, etc), and - a compressed pointer, which points either (1) to a string containing the name of the variable associated with the cell, or (2) to an area in the kernel .text section which we resolve to a symbol name + offset. This encoding allows us not to consume extra memory for associating information with each cell, and produces a precise output, that can tell for example the name of an uninitialized variable on the stack, the function in which it was pushed on the stack, and the function where we accessed this uninitialized variable. kMSan is available with LLVM, but not with GCC. The code is organized in a way that is similar to kASan and kCSan, so it means that other architectures than amd64 can be supported. commit 940a11a41718ceac4fd4bbc45325bda8de2a3b3d Author: maxv Date: Wed Nov 13 12:55:10 2019 +0000 Rename: PP_ATTRS_M -> PP_ATTRS_D PP_ATTRS_U -> PP_ATTRS_A For consistency. commit fdeb76829de777915e7f4306f75d2af23a2289af Author: maxv Date: Wed Nov 13 10:13:41 2019 +0000 Use x86_patch_window_{open,close}. This also fixes a bug: the CR0/PSL reloads were inverted. commit 5c2def182250c42269bd6cb857c60e41c0ccf71f Author: maxv Date: Wed Nov 13 09:47:37 2019 +0000 Switch to the new PTE naming. commit e1c14bd4a8eb1bb39f3c7798da99eea278837f97 Author: maxv Date: Tue Nov 12 18:00:13 2019 +0000 Mitigation for CVE-2019-11135: TSX Asynchronous Abort (TAA). Two sysctls are added: machdep.taa.mitigated = {0/1} user-settable machdep.taa.method = {string} constructed by the kernel There are two cases: (1) If the CPU is affected by MDS, then the MDS mitigation will also mitigate TAA, and we have nothing else to do. We make the 'mitigated' leaf read-only, and force: machdep.taa.mitigated = machdep.mds.mitigated machdep.taa.method = [MDS] The kernel already enables the MDS mitigation by default. (2) If the CPU is not affected by MDS but is affected by TAA, then we use the new TSX_CTRL MSR to disable RTM. This MSR is provided via a microcode update, now available on the Intel website. The kernel will automatically enable the TAA mitigation if the updated microcode is present. If the new microcode is not present, the user can load it via cpuctl, and set machdep.taa.mitigated=1. commit 898d990986180060785b6d676caa10bc9278a3c2 Author: maxv Date: Tue Nov 12 08:11:55 2019 +0000 Add more checks in ip6_pullexthdr, to prevent a panic in m_copydata. The Rip6 entry point could see a garbage Hop6 option. Not a big issue, since it's a clean panic only triggerable if the socket has the IN6P_DSTOPTS/IN6P_RTHDR option. Reported-by: syzbot+3b07b3511b4ceb8bf1e2@syzkaller.appspotmail.com commit 49a6cd2fcf22b202106e7462fd4f14227a09c949 Author: maxv Date: Mon Nov 11 09:50:11 2019 +0000 Remove lockless reads of 'xc_donep'. This is an uint64_t, and we cannot expect the accesses to be MP-safe on 32bit arches. Found by KCSAN. commit c8b99c73fd95e036505aa535e644932732470a95 Author: maxv Date: Fri Nov 8 12:36:10 2019 +0000 Exclude the PTE space from KCSAN, since there the same VA can point to different PAs. commit 7001ab8ede3b4d94002406dadef8dbc185e63d8a Author: maxv Date: Wed Nov 6 06:57:22 2019 +0000 Change kcsan_md_is_avail() to always return true; I was testing with interrupts disabled as debugging. Change the delay/sample parameters to have better fluidity. commit 358c4cbb7fcf649cca7b796452471d8e06f2620a Author: maxv Date: Tue Nov 5 20:23:44 2019 +0000 Note kCSan. commit 202e7274cc6c262b2601aaec415b5f3e0fdd9baf Author: maxv Date: Tue Nov 5 20:21:34 2019 +0000 Add the __nocsan attribute on this function. Races on ci_want_resched are accepted (part of the design). commit fb9e4a5499a890ceca44125ad474a60644fc3685 Author: maxv Date: Tue Nov 5 20:19:17 2019 +0000 Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us to detect race conditions at runtime. It is a variation of TSan that is easy to implement and more suited to kernel internals, albeit theoretically less precise than TSan's happens-before. We do basically two things: - On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell describing the access, and delay the calling CPU (10ms). - On all memory accesses, we verify if the memory we're reading/writing is referenced in a cell already. The combination of the two means that, if for example cpu0 does a read that is selected and cpu1 does a write at the same address, kCSan will fire, because cpu1's write collides with cpu0's read cell. The coverage of the instrumentation is the same as that of kASan. Also, the code is organized in a way similar to kASan, so it is easy to add support for more architectures than amd64. kCSan is compatible with KCOV. Reviewed by Kamil. commit e23e61a0b70146cc683ea420a6df957cee5db9e9 Author: maxv Date: Fri Nov 1 15:11:43 2019 +0000 Fix KUBSAN: the kernel size now exceeds the mapping limit, so bump the limit. commit 14b80cc0fbcff603272c07d6692f040394c8d829 Author: maxv Date: Wed Oct 30 17:06:57 2019 +0000 More inlined ASM. commit 7684a77b5bd62dd4d70a3318a46c9e84fa52dc3f Author: maxv Date: Wed Oct 30 16:32:04 2019 +0000 Style. commit 9fe4c0945d565d2295fd4d78b674743f68bf79e9 Author: maxv Date: Wed Oct 30 07:59:44 2019 +0000 Get &rsc->sc_dksc only when we know 'rsc' is not NULL. This was actually harmless because we didn't use the pointer then. Reported-by: syzbot+77097fae0e3aad6de088@syzkaller.appspotmail.com commit 9e0a10506aefd08713b1ee634d52260125941253 Author: maxv Date: Wed Oct 30 07:40:05 2019 +0000 Switch to new PTE bits. commit 476eb0a812366eea8bc82299a7536754e3351c01 Author: maxv Date: Tue Oct 29 12:39:46 2019 +0000 Enable XSAVEOPT. commit 232627de3bf47ca14cf428962c16bd5aef3fc200 Author: maxv Date: Tue Oct 29 08:13:16 2019 +0000 Forgot to put nvmmctl in the "nvmm" group. commit 0f96d1c5a23b1189775fef424020440e112f4926 Author: maxv Date: Mon Oct 28 14:20:28 2019 +0000 should be fork(2), noticed by wiz commit 8da1a202cb075e238507f7ff3fab65e1a06d455c Author: maxv Date: Mon Oct 28 13:04:18 2019 +0000 Add nvmmctl, with two commands for now. commit fd6ab7369dc14b8aa2595b2d6729f5ec6b50f0c2 Author: maxv Date: Mon Oct 28 09:00:08 2019 +0000 Add nram in struct nvmm_ctl_mach_info. commit 41d9410dd76543442c4a036a5ee582ada57fc89d Author: maxv Date: Mon Oct 28 08:30:49 2019 +0000 A few changes: - Use smaller types in struct nvmm_capability. - Use smaller type for nvmm_io.port. - Switch exitstate to a compacted structure. commit 653e53ec41c93baef57c0088271aad6cb9c4a87f Author: maxv Date: Sun Oct 27 20:17:36 2019 +0000 Change the way root_owner works: consider the calling process as root_owner not if it has root privileges, but if the /dev/nvmm device was opened with write permissions. Introduce the undocumented nvmm_root_init() function to achieve that. The goal is to simplify the logic and have more granularity, eg if we want a monitoring agent to access VMs but don't want to give this agent real root access on the system. commit 6b3b2637750d5037816dbfcf4b664c481421f21e Author: maxv Date: Sun Oct 27 18:26:54 2019 +0000 Add PCID support in the guests. This speeds up most 64bit guests, because since Meltdown, everybody uses PCID (including NetBSD). commit 5e8721915ae9af2b68da65ef6c86565796e3a7bd Author: maxv Date: Sun Oct 27 11:11:09 2019 +0000 Mask CPUID leaf 0x0A on Intel, because we don't want the guest to try (and fail) to probe the PMC MSRs. This avoids "Unexpected WRMSR" warnings in qemu-nvmm. commit ceac903fc713d882031c3a0cc4e048e03d297edb Author: maxv Date: Sun Oct 27 10:28:55 2019 +0000 Add a new VCPU conf option, that allows userland to request VMEXITs after a TPR change. This is supported on all Intel CPUs, and not-too-old AMD CPUs. The reason for wanting this option is that certain OSes (like Win10 64bit) manage interrupt priority in hardware via CR8 directly, and for these OSes, the emulator may want to sync its internal TPR state on each change. Add two new fields in cap.arch, to report the conf capabilities. Report TPR only on Intel for now, not AMD, because I don't have a recent AMD CPU on which to test. commit d04cabf5e8edc0118b6f3ea92c3d1df87865dafc Author: maxv Date: Sun Oct 27 08:30:05 2019 +0000 Use the new PTE naming, and define CR3_FRAME_* separately. No functional change. commit 7ae6ddb09ce8a9437bb11d1a31b580e0d8d608c1 Author: maxv Date: Sun Oct 27 07:08:15 2019 +0000 Add the "nvmm" group, and make nvmm_init() public. Sent to tech-kern@ a few days ago. commit cc801ef223697b879b01f73432600e78c1fd8b13 Author: maxv Date: Fri Oct 25 09:09:24 2019 +0000 Update the libnvmm man page: - Sync the naming with reality. - Replace "relevant" by "desired" and "virtualizer" by "emulator", closer to what I meant. - Add a "VCPU Configuration" section. - Add a "Machine Ownership" section. commit 9565ceef01920858d4ac7a34ddbfdf3f112e9195 Author: maxv Date: Wed Oct 23 12:02:55 2019 +0000 Three changes in libnvmm: - Add 'mach' and 'vcpu' backpointers in the nvmm_io and nvmm_mem structures. - Rename 'nvmm_callbacks' to 'nvmm_assist_callbacks'. - Rename and migrate NVMM_MACH_CONF_CALLBACKS to NVMM_VCPU_CONF_CALLBACKS, it now becomes per-VCPU. commit 94e1ec69b357df998062327058d5935fb0743b54 Author: maxv Date: Wed Oct 23 07:01:11 2019 +0000 Miscellaneous changes in NVMM, to address several inconsistencies and issues in the libnvmm API. - Rename NVMM_CAPABILITY_VERSION to NVMM_KERN_VERSION, and check it in libnvmm. Introduce NVMM_USER_VERSION, for future use. - In libnvmm, open "/dev/nvmm" as read-only and with O_CLOEXEC. This is to avoid sharing the VMs with the children if the process forks. In the NVMM driver, force O_CLOEXEC on open(). - Rename the following things for consistency: nvmm_exit* -> nvmm_vcpu_exit* nvmm_event* -> nvmm_vcpu_event* NVMM_EXIT_* -> NVMM_VCPU_EXIT_* NVMM_EVENT_INTERRUPT_HW -> NVMM_VCPU_EVENT_INTR NVMM_EVENT_EXCEPTION -> NVMM_VCPU_EVENT_EXCP Delete NVMM_EVENT_INTERRUPT_SW, unused already. - Slightly reorganize the MI/MD definitions, for internal clarity. - Split NVMM_VCPU_EXIT_MSR in two: NVMM_VCPU_EXIT_{RD,WR}MSR. Also provide separate u.rdmsr and u.wrmsr fields. This is more consistent with the other exit reasons. - Change the types of several variables: event.type enum -> u_int event.vector uint64_t -> uint8_t exit.u.*msr.msr: uint64_t -> uint32_t exit.u.io.type: enum -> bool exit.u.io.seg: int -> int8_t cap.arch.mxcsr_mask: uint64_t -> uint32_t cap.arch.conf_cpuid_maxops: uint64_t -> uint32_t - Delete NVMM_VCPU_EXIT_MWAIT_COND, it is AMD-only and confusing, and we already intercept 'monitor' so it is never armed. - Introduce vmx_exit_insn() for NVMM-Intel, similar to svm_exit_insn(). The 'npc' field wasn't getting filled properly during certain VMEXITs. - Introduce nvmm_vcpu_configure(). Similar to nvmm_machine_configure(), but as its name indicates, the configuration is per-VCPU and not per-VM. Migrate and rename NVMM_MACH_CONF_X86_CPUID to NVMM_VCPU_CONF_CPUID. This becomes per-VCPU, which makes more sense than per-VM. - Extend the NVMM_VCPU_CONF_CPUID conf to allow triggering VMEXITs on specific leaves. Until now we could only mask the leaves. An uint32_t is added in the structure: uint32_t mask:1; uint32_t exit:1; uint32_t rsvd:30; The two first bits select the desired behavior on the leaf. Specifying zero on both resets the leaf to the default behavior. The new NVMM_VCPU_EXIT_CPUID exit reason is added. commit 9153cfaafd05aec0dee8f7be04bb9cd69ab0e7e1 Author: maxv Date: Mon Oct 21 10:09:24 2019 +0000 Call cpu_probe_fpu() only once (from cpu0), and style. commit 82ccf39326da756507ec6c1e54e442017231bcfd Author: maxv Date: Sat Oct 19 19:45:10 2019 +0000 Put back 'default', because llvm apparently doesn't realize that all cases are covered in the switch. commit 9a5ad40a74dd2778ecc12a8dde43158098a3318d Author: maxv Date: Fri Oct 18 16:26:38 2019 +0000 Remove unused call to savectx(). commit c98fe0220bb36bc2956dda44105bb1e81f848a02 Author: maxv Date: Thu Oct 17 14:00:28 2019 +0000 Make sure we're dealing with a static binary. Otherwise we could crash if the user mistakenly tries to boot a KASLR kernel with 'boot' instead of 'pkboot'. Now we fail cleanly. Reported by cryo@. commit e0540d8707c908b9ac10b86b740d69bcc71214bf Author: maxv Date: Thu Oct 17 08:54:50 2019 +0000 Sentence begins with capital letter ("yes or no?"). Also add a few french sentences, to make it less awful, but not complete. Not tested. commit fb3a2c182f296ef8ded47c6f3d218aae7c28d0f5 Author: maxv Date: Mon Oct 14 16:43:04 2019 +0000 Error out if the type is beyond the storage size. No functional change, since the shift would otherwise 'and' against zero, returning EEXIST. Reported-by: syzbot+cb68ccdc1ef3aca2d679@syzkaller.appspotmail.com commit fd011c824130cbb11c6db32b8fe10d80cd38c68b Author: maxv Date: Mon Oct 14 16:27:03 2019 +0000 Add a check before the memcpy. memcpy is defined to never take NULL as second argument, and the compiler is free to perform optimizations knowing that this argument is never NULL. In this particular case, it was harmless. But still good to fix. Reported-by: syzbot+6f504255accb795eb6b7@syzkaller.appspotmail.com commit b56f6c6c54bfeb61baaec58cf0624cc45cf1c0e2 Author: maxv Date: Mon Oct 14 10:43:40 2019 +0000 Improve nvmm_vcpu_dump(). commit 319ea4968def506d372c5a791eb5d8e423157479 Author: maxv Date: Mon Oct 14 10:39:24 2019 +0000 Implement XCHG, add associated tests, and add comments to explain. With this in place the Windows 95 installer completes successfuly. Part of PR/54611. commit d85a5250b88e989d7f82130c5dd595503563c3fc Author: maxv Date: Sun Oct 13 17:32:15 2019 +0000 Fix incorrect parsing: the R/M field uses a special GPR map when the address size is 16 bits, regardless of the actual operating mode. With this special map there can be two registers referenced at once, and also disp16-only. Implement this special behavior, and add associated tests. While here simplify a few things. With this in place, the Windows 95 installer initializes correctly. Part of PR/54611. commit 14447043eb58d75f41bbc95c55711ac1626c9920 Author: maxv Date: Sat Oct 12 06:31:03 2019 +0000 Rewrite the FPU code on x86. This greatly simplifies the logic and removes the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on port-amd64 a week ago. Bump the kernel version to 9.99.16. commit 8b23a95a22e0a53d38b09a9092a005d3223050d8 Author: maxv Date: Thu Oct 10 13:45:14 2019 +0000 Add KASAN instrumentation on ucas and ufetch. commit f1f294fd447e2443312229408487267ad6023511 Author: maxv Date: Wed Oct 9 17:28:46 2019 +0000 Add new bits. commit 40cd93c2ddf09924cab4d27154a103c8655f83dc Author: maxv Date: Wed Oct 9 14:15:40 2019 +0000 Memset to prevent stack info leak. commit fc83762bc464be0bf351901b2c387a8cfedff7c4 Author: maxv Date: Wed Oct 9 14:03:57 2019 +0000 Provide a better abstraction for the TPM interface. Report it in the ioctl. commit 7339ab0a4705263929fc051c07ede47dcdcdccfe Author: maxv Date: Wed Oct 9 07:30:58 2019 +0000 Add suspend support for TPM 2.0 chips. Check the TPM response also for 1.2 chips. Unfortunately I cannot really test this change since ACPI suspend does not work on any of my laptops. commit 7be462a3028a8765c2f8b886e5e4c005c106b32f Author: maxv Date: Tue Oct 8 18:50:44 2019 +0000 No I/O ports for TPM-ISA, only MMIO, so remove commented-out options. commit fc8f55b842e893784de575c47e28c464c32bd93e Author: maxv Date: Tue Oct 8 18:43:02 2019 +0000 Improvements in tpm(4): - Remove interrupt support, do polling only, avoids unnecessary trouble. - Simplify a few things. - Fix the suspend function, the SaveState command is 0x98, not 0x9C. - Make the driver MP-safe. - Sync the man page with reality. commit a781e454b7f0eb01ae0802d955dfd7034ff28428 Author: maxv Date: Sat Oct 5 07:30:03 2019 +0000 Switch to the new PTE naming. No binary diff (tested with MKREPRO). commit f5c966066a1d67c58d0bfc10b97ded0e9ab095fb Author: maxv Date: Sat Oct 5 07:19:49 2019 +0000 Switch to the new PTE naming: PG_PVLIST -> PTE_PVLIST PG_W -> PTE_WIRED PG_FRAME -> PTE_FRAME No functional change. commit 5d80d1e0217d9109ea06f8d22820cea06e44dec2 Author: maxv Date: Fri Oct 4 15:28:00 2019 +0000 Misc reordering, to clarify and reduce the diff against amd64. commit 24f763a7d4485a323a1891a092c93c424231b597 Author: maxv Date: Fri Oct 4 12:17:05 2019 +0000 Switch to the new PTE naming. commit 473af18372d26e4ed3ed4d8086f758426df31f79 Author: maxv Date: Fri Oct 4 12:15:21 2019 +0000 Fix definition for MWAIT. It should be bit 11, not 12; 12 is the armed version. commit 6b31ba90c77f0dfab22ee453e0b2ea1387a29d11 Author: maxv Date: Fri Oct 4 12:11:38 2019 +0000 Add definitions for RDPRU, MCOMMIT, GMET and VTE. commit 00b446ab70d49fe4f4e6d5273b8025f233b18cee Author: maxv Date: Fri Oct 4 11:47:07 2019 +0000 Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to simplify. commit 6a79d4e76c441439418327e6c6bc5bddb1703dd1 Author: maxv Date: Fri Oct 4 06:27:42 2019 +0000 Add DMA instrumentation in KASAN. We note the original buffer and length in the map, and check the buffer on each bus_dmamap_sync. This allows us to find DMA buffer overflows and UAFs, which couldn't be found before because the device accesses to memory are outside of KASAN's control. commit 82166dc4c58b4042a28cac475d55973ad97cd57c Author: maxv Date: Thu Oct 3 05:20:31 2019 +0000 Fix memory leaks. Was wondering where memory had gone after several hours of attach/detach with vHCI. commit c8e8fa9f4a242bd8d3daa88c5d5254ecf1eecd1a Author: maxv Date: Thu Oct 3 05:16:16 2019 +0000 More less kmem_zalloc(0). commit 0b2c7c32211d40c8736700b074ec37d0c809e553 Author: maxv Date: Thu Oct 3 05:13:23 2019 +0000 Improvements: - Don't process packets if the USB device is detached. Contrary to the other HCIs, vHCI has no timeout, so we never collect the pending packets, and must drop them synchronously. - Fix refcounting bug in vhci_device_ctrl_abort. - Implement vhci_activate. - Add a few KASSERTs. commit 455db353c39e49fd23f524d7a1243f91740bc9a1 Author: maxv Date: Thu Oct 3 05:06:29 2019 +0000 Remove the LazyFPU code, as posted 5 months ago on port-amd64@. commit a4e74656b6e734b0e985dc57ad1449a4d4940f25 Author: maxv Date: Mon Sep 23 17:37:04 2019 +0000 Move the timeout check out of the loop, otherwise it is never reached. Found by the lgtm bot. commit c64b0da34a79b28a5366d57f1d12f2be868ca511 Author: maxv Date: Mon Sep 23 08:04:35 2019 +0000 Use M_BUFADDR to dedup code in M_LEADINGSPACE. commit 34b779382d1e615579fed20d8b468f39420bf2ea Author: maxv Date: Mon Sep 23 07:47:45 2019 +0000 Remove (unused) reference to m_pktdat. commit f7a46c53f78bde002831621d35581aa405c58f41 Author: maxv Date: Mon Sep 23 06:53:09 2019 +0000 Remove unused assignment. Found by the lgtm bot. commit e39c8aabbf518af50e8342058231ffcc9092d837 Author: maxv Date: Mon Sep 23 06:50:04 2019 +0000 A * is missing here. This could cause a use-after-free. Found by the lgtm bot. commit c0c3dbe17459b852fb9a0d0f03da941137da70e9 Author: maxv Date: Sun Sep 22 10:35:12 2019 +0000 Fix KASAN on aarch64: the bus_space_* functions are macros, so we can't redefine them. Introduce __HAVE_KASAN_INSTR_BUS, which indicates whether to instrument the bus functions. Defined on amd64 only. commit 903bc49829a9bc584561ab5cb79336c1b8d157de Author: maxv Date: Sat Sep 21 07:31:56 2019 +0000 Remove unused function prototype. Reported by the lgtm bot. commit 8d55581c6a485aa0e43cce8a35ddf888b0ae21aa Author: maxv Date: Sat Sep 21 07:08:27 2019 +0000 Add __printflike, and fix two incorrect fmts. Reported by the lgtm bot. commit b2d8392ef4caeb0a1222a8988d39d4f25054ccf4 Author: maxv Date: Sat Sep 21 06:56:51 2019 +0000 Fix netbsd32___mount50(): - zero out fs_args32 to prevent info leaks - remove unused and non-functional copyin in NFS (lgtm bot) - declare udata, and don't pass kernel pointers to copyout (lgtm bot) - make sure data_len is just big enough, to mimic the native behavior - don't forget to update *retval with the 32bit value - add an XXX for NFS commit c77cd19f55383516f81eee5e8190d1075ec3f0db Author: maxv Date: Fri Sep 20 13:38:00 2019 +0000 Add ifdefs to eliminate false positives on lgtm, same as coverity. commit 88534fd7d641ee140f689a4871764af96e6a31ff Author: maxv Date: Fri Sep 20 11:29:47 2019 +0000 Don't use the same iterator in a nested loop. (How could this work?) Found by the lgtm bot. commit 28afb74a3788a5cd3309820958f6b19f1ab6e1fd Author: maxv Date: Fri Sep 20 11:09:43 2019 +0000 Fix programming mistake: 'paddrp' is a pointer given as argument, setting it to NULL in the called function does not set it to NULL in the caller. Actually, the callers of these functions do not do anything with the special error handling, so drop the unused checks and the NULL assignments altogether. Found by the lgtm bot. commit b2cb8e6419c8d5e2b679810791bfb140c170cffa Author: maxv Date: Fri Sep 20 09:07:35 2019 +0000 Fix argument. Found by the lgtm bot. commit c90bc1349494f0fbe4cbd440cc153916c7c19dd1 Author: maxv Date: Fri Sep 20 08:58:25 2019 +0000 Fix direction of the loop. Found by the lgtm bot. commit 4500fb3ebc6f10ece7366c1eb91bfb5104effab0 Author: maxv Date: Fri Sep 20 08:48:55 2019 +0000 Use M_BUFADDR. commit 700fd6a70ff0b1fe7c488d7d67150152ddf23b9b Author: maxv Date: Fri Sep 20 08:45:29 2019 +0000 dedup commit dd038f8fb74eba9a0b2e745afc2f3a8eff06a5e9 Author: maxv Date: Wed Sep 18 16:18:12 2019 +0000 Handle M_EXT with M_BUFADDR, and introduce M_BUFSIZE. Use them to dedup code. commit fbff8b5a49978a1a66c0253fda875229b88a22db Author: maxv Date: Sun Sep 15 15:19:49 2019 +0000 Note vHCI. commit 04a5b41d0aeda43e8e95835afac9d198628500e3 Author: maxv Date: Sun Sep 15 15:15:02 2019 +0000 Regen for vHCI, IPMI (was forgotten it seems), and srt (for which a man page is now available). commit af2756c73deebee910e979d9e22a736540b47b37 Author: maxv Date: Sun Sep 15 11:45:47 2019 +0000 Wrong major. commit 95c3b5fa717d54ed2ad26d62a0d49bb7b2500a89 Author: maxv Date: Sun Sep 15 09:24:38 2019 +0000 Reset ud_pipe0 to NULL before calling usbd_setup_pipe_flags(). If the call fails we call usbd_remove_device(), which tries to free ud_pipe0, but it was already freed. While here, add two sanity checks, to prevent possible surprises. commit 6b2b45b8c0f581d6394f17a1b25953ed959c5564 Author: maxv Date: Sun Sep 15 09:21:36 2019 +0000 Add missing length checks on descriptors, to prevent buffer overflows. Found via KASAN+vHCI. Some remain however, but it looks like the code needs to be re-thought along the way, so it will be fixed later. commit 7bb0c316c7719ed74bb1d7b1bddb8f3d38fc3233 Author: maxv Date: Sun Sep 15 09:18:17 2019 +0000 Don't kmem_alloc(0) if there are no endpoints, otherwise panic. Found via vHCI. commit e67b91df42d82e90d8815ae534124cd070d22f95 Author: maxv Date: Sat Sep 14 15:24:23 2019 +0000 Fix error handling, to prevent kernel crashes at detach time. The code is slightly reorganized. Found via vHCI. commit 8462b24abeb1865b6910530f50363b23af17af93 Author: maxv Date: Sat Sep 14 15:22:31 2019 +0000 Fix NULL deref. Found by vHCI. commit 972e12e86c1ac4ec683556b56ed5809648ec0900 Author: maxv Date: Sat Sep 14 15:21:19 2019 +0000 Fix error handling, to prevent kernel crashes at detach time. Found by vHCI. commit 1ad7412673766a6ff30bb3d0a0e9b67d3a68947d Author: maxv Date: Sat Sep 14 15:19:52 2019 +0000 Fix possible NULL deref. Found by vHCI. commit a66520abce9900bc6c94cf5ccff642e50bcae99a Author: maxv Date: Sat Sep 14 12:53:24 2019 +0000 Fix error handling, to prevent kernel crashes when detaching an upgt0 device. Found with vHCI. commit e155fa2957500594409a5d11956a5a80684a73e5 Author: maxv Date: Sat Sep 14 12:50:16 2019 +0000 Fix NULL deref, to prevent kernel crashes when detaching an udsbr0 device. Found with vHCI. commit bb0ab05a2e8e3c1e8d6e08befa83c940cf22275c Author: maxv Date: Sat Sep 14 12:48:51 2019 +0000 Fix error handling, to prevent kernel crashes when detaching an ugensa0 device. Also move usbd_add_drv_event() down, after we are sure the attach didn't fail. Found with vHCI. commit dac7d5d0580fd59ea2cdc010ef0c838c3f2c3691 Author: maxv Date: Sat Sep 14 12:46:00 2019 +0000 Fix NULL deref, to prevent kernel crashes when detaching an uipaq0 device. Found with vHCI. commit b8900e7b68800f2ed8b4389f5f93283e86d1ad34 Author: maxv Date: Sat Sep 14 12:42:36 2019 +0000 Fix error handling, to prevent kernel crashes when detaching an ural0 device. Found with vHCI. commit 251e1c375b9a38417708bc307774a5ff6088821c Author: maxv Date: Sat Sep 14 12:41:32 2019 +0000 Fix error handling, to prevent kernel crashes when detaching an urio0 device. Found with vHCI. commit da4d3fb8d6a8f7304d5664871b8528a0584cd844 Author: maxv Date: Sat Sep 14 12:40:31 2019 +0000 Fix error handling, to prevent kernel crashes when detaching an urtw0 device. Also, fail safely if we didn't recognize the RF chip, to prevent kernel crashes at attach time. Note that other panics are there, maybe they also should be removed. Found with vHCI. commit 72a6e4a4c45526f28b3df8a34c4f5a168d40dfbd Author: maxv Date: Sat Sep 14 12:38:40 2019 +0000 Fix error handling, to prevent kernel crashes when detaching an umcs0 device. Found with vHCI. commit 9821c626f59089bf519836e32ae93b0b6c8f583b Author: maxv Date: Sat Sep 14 12:37:34 2019 +0000 Fix NULL derefs, to prevent kernel crashes when detaching an otus0 device. Found with vHCI. commit 43c859697c8b15ce842fbad6de20c018aafc5216 Author: maxv Date: Sat Sep 14 12:36:35 2019 +0000 Fix error handling, to prevent kernel crashes when detaching an athn0 device. Found with vHCI. commit 5f75d834b8f2e4cf7b2d53425f35d8c38fd790dc Author: maxv Date: Sat Sep 14 12:32:08 2019 +0000 Fixes: - Insert at the tail and not the head. I just noticed that the packets were in inverted order in the fifos when attaching a virtual urtw0. - Remove VHCI_DEBUG, which I mistakenly left enabled in rev1. commit b516205283ae73d967940534911b41c00dad34f1 Author: maxv Date: Sat Sep 14 06:57:51 2019 +0000 Add vHCI, a driver which allows to send and receive USB packets directly from userland via /dev/vhci. Using this, it becomes possible to test and fuzz the USB stack and all the USB drivers without having the associated hardware. The vHCI device has four ports independently addressable. For each xfer on each port, we create two packets: a setup packet (which indicates mostly the type of request) and a data packet (which contains the raw data). These packets are processed by read and write operations on /dev/vhci: userland poll-reads it to fetch usb_device_request_t structures, and dispatches the requests depending on bRequest and bmRequestType. A few ioctls are available: VHCI_IOC_GET_INFO - Get the current status VHCI_IOC_SET_PORT - Choose a vHCI port VHCI_IOC_USB_ATTACH - Attach a USB device on the current port VHCI_IOC_USB_DETACH - Detach the USB device on the current port vHCI has already allowed me to automatically find several bugs in the USB stack and its drivers. commit ec4573dac862b166b74e9262b78dbbcd77a4ad94 Author: maxv Date: Fri Sep 13 14:19:13 2019 +0000 Always set hwcode on error. Useful for debugging. commit 5864b8bda1fe57e456bd410330d37672cfc1ca6d Author: maxv Date: Fri Sep 13 06:39:29 2019 +0000 As I suspected, the KASSERT I added yesterday can fire if we try to process zero-sized packets. Skip them to prevent a type confusion that can trigger random page faults later. Reported-by: syzbot+3e447ebdcb2bcfa402ac@syzkaller.appspotmail.com commit 95af5f86324417b72ff3c257c52742c972e48d40 Author: maxv Date: Thu Sep 12 07:38:19 2019 +0000 Add KASSERT to catch bugs. Something tells me it could easily fire. commit 6aa3dfe518e8aa38ebd9bd424237cf3811ae9c0b Author: maxv Date: Thu Sep 12 06:39:47 2019 +0000 Fix a normally harmless race: initialize several global variables only on cpu0, so we don't get eg cpu1 re-initializing them while cpu0 is using them. commit 65bce5562ab3bb33d8d07686bc4e807da8328095 Author: maxv Date: Sun Sep 8 18:46:32 2019 +0000 Hum, remove incorrect assignment. Userland could have passed a smaller namelen, and the uninitialized bytes from sb_data were being used later in the network stack. commit 258d0e01f7e40715fce6f116dde21511bcab6a29 Author: maxv Date: Sun Sep 8 07:00:20 2019 +0000 Introduce sigaction_copy(), to copy sigaction structures without padding, and use it in sigaction1(). This is to fix info leaks all at once in the signal functions. commit e7a495dcf8fa94f2f165120aa48d26b81612cedf Author: maxv Date: Sat Sep 7 18:56:01 2019 +0000 Merge amd64func.S into cpufunc.S, and clean up. commit f37e237057c8023491bc2114c7d2b3ed16eae664 Author: maxv Date: Sat Sep 7 18:33:16 2019 +0000 Convert rdmsr_locked and wrmsr_locked to inlines. commit d7b94fb3175d5ba89cebf65b1fd1caf5752a5e6d Author: maxv Date: Sat Sep 7 11:09:03 2019 +0000 Add a memory barrier on wrmsr, because some MSRs control memory access rights (we don't use them though). Also add barriers on fninit and clts for safety. commit de218cadd4f1655a7536008ccd71f282a5572c62 Author: maxv Date: Sat Sep 7 10:24:01 2019 +0000 Add KASAN instrumentation on the bus_space functions that handle buffers. commit ddb74eb69ec89d3ceea953d64f29a6b80617cbd5 Author: maxv Date: Sat Sep 7 09:46:07 2019 +0000 Add KASAN instrumentation for memmove. commit 2185b3bcb613ceac577510ef77a718bc95005f52 Author: maxv Date: Fri Sep 6 09:19:06 2019 +0000 Reorder for clarity, and localify pool_allocator_big[], should not be used outside. commit 6669fda4ef52f4ddb6be8e219815d281aa6e7a97 Author: maxv Date: Thu Sep 5 16:19:16 2019 +0000 Add KASAN instrumentation on the atomic functions. Use macros to simplify. These macros are prerequisites for future changes. commit 39dafe1bd80feac7425849a6cce86feb3491c0b5 Author: maxv Date: Thu Sep 5 12:57:30 2019 +0000 Remove unused, and style. commit acdc5a40ea8166bde601c56572b2b3a2d4cfd9c0 Author: maxv Date: Tue Aug 27 17:24:51 2019 +0000 Fix bug, remove {0,0} because we switched to usb_lookup(). commit c4855322554f4c78100769ce1d2693fadb12c96e Author: maxv Date: Mon Aug 26 10:35:35 2019 +0000 Revert r1.254, put back || for KASAN, some destructors like lwp_dtor() caused false positives. Needs more work. commit fa55dee2eb123be3f218411aa9c809fcbe15ea73 Author: maxv Date: Mon Aug 26 10:19:08 2019 +0000 Reject negative offsets, to prevent panics later in genfs_getpages(). commit 127cb8cbb0df7afd85d32514080aa7adcdcb51df Author: maxv Date: Sun Aug 25 07:10:30 2019 +0000 Fix the size passed to memcpy, we only want 8 bytes. Found by KASAN. commit 7d126bf7d638a5b13001e0123a12b54ef8e4e478 Author: maxv Date: Sat Aug 24 14:21:13 2019 +0000 I don't see the point in having this useless printf, but add a '\n' to it, so that it at least displays useless stuff correctly. commit d5c2d209a0aad2382c3362cebdf250a742010e8e Author: maxv Date: Sat Aug 24 14:18:43 2019 +0000 Fix memory leak. commit d353d5d53f21a14aa00ede56a5591f3d601fbb03 Author: maxv Date: Sat Aug 24 14:08:35 2019 +0000 Hum, don't pass an mbuf to realloc(). Inspired from copyin32_msg_control(). commit e22743f7f2b944bde9f4d571cdbfe90c62fadbe4 Author: maxv Date: Sat Aug 24 12:33:25 2019 +0000 Don't read data from userland directly. This simply does not work on any recent x86 CPU (thanks to SMAP) and all architectures that forbid direct access to userland from the kernel. But I guess no one noticed because no one ever uses compat_linux, right? commit 0bd13b595136e04ecf60a88cb45cd3110e3e9c6c Author: maxv Date: Fri Aug 23 14:12:39 2019 +0000 Fix info leaks. commit 8da7c4fb7507fc10e6645aae94de89421e42dfb0 Author: maxv Date: Fri Aug 23 13:59:45 2019 +0000 Fix info leak. commit f79c6d34954790ac0233704b15b96885c478353f Author: maxv Date: Fri Aug 23 13:49:12 2019 +0000 Hum, don't forget the 'pid' argument, otherwise we're not gonna go very far. commit 791ca3625d1ccbce39f37ec6a4e1124758b69683 Author: maxv Date: Fri Aug 23 13:36:45 2019 +0000 Fix info leaks. commit aeaf1f2e1d44b4212762f644857a0b00f7df9472 Author: maxv Date: Fri Aug 23 12:49:59 2019 +0000 Put the printf under DEBUG_LINUX. commit f2284345a294573b3fd15bd550d831ea7b8fe2c5 Author: maxv Date: Fri Aug 23 12:42:14 2019 +0000 Fix error handling, returns an errno, not -1. commit c5899e7ec9cc65e45eaaec4fed5c51cda5876b5f Author: maxv Date: Fri Aug 23 12:09:17 2019 +0000 Add a default case, don't call sys_ioctl() with an uninitialized 'com' argument. commit 6a99472c6a4ed57ac575c40ac29f76a0f183d78a Author: maxv Date: Fri Aug 23 11:19:39 2019 +0000 When dealing with an unknown value, set -1, to prevent (harmless) uninitialized accesses later. commit 9221d329a9d5e573acae96c53577242788332360 Author: maxv Date: Fri Aug 23 10:31:14 2019 +0000 Remove printf. commit b68e52b0a88e27d113ebf19a3dd3dd688e7749ae Author: maxv Date: Fri Aug 23 10:22:14 2019 +0000 Fix stupid bugs in linux_sys_shmctl(): the index could be out of bound (page fault) and there was no proper locking. Maybe we should just remove LINUX_SHM_STAT, like compat_linux32. commit 515ffd83c7d90d01ea8c2348f01859bf3149a293 Author: maxv Date: Fri Aug 23 09:41:26 2019 +0000 Add missing mutex, we were hitting a KASSERT. commit 63f4f5892b285f179e06ea18b23ed8fb22fe189d Author: maxv Date: Fri Aug 23 08:31:11 2019 +0000 Fix info leaks in sigaltstack. commit 2af6c2421af10e459ddb7eee11b95f49f1f678ac Author: maxv Date: Fri Aug 23 08:01:42 2019 +0000 Fix info leaks in sysinfo(). commit 5e23d5fcc74c2932501c1d50f4b610541a5715f1 Author: maxv Date: Fri Aug 23 07:53:36 2019 +0000 Fix info leaks. commit 9e1e9ebad757f17a7f5ac3d95671b0100e30d25f Author: maxv Date: Fri Aug 23 06:59:52 2019 +0000 Fix info leaks. commit 71ca9c8f0ca7c690c3d6d9042a7a71d6c0b23ace Author: maxv Date: Fri Aug 23 06:54:54 2019 +0000 Fix info leak. commit c50162c13b75f2cedd6b1a8a88735417561abead Author: maxv Date: Fri Aug 23 06:47:58 2019 +0000 Fix info leaks. commit 38e4b6c542d306bc21db8e71b3a09a6b7f5557ca Author: maxv Date: Wed Aug 21 17:14:05 2019 +0000 Style and cleanup. commit 56efcbd93cbc37971cd7c20c0fe0f57fde5f4fe0 Author: maxv Date: Wed Aug 21 17:06:36 2019 +0000 Remove the single-step check, it is wrong. There is no way we could single-step on these entry points. If there were, we would be running with the wrong GS.base, and we would have died long before. commit d5948df6d5d6b81bd9d6c588352d1928e8d1b532 Author: maxv Date: Wed Aug 21 16:35:10 2019 +0000 Switch from printf to panic. These messages were notorious for being unreadable, and at least a clean panic allows the user to inspect the system via DDB. Also simplify the output, EAX gets overwritten with the error code so it indicates nothing meaningful. commit aa7ad3cb6544af8faefc81f60449e0cdee611b59 Author: maxv Date: Wed Aug 21 12:46:56 2019 +0000 Style and remove dead stuff. commit 798eb8739c05438b9397ba704dd30ae29132d008 Author: maxv Date: Wed Aug 21 12:33:12 2019 +0000 Don't depend on #ifdef USER_LDT in cpu_mcontext32_validate(), but rather on whether the proc uses a user-set LDT. Same as check_sigcontext32(). commit 116c234c833bcd8ea2ad261158821f5a913d7446 Author: maxv Date: Wed Aug 21 12:16:07 2019 +0000 No USER_LDT on Xen. commit 25e9e0fd714ded5525a372f8fac8fcd80ee0982e Author: maxv Date: Tue Aug 20 18:43:57 2019 +0000 Fix info leak, not all of 'pev' is initialized. commit d58bd629357795a9f5cdee85133510373fe9354a Author: maxv Date: Tue Aug 20 12:25:41 2019 +0000 Disable netbsd32_drm.c until it receives proper review. commit 8298e53fae2ed1e6fd58620e983202ea938b7264 Author: maxv Date: Sat Aug 17 12:37:49 2019 +0000 Kernel Heap Hardening: use bitmaps on all off-page pools. This migrates 29 MI pools on amd64 from linked lists to bitmaps, which have higher security properties. Then, change the computation of the size of the PH pools: take into account the bitmap area available by default in the ph_u2 union, and don't go with &phpool[>0] if &phpool[0] already has enough space to embed a bitmap. The pools that are migrated in this change all use bitmaps small enough to fit in &phpool[0], therefore there is no increase in memory consumption. commit 59cc283b98392776ac84d1665e87dfca5356685f Author: maxv Date: Fri Aug 16 10:41:35 2019 +0000 Initialize pp->pr_redzone to false. For some reason with KUBSAN GCC does not eliminate the unused branch in pr_item_linkedlist_put(), and this leads to a unused uninitialized access which triggers KUBSAN messages. commit 074877b015dfb5bb5187d69235138b8b8ef3a29d Author: maxv Date: Thu Aug 15 12:24:08 2019 +0000 Unlink KMEM_GUARD leftovers. commit d028ec13d27411e5a9784cc75dfefe2455139df7 Author: maxv Date: Thu Aug 15 12:06:42 2019 +0000 Retire KMEM_GUARD. It has been superseded by kASan, which is much more powerful, has much more coverage - far beyond just kmem(9) -, and also consumes less memory. KMEM_GUARD was a debug-only option that required special DDB tweaking, and had no use in releases or even diagnostic kernels. As a general rule, the policy now is to harden the pool layer by default in GENERIC, and use kASan as a diagnostic/debug/fuzzing feature to verify each memory allocation & access in the system. commit 56c649b68ff7b9674f999a504c865d046dbb0405 Author: maxv Date: Tue Aug 13 09:48:24 2019 +0000 sync with reality commit ed019e76fea7e63e2643238826072525d1fbcf85 Author: maxv Date: Wed Aug 7 10:36:19 2019 +0000 Check fc_type before fc_cluster, because the latter may not be initialized. This is harmless because fc_type is always initialized properly, so the next branch wouldn't have been taken. commit 3356cb9f72f07c5b72b348765cd1f27bb7214a59 Author: maxv Date: Wed Aug 7 08:47:09 2019 +0000 Introduce USB_DESCRIPTOR_SIZE (3), and fix two bugs: 1) In usbd_find_idesc(), make sure the tables we're reading fit in the allocated buffer, otherwise small overflow (seen on KASAN, with bLength=1). 2) Modify usbd_find_edesc(), to fix the same issues as 1). ok mrg@ commit 549cdeb397d94c841f39b5f18b209735df25cb66 Author: maxv Date: Wed Aug 7 06:28:03 2019 +0000 Sync with reality. commit 2d05417d37378e1c54c1da61a49021946201287e Author: maxv Date: Wed Aug 7 06:23:48 2019 +0000 Add support for USER_LDT in SVS. This allows us to have both enabled at the same time. We allocate an LDT for each CPU in the GDT and map an area for it, in addition to the default LDT already present. In context switches between different processes, we choose between the default or the per-cpu LDT selector: if the user set specific LDT entries, we memcpy them to the per-cpu LDT and load the per-cpu selector. Tested by Naveen Narayanan (with Wine on amd64). commit a685756b3abd42b1caea2d3050efeee29ed2ab2d Author: maxv Date: Tue Aug 6 08:10:27 2019 +0000 Change 'npgs' from int to size_t. Otherwise the 64bit->32bit conversion could lead to npgs=0, which is not expected. It later triggers a panic in uvm_vsunlock(). Found by TriforceAFL (Akul Pillai). commit 22934201a61106bb55d231172d44d58fa8eabcbd Author: maxv Date: Sun Aug 4 14:30:36 2019 +0000 Fix info leaks. commit bba16da44362a4d7b15424ae0d3a6420f713aba4 Author: maxv Date: Sat Aug 3 09:31:07 2019 +0000 Replace || by && in KASAN, to increase the pool coverage. Strictly speaking, what we want to avoid is poisoning buffers that were referenced in a global list as part of the ctor. But, if a buffer indeed got referenced as part of the ctor, it necessarily has to be unreferenced in the dtor; which implies it has to have a dtor. So we want both a ctor and a dtor, and not just one of them. Note that POOL_QUARANTINE already implicitly provides this increased coverage. commit a8ba46d5e423a796cefa11ed1c436556b9de636d Author: maxv Date: Fri Aug 2 05:22:14 2019 +0000 Kernel Heap Hardening: perform certain sanity checks on the pool caches directly, to immediately detect certain bugs that would otherwise have been detected only later on the pool layer, if the buffer ever reached the pool layer. commit 7133310556a1c96b6a27feee3141a353e337c00d Author: maxv Date: Wed Jul 31 19:40:59 2019 +0000 1) Make sure we have a complete endpoint descriptor header, otherwise small overflow. 2) Make sure the total length of the bos descriptor did not change in the meantime, otherwise severe memory corruption. 3) Make sure we have a complete hid descriptor header, otherwise small overflow. 4) Error out if the report descriptor is zero-sized, otherwise panic. ok skrll@ mrg@ commit 668bae59d437f33f83a6dbe365b51ce7f6ba4ad6 Author: maxv Date: Mon Jul 29 09:42:17 2019 +0000 Fix info leak: the padding after the header causes uninitialized heap memory to be copied to userland in sys_recvmsg(). commit 96b5928ca658fc8eecfcbe4b227093dbc05a5624 Author: maxv Date: Tue Jul 23 17:21:33 2019 +0000 1) If the descriptor length is bigger than the USB string descriptor itself, error out. Otherwise there is a small overflow (seen on KASAN, with bLength=255). 2) Make sure we have a config descriptor header, otherwise there are small overflows (seen on KASAN, with wTotalLength=1). 3) Once we have the complete config descriptor, make sure its size didn't change in the meantime. Otherwise there could be severe overflows. 4) Make sure we have a bos descriptor header, otherwise overflow, same as 2). ok mrg@ skrll@ commit 64bb0438705e62088d17903d579a9abf2a848b5e Author: maxv Date: Sun Jul 14 05:58:44 2019 +0000 Fix uninitialized variable: if 'tvp' is NULL, '*tdep' is not initialized. This could have caused the KASSERT to wrongfully fire. ok riastradh@ commit 0160edcbaadd9ce263a35b43a969236ebfd81893 Author: maxv Date: Sat Jul 13 14:24:37 2019 +0000 Remove the roundups, they are incorrect and cause memcmp to wrongfully fail because of uninitialized bytes at the end of the buffers. ok rmind@ commit cda5ded6afc840f01424695cd188d02de74af2e1 Author: maxv Date: Fri Jul 12 17:18:30 2019 +0000 Fix info leak: zero out the buffer, because it is not entirely filled, and the uninitialized bytes get copied to userland in sys___getdens30(). Remove unneeded cast while here. commit 0614378fa8c73e02895cef022fa8e06a4f7e4abd Author: maxv Date: Thu Jul 11 17:30:44 2019 +0000 Fix info leaks: the alignment of the structures causes uninitialized heap memory to be copied to userland in sys_recvmsg(). commit db9c91c2aeec8532c4cc7186d52caad33bcabfd0 Author: maxv Date: Thu Jul 11 17:07:10 2019 +0000 Fix info leak: 'map_attrib' is not used in UVM, and contains uninitialized heap garbage. Return zero. Maybe we should remove the field completely. commit 1e7d7de9cacb56131a9b8eb0df60e1ba0c45e652 Author: maxv Date: Thu Jul 11 16:59:14 2019 +0000 Fix (harmless) uninitialized variable: 'pg' could be 'endm', in which case 'pg->uobject' would not be initialized. Just invert the two last conditions of the KASSERT. ok hannken@ commit cc6e90f4682b0d55a8aa7ff52ac3d7866dc8e7a2 Author: maxv Date: Wed Jul 10 17:55:33 2019 +0000 Fix info leak: use kmem_zalloc, because we align the buffers, and the otherwise uninitialized padding bytes get copied to userland in bpf_read(). commit da12ffa63224a1261964fc1bd59c1f990805c019 Author: maxv Date: Wed Jul 10 17:52:22 2019 +0000 Fix info leak: instead of using SS_INIT as a literal compound, use a global variable from rodata. The compound gets pushed on the stack, the padding of the structure was therefore not initialized, and was getting leaked to userland in sys___sigaltstack14(). commit 13331b3fac1ddbe1d4c205ac41acd47a0c5fac37 Author: maxv Date: Wed Jul 10 17:32:37 2019 +0000 Zero out 'cprng->cs_name' entirely. Otherwise the RND pool gets polluted by uninitialized bits from the end of the string. commit 03685f6b39538b5a6857fa01b21efbed5410dc49 Author: maxv Date: Tue Jul 9 17:06:46 2019 +0000 Fix info leak: always clear 'dkw', because some of its (otherwise uninitialized) fields can be copied to userland, typically in the DIOCGWEDGEINFO ioctl. commit a331918d668be429059de915dff1bfa7e68d5489 Author: maxv Date: Tue Jul 9 16:56:24 2019 +0000 Fix uninitialized variable: in ipsec_checkpcbcache(), spidx.dir is not initialized, and the padding of the spidx structure is not initialized either. This causes the memcmp() to wrongfully fail. Change ipsec_setspidx() to always initialize spdix.dir and zero out the padding. ok ozaki-r@ commit d7f07f98ec3886c168e0fb4781592fdfe1c52058 Author: maxv Date: Sun Jul 7 15:12:59 2019 +0000 The whole 'tv' structure gets added to the RND pool, so clear it first, otherwise each random buffer gets tainted by uninitialized bytes from the padding. commit 5f9dd21e12bc5ef11198d7b33791c727639ccfe3 Author: maxv Date: Sat Jul 6 14:37:24 2019 +0000 Fix bug: if seg == UIO_SYSSPACE, tv[] is not initialized. The branches should depend on tptr[] instead. commit 11071d41c382946e9c8092f2f1ce6efe80f0ff35 Author: maxv Date: Sat Jul 6 14:27:38 2019 +0000 Fix (harmless) uninitialized variable. In the path namei_tryemulroot -> namei_oneroot-> namei_start There was a branch where 'ndp->ni_erootdir' was not initialized. commit 21027ae766d42505d1476983fc214253d8896c23 Author: maxv Date: Sat Jul 6 08:00:19 2019 +0000 Revert previous, for now. commit 4c49d02418100522a93e7ea7913dfbd8c3a65471 Author: maxv Date: Sat Jul 6 05:41:23 2019 +0000 Add a condition in the loop. Otherwise there could be an infinite loop, and we could also be wrongfully adding more wedges than necessary. Arbitrarily limit the number of blocks to 512, like GPT. commit 54cda24b14d01f81401f96d68072d6068908f0f5 Author: maxv Date: Sat Jul 6 05:13:10 2019 +0000 Localify two functions that are no longer used outside. Also return the error from the *_vcpu_run() functions, now that we commit the states in them (which can fail). commit d0e828351e508c24e70f401a0bf33b573777ba7b Author: maxv Date: Sat Jul 6 05:05:53 2019 +0000 Fix two length checks, otherwise a malicious USB key plugged in the system could trigger overflows, seen with KASAN. commit 37ac2396b7048288aafbcc643923ceab1f0be482 Author: maxv Date: Fri Jul 5 17:14:48 2019 +0000 Fix info leak. The padding of 'sigact' is not initialized, it gets copied in the proc, and can later be obtained by userland. commit 666264fa94a1f22ae35a427810ca95cc519aef7c Author: maxv Date: Fri Jul 5 17:08:55 2019 +0000 More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw(). commit 7d82f8bdff11301c0b06b8f611280cddc6ffe31e Author: maxv Date: Wed Jul 3 17:40:29 2019 +0000 Check the return value of PHY_READ(). Because, if it fails, 'reg' is not initialized. On Qemu, this read systematically fails. Print an error in this case, and act as if there was no fiber. Maybe there is a smarter way to fix this kind of things. commit 94fca8f69efe931f01e1975aecd51ebc542a78c4 Author: maxv Date: Wed Jul 3 17:31:32 2019 +0000 Invert two conditions, to fix uninitialized memory access. If the node is an immediate, then the 64 bits of nnode.sysctl_data may not all be initialized, since this is an union. Obviously, this is harmless; but still a bug, so fix it. commit e51942df84390cffffd973bc558545f1056e35c6 Author: maxv Date: Wed Jul 3 17:24:37 2019 +0000 Inline x86_cpuid2(), prerequisite for future changes. Also, add "memory" on certain other inlines, to make sure GCC does not reorder. commit 90669f0e93048e4b2b27fe2fd2f81097f612b095 Author: maxv Date: Mon Jul 1 17:15:43 2019 +0000 Restrict the size given to copyoutstr. It is safer to do that; even if there is no actual bug here, since the buffer is guaranteed to be NUL terminated. With KASAN we check the whole buffer to cover the "worst" case, and here it triggered false positives because the buffer size was not filtered. commit 919e47b6800acf15e2dd1faa4421c5a24269403b Author: maxv Date: Sat Jun 29 11:37:17 2019 +0000 Fix bug, don't release the reflock if we didn't take it in the first place. Looks like there are other locking issues in here. Reported-by: syzbot+81d2c90809163ab1e13c@syzkaller.appspotmail.com commit de17945650e85a6c884074a97fb3f6d5c43710f2 Author: maxv Date: Sat Jun 29 11:13:23 2019 +0000 The big pool allocators use pool_page_alloc(), which allocates page-aligned storage. So if we switch to a big pool, set PR_NOALIGN, because the address of the storage is not aligned to the item size. Should fix PR/54319. commit c7b68781231d718dc95541b79530b82eab7a4afb Author: maxv Date: Thu Jun 27 19:56:10 2019 +0000 Fix this fucking shit once and for all, for fuck's sake. commit c06e4da241cbc77cf957a25d50edf866e5464c32 Author: maxv Date: Wed Jun 26 20:28:59 2019 +0000 Remove useless debugging messages which achieved nothing but hiding bugs. commit 97515bf45b264ee189809748d78da4e5e03cccbb Author: maxv Date: Tue Jun 25 16:58:02 2019 +0000 Fix buffer overflow. It seems that some people need to go back to the basics of C programming. Reported-by: syzbot+8665827f389a9fac5cc9@syzkaller.appspotmail.com commit 61c59d054013a8b9f866c3456d2b75c6ecc9d01d Author: maxv Date: Sat Jun 22 12:57:40 2019 +0000 Revamp the TPM driver * Fix several bugs, and clean up. * Drop the "legacy" interface, it relied on an undocumented global variable that was never initialized. It likely had never been tested either, so good riddance. * Add support for TPM 2.0 chips via ACPI. For these we use the TIS1.2 interface, same as TPM 1.2. * Provide an ioctl to fetch TPM information from the driver. Tested on a Lenovo desktop with ACPI-TPM2.0, an HP laptop ACPI-TPM2.0, a Dell laptop with ISA-TPM1.2. commit 01d5453a9b53ab986625f17fb22d14709013652d Author: maxv Date: Sat Jun 22 12:39:40 2019 +0000 Dump TPM2. commit 6a56e877a12a793aee8e0a92442f7609917b1e87 Author: maxv Date: Sat Jun 22 06:45:46 2019 +0000 Fix buffer overflow. Triggerable by plugging a specially-crafted USB key in the machine (the kernel automatically tries to parse its GPT header). The check could maybe be appeased to allow bigger sizes, but we've never done that, so I'm leaving it as-is. commit 0a16923c5a7fc37329ec618c231a97e750880bdf Author: maxv Date: Thu Jun 20 17:33:30 2019 +0000 Add KASLR support in UEFI. commit 4febafc81e1fa9d1ed55d506625dd3b40e893987 Author: maxv Date: Sun Jun 16 18:30:31 2019 +0000 Make sure VMX-outside-SMX is allowed. It may not be if the BIOS decided to disable VMX. Seen on an HP laptop, where NVMM would panic because of that. commit 5f0a072e182158d1fcc3c74278301f0975b988a2 Author: maxv Date: Sun Jun 16 07:42:52 2019 +0000 Misc changes in RISC-V. commit 46ad26c7ef24fbefa69d0d1a90427fc656324d6f Author: maxv Date: Sat Jun 15 06:40:34 2019 +0000 Add KASAN_PANIC, an option to turn KASAN warning into kernel panics, requested by Siddharth. While here clarify a little. commit e057090624c844155c70a69824f114872453d095 Author: maxv Date: Thu Jun 13 17:33:34 2019 +0000 Fix the error handling in ehci_pci_attach(): if we got a USB<2 device we won't call ehci_init(), so don't call ehci_detach() in ehci_pci_detach(). Fixes a panic seen on a recent Lenovo machine, which has an USB 1.1 controller; ehci_detach() was getting called while 'sc' had not been completely initialized. commit da79db57eed0c5785bb489440456259d0c60f762 Author: maxv Date: Thu Jun 13 17:20:25 2019 +0000 Random style in ehci, also KM_SLEEP does not fail. commit 2d39ebf166acd9d66525b7758ae0391f7a5e5ea4 Author: maxv Date: Sat Jun 8 07:27:44 2019 +0000 Change the NVMM API to reduce data movements. Sent to tech-kern@. commit 0fd3afd674a655a6b2ac8668d4edcbbadb51d870 Author: maxv Date: Sat Jun 1 15:20:51 2019 +0000 Add XXXs for SCTP bugs. commit 0ddddbcaee5c19d263e5bb42a5f0535373171da6 Author: maxv Date: Sat Jun 1 12:42:27 2019 +0000 Misc changes in RISC-V. Start changing the memory layout, too. commit 6b86ec3cf8b2deafd07e6efabe773bf0adda9b66 Author: maxv Date: Sat Jun 1 08:12:26 2019 +0000 Fix two bugs in pmap_write_protect(): * The mask should be ~PAGE_MASK, not PTE_FRAME. PTE_FRAME eliminates the higher bits, and that's not wanted. * The computation of tva is incorrect: if the VA is in kernel space we must take the canonical hole into account, and here we were not. We've had these bugs basically forever. It meant that uvm_km_protect() would never flush the correct VA, and a stale TLB entry would persist. Fixes PR/54257. Since I added PCID support we execute invpcid in invlpg(), and invpcid triggers a #GP if the address is non canonical, contrary to invlpg. The wrong computation of the VA during a modload happened to hit the canonical hole. commit 2f948a24b47a2a314b099f7b2184272b1863fc4a Author: maxv Date: Sat Jun 1 06:54:28 2019 +0000 Mmh, check the highest leaf before calling x86_cpuid(), otherwise on old CPUs we might be getting garbage. While here fix a typo. Likely fixes PR/54256. commit d2a25c1a5d3cda8ab802aa25f52d6b19384dce80 Author: maxv Date: Wed May 29 17:09:17 2019 +0000 Add support for AMD Family 17h. commit 5fd19911815aeb30afe879ed7abcec9d77e15080 Author: maxv Date: Wed May 29 16:54:41 2019 +0000 Add PCID support in SVS. This avoids TLB flushes during kernel<->user transitions, which greatly reduces the performance penalty introduced by SVS. We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages in both ASIDs. The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether SVS+PCID is in use. commit 4c0a64f103c808652e0f37a3cc8e61aa0459627f Author: maxv Date: Tue May 28 18:25:23 2019 +0000 http -> https commit 7f27acd09dbbebeb7550e595c4517020b9f6b92d Author: maxv Date: Mon May 27 18:36:37 2019 +0000 Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled, but don't use PTE_G on the kernel PTEs in general. Add PTE_G on only a few pages, that are already leaked to userland and do not contain secrets. This slightly improves syscall performance. commit 26316327fe97e02bc89499e3ba59058f9ab030c2 Author: maxv Date: Mon May 27 17:32:36 2019 +0000 Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and there, reduces a future diff. commit 8a55f218bc791bba4e6b1ad3f325e792d5363f54 Author: maxv Date: Sat May 25 21:02:32 2019 +0000 Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be outdated, and we could be filling the AVX registers with garbage. commit 78d5e83e0d6b06ef351b1cdc2b34fefad1997b40 Author: maxv Date: Sun May 19 08:49:08 2019 +0000 Remove useless call to fpu_sigreset(), buildcontext() already calls it. commit 5d973841a56f389a2fee436bfa66bf747808cbbb Author: maxv Date: Sun May 19 08:46:15 2019 +0000 Rename fpu_save_area_clear -> fpu_clear fpu_save_area_reset -> fpu_sigreset Clearer, and reduces a future diff. No real functional change. commit 76a1c43c525b6869cb1d929d616a3db05a52777e Author: maxv Date: Sun May 19 08:17:02 2019 +0000 Misc changes in the x86 FPU code. Reduces a future diff. No real functional change. commit 7353232cf17070f5995d0c51c4c2a88e43ff3028 Author: maxv Date: Sat May 18 13:44:57 2019 +0000 Enable EagerFPU by default. Sent on port-amd64@. commit b69b8b5814b8c85218df54e131fbf4c0308fe7bf Author: maxv Date: Sat May 18 13:32:12 2019 +0000 Two changes in the CPU mitigations: * Micro-optimize: put every mitigation in the same branch. This removes two branches in each exc/int return path, and removes all branches in the syscall return path. * Modify the SpectreV2 mitigation to be compatible with SpectreV4. I recently realized that both couldn't be enabled at the same time on Intel. This is because initially, when there was just SpectreV2, we could reset the whole IA32_SPEC_CTRL MSR. But then Intel added another bit in it for SpectreV4, so it isn't right to reset it entirely anymore. SSBD needs to stay. commit 54ebea61e129501586cb6328d90159ccc3b04162 Author: maxv Date: Sat May 18 08:55:59 2019 +0000 Now that SVS cannot be disabled at run time, MSR_LSTAR is static, so no need to save it on each VM enter. commit cfa7714deb82186871e534d820c2a4bdf81a1582 Author: maxv Date: Sat May 18 08:54:38 2019 +0000 Use XC_HIGHPRI for SpectreV2 to reduce the CPU downtime. We already do this for MDS. commit 5dd16828fbcaa092a1a0aaea59a031657b7e95a8 Author: maxv Date: Sat May 18 08:17:39 2019 +0000 Clean up a little, add new XCR0 bits, remove a few unused MSRs, and fix typos. commit f30b2e320cce51af33482d52a415b517087eb604 Author: maxv Date: Sat May 18 07:58:58 2019 +0000 Set the symbol type for intrfastexit, so that tools like tprof can find the symbol name. commit 625bffbcef62e464abcd09531af15949bf3185da Author: maxv Date: Sat May 18 07:49:31 2019 +0000 Disable errata #1091. We are the only OS to apply it, and it seems to be causing trouble to VirtualBox (PR/54143). commit 970c6276142f701249b9a37b87697423aaaf23e5 Author: maxv Date: Wed May 15 18:27:51 2019 +0000 Enable EagerFPU on Xen PV. Should work as-is. Sent on port-amd64@. commit 12a952c184560235da8f3dd0cc1326f99f6f1b05 Author: maxv Date: Wed May 15 17:35:02 2019 +0000 RB_MD3 now disables SVS. commit 7e23abd6bcec9606d5a41483c6ffc122f065105f Author: maxv Date: Wed May 15 17:31:41 2019 +0000 Change the way SVS is disabled. Now you have to pass "boot -3" from the bootloader. The machdep.svs.enabled sysctl becomes read-only, and just indicates whether SVS is enabled. Sent on port-amd64@. commit e66fce09b8b94eb5624b1f46bd6a75f17c24c076 Author: maxv Date: Wed May 15 04:39:52 2019 +0000 NVMM: Expose MD_CLEAR to the guests. commit 0dc155513120e0b3359291b3b6829858a2341443 Author: maxv Date: Tue May 14 16:59:25 2019 +0000 Mitigation for INTEL-SA-00233: Microarchitectural Data Sampling (MDS). It requires a microcode update, now available on the Intel website. The microcode modifies the behavior of the VERW instruction, and makes it flush internal CPU buffers. We hotpatch the return-to-userland path to add VERW. Two sysctls are added: machdep.mds.mitigated = {0/1} user-settable machdep.mds.method = {string} constructed by the kernel The kernel will automatically enable the mitigation if the updated microcode is present. If the new microcode is not present, the user can load it via cpuctl, and set machdep.mds.mitigated=1. commit baf8a2a31f7bf246b2e588424a1b6c75b5c93030 Author: maxv Date: Mon May 13 18:53:10 2019 +0000 Remove comment, since there is no parsing anymore. commit 1cec1b2ad28be2813ea5a9186a2f6bf10dd13b2e Author: maxv Date: Mon May 13 17:50:30 2019 +0000 Remove dead code. commit 68287dc149ec097044afe9dfd4dd7163873e81bd Author: maxv Date: Sat May 11 19:31:03 2019 +0000 Add smtoff, an rc.d script that disables Simultaneous Multi-Threading. It parses the output of cpuctl, and executes "cpuctl offline" for each CPU that has SmtID!=0. The default is "smtoff=NO", which means that SMT remains enabled. commit adfe7f88a8a5b56abecc8a597a1b40ef2dbec3f3 Author: maxv Date: Sat May 11 11:59:21 2019 +0000 Check the return value of cpuset_set(), to prevent future surprises. commit 954bc679a4c861ad8d384f99d2940f872ea22712 Author: maxv Date: Sat May 11 11:53:55 2019 +0000 Fix bug, the computation of cpuset_nentries was incorrect, we must do +1 to be able to address the last 32 bits. On a machine with 80 CPUs, this caused "cpuctl identify >64" to return garbage. commit 8d06476a4af78e72e2840470daac171f98e30abc Author: maxv Date: Sat May 11 07:44:00 2019 +0000 Replace "VMM" by "emulator", clearer. commit b0c3d26d9a3ac415670cce95549ae9a3d868a46d Author: maxv Date: Sat May 11 07:40:38 2019 +0000 Sync with reality. commit 08b6253b720d3b8f53c9ab0c3ed78e48dfb117d8 Author: maxv Date: Sat May 11 07:31:56 2019 +0000 Rework the machine configuration interface. Provide three ranges in the conf space: , and . Remove nvmm_callbacks_register(), and replace it by the conf op NVMM_MACH_CONF_CALLBACKS, handled by libnvmm. The callbacks are now per-machine, and the emulators should now do: - nvmm_callbacks_register(&cbs); + nvmm_machine_configure(&mach, NVMM_MACH_CONF_CALLBACKS, &cbs); This provides more granularity, for example if the process runs two VMs and wants different callbacks for each. commit f42f7d992df819472a3ad1e65c88074beb42350d Author: maxv Date: Fri May 10 18:21:01 2019 +0000 Clean up, and add sanity checks on the microcode lengths. commit c84db16209cd8b8b305501ecb2aa71b52830ce53 Author: maxv Date: Thu May 9 18:53:14 2019 +0000 Invalidate the cache before updating the microcode. Some platforms require this. Seen in Illumos and FreeBSD. commit 4c7f382b72887b7851d7a6947f05876fa44daf33 Author: maxv Date: Sat May 4 17:19:10 2019 +0000 Rewrite kasan_mark() to fix a still existing race in pool_cache_get_paddr() that could cause false positives. Now a buffer initially valid remains valid, with no invalid->valid dance. commit 5376d5e6d6840960b6d44d2508904c0802b5cb1a Author: maxv Date: Sat May 4 10:07:10 2019 +0000 Add KASAN instrumentation for kcopy and copystr. commit 9b6778dfadd59333bcf718af3231759b3db80048 Author: maxv Date: Sat May 4 08:50:39 2019 +0000 Hum. Fix a potentially catastrophic bug: kcopy() sets DF=1 if the areas overlap, but doesn't clear it if the copy faults. If this happens, we return to the caller with DF=1, and each future memory copy will be backwards. I wonder if there really are places where kcopy() is called with overlapping areas. commit 1394b45ba7c93c827abe805cf54d0c9980891ddb Author: maxv Date: Sat May 4 07:20:22 2019 +0000 More inlined ASM. While here switch to proper types. commit 11d15af4940b40879a6158e646cc2b3ea96dcd1a Author: maxv Date: Wed May 1 15:17:49 2019 +0000 Start converting the x86 CPU functions to inlined ASM. Matters for NVMM, where some are invoked millions of times. commit cab61aa07012601a8a9f544e84c209442dc5d27e Author: maxv Date: Wed May 1 14:29:15 2019 +0000 Remove unused functions and reorder a little. commit 6087bfac2af9811302540232697de9be85ac714c Author: maxv Date: Wed May 1 09:20:21 2019 +0000 Use the comm page to inject events, rather than ioctls, and commit them in vcpu_run. This saves a few syscalls and copyins. For example on Windows 10, moving the mouse from the left to right sides of the screen generates ~500 events, which now don't result in syscalls. The error handling is done in vcpu_run and it is less precise, but this doesn't matter a lot, and will be solved with future NVMM error codes. commit 9b9b8400e517955de0a7298d2d980b618e657f42 Author: maxv Date: Mon Apr 29 19:03:17 2019 +0000 sync with reality commit ed9fe4d15fc1177ea499bd2708f7dc6965686646 Author: maxv Date: Mon Apr 29 18:54:25 2019 +0000 Stop taking care of the INT/NMI windows in the kernel, the emulator is supposed to do that itself. commit 604a261057e604c2616f4404e5b9f06f3e18a618 Author: maxv Date: Mon Apr 29 17:27:57 2019 +0000 Remove useless calls to nvmm_init(). commit 45e2a27ec0c25d5be79280294f4d27e2ab9e02d3 Author: maxv Date: Sun Apr 28 14:22:13 2019 +0000 Modify the communication layer between the kernel NVMM driver and libnvmm: introduce a bidirectionnal "comm page", a page of memory shared between the kernel and userland, and used to transfer data in and out in a more performant manner than ioctls. The comm page contains the VCPU state, plus three flags: - "wanted": the states the kernel must get/set when requested via ioctls - "cached": the states that are in the comm page - "commit": the states the kernel must set in vcpu_run The idea is to avoid performing expensive syscalls, by using the VCPU state cached, either explicitly or speculatively, in the comm page. For example, if the state is cached we do a direct 1->5 with no syscall: +---------------------------------------------+ | Qemu | +---------------------------------------------+ | ^ | (0) nvmm_vcpu_getstate | (6) Done | | V | +---------------------------------------+ | libnvmm | +---------------------------------------+ | ^ | ^ (1) State | | (2) No | (3) Ioctl: | (5) Ok, state cached? | | | "please cache | fetched | | | the state" | V | | | +-----------+ | | | Comm Page |------+---------------+ +-----------+ | ^ | (4) "Alright | V babe" | +--------+ +-----| Kernel | +--------+ The main changes in behavior are: - nvmm_vcpu_getstate(): won't emit a syscall if the state is already cached in the comm page, will just fetch from the comm page directly - nvmm_vcpu_setstate(): won't emit a syscall at all, will just cache the wanted state in the comm page - nvmm_vcpu_run(): will commit the to-be-set state in the comm page, as previously requested by nvmm_vcpu_setstate() In addition to this, the kernel NVMM driver is changed to speculatively cache certain states known to be of interest, so that the future nvmm_vcpu_getstate() calls libnvmm or the emulator will perform will use the comm page rather than expensive syscalls. For example, if an I/O VMEXIT occurs, the I/O Assist in libnvmm will want GPRS+SEGS+CRS+MSRS, and now the kernel caches all of that in the comm page before returning to userland. Overall, in a normal run of Windows 10, this saves several millions of syscalls. Eg on a 4CPU Intel with 4VCPUs, booting the Win10 install ISO goes from taking 1min35 to taking 1min16. The libnvmm API is not changed, but the ABI is. If we changed the API it would be possible to save expensive memcpys on libnvmm's side. This will be avoided in a future version. The comm page can also be extended to implement future services. commit 7fd08024e558cd8d5f30167e7232aefb8588350c Author: maxv Date: Sat Apr 27 17:30:38 2019 +0000 Mmh, fix nvmm_vcpu_create(), the cpuid is given, and must not be chosen from the free map. Looks like I forgot this after all my design rounds. While here reorder the initialization. commit 476039c9961b8b37a4d97b94c56da14e713693f9 Author: maxv Date: Sat Apr 27 15:45:21 2019 +0000 Reorder the NVMM headers, to make a clear(er) distinction between MI and MD. Also use #defines for the exit reasons rather than an union. No ABI change, and no API change except 'cap->u.{}' renamed to 'cap->arch'. commit c1fdc11ee2cdcff7799fd1f52dc5d0d9e7059c03 Author: maxv Date: Sat Apr 27 10:40:17 2019 +0000 Add support for EnhancedIBRS, a more performant mitigation for SpectreV2, available on future CPUs (or maybe they already exist now...). commit 5e829d46acdcee0321e4e572bb68f62c664bb94b Author: maxv Date: Sat Apr 27 09:06:18 2019 +0000 If guest events were being processed when a #VMEXIT occurred, reschedule the events rather than dismissing them. This can happen for instance when a guest wants to process an exception and an #NPF occurs on the guest IDT. In practice it occurs only when the host swapped out specific guest pages. commit 005f3c8b3a55c052dd3915faab8b403f46f4858f Author: maxv Date: Sat Apr 27 08:16:19 2019 +0000 Optimize nvmm-intel, use inlined GCC assembly rather than function calls. commit 6e083df1253a316ead4b31c755b9be92dc1b8ad9 Author: maxv Date: Wed Apr 24 18:45:15 2019 +0000 Match the structure order, for better cache utilization. commit 5752ed64e4a0572edcb1573eff2de53bcd5e7413 Author: maxv Date: Wed Apr 24 18:19:28 2019 +0000 Provide the hardware error code for NVMM_EXIT_INVALID, useful when debugging. commit 84199453cecc15e3d0a26299565ad77259dfdad1 Author: maxv Date: Sun Apr 21 06:48:37 2019 +0000 update commit 0e0adf5f5e9a8630a707a255a2949e1462a6383f Author: maxv Date: Sun Apr 21 06:46:03 2019 +0000 Note removal of COMPAT_OSF1. commit c22718bd6777a7854fb4b0da8c76bc96aa922abc Author: maxv Date: Sun Apr 21 06:37:21 2019 +0000 Rename the PTE bits. commit 2b64ba44dd6880f19807eee36a4dbd534a8d49b9 Author: maxv Date: Sat Apr 20 08:45:30 2019 +0000 Ah, take XSAVE into account in ECX too, not just in EBX. Otherwise if the guest relies only on ECX to initialize/copy the FPU state (like NetBSD does), spurious #GPs can be encountered because the bitmap is clobbered. commit 7917a2a2578d5165b2662337a867d93eda74b474 Author: maxv Date: Sun Apr 14 09:09:55 2019 +0000 Add more checks, if the values are negative we hit a KASSERT later in the timeout. Reported-by: syzbot+662dbeb526303f458255@syzkaller.appspotmail.com commit db3439ae216062883bfd795441aef3981c7c5af5 Author: maxv Date: Sat Apr 13 08:41:36 2019 +0000 Introduce POOL_QUARANTINE, a feature that creates a window during which a freed buffer cannot be reallocated. This greatly helps detecting use-after-frees, because they are not short-lived anymore. We maintain a per-pool fifo of 128 buffers. On each pool_put, we do a real free of the oldest buffer, and insert the new buffer. Before insertion, we mark the buffer as invalid with KASAN. On each pool_cache_put, we destruct the object, so it lands in pool_put, and the quarantine is handled there. POOL_QUARANTINE can be used in conjunction with KASAN to detect more use-after-free bugs. commit 2e5fa34968ae799f3c75df4b5b121be529e4db1a Author: maxv Date: Sat Apr 13 06:17:33 2019 +0000 Fix use-after-free. If we're not polling, virtio_enqueue_commit() will send the transaction, and it means 'xs' can be immediately freed. So, save the value of xs_control beforehand. Detected by KASAN, ok jdolecek@. Fixes PR/54008 Reported-by: syzbot+6513c4afe66237d7207f@syzkaller.appspotmail.com commit 0577c18073da8622c42c5d9bf6fd41c98f99c7ed Author: maxv Date: Thu Apr 11 17:43:45 2019 +0000 Add KASAN instrumentation for copyin/copyinstr/copyoutstr. No copyout for now, because mm.c needs whitelisting. commit 71d0393c414b5bd8f6682493422c2eb34d0a4d57 Author: maxv Date: Wed Apr 10 18:49:04 2019 +0000 Add the NVMM_CTL ioctl, always privileged regardless of the permissions of /dev/nvmm. We'll use it to provide a way for an admin to control the registered VMs in the kernel. Add an associated wrapper in libnvmm. commit d2195f0ad54a4e3d40fc9ed119474c79c70423ff Author: maxv Date: Mon Apr 8 18:38:45 2019 +0000 Reset so_cred to NULL after freeing it, because close() may leave the PCB in pcblist, and we don't want a future lookup (via eg netstat) to read freed data. Detected by KASAN, reported by Alexander Nasonov. commit 80bc33b376c379a8434db4527fdb5d1bd7b803e0 Author: maxv Date: Mon Apr 8 18:30:54 2019 +0000 Switch to MODULE_CLASS_MISC, from pgoyette@. commit 03a3a4384fc5891c12b53dce75f0493e6a093b84 Author: maxv Date: Mon Apr 8 18:23:46 2019 +0000 Don't forget to call (*machine_destroy) when killing VMs. commit 4077ef59599375eeed48ac452d63d440583af5a9 Author: maxv Date: Mon Apr 8 18:21:42 2019 +0000 Use the fd_clone approach, to avoid losing references to the registered VMs during fork(). We attach an nvmm_owner struct to the fd, reference it in each VM, and identify the process' VMs by just comparing the pointer. commit c5ddaabe68e9be361363faf215be0ef87108ed81 Author: maxv Date: Sun Apr 7 14:28:50 2019 +0000 Invert the filtering priority: now the kernel-managed cpuid leaves are overwritable by the virtualizer. This is useful to virtualizers that want to 100% control every leaf. commit 8f258a3edcb2d2d522f3090bba8c022fb41eb37a Author: maxv Date: Sun Apr 7 14:13:03 2019 +0000 Sync, and fix grammar. commit dbae0ceee4ed07f63929343fe58b428d04b94563 Author: maxv Date: Sun Apr 7 14:05:15 2019 +0000 Don't allow unloading when there are still VMs registered, and don't allow auto-unloading at all. Not a big problem actually, because since I changed the module class it's not auto-loadable anymore. commit a6d5868716e6305c9a6887896b8e7cfa8a78c234 Author: maxv Date: Sun Apr 7 09:20:04 2019 +0000 Provide a code argument in kasan_mark(), and give a code to each caller. Five codes used: GenericRedZone, MallocRedZone, KmemRedZone, PoolRedZone, and PoolUseAfterFree. This can greatly help debugging complex memory corruptions. commit 409bfd6f186ca45271decc4e9dfdda45d06bf4ef Author: maxv Date: Sun Apr 7 08:37:38 2019 +0000 Fix tiny race in pool+KASAN, that resulted in occasional false positives. We were uselessly marking already valid areas as valid. When doing that, our KASAN code emits two calls to kasan_markmem, and there is a very small window where the area becomes invalid. So, if the area happens to be already globally referenced, and if another thread happens to read the buffer via this reference, we get a false positive. This happens only with pool_caches that have a pc_ctor that creates a global reference to the buffer, and there is one single pool_cache that does that: 'file_cache'. So now, two changes: - In pool_cache_get_slow(), the pool_get() has already redzoned the object, so no need to call pool_redzone_fill(). - In pool_cache_destruct_object1(), don't re-mark the object. If there is no ctor pool_put is fine with already-invalid objects, if there is a ctor the object was not marked as invalid in the first place; so in either case, the re-marking is not needed. Fixes PR/53674. Although very rare and difficult to reproduce, a local quarantine patch of mine made the false positives recurrent. commit 2d7d00f32f8289cc7ae72e1790add5e61c84b17a Author: maxv Date: Sat Apr 6 11:49:53 2019 +0000 Replace the misc[] state by a new compressed nvmm_x64_state_intr structure, which describes the interruptibility state of the guest. Add evt_pending, read-only, that allows the virtualizer to know if an event is pending. commit cee0a1fca1498ab9c3de45bd069b06989d6be71b Author: maxv Date: Thu Apr 4 17:33:47 2019 +0000 Check the GPA permissions too in the Assists, because it is possible that the guest traps on a page the virtualizer marked as read-only (even if it appears as read-write in the HVA). commit 5045444de08ca75681072b18f4ec33422ba1d85c Author: maxv Date: Wed Apr 3 19:23:38 2019 +0000 Fix small read overflow; harmless, because since I removed RH0, the memory access on IPV6_RTHDR that would normally be illegal is not needed, and GCC automatically removes it. commit 48aa4d6e69e102f0fef64eb84ba540ffa29772bf Author: maxv Date: Wed Apr 3 19:14:25 2019 +0000 When scrolling the screen don't forget to update the last line. Whatever, there is no case where the screen scrolls actually. commit d4ea4f24d0f171e44ac50f48f055daade6a2d6e0 Author: maxv Date: Wed Apr 3 19:10:58 2019 +0000 VMX: if PAT is not valid, #GP on WRMSR, rather than crashing the guest. commit 142dd7189d5502fae113cded5d2b5952b714fbc3 Author: maxv Date: Wed Apr 3 18:05:55 2019 +0000 Add new VMCS bits. commit fe5d0e57b74a2dc1dccbd4262cbf1b3ef26b14c3 Author: maxv Date: Wed Apr 3 17:32:58 2019 +0000 Add MSR_TSC. commit f8afa7247c2b150fe651591f35487afdc0d8474f Author: maxv Date: Sun Mar 31 19:54:36 2019 +0000 Also check for MT_CONTROL, and end the receive operation if we see one. It is possible to get an MT_CONTROL if we sleep in MSG_WAITALL. The other BSDs do the same. Reported-by: syzbot+e8aa26ad551c649227b4@syzkaller.appspotmail.com commit 0895cc3a398fca7be974f1ea435e45ad6b992145 Author: maxv Date: Thu Mar 28 19:00:40 2019 +0000 Move NVMM in the "any" class, so that it can be enabled in GENERIC. Add missing files in files.nvmm, and add NVMM (commented out) in the amd64 GENERIC. Remove the "caveats" section in the man page. commit 8b0f4dd812b88cc53de4523154e9f0030ff54bbb Author: maxv Date: Thu Mar 28 18:12:24 2019 +0000 Move pnbuf_cache into vfs_init.c, where it belongs. commit 7971e505e1b748bcb2fd1a140a2565a113965ad0 Author: maxv Date: Wed Mar 27 18:27:46 2019 +0000 Kernel Heap Hardening: detect frees-in-wrong-pool on on-page pools. The detection is already implicitly done for off-page pools. We recycle pr_slack (unused) in struct pool, and make ph_node a union in order to recycle an unsigned int in struct pool_item_header. Each time a pool is created we atomically increase a global counter, and register the current value in pp. We then propagate this value in each ph, and ensure they match in pool_put. This can catch several classes of kernel bugs and basically makes them unexploitable. It comes with no increase in memory usage and no measurable increase in CPU cost (inexistent cost actually, just one check predicted false). commit 589eb749c3ec12516ecde0fb96e8de7128f5a121 Author: maxv Date: Tue Mar 26 20:05:18 2019 +0000 Remove unneeded PR_NOALIGN, pool_allocator_kmem is already page-aligned. commit ab8f33ba03fe8d584fee0e148eb20821455cd346 Author: maxv Date: Tue Mar 26 18:31:30 2019 +0000 Remove POOL_SUBPAGE, it is unused, undocumented, and adds confusion. commit a437fd7b1d35cab3da9fd159f47ceccf6869e80a Author: maxv Date: Mon Mar 25 19:24:29 2019 +0000 Remove compat_osf1, discussed on tech-kern@. commit 1c8d1ec43fa1c918eb73677a5b774321847de82a Author: maxv Date: Sun Mar 24 16:39:46 2019 +0000 regen commit 7cdeecffdf7dcbc465f6d6dc9b60807ac65a4a4b Author: maxv Date: Sun Mar 24 16:24:19 2019 +0000 Remove Alpha's compat_linux dependency on compat_osf1. Each function is copied as-is from compat_osf1 with no functional change. Discussed on tech-kern@, ok @thorpej. commit b6b1f0380be2b10c6d110c8c7c4bc610ce0b3c63 Author: maxv Date: Sun Mar 24 15:58:32 2019 +0000 Disable preemption when setting PCB_COMPAT32, to prevent a context switch before cpu_fsgs_reload() finishes, otherwise we write garbage in the GDT. On NetBSD-current it is harmless, however in NetBSD-8 it might cause panics, because NetBSD-8 uses the old SegRegs model and under this model we reload %fs and %gs during switches. commit ba1cbf467f32655d26db97068cad42b433684c88 Author: maxv Date: Sun Mar 24 13:15:42 2019 +0000 Fix a tiny race in setregs and linux_setregs. Between the moment we set pcb_flags to zero, and the moment cpu_segregs64_zero resets pcb_gs, we may be preempted. If this happens, and if the calling LWP was a 32bit thread, when switching back to that LWP, the context switcher sees that PCB_COMPAT32 is not set in pcb_flags and tries to perform a 64bit context switch; but pcb_gs contains a 32bit GDT descriptor, and not a 64bit GS.base value. The wrmsr therefore faults because the value is non-canonical, and this fault is fatal. Rearrange the code so that the update of pcb_flags and pcb_gs/pcb_fs is non interruptible. This fixes the problem, tested with a reproducer (which therefore doesn't work anymore). Likely fixes PR/53993. commit 4b92aafb0857370a5766f4521723d43a325ef5bc Author: maxv Date: Sat Mar 23 13:05:24 2019 +0000 Remove references to COMPAT_OSF1 in HPPA, it has never been supported on this architecture. commit 37ae473d4a2a167f154728d68e674607fc8c6ef6 Author: maxv Date: Sat Mar 23 12:01:18 2019 +0000 Enable QUEUEDEBUG under DIAGNOSTIC. It has never been documented and used, but it's very useful and costs basically nothing. I even think we could enable it by default in the kernel (if we added __predict_false's and removed some crap). commit 3c8cd92515737bfa26863a0057d8267e53c8cb81 Author: maxv Date: Sat Mar 23 10:02:05 2019 +0000 In fact, xc_broadcast also applies to offline CPUs, so we don't need to make sure each CPU is online. Remove the checks, I suspect they weren't totally correct by the way. commit df7220ac04759dff216e0cbdb1d74429f7ae392e Author: maxv Date: Thu Mar 21 20:21:40 2019 +0000 Make it possible for an emulator to set the protection of the guest pages. For some reason I had initially concluded that it wasn't doable; verily it is, so let's do it. The reserved 'flags' argument of nvmm_gpa_map() becomes 'prot' and takes mmap-like protection codes. commit 6bf721436d5875679a5c86c93d78d36d6f96a8dd Author: maxv Date: Tue Mar 19 19:23:39 2019 +0000 Add CVS ids, and rename the PTE bits. No functional change. commit 1ff11c123f87a29c51a5b9a748d10f0fc92f5830 Author: maxv Date: Tue Mar 19 19:15:57 2019 +0000 Fix/remove some half-baked stuff I left in the prekern: - Page-align the idt store, to be extra sure. - Remove unneeded prototypes. - Drop the TSS, we don't care and aren't even using it. - Initialize %ss with a default value. - Fix three exception handlers, no need to push an error code. No actual impact, because these things are used only when returning from exceptions received in the prekern; these exceptions are not supposed to be ever received, never are, and if they were we wouldn't return anyway. commit abd880643d6e1e046f3f0f9b14d6b4b3d830203e Author: maxv Date: Mon Mar 18 20:34:48 2019 +0000 Kernel Heap Hardening: manage freed items with bitmaps rather than linked lists when we're on-page and the page header is naturally big enough to contain a bitmap. This comes with no increase in memory consumption, and similar CPU cost (maybe it's a little faster actually). We want to favor bitmaps over linked lists, because linked lists install kernel pointers inside the items, and this can be too easily exploitable in use-after-free or double-free conditions, or in item buffer overflows occurring within a pool page. commit 65595eb047bd2556121b43844eb2dee48b5db251 Author: maxv Date: Sun Mar 17 19:57:54 2019 +0000 Introduce a new flag, PR_USEBMAP, that indicates whether the pool uses a bitmap to manage freed items. It dissociates PR_NOTOUCH from bitmaps, but for now is set only when PR_NOTOUCH is set, which reproduces the current behavior. Therefore, no functional change. Also clarify the code. commit b1ca7247c0d2be28e1f6314c4b3133f970b3b8a2 Author: maxv Date: Sun Mar 17 15:33:50 2019 +0000 Kernel Heap Hardening: put the pool header at the beginning of the backing page, not at the end of it. This makes it harder to exploit buffer overflows, because it eliminates the certainty that sensitive kernel data is located after the item space and is therefore overwritable. The pr_itemoffset field is recycled, and holds the (aligned) offset of the item space. The pr_phoffset field becomes unused. We align 'itemspace' for clarity, but it's not strictly necessary. This comes with no performance cost or increase in memory usage, in particular the potential padding consumed by roundup(PHSIZE, align) was already implicitly consumed before, because of the (necessary) truncations in the divisions. Now it's just more explicit, but not bigger. commit 61e03c23ec4fcb981051c2a00de49842e5c94a17 Author: maxv Date: Sun Mar 17 14:52:25 2019 +0000 Move some code into a separate function, and explain a bit. Also define PHSIZE. No functional change. commit cee8aa3875e151254c8bd9567cc26fa95675b72b Author: maxv Date: Sun Mar 17 07:22:18 2019 +0000 cosmetic commit 4a872c595ecd1659aa0f73101af48c249695e73f Author: maxv Date: Sun Mar 17 06:55:06 2019 +0000 Prepare the removal of the 'ioff' argument: add a KASSERT to ensure it is zero, and remove the internal logic. The pool code is simpler now. commit fd3f47a7ae22dab4303393a3ca0ec2c2c115f1cf Author: maxv Date: Sun Mar 17 06:36:22 2019 +0000 Hard-align the fields of the structures with __aligned(32), and pass ioff=0 in the pool cache. commit fa356fa0262cbd6acc4dc3710be79d8243379cbf Author: maxv Date: Sat Mar 16 13:33:10 2019 +0000 Misc changes: - Turn two KASSERTs to real panics, they are useful and not expensive. - Rename a few variables for clarity. - Add a new panic, to make sure a freed item is in the item space. commit 51cf5da1c747ba844cd9029a05d954c087822778 Author: maxv Date: Sat Mar 16 08:03:03 2019 +0000 Disable COMPAT_OSF1, will be removed. commit b93efd52e70760b40fadce6293a47d26b3a560c0 Author: maxv Date: Thu Mar 14 20:29:53 2019 +0000 Optimize NVMM-Intel: keep the VMCS active on the host CPU, and lazy-switch it on demand only when needed. This allows the CPU to use the cached version of the guest state, rather than the in-memory copy of it. This is much more performant. A VMCS must be active on only one CPU, but one CPU can have several active VMCSs at the same time. We keep track of which CPU each VMCS is active on. When we want to execute a VCPU, we determine whether its VMCS is loaded on another CPU, and if so send an IPI to ask it to unbusy that VMCS. In most cases the VMCS is already active on the current CPU, so we don't have to do anything and can proceed with a fast VMRESUME. We send IPIs with kpreemption enabled but with a bound LWP, because we don't want to get context-switched to the CPU we just sent an IPI to. Overall, with this in place, I see a ~15% performance increase in the guests on NVMM-Intel. commit 7c63f593d73c2584a06f794e05bff5c594be25a4 Author: maxv Date: Thu Mar 14 19:26:44 2019 +0000 Move a KASSERT, applies to all branches. commit 78aa41a956f0454eb0b8003fab03bfd139f9f84a Author: maxv Date: Thu Mar 14 19:15:26 2019 +0000 Reduce the mask of the VTPR, only the first four bits matter. commit 3c093f6fb3e30cb7eaf4a43797824a8d4702895f Author: maxv Date: Thu Mar 14 19:10:27 2019 +0000 Fail early if we're beyond the guest max ram. commit 67b40fa53e3e6b781128957ba8b56e4576fefa0a Author: maxv Date: Wed Mar 13 20:56:33 2019 +0000 style commit 9dd9378171bb0ad4525ca5fa61374d9eb0264d66 Author: maxv Date: Mon Mar 11 20:38:27 2019 +0000 Add sanity check: make sure we retrieve a valid item header, by checking its page address against the one we computed. If there's a mismatch it means the buffer does not belong to the pool, and we panic. commit e6f567f4e6d496baf9f251b8256133b3ee7b6add Author: maxv Date: Mon Mar 11 20:21:32 2019 +0000 Rename pr_item_notouch_* to pr_item_bitmap_*, and move some code into new pr_item_linkedlist_* functions. This makes it easier to see that we have two ways of handling freed items. No functional change. commit 989504d89d8ef743720cce08679e2830c10738f0 Author: maxv Date: Sun Mar 10 16:30:01 2019 +0000 Two changes: * Allow large pages to be passed in pmap_pdes_valid, this happens under DDB when it reads RIP (.text), called via pmap_extract. * Invert a branch in pmap_extract, so that 'l_cpu' is not touched if we're dealing with the kernel pmap. This fixes 'boot -d'. commit b29be6314e9783733fa14791c5ea83e98bd699fc Author: maxv Date: Sat Mar 9 09:09:56 2019 +0000 New software PTE bits. commit fad704ed94bc30d5147cf9d02bb7f1ea884ce5cc Author: maxv Date: Sat Mar 9 08:42:25 2019 +0000 Start replacing the x86 PTE bits. commit a13c832cf6991e0323e768c5dd50bc492b229bf9 Author: maxv Date: Thu Mar 7 18:32:10 2019 +0000 Mmh, fix len, mh_size includes the malloc header, but we don't redzone it. commit 824d33230d413d9a96d0392a69131c56377a8954 Author: maxv Date: Thu Mar 7 15:47:34 2019 +0000 Micro optimizations: - Compress x86_rexpref, x86_regmodrm, x86_opcode and x86_instr. - Cache-align the register, opcode and group tables. - Modify the opcode tables to have 256 entries, and avoid a lookup. commit 29f0099ec30e94009fcc2cbf75cfb803e8b04cef Author: maxv Date: Thu Mar 7 15:22:21 2019 +0000 Rename the internal NVMM HVA table entries from "segment" to "hmapping", less confusing. Also fix the error handling in nvmm_hva_unmap(). commit 4b725333a19d4295e7116a66e64600bf8c430d0d Author: maxv Date: Thu Mar 7 15:06:37 2019 +0000 Parse EXC_NMI on nvmm-intel, and don't return NVMM_EXIT_INVALID if we received a host NMI, otherwise the guest could get killed if an NMI comes in, typically when the host runs tprof at the same time. Already handled on nvmm-amd. commit 452754c6c646975c81be26c3f977d29e0a094cc9 Author: maxv Date: Thu Mar 7 14:40:35 2019 +0000 Introduce a new set of PTE bits, with a different naming convention. PG_V -> PTE_P /* Present */ PG_RW -> PTE_W /* Write */ PG_u -> PTE_U /* User */ PG_WT -> PTE_PWT /* Write-Through */ PG_N -> PTE_PCD /* Cache-Disable */ PG_U -> PTE_A /* Accessed */ PG_M -> PTE_D /* Dirty */ PG_PAT -> PTE_PAT /* PAT on 4KB Pages */ PG_PS -> PTE_PS /* Large Page Size */ PG_G -> PTE_G /* Global Translation */ PG_AVAIL1 -> PTE_AVL1 /* Ignored by Hardware */ PG_AVAIL2 -> PTE_AVL2 /* Ignored by Hardware */ PG_AVAIL3 -> PTE_AVL3 /* Ignored by Hardware */ PG_LGPAT -> PTE_LGPAT /* PAT on Large Pages */ PG_NX -> PTE_NX /* No Execute */ Until now we were using "PG_BIT". The "BIT" part of the naming did not follow the x86 naming convention in the spec, and was very confusing. We don't want the "PG_" part of it either, because UVM has similar flags (ie PG_BUSY). commit cc4bec4bb75c0df35a14018e05c06fd3c0409ecb Author: maxv Date: Thu Mar 7 13:26:24 2019 +0000 Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion. commit 887e6899e78d1d0fe89a3dc974810242488824f3 Author: maxv Date: Thu Mar 7 13:02:13 2019 +0000 Style, and remove useless comments. commit 978e3d4dc4ca94e34764c4bd6a513aa75ed2d813 Author: maxv Date: Thu Mar 7 12:29:14 2019 +0000 Remove getsombuf(), unused. commit 6dcedd92c2a4c74c95bec79fc9ddb80128870697 Author: maxv Date: Thu Mar 7 12:22:43 2019 +0000 style commit 4d7a6f4f63d89500ecc41ce5982e955c7472ac9f Author: maxv Date: Sun Mar 3 17:37:36 2019 +0000 Fix bug, the entry we're iterating on is 'current', not 'entry'. Here only the first entry gets wired in. commit f60c16c1d6aa08e0b987fc11b9a2c9c1399d7dd0 Author: maxv Date: Sun Mar 3 17:33:33 2019 +0000 Fix bug, PG_W is 'wired', not 'writable'. commit 155e55388e8fec5f9ada697af32077cf9390ad8f Author: maxv Date: Sun Mar 3 07:04:40 2019 +0000 Add KASAN use-after-scope detection in aarch64, tested by Ryo Shimizu, thanks. commit ad584c20185c433b9a7d3abb49a27a6a145f2f40 Author: maxv Date: Sun Mar 3 07:01:09 2019 +0000 Choose which CPUID bits to allow, rather than which bits to disallow. This is clearer, and also forward compatible with future CPUs. While here be more consistent when allowing the bits, and sync between nvmm-amd and nvmm-intel. Also make sure to disallow AVX, because the guest state we provide is only x86+SSE. Fixes a CentOS panic when booting on NVMM, reported by Jared McNeill, thanks. commit a7cdcccb8494989cb2e0a70cb06e5905b8a0ce28 Author: maxv Date: Tue Feb 26 12:23:12 2019 +0000 Change the layout of the SEG state: - Reorder it, to match the CPU encoding. This is the universal order, also used by Qemu. Drop the seg_to_nvmm[] tables. - Compress it. This divides its size by two. - Rename some of its fields, to better match the x86 spec. Also, take S out of Type, this was a NetBSD-ism that was likely confusing to other people. commit f35533a8094ee9d3fc61401beaf9028ab5d6f1f6 Author: maxv Date: Tue Feb 26 10:18:39 2019 +0000 Set hardseg to -1 rather than 0, because 0 can be a valid segment. commit 61fe4125ea51a54495e533b7eaecc232c2e6a2f9 Author: maxv Date: Tue Feb 26 06:52:34 2019 +0000 Fix locking: it is fine if the lock is already key_so_mtx, this can happen in socketpair. In that case don't take it. Ok ozaki-r@ Reported-by: syzbot+901e2e5edaaaed21c069@syzkaller.appspotmail.com commit d788e20728a29f2f68d3d8cee699d9c6dc58172f Author: maxv Date: Mon Feb 25 10:49:16 2019 +0000 Improve panic messages. commit e55989089613df24703a9a0f125d099932c433d3 Author: maxv Date: Mon Feb 25 07:31:32 2019 +0000 Fix the order in udp6_attach: soreserve should be called before in6_pcballoc, otherwise if it fails there is still a PCB attached, and we hit a KASSERT in socreate. In !DIAGNOSTIC this would have caused a memory leak. By the way I find the splsoftnet highly suspicious, in6_pcballoc already does that. Triggered by SyzKaller. Reported-by: syzbot+7bace612ca3cc3e124f8@syzkaller.appspotmail.com commit 1ddea7fcdc7a3c3753787a4a7efbef0a2b9b4d18 Author: maxv Date: Mon Feb 25 06:49:44 2019 +0000 RIP6, CAN, SCTP and SCTP6 lack a length check in their _send() functions. Fix RIP6 and CAN, add a big XXX in the SCTP ones. Found by KASAN, triggered by SyzKaller. Reported-by: syzbot+0b9692ae0f49f93b7dc7@syzkaller.appspotmail.com commit 2df69c0bc2a117843a26116890fcd0557058103e Author: maxv Date: Sun Feb 24 10:44:41 2019 +0000 Improve the KASAN output, provide an error code, to help distinguish classes of bugs. commit 237757f7f16d9e4ec4a43ceec804077c02db4621 Author: maxv Date: Sun Feb 24 08:02:45 2019 +0000 Add support for use-after-scope detection in KASAN. It is available since GCC7, and we have GCC7 by default now. Slightly reorder the code, and remove a duplicated KASSERT too. Tested on amd64-KASAN. Not yet enabled on aarch64-KASAN, but it should work as-is. commit 8a5c667c9cb103e4e0331fef47212ecc2c167539 Author: maxv Date: Sun Feb 24 07:20:33 2019 +0000 RIP, RIP6, DDP, SCTP and SCTP6 lack a length check in their _connect() functions. Fix the first three, and add a big XXX in the SCTP ones. Found by KASAN, triggered by SyzKaller. Reported-by: syzbot+9eaf98dad6ca738c250d@syzkaller.appspotmail.com commit 3f7ba5d607892caff67ede7857c13f6b4659d0fc Author: maxv Date: Sat Feb 23 12:27:00 2019 +0000 Install the x86 RESET state at VCPU creation time, for convenience, so that the libnvmm users can expect a functional VCPU right away. commit d543662772e1b34c05bf351d19d5dd3069ce353b Author: maxv Date: Sat Feb 23 10:59:12 2019 +0000 Move PATENTRY into pmap.h, will be used outside. commit bce11997382ba886f8576eb6df01582b9747bd89 Author: maxv Date: Sat Feb 23 10:43:36 2019 +0000 Add support for CPUs that don't have the EPT_{A,D} bits. On such CPUs, these bits are ignored by the hardware. We don't care about setting them, however, we must always assume they are set. Modify the pmap code to do that. While here, in pmap_ept_remove_pte, don't flush the TLB when it's not needed. Tested on an old Intel Celeron. commit 1712eeb2645b4a221705e694b0a61ec3f9fc8830 Author: maxv Date: Sat Feb 23 08:19:16 2019 +0000 Reorder the functions, and constify setstate. No functional change. commit d3528af0854366fe84a0e7a20921a87bbd0e725d Author: maxv Date: Fri Feb 22 12:24:34 2019 +0000 Fix omission: if we receive a guest trap on CR0, and if the original instruction would have resulted in Long Mode being enabled, we need to manually enable Long Mode ourselves. We were already doing that correctly in setstate, but not in the CR0 trap handler. Problem initially reported by Aymeric Vincent; ArchLinux wouldn't boot, now it does and works correctly. While here, add CR0_ET in the CR0 mask, for the associated shadow to be taken into account. Normally this shadow bit shouldn't be necessary, but for now I keep it regardless. commit 94930a0185e2d94ac2d680ec8f0fa9d4f4b82936 Author: maxv Date: Thu Feb 21 14:56:23 2019 +0000 Add a TODO list for NVMM, just to list some known issues. commit 4390b9922a1b8f07540dcd57743362fac2f012cd Author: maxv Date: Thu Feb 21 14:31:54 2019 +0000 Remove wrong KASSERT in EPT, and reorder the code to reduce duplication. commit 875518e2abe1e083fc8caacc74a0731d7bdd944d Author: maxv Date: Thu Feb 21 13:25:44 2019 +0000 Reorder the detection in vmx_ident(), to fix panic on old CPUs. We must read MSR_IA32_VMX_EPT_VPID_CAP _after_ ensuring EPT is there, because if it's not, the rdmsr faults. commit d6f8093fcd631fbae0ce8ae04447ae1492d92992 Author: maxv Date: Thu Feb 21 12:17:52 2019 +0000 Another locking issue in NVMM: the {svm,vmx}_tlb_flush functions take VCPU mutexes which can sleep, but their context does not allow it. Rewrite the TLB handling code to fix that. It becomes a bit complex. In short, we use a per-VM generation number, which we increase on each TLB flush, before sending a broadcast IPI to everybody. The IPIs cause a #VMEXIT of each VCPU, and each VCPU Loop will synchronize the per-VM gen with a per-VCPU copy, and apply the flushes as neededi lazily. The behavior differs between AMD and Intel; in short, on Intel we don't flush the hTLB (EPT cache) if a context switch of a VCPU occurs, so now, we need to maintain a kcpuset to know which VCPU's hTLBs are active on which hCPU. This creates some redundancy on Intel, ie there are cases where we flush the hTLB several times unnecessarily; but hTLB flushes are very rare, so there is no real performance regression. The thing is lock-less and non-blocking, so it solves our problem. commit 409f23287dab9681e2948875bcbb1061a2009467 Author: maxv Date: Thu Feb 21 11:58:04 2019 +0000 Clarify the gTLB code a little. commit e6d1e1e3f2ff23d7ff7ff6f66b3bc3fa9a4dc70a Author: maxv Date: Mon Feb 18 19:03:12 2019 +0000 Fix stupid mistake, I didn't reflect correctly the behavior of pmap_sync_pv in the EPT callback, 'optep' can be NULL. commit 042f8e32f3f132fe4cd182325724da3b7ca055dc Author: maxv Date: Mon Feb 18 12:17:45 2019 +0000 Ah, finally found you. Fix scheduling bug in NVMM. When processing guest page faults, we were calling uvm_fault with preemption disabled. The thing is, uvm_fault may block, and if it does, we land in sleepq_block which calls mi_switch; so we get switched away while we explicitly asked not to be. From then on things could go really wrong. Fix that by processing such faults in MI, where we have preemption enabled and are allowed to block. A KASSERT in sleepq_block (or before) would have helped. commit b360319c237c85e717ab9e13d6b74c1ea9074404 Author: maxv Date: Sun Feb 17 20:25:46 2019 +0000 Fix handling of SIB instructions. We were jumping to the SIB node _before_ fetching the displacement, so the node would always think there was no displacement. This didn't alter the final GPA we would be touching - because it is fetched from the kernel directly and not from the computation -, but it altered the instruction length, and on some guests (like Fedora 64bit), the VCPU would resume execution at the wrong RIP and crash. Now these guests work. commit bdf95c00c81f94ebe2b7638cfbd3378cadcb0062 Author: maxv Date: Sat Feb 16 12:58:13 2019 +0000 Ah no, adapt previous, on AMD RAX is in the VMCB. commit 613287c8d80e0ae608c972974949f8297a15a1c5 Author: maxv Date: Sat Feb 16 12:40:31 2019 +0000 Improve the FPU detection: hide XSAVES because we're not allowing it, and don't set CPUID2_OSXSAVE if the guest didn't first set CR4_OSXSAVE. With these changes in place, I can boot Windows 10 on NVMM. commit 60a2d4b8fb52e73be5f5c69b46c2d323ab33a930 Author: maxv Date: Sat Feb 16 12:05:30 2019 +0000 Handle MSR_MISC_ENABLE on NVMM-Intel (Intel-specific). commit 40c6faeb0023754bcb8de68dfad4e9aaf7e9786a Author: maxv Date: Fri Feb 15 16:42:27 2019 +0000 Remove the PSE check in the 32bit-PAE MMU. Setting CR4.PAE automatically enables PSE regardless of whether CR4.PSE is set or not, so we should just ignore it. With this in place I can boot Windows 8.1 on NVMM. commit 5f77213fd2f4c5b42649f18aa9f0c4c0a99870e6 Author: maxv Date: Fri Feb 15 13:17:05 2019 +0000 Initialize the guest TSC to zero at VCPU creation time, and handle guest writes to MSR_TSC at run time. This is imprecise, because the hardware does not provide a way to preserve the TSC during #VMEXITs, but that's fine enough. commit 83a61044252c6b126e7d3210ea15dc56dae6b2e8 Author: maxv Date: Thu Feb 14 14:30:20 2019 +0000 Harmonize the handling of the CPL between AMD and Intel. AMD has a separate guest CPL field, because on AMD, the SYSCALL/SYSRET instructions do not force SS.DPL to predefined values. On Intel they do, so the CPL on Intel is just the guest's SS.DPL value. Even though technically possible on AMD, there is no sane reason for a guest kernel to set a non-three SS.DPL, doing that would mess up several common segmentation practices and wouldn't be compatible with Intel. So, force the Intel behavior on AMD, by always setting SS.DPL<=>CPL. Remove the now unused CPL field from nvmm_x64_state::misc[]. This actually increases performance on AMD: to detect interrupt windows the virtualizer has to modify some fields of misc[], and because CPL was there, we had to flush the SEG set of the VMCB cache. Now there is no flush necessary. While here remove the CPL check for XSETBV on Intel, contrary to AMD Intel checks the CPL before the intercept, so if we receive an XSETBV VMEXIT, we are certain that it was executed at CPL=0 in the guest. By the way my check was wrong in the first place, it was reading SS.RPL instead of SS.DPL. commit 9eb6ddd38e3562dec16e1271f08147ff965482c4 Author: maxv Date: Thu Feb 14 09:37:31 2019 +0000 On AMD, the segments have a simple "present" bit. On Intel however there is an extra "unusable" bit, which has a twisted meaning. We can't just ignore this bit, because when unset, the CPU performs extra checks on the other attributes, which may cause VMENTRY to fail and the guest to be killed. Typically, on Qemu, some guests like Windows XP trigger two consecutive getstate+setstate calls, and while processing them, we end up wrongfully removing the "unusable" bits that were previously set. Fix that by forcing "unusable = !present". Each hypervisor I could check does something different, but this seems to be the least problematic solution for now. While here, the fields of vmx_guest_segs are VMX indexes, so they should be uint64_t (no functional change). commit d5ee280cae82c9155f22541ccbfd1baed65c7243 Author: maxv Date: Wed Feb 13 16:06:28 2019 +0000 Note Intel support. commit 2829a4737b63754a5375b25d9d3f4dd56448f388 Author: maxv Date: Wed Feb 13 16:03:16 2019 +0000 Add Intel-VMX support in NVMM. This allows us to run hardware-accelerated VMs on Intel CPUs. Overall this implementation is fast and reliable, I am able to run NetBSD VMs with many VCPUs on a quad-core Intel i5. NVMM-Intel applies several optimizations already present in NVMM-AMD, and has a code structure similar to it. No change was needed in the NVMM MI frontend, or in libnvmm. Some differences exist against AMD: - On Intel the ASID space is big, so we don't fall back to a shared ASID when there are more VCPUs executing than available ASIDs in the host, contrary to AMD. There are enough ASIDs for the maximum number of VCPUs supported by NVMM. - On Intel there are two TLBs we need to take care of, one for the host (EPT) and one for the guest (VPID). Changes in EPT paging flush the host TLB, changes to the guest mode flush the guest TLB. - On Intel there is no easy way to set/fetch the VTPR, so we intercept reads/writes to CR8 and maintain a software TPR, that we give to the virtualizer as if it was the effective TPR in the guest. - On Intel, because of SVS, the host CR4 and LSTAR are not static, so we're forced to save them on each VMENTRY. - There is extra Intel weirdness we need to take care of, for example the reserved bits in CR0 and CR4 when accesses trap. While this implementation is functional and can already run many OSes, we likely have a problem on 32bit-PAE guests, because they require special care on Intel CPUs, and currently we don't handle that correctly; such guests may misbehave for now (without altering the host stability). I expect to fix that soon. commit 098836b92dccbd757fe736cc3c8e5bfc67c8d3a1 Author: maxv Date: Wed Feb 13 10:55:13 2019 +0000 Drop support for software interrupts. I had initially added that to cover the three event types available on AMD, but Intel has seven of them, all with weird and twisted meanings, and they require extra parameters. Software interrupts should not be used anyway. commit da0695ce7f4a8b8d2e4ba67ee51c96cbbc1971a5 Author: maxv Date: Wed Feb 13 08:38:25 2019 +0000 Add the EPT pmap code, used by Intel-VMX. The idea is that under NVMM, we don't want to implement the hypervisor page tables manually in NVMM directly, because we want pageable guests; that is, we want to allow UVM to unmap guest pages when the host comes under pressure. Contrary to AMD-SVM, Intel-VMX uses a different set of PTE bits from native, and this has three important consequences: - We can't use the native PTE bits, so each time we want to modify the page tables, we need to know whether we're dealing with a native pmap or an EPT pmap. This is accomplished with callbacks, that handle everything PTE-related. - There is no recursive slot possible, so we can't use pmap_map_ptes(). Rather, we walk down the EPT trees via the direct map, and that's actually a lot simpler (and probably faster too...). - The kernel is never mapped in an EPT pmap. An EPT pmap cannot be loaded on the host. This has two sub-consequences: at creation time we must zero out all of the top-level PTEs, and at destruction time we force the page out of the pool cache and into the pool, to ensure that a next allocation will invoke pmap_pdp_ctor() to create a native pmap and not recycle some stale EPT entries. To create an EPT pmap, the caller must invoke pmap_ept_transform() on a newly-allocated native pmap. And that's about it, from then on the EPT callbacks will be invoked, and the pmap can be destroyed via the usual pmap_destroy(). The TLB shootdown callback is not initialized however, it is the responsibility of the hypervisor (NVMM) to set it. There are some twisted cases that we need to handle. For example if pmap_is_referenced() is called on a physical page that is entered both by a native pmap and by an EPT pmap, we take the Accessed bits from the two pmaps using different PTE sets in each case, and combine them into a generic PP_ATTRS_U flag (that does not depend on the pmap type). Given that the EPT layout is a 4-Level tree with the same address space as native x86_64, we allow ourselves to use a few native macros in EPT, such as pmap_pa2pte(), rather than re-defining them with "ept" in the name. Even though this EPT code is rather complex, it is not too intrusive: just a few callbacks in a few pmap functions, predicted-false to give priority to native. So this comes with no messy #ifdef or performance cost. commit e7b70c162b837efb5f76fd989aa58ffafbc36868 Author: maxv Date: Wed Feb 13 07:04:12 2019 +0000 Micro optimization: the STAR/LSTAR/CSTAR/SFMASK MSRs are static, so rather than saving them on each VMENTRY, save them only once, at VCPU creation time. commit 008a833faf902dcb502f8ae4c6bf368fa3c54654 Author: maxv Date: Wed Feb 13 06:32:45 2019 +0000 Reorder the GPRs to match the CPU encoding, simplifies things on Intel. commit 0196af49ab249be76c52d1ae4e6dd63fd820c186 Author: maxv Date: Tue Feb 12 14:54:59 2019 +0000 Optimize: the hardware does not clear the TLB flush command after a VMENTRY, so clear it ourselves, to avoid uselessly flushing the guest TLB. While here also fix the processing of EFER-induced flushes, they shouldn't be delayed. commit 43dbe2c4ed79a2aaa0b47688b2b078404598b463 Author: maxv Date: Tue Feb 12 14:50:21 2019 +0000 Optimize: fetch only 5 bytes instead of 15, the instruction can have only up to five prefixes. commit b71eee915378a883c91696071be21ab2ec7b1944 Author: maxv Date: Mon Feb 11 11:12:58 2019 +0000 Fix previous, pr_size includes the KASAN redzone. Repurpose pr_reqsize and use it for PR_ZERO, it holds the size requested by the user with no padding or redzone added, and only these bytes should be zeroed. commit 0c66f932a362a9c62828a8dab4ea30e3f0cefc98 Author: maxv Date: Mon Feb 11 07:07:37 2019 +0000 Increase the max guest ram from 4GB to 128GB. commit 1f9a1ec47be5f51d6cb858092d628f7d7071a438 Author: maxv Date: Thu Feb 7 10:58:45 2019 +0000 Improvements: - Emulate the instructions by executing them directly on the host CPU. This is easier and probably faster than doing it in software manually. - Decode SUB from Primary, CMP from Group1, TEST from Group3, and add associated tests. - Handle correctly the cases where an instruction that always implicitly reads the register operand is executed with the mem operand as source (eg: "orq (%rbx),%rax"). - Fix the MMU handling of 32bit-PAE. Under PAE CR3 is not page-aligned, so there are extra bits that are valid. With these changes in place I can boot Windows XP on Qemu+NVMM. commit f8d6eb99bd211e328c5abb14b35a6ddde54a587e Author: maxv Date: Tue Feb 5 17:03:10 2019 +0000 Ah, I had warnings disabled, fix the build. commit a1a558cbc8b8801712536119e62333865a1325cb Author: maxv Date: Tue Feb 5 13:56:32 2019 +0000 Sync with reality, and improve. commit d842c727ff09107d2fc91c0a141f06b5b5b0b8cc Author: maxv Date: Tue Feb 5 13:00:03 2019 +0000 Add 12 tests for libnvmm's I/O Assist. commit ce7fa18aa85f563960ecc4a059a763b1e2948852 Author: maxv Date: Mon Feb 4 15:13:54 2019 +0000 Clobber the size when freeing a buffer. This way, if the same buffer gets freed twice, the second size check will fire. commit 528d4eb72bac3766a57060163de516ddd49c3697 Author: maxv Date: Mon Feb 4 15:07:34 2019 +0000 Add more symbols to the unwinder, in case we get a KASAN message inside an exception handler. commit ef67fd308549ea0e74a403dfbf03c5c34647c610 Author: maxv Date: Mon Feb 4 12:11:18 2019 +0000 Improvements: - Guest reads/writes to PAT land in gPAT, so no need to emulate them. - When emulating EFER, don't advance the RIP if a fault occurs, and don't forget to flush the VMCB cache accordingly. commit a2b28b1ec87a5a97de86641088c9aebc55fd64ac Author: maxv Date: Fri Feb 1 11:35:13 2019 +0000 Add the remaining pmap callbacks, will be used by NVMM-VMX. commit cdb31846f88aa75d95305ec0042b31178809f208 Author: maxv Date: Fri Feb 1 06:49:58 2019 +0000 Fix two issues: * Uh I put the wrong masks in some GPRs, fuck. * When the opsize of MOVZX is 4, we need to combine the zero-extend from the instruction with the natural zero-extend of long mode. Add two associated tests. commit f2333748114011d1a76ae97c19612c9292025ff9 Author: maxv Date: Fri Feb 1 05:44:29 2019 +0000 Change the format of the pp_attrs field: instead of using PTE bits directly, use abstracted bits that are converted from/to PTE bits when needed (in pmap_sync_pv). This allows us to use the same pp_attrs for pmaps that have PTE bits at different locations. commit 3352642c9eb39d4fc3a555252b3d951bf2a3a392 Author: maxv Date: Fri Feb 1 05:32:08 2019 +0000 Put correct values in the seg fields. AMD doesn't check for that, but Intel does, so they need to be correct. commit 0864c15d2c9ce02aead896248a8e64b118b089b1 Author: maxv Date: Thu Jan 31 20:42:31 2019 +0000 Move some code into a separate function, no functional change. commit 22cbb91988d2bc6209e7c68987d87d05d81a84e8 Author: maxv Date: Thu Jan 31 20:09:05 2019 +0000 Fix kernel info leaks. commit 1d46b41d4f319aea9b30c2650c136817162c3ea4 Author: maxv Date: Sun Jan 27 09:07:23 2019 +0000 satlink removed commit 9defd879c45cadca7d8977f33cb912349efd36e0 Author: maxv Date: Sun Jan 27 08:57:04 2019 +0000 regen commit a61b377f3fc75ec2558411f5e4cd8cba6a65ad57 Author: maxv Date: Sun Jan 27 08:53:28 2019 +0000 Remove the satlink driver. It was disabled everywhere, had no man page and no use either. Spotted by thorpej in PR/21345, ok christos. commit 7d511c12ce179812b31745508a6bc6ef085ad22b Author: maxv Date: Sat Jan 26 15:25:51 2019 +0000 Optimize: keep a per-VCPU buffer for the state, and copy in and out directly on it. The VCPUs are protected by mutexes, so nothing to worry about. This saves two kmem_allocs in {get,set}state. commit c034519d8c971b80e248b55eb183b1b47b539dcb Author: maxv Date: Sat Jan 26 15:12:20 2019 +0000 Remove nvmm_exit_memory.npc, useless. commit 90b0ce8a8beb73f6c4aec9738ef1f69d2a2195a6 Author: maxv Date: Sat Jan 26 14:44:54 2019 +0000 Ah, fix bug: when the opcode has an immediate, we fill the src with a register storage, but then we overwrite it without zeroing out the highest bits of the resulting immediate (which may contain garbage from the union). commit 4edf9e08c2d5da7cf40982a7d6a4b75c2bfc4dbb Author: maxv Date: Thu Jan 24 13:05:59 2019 +0000 Optimize: change the behavior of the HLT vmexit, make it a "change in vcpu state" which occurs after the instruction executed, rather than an instruction intercept which occurs before. Disable the shadow and the intr window in kernel mode, and advance the RIP, so that the virtualizer doesn't have to do it itself. This saves two syscalls and one VMCB cache flush. Provide npc for other instruction intercepts, in case someone is interested. commit e0ff52afb5d1fcb09eaa6f0fb6ecea4b7d8b81ee Author: maxv Date: Sun Jan 20 16:55:21 2019 +0000 Improvements in NVMM * Handle the FPU differently, limit the states via the given mask rather than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by the virtualizer, to force a reload from memory. * Hide RDTSCP. * Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature. * Take ECX and not RCX on MSR instructions. commit e3f2552b0a135e77807f0c9fd4ba665a382e9835 Author: maxv Date: Thu Jan 17 19:26:03 2019 +0000 Increase VM_PHYSSEG_MAX from 32 to 64. Saw an example on tech-kern@ of a heavily fragmented memory map. commit cf5e1b013dad3e2b4783435d8bb6824ee0fead0a Author: maxv Date: Thu Jan 17 14:24:51 2019 +0000 Simplify pmap_sync_pv: just pass a pa, and build the pte inside. commit a1dbdac2fe28914a6560a2d853dc36ad8aa3315f Author: maxv Date: Mon Jan 14 18:54:07 2019 +0000 Add #ifndef i386, the dbregs are 32bit in this case anyway. commit 8e3396fa6a3b46c4db11860baa2e562f4ec7cc29 Author: maxv Date: Mon Jan 14 18:51:15 2019 +0000 Fix bug, should be ip6_protox[]. commit d1d35a7ab21869626e74e9a4bc2a2f500f269fac Author: maxv Date: Sun Jan 13 12:19:09 2019 +0000 Forgot to commit file along with identcpu.c::rev1.86. commit df8a533027710d433c5abaac61622d5c52a222cf Author: maxv Date: Sun Jan 13 12:16:58 2019 +0000 On certain AMD f10h CPUs (like mine), the BIOS does not enable WC+. It means that the guest pages that are WC+ become CD, and this degrades performance of the guests. Explicitly enable WC+. While here clarify the AMD identification code. commit 8cb399318926d8aee03f910c1af3c094621de84e Author: maxv Date: Sun Jan 13 10:43:22 2019 +0000 Handle more corner cases, clean up a little, and add a set of instructions in Group1. commit cf712e1021547cd92b0e6ce990bf11a6abc57259 Author: maxv Date: Sun Jan 13 10:07:50 2019 +0000 Reset DR7 before loading DR0-3, to prevent a fault if the host process has dbregs enabled. commit 49dd173d7aee5574cc64a7c2081a339f1f6bdfaf Author: maxv Date: Sun Jan 13 10:01:07 2019 +0000 Error out if the higher 32 bits of DR6 and DR7 are set. MOV DR would fault otherwise. commit fe65b1082f76612238471131fc9c6c66a0a04c8c Author: maxv Date: Thu Jan 10 06:58:36 2019 +0000 Optimize: * Don't save/restore the host CR2, we don't care because we're not in a #PF context (and preemption switches already handle CR2 safely). * Don't save/restore the host FS and GS, just reset them to zero after VMRUN. Note: DS and ES must be reset _before_ VMRUN, but that doesn't apply to FS and GS. * Handle FSBASE and KGSBASE outside of the VCPU loop, to avoid the cost of saving/restoring them when there's no reason to leave the loop. commit c73cb7aefbfc51da7ee63133ea164a8c908bc21a Author: maxv Date: Tue Jan 8 14:43:18 2019 +0000 Optimize: don't keep a full copy of the guest state, rather take only what is needed. This avoids expensive memcpy's. Also flush the V_TPR as part of the CR-state, because there is CR8 in it. commit c31f2b7a7f314e565fdf7bdf6e143bf974fc47af Author: maxv Date: Tue Jan 8 07:34:22 2019 +0000 Handle REPN. FreeBSD has a "repn movs", which is a bit unusual, but doesn't seem illegal as far as I can tell from the AMD SDM. With that, I can boot FreeBSD on Qemu+NVMM. commit 22f74dfea3707feb5529f4127650ed27a9c52aa3 Author: maxv Date: Tue Jan 8 07:29:46 2019 +0000 _IOWR -> _IOW commit f7bb016339701f15208edb198c280d8baef8e050 Author: maxv Date: Mon Jan 7 18:13:34 2019 +0000 Optimize the legpref node: omit BRN (we don't care and it's the same as OVR_CS), inline the loops, sort the checks from most to least likely prefix, and use a compact structure. commit 9edc324e9b99e233a44399f04ff26eb5caf4bbc1 Author: maxv Date: Mon Jan 7 16:30:25 2019 +0000 Optimize: on single memory operand instructions, take the GPA directly from the exit structure provided by the kernel. This saves an MMU translation, and sometimes complex address computation (eg SIB). Drop the GVA field, it is not useful to virtualizers. commit def9083f44681cdf683fb584498e6256ff19a17f Author: maxv Date: Mon Jan 7 14:08:02 2019 +0000 Optimize: cache the guest state entirely in the VMCB-cache, flush it on a state-by-state basis when needed. commit 15992ed9f793d45abc86f105b6f2051936ac850a Author: maxv Date: Mon Jan 7 13:47:33 2019 +0000 Improvements and fixes: * Decode AND/OR/XOR from Group1. * Sign-extend the immediates and displacements in 64bit mode. * Fix the storage of {read,write}_guest_memory, now that we batch certain IO operations we can copy more than 8 bytes, and shit hits the fan. * Remove the CR4_PSE check in the 64bit MMU. This bit is actually ignored in long mode, and some systems (like FreeBSD) don't set it. commit f66528d8cdad2beb8654e43fbb5e26dad3d27dc8 Author: maxv Date: Sun Jan 6 18:32:54 2019 +0000 Add more VMCB fields. Also remove debugging code I mistakenly committed in the previous revision. No functional change. commit 8802d1ac34b7d52ef07196b5dcfa96bacf9cf512 Author: maxv Date: Sun Jan 6 16:19:12 2019 +0000 Flush the host TLB too when dealing with a guest pmap. The pmap is not active on the host so the pages aren't cached; but the recursive PTE entries may have been cached by our pmap code. commit bb0e98976e37cb597e3a1878e4e9bf1fa5291427 Author: maxv Date: Sun Jan 6 16:13:51 2019 +0000 Handle the NVMM signature. commit 02a074bc61ed811a75d5377ceb3325cc09f3a9cf Author: maxv Date: Sun Jan 6 16:10:51 2019 +0000 Improvements and fixes in NVMM. Kernel driver: * Don't take an extra (unneeded) reference to the UAO. * Provide npc for HLT. I'm not really happy with it right now, will likely be revisited. * Add the INT_SHADOW, INT_WINDOW_EXIT and NMI_WINDOW_EXIT states. Provide them in the exitstate too. * Don't take the TPR into account when processing INTs. The virtualizer can do that itself (Qemu already does). * Provide a hypervisor signature in CPUID, and hide SVM. * Ignore certain MSRs. One special case is MSR_NB_CFG in which we set NB_CFG_INITAPICCPUIDLO. Allow reads of MSR_TSC. * If the LWP has pending signals or softints, leave, rather than waiting for a rescheduling to happen later. This reduces interrupt processing time in the guest (Qemu sends a signal to the thread, and now we leave right away). This could be improved even more by sending an actual IPI to the CPU, but I'll see later. Libnvmm: * Fix the MMU translation of large pages, we need to add the lower bits too. * Change the IO and Mem structures to take a pointer rather than a static array. This provides more flexibility. * Batch together the str+rep IO transactions. We do one big memory read/write, and then send the IO commands to the hypervisor all at once. This considerably increases performance. * Decode MOVZX. With these changes in place, Qemu+NVMM works. I can install NetBSD 8.0 in a VM with multiple VCPUs, connect to the network, etc. commit 0780651548d17af289510226841426956e1eeb37 Author: maxv Date: Sat Jan 5 22:11:07 2019 +0000 Apply amd64/kobj_machdep.c::rev1.7 to the prekern too, to fix the relocation with updated binutils. commit b6c2e14c6bfa17ce0f7cecb911e286412a73a262 Author: maxv Date: Fri Jan 4 10:25:39 2019 +0000 In !64bit mode RIP-relative is null+disp32, handle that correctly. commit 24165c0d2188eedfa4a8c18f53bc2b6904803a4b Author: maxv Date: Thu Jan 3 10:16:43 2019 +0000 Add KASSERT. commit fac5e55a24367828649f075d3fd77fc9f88afd01 Author: maxv Date: Thu Jan 3 08:02:49 2019 +0000 Fix another gross copy-pasto. commit 1cb762c12044df197ad606bb031b92ee27edb4c8 Author: maxv Date: Wed Jan 2 12:18:08 2019 +0000 When there's no DecodeAssist in hardware, decode manually in software. This is needed on certain AMD CPUs (like mine): the segment base of OUTS can be overridden, and it is wrong to just assume DS. We fetch the instruction and look at the prefixes if any to determine the correct segment. commit 6b8576f31ede26be7f6e2e9b69db5cafaffe8129 Author: maxv Date: Sat Dec 29 17:54:54 2018 +0000 Fix the segmentation check, the limit is relative, not absolute. commit 25a48bf3dc08d99ae580927dce041b7305d44cf1 Author: maxv Date: Sat Dec 29 11:35:14 2018 +0000 Note mbuf API changes, and removal of compat_ibcs2. commit 2cc10ee9103631825c897428cfc19daf1edb08d6 Author: maxv Date: Sat Dec 29 11:33:00 2018 +0000 Remove reference to compat_darwin (was retired a long time ago). commit 1eadd3f33ed11428b33fe8be185d85bb96f6897f Author: maxv Date: Sat Dec 29 11:30:11 2018 +0000 Retire compat_ibcs2, as discussed on tech-kern@. FreeBSD did the same recently. commit 105b3e1b48705d2567fcc24c2e0c8ebb0eb30b39 Author: maxv Date: Sat Dec 29 09:48:54 2018 +0000 Disable compat_ibcs2, it is being retired. commit 2d9c252e28357d09fb01b586b9c0597b8c9e28d3 Author: maxv Date: Thu Dec 27 16:59:17 2018 +0000 Remove unused arguments. commit 03d928445a256d7b6b82893f174ebe6eac253058 Author: maxv Date: Thu Dec 27 14:24:11 2018 +0000 Style, use __nothing, and remove _M_ (unused, appears to be a typo). No functional change. commit 32206537b802d4e118965a504d01e004c3caf3b0 Author: maxv Date: Thu Dec 27 14:03:54 2018 +0000 Remove M_COPY_PKTHDR, M_MOVE_PKTHDR, M_ALIGN and MH_ALIGN. commit fd2813a50320abdee3490cd451b3464bdc614659 Author: maxv Date: Thu Dec 27 09:57:16 2018 +0000 Fix kernel info leaks. + Possible info leak: [len=80, leaked=10] | #0 0xffffffff80bad7a7 in kleak_copyout | #1 0xffffffff8048e71b in netbsd32___msgctl50 | #2 0xffffffff8022fb5b in netbsd32_syscall | #3 0xffffffff802096dd in handle_syscall commit 3f983a428861a65df592aa4e8422057ec748a745 Author: maxv Date: Thu Dec 27 07:56:43 2018 +0000 Fix apparent race. We're doing a LIST_FOREACH, but unlock filelist_lock in the middle of the loop and drop the reference to fp. We then read fp->...le_next, but it may have been freed by another thread. This is difficult to trigger and observe, probably only KASAN can see problems of this kind. Switch to LIST_FOREACH_SAFE, and re-fetch np after re-locking. May fix PR/53674. commit 6d0107130349e7ccc613d85a3866e5fc6356acd2 Author: maxv Date: Thu Dec 27 07:22:31 2018 +0000 Several improvements and fixes: * Change the Assist API. Rather than passing callbacks in each call, the callbacks are now registered beforehand. Then change the I/O Assist to fetch MMIO data via the Mem callback. This allows a guest to perform an I/O string operation on a memory that is itself an MMIO. * Introduce two new functions internal to libnvmm, read_guest_memory and write_guest_memory. They can handle mapped memory, MMIO memory and cross-page transactions. * Allow nvmm_gva_to_gpa and nvmm_gpa_to_hva to take non-page-aligned addresses. This simplifies a lot of things. * Support the MOVS instruction, and add a test for it. This instruction is special, in that it takes two implicit memory operands. In particular, it means that the two buffers can both be in MMIO memory, and we handle this case. * Fix gross copy-pasto in nvmm_hva_unmap. Also fix a few things here and there. commit d22f11146af4af92f13fa5788e650ea0708c7efb Author: maxv Date: Mon Dec 24 16:04:14 2018 +0000 Remove unused macros. commit e79d348f996b48927cefacb19af76844f53c3f90 Author: maxv Date: Mon Dec 24 15:57:15 2018 +0000 Remove unused function. commit d075e37709eea2c59203654ccde18829dc3cc640 Author: maxv Date: Sun Dec 23 13:35:02 2018 +0000 Add initial tests for libnvmm's Mem Assist, with 8 test cases. commit 178391aab82c98e343899616557de5785461ea05 Author: maxv Date: Sun Dec 23 12:18:30 2018 +0000 Add /dev/nvmm. commit 0c81886dfc845e75834bc25f2dff43c90affa709 Author: maxv Date: Sun Dec 23 12:15:01 2018 +0000 Simplify the KASAN API, use only kasan_mark() and explain briefly. The alloc/free naming was too confusing. commit 96f06b7faf6cdcf4e9cce253c47fd6fd7d015cd6 Author: maxv Date: Sun Dec 23 11:42:13 2018 +0000 Remove useless debugging code, the area is completely filled but it's not checked afterwards, only pi_magic is. commit b2a7da79360de69dd615ed840db87d0a0cd21dd1 Author: maxv Date: Sat Dec 22 14:39:46 2018 +0000 Update the man page, we don't want M_COPY_PKTHDR, M_MOVE_PKTHDR, MH_ALIGN and M_ALIGN. commit 599ca9ed39317b5e638c7ae1120e59d9d9ad2928 Author: maxv Date: Sat Dec 22 14:28:56 2018 +0000 Replace M_ALIGN and MH_ALIGN by m_align. commit f57b8dc3fd8ba022cc1b214bec3a45576e17bbac Author: maxv Date: Sat Dec 22 14:07:53 2018 +0000 Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the former is a macro to the latter. commit 64d9d1ae10a2a129bc4c8a9220d578d4c5f1b11d Author: maxv Date: Sat Dec 22 13:55:56 2018 +0000 Move m_align() back into the kernel, and switch M_ALIGN and MH_ALIGN to it. Forcing a distinction between M_ALIGN and MH_ALIGN is too bug-friendly and serves no particular purpose. commit fa3b4ce5a27e890b96f01a59de8b3828fe7a4d0f Author: maxv Date: Sat Dec 22 13:11:37 2018 +0000 Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the former is a macro to the latter. commit 4fa3e0601315b2fdaa911ef5e2bc933c5a900991 Author: maxv Date: Sat Dec 22 10:00:39 2018 +0000 In the end, disable the supposed architectural SpectreV2 mitigation on AMD f12h and f16h. The SDMs of these CPUs haven't been updated since, and we shouldn't assume the position of the bits, we just can't know where they are. Initially I included f12h and f16h because f10h is actually documented to have a bit to disable the indirect branch predictor, and there were patches available in SuSE and CentOS that were treating f10h/f12h/f16h all the same. Knowing that SuSE has ties with AMD, it seemed safe to assume that these patches were correct and that f12h and f16h could indeed be treated the same way as f10h. But these patches have now disappeared, and the main Linux branch doesn't have them, without clear explanation. Therefore, I prefer to roll-back. commit 41fc05d4537e60ae81c0aca9a490f20962c14cd6 Author: maxv Date: Sat Dec 22 09:20:30 2018 +0000 Add AMD_SSB_NO, so that we explicitly say than an AMD CPU is not affected when it's not affected. commit 0efcb35274a70833c99cdff7a90b5656bde65382 Author: maxv Date: Sat Dec 22 08:59:44 2018 +0000 If the CPU is not vulnerable to SpectreV4, say it in the sysctl by default. Apply some minor style while here. commit 149df1c2bdd1a9d67a3156a6122288c295b4cb38 Author: maxv Date: Sat Dec 22 08:35:04 2018 +0000 Style, once again. commit 7ef1c58f7f6a294ac5a4c521dff4e7cbaa07892e Author: maxv Date: Wed Dec 19 14:07:51 2018 +0000 Note removal of COMPAT_SVR4. commit 67cbb0910002d42c6a9cb8cf0274261c4971de5c Author: maxv Date: Wed Dec 19 13:57:44 2018 +0000 Remove compat_svr4 and compat_svr4_32, as discussed on tech-kern@ recently, but also as discussed several times in the past. commit be5787d0db9b28fe59a6e456df35f7953d2b5c62 Author: maxv Date: Mon Dec 17 07:10:07 2018 +0000 Remove dead checks, they were already pointless when I fixed them a few years ago, and now they are wrong because the PTE space is randomized. commit 6abc0b70f6234dbc412f8cdbef2235b332a22473 Author: maxv Date: Mon Dec 17 06:58:54 2018 +0000 Add two pmap fields, will be used by NVMM-VMX. Also apply a few cosmetic changes. commit 3ffd9dd96dfbe50aca11ee44f1ca86d0ab5b10d1 Author: maxv Date: Sun Dec 16 21:03:35 2018 +0000 Add support for detecting use-after-frees in KASAN. We poison each freed buffer, any subsequent read or write will be detected as illegal. * Add POOL_CHECK_MAGIC, which is disabled under KASAN, because the same detection is done in a better way. * Register the size+redzone in the pool structure, to reduce the overhead. * Fix the CTOR/DTOR check in KLEAK, the fields are never NULL. commit 3448a39defab6ed216bc32cf49dcb434e96875b5 Author: maxv Date: Sun Dec 16 10:42:32 2018 +0000 Explicitly disable ALTINST on VIA, in case it isn't disabled by default already (the 'VIA cpu backdoor'). commit a7d1548b50e74991bfacdee2a9f8bf4d6c154100 Author: maxv Date: Sat Dec 15 13:39:43 2018 +0000 Invert the mapping logic. Until now, the "owner" of the memory was the guest, and by calling nvmm_gpa_map(), the virtualizer was creating a view towards the guest memory. Qemu expects the contrary: it wants the owner to be the virtualizer, and nvmm_gpa_map should just create a view from the guest towards the virtualizer's address space. Under this scheme, it is legal to have two GPAs that point to the same HVA. Introduce nvmm_hva_map() and nvmm_hva_unmap(), that map/unamp the HVA into a dedicated UOBJ. Change nvmm_gpa_map() and nvmm_gpa_unmap() to just perform an enter into the desired UOBJ. With this change in place, all the mapping-related problems in Qemu+NVMM are fixed. commit e88dc7e8bf7b918df1c85a6c72dacd5e18dc7468 Author: maxv Date: Sat Dec 15 13:09:02 2018 +0000 Two changes: - Fix the I/O Assist, for INS* it is RDI and not RSI, and the register gets updated regardless of the REP prefix. - Fill in the Mem Assist. We decode and emulate certain instructions, and pass a mem descriptor to the callback to handle the transaction. The disassembler could use some polishing, and there are still a few instructions missing; but basically it works. commit d7cd818ef4797bf4e0d098740fa249fee8376531 Author: maxv Date: Sat Dec 15 12:08:18 2018 +0000 Add KASAN and KLEAK. commit 65f495e8aa2116bef51801809f26a7ad61d3472b Author: maxv Date: Thu Dec 13 16:28:10 2018 +0000 Don't forget to advance the RIP after an XSETBV emulation. commit e3002faf2d915ce3ca295b1fb2b7ad042d25fa55 Author: maxv Date: Wed Dec 12 10:42:34 2018 +0000 Change the map/unmap functions, again. commit e785e4bc51e75f6bd156006b6aee07ecf3769f99 Author: maxv Date: Wed Dec 12 09:09:08 2018 +0000 Change the "FILES" section, in the end I don't want to commit toyvirt and smallkern, there is little interest installing them by default, rather they can be downloaded from www. It's better this way. While here add NVMM(4) in "SEE ALSO". commit 01e92e3efa7d567c1b4749a8b406fdee39340c23 Author: maxv Date: Wed Dec 12 08:28:19 2018 +0000 Say that on x86 you also have to modload tprof_x86. commit 549f1c0105da4908d956444b1ee5562e5fb7a127 Author: maxv Date: Wed Dec 12 08:24:50 2018 +0000 regen commit cb8f3d4e7846fc3a167b8a44ec157939df504ab5 Author: maxv Date: Wed Dec 12 08:20:53 2018 +0000 Add a NVMM(4) man page. commit c1b7dd634a6cd2383b0dd9afef3fa7077a2e6b20 Author: maxv Date: Wed Dec 12 08:02:17 2018 +0000 note kleak commit 9f395fe5d505418b85e4580383439c944d8bfa72 Author: maxv Date: Wed Dec 12 07:07:30 2018 +0000 Drop LMC-related entry from TODO.smpnet, and note removal of LMC. commit 527d9ef87a78a4f1b815b504abf94843800a49b9 Author: maxv Date: Wed Dec 12 07:04:05 2018 +0000 Retire the LMC driver, and its associated lmcconfig tool. LMC has been mentioned repeatedly as a non-MP-safe driver that is hard to maintain, and no one is taking care of it. LMC was removed from OpenBSD three years ago, and from FreeBSD a few months ago. commit b68616d271ed34833879bcccdcd9b394fde2bf58 Author: maxv Date: Wed Dec 12 06:29:36 2018 +0000 Remove references to "lmc" in the kernel configurations. commit e98f1007d331224aeedd9bed332d7fbb53eefde4 Author: maxv Date: Mon Dec 10 15:08:23 2018 +0000 Remove unused mbuf.h includes. commit 994868d3224d59042f625091249b538a0e2d09d2 Author: maxv Date: Mon Dec 10 14:46:24 2018 +0000 Remove unused mbuf.h includes. commit 2dc2e043dfa90b7784058a48b14323c641044b8f Author: maxv Date: Mon Dec 10 07:24:49 2018 +0000 Improve error handling, doesn't matter a lot, but still. commit 6e2f50d95103a656da1aa1e00177302ce5f1ff3f Author: maxv Date: Fri Dec 7 15:47:11 2018 +0000 Add an option to have a static kernel memory layout. This option is disabled by default - that is to say, KASLR remains enabled by default. commit c99f7266ed2cfcb0441cdf10dfb9bd3d4f8a2d19 Author: maxv Date: Thu Dec 6 17:44:28 2018 +0000 Simplify, use _pi instead of modulos, no real functional change. commit a532d8049825feb0d261efad1cac619278e292de Author: maxv Date: Thu Dec 6 17:26:18 2018 +0000 Fix inconsistency, these are indexes and not types, no real functional change. commit 996810818fe17b1e1c3e00b785143ff83ba5aa70 Author: maxv Date: Sun Dec 2 21:00:13 2018 +0000 Introduce KLEAK, a new feature that can detect kernel information leaks. It works by tainting memory sources with marker values, letting the data travel through the kernel, and scanning the kernel<->user frontier for these marker values. Combined with compiler instrumentation and rotation of the markers, it is able to yield relevant results with little effort. We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is supported on amd64 only for now, but it is not complicated to add more architectures (just a matter of having the address of .text, and a stack unwinder). A userland tool is provided, that allows to execute a command in rounds and monitor the leaks generated all the while. KLEAK already detected directly 12 kernel info leaks, and prompted changes that in total fixed 25+ leaks. Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer FKIE). commit e4d0eeb86e0f7795b2374f41ff95dd758ac362ba Author: maxv Date: Sat Dec 1 14:05:33 2018 +0000 Fix kernel info leak, 4 bytes of padding in struct _ksiginfo. Maybe we should just set _pad to zero on LP64? + Possible info leak: [len=40, leaked=4] | #0 0xffffffff80baf397 in kleak_copyout | #1 0xffffffff80bda817 in sigtimedwait1 | #2 0xffffffff80bdab95 in sys_____sigtimedwait50 | #3 0xffffffff80259c42 in syscall commit 6882fb8eca9b71cad2c720fb94d273333dee0eca Author: maxv Date: Thu Nov 29 19:55:20 2018 +0000 Rewrite the gpa map/unmap functions. Dig holes in the mapped areas when there is an overlap. Close to what Qemu expects. commit d98d77b991a4660d6d713dd284a8974ad1b9e29b Author: maxv Date: Thu Nov 29 17:40:12 2018 +0000 Improve my kern_time.c::rev1.192, systematically clear the buffers we get from 'ptimer_pool' to prevent more leaks. commit ffa8c8354c734220f29d4f799e26accf0ece42e7 Author: maxv Date: Thu Nov 29 12:37:22 2018 +0000 Fix info leak. There is one branch where 'status' is not initialized at all. + Possible info leak: [len=4, leaked=4] | #0 0xffffffff80baf397 in kleak_copyout | #1 0xffffffff80b56d0c in sys_wait6 | #2 0xffffffff80259c42 in syscall commit ee382f5de49f3a0b97a03123c92fa70f253b9b24 Author: maxv Date: Thu Nov 29 11:45:52 2018 +0000 Fix stack info leak. + Possible info leak: [len=136, leaked=92] | #0 0xffffffff80baf397 in kleak_copyout | #1 0xffffffff80bd4155 in ptrace_copyout_siginfo | #2 0xffffffff80bd5348 in do_ptrace | #3 0xffffffff80bd40fe in sys_ptrace | #4 0xffffffff80259c42 in syscall commit 9c13015dc36b306a6e8f2bac533ba0bab7e5052b Author: maxv Date: Thu Nov 29 10:27:36 2018 +0000 Fix kernel info leak, 4 bytes of padding at the end of struct sigaction. + Possible info leak: [len=32, leaked=4] | #0 0xffffffff80baf327 in kleak_copyout | #1 0xffffffff80bd9ca8 in sys___sigaction_sigtramp | #2 0xffffffff80259c42 in syscall commit 5834073f1e87d8333d67793d05f8e84d55a57f18 Author: maxv Date: Wed Nov 28 15:10:40 2018 +0000 Fix kernel info leak. + Possible info leak: [len=32, leaked=16] | #0 0xffffffff80baf3a7 in kleak_copyout | #1 0xffffffff80b940f8 in sys___timer_settime50 | #2 0xffffffff80259c42 in syscall commit a0e329c4e66f7a7db54073049ca3867bdcc39f77 Author: maxv Date: Tue Nov 27 14:09:53 2018 +0000 Fix widespread leak in the sendsig_siginfo() functions. sigframe_siginfo has padding, so zero it out properly. While here I'm also zeroing out some other things in several ports, for safety. Same problem in netbsd32, so fix that too. I can't compile-test on each architecture, but there should be no breakage (tm). Overall this fixes at least 14 info leaks. Prompted by the discovery by KLEAK of a leak in amd64's sendsig_siginfo. commit 2affef406444a1316ae1662d2702c910145a788a Author: maxv Date: Sun Nov 25 14:11:24 2018 +0000 Appease the check: allow NVMM_MAX_RAM bytes of memory, and not just NVMM_MAX_RAM-1. commit aa9fffd9559d03f72c332e6e70a86dbe6c583e2a Author: maxv Date: Sun Nov 25 14:09:57 2018 +0000 Add RFLAGS in the exitstate. commit 7eff9bc169e7a1ce6ce9d22100061174f612fc3f Author: maxv Date: Sat Nov 24 17:54:18 2018 +0000 Mark as done the two entries I added just minutes ago, they are now fixed. commit 242332e518f31084d9c785e7080b34bd21d66e77 Author: maxv Date: Sat Nov 24 17:52:39 2018 +0000 Fix kernel pointer leaks in sysctl_dobuf. While here constify argument. Also memset the buffer, to prevent leaks (even if there doesn't seem to be currently). commit 7d90f2d49665c3c344f22e4d8055fb72fbccc911 Author: maxv Date: Sat Nov 24 17:40:37 2018 +0000 Fix kernel pointer leaks in sysctl_doevcnt. While here also fix info leak; there is a big padding so use zalloc. commit d58d57719d119780306c816c56acfe8caf03d791 Author: maxv Date: Sat Nov 24 17:31:10 2018 +0000 Mark four issues as fixed, add two more. Netstat was actually sysctl_unpcblist, so remove it as duplicate. commit 611282111ea3de9b1a8b1e253a97a27bcbf10936 Author: maxv Date: Sat Nov 24 17:26:27 2018 +0000 Fix kernel pointer leaks in the kern.lwp sysctl. commit 9aba6728719ce6929f21c9c82f326ffb2c03c3f0 Author: maxv Date: Sat Nov 24 17:16:44 2018 +0000 Fix kernel pointer leaks in sysctl_unpcblist. commit f297c47f6629b40bee3b269bda67372b2c41d15e Author: maxv Date: Sat Nov 24 17:05:54 2018 +0000 KNF, no functional change. commit ed099ca6e32ce4843913f2f4e23ffecc1057a360 Author: maxv Date: Sat Nov 24 16:58:40 2018 +0000 Fix kernel pointer leaks in sysctl_inpcblist. commit 373383ddeebc0814f4c9991c83dc241e02e086c2 Author: maxv Date: Sat Nov 24 16:41:48 2018 +0000 Fix kernel pointer leaks in the kern.file sysctl, same as kern.file2. commit e18e92f8935783f17d88d629b521641368e5042f Author: maxv Date: Sat Nov 24 16:25:20 2018 +0000 Rename fill_file -> fill_file2, since that's the KERN_FILE2 sysctl. commit 84b63de6c5e4fe3c92ec37acc063b7df4c90d172 Author: maxv Date: Sat Nov 24 16:18:36 2018 +0000 Fix kernel info leak, we do a blunt copy of struct proc, but it has padding. So zero out the structure on each allocation. And copy field by field while here, because many fields should be hidden by COND_SET_VALUE. commit 5aea1ef46fc591f95cb4a13e4df7ac90c503813c Author: maxv Date: Thu Nov 22 07:37:12 2018 +0000 Add missing pmap_update after pmap_kenter_pa, noted by Kamil. commit 254997837180c25590347f145ce4478b6001b051 Author: maxv Date: Tue Nov 20 06:43:26 2018 +0000 Note support for Intel Silvermont/Airmont. commit ed114ff4dc2467302e81153e3b3a0fd3090ee411 Author: maxv Date: Mon Nov 19 21:45:37 2018 +0000 Fix error handling of realloc, and use memmove because the areas overlap; noted by agc@. These _nvmm_area_add/delete functions don't make a lot of sense right now and will likely be rewritten to match the behavior expected by Qemu; but still fix for the time being. Also fix a collision check while here. commit 365912a13862f67d74d6b9820d965323f2bc31c6 Author: maxv Date: Mon Nov 19 20:44:51 2018 +0000 Introduce pl_pi, will be used soon. commit 04abb5191938d3a0e12305d33c903a1018744de6 Author: maxv Date: Mon Nov 19 20:28:01 2018 +0000 Rename 'mask' -> 'frame', we will use the real 'mask' soon. commit 9d4e3adbeef2598e7eeb4e0141d9ad587e3266ce Author: maxv Date: Mon Nov 19 17:35:12 2018 +0000 Rename one constant, for clarity. commit e0463a397f0f81bd1549fdb2002e8de92e90f0df Author: maxv Date: Sun Nov 18 07:42:24 2018 +0000 Ah, should be UVM_ADV_RANDOM. commit 327344f16ac2ab65ec73fff66073bfd69412bbe8 Author: maxv Date: Sat Nov 17 16:11:33 2018 +0000 Don't forget to set 'prot' when the guest has paging disabled. commit ac6d5e2c0a0c9d5d175e04e665c59a905134e7c3 Author: maxv Date: Thu Nov 15 14:19:23 2018 +0000 Woah man, fix enormous leak. Possible info leak: [len=1056, leaked=931] #0 0xffffffff80bad351 in kleak_copyout #1 0xffffffff80b2cf64 in uvm_swap_stats.part.1 #2 0xffffffff80b2d38d in uvm_swap_stats #3 0xffffffff80b2d43c in sys_swapctl #4 0xffffffff80259b82 in syscall commit c569d70e2700a75f24d32a2d4613544cfef50ebc Author: maxv Date: Thu Nov 15 11:18:33 2018 +0000 Reduce indentation level. commit 4898a5e65ada7b6387c0d5c7d9adba3c014a7325 Author: maxv Date: Thu Nov 15 10:56:29 2018 +0000 Remove the 'copy' argument from m_devget(), unused. While here rename off0->off. commit c33882d10b4f23e327a42e2e441f4519d5d8cf80 Author: maxv Date: Thu Nov 15 10:37:26 2018 +0000 Add KASSERTs. commit 37026718d2c8f806127a2919a781c6c8d5b03a27 Author: maxv Date: Thu Nov 15 10:23:55 2018 +0000 Remove the 't' argument from m_tag_find(). commit c4123d3dc7d878dba942c6f353ddaa400a956269 Author: maxv Date: Thu Nov 15 10:06:06 2018 +0000 Simplify the mtag API: - Remove m_tag_init(), m_tag_first(), m_tag_next() and m_tag_delete_nonpersistent(). - Remove the 't' argument from m_tag_delete_chain(). commit b81bcb9444486b705fcdb6db22344c1d65f98bf0 Author: maxv Date: Thu Nov 15 09:38:57 2018 +0000 Merge uipc_mbuf2.c into uipc_mbuf.c. Reorder the latter a little to gather similar functions. No functional change. commit 86a74f4fcdb0da86ef214b0af2bb28e833dba5e7 Author: maxv Date: Wed Nov 14 19:14:40 2018 +0000 Take RAX from the VMCB and not the VCPU state, the latter is not synchronized and contains old values. commit e06f19fd7f32d1d07eecb38bc3e82926204afd0c Author: maxv Date: Tue Nov 13 07:45:43 2018 +0000 Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr. [ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2] [ 944.617335] #0 0xffffffff80b7c44a in kleak_note [ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout [ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if [ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist [ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable [ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch [ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl [ 944.667354] #7 0xffffffff8025ab3c in sy_call [ 944.667354] #8 0xffffffff8025ad6e in sy_invoke [ 944.677365] #9 0xffffffff8025adf4 in syscall commit adf0c9ffac675b1dd72d1e1b242536a276e2c3ca Author: maxv Date: Tue Nov 13 07:16:33 2018 +0000 Fix kernel info leak. There are 2x4 bytes of padding in struct itimerval. [ 738.451860] kleak: Possible leak in copyout: [len=32, leaked=8] [ 738.481840] #0 0xffffffff80b7c42a in kleak_note [ 738.491821] #1 0xffffffff80b7c4aa in kleak_copyout [ 738.501806] #2 0xffffffff80b6154e in sys___getitimer50 [ 738.511778] #3 0xffffffff80b61e39 in sys___setitimer50 [ 738.521781] #4 0xffffffff8025ab3c in sy_call [ 738.521781] #5 0xffffffff8025ad6e in sy_invoke [ 738.531808] #6 0xffffffff8025adf4 in syscall commit 690a1b0ab3f7250630ba7801d1c09408abf3b6d7 Author: maxv Date: Tue Nov 13 06:58:14 2018 +0000 Fix kernel info leak. There are 4 bytes of padding in struct kevent. [ 287.537676] kleak: Possible leak in copyout: [len=40, leaked=4] [ 287.537676] #0 0xffffffff80b7c41a in kleak_note [ 287.547673] #1 0xffffffff80b7c49a in kleak_copyout [ 287.557677] #2 0xffffffff80b1d32d in kqueue_scan.isra.1.constprop.2 [ 287.557677] #3 0xffffffff80b1dc6a in kevent1 [ 287.567683] #4 0xffffffff80b1dcb0 in sys___kevent50 [ 287.567683] #5 0xffffffff8025ab3c in sy_call [ 287.577688] #6 0xffffffff8025ad6e in sy_invoke [ 287.587693] #7 0xffffffff8025adf4 in syscall commit 214caece6f4646a1ee7e5ae9ffd1c9dafb364212 Author: maxv Date: Mon Nov 12 18:10:36 2018 +0000 Add a comment explaining an important rule. Just to better highlight that this rule is actually not respected. commit 451961169b68f5f52e5d9bf71fd6ec5c5ad1ba2d Author: maxv Date: Mon Nov 12 06:55:03 2018 +0000 Fix buffer overflow, which can lead to severe information leak. Detected by kASan. commit cdb77562ededd2cc01e458ca9fed784141ca5a8d Author: maxv Date: Mon Nov 12 06:53:43 2018 +0000 Fix inverted logic, which leads to buffer overflow. Detected by kASan. commit b161bce442f5e07ea7d438594c3abde8d1e2cd7c Author: maxv Date: Sun Nov 11 12:03:07 2018 +0000 Fix the libnvmm sets, do the same as libx86_64. commit 5cbe89012cdf8615ed79064712c547deb3b3fdfd Author: maxv Date: Sun Nov 11 11:17:49 2018 +0000 Fix stack info leak. There are 4 bytes of padding in struct timeval. Looks like there are other leaks related to timeval in this file. [ 133.414352] kleak: Possible leak in copyout: [len=16, leaked=4] [ 133.414352] #0 0xffffffff80224d0a in kleak_note [ 133.424360] #1 0xffffffff80224d8a in kleak_copyout [ 133.434361] #2 0xffffffff80b5fd79 in sys___gettimeofday50 [ 133.434361] #3 0xffffffff8025a89c in sy_call [ 133.444351] #4 0xffffffff8025aace in sy_invoke [ 133.454365] #5 0xffffffff8025ab54 in syscall commit d14c0d610e2fd0015a8b6fdd80dfa30ee82e2c54 Author: maxv Date: Sun Nov 11 10:58:40 2018 +0000 Fix stack info leak. There is a big padding in struct sigframe_siginfo. [ 224.006287] kleak: Possible leak in copyout: [len=920, leaked=92] [ 224.016977] #0 0xffffffff80224d0a in kleak_note [ 224.026268] #1 0xffffffff80224d8a in kleak_copyout [ 224.026268] #2 0xffffffff802224b5 in sendsig_siginfo [ 224.036261] #3 0xffffffff80b51564 in sendsig [ 224.046475] #4 0xffffffff80b51282 in postsig [ 224.046475] #5 0xffffffff80b2fc5d in lwp_userret [ 224.056273] #6 0xffffffff8025a951 in mi_userret [ 224.066277] #7 0xffffffff8025ab89 in syscall commit c07426e7214a8ab9422bcec3f4806aa9a278937f Author: maxv Date: Sun Nov 11 10:55:58 2018 +0000 Fix stack info leak. There are 2x4 bytes of padding in struct ps_strings. [ 223.896199] kleak: Possible leak in copyout: [len=32, leaked=8] [ 223.906430] #0 0xffffffff80224d0a in kleak_note [ 223.906430] #1 0xffffffff80224d8a in kleak_copyout [ 223.918363] #2 0xffffffff80b1e26c in copyoutpsstrs [ 223.926560] #3 0xffffffff80b1e331 in copyoutargs [ 223.936216] #4 0xffffffff80b21768 in execve_runproc [ 223.946225] #5 0xffffffff80b21cc9 in execve1 [ 223.946225] #6 0xffffffff8025a89c in sy_call [ 223.956225] #7 0xffffffff8025aace in sy_invoke [ 223.966232] #8 0xffffffff8025ab54 in syscall commit 174e9e1fc3f5999116b06a2b153c439473454ac2 Author: maxv Date: Sat Nov 10 11:08:54 2018 +0000 Merge the VIA detection code into cpu_probe_c3. commit 38a06a522d6f5adfb9ba8037988859f476b512a7 Author: maxv Date: Sat Nov 10 10:57:06 2018 +0000 Add copyright and RCSID, from wiz@. commit ea8168702673feedb3021e0472bf22b2f4656a3b Author: maxv Date: Sat Nov 10 10:52:51 2018 +0000 Declare the MSR_VIA_ACE values as macros, and use a consistent naming, similar to the rest of the file. I'm wondering if I'm not fixing a huge bug here. The ECX8 value we were using was wrong: ECX8 is bit 1, not bit 0. Bit 0 is ALTINST, an alternate ISA, which is now known to be backdoored. So it looks like we were explicitly enabling the backdoor. Not tested, because I don't have a VIA cpu. commit 33c33657c05873631724acbc68f17cb79eb93b2b Author: maxv Date: Sat Nov 10 09:42:42 2018 +0000 Remove unused cpu_msr.h includes. commit e00c4d7eb507db478a9a082f0db453d10159e0b9 Author: maxv Date: Sat Nov 10 09:28:56 2018 +0000 Add libnvmm, NetBSD's new virtualization API. It provides a way for VMM software to effortlessly create and manage virtual machines via NVMM. It is mostly complete, only nvmm_assist_mem needs to be filled -- I have a draft for that, but it needs some more care. This Mem Assist should not be needed when emulating a system in x2apic mode, so theoretically the current form of libnvmm is sufficient to emulate a whole class of systems. Generally speaking, there are so many modes in x86 that it is difficult to handle each corner case without introducing a ton of checks that just slow down the common-case execution. Currently we check a limited number of things; we may add more checks in the future if they turn out to be needed, but that's rather low priority. Libnvmm is compiled and installed only on amd64. A man page (reviewed by wiz@) is provided. commit 1e688a3c161a038c2ca39d7f14e1a7c90082442c Author: maxv Date: Thu Nov 8 10:55:41 2018 +0000 Simplify the ifdefs, and error out if XEN and USER_LDT are both defined. commit 107b9be98012a0311aa08eeb4c74e6b8bbd3a1d6 Author: maxv Date: Thu Nov 8 08:32:57 2018 +0000 Note NVMM and aarch64+kasan. commit cf7ee228f603ed519aa37c2ab736acb01b120531 Author: maxv Date: Thu Nov 8 08:28:07 2018 +0000 Track the stack with kASan on aarch64. Same principle as on amd64. Illegal accesses occurring there are now detected. Originally written by me, but reworked by ryo@, thanks. commit fca82350004555f133f27276ef973b47e78b11d7 Author: maxv Date: Wed Nov 7 07:49:10 2018 +0000 regen for nvmm commit 5c184ba9b5764e690485213d54f1c92aade4a23a Author: maxv Date: Wed Nov 7 07:43:07 2018 +0000 Add NVMM - for NetBSD Virtual Machine Monitor -, a kernel driver that provides support for hardware-accelerated virtualization on NetBSD. It is made of an MI frontend, to which MD backends can be plugged. One MD backend is implemented, x86-SVM, for x86 AMD CPUs. We install /usr/include/dev/nvmm/nvmm.h /usr/include/dev/nvmm/nvmm_ioctl.h /usr/include/dev/nvmm/{arch}/nvmm_{arch}.h And the kernel module. For now, the only architecture where we do that is amd64 (arch=x86). NVMM is not enabled by default in amd64-GENERIC, but is instead easily modloadable. Sent to tech-kern@ a month ago. Validated with kASan, and optimized with tprof. commit a69f1ec159e6b9772a96abbb9d48fe5ca555fa86 Author: maxv Date: Wed Nov 7 07:14:51 2018 +0000 Add two pmap fields, will be used by NVMM. commit b355369f2ad65c2f697b943b6151a66e23dca3b1 Author: maxv Date: Sun Nov 4 12:48:01 2018 +0000 Add tprof in MAKEDEV.tmpl, and regen MAKEDEV.8. commit 01abb36e84fe9c3ebc545f32f60dc5d35f1753b5 Author: maxv Date: Sat Nov 3 08:27:16 2018 +0000 Remove VA_SIGN_POS from the computation of the indexes, it is not needed. commit 8f5d88e9f7fdd56966f7514e0cbbc90afe0893cb Author: maxv Date: Fri Nov 2 12:27:47 2018 +0000 Add LIST_INIT for filehead. commit 7ebc52147c474fe9131f61644fbe40099d450023 Author: maxv Date: Fri Nov 2 11:59:59 2018 +0000 no, should be dst commit bd93e82019a268f2694bf7d896f1067b208d7ec8 Author: maxv Date: Fri Nov 2 08:59:59 2018 +0000 Don't overflow on the strings we read. Introduce db_read_string(), which stops on '\0'. Probably this doesn't matter a lot because the read is supposed to be safe, but let's not have bugs in the debugger. Detected by kASan, via skrll@ on aarch64, by typing "ps/l" on DDB. commit 92ecfeae8f4ae869da4ec1adc2a45843ee93393c Author: maxv Date: Thu Nov 1 20:34:49 2018 +0000 Add kASan support for aarch64. Stack tracking needs more investigation and will come in a separate commit. Reviewed by ryo@ jmcneill@ skrll@. commit ee40db3ad6c463697f6f6bb6e42fb47aedbd597a Author: maxv Date: Wed Oct 31 18:35:04 2018 +0000 Revert my kasan addition in this makefile, it looks like it causes asan.h to be installed, while we don't want it to be. commit 4df269c0bcce078a6c4f27840c901d0151ba6c2c Author: maxv Date: Wed Oct 31 06:26:25 2018 +0000 Move the MI parts of KASAN into kern/subr_asan.c. This file includes machine/asan.h, which contains the MD functions. We use an include rather than a plain C file, because we want GCC to optimize/inline some functions into one single block. The amd64 MD parts of KASAN are moved accordingly. The naming convention we use is: kasan_* a generic kasan object, declared in subr_asan.c kasan_md_* an MD kasan object, declared in machine/asan.h, and used in subr_asan.c __md_* an MD object, declared in machine/asan.h, and not used outside Overall this makes it easier to add KASAN support on more architectures. Discussed with several people. commit 9fba878b2d684525b8a97ef89d7cf7fba23f019e Author: maxv Date: Sun Oct 28 14:12:16 2018 +0000 Add #ifdef _KERNEL, vaddr_t does not exist in userland, and we don't want externs anyway. commit faa4f11a93e57712f75cbc9a9f3f76b05b54adc6 Author: maxv Date: Sat Oct 27 06:46:43 2018 +0000 Remove printfs that are too easily reachable, switch to M_REGION_GET, and simplify the initialization. No real functional change. commit b211d00a2c5c837f00dd2a7c0615fd126087b30a Author: maxv Date: Sat Oct 27 06:35:54 2018 +0000 Rename kasan_shadow_fill, remove one check in it, and inline it. Remove the use-after-scope code for now, because our GCC does not support that and when it does we will want to test the feature for real rather than letting a potentially broken code compile. commit 9fbbe2536faa632c2283bbab455da312464b0236 Author: maxv Date: Sat Oct 27 06:06:31 2018 +0000 Remove functions that aren't supposed to be used. commit 4e284a119539c908574b378f4a5dae3c10d83e29 Author: maxv Date: Sat Oct 27 05:56:10 2018 +0000 style commit e8fb19dfdda38d94e14c7a2143c06841f1bc614e Author: maxv Date: Sat Oct 27 05:42:23 2018 +0000 Localify one function, and switch to C99 types while here. commit 642ea891843ae57cb4af30c4386e5b9f30e7e4c7 Author: maxv Date: Tue Oct 16 13:18:25 2018 +0000 fix the shit, as usual commit 74e70992dcc720bd107ba1a57c0e104e675510cf Author: maxv Date: Sun Oct 14 08:36:09 2018 +0000 Remove dead files that have never been built, and likely can't build since they are not correct C files. commit 2a9f626fb701dcdfb738aa0ca09cd51b2172528d Author: maxv Date: Sun Oct 14 08:27:39 2018 +0000 Clean up setkey: remove dead wood, KNF, localify, and slightly improve. commit 91b7d9b4e90c187b865cdad621530b6d75d02c7e Author: maxv Date: Sat Oct 13 15:38:28 2018 +0000 Fix SF#24: incorrect authentication algorithms, copy-pasto. commit 53af6d1d73741d10702de10eaa4dfb61c604d8c3 Author: maxv Date: Sat Oct 13 15:17:45 2018 +0000 Fix ticket SF#91: pass the correct size for tbuf. commit 5c4485069f8ecca11758b93d3a7f8a26943f68b4 Author: maxv Date: Sat Oct 13 15:08:51 2018 +0000 Reduce the diff against the latest release. Also remove netbsd-import.sh, since we are upstream now. commit e6ffbd46443331e98846ab63499b7193398f7dc7 Author: maxv Date: Sat Oct 13 05:53:50 2018 +0000 Mark one entry as done, and another one as pointless. commit 814e6cfd83ad4cb39ae15c8594b7dc4b94a7e280 Author: maxv Date: Fri Oct 12 05:41:18 2018 +0000 Force ip_off to zero when the reassembly is complete. This was lost in my rev1.19 - before that the IP struct was clobbered for the reassembly, but it actually implicitly guaranteed that the first fragment of the packet would end up with ip_off = 0, and this was a desired behavior. commit b6d84981e4212b70cf6f4d6348e48c9e6767b701 Author: maxv Date: Sun Oct 7 08:00:49 2018 +0000 Make it clear that you need to disable SVS if you enable USER_LDT. I could make SVS compatible with it, but there has to be someone doing Wine work first, to justify the effort. commit ba129c36a01e0b640f5266b3bf5e28aab635253d Author: maxv Date: Fri Oct 5 18:51:52 2018 +0000 export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore commit a724cd1db0b3ca798669e97f404c9c3b0423794b Author: maxv Date: Sun Sep 30 10:00:24 2018 +0000 remove hardcoded bullshit, probably fixes PR/53644 commit a3489ff7a7d5907caa26e2429efd8d148c1fc9b4 Author: maxv Date: Thu Sep 27 16:34:08 2018 +0000 no isdn commit 17514778e41852621dbb7a274f72e9a631f255f3 Author: maxv Date: Thu Sep 27 16:33:14 2018 +0000 regen commit d30f00996a1816566c07af8f8277f6bd358aee0a Author: maxv Date: Thu Sep 27 14:59:43 2018 +0000 Improve a bit, no real functional change. commit 48b688427c389cc792d246f36d124cb6d593a13a Author: maxv Date: Thu Sep 27 13:04:21 2018 +0000 Export x86_dbregs_{save/restore}, will be used outside. Reproduce some internal dbregs logic in them. commit 02ea798e141be8f7915d986182bdf606efad4d6d Author: maxv Date: Mon Sep 24 05:47:33 2018 +0000 Don't go beyond start(). commit c3709bcd53a3fb73fb1c575c8da58d347d3b23ef Author: maxv Date: Sun Sep 23 13:48:16 2018 +0000 remove references to isdn commit f3d0c92c2fbe7f917ea4a179d272b483f57f0e5f Author: maxv Date: Sun Sep 23 13:46:53 2018 +0000 note removal of isdn commit 4f0691b5e979b20d2dec39abc54f96b7223fd21a Author: maxv Date: Sun Sep 23 09:20:57 2018 +0000 Remove ISDN from the kernel. It has remained unmaintained for a long time, is of poor quality, and is now an obstacle to MP-ification. It was removed ten years ago from FreeBSD for the same reason. This retires a big user of the mbuf API, and will ease maintenance of the kernel. commit 6e4103a5c6e9ac30dfa396605116880016bf887b Author: maxv Date: Sun Sep 23 08:26:00 2018 +0000 Remove the isdn sets from syspkg, now that they have no user left. commit 87e2ed449279e9f5e2eea79d5d9ef3bff072ba05 Author: maxv Date: Sun Sep 23 07:24:19 2018 +0000 Remove the userland part of ISDN. The kernel part is untouched for now. ipppctl was actually an exact copy of pppoectl; there is no functional change in pppoectl in this commit. commit 2253dcb26142f19d9d5b105f25cabefaee18ff32 Author: maxv Date: Sat Sep 22 16:22:22 2018 +0000 Remove isic(4). It is part of ISDN, which we are now retiring. commit d0f30d027596fc46eebe2bb5543204ab8b0c48f4 Author: maxv Date: Sat Sep 22 12:56:16 2018 +0000 Unreference iwic (now removed), forgot that. commit 0e37a033b5d07653e9b50cc20e5fad08fbeb665d Author: maxv Date: Sat Sep 22 12:41:00 2018 +0000 Remove iwic(4). It is part of ISDN, which we are now retiring. This driver was still marked as experimental (its man page dates back to 2002). commit b6c06b5050ff9851d8980e920762a234827b0e9c Author: maxv Date: Sat Sep 22 12:26:27 2018 +0000 Remove the "ifritz" driver (no man page). It is part of ISDN, which we are retiring. commit d342534685de01b4bc9be362aab293a9365acbd7 Author: maxv Date: Sat Sep 22 12:19:11 2018 +0000 Remove ifpci(4). It is part of ISDN, which we are retiring. commit 913348f367a1a9b30a1f408262a9f6c2c40082db Author: maxv Date: Fri Sep 21 18:38:25 2018 +0000 Remove iavc(4). commit 5e9e1909747a36d19a097feee72d5079184cd933 Author: maxv Date: Fri Sep 21 08:43:18 2018 +0000 no, put umbctl into netutil commit 21485c209ec3f9fafc1f6c59183ccf280f8f944c Author: maxv Date: Fri Sep 21 08:38:16 2018 +0000 umbctl is not related to ISDN, move it into man-sysutil-man, spotted by martin@ commit 647e1c05302f75e0cc6f2f9eb4b7e19ae086ab57 Author: maxv Date: Fri Sep 21 07:22:26 2018 +0000 Wrap long lines, so that nothing overflows. commit 5723acda28676fc6781d3bd28d352041b47ceea5 Author: maxv Date: Wed Sep 19 16:23:05 2018 +0000 i386 xen is pae commit a26d2e79fa0bfd73bb8cc0ecd9a6b94b9b409cc7 Author: maxv Date: Wed Sep 19 16:11:53 2018 +0000 Don't build the module sets for non-pae-32bit-pv. Noted by John D. Baker on port-i386@, thanks. commit 711253cd839a62b3a98dad997faa077873f4b654 Author: maxv Date: Wed Sep 19 15:36:12 2018 +0000 Switch back to tabs, it was nicer this way. commit 9fd363dfbbe994e42ed4cc7f181d47decc745229 Author: maxv Date: Wed Sep 19 15:20:39 2018 +0000 Don't display l_wchan, either there is something in l_wmesg and we display it, or there's nothing and we print "-". commit 51d948645fd1d3b8f2b3221165e785b17025a874 Author: maxv Date: Wed Sep 19 13:58:26 2018 +0000 Remove daic(4), it has never been functional. ok martin@ commit 5a10a7776a3b6696a29b2e3c361e73f727cbe2cb Author: maxv Date: Mon Sep 17 15:53:06 2018 +0000 Reduce the noise, reorder and rename some things for clarity. commit 93f1341d4e9a60d090e6250ee18ce9ba85d0f12d Author: maxv Date: Mon Sep 17 08:11:27 2018 +0000 Kick fragments that would introduce several !MFFs in a reassembly chain. The problem arises if we receive three fragments of the kind 3. A -> has MFF 1. B -> doesn't have MFF 2. C -> doesn't have MFF Because of the received order B->C->A, we don't see that B is !MFF, and therefore that there is a problem in this chain. Now we do two checks, and drop us if: * there is a fragment preceding us, and this fragment is !MFF, or * there is a fragment following us, and we are !MFF Spotted a long time ago. commit b6eb95c3f89b8f06f0b41eddac926aca36d50e73 Author: maxv Date: Mon Sep 17 06:01:36 2018 +0000 Hold ip_off and ip_len in the fragment entry, instead of always reading the associated mbuf (and converting to host order). This reduces the cache/TLB misses when processing long lists. commit 2c7aeedad49dc151a07928a12287da2d146f33fb Author: maxv Date: Fri Sep 14 05:09:51 2018 +0000 Use non-variadic function pointer in protosw::pr_input. commit 65cabc890d140549c561fcd282d35f0b8ef494a0 Author: maxv Date: Fri Sep 14 04:29:46 2018 +0000 rename toff -> off commit 7b0555a29f8d0f17d5e24727407b433d8be9c123 Author: maxv Date: Fri Sep 14 04:25:16 2018 +0000 rename off -> thlen commit 1f55a06840c23a34a10fbbfc28cc9be72d3211c2 Author: maxv Date: Thu Sep 13 14:44:09 2018 +0000 Don't leak kernel pointers to userland in kern.file2, same as kern.proc2. commit e1631707ab4b6eb0722ac4cf7958e163a0e8c620 Author: maxv Date: Wed Sep 12 15:58:08 2018 +0000 Remove this check, it has never protected against mmap on page zero, and has since been replaced by the code in exec_vm_minaddr. commit d5cc66a0186b58b3c5ea7f7ff58afbc3449e627a Author: maxv Date: Mon Sep 10 17:25:21 2018 +0000 reduce the battlefield commit 874d8dcc36e67f213af2d8e144483a0f0e79ac37 Author: maxv Date: Mon Sep 10 16:43:24 2018 +0000 Replace KDASSERT by panic. commit 553992ab9a06c3c7135aa2a84b6f4d2bde100d35 Author: maxv Date: Mon Sep 10 15:14:50 2018 +0000 Rename _pmap_alloc_pdp -> pmap_alloc_pdp, and make it public. commit 621505da779a0453a2881ea96e1e460f38edde0b Author: maxv Date: Mon Sep 10 13:11:05 2018 +0000 Correctly align the size+redzone for KASAN, on amd64 it happens to be always 8byte-aligned but on other architectures it may not be. commit 905928d2d295f2134ea67884302f70a2be2a4a6b Author: maxv Date: Sat Sep 8 12:40:17 2018 +0000 Work around dumb KASSERT in vtopte(), the PTE area can now be above the MAIN area. I guess the KASSERT should be removed because it doesn't make a lot of sense. commit 16406f272bd02ce036b1ac8caef967a7dca493af Author: maxv Date: Fri Sep 7 10:20:32 2018 +0000 mark two entries as done, and add two more commit 32b0faa9b909471ed8bddbbdda2e33b5e505111f Author: maxv Date: Fri Sep 7 06:13:14 2018 +0000 Make raw_input non-variadic. commit 41d54c3b6e894f4be358694fc395536c95d339fe Author: maxv Date: Fri Sep 7 06:08:16 2018 +0000 Set unused pr_input field to NULL, discussed on tech-net@. commit a4a12d9f212f43381c44455ef8bb813c8d04a5bc Author: maxv Date: Thu Sep 6 19:19:44 2018 +0000 Remove netkey/. commit 9379883253fcf0a3f2f57cf30ad38f75093d0b80 Author: maxv Date: Thu Sep 6 19:07:13 2018 +0000 Remove netinet6/ipsec.h. commit 0325797f8174edd0771610c30b35675c07d79a55 Author: maxv Date: Thu Sep 6 14:08:24 2018 +0000 en leftover commit e3991e52fa44d8fed43a15b465039e7d05813326 Author: maxv Date: Thu Sep 6 10:09:29 2018 +0000 more netkey->netipsec commit 495359a5b81e5905f14e82ca57633acb5d775c41 Author: maxv Date: Thu Sep 6 09:54:36 2018 +0000 Remove dead references to netinet6/ipsec.h. commit ccec470db04c8b6faa658fcd432798825b310e88 Author: maxv Date: Thu Sep 6 09:47:30 2018 +0000 Remove lurking references to Midway. These lists don't seem to be really up-to-date, by the way. commit 5ca6b057c6bd583411ccc0a11e93ebda94403182 Author: maxv Date: Thu Sep 6 09:44:09 2018 +0000 Replace netkey/ -> netipsec/, everything was moved into netipsec/. commit 7954719838a709de449834f073dd942a1f615c8e Author: maxv Date: Thu Sep 6 09:38:05 2018 +0000 sync with reality commit 703ebc339c741a83f50cc7c429bed6967a28a8fa Author: maxv Date: Thu Sep 6 09:31:06 2018 +0000 remove netnatm leftover commit 4f314fb3d9ad692a6bb75d566f65092310e64d48 Author: maxv Date: Thu Sep 6 09:28:00 2018 +0000 fix references, the things were moved into netipsec/ a while ago commit e8814db155227775b36095e3e4837e9467d033a6 Author: maxv Date: Thu Sep 6 06:46:25 2018 +0000 Note removal of midway and NATM, and prune the entries from TODO.smpnet. commit 6c46a43cec8b28a253a84a009a0ef68b628e4281 Author: maxv Date: Thu Sep 6 06:41:59 2018 +0000 Remove the network ATM code. commit 1bdc722939a0afaa1e25c86ea503734e10b265db Author: maxv Date: Thu Sep 6 05:36:49 2018 +0000 Retire the 'midway' driver. Discussed on tech-net@ recently and also three years ago, part of removing the network ATM code. commit 3415797d968e4a6d093e9286910085174ed66bc8 Author: maxv Date: Tue Sep 4 16:03:56 2018 +0000 Clear the kernel pointers from kern.proc and kern.proc2 when the user is not privileged. commit 0b2479767dbbddc54a0970decc4005ce49c14928 Author: maxv Date: Tue Sep 4 15:48:44 2018 +0000 Use p->p_session instead of ep->e_sess, no real functional change. commit bfdfc6adeae1fd99d56f294645493f0e26f48723 Author: maxv Date: Tue Sep 4 15:41:08 2018 +0000 more kernel address leaks commit b19241f7294dfafc83561c18aa1c3589f434664a Author: maxv Date: Tue Sep 4 15:36:01 2018 +0000 Fix the "Interfaces" section, I understood wrong. Talk about inference, because it was not mentioned before, and it plays an important role. Discussed with rmind. Probably not the last pass. commit c680311e1c8b56c828346c09703c369d83f0abd1 Author: maxv Date: Tue Sep 4 14:31:18 2018 +0000 Introduce KAUTH_REQ_PROCESS_CANSEE_KPTR, and use it in the already-existing modstat code. No real functional change. commit c08747d346d78ec5bd0163f3520bb1ec2250b4c5 Author: maxv Date: Sun Sep 2 17:45:18 2018 +0000 Be clearer about the difference between static vs dynamic interface list, and slightly improve wording. My understanding is that when none of inet4/inet6/ifaddrs is passed, NPF assumes ifaddrs. commit 4ae9ff8f7f089f1ed2baec403db73de3d64f1de0 Author: maxv Date: Sun Sep 2 17:21:28 2018 +0000 well well well it's September now commit 9c4e1c0a44faed8ed77bbc939da3ec3dee681f8b Author: maxv Date: Sun Sep 2 16:13:42 2018 +0000 remove dead references to IPF; also remove references to netccitt/, it was removed 12 years ago. commit 8e9d6f6ccb3e3f852384dea7266552746b7d177d Author: maxv Date: Sun Sep 2 16:08:12 2018 +0000 replace ipf->npf commit 7b5c9bd8808d9d5d10b96bef22e3f8e5dc0e432a Author: maxv Date: Sun Sep 2 16:05:33 2018 +0000 remove reference to ipnat, and duplicate comments commit 0ceacf6daaa27bdc875a3e2ac9e24ed2b364b113 Author: maxv Date: Sun Sep 2 16:02:18 2018 +0000 remove reference to ipnat commit 03db620abdb09566c0b47eed853fd328d23c1e8e Author: maxv Date: Fri Aug 31 15:15:23 2018 +0000 Fix buffer overflow, detected by kASan. ifconfig gif0 create ifconfig gif0 up [ 50.682919] kASan: Unauthorized Access In 0xffffffff80f22655: Addr 0xffffffff81b997a0 [8 bytes, read] [ 50.682919] #0 0xffffffff8021ce6a in kasan_memcpy [ 50.692999] #1 0xffffffff80f22655 in m_copyback_internal [ 50.692999] #2 0xffffffff80f22e81 in m_copyback [ 50.692999] #3 0xffffffff8103109a in rt_msg1 [ 50.692999] #4 0xffffffff8159109a in compat_70_rt_newaddrmsg1 [ 50.692999] #5 0xffffffff81031b0f in rt_newaddrmsg [ 50.692999] #6 0xffffffff8102c35e in rt_ifa_addlocal [ 50.692999] #7 0xffffffff80a5287c in in6_update_ifa1 [ 50.692999] #8 0xffffffff80a54149 in in6_update_ifa [ 50.692999] #9 0xffffffff80a59176 in in6_ifattach [ 50.692999] #10 0xffffffff80a56dd4 in in6_if_up [ 50.692999] #11 0xffffffff80fc5cb8 in if_up_locked [ 50.703622] #12 0xffffffff80fcc4c1 in ifioctl_common [ 50.703622] #13 0xffffffff80fde694 in gif_ioctl [ 50.703622] #14 0xffffffff80fcdb1f in doifioctl commit 5e783fda5a357e0b6eb39d0d55fb9e5815d7d264 Author: maxv Date: Fri Aug 31 14:16:06 2018 +0000 Introduce npf_set_mss(). When the MSS is not 16bit-aligned, it sets: 0 8 16 24 32 +------+-----------+-----------+------+ | data | MSS (low) | MSS (hig) | data | +------+-----------+-----------+------+ ^ ^ old[0] old[1] And sets new[0,1] accordingly with the new value. The MSS-clamping code then adjusts twice the checksum on a 16bit boundary: from old[0] to new[0] from old[1] to new[1] Fixes PR/53479, opened by myself. Tested with wireshark and kASan. commit 68204f10bdc2291a6b6f29f74737a8e9675205af Author: maxv Date: Fri Aug 31 11:18:35 2018 +0000 rename net-seg -> map-seg, and document it commit d956304c512abcd4f46a5f585cd84d35d9e0e6d6 Author: maxv Date: Fri Aug 31 11:11:21 2018 +0000 "interface" already contains "var-name", so don't mention it in "filt-addr", that's redundant commit e6c0caee9c04a0537d8f5eb298e5593386cbcf6f Author: maxv Date: Fri Aug 31 11:01:09 2018 +0000 should be port-opts commit 262f9a73581e2765e72c101a8723d0b2c1bcc46d Author: maxv Date: Fri Aug 31 10:52:30 2018 +0000 Clarify the "Groups" section. commit efce7c715f2d53b14e84128285afebc93f8b059e Author: maxv Date: Fri Aug 31 10:38:17 2018 +0000 remove commented reference to pflog commit 00a0a68dc25e13a0c37fd109cbc91da8085931aa Author: maxv Date: Thu Aug 30 10:38:01 2018 +0000 Use ASM markers for functions, it makes the code easier to understand and eliminates raw symbols. No functional change (tested on RPI3B+). commit f1ea6eb78cbc2f24a17313d4f961b3e0d15ed2a8 Author: maxv Date: Thu Aug 30 10:30:05 2018 +0000 style, no functional change commit c3a71ccacb4c211e3fb3816e6eb684d30bfd95f4 Author: maxv Date: Wed Aug 29 16:26:25 2018 +0000 clean up a little commit 61cd9cdbc8d0e07510e6ca2640f3bfbf1b249126 Author: maxv Date: Wed Aug 29 06:28:50 2018 +0000 Remove the constants of the DMAP, they are unused, and move NL4_SLOT_DIRECT into amd64/. commit a99d9e8cf332d93329dd043559302e32540f532d Author: maxv Date: Wed Aug 29 06:17:26 2018 +0000 Simplify the ASLR stuff, we don't care about resizable areas now, and it makes the code more complicated for no good reason. commit 0df37c8c09de0e2824a37e46d9f5447e84335748 Author: maxv Date: Mon Aug 27 13:09:16 2018 +0000 Improve the "Map" section. commit 1120426325d9e2b5e00b756c15eee49990f698be Author: maxv Date: Mon Aug 27 12:46:03 2018 +0000 Document ALGs. commit 3ccf28cb5e1146f92127c72675432c327c5bd5bc Author: maxv Date: Mon Aug 27 08:53:19 2018 +0000 Add kasan interceptors for strcpy/strcmp/strlen. commit 53b3598bef1d94f096d9282d37534415acabfd01 Author: maxv Date: Sat Aug 25 09:54:37 2018 +0000 Add KAUTH_REQ_PROCESS_CANSEE_EPROC, and use it for the kern.proc node. Same permission as before, so no functional change. commit f142cefdbc49236e9027d64273cad11785b4042a Author: maxv Date: Sat Aug 25 08:12:28 2018 +0000 Belatedly note the removal of vm86 (me, one year ago), and n8 (maya, two weeks ago). commit 92560aa5e589ab71677435ead4f4781f461193f6 Author: maxv Date: Sat Aug 25 08:08:26 2018 +0000 Note removal of NDIS. commit 42fc8544e933870a4b04cefcac5a614ac6aebf58 Author: maxv Date: Sat Aug 25 07:48:56 2018 +0000 Retire NDIS. It appears that it has never worked, after 13 years it was still marked as "experimental", and nowadays it may be one more obstacle to MPification of the network stack. Discussed on tech-net@. commit 40e7994a4d6d0dae149abeb8804582c976e2d5ae Author: maxv Date: Sat Aug 25 05:56:24 2018 +0000 Disable POOL_REDZONE until we figure out what's wrong. There must be a dumb problem, that is not triggerable on amd64. commit ec6e67283825d3880cfdafef26786b38f928de37 Author: maxv Date: Fri Aug 24 17:09:30 2018 +0000 mark one entry as done commit bba5248b810f64a05753612d54d0f53c584f367a Author: maxv Date: Fri Aug 24 17:06:29 2018 +0000 Use a random hunique, instead of sending the pointer of the interface. Tested via ATF. commit 44584a89ba77172f7eacdacecc332ff4facaccde Author: maxv Date: Fri Aug 24 14:04:27 2018 +0000 Use __predict_false to optimize, and also replace panic->printf. commit 86e33fe97e287f962735f1e2ed7f5d0635365a3a Author: maxv Date: Fri Aug 24 05:39:04 2018 +0000 Note kASan support. commit a4d0d5a837e781dc70e5a691d9b47561c477dbf5 Author: maxv Date: Thu Aug 23 12:18:02 2018 +0000 Add kASan redzones on pools and pool_caches. Also enable POOL_REDZONE on DIAGNOSTIC. commit d77be9a365f9c2d59249893e54d1b68da71b743b Author: maxv Date: Thu Aug 23 11:56:10 2018 +0000 Improve the detection on global variables, no need to round up. commit 1f7f80986dd4cbe30d097ef75d4395d2bab89020 Author: maxv Date: Thu Aug 23 11:53:15 2018 +0000 Fix buffer overflow, detected by kASan. [ 1.044878] kASan: Unauthorized Access In 0xffffffff804ec7e2: Addr 0xffffffff818a51e4 [2 bytes, read] [ 1.044878] #0 0xffffffff804ec7e2 in mskc_probe [ 1.044878] #1 0xffffffff80e92a77 in mapply [ 1.044878] #2 0xffffffff80e92e5f in config_search_loc [ 1.044878] #3 0xffffffff80e93fb5 in config_found_sm_loc [ 1.044878] #4 0xffffffff802ca9ea in pci_probe_device [ 1.044878] #5 0xffffffff802cad97 in pci_enumerate_bus [ 1.044878] #6 0xffffffff802caf00 in pcirescan [ 1.044878] #7 0xffffffff802cb1ee in pciattach [ 1.044878] #8 0xffffffff80e93e5b in config_attach_loc [ 1.044878] #9 0xffffffff80e93fce in config_found_sm_loc [ 1.044878] #10 0xffffffff80271212 in mp_pci_scan [ 1.044878] #11 0xffffffff8022d9ee in mainbus_attach [ 1.044878] #12 0xffffffff80e93e5b in config_attach_loc [ 1.044878] #13 0xffffffff8021e38b in cpu_configure [ 1.044878] #14 0xffffffff814a7068 in main commit 1729e86a779fd40f158008759381b850d33320fa Author: maxv Date: Wed Aug 22 17:25:02 2018 +0000 Unwind the stack on error, to get the full path that led to the illegal access. Example of output: kASan: Unauthorized Access In 0xffffffff80e6219c: Addr 0xffffbb007a39fd03 [1 byte, read] #0 0xffffffff80e6219c in ras_purgeall #1 0xffffffff80e62330 in sys_rasctl #2 0xffffffff80265008 in syscall (I manually added a one-byte stack read overflow in rasctl to demonstrate.) commit 973c52f91388ec9a3da4bb3ed991de5bc6d63659 Author: maxv Date: Wed Aug 22 17:04:36 2018 +0000 Explicitly unpoison the stack when entering a softint. Softints are the only place where we "discard" a part of the stack: we may have left the thread without allowing the asan instrumentation to clear the poison, and in this case, we can get false positives when we hit a poisoned area of the stack while executing another handler within the same softint thread. (I was actually getting a rare false positive in ip6intr.) commit 40c2e013b43cafc538ba2502229d0b04ace72d31 Author: maxv Date: Wed Aug 22 12:42:06 2018 +0000 Add back the KASAN ifdefs in kern_malloc until we sort out the type issue, and fix sys/asan.h. Tested on i386, amd64 and amd64-kasan. commit 93211a8e230aaa1ff271191f3b214f49a6aa80a5 Author: maxv Date: Wed Aug 22 12:07:42 2018 +0000 Add support for monitoring the stack with kASan. This allows us to detect illegal memory accesses occuring there. The compiler inlines a piece of code in each function that adds redzones around the local variables and poisons them. The illegal accesses are then detected using the usual kASan machinery. The stack size is doubled, from 4 pages to 8 pages. Several boot functions are marked with the __noasan flag, to prevent the compiler from adding redzones in them (because we haven't yet initialized kASan). The kasan_early_init function is called early at boot time to quickly create the shadow for the current stack; after this is done, we don't need __noasan anymore in the boot path. We pass -fasan-shadow-offset=0xDFFF900000000000, because the compiler wants to do shad = shadow-offset + (addr >> 3) and we do, in kasan_addr_to_shad shad = KASAN_SHADOW_START + ((addr - CANONICAL_BASE) >> 3) hence shad = KASAN_SHADOW_START + (addr >> 3) - (CANONICAL_BASE >> 3) = [KASAN_SHADOW_START - (CANONICAL_BASE >> 3)] + (addr >> 3) implies shadow-offset = KASAN_SHADOW_START - (CANONICAL_BASE >> 3) = 0xFFFF800000000000 - (0xFFFF800000000000 >> 3) = 0xDFFF900000000000 In UVM, we add a kasan_free (that is not preceded by a kasan_alloc). We don't add poisoned redzones ourselves, but all the functions we execute do, so we need to manually clear the poison before freeing the stack. With the help of Kamil for the makefile stuff. commit 4d98693b33a2f3c2996a61098bf979a0d0cf020e Author: maxv Date: Wed Aug 22 10:09:21 2018 +0000 Actually add __unused on the functions themselves in case a .c file does not use one function. commit 26c2f18efde3a4e3f4d2bd8762d5f0f21c50f404 Author: maxv Date: Wed Aug 22 09:38:21 2018 +0000 Reduce the number of KASAN ifdefs, suggested by Christos/Taylor. commit 72a1ebbfe1d8c71623c1398ebe3aa4b177370728 Author: maxv Date: Wed Aug 22 09:11:47 2018 +0000 Fix the computation in kasan_shadow_map, we may need one more page because of the rounddown. commit 9b5023884c99bc6f920f03d46b26593654bd23af Author: maxv Date: Tue Aug 21 07:56:53 2018 +0000 Need to keep track of the requested size, when realloc is used under kASan. Maybe we could use mh_rqsz by default. commit 5695f45dfb9e5deb9c2112ce674a8a5b93f19c27 Author: maxv Date: Mon Aug 20 15:04:51 2018 +0000 Add support for kASan on amd64. Written by me, with some parts inspired from Siddharth Muralee's initial work. This feature can detect several kinds of memory bugs, and it's an excellent feature. It can be enabled by uncommenting these three lines in GENERIC: #makeoptions KASAN=1 # Kernel Address Sanitizer #options KASAN #no options SVS The kernel is compiled without SVS, without DMAP and without PCPU area. A shadow area is created at boot time, and it can cover the upper 128TB of the address space. This area is populated gradually as we allocate memory. With this design the memory consumption is kept at its lowest level. The compiler calls the __asan_* functions each time a memory access is done. We verify whether this access is legal by looking at the shadow area. We declare our own special memcpy/memset/etc functions, because the compiler's builtins don't add the __asan_* instrumentation. Initially all the mappings are marked as valid. During dynamic allocations, we add a redzone, which we mark as invalid. Any access on it will trigger a kASan error message. Additionally, the compiler adds a redzone on global variables, and we mark these redzones as invalid too. The illegal-access detection works with a 1-byte granularity. For now, we cover three areas: - global variables - kmem_alloc-ated areas - malloc-ated areas More will come, but that's a good start. commit 64e102182eaa144e7a1b1684095fd781fc2e2aa1 Author: maxv Date: Mon Aug 20 11:46:44 2018 +0000 Compute the pointer earlier, not in the return statement. No functional change. commit 2a120712c3fb8530ecdd3780f2e2eecaace054aa Author: maxv Date: Mon Aug 20 11:35:28 2018 +0000 Retire KMEM_REDZONE and KMEM_POISON. KMEM_REDZONE is not very efficient and cannot detect read overflows. KASAN can, and will be used instead. KMEM_POISON is enabled along with KMEM_GUARD, but it is redundant, since the latter can detect read UAFs contrary to the former. In fact maybe KMEM_GUARD should be retired too, because there are many cases where it doesn't apply. Simplifies the code. commit ffe11c98465e422e642fe9cd68233c77d4b43aa0 Author: maxv Date: Sat Aug 18 08:45:55 2018 +0000 Simplify the conditions. Fixes compilation of native amd64 without direct map. commit ee22dd250ad5f33066be1723f1a27b7446a6f94a Author: maxv Date: Fri Aug 17 14:39:51 2018 +0000 Remove big outdated comment, remove unused macros, remove XXX that has nothing to do here, style. commit b542c3a781ea6a74da3771e058e628f2eede590e Author: maxv Date: Fri Aug 17 12:36:53 2018 +0000 Add a deprecation note in each of the PF man pages (instead of just pf.4), so that it's really clear. commit 65db28d7a29d5a940e9f2d5a9d559085c67bcaa1 Author: maxv Date: Fri Aug 17 12:20:49 2018 +0000 Add the values of "algo" in the grammar, and use # as comment marker for man-k.org (and others) not to highlight things in an incorrect way. commit f332b765cb33f8bdf9950c6d218f8fa09d4dbebb Author: maxv Date: Fri Aug 17 12:04:20 2018 +0000 Add missing quote in static-rule, it causes man-k.org (and other tools) to wrongly highlight the grammar. commit a9f9acc10eadcdfab34639a1dcee0983d188a95b Author: maxv Date: Fri Aug 17 10:24:19 2018 +0000 Replace "rproc"->"proc" in the grammar (spotted by he@), and slightly reword. commit 58245e89faa3dcfcba7b39f3558e57ea7122a34e Author: maxv Date: Fri Aug 17 10:16:24 2018 +0000 Replace () by [] in tcp-flags. Fix proc-opts, the value is optional, noted by he@. commit 53e6b19985e9fa56250785fc1ce75211c580c7e9 Author: maxv Date: Thu Aug 16 09:58:00 2018 +0000 Improve wording. commit 8b4ad92798f91e186cc6a8adba49708c92940bbd Author: maxv Date: Thu Aug 16 09:50:37 2018 +0000 Improve the "Map" section a little. commit c540801327cadb5593ac8d2b53a10ae40923bc8d Author: maxv Date: Thu Aug 16 09:46:18 2018 +0000 Document the "flags" keyword. commit 516446cb5c1266b4a6de9d4b45ed29b40b61a432 Author: maxv Date: Thu Aug 16 09:21:00 2018 +0000 Improve the "Rules" section: better explain the "final" keyword (it is the same as PF's "quick", so use the same wording), and document the "return" options. While here simplify the man code, suggested by wiz. commit dee5a911e34846b5806e7ea749ddc1e49f149480 Author: maxv Date: Thu Aug 16 08:51:53 2018 +0000 Add quotes around the option names, to match the actual npf conf. commit 02f4fbab81bc670706b75aa55367e6f832e31ddc Author: maxv Date: Thu Aug 16 08:37:51 2018 +0000 Enlighten the "Procedures" section. In particular document the "no-df" option. Also replace "normalisation" -> "normalization", to match the name of the rule. commit 490daf0ae12f5e5efdb6c6480d00be783cc624d9 Author: maxv Date: Tue Aug 14 14:56:33 2018 +0000 Note removal of etherip, add entry about ASLR in amd64, and improve the ipkdb entry. commit 6e5d7e3e101d0576715a0480cc7248ffda70bd3a Author: maxv Date: Tue Aug 14 14:49:13 2018 +0000 Retire EtherIP, we have L2TP instead. commit 07a04b365a93aa21185fbb1e906a50c4cbb39122 Author: maxv Date: Tue Aug 14 06:37:59 2018 +0000 Replace references to etherip by l2tp. Etherip was already not enabled anyway. commit cfc6b9776b901b0bb55cb0f147ae6e3b23624c77 Author: maxv Date: Tue Aug 14 06:21:36 2018 +0000 Replace etherip by l2tp in the "see also" sections. commit b4a3f32b55d07b0c121e5ea3cc9ad6058ccee4b2 Author: maxv Date: Tue Aug 14 06:18:46 2018 +0000 Enlighten a little. commit 4c6d31c27d3241926d7065722f4b75c1f8bc9886 Author: maxv Date: Tue Aug 14 06:04:24 2018 +0000 Enable L2TP on all x86 configurations, not just native amd64. commit 0dcdd89e4614ca96dc01fe998aca1265ac2e3354 Author: maxv Date: Mon Aug 13 15:48:21 2018 +0000 Clarify, remove dead code, and add XXXSMP; really this static variable looks like a great bug. commit f9732fd8973e7ebb3ab2cae349c64834d9749795 Author: maxv Date: Mon Aug 13 09:29:13 2018 +0000 Clarify two functions. commit 40e9d10a99dfaf7eb6f6b6259f819b992f95a74f Author: maxv Date: Sun Aug 12 15:33:36 2018 +0000 mark two entries as done commit 0bbbdde3e9551b3d03db3fb7e2c3eccc17733212 Author: maxv Date: Sun Aug 12 15:31:01 2018 +0000 More ASLR: randomize the location of the PTE area. The PTE slot is not created in locore anymore, but a little later; by using the already entered L4 page, rather than the recursive slot itself (which doesn't exist yet). In the prekern we still map the slot - the prekern behaves as an external locore -, because we need it as part of the randomization/relocation work. The kernel then removes this slot, and regenerates a randomized one. Tested on GENERIC and GENERIC_KASLR, Xen doesn't have it and dom0 still boots fine. commit 7b729e7e0ebb2c1dcde8d75e08e1f4c68d020ad9 Author: maxv Date: Sun Aug 12 13:31:16 2018 +0000 Move the PCPU area from slot 384 to slot 510, to avoid creating too much fragmentation in the slot space (384 is in the middle of the kernel half of the VA). commit 0e1f07d873abd0e3d125febb8044093bf56798ce Author: maxv Date: Sun Aug 12 12:42:53 2018 +0000 Move the PTE area from slot 255 to slot 509. I've never understood why we put it on 255; the "kernel" half of the VM space begins on slot 256, so if anything, the PTE area should have been above it, not below. Virtually extend the user slots in slotspace, because we don't want (randomized) kernel mappings to land on slot 255. The prekern is updated accordingly. Tested on GENERIC, GENERIC_KASLR and XEN3_DOM0. commit 07857fe8f387378169dce4dead5b68361b0637c3 Author: maxv Date: Sun Aug 12 12:23:33 2018 +0000 Introduce PDIR_SLOT_USERLIM, which indicates the limit of the user slots. Use it instead of PDIR_SLOT_PTE when we just want to iterate over the user slots. Also use it in SVS, I had hardcoded 255 because there was no proper define (which there now is). commit 4f1ab3c3c6ebab179e8dd0e65de42aebe5599751 Author: maxv Date: Sun Aug 12 11:51:42 2018 +0000 Reduce the minefield: zero out the pdir only once, at the beginning of the function. This eliminates one assumption on the order of the VM areas. commit 6f145107a4397eaa983c9a7e97eca0a30fbe3836 Author: maxv Date: Sun Aug 12 10:50:35 2018 +0000 Randomize the main memory on Xen, same as native. Tested on amd64-dom0. commit 17a3c9fa254e52aa8cba382a5e7e405de49b8572 Author: maxv Date: Sun Aug 12 10:45:27 2018 +0000 Take the last area into account, there is a hole before it. commit e0dae19cdeef30475fba52e8d133a51c748b3410 Author: maxv Date: Sun Aug 12 09:29:16 2018 +0000 Rename 'slotspace' -> 'slotarea' in UVM, to avoid (future) collision with the x86 slotspace structure. commit a6b3457735840f9d6968abc9396e0c6a4a4d97c0 Author: maxv Date: Sun Aug 12 09:05:52 2018 +0000 Add a new area, SLAREA_HYPV, which indicates the slots used by the hypervisor, in our case Xen. commit 9a2eba1620a5fd568992f87ffcce16d70a676f5e Author: maxv Date: Sun Aug 12 08:17:50 2018 +0000 More ASLR: randomize the kernel main memory. VM_MIN_KERNEL_ADDRESS becomes variable, and its location is chosen at boot time. There is room for improvement, since for now we ask for an alignment of NBPD_L4. This is enabled by default in GENERIC, but not in Xen. Tested extensively on GENERIC and GENERIC_KASLR, XEN3_DOM0 still boots fine. commit ffb33f7e034d67cf165f371bbb68d2c6f3a07b95 Author: maxv Date: Sun Aug 12 06:11:47 2018 +0000 Eliminate the only ASM reference to VM_MIN_KERNEL_ADDRESS. Rename the value to VM_SPACE_SEP_HIGH32, it is now the highest 32bits of the first va of the higher half of the address space (right after the canonical hole). commit 4ae315c05dd6e3aeac73dd4527a4fee737883336 Author: maxv Date: Sun Aug 12 05:43:42 2018 +0000 enable the two errata for AMD Family 16h, tested by mrg@, thanks commit 66f64af1d1970aa955970486bcce19330bef62b2 Author: maxv Date: Fri Aug 10 17:47:14 2018 +0000 remove reference to CPU_ARMV2, suggested by jmcneill@ commit 6378a1059c1eb6ec790ac5777b0aaf1d2dce2bb4 Author: maxv Date: Fri Aug 10 17:46:06 2018 +0000 Enlighten a little. commit f059e481ae300d534177fbee48f9ab97509cd2e2 Author: maxv Date: Fri Aug 10 16:17:29 2018 +0000 Retire CPU_ARM2, CPU_ARM250 and CPU_ARM3, they are all leftovers of acorn26. ok jmcneill@ skrll@ commit c5daeef0d2841162efe06ff6f996e396dd6f0fdc Author: maxv Date: Fri Aug 10 07:16:13 2018 +0000 Fix compilation of PF/IPF... commit 8e4d8a6cf567d5e626bf701c268ca18249a9b161 Author: maxv Date: Fri Aug 10 06:55:04 2018 +0000 Remove the callback and localify. Same as IPv4. commit 223fe87b6a461080f7dc32eb9ba3c3783907c87e Author: maxv Date: Fri Aug 10 06:46:08 2018 +0000 Rename ip6_undefer_csum -> in6_undefer_cksum in6_delayed_cksum -> in6_undefer_cksum_tcpudp The two previous names were inconsistent and misleading. Put the two functions into in6_offload.c. Add comments to explain what we're doing. Same as IPv4. commit 4d7648db613024a8db257b7960b32d1f6b3bcb37 Author: maxv Date: Fri Aug 10 06:23:12 2018 +0000 Don't unconditionally call pmap_extract_ma, it is part of XENNET_DEBUG. This call costs us. commit 837e85bdb77f5c94247892737b3c18bf1353b769 Author: maxv Date: Thu Aug 9 17:43:54 2018 +0000 Localify mcl_cache. commit 07e85f732dfbe51e5f66a6c1bb571b1f2a55e5a4 Author: maxv Date: Thu Aug 9 17:32:44 2018 +0000 Use an independent pool, don't steal pages from mcl_cache. This was a bad hack. No particular functional change, since the (MCLBYTES != PAGE_SIZE) condition is already true. commit d55ca7630dbe9e226a7061af9d4c3f11ec6bf479 Author: maxv Date: Thu Aug 9 17:26:00 2018 +0000 style a bit commit 071881c9cab1a3fb829943e035c3444fa7ace2eb Author: maxv Date: Tue Aug 7 10:50:12 2018 +0000 Add five errata for AMD Family 17h (Ryzen etc), tested by Patrick Welche, thanks. Also add two errata for Family 16h, not yet tested, so not yet enabled. commit eeb4f8d353241c7a51f5ab007ac045d8214a490b Author: maxv Date: Thu Aug 2 17:34:51 2018 +0000 Mark two entries as done. commit 16f20c80bd9c26ae9d6aa4abf79eea6d47a67d6d Author: maxv Date: Thu Aug 2 17:18:00 2018 +0000 Add a "version" field in the prekern_args structure. The kernel checks it, and if it's not happy it returns back to the prekern. commit 9711627999260ba4d14fc8d8042021a419579c72 Author: maxv Date: Thu Aug 2 16:58:00 2018 +0000 Don't forget to call init_slotspace when we're booted via the prekern. commit bf9bbb2831ad4cdb7c72d29e75ebf8ab2ce06b44 Author: maxv Date: Thu Aug 2 16:26:09 2018 +0000 Distribute GENERIC_KASLR on amd64. commit 1b2f62cac2c6ec20b861867dc1c47f6ce1999f84 Author: maxv Date: Thu Aug 2 16:22:43 2018 +0000 Remove netbsd-INSTALL_XEN3_DOMU.gz (it doesn't exist anymore), and add netbsd-XEN3PAE_DOM0.gz (has always existed, but was apparently forgotten). commit ef848ac0f1e97871ba6ae2413bcc4a22c5676edc Author: maxv Date: Wed Aug 1 20:04:09 2018 +0000 Unreference IPF/PF from all the config files, and enable NPF instead when wanted. This also fixes some inconsistencies I saw in several files (eg IPF options while IPF was not compiled, IPF+PF enabled by default, etc). commit 114eaf70a4240f7a4b9df5605fe8c3a155c2db97 Author: maxv Date: Wed Aug 1 16:59:09 2018 +0000 Unreference IPF/PF from the x86 config files (amd64, i386, xen), and enable NPF instead when wanted. commit b70e28be952b138b09f9111de2f3fd10e3a48f09 Author: maxv Date: Wed Aug 1 13:41:26 2018 +0000 Note the removal of non-PAE-32bit-PV. commit 89d1ded46aed4b514cf8c8d829f4bc6b9676d984 Author: maxv Date: Wed Aug 1 13:35:01 2018 +0000 Xen is PAE, so remove ifdefs. commit 75b66c7e1f7800e3a5d58995091263271f89b06e Author: maxv Date: Wed Aug 1 13:30:13 2018 +0000 Add a bold note to say our PF is obsolete. commit 080500ed69c46eba5deb8ad1c13d4469de84433f Author: maxv Date: Tue Jul 31 19:43:24 2018 +0000 Add new item, about deprecating IPF/PF; I thought it was there, but realized it wasn't. commit 2bb8d930f7daae51fd4cd245b38e128b81846f03 Author: maxv Date: Sun Jul 29 08:02:24 2018 +0000 Reduce the confusion, rename a bunch of variables and reorg a little. Tested on i386PAE-domU and amd64-dom0. commit 36419d34d18a1c0d15e3cd87fbb996c0de6b34ee Author: maxv Date: Fri Jul 27 10:04:22 2018 +0000 Try to reduce the confusion, rename: l2_4_count -> PDIRSZ count -> nL2 bootstrap_tables -> our_tables init_tables -> xen_tables No functional change. commit c3f7637ac81b7f442574c5585e5658851e019661 Author: maxv Date: Fri Jul 27 09:37:31 2018 +0000 Reduce the size of the blocks. No functional change. commit 731f8cb0a68b0795cfae9d5a06a1c773a8700949 Author: maxv Date: Fri Jul 27 09:22:40 2018 +0000 style, localify global variables, etc, no real functional change commit b33c014e768acd59867eaae28ebd05d03f58df89 Author: maxv Date: Fri Jul 27 07:35:09 2018 +0000 Remove KERN_BASE, unused. It has always been wrong anyway, the value should have been passed into VA_SIGN_NEG(). commit 4bbadfa029b9f56bdb11065dfaa16a985bf15962 Author: maxv Date: Fri Jul 27 07:32:59 2018 +0000 Replace KERN_BASE by VM_MIN_KERNEL_ADDRESS. Also add XXX on INKERNEL. commit 1e314aa6095da9aa2d72b954d00b6a77a62c31ed Author: maxv Date: Thu Jul 26 17:20:08 2018 +0000 Remove the non-PAE-i386 code of Xen. The branches are reordered so that __x86_64__ comes first, eg: #if defined(PAE) /* i386+PAE */ #elif defined(__x86_64__) /* amd64 */ #else /* i386 */ #endif becomes #ifdef __x86_64__ /* amd64 */ #else /* i386+PAE */ #endif Tested on i386pae-domU and amd64-dom0. commit 4dd80a8e8c3a15ebe29b8e929b535a65404d5221 Author: maxv Date: Thu Jul 26 16:22:49 2018 +0000 Retire the non-PAE-i386-PV configuration files. Keep only PAE-i386-PV. Non-PAE has been dropped years ago by Xen. The content of XEN3_* is merged into XEN3PAE_*, with "options PAE" set. commit 2a1843524cd5c6a036f44fedafe84411ce07b159 Author: maxv Date: Thu Jul 26 15:46:09 2018 +0000 Retire XENDEBUG_LOW, and switch its only user to XENDEBUG. commit 31f6b38c8ddda96d9d9d873b703807f47246f1d5 Author: maxv Date: Thu Jul 26 15:38:26 2018 +0000 Merge the content of xen_debug.c into xen_machdep.c, there is only one function. commit 84618c8b286d6a45a8a1110240831963d69c1fb3 Author: maxv Date: Thu Jul 26 15:26:10 2018 +0000 Remove dead code. This looks like a leftover from when our Xen port was being developed (2004), and it seems to have been copied from the Xen kernel examples. It can't have any use, so get rid of it. Also remove vprintk, unused. commit 8c45a4a1d92561b20368dd139a689bb1a07667a9 Author: maxv Date: Thu Jul 26 15:06:14 2018 +0000 Remove dead code. commit da6bf54aeed8200ee8893e4cfc2a64320b510a39 Author: maxv Date: Thu Jul 26 09:29:08 2018 +0000 Rework dbregs, to switch the registers during context switches, and not on each user->kernel transition via userret. Reloads of DR6/DR7 are expensive on both native and xen. commit fe6095293b8e00bae9fa25247a3ab941bfa3af3f Author: maxv Date: Thu Jul 26 08:22:19 2018 +0000 Remove useless/outdated comments. No functional change. commit 99175a7e372690ba66f062c5e56333c20cd461fc Author: maxv Date: Thu Jul 26 08:18:25 2018 +0000 Merge the blocks. No functional change. commit 92715e46f5d8223bef74b02d9c8c523ff4a9669d Author: maxv Date: Thu Jul 26 08:08:24 2018 +0000 Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for amd64, so use ifdef __x86_64__. No functional change. commit 78ffe51201176b29ca9e486d5a9996b201b527d4 Author: maxv Date: Wed Jul 25 11:47:07 2018 +0000 Remove NPTECL, unused. commit 3d21e553ce7775d79035330de1e5d852f9e998dc Author: maxv Date: Tue Jul 24 10:05:36 2018 +0000 Add a "support" section. commit 56acbd0deec6ca2a6ac8e3854698756b68aae7e4 Author: maxv Date: Tue Jul 24 09:50:37 2018 +0000 Use errx, there is no errno. commit 690ca6f10e73de642db2f720be69de0321c5fdd5 Author: maxv Date: Tue Jul 24 09:47:35 2018 +0000 Merge the tprof_pmi and tprof_amdpmi modules into a single tprof_x86 module. commit be7e0e3acc07b90981acb5d057ada6fd3e518a71 Author: maxv Date: Sun Jul 22 15:02:51 2018 +0000 Clean up dbregs; remove useless comments, remove arguments from prototypes, style, add KASSERT and move x86_dbregspl into dbregs.c. No real functional change. commit e75fdcd7b9d95ba6584141502fae4e66948a9772 Author: maxv Date: Sat Jul 21 21:26:30 2018 +0000 I realized the changes I made broke the !aslr conf, so enable aslr by default now rather than later (and rather than adding more ifdefs). Now the location of the direct map is randomized at boot time in GENERIC. commit 9c0fa240d4cded3ef5e71bf5627ffdfb8c4d99fb Author: maxv Date: Sat Jul 21 16:21:27 2018 +0000 Forgot to commit a change in i386/cpufunc.S; add rdtsc(), so that it can be used in cpu_rng. Restore the cpu_rng code back to how it was in my initial commit. commit 1a242af56750dffe658e04120fd62b8f1d7b86cc Author: maxv Date: Sat Jul 21 07:46:56 2018 +0000 Create /dev/ksyms as "440 $g_kmem". This prevents unprivileged users from reading the kernel symbols. Discussed in January 2018 on tech-kern@, reported by maya@, tested by tih@. commit fbdad2bb9ab1d775a451118227cb1737869e96db Author: maxv Date: Sat Jul 21 06:30:27 2018 +0000 Remove "no options GPROF", we don't have GPROF in the x86 kernels anymore. By the way this caused a warning because GPROF is not defflag'ed correctly... commit f90d05455ff14939d3a09a245aec33f6476db4e8 Author: maxv Date: Sat Jul 21 06:28:02 2018 +0000 note removal of tpfmt commit 6537f2db43deb6f5393b229eb947a3a7f2a78d47 Author: maxv Date: Sat Jul 21 06:25:29 2018 +0000 Remove the tprof_amdpmi.4 and tprof_pmi.4 man pages. commit fcfcb2fc86f3b739efb7336f079fee803b40ec28 Author: maxv Date: Sat Jul 21 06:09:13 2018 +0000 More ASLR. Randomize the location of the direct map at boot time on amd64. This doesn't need "options KASLR" and works on GENERIC. Will soon be enabled by default. The location of the areas is abstracted in a slotspace structure. Ideally we should always use this structure when touching the L4 slots, instead of the current cocktail of global variables and constants. machdep initializes the structure with the default values, and we then randomize its dmap entry. Ideally machdep should randomize everything at once, but in the case of the direct map its size is determined a little later in the boot procedure, so we're forced to randomize its location later too. commit fe3f159b0511b2e2f7a50872c2ed5a07dec16b53 Author: maxv Date: Mon Jul 16 06:18:31 2018 +0000 Move arch/x86/x86/tprof_pmi.c arch/x86/x86/tprof_amdpmi.c into dev/tprof/tprof_x86_intel.c dev/tprof/tprof_x86_amd.c commit 357b53d09fac8c41b457971e32aa2699e0480743 Author: maxv Date: Sun Jul 15 08:47:43 2018 +0000 Hum. Move the __HAVE_DIRECT_MAP block a little below, otherwise dynamically loaded kernel modules use a wrong offset for some ci_* fields. Found when modloading tprof_amd on an AMD 10h, the read of ci_signature was at a wrong address, and the cpu family was not detected correctly. commit f052c0fd33c618ca935eaf24cee31fb66fd4d4f0 Author: maxv Date: Sun Jul 15 06:14:21 2018 +0000 Remove unused x86/include/tprof.h, there should be no need for this kind of includes. commit 73993c39a46299b5caf056b8a85f33cfe24514c2 Author: maxv Date: Sun Jul 15 05:25:20 2018 +0000 Note improved tprof and removal of ipkdb. commit acf442c8c69460b946691b80b09424e4a5570000 Author: maxv Date: Sun Jul 15 05:16:40 2018 +0000 Retire ipkdb entirely. The option was removed from the config files yesterday. ok kamil christos commit 9f996fbe41605809208d7390814571490521e7ac Author: maxv Date: Sat Jul 14 15:09:40 2018 +0000 Remove "options IPKDB", and the other associated options, from the config files. ipkdb is being retired. Its code is really old, and hasn't kept pace with today's expectations: IPv6, SMP, modern NICs. The associated code for x86 was already removed because it was too incorrect to stay. There are plans to rewrite a similar feature from scratch. ok kamil christos commit 6b245a4bbba498cab8df2b0520bf6e6ddf9ba8c3 Author: maxv Date: Sat Jul 14 14:56:02 2018 +0000 Remove "options DEBUG_BY_TOOLS", it doesn't exist. commit ac8b4292fbadf12e2ceee67763e289ab4b58f2bd Author: maxv Date: Sat Jul 14 14:46:41 2018 +0000 Add splhigh() around the FPU code, we don't want to be preempted in the middle, this could corrupt the FPU state and trigger undefined behavior. Intentionally use splhigh and not kpreempt_disable, to match the generic x86 FPU code. Compile-tested only (I don't have VIA). Found by Maya almost a year ago. commit 4451b86e6e361fa7678bae75f197adfff86a44fb Author: maxv Date: Sat Jul 14 14:34:32 2018 +0000 Remove ifdef GPROF. commit dde30e2bc15631b86499af9146dfd1c1749f0bfb Author: maxv Date: Sat Jul 14 14:29:40 2018 +0000 Drop NENTRY() from the x86 kernels, use ENTRY(). With PMCs (and other hardware tracing facilities) we have a much better ways of monitoring the CPU activity than GPROF, without software modification. Also I think GPROF has never worked, because the 'start' functions of both i386 and amd64 use ENTRY(), and it would have caused a function call while the kernel was not yet relocated. commit 4f909702074aa730335689072c0c730619ee7074 Author: maxv Date: Sat Jul 14 07:54:37 2018 +0000 specialreg.h is x86-specific, don't include it commit e5fdd7efcc0c71f7bc24669bd8767f2e0c63834b Author: maxv Date: Sat Jul 14 07:54:04 2018 +0000 Finish the Skylake/Kabylake table, and improve the output of "tprof analyze". commit 6eb509759ecdb8e48309d7a1e71ca0ff3e533d79 Author: maxv Date: Fri Jul 13 12:04:50 2018 +0000 Ask for a file path with the "analyze" command, instead of reading stdin. commit f0e63fcc4a13e00580569c99e790f5ee2d561c55 Author: maxv Date: Fri Jul 13 11:14:14 2018 +0000 Remove tpfmt(1). Its code was merged into tprof(8). commit f0d56025d7ff91bd7abb76a6d421ac5692265ee1 Author: maxv Date: Fri Jul 13 11:03:36 2018 +0000 Merge tpfmt(1) into tprof(8). We want to have access to everything with only one tool. The code is copied mostly as-is, and the functionality is available via the "analyze" command. Eg: tprof monitor -e llc-misses:k -o myfile.out sleep 20 tprof analyze < myfile.out Will move soon, I don't like the reading via stdin. commit 64ae52880143aa4628825a2c2b515afcd9767a89 Author: maxv Date: Fri Jul 13 09:58:49 2018 +0000 Remove KAUTH_MACHDEP_X86PMC, now unused. commit a5dbb87c6fe3ad9f410c0d18e567c924c3974ea0 Author: maxv Date: Fri Jul 13 09:53:42 2018 +0000 Skylake/Kabylake are family 6, so add a check for that. While here improve the layout of "tprof list". commit 0bbf0707acb9b42adf95e7c621b0b526a9dbe621 Author: maxv Date: Fri Jul 13 09:37:32 2018 +0000 Remove the X86PMC code I had written, replaced by tprof. Many defines become unused in specialreg.h, so remove them. We don't want to add defines all the time, there are countless PMCs on many generations, and it's better to just inline the event/unit values. commit 527e87936bd15c89140948e8fb9794cc4b389209 Author: maxv Date: Fri Jul 13 09:15:55 2018 +0000 Remove the usr.bin/pmc tool. People should use tprof instead. commit ea978a909c6e98594666d660c1a849bc6a5f2c61 Author: maxv Date: Fri Jul 13 09:04:31 2018 +0000 Change the arguments of the tprof tool, to match the behavior of pmc(1) and cpuctl(8). They become: tprof list tprof monitor -e name:option [-o outfile] command commit 6d66b56ae8fb1e4c091234b9e2517635de9cdc03 Author: maxv Date: Fri Jul 13 08:09:21 2018 +0000 Inline the values in amd_f10h_names[], we're not going to use defines for each CPU model found in the wild. commit 9c6b137cb77aa5fb99ce6d51446f985ff643fc61 Author: maxv Date: Fri Jul 13 07:56:29 2018 +0000 Revamp tprof. Rewrite the Intel backend to use the generic PMC interface, which is available on all Intel CPUs. Synchronize the AMD backend with the new interface. The kernel identifies the PMC interface, and gives its id to userland. Userland then queries the events itself (via cpuid etc). These events depend on the PMC interface. The tprof utility is rewritten to allow the user to choose which event to count (which was not possible until now, the event was hardcoded in the backend). The command line format is based on usr.bin/pmc, eg: tprof -e llc-misses:k -o output sleep 20 The man page is updated too, but the arguments will likely change soon anyway so it doesn't matter a lot. The tprof utility has three tables: Intel Architectural Version 1 Intel Skylake/Kabylake AMD Family 10h A CPU can support a combination of tables. For example Kabylake has Intel-Architectural-Version-1 and its own Intel-Kabylake table. For now the Intel Skylake/Kabylake table contains only one event, just to demonstrate that the combination of tables works. Tested on an Intel Core i5 Kabylake. The code for AMD Family 10h is taken from the code I had written for usr.bin/pmc. I haven't tested it yet, but it's the same as pmc(1), so I guess it works as-is. The whole thing is written in such a way that (I think) it is not complicated to add more CPU models, and more architectures (other than x86). commit 14a4fb4eb1f003d8176d9b4971de0988dc5ceeae Author: maxv Date: Thu Jul 12 19:48:16 2018 +0000 Handle NMIs correctly when SVS is enabled. We store the kernel's CR3 at the top of the NMI stack, and we unconditionally switch to it, because we don't know with which page tables we received the NMI. Hotpatch the whole thing as usual. This restores the ability to use PMCs on Intel CPUs. commit ff28d416e27cd7ffdfec51d3ff28502900280502 Author: maxv Date: Thu Jul 12 18:39:09 2018 +0000 Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed. I was playing around with PMCs, and was wondering why some cache misses were occurring in svs_pdir_switch while I had SVS disabled. commit 4028a7afbf52949cebbb4a72eda66df6bc446e50 Author: maxv Date: Thu Jul 12 10:46:40 2018 +0000 Remove the kernel PMC code. Sent yesterday on tech-kern@. This change: * Removes "options PERFCTRS", the associated includes, and the associated ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is good. * Removes the PMC code of ARM XSCALE. * Removes all the pmc.h files. They were all empty, except for ARM XSCALE. * Reorders the x86 PMC code not to rely on the legacy pmc.h file. The definitions are put in sysarch.h. * Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control and sys_pmc_get_info syscalls. They are marked as OBSOL in kern, netbsd32 and rump. * Removes the pmc_evid_t and pmc_ctr_t types. * Removes all the associated man pages. The sets are marked as obsolete. commit dc1c83140cb32a9f1996e2c5071d1107e0f62ce9 Author: maxv Date: Thu Jul 12 07:06:35 2018 +0000 ...and obsolete the html of pmc.3 too... We will obsolete all the pmc* references anyway. commit 5ba15cc13dd9366c874be4c2ebd7379014952bcb Author: maxv Date: Thu Jul 12 07:04:15 2018 +0000 Obsolete pmc.3, it was part of libpmc. commit 769d697a436d91dd19d607c2e21e9fe85d1294e6 Author: maxv Date: Thu Jul 12 06:52:48 2018 +0000 Retire libpmc. It uses the legacy PMC interface in the kernel, which has support for only one ARM CPU. It used to have x86 support, but it was broken and I removed it. The legacy PMC interface will be removed from the kernel too. Sent on tech-kern@ yesterday, thorpej was fine. commit 763154da1fc06952648de45c68c2198197b22de7 Author: maxv Date: Wed Jul 11 06:25:05 2018 +0000 Add KASSERTs in in_undefer_cksum_tcpudp. commit cad93c0968abe137c0f64c780e27a47c275f025e Author: maxv Date: Wed Jul 11 06:00:34 2018 +0000 Style, rename 'iph' -> 'ip', and reduce the diff between in_undefer_cksum_tcpudp and the last part of in_undefer_cksum. commit ebffe4f9c2dd84d9d50954df7c0ce1aa35eea36d Author: maxv Date: Wed Jul 11 05:38:55 2018 +0000 Remove the callback, localify, and add a comment. commit 6f38db4ae3975b57486b6766b44285f999536b5e Author: maxv Date: Wed Jul 11 05:25:45 2018 +0000 Rename ip_undefer_csum -> in_undefer_cksum in_delayed_cksum -> in_undefer_cksum_tcpudp The two previous names were inconsistent and misleading. Put the two functions into in_offload.c. Add comments to explain what we're doing. The same could be done for IPv6. commit bb6ca18e614289a5e760f1b2ce57d65a62b09690 Author: maxv Date: Tue Jul 10 16:49:09 2018 +0000 Modify the logic in npf_reassembly. Don't call nbuf_reset, we don't need it since we don't read the IPv4 header anymore. If ip{6}_reass_packet fails, always free 'm', and always clear the nbuf. We want to avoid the case where 'm' was reallocated the nbuf pointer was not updated accordingly the caller tried to use the nbuf pointer This case doesn't happen right now, but the code is fragile, so strengthen it. commit 7e7cfcb1dc8a73095bc5ba6e091a8fb5eab11694 Author: maxv Date: Tue Jul 10 15:46:58 2018 +0000 Remove the second argument from ip_reass_packet(). We want the IP header on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little. No real functional change. commit 65e622d974671528c7614ea9a962daf0fec3c43d Author: maxv Date: Tue Jul 10 15:25:01 2018 +0000 Simplify the pointer handling. Set *mp = NULL at the beginning of the function. In npf_reassembly, pass a simple boolean instead of a ** mbuf pointer. Add a KASSERT for IPv4, we don't want (error && !m). Remove the 'fastout' label, use 'out'. commit 1d5ab98361bee5aae1d4f168acbe416b4dc24701 Author: maxv Date: Tue Jul 10 14:04:07 2018 +0000 Update the pointer when fast-kicking, because it may have been freed. Before my changes the nonsensical pointer ininitialization held, but when I started introducing sanity checks the whole thing collapsed. Need pullup-8. commit 40b76870d397db3a0d9af20c934eedc544ce2938 Author: maxv Date: Tue Jul 10 12:31:46 2018 +0000 Set con = NULL just once, instead of doing it in each branch. commit 538c2fcbc6d1f16ad460665508abf432f3f38061 Author: maxv Date: Tue Jul 10 06:44:49 2018 +0000 Fix bug, SPINOUT() is not supposed to take the value given to BACKOFF(). Here the exponential backoff is wrecked. commit 206328c736a19f3e1424c66a2cda3e0f938d028a Author: maxv Date: Mon Jul 9 18:52:04 2018 +0000 Don't push/pop %rdx, we don't care about preserving its value. commit acf1c69f6df46f25de13b7e1012864b7666a95c5 Author: maxv Date: Mon Jul 9 18:43:05 2018 +0000 Small optimization: don't execute the Meltdown/SpectreV2 cswitch code if we're leaving a softint. We were executing the softint with the LWP's context, so no need to switch the SVS/IBRS contexts, we already are in the desired contexts. commit 37c947dc95f961675f7deec3aae7556d2890f2f4 Author: maxv Date: Sun Jul 1 08:32:41 2018 +0000 Use a variable-sized memcpy, instead of copying the PCB and then adding the extra bytes. The PCB embeds the biggest static FPU state, but our real FPU state may be smaller (FNSAVE), so we don't need to memcpy the extra unused bytes. commit d8acc6afcedc67a0dd812cfd51cb8b820b431987 Author: maxv Date: Sun Jul 1 07:59:30 2018 +0000 Optimize FNSAVE. The size of its save area is 108 bytes, so don't set x86_fpu_save_size = 512, because otherwise we uselessly memset extra bytes at execve time. While here use sizeof instead of hardcoded values. commit 6f2b94b7d5add64e0617301d30402e4e66f88a41 Author: maxv Date: Sun Jul 1 07:18:56 2018 +0000 Use a switch, we can (and will) optimize each case separately. No functional change. commit 9a42ee4f375559acadb8bfe7f17bcdc633e3d90f Author: maxv Date: Fri Jun 29 19:34:35 2018 +0000 Add more KASSERTs. Should help PR/53399. commit 4e26b978c293dab62748cd62ff71e02924f60b05 Author: maxv Date: Fri Jun 29 19:21:43 2018 +0000 Call fpu_eagerswitch a little later, after we make sure newlwp is not pinned. Because if it is, the fpu state of the lwp we are context-switching to is already installed on the current cpu, so no point re-installing it. Or, it isn't, and in this case we don't want to install it. This wrong re-installation can occur when we leave a softint. It may fix bugs in places that call fpusave_lwp with spl != IPL_HIGH, and that expect the fpu state to stay in memory. As far as I can tell only cpu_lwp_free meets these conditions, and as far as I can tell again, there it's harmless. Should help PR/53399. commit 01cbe9ad9d6d961391e1a8df2666b1575086e06a Author: maxv Date: Sun Jun 24 18:24:53 2018 +0000 Sync the ld scripts: * Force a PAGE_SIZE alignment of .bss on i386. Normally that's not required since the bootloader ensures page alignment, but let's be safe. Same on Xen-i386. * Fill the .text section padding with int3 instructions on Xen kernels, to prevent FALLTHROUGHs if a pointer goes crazy, same as native. commit 135e7cb1870430aea5e37ef1a1123472e60d60ad Author: maxv Date: Sat Jun 23 10:06:02 2018 +0000 Add XXX in fpuinit_mxcsr_mask. commit 862083c1391794a457930dbaf574041afb114df1 Author: maxv Date: Sat Jun 23 10:02:39 2018 +0000 Reorder the code a little. On Xen, return earlier, we don't need to do the XSAVE-related initialization if we don't support XSAVE. commit 470bf39cc8afd244b847ce3f4eaa8634e2f5008f Author: maxv Date: Sat Jun 23 09:51:34 2018 +0000 Revert the rest of jdolecek's changes. This puts us back in a clean, sensical state. commit ffef76c8107111d713af3c415b6a2bd2dfaad74f Author: maxv Date: Sat Jun 23 06:57:24 2018 +0000 constify commit ebd900ca04466ca9e6ef7dc30cfb5a6ae92be464 Author: maxv Date: Sat Jun 23 06:40:43 2018 +0000 constify commit ddd48480fd5bee258bec517bafca23dc6a8a8306 Author: maxv Date: Fri Jun 22 06:22:37 2018 +0000 Revert jdolecek's changes related to FXSAVE. They just didn't make any sense and were trying to hide a real bug, which is, that there is for some reason a wrong stack alignment that causes FXSAVE to fault in fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And as seen several months ago, as well. The rest of the changes in XSAVE are wrong too, but I'll let him fix these ones. commit 82dc3e154e0a4a768436dd3fcf2f2dfd7c2c842c Author: maxv Date: Thu Jun 21 17:03:45 2018 +0000 remove unused arguments commit d342e2a6afad53da842cf808475ae174f84dba72 Author: maxv Date: Thu Jun 21 16:53:10 2018 +0000 Fix use-after-free, m_cat can free m. commit 6c3d209311d4a12f9f3ac45cc4b76dbb9beb76d6 Author: maxv Date: Wed Jun 20 11:57:22 2018 +0000 Use PMAP_DIRECT_UNMAP. commit a05ff54e37ae64c281d2a9a580d5b93116f3a426 Author: maxv Date: Wed Jun 20 11:49:37 2018 +0000 Add and use bootspace.smodule. Initialize it in locore/prekern to better hide the specifics from the "upper" layers. This allows for greater flexibility. commit 53393d9c4c269bdc2e62467f4384f4ce74bb05a4 Author: maxv Date: Wed Jun 20 11:45:25 2018 +0000 Put these arrays in .rodata, they aren't supposed to be executable. commit d6520e46db12fcdeb92389a45f5b3505a6689bc7 Author: maxv Date: Tue Jun 19 09:25:13 2018 +0000 When using EagerFPU, create the fpu state in execve at IPL_HIGH. A preemption could occur in the middle, and we don't want that to happen, because the context switch would use the partially-constructed fpu state. The procedure becomes: splhigh unbusy the current cpu's fpu create a new fpu state in memory install the state on the current cpu's fpu splx Disabling preemption also ensures that x86_fpu_eager doesn't change in the middle. In LazyFPU mode we drop IPL_HIGH right away. Add more KASSERTs. commit 8a5a96e5863af704d845ca270f124e4a235ef670 Author: maxv Date: Tue Jun 19 07:23:44 2018 +0000 Explicitly clear l2's pcb_fpcpu when forking. A context switch (preemption) could occur between fpusave_lwp(l1, true); and memcpy(pcb2, pcb1, sizeof(struct pcb)); In this case, l1's FPU state is re-installed on the current CPU, and pcb1->pcb_fpcpu becomes non NULL. While it's fine to have l1's state installed, we don't want to indicate l2's state is installed too. With lazy fpu this was not a problem, because the context-switch would not re-install the state, so pcb1->pcb_fpcpu was NULL. Should fix PR/53383. commit d81fd13c913e805c9b027ad050ba0f7ffde8057a Author: maxv Date: Mon Jun 18 20:20:27 2018 +0000 Add more KASSERTs, see if they help PR/53383. commit 411475c4df4c20dafb0fe45d47571ba7c7fbdb58 Author: maxv Date: Mon Jun 18 06:09:56 2018 +0000 todo list for kaslr, with the issues I can think of right now commit 06e738cdcd2898cee4763cf65d60541a9feb75b5 Author: maxv Date: Sun Jun 17 15:46:39 2018 +0000 i586 and below don't have this 3-byte nop, so use three 1-byte nops, reported by Nathanial Sloss commit aa22ac3a7c6b637f5bae09350fa5f09bc56c2d0d Author: maxv Date: Sun Jun 17 07:13:02 2018 +0000 Enable eager fpu automatically at boot time if the cpu is affected. Intel hasn't published a list of its affected products, but it appears that Xen was given this information since they have a specific detection code. We could just unconditionally enable eager; but on x86_32 eager may have a greater performance cost than lazy, and we don't want to lose performance on unaffected (and ~old) CPUs running NetBSD/i386. So use the same code as Xen: take Family 6, and whitelist certain models. commit a3e83c49ee4087c4ec537aa1b6601b3e45fd6983 Author: maxv Date: Sun Jun 17 06:03:40 2018 +0000 No, I meant to put the panic in fpudna not fputrap. Also appease it: panic only if the fpu already has a state. We're fine with getting a DNA, what we're not fine with is if the DNA is received while the FPU is busy. I believe (even though I couldn't trigger it) that the panic would otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup, probably. commit ee8255833c6b837797764c035e0c173e37ac82eb Author: maxv Date: Sat Jun 16 17:11:13 2018 +0000 Need IPIs when enabling eager fpu switch, to clear each fpu and get us started. Otherwise it is possible that the first context switch on one of the cpus will restore an invalid fpu state in the new lwp, if that lwp had its fpu state stored on another cpu that didn't have time to do an fpu save since eager-fpu was enabled. Use barriers and all the related crap. The point is that we want to ensure that no context switch occurs between [each fpu is cleared] and [x86_fpu_eager is set to 'true']. Also add KASSERTs. commit 4f9560d8a37667785d41231b5a71ef05e17f702b Author: maxv Date: Sat Jun 16 05:52:17 2018 +0000 Actually, don't do anything if we switch to a kernel thread. When the cpu switches back to a user thread the fpu is restored, so no point calling fninit (which doesn't clear all the states anyway). commit f7a877f1e135b32fb18185301a320fb5127e1621 Author: maxv Date: Thu Jun 14 18:00:15 2018 +0000 Install the FPU state on the current CPU in setregs (execve). commit 1197dc69cbc787f412d83a647019e1b6e5093fc7 Author: maxv Date: Thu Jun 14 17:58:22 2018 +0000 Eager FPU on i386. commit 53d0c6487f65e5281d664cdba5ada9adfb5d0339 Author: maxv Date: Thu Jun 14 14:48:59 2018 +0000 SpectreV4, backports in NetBSD-8, no XSAVEOPT commit 5ff1198da2a209bc9fa5f7b52b34c7b1387c860a Author: maxv Date: Thu Jun 14 14:36:46 2018 +0000 Add some code to support eager fpu switch, INTEL-SA-00145. We restore the FPU state of the lwp right away during context switches. This guarantees that when the CPU executes in userland, the FPU doesn't contain secrets. Maybe we also need to clear the FPU in setregs(), not sure about this one. Can be enabled/disabled via: machdep.fpu_eager = {0/1} Not yet turned on automatically on affected CPUs (Intel Family 6). More generally it would be good to turn it on automatically when XSAVEOPT is supported, because in this case there is probably a non-negligible performance gain; but we need to fix PR/52966. commit 0a68d2170b052e90dee1cad467e24976e27264d7 Author: maxv Date: Sun Jun 3 10:59:35 2018 +0000 Constify atu_devs[] so that it lands in .rodata (600 bytes). commit 2a429335f99f6b48a747f585af968675597ed34e Author: maxv Date: Sun Jun 3 10:45:16 2018 +0000 Constify ahc_pci_ident_table[] so that it lands in .rodata (1488 bytes). commit 784d89583a458c508c781bd5eaac5c091fd91013 Author: maxv Date: Sun Jun 3 10:37:23 2018 +0000 Constify a bunch of global varialbes under ipf/ so that they land in .rodata (3472 bytes). Also, remove ipf_tuneables[], unused. commit f9d509a9f594cd56fd5a1a30ee9539b2c80461b7 Author: maxv Date: Sun Jun 3 10:24:24 2018 +0000 Constify several variables in ixgbe/ so that they land in .rodata (1038 bytes). commit a1c6acbe3f4f35056b82d04c146b03ea0b093b3d Author: maxv Date: Sun Jun 3 10:13:54 2018 +0000 Constify lpcib_devices[] so that it lands in .rodata (1584 bytes). commit 62ed1ebd980d731b0f0dcd2534f618a9316ae7ec Author: maxv Date: Sun Jun 3 10:04:40 2018 +0000 Constify ug2_mb[], so that it lands in .rodata. commit 77efda58ae3127fdd232cc0ec705a18ee709bf8f Author: maxv Date: Sun Jun 3 10:01:21 2018 +0000 Constify the microcode variables used by BNX. This moves 38 pages of kernel memory from .data to .rodata. commit dc2b9cdf572a89029129bc605a666c1748a99e5c Author: maxv Date: Sat Jun 2 11:56:57 2018 +0000 Copy more mbuf flags. commit 7528bba2002fbc9abe4535c22f839a2c798b6766 Author: maxv Date: Fri Jun 1 09:34:39 2018 +0000 Fix M_PKTHDR use in if_alc, if_age and if_ena. if_alc and if_age always put in _rxhead a M_PKTHDR-flagged mbuf, so the flag must always be present. Instead of manually adding the flag, add a KASSERT to ensure it is already there. If it weren't, there would be memory corruptions. Same in if_ena, but this one does not compile so we don't really care. Also, use m_remove_pkthdr to remove the flag, instead of doing it manually. This ensures the tags get freed (even though these drivers don't seem to be using mtags). commit cfff0edae0ea5c8fb52f42a433cf7707f5517e9a Author: maxv Date: Fri Jun 1 09:10:52 2018 +0000 Use m_remove_pkthdr() instead of "&= ~M_PKTHDR", to ensure the tags get freed. Several other drivers have this problem it seems... commit ddbb812e2081c75e8c3efca61a9bc796489480a4 Author: maxv Date: Fri Jun 1 08:56:00 2018 +0000 Rename M_CSUM_DATA_IPv6_HL -> M_CSUM_DATA_IPv6_IPHL M_CSUM_DATA_IPv6_HL_SET -> M_CSUM_DATA_IPv6_SET Reduces the diff against IPv4. Also, clarify the definitions. commit 44f42bbc5895cf14d1ba14382345ed4318100855 Author: maxv Date: Thu May 31 15:41:11 2018 +0000 Add XXX for NULL deref. Not sure how to fix it, not sure we care either... commit 4c679d8ac6aba4845bed0c8ee55add7e67e0e101 Author: maxv Date: Thu May 31 15:34:25 2018 +0000 Clarify, remove superfluous things. commit 147c8dfb8f873478b9478dfa6af29909fa9a74d2 Author: maxv Date: Thu May 31 15:06:45 2018 +0000 Adapt rev1.75, suggested by Alexander Bluhm. Relax the checks to allow protocols smaller than two bytes (only IPPROTO_NONE). While here style. commit 5f9bceb8f27492e6bfaf43d0ebf06a67336a7bf3 Author: maxv Date: Thu May 31 13:51:56 2018 +0000 Remove the non-IKE part of the computation, too. commit 401cbf39b2df649b6d05a5b71de63084990c03b7 Author: maxv Date: Thu May 31 07:16:16 2018 +0000 Disable draft_00 in racoon, discussed on tech-net@ and now in PR/53334. While here clarify the comments, no #undef. No need to increase the library version I guess, since draft_00 is not used in libipsec. commit 932cd5d30dac228b648ab2c44891798431078eff Author: maxv Date: Thu May 31 07:03:57 2018 +0000 Remove support for non-IKE markers in the kernel. Discussed on tech-net@, and now in PR/53334. Basically non-IKE markers come from a deprecated draft, and our kernel code for them has never worked. Setsockopt will now reject UDP_ENCAP_ESPINUDP_NON_IKE. Perhaps we should also add a check in key_handle_natt_info(), to make sure we also reject UDP_ENCAP_ESPINUDP_NON_IKE in the SADB. commit 935ddab25d7e904e91d183892c28733ac253011d Author: maxv Date: Thu May 31 06:25:41 2018 +0000 Constify ipseczeroes, and remove one use of it. commit 2fce44d99fdda05c64793c8c5af94ac9f52c195a Author: maxv Date: Thu May 31 06:14:18 2018 +0000 Add a comment and a KASSERT. I remember wondering whether this check was a problem, since ARC4 has a blocksize of one. Normally ARC4 can't be used in IPsec. commit afbeb36a7dcadae189f76fd2398e75cb25a1fd68 Author: maxv Date: Thu May 31 05:52:09 2018 +0000 style commit 6746d4709d7f1d786831214ddccc1e73528e6627 Author: maxv Date: Wed May 30 18:02:40 2018 +0000 Correctly handle the padding for IPv6-AH, as specified by RFC4302. Seen in a FreeBSD bug report, by Jason Mader. The RFC specifies that under IPv6 the complete AH header must be 64bit- aligned, and under IPv4 32bit-aligned. That's a rule we've never respected. The other BSDs and MacOS never have either. So respect it now. This makes it possible to set up IPv6-AH between Linux and NetBSD, and also probably between Windows and NetBSD. Until now all the tests I made were between two *BSD hosts, and everything worked "correctly" since both hosts were speaking the same non-standard AHv6, so they could understand each other. Tested with Fedora<->NetBSD, hmac-sha2-384. commit ee9284b3760b1356dfe874b7ea49091ee6c7edc0 Author: maxv Date: Wed May 30 17:17:11 2018 +0000 Introduce ah_authsiz, which computes the length of the ICV only. Use it in esp_hdrsiz, and clarify. Until now we were using ah_hdrsiz, and were relying on the fact that the size of the AH header happens to be equal to that of the ESP trailer. Now the size of the ESP trailer is added manually. This also fixes one branch in esp_hdrsiz: we always append an ESP trailer, so it must always be taken into account, and not just when an ICV is here. commit df8e34544a058a33a6894cf8caac0ffbba7d0b4f Author: maxv Date: Wed May 30 16:49:38 2018 +0000 Apply the previous change in esp_input too, same as esp_output. commit ccd3d9fdeea1046e06d3afbe387809f25c87caca Author: maxv Date: Wed May 30 16:43:29 2018 +0000 Remove dead code, 'espx' is never NULL and dereferenced earlier, so no need to NULL-check all the time. commit 44266fb307a249d9b70a9b044588797171391ab2 Author: maxv Date: Wed May 30 16:32:26 2018 +0000 Simplify the padding computation. Until now 'padlen' contained the ESP Trailer (two bytes), and we were doing minus two all the time. Declare 'tlen', which contains padlen+ESP_Trailer+ICV, and use 'struct esptail' instead of hardcoding the construction of the trailer. 'padlen' now indicates only the length of the padding, so no need to do -2. commit 0ab4c0f9d9a19bb789cadbe65aec8dd3021caebb Author: maxv Date: Wed May 30 16:15:19 2018 +0000 Rename padding -> padlen, pad -> tail, and clarify. commit ff5910961cf679416bbcbb10620f9d4099838e46 Author: maxv Date: Tue May 29 17:21:57 2018 +0000 Fix an XXX of mine, be clearer about what we're doing. Basically we want to preserve the fragment offset and flags. That's necessary if the packet we're fragmenting is itself a fragment. commit 00b1e746c972718de30dbf5baf0531b22857598f Author: maxv Date: Tue May 29 16:50:38 2018 +0000 Strengthen and simplify, once more. commit 2a2c8286126c40cf427ef198090bead2071e5d26 Author: maxv Date: Tue May 29 16:29:47 2018 +0000 Remove aarp_clean, unused. By the way this function was probably buggy since it didn't reset aat_hold to NULL. commit c7889787ad09c9e90100cd17b20a2e284179af9f Author: maxv Date: Tue May 29 16:24:34 2018 +0000 Remove an XXX of mine, actually it's fine. While here also remove a misleading printf. commit 3aa464728e06a4d9985a7695eae20860e6c597ec Author: maxv Date: Tue May 29 16:21:30 2018 +0000 Remove dead code, we don't care. commit ae37ae8f9dd7c62b55f78b29cc504ec5371a6c58 Author: maxv Date: Tue May 29 08:24:59 2018 +0000 Replace KASSERT by m_pullup. While the ethernet header is always there when the packet was received on a physical interface, it may not be if the packet was received over L2TP/EtherIP. In particular, if the inner ethernet header ends up on two separate IP fragments. Here the KASSERT is triggered, and on !DIAGNOSTIC we corrupt memory. Note that this is a widespread problem: a lot of L2 code was written with the assumption that "most" headers are present in the first mbuf. Obviously, that's not true if L2 encapsulation is being used. commit fbfe704631ec978d7e5cf32dae65f5eaadf9478c Author: maxv Date: Mon May 28 20:45:38 2018 +0000 drop __P, suggested by sevan commit e7337763036f32235ef05c50ed93b92f024896ff Author: maxv Date: Mon May 28 20:34:45 2018 +0000 drop __P, suggested by sevan commit e6933e10ce02ded28e8e1fd2400400c4875758ca Author: maxv Date: Mon May 28 20:18:58 2018 +0000 Mmh, don't automatically set enabled=1 for SpectreV4, the actual mitigation is not yet applied by default. Just so people can test. commit d1ac245e3f05ba13e9f4af2fc31904bb4da0b4f1 Author: maxv Date: Mon May 28 19:52:18 2018 +0000 fix -Wold-style-definition commit 051a817fb9fa05674c4331054aa61d183474a67d Author: maxv Date: Mon May 28 19:39:21 2018 +0000 Remove ipsec_bindump, there is no prototype, so the function can't be used. commit 8d1fcb2be75c0a16bae98a9c54fffe3260ac5d0d Author: maxv Date: Mon May 28 19:36:42 2018 +0000 fix -Wdiscarded-qualifiers commit 126b9624130b056835a0783bac0156d6814f13ce Author: maxv Date: Mon May 28 19:22:40 2018 +0000 fix -Wunused and -Wold-style-definition commit 3d5cefb93441c6cd8eef89f30a882fd229f0998e Author: maxv Date: Fri May 25 16:01:31 2018 +0000 Hide a bunch of local symbols. commit 2c04c3df3a059a892029927cb84a63a7b51527d6 Author: maxv Date: Fri May 25 15:52:11 2018 +0000 Rename the entry points of the prekern, rename the array and move it into .rodata. commit 769deb93b7491eaca8855684e94a987366d8eedf Author: maxv Date: Fri May 25 15:33:56 2018 +0000 When the previous contrext is in kernel mode we are not guaranteed to have a 16-byte-aligned stack pointer, so align it. That's what the CPU would do on exception entry. commit c8b6becb4da6f42950001a4a3d3fac59c47581c5 Author: maxv Date: Wed May 23 18:40:29 2018 +0000 Add XXX. commit f4838dcc2537696858806583fc07589fa0439f48 Author: maxv Date: Wed May 23 10:21:43 2018 +0000 Add a comment about recent AMD CPUs. commit c18dcd7ef146d5a59416b104bff89ad213e0bf83 Author: maxv Date: Wed May 23 10:00:27 2018 +0000 Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87 state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the state there too. commit 5f2a9c7cf62c2714842b78a7155b005dc9f054c0 Author: maxv Date: Wed May 23 07:45:35 2018 +0000 Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that are used only in fpu.c. commit 1f285e9a6a7236ddb8d8d33bc4ef0b9e70a5f74b Author: maxv Date: Wed May 23 07:34:40 2018 +0000 style commit 65082512426530b2521752510e552943dfe89a26 Author: maxv Date: Wed May 23 07:24:37 2018 +0000 Clean up the FPU headers. commit c3e601d3193f6cfb593376fca23e06c43b06f142 Author: maxv Date: Tue May 22 17:14:46 2018 +0000 Extend the AMD NONARCH method to family 17h. The AMD spec states that for 17h care must be taken when handling sibling threads. The concern is that if we have a protected two-thread process running on two siblings, and context switch one thread to another unprotected thread, disabling the SSB protection on one logical core will disable SSB on its sibling too (which is still running the protected thread). All of that doesn't matter to us, because the SSB value we set is system-wide, not per-process. commit 1d5b2fcf01a02ed7a60317e099a3f6b79bfeadee Author: maxv Date: Tue May 22 16:44:42 2018 +0000 Simplify the sysctl handlers. commit 54441675382a08c948d1d87abcc83abf3297e85e Author: maxv Date: Tue May 22 16:36:19 2018 +0000 Forgot switch cases for AMD. commit 6c8bcfca485a9039c23fff5daa96ee8e4d4174e8 Author: maxv Date: Tue May 22 11:09:57 2018 +0000 Mmh, don't compile spectre.c on Xen. commit 0d6a02a5a63c6bd6c9bb7b4e4e088029bebf8d12 Author: maxv Date: Tue May 22 10:20:04 2018 +0000 Implement a mitigation for SpectreV4 on AMD families 15h and 16h. We use a non-architectural MSR. This MSR is also available on 17h, but there SMT is involved, and it needs more investigation. Not tested (I have only 10h). commit 71b745d1747394115ce2ea1960ef7a5e0b1e0f27 Author: maxv Date: Tue May 22 09:25:58 2018 +0000 Several changes: - Move the sysctl initialization code into spectre.c. This way each variable is local. Rename the variables, use shorter names. - Use mitigation methods for SpectreV4, like SpectreV2. There are several available on AMD (that we don't support yet). Add a "method" leaf. - Make SSB_NO a mitigation method by itself. This way we report as "mitigated" a CPU that is not affected by SpectreV4. In this case, of course, the user can't enable/disable the mitigation. Drop the "affected" sysctl leaf. commit b3f36d0df5ead41f598b6c6559a49bdb13ead43a Author: maxv Date: Tue May 22 08:15:26 2018 +0000 Clarify the parameters for the SpectreV2 mitigation. Add: machdep.spectre_v2.swmitigated Rename: machdep.spectre_v2.mitigated -> machdep.spectre_v2.hwmitigated Change the method string, to combine both the hardware and software mitigations. swmitigated is set at compile time, hwmitigated can be set by the user. Examples: spectre_v2.swmitigated = 1 spectre_v2.hwmitigated = 0 spectre_v2.method = [GCC retpoline] spectre_v2.swmitigated = 0 spectre_v2.hwmitigated = 0 spectre_v2.method = (none) spectre_v2.swmitigated = 1 spectre_v2.hwmitigated = 1 spectre_v2.method = [GCC retpoline] + [Intel IBRS] commit b6d61eb7e65505bff29bc7c659e30e141a8dbba5 Author: maxv Date: Tue May 22 07:24:08 2018 +0000 Add RSBA. When set, it indicates that the CPU is vulnerable to SpectreV2 via the RSB. commit 4640ee633d137ef788c2ea99417fb9e94fde631b Author: maxv Date: Tue May 22 07:11:53 2018 +0000 Mitigation for SpectreV4, based on SSBD. The following sysctl branches are added: machdep.spectre_v4.mitigated = {0/1} user-settable machdep.spectre_v4.affected = {0/1} set by the kernel The mitigation is not enabled by default yet. It is not tested either, because no microcode update has been published yet. On current CPUs a microcode/bios update must be applied for SSBD to be available. The user can then set mitigated=1. Even with an update applied the kernel will set affected=1. On future CPUs, where the problem will presumably be fixed by default, the CPU will report SSB_NO, and the kernel will set affected=0. In this case we also have mitigated=0, but the mitigation is not needed. For now the feature is system-wide. Perhaps we will want a more fine-grained, per-process approach in the future. commit 57f08dc2e235d4ce61fe024fe81a696b00e6ed81 Author: maxv Date: Tue May 22 06:31:05 2018 +0000 Reorder and rename, to make the code less SpectreV2-specific. commit c08ee49c89261b00919e2252718c7d8c40dc3771 Author: maxv Date: Sun May 20 09:14:18 2018 +0000 Add a note about FreeBSD. commit 10e0c87ab407de131907d1a18437b29f9f7fd09f Author: maxv Date: Sun May 20 08:55:25 2018 +0000 Update, after ten years. Importantly, add a "History" section, to explain what's going on. We have now become "upstream", and most of the ipsec-tools development is done in NetBSD's CVS. However, many distributions still take their tarballs from SourceForge (which is defunct, and not maintained). commit d209620651df434d50f9d570e3c5722f3eaec266 Author: maxv Date: Sun May 20 06:29:43 2018 +0000 Remove notyet, we've never had this. commit ad32f80954ad6559795787c99c58e98e47a36a20 Author: maxv Date: Sun May 20 06:15:45 2018 +0000 Style. commit a35013b58cd89241494dd491195c7be16982f5b1 Author: maxv Date: Sat May 19 20:40:40 2018 +0000 Remove dead code, and style. commit 9d5b13f302c23711e903ea6019383bb8cc35807a Author: maxv Date: Sat May 19 20:21:23 2018 +0000 Remove unused 'error' variables, it's obvious they should have no use. commit a9fd5d02ab68e8fac311d7968638499c3cd957f3 Author: maxv Date: Sat May 19 20:14:56 2018 +0000 Use strict prototypes, when they don't introduce more warnings than they fix. Also localify a few functions. commit 06642c08b71e80e0658d755c9bb42c06600c50b0 Author: maxv Date: Sat May 19 19:47:47 2018 +0000 Remove unused labels, functions, and function prototypes. commit e5ad283b9c6daaced5c6afe004766efc63cf75d9 Author: maxv Date: Sat May 19 19:32:16 2018 +0000 More unused variables. commit 7402c35cab71f7a875d0495c997edd4b5b7dd5c1 Author: maxv Date: Sat May 19 19:23:15 2018 +0000 Remove unused variables. commit 33d062a47025415d367aa6d96874b1fcb3a6a4c9 Author: maxv Date: Sat May 19 18:51:59 2018 +0000 Style, a little... commit 1651eee6d75b56c9e80c79525f0e411c9aaffe29 Author: maxv Date: Sat May 19 08:22:58 2018 +0000 Style. commit 2d2318920e9984bae8db630e391a820ba71067de Author: maxv Date: Sat May 19 06:44:08 2018 +0000 Remove misleading comment. commit f4f2ddc24df8994f3c95703327688d5f4125cbdf Author: maxv Date: Fri May 18 21:03:33 2018 +0000 Add missing m_put_rcvif_psref. commit 5da2a6c3ee5fd8fe1e557d88a95b0e7063aac750 Author: maxv Date: Fri May 18 18:58:51 2018 +0000 IP6_EXTHDR_GET -> M_REGION_GET, no functional change. commit 3593881298bd9d28bbf1115fd01410f400e4d007 Author: maxv Date: Fri May 18 18:52:17 2018 +0000 IP6_EXTHDR_GET performs a basic mbuf operation, which has nothing to do with IPv6. So declare an IP-independent M_REGION_GET, and make IP6_EXTHDR_GET an alias to it. commit e2a634b078768479540bee16d024dadfc9c6402f Author: maxv Date: Fri May 18 18:28:40 2018 +0000 Remove IP6_EXTHDR_GET0, remove pointless XXXs, and style. commit 83293ebcd6631e575e548f68b0f562aae909b789 Author: maxv Date: Thu May 17 12:07:48 2018 +0000 Fix the KASSERTs. It doesn't matter at all since the packet can't be this big anyway, and there are many other places that have this kind of typo; but still fix it, for the sake of closing PR/49834. commit 24edb16e54dc254749c1de22f62be7b8923d7699 Author: maxv Date: Thu May 17 11:59:36 2018 +0000 Add KASSERTs, related to PR/39794. commit d987e4c7f9bb858f733a52d0eeae226d90a70a51 Author: maxv Date: Thu May 17 07:30:13 2018 +0000 Remove reference to tcpiphdr in comment. commit 0ff6581288d54166c0db9d77ea4bf974bbfc0455 Author: maxv Date: Wed May 16 16:33:23 2018 +0000 Fix compilation on Xen. commit 742dfa9e4eb93a19bcaa32eae00ec82398af0fcd Author: maxv Date: Wed May 16 08:16:36 2018 +0000 Mitigation for CVE-2018-8897 on i386. Contrary to amd64 there is no clear way to determine if we are in kernel mode but with the user context; so we go the hard way, and scan the IDT. On i386 the bug is less of a problem, since we don't have GSBASE. All an attacker can do is panicking the system. commit ab18434a2b5735e780a943e9bbcc00d3cdac83ed Author: maxv Date: Tue May 15 19:16:38 2018 +0000 When reassembling IPv4/IPv6 packets, ensure each fragment has been subject to the same IPsec processing. That is to say, that all fragments are ESP, or AH, or AH+ESP, or none. The reassembly mechanism can be used both on the wire and inside an IPsec tunnel, so we need to make sure all fragments of a packet were received on only one side. Even though I haven't tried, I believe there are configurations where it would be possible for an attacker to inject an unencrypted fragment into a legitimate stream of already-decrypted-and-authenticated fragments. Typically on IPsec gateways with ESP tunnels, where we can encapsulate fragments (as opposed to the general case, where we fragment encapsulated data). Note, for the record: a funnier thing, under IPv4, would be to send a zero-sized !MFF fragment at the head of the packet, and manage to trigger an ICMP error; M_DECRYPTED gets lost by the reassembly, and ICMP will reply with the packet in clear (not encrypted). commit cf3f4ab0e97131dd1f05372f1cbe52cd92d537f4 Author: maxv Date: Mon May 14 17:34:26 2018 +0000 Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument a bool for clarity. Optimize the function: if M_CANFASTFWD is not there (because already removed by the firewall) leave now. Makes it easier to see that M_CANFASTFWD is not removed on IPv6. commit 496daeb9ea65511889f4c24ca13ed7faa5f6e919 Author: maxv Date: Mon May 14 17:26:16 2018 +0000 Don't crash if there is no inner IP header. commit a3ed677a3704c999aae639f4869d2451bf50c022 Author: maxv Date: Sun May 13 18:39:06 2018 +0000 Clarify ESP-in-UDP. commit 3f0ccee32bffb95ab55d031f081bf0e055f0a7eb Author: maxv Date: Sun May 13 18:34:59 2018 +0000 Remove unused calls to nat_t_ports_get. commit 4fb33bd32b6c623be78d31d1c49b62e7afc210d9 Author: maxv Date: Fri May 11 15:43:07 2018 +0000 ENOBUFS -> EACCES when updating the replay counter. commit aedf22bed3a989d61637d40be3902a077d592995 Author: maxv Date: Fri May 11 14:38:28 2018 +0000 Retire ICMPPRINTFS, it's annoying and it doesn't build. commit 39e1ca28aaf6a8ebcf82d6d928a61dd99d06dc63 Author: maxv Date: Fri May 11 14:25:50 2018 +0000 Dedup: introduce rip6_sbappendaddr. Same as IPv4. commit 4843b90c11c98f5adb39f96f1a4a7473fd54ec19 Author: maxv Date: Fri May 11 14:07:58 2018 +0000 Make sure we have at least an IP header, and remove pointless XXXs (there is no issue). commit 725d716b06ba5d1be284484548f86a81bd541438 Author: maxv Date: Fri May 11 13:56:43 2018 +0000 static commit 8dbe101a8ab10977759741242a9e3129cb8bdc19 Author: maxv Date: Fri May 11 13:52:48 2018 +0000 Improve comment, it's not just IPv4. commit 97b6d24d2699069c0511615e3005a351590a0328 Author: maxv Date: Fri May 11 13:50:38 2018 +0000 Clean up, and panic if we call functions that are not supposed to be called. commit 33118bfe3793caccc3e1da83ee30052cd6c386a7 Author: maxv Date: Thu May 10 05:15:14 2018 +0000 Replace dumb code by M_VERIFY_PACKET. In fact, perhaps we should not even call M_VERIFY_PACKET here, there is no particular reason for this place to be more wrong than the rest. commit 7bfecd77b41c6a1a209a941227c3c24ed4e4d789 Author: maxv Date: Thu May 10 05:08:53 2018 +0000 Rename ipsec4_forward -> ipsec_mtu, and switch to void. commit b0ba955d1aac5b525b5774b1c4c3b0a6141e9261 Author: maxv Date: Wed May 9 07:33:31 2018 +0000 static const on ipsecif4_encapsw commit cacb303f134acd2b7d9b3d6ba5293d58a205eddf Author: maxv Date: Wed May 9 07:30:21 2018 +0000 Rename allocopy -> xstrdup, and simplify. commit 2766b630f0cda223a0deb3a7b652f1b4d8434158 Author: maxv Date: Wed May 9 07:21:08 2018 +0000 Clean up. commit b9dd3c63e41d45996abea3b08b47bb771c9bbcd4 Author: maxv Date: Wed May 9 07:05:42 2018 +0000 Remove dead/broken code. commit c7899ea8e53dadd241087ed4862a6ad29d3cb129 Author: maxv Date: Wed May 9 06:55:26 2018 +0000 Remove nonsensical KASSERT. commit e43fa458e8f59084de8a2465b66b1cbf27837468 Author: maxv Date: Wed May 9 06:49:48 2018 +0000 Remove annoying things, style, and fix buffer overflows. commit 2914249883d73930cb99b5674fe8a33476da91ad Author: maxv Date: Wed May 9 06:35:10 2018 +0000 Replace m_copym(m, 0, M_COPYALL, M_DONTWAIT) by m_copypacket(m, M_DONTWAIT) when it is clear that we are copying a packet (that has M_PKTHDR) and not a raw mbuf chain. commit 271f8c6651c0490673fbd2d2b7c3f34c8871e17a Author: maxv Date: Tue May 8 17:20:44 2018 +0000 Mitigation for the SS bug, CVE-2018-8897. We disabled dbregs a month ago in -current and -8 so we are not particularly affected anymore. The #DB handler runs on ist3, if we decide to process the exception we copy the iret frame on the correct non-ist stack and continue as usual. commit fb8fd5a43fd7a9fba0eddec3fceb2c84ef55b028 Author: maxv Date: Tue May 8 16:47:58 2018 +0000 Use M_MOVE_PKTHDR. commit 83228edf2570df81af3aede8bc6a8f190efdcc42 Author: maxv Date: Tue May 8 07:02:07 2018 +0000 Remove three useless debug messages, remove meaningless XXXs, and remove ieee80211_note_frame (unused). commit 485e720c0104ec560e838e61ed8b55976b9ec4b0 Author: maxv Date: Tue May 8 06:11:45 2018 +0000 Don't remove M_PKTHDR manually, use m_remove_pkthdr instead. ok ryo@ commit 1c72ea96b8ebafb738109af1c8596c6dcae96110 Author: maxv Date: Tue May 8 06:08:19 2018 +0000 Simplify: use M_MOVE_PKTHDR directly. ok knakahara@ commit 15429772ee3911e6cbaa8ab84d96ffc23e367966 Author: maxv Date: Mon May 7 19:34:03 2018 +0000 Fix possible buffer overflow. We need to make sure the inner IPv4 packet doesn't have options, because we validate only an option-less header. commit abed137b7319ec84d31e37ddbdc4237320467d83 Author: maxv Date: Mon May 7 10:53:45 2018 +0000 Clean up, improve a bit, and document m_remove_pkthdr. commit 4e45c76ff87fbdd0a760e5a23cd7fa5110ac4507 Author: maxv Date: Mon May 7 10:21:08 2018 +0000 Remove misleading comments. commit 4dc84fb770962f5af280495129e768ca47575375 Author: maxv Date: Mon May 7 09:57:37 2018 +0000 Copy some KASSERTs from m_move_pkthdr into m_copy_pkthdr, and reorder the latter to reduce the diff with the former. commit 6e4c0a23c552d3f4d20aefa17a69ee3ceb5fae10 Author: maxv Date: Mon May 7 09:51:02 2018 +0000 Use m_remove_pkthdr. ok knakahara@ (for L2TP) commit 9981db39a014d6f350a4695d1da3328fd61a2619 Author: maxv Date: Mon May 7 09:41:10 2018 +0000 Fix double-free, m_tag_delete_chain is already called by m_free. commit 716dd145c036c7c03bd8364893bc7bfcf84b98c9 Author: maxv Date: Mon May 7 09:33:51 2018 +0000 Remove a dummy reference to XF_IP4, explain briefly why we don't use ipe4_xformsw, and remove unused includes. commit e3b14ad28274d7464624714b39441941f9999fa3 Author: maxv Date: Mon May 7 09:25:04 2018 +0000 Remove now unused 'isr', 'skip' and 'protoff' arguments from ipip_output. commit b2166dc2387bf418ecb08997006f819a1d5e056e Author: maxv Date: Mon May 7 09:16:46 2018 +0000 Remove unused 'mp' argument from all the xf_output functions. Also clean up xform.h a bit. commit 79032cb829b6e2346c94cba6c0b3da2e3f788be6 Author: maxv Date: Mon May 7 09:08:06 2018 +0000 Clarify IPIP: ipe4_xformsw is not allowed to call ipip_output, so replace the pointer by ipe4_output, which just panics. Group the ipe4_* functions together. Localify other functions. ok ozaki-r@ commit f68555508957e1cee0d6180531e89c579f82dadc Author: maxv Date: Fri May 4 11:25:24 2018 +0000 Remove duplicate macros. Reported in PR/29786. commit 285532887d246bcc843da87bd6ec1c67261a5a11 Author: maxv Date: Thu May 3 17:14:37 2018 +0000 Remove ovbcopy from net80211. commit 26eaee51f09f37fb25a0c1668a3eb1a7b8ee6b61 Author: maxv Date: Thu May 3 16:52:42 2018 +0000 Drop early if there's no PPPoE interface. Otherwise it is easy for someone to flood dmesg over the local subnet. commit dd0dc48111328eba831712e1dc99c6983605b99e Author: maxv Date: Thu May 3 08:39:28 2018 +0000 Remove unused M_MAXCOMPRESS macro. commit 552a5fd81cef9a365f89f3cbefd8bb42aff90450 Author: maxv Date: Thu May 3 08:14:29 2018 +0000 Fix comment, M_LOOP is not used for statistics, it's mostly used to avoid recomputing the checksum when the packet is received on loopback. commit 1800f8c97fe5ac00c315facef0227dfefdbf5da3 Author: maxv Date: Thu May 3 07:46:17 2018 +0000 Revert my rev1.190, remove the M_READONLY check. The initial code was correct: what is read-only is the mbuf storage, not the mbuf itself. The storage contains the packet payload, and never has anything related to mbufs. So it is fine to remove M_PKTHDR on mbufs that have a read-only storage. In fact it was kind of obvious, since several places already manually remove M_PKTHDR without taking care of the external storage. commit f0ba0d93d196d12a90b5ee2aadc5fd64565140c8 Author: maxv Date: Thu May 3 07:25:49 2018 +0000 Rename m_pkthdr_remove -> m_remove_pkthdr, to match the existing naming convention, eg m_copy_pkthdr and m_move_pkthdr. commit ddb4dac5cdd444f307296f0a61de7d77503dbccc Author: maxv Date: Thu May 3 07:13:48 2018 +0000 Remove now unused tcpip.h includes. Some were already unused before. commit 04dc0c8596286832bb30a34c20e55c3260b55401 Author: maxv Date: Thu May 3 07:01:08 2018 +0000 Remove m_copy completely. commit fbcfe62ddcef793bdbf487cbb0d846e61bca3062 Author: maxv Date: Thu May 3 06:41:30 2018 +0000 Remove net_osdep.h completely. commit 084d055404789d5639850f090f88facf1bfb6207 Author: maxv Date: Tue May 1 08:42:41 2018 +0000 Remove unused argument from udp4_espinudp, and remove unused includes. commit 1ef21786436da5a7d49f070d2737b2428cd5779d Author: maxv Date: Tue May 1 08:34:08 2018 +0000 Remove some more dead code. commit f49a758a82313655ec5b002d168e0f9c09c64435 Author: maxv Date: Tue May 1 08:27:13 2018 +0000 When IP6_EXTHDR_GET fails, return ENOBUFS, and don't log an error (HDROPS is not supposed to be used here). commit 437bfdf0daacaada6aefa1825ccc3a242862f796 Author: maxv Date: Tue May 1 08:16:34 2018 +0000 When the replay check fails, return EACCES instead of ENOBUFS. commit 58fd4037b801cf92546e9043e9a78e86b533e320 Author: maxv Date: Tue May 1 08:13:37 2018 +0000 Remove double include, opencrypto/xform.h is already included in netipsec/xform.h. commit d7fb1a1b8a0eab8ac2d9a177c798230d8eff777b Author: maxv Date: Tue May 1 08:08:46 2018 +0000 Remove unused. commit 8200039857a1ea35b55daad199933db37fcdc1b4 Author: maxv Date: Tue May 1 07:21:39 2018 +0000 Remove now unused net_osdep.h includes, the other BSDs did the same. commit aadda1a95dedaa11b3c3bae26477204e7f829174 Author: maxv Date: Tue May 1 07:07:00 2018 +0000 Remove unused alias to tcpiphdr. commit d781ed3abd6496b9e9ec87945081eec071c7d087 Author: maxv Date: Tue May 1 07:03:33 2018 +0000 Redefine the structure, not to rely on tcpiphdr. commit 714fb98b01a9fcf22279f74b0a89e05f95739dd9 Author: maxv Date: Tue May 1 06:50:06 2018 +0000 Move if_name() from net_osdep.h to if.h. net_osdep.h is now unused and can be removed - the other BSDs did the same. Discussed with Kengo (if.h suggested by him). commit 5db55268002c3eb07a095389d458831c7c7dadc3 Author: maxv Date: Tue May 1 05:42:26 2018 +0000 Fix the checks in compute_ipsec_pos, otherwise m_copydata could crash. I already fixed half of the problem two months ago in rev1.67, back then I thought it was not triggerable because each packet we emit is guaranteed to have correctly formed IPv6 options; but it is actually triggerable via IPv6 forwarding, we emit a packet we just received, and we don't sanitize its options before invoking IPsec. Since it would be wrong to just stop the iteration and continue the IPsec processing, allow compute_ipsec_pos to fail, and when it does, drop the packet entirely. commit 1268fb103acbcf4ece25c387e82cd6c0583d450a Author: maxv Date: Sun Apr 29 14:54:09 2018 +0000 Remove useless icmp6.h include, remove manual externs and include in6.h to get proper definitions, and remove duplicate logic in ipsec6_common_input_cb. commit f84d20294b4a012ec5e51ba888da82e783fe069e Author: maxv Date: Sun Apr 29 14:35:35 2018 +0000 Remove obsolete/dead code, the IP-in-IP encapsulation doesn't work this way anymore (XF_IP4 partly dropped by FAST_IPSEC). commit 3ba53cad34336e62080e1adad9eccb66f848f811 Author: maxv Date: Sun Apr 29 12:12:42 2018 +0000 Move struct tcpiphdr from tcpip.h to tcp_var.h, to match UDP (udpiphdr in udp_var.h). tcpip.h is now empty, and can be removed. commit e8029cd7fcc422f0bd0d4dc80bf83220a22d3107 Author: maxv Date: Sun Apr 29 11:51:08 2018 +0000 Remove unused and misleading argument from ipsec_set_policy. commit e80656ffbf578b2da9b60a33148d5a4d9036b7f4 Author: maxv Date: Sun Apr 29 11:42:09 2018 +0000 Remove unused function. commit 43f84123284f219f2910dd1e99b454ca8363b302 Author: maxv Date: Sun Apr 29 07:24:38 2018 +0000 Remove duplicate prototype. commit 6334958a1141dbdccedd60bae7ff3edb440af4b4 Author: maxv Date: Sun Apr 29 07:16:28 2018 +0000 Add missing pserialize_read_exit in error branch, spotted during my previous commit. commit 97664afb4e53de0e049fc0f1445c3585dcc8cbce Author: maxv Date: Sun Apr 29 07:13:10 2018 +0000 Remove references to m_copy in comments. commit 795e027d4e73e9fed98dac608860447c66823dc8 Author: maxv Date: Sun Apr 29 07:05:13 2018 +0000 Replace m_copym(m, 0, M_COPYALL, M_DONTWAIT) by m_copypacket(m, M_DONTWAIT) when it is obvious that 'm' has M_PKTHDR set. commit 79a206a42a6fb40e6c476ea9b40793336af14a09 Author: maxv Date: Sun Apr 29 06:52:55 2018 +0000 Add KASSERTs in the rcvif functions. commit 8ddad75bf9cf508ca2e37664d9b402b81f803961 Author: maxv Date: Sat Apr 28 15:45:16 2018 +0000 Remove IPSEC_SPLASSERT_SOFTNET, it has always been a no-op. commit 320480cec5e4d66e050190961be0944ba80f711e Author: maxv Date: Sat Apr 28 14:39:34 2018 +0000 Inline M_EXT_WRITABLE directly, and remove the XXX, there's nothing wrong in the use of !M_READONLY. commit 54f1bfdf26572cbb0d315a769c1f4ba63f855916 Author: maxv Date: Sat Apr 28 14:25:56 2018 +0000 Move the ipsec6_input prototype into ipsec6.h, and style. commit 4b4e43a6f22f4696295e7116f0b795695a6d7cf9 Author: maxv Date: Sat Apr 28 14:21:03 2018 +0000 Stop using a macro, rename the function to ipsec_init_pcbpolicy directly. commit c7c8daf7b93628c1d0f76e927897c608430f6e61 Author: maxv Date: Sat Apr 28 14:01:50 2018 +0000 Style and remove unused stuff. commit e4027b22f700a085e333bbf833fc9d3920833318 Author: maxv Date: Sat Apr 28 13:44:19 2018 +0000 Fix the net.inet6.ipsec6.def_policy node, the variable should be &ip6_def_policy.policy, otherwise we're overwriting other fields of the structure. commit 7c6ab552c3cb10475b92803951bf53ad94c6b81c Author: maxv Date: Sat Apr 28 13:26:57 2018 +0000 Remove unused ipsec_var.h includes. commit f3468e2562182f9ab4e81a6daefb987bdcf2e8c9 Author: maxv Date: Sat Apr 28 13:23:17 2018 +0000 Remove unused macros. commit 53685f7acd9c7b6a3d2fd969ede87976eb7bb601 Author: maxv Date: Sat Apr 28 08:34:45 2018 +0000 Rename the 'flags' and 'nowait' arguments to 'how'. The other BSDs did the same. Also, in m_defrag, rename 'mold' to 'm'. commit 4ef6609c0f057c7867b1b068e849dd322b099cba Author: maxv Date: Sat Apr 28 08:16:15 2018 +0000 Modify m_defrag, so that it never frees the first mbuf of the chain. While here use the given 'flags' argument, and not M_DONTWAIT. We have a problem with several drivers: they poll an mbuf chain from their queues and call m_defrag on them, but m_defrag could update the mbuf pointer, so the mbuf in the queue is no longer valid. It is not easy to fix each driver, because doing pop+push will reorder the queue, and we don't really want that to happen. This problem was independently spotted by me, Kengo, Masanobu, and other people too it seems (perhaps PR/53218). Now m_defrag leaves the first mbuf in place, and compresses the chain only starting from the second mbuf in the chain. It is important not to compress the first mbuf with hacks, because the storage of this first mbuf may be shared with other mbufs. commit 6cd4e1c537606e0dd3b9ea7d63073dc362c3abb7 Author: maxv Date: Fri Apr 27 19:06:48 2018 +0000 Remove unused debug code. commit 8aa60338b745bf1a3fb67ea66b76fa8f0ef9f912 Author: maxv Date: Fri Apr 27 18:40:40 2018 +0000 Remove reference to m_ext.ext_type (doesn't exist). commit aca377cd3a2b87101578c6fd108a02479185c997 Author: maxv Date: Fri Apr 27 16:32:03 2018 +0000 Remove unused ext_flags field in struct _m_ext_storage. Also, simplify MEXTMALLOC, mbtypes[] doesn't exist anymore, but the code still compiled correctly because "malloc" is a macro and the argument was dropped. commit 810bd956b880bd97e49910efc4577a2af34aeff7 Author: maxv Date: Fri Apr 27 16:18:40 2018 +0000 Stop passing the pool as argument of the storage. M_EXT_CLUSTER mbufs are supposed to take their area from mcl_cache only. commit 0f0d3b880dd8475f38ea44f3bdfd4675e8cc6430 Author: maxv Date: Fri Apr 27 09:22:28 2018 +0000 Remove _MCLGET, merge its content into m_clget(). The code is slightly modified to reduce the indentation level. commit d508c6b20fd0173af206e711352e70005ad84965 Author: maxv Date: Fri Apr 27 09:02:16 2018 +0000 Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of MCLBYTES, so the area allocated is still too small. I think it should have been MEXTMALLOC, and of course I can't test my change. commit be8fb1a45e1191d4ea8cf979b95c16266449c153 Author: maxv Date: Fri Apr 27 08:51:26 2018 +0000 M_CLUSTER -> M_EXT_CLUSTER, and remove M_CLUSTER completely. commit fae1b959af1db97084c6655490bb15f2c0394e27 Author: maxv Date: Fri Apr 27 08:23:18 2018 +0000 Reorder, to group related functions. commit 2e3e35f52885801b6198c1719c4c52cfb956c988 Author: maxv Date: Fri Apr 27 07:53:07 2018 +0000 M_CLUSTER -> M_EXT_CLUSTER commit e25fc3a28abefbdf48a313b29bab2bbd536bae9d Author: maxv Date: Fri Apr 27 07:41:58 2018 +0000 Rename m_reclaim -> mb_drain, and localify. commit 2c4948c4ac5eca1fc6940451836c8eedd5e9bdc2 Author: maxv Date: Fri Apr 27 07:20:33 2018 +0000 Implement M_COPY_PKTHDR as a function, like m_move_pkthdr. commit 54bb611f7994b22f6ca6ec4ec944d6e8c373b3d5 Author: maxv Date: Fri Apr 27 06:56:21 2018 +0000 Move m_align and m_append into iee80211_netbsd.c. They are part of net80211, and shouldn't be used outside. commit daa9ec3f3839702957ae4ba34959acb93596e296 Author: maxv Date: Fri Apr 27 06:36:16 2018 +0000 Simplify m_copydata, use unsigned int, and change its last argument to match that of the man page. commit 87787152a1454fe91683e9535dc646398aec36f2 Author: maxv Date: Fri Apr 27 06:27:36 2018 +0000 Style and simplify. commit 8398a2e5ca2c1a34464633518e2a6e4a9471c249 Author: maxv Date: Fri Apr 27 06:15:49 2018 +0000 Panic in m_copypacket if no header is present, that's a requirement. commit 7b6c20564ce663ebdfbbf28f5c68e29708781598 Author: maxv Date: Fri Apr 27 06:06:43 2018 +0000 Improve the documentation of m_copypacket(), to say explicitly that a header must be present, contrary to m_copym(). While here fix a variable name (from yesterday). commit f405682296a5960d7cf57a21b0c3c92bbca60cf0 Author: maxv Date: Thu Apr 26 20:10:44 2018 +0000 Hum. This should be M_READONLY, not M_ROMAP. M_ROMAP tells us whether the mbuf storage is mapped on a read-only page. But an mbuf can still be read-only in the sense that the storage is shared with other mbufs. commit 4bd38589ccc2d82c31d4c1f3870d323cf21918bd Author: maxv Date: Thu Apr 26 19:56:55 2018 +0000 m_copy -> m_copym commit 8ac557d8c4331ee7b2566d64bfbab058311e1748 Author: maxv Date: Thu Apr 26 19:50:09 2018 +0000 Stop using m_copy(), use m_copym() directly. m_copy is useless, undocumented and confusing. commit 9524592b95f3a1d2d12242f5e76cfb88353ae3d5 Author: maxv Date: Thu Apr 26 19:33:02 2018 +0000 Fix inverted arguments in m_gethdr(). commit c747ce480a6f9286c7323a3be35ac3f3591dbe33 Author: maxv Date: Thu Apr 26 19:27:04 2018 +0000 Fix inverted arguments in MGET(). commit 89f7dda62fe9ae64c34af41aaa5bd6a1f3416314 Author: maxv Date: Thu Apr 26 19:22:17 2018 +0000 Remove unused mbuf argument from sbsavetimestamp. commit 7d40735bf7d217917d9ba7ac3edc2dc6c7144ac7 Author: maxv Date: Thu Apr 26 19:13:34 2018 +0000 Change MCLGET, so that it calls m_clget instead of doing the work in a macro. Macros are inefficient when they contain too many instructions and are used too often, because of cache coherency (and also register use). This change saves 32KB of kernel .text. commit f8201152952ce30ddad6e07900bf0d74097df293 Author: maxv Date: Thu Apr 26 08:31:36 2018 +0000 Rename m_copyback0 -> m_copyback_internal M_COPYBACK0_* -> CB_* That's a lot less misleading. While here, fix a bunch of panic messages. commit 19ade8ed4aa27bbbbbde7d356e7512d4d3865c2c Author: maxv Date: Thu Apr 26 08:13:30 2018 +0000 Stop adding '0's in parameter and function names, that's just misleading. Some remain, they need more investigation. commit c0f3b4e21c6712d9a011daf9f0bc446540f57ab3 Author: maxv Date: Thu Apr 26 07:48:21 2018 +0000 Remove m_prepend from the man page, it's a helper, and is not supposed to be part of the API. commit 4618554c94b9fc61bb137558bc8bf3fdd8707311 Author: maxv Date: Thu Apr 26 07:46:24 2018 +0000 Change comment, to clearly say that m_prepend should not be used directly. commit dd9c3cb635afa4a5cee08e699e7d85ee74a52752 Author: maxv Date: Thu Apr 26 07:28:21 2018 +0000 Use M_UNWRITABLE, no functional change. commit 30397e260b7f53166b7538658f83eb5dced0edf3 Author: maxv Date: Thu Apr 26 07:01:38 2018 +0000 Move the address checks into one function, ip6_badaddr(). In this function, reinstate the "IPv4-compatible IPv6 addresses" check; these addresses are deprecated by RFC4291 (2006). commit 99597189cb6216163ba935c2cf8b66f990530253 Author: maxv Date: Thu Apr 26 06:23:33 2018 +0000 Remove ping6_opts_hops, "-g" does not exist anymore (RH0 removed). commit e8edcce8587460ceac214d940a118ae798e206ce Author: maxv Date: Tue Apr 24 08:22:16 2018 +0000 Remove nullcheck, m is not allowed to be null. commit d1bf7cc0f9fd231572ff0755ef750cb489602ee8 Author: maxv Date: Tue Apr 24 08:10:32 2018 +0000 Change/Improve the comments, so that the definitions fit one line. commit 10967a45aa03485390af09ece1c39efda822a415 Author: maxv Date: Tue Apr 24 08:07:05 2018 +0000 Remove the M_AUTHIPDGM flag. It is equivalent to M_AUTHIPHDR, both are set in IPsec-AH, and they are always handled together. commit 1043d54a95e26ceae0346a7ba57121f2c9041cc2 Author: maxv Date: Tue Apr 24 07:22:32 2018 +0000 Add code 3 of paramprob, part of RFC7112: "IPv6 First Fragment has incomplete IPv6 Header Chain". Handle this code in ping6. commit fc5e118a7d0ab32bfe0c12302d2bbe7fe1adb726 Author: maxv Date: Tue Apr 24 07:12:04 2018 +0000 Remove annoying (void) casts. commit 4b94a1002a9225da022bb4525c3a896ad44e7ad4 Author: maxv Date: Mon Apr 23 18:59:03 2018 +0000 Clean up the IPsec ifdefs, same as ping6. commit cba58a74dab2a71e897ca72bfb3f6071f03747e3 Author: maxv Date: Mon Apr 23 18:48:30 2018 +0000 Remove double include and unused macros. commit bc8a06a2427b69661480ad7bd3b7ea2a7c07553d Author: maxv Date: Mon Apr 23 18:44:39 2018 +0000 Remove the "-R" option. It uses IPV6_REACHCONF, but we've never had this. commit fbd7e5db23cbdcc413b8d62c48d5b0d3ad522cca Author: maxv Date: Mon Apr 23 18:37:19 2018 +0000 Fix usage(), A/E don't exist. commit a2f7f0bec0c75a92f857d31b4717ed1565603b8d Author: maxv Date: Mon Apr 23 18:32:18 2018 +0000 Simplify: remove #ifdefs for constants that are always defined, and remove their #else's (some of which can't compile, since they use values that since got removed). commit 82b83b60f83661dfdbbbe30b547c4e31a415d9ee Author: maxv Date: Mon Apr 23 10:35:20 2018 +0000 Remove dead/broken code, we want to favor RFC3542 over RFC2292. No functional change. traceroute6 and rtadvd did the same. commit 74009f56872ece54fec4484963d920e1d4c157d4 Author: maxv Date: Mon Apr 23 10:23:38 2018 +0000 ... another occurrence of OLDRAWSOCKET ... commit 958f4ccd185f780dee3372878bc716a10376e01c Author: maxv Date: Mon Apr 23 10:22:18 2018 +0000 Remove dead code. commit 0c7cc412311a9b23c657adb75312fa15a2e7f9a7 Author: maxv Date: Mon Apr 23 10:19:11 2018 +0000 Remove CPPFLAGS+=-DUSE_RFC3542, it's not used anymore. commit 1f3b6d783081a8f935b5345c51321ccb62cb3e4e Author: maxv Date: Mon Apr 23 10:14:12 2018 +0000 Remove dead/broken code. We want to favor RFC3542 over RFC2292. No functional change. commit 8943e37c7c58c249660d1df4c025e6d7f4ef7681 Author: maxv Date: Mon Apr 23 09:58:35 2018 +0000 Remove dead code. commit 4243c9dad76748b77fb0dbea8b04105ce440cf56 Author: maxv Date: Mon Apr 23 09:47:03 2018 +0000 Remove now unused code. commit b3f68048b8452660daa84dbd7611c805126698f4 Author: maxv Date: Mon Apr 23 07:22:54 2018 +0000 Remove the kernel RH0 code. RH0 is deprecated by RFC5095, for security reasons. RH0 was already removed in the kernel's input path, but some parts were still present in the output path: they are now removed. Sent on tech-net@ a few days ago. commit 24783aca3f326fda494d68e31a93fcdb41d8e5a3 Author: maxv Date: Mon Apr 23 06:51:25 2018 +0000 Remove the "hops" parameter, it uses RH0, which is deprecated by RFC5095, and doesn't work on modern networks anymore. commit 3bf197d26f564adeecd9f636a1d7c9625c3e0c02 Author: maxv Date: Mon Apr 23 06:42:02 2018 +0000 Remove the "-g" option, it uses RH0, which is deprecated by RFC5095, and doesn't work on modern networks anymore. commit 85fc06bf46da4b10c589fad912dee28bbe35c166 Author: maxv Date: Sun Apr 22 10:25:40 2018 +0000 Rename ipip_allow->ipip_spoofcheck, and add net.inet.ipsec.ipip_spoofcheck. Makes it simpler, and also fixes PR/39919. commit d1a3fd8c0c996b5b58a29cbc35c70c8462eb6418 Author: maxv Date: Sat Apr 21 13:22:06 2018 +0000 Remove #ifndef __vax__. The check enforces a 4-byte-aligned size for the option mbuf. If the size is not multiple of 4, the computation of ip_hl gets truncated in the output path. There is no reason for this check not to be present on VAX. While here add a KASSERT in ip_insertoptions to enforce the assumption. Discussed briefly on tech-net@ commit 3d032c893f0889fd7900efca179f61d57b7c7b7c Author: maxv Date: Fri Apr 20 06:01:59 2018 +0000 Cast to int, to properly handle dstoff > MHLEN (which never happens). commit 91f240951b1febcd4ba8d4f0bbdb04363b4616f7 Author: maxv Date: Thu Apr 19 08:27:38 2018 +0000 Remove extra long file paths from the headers. commit 21507563171e522e7e209430e808a76e48a636c9 Author: maxv Date: Thu Apr 19 08:16:44 2018 +0000 Remove unused typedef, remove unused arguments from _ipip_input, sync comment with reality, and change panic message. commit 2cca1ceae7128735a788a67b36ca4c1ffdd6b132 Author: maxv Date: Thu Apr 19 07:58:26 2018 +0000 Add a KASSERT (which is not triggerable since ipsec_common_input already ensures 8 bytes are present), add an XXX (about the fact that it is better to use m_copydata, because it is faster and less error-prone), and improve two m_copybacks (remove useless casts). commit cb57cdb9946d7a53dde2b0efe5301367ac7df85c Author: maxv Date: Thu Apr 19 07:36:23 2018 +0000 Style, and remove meaningless XXX. commit b0264df164b803266f6ad25ec215b6f822981c93 Author: maxv Date: Thu Apr 19 07:22:29 2018 +0000 cosmetic commit 70ea398bdb6c474fb25d112ee13e44fa7d6695e9 Author: maxv Date: Thu Apr 19 05:16:02 2018 +0000 The mbuf length is allowed to be zero. commit 471c1b3a93598a3cabe2dd80dda6b9ec9ef51e2e Author: maxv Date: Wed Apr 18 17:58:07 2018 +0000 Simplify the IPv4 parser. Get the option length in 'optlen', and sanitize it earlier. A new check is added (off + optlen > skip). In the IPv6 parser we reuse 'optlen', and remove 'ad' as a result. commit 8030ac1d8c167ac0c768acb989c1c3dc9fdc2078 Author: maxv Date: Wed Apr 18 17:34:54 2018 +0000 Remove unused includes, remove misleading comments, and style. commit d612b872795f0c840fceb2ff8b3192dcd8286b6b Author: maxv Date: Wed Apr 18 14:56:35 2018 +0000 m_free -> m_freem, m_copyback could have added mbufs in the chain commit f0c8ffec291c1e3f325d7cdb26c155bb92c7b232 Author: maxv Date: Wed Apr 18 14:47:11 2018 +0000 mention SVS, retpoline, SMAP commit 50966d0504e72120023540a478a00ae2e2c3cca4 Author: maxv Date: Wed Apr 18 14:42:16 2018 +0000 mention meltdown/spectre fixes commit 7ed061743e81c7ee4f9f1dd96759401f0fbdfd4f Author: maxv Date: Wed Apr 18 07:38:02 2018 +0000 Remove unused malloc.h include. commit 6248095d1571001535f3c8ac3d57c194406a7192 Author: maxv Date: Wed Apr 18 07:32:44 2018 +0000 Style, and remove unused MALLOC_DECLARE. commit 9ad54ef6f508727ae47b7551d6488e06f9396a0e Author: maxv Date: Wed Apr 18 07:17:49 2018 +0000 Remove unused netipsec/xform.h includes. commit b3f0637f47ca28d7198788dd275fbe31ac9d1c2c Author: maxv Date: Wed Apr 18 06:57:39 2018 +0000 Remove dead code. ok ozaki-r@ commit 32ad131c3177a9fb87006282b345002269f2dc17 Author: maxv Date: Wed Apr 18 06:43:10 2018 +0000 style commit 8284c193e45fb6fb142c9f418978c630e1a1ed2a Author: maxv Date: Wed Apr 18 06:22:47 2018 +0000 Style, and remove another misleading comment. commit 5ccac77955915d4f73b9854f78fa7af753b2a871 Author: maxv Date: Wed Apr 18 06:17:43 2018 +0000 Remove misleading comments. commit 44ee7beed582c77a8d584a73e34ba283380af496 Author: maxv Date: Wed Apr 18 06:13:23 2018 +0000 Remove the net.inet6.esp6 net.inet6.ipcomp6 net.inet6.ah6 subtrees. They are aliases to net.inet6.ipsec6, but they are not consistent with the original intended naming. (eg there was net.inet6.esp6.esp_trans_deflev instead of net.inet6.esp6.trans_deflev). commit 0b775456d5cd525f18fac1b5fb87ed530a73e2c0 Author: maxv Date: Wed Apr 18 06:03:36 2018 +0000 Remove duplicate sysctls: net.inet.esp.trans_deflev = net.inet.ipsec.esp_trans_deflev net.inet.esp.net_deflev = net.inet.ipsec.esp_net_deflev net.inet.ah.cleartos = net.inet.ipsec.ah_cleartos net.inet.ah.offsetmask = net.inet.ipsec.ah_offsetmask net.inet.ah.trans_deflev = net.inet.ipsec.ah_trans_deflev net.inet.ah.net_deflev = net.inet.ipsec.ah_net_deflev Use the convention on the right. Discussed a month ago on tech-net@. commit a94a5aec70db5968578634437f6c361c1ee6cfb8 Author: maxv Date: Tue Apr 17 17:56:08 2018 +0000 fix comments commit 74ff5df3649189fa65c0b7b58cb1d5c71ac8488b Author: maxv Date: Tue Apr 17 17:47:05 2018 +0000 Add XXX. If this code really does something, it should use MCHTYPE. commit cc92c60b29118095e71855df44638c802451a132 Author: maxv Date: Tue Apr 17 17:40:38 2018 +0000 Style, add XXX (about the mtu that goes negative), and remove #ifdef inet. commit d3ac13d6af4a1e585bc808892a425eb9774fd4b8 Author: maxv Date: Tue Apr 17 09:06:33 2018 +0000 Fix a pretty bad mistake, that has always been there. m_adj(m1, -(m1->m_len - roff)); if (m1 != m) m->m_pkthdr.len -= (m1->m_len - roff); This is wrong: m_adj will modify m1->m_len, so we're using a wrong value when manually adjusting m->m_pkthdr.len. Because of that, it is possible to exploit the attack I described in uipc_mbuf.c::rev1.182. The exploit is more complicated, but works 100% reliably. commit e19c778464c6b302f950774043eb23e166aaf31d Author: maxv Date: Tue Apr 17 07:58:31 2018 +0000 change the comment commit 1c1bbc5f53281990dfb1b2a211debda0bce657e1 Author: maxv Date: Tue Apr 17 07:41:34 2018 +0000 If the mbuf is shared leave M_PKTHDR in place. Given where this function is called from that's not supposed to happen, but I'm growing unconfident about our mbuf code. commit 15290065a881d035d7a8660c148999d66464fe60 Author: maxv Date: Tue Apr 17 06:23:30 2018 +0000 Don't assume M_PKTHDR is set only on the first mbuf of the chain. It should, but it looks like there are several places that can put M_PKTHDR on secondary mbufs (PR/53189), so drop this assumption right now to prevent further bugs. The check is replaced by (m1 != m), which is equivalent to the previous code: we want to modify m->m_pkthdr.len only when 'm' was not passed in m_adj(). commit 8c7027e8653e921124c3a1ee9b40bdc214ae4a8e Author: maxv Date: Mon Apr 16 19:19:51 2018 +0000 Disable the M_PKTHDR check for now. It causes PR/53189 (which is also reproducible on i386). It seems that someone is giving looutput a malformed chain. commit 77da9e87c2e69814942139f930d3c4c1a8518232 Author: maxv Date: Mon Apr 16 17:32:34 2018 +0000 Remove dead code. ok ozaki-r@ commit bd1c951fad7faa4c1efa71bb1cbff0f2603c9c4d Author: maxv Date: Sun Apr 15 17:26:39 2018 +0000 clarify commit 64a4e993bc548e45e75161e6e095d7aea0e709d3 Author: maxv Date: Sun Apr 15 08:31:18 2018 +0000 Remove useless DIAGNOSTIC block, the caller already ensures the assumptions, and here we're not doing anything (it should be a panic rather than a printf). commit 65fa378aefb2c0255edf8986086c7972eea13b15 Author: maxv Date: Sun Apr 15 08:27:21 2018 +0000 typo in comment commit 45a5d35dc52d120799705386e8ad72ce5f77cc94 Author: maxv Date: Sun Apr 15 07:35:49 2018 +0000 Introduce a m_verify_packet function, that verifies the mbuf chain of a packet to ensure it is not malformed. Call this function in "points of interest", that are the IPv4/IPv6/IPsec entry points. There could be more. We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only. This function should not be called everywhere, especially not in places that temporarily manipulate (and clobber) the mbuf structure; once they're done they put the mbuf back in a correct format. commit fa3c9dbe394574a0615c56b67925f79ffe8bb631 Author: maxv Date: Sat Apr 14 17:55:47 2018 +0000 Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer overflows in non-jumbogram packets. For jumbograms we will probably be in trouble here; but it doesn't seem possible to craft reliably a jumbogram for a non-jumbogram-enabled device. So I don't think it's a huge problem. commit f1da921972a2f6c60f31a7c2f89fb43bf050fbed Author: maxv Date: Sat Apr 14 14:59:58 2018 +0000 Cosmetic, and remove one XXX (no problem). commit 53dc588064749b2966765c1e64dccf80c0a05a97 Author: maxv Date: Sat Apr 14 08:13:58 2018 +0000 cosmetic commit aa2a1dca4f00defc612c2652e533b76c8352a760 Author: maxv Date: Sat Apr 14 08:03:33 2018 +0000 Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for security reasons. We already removed it in Route6. In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with the same offset, but still using the pointer from the first call, which could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs in place, instead of freeing them. And in general, using a 'finaldst' pointer on the mbuf, and then modifying that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error- prone. commit 8ddeeb99d94de0db3601b52d572047d6d659a01f Author: maxv Date: Sat Apr 14 06:45:17 2018 +0000 Remove dead code. It is the same as the non-obsolete one, since ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE, and the code leads to the same errno value (EHOSTUNREACH). commit 2a61565763906282c90781cfeb8c7915b2f8da60 Author: maxv Date: Fri Apr 13 17:43:37 2018 +0000 Document "debug" in usage(). commit a043c3fe5392f7deef8404b5a39c24a7cb7d147b Author: maxv Date: Fri Apr 13 11:32:44 2018 +0000 Localify global variables, style, and add two XXXs. commit 3725917354d877e4f0199155e2955c3326a3a9b6 Author: maxv Date: Fri Apr 13 11:19:09 2018 +0000 Add XXX, using a pool would be better than kmem. commit 0b0f423dc66fd720a91190b68d84142af8e393d7 Author: maxv Date: Fri Apr 13 11:18:08 2018 +0000 Release the lock a little earlier. commit 94672d15f548040846e44c827bbc49f21f67128f Author: maxv Date: Fri Apr 13 11:01:14 2018 +0000 style commit 536461a41934f3928cac138f30f7d3331ddb99b7 Author: maxv Date: Fri Apr 13 09:34:20 2018 +0000 Remove duplicate, to better show that this place doesn't make a lot of sense. The code should probably be removed, it's a leftover from when we had #ifdef __FreeBSD__. commit 1bd6f0e3a870d995c5fce89726671a966290ea22 Author: maxv Date: Fri Apr 13 09:29:04 2018 +0000 Improve the check, we want to have len >= udphdr all the time, and not just when the packet size doesn't match the mbuf size. Normally that's not a huge problem, since IP6_EXTHDR_GET gets called earlier, so we can't have (ip_len == iphlen + len) && (len < sizeof(struct udphdr)) commit dc76e49a2806591e0c9f2f93e528b803538441b7 Author: maxv Date: Fri Apr 13 09:00:29 2018 +0000 Remove useless comment and style. commit facfc66dfa23d2a64f59c312aed341ced0c2f7b5 Author: maxv Date: Fri Apr 13 08:55:50 2018 +0000 Add XXX. In fact, it would be better, if all the fragments were offloaded, to quickly recompute the checksum on the fly, and keep it in the mbuf header. commit d245abf00b60650a525356b06a4ff5810ec04a68 Author: maxv Date: Fri Apr 13 08:47:46 2018 +0000 Reduce the diff between similar blocks. commit 7e610062986b87763658aea19a46c8ff3f08031a Author: maxv Date: Fri Apr 13 08:44:41 2018 +0000 Add a KASSERT, we want M_PKTHDR. commit 6d601b07dd8311ca4e75614e10f9eacd38292c65 Author: maxv Date: Fri Apr 13 08:12:51 2018 +0000 Reorder a few instructions to clarify. Replace two bcopy by memcpy. commit 347497051c378c57f694c2dc1f06ce33196d3014 Author: maxv Date: Fri Apr 13 07:36:11 2018 +0000 No, fix previous. commit eaaf343f241965a10129e645740dadd56de89c70 Author: maxv Date: Fri Apr 13 07:30:46 2018 +0000 Improve comment. commit fe1531b8351f92355810f60478046e348f694c52 Author: maxv Date: Thu Apr 12 07:45:29 2018 +0000 Make 'opts' local to rip_sbappendaddr(). commit dc90fa12dcfc9487651f795c641c1113c296c9ca Author: maxv Date: Thu Apr 12 07:28:10 2018 +0000 Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is the same everywhere. commit 90cfce5f2bebd7a794986660684dbbd915b78420 Author: maxv Date: Thu Apr 12 06:49:39 2018 +0000 Remove misleading comment; we're just checking the SP, not verifying the AH/ESP payload. While here style a bit. commit 6e119faa35c423b5182524738f239e1f4108d384 Author: maxv Date: Wed Apr 11 08:29:19 2018 +0000 Remove whitespaces/tabs, and one non-ASCII character. commit 406231786e9a7d04c8a8935fa9b0fe93d41dece1 Author: maxv Date: Wed Apr 11 08:11:20 2018 +0000 Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in ipsec_getpolicybyaddr, and only IP_FORWARDING is taken. In fact it would be good to change the 'flags' argument of ipsec4_input to be a boolean, same for ipsec_getpolicybyaddr. It would be less misleading. commit 950fe064623e1f49293c15bf1e682039fb88eca9 Author: maxv Date: Wed Apr 11 07:55:19 2018 +0000 Add comment about IPsec. commit 7290856a6533d480783aa8673d71ba71fe52ed5e Author: maxv Date: Wed Apr 11 07:52:25 2018 +0000 Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't overlap. commit 4c128d155e75ce4d5d5c0dbb9d173c9c0156a68d Author: maxv Date: Wed Apr 11 07:15:12 2018 +0000 Add 'static', like the prototype. commit fa2a5aa348bcb405882768466b2c93aff4491ec9 Author: maxv Date: Wed Apr 11 06:37:32 2018 +0000 Add one more XXX in the list. commit 6afc30f362100b6acdadcffd7625d9f9ffd8fe66 Author: maxv Date: Wed Apr 11 06:26:00 2018 +0000 Add XXX. commit eddcdacb7c55eb22d40ac4ad9667e833fa7b1d8d Author: maxv Date: Wed Apr 11 05:59:42 2018 +0000 Add XXX. commit 158a49fe7e1480fe675fb233d78c89bac829e67b Author: maxv Date: Wed Apr 11 05:38:47 2018 +0000 Add XXX. commit ad4ab2621ef28f47e2df2443ab5c7d858df3bc13 Author: maxv Date: Tue Apr 10 16:12:29 2018 +0000 Remove m_getclr. It is unused, confusing (vs m_clget), and is a weak implementation (eg you can't request a zeroed pkthdr mbuf). commit 69ad968d7e00b59336d1082370a0ec452dfa5006 Author: maxv Date: Tue Apr 10 15:29:46 2018 +0000 Put the "free" functions close to one another. No functional change. commit ba4b70f7c7a427d7abbf15804a52de6e0d6c3f6f Author: maxv Date: Tue Apr 10 15:27:35 2018 +0000 Localify m_ext_free. commit c5aaba1b5cbcea366e6180bb9b2e3ebccc007434 Author: maxv Date: Tue Apr 10 08:41:14 2018 +0000 Remove unused mbuf argument from arpcreate() and arplookup(). commit d540724eadb056bb0206329138589274cabf086b Author: maxv Date: Tue Apr 10 08:22:35 2018 +0000 Replace comment by KASSERT. commit d1b4a5bed3e4f35e02ff35e514e0c9ba8695e3bd Author: maxv Date: Tue Apr 10 07:53:36 2018 +0000 Improve an XXX of mine, and fix one stat. commit bb19452abf8de642ae18aecdedd797e35972a4f6 Author: maxv Date: Tue Apr 10 06:32:23 2018 +0000 add two entries commit 263edb0db4f2688365cef47d5e77689cba8d575e Author: maxv Date: Mon Apr 9 16:14:11 2018 +0000 Replace KASSERTMSG by a real check. L2 encapsulation protocols (at least L2TP) don't ensure the LLC is there, and in !DIAGNOSTIC configurations m_copydata will crash. Tested with L2TP. commit 02becc9fe1eec0c0b34831904d9a69220d6792bf Author: maxv Date: Mon Apr 9 11:35:22 2018 +0000 Add KASSERT. The input point expects struct ether_header to be there. Now, I'm wondering whether it can be triggered by L2 encapsulation protocols - they may not provide a contiguous area. commit 65578a8f8c3ac77298d24cb38acff9764930d972 Author: maxv Date: Mon Apr 9 11:05:59 2018 +0000 Minor stylistic changes, add XXX and fix typo. No functional change. commit 308e1529803d949010379db973bb8675db39a1e9 Author: maxv Date: Sun Apr 8 12:18:06 2018 +0000 Remove the ipre_mlast field and the TRAVERSE macro. The goal was to store in ipre_mlast the last mbuf of the chain, so that m_cat could be called on it. But it's not needed, since m_cat already does the equivalent of TRAVERSE itself. If it were needed, there would be a bug, since we don't call TRAVERSE on ipre_mlast when creating a new reassembly entry. commit 7e076be9c1e8cf58ce2c64f71df902d6d5195114 Author: maxv Date: Sun Apr 8 11:50:46 2018 +0000 Remove unused field, and sync comment with reality. commit fedd32fedf6c643cc68a769aff46eb280f4bfd1d Author: maxv Date: Sun Apr 8 08:57:37 2018 +0000 Move NPF's todo list into src/doc/TODO.npf, and add some entries. After a conversation (two months ago) with rmind and sborrill. commit d392f0398276ab0482957fae12af0b496df8868d Author: maxv Date: Sun Apr 8 05:51:45 2018 +0000 Fix bug I introduced in previous commit. commit c615b2fb7228ff4dd663597bed4776ce86370a46 Author: maxv Date: Sat Apr 7 13:48:50 2018 +0000 Remove dead code. commit d73ca0a2775e4379127799e6ae05dd1c08478c60 Author: maxv Date: Sat Apr 7 09:20:25 2018 +0000 Fix an inverted logic. nbuf_cksum_barrier returns true when the direction is PFIL_OUT and TSO is active; that is to say, it returns true when the checksum was already recomputed by the function. The check should be !nbuf_cksum_barrier, because otherwise we're wrongfully checksumming twice, and it causes the packet to be kicked later in tcp_input. This can be seen with a configuration of the type: procedure "norm" { normalize: "max-mss" 15000 } group default { pass all apply "norm" } The packets systematically get dropped because the checksum validation in tcp_input fails. With this patch in place, it works. commit 52248867dd7c30cdab1b66584570a5ae4e140f7c Author: maxv Date: Sat Apr 7 09:06:26 2018 +0000 Rewrite npf_fetch_tcpopts: * Instead of doing several nbuf_advance/nbuf_ensure_contig and playing with gotos, fetch the TCP options only once, and iterate over the (safe) area. The code is similar to tcp_dooptions. * When handling TCPOPT_MAXSEG and TCPOPT_WINDOW, ensure the length is the one we're expecting. If it isn't, then skip the option. This wasn't done before, and not doing it allowed a packet to bypass the max-mss clamping procedure. Discussed on tech-net@. commit b90e00669585eee2fc9768e2b547d5dc1736e34c Author: maxv Date: Fri Apr 6 17:30:25 2018 +0000 Change the iteration, to make sure the ACPI_MCFG_ALLOCATION structure we're reading fits the table we allocated. Linux does the same. I have a laptop which, for some reason, reports a table size of 62 bytes. Clearly that's incorrect, it should be 60 (44 + 16). Because of the stray +2, here the kernel reads past the end of the allocated buffer, hits an unmapped VA, and panics at boot time. So the laptop can't boot. Now it boots fine. commit 2dd9f544b767ac6ca35c19690f5538245dfe8df6 Author: maxv Date: Fri Apr 6 14:50:55 2018 +0000 If we're trying to read the mss on a packet that for some reason has two MAXSEG options, we find ourselves patching the second option with the value of the first one. Fix that by using a local variable. commit 4330d00048406920bf963bbb110546ae0ddee243 Author: maxv Date: Thu Apr 5 15:04:29 2018 +0000 Set the "method" string at boot time too. commit ac73c89282ab040384488c016ee21049a9632044 Author: maxv Date: Thu Apr 5 14:14:27 2018 +0000 Hum, don't let userland set bit 13, because this can crash the kernel. commit d9cd9e485125f6bf0e554ebf6ace3b2dcc1edaf6 Author: maxv Date: Thu Apr 5 14:11:20 2018 +0000 Fix the check, should be >=. commit d40dfeef09f26c6aeb82f489ff2a5695900d5b1e Author: maxv Date: Thu Apr 5 08:43:07 2018 +0000 Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but we do have the AMD DIS_IND method. commit c27049e2903ef692e1171b54ce8fa8fd080404d0 Author: maxv Date: Wed Apr 4 16:23:27 2018 +0000 Add machdep.spectre_v2.method, a string that tells which method is active. commit 481c0c597154bf7c0d86f444e8a0062d6a3a07b4 Author: maxv Date: Wed Apr 4 12:59:49 2018 +0000 Enable the SpectreV2 mitigation by default at boot time. commit f2fa46aa64bd89ba5fe9bdd6ab29babbc468ba31 Author: maxv Date: Tue Apr 3 09:03:59 2018 +0000 Remove ipsec_copy_policy and ipsec_copy_pcbpolicy. No functional change, since we used only ipsec_copy_pcbpolicy, and it was a no-op. Originally we were using ipsec_copy_policy to optimize the IPsec-PCB cache: when an ACK was received in response to a SYN, we used to copy the SP cached in the SYN's PCB into the ACK's PCB, so that ipsec_getpolicybysock could use the cached SP instead of requerying it. Then we switched to ipsec_copy_pcbpolicy which has always been a no-op. As a result the SP cached in the SYN was/is not copied in the ACK, and the first call to ipsec_getpolicybysock had to query the SP and cache it itself. It's not totally clear to me why this change was made. But it has been this way for years, and after a conversation with Ryota Ozaki it turns out the optimization is not valid anymore due to MP-ification, so it won't be re-enabled. ok ozaki-r@ commit 6f2602ba9ce5e75c8993a883d166f5608b94c070 Author: maxv Date: Tue Apr 3 08:46:01 2018 +0000 Remove unused fields and outdated comment. commit 899eea122bf2878434723013206f9b717a8c3bc1 Author: maxv Date: Tue Apr 3 08:02:34 2018 +0000 bcopy -> memcpy, it's obvious the areas don't overlap. commit 5410f7d38042b3be881d3ad4cdf09c2ad818b688 Author: maxv Date: Sun Apr 1 12:58:47 2018 +0000 Change the check to be <= instead of <. This fixes one occurrence of an apparently widespread division-by-zero bug in our TCP code: if a user adds huge IPv6 options with setsockopt, and if the total size of the options happens to be equal to the available space calculated for the TCP payload, t_segsz gets set to zero, and given that we then divide several things by it, the kernel crashes. commit c3be4d5f3182ffcfac30838a0f1ce0638ca55280 Author: maxv Date: Sun Apr 1 12:46:50 2018 +0000 Reorder and style, for clarity. commit f37b296df7bc2fc583b1e54264ae86ed2c74f211 Author: maxv Date: Sat Mar 31 19:27:14 2018 +0000 typo in comments commit c4e90f6c85db32c4ed55fe593c1c2830f5fdca74 Author: maxv Date: Sat Mar 31 08:43:52 2018 +0000 Rename spectreV2 -> spectre_v2, and introduce spectre_v1 (which defaults to not-mitigated). This gives the user an easy way to find out whether the system is vulnerable: machdep.spectre_v1.mitigated machdep.spectre_v2.mitigated They are also available on i386. commit 40d2cbc4cd7c3920d54c4bd6d2cc087926466cd1 Author: maxv Date: Sat Mar 31 08:30:01 2018 +0000 Reorganize to simplify. commit cf5c9d3998f916d71f55108227264691c4311d42 Author: maxv Date: Sat Mar 31 07:15:47 2018 +0000 Add #ifdef, for i386 not to panic. commit 3573bddf8e7281f1a770769f5d6713cf597b3ebc Author: maxv Date: Fri Mar 30 19:58:05 2018 +0000 Improve the detection. Future generations of Intel CPUs will have a bit to say they are not affected by Meltdown. commit eccd917ca08e5efd1c68826bc97ae6e736fd00de Author: maxv Date: Fri Mar 30 19:51:53 2018 +0000 Retrieve cpuid.7:%edx. commit b92bd77bb732e42d1fcd91754501cfbb5ce77aa4 Author: maxv Date: Fri Mar 30 19:49:49 2018 +0000 Add RDCL_NO and IBRS_ALL. commit 9d732d46aa382e22c476cd72c3e0229ab6d2e365 Author: maxv Date: Fri Mar 30 10:01:36 2018 +0000 Fix warning when compiling Xen; FLAT_RING3_CS64 is defined in a child of xen.h, which is already included in genassym.cf. So don't redefine it. commit dc7e1d383b5922ea4727a41bfe0cf7375341b47b Author: maxv Date: Fri Mar 30 09:53:08 2018 +0000 Add #ifndef XEN, xen doesn't have speculation_barrier. commit 2011e64e8e327e2f075b52e6a3ea901b85ae0a5c Author: maxv Date: Fri Mar 30 08:57:32 2018 +0000 Remove dead code. It was introduced in rev1 (25 years ago), and is irrelevant today. commit f4e2fd891eb57f828c969abb446a6ef4fe51899f Author: maxv Date: Fri Mar 30 08:53:51 2018 +0000 Style, use NULL for pointers, use KASSERT, and don't inline huge functions, we want to debug them with DDB (and not just with GPROF). commit 13b93282b7deefc6981464a084fb23f16f23cf1f Author: maxv Date: Fri Mar 30 08:25:06 2018 +0000 Fix the log. mtod never returns NULL, so 'ip' is always non-NULL, and the 'ip6' branch is never taken. As a result we log garbage on IPv6 packets. Use ip_v instead. commit dcb39852671ddd012fe1daf16a243e6c3f4de02a Author: maxv Date: Fri Mar 30 07:11:40 2018 +0000 Use consttime_memequal instead of memcmp, to prevent side channels. This functions returns 1 when the buffers are equal, contrary to memcmp, hence the !. commit 1c0f69ac1a76ec4f009d7ce0964e3c07e6530662 Author: maxv Date: Thu Mar 29 18:54:48 2018 +0000 Remove TCPREASS_DEBUG. It was introduced 20 years ago when the reassembler was being developed, but it's irrelevant today. Makes the code clearer. commit eb3b149738933248bca21054eb9f3eddad424b2b Author: maxv Date: Thu Mar 29 17:46:17 2018 +0000 Reorder/Fix comments to clarify. commit 138c19ae9effbdb2caeda09374b6791a00614c35 Author: maxv Date: Thu Mar 29 17:12:36 2018 +0000 Remove two more 'else' branches. commit 123bc4debe4a35dba8f7fb4c3cf2e9521987e447 Author: maxv Date: Thu Mar 29 17:09:00 2018 +0000 Fix memory leak, we may reallocate 'tcp_saveti' after 'findpcb'. It's not a tragic bug, because it happens only on sockets with debug enabled. commit 4ffb1b7133ec576d5faf8650a339d998c8b5a0c1 Author: maxv Date: Thu Mar 29 17:01:46 2018 +0000 Remove 'else', makes it clearer that we leave. commit 2252451e7e8c7b9b29feec35b84467844650fec0 Author: maxv Date: Thu Mar 29 16:59:38 2018 +0000 Clarify with KASSERT. commit 0f1e599b2d3f821dcadd0d7306f0b59c720c3768 Author: maxv Date: Thu Mar 29 16:54:59 2018 +0000 Simplify the computation: m->m_pkthdr.len - sizeof(struct tcphdr) - optlen - hlen = m->m_pkthdr.len - (sizeof(struct tcphdr) + optlen + hlen) = m->m_pkthdr.len - [tcp_len] = toff commit 84674ffc6c79621980e2d4e3cd9f81414aa27da3 Author: maxv Date: Thu Mar 29 08:11:41 2018 +0000 Misc changes; no real functional change. commit 4b1d2aa0a9cb8838607ff016c3ad234e2c9834db Author: maxv Date: Thu Mar 29 07:46:43 2018 +0000 Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to understand. Also make tcp6_mtudisc() static in tcp_subr.c. commit 955fb0ef008e1a0302d8e44763f25581e5d0fdca Author: maxv Date: Thu Mar 29 07:24:26 2018 +0000 Use EOPNOTSUPP instead of EINVAL. commit 43dcfb5e48f4efa79bf6e624061b5961ed7a0b9b Author: maxv Date: Thu Mar 29 07:21:24 2018 +0000 Allow IBRS to be disabled dynamically. commit 96b24973a865c64e093040547f2417268ef58189 Author: maxv Date: Thu Mar 29 07:15:12 2018 +0000 Fix sysctl type, should be bool. commit 39a2d25208ca6c41f75456ae022dd02f36aab143 Author: maxv Date: Wed Mar 28 19:56:40 2018 +0000 The call to svs_lwp_switch can clobber %rdi/%rsi, so restore them before calling speculation_barrier. commit f9ec9bddddd9a24931117d3b38a1008b5730185c Author: maxv Date: Wed Mar 28 19:50:57 2018 +0000 oldlwp can be NULL, so ensure it isn't. commit 87589b09822a792610466d0f89aecff0534c754a Author: maxv Date: Wed Mar 28 19:47:54 2018 +0000 Add 'break', otherwise we're not gonna go very far. While here use a less error-prone syntax. commit 945554e74cdcf2a0e110cd6b08c9c10dae6ac03a Author: maxv Date: Wed Mar 28 16:02:49 2018 +0000 Add the IBRS mitigation for SpectreV2 on amd64. Different operations are performed during context transitions: user->kernel: IBRS <- 1 kernel->user: IBRS <- 0 And during context switches: user->user: IBPB <- 0 kernel->user: IBPB <- 0 [user->kernel:IBPB <- 0 this one may not be needed] We use two macros, IBRS_ENTER and IBRS_LEAVE, to set the IBRS bit. The thing is hotpatched for better performance, like SVS. The idea is that IBRS is a "privileged" bit, which is set to 1 in kernel mode and 0 in user mode. To protect the branch predictor between user processes (which are of the same privilege), we use the IBPB barrier. The Intel manual also talks about (MWAIT/HLT)+HyperThreading, and says that when using either of the two instructions IBRS must be disabled for better performance on the core. I'm not totally sure about this part, so I'm not adding it now. IBRS is available only when the Intel microcode update is applied. The mitigation must be enabled manually with machdep.spectreV2.mitigated. Tested by msaitoh a week ago (but I adapted a few things since). Probably more changes to come. commit 4bc5b56711a56aa0f3f2833ec735fd3d55bca1f2 Author: maxv Date: Wed Mar 28 14:56:59 2018 +0000 Move the SpectreV2 mitigation code into a dedicated spectre.c file. The content of the file is taken from the end of cpu.c, and is copied as-is. commit 59e462a625db7de3255915ea2aa58ffcd8093ed0 Author: maxv Date: Wed Mar 28 14:43:55 2018 +0000 Several changes in syn_cache_respond: * Replace idiotic diagnostic check by KASSERT. max_linkhdr+tlen<=MCLBYTES is a widespread assumption. * Improve initialization of 'tp'. * Put panics in dead branches. * Merge two switches. commit f03ef760423cb652fdbadb917192ae5e83e5e0ac Author: maxv Date: Wed Mar 28 14:30:42 2018 +0000 Remove unused variable. commit 916ba4b93518847739e51ea20fe6482d4327b6b0 Author: maxv Date: Wed Mar 28 14:22:16 2018 +0000 Remove two unused args from syn_cache_get(). commit 192275a5ecda6cd0bdddb93f0fe9ed9bb56c0197 Author: maxv Date: Wed Mar 28 14:16:59 2018 +0000 Dedup: introduce tcp_urp_drop() and use it. commit da06b6e79f3ace3c1909eaf3980fbaa35715e1fa Author: maxv Date: Wed Mar 28 13:50:14 2018 +0000 Minor changes: style, improve comments (and put them at the correct place), use NULL for pointers, and add {}s to prevent confusion. commit 999e99910c03207cdd7e29c776573146efed4f73 Author: maxv Date: Fri Mar 23 09:30:55 2018 +0000 Remove #ifdef INET. Nobody is doing that in the kernel, and there are even IPv4 places that are not covered here. commit 7b2860737048474575b7c89550b984df61aee124 Author: maxv Date: Fri Mar 23 08:57:40 2018 +0000 Improve a bit here and there. Replace bcopy by memcpy/memmove. commit 1258c533cbf0f3f7e50f1787b2f6938883c569c6 Author: maxv Date: Fri Mar 23 08:34:57 2018 +0000 In addition to checking L4 in the cache, here we also need to check the protocol. The NPF entry point does not ensure that ICMPv6 can be set only in IPv6 ICMPv4 can be set only in IPv4 So we could have ICMPv6 in IPv4. commit 6280f7d595445cbaa864bd51c3718cb2f104161e Author: maxv Date: Fri Mar 23 08:28:54 2018 +0000 If we fail to advance inside TCP/UDP/ICMPv4/ICMPv6, stop pretending L4 is unknown, and error out right away. This prevents bugs in machinery, if a place looks for L4 in 'npc_proto' without checking the cache too. I've seen a ~similar problem already. commit 0f5040d262e361fdc84ec75d9edbbba2de002f81 Author: maxv Date: Thu Mar 22 21:19:28 2018 +0000 Don't pass a pointer to tcp_reass, otherwise it looks like it can modify tlen while it doesn't. commit ad9781c03ad7924b8bfefff84ce410495a97e398 Author: maxv Date: Thu Mar 22 21:10:17 2018 +0000 Rearrange a bit. No real functional change. commit 8b2f367ce0a4e247f5956b2f176e3aa43cbe6d89 Author: maxv Date: Thu Mar 22 20:48:38 2018 +0000 Don't call tcp_input_checksum again, it was already called earlier, no need to checksum twice. Then call tcp_fields_to_host a bit earlier, so that we don't need to call it in each branch. commit aa6011464d73c4a58ec4390245f3c00c4b17eb3c Author: maxv Date: Thu Mar 22 12:16:11 2018 +0000 Ah, fix compilation. I tested my previous change by loading the kernel module from the filesystem, but the Makefile didn't have DIAGNOSTIC enabled, and the two KASSERTs I added did not compile properly. commit 51ea93459cdcf26683ae54bc32aed7062eea246b Author: maxv Date: Thu Mar 22 09:04:25 2018 +0000 Retrieve the complete IPv4 header right away, and make sure we did retrieve the IPv6 option header we were iterating on. commit 6a22a8fa9ba9f7361fc0f919efde733cecebdb17 Author: maxv Date: Thu Mar 22 08:57:47 2018 +0000 Change npf_cache_all so that it ensures the potential ICMP Query Id is in the nbuf. In such a way that we don't need to ensure that later. Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither the nbuf nor npc. Adapt their callers accordingly. In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave right away, without recaching npc (not needed since we didn't touch the nbuf). This fixes the handling of Query Id packets (that I broke in my previous commit), and also fixes another possible use-after-free. commit f960283b72546b8f1f0fb5772abaa54671f5a881 Author: maxv Date: Thu Mar 22 07:32:07 2018 +0000 Fix use-after-free. The nbuf can be reallocated as a result of caching 'enpc', so it is necessary to recache 'npc', otherwise it contains pointers to the freed mbuf - pointers which are then used in the ruleset machinery. We recache 'npc' when we are sure we won't use 'enpc' anymore, because 'enpc' can be clobbered as a result of caching 'npc' (in other words, only one of the two can be cached at the same time). Also, we recache 'npc' unconditionally, because there is no way to know whether the nbuf got clobbered relatively to it. We can't use the NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the cache. Discussed with rmind@. commit da03644a013ed3bf28957db118d6b70fce70766d Author: maxv Date: Wed Mar 21 17:03:09 2018 +0000 Localify and remove unused prototypes. commit 73204a39b7db6eac652cab91c4656a575aad5899 Author: maxv Date: Wed Mar 21 16:26:04 2018 +0000 Remove these global variables. They are unused, racy, and the only thing they do is triggering cache synchronization latencies between CPUs. commit 76ca46dc7b6736e2194d6ad09b671bc8438ee88b Author: maxv Date: Wed Mar 21 15:36:28 2018 +0000 Add XXX (we don't handle IPv6 Jumbograms), and whitespace. commit 1c9c60ff3c3057cd4a4b7934cbc26acab7609c27 Author: maxv Date: Wed Mar 21 15:33:25 2018 +0000 Fix an untriggerable memory leak. carp_prepare_ad does not fail, so switch it to void. commit 80fecd6dbdd3be5edf568359916a241561928d8b Author: maxv Date: Wed Mar 21 10:08:16 2018 +0000 Don't read the L4 payload after IPPROTO_AH when handling IPv6 packets. AH must be considered as the payload, otherwise a block all pass in proto ah from any pass out proto ah from any configuration will actually block everything, because NPF checks the protocol against the one found after AH, and not AH itself. In addition it may have been a problem for stateful connections; an AH packet sent by an attacker with an incorrect authentication and a correct TCP/UDP/whatever payload from an active connection could manage to change NPF's FSM state, which would perhaps have altered the legitimate connection with the authenticated remote IPsec host. Note that IPv4 already doesn't go beyond AH, which is the correct behavior. commit fb54665e73407bd85293a3f3270ea6eae86e8798 Author: maxv Date: Tue Mar 20 18:27:58 2018 +0000 (Re)Fix handling of segment register faults. My previous attempt did fix faults occuring when reloading %es/%ds/%fs/%gs, but it did not fix faults occuring when executing 'iretq', because before iretq we needed to do +16 in %rsp, and the resulting stack layout was not the one kernuser_reenter() expected (tf_trapno and tf_err were not there). So now: pop tf_trapno and tf_err right away in intrfastexit(), and update the layout in kernuser_reenter() accordingly. The resulting code is actually simpler. Tested by "hardcoding" an iretq fault; the process correctly receives a SIGSEGV. (Note that segment register faults do not happen in the wild, you really need to try hard to trigger one.) commit 3ed90cfec8de411c6d7f68fd4cde88768a90eb60 Author: maxv Date: Tue Mar 20 14:26:49 2018 +0000 Remove the sysretq fault handler. It is broken with SVS, and not really needed anyway. Initially I had added it so that if such a fault was received the kernel would panic "cleanly" instead of crashing in a potentially undefined way. I'll re-add this handler later. commit 5121c2b7362a3b0c1285d3db8f72c52a425e50d1 Author: maxv Date: Sat Mar 17 17:12:39 2018 +0000 Add missing opt_svs.h. commit ec6e715aa5babd4148c76260d82c00a184c1b733 Author: maxv Date: Sat Mar 17 10:42:23 2018 +0000 Set the scopes before calling icmp6_error(). This fixes a bug similar to the one I fixed in rev1.17: since the scopes were not set the packet was never actually sent. Tested with wireshark, now the ICMPv6 reply is correctly sent, as expected. commit 6e3e7474181b7a22a2cb378354f56bf9064ac17c Author: maxv Date: Sat Mar 17 10:21:09 2018 +0000 Fix the IPv6 payload computation in npf_tcpsaw. It was incorrect, and this caused the "return-rst" rules to send back an RST with the wrong ACK when the received SYN had an IPv6 option. commit 9614cf026e61b4dbdb72ecb6288497c0f6663d9f Author: maxv Date: Fri Mar 16 12:48:54 2018 +0000 Remove ipkdb from i386. Also remove unused references in amd64. I already talked about doing that six months ago on port-i386@. Back then it was as general cleanup, but now, with SVS etc, we do actually have good reasons for simplifying the entry points. Ok kamil@. (christos@ was in the conversation too) commit 282efa56e680699ec4846d9e363aea8275fb0ed4 Author: maxv Date: Fri Mar 16 12:21:50 2018 +0000 Remove the prototypes for cpu_uarea_*, I removed these functions two minutes ago. commit 554c8423e77a87b22c6288c2857bf87a8b7169ce Author: maxv Date: Fri Mar 16 12:19:35 2018 +0000 Remove the __HAVE_CPU_UAREA_ROUTINES code from x86. It was available only in amd64, and I disabled it a few months ago in order to support SVS. Regardless of SVS this option was questionable, since it made stack overflows more difficult to detect. commit 051fab14bceab6386e43f500f762f526a067cdbe Author: maxv Date: Fri Mar 16 08:48:34 2018 +0000 Rename "handle_" -> "Xhandle_", and add the function names (introduced by SVS) in db_machdep.c. Should fix the DDB part of PR/53060. commit 41ad27e402d770a64935f4b60263c9a026bad140 Author: maxv Date: Fri Mar 16 08:21:56 2018 +0000 Add one more page for the stack, to compensate for the fact that SVS's stack switching mechanism consumes approximately one page. commit 9c5c08b4b5e3a6f92ffdd1d32f4da69701c69ad8 Author: maxv Date: Thu Mar 15 09:17:31 2018 +0000 Remove #ifdef XEN (Xen has its own cpu.c), and add a comment. commit 3de5606cb1ea2b120b8f3a699e370d3a440c168e Author: maxv Date: Thu Mar 15 08:15:21 2018 +0000 Add the PR_LASTHDR flag on the PFsync and CARP entries. Otherwise a "require" IPsec policy is not enforced on them, and unauthenticated packets will be accepted. Tested with a require-AH configuration. Sent on tech-net@, no comment. commit 40ebd81a138abff2aa073a2dc1f5d07902f8f904 Author: maxv Date: Wed Mar 14 17:40:41 2018 +0000 Spectre V2 mitigation for certain families of AMD CPUs. A new sysctl is added, machdep.spectreV2.mitigated, that controls whether Spectre V2 is mitigated. For now it defaults to "false". The code is written in such a way that there can be several methods. For now only one method is supported, on AMD Families 10h, 12h and 16h, where an MSR is available to disable branch prediction entirely. Compile-tested on Intel, AMD will be tested soon. commit 9ec91d8b065109578d737570d1ca40e53a165db0 Author: maxv Date: Wed Mar 14 15:03:16 2018 +0000 ... and also add IBPB ... commit 59abce50408f2ec35a0471d8d0a1a09c8622d368 Author: maxv Date: Wed Mar 14 14:44:25 2018 +0000 Add the IBRS and STIBP MSRs. commit c24f5fc354001821bbcc7247c3dde27dd4547c75 Author: maxv Date: Wed Mar 14 14:15:02 2018 +0000 Add IC_CFG.DIS_IND: "Disable Indirect Branch Predictor". Available (at least) on AMD Families 10h, 12h and 16h. commit 653bddb63fc5b5a34e052fc4a753286de57953c8 Author: maxv Date: Wed Mar 14 09:32:04 2018 +0000 Fix the "return-rst" rule on IPv6 packets. The scopes needed to be set on the addresses before invoking ip6_output, because ip6_output needs them. The reason they are not here already is because pfil_run_hooks (in ip6_input) is called _before_ the kernel initializes the scopes. Until now ip6_output was always failing, and the IPv6-TCP-RST packet was never actually sent. Perhaps it would be better to have the kernel initialize the scopes before invoking pfil_run_hooks, but several things will need to be fixed in several places. Tested with a simple TCPv6 server. Until now the client would block waiting for an answer that never came; now it receives an RST right away and closes the connection, as expected. I believe that the same problem exists in the "return-icmp" rules, but I can't investigate this right now (some problems with wireshark). commit ecaa9a185ae5d3975e51513add0195d653583d27 Author: maxv Date: Tue Mar 13 16:52:42 2018 +0000 Fix wrong order; first enable WP, then enable interrupts. Otherwise we might get an interrupt before re-enabling WP, and be rescheduled as a result. In practice it never happens, because the previous PSL always has interrupts disabled too. commit bd5d573bd1329b9ec66459278a1433e48f398141 Author: maxv Date: Tue Mar 13 16:45:52 2018 +0000 Mmh, add a missing x86_disable_intr(). My intention there was to ensure interrupts were disabled before the barriers. commit f5efdf3f9bcd5164402e5c88eab8a6831d191ebf Author: maxv Date: Tue Mar 13 16:23:40 2018 +0000 Mmh, put back the RFC6946 check (about dummy fragments), otherwise NPF is not happy in npf_reassembly, because NPC_IPFRAG is again returned after the packet was reassembled. I'm wondering whether it would not be better to just remove the fragment header in frag6_input directly. commit 134d0a47c7884da81e41b7acccd638fc231029c2 Author: maxv Date: Tue Mar 13 09:04:02 2018 +0000 Fix two consecutive mistakes. The first mistake was npf_inet.c rev1.37: "Don't reassemble ipv6 fragments, instead treat the first fragment as a regular packet (subject to filtering rules), and pass subsequent fragments in the same group unconditionally." Doing this was entirely wrong, because then a packet just had to push the L4 payload in a secondary fragment, and NPF wouldn't apply rules on it - meaning any IPv6 packet could bypass >=L4 filtering. This mistake was supposed to be a fix for the second mistake. The second mistake was that ip6_reass_packet (in npf_reassembly) was getting called with npc->npc_hlen. But npc_hlen pointed to the last encountered header in the IPv6 chain, which was not necessarily the fragment header. So ip6_reass_packet was given garbage, and would fail, resulting in the packet getting kicked. So basically IPv6 was broken by NPF. The first mistake is reverted, and the second one is fixed by doing: - hlen = sizeof(struct ip6_frag); + hlen = 0; Now the iteration stops on the fragment header, and the call to ip6_reass_packet is valid. My npf_inet.c rev1.38 is partially reverted: we don't need to worry about failing properly to advance; once the packet is reassembled npf_cache_ip gets called again, and this time the whole chain should be there. Tested with a simple UDPv6 server - send a 3000-byte-sized buffer, the packet gets correctly reassembled by NPF now. commit 0e0638cff3dbe4949c895daa16bd3ab8873a9ccc Author: maxv Date: Mon Mar 12 12:45:26 2018 +0000 Remove dead branches, 'npc' can't be NULL (and it is dereferenced earlier). commit 21ed6a9c08be1cb023584fde2fc203eaa14a9a4b Author: maxv Date: Sun Mar 11 13:38:02 2018 +0000 Explain the TSC drift thing. commit 0daf215bbc1c6bb988acfaa98930331e85a934cb Author: maxv Date: Sat Mar 10 17:52:50 2018 +0000 Add KASSERTs. commit c4c2ccdda6bcd177eb18dc203277820d9fed793d Author: maxv Date: Sat Mar 10 17:48:32 2018 +0000 Fix the computation. Normally that's harmless since ip6_output recomputes ip6_plen. commit 61aa2b4e4b3544bdf7e87892ab0bbfb68e03196c Author: maxv Date: Fri Mar 9 11:57:38 2018 +0000 Remove M_PKTHDR from secondary mbufs when reassembling packets. This is a real problem, because I found at least one component that relies on the fact that only the first mbuf has M_PKTHDR: far from here, in m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a secondary mbuf. (The initial intention there was to avoid updating m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're dealing with the first mbuf.) Therefore, when handling fragmented IPsec packets (in particular IPv6, IPv4 is a bit more complicated), we may end up with an incorrect m_pkthdr.len after authentication or decryption. In the case of ESP, this can lead to a remote crash on this instruction: m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree); m_pkthdr.len is bigger than the actual mbuf chain. It seems possible to me to trigger this bug even if you don't have the ESP key, because the fragmentation part is outside of the encrypted ESP payload. So if you MITM the target, and intercept an incoming ESP packet (which you can't decrypt), you should be able to forge a new specially-crafted, fragmented packet and stuff the ESP payload (still encrypted, as you intercepted it) into it. The decryption succeeds and the target crashes. commit 0079e117e332a5de2d696275c2bca8f39534b9d6 Author: maxv Date: Thu Mar 8 07:54:14 2018 +0000 Switch nptr to uint8_t, and use nbuf_ensure_contig. Makes us use fewer magic values. commit 2d3c34b0db6ce41dbfb408b539d0ee3dafe74649 Author: maxv Date: Thu Mar 8 07:06:13 2018 +0000 Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header, we are allowed to fail to advance, otherwise we kick the packet. Sent on tech-net@ a few days ago, no response, but I'm committing it now anyway. commit 559705543e7dba7d9740517f33f2524c56b550e0 Author: maxv Date: Tue Mar 6 17:39:36 2018 +0000 Perform the IP (src/dst) checks _before_ calling the packet filter, because if the filter has a "return-icmp" rule it may call icmp6_error with an src field that was not entirely validated. commit 44145c3bb039f90343ceae5e52042679981b46ac Author: maxv Date: Mon Mar 5 12:42:28 2018 +0000 Improve stupid check, style, and fix leak (m, not m0). commit 1010a62369987b30f31c088085ddd4df71bbe05b Author: maxv Date: Mon Mar 5 11:50:25 2018 +0000 Call m_pullup earlier, fixes one branch. commit ebc45dd1860c9b1f6fa904721bf08d4eb1878b65 Author: maxv Date: Sat Mar 3 09:54:55 2018 +0000 Reduce the diff between ipsec4_output and ipsec6_check_policy. While here style. commit b2d4c7a166d64932f992fb7eb76da44e68d5e26e Author: maxv Date: Sat Mar 3 09:47:01 2018 +0000 Dedup. commit 6e7f682087e860b41a0bc03f39466f9a6c9da0b0 Author: maxv Date: Sat Mar 3 09:39:29 2018 +0000 Add KASSERTs, we don't want m_nextpkt in ipsec{4/6}_process_packet. commit b6695af912509cddf9ba1c55239b2f2449c47a46 Author: maxv Date: Thu Mar 1 16:55:01 2018 +0000 Replace PG_G by pmap_pg_g, for the sake of removing references to the former. No functional change since pmap_pg_g = PG_G. commit ee9e9ddc413d9498a36bc999281e45dbfc4c5373 Author: maxv Date: Thu Mar 1 16:49:06 2018 +0000 Remove these two KASSERTs. Thinking about it, they may fire when the user enters "sysctl -w machdep.svs.enabled=0", if the xcall is received between the 'svs_enabled' check in the caller and the same check in these KASSERTs. In such a case we perform an SVS operation with svs_enabled set to false, but that's intentional: after it is done svs_pmap_sync and svs_lwp_switch won't be called anymore, the pdir synchronization is dropped. Having said that, I didn't see these KASSERTs getting triggered. commit 444f885dcc9bd97f2d7507acbb027cb260a08a3b Author: maxv Date: Thu Mar 1 06:08:43 2018 +0000 Revert rev1.183 (2003). It was intended as an optimization, but it increases the attack surface: the IPsec policy is not enforced on RST packets when the socket is in the LISTEN state, and an (unauthenticated) attacker could jam the connection between two IPsec hosts by sending RST packets between the client's SYN and ACK packets. Discussed with ozaki-r@. commit 55e06a05c43782f1b324f6e5c96b88e7637c2b49 Author: maxv Date: Wed Feb 28 11:29:14 2018 +0000 add missing static commit 688bab5f8cf01cb18da6d83d74bd802dac809f31 Author: maxv Date: Wed Feb 28 11:23:24 2018 +0000 Remove unused ipsec_private.h includes. commit e9e3085492597b6e8481db4d0c4c1f178818f99f Author: maxv Date: Wed Feb 28 11:19:49 2018 +0000 Remove unused macros, and while here style. commit 55844e6a435c64c16798e3dd04c0a7d5a32e1cbb Author: maxv Date: Wed Feb 28 11:10:22 2018 +0000 (just forgot to commit this file, the message was) Remove duplicate IPSEC_STATINC(IPSEC_STAT_IN_POLVIO), ipsec_in_reject already increases it. IPSEC6_STATINC is now unused, so remove it too. commit 978b00fe5793e35c68d3614d0ae6e952a21e735f Author: maxv Date: Wed Feb 28 11:09:03 2018 +0000 Remove duplicate IPSEC_STATINC(IPSEC_STAT_IN_POLVIO), ipsec_in_reject already increases it. IPSEC6_STATINC is now unused, so remove it too. commit f217c4945cd13e6b5376da1ec115dd677e44a2b8 Author: maxv Date: Wed Feb 28 10:30:20 2018 +0000 Remove unused mbuf tags. commit 49a7f008a77bbf2a0bfd98624dd7ab665da53d9d Author: maxv Date: Wed Feb 28 10:16:19 2018 +0000 Dedup: merge ipsec4_setspidx_inpcb and ipsec6_setspidx_in6pcb. commit 5dad6a2a92984ac77ccbdfdda431d06b7a3a6993 Author: maxv Date: Wed Feb 28 10:09:17 2018 +0000 ipsec6_setspidx_in6pcb: call ipsec_setspidx() only once, just like the IPv4 code. While here put the correct variable in sizeof. ok ozaki-r@ commit 54c92d6260ec2282a3de49d11a9dc855d733b528 Author: maxv Date: Tue Feb 27 15:01:30 2018 +0000 Dedup: merge ipsec4_set_policy and ipsec6_set_policy. The content of the original ipsec_set_policy function is inlined into the new one. commit 9825405f8d3e53a377655fceffe78d1a367653a1 Author: maxv Date: Tue Feb 27 14:52:51 2018 +0000 Remove duplicate checks, and no need to initialize 'newsp' in ipsec_set_policy. commit c66c029e2c0274ffbc4af65fea2aa243a62d0ec3 Author: maxv Date: Tue Feb 27 14:45:43 2018 +0000 Oops, forgot this file; I just merged two IPsec functions, so adapt the rump stubs accordingly. commit 89dca4dfa2d7b8a93551f8d37b975c589467c09e Author: maxv Date: Tue Feb 27 14:44:10 2018 +0000 Dedup: merge ipsec4_get_policy and ipsec6_get_policy ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy The already-existing ipsec_get_policy() function is inlined in the new one. commit ce70003a68f6b37e52779875f1244a9dcec83563 Author: maxv Date: Tue Feb 27 14:28:01 2018 +0000 Remove the Econet code. It was part of acorn26, which was removed a month ago. commit 13802389806a0e6e349b2316800cb553b24c510e Author: maxv Date: Tue Feb 27 14:14:19 2018 +0000 style and fix typo commit 43ddd5596e17d94ff1f9cbb2f390acb77a380d38 Author: maxv Date: Tue Feb 27 13:36:21 2018 +0000 Use inpcb_hdr to reduce the diff between ipsec4_set_policy and ipsec6_set_policy ipsec4_get_policy and ipsec6_get_policy ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy No real functional change. commit 659b513c69af0895d5a6b6b64c616794a7f108ce Author: maxv Date: Tue Feb 27 08:05:19 2018 +0000 Optimize: use ipsec_sp_hdrsiz instead of ipsec_hdrsiz, not to re-query the SP. ok ozaki-r@ commit 55ff5a23d32788b4d68f95a48c2af312f044a7e9 Author: maxv Date: Mon Feb 26 10:36:24 2018 +0000 Dedup: call ipsec_in_reject directly. IPSEC_STAT_IN_POLVIO also gets increased now. commit 70672d9e03705eff56d7b24ebc5ba686239ef6b4 Author: maxv Date: Mon Feb 26 10:19:13 2018 +0000 Reduce the diff between ipsec6_input and ipsec4_input. commit 02364b42469a3449afa1a5e4b3a40408b367a25e Author: maxv Date: Mon Feb 26 09:13:00 2018 +0000 Remove redundant condition (harmless). PR/53030. commit a60da3dc9a07dd335f8892ad204272d38771e1c0 Author: maxv Date: Mon Feb 26 09:04:29 2018 +0000 Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject. While here fix misleading comment. ok ozaki-r@ commit fdb572c2b89fbc4f24fa07c2be4f44d633d1cd6e Author: maxv Date: Mon Feb 26 08:50:25 2018 +0000 Dedup: merge ipsec4_hdrsiz and ipsec6_hdrsiz into ipsec_hdrsiz. ok ozaki-r@ commit 10858aa57a8be096bbc1e941ca9455be0312a7b4 Author: maxv Date: Mon Feb 26 08:42:16 2018 +0000 Dedup: merge ipsec4_checkpolicy and ipsec6_checkpolicy into ipsec_checkpolicy. ok ozaki-r@ commit 1027f7a8a0ddff5ee433982289660129b6030de3 Author: maxv Date: Mon Feb 26 06:58:56 2018 +0000 If 'skip' is lower than sizeof(struct ip), we are in trouble. So remove a nonsensical branch, and add a panic at the beginning of the function. commit 5482b8b147dbd8b118d2c7a621736b3016fd192d Author: maxv Date: Mon Feb 26 06:53:22 2018 +0000 m is never allowed to be NULL, so turn the KASSERT (and the null check) to a panic. commit 9995172d07c7c4b008076041d81395f90015cdf7 Author: maxv Date: Mon Feb 26 06:48:01 2018 +0000 Fix nonsensical checks, neither in6p nor request is allowed to be NULL, and the former is already dereferenced in a kassert. This code should be the same as ipsec4_set_policy. commit 295566d6346499290655bda23c8992d622c0995b Author: maxv Date: Mon Feb 26 06:41:27 2018 +0000 Add XXX, it seems to me we need to free the mbuf here. commit d9b7fbf4cc19b0a34a763f81f08b1e5791029ab8 Author: maxv Date: Mon Feb 26 06:40:08 2018 +0000 Reinforce this area, make sure the length field fits the option. Normally it always does because the options were already sanitized earlier. commit 275a7311d5365d95924921c693dd763718a59685 Author: maxv Date: Mon Feb 26 06:34:39 2018 +0000 Fix mbuf mistake: we are using ip6 before it is pulled up properly. commit cefc36318ad9587213af73ada110e06c54a77ed1 Author: maxv Date: Mon Feb 26 06:17:01 2018 +0000 Merge some minor (mostly stylistic) changes from last week. commit 64d8338c16b6f005807bc958dc465bc7d012e840 Author: maxv Date: Mon Feb 26 05:52:50 2018 +0000 Enable SVS by default. commit 9a6ecc21ae7dfd9f3d3240ca942fe2f18d7f43c6 Author: maxv Date: Sun Feb 25 13:15:35 2018 +0000 Remove the first entry from the todo list, it's handled properly now. commit cd4d3a660f9e1429234062be1474d49ccf92973c Author: maxv Date: Sun Feb 25 13:14:27 2018 +0000 Remove INTRENTRY_L, it's not used anymore. commit 1f9f570e48068a961906043a585a03a42e29eac2 Author: maxv Date: Sun Feb 25 13:09:33 2018 +0000 Mmh. We shouldn't read %cr2 here. %cr2 is initialized by the CPU only during page faults (T_PAGEFLT), so here we're reading a value that comes from a previous page fault. That's a real problem; if you launch an unprivileged process, set up a signal handler, make it sleep 10 seconds, and trigger a T_ALIGNFLT fault, you get in si_addr the address of another LWP's page - and perhaps this can be used to defeat userland ASLR. This bug has been there since 2003. commit 938088c4419d0b74d615bc9f904d4fdace9db1b2 Author: maxv Date: Sun Feb 25 12:37:16 2018 +0000 Fix handling of segment register faults when running with SVS. The behavior is changed also in the non-SVS case. I've put a documentation in amd64_trap.S. Basically, the problem with SVS is that if iret faults, we already have a full trapframe pushed on the stack and the CPU will push another frame on this stack (nested), but it hits the redzone below the stack since it is still running with the user page table loaded. To fix that, we pop a good part of the trapframe earlier in intrfastexit. If iret faults, the current %rsp has enough room for an iret frame, and the CPU can push that without problem. We then switch back to the outer iret frame (the frame the CPU was trying to pop by executing iret, but that it didn't pop for real because iret faulted), call INTRENTRY, and handle the trap as if it had been received from userland directly. commit 440412b0100251fc352d14dd813795b8fc2f2acf Author: maxv Date: Sun Feb 25 11:57:44 2018 +0000 Ah. Don't use NENTRY() to declare check_swapgs, use LABEL() instead. NENTRY puts the code in the .text section, so the effect of TEXT_USER_BEGIN was overwritten, and check_swapgs was not put in the .text.user section. As a result kernels running SVS would crash when jumping here - because we execute this place with the user page table loaded, and in this page table only .text.user is mapped. While here, rename check_swapgs -> kernuser_reenter, because we do more things than just SWAPGS. commit 4bba47bb021795864b0f6d71271ad4685ff77aa4 Author: maxv Date: Sun Feb 25 08:28:55 2018 +0000 Replace %rax -> %rdi, so that check_swapgs clobbers only one register. commit 6f4bc6abb7ad29f01fe004f71235e824b103ac85 Author: maxv Date: Sun Feb 25 08:09:07 2018 +0000 There are two places where we reload %gs: * In setusergs. Here we can't fault. So we don't need to handle this case. * In intrfastexit for 32bit processes. This case needs to be handled, and we already have a label. So use the label instead of disassembling %rip. commit ff51a6812ef509f5c6f06f9814f17a78ebabb349 Author: maxv Date: Sat Feb 24 19:52:46 2018 +0000 Fix one thing in the documentation, I meant to say only SVS_UTLS. commit e4d12582a357160bbb6e80b7168048626d5a1fa9 Author: maxv Date: Sat Feb 24 17:12:10 2018 +0000 Use %rax instead of %r15 in the non-SVS case, to reduce the diff against SVS. In SVS we use %rax instead of %r15 because the following instructions cannot be encoded: movq %r15,SVS_UTLS+UTLS_SCRATCH movq SVS_UTLS+UTLS_RSP0,%r15 commit ec167c9c3517526768839c4a9ebe95f0d37435b5 Author: maxv Date: Sat Feb 24 10:31:30 2018 +0000 Document SVS. Also, remove an entry from the todo list. commit 248b6b2592474fc28f92c3c73f68b6bba3c81c8b Author: maxv Date: Fri Feb 23 19:43:08 2018 +0000 Fix off-by-one, we don't want the entry point to equal the maximum address. commit 03203329a119dc53e979df486f338d2767c652a7 Author: maxv Date: Fri Feb 23 19:39:27 2018 +0000 Add a new entry in the TODO list. commit 16f7d07e089d0310b2752ffe1b1dce5cb7eb2024 Author: maxv Date: Fri Feb 23 14:16:52 2018 +0000 Revert previous, we'll need something better (and compatible with Clang). commit 2dd76723d5817b3e4f823522116c9253702cc2b3 Author: maxv Date: Fri Feb 23 09:57:20 2018 +0000 Change the SVS node, from machdep.svs_enabled to machdep.svs.enabled. commit 76beaea9c2d9da3ff2e58cef16fee0e89fdac62c Author: maxv Date: Fri Feb 23 09:00:55 2018 +0000 Add -fno-shrink-wrap, to force GCC to push the frames at the very beginning of the functions. Otherwise DDB is unable to display a correct stack trace if a fault occurred in a function before the frame was pushed. Discussed on tech-kern@, flag suggested by Krister Walfridsson. Should fix PR/52560. commit 0a9052503b4fe69e92a07f0cf50c1cf923489426 Author: maxv Date: Thu Feb 22 14:57:11 2018 +0000 Adapt previous; put #ifdef SVS around the declaration directly. commit d67bbdb34281e22fe020c3a6bb7dd825e1bfab76 Author: maxv Date: Thu Feb 22 13:27:17 2018 +0000 Remove svs_pgg_update(). Instead of manually changing PG_G on each page, we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that. In addition, install CR4_PGE when SVS is disabled manually (via the sysctl). Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance completely, exactly as if SVS hadn't been enabled in the first place. commit 6f44eed14f40ce3d235b0f5c002f999d6dd43967 Author: maxv Date: Thu Feb 22 11:57:39 2018 +0000 Ensure the CPUs are all online. We take cpu_lock, so nobody can go offline in the meantime. commit e6fb53bd48ae58415222d5ec4553167445c77ea9 Author: maxv Date: Thu Feb 22 10:42:10 2018 +0000 Make the machdep.svs_enabled sysctl writable, and add the kernel code needed to disable SVS at runtime. We set 'svs_enabled' to false, and hotpatch the kernel entry/exit points to eliminate the context switch code. We need to make sure there is no remote CPU that is executing the code we are hotpatching. So we use two barriers: * After the first one each CPU is guaranteed to be executing in svs_disable_cpu with interrupts disabled (this way it can't leave this place). * After the second one it is guaranteed that SVS is disabled, so we flush the cache, enable interrupts and continue execution normally. Between the two barriers, cpu0 will disable SVS (svs_enabled=false and hotpatch), and each CPU will restore the generic syscall entry point. Three notes: * We should call svs_pgg_update(true) afterwards, to put back PG_G on the kernel pages (for better performance). This will be done in another commit. * The fact that we disable interrupts does not prevent us from receiving an NMI, and it would be problematic. So we need to add some code to verify that PMCs are disabled before hotpatching. This will be done in another commit. * In svs_disable() we expect each CPU to be online. We need to add a check to make sure they indeed are. The sysctl allows only a 1->0 transition. There is no point in doing 0->1 transitions anyway, and it would be complicated to implement because we need to re-synchronize the CPU user page tables with the current ones (we lost track of them in the last 1->0 transition). commit 1b63209bca296e853d41f4651a98e61648ef90a8 Author: maxv Date: Thu Feb 22 10:26:32 2018 +0000 Mmh, add #ifdef SVS around svs_init(). commit f57bb10e83cf68e4194f98830f5fb943b49a42d9 Author: maxv Date: Thu Feb 22 09:41:06 2018 +0000 Improve the SVS initialization. Declare x86_patch_window_open() and x86_patch_window_close(), and globalify x86_hotpatch(). Introduce svs_enable() in x86/svs.c, that does the SVS hotpatching. Change svs_init() to take a bool. This function gets called twice; early when the system just booted (and nothing is initialized), lately when at least pmap_kernel has been initialized. commit 7089eaf0ece0486620b23a0e767a77959bd21834 Author: maxv Date: Thu Feb 22 08:56:51 2018 +0000 Add a dynamic detection for SVS. The SVS_* macros are now compiled as skip-noopt. When the system boots, if the cpu is from Intel, they are hotpatched to their real content. Typically: jmp 1f int3 int3 int3 ... int3 ... 1: gets hotpatched to: movq SVS_UTLS+UTLS_KPDIRPA,%rax movq %rax,%cr3 movq CPUVAR(KRSP0),%rsp These two chunks of code being of the exact same size. We put int3 (0xCC) to make sure we never execute there. In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that the SVS_* macros are small, this jump will likely leave us in the same icache line, so it's pretty fast. The syscall entry point is special, because there we use a scratch uint64_t not in curcpu but in the UTLS page, and it's difficult to hotpatch this properly. So instead of hotpatching we declare the entry point as an ASM macro, and define two functions: syscall and syscall_svs, the latter being the one used in the SVS case. While here 'syscall' is optimized not to contain an SVS_ENTER - this way we don't even need to do a jump on the non-SVS case. When adding pages in the user page tables, make sure we don't have PG_G, now that it's dynamic. A read-only sysctl is added, machdep.svs_enabled, that tells whether the kernel uses SVS or not. More changes to come, svs_init() is not very clean. commit 70e0dfab9e458b9d302d985ce0747eea96d2f76c Author: maxv Date: Thu Feb 22 08:36:31 2018 +0000 Revert all my latest changes, and restore this file back to how it was in rev1.24. I wanted to replace the functions dynamically for SVS, but that was a dumb idea, we'll just hotpatch instead. commit 627456174c43aa4f769374a99e753aec43348be6 Author: maxv Date: Wed Feb 21 17:04:52 2018 +0000 Style, no functional change. commit bcdb0e8683950f2ae2e60b2c975b9475681d7ae4 Author: maxv Date: Wed Feb 21 16:55:53 2018 +0000 Strengthen this check, to make sure there is room for an ip6_ext structure. Seems possible to crash m_copydata here (but I didn't test more than that). commit b9aa0edaf1c7efa8628bedb9439dbf2ad27b6415 Author: maxv Date: Wed Feb 21 16:48:28 2018 +0000 Argh, in my previous commit in this file I forgot to fix the IPv6 entry point; apply the same fix there. commit c0535d5527796bbff4ccb620ccd97e730d7afcb9 Author: maxv Date: Wed Feb 21 16:42:33 2018 +0000 Fix ipsec4_get_ulp(). We should do "goto done" instead of "return", otherwise the port fields of spidx are uninitialized. ok mlelstv@ commit 0a956b375a944fe6c8bf1ecf18d30049ae085220 Author: maxv Date: Wed Feb 21 16:38:15 2018 +0000 Use inpcb_hdr to reduce the diff between: ipsec4_hdrsiz and ipsec6_hdrsiz ipsec4_in_reject and ipsec6_in_reject ipsec4_checkpolicy and ipsec4_checkpolicy The members of these couples are now identical, and could be merged, giving only three functions instead of six... commit 02abc3407d371a0b45c3afa4e2e29937bb143261 Author: maxv Date: Wed Feb 21 16:18:52 2018 +0000 Rename: ipsec_in_reject -> ipsec_sp_reject ipsec_hdrsiz -> ipsec_sp_hdrsiz localify the former, and do some cleanup while here. commit 560004ab188b7f2cd1e6688ec1bd5268b85a529c Author: maxv Date: Wed Feb 21 16:08:55 2018 +0000 Extend these #ifdef notyet. The m_copydata's in these branches are wrong, we are not guaranteed to have enough room for another struct ip, and we may crash here. Triggerable remotely, but after authentication, by sending an AH packet that has a one-byte-sized IPIP payload. commit cc9db41946a306043dbe890dc1de152a97ee988f Author: maxv Date: Sun Feb 18 14:32:31 2018 +0000 Pass the name of the function as argument in SWAPGS_HANDLER. commit b4d37c062509d62826afdcf1172e77c4e716f99d Author: maxv Date: Sun Feb 18 14:07:29 2018 +0000 Add svs_enabled, which defaults to 'true' when SVS is compiled (no dynamic detection yet). commit 6d2fc8dd98cbb99bed9c3f733bd2c233b355b611 Author: maxv Date: Sat Feb 17 21:05:58 2018 +0000 Declare check_swapgs in an ASM macro. No real functional change. commit 42024de930105aebd57efe338c1ab970001fa8fa Author: maxv Date: Sat Feb 17 20:59:14 2018 +0000 Use ASM macros for the rest of the entry points. No real functional change. Now the format of the entry points is: .macro TRAP_ENTRY_POINT_xx arg1,arg2,arg3 ...the asm code... .endm TEXT_USER_BEGIN TRAP_ENTRY_POINT_xx arg1,arg2,arg3 TEXT_USER_END commit 09f2dda95ccb299bf4314f361eb70925fd75b22d Author: maxv Date: Sat Feb 17 20:47:04 2018 +0000 Declare and use TRAP_ENTRY_POINT_DNA. This time we don't give an is_ztrap argument, because the macro is tied to the entry point, and it would be wrong to suggest the paramater is controllable. No real functional change. commit 493da9e4bd17db1e62803f56a973bccada408af0 Author: maxv Date: Sat Feb 17 20:41:57 2018 +0000 Now that [Z]TRAP and [Z]TRAP_NJ are identical, put back the INTRENTRY jmp .Lalltraps_noentry instructions for Xen, and remove [Z]TRAP_NJ. commit 530c701d7f0d2fd8741f2cb060ff477ccd42145f Author: maxv Date: Sat Feb 17 20:33:28 2018 +0000 Declare and use TRAP_ENTRY_POINT_SPUR. No real functional change. commit b745a559d15fc4bcc9460297f60dcc79b205c983 Author: maxv Date: Sat Feb 17 20:28:18 2018 +0000 Declare and use TRAP_ENTRY_POINT_FPU. No real functional change. commit 096c4a13475f94038b5d23d49319a2cad5d19052 Author: maxv Date: Sat Feb 17 20:22:05 2018 +0000 Start using ASM macros to define the trap entry points. No real functional change. commit 964929f8439eb022843a22ea5c83ab01ae5bd3d2 Author: maxv Date: Sat Feb 17 19:26:20 2018 +0000 Define legacy_stubs in a macro. commit 78acd04140c98da4b1a15e63e0991fe425f9855b Author: maxv Date: Sat Feb 17 18:51:53 2018 +0000 Rename i8259_stubs -> legacy_stubs. We will want the entries to have the same name, eg: legacy_stubs -> Xintr_legacy0, Xrecurse_legacy0, Xresume_legacy0 -> Xintr_legacy1, Xrecurse_legacy1, Xresume_legacy1 ... commit 6c3d86ff124d1d426d211e377ee907431709f6b9 Author: maxv Date: Sat Feb 17 17:44:09 2018 +0000 Add svs_init. This is where we will detect the CPU and decide whether to turn SVS on or not. Add svs_pgg_update to dynamically add/remove PG_G from all the kernel pages. Use it now. commit ab200ca1196c936e1bba7cfa1344f555f9fda1c4 Author: maxv Date: Fri Feb 16 15:18:41 2018 +0000 Style, remove unused and misleading macros and comments, localify, and reduce the diff between similar functions. No functional change. commit 255473793c1cafc0ddd32ab3cc98c9ce9cd65e5a Author: maxv Date: Fri Feb 16 11:25:16 2018 +0000 Fix inverted logic, otherwise the kernel crashes when receiving a 1-byte AH packet. Triggerable before authentication when IPsec and forwarding are both enabled. commit 953bd6c53c48df5cd42c6e09a0ffec424a5d0657 Author: maxv Date: Fri Feb 16 11:07:44 2018 +0000 Style a bit, no functional change. commit 24178a357d3c9c1ff6862d5661171efcb38f5a66 Author: maxv Date: Fri Feb 16 10:07:07 2018 +0000 Remove unused. commit 654c33bc36b498fae07d6a08390651aacecaafa6 Author: maxv Date: Fri Feb 16 09:24:55 2018 +0000 Add [ah/esp/ipcomp]_enable sysctls, and remove the FreeBSD #ifdefs. Discussed with ozaki-r@. commit 38e35ce910b019a1750bd2d83f7fbd9021504402 Author: maxv Date: Fri Feb 16 09:07:50 2018 +0000 Remove some more FreeBSD sysctl declarations that already have NetBSD counterparts. Discussed with ozaki-r@. commit cf208351c0067da51e3ecddc4ef48bd65ed9b51b Author: maxv Date: Fri Feb 16 08:56:50 2018 +0000 Remove ipsec_replay and ipsec_integrity from this place, they are already declared as sysctls. Discussed with ozaki-r@. commit 7597f8bccaefede4fb0fbd6abfce1ab4b22ddec3 Author: maxv Date: Fri Feb 16 08:51:28 2018 +0000 Remove ip4_esp_randpad and ip6_esp_randpad, unused. Discussed with ozaki-r@. commit cfe83793389116f9407e3174903564242cb83c0b Author: maxv Date: Thu Feb 15 13:51:32 2018 +0000 Style and simplify. commit 7ce369a057e48824e07b7af9040d99d0e431d369 Author: maxv Date: Thu Feb 15 12:40:12 2018 +0000 Style a bit, and if we don't know the pad-filling policy use SADB_X_EXT_PZERO by default. There doesn't seem to be a sanity check in the keysock API to make sure this place is never reached, and it's better to fill in with zeros than not filling in at all (and leaking uninitialized mbuf data). commit e6d000256283973f27c8ad3c86d1946cf01198bd Author: maxv Date: Thu Feb 15 10:41:51 2018 +0000 Remove broken MROUTING code, rename ipo->ip4, and simplify. commit 696a27bc9008fd5d2ef21c86a9fb36b651df764d Author: maxv Date: Thu Feb 15 10:28:49 2018 +0000 Fix the IPIP_STAT_IBYTES stats; we did m_adj(m, iphlen) which substracted iphlen, so no need to re-substract it again. commit f969bd7671bf3793d5477e955e4c5c2be2ca6885 Author: maxv Date: Thu Feb 15 10:21:39 2018 +0000 dedup again commit 52ac9796870d27e123c1f65845aeacd9fc4db1b9 Author: maxv Date: Thu Feb 15 10:09:53 2018 +0000 dedup commit 434506389c2f3fbb37bbfa99943959ca57129b52 Author: maxv Date: Thu Feb 15 10:04:43 2018 +0000 Style and remove dead code. commit 5303211fc9875bdc07984e978316218999a1d4b1 Author: maxv Date: Thu Feb 15 08:38:00 2018 +0000 style commit ab2879532656cee78b555dfdddd6eee82e8800a9 Author: maxv Date: Thu Feb 15 07:38:46 2018 +0000 Make sure the Authentication Header fits the mbuf chain, otherwise panic. commit 803b346272655edc9bc77b6e5ad497e11633b79b Author: maxv Date: Thu Feb 15 07:16:05 2018 +0000 Fix use-after-free, 'ah' may not be valid after m_makewritable and ah_massage_headers. commit c16812dba2837a0f697984788f290377bd100ddb Author: maxv Date: Wed Feb 14 14:28:40 2018 +0000 Style, and remove unused prototypes and functions. commit 35bd12923f0e9616de47652071109e3695f1c4e9 Author: maxv Date: Wed Feb 14 14:19:53 2018 +0000 Remove m_checkalignment(), unused. This eliminates a reference to m_getptr(). commit a6f07e63c8073d1f9da950422a723b71c11085a4 Author: maxv Date: Wed Feb 14 14:15:53 2018 +0000 Remove IFF_STATICARP, we don't support this, and the code is useless in its current form. ok ozaki-r@ commit e37ef9d5cad1f932d25189e59c64566ee408cc77 Author: maxv Date: Wed Feb 14 06:52:41 2018 +0000 Use .Cm instead of .Li, same as arp.8. commit d77154e983389648b86d6df6e3a9ece01880ecee Author: maxv Date: Wed Feb 14 05:29:39 2018 +0000 Re-make ip6_nexthdr global, it will be used in soon-to-be-added code... commit 48a5832a22b8f88bd85712cc670832df22569828 Author: maxv Date: Wed Feb 14 05:24:44 2018 +0000 Revert my two last changes in this file. They are apparently causing problems with racoon, I'll investigate this later. commit c5b0fd3492d18a99d49ab77dab0929c071844b68 Author: maxv Date: Tue Feb 13 15:21:59 2018 +0000 Make the arpresolve branch more readable, fix typo, fix XXX (which I added), add missing pserialize_read_exit (which I forgot). commit dd80fcdfd166895b1f0abe610b20d498db493662 Author: maxv Date: Tue Feb 13 14:50:28 2018 +0000 Mmh. Add a missing check: if ARP was disabled on the interface, don't process ARP packets. Otherwise the kernel will add ARP entries even if ifconfig wm0 -arp was entered. commit 43daecaaeefeaaedea84e5d068aace66a33ae564 Author: maxv Date: Tue Feb 13 10:50:38 2018 +0000 Remove KERNEL_LOCK around the MPLS code. It's not needed, since we're only touching the tag of the mbuf - the tag belongs only to the mbuf, and the mbuf is not shared. ok knakahara@ commit 20c6b2773fd5b325fb468e5a8817240d4d5515ce Author: maxv Date: Tue Feb 13 10:47:41 2018 +0000 Be tougher: * In arpintr(), don't allow IEEE1394 packets on non-IEEE1394 interfaces. * In revarpinput(), kick IEEE1394 packets right away. They are not supported. commit 091e4da329f0bd4c6790607c76c93073b336376e Author: maxv Date: Tue Feb 13 10:31:01 2018 +0000 Same change as rev1.258, but this time in revarpinput: use m_pullup. commit 6af00b85dc39d2ad01ea6cab14c26f3613cb83e8 Author: maxv Date: Tue Feb 13 10:20:50 2018 +0000 Minor stylistic changes, and use C99 types. commit 0c5e19b9884aca7df5a3296b605e1ce41e633525 Author: maxv Date: Tue Feb 13 10:05:05 2018 +0000 Replace dead code by KASSERT. commit 9ba0bc0a0b9dce553a0747f5607f34bc594ccc05 Author: maxv Date: Tue Feb 13 09:26:17 2018 +0000 Put time_second and time_uptime in different cache lines, probably saves us some false sharing. commit 9b24accda6fa0c854020660dd345850bd402f044 Author: maxv Date: Tue Feb 13 08:51:37 2018 +0000 Don't force ARPHRD_IEEE1394 on IEEE1394 interfaces. If it's not there, then kick the packet. And do this earlier. commit bf47bc68ebea32926f77ad525f9b60444eb3b34b Author: maxv Date: Tue Feb 13 08:43:26 2018 +0000 Define ar_* as inlined functions, not as macros. Makes it easier to understand why ARPHRD_IEEE1394 needs to be handled with care - it doesn't have ar_tha. commit c02f6d2f26758e47401d0e7ae64d3772108468ef Author: maxv Date: Tue Feb 13 08:20:12 2018 +0000 Use only one label, clearer. commit b0ad34f06b7b981615577728a361011941868591 Author: maxv Date: Tue Feb 13 07:51:24 2018 +0000 Fix three things in arpintr(): * mtod can't return NULL. * It is wrong to kick the packet if m->m_len < arplen. While this check always returns false for native Ethernet interfaces, it may not if the frame is encapsulated in EtherIP/L2TP. Use m_pullup instead. * Remove XXX, it is fine. Reduce the indentation level afterwards. commit 65d8afa69c0757bac76b2034969ade8524e6c243 Author: maxv Date: Tue Feb 13 07:44:25 2018 +0000 Style, no functional change. commit e40073f1bec4c00ca65bc95a27581ea6146edd29 Author: maxv Date: Tue Feb 13 06:44:13 2018 +0000 Remove double declaration; 'ddb_regs' is already declared as a macro in db_machdep.h if MULTIPROCESSOR is on, and the macro has higher priority. Don't declare 'ddb_regs' locally in this case, because it is misleading. Part of PR/52964. commit f7cddc3e2e76927c5d6a5fee27b2785fb36f8f10 Author: maxv Date: Mon Feb 12 17:04:58 2018 +0000 Another missing NULL-check. commit aa9506852407296d3f1fc96771f7a77c3a8f8cb8 Author: maxv Date: Mon Feb 12 17:01:22 2018 +0000 m_free -> m_freem, otherwise leak commit 717aaf06372cf667dd86ae3363abd867a2607a92 Author: maxv Date: Mon Feb 12 16:58:01 2018 +0000 NULL-check after M_DONTWAIT. commit 20183e76f1c36febd3ffefbfe96114611945a99d Author: maxv Date: Mon Feb 12 16:01:35 2018 +0000 Add a KASSERT; we expect *from to be a single mbuf (not chained). commit 5999836e20a1b82c2fd346a94bbb6f2a4b1c48cb Author: maxv Date: Mon Feb 12 15:38:14 2018 +0000 Use m_freem instead of m_free. Otherwise we're leaking the next mbufs in the chain. commit d808ffb77b336b4b95bd845f8383990e9a623cf3 Author: maxv Date: Mon Feb 12 12:52:12 2018 +0000 Replace bcopy -> memcpy when it is obvious that the areas don't overlap. Rearrange ip6_splithdr() for clarity. commit d970581739968cd95b7d3c729d8cfe2fcea0bd30 Author: maxv Date: Mon Feb 12 12:17:38 2018 +0000 Fix typo, and add a comment about MPLS. commit 808b520fad453cc1e121ae40672037564f70d585 Author: maxv Date: Mon Feb 12 09:31:06 2018 +0000 Don't rebase the pointers. 'm' is only allowed to become NULL (which means 'processed'). commit 70c73c7b9c9aa7ef0ae34cf713f9c501b41a9687 Author: maxv Date: Mon Feb 12 08:22:26 2018 +0000 Remove unused argument from tcp_signature_getsav. commit 881a52af7eb57bb220130279f0b1c844ccf47c70 Author: maxv Date: Mon Feb 12 08:13:08 2018 +0000 Add a KASSERT. commit 3f7078acd3a5967934a46ccf795660aa2318d0f4 Author: maxv Date: Mon Feb 12 08:08:28 2018 +0000 Remove the 'm' argument from syn_cache_respond(); all it does with it is freeing it, so free in the caller instead. commit c45c2b7d89c89e2f441370a7c01b5a80e11f2351 Author: maxv Date: Mon Feb 12 08:03:42 2018 +0000 Remove this multicast check. Multicast packets are already dropped at the beginning of the function. commit 9d97dccd3b3a429a502fb757c3204c7975d4f7ec Author: maxv Date: Sun Feb 11 09:39:36 2018 +0000 Move SVS into x86/svs.c commit f54b4a9b9cfbc597cdd87eb1fb5e8ffc69a387b0 Author: maxv Date: Sun Feb 11 08:27:18 2018 +0000 Style, and reduce the diff between i386 and amd64. No functional change. commit 17bd4cf6e691bb786a45695a3e8f900b828fe549 Author: maxv Date: Sat Feb 10 08:54:22 2018 +0000 Add a note, to say that basically the recent FreeBSD binaries can't be expected to work, and that we keep compat_freebsd only for tw_cli. commit 947ff4a729c1e818e27e56f5d188265778c60c1d Author: maxv Date: Sat Feb 10 08:17:00 2018 +0000 If the socket wants a ESP-over-UDP packet, and the packet is incorrect, stop processing it instead of giving it to udp4_sendup. It just doesn't make any sense not to drop it. I was already telling myself this the other day when I visited this place, but I just saw PR/36782 (11 years old) that suggests the exact same thing, so fix it. Now, udp4_espinudp always frees the mbuf, and is made void. The packet is not processed any further afterwards. commit 29492263ce522dc44b8f72bf8b30a17d36d47984 Author: maxv Date: Sat Feb 10 07:59:54 2018 +0000 Remove the last reference to IPSEC_ESP. This option was deleted in 2013. commit d3e14d9ab1ca9cada621a06d2346aff6e9291abb Author: maxv Date: Fri Feb 9 21:25:04 2018 +0000 Oh, what is this. Fix a remotely-triggerable integer overflow: the way we define TCPOLEN_SACK makes it unsigned, and the comparison in the while() is unsigned too. That's not the expected behavior, the original code wanted a signed comparison. It's pretty easy to make 'hlen' go negative and trigger a buffer overflow. This bug was reported 8 years ago by Lucio Albornoz in PR/44059. commit a97d8351e16c5add4e445259f31366329c3e6b46 Author: maxv Date: Fri Feb 9 18:45:55 2018 +0000 Disable XSAVEOPT, until it is clear what's wrong with it (PR/52966). commit 179a0b5ef9496d87aefac88be31cbb99bd2c17db Author: maxv Date: Fri Feb 9 18:31:52 2018 +0000 Remove dead code. commit 3b921823c7f08d2f85e7a57a1f164af96711a2fa Author: maxv Date: Fri Feb 9 14:06:17 2018 +0000 Style, and move the 'ip_srcroute' call after 'tcp_dooptions', otherwise we're leaking 'ipopts'. (Harmless, since TCP_SIGNATURE is disabled.) commit 6c65d2f643b6e549b3126b44bb1f73b7edb549e5 Author: maxv Date: Fri Feb 9 09:36:42 2018 +0000 Reset ddb_regp to NULL. Reported by David Binderman in PR/52964. commit b2a9fb65bb0feab0d6b07559898797178db73a19 Author: maxv Date: Fri Feb 9 09:07:13 2018 +0000 Use UVM_PROT_RW instead of UVM_PROT_ALL. This doesn't change anything, since the protection code is not applied: the pages are manually kentered as RW. But fix it anyway, so that "pmap 0" does not say the map is executable. commit 1bc2fb11b5e0e0fb5cc18574154f35a8d12e9494 Author: maxv Date: Fri Feb 9 08:58:01 2018 +0000 Force a reload of CW in fpu_set_default_cw(). This function is used only in COMPAT_FREEBSD, it really needs to die. commit 6964836208ab1fae12d510993ee5f751a4b6f16d Author: maxv Date: Fri Feb 9 08:54:11 2018 +0000 Don't restore segment registers when leaving NMIs. In nmitrap (and the functions it later calls), we are not allowing the trap frame to change; so the segregs don't change since we are running with interrupts disabled and there is no rescheduling in this case. commit dd12aac7cd378d59dc9c5c8a01e19304fcbd9615 Author: maxv Date: Fri Feb 9 08:42:26 2018 +0000 Define INTRSTUB_ARRAY, simplifies a lot. commit 3c3c50e2e2e54f6d81749e411822744b799b3156 Author: maxv Date: Fri Feb 9 08:03:33 2018 +0000 Style (realign everything correctly), and fix a typo. commit 5ff68066a840ae2d6dd750975a87e6a021f22048 Author: maxv Date: Thu Feb 8 21:02:05 2018 +0000 Remove ovbcopy. It's long dead; only sparc has a reference to a function of the same name, which too should be removed. commit a41ac430d6d2bb78c6eb204fc87a12587c7886a7 Author: maxv Date: Thu Feb 8 20:57:41 2018 +0000 Remove unused net_osdep.h include. commit 53d3033e34af29cc7de40fcdb1109414eb08f2fc Author: maxv Date: Thu Feb 8 20:50:00 2018 +0000 Style, rename a variable, and remove an unreachable case. commit b3e344937710ff4af5644eb30e4477be602b59c9 Author: maxv Date: Thu Feb 8 20:41:36 2018 +0000 Move the IPv4 multicast check earlier; we want to kick multicast packets all the time, and not just when they are SYNs. The IPv6 multicast check is already done earlier, so this block of code can be removed. commit 06b522b96a09e2025e3880023fc2d825f70434a0 Author: maxv Date: Thu Feb 8 20:19:30 2018 +0000 Remove the unused 'multicast' argument from tcp_vtw_input, and remove the now-unused multicast detection code. It couldn't have been correct on IPv6, since multicast packets are kicked at the beginning of the function. commit 9e5ebc39033b198c1ba82b9e1aa0facb053f6991 Author: maxv Date: Thu Feb 8 20:10:55 2018 +0000 Remove the default case, the beginning of the function already ensures af == AF_INET || af == AF_INET6. commit b2858fbe812a34087b01532fb44ea8b148ac8f2c Author: maxv Date: Thu Feb 8 20:06:21 2018 +0000 Dedup code. commit 538a184b0151ffbddb13bac675e938e23b497c29 Author: maxv Date: Thu Feb 8 19:58:05 2018 +0000 Remove the IN6_IS_ADDR_V4MAPPED checks in the protocol functions. They are useless, because the IPv6 entry point (ip6_input) already performs them. The checks were first added in the protocol functions: Wed Dec 22 04:03:02 1999 UTC (18 years, 1 month ago) by itojun "drop IPv6 packets with v4 mapped address on src/dst. they are illegal and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as source you may be able to pretend the packet is from local node)" Shortly afterwards they were also added in the IPv6 entry point, but where not removed from the protocol functions: Mon Jan 31 10:33:22 2000 UTC (18 years ago) by itojun "be proactive about malicious packet on the wire. we fear that v4 mapped address to be used as a tool to hose security filters (like bypassing "local host only" filter by using ::ffff:127.0.0.1)." OpenBSD did the same a few months ago. FreeBSD has never had these checks. commit 547b736470183bcf1178dfd9f5f17ef1b53e0dcc Author: maxv Date: Thu Feb 8 19:38:21 2018 +0000 Style, and remove outdated comments. commit aeb575e71ebe20d195331156c8002b541c9f56af Author: maxv Date: Thu Feb 8 19:25:48 2018 +0000 Remove this check, it is already done at the beginning of the function. commit d7027c90eaca953144514090b2228ab1dc4c0421 Author: maxv Date: Thu Feb 8 18:58:59 2018 +0000 Reduce the indentation level of this huge block (without realigning yet, for proofreadability). No functional change. commit af2b3e0397c285d14821b618074030c37103513c Author: maxv Date: Thu Feb 8 18:55:11 2018 +0000 Move the SO_DEBUG block earlier, to reduce the indentation level. commit 3ddad9d839d783ea7d2ae79fc19369cc308cd247 Author: maxv Date: Thu Feb 8 11:49:37 2018 +0000 pr_send can be given a NULL lwp. It looks like the control != NULL && lwp == NULL condition is never supposed to happen, but add a panic for safety. commit 4baf81b5ab199ec201a89d3c13debbf578e6e861 Author: maxv Date: Thu Feb 8 11:34:35 2018 +0000 Move udp6_output() into udp6_usrreq.c, and remove udp6_output.c. This is more consistent with IPv4, and there is no good reason for keeping a separate file only for one function. FreeBSD did the same. commit 15ebda77833b7ab6f8268ee2d248dd1d60edab99 Author: maxv Date: Thu Feb 8 11:13:20 2018 +0000 Style, no functional change. commit f734d5452cfe94a3c2f0ebe85245e1031ac60208 Author: maxv Date: Thu Feb 8 10:42:12 2018 +0000 Use C99 types - in particular, stop using n_time and n_short -, style, and remove prototype of icmp_sysctl (does not exist). No functional change. commit 6785ffc13af12b0be1c1246a288e3a5376750a89 Author: maxv Date: Thu Feb 8 10:30:30 2018 +0000 Style, and remove prototype of udp_sysctl (does not exist). commit 88a0139694b339915f3594dafbd70e73b5cfdc0e Author: maxv Date: Thu Feb 8 10:24:46 2018 +0000 More style, no functional change. commit e6ebfce5977cb25656a489fdd5874daa6b31c625 Author: maxv Date: Thu Feb 8 10:03:52 2018 +0000 Change the error stat from IP_STAT_BADFRAGS to IP_STAT_TOOLONG. The ping_of_death ATF test expects this counter to get increased. commit 1a0b621f4cfe31baff0773eabddc0af8e12a1f5c Author: maxv Date: Thu Feb 8 09:56:19 2018 +0000 Now that we don't allow source-routed packets by default, set allowsrcrt=1 and forwsrcrt=1. Should fix the ATF failure. commit d5924650bc8c8b059140f9aa4ce24b79b4aa2863 Author: maxv Date: Thu Feb 8 09:32:02 2018 +0000 Fix a possible buffer overflow in the IPv4 _ctlinput functions. In _icmp_input we are guaranteeing that the ICMP_ADVLENMIN-byte area starting from 'icp' is contiguous. ICMP_ADVLENMIN = 8 + sizeof(struct ip) + 8 = 36 But the _ctlinput functions (eg udp_ctlinput) expect the area to be larger. These functions read at: (uint8_t *)icp + 8 + (icp->icmp_ip.ip_hl << 2) which can be crafted to be: (uint8_t *)icp + 68 So we end up reading 'icp+68' while the valid area ended at 'icp+36'. Having said that, it seems pretty complicated to trigger this bug; it would have to be a fragmented packet with half of the ICMP header in the first fragment, and we would need to have a driver that did not allocate a cluster for the first mbuf of the chain. The check of icmplen against ICMP_ADVLEN(icp) was not sufficient: while it did guarantee that the ICMP header fit the chain, it did not guarantee that it fit 'm'. Fix this bug by pulling up to hlen+ICMP_ADVLEN(icp). No need to log an error. Rebase the pointers afterwards. commit 12652b35f790293b1b50d172afad9f3f65be4ebc Author: maxv Date: Thu Feb 8 07:11:20 2018 +0000 Style, and remove printfs. commit 6d881e1e3f99778d93f231d3e0883769c1167cb4 Author: maxv Date: Thu Feb 8 06:50:38 2018 +0000 Fix three pretty bad mistakes in NAT-T: * If we got a keepalive packet, we need to call m_freem, not m_free. Here the next mbufs in the chain are not freed. Seems easy to remotely DoS the system by sending fragmented keepalives in a loop. * If !ipsec_used, free the mbuf. * In udp_input, we need to update 'uh', because udp4_realinput may have modified the chain. Perhaps we also need to re-enforce alignment, so add an XXX. commit 885e59916b6f3867e1651481bd2421adc0d4a7e6 Author: maxv Date: Wed Feb 7 14:03:18 2018 +0000 Keep /dev/ksyms open in _kvm_open(). This way /dev/ksyms can be put into $g_kmem without breaking the tools that need kmem+ksyms. Discussed on tech-kern@ three weeks ago. The original issue was reported by maya@, the patch was written by Tom Ivar Helbekkmo, ok christos@. commit 30d25651ec584ba53531f66fbb3a9904d22cea9a Author: maxv Date: Wed Feb 7 13:22:41 2018 +0000 Style and constify. commit 62cbd528eb2a42a8f2eff7627be8b1c5027f172f Author: maxv Date: Wed Feb 7 12:15:32 2018 +0000 More style. No functional change. commit 96ced4be92e316897ac68e903ae40979d2253a63 Author: maxv Date: Wed Feb 7 12:09:55 2018 +0000 Remove parentheses in return statements. No functional change. commit 958600405384774f9963990302e406872b0962df Author: maxv Date: Wed Feb 7 12:04:50 2018 +0000 Style and remove unused macros. More to come. commit ba1e942efba335fc5a698f17ef0e839b9cee1ccd Author: maxv Date: Wed Feb 7 11:42:57 2018 +0000 Remove RSVP_ISI, that's mostly dead code. FreeBSD and OpenBSD too removed it; FreeBSD kept some pieces but they are mostly no-opts. Sent on tech-net@, no comment. commit 85a3cc5eda852bcecf766a9a77e665c26da4fcd5 Author: maxv Date: Wed Feb 7 10:52:20 2018 +0000 Style, and localify IPV6FORWARDING. No functional change. commit e98efc36f5c68adcd2b3aa22ac39912d709a5b5c Author: maxv Date: Wed Feb 7 10:21:59 2018 +0000 Change ip6_hdrnestlimit to be 15 instead of 50. I couldn't find any reference in RFCs about what a correct limit should be, but FreeBSD already uses 15. If an IPv6 packet has 50 options, there is clearly something wrong with it. commit ed0d6c7ab01b39a8728c43630b514c962bb5d093 Author: maxv Date: Wed Feb 7 09:53:08 2018 +0000 Rename back to ip6af_mff. It was actually clearer than ip6af_more. commit b9f24eb468c616bd750ef3674aeaf23d82bacef7 Author: maxv Date: Wed Feb 7 08:12:25 2018 +0000 Remove null check on ip, it can't be null. (Confuses code scanners.) commit d0b0ef2f3e5d8a6a5dabf32d675d1074b7089757 Author: maxv Date: Tue Feb 6 17:08:18 2018 +0000 Several changes, mostly cosmetic: * Add a KASSERT in ip_output(), we expect (at least) the IP header to be here. * In ip_fragment(), declare two variables instead of recomputing the values each time. Add an XXX for ipoff, it seems to me we should also remove IP_RF. * Rename the arguments of ip_optcopy(). * Style: use NULL for pointers, remove ()s for return statements, and add whitespaces for clarity. No real functional change. commit 1374a6000156060be7bc84c4fc0075c9fc0a8aa5 Author: maxv Date: Tue Feb 6 15:48:02 2018 +0000 Add one more check in ip_reass_packet(): make sure that the end of each fragment does not exceed IP_MAXPACKET. In ip_reass(), we only check the final length of the reassembled packet against IP_MAXPACKET. But there is an integer overflow that can happen a little earlier. We are doing: i = ntohs(p->ipqe_ip->ip_off) + ntohs(p->ipqe_ip->ip_len) - ntohs(ip->ip_off); [...] ip->ip_off = htons(ntohs(ip->ip_off) + i); It is possible that ntohs(p->ipqe_ip->ip_off) + ntohs(p->ipqe_ip->ip_len) > 65535 so the computation of ip_off wraps to zero. This breaks an assumption in the reassembler - it expects the list of fragments to be ordered by offset, and here it's not ordered anymore. (Un)Fortunately I couldn't turn this into anything exploitable. With the new check, it is guaranteed that ip_off+ip_len<=65535. commit abac3ffa3a6b567ad913d3c044c082e2dbfcbbe5 Author: maxv Date: Tue Feb 6 06:36:40 2018 +0000 Typos and style a bit, no functional change. commit 22d5aac26f9ebb1ebd9b13ebde22e7ee616a6e2c Author: maxv Date: Tue Feb 6 06:32:25 2018 +0000 Remove dead code. commit c2d1a87bce2b4f277d9a4e2d63270fc028570aa7 Author: maxv Date: Mon Feb 5 15:23:14 2018 +0000 Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented. commit 85fc3ec56cce3dd737b1bff6be3231c76df57487 Author: maxv Date: Mon Feb 5 15:18:10 2018 +0000 Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It probably wouldn't have built correctly anyway, since there is no associated defflag. These ten lines of code in ip_input.c already look a lot better. commit 545b962c173bf78d75041df9693ce607d95aebd1 Author: maxv Date: Mon Feb 5 15:02:52 2018 +0000 Remove references to IPFORWSRCRT (the only one that was actually documented). commit 34be00f6020dbbacb3051f91e0cc2175a3b37431 Author: maxv Date: Mon Feb 5 14:52:42 2018 +0000 Clean up this mess. This is typically the kind of places where we need to seriously cut the bullshit. These things are unreadable, undocumented, and all they bought us was not figuring out we had IPv4 forwarding enabled by default for 20+ years. commit e652e4d347a56cb920950971d04d95b62cf9c83b Author: maxv Date: Mon Feb 5 14:23:38 2018 +0000 Be tougher, and don't allow LSRR+SSRR (RFC7126). commit b9e1867a43a339e334580a77bdc96a684e77acb1 Author: maxv Date: Mon Feb 5 13:52:39 2018 +0000 Kick duplicate options, they are not allowed (RFC791). commit 703764a2595ccb4b21eb3af5e2068d91b413b5af Author: maxv Date: Mon Feb 5 13:34:20 2018 +0000 Remove unused variable. commit f29f593100f8e4149bec4bc47562dbaaecd761e9 Author: maxv Date: Mon Feb 5 13:23:11 2018 +0000 Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a completely dumb idea, because they have security implications. By sending an IPv4 packet containing an LSRR option, an attacker will cause the system to forward the packet to another IPv4 address - and this way he white-washes the source of the packet. It is also possible for an attacker to reach hidden networks: if a server has a public address, and a private one on an internal network (network which has several internal machines connected), the attacker can send a packet with: source = 0.0.0.0 destination = public address of the server LSRR first address = address of a machine on the internal network And the packet will be forwarded, by the server, to the internal machine, in some cases even with the internal IP address of the server as a source. commit 292156afe70b5dfa34d0f1748ae00cdc2eee5893 Author: maxv Date: Mon Feb 5 13:04:56 2018 +0000 Style, no functional change. commit c9f5979a949de7e771226aa047cfd8a2aad65a2a Author: maxv Date: Mon Feb 5 08:38:06 2018 +0000 Declare icmperrppslim in ip_icmp.c, it shouldn't be used elsewhere. commit 42aafc00d05f32599fa5f854f51e4c611347a24f Author: maxv Date: Sun Feb 4 17:54:34 2018 +0000 Explicitly disable the kernel-mode GPROF (even though it is never enabled), and explain a bit. commit 8778e3b7e09a63a1fd50e82a634cf39fadea154e Author: maxv Date: Sun Feb 4 17:31:51 2018 +0000 Add a proper defflag for GPROF, and include opt_gprof.h, otherwise we're not gonna go very far. commit 1b9402567314c0d262121ba38ca77087f7a8b549 Author: maxv Date: Sun Feb 4 17:03:21 2018 +0000 Add a TODO list for SVS. commit 43ece57b2a5f52ec3ff1498c4a6ce934885982c3 Author: maxv Date: Fri Feb 2 10:49:01 2018 +0000 Fix memory leak. Contrary to what the XXX indicates, this place is 100% reachable remotely. commit af8833b7d00373007aeea4d0938d2a9811f191b4 Author: maxv Date: Fri Feb 2 09:01:17 2018 +0000 Style, no functional change. commit bd160ad4d50df527e7ea1285c4a2eee05ecccadf Author: maxv Date: Fri Feb 2 06:23:45 2018 +0000 Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE, not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain on an mbuf that was already freed. commit 1a4f88a4e2d0acfc70937dd0a79981e706e83237 Author: maxv Date: Thu Feb 1 17:22:45 2018 +0000 Remove unused (and a reference to ovbcopy along the way). commit 2518beab7192994ad5e389cbe68ac971e39e2bda Author: maxv Date: Thu Feb 1 17:16:11 2018 +0000 Replace ovbcopy -> memmove, same. commit 99f4cb9d6479f4f3bb4e9b74e14c7ecd17c8426f Author: maxv Date: Thu Feb 1 16:49:34 2018 +0000 Replace ovbcopy -> memmove, same. commit f7fb9721940cc82edbfa308328abbca108f76856 Author: maxv Date: Thu Feb 1 16:36:01 2018 +0000 Style, no real functional change. commit cf3d29a5e637be684ddbc6426e25c5ed80de481f Author: maxv Date: Thu Feb 1 16:23:28 2018 +0000 Remove this code, RH0 must be dropped, according to RFC5095. FreeBSD and OpenBSD already do the same. Also, style, and remove useless includes. commit 249e4244f1efa02188a632177242e30d8f65cd23 Author: maxv Date: Thu Feb 1 16:17:00 2018 +0000 Fix the ICMP error code. rh was obtained via IP6_EXTHDR_GET, and it is not guaranteed to be in the same mbuf as ip6, so computing the difference between the pointers may result in a wrong offset. ip6 is now unused, so remove it. commit d83bb81960b590524ba24889cac72db9cd33631a Author: maxv Date: Thu Feb 1 15:53:16 2018 +0000 Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so it is wrong to read ip6->ip6_nxt. commit c5535f66abc098710fbc0c394d52068a1d3b18d4 Author: maxv Date: Thu Feb 1 07:49:19 2018 +0000 Style, and remove the 'len' argument from mld_allocbuf(), it is misleading, we only want a static struct. Beyond that no functional change. commit f1cd34f4e2f04e947d7da2cfa06745482897e9f6 Author: maxv Date: Wed Jan 31 15:23:08 2018 +0000 Correct the check; we want to find IPPROTO_HOPOPTS, not IPV6_HOPOPTS. This just couldn't work. By the way, I'm wondering what is the point of this block. Calling ip6_hopopts_input() won't achieve anything useful, and it could actually be a problem, because there are several paths in it that call icmp6_error, which calls ip6_output, and then we're back in the same function. Besides it is possible to reach icmp6_error with a packet we emitted (as opposed to a packet we are forwarding), and in that case we are sending an ICMP error back to ourselves. commit c6a2a7a962d877c5924950bfdfabf9d48bcd8265 Author: maxv Date: Wed Jan 31 14:16:28 2018 +0000 Remove a misleading instruction. We don't care about increasing m_pkthdr.len in ip6_insertfraghdr(), it gets recomputed after calling this function. If we cared there would be a bug, since we don't increase it in the other branches. commit ca52151aebd00d24c0ebaa54122c9b31fd287381 Author: maxv Date: Wed Jan 31 14:10:11 2018 +0000 Try to sound a little less pessimistic, there is nothing wrong here. commit 15a6888c52096c5affb80ef081286d88ea940da9 Author: maxv Date: Wed Jan 31 13:57:08 2018 +0000 Style, localify, constify, and reorder a bit. No real functional change. commit 1b3838becc6e29a3e6847fe8daf1d6411846832a Author: maxv Date: Tue Jan 30 15:54:02 2018 +0000 Style, localify, remove dead code, and fix typos. No functional change. commit 73e004131f97d56a6f9226c86b9dce9657a7e35a Author: maxv Date: Tue Jan 30 15:35:31 2018 +0000 Kick nested fragments. commit 9cd682207c7c57865815b46a8f0c27d7e520cd74 Author: maxv Date: Tue Jan 30 14:49:25 2018 +0000 Fix a buffer overflow in ip6_get_prevhdr. Doing mtod(m, char *) + len is wrong, an option is allowed to be located in another mbuf of the chain. If the offset of an option within the chain is bigger than the length of the first mbuf in that chain, we are reading/writing one byte of packet- controlled data beyond the end of the first mbuf. The length of this first mbuf depends on the layout the network driver chose. In the most difficult case, it will allocate a 2KB cluster, which is bigger than the Ethernet MTU. But there is at least one way of exploiting this case: by sending a special combination of nested IPv6 fragments, the packet can control a good bunch of 'len'. By luck, the memory pool containing clusters does not embed the pool header in front of the items, so it is not straightforward to predict what is located at 'mtod(m, char *) + len'. However, by sending offending fragments in a loop, it is possible to crash the kernel - at some point we will hit important data structures. As far as I can tell, PF protects against this difficult case, because it kicks nested fragments. NPF does not protect against this. IPF I don't know. Then there are the more easy cases, if the MTU is bigger than a cluster, or if the network driver did not allocate a cluster, or perhaps if the fragments are received via a tunnel; I haven't investigated these cases. Change ip6_get_prevhdr so that it returns an offset in the chain, and always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET leaves M_PKTHDR untouched. This place is still fragile. commit 1aba4ed0782586a9d437b6f606d612ad34c04c96 Author: maxv Date: Mon Jan 29 10:57:13 2018 +0000 Start cleaning up ip6_input.c. Several pieces of code have evolved but their neighboring comments were not updated. So update them, and remove code that has been disabled for years (it has no use anyway). commit f2cc61990e251bf91a2110ba878a67cfa98d3740 Author: maxv Date: Mon Jan 29 08:27:10 2018 +0000 Style, and use __cacheline_aligned. By the way, it would be nice to revisit the use of 'ip6flow_lock' in ip6flow_fastforward(): it is taken right away because of 'ip6flow_inuse', but then we perform several checks that do not require it. commit fc6f984cc69f03565e551f333115f3078286d970 Author: maxv Date: Mon Jan 29 08:17:18 2018 +0000 style commit b638ff5f368872416b4ceb98a512b34ec06a9abc Author: maxv Date: Mon Jan 29 08:14:54 2018 +0000 Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed, and a 'goto out' is missing after ipsec6_process_packet. commit 307cd2102ece82d5d94b94fdc12a511c39315e85 Author: maxv Date: Sat Jan 27 18:48:59 2018 +0000 Declare INTR_RECURSE_HWFRAME, same as amd64. commit f38f8f594419145d5a93d6e2ec51c954713c7e4d Author: maxv Date: Sat Jan 27 18:44:19 2018 +0000 style commit cf313bef0134d1d96d70a20843e8f9e672515a70 Author: maxv Date: Sat Jan 27 18:27:08 2018 +0000 Put the default %cs value in INTR_RECURSE_HWFRAME. Pushing an immediate costs less than reading the %cs register and pushing its value. This value is not allowed to be != GSEL(GCODE_SEL,SEL_KPL) in all cases. commit 237a6113ef2762bfea04b862b3f3c4e093b86c95 Author: maxv Date: Sat Jan 27 18:17:57 2018 +0000 Declare and use INTR_RECURSE_ENTRY, an optimized version of INTRENTRY. When processing deferred interrupts, we are always entering the new handler in kernel mode, so there is no point performing the userland checks. Saves several instructions. commit 07d9ef39dd75309559c3fa95fbfedfe30effbb36 Author: maxv Date: Sat Jan 27 17:54:13 2018 +0000 Use testb, faster. commit 3faddf38eeb6e23f40ab633939afba6c3b658ca1 Author: maxv Date: Sat Jan 27 15:31:10 2018 +0000 SMAP on i386. commit ce18565ba866a14ff07afccca5f9c9aab17b1bab Author: maxv Date: Sat Jan 27 09:33:25 2018 +0000 Add SMAP support for i386. commit 324866daae84490e2bfdcb627d3b501df4d8b9fa Author: maxv Date: Sat Jan 27 08:12:27 2018 +0000 Remove DO_DEFERRED_SWITCH and DO_DEFERRED_SWITCH_RETRY, unused. commit 43f219b530221f50400a5d83ce517ad6e105fbda Author: maxv Date: Sat Jan 27 08:05:14 2018 +0000 Use .pushsection (like amd64), and align INTRENTRY. commit 5fd85b28c840883e7137263c2e1135e1635693d2 Author: maxv Date: Sat Jan 27 07:51:04 2018 +0000 Remove these files. No one cares about this on i386, and there is no point in keeping undocumented options nobody understands anyway. commit f42d24b1306958665f69353f1656ee49dbce2c10 Author: maxv Date: Sat Jan 27 07:45:57 2018 +0000 Sync with amd64, in particular, add END() markers, don't fall through functions, narrow the copy windows, and remove suword. commit dcb9a0093649a3f235d52417330f5ba4be23b4e3 Author: maxv Date: Fri Jan 26 14:47:41 2018 +0000 A few fixes: * Style. * Don't add M_PKTHDR manually, that's absolutely forbidden. Add a KASSERT to make sure it's already there. * Add a missing NULL check after m_pullup. commit 56eb496fd31fdec944e9f93c2229420626b50319 Author: maxv Date: Fri Jan 26 14:41:22 2018 +0000 Add etherip, so that we at least know it exists on amd64. commit b10e43e72f699e772dab05635365d5de8edf6de6 Author: maxv Date: Fri Jan 26 14:38:46 2018 +0000 Zero out the scratch value in the UTLS page during context switches. We temporarily put %rax there when processing syscalls, and we wouldn't want the new lwp to see the %rax value of the previous lwp. commit e8e282df9bfacc572af317daef7cd77f23f6a9cb Author: maxv Date: Fri Jan 26 14:10:15 2018 +0000 Use MH_ALIGN instead, ok knakahara@. commit c4f9b1288bb2f870a898a2de97bc414988b4fd22 Author: maxv Date: Fri Jan 26 11:06:32 2018 +0000 Don't call if_attach, do if_initialize+if_register, otherwise when an EtherIP packet is received the first KASSERT in if_input() fires. commit 4945fb7ac0ec6d61d156d9d343cf77a486d83d70 Author: maxv Date: Fri Jan 26 07:49:15 2018 +0000 Several fixes in L2TP: * l2tp_input(): use m_copydata, and ensure there is enough space in the chain. Otherwise overflow. * l2tp_tcpmss_clamp(): ensure there is enough space in the chain. * in_l2tp_output(): don't check 'sc' against NULL, it can't be NULL. * in_l2tp_input(): no need to call m_pullup since we use m_copydata. Just check the space in the chain. * in_l2tp_input(): if there is a cookie, make sure the chain has enough space. * in6_l2tp_input(): same changes as in_l2tp_input(). Ok knakahara@ commit 2a22560dd4876b2824b7513e9cb86cc25ab2964b Author: maxv Date: Thu Jan 25 20:55:15 2018 +0000 Kick zero-sized fragments. We can't allow them to enter; two fragments could be put at the same offset. commit 22033410bc614f578aea0b20528467d04cfd15c6 Author: maxv Date: Thu Jan 25 15:55:57 2018 +0000 Remove outdated comment and fix typo. commit c8a0be6c80872d8d9d35c1ae7a9aa9e554761c80 Author: maxv Date: Thu Jan 25 15:33:06 2018 +0000 Several changes: * Move the structure definitions into frag6.c, they should not be used elsewhere. * Rename ip6af_mff -> ip6af_more, and switch it to bool, easier to understand. * Remove IP6_REASS_MBUF, no point in keeping this. * Remove ip6q_arrive and ip6q_nxtp, unused. * Style. commit 63ccfd5c2ca1bfe6e51ba84411b265f1fcfec4dc Author: maxv Date: Thu Jan 25 10:45:58 2018 +0000 Style, reduce the indentation level when possible, and add a missing NULL check after M_PREPEND. commit 88db1223505810c376681fb772fe5b9942b67d25 Author: maxv Date: Thu Jan 25 10:33:37 2018 +0000 style commit 581927ec91d421dfdd7f265ee0f52803bfc39f27 Author: maxv Date: Thu Jan 25 09:33:21 2018 +0000 Improve wording. commit 5068196fb271fdfab5c47ac3e80863fc82f92c71 Author: maxv Date: Thu Jan 25 09:29:18 2018 +0000 Improve wording, and put a new drawing, from me and Kengo Nakahara. commit 72c6a0abe6f0375e96361eba569d8126b42afb4b Author: maxv Date: Wed Jan 24 14:39:14 2018 +0000 style commit 7ece7efeda79499a70f9c53946150a12b05c4cd1 Author: maxv Date: Wed Jan 24 14:37:34 2018 +0000 As I said in my last commit in this file, ipo should be set to NULL; otherwise the 'local address spoofing' check below is always wrong on IPv6. commit c7a2c4fc78ed1c458b56a13ff139004a030c5552 Author: maxv Date: Wed Jan 24 14:28:13 2018 +0000 Fix the iteration: IPPROTO_FRAGMENT options are special, in the sense that they don't have a 'length' field. It is therefore incorrect to read ip6e.ip6e_len, it contains garbage. I'm not sure whether this an exploitable vulnerability. Because of this bug you could theoretically craft 'protoff', which means that you can have the kernel patch the nxt value at the wrong place once the packet is decrypted. Perhaps it can be used in some unusual MITM - a router that happens to be between two IPsec hosts adds a frag6 option in the outer IPv6 header to trigger the bug in the receiver -, but I couldn't come up with anything worrying. commit 81da73478e7ec0e4d402f56f8c2a3c0de4c7202d Author: maxv Date: Wed Jan 24 14:01:40 2018 +0000 ipsec4_fixup_checksum calls m_pullup, so don't forget to do mtod() again, to prevent use-after-free. In fact, the m_pullup call is never reached: it is impossible for 'skip' to be zero in this function, so add an XXX for now. commit 935b2f9a4f059650d2e0af811aee6ad1dce41ab4 Author: maxv Date: Wed Jan 24 13:54:16 2018 +0000 Add missing NULL check. Normally that's not triggerable remotely, since we are guaranteed that 8 bytes are valid at mbuf+skip. commit d8ca205077a363228996e6ecdda531be1fa609c7 Author: maxv Date: Wed Jan 24 13:49:23 2018 +0000 Reinforce and clarify. commit d912334212803ec0a6457263308e6c22a3d4d528 Author: maxv Date: Wed Jan 24 13:30:47 2018 +0000 Fix a vulnerability in IPsec-IPv6-AH, that allows an attacker to remotely crash the kernel with a single packet. In this loop we need to increment 'ad' by two, because the length field of the option header does not count the size of the option header itself. If the length is zero, then 'count' is incremented by zero, and there's an infinite loop. Beyond that, this code was written with the assumption that since the IPv6 packet already went through the generic IPv6 option parser, several fields are guaranteed to be valid; but this assumption does not hold because of the missing '+2', and there's as a result a triggerable buffer overflow (write zeros after the end of the mbuf, potentially to the next mbuf in memory since it's a pool). Add the missing '+2', this place will be reinforced in separate commits. commit 16ba966c0982123155e655e33286ae81f3a52455 Author: maxv Date: Wed Jan 24 13:13:11 2018 +0000 Revert a part of rev1.49 (six months ago). The pointer given to memcpy was correct. Discussed with Christos and Ryota. commit bb260d0dc777ed30177e15b6a2d36ed16a03b0d8 Author: maxv Date: Tue Jan 23 15:13:56 2018 +0000 Fix the calculation of the ICMP6 error pointer. It is not correct to use pointer = opt - mtod(m, u_int8_t *) because m may have gone through m_pulldown, and it is possible that m->m_data is no longer the beginning of the packet. commit 4f1e702b58fcb7b8d358d7781862252a5eb84cb9 Author: maxv Date: Tue Jan 23 10:55:38 2018 +0000 Style, localify, remove XXX when there's no issue, and switch 'extra' to int. commit 2bbac22c2c5761009a7dde2dd71829054e4fca61 Author: maxv Date: Tue Jan 23 10:46:59 2018 +0000 Fix the check on 'maxlen', we are not creating struct icmp6_hdr but struct nd_redirect (which is bigger). Also, make sure we can add a struct nd_opt_rd_hdr. Normally this doesn't change anything, since the mbuf has IPV6_MMTU bytes, and it's always way bigger than what we need. commit 0421c243c3a5fd543be50c4f34b434485670f8b4 Author: maxv Date: Tue Jan 23 10:32:50 2018 +0000 Fix info leak. We are allocating a slot of size: roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8) But we are not filling in the padding caused by the roundup, and therefore several bytes are leaked, in the mbuf we're about to send to the network. commit 8b0909c33ffde55a4ad76fc30d928edac86f31c9 Author: maxv Date: Tue Jan 23 09:21:59 2018 +0000 Fix twice the same mistake: 'last' can't be null, so there's no point in having this misleading branch. commit 53325d1d86d6767cbc431b2769ed808b9885641c Author: maxv Date: Tue Jan 23 07:33:49 2018 +0000 Don't use global variables, that's obviously incorrect on MP systems. One remains, because it is imported in tcp_timer.c, and I'm not totally sure of how it interacts with icmp_mtudisc(). commit 47c8173cf1c86ba6c0da2f18dbbee6fb2ef01557 Author: maxv Date: Tue Jan 23 07:15:04 2018 +0000 Style, localify icmp_send, and add a clear KASSERT (that replaces a vague comment). commit acc4ca4752ee042651d22dd949d736e5303701c8 Author: maxv Date: Tue Jan 23 07:02:57 2018 +0000 Style, and four fixes: * Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it will have M_DECRYPTED, and this is already checked. * Memory leaks in icmp6_error2. They seem hardly triggerable. * Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed to be located right after the IP6 header. ok mlelstv@ * Memory leak in _icmp6_input. This one seems to be impossible to trigger. commit c4086b98cd282474fb4e275d0e0295340fcca0ec Author: maxv Date: Mon Jan 22 15:05:27 2018 +0000 Style and clarify. commit d964a2074dfe89d05d3c028824e76fd87a9f2c3e Author: maxv Date: Mon Jan 22 14:40:53 2018 +0000 Fix memory leak, looks like there is still something wrong here. commit d3daf4df906e16f607943fbb9996f8a9dc5ac2f9 Author: maxv Date: Mon Jan 22 10:40:36 2018 +0000 m_split does not 'attempt' to restore the chain, it just restores it plain and simple. commit cb86046726ea76f410642108b8d81e04db6954d6 Author: maxv Date: Mon Jan 22 10:26:38 2018 +0000 m_prepend does not tolerate being given len > MHLEN, so add a panic, and document this behavior. commit e82d707cf2cf1b20297a984c35ffc846f3e5bc98 Author: maxv Date: Mon Jan 22 09:51:06 2018 +0000 Fix null deref, m could be NULL if M_PREPEND fails. commit b228904f98aee0a57a5f45d80814b5fcf8182661 Author: maxv Date: Mon Jan 22 09:06:40 2018 +0000 Style, no functional change. commit e43c0c085654f4a50a0e16e7e653f311fde9e274 Author: maxv Date: Mon Jan 22 08:14:09 2018 +0000 Ah, remove duplicate SVS_LEAVE. Fixes 32bit binaries. While here remove duplicate 'cli', but that's harmless. commit 88de34fa89c23967d97a5ecaf54b780fb9484522 Author: maxv Date: Mon Jan 22 07:22:52 2018 +0000 Fix m_prepend(). If 'm' is not a pkthdr, it doesn't make sense to use MH_ALIGN, it should rather be M_ALIGN. I'm wondering whether there should not be a KASSERT to make sure 'm' is always a pkthdr. commit 7392dcf19fb56108121807099183cb8d444ef6dd Author: maxv Date: Mon Jan 22 07:11:45 2018 +0000 Add KASSERTs in *_ALIGN: ensure the mbuf is of the correct type, and also make sure m->m_data points at the beginning of the mbuf. commit 4af1fb188e52fd96f591dd6635c5276f35ceeb39 Author: maxv Date: Mon Jan 22 06:56:25 2018 +0000 Adapt previous, reintroduce MH_ALIGN. It's used as an optimization - we can later prepend something to the current mbuf without having to allocate a new mbuf. commit e9098b58e63501398b4d6d0857c7d67a5fc9703e Author: maxv Date: Sun Jan 21 14:18:21 2018 +0000 Switch sp_timoff to u_int16_t, to prevent possible overflow in ieee80211_recv_mgmt_beacon(). Actually this field is unused. commit a55b6b32e89b82b39dc6b73de746b72202144041 Author: maxv Date: Sun Jan 21 14:13:49 2018 +0000 Appease the overflow check, 4 is enough. commit db265d37b0674ad971f5d165c57f76f979a9bf8e Author: maxv Date: Sun Jan 21 11:21:40 2018 +0000 Unmap the kernel from userland in SVS, and leave only the needed trampolines. As explained below, SVS should now completely mitigate Meltdown on GENERIC kernels, even though it needs some more tweaking for GENERIC_KASLR. Until now the kernel entry points looked like: FUNC(intr) pushq $ERR pushq $TRAPNO INTRENTRY ... handle interrupt ... INTRFASTEXIT END(intr) With this change they are split and become: FUNC(handle) ... handle interrupt ... INTRFASTEXIT END(handle) TEXT_USER_BEGIN FUNC(intr) pushq $ERR pushq $TRAPNO INTRENTRY jmp handle END(intr) TEXT_USER_END A new section is introduced, .text.user, that contains minimal kernel entry/exit points. In order to choose what to put in this section, two macros are introduced, TEXT_USER_BEGIN and TEXT_USER_END. The section is mapped in userland with normal 4K pages. In GENERIC, the section is 4K-page-aligned and embedded in .text, which is mapped with large pages. That is to say, when an interrupt comes in, the CPU has the user page tables loaded and executes the 'intr' functions on 4K pages; after calling SVS_ENTER (in INTRENTRY) these 4K pages become 2MB large pages, and remain so when executing in kernel mode. In GENERIC_KASLR, the section is 4K-page-aligned and independent from the other kernel texts. The prekern just picks it up and maps it at a random address. In GENERIC, SVS should now completely mitigate Meltdown: what we put in .text.user is not secret. In GENERIC_KASLR, SVS would have to be improved a bit more: the 'jmp handle' instruction is actually secret, since it leaks the address of the section we are jumping into. By exploiting Meltdown on Intel, this theoretically allows a local user to reconstruct the address of the first text section. But given that our KASLR produces several texts, and that each section is not correlated with the others, the level of protection KASLR provides is still good. commit dc0df07afef4614c03dc9411870a40737528b04d Author: maxv Date: Sun Jan 21 10:59:21 2018 +0000 Increase the size of the initial mapping of the kernel. KASLR kernels are bigger than their GENERIC counterparts, and the limit will soon be hit on them. commit 10de2466368b597d9978f61e526b14f530b1e5b5 Author: maxv Date: Sun Jan 21 08:33:46 2018 +0000 Fix the build, on Xen too amd64_trap.S needs to be compiled independently. commit fe9c2bdb57ebb8a1b41ca4fb1422ad3905f56edd Author: maxv Date: Sun Jan 21 08:20:30 2018 +0000 Make it possible for SVS to map in the user page tables a 4K kernel page contained in a 2MB large page. Will be used soon. commit 5bd45847270dcc6f9a156d2011c9457110fa99c7 Author: maxv Date: Sat Jan 20 14:39:21 2018 +0000 Use .pushsection/.popsection, we will soon embed macros in several layers of nested sections. commit 0c599f62d221b0a37ea6f62113acd85edcd5debb Author: maxv Date: Sat Jan 20 14:27:14 2018 +0000 Compile amd64_trap.S as a file instead of including it. commit 473011e7521609f3f6728e1b4d8562d5c1ac1a09 Author: maxv Date: Sat Jan 20 14:08:08 2018 +0000 Start with .text not to inherit the last section of amd64_trap.S, and remove outdated #define. commit f8a54ad1c68df33c1cc549f29818d2ecbeb75228 Author: maxv Date: Sat Jan 20 13:45:15 2018 +0000 Eliminate a '.text'. commit e4226ba1496e042911cb1aa2db4307d61becaceb Author: maxv Date: Sat Jan 20 13:42:07 2018 +0000 Don't declare exceptions[] with IDTVEC, it's an array, not a function. Rename it to x86_exceptions[], and move it to .rodata. commit 6b89df06b6e3167c9bdebcf89abab349a1b5dc2a Author: maxv Date: Sat Jan 20 08:45:28 2018 +0000 Mmh, restore PG_G on the direct map, we still want that in the non-SVS case. commit ce135aef2b31ab6619d28f94e34daec751afb489 Author: maxv Date: Sat Jan 20 08:30:53 2018 +0000 Fix the double-fault handler. We're executing on ist1 and must not jump out of it, so don't enable interrupts. And use the SVS_*_ALTSTACK macros. While here, fix the NMI handler too: it should use SVS_LEAVE_ALTSTACK. commit c934d768b72eaa4e1741c30be10767d67226d0b0 Author: maxv Date: Sat Jan 20 07:43:28 2018 +0000 Improve two comments and a KASSERT. commit fd44ad84e18def31e0414a7e30ecda36cb8ce3f1 Author: maxv Date: Fri Jan 19 15:04:29 2018 +0000 Several changes: * Declare TRIM_LABEL as a function. * In mpls_unlabel_inet, copy the label locally. It's not incorrect to keep a pointer on the mbuf, but it's bug-friendly. * In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we just want to make sure that m_copydata won't fail, but if we were guaranteed that m has M_PKTHDR set, we could simply check the length against m->m_pkthdr.len. commit 18b1d5044ebfdf6f837edec432e41b67ef5acead Author: maxv Date: Fri Jan 19 14:30:09 2018 +0000 Fix build failure, the structure is already defined now. commit ce7b46cade8371fa23534c6ee60853bb7bd44760 Author: maxv Date: Fri Jan 19 14:15:35 2018 +0000 Add XXX. commit 1a34d4597de0b222d8a7066f1818aa7d34f7658c Author: maxv Date: Fri Jan 19 13:17:29 2018 +0000 Fix a buffer overflow in icmp_error. We create in 'm' a packet that must contain: IPv4 header | Fixed part of ICMP header | Variable part of ICMP header But we perform length checks on 'totlen', which does not count the IPv4 header. So now, add sizeof(struct ip) in totlen, and stop doing this m_data nonsense, just get the pointers as usual. commit 688ca3bc97ffdf60555b68a4b57c31f0bc75ecb5 Author: maxv Date: Fri Jan 19 12:50:27 2018 +0000 Clarify icmp_error: * Rename (and constify) oiplen -> oiphlen. * Rename icmplen -> datalen, it's the size of the variable part of the ICMP header, not the total size of the ICMP header itself. * Introduce totlen, this is the total size of the ICMP header (icmp_ip included). No real functional change. commit e981c552f0760e0140bcd311eda4deb346ce7d50 Author: maxv Date: Fri Jan 19 10:54:31 2018 +0000 Move the ICMP Extension structures from mpls_ttl.c to ip_icmp.h; that's part of the ICMP protocol (per RFC4884), and not specific to MPLS. Also add ih_exthdr in struct icmp, the 'length' field appeared. While here, style in MPLS. commit 9270cdb99c727fdffb6c17b5a557c26868b12e48 Author: maxv Date: Fri Jan 19 10:21:24 2018 +0000 Style, explain a bit, and fix icmp_radv, it should be icmp_dun.id_radv. commit 89d4dc48e5f77de02b48deae73631f05bf836f12 Author: maxv Date: Fri Jan 19 07:58:25 2018 +0000 Style, no functional change. commit b16483e76446e650752d662afb30cf401d3ca2c6 Author: maxv Date: Fri Jan 19 07:57:50 2018 +0000 Style, and check the return value of m_append. commit 4dccc06ca7f83d96bfe7f732fb511fe4d32092cd Author: maxv Date: Fri Jan 19 07:53:46 2018 +0000 Style, no functional change. commit 671dd6e4491164fb42b3691f1536201fe5bc2ac9 Author: maxv Date: Fri Jan 19 07:52:37 2018 +0000 Style, and make sure that there is a header+trailer included in the packet. The crypto functions can touch the trailer, but they don't check whether it's there in the first place. commit 8db79d308940bef674ef9540f788b73c82ee9c43 Author: maxv Date: Thu Jan 18 17:59:29 2018 +0000 Style, no functional change. commit 0fcd3416fce91356dc5113b89f403a72dfcd69f9 Author: maxv Date: Thu Jan 18 17:57:49 2018 +0000 Style, and zero out 'ns' entirely, otherwise some bytes get leaked to userland (eg ns_rsvd0). commit 5acf42059777aba9d36dc0a049dd5c25d281984c Author: maxv Date: Thu Jan 18 16:23:43 2018 +0000 Several changes: * Make the code more readable. * Add a panic in ieee80211_compute_duration(). I'm not sure there's a bug here - I don't have the hardware -, but looking at the code, it may be possible for 'paylen' to go negative. Obviously that's not the correct way to fix it, but at least we'll see if it happens. commit a923103bba2ef45bebb14ee0850e90c664ba7258 Author: maxv Date: Thu Jan 18 13:31:20 2018 +0000 Don't return the address of the kernel modules if the user is not privileged. Discussed on tech-kern@. commit ed600df743388049da5376c7506e6a97126d0929 Author: maxv Date: Thu Jan 18 13:24:01 2018 +0000 Several changes: * Make the code more readable. In particular, declare variables as const along the way. * Explain what we're doing in ieee80211_send_mgmt(). The IEEE80211_FC0_SUBTYPE_PROBE_RESP case has some inconsistencies, but they are not inherently wrong so I'm not changing that. * When sending IEEE80211_FC0_SUBTYPE_REASSOC_RESP frames, make sure to zero out the 'association ID', otherwise two bytes are leaked. * Fix a possible memory leak in ieee80211_send_probereq(). commit 72152b7e3de641a6959f719b93783c0d5f288492 Author: maxv Date: Thu Jan 18 07:25:34 2018 +0000 Unmap the kernel heap from the user page tables (SVS). This implementation is optimized and organized in such a way that we don't need to copy the kernel stack to a safe place during user<->kernel transitions. We create two VAs that point to the same physical page; one will be mapped in userland and is offset in order to contain only the trapframe, the other is mapped in the kernel and maps the entire stack. Sent on tech-kern@ a week ago. commit a8560c326f1ed7d804cd5213a5eaa7eb95fe0089 Author: maxv Date: Wed Jan 17 17:41:38 2018 +0000 Style, and fix two pretty bad mistakes in the crypto functions: * They call M_PREPEND, but don't pass the updated pointer back to the caller. * They use memmove on the mbuf data, but they don't ensure that the area they touch is contiguous. This fix is not complete, ieee80211_crypto_encap too needs to pass back the updated pointer. This will be done in another commit. commit 1f6505e241170730cfc9638d1140e95641792822 Author: maxv Date: Wed Jan 17 16:03:16 2018 +0000 Several changes: * Style in several places, to make the code more readable or easier to understand. * Instead of checking m->m_pkthdr.len, check m->m_len. m_pkthdr.len is the total size of the packet, not the size of the current mbuf (which may be smaller). * Add a missing length check when handling QoS frames. * Cast the lengths passed in IEEE80211_VERIFY_LENGTH to size_t. * Remove the length check on scan.sp_xrates, that I added yesterday. xrates gets silently truncated in ieee80211_setup_rates(). * Fix several buffer overflows in the parsers of the MANAGEMENT frames. commit 998adca5710310a8b1e374a33e263497b876c555 Author: maxv Date: Tue Jan 16 18:53:32 2018 +0000 Various fixes: style, remove tiring XXXs, and prevent integer overflow in ieee80211_setup_rates (normally it already can't happen, because I added a length check on xrates in ieee80211_recv_mgmt_beacon). commit 932b8ca71a28ce8cf9644098d27146beb9207aa9 Author: maxv Date: Tue Jan 16 18:42:43 2018 +0000 Prepend 'sp_' to the name of the fields, so that they can easily be found via NXR or grep. commit 91c2939b07de14d2c9831dfdf9219b0c53fdd670 Author: maxv Date: Tue Jan 16 16:54:54 2018 +0000 Add comments about the length checks, and check xrates. commit 2cceeffafd6eb5800e379fe4558f1f9d3178404a Author: maxv Date: Tue Jan 16 16:31:37 2018 +0000 Gather related code. commit 992f61c7a5ab9875acfd6340f13bf53c9e49e4de Author: maxv Date: Tue Jan 16 16:20:57 2018 +0000 Style on the new functions. commit fbc428b7d848859aaa069344ed93a84914998e0e Author: maxv Date: Tue Jan 16 16:09:30 2018 +0000 Introduce ieee80211_recv_mgmt_disassoc. commit 9838344e164f40d81a3f2011b7f19e5e4409d1b9 Author: maxv Date: Tue Jan 16 16:04:16 2018 +0000 Introduce ieee80211_recv_mgmt_deauth. commit f12320afdecc763e8729c70117321d853d7c4ce6 Author: maxv Date: Tue Jan 16 16:00:17 2018 +0000 Introduce ieee80211_recv_mgmt_assoc_resp. commit f02200cd42b0170f9e0d0addd35976571fccc260 Author: maxv Date: Tue Jan 16 15:55:14 2018 +0000 Introduce ieee80211_recv_mgmt_assoc_req. commit d539df6fc2e4a5c211848c9dfc5254668cc832f7 Author: maxv Date: Tue Jan 16 15:48:32 2018 +0000 Introduce ieee80211_recv_mgmt_auth. commit d5b4fe97c44dc5d310270731c3b3ff0d893b61ea Author: maxv Date: Tue Jan 16 15:42:52 2018 +0000 Start splitting ieee80211_recv_mgmt. commit d905e8cc8cafb53b717022acb5bc0be76a0acc70 Author: maxv Date: Tue Jan 16 15:18:37 2018 +0000 More overflows... commit d8dc7fea125d58983aa644b26ab50e3777c0d733 Author: maxv Date: Tue Jan 16 14:37:24 2018 +0000 Fix overflow. commit 5d0dea8f3b939eadeddc158b1308b6a599f2a2af Author: maxv Date: Tue Jan 16 14:23:15 2018 +0000 Mmh refix previous, we also need to make sure frm[1] is there. commit 2a12fed139d1ef416fabda01cba79cd805f60ad2 Author: maxv Date: Tue Jan 16 14:01:13 2018 +0000 Fix memory leak. If m1 == m, m = NULL, so it's safe to just call m_freem. commit 65cefb8ef9be9474c3c212f3044c75b95bc0a9c5 Author: maxv Date: Tue Jan 16 13:48:21 2018 +0000 Fix overflow, noted by Maya. commit a230e38aa11e68456d9303631941ecb115fbcd1f Author: maxv Date: Tue Jan 16 09:42:11 2018 +0000 Style, remove pointless XXXs, and add a comment about LLC. commit 9af01d3acaa9a58a5df54a7736aeb62600afca3a Author: maxv Date: Tue Jan 16 09:04:30 2018 +0000 Update the mbuf pointer when m_pullup succeeds, I forgot this in my last revision (I only fixed the UAF in one branch). Meanwhile, style. commit 48575915a1fb6d001f0a35ebe5adb9b6544ae66d Author: maxv Date: Tue Jan 16 08:39:29 2018 +0000 Split ieee80211_input into three sub-functions, that parse received packets depending on their type: DATA -> ieee80211_input_data MANAGEMENT -> ieee80211_input_management CONTROL -> ieee80211_input_control No real functional change, but makes the code much clearer. commit 696f1d75feed6da194aa9e61c3936b79a2930e10 Author: maxv Date: Tue Jan 16 07:53:02 2018 +0000 Start cleaning up this mess. commit b1ae229bf4514e2e9579ee5844b4cfb5f300facf Author: maxv Date: Tue Jan 16 07:05:24 2018 +0000 Fix overflow. commit 1ac20093ba75487c5e626dc566a0eff6efb89ab9 Author: maxv Date: Tue Jan 16 06:38:42 2018 +0000 style commit fd270400bc2b1742a0d122e9b6f8120b43d70c9e Author: maxv Date: Mon Jan 15 16:36:51 2018 +0000 Mostly style, and add a bunch of KASSERTs. commit 1c41d1a636e0a739e59956edef88c24740211a87 Author: maxv Date: Mon Jan 15 14:00:34 2018 +0000 Style, and fix a bug in the AppleTalk path: we're doing M_PREPEND(M_DONTWAIT), but we forgot to NULL-check the mbuf afterwards. commit 22973b3b84fcf2995bc66895210c538d561e7861 Author: maxv Date: Mon Jan 15 13:14:18 2018 +0000 Fix two bugs in altq_etherclassify. When scanning the mbuf chain we need to make sure that m_next is not NULL, otherwise NULL deref. After that, we must not touch m->m_pkthdr, given that 'm' may not be the first mbuf of the chain anymore. Declare mtop, and add a KASSERT to make sure it has M_PKTHDR set. commit d8e2bf1537abbdc077a407d7af0642f749617e02 Author: maxv Date: Mon Jan 15 13:05:40 2018 +0000 Add a KASSERT in IFQ_CLASSIFY, we really need to make sure the given mbuf is the top of the chain. commit c6b2ba126a8acb5441cd35871ad54c575cab37fa Author: maxv Date: Mon Jan 15 12:17:05 2018 +0000 Fix a bug in the VLAN path: there's an inverted logic, the mbuf needs to be bigger than struct ether_vlan_header, not smaller. Meanwhile add a KASSERT in the LLC path. commit ccc2590d96e080d7af0f7f46081fad9d8c4cef6e Author: maxv Date: Mon Jan 15 11:57:27 2018 +0000 Style, make the code more readable, and add a KASSERT (we expect the mbuf to have M_PKTHDR set). commit 6eb93b7553eb45fc9a83dd38fa4fcbdd32cde170 Author: maxv Date: Mon Jan 15 11:16:04 2018 +0000 Mmh, fix a weird mistake: the guy who added #if NVLAN > 0 forgot to actually include vlan.h, so the branches are never compiled. They don't compile, by the way, so fix that too, by reproducing the vlan input path of ether_input(). commit 553247e8ae87745f8438e4233b458a4009eb2f9c Author: maxv Date: Mon Jan 15 10:27:51 2018 +0000 Several fixes: - Style and typos - Use kmem_zalloc, in case there is a padding between the fields of the structures - Use ETHER_ADDR_LEN instead of a hard-coded '6' - kmem_alloc(KM_SLEEP) can't fail - Simplify ether_aton_r - Use mutex_obj_free, not to leak memory commit 165a302b0f2d1a41154d61bb445e224aa6972c08 Author: maxv Date: Mon Jan 15 09:49:16 2018 +0000 If the bridge is not running, don't call bridge_stop. Otherwise the following commands will crash the kernel: ifconfig bridge0 create ifconfig bridge0 destroy commit db7cd8371a90b511f8fe7cb1ec51a7670ec3b907 Author: maxv Date: Mon Jan 15 09:26:21 2018 +0000 Fix spl leak. ifconfig gif0 create ifconfig gif0 destroy WARNING: SPL NOT LOWERED ON ... commit 0031abc3fe575856e907ac3b778d398e49c3de62 Author: maxv Date: Mon Jan 15 08:45:19 2018 +0000 Style, improve comment, and add KASSERTs on the assumptions. commit 7012bf78607cb957e7f23751989934afc9fa47fd Author: maxv Date: Mon Jan 15 07:59:48 2018 +0000 Fix the net.ether.multicast sysctl. If there is no multicast address don't kmem_alloc(0) (which panics the kernel), and if the number of multicast addresses has decreased don't copyout uninitialized kernel data. commit 43cdc7dca2a945b6b0a0d748267a61731e687d41 Author: maxv Date: Sun Jan 14 18:23:03 2018 +0000 Fix awful use of m_defrag, this code just can't work. And don't forget to return the updated pointer, because otherwise use-after-free. I couldn't test this change because I don't have the hardware. commit f1ebe49100b24ab599181bb0e2c452ae208c939c Author: maxv Date: Sun Jan 14 17:43:55 2018 +0000 Dedup. m_defrag is already a common function, no need to reimplement it there. Meanwhile this should fix two bugs (that I couldn't investigate more than that since I don't have this hardware): the mbuf passed to vge_m_defrag was leaked, and the tags were not copied in the returned mbuf. commit d54d5fc22a3978c1811cce92a8fdec8bad334859 Author: maxv Date: Sun Jan 14 17:16:58 2018 +0000 KDASSERT -> KASSERT. This code is fast and useful. commit 4bb0302d517a03cc896ff7a1c084adb21fbc5346 Author: maxv Date: Sun Jan 14 16:59:37 2018 +0000 style commit b3dece58fc767540958c301b64b62879145026c2 Author: maxv Date: Sun Jan 14 16:50:37 2018 +0000 If cnt == 0, don't kmem_alloc(0). Found by Mootja. Looking at the code, I also find it suspicious that we read ifv->ifv_mib->ifvm_p directly without making sure ifv_mib != NULL. commit abe8117b9d6c9cda5dfdce333d61753b029e4e51 Author: maxv Date: Sun Jan 14 16:43:03 2018 +0000 typos commit 53bf5be0d557508192b2581eaca33ed073f02289 Author: maxv Date: Sun Jan 14 16:36:04 2018 +0000 Fix use-after-free. There is a path where the mbuf gets pulled up without a proper mtod afterwards: 218 ipo = mtod(m, struct ip *); 281 m = m_pullup(m, hlen); 232 ipo->ip_src.s_addr Found by Mootja. Meanwhile it seems to me that 'ipo' should be set to NULL if the inner packet is IPv6, but I'll revisit that later. commit dacfe7734dd0710bfc6474669d63dbff9fcc7f5f Author: maxv Date: Sun Jan 14 16:18:11 2018 +0000 Fix memory leak, found by Mootja. commit d14d5cad5d7afb2a2dd06df9cb35c8efdb25cdbc Author: maxv Date: Fri Jan 12 09:12:01 2018 +0000 Split svs_page_add in two, one half will be used for other purposes, and update a comment. commit df03c95dfb78ada1d0020b67a3f2c7ab037d35b8 Author: maxv Date: Fri Jan 12 06:24:43 2018 +0000 Remove unused. commit d4b745eee4463fb4ac19f81267b87928e09e49d4 Author: maxv Date: Thu Jan 11 13:35:15 2018 +0000 Introduce a new svs_page_add function, which can be used to map in the user space a VA from the kernel space. Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates its own slot which maps only its own pcpu_entry plus the common area (IDT+ LDT). This way, the pcpu areas of the remote CPUs are not mapped in userland. commit 5a3dbe7bc9ed9cc496a0c23a4de47343233f51e3 Author: maxv Date: Thu Jan 11 11:15:34 2018 +0000 The uarea must always be page-aligned. commit 79118e4b0cff4b842511abb98af2184b55dcbf0e Author: maxv Date: Thu Jan 11 10:38:13 2018 +0000 Add ist0 to pcpu_entry. commit 45908b892ee69a505f05697314b61ae5ed24b728 Author: maxv Date: Thu Jan 11 10:30:26 2018 +0000 Initialize ist0 in cpu_init_tss. On amd64 this is the DDB stack, and it has nothing to do with ci_intrstack. While here, style, and don't forget to pass UVM_KMF_ZERO in uvm_km_alloc. commit 2ddd8078d65852551c0848922e76720b9f642b80 Author: maxv Date: Thu Jan 11 09:00:04 2018 +0000 Declare new SVS_* variants: SVS_ENTER_NOSTACK and SVS_LEAVE_NOSTACK. Use SVS_ENTER_NOSTACK in the syscall entry point, and put it before the code that touches curlwp. (curlwp is located in the direct map.) Then, disable __HAVE_CPU_UAREA_ROUTINES (to be removed later). This moves the kernel stack into pmap_kernel(), and not the direct map. That's a change I've always wanted to make: because of the direct map we can't add a redzone on the stack, and basically, a stack overflow can go very far in memory without being detected (as far as erasing all of the system's memory). Finally, unmap the direct map from userland. commit 3b6aad35d7c3febe7276865087cdc5ee81980884 Author: maxv Date: Wed Jan 10 20:51:11 2018 +0000 Restrict the check: SMAP faults are always protection violations, as the SDM points out, so make sure we have PGEX_P. This way NULL dereferences - which are caused by an unmapped VA, and therefore are not protection violations - don't take this branch, and don't display a misleading "SMAP" in ddb. Adding a PGEX_P check, or not, does not essentially change anything from a security point of view, it's just a matter of what gets displayed when a fatal fault comes in. I didn't put PGEX_P until now, because initially when I wrote the SMAP implementation Qemu did not always receive the fault if the PGEX_P check was there, while a native i5 would. I'm unable to reproduce this issue with a recent Qemu, so I assume I did something wrong when testing in the first place. commit 6e6b104371bbdde5669da615f58994bd5893dc33 Author: maxv Date: Wed Jan 10 18:13:29 2018 +0000 Add KASLR and SVS. commit 7cc64e90bb2160a1b940d8b9e1a1871fde173132 Author: maxv Date: Mon Jan 8 14:39:33 2018 +0000 Make Xen compile again. commit 496dc1cd0ad8a94bcbe8160877442c950eacd20a Author: maxv Date: Mon Jan 8 09:33:53 2018 +0000 Since SVS is now defined in files.x86, remove it from files.amd64 and files.i386. commit 5939d72a4aba02c8f746b6da66a1ae10598e67fc Author: maxv Date: Sun Jan 7 16:10:52 2018 +0000 Don't enable SVS yet. commit 0c4a3be28c47135311e0fb00132a0e9b60014ac5 Author: maxv Date: Sun Jan 7 16:10:16 2018 +0000 Add a new option, SVS (for Separate Virtual Space), that unmaps kernel pages when running in userland. For now, only the PTE area is unmapped. Sent on tech-kern@. commit 88d9257e94cd4a3adf765726e037a79db87e0ab1 Author: maxv Date: Sun Jan 7 13:43:23 2018 +0000 Switch x86_retpatch[] -> HOTPATCH(). commit 9f63dde9e6cea4a60148770dca0ed7cd9daf7bcd Author: maxv Date: Sun Jan 7 13:37:39 2018 +0000 Fix previous - atomic_lockpatch[] is still there. commit db920a754d8799daf2f9b9c22ee8a51a39e40cee Author: maxv Date: Sun Jan 7 13:15:23 2018 +0000 Switch x86_lockpatch[] -> HOTPATCH(). commit de80fc604897f3f7533000cbab06fc4b5ee049dc Author: maxv Date: Sun Jan 7 12:42:46 2018 +0000 Implement a real hotpatch feature. Define a HOTPATCH() macro, that puts a label and additional information in the new .rodata.hotpatch kernel section. In patch.c, scan the section and patch what needs to be. Now it is possible to hotpatch the content of a macro. SMAP is switched to use this new system; this saves a call+ret in each kernel entry/exit point. Many other operating systems do the same. commit 342f94af5d7f1e8817c343d69a459cebb8e070bc Author: maxv Date: Sun Jan 7 11:24:45 2018 +0000 Give patchbytes an array. commit fccb533166941fe4622e164b733b8e4e1efed98f Author: maxv Date: Sun Jan 7 10:16:13 2018 +0000 Use uvm_km_alloc instead of kmem_zalloc. commit 9540a7c9cb107bd3f27e18dcce7e5a4432e13993 Author: maxv Date: Sat Jan 6 08:44:01 2018 +0000 Mmh, I made a mistake in r1.10 - I forgot to update this function call. commit 8964449a7e73dcb94f202a557f97db8ffdd63dde Author: maxv Date: Fri Jan 5 08:04:20 2018 +0000 Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not Xen. With this option, the CPU structures that must always be present in the CPU's page tables are moved on L4 slot 384, which means address 0xffffc00000000000. A new pcpu_area structure is defined. It contains shared structures (IDT, LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci). Theoretically the LDT should be in the array, but this will be done later. During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a page tree that is able to map the pcpu_area structure entirely. cpu0 then immediately maps the shared structures. Later, every CPU goes through cpu_pcpuarea_init, which allocates physical pages and kenters the relevant pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea. The point of this change is to make sure that the structures that must always be present in the page tables have their own L4 slot. Until now their L4 slot was that of pmap_kernel, and making a distinction between what must be mapped and what does not need to be was complicated. Even in the non-speculative-bug case this change makes some sense: there are several x86 instructions that leak the addresses of the CPU structures, and putting these structures inside pmap_kernel actually offered a way to compute the address of the kernel heap - which would have made ASLR on it plainly useless, had we implemented that. Note that, for now, pcpuarea does not contain rsp0. Unfortunately this change adds many #ifdefs, and makes the code harder to understand. There is also some duplication, but that will be solved later. commit d883e3ded59c99c3b06bbded1889567f52bc898d Author: maxv Date: Thu Jan 4 20:38:30 2018 +0000 Declare gdt_size as const, simplifies. commit d8a4c9c8a4e3c7b9f88d251c1fe24c43ce371bd6 Author: maxv Date: Thu Jan 4 14:02:23 2018 +0000 Declare IOMAP_VALIDOFF, not to use ci_tss pointers. commit 36f7684150702fb6fa268f11e5379df146a2b58e Author: maxv Date: Thu Jan 4 13:36:30 2018 +0000 Allocate the TSS area dynamically. This way cpu_info and cpu_tss can be put in separate pages. commit 6825d8ad8283dc3f053ab8a2a33d6fff49889c37 Author: maxv Date: Thu Jan 4 12:34:15 2018 +0000 Group the different TSSes into a cpu_tss structure. And pack this structure to make sure there is no padding between 'tss' and 'iomap'. commit 84973b70e0ab64e91ee21b082bf01f085633db8e Author: maxv Date: Wed Jan 3 09:46:41 2018 +0000 style commit b71ed6754a3c3a7827f2058138aab817e18cd8de Author: maxv Date: Wed Jan 3 09:38:23 2018 +0000 simplify commit a4b8db089f9e6c7f5cf1153e0c5bba616d7a6bb0 Author: maxv Date: Tue Jan 2 18:54:26 2018 +0000 Stop sharing the double-fault stack. It is embedded in .data, and we won't want that in the future. This has always been wrong anyway, even if it is unlikely that two CPUs will double fault at the same time. commit 30dfbad20d344bb80a4cc0318863e4705c87f1f2 Author: maxv Date: Tue Jan 2 18:41:14 2018 +0000 Use decimal numbering - hex is just misleading -, use ZTRAP_NJ for NMIs, and declare intrspurious independently. commit 1350303403f72a4b18f2f6274d503cdfd85e88e2 Author: maxv Date: Mon Jan 1 12:36:26 2018 +0000 Remove MFREE. commit 04ed035cc2a6ebd6012bcb632ae2a8357e7a1ebd Author: maxv Date: Mon Jan 1 12:22:59 2018 +0000 Detect use-after-frees on mbufs with external storage, too. This is done even when the refcount is > 1. Again, this code is enabled by default, because it is fast and quite useful. commit 17b73d88739888cf66bbc1bba935ce1c1f45401b Author: maxv Date: Mon Jan 1 12:09:56 2018 +0000 Don't use macros, rather inline, much clearer. For the record, I was partly mistaken in my previous commit: even though the macros were local, the function names were still the ones of the real callers. However, setting the name in m_data was not a good thing; this was a valid pointer, and the kernel could execute a long time before figuring out the mbuf was already freed - therefore making debugging more difficult. And information on the caller can be obtained via ddb anyway. commit e65fe66b16554cb4cdf767024b6fbacd7c010b1c Author: maxv Date: Mon Jan 1 08:14:13 2018 +0000 Compile the prekern entry point only under KASLR. commit 683b4c48842c01378f264d39b30d3cbf6e63ac36 Author: maxv Date: Mon Jan 1 08:03:43 2018 +0000 Use the default %cs, and mask the other segregs. commit 7a3545607e736a9fe89d7091d57935c43266826e Author: maxv Date: Sun Dec 31 15:41:05 2017 +0000 Ah, finally found you. Fix two bugs in pmap_remap_largepages(), that could cause KASLR kernels to crash early during the boot procedure. pmap_remap_largepages assumes that the kernel is far from the end of the VM space, but this assumption does not hold with KASLR, since the kernel sections are allowed to reside in the very last page of the VM space. Doing +NBPD_L2 or roundup() in such cases caused an integer overflow, which caused a page fault when touching &L2_BASE, which in turn caused an immediate CPU reset and a reboot. Took me a while to reproduce and debug this issue. commit 28ab40b997f45d781a1ced73ed32826e6a8a856b Author: maxv Date: Sun Dec 31 08:29:38 2017 +0000 Fix a huge privilege separation vulnerability in Xen-amd64. On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL equals SEL_UPL. While Xen can make a distinction between usermode and kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes to read and write to the CPU ports. It is easy, then, to completely escalate privileges; by reprogramming the PIC, by reading the ATA disks, by intercepting the keyboard interrupts (keylogger), etc. Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use the ports but not userland. I didn't test this change on i386, but it seems fine enough. commit 550f0f830f78d107a1a31e7348dc528670ef86a5 Author: maxv Date: Sun Dec 31 07:23:09 2017 +0000 gc unused commit 94dc165913a1f76886a20737da31f73e2073b636 Author: maxv Date: Sun Dec 31 06:57:12 2017 +0000 Check MT_FREE by default, and not just under DEBUG (or DIAGNOSTIC). This code is fast, with an nonexistent overhead - and we already take care of setting MT_FREE, so why not check it. In addition, stop registering the function name, that's not helpful since the MBUFFREE macro is local. Instead, set m_data to NULL, so that any access to a freed mbuf's data after mtod() or similar will page fault. The combination of these two changes provides a fast and efficient way of detecting use-after-frees in the network stack. commit 8cf6cd717c810b4ae1dc8c427630d6017e80a339 Author: maxv Date: Thu Dec 28 14:34:39 2017 +0000 Use variables in PMAP_DIRECT_*, so that the location of the direct map can change. commit 7c4d6fbce10b8411c49ac8c7702a24076f2f9662 Author: maxv Date: Thu Dec 28 14:03:13 2017 +0000 Eliminate the assumption that the beginning of the direct map is aligned to NBPD_L4 and NBPD_L3. It won't be when we'll randomize its location. commit 1a1f8e79f4a192ffe46253f52f9c0af7ba44f1a9 Author: maxv Date: Thu Dec 28 13:46:10 2017 +0000 Downgrade the direct map from 1GB superpages to 2MB large pages, and simplify. Then, map the "head" region and the kernel segments as RO instead of RW, to kill the last place that has .text mapped as writable. It will also allow for a greater number of possibilities when we will randomize the direct map. While it is true that this change theoretically reduces performance a bit, we are more interested in correctness. commit eb841456f7de9d161a30996278f808010ddab99a Author: maxv Date: Thu Dec 28 08:49:28 2017 +0000 Style, export struct acpisrat_node, and add acpisrat_get_node. commit e1a629f45eb1982ca538345a823267c0ebc1f8de Author: maxv Date: Thu Dec 28 08:30:36 2017 +0000 typos commit 5e26281014edf5cabfc6f96701ae84f746940241 Author: maxv Date: Fri Dec 22 07:37:27 2017 +0000 Sync comments with reality. commit 962a9bb019a537ae0150a6b2fdae6394de541e10 Author: maxv Date: Fri Dec 22 07:19:02 2017 +0000 Build and install the prekern by default. I didn't build a full distribution to test this change, but it seems fine enough. commit 911d988bf2f13932c1ab9d4566a2bce47d029d88 Author: maxv Date: Thu Dec 21 14:32:06 2017 +0000 Remove unused macros. commit e846436051f74769f59da18ad79fc0a7953c28c8 Author: maxv Date: Thu Dec 21 14:28:39 2017 +0000 Make sure we're loading a relocatable binary, to give the user a chance to correct the kernel name if he mistakenly typed pkboot on a static kernel, without having to reboot the machine (currently the prekern sees it's a static kernel and panics). commit 3ae0d9e98e8fd03569ce609044b1f492f181f3d4 Author: maxv Date: Sat Dec 16 10:15:12 2017 +0000 compat_util.c must be compiled by default in the kernel. It is needed by generic non-compat code, so it must not depend on anything (libcompat or whatever option we choose to associate it to). commit 19d7472ea3324cefbbe0e228866467a702587135 Author: maxv Date: Sat Dec 16 09:34:18 2017 +0000 Fix the linux dependency. It does not depend on COMPAT_16, it just wants the compat functions (not really controlled by COMPAT_NETBSD, but for the principle). Makes it possible to load compat_linux.kmod from the filesystem without any COMPAT_* option compiled (but COMPAT_NETBSD). commit 94ec8598782d94f71147638a3f6a296f530d6590 Author: maxv Date: Sat Dec 16 09:10:30 2017 +0000 Build these functions regardless of whether COMPAT_50 or COMPAT_70 are enabled. They must be there, because they are needed in rtsock.c even when no compat option is enabled. commit afc2299f0c2e5186f22de107abe18d22e00103c6 Author: maxv Date: Sat Dec 16 08:31:36 2017 +0000 Build libcompat as an object, not as a library. We want all of its functions compiled in, because compat modules loaded from the filesystem may depend on them. commit 2c49b545ca576e5ecf28b287091403eefe0897bc Author: maxv Date: Fri Dec 15 21:00:26 2017 +0000 Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to bypass a certain number of filtering rules. Basically there is an integer overflow in npf_cache_ip: npc_hlen is a 8bit unsigned int, and can wrap to zero if the IPv6 packet being processed has large extensions. As a result of an overflow, (mbuf + npc_hlen) won't point at the real protocol header, but instead at some garbage within the packet. That garbage, is what NPF applies its rules on. If these filtering rules allow the packet to enter, that packet is given to the main IPv6 entry point. This entry point, however, is not subject to an integer overflow, so it will actually parse the correct protocol header. The result is: NPF read a wrong header, allowed the packet to enter, the kernel read the correct header, and delivered the packet depending on this correct header. So the offending packet was supposed to be kicked, but still went through the firewall. Simple example, a packet with: packet + 0 = IP6 Header packet + 40 = IP6 Routing header (ip6r_len = 31) packet + 48 = Crafted UDP header (uh_dport = 7777) packet + 296 = IP6 Dest header (ip6e_len = 0) packet + 304 = Real UDP header (uh_dport = 6666) Will bypass a rule of the kind "block port 6666". Here NPF reads the crafted UDP header, sees 7777, lets the packet in; later the kernel reads the real UDP header, and delivers it on port 6666. Fix this by using uint32_t. While here, it seems to me there is also a memory overflow: still in npf_cache_ip, npc_hlen may be incremented with a value that goes beyond the mbuf. commit eba515c1f425d88fdd6998674fb2b6da0a4ee6a4 Author: maxv Date: Sun Dec 10 09:06:46 2017 +0000 Fix use-after-free: if m_pullup fails the (freed) mbuf is pushed on the ip6_pktq queue and re-processed later. Return 1 to say "processed and freed". commit fa370b63dbc0f963d91c7bec84c5aa66afdc9625 Author: maxv Date: Sun Dec 10 08:56:23 2017 +0000 Fix use-after-free: ieee80211_crypto_decap does a pullup on the mbuf but the updated pointer is not passed back. Looks like it is triggerable remotely. commit cb82da5e4200c669301a5f9d15e790994115a968 Author: maxv Date: Sun Dec 10 08:48:15 2017 +0000 Update the pointer after m_pullup, otherwise possible use-after-free. commit 4ad4b2434577b400e209a5c0e28e62d181daf959 Author: maxv Date: Sat Dec 9 10:51:30 2017 +0000 style commit b63c71a03b78b8d92b32aa7c4c7d1e458ca9b6ff Author: maxv Date: Sat Dec 9 10:30:30 2017 +0000 Kick MPLS packets earlier. commit a2db3960a7318cc084d234d5e224de8fc1ec019c Author: maxv Date: Sat Dec 9 10:19:42 2017 +0000 Make sure we have an llc structure in the packet, and don't read past the end of the mbuf if we don't. I'm wondering whether we should not pull up instead, but whatever. commit f0c9d6644f46f831a5c6370fd98f6a8e74051cd9 Author: maxv Date: Sat Dec 9 10:14:04 2017 +0000 Mmh, pull up the packet to ether_aarp, otherwise we're reading past the end of the mbuf. commit 9370b4fcf99234134f1e7387199c4fb757abf1cb Author: maxv Date: Fri Dec 8 17:49:54 2017 +0000 Style, and fix several bugs: - ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform pullups, so we need to pass the updated pointers back - in mpls_lse() the route is not always freed Looks a little better now. commit a918ade6e27c6376eeafe8b5e12211beee9ed906 Author: maxv Date: Sun Dec 3 12:53:52 2017 +0000 Fix uninitialized pointer, found by Mootja. Not a surprise in untested code. commit 8f93bb10be24d8b433c3d9f72e9ef7ce77f5d3ec Author: maxv Date: Sat Dec 2 15:36:24 2017 +0000 Remove a piece of COMPAT_13, that I mistakenly didn't commit three hours ago (in my change to drop COMPAT_13 on amd64). commit 72233a1e8e647c4846915044b19934e661e112ac Author: maxv Date: Sat Dec 2 13:03:15 2017 +0000 Drop COMPAT_13 on amd64, already not enabled. Reduces the number of critical places. commit 336a3465941c435d0ddb8c6472ba015c9f380056 Author: maxv Date: Sat Dec 2 12:40:03 2017 +0000 Drop COMPAT_10 on amd64. The support for it comes down to one ifdef in trap.c - code that is incorrect anyway, there were originally three lcall LDT slots, and here only one instruction is decoded. Given that one of these slots was used by BSDi's syscall, also remove the references to COMPAT_NOMID to make clear we don't support that (it already is not enabled). Note: for some reason, COMPAT_10 does not even compile, because there are "multiple definitions of _KERNEL_OPT_COMPAT_...", and I don't really understand where this comes from. commit 285abd1c85c79f24a3225b8135e92e593c2260a8 Author: maxv Date: Sat Dec 2 09:59:02 2017 +0000 Remove options that do not exist on amd64. commit cec870baddd8e798bf959d84bedddaff6d0ebb84 Author: maxv Date: Fri Dec 1 21:22:45 2017 +0000 Don't even bother with the trap frame, and force the default values. commit b635c9b7628390c23e83e73814295695aebe127d Author: maxv Date: Thu Nov 30 18:44:16 2017 +0000 If no auxv is present, don't kmem_alloc(0). Easy to panic the kernel by typing 'cat /proc/aout_pid/auxv' on whatever a.out binary you're running. Fortunately, amd64 does not enable EXEC_AOUT by default. Unfortunately, i386 does enable it by default. commit 58188f1a641a4121878a008fdff20276547dcf8e Author: maxv Date: Tue Nov 28 08:43:49 2017 +0000 style commit 72283ea76d614319bdb8dfc5155ba40f29192fd9 Author: maxv Date: Mon Nov 27 09:18:01 2017 +0000 Inline _FRAME_GREG, and mask only 16 bits of the segment registers, otherwise the upper 48 bits may contain stack garbage. By the way, I find it suspicious that we're not masking regs[_REG_RFLAGS] with PSL_USER in process_write_regs. commit ecd25dfae280716f5be757f25a18c3cf7839845a Author: maxv Date: Mon Nov 27 09:10:12 2017 +0000 Remove unused fields, there is no alignment we need to enforce. commit 3d73bd9d48871f6368f441c34c3e87ccc9b7f590 Author: maxv Date: Sun Nov 26 15:00:16 2017 +0000 Update a comment, and use testw instead. commit b1cea2fdf59762e8477c93d2e28e4e8694583688 Author: maxv Date: Sun Nov 26 14:54:43 2017 +0000 Hide a bunch of raw symbols. commit cdf4208ef9bce7a6ab1159839947185deb112fbe Author: maxv Date: Sun Nov 26 14:29:48 2017 +0000 Oh, damn. Obviously I forgot one case here: an already-mapped region could be contained entirely in the region we're trying to create. So go through another round. While here add mm_reenter_pa, and make sure the va given to mm_enter_pa does not already point to something. commit b0079c63cc6e6bff1b54448b00cf71aff90f6292 Author: maxv Date: Sun Nov 26 11:37:10 2017 +0000 Remove unused variables. commit 3aa5ce310078f89f02afb81aadca098e135ef9b8 Author: maxv Date: Sun Nov 26 11:08:34 2017 +0000 I forgot to say in my previous commit that the PRNG is inspired from a conversation with Taylor and Thor on tech-kern@. (just add a comment) commit c056f199056ea5f14fb01e07eaa3541c4a0fe65f Author: maxv Date: Sun Nov 26 11:01:09 2017 +0000 Add a PRNG for the prekern, based on SHA512. The formula is basically: Y0 = SHA512(entropy-file, 256bit rdseed, 64bit rdtsc) Yn+1 = SHA512(256bit lowerhalf(Yn), 256bit rdseed, 64bit rdtsc) On each round, random values are taken from the higher half of Yn. If rdseed is not available, rdrand is used. The SHA1 checksum of entropy-file is verified. However, the rndsave_t::data field is not updated by the prekern, because the area is accessed via the read-only view we created in locore. I like this design, so it will have to be updated differently. commit ac6c8cd624537870c13facb9ca9465015e9f06d6 Author: maxv Date: Sun Nov 26 10:21:20 2017 +0000 Add rdrand. commit 7bdc49beed80377dccf16c6febd6dd6d0d7dd4ee Author: maxv Date: Tue Nov 21 10:55:23 2017 +0000 Mmh, surprising bug. It's __packed, not __packed__. Here the structure is not packed for real, but instead a global __packed__ symbol is declared. commit efc379f0cec9fba9daa6446484fbbaa1ecfd88da Author: maxv Date: Tue Nov 21 10:45:12 2017 +0000 This should be "linux_sg_version", not "version". commit f0a0e01ebe84ff92576fff66a31b819af88cd620 Author: maxv Date: Tue Nov 21 10:42:44 2017 +0000 Remove unused variables. commit 11fc0dd21d55d0b1a91bc56f5c2dd2e52af0814e Author: maxv Date: Tue Nov 21 09:58:09 2017 +0000 Remove unused symbol - it is aligned to 4096 and this reduces the number of possible locations for .bss in KASLR kernels. commit 11b2b9ea8a37ea7de429dfcc5628a5e3faca91c2 Author: maxv Date: Tue Nov 21 07:56:05 2017 +0000 Clean up and add some ASSERTs. commit 3d7758ba0d75110ba95efebab14b38c374f18879 Author: maxv Date: Fri Nov 17 07:16:06 2017 +0000 Kernel ASLR and XSAVEOPT. commit bc74d54c2e026986f58a4c76b2e253f248b6c033 Author: maxv Date: Fri Nov 17 07:07:52 2017 +0000 style commit cd0a11a4180b7535fe755390c3010f02e9861105 Author: maxv Date: Wed Nov 15 20:45:16 2017 +0000 Small cleanup. commit 9e384226931123ddd92f4f1da611d66d0d379fbf Author: maxv Date: Wed Nov 15 20:25:29 2017 +0000 Mmh, should be <=. commit 3af09de6f5ec11fd1c8a16ea8c4c0c534ba6198a Author: maxv Date: Wed Nov 15 18:44:34 2017 +0000 Define MM_PROT_* locally. commit 27958a79ad3990f969ce578239fad56100abdc9a Author: maxv Date: Wed Nov 15 18:02:36 2017 +0000 Support large pages on KASLR kernels, in a way that does not reduce randomness, but on the contrary that increases it. The size of the kernel sub-blocks is changed to be 1MB. This produces a kernel with sections that are always < 2MB in size, that can fit a large page. Each section is put in a 2MB physical chunk. In this chunk, there is a padding of approximately 1MB. The prekern uses a random offset aligned to sh_addralign, to shift the section in physical memory. For example, physical memory layout created by the bootloader for .text.4 and .rodata.0: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ |+---------------+ |+---------------+ | || .text.4 | PAD || .rodata.0 | PAD | |+---------------+ |+---------------+ | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ PA PA+2MB PA+4MB Then, physical memory layout, after having been shifted by the prekern: +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ | P +---------------+ | +---------------+ | | A | .text.4 | PAD | PAD | .rodata.0 | PAD | | D +---------------+ | +---------------+ | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ PA PA+2MB PA+4MB The kernel maps these 2MB physical chunks with 2MB large pages. Therefore, randomness is enforced at both the virtual and physical levels, and the resulting entropy is higher than that of our current implementaion until now. The padding around the section is filled by the prekern. Not to consume too much memory, the sections that are smaller than PAGE_SIZE are mapped with normal pages - because there is no point in optimizing them. In these normal pages, the same shift is applied. This change has two additional advantages: (a) the cache attacks based on the TLB are mostly mitigated, because even if you are able to determine that a given page-aligned range is mapped as executable you don't know where exactly within that range the section actually begins, and (b) given that we are slightly randomizing the physical layout we are making some rare physical attacks more difficult to conduct. NOTE: after this change you need to update GENERIC_KASLR / prekern / bootloader. commit ea94495e639e617fc468e3bfb997da3f51305069 Author: maxv Date: Tue Nov 14 13:58:07 2017 +0000 Remove XXX: set FRAMESIZE to the kernel value. Verily I don't understand why we are doing that in the non-kaslr kernels, but let's just reproduce the behavior. jump_kernel is changed to use callq, so that the stack alignment is preserved. commit 8eda08088e364e36a5fbe71711d5d9eef47d8ca5 Author: maxv Date: Tue Nov 14 10:15:40 2017 +0000 Split each kernel section into sub-blocks of approximately 2MB. The newly created sections are named .origname.i, for example: .text -> { .text .text.0 .text.1 .text.2 .text.3 .text.4 } Each section is randomized independently by the prekern - and in a random order obviously. As a result we can get intertwined mappings, of the type: +-------+-----------+------+---------+-----------+-------+-------+------+- | text1 | NOTMAPPED | bss0 | rodata1 | NOTMAPPED | data2 | text3 | bss1 | +-------+-----------+------+---------+-----------+-------+-------+------+- ---------+- rodata0 | ... ---------+- The CTF section is dropped completely, because (a) when split it becomes enormous for some reason (that I don't quite understand, verily), and (b) the kernel expects only one CTF and can't handle several of them. commit 07468aa596a6213df880a72cecccd635fcb17fa0 Author: maxv Date: Tue Nov 14 09:56:26 2017 +0000 Remove max-page-size on KASLR, it doesn't play any role. commit 6124354c7d6afac26e86b19f696c0dfd331af206 Author: maxv Date: Tue Nov 14 09:55:41 2017 +0000 Add missing ). commit 78191a2b2fb7f071349330f331e8a56cee46a99b Author: maxv Date: Tue Nov 14 07:06:34 2017 +0000 Add -Wstrict-prototypes, and fix each warning. commit 2f664a44e78808cfb028e766ec9533ba69259cc8 Author: maxv Date: Mon Nov 13 21:33:42 2017 +0000 One more ASSERT, won't hurt. commit 50fafb2a1d101c9026418452d62f19f3f3ac145b Author: maxv Date: Mon Nov 13 21:32:21 2017 +0000 Don't process ELF sections that don't have the ALLOC flag set. NOTE: you need to update both the prekern and the bootloader after this change. commit 313790418a1a73743160d570a6df7554292701d8 Author: maxv Date: Mon Nov 13 21:14:03 2017 +0000 Change the mapping logic: don't group sections of the same type into segments, and rather map each section independently at a random VA. In particular, .data and .bss are not merged anymore and reside at different addresses. commit f1662022a19c6f0472e38aa39153d4ee2ea5676a Author: maxv Date: Mon Nov 13 20:21:10 2017 +0000 Revert my last revision, that is to say, don't group sections into segments anymore. Initially I did this because I wanted to compress the sections by reducing the padding between them; but we'll handle that differently. commit 466a8afb271b6f18ba4b8d6a68dfc5493967caa0 Author: maxv Date: Mon Nov 13 20:03:26 2017 +0000 Link libkern in the prekern, and remove redefined functions. commit 2bbdf7ebc7415583b06c5289f4127c458a5ec6aa Author: maxv Date: Mon Nov 13 20:01:48 2017 +0000 Use SUBALIGN, to force the alignment at the section level, and remove the inter-section ALIGN which doesn't do anything since the physical address of the section is chosen dynamically by the bootloader. commit 23c60ac84c53543bc6a0d19e06ac8fedbd952699 Author: maxv Date: Sat Nov 11 13:50:57 2017 +0000 Detect collisions from bootspace directly. commit 332b821d0584cf2be94666a7c7a9fc649c6a2bea Author: maxv Date: Sat Nov 11 12:51:05 2017 +0000 Modify the layout of the bootspace structure, in such a way that it can contain several kernel segments of the same type (eg several .text segments). Some parts are still a bit messy but will be cleaned up soon. I cannot compile-test this change on i386, but it seems fine enough. NOTE: you need to rebuild and reinstall a new prekern after this change. commit af8c5169355ddf5ecb96db6841a8d6709d5ecc3d Author: maxv Date: Sat Nov 11 11:00:46 2017 +0000 Recommit http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's wrong with the Xen fpu. commit ea8a2039e2e043fb20ab741d75a7102ece0931c3 Author: maxv Date: Fri Nov 10 08:52:57 2017 +0000 Implement memcpy, the builtin version does not work with variable sizes. commit 589a841d724ae8399de1d0e517722db7c849364d Author: maxv Date: Fri Nov 10 08:05:38 2017 +0000 Add cpuid and rdseed. commit 0f4920e59686e1371b94e754a44c68faec1116b1 Author: maxv Date: Thu Nov 9 15:56:56 2017 +0000 Define utility functions as inlines in prekern.h. commit 16deffeb5fd9b76eacd1cb6662fa62ca3f157deb Author: maxv Date: Thu Nov 9 15:46:48 2017 +0000 Use another ld script for kaslr kernels, in which there are no alignment directives. They don't matter since the bootloader overwrites them. But, normally we still need to make sure .data.read_mostly is aligned. Unfortunately I couldn't find any way to force sh_addralign to be 64, so I'm leaving the alignment there as a useless reminder. commit 17e275ba418aa829f6ebd71173b6a92db8d343e5 Author: maxv Date: Thu Nov 9 15:24:39 2017 +0000 Fill in the page padding. Only .text is pre-filled by the ld script, but this will change in the future. commit 7257f29bfdc0f54313ee9506723d1f289e698cd8 Author: maxv Date: Wed Nov 8 18:31:00 2017 +0000 Add pkboot in "help". commit f275b4e8cd01135e82e7357ca80b52e497b75f5d Author: maxv Date: Wed Nov 8 18:29:04 2017 +0000 Don't fall through. commit 07d090a2a022816f324a9365fa3f421da07623c6 Author: maxv Date: Wed Nov 8 17:55:54 2017 +0000 remove vestige commit e0c18d5fae4ff05c13e842b15623972bbd6a5f51 Author: maxv Date: Wed Nov 8 17:52:22 2017 +0000 Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before touching xcr0. Then use clts/stts instead of modifying cr0, and enable the mxcsr_mask detection on Xen. commit 6aa3df72856040f8d5ec6b1a658e98dc7165f337 Author: maxv Date: Sun Nov 5 16:27:18 2017 +0000 Remove unused. commit e32f5d305afb9f5385a9cb958e7afc750b57fbc6 Author: maxv Date: Sun Nov 5 16:26:15 2017 +0000 Mprotect the segments in mm.c using bootspace, and remove the now unused fields of elfinfo. commit e5b5a40325832b215750fca48d6c93e33744499f Author: maxv Date: Sat Nov 4 12:53:00 2017 +0000 Fix stack overflow, found when testing a new feature. commit ae068992a0b60407a658c6f9e7c6884d59daceca Author: maxv Date: Sat Nov 4 08:58:30 2017 +0000 Add support for xsaveopt. It is basically an instruction that optimizes context switch performance by not saving to memory FPU registers that are known to be in their initial state or known not to have changed since the last time they were saved to memory. Our code is now compatible with the internal state tracking engine: - We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT. That is to say, we always call XRSTOR first. - During a fork, the whole in-memory FPU state area is memcopied in the new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it will fault, we migrate the area, call XRSTOR and clear CR0_TS. During this XRSTOR XSTATE_BV still contains the initial values, and it forces a reload of XINUSE. - Whenever software wants to change the in-memory FPU state, it manually sets XSTATE_BV[i]=1, which forces XINUSE[i]=1. - The address of the state passed to xrstor is always the same for a given LWP. fpu_save_area_clear is changed not to force a reload of CW if fx_cw is the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt will optimize this state. Small benchmark: switch lwp to cpu2 do float operation switch lwp to cpu3 do float operation Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds to 20,8 seconds. commit c415522ac729f1673e8e0d162a401574bae04ee6 Author: maxv Date: Sat Nov 4 07:38:42 2017 +0000 Always set XCR0_X87, to force a reload of CW. That's needed for compat options where fx_cw is not the standard fpu value. commit d772eba4edd20f72378c8fb8ada03c2f99d1fedb Author: maxv Date: Sat Nov 4 07:35:00 2017 +0000 Fix xen. Not tested, but seems fine enough. commit 62592c33c1318547fbe310ac6295c7869176fa31 Author: maxv Date: Fri Nov 3 09:59:07 2017 +0000 Handle absolute relocations coming from the kernel: preserve SHN_ABS in the kernel and module symbols, and when relocating a symbol that has SHN_ABS, take its value as-is and don't return an error if it equals zero. Sent on tech-kern@. commit 83d360529a566bf4d005234f17413674b0193716 Author: maxv Date: Fri Nov 3 07:14:24 2017 +0000 Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking MXCSR we are losing some features (eg DAZ). commit c8b47b7e6283dc7820598f86e4c80bd334456e04 Author: maxv Date: Wed Nov 1 17:00:17 2017 +0000 Handle absolute symbols. Since my linux_sigcode.S::rev1.4 there are two Elf_Rela that point to the NULL symbol - which the prekern thought was an external reference. In the ELF spec, STN_UNDEF means the value of the symbol is zero. commit 35e64ffb71d42f9c55e9015c3d23d764d6927239 Author: maxv Date: Wed Nov 1 09:47:53 2017 +0000 Use NENTRY -> END. commit dcd7beaa5523f8089754e2c18049c857b1a06683 Author: maxv Date: Wed Nov 1 09:38:43 2017 +0000 More END(). In linux_sigcode.S we only provide symbols, not defined as functions. commit 681da605244fe48229324e37a96b56eec0945a24 Author: maxv Date: Wed Nov 1 09:31:24 2017 +0000 Add linux_sigcode.o, otherwise it doesn't get rebuilt. commit 8f059525fac2dc8db3e2b63f0d77b5e8f644fd39 Author: maxv Date: Wed Nov 1 09:17:28 2017 +0000 Don't fall through functions, explicitly jump instead. While here don't call smap_enable twice (harmless), and add END() markers. commit 38ccf1c737c00d12b5e815f510e6db3388ab4db8 Author: maxv Date: Wed Nov 1 07:14:29 2017 +0000 Remove unused macros and LDT entries. commit 63e222e2ec34b949bb704d3f2e40cc1b98e125d1 Author: maxv Date: Tue Oct 31 18:30:36 2017 +0000 Remove outdated comment. commit a847238d8d8152939d3d38951194e0b40b45c775 Author: maxv Date: Tue Oct 31 18:23:29 2017 +0000 Zero out the buffer entirely. commit e58fdc253a08c94a9104016a389eb10c597c00c5 Author: maxv Date: Tue Oct 31 18:13:37 2017 +0000 Mask mxcsr, otherwise userland could set reserved bits to 1 and make xrstor fault. commit 7447f3d8c5daf4a9bf7081f924fa9e53e9e89248 Author: maxv Date: Tue Oct 31 15:16:10 2017 +0000 Initialize xstate_bv with the structures that were just filled in, otherwise xrstor does not restore them. This can happen only if userland calls setcontext without having used the FPU before. Until rev1.15 xstate_bv was implicitly initialized because the xsave area was not zeroed out properly. commit 89ed108d782e1b994654e02e06bc7a7f921b5c65 Author: maxv Date: Tue Oct 31 12:02:20 2017 +0000 Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB. commit 357436d81cec001ad32185cc54ce754472c9e221 Author: maxv Date: Tue Oct 31 11:37:05 2017 +0000 Always use x86_fpu_save, clearer. commit 30b3ae562d6349b243033ba251cf38519fd35976 Author: maxv Date: Tue Oct 31 10:39:13 2017 +0000 Add xsh_xcomp_bv and fx_zero, and use uint8_t instead. commit 295fec21a4d5726c995f28b88bab311a9f3f8e08 Author: maxv Date: Tue Oct 31 10:35:58 2017 +0000 Remove comments that are more misleading than anything else. While here make sure we zero out the FPU area entirely, and not just its legacy region. commit e15e0b5c5a3db598afa24d37095903f23f1076ff Author: maxv Date: Mon Oct 30 17:13:39 2017 +0000 Add END(). commit 041dce6f7906362fef36bac813fd179b97e3d2db Author: maxv Date: Mon Oct 30 17:06:42 2017 +0000 Always use END() markers when declaring functions in assembly, so that ld can compute the size of the functions. A few remain. While here, fix a bug in the INTRSTUB macro: we are falling through resume_, but it is aligned, so it looks like we're executing the inter- function padding - which probably happens to contain NOPs, but that's still bad. commit 90f0671611dcbed1883813d51d5121d4475def07 Author: maxv Date: Sun Oct 29 17:19:14 2017 +0000 Mmh, we don't map the CTF section on kaslr kernels, so disable KDTRACE_HOOKS for now. commit 5e4eea704de0155b8d6c566bb3ca3382fef73130 Author: maxv Date: Sun Oct 29 11:38:43 2017 +0000 Fix a few error messages, and be a little more verbose. commit f140e8b5282141006a0962c885aaea04d7c7a969 Author: maxv Date: Sun Oct 29 11:28:30 2017 +0000 Randomize the kernel segments independently. That is to say, put text, rodata and data at different addresses (and in a random order). To achieve that, the mapping order in the prekern is changed. Until now, we were creating the kernel map the following way: -> choose a random VA -> map [kernpa_start; kernpa_end[ at this VA -> parse the ELF structures from there -> determine where exactly the kernel segments are located -> relocate etc Now, we are doing: -> create a read-only view of [kernpa_start; kernpa_end[ -> from this view, compute the size of the "head" region -> choose a random VA in the HEAD window, and map the head there -> for each region in (text, rodata, data, boot) -> compute the size of the region from the RO view -> choose a random VA in the KASLR window -> map the region there -> relocate etc Each time we map a region, we initialize its bootspace fields right away. The "head" region must be put before the other regions in memory, because the kernel uses (headva + sh_offset) to get the addresses of the symbols, and the offset is unsigned. Given that the head does not have an mcmodel constraint, its location is randomized in a window located below the KASLR window. The rest of the regions being in the same window, we need to detect collisions. Note that the module map is embedded in the "boot" region, and that therefore its location is randomized too. commit 39791836b222683918ddc2ec428cc83ea2ca29c2 Author: maxv Date: Sun Oct 29 10:25:28 2017 +0000 Use bootspace.head.va instead of the direct map. Otherwise there's the assumption that the offsets contained in sh_offset in physical memory are equal to the offsets in virtual memory, which won't be true in the future. commit 27a2509f0e4c8da359a710d994add141b3c1c7bf Author: maxv Date: Sun Oct 29 10:07:08 2017 +0000 Add three functions and start using them; will be more useful soon. commit abacc349c7f34a590082183c574085e079786e02 Author: maxv Date: Sun Oct 29 10:01:21 2017 +0000 Add a fifth region, called "head". On kaslr kernels it contains the ELF Header and the ELF Section Headers. On normal kernels it is empty (the headers are in the "boot" region). Note: if you're using GENERIC_KASLR, you also need to rebuild the prekern. commit 714514ca1d1f3cd82bb54cc696e056063ab0e572 Author: maxv Date: Sat Oct 28 20:06:31 2017 +0000 It appears that Xen remaps the userland %cs to 0xE033. So add it to the checklist. Otherwise we're going through Luexit32: %fs gets reloaded, which sets the FS.base to NULL, which will cause the thread to page-fault next time it accesses its TLS (as seen in PR/52662). This fix is not very clean, and it would be nice to understand why Xen remaps %cs. But I'm committing it now anyway, so that people can test. commit 0c2ccd509229a079704e0f104a7afe973f261f84 Author: maxv Date: Sat Oct 28 19:28:11 2017 +0000 Fix a mistake I made in the very first revision. The calculation of the number of slots was incorrect in some cases, and it could cause the prekern to fault right away at boot time, or the kernel to fault when loading kernel modules near the end of the module map. The variables are divided by PAGE_SIZE to prevent integer overflows. commit 4c79a8787e72fa22eab9b0fbdbdef11f3823d4d9 Author: maxv Date: Mon Oct 23 06:00:59 2017 +0000 Add two XXXs, so that people don't get confused, a fifth region is needed anyway. commit 3004ddfa3d11ac26997f5665f1cc97fdf9a73b5d Author: maxv Date: Sat Oct 21 15:20:52 2017 +0000 SMAP on amd64. commit f6da6af0b171ffb5998a4d239b34a62acd89ce02 Author: maxv Date: Sat Oct 21 15:14:57 2017 +0000 USER_LDT on amd64. commit 29b4dde66719972f1ecdddaa73e688732e9f5e46 Author: maxv Date: Sat Oct 21 15:12:27 2017 +0000 USER_LDT done. commit be7dee3ed8e7a998b652d7782254ff45004dd51b Author: maxv Date: Sat Oct 21 08:27:19 2017 +0000 Forbid 64bit entries. That's it, now we support USER_LDT on amd64. commit 081da90136d16f7c7f3e65b6d5f22a14fb8ae64d Author: maxv Date: Sat Oct 21 08:08:26 2017 +0000 Use labels instead of disassembling *(%rip). intrfastexit is now the only place where the segregs can fault. commit 719f97115c32b9e8bea288a4ace746c86937a36b Author: maxv Date: Sat Oct 21 07:24:26 2017 +0000 Include opt_user_ldt.h when needed. commit 9adf42c4e0dc4d96a962494f507cec909022db19 Author: maxv Date: Sat Oct 21 07:23:22 2017 +0000 Handle by default. commit 30e916fad063cf9546725c87d1b541ce775339f7 Author: maxv Date: Sat Oct 21 06:55:54 2017 +0000 Improve our segregs model. Pass 3/3. Treat %gs the same way we treat %ds/%es/%fs: restore it in INTRFASTEXIT on 32bit LWPs. On Xen however, its behavior does not change, because we need to do an hypercall before INTR_RESTORE_GPRS, and that's too complicated for now. As a side effect, this change fixes a bug in the ACPI wakeup code; %fs/%gs were not restored on 32bit LWPs, and chances are they would segfault shortly afterwards. Support for USER_LDT on amd64 is almost complete now. commit 15dacac9342080c5ac1a32890ef8fc974ab7bf89 Author: maxv Date: Thu Oct 19 20:27:12 2017 +0000 Use cmpw. commit 8428b60f28f4dea656ce7f9c249bb0659aab3f7c Author: maxv Date: Thu Oct 19 19:05:53 2017 +0000 Improve our segregs model. Pass 2/3. Treat %fs the same way we treat %ds and %es. For a new 32bit LWP %fs is set to GUDATA32_SEL, and always updated in INTRFASTEXIT. This solves an important issue we had until now: we couldn't handle the faults generated by the "movw $val,%fs" instructions, because they were deep into the kernel context. Now %fs can fault only in INTRFASTEXIT, which is safe. Note that it also fixes a bug I believe affected the kernel: on AMD CPUs, setting %fs to zero does not flush the internal register state, and therefore we could leak the %fs base address when context-switching. This being said, I couldn't trigger the issue on the AMD cpu I have. Whatever, it's fixed now, since we first set %fs to GUDATA32 - which does flush the register state. commit 02d2480fce6a3cf45cba77b190219bc9fe39269e Author: maxv Date: Thu Oct 19 18:36:31 2017 +0000 Improve our segregs model. Pass 1/3. Right now, we are saving and restoring %ds/%es each time we enter/leave the kernel. However, we let %fs/%gs live in the kernel space, and we rely on the fact that when switching to an LWP, %fs/%gs are set right away (via cpu_switchto or setregs). It has two drawbacks: we are taking care of %ds/%es while they are deprecated (useless) on 64bit LWPs, and we are restricting %fs/%gs while they still have a meaning on 32bit LWPs. Therefore, handle 32bit and 64bit LWPs differently: * 64bit LWPs use fixed segregs, which are not taken care of. * 32bit LWPs have dynamic segregs, always saved/restored. For now, only %ds and %es are changed; %fs and %gs will be in the next passes. The trapframe is constructed as usual. In INTRFASTEXIT, we restore %ds/%es depending on the %cs value. If %cs contains one of the two standard 64bit selectors, don't do anything. Otherwise, restore everything. When doing a context switch, just restore %ds/%es to their default values. On a 32bit LWP they will be overwritten by INTRFASTEXIT; on a 64bit LWP they won't be updated. In the ACPI wakeup code, restore %ds/%es to the default 64bit user value. commit 961a3f96032b29afca4e304d3d33daa05f56afb4 Author: maxv Date: Thu Oct 19 10:01:09 2017 +0000 Always mask the 16 bits of the segregs in the trapframe. We don't zero- extend the uint64_t's when building it, so we're leaking 48 bits of kernel stack to userland. Having said that, it appears that I unintentionally fixed most of this issue in locore.S::rev1.127 - by building the frame with interrupts disabled, we are implicitly guaranteeing that the structure doesn't get overwritten by the kernel. Which means, we are leaking to userland data that comes from userland anyway. (still other places with this issue, but I'll fix them differently) commit b4d75fd824e383f99c48c3b07df400af91f23b15 Author: maxv Date: Thu Oct 19 09:32:01 2017 +0000 Make sure we don't go farther with 32bit LWPs. There appears to be some confusion in the code - in part introduced by myself -, and clearly this place is not supposed to handle 32bit LWPs. Right now we're returning EINVAL, but verily we would need to redirect these calls to their netbsd32 counterparts. commit 322b7f2e7551a600ee2e73a46c67cf5d56979911 Author: maxv Date: Wed Oct 18 17:12:42 2017 +0000 If a branch is already there, use it and don't create a new one. This way we can call mm_map_tree twice with neighboring regions. commit b42f485c60c4ac8ca522faad805e9b216cab5bc2 Author: maxv Date: Wed Oct 18 16:29:56 2017 +0000 Group the sections into segments, and align to KERNALIGN only between segments. Prerequisite for other changes. Unfortunately the code is not very compact, but whatever. commit 188546e124d1534a76551cb1edaa21ba5d525c9b Author: maxv Date: Tue Oct 17 07:48:10 2017 +0000 Move %ds and %es into the GDT on 64bit LWPs. commit e4cee2e6b689d82a2a6dc3845e8f92adc7b2231b Author: maxv Date: Tue Oct 17 07:33:44 2017 +0000 Have the cpu clear PSL_D automatically when entering the kernel via a syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point, they are now both cleared by the cpu (faster). However they still need to be manually cleared in the interrupt/trap entry points. commit 720de732652df964cb7b0d5f5d2b956f125762eb Author: maxv Date: Tue Oct 17 07:02:50 2017 +0000 fix comment, rdx, not edx commit 6a260d6e5f58adbfefbdb5c8d3696a13676e7c91 Author: maxv Date: Tue Oct 17 06:58:15 2017 +0000 Add support for SMAP on amd64. PSL_AC is cleared from %rflags in each kernel entry point. In the copy sections, a copy window is opened and the kernel can touch userland pages. This window is closed when the kernel is done, either at the end of the copy sections or in the fault-recover functions. This implementation is not optimized yet, due to the fact that INTRENTRY is a macro, and we can't hotpatch macros. Sent on tech-kern@ a month or two ago, tested on a Kabylake. commit 6a1b00f490384eac99bd4252242a7d2a2d5b4913 Author: maxv Date: Sun Oct 15 13:34:24 2017 +0000 Mmh, don't forget to clear the TLS gdt slots on Xen. Otherwise, when doing a lwp32->lwp64 context switch, the new lwp can use the slots to reconstruct the address of the previous lwp's TLS space (and defeat ASLR?). commit 54316c6206690b64b8c2c3cc5dfeae966ae05b26 Author: maxv Date: Sun Oct 15 12:49:53 2017 +0000 Use two separate functions: cpu_segregs32_zero and cpu_segregs64_zero. The way segment registers work on amd64 will diverge between 32bit and 64bit LWPs. commit 4c5737040011efc6c1d348fee951ad65a6b75f9c Author: maxv Date: Sun Oct 15 11:39:42 2017 +0000 Remove this #undef on native amd64, but keep it on Xen. commit c4ae9ce9b0be658274c46e968b22cdd522ac346a Author: maxv Date: Sun Oct 15 11:36:15 2017 +0000 Make sure the 32bit LWPs don't have MDL_IRET set. That's not a problem right now, but will be in the future. commit c0f72f04d62be2dd6e23a9d8977b4648af1839f3 Author: maxv Date: Sun Oct 15 11:31:00 2017 +0000 Add setds and setes, will be useful in the future. commit 01f055b8bdd0d31be74d846dafd86edc23e4141b Author: maxv Date: Sun Oct 15 10:58:32 2017 +0000 Add setusergs on Xen, and simplify. commit 157bba0210c3ac7db8cc0e56d85901e86df57e3b Author: maxv Date: Sun Oct 15 06:37:32 2017 +0000 Descend the page tree from L4 to L1, instead of allocating a separate branch and linking it at the end. This way we don't need to allocate VA from the (tiny) prekern map. commit 4cb032f075cf09a484410bb87d506849aedfc9e6 Author: maxv Date: Fri Oct 13 10:39:26 2017 +0000 Introduce two functions, and dedup code. commit 89ee7cd75b7fd1b5ae952915eb2c4d8f5ca2bc90 Author: maxv Date: Fri Oct 13 10:04:27 2017 +0000 Constify offset, it must not change. commit 874efd4ab774939298d5da640dd1ea58a835c2da Author: maxv Date: Wed Oct 11 16:56:26 2017 +0000 Use bootspace. commit d6d42bc128ea6c78cbe9c710784a3d11f695e95d Author: maxv Date: Wed Oct 11 16:21:06 2017 +0000 Make sure we're relocating a relocatable kernel. commit 650a891fae347b10cfd123da64bbe8ced0dfad7d Author: maxv Date: Wed Oct 11 16:18:11 2017 +0000 Remove this #if, these options belong to the kernel and not the prekern. No real change since eblob is always here. And I was apparently drunk when writing some comments. commit 9dbd6a45ddbeb2e24bc8169d80a30b72dab54227 Author: maxv Date: Wed Oct 11 16:13:16 2017 +0000 Add an alignment to fill strictly all of the padding; does not increase the size of the prekern. commit 310a8f7f7c52fee2dcd6c8a0a2dc6921c9f38427 Author: maxv Date: Wed Oct 11 09:53:14 2017 +0000 Reset has_prekern if pkboot fails. Otherwise here: pkboot wrong_kernel_path boot netbsd the prekern still gets invoked in the second command. commit 0555d4c3927e756f3b0390c2d975311c67a5c0ed Author: maxv Date: Tue Oct 10 09:29:14 2017 +0000 Add the amd64 prekern. It is a kernel relocator used for Kernel ASLR (see tech-kern@). It works, but is not yet linked to the build system, because I can't build a distribution right now. commit a447f14c475971e3e1e7e5610b13510385ce6fd0 Author: maxv Date: Sun Oct 8 13:51:31 2017 +0000 Improve comments. commit bf6c0e78abed7fbbd781cd5b208818b50b221e0d Author: maxv Date: Sun Oct 8 13:49:38 2017 +0000 Use roundup instead. Otherwise some (userland) pages could get mapped in the text large pages. We were using roundup to improve performance on i386 (mapping the text with large pages even if it was not aligned). But we're in a state where correctness matters more than performance - the correct way to get performance here is to align .text to 4MB. commit fa7ca063062e92987dde5d7ed3c4946456ece5b3 Author: maxv Date: Sun Oct 8 09:06:50 2017 +0000 KASLR: add workarounds to compute the bootinfo VAs (use the direct map), and don't use large pages yet. Both will be fixed later. commit dafff27619891150688d6adbbe01ff148bf743b7 Author: maxv Date: Sun Oct 8 08:26:01 2017 +0000 Add the prekern entry point in the kernel. commit cd2ae5172cdd60884c8ae5a5fab1e16af4a10049 Author: maxv Date: Sat Oct 7 10:32:56 2017 +0000 Bump bootloader version, support for booting KASLR amd64 kernels. commit c9c143633993187e6d9b4b75735162d9ea0f9013 Author: maxv Date: Sat Oct 7 10:26:38 2017 +0000 Add a new option in libsa, to load dynamic binaries. A separate function is used, and it does not break in any way the generic static loader. Then, add a new "pkboot" command in the x86 bootloader, which boots a GENERIC_KASLR kernel via the prekern. (See thread on tech-kern@.) commit 70dee003e420ae06c4b04d102c964be20c9441f7 Author: maxv Date: Sat Oct 7 10:16:47 2017 +0000 Add GENERIC_KASLR, only toolchain parts for now. commit 084a8872ba30ffdfbb690365326871adf409c4c9 Author: maxv Date: Mon Oct 2 19:23:16 2017 +0000 Add a machdep.tsc_user_enable sysctl, to enable/disable the rdtsc instruction in usermode. It defaults to enabled. commit 74a604890ff17593a0b363a8335c6fdf1026d7f7 Author: maxv Date: Sat Sep 30 12:35:48 2017 +0000 use bootspace (this branch is never taken) commit f3409b8f1d601a6b185cad1f6fe3ae9091ef8ed2 Author: maxv Date: Sat Sep 30 12:29:58 2017 +0000 Declare pmap_remap_global, and map the four regions independently with bootspace. commit c61789667e8b8cb14a0320e334fa81c56ba2c5ba Author: maxv Date: Sat Sep 30 12:12:29 2017 +0000 use bootspace commit cb5f837a069a2c7e9bef416596d1338775e0fd81 Author: maxv Date: Sat Sep 30 12:01:56 2017 +0000 use bootspace commit 32046ce3b3ec99f0110b06b7e8cdc4f2f8d51703 Author: maxv Date: Sat Sep 30 11:43:57 2017 +0000 Add a bootspace structure. It describes the physical and virtual space layout created by the early kernel bootstrap code. Start using it, and eliminate several references to KERNBASE and other global symbols. While here clean up xen-i386, it's really tiring. commit b036f7f88c2fac19427ea500398d205a939cf530 Author: maxv Date: Fri Sep 29 17:47:29 2017 +0000 Remove compat_linux32 from the autoload list and add a enable/disable sysctl, like compat_linux. commit f4568e4a943a277edb9ada7c64ef6ae6b6e6a70a Author: maxv Date: Fri Sep 29 17:08:00 2017 +0000 Remove compat_linux from the autoload list, and add a sysctl to enable or disable it - which defaults to disabled. The following command is now required to use linux binaries: sysctl -w emul.linux.enabled=1 After a discussion on tech-kern@. All the other ideas to reduce the attack surface have drawbacks, and this sysctl seems to be the best option. commit bf30151722f22fbc07a53c5453c47b21bafb48a3 Author: maxv Date: Thu Sep 28 17:48:20 2017 +0000 Pack the useful variables at the end of the trampoline page; eliminates a hard-coded dependency on KERNBASE. Note that I cannot test this change on i386 right now, but it seems fine enough. commit 8ee655b7158083041b725cdb80d1a6d72d428342 Author: maxv Date: Thu Sep 28 17:35:08 2017 +0000 Clean up, and initialize the lwp0 fields in init_x86_64. commit 0b76fdd8aecc7e43bcd1a3e8a841fa388a957feb Author: maxv Date: Mon Sep 25 20:39:21 2017 +0000 Clean up and split loadfile, reduces a patch I have. commit 4f69c747e1ed6784f70ed662386b4e302d14a1ff Author: maxv Date: Sat Sep 23 11:01:32 2017 +0000 Make MTRR_GET privileged, the structures are not always zeroed (thereby leaking information), and beyond that we are not particularly interested in letting userland know how the kernel uses its MTRRs. commit 54af0ad641b21fe5e3e201a96589d21565a3dc91 Author: maxv Date: Sat Sep 23 10:38:59 2017 +0000 Initialize the errata MSRs when waking up, otherwise they are clear and we're re-enabling certain CPU bugs. commit 6a56603d2419281aa5116df85fb42320cbc7e20c Author: maxv Date: Sat Sep 23 10:18:49 2017 +0000 Make sure %edx is clear. commit ad0661ae7ccb45b0d478eac15e5cb5a607598bd3 Author: maxv Date: Sat Sep 23 10:00:00 2017 +0000 Reinitialize the PAT MSR when waking up, otherwise the write-combined pages become write-through. commit 3c10dacb4f46678bb713658354cc7f9aa2aea0c7 Author: maxv Date: Sun Sep 17 09:59:23 2017 +0000 Declare INTRFASTEXIT as a function, like amd64; will be expanded soon. commit d2ad45316fd0dfeed4a7b436034942e0a0ab4490 Author: maxv Date: Sun Sep 17 09:41:35 2017 +0000 Remove the second argument from USERMODE and KERNELMODE, it is unused now that we don't have vm86 anymore. commit e901692be2edd25f71ef2a910f6489ba871ec26f Author: maxv Date: Sun Sep 17 09:11:19 2017 +0000 Remove tlog.h - unused now. Note that it is not installed. commit 1130ea96dd71b2e74a7db5c15c8c762bd52fa7d4 Author: maxv Date: Sun Sep 17 09:04:51 2017 +0000 Remove TRAPLOG from i386. Nowadays there are better instrumentation tools, in both software and hardware. commit f509a1c26622acad21e6fd36c7da54c037217873 Author: maxv Date: Sat Sep 16 09:28:38 2017 +0000 Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves 10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU. commit 95ec3176578c7b0b8408c5698aa0fbd6209f833c Author: maxv Date: Fri Sep 15 17:32:12 2017 +0000 Declare INTRFASTEXIT as a function, so that there is only one iretq in the kernel. Then, check %rip against the address of this iretq instead of disassembling (%rip) - which could fault again, or point at some random address which happens to contain the iretq opcode. The same is true for gs below, but I'll fix that in another commit. commit 3af44b456c948d0a98814a568b21b4794ffec58b Author: maxv Date: Fri Sep 15 17:22:09 2017 +0000 Obviously, I was being absolutely dumb here; it's XEN, not Xen. commit 608ea932bcad4253c77aad1b0677402ae2b426ef Author: maxv Date: Sun Sep 10 10:51:13 2017 +0000 simplify commit 68bcde723baca1eb1ee9ce273fe04355de6a3b21 Author: maxv Date: Sun Sep 3 09:19:51 2017 +0000 Declare onfault_restore, and be stricter with SMEP. commit 44624c8aabcfd7aeff5f10292b5c70d023e42401 Author: maxv Date: Sun Sep 3 09:01:03 2017 +0000 Treat page faults from iretq/etc as fatal, otherwise we could hide kernel stack bugs. Note that it would be good to call check_swapgs from trap0e, but a few things need to be fixed before that. commit 698739faf0caa163ec94505df2100115b97c97b0 Author: maxv Date: Sun Sep 3 08:52:18 2017 +0000 Remove useless debug code, and split trap() into smaller functions, easier to understand. NMIs take another, faster path now. No functional change beyond that. commit e3054ddab0c4bc13f6f1762bdb83479a5fe107c6 Author: maxv Date: Sat Sep 2 12:57:03 2017 +0000 Fix a subtle ring0 escalation vulnerability in amd64, and implement a mitigation against similar bugs. The operations on segment registers can generate a page fault if there is an issue when touching the in-memory gdt. Theoretically, it is never supposed to happen, since the gdt is mapped correctly. However, in the kernel we allow the gdt to be resized, and to do that, we allocate the maximum amount of va needed by it, but only kenter a few pages until we need more. Moreover, to avoid reloading the gdt each time we grow it, the 'size' field of gdtr is set to the maximum value. All of this means that if a mov or iretq is done with a segment register whose index hits a page that has not been kentered, a page fault is sent. Such a page fault, if received in kernel mode, does not trigger a swapgs on amd64; in other words, the kernel would be re-entered with the userland tls. And there just happens to be a place in compat_linux32 where the index of %cs is controlled by userland, making it easy to trigger the page fault and get kernel privileges. The mitigation simply consists in abandoning the gdt_grow mechanism and allocating/kentering the maximum size right away, in such a way that no page fault can be triggered because of segment registers. commit 13936b019825eadd3e6ba29ab632b7f20164f2df Author: maxv Date: Thu Aug 31 15:41:14 2017 +0000 check sc_eip in the ldt branch too commit 4635c7518b80c8eb429fa830b498c48da9f4358f Author: maxv Date: Thu Aug 31 10:30:58 2017 +0000 Add a layer of mitigation against the intel sysret vuln: restore %gs when sysretq faults. Right now we try to make sure that %rip is canonical by performing sanity checks in several places, but I've already found missing checks two times already, and there may be others. By performing an additional swapgs here, we are turning ring0 exploits to simple DoSes - which are still security bugs, but of a lower impact. commit eec3b82aeaa950969dc20136ceb62242d913244e Author: maxv Date: Thu Aug 31 09:33:19 2017 +0000 Reorder for clarity, and style. commit a3e2288ac1f8dfed8eda40c4ac09a98c03ca5471 Author: maxv Date: Thu Aug 31 09:27:28 2017 +0000 Construct the trap frame with interrupts disabled, for safety, just like the rest of the interrupt entry points. commit 8c778261acc77a52f3304131ef90ee807f1d1fc1 Author: maxv Date: Wed Aug 30 16:01:55 2017 +0000 Make these pages non-executable, and style. commit 38374088632fb5782dbcad18f5b442495cd66dbd Author: maxv Date: Wed Aug 30 15:46:19 2017 +0000 Don't test call gates, they are not supported anymore. commit 71b4a94ac35acff21dbc264f5d7714c25a37cd2d Author: maxv Date: Wed Aug 30 15:44:01 2017 +0000 Don't allow userland to create 286/386 call gates anymore - they are not used by Wine. While here, don't allow it to overwrite the static entries either, don't allow unknown entry types, remove LDT_DEBUG, and style. commit 4160377b7edc070a426a0263a218f3c6e0c2e496 Author: maxv Date: Wed Aug 30 15:34:57 2017 +0000 Pfff, use %ss and not %ds. The latter is controlled by userland, the former contains the kernel value (flat); FreeBSD fixed this too a few weeks ago. As I said earlier, this dtrace code is complete bullshit. commit fca4bb1a5253a1acd4bc76cc6d599877c529e900 Author: maxv Date: Sun Aug 27 09:32:12 2017 +0000 style, and move some i386-specific code into i386/ commit 9673bb3af1b030ca1357f9ce5fe084a1597500ab Author: maxv Date: Sun Aug 27 08:38:32 2017 +0000 Localify. By the way, we should use a different stack for NMIs. commit ab6c51d634fc0772095cd14518a19ce4d4a5b340 Author: maxv Date: Fri Aug 25 11:35:03 2017 +0000 Move incq outside of the copy section. No functional change, reduces my smap diff. commit 3307cfbd2ee429e062c73b679e6d768b25a815be Author: maxv Date: Fri Aug 25 11:05:46 2017 +0000 Split comment, otherwise it is misleading. kcopy operates on kernel memory, and must *not* be used with userland pages. commit 4ea214eb59e5aaced1b86db67a4a06388fd1c22c Author: maxv Date: Wed Aug 23 08:14:18 2017 +0000 style, reduces an incoming diff commit a3ab657f3ca243923cfa97f06c64b7a4bb9de8b1 Author: maxv Date: Wed Aug 23 08:04:22 2017 +0000 Fix a bug in ucas_32 and ucas_64. There is a branch where they don't initialize %rax. commit bb57f4f91ad3bc75ce6e968122cc59901ec61692 Author: maxv Date: Tue Aug 22 09:12:49 2017 +0000 Apply only CCR. Otherwise userland could set PSTATE_PRIV in %pstate and get kernel privileges on the hardware. ok martin commit f94707ac4986925d62e8bbedfae3dffb0555714c Author: maxv Date: Sun Aug 20 11:06:35 2017 +0000 spl leak, found by mootja commit 465de3f57628907748e37563b602a22fc04419aa Author: maxv Date: Sun Aug 20 11:05:24 2017 +0000 as the xxx implicitly points out, there's a division by zero here, so panic right away; found by mootja commit 2d0a5074a5d696b5cae96818b7ae2d865235f211 Author: maxv Date: Sun Aug 20 11:03:04 2017 +0000 spl leak, found by mootja commit 532a9719e55689ea414c7394f499e6e7f415efb3 Author: maxv Date: Sun Aug 20 11:00:30 2017 +0000 M_WAITOK cannot fail, so remove the test, otherwise it looks like an spl leak; found by mootja commit 2c54eed5e62943371627c7ec88f16b563b4814b4 Author: maxv Date: Sun Aug 20 10:55:37 2017 +0000 M_WAITOK cannot fail, so remove the test branches. Otherwise it looks like leak/uninitialized area. commit e37071fa22145231f204d4a45b3c5813ae316222 Author: maxv Date: Fri Aug 18 14:52:19 2017 +0000 Revert my previous change. I hadn't checked carefully enough: the symbols are used in src/external. There is a number of things that seem wrong to me here, but I'm not changing them for now. commit 918a78d01f42eb0b21119503bfb85b74b00eb899 Author: maxv Date: Fri Aug 18 10:28:53 2017 +0000 Fill the .text padding with 0xcc (int3), in such a way that any jump into this area will automatically fault. The alignment within the section is necessary, in order to fill strictly all of the padding (took me a while to figure this out); but it does not change the kernel size. Greatly inspired from FreeBSD, but for some reason they decided not to apply the alignment. commit cd24cbcecdf56f173742902ea2dc0efc2bdc170a Author: maxv Date: Fri Aug 18 10:02:37 2017 +0000 Remove unused and broken code. On amd64 we won't want int3 from kernel mode to be valid. commit 950f9980bf3e1acdb88f44c901dcf2a43864e27b Author: maxv Date: Tue Aug 15 09:25:00 2017 +0000 Reduce the diff between amd64 and i386, and style. commit b5997a269455e7b642a09a88e16da59a2687f0d7 Author: maxv Date: Tue Aug 15 09:16:59 2017 +0000 Remove unused arg, to have the same definition as amd64. commit 5aab3a86885353af1648673b5b78690edb3ce722 Author: maxv Date: Tue Aug 15 09:08:39 2017 +0000 Rename intrddb -> intrddbipi, like i386. commit f6920d3c3497818b43d78b97fc17f4ca8ff96323 Author: maxv Date: Tue Aug 15 08:57:19 2017 +0000 style commit da45f39a103ff408d6b139d54dc77b90acc52af3 Author: maxv Date: Tue Aug 15 08:51:38 2017 +0000 Merge into x86/. commit c11d0cb636970c97095c59fbbb159dd49479d789 Author: maxv Date: Tue Aug 15 08:35:56 2017 +0000 Reduce the diff between amd64 and i386. It also fixes a bug in amd64, where large pages were not handled correctly. commit 961cb20613540ce5eb7bf69a53ed321960b57c89 Author: maxv Date: Tue Aug 15 06:57:53 2017 +0000 Reduce the diff between amd64 and i386. commit 5d4dea5b09a431f5ad6fc29fb80b1ca22a323e01 Author: maxv Date: Tue Aug 15 06:39:37 2017 +0000 Remove __ELF__ vestige. commit 7d15937fd65cf04f61e5208698556e1d50d866ef Author: maxv Date: Tue Aug 15 06:37:50 2017 +0000 style commit eaa150b6f3ad5e7adf10963f67a337e4a403fe4e Author: maxv Date: Tue Aug 15 06:27:39 2017 +0000 Merge into x86/. commit 7691907e11ef4a28be74afe545da2ec9c7857b20 Author: maxv Date: Tue Aug 15 06:04:28 2017 +0000 Reduce the diff between i386 and amd64 (bios32_service not implemented there). commit 28008c0d6e5433257df1ba116c73701cbc8ee4e3 Author: maxv Date: Sun Aug 13 08:07:52 2017 +0000 Mmh, restore %cs and %ss on Xen. Otherwise (unpriv) userland could set a non-three ring, causing the hypervisor to send a fatal interrupt to the kernel. commit ed54495d19a01f73bd34bbd9d1fec199f0e99655 Author: maxv Date: Sun Aug 13 07:16:44 2017 +0000 Remove unused include, remove dead code, KNF, and fix off-by-one. commit ab98486f8eff037bba672c7a7d8bcddd258bc83e Author: maxv Date: Sat Aug 12 19:48:28 2017 +0000 Bump - removal of i386_vm86 and i386_pmc. commit 3259c625a04a9ba954c85b561705997f7e17445f Author: maxv Date: Sat Aug 12 13:16:14 2017 +0000 Remove references to PSL_VM (implicitly vm86). commit 71c57f54ff5922046024055c8d77dda27b4fc3e5 Author: maxv Date: Sat Aug 12 13:11:23 2017 +0000 Remove the vm86 fields from the trap frame. It seems to me that we could now remove the '-16' when initializing pcb_esp0. commit 967e8772db3118a2038439591e0748da269cceab Author: maxv Date: Sat Aug 12 12:48:47 2017 +0000 Remove the vm86 fields from the pcb. commit 01cf7b742f3eb1d66182b2e6ad9dbc59c99ec482 Author: maxv Date: Sat Aug 12 12:33:31 2017 +0000 Don't include opt_vm86.h. commit 08cfc138ecf490576ce81a8b3260669dab021487 Author: maxv Date: Sat Aug 12 08:45:58 2017 +0000 State that this is SVR3, not iBCS2. commit ff35c3ae8f2a7d3867a13e1333772af1c1e66688 Author: maxv Date: Sat Aug 12 08:21:30 2017 +0000 All things considered, remove the i386_pmc API. I deprecated it some months ago, and clearly no one should be using it. (reminder: our new PMCs use the same sysarch, but the arguments are opaque and not compatible with the previous versions) commit 3696984a5edadc74e9a4b5a6ea5685df6a3a81b1 Author: maxv Date: Sat Aug 12 08:03:57 2017 +0000 Remove reference to vm86. commit f5990407e6403bf3c4b283b2208e7904fd7fece2 Author: maxv Date: Sat Aug 12 07:59:42 2017 +0000 Remove the i386_vm86 API (instead of just deprecating it). This API is not available anymore, and any binary using it won't function correctly. commit 53196a0f374e5cdf4ce8355163ab572047df34cb Author: maxv Date: Sat Aug 12 07:40:43 2017 +0000 Remove the vm86 tests. commit 34837190075740d032c7da505618bc888d0496e0 Author: maxv Date: Sat Aug 12 07:35:08 2017 +0000 Remove vm86. Pass 4. commit e017aec53e84e8705687b256171c131873d1fcaf Author: maxv Date: Sat Aug 12 07:21:57 2017 +0000 Remove vm86. Pass 3. commit 81ce6dad7cf5e11715127f1d28c8fb9721cdf869 Author: maxv Date: Sat Aug 12 07:07:53 2017 +0000 Remove vm86. Simplifies a number of critical places. Pass 2. commit c43c714eb7f7555d5111aa939adfcba9faeafc3b Author: maxv Date: Sat Aug 12 06:46:13 2017 +0000 Remove support for vm86 on i386. It is bug-friendly, and there is no point in having kernel support for this: the instruction set of the CPU is small, and it can easily be emulated in userland entirely. There are also several assumptions in the code that are not respected, and the slightest confusion in the trap frame can lead to ring0 exploits. vm86 has received zero maintenance. As far as I can tell, it was added 20 years ago in order to make doscmd work. But doscmd has not been maintained either, and was removed from pkgsrc in 2011. dosbox can be used instead: it does not require kernel support, and will produce better results than our flimsy implementation. Pass 1. (many pieces still in the tree) commit 351e23320752c3fceed46562b40bdbf86a949a4a Author: maxv Date: Fri Aug 11 12:58:14 2017 +0000 Don't build the ibcs2 module on i386. commit 1dc69f379a0212dcaac24515032064f3b6e0fd7b Author: maxv Date: Fri Aug 11 06:27:12 2017 +0000 Add a comment about APICBASE_PHYSADDR. Has to do with PR/42597. commit a50e66d1c0e7befeed3cf094d2811e00e28365e8 Author: maxv Date: Fri Aug 11 06:18:29 2017 +0000 Fix a bug introduced in r1.55, this should be LAPIC_BASE. commit f44587a14a098a5f8eac7029c3f986dfc18250ee Author: maxv Date: Thu Aug 10 17:33:32 2017 +0000 Pff, I forgot to revert my change in these files. I committed only the GENERIC files, and the message was: Revert my changes, and re-enable COMPAT_NOMID, COMPAT_09 and COMPAT_43. Several compat options happen to be dependent on the compat_43_* functions, the availability of which is (wrongfully) controlled with COMPAT_43. Same for COMPAT_09. commit 33c35ad6c763e9fa6c38bbb2f18ebd1c8bc22fd1 Author: maxv Date: Thu Aug 10 14:13:45 2017 +0000 Switch to the temporary stack right away when booted via multiboot. GRUB happens to give a correct stack, but it is not guaranteed by the spec. This temporary stack will be reset later, which is fine. Fixes PR/50245. commit 5f8d63ecf093992e29bb03d6b178466c3d1a98e8 Author: maxv Date: Thu Aug 10 13:39:08 2017 +0000 Should be comp-obsolete. commit 4877a30a7b8e7926e8a7ed0c03bc42c33b54b467 Author: maxv Date: Thu Aug 10 13:13:03 2017 +0000 Save and restore xcr0 when doing ACPI sleeps. Should fix PR/49174. commit 978b4c7b0514e6290aba15e56d7148e4c8841f2b Author: maxv Date: Thu Aug 10 12:51:22 2017 +0000 Don't include opt_compat_freebsd.h. commit 1f4c774c695dc7c2cd7bd9219cd07320b722e620 Author: maxv Date: Thu Aug 10 12:49:11 2017 +0000 Don't include opt_compat_ibcs2.h. No idea what it was doing in amd64, since it never got implemented there. commit 89aa159598d0c9711f84c195fd54ac101a3a1bfe Author: maxv Date: Thu Aug 10 12:46:31 2017 +0000 Remove the svr4/ibcs2 fpu flags. commit 7585203c3cd818ff88ca96fa494bd2ca419db32f Author: maxv Date: Wed Aug 9 19:18:59 2017 +0000 Remove references to svr4 and ibcs2, they are not supported here. commit 3075e4a110fd6e09adb1576a93cd6eee0459d70f Author: maxv Date: Wed Aug 9 19:11:13 2017 +0000 Remove several dead entries from the x86 makefiles. Looks like people (me included) regularly forget to take care of this. commit ad9c7fb07a7b68b57cf2a991cd28a68f0918c6a9 Author: maxv Date: Wed Aug 9 18:58:51 2017 +0000 Remove ibcs2_machdep.h on i386, and don't install it. commit 4ebb2b837a0d09b1a2f92f4e7008e70f8f2f462a Author: maxv Date: Wed Aug 9 18:52:00 2017 +0000 Remove __i386__. commit 205db71845b511a4f9b368131920c1c162a87d1d Author: maxv Date: Wed Aug 9 18:48:53 2017 +0000 Remove references to compat_ibcs2. commit 67980d44e3e68597b238558e044f9a2d201b1b69 Author: maxv Date: Wed Aug 9 18:45:30 2017 +0000 Remove compat_ibcs2 from i386. After a discussion on port-vax, it turns out that compat_ibcs2 does not implement the iBCS2 standard - which is x86-specific - but rather SVR3. Our real iBCS2 implementation was a mixture of compat_ibcs2 and compat_svr4, and was only partial. Keeping support for this in i386 is totally irrelevant today. I also asked on port-i386 but didn't wait long. The main issue is that compat_ibcs2 should have been called compat_svr3. But CVS does not support renaming files, and moving things around is both painful and tiring, even more so when no one seems to be interested in doing this work or in the feature at all. For now compat_ibcs2 is available on Vax and will stay, until someone (not me) cleans it up. commit e1fe920c49f58305c5fdaf0e5cd58dc24d25d782 Author: maxv Date: Tue Aug 8 17:27:34 2017 +0000 Mmh, don't overwrite tf_err and tf_trapno. Looks like it can be used to exploit the intel sysret vulnerability once again. commit da29c4eb50409db951bcb4793989c0cd4c4ba62d Author: maxv Date: Tue Aug 8 17:00:42 2017 +0000 Remove dumb debug code and outdated comment. commit 259b7bfeec00722e8b91d06c0384e6447755a7fb Author: maxv Date: Tue Aug 8 16:57:32 2017 +0000 Remove compat_svr4, compat_svr4_32 and compat_ibcs2 from the list of autoloaded modules. These options are disabled everywhere (except ibcs2 on Vax, but Vax does not support kernel modules, so doesn't matter), therefore there is no issue in removing them from the list. Interested users will now have to do a 'modload' first, or uncomment the entries in GENERIC. commit ed54b5b812bc13ead3aa8b150806640a8bb8d0ba Author: maxv Date: Tue Aug 8 08:12:14 2017 +0000 Remove compat_freebsd from the list of autoloaded modules. Interested users will now have to type 'modload' to use it, or uncomment the entry in GENERIC. I should have removed it when I disabled COMPAT_FREEBSD by default, sorry about that. commit 8b1f5c03b044be9a481051af5b9a62f4093c8d23 Author: maxv Date: Tue Aug 8 08:04:05 2017 +0000 Move freebsd_machdep.h into sys/compat/freebsd, and don't install it. Now, the compat_freebsd files are all contained in sys/compat/freebsd. commit f66b0a1d33cae273b54e90ff377147219c279abd Author: maxv Date: Mon Aug 7 17:31:11 2017 +0000 Fix GCC warning on NET4501, PR/52451. commit ca97d1b39bac4344ae544bb30dfa2503a10b118e Author: maxv Date: Mon Aug 7 17:10:09 2017 +0000 Remove incorrect KASSERT, only the allocation is protected by cpu_lock. commit 4391ae463a8f715d0cf997b46860379241451459 Author: maxv Date: Sun Aug 6 08:11:38 2017 +0000 Mention high mem. commit 159c658d09f3c5abd51b26ba3c23f0eec51763ef Author: maxv Date: Sun Aug 6 08:07:37 2017 +0000 Mention PMCs. commit 706e775a258994d75930258fe60a8414c2a27169 Author: maxv Date: Sun Aug 6 08:00:40 2017 +0000 Deprecate. commit 8b54dd6a77b9c38b4973ba3d027ce1ab35a8fa59 Author: maxv Date: Fri Aug 4 09:33:03 2017 +0000 typos commit 232cd22a44c4ce24ee51ab8bc6649792ad256a41 Author: maxv Date: Fri Aug 4 09:30:19 2017 +0000 Revert my changes, and re-enable COMPAT_NOMID, COMPAT_09 and COMPAT_43. Several compat options happen to be dependent on the compat_43_* functions, the availability of which is (wrongfully) controlled with COMPAT_43. Same for COMPAT_09. commit ffa34db72a5de6222bf831f034cb77ee442e48e1 Author: maxv Date: Tue Aug 1 14:43:54 2017 +0000 Move arch/i386/i386/freebsd_* into compat/freebsd/. COMPAT_FREEBSD is i386-specific. commit aed43e48024570ab23cfd5d8ef268b301c6f58c2 Author: maxv Date: Tue Aug 1 14:23:42 2017 +0000 Remove references to compat_freebsd when it is not supported. commit e6b3582001bf044c1a20c1664a18651fe0495403 Author: maxv Date: Tue Aug 1 13:57:03 2017 +0000 Remove svr4_machdep.h right away, no one should include it. commit 9272d3c7805926275a8f73903f992ac0b3d2f85c Author: maxv Date: Tue Aug 1 13:49:50 2017 +0000 Don't build the svr4 module on i386. commit ba8d84a6868134f365a9f6575216c1f5dd7d54e0 Author: maxv Date: Tue Aug 1 13:47:49 2017 +0000 Don't include files.svr4 and files.svr4_32. commit 03417a557dccb3645eb679f9cf52a07e9656aa0f Author: maxv Date: Mon Jul 31 18:54:40 2017 +0000 Use idt_vec_set instead. commit 303384bcc8683b2aa50bd8835d13a0bc46794e01 Author: maxv Date: Mon Jul 31 15:51:27 2017 +0000 Fix TCPCTL_NAMES, and remove TCPCTL_VARIABLES. commit 8fdd7ab3ad6d1271f584647f5914d775d7b60e4f Author: maxv Date: Mon Jul 31 15:43:33 2017 +0000 Disable all the compat options until COMPAT_10. NetBSD 1.0 was released on October 26 1994; 23 years of compatibility is enough. Discussed with christos quickly. commit 5760768b44d2e21066869705591a5d62def9610f Author: maxv Date: Mon Jul 31 15:38:01 2017 +0000 Remove references to COMPAT_OLDSOCK (itself removed years ago). commit 0aa35633a2c13d2fe002870919224e60e59ca40f Author: maxv Date: Sun Jul 30 16:13:24 2017 +0000 Remove references to COMPAT_IRIX - does not exist anymore. I believe svr4_machdep.h should be removed when the option is not implemented on the target architecture; and we should also remove the associated md.* entries. commit 3fddccdb14cad1418809eb82ce955c99ef786a46 Author: maxv Date: Sun Jul 30 16:07:06 2017 +0000 Remove lurking reference to TCP_COMPAT_42. commit 0b5a72baf3140fe3994877bd1d67b3d3c7b47295 Author: maxv Date: Sun Jul 30 13:12:49 2017 +0000 Disable svr4 and svr4_32 on sparc, sparc64 and amiga - the only places where they were still enabled. commit 761e6a02576c2cb2d170cb91e00fa71195ab8535 Author: maxv Date: Sat Jul 29 18:08:56 2017 +0000 Remove TCP_COMPAT_42 from the config files. Pass 3. commit 5a06981d8a91bcddea3401814ecff912b4565ccb Author: maxv Date: Sat Jul 29 13:05:15 2017 +0000 Remove unused. commit 249faed7b76a748d41d655e80c1e9efd1b100a6e Author: maxv Date: Sat Jul 29 12:34:34 2017 +0000 Remove undocumented hack. commit 42667fa37bc40b80665ae5fd4eec55da1dfd2afb Author: maxv Date: Sat Jul 29 12:28:27 2017 +0000 Remove TCP_COMPAT_42 from the config files. Pass 2. commit 3be21989adced6167875187aa86716e21bd16f91 Author: maxv Date: Sat Jul 29 12:15:12 2017 +0000 Remove references to i386. commit d02e99182cac38bf278c7e6d8969f872a2e8ca90 Author: maxv Date: Sat Jul 29 12:07:45 2017 +0000 Unlink svr4_machdep.h. commit f1fa824c03abfcab0dafa343ebe93344e05e9ec1 Author: maxv Date: Sat Jul 29 12:03:37 2017 +0000 Remove i386. By the way, it looks like several architectures are missing here. commit 89d6a0c5be6df9236603869c086fe4fe6909ed64 Author: maxv Date: Sat Jul 29 12:00:56 2017 +0000 Remove svr4 from the config files. commit 602ce9567d1316cac05a3d8fd19b20164d3876a5 Author: maxv Date: Sat Jul 29 11:54:14 2017 +0000 Drop support for svr4 on i386. This feature is not maintained, not reliable, and of a limited use case. Most svr4 applications got time to be ported to linux, and we do have a functional, maintained linux emulation. Reduces the number of entry points into the kernel, the number of places that need special care (cpu context). Note that compat_svr4 is still available on sparc. commit 4f68cdf6c087571c6ad356abcfb703c9a3933e37 Author: maxv Date: Sat Jul 29 10:39:48 2017 +0000 Remove exec_aout support in compat_freebsd. The only reason we still have compat_freebsd is because of tw_cli, and it is an elf32 binary (could test, manuel sent it to me). commit 80aa9d2197025542828a93052a2d2b938864871e Author: maxv Date: Sat Jul 29 07:19:47 2017 +0000 Remove DEBUG_HPUX (does not exist). commit 84b389be7310ceeb904b63916c19e99569af08c5 Author: maxv Date: Sat Jul 29 07:16:14 2017 +0000 Remove IBCS2_DEBUG (does not exist). commit 580d991a56036a88dbedb0830b14e8fa0e0b45c1 Author: maxv Date: Sat Jul 29 06:29:31 2017 +0000 Remove the remaining parts of compat_oldboot. commit 389ba963fcf3b3ad1028521112be8b895c781d86 Author: maxv Date: Sat Jul 29 06:12:50 2017 +0000 Only compat_43 needs compat_osock. Note that the use of vec_compat_ifioctl is racy. commit ff1f6380754e16174d7aa06626c2584d810de872 Author: maxv Date: Sat Jul 29 05:59:08 2017 +0000 Disable COMPAT_386BSD_MBRPART on Xen - not enabled in GENERIC. commit 6e765e29c055e4865c7b85be627f8129c2b28fdd Author: maxv Date: Sat Jul 29 05:46:29 2017 +0000 Remove TCP_COMPAT_42. commit dc18161cf0e717eb9132f77d79b37d6bdc67be90 Author: maxv Date: Sat Jul 29 05:08:48 2017 +0000 Forgot to commit this file yesterday. commit 97e5a08b2f4549e0fda367cc5c7e38c1b1a5bef2 Author: maxv Date: Fri Jul 28 19:26:15 2017 +0000 Remove TCP_COMPAT_42 from the config files. Pass 1. commit af908aef2ee2e3cb119f518365c92b9a633f513b Author: maxv Date: Fri Jul 28 19:16:41 2017 +0000 Remove TCP_COMPAT_42. This feature is a workaround for a bug in the TCP stack of BSD4.2. Having such features just does not make any sense, and looking at the code, I'm not sure it actually works. commit 63d0a27f860714efcc085e53cf03f36fe7a374d1 Author: maxv Date: Fri Jul 28 16:10:28 2017 +0000 After a careful review, and all things considered, disable compat43 by default on amd64. The use case is limited, the potential for damage too high, and it is safer to run a BSD4.3 binary on i386 since the kernel does not have to go through netbsd32 - which may not correctly reproduce i386. commit 46a15d10b411b0c4c61e77e8bb856d653bfd3971 Author: maxv Date: Fri Jul 28 14:26:50 2017 +0000 Don't include malloc.h. commit b3ccdbd8cb61d5eac83b9b2583a9a61a522dd3d5 Author: maxv Date: Fri Jul 28 14:13:13 2017 +0000 Disable svr4 and ibcs2 by default. These options are not well-tested, of a limited use case, and the potential for damage is too high. Vulnerabilities were presented at DEFCON 25 - I see that at least one of them can be exploited to get ring0 privileges. commit 76f51534b544d3ecf56a42106a00cb6c0f0d54ba Author: maxv Date: Fri Jul 28 13:59:07 2017 +0000 Disable vm86 by default. The use case is limited, and the potential for damage is too high. This code is fragile, and relies on a certain number of assumptions, some of which are not be totally true. For example, it relies on the fact that a 16bit process cannot perform a syscall, but verily it can. The slighest confusion in the trap frame can lead to ring0 exploits. Also, I'm not convinced that it interacts well with the compatibility layers. commit 33e724835efcdc3919df42e4b5353e57d7e807c4 Author: maxv Date: Tue Jul 25 18:03:56 2017 +0000 This branch must be static, otherwise there is a condition under which the KASSERT in startlwp32 would be triggered. commit 3e5e4095cc7340d1a4133029ec7bb49841e34d7b Author: maxv Date: Tue Jul 25 17:43:44 2017 +0000 Must not be from n32. commit 379d918fc650f78dc559e1a1b8d227e213d2b12e Author: maxv Date: Sat Jul 22 13:03:54 2017 +0000 Add USER_LDT, commented out for now. commit d94ed138d6a6cd94ae1cac60c7392b59cfef5fb9 Author: maxv Date: Sat Jul 22 13:00:42 2017 +0000 Branch for USER_LDT. commit 0f8cf2db758bd25d1353a15451bfbc78addd17fa Author: maxv Date: Sat Jul 22 09:20:01 2017 +0000 Must be curlwp. commit 72c35db4c5cf05b8096e1fc5047b61469648bdd9 Author: maxv Date: Sat Jul 22 09:01:46 2017 +0000 Call _proc0_tss_ldt_init only once, and rename them. commit b603259080b6bffeb7ef55aff98110c4c8a23bb7 Author: maxv Date: Sat Jul 22 08:23:18 2017 +0000 Initialize these kpm fields in pmap_bootstrap. commit 087b90f67ef791fe672546f85c22d1436578f0ad Author: maxv Date: Sat Jul 22 08:01:35 2017 +0000 Clean up, it is easier to debug with qemu+gdb anyway. commit 095b7721134cffc007881692bad82acb0f530b72 Author: maxv Date: Fri Jul 14 13:23:48 2017 +0000 Should be loadfactor(). commit 6c8beef488d54783d050071b6e7850799c736cb7 Author: maxv Date: Fri Jul 14 13:21:29 2017 +0000 Don't forget to clean l_md.md_flags, otherwise there may be MDL_COMPAT32, in which case the kernel would always use iret (slower). commit 900e375bad423737995b69d694af3cd2682ad3ac Author: maxv Date: Fri Jul 14 13:02:20 2017 +0000 Revert rev1.26. l_estcpu is increased by only one cpu, not all of them. commit 3cb66843db628680cb6ce5a8f0408b7070a4d80e Author: maxv Date: Wed Jul 12 17:52:18 2017 +0000 rsp2, not 3 commit 482bcafeedd9d99a2ce9385aea7e64012628d277 Author: maxv Date: Wed Jul 12 17:40:33 2017 +0000 Enable PMCs by default. commit e310143491ebdf28f3561a3a4873524c2f2b102d Author: maxv Date: Wed Jul 12 17:38:15 2017 +0000 Update. commit f93123645f7afa1d0171d2035c32169179ad2e07 Author: maxv Date: Wed Jul 12 17:33:29 2017 +0000 Properly handle overflows, and take them into account in userland. commit 7865ee58f85983cbbb796903b3097f44f2123985 Author: maxv Date: Wed Jul 12 17:10:09 2017 +0000 Build the pmc tool on amd64. commit a5cb3465af9be2a0e45aba64f631d1ee1cef373e Author: maxv Date: Wed Jul 12 16:59:41 2017 +0000 include opt_pmc.h commit 2a46ba9b7ccf91c07c9a1dd3e37f0aeaef73b9f0 Author: maxv Date: Sat Jul 8 15:15:43 2017 +0000 explain a bit commit fa290400d30a0d2f933707373c523f92c8a6d07b Author: maxv Date: Sun Jul 2 11:21:13 2017 +0000 Put some ()s in the macro (kre). commit 32772c615f58ecc86fa53c0ac5992cd5644bf15a Author: maxv Date: Sun Jul 2 11:16:50 2017 +0000 Use a bitmap-based allocator for i386, same as amd64. Several functions are now identical - or nearly identical - on both sides. I couldn't test this change on xen, because I'm having some unrelated issues with my VM and I've spent enough time not understanding what's wrong with it. commit c146046ae992a2cb251726d89ed877d914fde890 Author: maxv Date: Sun Jul 2 09:02:51 2017 +0000 Hide the computation in a macro. commit 52f55fda432a72ed7fc7023c8af38f8f9b2cb27a Author: maxv Date: Sun Jul 2 09:02:06 2017 +0000 Define MINGDTSIZ/MAXGDTSIZ in bytes, not in number of slots; same as amd64. commit c9159fe6881a396bf619f9b1c3be20659e766c3e Author: maxv Date: Sat Jul 1 10:44:42 2017 +0000 Remove the osyscall call gate on i386, and emulate it. There is a one- instruction race in it that could panic the kernel. commit 63d818410227d1108b36399086b36b518df736c0 Author: maxv Date: Sun Jun 25 12:44:04 2017 +0000 NULL deref, found by Mootja; not sure how to fix it, so just add a big XXX commit 72dcbcd2c5c1316844ad4ca042001f530b3f8088 Author: maxv Date: Sun Jun 25 12:39:27 2017 +0000 memory leak, found by Mootja; it seems that we should check the return value of 'fw_bindadd' in several other places, but whatever commit 0d4ced1b6b4c4f6373108d1a84b4d6cb54c60c99 Author: maxv Date: Sun Jun 25 12:29:32 2017 +0000 dumb instruction commit 42f9673b63e7dcf99a9046e1e0e29f3e8d71279e Author: maxv Date: Sun Jun 25 12:27:13 2017 +0000 spl leak, found by Mootja commit b4711da238bb357902f25ba1b1614283ca0c5ab4 Author: maxv Date: Sun Jun 25 12:25:02 2017 +0000 two spl leaks, found by Mootja commit 7c709b3fedfa386f9d081fd7013d783a14ae0d5a Author: maxv Date: Sun Jun 25 12:21:00 2017 +0000 spl leak, found by Mootja commit a56660bb5e1a675eccf339b8cbb25be4ce9b67f7 Author: maxv Date: Sun Jun 25 12:15:04 2017 +0000 uninitialized var, found by Mootja; don't know which value to put, so add a big XXX commit de87c056759ee8bf845e008aabf6fa6f90c652ee Author: maxv Date: Sun Jun 25 12:11:30 2017 +0000 memory leak, found by Mootja commit f482dc51422f0ba6823cab5b99b7100d1044dff2 Author: maxv Date: Sun Jun 25 12:07:23 2017 +0000 spl leak, found by Mootja commit 11c030056ceb29691d6f5897408cfeac0d296953 Author: maxv Date: Sun Jun 25 12:04:37 2017 +0000 uninitialized variable, found by Mootja commit 26b85dcfdbe7736104383b40b49db0eb9d998cf6 Author: maxv Date: Sun Jun 25 12:02:59 2017 +0000 spl leak, found by Mootja a long time ago commit f249bc4e4f23ee2708b6d97cd13c8306f706076c Author: maxv Date: Sat Jun 17 09:32:53 2017 +0000 Remove dead and broken code. It is not a bad idea to implement USER_LDT on Xen, but it certainly shouldn't be done this way. commit 2490c6e6ad153be450f0cd5650d05aa8a065f91f Author: maxv Date: Sat Jun 17 08:40:46 2017 +0000 Increase the kernel heap size from 512GB to 32TB, in such a way that it is able to map the maximum amount of ram supported twice (16TB x 2). commit 8b162f3442f6bf8f3151644832f60ea648966f48 Author: maxv Date: Sat Jun 17 08:07:03 2017 +0000 Actually, use slot 456 instead, so that it fits a cache line. commit b1d0555b5bbb5961b342447c5140585115fc2cde Author: maxv Date: Sat Jun 17 07:45:13 2017 +0000 Check (inside), not (!outside). It explains the two install failures reported between pmap.h::r1.65 and vmparam.h::r1.40. commit 9ee5412c66bfe69de1958b0f88d71ba4d3d4963e Author: maxv Date: Thu Jun 15 13:42:55 2017 +0000 Fix a subtle but important bug in pmap_growkernel. When adding new toplevel slots to pmap_kernel, we are implicitly using the recursive slot; but this slot is in the active pmap, which may not be pmap_kernel. Therefore, adding L4 slots is fine in itself, but when adding L3 slots the kernel faults since the L4 slots that were just added are not active on the cpu. So far this has never been triggered, because the current va limit makes it impossible to add a new L4 slot, and i386 only has one level so the kernel cannot fault in a lower level. Now the tree is grown in the current pmap (cpm), copied into pmap_kernel, and propagated in the other pmaps as expected. Note that we're using CPUF_PRESENT, because this function may be called early, before cpu0 is attached. It does add to the current mess in the cpu attach code, so it will probably have to be revisited later. commit e048936b37ec75f803a2f7fb59fdb7b4061d0de0 Author: maxv Date: Thu Jun 15 11:25:52 2017 +0000 Correct these values. They must be consistent with NKL4_MAX_ENTRIES, otherwise the kernel thinks it has ~126TB of va while pmap knows it has only 512GB. commit 7cc7a0afeadaf3468eea8e41417b1f56603e87e4 Author: maxv Date: Thu Jun 15 09:31:48 2017 +0000 Mmh, correctly handle the physmem % lvl == 0 case. Don't know how I didn't see this in the first place. commit cd491ad5d6021aae4bb032fc8d75f91b15d3c956 Author: maxv Date: Thu Jun 15 07:05:32 2017 +0000 Limit the size of the direct map with a 2MB granularity (instead of 1GB). This way if there's a computation error somewhere we will fault earlier instead of letting the cpu access non-present physmem - which may cause some bizarre behavior. commit 0d5708a0ccdf88ba33bf27f03b057b3618282313 Author: maxv Date: Thu Jun 15 06:32:52 2017 +0000 Reorder these loops to reduce the number of enter->flush. I figured out yesterday that this has a clear impact: a system with 16TB of hard-coded ram has a 4-second black screen when booting. Now we're down to < 0.5s. It could be optimized more, but verily I don't have a machine with P1GB right now. commit f1d68793c024d5bc839796eed9f4bff14021cd38 Author: maxv Date: Wed Jun 14 17:54:01 2017 +0000 Check argc, and add a message. commit 9edf26c8964f8f550060ffe0f8832f051dd704b8 Author: maxv Date: Wed Jun 14 17:48:40 2017 +0000 Make the PMC syscalls privileged. commit 5c26a1a4f0aba292a9ea3a030538de1877567c33 Author: maxv Date: Wed Jun 14 17:21:04 2017 +0000 Disable interrupts for T_NMI (inline calltrap). Note that there's still a way to evade the NMI mode here, if a segment register faults in INTRFASTEXIT; but we don't care. I didn't test this change, but it seems fine enough. commit 03b96c316e25b5556567162513a7980dbc9369e8 Author: maxv Date: Wed Jun 14 17:02:16 2017 +0000 style commit 12153b63581c85c7f78576d8352687245614a83b Author: maxv Date: Wed Jun 14 14:17:15 2017 +0000 Give the direct map 32 slots (16TB of va). This matches MAXPHYSMEM, in such a way that the direct map is no longer the limiting factor for high memory systems. commit 813ba0e2f4735b2d5a5141535bd95355f18593ec Author: maxv Date: Wed Jun 14 12:49:37 2017 +0000 Move the direct map from slot 509 to slot 460. We will increase its size dynamically. commit 89a31c5e4ffcd74cff0674d968911406fa9f4965 Author: maxv Date: Wed Jun 14 12:27:24 2017 +0000 Define MAXPHYSMEM globally. commit 033dd527a3d088a8f7c6e3bbcba8cd9f7d9a7150 Author: maxv Date: Wed Jun 14 08:45:42 2017 +0000 Add EFER_TCE. This would be an interesting feature to have, since it reduces the indirect cost of invlpg; but I'm not convinced the way we flush upper-levels is correct for this yet. commit ac3745c77bc9d868bb8638e839f8730f1d9d009e Author: maxv Date: Wed Jun 14 08:12:22 2017 +0000 Fix a bug introduced in bus_space.c::r1.39. This check too is hard-coded. Might have had a cumulative effect on PR/52000. commit f771cea343621edd70bd70ea428d8191c069fb1c Author: maxv Date: Wed Jun 14 07:45:45 2017 +0000 Fix a pretty dumb mistake I made in r1.22: the alignment needs to be in the bss, otherwise the bootloader will use memory before __kernel_end and give a wrong start pa to the kernel. This issue was investigated by Anthony Mallet. Should fix PR/52000. commit b12026b8d5f2e867ffeaf68ce0bf38df23f6f8a4 Author: maxv Date: Sat Mar 25 15:07:21 2017 +0000 Don't need gdtstore here. commit cb13eedc220f9862b790663b0a48f08c27a7f356 Author: maxv Date: Sat Mar 25 15:05:16 2017 +0000 Use a bitmap-based allocator, will be easier to share with i386. commit fe7aba1813529219b302a3d053e559b62a12a23a Author: maxv Date: Fri Mar 24 19:21:06 2017 +0000 Handle counter overflows, and sample with 500000 events per interrupt. It's a pre-requisite for real sampling. Overflows are not yet displayed by pmc(1), but will be soon. commit 810e6112fcc7d2aa75fdede12c6c14a02d5d32ce Author: maxv Date: Fri Mar 24 18:30:44 2017 +0000 Drop support for 586 PMCs; the detection is broken, and I'm not sure the code even works. No one has ever cared about this anyway, and we won't maintain it. While here, fix the mask on the counter - K7 and F10H have 48bit counters. commit 44408b286b468c6bc99cac7010b41707bcf2f976 Author: maxv Date: Fri Mar 24 18:03:32 2017 +0000 Unconditionnally save the segment registers - because we could have a kernel %gs and a userland %es/%ds -, and explain why T_NMI is a special case. Note that checking %gs directly is not a good idea: recent CPUs have the FSGSBASE instruction set, which allows userland to directly modify %gs without going through the kernel. If we ever enable this set, we will have to change this function, since we won't be able to test %gs against VM_MIN_KERNEL_ADDRESS anymore. commit e2b64c789c5b06b699deb1369b3fe242f1e64cd9 Author: maxv Date: Fri Mar 24 17:09:36 2017 +0000 Don't compile PMCs on Xen. commit e332366bd103073e75a293da8197faadc5a0e82f Author: maxv Date: Fri Mar 24 10:58:06 2017 +0000 Don't forget to flush the xpq queue, otherwise shit may happen. commit 9e0a179a96eea601d561c65a9c1c4412c4410075 Author: maxv Date: Thu Mar 23 18:08:06 2017 +0000 Remove PG_k completely. commit cb7a294384928a48c94460ae854c0bbece30845e Author: maxv Date: Thu Mar 23 17:25:51 2017 +0000 Remove this call gate on amd64, it is useless and vulnerable. Call gates do not modify %rflags, so interrupts are not disabled when entering the gate. There is a small window where we are in kernel mode and with a userland %gs, and if an interrupt happens here we will rejump into the kernel but not switch to the kernel TLS. Userland can simply perform a gate call in a loop, and hope that at some point an interrupt will be received in this window - which necessarily will be the case. With a specially-crafted %gs it is certainly enough to escalate privileges. commit 1b395df6a4a0c9b5f2bf37fb58924a18d471a058 Author: maxv Date: Sat Mar 18 13:39:23 2017 +0000 Mmh, allow iret to be handled when an #SS fault (T_STKFLT) happens. Even if the sdm is far from being clear, it appears that iret can trigger an #SS fault if %ss points to a writable but non-present segment; in which case the kernel would panic, thinking the fault was internal to it. In particular, userland can create a broken segment in the ldt with USER_LDT, update its %ss with setcontext and trigger the panic. I don't think amd64 is affected since USER_LDT does not exist there, and the changes on tf_ss seem correct - but I'm still adding T_STKFLT for safety. commit 35c40bd394f25e02657d8d4d51125a8f6fddd214 Author: maxv Date: Sat Mar 18 13:35:57 2017 +0000 Style, and remove debug code that does not work anyway. commit 526c571da9a019afc8583ba16e0bb00d820f81c0 Author: maxv Date: Wed Mar 15 16:42:18 2017 +0000 Add a comment to answer a question regarding privilege separation when modifying a PTE from an active page tree. The question is from Manuel Bouyer, and the answer is from me. commit 8ba43c36a349e03a91027855ede71fa47fa03c4c Author: maxv Date: Sat Mar 11 14:13:39 2017 +0000 Mmh, remove a debug printf I mistakenly added in my previous commit commit f559bad04f14b4160a504ba45d782357a523fcda Author: maxv Date: Sat Mar 11 10:33:46 2017 +0000 Add the AMD 10h family, with additional events that I believe are useful, the DTLB misses on large pages for example. While here, remove a few K7 flags that do not actually exist on K7 (there must have been a confusion between K7 and K8); and make the 'pmc list' command a little more user-friendly. commit 1d0457fea854e50c7a766b01a3fc909413757ba7 Author: maxv Date: Fri Mar 10 15:06:20 2017 +0000 PRIu64 commit 8bf7800cb36b80509af4313bf48e3ecc2214ef8e Author: maxv Date: Fri Mar 10 14:54:12 2017 +0000 PMCs for amd64 - still disabled, like i386. commit b728bc317e0cbbf39e9d5d90f8222455d3b477af Author: maxv Date: Fri Mar 10 14:40:56 2017 +0000 Move pmc.c into x86/, it can be shared with amd64. commit c74db929a38ea32cbb4eb57de2d7664114f7db17 Author: maxv Date: Fri Mar 10 13:42:47 2017 +0000 unused commit cd45a714ca5e13d599b322e8d197b0005828c73f Author: maxv Date: Fri Mar 10 13:09:11 2017 +0000 Switch to per-CPU PMC results, and completely rewrite the pmc(1) tool. Now the PMCs are system-wide, fine-grained and more tunable by the user. We don't do application tracking, since it would require to store the PMC values in mdproc and starting/stopping the counters on each context switch. While this doesn't seem to be particularly difficult to achieve, I don't think it is really interesting; and if someone really wants to measure the performance of an application, they can simply schedctl it to a cpu and look at the PMC results for this cpu. Note that several options are implemented but not yet used. commit ac63fe57a60163957e81c49b2fb6ccaefa35ed0b Author: maxv Date: Wed Mar 8 18:00:49 2017 +0000 A few changes: * Use markers to reduce false sharing. * Remove XENDEBUG_SYNC and several debug messages, they are just useless. * Remove xen_vcpu_*. They are unused and not optimized: if we really wanted to flush ranges we should pack the VAs in a mmuext_op array instead of performing several hypercalls in a loop. * Start removing PG_k. * KNF, reorder, simplify and remove stupid comments. commit a69664a349fe82e8e564e84d6e924f620d2c2233 Author: maxv Date: Wed Mar 8 16:52:17 2017 +0000 Mark as obsolete instead of removing (from Thomas Klausner). commit f52cdaa97b8387ca8bafdd6928242df39285b416 Author: maxv Date: Wed Mar 8 16:42:27 2017 +0000 Add a version argument, set to 1, and check it in usr.bin/pmc. Use uint32_t instead uint8_t since we now need 12bit selectors (10h family). And while here KNF. commit ba5d54eefeaf442d3e76593a3ba7cd040f702980 Author: maxv Date: Wed Mar 8 16:09:27 2017 +0000 Deprecate the pmc functions in libi386. The parameters will be updated, and we are not interested in maintaining this anyway. Now i386's pmc interface is opaque, which is good. commit 9661c249579efa3e7a1b9651e6482db1782f70d4 Author: maxv Date: Wed Mar 8 16:05:29 2017 +0000 We don't use libi386 anymore. commit 61687248b290dbcd8b8de592186de0d2932f472e Author: maxv Date: Wed Mar 8 15:53:00 2017 +0000 Remove i386 from libpmc; it has its own interface (sysarch), and we won't maintain compatibility. Verily, I cannot build a distribution now, so I'm committing this rather blindly. This being said, it looks correct enough. commit aa2147dc9dd22e7435e47706415c297e02bba5f8 Author: maxv Date: Sun Mar 5 09:08:18 2017 +0000 Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege separation between the kernel and userland. On Xen-amd64, the kernel runs in ring3 just like userland, and the separation is guaranteed by the hypervisor - each syscall/trap is intercepted by Xen and sent manually to the kernel. Before that, the hypervisor modifies the page tables so that the kernel becomes accessible. Later, when returning to userland, the hypervisor removes the kernel pages and flushes the TLB. However, TLB flushes are costly, and in order to reduce the number of pages flushed Xen marks the userland pages as global, while keeping the kernel ones as local. This way, when returning to userland, only the kernel pages get flushed - which makes sense since they are the only ones that got removed from the mapping. Xen differentiates the userland pages by looking at their PG_u bit in the PTE; if a page has this bit then Xen tags it as global, otherwise Xen manually adds the bit but keeps the page as local. The thing is, since we set PG_u in the kernel pages, Xen believes our kernel pages are in fact userland pages, so it marks them as global. Therefore, when returning to userland, the kernel pages indeed get removed from the page tree, but are not flushed from the TLB. Which means that they are still accessible. With this - and depending on the DTLB size - userland has a small window where it can read/write to the last kernel pages accessed, which is enough to completely escalate privileges: the sysent structure systematically gets read when performing a syscall, and chances are that it will still be cached in the TLB. Userland can then use this to patch a chosen syscall, make it point to a userland function, retrieve %gs and compute the address of its credentials, and finally grant itself root privileges. commit 2f71cf0a8745eb970508936c23e98bf689cf8d64 Author: maxv Date: Sun Mar 5 08:36:35 2017 +0000 Should be PG_k, doesn't change anything. commit 011314358c24ff854210d7b54f9f67d675440013 Author: maxv Date: Sat Feb 18 16:48:38 2017 +0000 KNF, and make it less i386-specific. commit eef79679ebc53f42d853c18c990f3c9e35fee5ce Author: maxv Date: Sat Feb 18 16:15:51 2017 +0000 Add the AMD 10h family PMC values. Some values depend on the CPU revision, they are commented out. Several other values are common with K7, we could merge them later. This family of CPUs has a 12bit event selector, contrary to K7 (8bit). The thing is, i386's PMC interface takes as argument a uint8_t from userland, so these counters are not accessible (yet). commit 01c8ebfdae3471d194f9d7eb9b365c4401c7e6eb Author: maxv Date: Sat Feb 18 15:56:03 2017 +0000 Fix a bug I introduced yesterday. The arguments are 8-bit ints, so the unit gets truncated. By luck, the counters I was testing could accept a null unit. commit d4ae1f29dcfab9114539134179aae09857780d1d Author: maxv Date: Sat Feb 18 14:43:34 2017 +0000 PERFCTRS -> PMC (not implemented anyway) commit 6dc981ea93b74f90737eeddf3b902e18862729ca Author: maxv Date: Sat Feb 18 14:36:32 2017 +0000 There is currently an ugly mix between the PERFCTRS subsystem (MI), and i386's own PMC interface (MD). Stop using PERFCTRS and use PMC instead. While here remove some unused flags, which are wrong on the latest CPUs anyway. commit e1189791dacbd0fef4ef1d31fb05b1513745c15a Author: maxv Date: Fri Feb 17 12:10:40 2017 +0000 Support PMCs on multi-processor systems. Still several things to fix, but at least it works a little. Will be improved and moved into x86/ soon. commit f9a1deadf0cb999b8a2c06dda04372b3828c7d73 Author: maxv Date: Tue Feb 14 09:11:05 2017 +0000 Add most of my USER_LDT code for amd64, but disable it and put a comment about why Wine still does not work. Nothing changes, but at least it is a step forward. commit 162273667de729413a1d49b559495da1d33b0ba0 Author: maxv Date: Tue Feb 14 09:03:48 2017 +0000 Check %eip with USER_LDT too. commit 051fbcd693b553d0ddf7589bd88d2545eafa8423 Author: maxv Date: Mon Feb 13 15:03:18 2017 +0000 Make sure %rip is in userland. This is harmless, since the return to userland is made with iret instead of sysret in this path. While here, use size_t. commit 21e281d828a69471d3aba9dd0db79c93753d9ac6 Author: maxv Date: Mon Feb 13 14:54:11 2017 +0000 Don't let userland choose %rip. This is the Intel Sysret vulnerability again. commit 83a5ee117b871849bbe28ce506276debdf93ce5b Author: maxv Date: Sun Feb 12 18:43:56 2017 +0000 Add a KASSERT, otherwise it looks like a NULL deref; from Mootja. commit 89cd850a981aa487c30cdd1c1d7a0f6f8749a4ce Author: maxv Date: Sun Feb 12 18:24:31 2017 +0000 Memory leak, found by Mootja; not tested, but obvious enough. commit a36bf099ad0467eadf6d37c874da0739bbaa1006 Author: maxv Date: Sun Feb 12 18:21:50 2017 +0000 Uninitialized var, found by Mootja; not tested, but obvious enough. commit e5b5f1232cf20488e5888b4da54a4d5779904162 Author: maxv Date: Sat Feb 11 16:02:11 2017 +0000 Put 2MB alignments between the kernel segments. This way the kernel image is entirely mapped with large pages, which uniformizes performance and reduces fluctuation. Sent on port-amd64. commit 1786cf4e267b60b2b7c2202724030d9a29b0c8cd Author: maxv Date: Sat Feb 11 15:11:45 2017 +0000 Fix a few (unused) MSR values, and add some others that I believe are relevant. From Murray Armfield (PR/42861). commit 37da66d68829988ee878843fa26aad5af6c4cbbc Author: maxv Date: Sat Feb 11 15:05:15 2017 +0000 Remove VM_MAX_KERNEL_BUF (unused). Looks like several other ports could do the same. commit 3ef2f444723079769bb7199423da1bce003e7214 Author: maxv Date: Sat Feb 11 14:11:24 2017 +0000 Instead of using a global array with per-cpu indexes, embed the tmp VAs into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64}, because amd64 already has a direct map that is way faster than that. There are two major issues with the global array: maxcpus entries are allocated while it is unlikely that common i386 machines have so many cpus, and the base VA of these entries is not cache-line-aligned, which mostly guarantees cache-line-thrashing each time the VAs are entered. Now the number of tmp VAs allocated is proportionate to the number of CPUs attached (which therefore reduces memory consumption), and the base is properly aligned. On my 3-core AMD, the number of DC_refills_L2 events triggered when performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on average divided by two with this patch. Discussed on tech-kern a little. commit ee9f8e658fb9f7094b31403eeebd21d9e804125f Author: maxv Date: Sat Feb 11 13:22:58 2017 +0000 As the XXX implicitly suggests, this line is wrong. Many other families support PMCs (like my 10h amd). While here, put a warning in a comment. commit 9694705d74bda734af134492f5cccc7a493b371b Author: maxv Date: Fri Feb 10 10:39:36 2017 +0000 If the segment list is full, print a warning on the console and launch the system with the available segments. High memory systems may have more than VM_PHYSSEG_MAX segments; it is better to truncate the memory and allow the system to work rather than just panicking. The user can still increase VM_PHYSSEG_MAX (or ask us to). Fixes issues such as PR/47093. Note: the warning is logged but does not appear in dmesg, this too needs to be fixed for the rest of the bootstrap procedure. commit c53f8d0b8d429fa588589f976a4372f950d632b2 Author: maxv Date: Fri Feb 10 10:02:26 2017 +0000 Use macros instead of hard-coded constants. By the way, I don't think this code is correct, but whatever. commit b167670e0e1290a452a38b7ca976cd410baa1e96 Author: maxv Date: Fri Feb 10 09:57:04 2017 +0000 Import iomem_ex locally. commit 797e7a125c1ea3368607a9a9899f1e684fd364e3 Author: maxv Date: Thu Feb 9 19:30:56 2017 +0000 If the preloaded modules cannot be mapped with the initial amount of VA, discard the associated bootinfo entry. Otherwise the machine faults and reboots immediately. I spotted this bug more than a year ago, but I recently saw that there is already PR/42645 (7 years old), so just fix it. The size has been increased in the meantime, so the limit is unlikely to be reached anyway. commit f41369c5cca23f0462537c8554d2a8eab1511c21 Author: maxv Date: Thu Feb 9 08:38:25 2017 +0000 No, do not just copy code from i386 and expect it to work on amd64. There are several structural differences. At least two issues here: segment registers that could fault in kernel mode with userland TLS, and a non- canonical %eip on iret. Not even tested, but just obvious. By the way, I believe this function is still buggy since we don't call cpu_fsgs_reload while %fs/%gs could have been reloaded. commit bc40dbbe06af72f18c258a269b001b2a083a23d1 Author: maxv Date: Thu Feb 9 08:23:46 2017 +0000 Restore %ds before swapgs. Movs to segment registers are allowed to fault in kernel mode but simply cause a signal to be sent to userland. The thing is, in this case %gs is not restored when entering the trap routine, which means the kernel uses userland's TLS instead of using its own. Which in short makes it easy to escalate privileges. Currently, this bug is triggered only in one place, which I am about to fix too. commit 7b2a91dead5a596ba4c7935211f92fda3d6f37e6 Author: maxv Date: Wed Feb 8 10:08:26 2017 +0000 Remove gdt_reload_cpu. GDTR takes a VA as base, and in our x86 implementation this VA is per-cpu and does not change; there is therefore no need to remotely reload GDTR. commit 818b2c1cbc8efe4aa2152e765f211fb62dbe0581 Author: maxv Date: Wed Feb 8 09:39:32 2017 +0000 Localify, add a comment and merge some others. commit 6091e3b69a153161fb7be196cd065dc7e4c96f7c Author: maxv Date: Mon Feb 6 16:34:37 2017 +0000 In cpu_mcontext32_validate, allow the registers to have different locations if the LDT is user-set. I am intentionally not allowing this in check_sigcontext32, because I don't think Wine uses it. commit 747f1de187fe4faa207deb11e777b34a2f2c3d7c Author: maxv Date: Mon Feb 6 16:02:17 2017 +0000 Add the USER_LDT sysarch options in netbsd32. We don't translate 'desc', since if we ever implement USER_LDT we will only allow 8-byte-sized entries, which have the same layout on amd64 and i386. commit 3203c4bb9ecc7116ee7df1860cccd00537f9ece9 Author: maxv Date: Sun Feb 5 10:42:21 2017 +0000 Rename ldt->ldtstore and gdt->gdtstore on i386. It reduces the diff with amd64, and makes it easier to track down these variables on nxr - 'ldt' and 'gdt' being common keywords. commit 68b4b5a903d408a607baaac7aa83f10e725b34d5 Author: maxv Date: Sun Feb 5 08:58:39 2017 +0000 Remove misleading comment; these macros should not be used if a user LDT is active. commit 461bd691015989dc5f25e6d9a40a3e1fe4bd5cc4 Author: maxv Date: Sun Feb 5 08:52:11 2017 +0000 Remove #if 0 on USER_LDT. commit ea7976adef3456c4debcfc7a1f4981ea6851238e Author: maxv Date: Sun Feb 5 08:42:49 2017 +0000 Missing pmap_ldt_cleanup. commit 84045b32c0d686db46c6044d5bb23c7fab2f5aec Author: maxv Date: Sun Feb 5 08:36:08 2017 +0000 Now that valid_user_selector only checks for LDT selectors, remove it. A user may legitimately want to have one register in the GDT, and another in the LDT. Pass 2/2. commit ccb50ffed0a22b44ab0092f7452ec35e135f15db Author: maxv Date: Sun Feb 5 08:19:05 2017 +0000 In cpu_mcontext_validate, treat %cs differently depending on whether a user LDT is set; just check the permission without checking the location (which may change). In valid_user_selector, don't check the length of the LDT. This is racy because pm_ldt_len could be updated by another thread, and useless since the length is already referenced in ldtr (ldt_alloc), which means that any overflow will fault in userland. Also, don't check the permission of the segment pointed to; this too is racy, and we don't care either since the permissions are checked earlier in x86_set_ldt1. Pass 1/2. commit b57c818d68ea89eb03f6af7278290af14438b8f7 Author: maxv Date: Thu Feb 2 19:12:09 2017 +0000 Fix these comments, we probably won't want to keep them up to date. commit 7276bb8b264f9a740e5ad5c804a2017f8fdd8046 Author: maxv Date: Thu Feb 2 19:09:08 2017 +0000 Increase KERNTEXTOFF from 1MB to 2MB on amd64. [1MB; 2MB[ is now handled by UVM, so there is no physical loss. On amd64 we always remap the kernel text with 2MB pages, and because of the 1MB start address we were forced to map [0MB; 2MB[ inside the first large page. The problem is, the lower half is used by UVM to allocate physical pages, and it is possible that some of these could be used by userland. We could end up with userland-controllable data mapped into the kernel text on a privileged page, which is far from being a good idea from a security pov. I am not fixing i386 yet, because the large page size depends on PAE, and we probably don't want to have a text located at 4MB on low-memory systems. (note: I didn't introduce this issue, it was already there when I came in) commit 3806b2394270571c9dcac2f7b7061b619ee0d98f Author: maxv Date: Thu Feb 2 17:37:49 2017 +0000 The first va should depend on the text offset, not the kernel base. Use rounddown. Note: this value is still wrong, it should be roundup. But that's another issue that will be fixed in amd64 soon. commit 0bd04880decfb7e835fb91d2c0206382857954f8 Author: maxv Date: Thu Feb 2 08:57:04 2017 +0000 Use __read_mostly on these variables, to reduce the probability of false sharing. commit 2e6c218cd54da250d1cbe17a20f8d1bc2f157bdc Author: maxv Date: Wed Feb 1 17:58:47 2017 +0000 Not sure what we are trying to achieve here, but there are two issues; error can be printed while it is not initialized, and if m_pulldown fails m is freed and reused. Quickly reviewed by christos and martin commit 922cd8ccbd7c7f650a0b38eec7e671c9c4307cf1 Author: maxv Date: Tue Jan 31 17:38:54 2017 +0000 Update the URLs, and add the DC_refills_ flags (from the spec, not present on my cpu). commit 4b2f9c027647ce40862fdbd60460583c07578fcb Author: maxv Date: Tue Jan 31 17:13:36 2017 +0000 Correctly handle the return value of arpresolve, otherwise we either leak memory or use some we already freed. Sent on tech-net, ok christos commit ca025ac8853b2f6384e0982ba56af3def4be02fe Author: maxv Date: Tue Jan 24 18:37:20 2017 +0000 Don't forget to free the mbuf when we decide not to reply to an ARP request. This obviously is a terrible bug, since it allows a remote sender to DoS the system with specially-crafted requests sent in a loop. commit d7831243b0ffff577e531d48db6ce541902730b5 Author: maxv Date: Sun Jan 22 20:17:10 2017 +0000 Use xpmap_pg_nx. Not tested (due to some unrelated panic I'm getting), but obvious enough. commit dea9d4839a2df19c6163850882fd85dbc3300602 Author: maxv Date: Sun Jan 22 20:04:35 2017 +0000 Put pmap_pg_nx into the dummy Xen page. While here, do some KNF and localify a bit. commit aac9d409df7523fef88c205edc6372da166b1b67 Author: maxv Date: Sun Jan 22 19:42:48 2017 +0000 Import xpmap_pg_nx, and put it in the per-cpu recursive slot on amd64. commit 997bc263e1b401f3f5fca157fa3eb91b82451c5b Author: maxv Date: Sun Jan 22 19:24:51 2017 +0000 Export xpmap_pg_nx, and put it in the page table pages. It does not change anything, since Xen removes the X bit on these; but it is better for consistency. commit e80d650760fe32ab41920f183d68c7d34a05351c Author: maxv Date: Sat Jan 21 11:07:46 2017 +0000 Add some checks, mostly same as in_arpinput. commit 8c05ef7c0c346aa9dcfd44b6cf92fed36fe08c08 Author: maxv Date: Fri Jan 20 19:21:01 2017 +0000 Make sure the protocol address length equals that of IPv4. Also, make sure the hardware address length equals that of the interface we received the packet on. Otherwise a packet could easily set them both to zero and make the kernel read beyond the allocated mbuf, which is terrible. Note: for the latter we drop the packet instead of replying, since it is malformed. Note: I also added an ugly hack in CARP, since it apparently expects at least six bytes. commit a7aeb25026199dce3b79ba78afb9719f8f53bc99 Author: maxv Date: Fri Jan 20 17:50:52 2017 +0000 Style commit cab5d78441774b467fe2f7fbee676898ef117602 Author: maxv Date: Fri Jan 20 17:45:42 2017 +0000 Reput a nullcheck that was mistakenly removed in rev1.204. ar_hrd is packet-controlled. commit 9dda8d65f690864a32d372f5c13a0ef115661ad1 Author: maxv Date: Fri Jan 6 09:14:36 2017 +0000 Explain how all that mess works, without actually fixing it yet. commit 01024c1e485b0c4784a618cb3c4568e8cab2147b Author: maxv Date: Fri Jan 6 09:04:06 2017 +0000 Rename a few things commit 4a4c95683599613e2a3f96e8b832ed30ea225f4a Author: maxv Date: Fri Jan 6 08:36:56 2017 +0000 Explain the computation commit 0dd7d2d7b0381c675c3cd194ab968e9856c26393 Author: maxv Date: Fri Jan 6 08:32:26 2017 +0000 Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs are entered in reversed order. commit 09a5d91ebb6b8ae4685fd7bade141da60e988fff Author: maxv Date: Tue Dec 20 14:09:09 2016 +0000 kernel modules on xen commit f637e78d569ece91b9d902ffb493016ffdab8be5 Author: maxv Date: Tue Dec 20 14:03:15 2016 +0000 When the i386 port was designed, the bootstrap code needed little physical memory, and taking it below the kernel image was fine: we had 160 free pages, and never allocated more than 20. With amd64 however, we create a direct map, and for this map we need a number of page table pages that is mostly proportionate to the number of physical addresses available, which implies that these 160 free pages may not be enough. In particular, if the CPU does not support 1GB superpages, each 1GB chunk of physical memory needs a 4k page in the direct map, which means that if a machine has 160GB of ram, the bootstrap code allocates more than 160 pages, thereby overwriting the I/O mem area. If we push a little further, if a machine has 512GB of ram, we allocate ~525 pages, and start overwriting the kernel text, causing the system to go crazy at boot time. Fix this moving the physical allocation area from below the kernel to above it. avail_start is now beyond the kernel, and lowmem_rsvd indicates the reserved low-memory pages. The area [lowmem_rsvd; IOM_BEGIN[ is internalized into UVM, so there is no pa loss. The only limit now is the pa of LAPIC, which is located at ~4GB of memory, so it is perfectly fine. This change theoretically adds va support for 512GB of ram; and it is a prerequisite if we want to support more memory anyway. commit a1580a97988dc259e8be4c2e10e119eaff28ee6f Author: maxv Date: Tue Dec 20 12:48:30 2016 +0000 Depend on KERNTEXTOFF - KERNBASE, not IOM_END, both are equal but the text address may change in the future. commit 0ce2527fd9ece58936556e91b7bd284b51a27719 Author: maxv Date: Sat Dec 17 15:23:08 2016 +0000 Remove a wrong comment - the FPU save size should never be percpu -, and be more explicit about Xen. commit 3cc8721e0906bd30924130350b30c7413745f31a Author: maxv Date: Sat Dec 17 14:49:26 2016 +0000 Add MODULAR in Xen kernels. commit e0ec582a902455efc9301972ad5a8177935b22b7 Author: maxv Date: Sat Dec 17 14:27:53 2016 +0000 Put a limit in the percpu segment, so we can detect overflows on %fs. commit 62860d9036aa5b255b658c7785b7abc03fdd3feb Author: maxv Date: Sat Dec 17 13:49:05 2016 +0000 Fix the name of the labels. I think I got confused by jne, so while here replace it by jnz, which is more explicit. commit e086639646646c7fe898a7389900bde0bfe22674 Author: maxv Date: Sat Dec 17 13:43:33 2016 +0000 Use pmap_bootstrap_valloc and simplify. By the way, I think the cache stuff is wrong, since the pte is not necessarily aligned to 64 bytes, so nothing guarantees there is no false sharing. commit 3dae435511ca7544cd6e1d900743b1c14c185b65 Author: maxv Date: Fri Dec 16 20:16:50 2016 +0000 This can actually be enabled in Xen; my rev1.235 fixed the issue. Before that kern_end was pointing to DUMMY PAGE, which was already kentered earlier in xen_locore, causing pmap to panic. This change adds support for kernel modules in Xen. commit 4f90bdf3e776e9cf54dc5c223591aabad9cb705e Author: maxv Date: Fri Dec 16 19:52:22 2016 +0000 The way the xen dummy page is taken care of makes absolutely no sense at all, with magic offsets here and there in different layers of the system. It is just blind luck that everything has always worked as expected so far. Due to this wrong design we have a problem now: we allocate one physical page for lapic, and it happens to overlap with the dummy page, which causes the system to crash. Fix this by keeping the dummy va directly in a variable instead of magic offsets. The asm locore now increments the first pa to hide the dummy page to machdep and pmap. commit 1becd0cd3ca391a61c0b1d043ef5aa93dc4d8790 Author: maxv Date: Sun Dec 11 08:31:53 2016 +0000 Kenter local_apic_va to a fake physical page, because our x86 implementation expects this va to be valid even if no lapic is present; which probably is a bug in itself, but let's just reproduce the old behavior and rehide that bug. commit d52be83d6965bb50ae3c9990f5e8a14f5c352bb0 Author: maxv Date: Fri Dec 9 17:57:24 2016 +0000 On amd64 we try to guarantee that VA = PA + KERNBASE in the bootstrap memory. But we have a problem with the ISA I/O MEM, because its va is located above the kernel and its pa below it, so it does not respect the rule. To compensate for that we make the map look like the ISA stuff is above the kernel by applying an offset on the pa. The issue with this design is that we systematically lose 96 pages of physical memory. Fix this by applying the offset on the va instead. Now these 96 pages are internalized into uvm, and the rule is respected until kern_end. commit b629c356e9f9cd8c3a2d27d684fb28e8766c106d Author: maxv Date: Tue Dec 6 15:09:04 2016 +0000 Memory leak, found by Mootja commit e560f1df5e714d2c5ba90364a9a4c11766789dd4 Author: maxv Date: Tue Dec 6 15:05:07 2016 +0000 Use __kernel_end instead. Does not change anything, but will be meaningful soon. commit a979f7f9dc965cf0ae3c516090cea1a3179fb4e0 Author: maxv Date: Sun Dec 4 08:21:08 2016 +0000 KNF and explain a few things commit 74605a3a0cf1828c4234e3b8471366b0b0a2233d Author: maxv Date: Sat Dec 3 09:20:55 2016 +0000 Fix a wrong flag and KNF. commit 437498b762a067c5d3eb4351d168d87bf18aa2da Author: maxv Date: Fri Nov 25 14:26:53 2016 +0000 Remove this comment and allow the beginning of .data to be mapped with large pages. The issue is fixed, the lapic va is dynamically allocated now. commit 3e6f2a9524435a6460b27c58511af8a4e5667f9c Author: maxv Date: Fri Nov 25 14:12:55 2016 +0000 Move the virtual address of the LAPIC page out of the data segment on amd64 and i386. The old design was error-prone, and it didn't allow us to map the data segment with large pages. Now, the VA is allocated dynamically in the pmap bootstrap code, and entered manually later. We go from using &local_apic to using *local_apic_va, and we therefore need one more level of indirection in the asm code. Discussed on tech-kern. commit 371c900d22520ddcbdfe058dc2432e6322b47a00 Author: maxv Date: Fri Nov 25 12:20:03 2016 +0000 KNF a little commit 219bef40000555358289bde3d5fa56acd0604dfa Author: maxv Date: Fri Nov 25 11:57:36 2016 +0000 Initialize the module map limits in amd64, not x86. For the record: normally we could enable this code on Xen, since the bootstrap layout is globally the same. But there appears to be an issue in xen_locore, since any kenter in the area after kern_end triggers a KASSERT because the va is already busy. commit a8ce5a92d86c230dcdf98e28545e45728de948ef Author: maxv Date: Sun Nov 20 09:28:43 2016 +0000 Memory leak, found by Mootja. commit db93f0fa9adee9fc6066d70561b27652f161a3f6 Author: maxv Date: Sat Nov 19 09:22:03 2016 +0000 Put a one-page redzone between userland and the PTE space on amd64 and i386. The PTE space is a critical region that maps the page tree, and bugs have been found in both amd64 and i386 where the kernel would wrongly overflow userland data on this area. This kind of bug is terrible, since it allows userland to overwrite some entries of the page tree, which makes it easy to patch the kernel text and get ring0 privileges. commit 9f2f4a61a8bf64bf166cad4c4a27cc44a9c17934 Author: maxv Date: Thu Nov 17 16:32:06 2016 +0000 Unmap tmpva once we are done using it, not to pollute the page tree. commit b1cb57c50a3d123ca546e6cbf64c28c6ab6cfd16 Author: maxv Date: Thu Nov 17 16:26:07 2016 +0000 Remap the pages with G until kern_end, and not just the preloaded modules. This way the bootstrap tables, proc0's stack and the I/O mem area don't get flushed each time userland needs a TLB shootdown. commit 49a457e46212e5d87dd202b5304660aa7966f677 Author: maxv Date: Tue Nov 15 17:01:12 2016 +0000 Mmh, apparently I didn't properly test my previous change since it does not compile anymore commit 29a0e3168a0b58022699a19623e6a62dcab49b12 Author: maxv Date: Tue Nov 15 15:37:20 2016 +0000 Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping from KERNTEXTOFF. For symmetry with the normal amd64, does not change anything. commit 9ebec0815a31ce439c6f35f77cc67afbf3e6bb62 Author: maxv Date: Tue Nov 15 15:26:59 2016 +0000 I actually came across the solution to this issue in the Intel SDM for a totally unrelated reason a few weeks ago. The reason we need a particular module_map on amd64 is because gcc makes us use RIP-relative addressing. The offset field of the associated opcodes is a 32bit signed displacement, which means we can access only up to 2GB around the current instruction. And given that kernel_map is too far away from the kernel .text, it is not RIP-addressable. Hence the module_map embedded into the bootstrap memory, which is right above the kernel image. commit e267906dc127133f4c573f8e457046211bfe2cf6 Author: maxv Date: Tue Nov 15 15:00:55 2016 +0000 Initialize kern_end in amd64 instead of x86. commit fc951f62391b4b2c53a77ddf59e73826dc0aae98 Author: maxv Date: Sun Nov 13 12:58:40 2016 +0000 Explain why this is the right value, otherwise someone (like me) could be tempted to increase it. The invlpg part is from rmind, the statistical from me. commit 6e1dcae1b198efe3210fc14801255689f1652c38 Author: maxv Date: Sun Nov 13 12:38:14 2016 +0000 The reason we are not using INTRENTRY here is because this interrupt goes through a task gate that points to a TSS entry in the GDT, and therefore the GPRs are saved in the TSS by the hardware itself. Explain it, otherwise it easily looks buggy. commit 8d3b8d02890ad25641b653957f9b62238fd75951 Author: maxv Date: Fri Nov 11 12:06:31 2016 +0000 Remove useless values, and explain where some others come from commit 3dbc785add8c11d67ccb022380d85066c2c9b09d Author: maxv Date: Fri Nov 11 11:34:51 2016 +0000 Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with pmap and is just a C version of what amd64 and i386 do in asm. commit a3ef7b77c279b1800b30d11754a2051c71d669e4 Author: maxv Date: Fri Nov 11 11:31:26 2016 +0000 Mmh, I mistakenly removed the lapic page (which is part of another diff), put it back in. commit 7b7768b09038678832aa96e9a84a264ba2f4b023 Author: maxv Date: Fri Nov 11 11:12:42 2016 +0000 Start simplifying the Xen locore: rename and reorder several things, remove awful debug messages, use unsigned counters, fix typos and KNF. commit 18509a9fb38c106cc2ebb99d73a5976731c9994c Author: maxv Date: Fri Nov 11 11:00:38 2016 +0000 KNF and simplify Xen, and reduce the diff with amd64 a little commit ae289667f294b6690c7e3a5c4ade86a984876df6 Author: maxv Date: Fri Nov 11 10:40:00 2016 +0000 KNF and simplify Xen commit 7cb0b46bdd9909822594a9c5180eb031a60f873a Author: maxv Date: Fri Nov 11 09:47:18 2016 +0000 Update the pmap only once commit a6f85d963e5a288434499878e612c540c5544866 Author: maxv Date: Tue Nov 1 12:16:10 2016 +0000 Map the PTE space as non-executable on PAE. The same is already done on amd64. commit a5230e639daa76a7e47681fbf7034058812c7a51 Author: maxv Date: Tue Nov 1 12:00:21 2016 +0000 Map the remaining pages as non-executable. Only text should have X. commit 23da72060fa526d8778aeb2a8ea3ce26d3d30acb Author: maxv Date: Mon Oct 31 15:27:24 2016 +0000 The mbuf is freed by the protocol even on error, so always NULL the pointer instead of double-freeing it. Indirectly pointed out by Mootja. commit 93f862d33f96b82b36b75b4a7af7faa59f086e83 Author: maxv Date: Mon Oct 31 15:08:45 2016 +0000 Memory leak, found by Mootja. By the way, we probably shouldn't be returning -1 here. commit e1951bffbbce47539a30cd1a998dcda846432db3 Author: maxv Date: Mon Oct 31 15:05:05 2016 +0000 Memory leak, found by Mootja. It is easily triggerable from userland. commit 567edfb630e2acefc4bdd6e1887e31e97605203b Author: maxv Date: Thu Oct 20 16:05:04 2016 +0000 There is a huge fpu synchronization issue here. When the remote CPUs receive the ACPI sleep IPI, they do not save the fpu state of the lwp they are executing. The problem is, when waking up they reinitialize the registers of their local fpu and go back to their lwp directly. Therefore, if an lwp is interrupted while storing data in an fpu register, that data gets overwritten, which basically means the lwp is likely to go crazy when resuming execution. Fix this by simply saving the fpu state correctly. This way when going to sleep the state is stored in the lwp's pcb and CR0_TS is set, so the next time the lwp wants to use the fpu we'll get a dna, and the state will be restored as expected. While here, don't forget to reenable interrupts (and the spl) if an error occurs. commit 73491a0f9b1fd61011001e130f7d331e96ad0197 Author: maxv Date: Thu Oct 20 14:06:18 2016 +0000 Reload the MSRs on the original cpu on i386 - looks like I forgot this part in my rev1.41. Technically it does not change anything, since the only MSR is NOX and it is already reloaded in the trampoline. commit 0cd28157f8f3e8c9c9a55e703cb7a4d3e56c8e9a Author: maxv Date: Sun Oct 16 10:57:58 2016 +0000 Remove unused (and buggy) function. Not even compile-tested, but I've been told to go ahead anyway. commit 1f0bc3d53110a152cc6cf5910ea12aad3a02078a Author: maxv Date: Sun Oct 16 10:51:31 2016 +0000 Remove lapic_tpr on amd64 and i386, unused. Now, we have only one pointer to the LAPIC page, and each register access is done with relative offsets. commit 06f885a0dd8dbf135ff96d0625a1433526437785 Author: maxv Date: Sun Oct 16 10:38:49 2016 +0000 Use offsets to access the TPR, and not lapic_tpr. For the record: I'm not sure exactly why it was originally done this way, and the cvs logs are not quite enlightening. It might have been a way to speed up the asm access, but it would sound a little bizarre since there are many more legitimate reasons to do it on the EOI register instead. commit 5eadfcb1cd1a01b2904e7472cb919a4d9fa52526 Author: maxv Date: Sun Oct 16 10:24:58 2016 +0000 Use the generic i82489_writereg instead of lapic_tpr, for consistency. commit 255228fefbd947ae0f73686d5649d528d033ded5 Author: maxv Date: Sat Oct 15 09:50:27 2016 +0000 Instead of setting the TPR to the value that was in the data segment, set zero directly. On amd64, the data version of lapic_tpr is not explicitly initialized. commit fbb5bf9e6d382afbef6ac92bdfc761b350934f5d Author: maxv Date: Sat Oct 15 08:37:55 2016 +0000 There are several leaks in here, just fix one that should have been fixed in rev1.21 commit 0e65c5514e4fdbd51b0e9af68e1c5f1bf9d280cb Author: maxv Date: Sat Oct 15 08:30:42 2016 +0000 Memory leak, found by mootja; not tested, but obvious enough commit 27c5a67608bcfd22c81ba84a1119be531edcb668 Author: maxv Date: Tue Oct 11 13:04:57 2016 +0000 Memory leak, found by mootja; not tested, but obvious enough. By the way, I guess we should be handling the return value of OF_getprop. commit b62a1868c9cbb160160e37e16f2a7175a37a4935 Author: maxv Date: Tue Oct 11 12:53:56 2016 +0000 There are two memory leaks here, found by mootja; just add some XXXs. commit b58cbc34ddb5ccf33f15603d605f8eebf6360e54 Author: maxv Date: Sat Oct 8 15:48:07 2016 +0000 Uninitialized var, found by mootja; not tested, but obvious enough commit 3f70451742e4f4e44e4b18807082c3ff4db8102c Author: maxv Date: Thu Sep 29 17:01:43 2016 +0000 Remove outdated comments, typos, rename and reorder a few things. commit 888910932adb50529c153fd105a3b99cff264106 Author: maxv Date: Sun Sep 25 12:59:19 2016 +0000 Fix outdated comment, and #ifdef. commit 648363f5f18121e20e41cb981ee71ae0f10e0f4f Author: maxv Date: Sun Sep 25 12:53:24 2016 +0000 Revert my previous change. It is too severe: a fault might be happening in the kernel page if the map is pageable - implying it is not pmap_kernel. I thought it wouldn't be the case. commit 53b1ff80b59d9d04e0f9351a7a9e1051362203fe Author: maxv Date: Sat Sep 17 12:09:22 2016 +0000 This is just a temporary stack that holds fake arguments, and that gets remapped as RW in sys_execve. Still, in this small window, it does not need to be executable. commit 7321b9fe96c755cadb1f9a30f3bc2544985d3168 Author: maxv Date: Sat Sep 17 12:00:34 2016 +0000 Use VM_MAXUSER_ADDRESS for proc0, not VM_MAX_ADDRESS. It normally does not change anything, since kernel processes use the shared kernel map instead of the one they are given here. For consistency though, it is better to make sure UVM will not be tempted to access machine-dependent reserved areas (e.g., the PTE space on x86). commit 548506e195bfdd29c6f140b9e645e1425d1c0306 Author: maxv Date: Fri Sep 16 12:28:41 2016 +0000 x86_copyargs takes as third argument a size, but still copies two chunks of 16 and 24 bytes, without checking the userland<->kernel limit accordingly. Fix it by just checking the maximum size direcly. It means that even if 16 bytes are copied, the kernel now makes sure 40 bytes are in userland. We could make it more fine-grained, but it would probably unoptimize the function, and we don't care enough. commit 9e723f27e5f47782caa2fe657a98f7882dcf5aed Author: maxv Date: Fri Sep 16 11:48:10 2016 +0000 Put two KASSERTs, to make sure the fault is happening in the correct half of the vm space when using special copy functions. It can detect bugs where the kernel would fault when copying a kernel buffer which it wrongly believes comes from userland. commit 439023955e2287defdf9902d779b1fb9dd4b0229 Author: maxv Date: Sat Sep 3 08:47:38 2016 +0000 Fix the mmap call, KNF, and make the output more readable. commit ab8ecc02212dfe2c25e35a089cad06a9d908084d Author: maxv Date: Fri Sep 2 08:52:12 2016 +0000 Give the structure sizes. commit ae5c6f34b2c0cfeeb2d8b1e9081aa17becd96abe Author: maxv Date: Fri Sep 2 08:28:06 2016 +0000 KNF, and give the structure sizes. commit 68b5f6e4edbcc1df6a66ef243bcb1aa492255af0 Author: maxv Date: Fri Sep 2 07:51:05 2016 +0000 Fix argument (does not compile anyway). commit 9981bb98d789bfe49d560c1146c1e0a7dbf6e8a1 Author: maxv Date: Sat Aug 27 16:17:16 2016 +0000 Don't protect the second page, since it is not part of the IDT. commit 550f526c32393518f31d7a45f1c476db7c492e6e Author: maxv Date: Sat Aug 27 16:07:26 2016 +0000 Map the boot IDT, GDT and LDT in three different pages on x86. It is much better this way, and it reduces the diff between x86 and Xen. Also, zero them properly, otherwise we might end up with garbage in several slots. commit d25ae6e23d1d9cdb941f933efe3a4a41eced64d7 Author: maxv Date: Sat Aug 27 14:19:47 2016 +0000 Remove idt_init. commit 17065cb1f21ad866bf0bbde55ccb21e1aeedf00a Author: maxv Date: Sat Aug 27 14:12:58 2016 +0000 Rename this value, and use it. commit 8ee5a8806706ef44261a122d43bebd4b2b9452dd Author: maxv Date: Sun Aug 21 12:02:38 2016 +0000 My changes. commit 21560f09d73b5f69f72921e8f04a7f2ec85a8900 Author: maxv Date: Sun Aug 21 11:48:59 2016 +0000 KNF, and typo. commit 977fa4825b5c6ca88789c691ee8e36a2f367fe24 Author: maxv Date: Sun Aug 21 10:42:33 2016 +0000 Simplify gdt_grow, and make sure we don't kenter more than has been virtually allocated. commit dee9094fbf58cedc0eb256bed96ea22f6cdf253b Author: maxv Date: Sun Aug 21 10:20:21 2016 +0000 Use KASSERT, and panic by default instead of allowing the area to overflow. commit df221eb102bb588cd2a4da60128a40d85934b639 Author: maxv Date: Sun Aug 21 10:07:15 2016 +0000 Explain a little what we are doing. Also, make sure gdt_init_cpu is called on the currently running CPU. Theoretically, we could put the same KASSERT in gdt_reload_cpu, but the associated IPI is never sent, which is another issue. commit 526c38e4cce2c848dba51a7b4bafb23d2e165f67 Author: maxv Date: Sun Aug 21 09:53:25 2016 +0000 Simplify. commit 8a003210b441d05a44d7387b53d49b4fe2a126a6 Author: maxv Date: Sat Aug 20 18:31:06 2016 +0000 Make this area compile, even if we don't support USER_LDT on amd64. commit 880a851fdd8f4e281d9c2055cd395a88e16a3006 Author: maxv Date: Sat Aug 20 18:04:04 2016 +0000 The GDT needs to be grown on each CPU, and not just gdtstore (cpu0). Otherwise, if the caller gets switched to another CPU, the kernel will end up accessing unallocated memory. Currently, it never happens. The same is done in i386. commit 61038f3aa716f28861d214f0af8a29ac3de5f3d0 Author: maxv Date: Sat Aug 20 16:05:48 2016 +0000 Localify. commit 17316c674ea5c56606aca7dbabbf7c87eef347b7 Author: maxv Date: Fri Aug 19 19:04:57 2016 +0000 Unused. commit 045a6ec04d4516f76ef9aa23cd18a494f2842881 Author: maxv Date: Fri Aug 19 18:53:29 2016 +0000 KNF so NXR likes it, and some typos commit 440175dbb406540f1cb90377cc30190ec2655f45 Author: maxv Date: Fri Aug 19 18:24:57 2016 +0000 Switch the XXXCDC to panics. Normally it should never be triggered, since the kernel space is above the PTE space, and the user space is below it. Any attempt to write or remove this area should be blocked by UVM earlier. commit b630fbe1d5f893c1e92158f8760ee0d8f7f68adc Author: maxv Date: Fri Aug 19 18:08:50 2016 +0000 Remove the last references to KMEMSTATS. commit bed2280835ea9579825a5862e39db3405b076eae Author: maxv Date: Fri Aug 19 18:04:39 2016 +0000 Rename new_pve2 -> new_sparepve, makes it less bizarre. commit 99cd09fc0d8311d68275818728b8b14164adae31 Author: maxv Date: Thu Aug 18 13:00:54 2016 +0000 KNF and simplify. commit 13675a90ef36e6c31ddf1e484782a1b1bea5b4e0 Author: maxv Date: Thu Aug 18 12:36:35 2016 +0000 Simplify. commit 26be7c8dd6b883a171085bba38b58fef29599895 Author: maxv Date: Mon Aug 15 09:30:22 2016 +0000 Use the exact same argument for kmem_alloc and kmem_free; from brainy commit 7890a4d2d7572afec2ca41eac40e1371202067d5 Author: maxv Date: Mon Aug 15 09:20:11 2016 +0000 Uninitialized var, found by brainy; not tested, but obvious enough commit 11476148a545ede4b45c5e7ddd253f8f5bf245bf Author: maxv Date: Mon Aug 15 09:14:12 2016 +0000 Memory leak, found by brainy; not tested, but obvious enough commit eedf4fb2225b3081d8c0cb2427b4fb8866880c4f Author: maxv Date: Mon Aug 15 09:06:39 2016 +0000 Two uninitialized vars, found by brainy. The former is similar to the one I fixed in ia64/stand/efi/libefi/devicename.c. I don't know how to fix the latter, so just add a comment. I will probably file a PR for this one. commit 6734a0dffaf58cd481f016044e9726fcde4965ce Author: maxv Date: Mon Aug 15 09:00:52 2016 +0000 Uninitialized var, found by brainy. I haven't tested this change, and it may not be the perfect way to fix it. But it seems correct enough. commit d3565c4956cb3188ef62322f7f1092a64d5dc697 Author: maxv Date: Mon Aug 15 08:52:33 2016 +0000 This thing is completely buggy. There is a use-after-free and NULL pointer dereference. Just fix the uaf, and add a comment. Not tested, but obvious enough; found by brainy. commit 3d8a2c88ea62e52ef56f192bba5ecaf007885661 Author: maxv Date: Mon Aug 15 08:43:19 2016 +0000 Return zero instead of error, otherwise it looks like it is supposed to return an error; found by brainy. commit 1455d8c0813b4c6a15d6b6fd1ac6c853befa8c60 Author: maxv Date: Mon Aug 15 08:40:23 2016 +0000 Uninitialized var, found by brainy. FreeBSD fixed it this way four years ago. I haven't tested this change, but it is rather obvious, as the FreeBSD commit indicates, that sc->ndis_io_rid should be used instead. commit d2c5d4a712897c54fc7d8a79b5f7b9d7741a59ac Author: maxv Date: Mon Aug 15 08:24:05 2016 +0000 Uninitialized var, found by brainy; not tested, but obvious enough commit e79080f8c17497948459d6ce102bddff9fbbe1ef Author: maxv Date: Mon Aug 15 08:20:11 2016 +0000 Curious typo, found by mootja commit 1aaad973d7cbcdeb5d63bfe8ff8cd4cd4169bfe1 Author: maxv Date: Mon Aug 15 08:17:35 2016 +0000 Uninitialized vars, found by brainy commit 4042a74f25f33b59ac15b1423d094177e9205a12 Author: maxv Date: Mon Aug 15 08:12:32 2016 +0000 Uninitialized var, found by brainy commit 03dd568e6f1f80eb21fb147eab9a847008935e3a Author: maxv Date: Thu Aug 11 15:45:39 2016 +0000 Use absolute addressing mode, just like the rest. commit 97952075cb2a3a5e611a189cc031e5921520e4a3 Author: maxv Date: Thu Aug 11 15:35:10 2016 +0000 Make the I/O area non-executable on Xen. commit ad5173f659e8c7594deb248a63ff8337fbb9f8df Author: maxv Date: Thu Aug 11 15:03:23 2016 +0000 This should be VM_MIN_KERNEL_ADDRESS, not KERNBASE. commit e638ea4314ce39cdbc954dbf21085ff205c58321 Author: maxv Date: Thu Aug 11 14:58:29 2016 +0000 Reduce the diff, and typo. commit 779c94105456b476a234d7b04463e629e3074f9c Author: maxv Date: Sun Aug 7 10:07:58 2016 +0000 KNF a little. commit d442ae61d1ddf8c1d01d6d32a5c0084beffa1365 Author: maxv Date: Sun Aug 7 09:55:18 2016 +0000 Explicitly return syscall-specific error codes, instead of the ones given by range_test. This fixes msync, mlock and munlock, which all return EINVAL instead of ENOMEM if the address is not in the va space. It should also fix the recent ATF failures. commit db25b878be02969f37b54f5ebbb2ac220625b529 Author: maxv Date: Sun Aug 7 09:04:55 2016 +0000 Explain a little. commit e9652cc8f54db6dcd4c193f4e40483cb6584d55a Author: maxv Date: Sat Aug 6 15:13:13 2016 +0000 The way the kernel tries to prevent a userland process from allocating page zero is hugely flawed. It is easy to demonstrate that one can trick UVM into chosing a NULL hint after the user_va0_disable check from uvm_map. Such a bypass allows kernel NULL pointer dereferences to be exploitable on architectures with a shared userland<->kernel VA, like amd64. Fix this by increasing the limit of the vm space made available for userland processes. This way, UVM will never chose a NULL hint, since it would be outside of the vm space. The user_va0_disable sysctl still controls this feature. commit 0a5021b0b53c08657631480fee108c5c833635f9 Author: maxv Date: Sat Aug 6 14:54:25 2016 +0000 Use the stack to save %edx. commit 727188e396dbb60499c1b5901cd7befd33acc5ad Author: maxv Date: Wed Aug 3 11:51:18 2016 +0000 Map the recursive slot and page table pages as non-executable on Xen. Same as normal x86. commit 93de96c5faf9f71202462f3d7a6307b3d3883992 Author: maxv Date: Tue Aug 2 14:21:53 2016 +0000 Map the kernel text, rodata and data+bss independently on Xen, with respectively RX, R and RW. commit 2894fe84981d27dd57a0771e77781ec2210ed49e Author: maxv Date: Tue Aug 2 14:03:34 2016 +0000 Align the segments properly, and split text+rodata in two separate segments on Xen. commit 9871a75b569a33c7de46abc12405c1a6022212c5 Author: maxv Date: Tue Aug 2 13:29:35 2016 +0000 Use PG_RO instead of a magic zero. commit a62bdd106a70bc486316e0ebe589737d3ae5b60f Author: maxv Date: Tue Aug 2 13:25:56 2016 +0000 KNF, and use PAGE_SIZE instead of NBPG. commit da1722635a231b77bfb65bc85956f2bdff46e8ed Author: maxv Date: Mon Aug 1 16:07:39 2016 +0000 This panic is wrong. There could be two consecutive clusters below avail_start. commit 8b8456bbc194207f806b5a22264b7f395b7ad6a7 Author: maxv Date: Mon Aug 1 15:41:05 2016 +0000 Don't fail if a module does not have a data or rodata section. Small modules don't have data. commit 79265a663464e1436cff14706b7e79d6ba6c2cfb Author: maxv Date: Wed Jul 27 16:45:00 2016 +0000 Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and UVM waits for the page to fault before kentering it. When kentering it, it will use the UVM_PROT_ flag that was passed to uvm_map; which means that it will kenter it as RWX. With this change, the number of RWX pages in the amd64 kernel reaches strictly zero. commit 322839624ac647707e6a9bfaa93bbdac59a54559 Author: maxv Date: Wed Jul 27 13:04:28 2016 +0000 Call cpu_init_msrs on i386 when waking up. Currently it does not change anything, since MSR_EFER is already enabled earlier. But if we add new MSRs in the future, we will want them when waking up as well. commit 44703eefd441b8b35acf1a60622bfe9e9ed59e6d Author: maxv Date: Wed Jul 27 12:08:46 2016 +0000 Re-enable large pages on the data segment, but don't map the first page, and add a comment to explain why. We will have to move the LAPIC VA. The large page support is technically the same as before my last commit, since in practice, the first page of .data is never mapped with large pages. commit b794c1fa917d702958c71d7fe9babd7184880afe Author: maxv Date: Mon Jul 25 16:03:38 2016 +0000 Remove lapic_id, lapic_ppr and lapic_isr. We need to be careful though: the offset of lapic_tpr must not change, and the whole area must be exactly one page. commit 612a5dc79d660d80d66905fb2b98664ca923d932 Author: maxv Date: Mon Jul 25 15:29:06 2016 +0000 Unused. commit 3b093aa0d423f61f768edb7bfb488042e4595213 Author: maxv Date: Mon Jul 25 15:18:41 2016 +0000 This needs to be page-aligned anyway. commit db0ef349a8370be7e32e23a811b53b73351db4cc Author: maxv Date: Mon Jul 25 12:11:40 2016 +0000 The L1 entry of the first page of the data segment is overwritten for the LAPIC page, and set as RWX+PG_N. The LAPIC pa is fixed, and its va resides in the data segment. Because of this error-prone design, the kernel image map is not linear, and I first thought it was a bug (as I vaguely said in PR/51148). Using large pages for the data segment is therefore wrong, since the first page does not actually belong to the data segment (even if its va is in the range). This bug is not triggered currently, since local_apic is not large-page-aligned. We will certainly have to allocate a va dynamically instead of using the first page of data; but for now, disable large pages on the data segment, and map the LAPIC as RW. This is the last x86-specific RWX page. commit 1bb4a89625f38e763ad8ad59e9a10c306999b11a Author: maxv Date: Sun Jul 24 14:09:22 2016 +0000 The MSR EFER state is not saved and restored when sleeping on i386. On PAE, the CPU crashes right after waking up, since it needs to access NOX-ed pages, which are to be enabled in an MSR. Fix this by properly saving and restoring the EFER MSR. It's a little tricky since the wakeup code uses %edx, but rdmsr overwrites it. We just save it in %esi. Now, the CPU sleeps properly on PAE kernels. commit be2d2b15749b4cba15bd0ef4069624ad8651e433 Author: maxv Date: Sun Jul 24 13:04:58 2016 +0000 KNF, and reduce the diff between amd64 and i386. commit 3602b5b61bdfc683905b9c30079a3b3bc6794f83 Author: maxv Date: Fri Jul 22 14:08:33 2016 +0000 Remove pmap_prealloc_lowmem_ptps on amd64. This function creates levels in the page tree so that the first 2MB of virtual memory can be kentered in L1. Strictly speaking, the kernel should never kenter a virtual page below VM_MIN_KERNEL_ADDRESS, because then it wouldn't be available in userland. It used to need the first 2MB in order to map the CPU trampoline and the initial VAs used by the bootstrap code. Now, the CPU trampoline VA is allocated with uvm_km_alloc and the VAs used by the bootstrap code are allocated with pmap_bootstrap_valloc, and in either case the resulting VA is above VM_MIN_KERNEL_ADDRESS. The low levels in the page tree are therefore unused. By removing this function, we are making sure no one will be tempted to map an area below VM_MIN_KERNEL_ADDRESS in kernel mode, and particularly, we are making sure NULL cannot be kentered. In short, there is no way to map NULL in kernel mode anymore. commit 585212dfd328de21b5243e5573f4d2a91e26cf3a Author: maxv Date: Fri Jul 22 13:01:43 2016 +0000 Simplify pmap_alloc_level. It is designed to work only with normal_pdes and PTP_LEVELS, so don't pass them as argument. While here, explain what we are doing. commit 7173a2d84e95e84452b02790e1a2ee3a8ea1a40d Author: maxv Date: Fri Jul 22 12:36:03 2016 +0000 Unused. commit 33ff7b9ccd06e4e06de88a6467c2af2448e602da Author: maxv Date: Wed Jul 20 13:49:17 2016 +0000 This comment is wrong. In fact, we are in low physical memory, but in high virtual memory, and only the latter matters. I'm not exactly sure why, but it appears that the kernel modules must be placed above the kernel image. Just make this comment more ambiguous, in case the next passer-by gets inspired. commit 0bba1f99e0fff575a999778e24be042f2dcc96a5 Author: maxv Date: Wed Jul 20 13:36:19 2016 +0000 Split the data+bss+rodata segment in two data+bss and rodata segments. The latter is made read-only. commit 9ebdae98fcf12b769419570bca3f78708d93de50 Author: maxv Date: Wed Jul 20 13:11:58 2016 +0000 Change the protection of the kernel modules segments once we are done relocating them. The text is allocated as RWX, and then mprotected to RW. There is a bug that prevents us from doing RW->RX on amd64 and perhaps sparc64. On x86, the pmap waits for the page to fault before granting it the X permission. But in the trap handler, such a page is considered as belonging to kernel_map, while it actually belongs to module_map. The kernel then finds out the page is not present in kernel_map, and panics. In all cases, module_map is non pageable, so even if the trap were handled properly, it still wouldn't work. Therefore, there is a small window in which the segment is RWX. But that's fine enough, for now. commit 126067326186cfb865114e47051374f0e2e906aa Author: maxv Date: Wed Jul 20 12:38:43 2016 +0000 Introduce uvm_km_protect. commit c94d7bbf462e30e063929e51882b9f5262c7b945 Author: maxv Date: Wed Jul 20 12:33:59 2016 +0000 There is a huge bug in the way a uvm_map_protect is processed on x86. When mprotecting a page, the kernel updates the uvm protection associated with the page, and then gives control to the x86 pmap which splits the procedure in two: if we are restricting the permissions it updates the page tree right away, and if we are increasing the permissions it just waits for the page to fault. In the first case, it forgets to take care of the X permission. Which means that if we allocate an executable page, it is impossible to remove the X permission on it, this being true regardless of whether the mprotect call comes from the kernel or from userland. It is not possible to make sure the page is non executable either, since the only holder of the permission information is uvm, and no track is kept at the pmap level of the actual permissions enforced. In short, the kernel believes the page is non executable, while the cpu knows it is. Fix this by properly taking care of the !VM_PROT_EXECUTE case. Since the bit manipulation is a little tricky we use two vars: bit_rem (remove) and bit_put. commit 67115e57b177f848a510b0f280b34523132379b7 Author: maxv Date: Tue Jul 19 18:54:45 2016 +0000 This loop makes no sense at all. commit 0beecb77a07f0e24777d29345def76f222e811b6 Author: maxv Date: Sun Jul 17 10:46:43 2016 +0000 Simplify x86_add_cluster. commit ce4be4055f21e22cb362343b17eec0506f4a8379 Author: maxv Date: Sat Jul 16 17:13:25 2016 +0000 KNF, and rename. commit 9f5919201d2ccaa9baf666607495203d3243507f Author: maxv Date: Sat Jul 16 17:02:34 2016 +0000 Simplify the way physical pages are internalized into the VM system on x86. Only two functions are called now: init_x86_clusters, which initializes the memory clusters from the bootinfo, and init_x86_vm, which inserts the pages from the clusters into VM. commit e9cd54bc38e21034c8de35963a9d607816a1cead Author: maxv Date: Sat Jul 16 14:51:45 2016 +0000 Introduce x86_load_region(), and explain a little what we are doing. commit f5eb3b134d98a7a3a25efad9fdb8e477f6f95e3a Author: maxv Date: Sat Jul 16 13:47:01 2016 +0000 Add the cr4 flags for PKE and UMIP. commit 1c39317bb3aaed7aa4cbbd21f4e983b9e24c493e Author: maxv Date: Wed Jul 13 15:59:54 2016 +0000 x86_alldisks can be NULL, so don't dereference it. Not tested, but obvious enough. commit 7a48bfca49b6341b8ab4cc6637bbb1852e445fa8 Author: maxv Date: Wed Jul 13 15:53:26 2016 +0000 Reorder some instructions, reduces the diff between amd64 and i386. commit a3aa93490fbdfcaca6fb8ae15cdc745fd67d27f6 Author: maxv Date: Wed Jul 13 15:39:33 2016 +0000 Remove msgbuf_paddr. commit 6973d580cec0d2798bb755eab83f6f517f58d13f Author: maxv Date: Wed Jul 13 15:35:56 2016 +0000 KNF commit 1ed6520ab8692187e5e5c131c53a955e786fc485 Author: maxv Date: Mon Jul 11 14:52:54 2016 +0000 KNF and simplify. commit 2b563b947798f437acf296f20bd538d6a23ac870 Author: maxv Date: Mon Jul 11 14:18:16 2016 +0000 KNF and simplify a little. commit 8e1137ed18728e114400973883729ca2d6e781de Author: maxv Date: Sat Jul 9 09:33:21 2016 +0000 Simplify pmap_get_physpage. commit 2bd80feb3bb216d432d321f6d6609ca4151e3991 Author: maxv Date: Sat Jul 9 09:25:44 2016 +0000 Use pmap_bootstrap_palloc. commit f1306edbc638df55f1d758d82310c7bf41a9e8e5 Author: maxv Date: Sat Jul 9 08:05:46 2016 +0000 When a user pmap is created, it is populated with the higher kernel slots, which become accessible upon kernel entry (syscall, cpu switch, or whatever). Put the NOX bit in the user recursive slot, so the whole tree does not appear as executable in kernel mode. This is already what is done in the kernel pmap. commit b4a2ae1073c17f6e845fa0deccc2f5df34189de2 Author: maxv Date: Sat Jul 9 07:47:25 2016 +0000 KNF this function a little commit e38d6778c42c6e41340124f190dfba9161b725d1 Author: maxv Date: Sat Jul 9 07:25:00 2016 +0000 When loading a module from VFS and from the bootloader, the kernel packs up the module segments into one big RWX chunk. Split this chunk into two different text and data+bss+rodata chunks. The latter is made non- executable. This also provides some kind of ASLR, since the chunks are not necessarily contiguous. commit d171963aeb11c64d4644507c02c5a2e37238eb10 Author: maxv Date: Sat Jul 9 06:58:06 2016 +0000 The CPU considers a given va as executable if none of its levels have the NOX bit. With the top level recursive slot, however, several levels are recursively omitted, which implies that each entry that is not the child of NOX-ed parents actually appears somewhere in the virtual space as executable via this slot, even if it is followed by an underlying entry that has the NOX bit. This recursive slot is only used to edit the page tree itself. Make it non-executable. commit a87f0f80a8d937c308758cd8c9185c3154cb4c80 Author: maxv Date: Fri Jul 8 09:15:38 2016 +0000 The preloaded modules are now reallocated dynamically by the kernel. This area does not need to be executable anymore. commit b15437bef049de86c688d03bb8dbe0f48d610e32 Author: maxv Date: Fri Jul 8 08:55:48 2016 +0000 Force the kernel to dynamically reallocate the preloaded modules. commit be0826f9221bfac77211eb7db63cb744189ea47e Author: maxv Date: Mon Jul 4 07:56:07 2016 +0000 Make the execution flow canonical instead of jumping back and forth, and complete the userland check. commit 2ed17b89144414c41386b9139f97d691abccfa9a Author: maxv Date: Sat Jul 2 07:22:09 2016 +0000 Explain why we should use kernel_map instead of module_map, and why we can't. We should probably add some GCC flags in the modules makefiles to make sure the relocations generated are not 32bit. Related to PR/43438. commit a05cf6e80a989d330bf21705febc34bd13804340 Author: maxv Date: Fri Jul 1 13:11:21 2016 +0000 Try to make this part more readable. No functional change. commit 7ef67fd160d264f0d547dd4720c60f79c19a97e5 Author: maxv Date: Fri Jul 1 12:49:22 2016 +0000 Ensure the restartable atomic sequence is in userland, for real. commit 31ba59e564b64f27e50afa83024d9ae75bce11ac Author: maxv Date: Fri Jul 1 12:41:28 2016 +0000 Don't confuse between VM_PROT and UVM_PROT. This should be VM_PROT. commit 077ca64228ac9dfe476cbf137a3ee460e664b93a Author: maxv Date: Fri Jul 1 12:36:43 2016 +0000 There is no direct map on i386, and therefore we always need to use temporary VAs and PTEs when mapping an area. These temporary VAs don't need to be executable. Put the NOX bit on them. commit 0cf2e32eddf3b19d889f5776709eddbf7afd357f Author: maxv Date: Fri Jul 1 12:18:34 2016 +0000 Surprisingly enough, the kernel expects the CPU to support large pages when creating the direct map on amd64. Therefore, the amd64 CPUs that do not support large pages basically don't work on NetBSD. It looks like it has always been this way; add a KASSERT to panic properly in case we come across one of these CPUs. commit 07eeef97ccb3a438be36390b89ccb39574ace671 Author: maxv Date: Fri Jul 1 12:12:06 2016 +0000 KNF a little, remove some stupid comments, and add some when needed. commit 4bb7781864911d1a1b69020ce9e6e0cdcfd144aa Author: maxv Date: Fri Jul 1 11:57:10 2016 +0000 We use only one L4 slot for the direct map, which means that we cannot map more than 512GB. Panic properly if this limit is reached. commit af0950ae80cc5c4e2991cee945ad59c9d94cacdb Author: maxv Date: Fri Jul 1 11:44:05 2016 +0000 Use pmap_bootstrap_valloc and pmap_bootstrap_palloc under XEN at least once, for these not to appear as unused functions (not tested, but I guess). commit 94712ab18141e34efcc7e34b627f8933b31733f3 Author: maxv Date: Fri Jul 1 11:39:45 2016 +0000 Create the direct map in a separate function. While here, add some comments to explain what we are doing. No functional change. commit 8608339d191c4b582b3d534cb9582ca283fa77ce Author: maxv Date: Fri Jul 1 11:28:18 2016 +0000 Introduce pmap_bootstrap_valloc and pmap_bootstrap_palloc, that are used to allocate a virtual/physical address before the VM system has been set up. Start using it. commit c0f69e2cb37961275182d64bf5b22a8390d4254a Author: maxv Date: Fri Jul 1 11:20:01 2016 +0000 Put the code in charge of remapping the kernel segments with large pages into another function. No functional change. commit 6808883e1c8cb22484deb85bc000ca585b7ad63b Author: maxv Date: Fri Jul 1 11:10:48 2016 +0000 Define pmap_pg_nx globally. Will be used soon. commit 0d87d7e76710a7582e0e1609ded2b7718c8d58da Author: maxv Date: Fri Jul 1 10:20:10 2016 +0000 Remove this area (unused). commit 6f8c3734b3927e7b628c4599c9694130c926a249 Author: maxv Date: Sun Jun 5 14:13:57 2016 +0000 Don't use a magic value. Define a limit, and enforce it. commit 349e1a478087c3142bbdda777b1e2b269e7b1ea8 Author: maxv Date: Sun Jun 5 14:06:31 2016 +0000 The bootinfo is refreshed each time the bootloader tries to execute a kernel, so there's no point in using this global variable. Because of this variable, only one "boot" command can be issued in the prompt, and you have to reboot the machine if you mistyped the kernel name. commit d3cf12532257b47c860f2d95d58341273e92700c Author: maxv Date: Sun Jun 5 13:44:48 2016 +0000 Remove the ALLOC_FIRST_FIT and ALLOC_TRACE options. This is a rather simple allocator, and it does not need to be that complicated. commit 8f2cb6db69ee1fde6ad6661780efbf0b9c234fd0 Author: maxv Date: Sun Jun 5 13:33:03 2016 +0000 Use gets_s instead of gets. The x86 bootloader prompt is easy to overflow. commit 5a0ce4fcdfcf617f54f053aef4bf6d36ad388ca2 Author: maxv Date: Sat Jun 4 10:48:11 2016 +0000 The ISA I/O MEM does not need to be executable. Remove the X permission on it. commit 50b5133a958a5bcf4d5a8f691c89f9d9487d9169 Author: maxv Date: Sat Jun 4 10:19:09 2016 +0000 Use the same instruction layout to map the ISA I/O. No functional change. The comment is still wrong: we are not on (4), we are actually below the kernel area in physical memory. I'll fix that later. commit 8ee94f69e229864b590a72dd5cca3cf0cfc7470f Author: maxv Date: Sat Jun 4 10:02:12 2016 +0000 Define and use fillkpt_blank on i386, like amd64. The PAE case is included in fillkpt_blank, since PDE_SIZE is either 4 or 8 bytes. commit 9471fc7e067237a2381f6a4ddc39105f59f37762 Author: maxv Date: Sat Jun 4 09:52:41 2016 +0000 Initialize cpuid_level at compile-time, not run-time. Same as amd64. commit db46ddbb39575f4147b3548071f91ff5f810e520 Author: maxv Date: Sat Jun 4 09:45:57 2016 +0000 Reorder some definitions. Reduces the diff between amd64 and i386. commit 2da0e5f72860e37fc6ff0c2291a81a2a2c080e46 Author: maxv Date: Sun May 29 09:16:11 2016 +0000 Define tablesize. Useful when debugging. commit 61b12c0d51b97b930a46c06c41a54dae7b98a871 Author: maxv Date: Sun May 29 09:04:19 2016 +0000 Revert rev1.94. It apparently raises a page fault from SMEP. I need to investigate the whole kernel mappings anyway, so I'll recommit this patch later. commit f7e41e32194b9048f46016a490e1c2e801af9aca Author: maxv Date: Sat May 28 09:03:16 2016 +0000 Define fillkpt_blank, which creates blank entries in a page table. Use it to map the first MB. No functional change. commit 51c36cec913f26c5e643e1a4d931b5145c6aed53 Author: maxv Date: Sat May 28 08:43:16 2016 +0000 Move proc0's stack out of the BOOTSTRAP TABLES, and map it independently with RW permissions. Reduces the impact of a stack overflow. commit 4caf51cb740414317282412b5bb90fb60ad0410b Author: maxv Date: Thu May 26 07:24:55 2016 +0000 There is an issue in the way the fillkpt macro sets up pages on both amd64 and i386. The fillkpt loop is equivalent to the following: do { /* fill in the slot */ /* increment %ebx to the next slot */ /* increment %eax to the next pa */ } while (%ecx > 0) The issue here is that if %ecx = 0 (i.e., the chunk we are trying to map is zero-sized), there is still one entry created in the page table. The kernel expects the va<->pa translation to be linear in low memory. If there is a zero-sized chunk, the dead entry creates a +4096 offset in the virtual space, with two consecutive entries that point to the same physical address. In other words, the mappings are not linear anymore, which causes the kernel to die. Before my recent changes, there were only two big chunks that were mapped, and neither of these could be zero-sized. Now, with multiple, fine-grained chunks, it is possible that the [SYMS]+[PRELOADED_MODULES] chunk could be zero-sized. [PRELOADED_MODULES] is almost never here, and [SYMS] is always here on default kernels. Except for floppies, where the bootloader does not load [SYMS]. Should fix PR 51148. commit 2c46ade1599b57294ba5f0fe820bc310963ca3df Author: maxv Date: Sun May 22 10:11:55 2016 +0000 Save L4's physical address earlier. Also, PDE_SIZE has nothing to do here, we are just zeroing out the upper 32bits of the 64bit pointer. commit b41804842c57259aa09c4f514b66f33f4d022fae Author: maxv Date: Sun May 22 09:10:37 2016 +0000 Revert my previous change. I missed an entry on NXR. commit 64430cfcd5dc1c9908a8fe4218d4d986f02c559f Author: maxv Date: Sat May 21 07:15:56 2016 +0000 There is an issue in the way the direct map is set up on amd64. When allocating memory, the kernel allocates physical pages and virtual addresses for these pages. In order to optimize allocations smaller than PAGE_SIZE, uvm_km_kmem_alloc can allocate a single physical page and take its virtual address in the direct map in high virtual memory. This direct map is set up at boot time, its PTEs do not change, and therefore they don't need to be kentered. These high virtual PTEs being constant, the permissions of the areas they point to are fixed at boot time and cannot change. The problem is that at boot time, they are created with RWX permissions. Therefore, allocations smaller than PAGE_SIZE in the kernel heap are all executable: mbufs, pnbufs, small kmem allocations, etc. Fix this by setting the NOX bit in the direct map pages at boot time. We also set the NOX bit in the temporary tmpva, since it does not need to be executable either. This also makes the U-area non executable on amd64. commit f803f1f824e9154193b57e537c30514f39043e45 Author: maxv Date: Sat May 21 07:00:18 2016 +0000 Explain where this value comes from. commit 6630351e242701667ec24b7e63d85c21f42c4e3a Author: maxv Date: Sat May 21 06:37:28 2016 +0000 USPACE and USPACE_ALIGN are constants. Use a #if instead. Probably saves some instructions. commit da47b4dadc9fef89eb28f074b9de8dec9de9bee1 Author: maxv Date: Mon May 16 07:52:31 2016 +0000 Update kern.ldscript.4MB. It is the same as kern.ldscript, but with a large page alignment before rodata. commit 1673c13037ed65475765e016e0ba94bdc8943b01 Author: maxv Date: Mon May 16 07:37:45 2016 +0000 Mention fine-grained permissions and large pages on x86. commit e4ec4bb9b241e5f8fd1d70f0c7b4d84ac2e31ae6 Author: maxv Date: Sun May 15 10:35:54 2016 +0000 Explicitly mention MP_TRAMPOLINE in these comments, so that NXR links them. commit 266038ae0711162ffa8f9bb3d60477fb1e52e1e4 Author: maxv Date: Sun May 15 07:17:53 2016 +0000 Split the PRELOADED_MODULES+BOOTSTRAP_TABLES chunk into two separate chunks mapped independently with RWX and RW, on both amd64 and i386. This way the BOOTSTRAP TABLES are non-executable. commit 068cb85bfbe17821b8d73662b680ed7d932206ef Author: maxv Date: Sun May 15 07:01:36 2016 +0000 Reduce the diff between amd64 and i386. We invert two instructions on amd64, but it makes no difference since PDE_SIZE = 8. commit 5b7f80ea8ba17f213e95467b6121bb46d3928024 Author: maxv Date: Sat May 14 12:48:31 2016 +0000 KNF so it appears aligned on NXR, and fix a comment. commit fef69c45c3f71e1055b2462ec339d58056b00fa2 Author: maxv Date: Sat May 14 09:51:56 2016 +0000 Actually, put the NOX identification above. Old CPUs do not support the cpuid instruction. commit 0b7467d2671ac4f6f0941511619a910bb56110cb Author: maxv Date: Sat May 14 09:37:21 2016 +0000 The NOX bit on large pages does not need to be amd64-specific anymore. The i386 secondary CPUs can now properly handle it. commit 45ae6d6d54120be1960a0aeec441db0a35cb9dda Author: maxv Date: Sat May 14 08:49:16 2016 +0000 Map rodata and data+bss independently, and give them R and RW with fillkpt_nox. The code is exactly the same as amd64's. commit 9646b269d68bc78c2c69e5d4f5051aae769e4348 Author: maxv Date: Sat May 14 08:39:41 2016 +0000 Define fillkpt_nox on i386, same as amd64. But there is a difference in the way it is done here. If PAE is not enabled, PDE_SIZE = 4, so there is no NOX bit set. If PAE is enabled, PDE_SIZE = 8, so the NOX bit is set. This works exactly as intended, since NOX does not exist in the non-PAE case. commit 87ffcc85fafc8f0f75b11f8b87b9eac89039c120 Author: maxv Date: Sat May 14 08:34:00 2016 +0000 Fix the secondary CPUs bug in i386. Same as amd64. commit 9380e554d8ef3a138eb62f6e9be0679235340b06 Author: maxv Date: Sat May 14 08:19:42 2016 +0000 Align the segments on i386. We're going to map them independently. commit ea58bc11e613f5b96cff4068b04f7b35a2874ea5 Author: maxv Date: Sat May 14 06:49:34 2016 +0000 Define killkpt, and don't use _RELOC. Same as amd64. commit 4f9e07f14044aef347b7ebb6e1e8e6fbb35b9389 Author: maxv Date: Fri May 13 14:09:38 2016 +0000 Mention SMEP. commit e0aaf5b3b9095d5870baf798494ee0fc95c6777c Author: maxv Date: Fri May 13 14:03:00 2016 +0000 Bring some amd64 swag. No functional changes. commit 19a4beadbceb7142f184770a78bfc21b25bef1ea Author: maxv Date: Fri May 13 13:24:01 2016 +0000 KNF a little, use C-style comments, and remove susword/fusword. No functional changes. commit 11e0a9a845bd3e5c0d8d06f9173cc70cb863825e Author: maxv Date: Fri May 13 11:47:02 2016 +0000 Actually, make the NOX part amd64-specific. The secondary CPUs bug is not yet fixed on i386. commit 2e3d60854af2b45eec782d45854ce0c2f125b398 Author: maxv Date: Fri May 13 11:17:20 2016 +0000 KNF, so it appears aligned on NXR. commit 4b09c753b43c927795a7865456fc7a2fc22ecf8c Author: maxv Date: Fri May 13 10:24:42 2016 +0000 Remap the rodata and data+bss segments with large pages on x86. There still is a bug in the way the text segment is mapped, but I'll see later. commit 4123c24356611ad23daf39815802f5e48ddf52fd Author: maxv Date: Fri May 13 10:18:01 2016 +0000 Define __kernel_end. commit 24c58a1c0d557c0f3d92b145002dccb426444e19 Author: maxv Date: Fri May 13 05:45:13 2016 +0000 Xen therefore uses x86/db_memrw.c, as I suspected. Define __rodata_start in the Xen ld scripts, so that it can compile. We put the __rodata_start definition right before __data_start, for it to appear as dead code, since the rodata segment is not yet mapped independently on Xen. commit a1ad52c3929262f7cf46507ddfdee55fd68e9f55 Author: maxv Date: Thu May 12 09:40:23 2016 +0000 KNF, and reduce the diff between amd64 and i386. commit 966e1f6a0732b8c1d0e6c4c1c58b0aeda0371fec Author: maxv Date: Thu May 12 09:05:16 2016 +0000 Map the data+bss chunk independently on amd64, and remove the X permission on it. commit c433d3a39ca64b926a3d647e25ee517551888700 Author: maxv Date: Thu May 12 07:51:09 2016 +0000 Define fillkpt_nox, which sets up a set of pages and puts the NOX bit on them by using nox_flag. Use fillkpt_nox to map the rodata segment without X permissions. commit 670625d42cbd0ebf707ef1118981f3e9c74128fa Author: maxv Date: Thu May 12 07:21:18 2016 +0000 Map the rodata segment independently on amd64, and remove the W permission on it. commit 58e5be1881eabb1ab8efdf3399db9b8f3890c0e3 Author: maxv Date: Thu May 12 06:57:55 2016 +0000 KNF the Xen ld scripts on x86. commit a454c477d10b60e4db10925d64123911a331a6bb Author: maxv Date: Thu May 12 06:45:16 2016 +0000 Split the {text+rodata} chunk in two separate chunks on x86. The rodata segment now loses the large page optimization, gets mapped inside the data segment, and therefore becomes RWX. It may break the build on Xen. commit e4da397b83a01262ee5b0c6efec3d05b18e3099e Author: maxv Date: Wed May 11 19:35:08 2016 +0000 There is a bug in the way the secondary CPUs are launched on amd64. When CPU0 is launched, EFER_NXE is enabled in it, and it allows it to handle pages that have the NOX bit. When the secondary CPUs are launched, however, EFER_NXE is enabled only after paging is set in their %cr0. And therefore, between the moment when paging is enabled and the moment when EFER_NXE is enabled, the secondary CPUs cannot access pages that have the NOX bit - they crash if they try to. The funny thing is that in order to enable EFER_NXE, the secondary CPUs give a look at cpu_feature[2], which is in the DATA segment, which in turn could have the NOX bit. In other words, the secondary CPUs crash if the DATA segment is mapped with the NOX bit. Fix this by enabling EFER_NXE in the secondary CPUs before enabling paging. CPU0 initializes nox_flag to the 32bit version of PG_NX if NOX is supported; the secondary CPUs then use nox_flag to know whether NOX is supported. nox_flag will be used for other purposes soon. commit 9473570ff2c420939dc168bf1484ab9d1ffb541c Author: maxv Date: Wed May 11 17:48:05 2016 +0000 Switch to C-style comments, and reduce a little the diff between i386 and amd64. No functional changes. commit 22d64b6a5aebb336b3907cf0577d29819f9075f4 Author: maxv Date: Sun May 8 08:30:41 2016 +0000 Define __rodata_start. Will be used soon. commit 6edfdc72a7ef3d1be14fb0c84fbb6348e1234706 Author: maxv Date: Sun May 8 08:22:58 2016 +0000 Use killkpt for the PML4 entries as well. commit c49f89c3e68c844a0db3f16b2f636cd29ec532d1 Author: maxv Date: Sat May 7 13:08:30 2016 +0000 clarify commit f98cd8c05e56596b5af6ace8254ee84abca7750d Author: maxv Date: Sat May 7 12:45:55 2016 +0000 Large pages are supported by default for the text+rodata segments. Apply the proper alignment for the data segment, so that more pages can benefit from it. Reduces TLB contention. kern.ldscript.2MB and largepages.inc are useless. commit 7fbf4e80346e148a4ffa01decebe65282c4e5e29 Author: maxv Date: Sat May 7 11:59:08 2016 +0000 uaf commit d79b44555208ca44d3c14d0b908e714f03627918 Author: maxv Date: Sat May 7 11:49:21 2016 +0000 clarify commit e0261f2b242b187904e2bf51142630e9c2b27ceb Author: maxv Date: Sat Dec 19 13:15:21 2015 +0000 Missing field (was here before my change). commit 3ec2e4f7440874c6b6ea989a7c4edb86efb35289 Author: maxv Date: Wed Dec 16 18:54:03 2015 +0000 Extend SMEP support to i386 (does not require PAE). commit 018ecf7be0f164b1a5136f74be8db8e4f4b92387 Author: maxv Date: Sun Dec 13 15:53:05 2015 +0000 Implement amd64 support for SMEP - Supervisor Mode Execution Protection. Now, on CPUs that support this feature, if the kernel tries to execute an instruction located in userland, the CPU will trigger a page fault. Tested on amd64 (Intel Core i5). commit e43f12b5a94ade04819accf4060c561bad665263 Author: maxv Date: Sun Dec 13 15:02:19 2015 +0000 Retrieve cpuid7 (Structured Extended Features) into ci_feat_val. commit 323192a104dfa3bc87a2c46f60435644bbe6c142 Author: maxv Date: Sat Dec 12 15:27:42 2015 +0000 Put the code in charge of handling MODCTL_STAT (32bit) into a separate function. No functional change. commit 49b08116f1a1991f3ebbb28833e557b4b2612ef9 Author: maxv Date: Sat Dec 12 14:57:52 2015 +0000 secmodel_extensions_system_cb() is not mount-specific, even though KAUTH_SYSTEM_MOUNT happens to be the only option handled here. Put everything into a swith(action). No functional change. commit b9218dd63c74db8a597e834fa89d3da11abfcf74 Author: maxv Date: Sat Dec 12 14:47:37 2015 +0000 Put the code in charge of handling MODCTL_STAT into a separate function. No functional change. commit 76ed77605cf831afdecdbb2ca325481657f2e896 Author: maxv Date: Wed Dec 9 18:25:32 2015 +0000 Rename verified_exec.c -> veriexec.c. The old log is now in Attic/. commit 59f9a1d63c25d2c37389b3e08c7e9ae750aca781 Author: maxv Date: Wed Dec 9 16:55:18 2015 +0000 KNF, and use C-style comments. Also, remove fusword/susword. commit 45bed4b9ef39b01db0fced17b670daef95c11f5e Author: maxv Date: Wed Dec 9 16:26:16 2015 +0000 KNF commit 2657837c2660d9bf3e04cc0be5024f545f297b47 Author: maxv Date: Sat Nov 28 18:08:40 2015 +0000 KNF commit 3ebf55ce7973415498b52e36c90d69c0852e4345 Author: maxv Date: Wed Nov 25 16:00:09 2015 +0000 Cosmetic changes. commit ebad3755f62ab1ceb1a29dde05c26ac2424a87f6 Author: maxv Date: Sun Nov 22 14:06:08 2015 +0000 Remove cpu_vendorname (unused). It is retrieved later in identcpu.c. commit ab6f896459c0c87581d77010c1fcd20afd56bb08 Author: maxv Date: Sun Nov 22 13:41:24 2015 +0000 KNF a bit, so I don't get scared each time I open a file commit a0e905e28afa79e82bd965bc730cffa6f7b9756e Author: maxv Date: Sun Nov 22 10:18:59 2015 +0000 Clarify: - add some comments - rename some jumps - KNF No functional change. commit 40c56fa3998fcf15537475a7fe8b0ef4da6ecc2c Author: maxv Date: Sat Nov 21 12:34:48 2015 +0000 Remove the amd64 implementation of fuword and suword. They are not used in the MI+amd64 code - Christos replaced them yesterday by copy*. They are both buggy: - suword does not properly check the userspace limit: 64 bits are copied, but the max address checked is VM_MAXUSER_ADDRESS-4, which means that 4 bytes may overflow. Reported by Ed Schouten. - fuword is supposed to be symmetrical with suword. But it uses 32bit registers, so it stores 32bit values! Spotted by Chuck (chs@). commit 37e261888efee0476e5fe7abf4111d8e080ba589 Author: maxv Date: Fri Nov 20 11:58:00 2015 +0000 A few changes: - remove cpu_id and cpu_brand_id (unused) - copy a comment from i386 about fillkpt - define PDE_SIZE (i386) commit de24102bff43b74873cd72c204c79bae503fd7ec Author: maxv Date: Sat Nov 14 14:01:23 2015 +0000 KNF, and fix some comments commit d6af0bce376955652f9d30584968deb326e1f4c3 Author: maxv Date: Sun Oct 25 07:51:16 2015 +0000 Uninitialized variable. Found by Brainy. ok pgoyette@ commit 188454744082d708f5c91433c951c53849ab12b9 Author: maxv Date: Fri Oct 23 19:40:10 2015 +0000 Change do_sys_mount() so that it only takes as argument the type of the drive instead of its associated vfsops. Makes it more friendly, and allows compat binaries to autoload VFS modules if needed. sent on tech-kern@, ok christos@ commit a8b41879d5182c2d5d3c59bf19ff3e6c78bc7967 Author: maxv Date: Thu Oct 22 11:48:02 2015 +0000 Reset the PaX flags, make sure ep_emul_arg is NULL, and add a comment. commit 8fce2945dd5c3f3651b757adf878ce3f6dfa8350 Author: maxv Date: Thu Oct 22 11:38:51 2015 +0000 Check the error code from es_setup_stack, and correctly free ep_emul_arg if it fails. That bug is harmless, since ep_setup_stack never fails. commit 9232f9f6f84ed633263083bb4e62c0e418101c0f Author: maxv Date: Thu Oct 22 11:31:31 2015 +0000 Fix PR 50070. From hannken@. commit 76a42ac7e865447b348833406c6a312de30f4fe1 Author: maxv Date: Tue Oct 20 14:46:45 2015 +0000 Harmless alloc inconsistency; make sure the exact same argument is given to kmem_alloc/kmem_free. Found by Brainy. commit 817cc08474c9e5913858603fc785e05b4d8d7c87 Author: maxv Date: Sun Oct 18 17:13:32 2015 +0000 Add some {} when the meaning is too ambiguous. From Brainy. commit bbbdca85de02f821cf33987474a8e140ec7ab426 Author: maxv Date: Sun Oct 18 16:59:19 2015 +0000 Make sure we have space for the aout header. commit 22fa9f45719d791a4f2d5c142285519e18f05de4 Author: maxv Date: Sat Oct 10 10:51:15 2015 +0000 Remove the mach entry. commit 0bc84ee82fe473314907abdf6773cf59066e28b9 Author: maxv Date: Sat Sep 26 16:33:16 2015 +0000 Disable PAX_SEGVGUARD. We actually have a big problem: the fileassocs are never deleted. Therefore, if a user generates a lot of buggy binaries and launches them all, the kernel will allocate memory again again and again for all these entries and will never free them (unless the files are deleted from the disk). Which means that a user can too easily put the kernel under memory pressure. commit 26bf730375771cd06807cc15380ad5222885ceef Author: maxv Date: Sat Sep 26 16:12:24 2015 +0000 Revamp the way processes are PaX'ed in the kernel. Sent on tech-kern@ two months ago, but no one reviewed it - probably because it's not a trivial change. This change fixes the following bug: when loading a PaX'ed binary, the kernel updates the PaX flag of the calling process before it makes sure the new process is actually launched. If the kernel fails to launch the new process, it does not restore the PaX flag of the calling process, leaving it in an inconsistent state. Actually, simply restoring it would be horrible as well, since in the meantime another thread may have used the flag. The solution is therefore: modify all the functions used by PaX so that they take as argument the exec package instead of the lwp, and set the PaX flag in the process *right before* launching the new process - it cannot fail in the meantime. commit 8ea9d455dec9c10425de811baa4c25e6aa544595 Author: maxv Date: Sat Sep 26 12:16:28 2015 +0000 Curious typo. Harmless. Found by Brainy commit 525a9da904f31e1469914704531a94506f0cd83b Author: maxv Date: Sat Sep 26 11:16:12 2015 +0000 Remove KMEMSTATS. Normally it's ok now. commit 2aa6fddc6bebf727bc57906303ab20286b50b84e Author: maxv Date: Sat Aug 29 12:24:00 2015 +0000 Don't decrement the number of offline cpus if we fail to shut down one. ok christos@, via tech-kern@ commit 8b82f543c5e1094872d469dce2378cf51c99d08e Author: maxv Date: Sat Aug 15 10:31:41 2015 +0000 Mention UVM_KMF_EXEC. commit 464cc2d795f7066ef97f21b8243f6cfa2ee76d5c Author: maxv Date: Sat Aug 15 10:24:29 2015 +0000 Remove pax_adjust() (does not exist). commit 20b34bfe006a6ef470c7f126e6df7d131f5ab302 Author: maxv Date: Sat Aug 15 10:18:07 2015 +0000 Remove POOL_INIT() (does not exist). commit bbe540f1d178db088b150f4c027fa7de65ea1381 Author: maxv Date: Wed Aug 12 07:53:56 2015 +0000 Remove KMEMSTATS. commit ffe4e8ee5eee416e0266032bce025347be03f546 Author: maxv Date: Sat Aug 8 12:02:35 2015 +0000 easy kmem_alloc(0) ok shm@ commit 799233e5466b86f65ba456285cc79cce4540424a Author: maxv Date: Sat Aug 8 06:36:24 2015 +0000 Remove KMEMSTATS. commit 0c8eee95e92ad8325c93d8f0488753afd5abe21d Author: maxv Date: Sat Aug 8 06:24:40 2015 +0000 revert; but still fix the comment commit 11003d463c7da0957463610cd1490a2eba5d38ab Author: maxv Date: Fri Aug 7 15:53:24 2015 +0000 Document some of my most important changes (a bit late...) commit 989dfed2ba11fddbae5322c744c7cb83ccf7a7b0 Author: maxv Date: Fri Aug 7 13:53:28 2015 +0000 Remove KMEMSTATS. commit 3a8033fe385d23506cec0a56b18c65ed8092cfde Author: maxv Date: Fri Aug 7 07:34:56 2015 +0000 Remove KMEMSTATS commit 80d6f9241f82a9e32db39c3dad5d6eead911134c Author: maxv Date: Fri Aug 7 07:29:33 2015 +0000 Remove the KMEMSTATS option. It no longer exists. commit ff868e434ff5bf93cb5fa707bec6f0978c0ae6ae Author: maxv Date: Fri Aug 7 07:14:43 2015 +0000 Remove the malloc debug options. They no longer exist. commit 3ff0ceefe14782e75749473a700a3b7c7df8702c Author: maxv Date: Wed Aug 5 15:58:01 2015 +0000 stupid comment, and make sure we are not executing a lib commit e5c5d0570da2b5af8be26dd9d4b47972ad8a6002 Author: maxv Date: Tue Aug 4 18:28:09 2015 +0000 Some changes, to reduce a bit my tech-kern@ patch: - move the P_PAX_ flags out of #ifdef PAX_ASLR in pax.h - add a generic pax_flags_active() function - fix a comment in exec_elf.c; interp is not static - KNF for return - rename pax_aslr() to pax_aslr_mmap() - rename pax_segvguard_cb() to pax_segvguard_cleanup_cb() commit 6463c4da9d3ffd9878fbb36b90a1895cc40e2b55 Author: maxv Date: Tue Aug 4 12:44:04 2015 +0000 Remove uvm_extern.h and exec.h (unused). commit 11acd84342a8834be24eb83c897482f31233358b Author: maxv Date: Tue Aug 4 11:42:08 2015 +0000 Small changes: - remove the per-page stuff. It has been disabled for 10 years, and it is not implemented properly. - typo in comment - use KASSERT commit 10c9b59a35454d8eda84f5493e60ba1e89df5f7e Author: maxv Date: Sun Aug 2 07:37:57 2015 +0000 Wrong logic. Here, userland can control the size and the data copied, which basically means it can overflow kernel memory. ok martin@ christos@ commit a4ec753d1a236fb747898618fdb023b7d4f77e60 Author: maxv Date: Fri Jul 31 07:37:17 2015 +0000 Small changes: - rename pax_aslr_init() to pax_aslr_init_vm() - remove the PAX_ flags (unused) - fix a comment in pax.h commit 454467db209d06d7396abe3299e77dbfd125192c Author: maxv Date: Thu Jul 30 15:28:18 2015 +0000 Revamp PaX: - don't confuse between ELF flags and proc flags. Introduce the proc- specific P_PAX_ASLR, P_PAX_MPROTECT and P_PAX_GUARD flags. - introduce pax_setup_elf_flags(), which takes as argument the PaX flag of the ELF PaX note section, and which sets the proc flag as appropriate. Also introduce a couple of other functions used for that purpose. - modify pax_aslr_active(), and all the other similar pieces of code, so that it checks the proc flag directly, without extra ELF computation In addition to making PaX clearer, the combination of these changes fixes the following bug: if a non-PaX'ed process is launched, and then someone sets security.pax.{aslr,mprotect,segvguard}.global=1, the process becomes PaX'ed while its address space hasn't been randomized, which is not likely to be a good idea. Now, only the proc flag is checked at runtime, which means the process's PaX status won't be altered during the execution. Also: - declare PAX_DPRINTF, makes it more readable - fix a typo in exec_elf.h commit 8c451af6a57611f1d5db7719ba22ff9624b7f839 Author: maxv Date: Thu Jul 30 09:55:57 2015 +0000 Lock before calling uvm_swap_stats(). Otherwise a race condition could corrupt memory. commit 83a6074a98d426510e2f1191975201f4be175ca2 Author: maxv Date: Thu Jul 30 08:11:44 2015 +0000 Don't forget to unlock the LWP. ok rmind@ commit c1a82fe554fe3f52b59752a64d23995b9c6d0426 Author: maxv Date: Tue Jul 28 12:32:44 2015 +0000 Introduce POOL_REDZONE. commit ff4cd12f3f0a63d0dc42ff1d9ac086cea4d76163 Author: maxv Date: Tue Jul 28 08:59:47 2015 +0000 Document KMEM_SIZE, KMEM_REDZONE and KMEM_GUARD. commit 605d20f73a8afbe4aaad990c4c5f6d8aee7ce7ea Author: maxv Date: Mon Jul 27 09:24:28 2015 +0000 Several changes and improvements in KMEM_GUARD: - merge uvm_kmguard.{c,h} into subr_kmem.c. It is only user there, and makes it more consistent. Also, it allows us to enable KMEM_GUARD without enabling DEBUG. - rename uvm_kmguard_XXX to kmem_guard_XXX, for consistency - improve kmem_guard_alloc() so that it supports allocations bigger than PAGE_SIZE - remove the canary value, and use directly the kmem header as underflow pattern. - fix some comments (The UAF fifo is disabled for the moment; we actually need to register the va and its size, and add a weight support not to consume too much memory.) commit a7d9cc652842e4e2c3f8814806b26de09a572708 Author: maxv Date: Sat Jul 25 08:36:44 2015 +0000 Memory leak. Same as r1.93. I don't know why Brainy didn't detect it earlier; or perhaps I forgot to report it. Found by Brainy. commit 96293727e83e561ef734a3c6a2bc6d6ed21f7b28 Author: maxv Date: Fri Jul 24 13:02:52 2015 +0000 Unused inits (harmless). Found by Brainy. commit 87bb96fdb18f44088499a78eeadea2989a577478 Author: maxv Date: Fri Jul 24 12:29:55 2015 +0000 typo (comment) commit c0236f54da67124b0dc43f7b14e4b1154346caf9 Author: maxv Date: Wed Jul 22 14:25:39 2015 +0000 Memory leak, triggerable from an unprivileged user. commit 78519971d5a0a738f94a84e8b4ee1f5fbd09e613 Author: maxv Date: Wed Jul 22 14:18:08 2015 +0000 Memory leak. Triggerable from an unprivileged user via COMPAT_43. commit e5ef39d240b0709d1b0ef521a4d62a13762dd7c5 Author: maxv Date: Wed Jul 22 14:10:45 2015 +0000 Double compiler branch. Found by Brainy commit d9994ea3157291011b4b58a513e2d38926c1b0f7 Author: maxv Date: Wed Jul 22 14:06:26 2015 +0000 Set 'error' properly. commit e587dd03b142d64b8e8ec951aa8c2d189a497cf6 Author: maxv Date: Sat Jul 4 06:13:01 2015 +0000 Remove a dead continue. Harmless, found by Brainy commit 4953ce10a748e2fbe331a730d33f3a8107c890e6 Author: maxv Date: Mon Jun 29 16:36:17 2015 +0000 Remove a dead branch. Could look like a memory leak, but ih cannot be NULL. Found by Brainy. commit 7781484c85a63c2733c43de580341503c24041e2 Author: maxv Date: Mon Jun 29 12:27:41 2015 +0000 Use-after-free. ok christos@ Found by Brainy. commit 4671003166ca0a9e2ca2ca91b668d700d16dfa0c Author: maxv Date: Sun Jun 28 15:13:28 2015 +0000 Initialize 'error'. Can't test, but obvious enough apparently. Found by Brainy. commit 847dc81e6fde6bc8154766409ceedb6c274e1ef7 Author: maxv Date: Sun Jun 28 10:04:32 2015 +0000 Small fixes. ok hannken@ commit 69f3d6fb931cdeaba9cac8dde17519540a272105 Author: maxv Date: Sun Jun 28 09:15:45 2015 +0000 Use-after-free. ok christos@ Found by The Brainy Code Scanner. commit 3b4292756a802f7cda0e746ffbc3ea62b99e5be6 Author: maxv Date: Sun Jun 21 14:09:47 2015 +0000 KNF commit cee77081d0af335bc2170b9db5ac9b6ad15274bd Author: maxv Date: Sun Jun 21 13:50:34 2015 +0000 KNF commit 66c55f71c7826a94135c3481208a1511d07c4547 Author: maxv Date: Sun Jun 21 13:40:25 2015 +0000 KNF commit c3978e15fa788b8c6d58acee6f4202cf1c1567ef Author: maxv Date: Sat May 23 18:13:31 2015 +0000 Disable COMPAT_FREEBSD. The implementation is poor, not well tested and almost irrelevant. People who need it (for tw_cli for example) can still recompile their kernels with this option. Discussed on tech-kern@ commit 5308fbf405e3bb5871ce2517ae3c70f03e33ab6d Author: maxv Date: Sat May 23 17:05:03 2015 +0000 Remove the DIAGNOSTIC section, and two references to MALLOC and FREE. commit 25e86a5da9c6f9823f1022b9785f0cee34450d96 Author: maxv Date: Sat May 23 16:59:13 2015 +0000 Add a missing goto. (was here before my changes) ok christos@ commit 0d58a382208039cc92ed0a329870c6325dd25e44 Author: maxv Date: Thu May 14 07:27:14 2015 +0000 Use-after-free. Found by Brainy. ok christos@ commit de1d35761121d856872b0497a4285dce8b2a9f3d Author: maxv Date: Mon Apr 27 09:19:58 2015 +0000 Remove #ifdef notyet. commit 42c103908a3abddb596d52d04df859cbee2a67b4 Author: maxv Date: Mon Apr 27 09:17:31 2015 +0000 Remove FreeBSD. ok elad@ commit 187810545a414ad11f9e13a6bb7cca290c13a5e1 Author: maxv Date: Sun Apr 26 09:45:40 2015 +0000 Not to add even more confusion in an already overcomplicated subsystem, remove the FreeBSD code. This code is likely to be outdated, and Veriexec is in all cases not available on FreeBSD. commit 685d9278676ad99c0299c73dcfd9aee49e454340 Author: maxv Date: Sun Apr 26 09:38:01 2015 +0000 KNF commit 12c51df0e6b080a144577d55232741189b1a409b Author: maxv Date: Sun Apr 26 09:20:09 2015 +0000 Be a bit more verbose if the kernel rejects a file commit 8baf08853fcffed8bc545c5d68ec0f6106608c7c Author: maxv Date: Sun Apr 26 09:16:06 2015 +0000 If we already have an entry for the file being loaded, return EEXIST, don't silently skip it. commit 120432677cbfbf842a466efa0e1446192d0f661a Author: maxv Date: Sun Apr 26 06:19:36 2015 +0000 ffs_superblock_validate(): check the size of cylinder groups. commit 7e91eb6ddb858aeb82df065bb2aefe290c1c82b1 Author: maxv Date: Sat Apr 25 19:10:29 2015 +0000 Make veriexec_renamechk() more readable. Also add a KASSERT on vte_count. No real functional change commit bc430abcabd793829a46464ffe7066a56cab8e7c Author: maxv Date: Sat Apr 25 18:43:13 2015 +0000 Instead of duplicating code, add veriexec_fp_status(). Also reorder a useless goto. commit c35516e8a9a24e914fe0afd68384ed440b7ed6c3 Author: maxv Date: Sat Apr 25 09:08:51 2015 +0000 Don't mix veriexec lock and file lock in veriexec_file_verify(). Now: - 'veriexec_op_lock' needs to be held when calling veriexec_file_verify() - the 'file_lock_state' argument indicates if the file is locked - add some KASSERTs commit 93f9af897ce727e11b86b1d85e07741b00f90f0f Author: maxv Date: Sat Apr 25 08:19:06 2015 +0000 KNF commit 0f08c756e0b900137c5c5b3cdf1291e01f25b384 Author: maxv Date: Wed Apr 22 07:27:09 2015 +0000 Instead of duplicating code, create ffs_is_appleufs(): returns 1 if the device is an AppleUFS FS, 0 otherwise. This changes the behavior a bit: if the kernel cannot determine whether the disk is an AppleUFS one or not, it now considers it as a normal UFS rather than returning an error and not mounting/reloading it. No particular comment on tech-kern@ commit 355baf4f46b1206a20d7c8cd1fd3e0e139bb1297 Author: maxv Date: Mon Apr 20 14:10:31 2015 +0000 Fix the French translation. commit cc008f88087b819bc7b4b2359393f22ce7bd2446 Author: maxv Date: Sun Apr 19 16:14:03 2015 +0000 Several fixes for the French translation. Looks like the '{\n' break the interface: the "No" buttons sometimes disappear. Actually I can't test this change right now; will see tomorrow. commit 08e37649f53d9711a01333a512e21159229c73fe Author: maxv Date: Fri Apr 10 11:47:12 2015 +0000 Fix a double free. "Suggested" by Brainy. ok rjs@ riastradh@ commit 42ef7d264b453c7bc9c4408dafec133036510592 Author: maxv Date: Sat Apr 4 06:00:12 2015 +0000 ffs_superblock_validate(): ensure fs_ncg!=0 and fs_maxbpg!=0 to prevent several divisions by zero. commit 3254b5510b7546c0bdcf40e36f959d878b0ad0d4 Author: maxv Date: Sat Mar 28 19:29:16 2015 +0000 7.99.8 (bread, breadn) commit f61b6169f825204d0ecf7bb2725de8e0b2854450 Author: maxv Date: Sat Mar 28 19:24:04 2015 +0000 Remove the 'cred' argument from bread(). Remove a now unused var in ffs_snapshot.c. Update the man page accordingly. ok hannken@ commit 4ef29bad3d4d5e225c5e4e327d13c4fe438f8177 Author: maxv Date: Sat Mar 28 17:23:42 2015 +0000 Remove the 'cred' argument from breadn(), and update the man page accordingly. ok hannken@ commit 9db97a6a13275b0f7c56731a69509a32dbc6bd05 Author: maxv Date: Sat Mar 28 16:55:21 2015 +0000 Remove the 'cred' argument from bio_doread(). commit 18c81562cdc9217f390ea8e4ce104ac5c2e61331 Author: maxv Date: Fri Mar 20 20:36:27 2015 +0000 Zero-fill the ELF auxiliary vectors. Otherwise, on 64bit systems, the padding between a_v and a_type contains kernel garbage, therefore exposed to userland. Original report by uebayasi@ commit e2a99c33da6f8fdc7ec61ad5b15204c9e7fe21ff Author: maxv Date: Sun Mar 15 09:21:01 2015 +0000 ffs_reload(): fix a bug that prevents Big Endian FSes from being reloaded. 'newfs' should be tagged as FS_SWAPPED, not 'fs'. Was here before my changes. While here, also KNF a bit. commit b20cf6ae252af991047e5bc9132a50ea6e84035e Author: maxv Date: Sat Mar 14 19:52:54 2015 +0000 ffs_superblock_validate(): ensure fs_ipg and fs_fpg are != 0. Otherwise division by zero in several places. commit e0f295570c26deecd0b61b68e2784a7db767767a Author: maxv Date: Tue Mar 10 12:59:32 2015 +0000 ffs_superblock_validate(): check the number of inodes per block. Otherwise a malformed value could panic the system. commit e321468b53ab0ac0de03282b9b12e3204499592d Author: maxv Date: Fri Mar 6 19:03:30 2015 +0000 Fix uninitialized variable. Found by The Brainy Code Scanner in FreeBSD. commit d7f1de926a7fdb8f0d57d13f592fea1914869ee5 Author: maxv Date: Tue Mar 3 17:56:51 2015 +0000 ffs_reload(): release 'bp' earlier commit d4e60839d7bda1e4b1e310ee37b70553541b56a8 Author: maxv Date: Tue Mar 3 17:46:39 2015 +0000 ffs_reload(): the current implementation blindly guesses critical fields of the superblock didn't change. Add checks to ensure they didn't change for real. This prevents several memory corruptions. commit db82f426a3ccd09f67ea7c58c03b42c15ff7d898 Author: maxv Date: Mon Feb 23 17:05:58 2015 +0000 Hum. Perhaps I missed a bit of the specification. Let's not be that severe when checking the superblock. Should fix ATF. commit 0e8b721d34908fc286367f147e5a1db85de4f814 Author: maxv Date: Mon Feb 23 13:38:54 2015 +0000 Small changes: - instead of always calling DPRINTF with __func__, put __func__ directly in the macro - ffs_mountfs(): rename fsblockloc -> fs_sblockloc, initialize fs_sbsize to zero No real functional change commit b4605d5c7fb480994863d09e77d6480857737fb6 Author: maxv Date: Sun Feb 22 14:55:23 2015 +0000 Merge _sbcompute() and _sbcheck() into _sbfill(). In ext2fs_sbfill(), check more fields of the superblock, to prevent several kernel panics when mounting/unmounting a disk. commit 1ede3ff623961940f5de97a4c1569a480d57e16e Author: maxv Date: Sun Feb 22 14:22:34 2015 +0000 ffs_superblock_validate(): sanitize fs_fragshift, fs_bmask and fs_fmask. commit 910a6ba8f1bbc196f3818b0a4bc027c5daede885 Author: maxv Date: Sun Feb 22 14:12:48 2015 +0000 KNF, and simplify a bit. No functional change commit 2f08dc6f66bc24fdb2ef875aba6dfc78626b5ed6 Author: maxv Date: Fri Feb 20 17:44:54 2015 +0000 Several fixes: - rename ext2fs_checksb() -> ext2fs_sbcheck(): more consistent - in ext2fs_sbcheck(), add a check to ensure e2fs_inode_size!=0, otherwise division by zero - add ext2fs_sbcompute(), to compute dynamic values of the superblock. It is done twice in _reload() and _mountfs(), so put it in a function. - reorder the code in charge of loading the superblock: now, read the superblock, swap it directly, and *then* pass it to ext2fs_sbcheck(). It is similar to what ffs now does. It is better since the fields don't need to be swapped on the fly in ext2fs_sbcheck(). Tested on amd64. commit c82fa469f86884f0068afb08e3c0b66d1d0f9b4c Author: maxv Date: Fri Feb 20 17:10:17 2015 +0000 Style, and fix a DPRINTF No functional change commit 6d3f0115552394e5cfd5fcde5705e34a8e86c08f Author: maxv Date: Fri Feb 20 17:08:13 2015 +0000 Cosmetic changes: - add a ffs-like ntfs_superblock_validate function - remove unused includes - fix some comments - KNF No functional change. commit fd3aa5ffbc07374dd01f4275d1e8b8af50b2bb9c Author: maxv Date: Thu Feb 19 21:31:44 2015 +0000 e2fs_sbcheck(): add a check to ensure e2fs_bpg!=0. Otherwise the kernel panics with a division by zero. While here, remove the #ifdef's. commit 1a6e68e0c599bc827c7bb680ef45ddb03b85b0a1 Author: maxv Date: Sun Feb 15 11:04:43 2015 +0000 Revert a change in my previous commit that broke the checksum calculation. Noted by dholland@ commit 4801c66a96dede2ba836ede0abe4d49ca914ecfe Author: maxv Date: Sat Feb 14 13:43:28 2015 +0000 ffs_superblock_validate(): when checking the number of frag blocks, also make sure it matches fs->fs_frag. This also prevents an infinite loop if fs->fs_frag=0. commit 4a4652df38cb4e8dd9cc8bd45a85e09e52dd9be0 Author: maxv Date: Sat Feb 14 10:21:29 2015 +0000 ffs_superblock_validate(): compute fs_bshift and fs_fshift, and ensure they are consistent with what is indicated in the superblock. This allows us to safely use some ffs_ macros. commit b2f197f1320fe6998ebae7589a941231a4ccc5db Author: maxv Date: Sat Feb 14 09:55:53 2015 +0000 In fact, we need to sanitize the superblock *after* swapping it. Therefore, move the swap code inside the loop. 'fs->fs_sbsize' is swapped twice: the first time in order to get the correct superblock size, and later when swapping the whole superblock structure. As a result, we need to check 'fs->fs_sbsize' twice. This: - fixes my previous changes for swapped FSes - allows the kernel to look for other superblock locations if the current superblock is not validated And now: - ffs_superblock_validate() takes only one argument: the fs structure - 'fs_bsize' is unused, so delete it Add some comments to explain a bit what we are doing. commit a77ab2918b16a654406697cd856496114326a3d2 Author: maxv Date: Sat Feb 14 09:06:11 2015 +0000 Two typos: - "preferrably" -> "preferably" - "overriden" -> "overridden" No functional change. commit e3e449ae95a64aaf91442fc7f0b2e72bf725ee8a Author: maxv Date: Sat Feb 14 09:00:12 2015 +0000 ffs_superblock_validate(): sanitize the number of frag blocks. commit e0ca1a1d2c4fc28e9ded019ebcc11a03b88d1105 Author: maxv Date: Sat Feb 14 08:07:39 2015 +0000 ffs_appleufs_validate(): - remove superfluous printfs - ensure ul_namelen!=0, otherwise the kernel accesses ul_name[-1] and overwrites the previous field in the structure. commit 344c734f9de33c7d8c039efcd8050e4f0d73b745 Author: maxv Date: Sat Feb 14 07:56:31 2015 +0000 KNF. No functional change. commit d320e9f0aadb3eddf62412f1cdc6541334ef9b60 Author: maxv Date: Sat Feb 14 07:41:40 2015 +0000 Currently, in ffs_reload(), we don't handle the possibility that the superblock location may have changed. But that implies that we don't handle the possibility that its size may have changed either. Therefore: add a check to ensure the size hasn't changed. Otherwise the mismatch leads to a memory corruption with kmem. commit 17a3406ab266a25ec0d364fd0d2030abc9ab0378 Author: maxv Date: Sat Feb 14 07:20:11 2015 +0000 Style. No functional change. commit 7158e2dac36fbf3f193e96fd84d6f0ddb934721a Author: maxv Date: Sat Feb 14 07:11:34 2015 +0000 ffs_reload(): call ffs_superblock_validate() with the new superblock. commit 681c4e86d11d041c5ab56fbf3ce6b74f3126a71d Author: maxv Date: Fri Feb 13 17:55:24 2015 +0000 ... and I forgot to actually remove kern_verifiedexec.c. As I said in the first revision of kern_veriexec.c: rename kern_verifiedexec.c to kern_veriexec.c. The old history is now in Attic/, and no change between kern_verifiedexec.c and kern_veriexec.c. okayed by christos@ and blymn@ some months ago. commit 8f9f36e470c09e6bdfac6edd9deed70eee08c68d Author: maxv Date: Fri Feb 13 17:50:48 2015 +0000 Rename kern_verifiedexec.c to kern_veriexec.c. "Veriexec" is the name of the subsystem, not "Verifiedexec". The revisions of kern_verifiedexec.c are now in Attic/. No change between kern_verifiedexec.c and kern_veriexec.c. Also, update the man page accordingly. Okayed by christos@ and blymn@ some months ago. commit fd6143f8736865f34e43222b42e941474f4a7fdf Author: maxv Date: Fri Feb 13 17:13:20 2015 +0000 ffs_superblock_validate(): ensure fs->fs_cssize!=0, otherwise the kernel panics with kmem_alloc(0). commit 9ff5504e9f0e506cef73ae6d15344f6f157599ea Author: maxv Date: Fri Feb 13 16:59:52 2015 +0000 Add some checks in ffs_superblock_validate(): - fs_bsize < MINBSIZE - !powerof2(fs_bsize) - !powerof2(fs->fs_fsize) - fs_bsize < fs->fs_fsize Based on makefs/ffs. commit 0112a179ac557e6d0cf971588daa41335094431a Author: maxv Date: Fri Feb 13 15:52:29 2015 +0000 Add a new function: ffs_superblock_validate(). And add a new check to ensure fs_size!=0; otherwise the kernel panics with a division by zero. commit 4927ed650eee02a9c8b27d0dabf6fadb86e1ac65 Author: maxv Date: Fri Feb 13 15:28:56 2015 +0000 Make this a bit more readable. No functional change. commit 2347d7d9d2b8ab4d484fdc918d2548395981cd88 Author: maxv Date: Fri Feb 13 13:26:50 2015 +0000 Remove this MALLOC_DEFINE (M_PMF unused). commit c3cc9b8eaf50a726caf32cb8ffef155ee460731e Author: maxv Date: Sat Feb 7 10:40:57 2015 +0000 Revert previous, it was a false positive. In nilfs_mount_device() there's one branch where the node is not released: when the device is already mounted. Not releasing it was thus intentional, but this is something code scanners can't understand. commit f9fac3477b851096d2b0f29566170feffc99f53a Author: maxv Date: Fri Feb 6 18:21:29 2015 +0000 Don't include commit dfceeddcfa42f666414317b5c7cfc69ef9f007be Author: maxv Date: Fri Feb 6 18:19:22 2015 +0000 Kill kmeminit(). commit e7a966224c030961015d82342d5f7998f569dac8 Author: maxv Date: Fri Jan 16 17:02:12 2015 +0000 Fix a node leak. Sent on tech-kern@, tested by martin@ commit c5c9599d493d651299e28800ee2da249e07b29ec Author: maxv Date: Mon Dec 29 17:17:54 2014 +0000 Small cleanup: - KNF - malloc + memset -> malloc(|M_ZERO) - no need to check data == NULL commit cca151cb932eb8eccd2cd8c20563543407d9ead3 Author: maxv Date: Mon Dec 29 17:02:39 2014 +0000 I started to KNF this file but quickly ended up figuring out I was not courageous enough for such ugliness. So I only KNF'ed the first 300 lines. I'll come back later. commit 9eb43a8055ee2e72be289d9e15cdb64f4801ed3a Author: maxv Date: Mon Dec 29 16:37:27 2014 +0000 Typos: - "nessesary" -> "necessary" (comment) - "UNEXISTED" -> "NON-EXISTENT" (dprintf) - "NON-EXISTANT" -> "NON-EXISTENT" (dprintf) - "reach" -> "reaches" (comment) commit 5c93a2aa31192ae03d3fb9c666719fe32d2ad6ac Author: maxv Date: Sun Dec 28 14:42:56 2014 +0000 Make this more readable (KNF). commit cc7a520dacb7d2d6f718d249f41ce41e851673e0 Author: maxv Date: Sun Dec 28 13:11:52 2014 +0000 Prevent another division by zero in ntfs_loadntnode() by ensuring spc != 0. commit bb57954c28d6846e0bfe813c8d15dec4ad482029 Author: maxv Date: Sun Dec 28 12:57:44 2014 +0000 Ensure bps != 0 to prevent a division by zero. Zero byte per sector makes no sense. commit ca4df98693e96a3d2304b86e3b2448184032002a Author: maxv Date: Sun Dec 28 12:19:21 2014 +0000 Two typos: - reserver4 -> reserved4 (in struct bootfile) - "inducates" -> "indicates" (comment) commit 12f756c2222b3c2f7724b4c669821c2e72608f4a Author: maxv Date: Sun Dec 28 12:13:22 2014 +0000 Make this more readable (KNF). commit c49e353f549fd6a7342659baef95bea3c172c83f Author: maxv Date: Sat Dec 27 19:32:57 2014 +0000 Cleanup: - remove struct kmembuckets (dead) - correctly deadify MALLOC_XX - remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead) - remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT() and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc commit 33eaa8efa718426d976b9d2f55a007ce67415a36 Author: maxv Date: Fri Dec 5 17:26:21 2014 +0000 User-triggerable kmem_alloc(0). Ok martin@ christos@ User commit 0ea362857710e3bc649449cc99f3972a0d993d24 Author: maxv Date: Fri Nov 14 17:34:23 2014 +0000 Do not uselessly include . commit 321cf7e096dbd30a1c55cc9430db310a786c3916 Author: maxv Date: Mon Nov 10 18:46:33 2014 +0000 Do not uselessly include . commit 1d116dd6f1a5b67f0f596b4e0f4af366c91d9a90 Author: maxv Date: Sun Nov 9 18:23:28 2014 +0000 Do not uselessly include . commit 0da56f7c30a244c94764bd607d462420a0a01639 Author: maxv Date: Sun Nov 9 18:08:06 2014 +0000 Do not uselessly include . commit 0d955aa7ed8bdcbbd4611c526e06b655f0a24ccb Author: maxv Date: Sun Nov 9 17:48:07 2014 +0000 Do not uselessly include . commit 404da71becf1ff11a40af95f582fb99bb124b0b5 Author: maxv Date: Tue Nov 4 16:01:58 2014 +0000 Do not release secmodels_lock when it is not held. Sent on tech-kern@, ok lars@ commit 6f9d642323d2da534c2f1ca022bb14bda4799911 Author: maxv Date: Thu Oct 30 17:13:41 2014 +0000 Limit the superblock size to SBLOCKSIZE, not MAXBSIZE. Otherwise memcpy will read beyond the allocated buffer. Discussed a bit on tech-kern@. commit 752a2c84f7e0f7d836aedde637f8fa724ba8d2a8 Author: maxv Date: Thu Oct 30 16:45:28 2014 +0000 Reject non-regular files. Patch from njoly@. commit c3c9c5c9a9cca3e4461d9a039ed644d85a6fd3e1 Author: maxv Date: Mon Oct 20 08:20:08 2014 +0000 The userland namelen is size_t, but the kernel holds it in an int. The sizeof(login) test implicitly interprets 'namelen' as unsigned, which means that negative values get kicked anyway. Still this is fragile, so: int -> size_t commit cba50ac9822e1b9b91a97a6625167e8b19cdf451 Author: maxv Date: Mon Oct 20 06:56:38 2014 +0000 Memory leak, triggerable from root only. Found by my code scanner. ok christos@ commit 858a9750c1cc53661cd7d7ea924953423b5a2172 Author: maxv Date: Mon Oct 20 06:41:51 2014 +0000 Memory leak. Found by my code scanner. ok christos@ commit c02afeb9577e19b7f13510b9ffcb30c65b5ad79b Author: maxv Date: Sun Oct 19 17:33:58 2014 +0000 Resource leak. Found by my code scanner. Tested by njoly@; ok njoly@ rmind@ on tech-kern@. commit f1e5d176f7b86e33ac047fdfebcd02e3f16f6d23 Author: maxv Date: Fri Oct 10 16:29:56 2014 +0000 I'm not sure reading from an unsanitized userland pointer is a good idea. Some users might be tempted to give 0x01, in which case the kernel will crash. commit 4c2c78f28526d0b490e187a412448111542bba7e Author: maxv Date: Sun Aug 24 12:48:58 2014 +0000 Ensure nbytes > 0. Otherwise bad things may happen. Compile-tested only. ok christos@ commit e3fa51890df1655051939f1179d3340e7750e1c9 Author: maxv Date: Thu Aug 21 06:40:35 2014 +0000 Remove dead returns: return VAR/func(XX); return VAR; The latter is never reached. Sent on tech-kern@, no disagreement. commit b818eea4ee14c5ef97b2aa1fb12b76b6d33af58b Author: maxv Date: Sat Aug 16 17:27:09 2014 +0000 http://m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html#Report-2 #03-0x02: Memory leak ok ozaki-r@ commit af0e11f098790e99eec442c181cb0a52b28b7e2d Author: maxv Date: Thu Aug 14 17:29:30 2014 +0000 http://m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html#Report-2 #06-0x01: Empty compiler block ok christos@ commit d067e0d2d1ee3bcd3b4638a19289c64480fbd748 Author: maxv Date: Thu Aug 14 14:06:53 2014 +0000 Overflow if *data_len == OSIZE and args->version >= PTYFS_ARGSVERSION. Sent on tech-kern@, ok christos@ commit 8b9f5a76b24743e708c0b85adf8ebe3bd50cac79 Author: maxv Date: Tue Aug 12 06:57:20 2014 +0000 http://m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html#Report-2 #04-0x01: Uninitialized var 'rqp' (does not compile anyway) commit d725bf115ece734a98bd6db2de06b8afde0c8072 Author: maxv Date: Tue Aug 12 06:49:10 2014 +0000 http://m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html#Report-2 #04-0x02: Remove 'doclusterread' and 'doclusterwrite' (unused). commit f7d1ffdc2e338b21b5da91e5672bddf3f0630171 Author: maxv Date: Mon Aug 11 14:02:14 2014 +0000 1) 'error' is returned while it does not even hold an error code. Which means that zero is returned, and the kernel keeps mounting (and it probably ends up in a deadlock/memory corruption somewhere). 2) 'nentries' and 'gnentries' are int and user-controlled, and there's no check to ensure they are greater than zero. Since they are used to compute the size of two copyin's, a user can control the copied size by giving a negative value (like 128-2^29), and thus overwrite kernel memory. Both triggerable from root only. commit de6810d06f2fc626acf378c8b2c9b07290afe978 Author: maxv Date: Thu Jul 31 12:35:33 2014 +0000 Just return sys_open(). COMPAT_10 will be handled internally. ok christos@ commit 74e98f65c6d997b2568c2431b70a3889650cbced Author: maxv Date: Fri Jul 25 16:28:12 2014 +0000 'result' -> 'error' commit 93cc1540087debc9fba84e8894a291f500d14487 Author: maxv Date: Fri Jul 25 16:23:13 2014 +0000 Remove ELF_ROUND and ELF_TRUNC (unused). Found by my code scanner. commit 411c285f096225e475852ea70eb3b30c64d6416b Author: maxv Date: Tue Jul 22 08:18:33 2014 +0000 1) On 64bit systems, don't add the 32bit execsw[] to the global exec array. exec_elf32 works on 32bit systems only, and will crash 32bit binaries on 64bit systems. 2) Now that exec_elf32 is dormant, we can give the native ELF loaders the highest priority. Binaries will load faster now (system boot, compilation, etc.). With the help of njloy@. Discussed a bit on tech-kern@, no disagreement. commit 524e06d13fc85f69d515772813fcbdd9f9c46b3f Author: maxv Date: Tue Jul 22 07:38:41 2014 +0000 Enable KMEM_REDZONE on DIAGNOSTIC. It will try to catch overflows. No comment on tech-kern@ commit f15e7710141a8a3886043c9fb56f125e1d4b1749 Author: maxv Date: Fri Jul 18 17:24:34 2014 +0000 Make DPRINTF more understandable, and replace my previous #ifdef DIAGNOSTIC... commit 50b315fc7f505993a6a5a3c88259a27bf7e05a97 Author: maxv Date: Fri Jul 18 16:25:17 2014 +0000 Fix the ATF failures caused by my recent smbfs change (smbfs_vfsops.c -r1.103). ok pooka@ commit 5fdb925c9f8dfd7a3c111e7dd16dcdc9829efa55 Author: maxv Date: Wed Jul 16 20:09:00 2014 +0000 Limit the minimum size of a disk sector to 512 bytes, to prevent memory overflow on extremely low secsize. This normally conforms to the old standard (for which there doesn't seem to be a clear spec). Since 2011, IDEMA's Advanced Format standardizes it to 4k, so this change won't cause any trouble on new devices. Put the printf under DIAGNOSTIC temporarily to see if someone complains. after a quick discussion on tech-kern commit 8cedd53624017b83d694c1434726e72ea25cd9dc Author: maxv Date: Wed Jul 16 13:26:33 2014 +0000 Keep setting 'error' as appropriate (even if this place is broken enough to crash in many other ways...) commit 3e292afccec2c218ca9bfc205c7b62ddfe3671ec Author: maxv Date: Mon Jul 14 16:29:50 2014 +0000 smbfs depends on nsmb, so add the dependency as appropriate. Fixes # modload smbfs on modular kernels, PR kern/40011, and probably system crashes. commit 202290b6f0885875f8f14699aeac1eea31c4bad5 Author: maxv Date: Mon Jul 14 16:06:48 2014 +0000 Tell which dependency has failed commit 9dd7d2e5d7727493114e4a848c1128f648ffd21a Author: maxv Date: Fri Jul 11 16:22:49 2014 +0000 netbsd32 should depend on exec_elf32, since it will use exec_elf32's functions. This fixes # modload compat_netbsd32 when exec_elf32 is not loaded. ok njoly@ commit 29664c5236edb59bbcc82428cdbd7a9ffb496042 Author: maxv Date: Thu Jul 10 19:21:46 2014 +0000 Simplify a bit commit bcba01eb7580347c0b2919347602a8bd3da8e1d0 Author: maxv Date: Thu Jul 10 19:12:07 2014 +0000 Fix a user-controlled memory allocation. kmem_alloc(0) will panic the system. ok christos@ commit 3ec5dd65ce579048441001059901038df72b9383 Author: maxv Date: Wed Jul 9 09:00:18 2014 +0000 Minor changes: - malloc()+memset() -> malloc(|M_ZERO) - rename 'vers' to 'FSVers' - declare 'ExtFlags' instead of calling getushort() two times commit a246387a59eeed5de71c65cfc266dda683c655c3 Author: maxv Date: Wed Jul 9 08:43:54 2014 +0000 Remove ROOTNAME (unused). commit 25495aa6232c82992a86906a444535d654dc7f99 Author: maxv Date: Wed Jul 9 06:04:16 2014 +0000 What a terrible use-after-free commit 18d5fddcbc48402dc529c403f63747669fe730e5 Author: maxv Date: Wed Jul 9 05:50:51 2014 +0000 - limit the number of sections with ELF_MAXSHNUM - fix the (symstrindex > hdr->e_shnum) check: it should be >=, otherwise there's an off-by-one - fix the (symstrindex < 0) check: the value is unsigned, so it can't be <0. However, we should ensure that symstrindex!=0 (done with SHN_UNDEF) - set 'error' as appropriate - ensure that e_shstrndx < hdr->e_shnum, to prevent out-of-bound reads Fixes several crashes that could occur when loading a kernel module. Quick glance from martin@ commit eb33c2a4aa55768479cb0bdbc6669d6be7d1741a Author: maxv Date: Tue Jul 8 19:34:47 2014 +0000 - Perform sanity checks not just for GEMDOSFS, but for all FAT devices. This also fixes a division-by-zero bug that could crash the system. - Define GEMDOSFS_BSIZE instead of a hard-coded 512 value, and remove 'bsize'. - Rename 'tmp' to 'BlkPerSec'. From me, FreeBSD, OpenBSD and the FAT specification. ok christos@ commit e8c7d251a2a90fa8dafa452f4a9fa0ffbfb69c48 Author: maxv Date: Tue Jul 8 17:16:25 2014 +0000 Define ELF_MAXNOTESIZE, ELF_MAXSHNUM and ELF_MAXPHNUM in , so that it can be used externally. commit 4bf2499d46a2f12eca8b6f3e79c34986c81ee993 Author: maxv Date: Sun Jul 6 15:35:32 2014 +0000 Remove this (symtabindex == -1) check; it is already handled by (nsym != 1). Put a KASSERT instead. commit e6f48ff69a99b568a11233576c39f9035fedeaea Author: maxv Date: Sun Jul 6 15:22:31 2014 +0000 Use a macro instead of always putting __func__ and __LINE__. commit a8e6d7e6ca5887a74393797ec3d83579be4e42be Author: maxv Date: Sun Jul 6 07:41:41 2014 +0000 Check .evs_used==0 instead of .evs_cmds==NULL. evs_cmds would not be NULL if another _makecmds() had allocated and deallocated VMCMDs (not the case currently). commit 180f9cbac004df33d95dcdb11b3df2810e7ff95d Author: maxv Date: Thu Jul 3 08:43:49 2014 +0000 Change the pattern of KMEM_REDZONE so that the first byte is never '\0'. From me and lars@. commit 82a5a2fbf43ea432e1f8e647a28bd61d521826b3 Author: maxv Date: Wed Jul 2 15:00:28 2014 +0000 Fix the KMEM_POISON check: it should check the whole buffer, otherwise some write-after-free's wouldn't be detected (those occurring in the 8 last bytes of the allocated buffer). Was here before my changes, spotted by lars@. commit b4e7a946d83200a13ede7c9ef8d3bdaafb65dd96 Author: maxv Date: Tue Jul 1 12:08:33 2014 +0000 1) Define a malloc(9)-like kmem_header structure for KMEM_SIZE. It is in fact more consistent, and more flexible (eg if we want to add new fields). 2) When I say "page" I actually mean "kmem page". It may not be clear, so replace it by "memory chunk" (suggested by lars@). 3) Minor changes for KMEM_REDZONE. commit ff80582eb0b856db1559fac6546d8a0d6150a4c8 Author: maxv Date: Mon Jun 30 17:51:31 2014 +0000 This is weird; 'abort' already does all this, so simply use goto abort. commit bc21039b9f81931a2543fe986c67734368384e1f Author: maxv Date: Mon Jun 30 17:31:15 2014 +0000 Reorder two variables and fix some comments. commit 8d04aaa30c70eadf29d536c5b3c09c68e7797611 Author: maxv Date: Mon Jun 30 17:22:32 2014 +0000 If the interpreter is "", do not keep loading the script (which will later fail), but return ENOEXEC directly. ok christos@ commit b6700c989e18a87570160e368e8e937ac25ee566 Author: maxv Date: Sat Jun 28 15:52:45 2014 +0000 This KASSERT can trigger a panic too easily, if SCARG(uap, cmd)=SWAP_OFF and SCARG(uap, arg)=NULL. The same KASSERT is already in the SWAP_ON switch case, so just delete it here. commit 5279f0e2b19882a564063dc583c786d279420c65 Author: maxv Date: Sat Jun 28 11:39:15 2014 +0000 Sync getfh() with the native implementation. It also fixes: a) a return value b) a vnode lock c) a user-controlled memory allocation ok christos@, on tech-kern commit b9d888ec84ebb4d807c46364840194bf4da4839d Author: maxv Date: Sat Jun 28 11:06:31 2014 +0000 Empy comment commit 186ca86fc354e0d97068909a81667b5d26e8f4cf Author: maxv Date: Wed Jun 25 16:35:12 2014 +0000 1) Make clear that we want the space allocated for the KMEM_SIZE header to be aligned, by using kmem_roundup_size(). There's no functional difference with the current MAX(). 2) If there isn't enough space in the page padding for the red zone, allocate one more page, not just 2 bytes. We only poison 1 or 2 bytes in this page, depending on the space left in the previous page. That way 'allocsz' is properly aligned. Again, there's no functional difference since the shift already handles it correctly. commit 08225dd9c9dd9a5a607e08f8d0f575c12faf3b71 Author: maxv Date: Wed Jun 25 16:05:22 2014 +0000 Rephrase some comments and remove whitespaces. No functional change. commit 78f115bf76b15bdaaf7f2aa1e3eb025ec702b5d7 Author: maxv Date: Tue Jun 24 14:42:43 2014 +0000 Do not hardcode the value. Use KQ_NEVENTS. commit 0a9d1a3680046133a33457b5b7690c09ccac6cdb Author: maxv Date: Tue Jun 24 14:33:57 2014 +0000 Allocate directly KQ_NEVENTS bytes. Otherwise a user can panic the system. ok christos@ commit 35e7ddc4639ad3d78b9ecff4e4e6b35b52d12fc7 Author: maxv Date: Tue Jun 24 12:17:40 2014 +0000 Remove unused headers. commit 5361dc07d0e78f880ad2ed5c816dcf44f20c4309 Author: maxv Date: Tue Jun 24 11:59:10 2014 +0000 Remove dead code. The kernel already checks for PT_INTERP sections, and puts their content into "itp". There's no need for re-reading the whole binary and trying to find this section again. Just use "itp". DEBUG_FREEBSD_ELF is now unused, so remove its references in amd64/conf/ALL and i386/conf/ALL. commit 90c106f51406a0b6c1f60eff3c9d9f1afa51ec54 Author: maxv Date: Tue Jun 24 10:08:45 2014 +0000 'miliseconds' -> 'milliseconds'. commit 2e7c01fbfdbcff97ef0069386b2b80092f141653 Author: maxv Date: Tue Jun 24 07:28:23 2014 +0000 KMEM_REDZONE+KMEM_POISON is supposed to detect buffer overflows. But it only poisons memory after kmem_roundup_size(), which means that if an overflow occurs in the page padding, it won't be detected. Fix this by making KMEM_REDZONE independent from KMEM_POISON and making it put a 2-byte pattern at the end of each requested buffer, and check it when freeing memory to ensure the caller hasn't written outside the requested area. Not enabled on DIAGNOSTIC for the moment. commit 821257818c09de66f49035603dd85a77aa38b92b Author: maxv Date: Mon Jun 23 18:06:32 2014 +0000 Use KASSERT() instead of #ifdef(DIAGNOSTIC). Clearer. commit 4059d4cb06a2652d4c7340cf3a247fc05f07122f Author: maxv Date: Mon Jun 23 17:43:42 2014 +0000 Enable KMEM_SIZE on DIAGNOSTIC. It will catch memory corruption bugs due to a different size given to kmem_alloc() and kmem_free(), with no performance impact. commit dde88c78cd4ed0410e8721ac04953eb632ebce5c Author: maxv Date: Sun Jun 22 19:09:39 2014 +0000 Sync swapctl() with netbsd32. Return EINVAL when misc<0, and 0 when misc=0 or uvmexp.nswapdev=0. commit 093b3f0948a096c422bb3581c00fa7cf3393a126 Author: maxv Date: Sun Jun 22 18:32:27 2014 +0000 Fix a NULL pointer dereference after a loooong discussion with dholland@, hannken@, blymn@ and martin@. This bug would panic the system when veriexec is set to the VERIEXEC_LOCKDOWN mode (only settable from root). commit 30fd3578a9ac10debf7a437477cdaf86d520dc9a Author: maxv Date: Sun Jun 22 17:36:42 2014 +0000 Put the KMEM_GUARD code under #if defined(KMEM_GUARD). No functional change. commit 196920d25820e595a37ef710f3209c3a1b13f1a4 Author: maxv Date: Sun Jun 22 17:23:34 2014 +0000 A KASSERT() is better. commit 64673a6c31ccede36d85a490c1adcaab48b1cfc2 Author: maxv Date: Sat Jun 21 10:23:07 2014 +0000 If SCARG(uap, what) = 0, copyin() will copy (size_t)-1 bytes, and it's not a good idea; but not proven harmful. With the help of njoly@ commit 9af409764183346ca2ba56ff896fad78a45a3bf0 Author: maxv Date: Tue Apr 22 19:01:47 2014 +0000 Fix a read-beyond-end string read. coredump_buildname() copies 'pattern' into 'name', and handles special characters such as "%n". "%n", if present, will be replaced by p->p_comm. error = coredump_buildname(p, name, pattern, MAXPATHLEN); This function handles overflows, and returns an error when 'name' becomes larger than MAXPATHLEN. However, when coredump() calls it, 'name' is used before the error check, with: lastslash = strrchr(name, '/'); 'name' is not guaranteed to be NUL-terminated, because of the *d = *s in coredump_buildname(). This strrchr will read a string which is not NUL- terminated (ie. until finding a '\0' in memory). 'pattern' can't be higher than MAXPATHLEN. A user can fill it in via a PT_DUMPCORE ptrace call, given the input is not longer than MAXPATHLEN. Since the 2-bytes-sized "%n"s will be replaced by p->p_comm (which is user-settable, like a 10-bytes-sized "0123456789"), 'name' can become longer than 'pattern' (and thus longer than MAXPATHLEN). Some 'a's at the end of the buffer will make sure 'name' is not NUL-terminated. pattern: "%n%n%naaaaaaaaaaaaaaaaaaaaaaaaaaaa\0" | | | ||||||||||||||||||||||||||||| -> name: "012345678901234567890123456789aaaaa" [no \0] | | | |||||MAXPATHLEN Fix it by checking 'error' before calling strrchr. commit 79003296e40c132f4b577c4b89dcea168a83c5bd Author: maxv Date: Sun Apr 20 21:26:51 2014 +0000 This thing is totally buggy: 'data_len' is modified by the fs, so calling kmem_free with it while its value has changed since the kmem_alloc is far from being a good idea. If the kernel figures out that something mismatches, it will panic (typically with kernfs). commit 8039abf772bb28a5e344181abf685c3e426f7619 Author: maxv Date: Fri Apr 18 11:44:31 2014 +0000 'error' is not set on failure. This is a true bug: everything is freed and unlocked while zero is returned. Since there's no error, execve_runproc() will get called and will try to use those freed things. PS: This bug was here before uebayasi@'s changes commit 2e64ee3dba6c36b6d10cda297a889eec8cb3463a Author: maxv Date: Fri Apr 18 05:22:13 2014 +0000 Memory leak (only triggerable from root). ok christos@ commit d7ae0e65229b63ecb165488daf6b3519161e58dc Author: maxv Date: Wed Apr 16 19:25:28 2014 +0000 Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check to prevent an (un)privileged user from requesting a zero-sized allocation (and thus a panic). commit f4f7f4f3d864a2657c445c3ae9702f2a9f282fbc Author: maxv Date: Wed Apr 16 18:55:17 2014 +0000 An (un)privileged user can easily make the kernel dereference a NULL pointer. The kernel allows 'data' to be NULL; it's the fs's responsibility to ensure that it isn't NULL (if the fs actually needs data). ok christos@ commit ab43e87a1c4f5b5e4742a70096b9421997ebd0b8 Author: maxv Date: Tue Apr 15 17:53:09 2014 +0000 There are two times the same branches. } else if (addr == LUSR_OFF(__signal)) { error = ENOTSUP; } else if (addr == LUSR_OFF(__signal)) { error = ENOTSUP; } Just delete one of them. Spotted by my code scanner. ok christos@ commit 021a7f93f66ea2ba47e4a700d7646df43bdf4264 Author: maxv Date: Tue Apr 15 17:29:00 2014 +0000 A specially-crafted binary could easily control a kernel array index. Add some checks to ensure that nothing will be read outside the allocated area. Rewrite the code so that we don't need to allocate the whole section. Spotted by several developers, patch from chs@/enami@ commit 36c872497b3a0d27999a30577df6d02e45f55d7b Author: maxv Date: Tue Apr 15 06:14:55 2014 +0000 There's no need for this NULL-check. commit 4053aa5877afe6c39984ee6eb027751538ea88b1 Author: maxv Date: Wed Apr 9 11:40:03 2014 +0000 'error' is not set on failure. Which means that if copyout() fails, 0 will be returned while the stack is not ready. This is a bug. commit 7d3119d6646577ab5f7194095ecd9c27b2ec0627 Author: maxv Date: Fri Apr 4 06:47:02 2014 +0000 Limit check for 'data_len'. Otherwise a (un)privileged user can easily panic the system by passing a huge size. ok christos@ commit d2d433f1b3055c9e53780994cdb22aa1e42c051f Author: maxv Date: Sat Mar 29 09:31:11 2014 +0000 Style commit dade17c09eb11df6de57af166fb695d212f817f3 Author: maxv Date: Sat Mar 22 08:15:25 2014 +0000 Fix a potential - but very unlikely - NULL pointer dereference. (it does not introduce a new error code for open(), since pathbuf_copyin() is already there and can return ENOMEM) Found by my code scanner. commit 8f8c56b208e3f884f01f274df7d7d68061f1ab6c Author: maxv Date: Sat Mar 22 07:46:35 2014 +0000 'newrt' is not supposed to be NULL. Therefore, the NULL-check in the if() is pointless; and even if 'newrt' were NULL, 'rt' would be dereferenced later. This is not a bug. CID 270855 ok christos@ commit 5fb614184d84a815189fb59ac9290245b763edf8 Author: maxv Date: Sat Mar 22 07:27:21 2014 +0000 Small changes: - rename elf_load_file() to elf_load_interp() - use the correct type for 'nused' - remove useless cases - reorder a kmem_alloc ok christos@ commit c512cfa934bfd4e95e52af53385f9d4cf1fc7e38 Author: maxv Date: Sun Mar 16 07:57:25 2014 +0000 Remove the 'prot' argument from elf_load_psection(). It is not used outside, and can be declared locally. Clearer. ok christos@ commit 123e11a61e796f575e287bf558946e1fc59de92a Author: maxv Date: Thu Mar 6 19:46:27 2014 +0000 Fix uninitialized variable. Found by my code scanner. ok christos@ commit 2d55e7ed835ee8d18747cb66c1d0af2152ec23a7 Author: maxv Date: Sat Mar 1 16:59:41 2014 +0000 Some {} are missing. The behavior is thus wrong: the code always jumps to out1. Spotted by my code scanner. ok christos@ commit 721d623b26d0483e2bbaa2070122476ea4d0d9b5 Author: maxv Date: Sat Mar 1 16:46:14 2014 +0000 ';;' -> ';' no functional change spotted by my code scanner ok christos@ commit da9c6bd04bfc02bd63c96f5f8b74c50ff478d421 Author: maxv Date: Thu Feb 27 09:58:05 2014 +0000 We have to ensure the string is NUL-terminated and of the expected length to avoid copying uninitialized data. ok christos@ commit c67949bde0e1afe46fb903faac2b37335bf5ca77 Author: maxv Date: Sat Feb 22 07:53:16 2014 +0000 Simplify error path. ok christos@ commit 4855680df77d28bfccb546e9cfae77b9cb8fa768 Author: maxv Date: Fri Feb 21 08:11:59 2014 +0000 Revert rev1.38. The header already begins with EXEC_SCRIPT_MAGIC="#!". So it can't be ELFMAG="\177ELF" at the same time. ok christos@ commit d91ba50b1f2201f6aece5635388d036fdcd2b58b Author: maxv Date: Fri Feb 21 07:53:53 2014 +0000 Increase LINUX32_ELF_AUX_ENTRIES to avoid overrun in linux32/. Also, add comments and KASSERTs to make sure people don't forget to increase XX_AUX_ENTRIES's when adding vectors. Reported by martin@ (CV), with suggestions from chs@. ok martin@ chs@ commit 2c728e7209ae0e0fe3fcf7b24c2a60499f85e66c Author: maxv Date: Fri Feb 21 07:47:02 2014 +0000 Properly check the section size to avoid out-of-bound reads. The computed size must be the exact same size that is indicated in sh_size. ok agc@ christos@ commit 15be23202d43f6d334ab7012fd942f55d53cd898 Author: maxv Date: Wed Feb 19 15:23:20 2014 +0000 We need VMCMDs for a binary and its interpreter, so make sure we have at least one VMCMD. This also prevents the kernel from using an uninitialized pointer as entry point for the execution. From me and Christos ok christos@ commit 1b4ad69824ed9ff41657d2b801f1a451d6dc65fc Author: maxv Date: Mon Feb 17 20:16:52 2014 +0000 Adapt my previous patch differently. read(2) wants EISDIR when the object is a directory. Which also means that tmpfs_read() was returning a wrong error code when dealing with non-regular vnodes. commit 1c305f5aa161e3467a19645698b22cc2daa1b103 Author: maxv Date: Mon Feb 17 19:29:46 2014 +0000 Cosmetic; just replace whitespaces by tabs commit 76a87f4c65e40b6ecf486ab94ed4fa581133f209 Author: maxv Date: Sun Feb 16 17:46:36 2014 +0000 Small cleanup: - make elf_load_file() and elf_load_psection() static - make loops consistent - 'nload' is not used - see rev1.24 - 'ap' is not used in elf_load_file() ok agc@ christos@ commit 53020ff48745403bd8f76dc7d634eda55ddabdcd Author: maxv Date: Sun Feb 16 12:54:07 2014 +0000 Fix tmpfs_read()'s return value; it should return EINVAL. Now consistent with tmpfs_write(). ok christos@ commit 2c3e5d3a51df503c64f1a22b56d9faec7f01ee12 Author: maxv Date: Sat Feb 15 16:17:01 2014 +0000 Remove the last argument of elf_check_header(). It is easier - and faster - to check the e_type field in the calling function. Other BSD's already do this. ok christos@ commit 5b63a38401956e5c1b3a3bacefcd04a1b7514328 Author: maxv Date: Fri Feb 14 07:30:07 2014 +0000 Fix memory leak. ok christos@ agc@ commit 9f24517bf24e98d4e1d9ce85b90682d53b8e244a Author: maxv Date: Tue Feb 11 16:00:13 2014 +0000 Fix uninitialized variable. Harmless: it does not change the behavior at all. ok rmind@ christos@ commit 4302522ab9718878f8cc380f5738d534627e77de Author: maxv Date: Sun Feb 9 14:51:13 2014 +0000 Reorder code to avoid using an uninitialized variable: if sysctl_copyin fails, 'tmp' is not initialized. This bug is harmless since only the return value will be different; it does not expose kernel memory unless diagnostic is enabled. ok agc@ martin@ commit fa991d7fd3756e53d4ece7042d25e44b4d905ae0 Author: maxv Date: Sun Feb 9 13:40:59 2014 +0000 Fix error message; argv[1] could be NULL commit 9fee09b61535daae49ccc5bccaca29ecf9c9587d Author: maxv Date: Sat Feb 8 15:50:29 2014 +0000 add myself