Intel hardware only uses 4 bits in MSR_EFER. Changes to LME and LMA are
handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.
SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
and vmx_update_guest_efer(), and works by altering the host SCE value to match
the setting the guest wants. This works because, in HVM vcpu context, Xen
never needs to execute a SYSCALL or SYSRET instruction.
However, NXE has never been context switched. Unlike SCE, NXE cannot be
context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
reserved and cause a pagefault when encountered. This means that the guest
always has Xen's setting in effect, irrespective of the bit it can see and
modify in its virtualised view of MSR_EFER.
This isn't a major problem for production operating systems because they, like
Xen, always turn the NXE on when it is available. However, it does have an
observable effect on which guest PTE bits are valid, and whether
PFEC_insn_fetch is visible in a #PF error code.
Second generation VT-x hardware has host and guest EFER fields in the VMCS,
and support for loading and saving them automatically. First generation VT-x
hardware needs to use MSR load/save lists to cause an atomic switch of
MSR_EFER on vmentry/exit.
Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
support when available (and MSR load/save lists on older hardware) and drop
all ad-hoc alteration of SCE.
There are two minor complications when selecting the EFER setting:
* For shadow guests, NXE is a paging setting and must remain under host
control, but this is fine as Xen also handles the pagefaults.
* When the Unrestricted Guest control is clear, hardware doesn't tolerate LME
and LMA being different. This doesn't matter in practice as we intercept
all writes to CR0 and reads from MSR_EFER, so can provide architecturally
consistent behaviour from the guests point of view.
With changing how EFER is loaded, vmcs_dump_vcpu() needs adjusting. Read EFER
from the appropriate information source, and identify when dumping the guest
EFER value which source was used.
As a result of fixing EFER context switching, we can remove the Intel-special
case from hvm_nx_enabled() and let guest_walk_tables() work with the real
guest paging settings.