x86/vmx: Fix security issue when a guest balloons out the #VE info page
The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is
dangerous. After #VE has been set up, the guest can balloon out and free the
nominated GFN, after which the processor may write to it. Also, the unlocked
GFN query means the MFN is stale by the time it is used. Alternatively, a
guest can race two disable calls to cause one VMCS to still reference the
nominated GFN after the tracking information was dropped.
Rework the logic from scratch to make it safe.
Hold an extra page reference on the underlying frame, to account for the
VMCS's reference. This means that if the GFN gets ballooned out, it isn't
freed back to Xen until #VE is disabled, and the VMCS no longer refers to the
page.
A consequence of this is that altp2m_vcpu_disable_ve() needs to be called
during the domain_kill() path, to drop the reference for domains which shut
down with #VE still enabled.
For domains using altp2m, we expect a single enable call and no disable for
the remaining lifetime of the domain. However, to avoid problems with
concurrent calls, use cmpxchg() to locklessly maintain safety.
This doesn't have an XSA because altp2m is not yet a security-supported
feature.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Juergen Gross <jgross@suse.com>