Commit
6e03363 ("x86: Implement TSC adjust feature for HVM guest")
implemented TSC_ADJUST MSR for hvm guests. Though while booting
an HVM guest the boot CPU would have a value set with delta_tsc -
guest tsc while secondary CPUS would have 0. For example one can
observe:
$ xen-hvmctx 17 | grep tsc_adjust
TSC_ADJUST: tsc_adjust
ff9377dfef47fe66
TSC_ADJUST: tsc_adjust 0
TSC_ADJUST: tsc_adjust 0
TSC_ADJUST: tsc_adjust 0
Upcoming Linux 4.10 now validates whether this MSR is correct and
adjusts them accordingly under the following conditions: values of < 0
(our case for CPU 0) or != 0 or values >
7FFFFFFF. In this conditions it
will force set to 0 and for the CPUs that the value doesn't match all
together. If this msr is not correct we would see messages such as:
[Firmware Bug]: TSC ADJUST: CPU0: -
30517044286984129 force to 0
And on HVM guests supporting TSC_ADJUST (requiring at least Haswell
Intel) it won't boot.
Our current vCPU 0 values are incorrect and according to Intel SDM which on
section "Time-Stamp Counter Adjustment" states that "On RESET, the value
of the IA32_TSC_ADJUST MSR is 0." hence we should set it 0 and be
consistent across multiple vCPUs. Perhaps this MSR should be only
changed by the guest which already happens through
hvm_set_guest_tsc_adjust(..) routines (see below). After this patch
guests running Linux 4.10 will see a valid IA32_TSC_ADJUST msr of value
0 for all CPUs and are able to boot.
On the same section of the spec ("Time-Stamp Counter Adjustment") it is
also stated:
"If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR
adds (or subtracts) value X from the TSC, the logical processor also
adds (or subtracts) value X from the IA32_TSC_ADJUST MSR.
Unlike the TSC, the value of the IA32_TSC_ADJUST MSR changes only in
response to WRMSR (either to the MSR itself, or to the
IA32_TIME_STAMP_COUNTER MSR). Its value does not otherwise change as
time elapses. Software seeking to adjust the TSC can do so by using
WRMSR to write the same value to the IA32_TSC_ADJUST MSR on each logical
processor."
This suggests these MSRs values should only be changed through guest i.e.
throught write intercept msrs. We keep IA32_TSC MSR logic such that writes
accomodate adjustments to TSC_ADJUST, hence no functional change in the
msr_tsc_adjust for IA32_TSC msr. Though, we do that in a separate routine
namely hvm_set_guest_tsc_msr instead of through hvm_set_guest_tsc(...).
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit:
98297f09bd07bb63407909aae1d309d8adeb572e
master date: 2017-01-24 12:37:36 +0100
}
delta_tsc = guest_tsc - tsc;
- v->arch.hvm_vcpu.msr_tsc_adjust += delta_tsc
- - v->arch.hvm_vcpu.cache_tsc_offset;
v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc;
hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, at_tsc);
}
+static void hvm_set_guest_tsc_msr(struct vcpu *v, u64 guest_tsc)
+{
+ uint64_t tsc_offset = v->arch.hvm_vcpu.cache_tsc_offset;
+
+ hvm_set_guest_tsc(v, guest_tsc);
+ v->arch.hvm_vcpu.msr_tsc_adjust += v->arch.hvm_vcpu.cache_tsc_offset
+ - tsc_offset;
+}
+
void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust)
{
v->arch.hvm_vcpu.cache_tsc_offset += tsc_adjust
break;
case MSR_IA32_TSC:
- hvm_set_guest_tsc(v, msr_content);
+ hvm_set_guest_tsc_msr(v, msr_content);
break;
case MSR_IA32_TSC_ADJUST: