Just before smp_send_stop() re-enables interrupts when shutting down
for reboot or kexec, it calls __stop_this_cpu() which in turn calls
disable_local_APIC(), which puts the APIC back in to the mode Xen found
it in at boot.
If that means turning x2APIC off and going back into xAPIC mode, then
a timer interrupt occurring just after interrupts come back on will
lead to a GP# when apic_timer_interrupt() attempts to ack the IRQ
through the EOI register in x2APIC MSR 0x80b:
(XEN) Executing kexec image on cpu0
(XEN) ----[ Xen-4.14-unstable x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<
ffff82d08026c139>] apic_timer_interrupt+0x29/0x40
(XEN) RFLAGS:
0000000000010046 CONTEXT: hypervisor
(XEN) rax:
0000000000000000 rbx:
00000000000000fa rcx:
000000000000080b
...
(XEN) Xen code around <
ffff82d08026c139> (apic_timer_interrupt+0x29/0x40):
(XEN) c0 b9 0b 08 00 00 89 c2 <0f> 30 31 ff e9 0e c9 fb ff 0f 1f 40 00 66 2e 0f
...
(XEN) Xen call trace:
(XEN) [<
ffff82d08026c139>] R apic_timer_interrupt+0x29/0x40
(XEN) [<
ffff82d080283825>] S do_IRQ+0x95/0x750
...
(XEN) [<
ffff82d0802a0ad2>] S smp_send_stop+0x42/0xd0
We can't clear the global x2apic_enabled variable in disable_local_APIC()
itself because that runs on each CPU. Instead, correct it (by using
current_local_apic_mode()) in smp_send_stop() while interrupts are still
disabled immediately after calling __stop_this_cpu() for the boot CPU,
after all other CPUs have been stopped.
cf:
d639bdd9bbe ("x86/apic: Disable the LAPIC later in smp_send_stop()")
... which didn't quite fix it completely.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit:
8b1002ab037aeacdece7723c07ab35ca16c1e22e
master date: 2020-02-14 18:01:52 +0000