On two back-to-back CPU offline operations, on second offline the
cpu_state var will be CPU_STATE_DEAD from the first offline. Hence
__cpu_die() will incorrectly not wait for the second slave to fully
die and set cpu_state itself.
The fix is to set cpu_state to a new value, CPU_STATE_DYING, earlier
during CPU offline, before __cpu_die() starts to execute.
Original diagnosis and patch by Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 22975:
d3d29df8f082
xen-unstable date: Sat Mar 05 11:34:41 2011 +0000
static int cpu_error;
static enum cpu_state {
- CPU_STATE_DEAD = 0, /* slave -> master: I am completely dead */
+ CPU_STATE_DYING, /* slave -> master: I am dying */
+ CPU_STATE_DEAD, /* slave -> master: I am completely dead */
CPU_STATE_INIT, /* master -> slave: Early bringup phase 1 */
CPU_STATE_CALLOUT, /* master -> slave: Early bringup phase 2 */
CPU_STATE_CALLIN, /* slave -> master: Completed phase 2 */
extern void fixup_irqs(void);
int cpu = smp_processor_id();
+ set_cpu_state(CPU_STATE_DYING);
+
local_irq_disable();
clear_local_APIC();
/* Allow any queued timer interrupts to get serviced */
while ( cpu_state != CPU_STATE_DEAD )
{
+ BUG_ON(cpu_state != CPU_STATE_DYING);
mdelay(100);
cpu_relax();
process_pending_softirqs();