ia64/xen-unstable

changeset 15473:34f285b57b87

[IA64] Fix soft lock up caused by xen_timer_interrupt()

This patch intends to fix softlockup caused by xen_timer_interrupt().
This is caused by local_cpu_data->itm_next and stime_irq, itc_at_irq
inconsistency at CPU0 of hypervisor. This patch sets stime_irq and
itc_at_irq every time in xen_timer_interrupt() to avoid this soft
lock up.

In other words, it is caused by competition of local_cpu_data->itm_next
and domain_itm in xen_timer_interrupt() and reprogram_timer() (more
specific vcpu_set_next_timer()).

For example:
1) reprogram_timer() runs and set local_cpu_data->itm_next and set
domain_itm as next itm.
2) xen_timer_interrupt() called but following condition is not satisfied:
while(time_after(ia64_get_itc(), local_cpu_data->itm_next)
This skips stime_irq and itc_at_irq setting.
3) goto 1)
4) sometimes local_cpu_data->itm_next is rollback because
ns_to_cycle()/IA64 is representing almost 32bit.
(This occured at reprogram_timer())
5) It causes soft lock up.
6) Hypervisor returns to work(not hang).

To reproduce this issue, I do following configuration.

1) boot Xen with pcpu=4 and Dom0 with vcpu=4
2) boot domU1 with vcpu with vcpu-pin 0-1
3) boot domU2 with vcpu with vcpu-pin 0-1
4) run yes > /dev/null 2 process on domU1
5) run nothing on domU2(to check softlock up occured or not)
6) run kernel compile with -j4 on Dom0 continuously
7) wait 4 or 8 hours to occur softlockup.

Signed-off-by: Atsushi SAKAI <sakaia@jp.fujitsu.com>
author Alex Williamson <alex.williamson@hp.com>
date Thu Jul 05 13:04:00 2007 -0600 (2007-07-05)
parents 40608e5e394e
children f71dcdd9cddb
files xen/arch/ia64/xen/xentime.c
line diff
     1.1 --- a/xen/arch/ia64/xen/xentime.c	Mon Jul 02 21:06:46 2007 -0600
     1.2 +++ b/xen/arch/ia64/xen/xentime.c	Thu Jul 05 13:04:00 2007 -0600
     1.3 @@ -126,9 +126,7 @@ xen_timer_interrupt (int irq, void *dev_
     1.4  
     1.5  
     1.6  	new_itm = local_cpu_data->itm_next;
     1.7 -	while (time_after(ia64_get_itc(), new_itm)) {
     1.8 -		new_itm += local_cpu_data->itm_delta;
     1.9 -
    1.10 +	while (1) {
    1.11  		if (smp_processor_id() == TIME_KEEPER_ID) {
    1.12  			/*
    1.13  			 * Here we are in the timer irq handler. We have irqs locally
    1.14 @@ -150,6 +148,10 @@ xen_timer_interrupt (int irq, void *dev_
    1.15  
    1.16  		local_cpu_data->itm_next = new_itm;
    1.17  
    1.18 +		if (time_after(new_itm, ia64_get_itc())) 
    1.19 +			break;
    1.20 +
    1.21 +		new_itm += local_cpu_data->itm_delta;
    1.22  	}
    1.23  
    1.24  	if (!is_idle_domain(current->domain) && !VMX_DOMAIN(current)) {