x86 mce: Small fix for polling/CMCI race conditions.
When CMCI happens very quickly, polling/CMCI processing path might
cross. For Intel CPUs which support CMCI, if the error bank has CMCI
capability, we'll disable poll on this bank.
Signed-off-by: Liping Ke <liping.ke@intel.com> Signed-off-by: Yunhong Jiang<yunhong.jiang@intel.com>
While in the comments to an earlier submitted (and already applied)
patch I claimed to have fixed the need to specify both "nolapic" and
"noapic" when "nolapic" alone should already have the intended effect,
this doesn't appear to be the case. Here are the missing bits.
This patch adds the ACPI fixed hardware power button for HVM.
It enables a graceful shutdown of a guest OS by direction of Dom0.
(if a proper action for the power button is set inside the guest)
network-bridge: Fix do_ifup in the case of ${bridge} != ${netdev}
On RHEL5.2, ifup ${bridge} fails if ${bridge} != ${netdev},
because RHEL5.2's ifup ${bridge} runs the following sequence:
1. Search CONFIG that has the same mac address of ${bridge}.=20
ifcfg-${netdev} is found.
2. Run "ip link set dev ${netdev} up".
# ${bridge} is expected.
3. Output "Failed to bring up ${netdev}."
Because ${netdev} does not exist.
Thus, do_ifup() should not use ifup if ${bridge} != ${netdev}.
vector_channel[], as its name already says, is vector-, not
irq-indexed.
hpet_assign_irq() sits not only in the boot path, but also in the
resume one. Short of knowing why this is, simply checking whether a
vector was already assigned prevents leaking previously assigned ones.
xend: allow hvm domain to have multiple serial consoles
This patch allows hvm domain to have multiple serial ports
with serial =3D [ '...', '...'].
The old style, serial=3D'option string', is also accepted for
compatibility.
Keir Fraser [Tue, 31 Mar 2009 12:28:45 +0000 (13:28 +0100)]
x86: Enable S3 for 32bit dom0 on 64bit Xen
Three SYSENTER MSRs should be taken care of at save/restore BSP
context, or else 32bit dom0 rejects working after S3 resume. Thanks
for Jan's help to find this missing part.
Signed-off-by: Guanqun Lu <guanqun.lu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Keir Fraser [Tue, 31 Mar 2009 12:27:03 +0000 (13:27 +0100)]
hvmloader acpi: Reserve ioport ranges for expanded PHP
Now there are two control registers plus one register for
each of the 32 PHP slots. A total of 34 registers. Accordingly the
ioport space required has expanded by from 3 to 34 bytes.
Signed-off-by: Simon Horman <horms@verge.net.au>
hvmloader acpi: Use If and Else instead of Switch
Keir Fraser [Tue, 31 Mar 2009 12:23:11 +0000 (13:23 +0100)]
x86: unify BUG() & Co, reduce overhead on x86-64
Since it's only the string pointer representations that differ between
i386 and x86-64, abstract out those and make everything else shared.
While touching this code, also use
- proper instructions rather than a mixture of such and raw .byte/
.long/.quad data emissions,
- PC-relative pointers on x86-64 to cut the amount of storage (and
in particular cache space) needed for string references by half.
Keir Fraser [Tue, 31 Mar 2009 12:22:12 +0000 (13:22 +0100)]
Use unlikely() in BUG_ON()/WARN_ON()
-fno-reorder-blocks was added in c/s 1712, when x86-64 just started to
become enabled. The reason it got added is entirely unclear to me, and
it prevents the intended effect of unlikely() constructs (in
particular
the ones added here) of moving out of line code which is expected to
never get executed, as well as using forward branches (which are
statically predicted taken by various processors' branch prediction
units) preferably to reach infrequently executed code.
Keir Fraser [Tue, 31 Mar 2009 12:20:04 +0000 (13:20 +0100)]
xend: less noise in xend-debug.log on HVM shutdown
Shutting down a hvm, xend-debug.log always shows:
Unhandled exception in thread started by=20
Traceback (most recent call last):
File "//usr/lib64/python/xen/xend/image.py", line 549, in
_sentinel_watch
self._dmfailed(message)
File "//usr/lib64/python/xen/xend/image.py", line 491, in _dmfailed
xc.domain_shutdown(self.vm.getDomid(), DOMAIN_CRASH)
xen.lowlevel.xc.Error: (3, 'No such process')
Keir Fraser [Tue, 31 Mar 2009 12:11:56 +0000 (13:11 +0100)]
x86 mce: fix c/s 17968 for 32-on-64
32-on-64 aspects were not properly considered. Add respective
checking, and adjust structure layouts for the cases where the
checking pointed out issues.
Also,
- fix a potential memory corruption issue (do_mca() could write beyond
log_cpus' end if the guest specified less than the number of online
CPUs
- there is no reason to make the (not even properly prefixed)
definitions in xen/public/arch-x86/xen-mca.h globally visible by
including the file from xen/public/arch-x86/xen.h.
Keir Fraser [Tue, 31 Mar 2009 10:54:12 +0000 (11:54 +0100)]
vtd: fix multiple Dom0 S3 on hosts that support Queued Invalidation.
On such hosts we can't do multiple Dom0 S3 when VT-d is enabled.
The cause is: during the first S3 resume, init_vtd_hw() initializes
the invalidation function pointers to the register-based ones and later
enable_qinval() forgets to overwrite the flush function pointers to
queued-based ones, so actually Queued Invalidaton is enabled, but we
actually use the register-based invalidation function! Later during
the second Dom0 S3, in iommu_suspend() -> iommu_flush_all(), we try to
use the register-based invalidation functions to perform global flush
while Queued Invalidation is enabled, and this can cause a host reset
because VT-d spec says: when the queued invalidation is enabled,
software must submit invalidation commands only through the IQ (and
not through any invalidation command registers).
The attached patch fixes the buggy enable_qinval(). And in
iommu_resume(), we invoke iommu_flush_all() for safety.
Keir Fraser [Tue, 31 Mar 2009 10:51:56 +0000 (11:51 +0100)]
cpuidle: suspend/resume scheduler tick timer during cpu idle state entry/exit
cpuidle can collaborate with scheduler to reduce unnecessary timer
interrupt. For example, credit scheduler accounting timer
doesn't need to be active at idle time, so it can be stopped at
cpuidle entry and resumed at cpuidle exit. This patch implements this
function by adding two ops in scheduler: tick_suspend/tick_resume, and
implement them for credit scheduler
Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Keir Fraser [Tue, 31 Mar 2009 10:41:13 +0000 (11:41 +0100)]
vtd: fix iommu vector leak
When we do Dom0 S3 for many times, iommu_set_interrupt() would fail
during S3 resume because it can't obtain vector. We should not request
new vector for every Dom0 S3 resume. We should re-use the same vector.
Keir Fraser [Tue, 31 Mar 2009 10:40:28 +0000 (11:40 +0100)]
xend: Allow user to specify vslots 0 - 1f for static pass-through
The current parser only accepts vslots 0 - f (hex), that is, only
slots that have one digit. This is an omission as two digit slots
with a leading 0 or 1 are also valid, representing the
full range of slots 0 - 1f.
Thanks to Dexuan Cui for spotting this problem.
Cc: Dexuan Cui <dexuan.cui@intel.com> Signed-off-by: Simon Horman <horms@verge.net.au>
Keir Fraser [Tue, 31 Mar 2009 10:39:32 +0000 (11:39 +0100)]
xen: include MSI/MSI-X information in interrupt debug output
With per-domain irq-to-vector mappings, dump_irqs() omitted some
vectors. This patch cycles through the vectors rather than
interrupts and prints the same debug information. The patch also
prints out information about mapped but unbound interrupts.
Keir Fraser [Tue, 31 Mar 2009 10:31:08 +0000 (11:31 +0100)]
xenapi: Fix VDI:read_only, VDI:sharable and VBD:mode of XenAPI
I started a VM by using xm create, then I checked values of VDI
records and values of VBD records. When I gave the following disk
modes to a disk parameter, I got the following values from the records.
"r" "w" "w!"
VDI:read_only True True True <-- Always True!
VDI:sharable True True True <-- Always True!
VBD:mode RO RW RO
^^ <-- It should be RW.
This patch fixes the values of the records as follows.
Keir Fraser [Tue, 31 Mar 2009 10:28:08 +0000 (11:28 +0100)]
xend: Fix 'xm pci_list_assignable_devices'
The current implementation of 'xm pci-list-assignable-devices' command
has a problem that it directly invokes hypercall using
xen.lowlevel.xc.
This is probably based on an assumption that the command is executed
on the host itself, but in fact there are cases xm commands can be
executed on a remote server through xmlrpc.
So this patch makes the xm command just inquire of xend about the
information of available devices.
Keir Fraser [Tue, 31 Mar 2009 10:27:10 +0000 (11:27 +0100)]
xend: Fix scsi_id for pvSCSI
pvSCSI allocations fail if the version of udev in a host OS is
relatively new. I have not been able to detect the failure because
I have used udev of the version 095. The failure occurs by
an incompatibility problem of scsi_id command included udev.
This patch tackles the incompatibility problem.
Keir Fraser [Tue, 31 Mar 2009 10:23:38 +0000 (11:23 +0100)]
xend: Properly save/restore vnc/vfb configuration
In 19284:0942BAA2A088 provision was made for running vnc and sdl
simultaneously. However, arrangements for saving and restoring the
new structure-configuration, and arrangements for allowing the new
code to load old savefiles, were not made.
This patch adds these facilities. Amongst other things, HVM VNC
save/restore will now work properly again.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Tue, 31 Mar 2009 10:13:56 +0000 (11:13 +0100)]
xend: fix domain_migrate
When the guest(pv-on-hvm guest that cannot suspend) reboot in
LiveMigration, the disconnecting of src-side is not transmitted to
dist-side. As a result, the error processing on the dist side is not
executed.
Keir Fraser [Tue, 24 Mar 2009 06:55:29 +0000 (06:55 +0000)]
libxc: fix link error of xc_save on ia64
The suspend event channel functions are arch independent code
which xc_save uses.
The changeset of 19382:a5f497f02e34 cause link error on ia64.
This patch moves the functions into arch common file from
x86 specific file xc_domain_save.c
Keir Fraser [Fri, 20 Mar 2009 17:24:29 +0000 (17:24 +0000)]
x86: Core support for Intel MCA support
Those patches based on AMD and SUN's MCA related jobs.
We have latest rebase after SUN's latest improvements.
We will have late following patches for recovery actions. This is a
basic framework for Intel.
Some implementation notes:
1) When error happens, if the error is fatal (pcc = 1) or can't be
recovered (pcc = 0, yet no good recovery methods),
for avoiding losing logs in DOM0, we will reset machine
immediately. Most of MCA MSRs are sticky. After reboot,
MCA polling mechanism will send vIRQ to DOM0 for logging.
2) When MCE# happens, all CPUs enter MCA context. The first CPU who
read&clear the error MSR bank will be this
MCE# owner. Necessary locks/synchronization will help to judge the
owner and select most severe error.
3) For convenience, we will select the most offending CPU to do most
of processing&recovery job.
4) MCE# happens, we will do three jobs:
a. Send vIRQ to DOM0 for logging
b. Send vMCE# to Impacted Guest (Currently Only inject to impacted
DOM0)
c. Guest vMCE MSR virtualization
5) Some further improvement/adds for newer CPUs might be done later
a) Connection with recovery actions (cpu/memory online/offline)
b) More software-recovery identification in severity_scan
c) More refines and tests for HVM might be done when needed.
Keir Fraser [Fri, 20 Mar 2009 12:33:55 +0000 (12:33 +0000)]
CPUIDLE: enable MSI capable HPET for timer broadcast
The HPET broadcast was default not enabled because the old code use
HPET channel 0 in legacy replacing mode which will hide both PIT & RTC
irqs and cause issues if RTC is needed in some cases. The upstream
default broadcast timer is PIT, which is in periodic mode (100HZ) and
would be expensive to be used as oneshot. MSI capable HPET is coming
into being. It is capable to deliver interrupt through FSB directly,
and has no side effect.
This patch extends support for MSI HPET based on original legacy HPET
code. The broadcast timer selection logic becomes: 1. if MSI capable HPET
available, use multiple HPET channels (no more than possible cpu num)
in MSI mode; 2. else if legacy replacing mode available for HPET &
'hpetbroadcast' option is given in cmd line, use HPET channel 0 in
legacy mode; 3 else use PIT.
While entering broadcast, it gets a hpet channel (look for a spare one
first, if failing allocate a shared one), attach to current cpu, setup
the irq affinity & broadcast cpumask. While exiting broadcast, it
detach the used hpet channel and try to change the owner if the
broadcast mask is not empty. Some optimizations(static affinity) were
done for (MSI HPET channels >= possible cpu num) case.
A new hw_interrupt_controller 'HPET_MSI' is created for HPET MSI
interrupt handling.
Keir Fraser [Fri, 20 Mar 2009 12:09:20 +0000 (12:09 +0000)]
xend: specify the slot for pass-through devices
Currently a slot may be specified for a hot-plug device,
but not for a pass-through device that is inserted at boot time.
This patch adds support for the latter.
The syntax is:
BUS:DEV.FUNC[@VSLOT]
e.g: 0000:00:1d:0@7
This may be important as recent changes that allow any free PCI
slot to be used for pass-through (and hotplug) may case pass-through
devices to be assigned in different locations to before. Amongst
other things, specifying the slot will allow users to move them
back, if there is a need.
Keir Fraser [Fri, 20 Mar 2009 09:36:57 +0000 (09:36 +0000)]
vtd: fix Dom0 S3 when VT-d is enabled.
On some platforms that support Queued Invalidation and Interrupt
Remapping, Dom0 S3 doesn't work. The patch fixes the issue.
1) In device_power_down(), we should invoke iommu_suspend() after
ioapic_suspend(); in device_power_up(), we should invoke
iommu_resume() before ioapic_resume().
2) Add 2 functions: disable_qinval() and disable_intremap(); in
iommu_suspend(), we invoke them and iommu_disable_translation().
Rename qinval_setup() to enable_qinval() and rename
intremap_setup() to enable_intremap().
3) In iommu_resume(), remove the unnecessary
iommu_flush_{context, iotlb}_global() -- actually we mustn't do that
if Queued Invalidation was enabled before S3 because at this point of
S3 resume, Queued Invalidation hasn't been re-enabled.
4) Add a static global array ioapic_pin_to_intremap_index[] to
remember what intremap_index an ioapic pin uses -- during S3 resume,
ioapic_resume() re-writes all the ioapic RTEs, so we can use the array
to re-use the previously-allocated IRTE;
5) Some cleanups:
a) Change some failure handlings in enable_intremap() to panic().
b) Remove the unnecessary local variable iec_cap in
__iommu_flush_iec().
c) Add a dmar_writeq(iommu->reg, DMAR_IQT_REG, 0) in
enable_qinval().
Keir Fraser [Fri, 20 Mar 2009 09:10:55 +0000 (09:10 +0000)]
vtd: only enable Interrupt Remapping if Queued Invalidation is also enabled.
If Queued Invalidation is not supported or not enabled, we should not
enable Interrupt Remapping even if HW supports it, because Interrupt
Remapping needs Queued Invalidation to invalidate Interrupt Remapping
Cache.
Keir Fraser [Fri, 20 Mar 2009 08:59:47 +0000 (08:59 +0000)]
xenpm: Add a small scheduler knob "sched_smt_power_savings"
Current scheduler only care performance, thus always picks pCPU from
the most idle package. This knob provides another option to pick pCPU from
least idle package, for user who want performance power balance.
Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Tian Kevin <kevin.tian@intel.com>