Keir Fraser [Wed, 17 Feb 2010 12:05:45 +0000 (12:05 +0000)]
Mask AMD CPUID masks in software before writing them to the MSRs
Mask AMD CPUID masks in software before writing them to the MSRs.
Setting bits in the CPUID mask MSR that are not set in the unmasked
CPUID response can cause those bits to be set in the masked response.
Avoid that by explicitly masking in software.
Keir Fraser [Wed, 17 Feb 2010 12:04:50 +0000 (12:04 +0000)]
x86/mcheck: do not blindly de-reference dom0 et al
Since machine checks and CMCIs can happen before Dom0 even gets
constructed, the handlers of these events have to avoid de-referencing
respective pointers without checking.
Keir Fraser [Tue, 16 Feb 2010 09:28:39 +0000 (09:28 +0000)]
hotplug: ignore xenstore-read error
The failure to read "backend/tap/<domid>/*" in the xenstore is a usual
case since the domain is gone after xenstore-ls command is executed.
The error should be ignored.
Keir Fraser [Tue, 16 Feb 2010 09:27:45 +0000 (09:27 +0000)]
cpuidle: do not enter deep C state if there is urgent VCPU
when VCPU is polling on event channel, it usually has urgent task
running, e.g. spin_lock, in this case, it is better for cpuidle driver
not to enter deep C state.
This patch fix the issue that SLES 11 SP1 domain0 hangs in the box of
large number of CPUs (>= 64 CPUs).
Signed-off-by: Yu Ke <ke.yu@intel.com> Signed-off-by: Tian Kevin <kevin.tian@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 15 Feb 2010 08:14:21 +0000 (08:14 +0000)]
hvmloader: Fix an ACPI asl bug.
Fix an ACPI asl bug by explicitly convert PRS to buffer, otherwise PRS
would be parsed as integer if less than 32/64 bits (according to ACPI
1.0 or 2.0).
Keir Fraser [Mon, 15 Feb 2010 08:13:26 +0000 (08:13 +0000)]
Vcpu hotplug: Move ACPI processor from \_PR to \_SB
Move processor from \_PR to \_SB. ACPI processor can be defined
under \_PR or \_SB. However, recently os like linux 2.6.30/32 support
cpu hotplug better for \_SB processor object.
Keir Fraser [Fri, 12 Feb 2010 09:24:18 +0000 (09:24 +0000)]
x86_64: widen bit width usable for struct domain allocation
With it being a PDX (instead of a PFN) that gets stored when a 32-bit
quantity is needed, we should also account for the bits removed during
PFN-to-PDX conversion when doing the allocation.
Keir Fraser [Fri, 12 Feb 2010 09:23:10 +0000 (09:23 +0000)]
Remus: increase failover timeout from 500ms to 1s
500ms is aggressive enough to trigger split-brain under fairly
ordinary workloads, particularly for HVM. The long-term fix is to
integrate with a real HA monitor like linux HA.
Keir Fraser [Fri, 12 Feb 2010 09:21:57 +0000 (09:21 +0000)]
keyhandler: Do not serialise keyhandlers; increase scratch array size.
Although serialising keyhandlers is safer, and in particular
protects access to shared heyhandler_scratch[], in debug scenarios it
is probably better to 'have a go' when requested - and assume the user
knows what they are doing.
Meanwhile, increase scratch array size to 1024. That's enough for more
than a dozen lines of 80-column text, and should be plenty in any
practical situation.
Keir Fraser [Fri, 12 Feb 2010 09:16:10 +0000 (09:16 +0000)]
hvmloader: Fix parallel build of ACPI tables.
Make build.c dependency on ssdt_{pm,tpm}.h explicit, else they can be
built in the wrong order.
Also, improve naming of target DSDT structures, and specify -p option
to iasl so that all generated files for a given target have
target-unique names, hence build can proceed safely in parallel.
Keir Fraser [Thu, 11 Feb 2010 22:42:18 +0000 (22:42 +0000)]
hvmloader: Build a compatibility DSDT with only 15 processor objects.
Deploy this smaller DSDT where possible: this is required to boot
Windows 2000, which only supports up to 15 processors and will blue
screen if it sees more processor objects than that (even inactive
ones).
Keir Fraser [Thu, 11 Feb 2010 19:51:15 +0000 (19:51 +0000)]
VT-d: get rid of duplicated definition
free_pgtable_maddr was implemented the same for x86 and IA64, so it's
not necessary to define it separately for x86 and IA64. This patch
moves free_pgtable_maddr definition to iommu.c to avoid duplicated
definition.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Thu, 11 Feb 2010 19:48:58 +0000 (19:48 +0000)]
VT-d: ensure zapping ACPI DMAR signature in acpi_parse_dmar
VT-d is owned by Xen hypervisor. Xen zaps ACPI DMAR signature to
prevent dom0 to use VT-d. This patch changes the direct return when
DMAR width is zero, instead zaps ACPI DMAR signature before return.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Wed, 10 Feb 2010 09:20:56 +0000 (09:20 +0000)]
x86: move trampoline location
A partner of ours is reporting boot failures (Xen not even emitting a
single message) over iSCSI on new (UEFI based) systems. After
pointing at their BIOS initially I finally remembered to take a look
at the memory map a native kernel booted this way see - and voila, the
BIOS reports memory starting at 0x8d000 as reserved. Xen, however,
places about 12k of (trampoline) data at 0x8c000.
For now, move the trampolien down by 4kB to 0x88000. Later we may
choose the location dynamically based on E820 information, if this
proves to be an ongoing problem.
One thing this patch enforces in any case is a single point of
definition for the hard coded location, so that at least adjusting it
won't require more than a single line change in the future.
Keir Fraser [Wed, 10 Feb 2010 09:18:43 +0000 (09:18 +0000)]
Fix domain reference leaks
Besides two unlikely/rarely hit ones in x86 code, the main offender
was tmh_client_from_cli_id(), which didn't even have a counterpart
(albeit it had a comment correctly saying that it causes d->refcnt to
get incremented). Unfortunately(?) this required a bit of code
restructuring (as I needed to change the code anyway, I also fixed
a couple os missing bounds checks which would sooner or later be
reported as security vulnerabilities), so I would hope Dan could give
it his blessing before it gets applied.
Keir Fraser [Wed, 10 Feb 2010 09:18:11 +0000 (09:18 +0000)]
x86: MCE fixes
- fill_vmsr_data() leaked a domain reference; since the caller already
obtained one, there's no need to obtain another one here
- intel_UCR_handler() could call put_domain() with a NULL pointer
- mcheck_mca_logout() updated a local data structure that wasn't used
after the update
Keir Fraser [Mon, 8 Feb 2010 10:18:51 +0000 (10:18 +0000)]
Dump machine check context for fatal machine check
This small patches enable Xen hypervisor to always dump machine check
ontext, previously it will not print anything if fatal MCE happens. It
also add checking for NULL pointer.
It also change the address passing to guest to always use guest
mfn. It should benifit non-translated guest.
Keir Fraser [Mon, 8 Feb 2010 08:43:25 +0000 (08:43 +0000)]
vmx: Don't enable irq for machine check vmexit handling
We should not enable irq for machine check VMExit
In changeset 18658:824892134573, IRQ is enabled during VMExit except
external interrupt. The exception should apply for machine check also,
because :
a) The mce_logout_lock should be held in irq_disabled context.
b) The machine check event should be handled as quickly as possible,
enable irq will increase the period greatly.
Keir Fraser [Fri, 5 Feb 2010 10:37:24 +0000 (10:37 +0000)]
Remus: fix ia64 build
This patch fixes the following error:
/xen-unstable.hg/tools/remus/kmod/sch_queue.c: In function
`is_foreign':
/xen-unstable.hg/tools/remus/kmod/sch_queue.c:51: error:
`phys_to_machine_mapping' undeclared (first use in this function)
Keir Fraser [Fri, 5 Feb 2010 10:35:57 +0000 (10:35 +0000)]
libxl: notice if vbd virt device specifier ("path") unrecognised
Previously, specifying a virtual device string the vbd that couldn't
be parsed would result in attempting to actually create the device
with vbd number -1 !
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Thu, 4 Feb 2010 13:16:39 +0000 (13:16 +0000)]
libxc: Reorder functions in xc_misc.c to avoid weak symbol problem
Using a function, and then declaring it weak later, has undefined
behaviour:
cc1: warnings being treated as errors
xc_misc.c:388: error: weak declaration of 'xc_map_foreign_bulk'
after first use results in unspecified behavior
So swap the functions xc_map_foreign_pages and xc_map_foreign_bulk.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Thu, 4 Feb 2010 13:09:30 +0000 (13:09 +0000)]
mem hotplug: Fix an incorrect sanity check in memory add
Current, memory hot-add will fail if the new added memory is bigger
than current max_pages. This is really a stupid checking, considering
user may hot-add the biggest address riser card firstly.
This patch fix this issue. It check if all new added memory is
unpopulated, if yes, then it is ok.
Keir Fraser [Thu, 4 Feb 2010 09:09:13 +0000 (09:09 +0000)]
VT-d, tools: Intel IGD passthrough 2/2: hvm config file example
Add an option into hvm config file to enable graphics passthrough
including discrete and IGD. To passthrough graphics to guest, need to
set gfx_passthru=1, and also specify graphics device BDF in pci
passthrough option, like pci=['xx:xx.x'] in hvm config file.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Thu, 4 Feb 2010 08:53:16 +0000 (08:53 +0000)]
x86: fix frame table initialization when hotplug memory regions were detected
max_idx is not a pdx, and hence needs to be converted to one in all
cases where it is being passed to pdx_to_page().
Also, just like for max_pdx, the conversion result of max_idx may
point into an address space hole, and hence it must not be used
directly as an argument to pdx_to_page(). Note that this doesn't apply
to the arguments passed to memset(), as the size argument would be
zero in the case of hitting an address space hole.
Keir Fraser [Wed, 3 Feb 2010 09:46:38 +0000 (09:46 +0000)]
VT-d: fix a bug in enable_ats_device
In enable_ats_device, it should enable ATS if find matched atsr unit
for a device, and don't enable it if no matched atsr unit. But current
code does contrarily. This patch fixes it.
Signed-off-by: Weidong Han <Weidong.han@intel.com>
Keir Fraser [Wed, 3 Feb 2010 09:46:01 +0000 (09:46 +0000)]
libxc: Check there's enough memory for segments we're creating
Previously, xc_dom_alloc_segment would go ahead even if the segment
we're trying to create is too big for the domain's RAM (or the
requested addr is out of range). It would pass invalid parameters to
xc_dom_seg_to_ptr giving undefined behaviour.
Fixing xc_dom_seg_to_ptr to fail is not sufficient because we want to
provide a comprehensible explanation to the caller - which may
ultimately be the user.
In particular, with this change attempting "xl create" with a ramdisk
image bigger than the guest's specified RAM will provide a useful
error message mentioning the ramdisk.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Wed, 3 Feb 2010 09:45:40 +0000 (09:45 +0000)]
libxc: Check full range of pfns for xc_dom_pfn_to_ptr
Previously, passing a valid pfn but an overly large count to
xc_dom_pfn_to_ptr, and functions which call it, would run off the end
of the pfn array giving undefined behaviour.
It is tempting to change this check to an assert, as no callers should
be providing invalid parameters here. But this is probably best not
done while frozen for 4.0.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Wed, 3 Feb 2010 09:36:37 +0000 (09:36 +0000)]
xentrace: Disable tracing, then read records one more time.
When interrupted, first disable tracing, then read through the records
one last time.
Without this patch, it's possible to get traces which interact (such
as runstate changes) on processors with higher numbers, while missing
the corresponding traces generated on lower-numbered processors.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Wed, 3 Feb 2010 09:35:56 +0000 (09:35 +0000)]
xentrace: Clear lost records when disabling tracing
This patch clears the "lost records" flag on each cpu when tracing is
disabled.
Without this patch, the next time tracing starts, cpus with lost
records will generate lost record traces, even though buffers are
empty and no tracing has recently happened.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Wed, 3 Feb 2010 09:35:23 +0000 (09:35 +0000)]
xentrace: Trace p2m events
Add more tracing to aid in debugging ballooning / PoD:
* Nested page faults for EPT/NPT systems
* set_p2m_enry
* Decrease reservation (for ballooning)
* PoD populate, zero reclaim, superpage splinter
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Wed, 3 Feb 2010 09:16:11 +0000 (09:16 +0000)]
hvmloader: Fix CPU hotplug notify handler in ACPI DSDT.
By merging PRSC and NTFY methods we simplify the code, improve
efficiency, and fix a bug where PRSC iterated 0-255 but NTDY could
only handle CPU numbers 0-127.
Keir Fraser [Fri, 29 Jan 2010 08:55:27 +0000 (08:55 +0000)]
blktap2: Prefer AIO eventfd support on kernels >= 2.6.22
Mainline kernel support for eventfd(2) in linux aio was added between
2.6.21 and 2.6.22. Libaio after 0.3.107 has the header file, but
presently few systems support it. Neither do we rely on an up-to-date
libc6.
Instead, this patch adds a header which defines custom iocb_common
struct, and works around a potentially missing sys/eventfd.h.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Keir Fraser [Fri, 29 Jan 2010 08:54:51 +0000 (08:54 +0000)]
blktap2: Separate tapdisk raw I/O into different backends.
Hide tapdisk support for different raw I/O interfaces behind a new
struct tio. Libaio remains to dominate the interface, requiring
everyone to dispatch iocb/ioevent structs.
Misc:
- Fixes a bug in tapdisk-vbd which locks up the sync io mode.
- Wants a PERROR macro in blktaplib.h
- Removes dead code in qcow2raw to make it link again.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com> Signed-off-by: Jake Wires <jake.wires@citrix.com>
Keir Fraser [Fri, 29 Jan 2010 06:50:23 +0000 (06:50 +0000)]
x86 mca: Be more careful for printk in MCE context
MCE may happen in printk context, and will cause deadlock if we try to
call printk again in MCE context.
A new level(mce_critical) is added to mce_verbosity for printk in mce
context. This level is only for developer that aware of such issue.
In mce_panic, force console unlock.
Keir Fraser [Fri, 29 Jan 2010 06:49:42 +0000 (06:49 +0000)]
x86 mca: Add MCE broadcast checkiing.
Some platform will broadcast MCE to all logical processor, while some
platform will not. Distinguish these platforms will be helpful for
unified MCA handler.
the "mce_fb" is a option to emulate the broadcast MCA in non-broadcast
platform. This is mainly for MCA software trigger.
Keir Fraser [Fri, 29 Jan 2010 06:48:00 +0000 (06:48 +0000)]
x86 mca: Not GP fault when guest write non 0s or 1s to MCA CTL MSRs.
a) For Mci_CTL MSR, Guest can write any value to it. When read back,
it will be ANDed with the physical value. Some bit in physical value
can be 0, either because read-only in hardware (like masked by AMD's
Mci_CTL_MASK), or because Xen didn't enable it.
If guest write some bit as 0, while that bit is 1 in host, we will
not inject MCE corresponding that bank to guest, as we can't
distinguish if the MCE is caused by the guest-cleared bit.
b) For MCG_CTL MSR, guest can write any value to it. When read back,
it will be ANDed with the physical value.
If guest does not write all 1s. In mca_ctl_conflict(), we simply
not inject any vMCE to guest if some bit is set in physical MSR
while is cleared in guest 's vMCG_CTL MSR.
Keir Fraser [Fri, 29 Jan 2010 06:47:24 +0000 (06:47 +0000)]
x86 mca: Handle the vMCA bank correctly
Currently the virtual MCE MSR assume all MSRs range from 0 to
MAX_NR_BANKS are always MCE MSR, this is not always correct. With this
patch, the mce_rdmsr/mce_wrmsr will only handle vMCE MSR range from 0
to the MCA banks in the host platform.
Please notice that some MSR beyond current MCA banks in the host
platform are really MCA MSRs, that should be handled by general MSR
handler.
Keir Fraser [Tue, 26 Jan 2010 15:54:40 +0000 (15:54 +0000)]
pygrub: improve grub 2 support
* The "default" value can be a quoted string (containing an integer)
so strip the quotes before interpreting.
* The "set" command takes a variable with an arbitrary name so instead
of whitelisting the ones to ignore simply silently accept any set
command with an unknown variable.
* Ignore the echo command.
* Handle the function { ... } syntax. Previously pygrub would error
out with a syntax error on the closing "}" because it thought it was
the closing bracket of a menuentry.
This makes pygrub2 work with the configuration files generated by
Debian Squeeze today.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>