x86: Prefer multiboot-provided e820 over bios-provided e801 memory info.
Some UEFI systems do not provide e820 information. In this case we
should take the detailed memory map provided by a multiboot-capable
loader, rather than rely on very conservative values from the e801
bios call. Using the latter on any modern system really hardly makes
good sense.
Signed-off-by: Keir Fraser <keir@xen.org> Tested-by: Jonathan Tripathy <jonnyt@abpni.co.uk>
xen-unstable changeset: 25786:a0b5f8102a00
xen-unstable date: Tue Aug 28 21:40:45 UTC 2012
Fix shared entry status for grant copy operation on paged-out gfn
The unwind path was not clearing the shared entry status bits. This
was BSOD-ing guests on network activity under certain configurations.
Also:
* sed the fixup method name to signal it's related to grant copy.
* use atomic clear flag ops during fixup.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
xen-unstable changeset: 25771:1636cc4886f6
xen-unstable date: Wed Aug 22 21:27:50 UTC 2012
Jan Beulich [Thu, 20 Sep 2012 08:53:43 +0000 (10:53 +0200)]
x86-64: refine the XSA-9 fix
Our product management wasn't happy with the "solution" for XSA-9, and
demanded that customer systems must continue to boot. Rather than
having our and perhaps other distros carry non-trivial patches, allow
for more fine grained control (panic on boot, deny guest creation, or
merely warn) by means of a single line change.
Also, as this was found to be a problem with remotely managed systems,
don't default to boot denial (just deny guest creation).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25765:e6ca45ca03c2
xen-unstable date: Mon Aug 20 06:46:47 UTC 2012
Jan Beulich [Thu, 20 Sep 2012 08:52:24 +0000 (10:52 +0200)]
x86: don't expose SYSENTER on unknown CPUs
So far we only ever set up the respective MSRs on Intel CPUs, yet we
hide the feature only on a 32-bit hypervisor. That prevents booting of
PV guests on top of a 64-bit hypervisor making use of the instruction
on unknown CPUs (VIA in this case).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25764:4b0d263008cd
xen-unstable date: Mon Aug 20 06:40:01 UTC 2012
Jan Beulich [Thu, 20 Sep 2012 08:51:30 +0000 (10:51 +0200)]
EPT/PoD: fix interaction with 1Gb pages
When PoD got enabled to support 1Gb pages, ept_get_entry() didn't get
updated to match - the assertion in there triggered, indicating that
the call to p2m_pod_demand_populate() needed adjustment.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
xen-unstable changeset: 25757:3468a834be8d
xen-unstable date: Thu Aug 16 16:38:05 UTC 2012
Tim Deegan [Thu, 20 Sep 2012 08:50:39 +0000 (10:50 +0200)]
x86/mm: update max_mapped_pfn on MMIO mappings too.
max_mapped_pfn should reflect the highest mapping we've ever seen of
any type, or the tests in the lookup functions will be wrong. As it
happens, the highest mapping has always been a RAM one, but this is no
longer the case when we allow 64-bit BARs.
Reported-by: Xudong Hao <xudong.hao@intel.com> Signed-off-by: Tim Deegan <tim@xen.org>
xen-unstable changeset: 25756:8918737c7e80
xen-unstable date: Thu Aug 16 13:31:09 UTC 2012
Jan Beulich [Thu, 20 Sep 2012 08:49:17 +0000 (10:49 +0200)]
x86/PoD: clean up types
GMFN values must undoubtedly be "unsigned long". "count" and
"entry_count", since they are signed types, should also be "long" as
otherwise they can't fit all values that can fit into "d->tot_pages"
(which currently is "uint32_t").
Beyond that, the patch doesn't convert everything to "long" as in many
places it is clear that "int" suffices. In places where "long" is being
used partially already, the change is however being done.
Furthermore, page order values have no use of being "long".
Finally, in the course of updating a few printk messages anyway, some
also get slightly shortened (to focus on the relevant information).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 25755:c887c30a0a35
xen-unstable date: Thu Aug 16 08:16:19 UTC 2012
Jan Beulich [Thu, 20 Sep 2012 08:46:23 +0000 (10:46 +0200)]
x86/PoD: prevent guest from being destroyed upon early access to its memory
When an external agent (e.g. a monitoring daemon) happens to access the
memory of a PoD guest prior to setting the PoD target, that access must
fail for there not being any page in the PoD cache, and only the space
above the low 2Mb gets scanned for victim pages (while only the low 2Mb
got real pages populated so far).
To accomodate for this
- set the PoD target first
- do all physmap population in PoD mode (i.e. not just large [2Mb or
1Gb] pages)
- slightly lift the restrictions enforced by p2m_pod_set_mem_target()
to accomodate for the changed tools behavior
Tested-by: Jürgen Groß <juergen.gross@ts.fujitsu.com>
(in a 4.0.x based incarnation) Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 25754:be8ae5439a88
xen-unstable date: Thu Aug 16 08:14:11 UTC 2012
Boris Ostrovsky [Thu, 20 Sep 2012 08:43:49 +0000 (10:43 +0200)]
acpi: Make sure valid CPU is passed to do_pm_op()
Passing invalid CPU value to do_pm_op() will cause assertion
in cpu_online().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Such checks would, at a first glance, then also be missing at the top
of various helper functions, but these check really were already
redundant with the check in do_pm_op(). Remove the redundant checks
for clarity and brevity.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25752:1df4fdbaade0
xen-unstable date: Wed Aug 15 07:43:25 UTC 2012
Andrew Cooper [Wed, 12 Sep 2012 18:35:07 +0000 (19:35 +0100)]
x86/passthrough: Fix corruption caused by race conditions between
device allocation and deallocation to a domain.
A toolstack, when dealing with a domain using PCIPassthrough, could
reasonably be expected to issue DOMCTL_deassign_device hypercalls to
remove all passed through devices before issuing a
DOMCTL_destroydomain hypercall to kill the domain. In the case where
a toolstack is perhaps less sensible in this regard, the hypervisor
should not fall over.
In domain_kill(), pci_release_devices() searches the alldevs_list list
looking for PCI devices still assigned to the domain. If the
toolstack has correctly deassigned all devices before killing the
domain, this loop does nothing.
However, if there are still devices attached to the domain, the loop
will call pci_cleanup_msi() without unbinding the pirq from the
domain. This eventually calls destroy_irq() which xfree()'s the
action.
However, as the irq_desc->action pointer is abused in an unsafe
matter, without unbinding first (which at least correctly cleans up),
the action is actually an irq_guest_action_t* rather than an
irqaction*, meaning that the cpu_eoi_map is leaked, and eoi_timer is
free()'d while still being on a pcpu's inactive_timer list. As a
result, when this free()'d memory gets reused, the inactive_timer list
becomes corrupt, and list_*** operations will corrupt hypervisor
memory.
If the above were not bad enough, the loop in pci_release_devices()
still leaves references to the irq it destroyed in
domain->arch.pirq_irq and irq_pirq, meaning that a later loop,
free_domain_pirqs(), which happens as a result of
complete_domain_destroy() will unbind and destroy all irqs which were
still bound to the domain, resulting in a double destroy of any irq
which was still bound to the domain at the point at which the
DOMCTL_destroydomain hypercall happened.
Because of the allocation of irqs from find_unassigned_irq(), the
lowest free irq number is going to be handed back from create_irq().
There is a further race condition between the original (incorrect)
call to destroy_irq() from pci_release_devices(), and the later call
to free_domain_pirqs() (which happens in a softirq context at some
point after the domain has officially died) during which the same irq
number (which is still referenced in a stale way in
domain->arch.pirq_irq and irq_pirq) has been allocated to a new domain
via a PHYSDEVOP_map_pirq hypercall (Say perhaps in the case of
rebooting a domain).
In this case, the cleanup for the dead domain will free the recently
bound irq under the feet of the new domain. Furthermore, after the
irq has been incorrectly destroyed, the same domain with another
PHYSDEVOP_map_pirq hypercall can be allocated the same irq number as
before, leading to an error along the lines of:
../physdev.c:188: dom54: -1:-1 already mapped to 74
In this case, the pirq_irq and irq_pirq mappings get updated to the
new PCI device from the latter PHYSDEVOP_map_pirq hypercall, and the
IOMMU interrupt remapping registers get updated, leading to IOMMU
Primary Pending Fault due to source-id verification failure for
incoming interrupts from the passed through device.
The easy fix is to simply deassign the device in pci_release_devices()
and leave all the real cleanup to the free_domain_pirqs() which
correctly unbinds and destroys the irq without leaving stale
references around.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25883:4fdaebea82d7
xen-unstable date: Wed Sep 12 19:31:16 2012 +0100
Jan Beulich [Tue, 4 Sep 2012 12:56:48 +0000 (14:56 +0200)]
make all (native) hypercalls consistently have "long" return type
for common and x86 ones at least, to address the problem of storing
zero-extended values into the multicall result field otherwise.
Reported-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25744:47080c965937
xen-unstable date: Fri Aug 10 07:51:01 UTC 2012
Jan Beulich [Tue, 4 Sep 2012 12:54:49 +0000 (14:54 +0200)]
x86-64: don't allow non-canonical addresses to be set for any callback
Rather than deferring the detection of these to the point where they
get actually used (the fix for XSA-7, 25480:76eaf5966c05, causing a #GP
to be raised by IRET, which invokes the guest's [fragile] fail-safe
callback), don't even allow such to be set.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25485:5b6a857411ba
xen-unstable date: Mon Jun 18 15:02:01 UTC 2012
Jan Beulich [Tue, 4 Sep 2012 12:46:12 +0000 (14:46 +0200)]
xen: fix page_list_splice()
Other than in __list_splice(), the first element's prev pointer
doesn't need adjustment here - it already is PAGE_LIST_NULL. Rather
than fixing the assignment (to formally match __list_splice()), simply
assert that this assignment is really unnecessary.
Reported-by: Jisoo Yang <jisooy@gmail.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Also assert that the prev pointers are both PAGE_LIST_NULL.
Signed-off-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25459:f6bfaf9daa50
xen-unstable date: Wed Jun 6 15:37:05 UTC 2012
Jan Beulich [Tue, 4 Sep 2012 12:43:57 +0000 (14:43 +0200)]
x86: don't hold off NMI delivery when MCE is masked
Likely through copy'n'paste, all three instances of guest MCE
processing jumped to the wrong place (where NMI processing code
correctly jumps to) when MCE-s are temporarily masked (due to one
currently being processed by the guest). A nested, unmasked NMI should
get delivered immediately, however.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25382:6dc80df50fa8
xen-unstable date: Tue May 22 14:30:11 UTC 2012
Fix save/restore of guest PAT table in HAP paging mode.
HAP paging mode guests use direct MSR read/write into the VMCS/VMCB
for the guest PAT table, while the current save/restore code was
accessing only the pat_cr field in hvm_vcpu, used when intercepting
the MSR mostly in shadow mode (the Intel scenario is a bit more
complicated). This patch fixes this issue creating a new couple of
hvm_funcs, get/set_guest_pat, that access the right PAT table based on
the paging mode and guest configuration.
Signed-off-by: Gianluca Guida <gianluca.guida@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
xen-unstable changeset: 25196:375fa55c7a6c
xen-unstable date: Tue Apr 17 07:29:26 UTC 2012
PCID (Process-context identifier) is a facility by which a logical
processor may cache information for multiple linear-address spaces.
INVPCID is an new instruction to invalidate TLB. Refer latest Intel SDM
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
We disable PCID/INVPCID for dom0 and pv. Exposing them into dom0 and pv
may result in performance regression, and it would trigger GP or UD
depending on whether platform suppport INVPCID or not.
This patch disables PCID/INVPCID for dom0.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
xen-unstable changeset: 24278:d9cb04ed5539
xen-unstable date: Thu Dec 1 11:22:43 UTC 2011
Jan Beulich [Tue, 4 Sep 2012 12:23:18 +0000 (14:23 +0200)]
x86-64/MMCFG: correct base address computation for regions not starting at bus 0
As per the specification, the base address reported by ACPI is the one
that would be used if the region started at bus 0. Hence the
start_bus_number offset needs to be added not only to the virtual
address, but also the physical one when establishing the mapping, and
it then needs to be subtracted when obtaining the virtual address for
doing accesses.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23747:b07b6fa76656
xen-unstable date: Mon Jul 25 15:42:19 UTC 2011
David Vrabel [Thu, 9 Aug 2012 15:44:51 +0000 (16:44 +0100)]
cpufreq: P state stats aren't available if there is no cpufreq driver
If there is no cpufreq driver (e.g., with an AMD Opteron 8212) then
reading the P state statistics causes a deadlock as an uninitialized
spinlock is locked in do_get_pm_info(). The spinlock is initialized in
cpufreq_statistic_init() which is not called if cpufreq_driver ==
NULL.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25706:7fd5facb6084
xen-unstable date: Fri Aug 03 09:50:28 2012 +0200
Ian Campbell [Thu, 9 Aug 2012 14:47:42 +0000 (15:47 +0100)]
xen: only check for shared pages while any exist on teardown
Avoids worst case behavour when guest has a large p2m.
This is XSA-11 / CVE-2012-3433
Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Olaf Hering <olaf@aepfle.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 3 Aug 2012 09:43:24 +0000 (10:43 +0100)]
nestedhvm: fix nested page fault build error on 32-bit
cc1: warnings being treated as errors
hvm.c: In function ‘hvm_hap_nested_page_fault’:
hvm.c:1282: error: passing argument 2 of
‘nestedhvm_hap_nested_page_fault’ from incompatible pointer type
/local/scratch/ianc/devel/xen-unstable.hg/xen/include/asm/hvm/nestedhvm.h:55:
note: expected ‘paddr_t *’ but argument is of type ‘long unsigned
int *’
hvm_hap_nested_page_fault takes an unsigned long gpa and passes &gpa
to nestedhvm_hap_nested_page_fault which takes a paddr_t *. Since both
of the callers of hvm_hap_nested_page_fault (svm_do_nested_pgfault and
ept_handle_violation) actually have the gpa which they pass to
hvm_hap_nested_page_fault as a paddr_t I think it makes sense to
change the argument to hvm_hap_nested_page_fault.
The other user of gpa in hvm_hap_nested_page_fault is a call to
p2m_mem_access_check, which currently also takes a paddr_t gpa but I
think a paddr_t is appropriate there too.
Jan points out that this is also an issue for >4GB guests on the 32
bit hypervisor.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 25724:612898732e66
xen-unstable date: Fri Aug 03 09:54:17 2012 +0100 Backported-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 26 Jul 2012 15:56:35 +0000 (16:56 +0100)]
x86/hvm: don't leave emulator in inconsistent state
The fact that handle_mmio(), and thus the instruction emulator, is
being run through twice for emulations that require involvement of the
device model, allows for the second run to see a different guest state
than the first one. Since only the MMIO-specific emulation routines
update the vCPU's io_state, if they get invoked on the second pass,
internal state (and particularly this variable) can be left in a state
making successful emulation of a subsequent MMIO operation impossible.
Consequently, whenever the emulator invocation returns without
requesting a retry of the guest instruction, reset io_state.
[ This is a security issue. XSA#10. -iwj ]
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 25682:ffcb24876b4f Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Mon, 9 Jul 2012 09:30:44 +0000 (10:30 +0100)]
x86/PCI: fix guest_io_read() when pci_cfg_ok() denies access
For a multi-byte aligned read, this so far resulted in 0x00ff to be
put in the guest's register rather than 0xffff or 0xffffffff, which in
turn could confuse bus scanning functions (which, when reading vendor
and/or device IDs, expect to get back all zeroes or all ones).
As the value gets masked to the read width when merging back into the
full result, setting the initial value to all ones should not harm any
or the other cases.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25489:cc46bd403bc4
xen-unstable date: Fri Jun 22 10:04:30 2012 +0200
Jan Beulich [Mon, 9 Jul 2012 09:30:16 +0000 (10:30 +0100)]
x86/mm: fix mod_l1_entry() return value when encountering r/o MMIO page
While putting together the workaround announced in
http://lists.xen.org/archives/html/xen-devel/2012-06/msg00709.html, I
found that mod_l1_entry(), upon encountering a set bit in
mmio_ro_ranges, would return 1 instead of 0 (the removal of the write
permission is supposed to be entirely transparent to the caller, even
more so to the calling guest).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25487:baa85434d0ec
xen-unstable date: Thu Jun 21 11:30:59 2012 +0200
George Dunlap [Mon, 9 Jul 2012 09:24:44 +0000 (10:24 +0100)]
xen: Fix schedule()'s grabbing of the schedule lock
Because the location of the lock can change between the time you read
it and the time you grab it, the per-cpu schedule locks need to check
after lock acquisition that the lock location hasn't changed, and
release and re-try if so. This change was effected throughout the
source code, but one very important place was apparently missed: in
schedule() itself.
George Dunlap [Mon, 9 Jul 2012 09:22:58 +0000 (10:22 +0100)]
xen, vtd: Fix device check for devices behind PCIe-to-PCI bridges
On some systems, requests devices behind a PCIe-to-PCI bridge all
appear to the IOMMU as though they come from from slot 0, function 0
on that device; so the mapping code much punch a hole for X:0.0 in the
IOMMU for such devices. When punching the hole, if that device has
already been mapped once, we simply need to check ownership to make
sure it's legal. To do so, domain_context_mapping_one() will look up
the device for the mapping with pci_get_pdev() and look for the owner.
However, if there is no device in X:0.0, this look up will fail.
Rather than returning -ENODEV in this situation (causing a failure in
mapping the device), try to get the domain ownership from the iommu
context mapping itself.
Jan Beulich [Mon, 9 Jul 2012 09:21:42 +0000 (10:21 +0100)]
x86: update Intel CPUID masking code to latest spec
..., which adds masking of the xsave feature leaf.
Also add back (and fix to actually make it do what it was supposed to
do from the beginning) the printing of what specific masking couldn't
be done in case the user requested something the hardware doesn't
support.
Zhigang Wang [Mon, 9 Jul 2012 09:19:15 +0000 (10:19 +0100)]
tools/pygrub: fix solaris kernel sniff
Solaris 11 build 163+ removes '/platform/i86xpv/kernel/unix' and only
the
64-bit PV kernel file '/platform/i86xpv/kernel/amd64/unix' exists.
This patch fixes the detection.
Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Frank Che <frank.che@oracle.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23686:7c39a2c0d870
xen-unstable date: Thu Jul 14 18:09:58 2011 +0100
Andrew Cooper [Tue, 3 Jul 2012 12:50:01 +0000 (13:50 +0100)]
xen: Fix off-by-one error when parsing command line arguments
As Xen currently stands, it will attempt to interpret the first few
bytes of the initcall section as a struct kernel_param.
The reason that this not caused problems is because in the overflow
case, param->name is actually a function pointer to the first
initcall, and intepreting it as string is very unlikely to match an
ASCII command line parameter name.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25587:2cffb7bf6e57
xen-unstable date: Tue Jul 03 13:38:19 2012 +0100
Andrew Cooper [Tue, 3 Jul 2012 12:49:32 +0000 (13:49 +0100)]
x86/nmi: Fix deadlock in unknown_nmi_error()
Additionally, correct the text description to reflect what is being
done, and make use of fatal_trap() in preference to kexec_crash() in
case an unknown NMI occurs before a kdump kernel has been loaded.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25478:6d1a30dc47e8
xen-unstable date: Mon Jun 11 15:12:50 2012 +0100
Andrew Cooper [Tue, 3 Jul 2012 12:48:58 +0000 (13:48 +0100)]
x86_64: Fix off-by-one error setting up the Interrupt Stack Tables
The Interrupt Stack Table entries in a 64bit TSS are a 1 based data
structure as far as hardware is concerned. As a result, the code
setting up stacks in subarch_percpu_traps_init() fills in the wrong
IST entries.
The result is that the MCE handler executes on the stack set up for
NMIs; the NMI handler executes on a stack set up for Double Faults,
and Double Faults are executed with a stack pointer set to 0.
Once the #DF handler starts to execute, it will usually take a page
fault looking up the address at 0xfffffffffffffff8, which will cause a
triple fault. If a guest has mapped a page in that location, then it
will have some state overwritten, but as the #DF handler always calls
panic(), this is not a problem the guest will have time to care about.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25271:54da0329e259
xen-unstable date: Thu May 10 11:04:32 2012 +0100
Zheng Li [Tue, 3 Jul 2012 12:48:07 +0000 (13:48 +0100)]
tools/ocaml: Fix 2 bit-twiddling bugs and an off-by-one
The bit bugs are in ocaml vcpu affinity calls, and the off-by-one
error is in the ocaml console ring code
Signed-off-by: Zheng Li <zheng.li@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell.com> Committed-by: Ian Jackson <ian.jackson.citrix.com> Acked-by: Jon Ludlam <jonathan.ludlam@eu.citrix.com>
xen-unstable changeset: 23940:187d59e32a58
xen-unstable date: Mon Oct 10 16:41:16 2011 +0100
Jan Beulich [Tue, 12 Jun 2012 10:42:57 +0000 (11:42 +0100)]
x86-64: detect processors subject to AMD erratum #121 and refuse to boot
Processors with this erratum are subject to a DoS attack by unprivileged
guest users.
This is XSA-9 / CVE-2012-2934.
Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 25481:422880dc94a4
xen-unstable date: Tue Jun 12 11:33:42 2012 +0100 Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 12 Jun 2012 10:46:11 +0000 (11:46 +0100)]
x86-64: fix #GP generation in assembly code
When guest use of sysenter (64-bit PV guest) or syscall (32-bit PV
guest) gets converted into a GP fault (due to no callback having got
registered), we must
- honor the GP fault handler's request the keep enabled or mask event
delivery
- not allow TBF_EXCEPTION to remain set past the generation of the
(guest) exception in the vCPU's trap_bounce.flags, as that would
otherwise allow for the next exception occurring in guest mode,
should it happen to get handled in Xen itself, to nevertheless get
bounced to the guest kernel.
Also, just like compat mode syscall handling already did, native mode
sysenter handling should, when converting to #GP, subtract 2 from the
RIP present in the frame so that the guest's GP fault handler would
see the fault pointing to the offending instruction instead of past it.
Finally, since those exception generating code blocks needed to be
modified anyway, convert them to make use of UNLIKELY_{START,END}().
[ This bug is security vulnerability, XSA-8 / CVE-2012-0218. ]
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25200:80f4113be500 25204:569d6f05e1ef Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 12 Jun 2012 10:38:30 +0000 (11:38 +0100)]
x86_64: Do not execute sysret with a non-canonical return address
Check for non-canonical guest RIP before attempting to execute sysret.
If sysret is executed with a non-canonical value in RCX, Intel CPUs
take the fault in ring0, but we will necessarily already have switched
to the the user's stack pointer.
This is a security vulnerability, XSA-7 / CVE-2012-0217.
Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Keir Fraser <keir.xen@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 25480:76eaf5966c05
xen-unstable date: Tue Jun 12 11:33:40 2012 +0100 Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
block-remus.c: In function 'ramdisk_flush':
block-remus.c:508: error: 'buf' may be used uninitialized in this
function
make[5]: *** [block-remus.o] Error 1
This is because gcc can see that merge_requests doesn't always set
*mergedbuf but gcc isn't able to prove that it always does so if
merge_requests returns 0 and that in that case the value of
ramdisk_flush::buf isn't used.
This is too useful a warning to disable, despite the occasional false
positive of this form. The conventional approach is to suppress the
warning by explicitly initialising the variable to 0.
This has just come to light because 25275:27d63b9f111a reenabled
optimisation for this area of code, and gcc's data flow analysis
(which is required to trigger the uninitialised variable warning) only
occurs when optimisation is turned on.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 25281:60064411a8a9
xen-unstable date: Thu May 10 14:26:14 2012 +0100
Olaf Hering [Mon, 14 May 2012 15:51:27 +0000 (16:51 +0100)]
unmodified_drivers: remove inclusion of asm/system.h
Allow compilation of PVonHVM drivers with forward-ported xenlinux
sources in openSuSE 12.2. Since Linux 3.4 asm/system.h is not present
anymore. Remove inclusion of this header, its not needed.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Committed-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25327:cc7a054a5a27
xen-unstable date: Mon May 14 12:04:27 2012 +0200
Jan Beulich [Mon, 14 May 2012 15:50:58 +0000 (16:50 +0100)]
unmodified drivers: use upstream sync_bitops if available
The forward ported xenlinux sources in openSuSE 12.2 were switched
from the old synch_bitops to the sync_bitops since kernel version
3.3. Add compat macros to use either old or new helpers depending on
used kernel source version.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Olaf Hering <olaf@aepfle.de>
xen-unstable changeset: 25069:46bf3ab42baf
xen-unstable date: Fri Mar 16 11:35:06 2012 +0100
Olaf Hering [Mon, 14 May 2012 15:50:21 +0000 (16:50 +0100)]
unmodified drivers: add pfn_is_ram helper for kdump
Register pfn_is_ram helper speed up reading /proc/vmcore in the kdump
kernel. It is compiled only if the kernel source is recent enough to
have the pfn_is_ram helper (v3.0-rc1, commit 997c136f518c5debd63847e78e2a8694f56dcf90).
Signed-off-by: Olaf Hering <olaf@aepfle.de> Committed-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25068:e4460795ee66
xen-unstable date: Fri Mar 16 11:34:41 2012 +0100
Olaf Hering [Mon, 14 May 2012 15:50:03 +0000 (16:50 +0100)]
unmodified drivers: hide xen_cpuid_base() in version 2.6.38+
Allow compilation of PVonHVM drivers with forward-ported xenlinux
sources in openSuSE 12.1. xen_cpuid_base() is now in mainline, the
copy
in the xen tree leads to a compilation error. The current state leads
to a compile error:
/usr/src/packages/BUILD/xen-4.2.24547/non-dbg/obj/default/platform-pci/platform-pci.c:121:
error: redefinition of 'xen_cpuid_base'
/usr/src/linux-3.0.13-0.11/arch/x86/include/asm/xen/hypervisor.h:43:
error: previous definition of 'xen_cpuid_base' was here
The reason is that the kernel sources are searched before the xen
sources for asm/hypervisor.h:
Ian Campbell [Mon, 14 May 2012 15:49:24 +0000 (16:49 +0100)]
unmodified_drivers: update README from
http://wiki.xen.org/xenwiki/UnmodifiedDrivers
Add reference to the fact that these drivers are for "classic-Xen"
kernels only
and do not work with PVops but point towards the PVHVM functionality
in
mainstream.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson.citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com>
xen-unstable changeset: 24045:4ed766d70396
xen-unstable date: Wed Oct 26 17:44:03 2011 +0100
George Dunlap [Tue, 1 May 2012 13:15:20 +0000 (14:15 +0100)]
svm: Fake out the Bus Unit Config MSR on revF AMD CPUs
Win2k8 x64 reads this MSR on revF chips, where it wasn't publically
available; it uses a magic constant in %rdi as a password, which we
don't have in rdmsr_safe(). Since we'll ignore the later writes, just
use a plausible value here (the reset value from rev10h chips) if the
real CPU didn't provide one.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 24990:322300fd2ebd
xen-unstable date: Thu Mar 08 09:17:21 2012 +0000
svm: amend c/s 24990:322300fd2ebd (fake BU_CFG MSR on AMD revF)
Let's restrict such a hack to the known affected family.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
xen-unstable changeset: 25058:f47d91cb0faa
xen-unstable date: Thu Mar 15 15:09:18 2012 +0100
x86-64: Fix memory hotplug epfn upper limit test for updating the compat M2P table
The epfn is being compared to (RDWR_COMPAT_MPT_VIRT_END -
RDWR_COMPAT_MPT_VIRT_START) without a 2 bit shift, resulting in the
epfn being compared to the size of the RDWR_COMPAT_MPT table in bytes
instead of the maximum page frame number that the RDWR_COMPAT_MPT
table can map.
Jan Beulich [Tue, 17 Apr 2012 07:35:09 +0000 (08:35 +0100)]
x86/hpet: disable before reboot or kexec
Linux up to now is not smart enough to properly clear the HPET when it
boots, which is particularly a problem when a kdump attempt from
running under Xen is being made. Linux itself added code to work
around
this to its shutdown paths quite some time ago, so let's do something
similar in Xen: Save the configuration register settings during boot,
and restore them during shutdown. This should cover the majority of
cases where the secondary kernel might not come up because timer
interrupts don't work.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25101:f06ff3dfde08
xen-unstable date: Tue Mar 27 15:20:23 2012 +0200
Jan Beulich [Tue, 17 Apr 2012 07:33:33 +0000 (08:33 +0100)]
XENPF_set_processor_pminfo XEN_PM_CX overflows states array
Calling XENPF_set_processor_pminfo with XEN_PM_CX could cause states
array in "struct acpi_processor_power" to exceed its limit.
The array used to be reset (by function cpuidle_init_cpu()) for each
hypercall. The patch puts it back that way and adds an assertion to
make it clear in case that happens again.
Signed-off-by: Eric Chanudet <eric.chanudet@eu.citrix.com>
- convert assertion to printk() & bail
- eliminate struct acpi_processor_cx's valid member (not read anymore)
- further adjustments to one-time-only vs each-time operations in
cpuidle_init_cpu()
- don't use ACPI_STATE_Cn as array index anymore
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 24996:396801f25e92
xen-unstable date: Thu Mar 08 17:04:32 2012 +0100
Future AMD CPUs support TSC scaling. It allows guests to have a
different TSC frequency from host system using this formula: guest_tsc
= host_tsc * tsc_ratio + vmcb_offset. The tsc_ratio is a 64bit MSR
contains a fixed-point number in 8.32 format (8 bits for integer part
and 32bits for fractional part). For instance 0x00000003_80000000
means tsc_ratio=3.5.
This patch enables TSC scaling ratio for SVM. With it, guest VMs don't
need take #VMEXIT to calculate a translated TSC value when it is
running under TSC emulation mode. This can substancially reduce the
rdtsc overhead.
Signed-off-by: Wei Huang <wei.huang2@amd.com>
xen-unstable changeset: 23437:d7c755c25bb9
xen-unstable date: Sat May 28 08:58:08 2011 +0100
Jacob Shin [Thu, 12 Apr 2012 08:08:13 +0000 (09:08 +0100)]
hvm: vpmu: Add support for AMD Family 15h processors
AMD Family 15h CPU mirrors legacy K7 performance monitor counters to
a new location, and adds 2 new counters. This patch updates HVM VPMU
to take advantage of the new counters.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
xen-unstable changeset: 23306:e787d4f2e5ac
xen-unstable date: Mon May 09 09:54:46 2011 +0100
xenoprof: Add support for AMD Family 15h processors
AMD Family 15h CPU mirrors legacy K7 performance monitor counters to
a new location, and adds 2 new counters. This patch updates xenoprof
to take advantage of the new counters.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Rename fam15h -> amd_fam15h in a few places, as suggested by Jan
Beulich.
Andre Przywara [Thu, 12 Apr 2012 08:06:02 +0000 (09:06 +0100)]
svm: implement instruction fetch part of DecodeAssist (on #PF/#NPF)
Newer SVM implementations (Bulldozer) copy up to 15 bytes from the
instruction stream into the VMCB when a #PF or #NPF exception is
intercepted. This patch makes use of this information if available.
This saves us from a) traversing the guest's page tables, b) mapping
the guest's memory and c) copy the instructions from there into the
hypervisor's address space.
This speeds up #NPF intercepts quite a lot and avoids cache and TLB
trashing.
Newer SVM implementations (Bulldozer) give the desired address on
a INVLPG intercept explicitly in the EXITINFO1 field of the VMCB.
Use this address to avoid a costly instruction fetch and decode
cycle.
Newer SVM implementations (Bulldozer) now give the used general
purpose register on a MOV-CR intercept explictly. This avoids
fetching and decoding the instruction from guest's memory and speeds
up some Windows guest, which exercise CR8 quite often.
Chapter 15.33 of recent APM Vol.2 manuals describe some additions
to SVM called DecodeAssist. Add the newly added fields to the VMCB
structure and name the associated CPUID bit.
vmx/hvm: move mov-cr handling functions to generic HVM code
Currently the handling of CR accesses intercepts is done much
differently in SVM and VMX. For future usage move the VMX part
into the generic HVM path and use the exported functions.
David Vrabel [Wed, 11 Apr 2012 18:41:14 +0000 (19:41 +0100)]
x86: fix delta calculation in TSC deadline timer emulation
In the virtual LAPIC, correct the delta calculation when emulating the
TSC deadline timer.
Without this fix, XenServer (which is based on Xen 4.1) does not work
when running as an HVM guest. dom0 fails to boot because its timer
interrupts are very delayed (by several minutes in some cases).
libxl: support for "rtc_timeoffset" and "localtime"
Implement "rtc_timeoffset" and "localtime" options compatible as xm.
rtc_timeoffset is the offset between host time and guest time.
localtime means to specify whether the emulted RTC appears as UTC or is
offset by the host.
Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Committed-by: Ian Jackson <ian.jackson.citrix.com>
xen-unstable changeset: 25131:6f81f4d79fde Backport-requested-by: Giam Teck Choon <giamteckchoon@gmail.com> Signed-off-by: Giam Teck Choon <giamteckchoon@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Bugzilla 1680: Xend fails to start if /var/lib/xend/state/*.xml are empty
which I get often when replacing the Xen hypervisor with a newer version.
This can be easily be reproduced under Fedora Core 16 by installing
xen RPMs and then replacing the xen.gz with a newer version.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Anthony Low <shinji@pikopiko.org> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 24140:a3a2e300951a Backport-requested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell.com> Committed-by: Ian Jackson <ian.jackson.citrix.com>
xen-unstable changeset: 24459:caf9753d4cc1 Backport-requested-by: Roderick Colenbrander <thunderbird2k@gmail.com> Signed-off-by: Giam Teck Choon <giamteckchoon@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
During creation of the PV domain we allocate the E820 structure to
have the amount of E820 entries on the machine, plus the number three.
This will allow the tool stack to fill the E820 with more than three
entries. Specifically the use cases is , where the toolstack retrieves
the E820, sanitizes it, and then sets it for the PV guest (for PCI
passthrough), this dynamic number of E820 is just right.
Jan Beulich [Fri, 23 Mar 2012 13:58:22 +0000 (13:58 +0000)]
x86/gnttab: fix asm() operand in gnttab_clear_flag()
The operand needs to use the 'w' modifier in case the compiler happens
to pick a register (which apparently it does for no-one but the
reporter of this problem).
Reported-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25092:a66fb91cb8d3
xen-unstable date: Fri Mar 23 08:39:39 2012 +0100