libxl: Add a vkbd frontend/backend pair for HVM guests
Linux PV on HVM guests can use vkbd, so add a vkbd frontend/backend
pair for HVM guests by default. It is useful because it doesn't
require frequent qemu wakeups as the usb keyboard/mouse does.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Daniel De Graaf [Tue, 22 Nov 2011 13:29:48 +0000 (13:29 +0000)]
xsm/flask: fix resource list range checks
The FLASK security checks for resource ranges were not implemented
correctly - only the permissions on the endpoints of a range were
checked, instead of all items contained in the range. This would allow
certain resources (I/O ports, I/O memory) to be used by domains in
contravention to security policy.
This also corrects a bug where adding overlapping resource ranges did
not trigger an error.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Wei Wang [Tue, 22 Nov 2011 13:27:19 +0000 (13:27 +0000)]
amd iommu: Support INVALIDATE_IOMMU_ALL command.
It is one of the new architectural commands supported by iommu v2.
It instructs iommu to clear all address translation and interrupt
remapping caches for all devices and all domains.
Signed-off-by: Wei Wang <wei.wang2@amd.com> Committed-by: Keir Fraser <keir@xen.org>
Jan Beulich [Mon, 21 Nov 2011 08:29:31 +0000 (09:29 +0100)]
x86/vioapic: clear remote IRR when switching RTE to edge triggered mode
Xen itself (as much as Linux) relies on this behavior, so it should
also emulate it properly. Not doing so reportedly gets in the way of
kexec inside a HVM guest.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Olaf Hering <olaf@aepfle.de>
Jean Guyader [Fri, 18 Nov 2011 13:42:46 +0000 (13:42 +0000)]
hvmloader: Change memory relocation loop when overlap with PCI hole
Change the way we relocate the memory page if they overlap with pci
hole. Use new map space (XENMAPSPACE_gmfn_range) to move the loop
into xen.
This code usually get triggered when a device is pass through to a
guest and the PCI hole has to be extended to have enough room to map
the device BARs. The PCI hole will starts lower and it might overlap
with some RAM that has been alocated for the guest. That usually
happen if the guest has more than 4G of RAM. We have to relocate
those pages in high mem otherwise they won't be accessible.
Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Juergen Gross [Fri, 18 Nov 2011 13:34:43 +0000 (13:34 +0000)]
sched_sedf: Avoid panic when adjusting sedf parameters
When using sedf scheduler in a cpupool the system might panic when
setting sedf scheduling parameters for a domain. Introduces
for_each_domain_in_cpupool macro as it is usable 4 times now. Add
appropriate locking in cpupool_unassign_cpu().
Paul Durrant [Fri, 18 Nov 2011 13:32:50 +0000 (13:32 +0000)]
hvmloader: Add configuration options to selectively disable S3 and S4 ACPI power states.
Introduce acpi_s3 and acpi_s4 configuration options (default=1). The
S3 and S4 packages are moved into separate SSDTs and their inclusion
is controlled by the new configuration options.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Paul Durrant [Fri, 18 Nov 2011 13:31:43 +0000 (13:31 +0000)]
hvmloader: Move acpi_enabled out of hvm_info_table into xenstore
Since hvmloader has a xentore client, use a platform key in xenstore
to indicate whether ACPI is enabled or not rather than the shared
hvm_info_table structure.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 18 Nov 2011 08:22:45 +0000 (09:22 +0100)]
x86/xsave: provide guests with finit-like environment
Without the use of xsave, guests get their initial floating point
environment set up with finit. At least NetWare actually depends on
this (in particular on all exceptions being masked), so to be
consistent set the same environment also when using xsave. This is
also in line with all SSE exceptions getting masked initially.
To avoid further fragile casts in xstate_alloc_save_area() the patch
also changes xsave_struct's fpu_see member to have actually usable
fields.
The patch was tested in its technically identical, but modified-file-
wise different 4.1.2 version.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Charles Arnold <carnold@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 18 Nov 2011 08:21:24 +0000 (09:21 +0100)]
x86/IRQ: prevent vector sharing within IO-APICs
Following the prevention of vector sharing for MSIs, this change
enforces the same within IO-APICs: Pin based interrupts use the IO-APIC
as their identifying device under the AMD IOMMU (and just like for
MSIs, only the identifying device is used to remap interrupts here,
with no regard to an interrupt's destination).
Additionally, LAPIC initiated EOIs (for level triggered interrupts) too
use only the vector for identifying which interrupts to end. While this
generally causes no significant problem (at worst an interrupt would be
re-raised without a new interrupt event actually having occurred), it
still seems better to avoid the situation.
For this second aspect, a distinction is being made between the
traditional and the directed-EOI cases: In the former, vectors should
not be shared throughout all IO-APICs in the system, while in the
latter case only individual IO-APICs need to be contrained (or, if the
firmware indicates so, sub- groups of them having the same GSI appear
at multiple pins).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 18 Nov 2011 08:18:41 +0000 (09:18 +0100)]
x86/IO-APIC: refine EOI-ing of migrating level interrupts
Rather than going through all IO-APICs and calling io_apic_eoi_vector()
for the vector in question, just use eoi_IO_APIC_irq().
This in turn allows to eliminate quite a bit of other code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 16 Nov 2011 16:04:31 +0000 (16:04 +0000)]
x86/emulator: add feature checks for newer instructions
Certain instructions were introduced only after the i686 or original
x86-64 architecture, so we should not try to emulate them if the guest
is not seeing the respective feature enabled (or, worse, if the
underlying hardware doesn't support them). This affects fisttp,
movnti, and cmpxchg16b.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Keir Fraser <keir@xen.org>
Introduce an event channel for buffered io event notifications,
advertise the port number using an hvm param. This way the device
model is not forced to check the buffered io page for data several
times a second for the entire life of the VM (buffered io is mostly
used for stdvga emulation in Xen that is switched off after the guest
goes into graphical mode).
Keir Fraser [Mon, 14 Nov 2011 20:15:35 +0000 (20:15 +0000)]
hvmloader: Move acpi_info structure out from low memory.
This avoids a conflict with SeaBIOS's memory management. Moreover
there is no reason that acp_info must live below 1MB, and moving it
out actually simplifies our code.
Create two new variables called APPEND_ and PREPEND_ to add compile
flags at the beginning or at the end of the search path.
Added a new semantic for user defined compile flags, here is the list
of possible options:
PREPEND_LIB: add libraries to the search path before xen
(before xen installation folders).
PREPEND_INCLUDES: add headers to the search path before xen
(before xen installation folders).
APPEND_LIB: add libraries to the search path at the end
(after all xen installation folders have been added).
APPEND_INCLUDES: add libraries to the search path at the end
(after all xen installation folders have been added).
EXTRA_INCLUDES and EXTRA_LIB can still be used, and they will have the
same effect as PREPEND_INCLUDES and PREPEND_LIB.
Signed-off-by: Roger Pau Monne <roger.pau@entel.upc.edu> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Bugzilla 1680: Xend fails to start if /var/lib/xend/state/*.xml are empty
which I get often when replacing the Xen hypervisor with a newer version.
This can be easily be reproduced under Fedora Core 16 by installing
xen RPMs and then replacing the xen.gz with a newer version.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Anthony Low <shinji@pikopiko.org> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 14 Nov 2011 17:50:53 +0000 (17:50 +0000)]
docs: report if we do not build a doc due to lack of the necessary tool
Previously only some targets did this. An alternative would be to make a hard
dependency on these tools, this might make more sense especially for markdown?
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Olaf Hering [Mon, 14 Nov 2011 17:49:14 +0000 (17:49 +0000)]
xenpaging: munmap all pages after page-in
Do munmap() on all mapped pages, not just the first one. Without this
change the gfns backing the remaining pages can not be paged out again
because the page count does not go down to 1. This change was missing
from changeset 23827:d1d6abc1db20.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Callers of lookups into the p2m code are now variants of get_gfn. All
callers need to call put_gfn. The code behind it is a no-op at the
moment, but will change to proper locking in a later patch.
This patch does not change functionality. Only naming, and adds
put_gfn's.
set_p2m_entry retains its name because it is always called with
p2m_lock held.
This patch is humongous, unfortunately, given the dozens of call sites
involved.
After this patch, anyone using old style gfn_to_mfn will not succeed
in compiling their code. This is on purpose: adapt to the new API.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
Lasse Collin [Fri, 11 Nov 2011 13:35:51 +0000 (14:35 +0100)]
Decompressors: check input size in unlzo.c
From: Lasse Collin <lasse.collin@tukaani.org>
The code assumes that the input is valid and not truncated. Add checks to
avoid reading past the end of the input buffer. Change the type of "skip"
from u8 to int to fix a possible integer overflow.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Lasse Collin [Fri, 11 Nov 2011 13:35:05 +0000 (14:35 +0100)]
Decompressors: check for write errors in unlzo.c
From: Lasse Collin <lasse.collin@tukaani.org>
The return value of flush() is not checked in unlzo(). This means that
the decompressor won't stop even if the caller doesn't want more data.
This can happen e.g. with a corrupt LZO-compressed initramfs image.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Lasse Collin [Fri, 11 Nov 2011 13:34:24 +0000 (14:34 +0100)]
Decompressors: validate match distance in unlzma.c
From: Lasse Collin <lasse.collin@tukaani.org>
Validate the newly decoded distance (rep0) in process_bit1(). This is to
detect corrupt LZMA data quickly. The old code can run for long time
producing garbage until it hits the end of the input.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Lasse Collin [Fri, 11 Nov 2011 13:33:30 +0000 (14:33 +0100)]
Decompressors: check for write errors in unlzma.c
From: Lasse Collin <lasse.collin@tukaani.org>
The return value of wr->flush() is not checked in write_byte(). This
means that the decompressor won't stop even if the caller doesn't want
more data. This can happen e.g. with corrupt LZMA-compressed initramfs.
Returning the error quickly allows the user to see the error message
quicker.
There is a similar missing check for wr.flush() near the end of unlzma().
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Lasse Collin [Fri, 11 Nov 2011 13:32:57 +0000 (14:32 +0100)]
Decompressors: check for read errors in unlzma.c
From: Lasse Collin <lasse.collin@tukaani.org>
Return value of rc->fill() is checked in rc_read() and error() is called
when needed, but then the code continues as if nothing had happened.
rc_read() is a void function and it's on the top of performance critical
call stacks, so propagating the error code via return values doesn't sound
like the best fix. It seems better to check rc->buffer_size (which holds
the return value of rc->fill()) in the main loop. It does nothing bad
that the code runs a little with unknown data after a failed rc->fill().
This fixes an infinite loop in initramfs decompression if the
LZMA-compressed initramfs image is corrupt.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Lasse Collin [Fri, 11 Nov 2011 13:32:03 +0000 (14:32 +0100)]
Decompressors: fix header validation in unlzma.c
From: Lasse Collin <lasse.collin@tukaani.org>
Validation of header.pos calls error() but doesn't make the function
return to indicate an error to the caller. Instead the decoding is
attempted with invalid header.pos. This fixes it.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 11 Nov 2011 13:25:16 +0000 (14:25 +0100)]
x86: quiesce cpuidle code
So far these messages got pointlessly (as the code in other places
assumes symmetric configuration) emitted once per CPU. Hide the debug
one behind opt_cpu_info, and issue the info one just once (if the code
gets adjusted to support assymtric configurations, this would need to
be revisited, but ideally without producing per-CPU messages again).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Wei Wang [Fri, 11 Nov 2011 11:05:14 +0000 (12:05 +0100)]
amd iommu: Compress hyper-transport flags into a single byte
These flags are single bit, no need to be saved as integers.
Add 3 inline helpers to make single bit access easier.
Introduce iommu_has_ht_flag and set_iommu_ht_flags
Signed-off-by: Wei Wang <wei.wang2@amd.com> Committed-by: Jan Beulich <jbeulich@suse.com>
* Define new structure to represent capability block.
* Remove unnecessary read for unused information.
* Add sanity check into get_iommu_capabilities.
* iommu capability offset is 16 bit not 8 bit, fix that.
Signed-off-by: Wei Wang <wei.wang2@amd.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Jean Guyader [Fri, 11 Nov 2011 09:14:22 +0000 (10:14 +0100)]
Hypercall continuation cancelation in compat mode for XENMEM_get/set_pod_target
If copy_to_guest failed in the compat code after a continuation as been
done in the native code we need to cancel it so we won't reexecute the
hypercall but return from the hypercall with the appropriate error.
Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 11 Nov 2011 08:47:40 +0000 (09:47 +0100)]
x86/IRQ: eliminate irq_vector[]
The vector is already being tracked in struct irq_desc's arch.vector
member, so there's no real need for a second place where this to get
stored. The only caveat is that legacy vectors (used for interrupts
handled through the 8259) must be special cased to not prevent non-
legacy vectors from being assigned.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tim Deegan [Thu, 10 Nov 2011 11:12:35 +0000 (11:12 +0000)]
x86/mm: Refactor p2m get_entry accessor
Move the main query accessor to the p2m outside of an inline and into the
p2m code itself. This will allow for p2m internal locking to be added
to the accessor later.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 10 Nov 2011 11:12:35 +0000 (11:12 +0000)]
x86/mm: Enforce ordering constraints for the page alloc lock in the PoD code
The page alloc lock is sometimes used in the PoD code, with an
explicit expectation of ordering. Use our ordering constructs in the
mm layer to enforce this.
The additional book-keeping variables are kept in the arch_domain
sub-struct, as they are x86-specific to the whole domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 10 Nov 2011 11:12:35 +0000 (11:12 +0000)]
x86/mm: Declare an order-enforcing construct for external locks used in the mm layer
Declare an order-enforcing construct for a lock used in the mm layer
that is not of type mm_lock_t. This is useful whenever the mm layer
takes locks from other subsystems, or locks not implemented as
mm_lock_t.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 10 Nov 2011 11:12:35 +0000 (11:12 +0000)]
x86/mm: Refactor mm-lock ordering constructs
The mm layer has a construct to enforce locks are taken in a pre-
defined order, and thus avert deadlock. Refactor pieces of this
code for later use, no functional changes.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
Jean Guyader [Tue, 8 Nov 2011 19:41:47 +0000 (19:41 +0000)]
xsm: Add support for HVMOP_track_dirty_vram.
Xen try to inforce the xsm policy when a HVMOP_track_dirty_vram
is received (xen/arch/x86/hvm/hvm.c:3637). It was failing because
in flask_hvmcontext, xsm didn't have any case for this operation.
Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Juergen Gross [Mon, 7 Nov 2011 14:36:44 +0000 (14:36 +0000)]
Make lock profiling usable again
Using lock profiling (option lock_profile in xen/Rules.mk) resulted in
build errors.
Changes:
- Include public/sysctl.h in spinlock.h when using lock profiling.
- Allocate profile data in an own structure to avoid struct domain
becoming larger then one page
Jan Beulich [Mon, 7 Nov 2011 09:29:14 +0000 (10:29 +0100)]
cpufreq: allocate CPU masks dynamically
struct cpufreq_policy, including a cpumask_t member, gets copied in
cpufreq_limit_change(), cpufreq_add_cpu(), set_cpufreq_gov(), and
set_cpufreq_para(). Make the member a cpumask_var_t, thus reducing the
amount of data needing copying (particularly with large NR_CPUS).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Mon, 7 Nov 2011 09:26:23 +0000 (10:26 +0100)]
powernow: don't read never initialized structure member
c/s 20361:51b031b0737e removed the writing of struct
processor_performance's shared_cpu_map member, but the powernow driver
still has code to read it (though presumably that code path can't be
taken on actual hardware supported by the powernow driver). Remove the
use of the field along with the field itself.
Jan Beulich [Fri, 4 Nov 2011 14:55:50 +0000 (15:55 +0100)]
x86/IRQ: fix create_irq() after c/s 24068:6928172f7ded
init_one_irq_desc() must be called with interrupts enabled (as it may
call functions from the xmalloc() group). Rather than mis-using
vector_lock to also protect the finding of an unused IRQ, make this
lockless through using cmpxchg(), and obtain the lock only around the
actual assignment of the vector.
Also fold find_unassigned_irq() into its only caller.
It is, btw, questionable whether create_irq() calling
__assign_irq_vector() (rather than assign_irq_vector()) is actually
correct - desc->affinity appears to not get initialized properly in
this case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>