Dongxiao Xu [Fri, 11 Jan 2013 16:30:56 +0000 (17:30 +0100)]
nested vmx: fix CR0/CR4 emulation
While emulate CR0 and CR4 for nested virtualization, set the CR0/CR4
guest host mask to 0xffffffff in shadow VMCS, then calculate the
corresponding read shadow separately for CR0 and CR4. While getting
the VM exit for CR0/CR4 access, check if L1 VMM owns the bit. If so,
inject the VM exit to L1 VMM. Otherwise, L0 will handle it and sync
the value to L1 virtual VMCS.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Fri, 11 Jan 2013 12:22:30 +0000 (12:22 +0000)]
Switch from select() to poll() in xenconsoled's IO loop
In Linux select() typically supports up to 1024 file descriptors. This can be
a problem when user tries to boot up many guests. Switching to poll() has
minimum impact on existing code and has better scalibility.
pollfd array is dynamically allocated / reallocated. If the array fails to
expand, we just ignore the incoming fd.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Some renaming to correct the PCI and SBDF terminology.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Fri, 11 Jan 2013 12:22:29 +0000 (12:22 +0000)]
tools/ocaml: libxc bindings: Fix SBDF encoding
Changeset 23861:ec7c81fbe0de alters the SBDF encoding expected by the
DOMCTL_{de,}assign_device hypercalls.
While it updates libxl, libxc and the python bindings, the ocaml
bindings got missed. As a result, any attempt to use PCI Passthrough
with Xen-4.2 and later will fail.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
fix wrong path while calling pygrub and libxl-save-helper
in current xen x86_64, the default libexec directory is /usr/lib/xen/bin,
while the private binder is /usr/lib64/xen/bin. but some commands(pygrub,
libxl-save-helper) located in private binder directory is called from
libexec directory which lead to the following error:
1, for pygrub bootloader:
libxl: debug: libxl_bootloader.c:429:bootloader_disk_attached_cb: /usr/lib/xen/bin/pygrub doesn't exist, falling back to config path
2, for libxl-save-helper:
libxl: cannot execute /usr/lib/xen/bin/libxl-save-helper: No such file or directory
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 3 save/restore helper stdout pipe
libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: domain 3 save/restore helper [10222] exited with error status 255
there are two ways to fix above error. the first one is make such command
store in the /usr/lib/xen/bin and /usr/lib64/xen/bin(symbol link to
previous), e.g. qemu-dm. The second way is using private binder dir
instead of libexec dir. e.g. xenconsole.
For these cases, the latter one is suitable.
Signed-off-by: Bamvor Jian Zhang <bjzhang@suse.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Fri, 11 Jan 2013 12:22:27 +0000 (12:22 +0000)]
libxc: x86: ensure that the initial mapping fits into the guest's memory
In particular we need to check that adding 512KB of slack and
rounding up to a 4MB boundary do not overflow the guest's memory
allocation. Otherwise we run off the end of the p2m when building the
guest's initial page tables and populate them with garbage.
Wei noticed this when build tiny (2MB) mini-os domains.
Reported-by: Wei Liu <Wei.Liu2@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Jim Fehlig [Fri, 11 Jan 2013 12:22:26 +0000 (12:22 +0000)]
libxl: Set vfb and vkb devid if not done so by the caller
Other devices set a sensible devid if the caller has not done so.
Do the same for vfb and vkb. While at it, factor out the common code
used to determine a sensible devid, so it can be used by other
libxl__device_*_add functions.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Daniel De Graaf [Fri, 11 Jan 2013 10:49:49 +0000 (10:49 +0000)]
xsm/flask: document the access vectors
This adds comments to the FLASK access_vectors file describing what
operations each access vector controls and the meanings of the source
and target fields in the permission check. This also makes the
indentation of the file consistent; no functionality changes are made.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:46:43 +0000 (10:46 +0000)]
tmem: add XSM hooks
This adds a pair of XSM hooks for tmem operations: xsm_tmem_op which
controls any use of tmem, and xsm_tmem_control which allows use of the
TMEM_CONTROL operations. By default, all domains can use tmem while
only IS_PRIV domains can use control operations.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:44:01 +0000 (10:44 +0000)]
xen/xsm: Add xsm_default parameter to XSM hooks
Include the default XSM hook action as the first argument of the hook
to facilitate quick understanding of how the call site is expected to
be used (dom0-only, arbitrary guest, or device model). This argument
does not solely define how a given hook is interpreted, since any
changes to the hook's default action need to be made identically to
all callers of a hook (if there are multiple callers; most hooks only
have one), and may also require changing the arguments of the hook.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:43:02 +0000 (10:43 +0000)]
xen: platform_hypercall XSM hook removal
A number of the platform_hypercall XSM hooks have no parameters or
only pass the operation ID, making them redundant with the
xsm_platform_op hook. Remove these redundant hooks.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:42:30 +0000 (10:42 +0000)]
xen: sysctl XSM hook removal
A number of the sysctl XSM hooks have no parameters or only pass the
operation ID, making them redundant with the xsm_sysctl hook. Remove
these redundant hooks.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:41:51 +0000 (10:41 +0000)]
xen: domctl XSM hook removal
A number of the domctl XSM hooks do nothing except pass the domain and
operation ID, making them redundant with the xsm_domctl hook. Remove
these redundant hooks.
The remaining domctls all use individual hooks because they pass extra
details of the call to the XSM module in order to allow a more
fine-grained access decision to be made - for example, considering the
exact device or memory range being set up for guest access.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:39:58 +0000 (10:39 +0000)]
arch/x86: use XSM hooks for get_pg_owner access checks
There are three callers of get_pg_owner:
* do_mmuext_op, which does not have XSM hooks on all subfunctions
* do_mmu_update, which has hooks that are inefficient
* do_update_va_mapping_otherdomain, which has a simple XSM hook
In order to preserve return values for the do_mmuext_op hypercall, an
additional XSM hook is required to check the operation even for those
subfunctions that do not use the pg_owner field. This also covers the
MMUEXT_UNPIN_TABLE operation which did previously have an XSM hook.
The XSM hooks in do_mmu_update were capable of replacing the checks in
get_pg_owner; however, the hooks are buried in the inner loop of the
function - not very good for performance when XSM is enabled and these
turn in to indirect function calls. This patch removes the PTE from
the hooks and replaces it with a bitfield describing what accesses are
being requested. The XSM hook can then be called only when additional
bits are set instead of once per iteration of the loop.
This patch results in a change in the FLASK permissions used for
mapping an MMIO page: the target for the permisison check on the
memory mapping is no longer resolved to the device-specific type, and
is instead either the domain's own type or domio_t (depending on if
the domain uses DOMID_SELF or DOMID_IO in the map
command). Device-specific access is still controlled via the "resource
use" permisison checked at domain creation (or device hotplug).
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:39:20 +0000 (10:39 +0000)]
arch/x86: Add missing mem_sharing XSM hooks
This patch adds splits up the mem_sharing and mem_event XSM hooks to
better cover what the code is doing. It also changes the utility
function get_mem_event_op_target to rcu_lock_live_remote_domain_by_id
because there is no mm-specific logic in there.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:38:39 +0000 (10:38 +0000)]
xsm/flask: add distinct SIDs for self/target access
Because the FLASK XSM module no longer checks IS_PRIV for remote
domain accesses covered by XSM permissions, domains now have the
ability to perform memory management and other functions on all
domains that have the same type. While it is possible to prevent this
by only creating one domain per type, this solution significantly
limits the flexibility of the type system.
This patch introduces a domain type transition to represent a domain
that is operating on itself. In the example policy, this is
demonstrated by creating a type with _self appended when declaring a
domain type which will be used for reflexive operations. AVCs for a
domain of type domU_t will look like the following:
This change also allows policy to distinguish between event channels a
domain creates to itself and event channels created between domains of
the same type.
The IS_PRIV_FOR check used for device model domains is also no longer
checked by FLASK; a similar transition is performed when the target is
set and used when the device model accesses its target domain.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:37:10 +0000 (10:37 +0000)]
xsm/flask: Add checks on the domain performing the set_target operation
The existing domain__set_target check only verifies that the source
and target domains can be associated. We also need to check that the
privileged domain making this association is allowed to do so.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:09:45 +0000 (10:09 +0000)]
xen: convert do_domctl to use XSM
The xsm_domctl hook now covers every domctl, in addition to the more
fine-grained XSM hooks in most sub-functions. This also removes the
need to special-case XEN_DOMCTL_getdomaininfo.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:07:19 +0000 (10:07 +0000)]
xen: avoid calling rcu_lock_*target_domain when an XSM hook exists
The rcu_lock_{,remote_}target_domain_by_id functions are wrappers
around an IS_PRIV_FOR check for the current domain. This is now
redundant with XSM hooks, so replace these calls with
rcu_lock_domain_by_any_id or rcu_lock_remote_domain_by_id to remove
the duplicate permission checks.
When XSM_ENABLE is not defined or when the dummy XSM module is used,
this patch should not change any functionality. Because the locations
of privilege checks have sometimes moved below argument validation,
error returns of some functions may change from EPERM to EINVAL when
called with invalid arguments and from a domain without permission to
perform the operation.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Jan 2013 10:06:43 +0000 (10:06 +0000)]
xen: use XSM instead of IS_PRIV where duplicated
The Xen hypervisor has two basic access control function calls:
IS_PRIV and the xsm_* functions. Most privileged operations currently
require that both checks succeed, and many times the checks are at
different locations in the code. This patch eliminates the explicit
and implicit IS_PRIV checks that are duplicated in XSM hooks.
When XSM_ENABLE is not defined or when the dummy XSM module is used,
this patch should not change any functionality. Because the locations
of privilege checks have sometimes moved below argument validation,
error returns of some functions may change from EPERM to EINVAL or
ESRCH if called with invalid arguments and from a domain without
permission to perform the operation.
Some checks are removed due to non-obvious duplicates in their
callers:
* acpi_enter_sleep is checked in XENPF_enter_acpi_sleep
* map_domain_pirq has IS_PRIV_FOR checked in its callers:
* physdev_map_pirq checks when acquiring the RCU lock
* ioapic_guest_write is checked in PHYSDEVOP_apic_write
* PHYSDEVOP_{manage_pci_add,manage_pci_add_ext,pci_device_add} are
checked by xsm_resource_plug_pci in pci_add_device
* PHYSDEVOP_manage_pci_remove is checked by xsm_resource_unplug_pci
in pci_remove_device
* PHYSDEVOP_{restore_msi,restore_msi_ext} are checked by
xsm_resource_setup_pci in pci_restore_msi_state
* do_console_io has changed to IS_PRIV from an explicit domid==0
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Thu, 10 Jan 2013 17:32:10 +0000 (17:32 +0000)]
arch/x86: add distinct XSM hooks for map/unmap
The xsm_iomem_permission and xsm_ioport_permission hooks are intended
to be called by the domain builder, while the calls in
arch/x86/domctl.c which control mapping are also performed by the
device model. Because these operations require distinct access
control policies, they cannot use the same XSM hooks.
This also adds a missing XSM hook in the unbind IRQ domctl.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Thu, 10 Jan 2013 17:30:47 +0000 (17:30 +0000)]
flask: move policy headers into hypervisor
Rather than keeping around headers that are autogenerated in order to
avoid adding build dependencies from xen/ to files in tools/, move the
relevant parts of the FLASK policy into the hypervisor tree and
generate the headers as part of the hypervisor's build.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Thu, 10 Jan 2013 17:27:58 +0000 (17:27 +0000)]
xsm: Use the dummy XSM module if XSM is disabled
This patch moves the implementation of the dummy XSM module to a
header file that provides inline functions when XSM_ENABLE is not
defined. This reduces duplication between the dummy module and callers
when the implementation of the dummy return is not just "return 0",
and also provides better compile-time checking for completeness of the
XSM implementations in the dummy module.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Ross Philipson [Thu, 10 Jan 2013 17:18:43 +0000 (17:18 +0000)]
HVM firmware passthrough ACPI processing
ACPI table passthrough support allowing additional static tables and
SSDTs (AML code) to be loaded. These additional tables are added at
the end of the secondary table list in the RSDT/XSDT tables.
Signed-off-by: Ross Philipson <ross.philipson@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Ross Philipson [Thu, 10 Jan 2013 17:18:10 +0000 (17:18 +0000)]
HVM firmware passthrough SMBIOS processing
Passthrough support for the SMBIOS structures including three new DMTF
defined types and support for OEM defined tables. Passed in SMBIOS
types override the default internal values. Default values can be
enabled for the new type 22 portable battery using a xenstore
flag. All other new DMTF defined and OEM structures will only be added
to the SMBIOS table if passthrough values are present.
Signed-off-by: Ross Philipson <ross.philipson@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Ross Philipson [Thu, 10 Jan 2013 17:17:21 +0000 (17:17 +0000)]
HVM firmware passthrough control tools support
Xen control tools support for loading the firmware passthrough blocks
during domain construction. SMBIOS and ACPI blocks are passed in using
the new xc_hvm_build_args structure. Each block is read and loaded
into the new domain address space behind the HVMLOADER image. The base
address for the two blocks is returned as an out parameter to the
caller via the args structure.
Signed-off-by: Ross Philipson <ross.philipson@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Ross Philipson [Thu, 10 Jan 2013 17:16:28 +0000 (17:16 +0000)]
HVM xenstore strings and firmware passthrough header
Add public HVM definitions header for xenstore strings used in
HVMLOADER. In addition this header describes the use of the firmware
passthrough values set using xenstore.
Signed-off-by: Ross Philipson <ross.philipson@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Samuel Thibault [Wed, 9 Jan 2013 08:43:53 +0000 (08:43 +0000)]
mini-os: Notify shutdown through weak function call instead of wake queue
To allow for more flexibility, this notifies domain shutdown through a
function rather than a wake queue, to let the application use a wake
queue only if it wishes.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Committed-by: Keir Fraser <keir@xen.org>
Dongxiao Xu [Tue, 8 Jan 2013 09:43:35 +0000 (10:43 +0100)]
nested vmx: synchronize page fault error code match and mask
Page fault is specially handled not only with exception bitmaps,
but also with consideration of page fault error code mask/match
values. Therefore in nested virtualization case, the two values
need to be synchronized from virtual VMCS to shadow VMCS.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Daniel De Graaf [Tue, 8 Jan 2013 09:37:38 +0000 (10:37 +0100)]
x86/hvm: Bind xen-created event channels to building domain
Instead of using a hardcoded domain 0 as the endpoint for the event
channels created in hvm_vcpu_initialise, use the domain ID of the
building domain so that a domain builder in a domain other than dom0 has
the expected access to the event channels.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 7 Jan 2013 13:20:26 +0000 (14:20 +0100)]
x86: fix assertion in get_page_type()
c/s 22998:e9fab50d7b61 (and immediately following ones) made it
possible that __get_page_type() returns other than -EINVAL, in
particular -EBUSY. Consequently, the assertion in get_page_type()
should check for only the return values we absolutely don't expect to
see there.
Jan Beulich [Mon, 7 Jan 2013 11:58:09 +0000 (12:58 +0100)]
IOMMU: add option to specify devices behaving like ones using phantom functions
At least certain Marvell SATA controllers are known to issue bus master
requests with a non-zero function as origin, despite themselves being
single function devices.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
Jan Beulich [Mon, 7 Jan 2013 11:56:52 +0000 (12:56 +0100)]
VT-d: relax source qualifier for MSI of phantom functions
With ordinary requests allowed to come from phantom functions, the
remapping tables ought to be set up to allow for MSI triggers to come
from other than the "real" device too.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
Jan Beulich [Mon, 7 Jan 2013 11:55:42 +0000 (12:55 +0100)]
IOMMU: add phantom function support
Apart from generating device context entries for the base function,
all phantom functions also need context entries to be generated for
them.
In order to distinguish different use cases, a variant of
pci_get_pdev() is being introduced that, even when passed a phantom
function number, would return the underlying actual device.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
Jan Beulich [Mon, 7 Jan 2013 11:54:39 +0000 (12:54 +0100)]
IOMMU/PCI: consolidate pdev_type() and cache its result for a given device
Add an "unknown" device types as well as one for PCI-to-PCIe bridges
(the latter of which other IOMMU code with or without this patch
doesn't appear to handle properly).
Make sure we don't mistake a device for which we can't access its
config space as a legacy PCI device (after all we in fact don't know
how to deal with such a device, and hence shouldn't try to).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: "Zhang, Xiantao" <xiantao.zhang@intel.com>
Andrew Cooper [Fri, 4 Jan 2013 09:06:47 +0000 (10:06 +0100)]
passthrough/domctl: use correct struct in union
This appears to be a copy paste error from c/s 23861:ec7c81fbe0de.
It is safe, functionally speaking, as both the xen_domctl_assign_device
and xen_domctl_get_device_group structure start with a 'uint32_t
machine_sbdf'. We should however use the correct union structure.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Ian Campbell [Fri, 21 Dec 2012 17:05:38 +0000 (17:05 +0000)]
tools/tests: Restrict some tests to x86 only
MCE injection and x86_emulator are clearly x86 specific.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Wed, 19 Dec 2012 16:04:49 +0000 (16:04 +0000)]
xen: remove nr_irqs_gsi from generic code
The concept is X86 specific.
AFAICT the generic concept here is the number of static physical IRQs
which the current hardware has, so call this nr_static_irqs.
Also using "defined NR_IRQS" as a standin for x86 might have made
sense at one point but its just cleaner to push the necessary
definitions into asm/irq.h.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 19 Dec 2012 14:33:24 +0000 (14:33 +0000)]
libxl: move definition of libxl_domain_config into the IDL
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 19 Dec 2012 14:16:30 +0000 (14:16 +0000)]
xen: arm: introduce arm32 as a subarch of arm.
- move 32-bit specific files into subarch specific arm32 subdirectory.
- move gic.h to xen/include/asm-arm (it is needed from both subarch
and generic code).
- make the appropriate build and config file changes to support
XEN_TARGET_ARCH=arm32.
This prepares us for an eventual 64-bit subarch.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 19 Dec 2012 14:16:29 +0000 (14:16 +0000)]
xen: arm: reorder registers in struct cpu_user_regs.
Primarily this is so that they are ordered in the same way as the
mapping from arm64 x0..x31 registers to the arm32 registers, which is
just less confusing for everyone going forward.
It also makes the implementation of select_user_regs in the next patch
slightly simpler.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 19 Dec 2012 14:16:23 +0000 (14:16 +0000)]
xen: arm: stub page_is_ram_type.
Callers are VT-d (so x86 specific) and various bits of page offlining
support, which although it looks generic (and is in xen/common) does
things like diving into page_info->count_info which is not generic.
In any case on this is only reachable via XEN_SYSCTL_page_offline_op,
which clearly shouldn't be called on ARM just yet.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 19 Dec 2012 14:16:22 +0000 (14:16 +0000)]
xen: arm: stub out wallclock time.
We don't currently have much concept of wallclock time on ARM (for
either the hypervisor, dom0 or guests). For now just stub everything
out. Specifically domain_set_time_offset, update_vcpu_system_time and
wallclock_time.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
` Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 19 Dec 2012 14:16:18 +0000 (14:16 +0000)]
xen: arm: Call init_xen_time earlier
If we panic before calling init_xen_time then the "Rebooting in 5
seconds" delay ends up calling udelay which uses cntfrq before it has
been initialised resulting in a divide by zero.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Andre Przywara [Wed, 19 Dec 2012 10:42:09 +0000 (11:42 +0100)]
x86, amd: Disable way access filter on Piledriver CPUs
The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.
The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.
The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.
The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.
More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.
Signed-off-by: Andre Przywara <osp@andrep.de>
Add and use MSR_AMD64_IC_CFG. Update the value whenever it is found to
not have all bits set, rather than just when it's zero.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Daniel De Graaf [Tue, 18 Dec 2012 18:16:52 +0000 (18:16 +0000)]
xen/arch/*: add struct domain parameter to arch_do_domctl
Since the arch-independent do_domctl function now RCU locks the domain
specified by op->domain, pass the struct domain to the arch-specific
domctl function and remove the duplicate per-subfunction locking.
This also removes two get_domain/put_domain call pairs (in
XEN_DOMCTL_assign_device and XEN_DOMCTL_deassign_device), replacing
them with RCU locking.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Tue, 18 Dec 2012 18:16:13 +0000 (18:16 +0000)]
xen: lock target domain in do_domctl common code
Because almost all domctls need to lock the target domain, do this by
default instead of repeating it in each domctl. This is not currently
extended to the arch-specific domctls, but RCU locks are safe to take
recursively so this only causes duplicate but correct locking.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Committed-by: Keir Fraser <keir@xen.org>
Dongxiao Xu [Tue, 18 Dec 2012 18:14:45 +0000 (18:14 +0000)]
nested vmx: nested TPR shadow/threshold emulation
TPR shadow/threshold feature is important to speedup the boot time
for Windows guest. Besides, it is a must feature for certain VMM.
We map virtual APIC page address and TPR threshold from L1 VMCS,
and synch it into shadow VMCS in virtual vmentry.
If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest, we
inject it into L1 VMM for handling.
Besides, this commit fixes an issue for apic access page, if L1
VMM didn't enable this feature, we need to fill zero into the
shadow VMCS.
So that it becomes possible to create scheduler specific trace
records, within each scheduler, without worrying about the
overlapping, and also without giving up being able to recognise them
univocally. The latter is deemed as useful, since we can have more
than one scheduler running at the same time, thanks to cpupools.
The event ID is 12 bits, and this change uses the upper 3 of them for
the 'scheduler ID'. This means we're limited to 8 schedulers and to
512 scheduler specific tracing events. Both seem reasonable
limitations as of now.
This also converts the existing credit2 tracing (the only scheduler
generating tracing events up to now) to the new system.
Dario Faggioli [Tue, 18 Dec 2012 18:10:57 +0000 (18:10 +0000)]
xen: sched_credit: improve tickling of idle CPUs
Right now, when a VCPU wakes-up, we check whether it should preempt
what is running on the PCPU, and whether or not the waking VCPU can
be migrated (by tickling some idlers). However, this can result in
suboptimal or even wrong behaviour, as explained here:
This change, instead, when deciding which PCPU(s) to tickle, upon
VCPU wake-up, considers both what it is likely to happen on the PCPU
where the wakeup occurs,and whether or not there are idlers where
the woken-up VCPU can run. In fact, if there are, we can avoid
interrupting the running VCPU. Only in case there aren't any of
these PCPUs, preemption and migration are the way to go.
This has been tested (on top of the previous change) by running
the following benchmarks inside 2, 6 and 10 VMs, concurrently, on
a shared host, each with 2 VCPUs and 960 MB of memory (host had 16
ways and 12 GB RAM).
Numbers show how the change has either no or very limited impact
(specjbb2005 case) or, when it does have some impact, that is a
real improvement in performances (sysbench-memory case).
Dario Faggioli [Tue, 18 Dec 2012 18:10:18 +0000 (18:10 +0000)]
xen: sched_credit: improve picking up the idle CPU for a VCPU
In _csched_cpu_pick() we try to select the best possible CPU for
running a VCPU, considering the characteristics of the underlying
hardware (i.e., how many threads, core, sockets, and how busy they
are). What we want is "the idle execution vehicle with the most
idling neighbours in its grouping".
In order to achieve it, we select a CPU from the VCPU's affinity,
giving preference to its current processor if possible, as the basis
for the comparison with all the other CPUs. Problem is, to discount
the VCPU itself when computing this "idleness" (in an attempt to be
fair wrt its current processor), we arbitrarily and unconditionally
consider that selected CPU as idle, even when it is not the case,
for instance:
1. If the CPU is not the one where the VCPU is running (perhaps due
to the affinity being changed);
2. The CPU is where the VCPU is running, but it has other VCPUs in
its runq, so it won't go idle even if the VCPU in question goes.
22005(...) line (the first line) means _csched_cpu_pick() was called
on VCPU 1 of domain 10, while it is running on CPU 0, and it choose
CPU 8, which is busy ('|'), even if there are plenty of idle
CPUs. That is because, as a consequence of changing the VCPU affinity,
CPU 8 was chosen as the basis for the comparison, and therefore
considered idle (its bit gets unconditionally set in the bitmask
representing the idle CPUs). 28004(...) line means the VCPU is woken
up and queued on CPU 8's runq, where it waits for a context switch or
a migration, in order to be able to execute.
This change fixes things by only considering the "guessed" CPU idle if
the VCPU in question is both running there and is its only runnable
VCPU.