]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agox86: check for allocation errors in modify_xen_mappings()
Jan Beulich [Fri, 25 Aug 2017 12:03:47 +0000 (14:03 +0200)]
x86: check for allocation errors in modify_xen_mappings()

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen: remove CONFIG_PAGING_ASSISTANCE
Wei Liu [Wed, 23 Aug 2017 15:58:23 +0000 (16:58 +0100)]
xen: remove CONFIG_PAGING_ASSISTANCE

Arm should always set it, while on x86 xen can't build with it set to
0, which means people haven't used it for years.

Remove it and simplify xen/paging.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agohvmloader: add fields for SMBIOS 2.4 compliance
Roger Pau Monné [Wed, 23 Aug 2017 15:47:38 +0000 (17:47 +0200)]
hvmloader: add fields for SMBIOS 2.4 compliance

The version of SMBIOS set in the entry point is 2.4, however several
structures are missing fields required by 2.4. Fix this by adding the
missing fields, this is based on the documents found at the DMTF site
[0].

Most fields are set to 0 (undefined/not specified), except for the
cache related handlers that need to be initialized to 0xffff in order
to signal that the information is not provided.

[0] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.1.1.pdf

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported by: Chris Gilbert <chris.gilbert@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxsm: policy hooks to require an IOMMU and interrupt remapping
Christopher Clark [Wed, 23 Aug 2017 15:47:04 +0000 (17:47 +0200)]
xsm: policy hooks to require an IOMMU and interrupt remapping

Isolation of devices passed through to domains usually requires an
active IOMMU. The existing method of requiring an IOMMU is via a Xen
boot parameter ("iommu=force") which will abort boot if an IOMMU is not
available.

More graceful degradation of behaviour when an IOMMU is absent can be
achieved by enabling XSM to perform enforcement of IOMMU requirement.

This patch enables an enforceable XSM policy to specify that an IOMMU is
required for particular domains to access devices and how capable that
IOMMU must be. This allows a Xen system to boot whilst still
ensuring that an IOMMU is active before permitting device use.

Using a XSM policy ensures that the isolation properties remain enforced
even when the large, complex toolstack software changes.

For some hardware platforms interrupt remapping is a strict requirement
for secure isolation. Not all IOMMUs provide interrupt remapping.
The XSM policy can now optionally require interrupt remapping.

The device use hooks now check whether an IOMMU is:
 * Active and securely isolating:
    -- current criteria for this is that interrupt remapping is ok
 * Active but interrupt remapping is not available
 * Not active

This patch also updates the reference XSM policy to use the new
primitives, with policy entries that do not require an active IOMMU.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Ross Philipson <ross.philipson@gmail.com>
7 years agoarm/mm: release grant lock on xenmem_add_to_physmap_one() error paths
Jan Beulich [Wed, 23 Aug 2017 15:45:45 +0000 (17:45 +0200)]
arm/mm: release grant lock on xenmem_add_to_physmap_one() error paths

Commit 55021ff9ab ("xen/arm: add_to_physmap_one: Avoid to map mfn 0 if
an error occurs") introduced error paths not releasing the grant table
lock. Replace them by a suitable check after the lock was dropped.

This is XSA-235.

Reported-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agox86: switch to plain bool in passthrough code
Wei Liu [Mon, 21 Aug 2017 14:09:13 +0000 (15:09 +0100)]
x86: switch to plain bool in passthrough code

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen: merge common hvm/irq.h into x86 hvm/irq.h
Wei Liu [Mon, 21 Aug 2017 14:09:12 +0000 (15:09 +0100)]
xen: merge common hvm/irq.h into x86 hvm/irq.h

That header file is only used by x86. Merge is into the x86 header.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen: move hvm save code under common to x86
Wei Liu [Mon, 21 Aug 2017 14:09:11 +0000 (15:09 +0100)]
xen: move hvm save code under common to x86

The code is only used by x86 at this point. Merge common/hvm/save.c
into x86 hvm/save.c. Move the headers and fix up inclusions. Remove
the now empty common/hvm directory.

Also fix some issues while moving:
1. removing trailing spaces;
2. fix multi-line comment;
3. make "i" in hvm_save unsigned int;
4. add some blank lines to separate sections of code;
5. change bool_t to bool.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agohvmloader, libxl: use the correct ACPI settings depending on device model
Igor Druzhinin [Thu, 17 Aug 2017 14:57:13 +0000 (15:57 +0100)]
hvmloader, libxl: use the correct ACPI settings depending on device model

We need to choose ACPI tables and ACPI IO port location
properly depending on the device model version we are running.
Previously, this decision was made by BIOS type specific
code in hvmloader, e.g. always load QEMU traditional specific
tables if it's ROMBIOS and always load QEMU Xen specific
tables if it's SeaBIOS.

This change saves this behavior (for compatibility) but adds
an additional way (xenstore key) to specify the correct
device model if we happen to run a non-default one. Toolstack
bit makes use of it.

The enforcement of BIOS type depending on QEMU version will
be lifted later when the rest of ROMBIOS compatibility fixes
are in place.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/libxc/xc_dom_arm: add missing variable initialization
Bernd Kuhls [Sat, 19 Aug 2017 14:21:42 +0000 (16:21 +0200)]
tools/libxc/xc_dom_arm: add missing variable initialization

The variable domctl.u.address_size.size may remain uninitialized if
guest_type is not one of xen-3.0-aarch64 or xen-3.0-armv7l. And the
code precisely checks if this variable is still 0 to decide if the
guest type is supported or not.

This fixes the following build failure with gcc 7.x:

xc_dom_arm.c:229:31: error: 'domctl.u.address_size.size' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     if ( domctl.u.address_size.size == 0 )

Patch originally taken from
https://www.mail-archive.com/xen-devel@lists.xen.org/msg109313.html.

Signed-off-by: Bernd Kuhls <bernd.kuhls@t-online.de>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agomm: Make sure pages are scrubbed
Boris Ostrovsky [Wed, 16 Aug 2017 18:31:00 +0000 (20:31 +0200)]
mm: Make sure pages are scrubbed

Add a debug Kconfig option that will make page allocator verify
that pages that were supposed to be scrubbed are, in fact, clean.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agomm: Print number of unscrubbed pages in 'H' debug handler
Boris Ostrovsky [Wed, 16 Aug 2017 18:30:00 +0000 (20:30 +0200)]
mm: Print number of unscrubbed pages in 'H' debug handler

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agomm: Keep heap accessible to others while scrubbing
Boris Ostrovsky [Wed, 16 Aug 2017 18:31:00 +0000 (20:31 +0200)]
mm: Keep heap accessible to others while scrubbing

Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agospinlock: Introduce spin_lock_cb()
Boris Ostrovsky [Wed, 16 Aug 2017 18:31:00 +0000 (20:31 +0200)]
spinlock: Introduce spin_lock_cb()

While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.

Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.

We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket and execute a
callback while waiting. This callback is executed on every iteration
of the spinlock waiting loop.

Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agomm: Scrub memory from idle loop
Boris Ostrovsky [Wed, 16 Aug 2017 18:30:00 +0000 (20:30 +0200)]
mm: Scrub memory from idle loop

Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.

We might come to scrub_free_pages()from idle loop while another CPU
uses mapcache override, resulting in a fault while trying to do
__map_domain_page() in scrub_one_page(). To avoid this, make mapcache
vcpu override a per-cpu variable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: vtimer: Re-order the includes alphabetically
Julien Grall [Fri, 11 Aug 2017 18:02:51 +0000 (19:02 +0100)]
xen/arm: vtimer: Re-order the includes alphabetically

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: vgic-v3: Re-order the includes alphabetically
Julien Grall [Fri, 11 Aug 2017 18:02:50 +0000 (19:02 +0100)]
xen/arm: vgic-v3: Re-order the includes alphabetically

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain: Re-order the includes alphabetically
Julien Grall [Fri, 11 Aug 2017 18:02:48 +0000 (19:02 +0100)]
xen/arm: domain: Re-order the includes alphabetically

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Tighten memory attribute requirement for memory shared
Julien Grall [Tue, 8 Aug 2017 17:17:26 +0000 (18:17 +0100)]
xen/arm: Tighten memory attribute requirement for memory shared

Xen allows shared mapping to be Normal inner-cacheable with any inner cache
allocation strategy and no restriction of the outer-cacheability.

However, Xen is always mapping those region Normal Inner Write-Back
Outer Write-Back Inner-shareable. Per B2.8 "Mismatched memory
attributes" in ARM DDI 0487B.a, if the guest is not using the exact same
memory attributes (excluding any cache allocation hints) for the shared
region then the region will be accessed with mismatched attributes.

This will result to potential loss of coherency, and may impact the
performance.

Given that the ARM ARM strongly recommends to avoid using mismatched
attributes, we should impose shared region to be Normal Inner Write-Back
Outer Write-Back Inner-shareable.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: p2m: Remove p2m_operation
Julien Grall [Fri, 11 Aug 2017 18:14:21 +0000 (19:14 +0100)]
xen/arm: p2m: Remove p2m_operation

This is a left over of before the P2M code was reworked. So drop it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agohvmloader: support system enclosure asset tag (SMBIOS type 3)
Vivek Kumar Chaubey [Mon, 21 Aug 2017 13:49:36 +0000 (15:49 +0200)]
hvmloader: support system enclosure asset tag (SMBIOS type 3)

Allow setting system enclosure asset tag for HVM guest. Guest OS can
check and perform desired operation like support installation.
Also added documentation of '~/bios-string/*' xenstore keys into
docs/misc/xenstore-paths.markdown

Signed-off-by: Vivek Kumar Chaubey <vivekkumar.chaubey@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agomm: Scrub pages in alloc_heap_pages() if needed
Boris Ostrovsky [Wed, 16 Aug 2017 18:30:00 +0000 (20:30 +0200)]
mm: Scrub pages in alloc_heap_pages() if needed

When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agomm: Extract allocation loop from alloc_heap_pages()
Boris Ostrovsky [Wed, 16 Aug 2017 18:31:00 +0000 (20:31 +0200)]
mm: Extract allocation loop from alloc_heap_pages()

This will make code a bit more readable, especially with changes that
will be introduced in subsequent patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agomm: Place unscrubbed pages at the end of pagelist
Boris Ostrovsky [Wed, 16 Aug 2017 18:31:00 +0000 (20:31 +0200)]
mm: Place unscrubbed pages at the end of pagelist

.. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: fix "don't use possibly unbounded tail calls"
Jan Beulich [Mon, 21 Aug 2017 13:43:36 +0000 (15:43 +0200)]
gnttab: fix "don't use possibly unbounded tail calls"

The compat mode code also needs adjustment to deal with the changed
return value from gnttab_copy().

This is part of XSA-226.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/x86/shadow: adjust barriers around gtable_dirty_version.
Tim Deegan [Fri, 18 Aug 2017 14:23:44 +0000 (15:23 +0100)]
xen/x86/shadow: adjust barriers around gtable_dirty_version.

Use the smp_ variants, as we're only synchronizing against other CPUs.

Add a write barrier before incrementing the version.

x86's memory ordering rules and the presence of various out-of-unit
function calls mean that this code worked OK before, and the barriers
are mostly decorative.

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoarm: traps: handle SMC32 in check_conditional_instr()
Volodymyr Babchuk [Wed, 16 Aug 2017 18:44:57 +0000 (21:44 +0300)]
arm: traps: handle SMC32 in check_conditional_instr()

On ARMv8 architecture we need to ensure that conditional check was passed
for a trapped SMC instruction that originates from AArch32 state
(ARM DDI 0487B.a page D7-2271).
Thus, we should not skip it while checking HSR.EC value.

For this type of exception special coding of HSR.ISS is used. There is
additional flag (CCKNOWNPASS) to be checked before performing standard
handling of CCVALID and COND fields.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm: traps: handle unknown exceptions in check_conditional_instr()
Volodymyr Babchuk [Wed, 16 Aug 2017 18:44:56 +0000 (21:44 +0300)]
arm: traps: handle unknown exceptions in check_conditional_instr()

According to ARM architecture reference manual (ARM DDI 0487B.a page D7-2259,
ARM DDI 0406C.c page B3-1426), exception with unknown reason (HSR.EC == 0)
has no valid bits in HSR (apart from HSR.EC), so we can't check if that was
caused by conditional instruction. We need to assume that it is unconditional.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm: processor: add new struct hsr_smc32 into hsr union
Volodymyr Babchuk [Wed, 16 Aug 2017 18:44:55 +0000 (21:44 +0300)]
arm: processor: add new struct hsr_smc32 into hsr union

On ARMv8, one of conditional exceptions (SMC that originates
from AArch32 state) has extra field in HSR.ISS encoding:

CCKNOWNPASS, bit [19]
Indicates whether the instruction might have failed its condition
code check.
   0 - The instruction was unconditional, or was conditional and
   passed  its condition code check.
   1 - The instruction was conditional, and might have failed its
   condition code check.
(ARM DDI 0487B.a page D7-2272)

This is an instruction specific field, so better to add new structure
to union hsr. This structure describes ISS encoding for an exception
from SMC instruction executing in AArch32 state. But we define this
struct for both ARMv7 and ARMv8, because ARMv8 encoding is backwards
compatible with ARMv7.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Walk the guest's pt in software
Sergej Proskurin [Wed, 16 Aug 2017 13:17:44 +0000 (15:17 +0200)]
arm/mem_access: Walk the guest's pt in software

In this commit, we make use of the gpt walk functionality introduced in
the previous commits. If mem_access is active, hardware-based gva to ipa
translation might fail, as gva_to_ipa uses the guest's translation
tables, access to which might be restricted by the active VTTBR. To
side-step potential translation errors in the function
p2m_mem_access_check_and_get_page due to restricted memory (e.g. to the
guest's page tables themselves), we walk the guest's page tables in
software.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Add short-descriptor based gpt
Sergej Proskurin [Wed, 16 Aug 2017 13:17:43 +0000 (15:17 +0200)]
arm/mem_access: Add short-descriptor based gpt

This commit adds functionality to walk the guest's page tables using the
short-descriptor translation table format for both ARMv7 and ARMv8. The
implementation is based on ARM DDI 0487B-a J1-6002 and ARM DDI 0406C-b
B3-1506.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Add long-descriptor based gpt
Sergej Proskurin [Wed, 16 Aug 2017 13:17:42 +0000 (15:17 +0200)]
arm/mem_access: Add long-descriptor based gpt

This commit adds functionality to walk the guest's page tables using the
long-descriptor translation table format for both ARMv7 and ARMv8.
Similar to the hardware architecture, the implementation supports
different page granularities (4K, 16K, and 64K). The implementation is
based on ARM DDI 0487B.a J1-5922, J1-5999, and ARM DDI 0406C.b B3-1510.

Note that the current implementation lacks support for Large VA/PA on
ARMv8.2 architectures (LVA/LPA, 52-bit virtual and physical address
sizes). The associated location in the code is marked appropriately.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Add software guest-page-table walk
Sergej Proskurin [Wed, 16 Aug 2017 13:17:41 +0000 (15:17 +0200)]
arm/mem_access: Add software guest-page-table walk

The function p2m_mem_access_check_and_get_page in mem_access.c
translates a gva to an ipa by means of the hardware functionality of the
ARM architecture. This is implemented in the function gva_to_ipa. If
mem_access is active, hardware-based gva to ipa translation might fail,
as gva_to_ipa uses the guest's translation tables, access to which might
be restricted by the active VTTBR. To address this issue, in this commit
we add a software-based guest-page-table walk, which will be used by the
function p2m_mem_access_check_and_get_page perform the gva to ipa
translation in software in one of the following commits.

Note: The introduced function guest_walk_tables assumes that the domain,
the gva of which is to be translated, is running on the currently active
vCPU. To walk the guest's page tables on a different vCPU, the following
registers would need to be loaded: TCR_EL1, TTBR0_EL1, TTBR1_EL1, and
SCTLR_EL1.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/guest_access: Rename vgic_access_guest_memory
Sergej Proskurin [Wed, 16 Aug 2017 13:17:40 +0000 (15:17 +0200)]
arm/guest_access: Rename vgic_access_guest_memory

This commit renames the function vgic_access_guest_memory to
access_guest_memory_by_ipa. As the function name suggests, the functions
expects an IPA as argument. All invocations of this function have been
adapted accordingly. Apart from that, we have adjusted all printk
messages for cleanup and to eliminate artefacts of the function's
previous location.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/guest_access: Move vgic_access_guest_memory to guest_access.h
Sergej Proskurin [Wed, 16 Aug 2017 13:17:39 +0000 (15:17 +0200)]
arm/guest_access: Move vgic_access_guest_memory to guest_access.h

This commit moves the function vgic_access_guest_memory to guestcopy.c
and the header asm/guest_access.h. No functional changes are made.
Please note that the function will be renamed in the following commit.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Introduce GENMASK_ULL bit operation
Sergej Proskurin [Wed, 16 Aug 2017 13:17:38 +0000 (15:17 +0200)]
arm/mem_access: Introduce GENMASK_ULL bit operation

The current implementation of GENMASK is capable of creating bitmasks of
32-bit values on AArch32 and 64-bit values on AArch64. As we need to
create masks for 64-bit values on AArch32 as well, in this commit we
introduce the GENMASK_ULL bit operation. Please note that the
GENMASK_ULL implementation has been lifted from the linux kernel source
code.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Introduce BIT_ULL bit operation
Sergej Proskurin [Wed, 16 Aug 2017 13:17:37 +0000 (15:17 +0200)]
arm/mem_access: Introduce BIT_ULL bit operation

We introduce the BIT_ULL macro to using values of unsigned long long as
to enable setting bits of 64-bit registers on AArch32.  In addition,
this commit adds a define holding the register width of 64 bit
double-word registers. This define simplifies using the associated
constants in the following commits.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Introduce GV2M_EXEC permission
Sergej Proskurin [Wed, 16 Aug 2017 13:17:36 +0000 (15:17 +0200)]
arm/mem_access: Introduce GV2M_EXEC permission

We extend the current implementation by an additional permission,
GV2M_EXEC, which will be used to describe execute permissions of PTE's
as part of our guest translation table walk implementation.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Add short-descriptor pte typedefs and macros
Sergej Proskurin [Wed, 16 Aug 2017 13:17:35 +0000 (15:17 +0200)]
arm/mem_access: Add short-descriptor pte typedefs and macros

The current implementation does not provide appropriate types for
short-descriptor translation table entries. As such, this commit adds new
types, which simplify managing the respective translation table entries.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/lpae: Introduce lpae_is_page helper
Sergej Proskurin [Wed, 16 Aug 2017 13:17:34 +0000 (15:17 +0200)]
arm/lpae: Introduce lpae_is_page helper

This commit introduces a new helper that checks whether the target PTE
holds a page mapping or not. This helper will be used as part of the
following commits.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Add defines supporting PTs with varying page sizes
Sergej Proskurin [Wed, 16 Aug 2017 13:17:33 +0000 (15:17 +0200)]
arm/mem_access: Add defines supporting PTs with varying page sizes

AArch64 supports pages with different (4K, 16K, and 64K) sizes.  To
enable guest page table walks for various configurations, this commit
extends the defines and helpers of the current implementation.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/mem_access: Add and cleanup (TCR_|TTBCR_)* defines
Sergej Proskurin [Wed, 16 Aug 2017 13:17:32 +0000 (15:17 +0200)]
arm/mem_access: Add and cleanup (TCR_|TTBCR_)* defines

This commit adds (TCR_|TTBCR_)* defines to simplify access to the
respective register contents. At the same time, we adjust the macros
TCR_T0SZ and TCR_TG0_* by using the newly introduced TCR_T0SZ_SHIFT and
TCR_TG0_SHIFT instead of the hardcoded values.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86: move pv_emul_is_mem_write to pv/emulate.h
Wei Liu [Wed, 19 Jul 2017 15:15:48 +0000 (16:15 +0100)]
x86: move pv_emul_is_mem_write to pv/emulate.h

Make it a static inline function in pv/emulate.h.  In the mean time it
is required to include pv/emulate.h in x86/mm.c.

The said function will be used later by different emulation handlers
in later patches.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mm: document the return values from get_page_from_l*e
Wei Liu [Wed, 19 Jul 2017 14:59:11 +0000 (15:59 +0100)]
x86/mm: document the return values from get_page_from_l*e

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: lift PAGE_CACHE_ATTRS to page.h
Wei Liu [Fri, 7 Jul 2017 14:26:28 +0000 (15:26 +0100)]
x86/mm: lift PAGE_CACHE_ATTRS to page.h

Currently all the users are within x86/mm.c. But that will change once
we split PV specific mm code to another file. Lift that to page.h
along side _PAGE_* in preparation for later patches.

No functional change. Add some spaces around "|" while moving.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mm: split HVM grant table code to hvm/grant_table.c
Wei Liu [Thu, 20 Jul 2017 15:13:42 +0000 (16:13 +0100)]
x86/mm: split HVM grant table code to hvm/grant_table.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: carve out replace_grant_pv_mapping
Wei Liu [Fri, 7 Jul 2017 13:50:36 +0000 (14:50 +0100)]
x86/mm: carve out replace_grant_pv_mapping

And at once make it an inline function. Add declarations of
replace_grant_{p2m,pv}_mapping to respective header files.

The code movement will be done later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: carve out create_grant_pv_mapping
Wei Liu [Fri, 7 Jul 2017 13:04:18 +0000 (14:04 +0100)]
x86/mm: carve out create_grant_pv_mapping

And at once make create_grant_host_mapping an inline function.  This
requires making create_grant_{p2m,pv}_mapping non-static.  Provide
{p2m,pv}/grant_table.h. Include the headers where necessary.

The two functions create_grant_{p2m,pv}_mapping will be moved later in
a dedicated patch with all their helpers.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/smp: Misc cleanup
Andrew Cooper [Fri, 18 Aug 2017 10:27:27 +0000 (11:27 +0100)]
x86/smp: Misc cleanup

 * Delete trailing whitespace
 * Switch to using mfn_t for mfn_to_page()/page_to_mfn()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: Override mfn_to_page() and page_to_mfn() to use mfn_t
Andrew Cooper [Fri, 18 Aug 2017 10:27:26 +0000 (11:27 +0100)]
x86/mm: Override mfn_to_page() and page_to_mfn() to use mfn_t

To avoid breaking the build elsewhere, the l{1..4}e_{from,get}_page() macros
are switched to using __mfn_to_page() and __page_to_mfn().

Most changes are wrapping or removing _mfn()/mfn_x() from existing callsites.

However, {alloc,free}_l1_table() are switched to using __map_domain_page(), as
their pfn parameters are otherwise unused.  get_page() has one pfn->mfn
correction in a printk(), and __get_page_type()'s IOMMU handling has its gfn
calculation broken out for clarity.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/asm: add .file directives
Jan Beulich [Thu, 17 Aug 2017 12:45:14 +0000 (14:45 +0200)]
x86/asm: add .file directives

Make sure local symbols are correctly associated with their source
files: I've just run across a cpufreq.c#create_bounce_frame stack trace
entry. Since we have multiple entry.S, don't use __FILE__ there to
fully disambiguate things.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: clean up main switch in do_grant_table_op()
Jan Beulich [Thu, 17 Aug 2017 12:44:38 +0000 (14:44 +0200)]
gnttab: clean up main switch in do_grant_table_op()

Add blank lines as necessary and drop unnecessary braces.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: drop struct active_grant_entry's gfn field for release builds
Jan Beulich [Thu, 17 Aug 2017 12:44:02 +0000 (14:44 +0200)]
gnttab: drop struct active_grant_entry's gfn field for release builds

This shrinks the size from 48 to 40 bytes bytes on 64-bit builds.
Switch to gfn_t at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: re-arrange struct active_grant_entry
Jan Beulich [Thu, 17 Aug 2017 12:42:58 +0000 (14:42 +0200)]
gnttab: re-arrange struct active_grant_entry

While benign to 32-bit arches, this shrinks the size from 56 to 48
bytes on 64-bit ones (while still leaving a 16-bit hole).

Take the opportunity and consistently use bool/true/false for all
is_sub_page uses.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: drop pointless leading double underscores
Jan Beulich [Thu, 17 Aug 2017 12:42:27 +0000 (14:42 +0200)]
gnttab: drop pointless leading double underscores

They're violating name space rules, and we don't really need them. When
followed by "gnttab_", also drop that.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: type adjustments
Jan Beulich [Thu, 17 Aug 2017 12:41:57 +0000 (14:41 +0200)]
gnttab: type adjustments

In particular use grant_ref_t and grant_handle_t where appropriate.
Also switch other nearby type uses to their canonical variants where
appropriate and introduce INVALID_MAPTRACK_HANDLE.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: avoid spurious maptrack handle allocation failures
Jan Beulich [Thu, 17 Aug 2017 12:41:01 +0000 (14:41 +0200)]
gnttab: avoid spurious maptrack handle allocation failures

When no memory is available in the hypervisor, rather than immediately
failing the request, try to steal a handle from another vCPU.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: fix transitive grant handling
Jan Beulich [Thu, 17 Aug 2017 12:40:31 +0000 (14:40 +0200)]
gnttab: fix transitive grant handling

Processing of transitive grants must not use the fast path, or else
reference counting breaks due to the skipped recursive call to
__acquire_grant_for_copy() (its __release_grant_for_copy()
counterpart occurs independent of original pin count). Furthermore
after re-acquiring temporarily dropped locks we need to verify no grant
properties changed if the original pin count was non-zero; checking
just the pin counts is sufficient only for well-behaved guests. As a
result, __release_grant_for_copy() needs to mirror that new behavior.

Furthermore a __release_grant_for_copy() invocation was missing on the
retry path of __acquire_grant_for_copy(), and gnttab_set_version() also
needs to bail out upon encountering a transitive grant.

This is part of XSA-226.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: don't use possibly unbounded tail calls
Jan Beulich [Thu, 17 Aug 2017 12:39:18 +0000 (14:39 +0200)]
gnttab: don't use possibly unbounded tail calls

There is no guarantee that the compiler would actually translate them
to branches instead of calls, so only ones with a known recursion limit
are okay:
- __release_grant_for_copy() can call itself only once, as
  __acquire_grant_for_copy() won't permit use of multi-level transitive
  grants,
- __acquire_grant_for_copy() is fine to call itself with the last
  argument false, as that prevents further recursion,
- __acquire_grant_for_copy() must not call itself to recover from an
  observed change to the active entry's pin count

This is part of XSA-226.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mm: Reduce debug overhead of __virt_to_maddr()
Andrew Cooper [Wed, 16 Aug 2017 12:01:03 +0000 (13:01 +0100)]
x86/mm: Reduce debug overhead of __virt_to_maddr()

__virt_to_maddr() is used very frequently, but has a large footprint due to
its assertions and comparasons.

Rearange its logic to drop one assertion entirely, encoding its check in a
second assertion (with no additional branch, and the comparason performed with
a 32bit immediate rather than requiring a movabs).

Bloat-o-meter net report is:
  add/remove: 0/0 grow/shrink: 1/72 up/down: 3/-2169 (-2166)

along with a reduction of 32 assertion frames (895 down to 861)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/svm: Use physical addresses for HSA and Host VMCB
Andrew Cooper [Wed, 16 Aug 2017 13:31:37 +0000 (13:31 +0000)]
x86/svm: Use physical addresses for HSA and Host VMCB

They are only referenced by physical address (either the HSA MSR, or via
VMSAVE/VMLOAD which take a physical operand).  Allocating xenheap pages and
storing their virtual address is wasteful.

Allocate them with domheap pages instead, taking the opportunity to suitably
NUMA-position them.  This avoids Xen needing to perform a virt to phys
translation on every context switch.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/mcheck: Minor cleanup to amd_nonfatal
Andrew Cooper [Tue, 15 Aug 2017 15:14:08 +0000 (15:14 +0000)]
x86/mcheck: Minor cleanup to amd_nonfatal

  * Drop trailing whitespace.
  * Move amd_nonfatal_mcheck_init() into .init.text and drop a trailing return.
  * Drop unnecessary wmb()'s.  Because of Xen's implementation, they are only
    compiler barriers anyway, and each wrmsr() is already fully serialising.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: Drop __PAGE_OFFSET
Andrew Cooper [Wed, 16 Aug 2017 12:47:25 +0000 (13:47 +0100)]
x86/mm: Drop __PAGE_OFFSET

It is a vestigial leftover of Xen having inherited Linux's memory management
code in the early days.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: Drop more PV superpage leftovers
Andrew Cooper [Wed, 16 Aug 2017 12:25:17 +0000 (13:25 +0100)]
x86/mm: Drop more PV superpage leftovers

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: don't check alloc_boot_pages return
Julien Grall [Wed, 16 Aug 2017 10:27:22 +0000 (12:27 +0200)]
x86/mm: don't check alloc_boot_pages return

The only way alloc_boot_pages will return 0 is during the error case.
Although, Xen will panic in the error path. So the check in the caller
is pointless.

Looking at the loop, my understanding is it will try to allocate in
smaller chunk if a bigger chunk fail. Given that alloc_boot_pages can
never check, the loop seems unecessary.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/srat: don't check alloc_boot_pages return
Julien Grall [Wed, 16 Aug 2017 10:27:02 +0000 (12:27 +0200)]
x86/srat: don't check alloc_boot_pages return

alloc_boot_pages will panic if it is not possible to allocate. So the
check in the caller is pointless.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/numa: don't check alloc_boot_pages return
Julien Grall [Wed, 16 Aug 2017 10:26:37 +0000 (12:26 +0200)]
x86/numa: don't check alloc_boot_pages return

alloc_boot_pages will panic if it is not possible to allocate. So the
check in the caller is pointless.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/psr: fix coding style issue
Yi Sun [Wed, 16 Aug 2017 09:03:29 +0000 (11:03 +0200)]
x86/psr: fix coding style issue

In psr.c, we defined some macros but the coding style is not good.
Use '(1u << X)' to replace '(1<<X)'.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
7 years agognttab: use DIV_ROUND_UP() instead of open-coding it
Jan Beulich [Wed, 16 Aug 2017 09:02:48 +0000 (11:02 +0200)]
gnttab: use DIV_ROUND_UP() instead of open-coding it

Also adjust formatting of nearby code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: move GNTPIN_* out of header file
Jan Beulich [Wed, 16 Aug 2017 09:02:10 +0000 (11:02 +0200)]
gnttab: move GNTPIN_* out of header file

They're private to grant_table.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: drop useless locking
Jan Beulich [Wed, 16 Aug 2017 08:56:23 +0000 (10:56 +0200)]
gnttab: drop useless locking

Holding any lock while accessing the maptrack entry fields is
pointless, as these entries are protected by their associated active
entry lock (which is being acquired later, before re-validating the
fields read without holding the lock).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86_64/mm: remove extraneous breaks in m2p_mapped
Wei Liu [Tue, 15 Aug 2017 10:21:04 +0000 (11:21 +0100)]
x86_64/mm: remove extraneous breaks in m2p_mapped

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen: lift hypercall_cancel_continuation to sched.h
Wei Liu [Mon, 14 Aug 2017 15:46:28 +0000 (16:46 +0100)]
xen: lift hypercall_cancel_continuation to sched.h

The function is the same on both x86 and arm. Lift it to sched.h to
save a function call, make it take a pointer to vcpu to avoid
resolving current every time it gets called.

Take the chance to change one of its callers to only use one current
in code.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxc: correct error message in xc_sr_common.c
Juergen Gross [Thu, 10 Aug 2017 11:24:27 +0000 (13:24 +0200)]
libxc: correct error message in xc_sr_common.c

When the record length for sending the p2m frames in a migration
stream is too large, the issued error message is not very helpful:

xc: Record (0x00000003, x86 PV P2M frames) length 0x8 exceeds max
    (0x800000): Internal error

When printing the error use the size which was tested instead that of
the record header length.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agocommon/gnttab: simplify gnttab_copy_lock_domain()
Andrew Cooper [Tue, 20 Jun 2017 09:40:56 +0000 (10:40 +0100)]
common/gnttab: simplify gnttab_copy_lock_domain()

Remove the opencoded rcu_lock_domain_by_any_id().  Drop the PIN_FAIL()s and
return GNTST_* values directly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agocommon/gnttab: gnttab_query_size() cleanup
Andrew Cooper [Tue, 20 Jun 2017 09:40:56 +0000 (10:40 +0100)]
common/gnttab: gnttab_query_size() cleanup

Drop pointless debugging messages, and reduce variable scope.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agocommon/gnttab: gnttab_setup_table() cleanup
Andrew Cooper [Tue, 20 Jun 2017 09:40:56 +0000 (10:40 +0100)]
common/gnttab: gnttab_setup_table() cleanup

Drop pointless debugging messages, reduce variable scope, and correct the type
of an induction variable.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agocommon/gnttab: General cleanup
Andrew Cooper [Tue, 20 Jun 2017 09:40:56 +0000 (10:40 +0100)]
common/gnttab: General cleanup

 * Drop trailing whitespace
 * Style corrections

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agocommon/gnttab: Correct error handling for gnttab_setup_table()
Andrew Cooper [Tue, 20 Jun 2017 09:40:56 +0000 (10:40 +0100)]
common/gnttab: Correct error handling for gnttab_setup_table()

Simplify the error labels to just "unlock" and "out".  This fixes an erroneous
path where a failure of rcu_lock_domain_by_any_id() still results in
rcu_unlock_domain() being called.

This is only not an XSA by luck.  rcu_unlock_domain() is a nop other than
decrementing the preempt count, and nothing reads the preempt count outside of
a debug build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hpet: Improve handling of timer_deadline
Andrew Cooper [Wed, 31 May 2017 13:56:26 +0000 (14:56 +0100)]
x86/hpet: Improve handling of timer_deadline

timer_deadline is only ever updated via this_cpu() in timer_softirq_action(),
so is not going to change behind the back of the currently running cpu.

Update hpet_broadcast_{enter,exit}() to cache the value in a local variable to
avoid the repeated RELOC_HIDE() penalty.

handle_hpet_broadcast() reads the timer_deadlines of remote cpus, but there is
no need to force the read for cpus which are not present in the mask.  One
requirement is that we only sample the value once (which happens as a side
effect of RELOC_HIDE()), but is made more explicit with ACCESS_ONCE().

Bloat-o-meter shows a modest improvement:

  add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-144 (-144)
  function                                     old     new   delta
  hpet_broadcast_exit                          335     313     -22
  hpet_broadcast_enter                         327     278     -49
  handle_hpet_broadcast                        572     499     -73

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: correct pin status fixup for copy
Jan Beulich [Tue, 15 Aug 2017 13:08:03 +0000 (15:08 +0200)]
gnttab: correct pin status fixup for copy

Regardless of copy operations only setting GNTPIN_hst*, GNTPIN_dev*
also need to be taken into account when deciding whether to clear
_GTF_{read,writ}ing. At least for consistency with code elsewhere the
read part better doesn't use any mask at all.

This is XSA-230.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: split maptrack lock to make it fulfill its purpose again
Jan Beulich [Tue, 15 Aug 2017 13:07:25 +0000 (15:07 +0200)]
gnttab: split maptrack lock to make it fulfill its purpose again

The way the lock is currently being used in get_maptrack_handle(), it
protects only the maptrack limit: The function acts on current's list
only, so races on list accesses are impossible even without the lock.

Otoh list access races are possible between __get_maptrack_handle() and
put_maptrack_handle(), due to the invocation of the former for other
than current from steal_maptrack_handle(). Introduce a per-vCPU lock
for list accesses to become race free again. This lock will be
uncontended except when it becomes necessary to take the steal path,
i.e. in the common case there should be no meaningful performance
impact.

When in get_maptrack_handle adds a stolen entry to a fresh, empty,
freelist, we think that there is probably no concurrency.  However,
this is not a fast path and adding the locking there makes the code
clearly correct.

Also, while we are here: the stolen maptrack_entry's tail pointer was
not properly set.  Set it.

This is CVE-2017-12136 / XSA-228.

Reported-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agox86/grant: disallow misaligned PTEs
Andrew Cooper [Tue, 15 Aug 2017 13:06:45 +0000 (15:06 +0200)]
x86/grant: disallow misaligned PTEs

Pagetable entries must be aligned to function correctly.  Disallow attempts
from the guest to have a grant PTE created at a misaligned address, which
would result in corruption of the L1 table with largely-guest-controlled
values.

This is CVE-2017-12137 / XSA-227.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agomm: clean up free_heap_pages()
Boris Ostrovsky [Mon, 14 Aug 2017 15:18:49 +0000 (17:18 +0200)]
mm: clean up free_heap_pages()

Make buddy merging part of free_heap_pages() a bit more readable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agogrant_table: include mm.h in xen/grant_table.h
Julien Grall [Mon, 14 Aug 2017 15:17:44 +0000 (17:17 +0200)]
grant_table: include mm.h in xen/grant_table.h

While re-ordering the include alphabetically in arch/arm/domain.c, I got
a complitation error because grant_table.h is using gfn_t before been
defined:

In file included from domain.c:14:0:
xen/xen/include/xen/grant_table.h:153:29: error: unknown type name \91gfn_t\92
                             gfn_t *gfn, uint16_t *status);
                             ^

Fix it by including xen/mm.h in it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/page: Introduce and use PAGE_HYPERVISOR_UC
Andrew Cooper [Mon, 14 Aug 2017 10:52:20 +0000 (11:52 +0100)]
x86/page: Introduce and use PAGE_HYPERVISOR_UC

Always map the PCI MMCFG region as strongly uncacheable.  Nothing good will
happen if stray MTRR settings end up converting UC- to WC.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/page: Rename PAGE_HYPERVISOR_NOCACHE to PAGE_HYPERVISOR_UCMINUS
Andrew Cooper [Mon, 14 Aug 2017 10:42:24 +0000 (11:42 +0100)]
x86/page: Rename PAGE_HYPERVISOR_NOCACHE to PAGE_HYPERVISOR_UCMINUS

To better describe its actual function.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/config: Fix stale documentation concerning virtual layout
Andrew Cooper [Fri, 11 Aug 2017 13:35:48 +0000 (14:35 +0100)]
x86/config: Fix stale documentation concerning virtual layout

The hypercall argument translation area lives in the per-domain mappings in
PML4 slot 260.  Nothing currently resides in the lower canonical half above
the 4GB boundary in a 32bit PV guest.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen: remove struct domain and vcpu declarations from types.h
Wei Liu [Thu, 10 Aug 2017 17:22:53 +0000 (18:22 +0100)]
xen: remove struct domain and vcpu declarations from types.h

They don't belong there. Removing them causes build errors in several
places. Add the forward declarations in those places.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/flask: Switch to using bool
Andrew Cooper [Fri, 23 Jun 2017 10:56:37 +0000 (10:56 +0000)]
xen/flask: Switch to using bool

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agoxsm/flask: Fix build following "xsm: correct AVC lookups for two sysctls"
Andrew Cooper [Thu, 10 Aug 2017 13:13:00 +0000 (14:13 +0100)]
xsm/flask: Fix build following "xsm: correct AVC lookups for two sysctls"

avc_current_has_perm() takes 4 arguments, not 3.  Spotted by a Travis
randconfig run which actually turned XSM on.

https://travis-ci.org/xen-project/xen/jobs/263063220

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agocommon/domain_page: Drop domain_mmap_cache infrastructure
Andrew Cooper [Wed, 26 Jul 2017 09:18:02 +0000 (10:18 +0100)]
common/domain_page: Drop domain_mmap_cache infrastructure

This infrastructure is used exclusively by the x86 do_mmu_update() hypercall.
Mapping and unmapping domain pages is probably not the slow part of that
function, but even with an opencoded caching implementation, Bloat-o-meter
reports:

  function                                     old     new   delta
  do_mmu_update                               6815    6573    -242

The !CONFIG_DOMAIN_PAGE stub code has a mismatch between mapping and
unmapping, which is a latent bug.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/psr: remove useless check in free_socket_resources
Wei Liu [Wed, 9 Aug 2017 12:35:19 +0000 (13:35 +0100)]
x86/psr: remove useless check in free_socket_resources

The check is useless because pointer arithmetic ensures "info" is
always non-zero.

Replace it with an ASSERT for socket_info. The only caller of
free_socket_resources already ensures socket_info is not NULL before
calling it.

Coverity-ID: 1416344

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/HVM: fix boundary check in hvmemul_insn_fetch() (again)
Jan Beulich [Thu, 10 Aug 2017 10:37:24 +0000 (12:37 +0200)]
x86/HVM: fix boundary check in hvmemul_insn_fetch() (again)

Commit 5a992b670b ("x86/hvm: Fix boundary check in
hvmemul_insn_fetch()") went a little too far in its correction to
commit 0943a03037 ("x86/hvm: Fixes to hvmemul_insn_fetch()"): Keep the
start offset check, but restore the original end offset one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agox86/mm: make various hotplug related functions static
Jan Beulich [Thu, 10 Aug 2017 10:36:58 +0000 (12:36 +0200)]
x86/mm: make various hotplug related functions static

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoIOMMU/PCI: properly annotate setup_one_hwdom_device()
Jan Beulich [Thu, 10 Aug 2017 10:36:24 +0000 (12:36 +0200)]
IOMMU/PCI: properly annotate setup_one_hwdom_device()

Its sole caller is __hwdom_init, so it can be such itself, too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agocpufreq: only stop ondemand governor if already started
Christopher Clark [Thu, 10 Aug 2017 10:35:50 +0000 (12:35 +0200)]
cpufreq: only stop ondemand governor if already started

On CPUFREQ_GOV_STOP in cpufreq_governor_dbs, shortcut to
return success if the governor is already stopped.

Avoid executing dbs_timer_exit, to prevent tripping an assertion
within a call to kill_timer on a timer that has not been prepared
with init_timer, if the CPUFREQ_GOV_START case has not
run beforehand.

kill_timer validates timer state:
 * itself, via BUG_ON(this_cpu(timers).running == timer);
 * within active_timer, ASSERTing timer->status is within bounds;
 * within list_del, which ASSERTs timer inactive list membership.

Patch is synonymous to an OpenXT patch produced at Citrix prior to
June 2014.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxsm: correct AVC lookups for two sysctls
Daniel De Graaf [Thu, 10 Aug 2017 10:35:28 +0000 (12:35 +0200)]
xsm: correct AVC lookups for two sysctls

The current code was incorrectly using SECCLASS_XEN instead of
SECCLASS_XEN2, resulting in the wrong permission being checked.

GET_CPU_LEVELLING_CAPS was checking MTRR_DEL
GET_CPU_FEATURESET was checking MTRR_READ

The default XSM policy only allowed these permissions to dom0, so this
didn't result in a security issue there.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/tboot: disable interrupts after map_pages_to_xen() in tboot_shutdown()
Christopher Clark [Thu, 10 Aug 2017 10:34:58 +0000 (12:34 +0200)]
x86/tboot: disable interrupts after map_pages_to_xen() in tboot_shutdown()

Move the point where interrupts are disabled in tboot_shutdown
to slightly later, to after the call to map_pages_to_xen.

This patch originated in OpenXT with the following report:

"Disabling interrupts early causes debug assertions.

This is only seen with debug builds but since it causes assertions it is
probably a bigger problem. It clearly says in map_pages_to_xen that it
should not be called with interrupts disabled. Moved disabling to just
after that call."

The Xen code comment ahead of map_pages_to_xen notes that the CPU cache
flushing in map_pages_to_xen differs depending on whether interrupts are
enabled or not. The flush logic with interrupts enabled is more
conservative, flushing all CPUs' TLBs/caches, rather than just local.
This is just before the tboot memory integrity MAC calculation is performed
in the case of entering S3.

Original patch author credit: Ross Philipson.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoAMD IOMMU: drop amd_iommu_setup_hwdom_device()
Jan Beulich [Thu, 10 Aug 2017 10:34:21 +0000 (12:34 +0200)]
AMD IOMMU: drop amd_iommu_setup_hwdom_device()

By moving its bridge special casing to amd_iommu_add_device(), we can
pass the latter to setup_hwdom_pci_devices() and at once consistently
handle bridges discovered at boot time as well as such reported by Dom0
later on.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>