]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agoxen/arm: gic-v3: Make sure ICC_SRE_EL1 is restored before ICH_VMCR_EL2
Julien Grall [Thu, 19 Oct 2017 17:09:05 +0000 (18:09 +0100)]
xen/arm: gic-v3: Make sure ICC_SRE_EL1 is restored before ICH_VMCR_EL2

Per 8.4.8 in ARM IHI 0069D, ICH_VMCR_EL2.VFIQEn is RES1 when
ICC_SRE_EL1.SRE is 1. This causes a Group 0 interrupt (as generated in
GICv2 mode) to be delivered as a FIQ to the guest, with potentially
consequence. So we must make sure that ICC_SRE_EL1 has been actually
programmed before at ICH_VMCR_EL2.

This was discovered when booting EFI in a GICv2 guest on a GICv3
hardware.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm: configure interrupts to be in non-secure group1
Stefano Stabellini [Wed, 18 Oct 2017 21:29:58 +0000 (14:29 -0700)]
arm: configure interrupts to be in non-secure group1

Xen uses non-secure group1 interrupts, however it doesn't configure the
GICv3 accordingly. Xen needs to set GICD_IGROUPR for SPIs and
GICR_IGROUPR0 for local interrupt to "1" to specify that interrupts
belong to group1. This is particularly important if the system has
GICD_CTLR.DS set, also see commit
7c9b973061b03af62734f613f6abec46c0dd4a88 in Linux.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Released-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoxen/public: Correct the definition of GNTTAB_CACHE_SOURCE_GREF
Andrew Cooper [Tue, 17 Oct 2017 14:11:23 +0000 (15:11 +0100)]
xen/public: Correct the definition of GNTTAB_CACHE_SOURCE_GREF

Discovered when running the XSA-232 PoC on a UBSAN-enabled hypervisor.

  (d79) XSA-232 PoC
  (XEN) ================================================================================
  (XEN) UBSAN: Undefined behaviour in grant_table.c:3217:25
  (XEN) left shift of 1 by 31 places cannot be represented in type 'int'
  (XEN) ----[ Xen-4.10.0-rc  x86_64  debug=y   Tainted:    H ]----

Update all of the GNTTAB_CACHE_* constants to be unsigned integers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/mm: Consolidate all Xen L4 slot writing into init_xen_l4_slots()
Andrew Cooper [Tue, 29 Aug 2017 10:35:31 +0000 (10:35 +0000)]
x86/mm: Consolidate all Xen L4 slot writing into init_xen_l4_slots()

There are currently three functions which write L4 pagetables for Xen, but
they all behave subtly differently.  sh_install_xen_entries_in_l4() in
particular is catering for two different usecases, which makes the safety of
the linear mappings hard to follow.

By consolidating the L4 pagetable writing in a single function, the resulting
setup of Xen's virtual layout is easier to understand.

No practical changes to the resulting L4, although the logic has been
rearranged to avoid rewriting some slots.  This changes the zap_ro_mpt
parameter to simply ro_mpt.

Both {hap,sh}_install_xen_entries_in_l4() get folded into their callers.  The
hap side only a single caller, while the shadow side has two.  The shadow
split helps highlight the correctness of the linear slots.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-acked-by: Julien Grall <julien.gral@linaro.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/mm: Consolidate all Xen L2 slot writing into init_xen_pae_l2_slots()
Andrew Cooper [Tue, 29 Aug 2017 10:35:31 +0000 (11:35 +0100)]
x86/mm: Consolidate all Xen L2 slot writing into init_xen_pae_l2_slots()

Having all of this logic together makes it easier to follow Xen's virtual
setup across the whole system.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-acked-by: Julien Grall <julien.gral@linaro.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoRevert "x86/mm: move PV l4 table setup code" and "x86/mm: factor out pv_arch_init_memory"
Andrew Cooper [Mon, 25 Sep 2017 10:11:05 +0000 (11:11 +0100)]
Revert "x86/mm: move PV l4 table setup code" and "x86/mm: factor out pv_arch_init_memory"

This reverts commit f3b95fd07fdb55b1db091fede1b9a7c71f1eaa1b and
1bd39738a5a34f529a610fb275cc83ee5ac7547a.

The following patches (post XSA-243 fixes) requires init_guest_l4_table()
being common code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-acked-by: Julien Grall <julien.gral@linaro.org>
7 years agotools: libxendevicemodel: Restore symbol versions for 1.0
Ian Jackson [Tue, 17 Oct 2017 16:52:02 +0000 (17:52 +0100)]
tools: libxendevicemodel: Restore symbol versions for 1.0

In 1462f9ea8f4219d520a530787b80c986e050aa98
"tools: libxendevicemodel: Provide xendevicemodel_shutdown"
we added a new version 1.1 to the symbol map and simply abolished
the old one.  That is quite wrong.

Instead, we should have left the 1.0 map alone and added a new version
which simply adds the new symbol.

Fix this.

Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agomm/shadow: fix declaration of fetch_type_names
Roger Pau Monné [Tue, 17 Oct 2017 10:23:53 +0000 (11:23 +0100)]
mm/shadow: fix declaration of fetch_type_names

fetch_type_names usage is guarded by SHADOW_DEBUG_PROPAGATE in
SHADOW_DEBUG, fix the declaration so it's also guarded by
SHADOW_DEBUG_PROPAGATE instead of DEBUG_TRACE_DUMP.

Observed while building with clang and ubsan enabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoxen/dom0: Fix latent dom0 construction bugs on all architectures
Andrew Cooper [Mon, 16 Oct 2017 13:20:00 +0000 (13:20 +0000)]
xen/dom0: Fix latent dom0 construction bugs on all architectures

 * x86 PV and ARM dom0's must not clear _VPF_down from v->pause_flags until
   all state is actually set up.  As it currently stands, d0v0 is eligible for
   scheduling before its registers have been set.  This is latent as we also
   hold a systemcontroller pause reference at the time which prevents d0 from
   being scheduled.

 * x86 PVH previously was not setting v->is_initialised for d0v0, despite
   setting the vcpu running eventually.  Therefore, a later VCPUOP_initialise
   hypercall will modify state under the feet of the running vcpu.  This is
   latent as PVH dom0 construction don't yet function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoConfig.mk, xen/Makefile: Update version to 4.10[.0]-rc 4.10.0-rc1
Ian Jackson [Mon, 16 Oct 2017 14:14:16 +0000 (15:14 +0100)]
Config.mk, xen/Makefile: Update version to 4.10[.0]-rc

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years ago*_REVISION: Swtich to fixed tags for Xen 4.10-rc1
Ian Jackson [Mon, 16 Oct 2017 14:09:00 +0000 (15:09 +0100)]
*_REVISION: Swtich to fixed tags for Xen 4.10-rc1

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agotoolcore: Build in rumprun environment too
Ian Jackson [Mon, 16 Oct 2017 10:05:11 +0000 (11:05 +0100)]
toolcore: Build in rumprun environment too

Otherwise,
  f942a9b4a12081d5f9a4679d06e88cb5d503396e
  xentoolcore_restrict_all: "Implement" for xenstore
breaks the build of the tools inside rumprun.

toolcore is in libs, so we need to add the CONFIG_RUMP special case to
tools/libs/Makefile and add toolcore there.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: fix do_update_va_mapping_otherdomain() wrt translated domains
Jan Beulich [Fri, 13 Oct 2017 10:43:41 +0000 (12:43 +0200)]
x86: fix do_update_va_mapping_otherdomain() wrt translated domains

While I can't seem to find any users of this hypercall (being a likely
explanation of why the problem wasn't noticed so far), just like for
do_mmu_update() paged-out and shared page handling is needed here. Move
all this logic into mod_l1_entry(), which then also results in no
longer
- doing any of this handling for non-present PTEs,
- acquiring two temporary page references when one is already more than
  enough.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: request page table page-in for the correct domain
Jan Beulich [Fri, 13 Oct 2017 10:42:43 +0000 (12:42 +0200)]
x86: request page table page-in for the correct domain

The domain passed to p2m_mem_paging_populate() should match the one
passed to the corresponding get_page_from_gfn().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agolibxl: dm_restrict: DEFINE_USERLOOKUP_HELPER returned a pointer to an auto
Ian Jackson [Fri, 13 Oct 2017 10:21:57 +0000 (11:21 +0100)]
libxl: dm_restrict: DEFINE_USERLOOKUP_HELPER returned a pointer to an auto

When I converted the previous open-coded user lookup functionality
into DEFINE_USERLOOKUP_HELPER, I moved the struct passwd buffer into
the function generated by the macro.  This is wrong because that
buffer is used by get{pw,gr}* for its return value, so the helper
function would contrive to return a pointer to the buffer on its own
stack.

Fix this by adding a buffer parameter to the generated helpers, that
the caller must supply, and updating all the call sites.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/cpu: Fix IST handling during PCPU bringup
Andrew Cooper [Thu, 12 Oct 2017 12:50:31 +0000 (14:50 +0200)]
x86/cpu: Fix IST handling during PCPU bringup

Clear IST references in newly allocated IDTs.  Nothing good will come of
having them set before the TSS is suitably constructed (although the chances
of the CPU surviving such an IST interrupt/exception is extremely slim).

Uniformly set the IST references after the TSS is in place.  This fixes an
issue on AMD hardware, where onlining a PCPU while PCPU0 is in HVM context
will cause IST_NONE to be copied into the new IDT, making that PCPU vulnerable
to privilege escalation from PV guests until it subsequently schedules an HVM
guest.

This is XSA-244.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/shadow: Don't create self-linear shadow mappings for 4-level translated guests
Andrew Cooper [Thu, 12 Oct 2017 12:50:07 +0000 (14:50 +0200)]
x86/shadow: Don't create self-linear shadow mappings for 4-level translated guests

When initially creating a monitor table for 4-level translated guests, don't
install a shadow-linear mapping.  This mapping is actually self-linear, and
trips up the writeable heuristic logic into following Xen's mappings, not the
guests' shadows it was expecting to follow.

A consequence of this is that sh_guess_wrmap() needs to cope with there being
no shadow-linear mapping present, which in practice occurs once each time a
vcpu switches to 4-level paging from a different paging mode.

An appropriate shadow-linear slot will be inserted into the monitor table
either while constructing lower level monitor tables, or by sh_update_cr3().

While fixing this, clarify the safety of the other mappings.  Despite
appearing unsafe, it is correct to create a guest-linear mapping for
translated domains; this is self-linear and doesn't point into the translated
domain.  Drop a dead clause for translate != external guests.

This is XSA-243.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agox86: don't allow page_unlock() to drop the last type reference
Jan Beulich [Wed, 27 Sep 2017 10:00:56 +0000 (11:00 +0100)]
x86: don't allow page_unlock() to drop the last type reference

Only _put_page_type() does the necessary cleanup, and hence not all
domain pages can be released during guest cleanup (leaving around
zombie domains) if we get this wrong.

This is XSA-242.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: don't store possibly stale TLB flush time stamp
Jan Beulich [Thu, 12 Oct 2017 12:48:25 +0000 (14:48 +0200)]
x86: don't store possibly stale TLB flush time stamp

While the timing window is extremely narrow, it is theoretically
possible for an update to the TLB flush clock and a subsequent flush
IPI to happen between the read and write parts of the update of the
per-page stamp. Exclude this possibility by disabling interrupts
across the update, preventing the IPI to be serviced in the middle.

This is XSA-241.

Reported-by: Jann Horn <jannh@google.com>
Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86: limit linear page table use to a single level
Jan Beulich [Wed, 27 Sep 2017 10:46:52 +0000 (11:46 +0100)]
x86: limit linear page table use to a single level

That's the only way that they're meant to be used. Without such a
restriction arbitrarily long chains of same-level page tables can be
built, tearing down of which may then cause arbitrarily deep recursion,
causing a stack overflow. To facilitate this restriction, a counter is
being introduced to track both the number of same-level entries in a
page table as well as the number of uses of a page table in another
same-level one (counting into positive and negative direction
respectively, utilizing the fact that both counts can't be non-zero at
the same time).

Note that the added accounting introduces a restriction on the number
of times a page can be used in other same-level page tables - more than
32k of such uses are no longer possible.

Note also that some put_page_and_type[_preemptible]() calls are
replaced with open-coded equivalents.  This seemed preferrable to
adding "parent_table" to the matrix of functions.

Note further that cross-domain same-level page table references are no
longer permitted (they probably never should have been).

This is XSA-240.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/HVM: prefill partially used variable on emulation paths
Jan Beulich [Thu, 12 Oct 2017 12:43:26 +0000 (14:43 +0200)]
x86/HVM: prefill partially used variable on emulation paths

Certain handlers ignore the access size (vioapic_write() being the
example this was found with), perhaps leading to subsequent reads
seeing data that wasn't actually written by the guest. For
consistency and extra safety also do this on the read path of
hvm_process_io_intercept(), even if this doesn't directly affect what
guests get to see, as we've supposedly already dealt with read handlers
leaving data completely unitialized.

This is XSA-239.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/ioreq server: correctly handle bogus XEN_DMOP_{,un}map_io_range_to_ioreq_server...
Vitaly Kuznetsov [Tue, 5 Sep 2017 11:41:37 +0000 (13:41 +0200)]
x86/ioreq server: correctly handle bogus XEN_DMOP_{,un}map_io_range_to_ioreq_server arguments

Misbehaving device model can pass incorrect XEN_DMOP_map/
unmap_io_range_to_ioreq_server arguments, namely end < start when
specifying address range. When this happens we hit ASSERT(s <= e) in
rangeset_contains_range()/rangeset_overlaps_range() with debug builds.
Production builds will not trap right away but may misbehave later
while handling such bogus ranges.

This is XSA-238.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/FLASK: fix unmap-domain-IRQ XSM hook
Jan Beulich [Thu, 12 Oct 2017 12:37:56 +0000 (14:37 +0200)]
x86/FLASK: fix unmap-domain-IRQ XSM hook

The caller and the FLASK implementation of xsm_unmap_domain_irq()
disagreed about what the "data" argument points to in the MSI case:
Change both sides to pass/take a PCI device.

This is part of XSA-237.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/IRQ: conditionally preserve irq <-> pirq mapping on map error paths
Jan Beulich [Thu, 12 Oct 2017 12:37:26 +0000 (14:37 +0200)]
x86/IRQ: conditionally preserve irq <-> pirq mapping on map error paths

Mappings that had been set up before should not be torn down when
handling unrelated errors.

This is part of XSA-237.

Reported-by: HW42 <hw42@ipsumj.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/MSI: disallow redundant enabling
Jan Beulich [Thu, 12 Oct 2017 12:36:58 +0000 (14:36 +0200)]
x86/MSI: disallow redundant enabling

At the moment, Xen attempts to allow redundant enabling of MSI by
having pci_enable_msi() return 0, and point to the existing MSI
descriptor, when the msi already exists.

Unfortunately, if subsequent errors are encountered, the cleanup
paths assume pci_enable_msi() had done full initialization, and
hence undo everything that was assumed to be done by that
function without also undoing other setup that would normally
occur only after that function was called (in map_domain_pirq()
itself).

Rather than try to make the redundant enabling case work properly, just
forbid it entirely by having pci_enable_msi() return -EEXIST when MSI
is already set up.

This is part of XSA-237.

Reported-by: HW42 <hw42@ipsumj.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86: enforce proper privilege when (un)mapping pIRQ-s
Jan Beulich [Thu, 12 Oct 2017 12:36:30 +0000 (14:36 +0200)]
x86: enforce proper privilege when (un)mapping pIRQ-s

(Un)mapping of IRQs, just like other RESOURCE__ADD* / RESOURCE__REMOVE*
actions (in FLASK terms) should be XSM_DM_PRIV rather than XSM_TARGET.
This in turn requires bypassing the XSM check in physdev_unmap_pirq()
for the HVM emuirq case just like is being done in physdev_map_pirq().
The primary goal security wise, however, is to no longer allow HVM
guests, by specifying their own domain ID instead of DOMID_SELF, to
enter code paths intended for PV guest and the control domains of HVM
guests only.

This is part of XSA-237.

Reported-by: HW42 <hw42@ipsumj.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86: don't allow MSI pIRQ mapping on unowned device
Jan Beulich [Thu, 12 Oct 2017 12:35:14 +0000 (14:35 +0200)]
x86: don't allow MSI pIRQ mapping on unowned device

MSI setup should be permitted only for existing devices owned by the
respective guest (the operation may still be carried out by the domain
controlling that guest).

This is part of XSA-237.

Reported-by: HW42 <hw42@ipsumj.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxl: dm_restrict: Document that it does not work with PV
Ian Jackson [Thu, 12 Oct 2017 11:18:58 +0000 (12:18 +0100)]
xl: dm_restrict: Document that it does not work with PV

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: dm_restrict: Move to domain_build_info
Ian Jackson [Thu, 12 Oct 2017 11:13:48 +0000 (12:13 +0100)]
libxl: dm_restrict: Move to domain_build_info

Right now, this is broken because libxl__build_device_model_args_new
is used also for the qemu run for pv guests for qdisk devices, pvfb,
etc.

We can either make this option properly HVM-specific, or make it
generic.

In principle it is a reasonable request, to make the PV qemu
deprivileged (even though it is not likely to be implemented any time
soon).  So make this option generic.

We retain the name "device model" even though it is arguably
inaccurate, because the xl docs already say, for example
  For a PV guest a device-model is sometimes used to provide backends
  for certain PV devices

The documentation patch here is pure code motion.  For ease of review
we will fix up the docs, so the wording to be right for the new
context, in the next patch.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
v2: Change xl too, to avoid breaking the build.
    Fix manpage pod syntax.

7 years agofuzz/x86_emulate: Move definitions into a header
George Dunlap [Wed, 11 Oct 2017 17:49:43 +0000 (18:49 +0100)]
fuzz/x86_emulate: Move definitions into a header

Move fuzz-emul.c function prototypes into a header.  Also share the
definition of the input size (rather than hard-coding it in
fuzz-emul.c).

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
7 years agofuzz/x86_emulate: Take multiple test files for inputs
George Dunlap [Wed, 11 Oct 2017 17:49:42 +0000 (18:49 +0100)]
fuzz/x86_emulate: Take multiple test files for inputs

Finding aggregate coverage for a set of test files means running each
afl-generated test case through the harness.  At the moment, this is
done by re-executing afl-harness-cov with each input file.  When a
large number of test cases have been generated, this can take a
significant amonut of time; a recent test with 30k total files
generated by 4 parallel fuzzers took over 7 minutes.

The vast majority of this time is taken up with 'exec', however.
Since the harness is already designed to loop over multiple inputs for
llvm "persistent mode", just allow it to take a large number of inputs
on the same when *not* running in llvm "persistent mode"..  Then the
command can be efficiently executed like this:

  ls */queue/id* | xargs $path/afl-harness-cov

For the above-mentioned test on 30k files, the time to generate
coverage data was reduced from 7 minutes to under 30 seconds.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agofuzz/x86_emulate: Add 'afl-cov' target
George Dunlap [Wed, 11 Oct 2017 17:49:41 +0000 (18:49 +0100)]
fuzz/x86_emulate: Add 'afl-cov' target

...to generate a "normal" coverage-instrumented binary, suitable for
use with gcov or afl-cov.

This is slightly annoying because:

 - Every object file needs to have been instrumented to work
   effectively

 - You generally want to have both an afl-instrumented binary and a
   gcov-instrumented binary at the same time, but

 - gcov instrumentation and afl instrumentation are mutually exclusive

So when making the `afl-cov` target, generate a second set of object
files and a second binary with the `-cov` suffix.

While we're here, remove the redundant x86-emulate.c dependency for
x86-emulate.o.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agofuzz/x86_emulate: Rename the file containing the wrapper code
George Dunlap [Wed, 11 Oct 2017 17:49:40 +0000 (18:49 +0100)]
fuzz/x86_emulate: Rename the file containing the wrapper code

When generating coverage output, by default gcov generates output
filenames based only on the coverage file and the "leaf" source file,
not the full path.  As a result, it uses the same name for
x86_emulate.c and x86_emulate/x86_emulate.c, generally overwriting the
second (which we actually are about) with the first (which is just a
wrapper).

Rename the user-space wrapper helpers to x86-emulate.[ch], so
that it generates separate files.

There is actually an option to gcov, `--preserve-paths`, which will
cause the full path name to be included in the filename, properly
distinguishing between the two.  However, given that the user-space
wrapper doesn't actually do any emulation (and the poor state of gcov
documentation making it difficult to find the option in the first
place), it seems to make more sense to rename the file anyway.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agofuzz/x86_emulate: Implement input_read() and input_avail()
George Dunlap [Wed, 11 Oct 2017 17:49:39 +0000 (18:49 +0100)]
fuzz/x86_emulate: Implement input_read() and input_avail()

Rather than open-coding the "read" from the input file.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agofuzz/x86_emulate: Improve failure descriptions in x86_emulate harness
George Dunlap [Wed, 11 Oct 2017 17:49:38 +0000 (18:49 +0100)]
fuzz/x86_emulate: Improve failure descriptions in x86_emulate harness

- Print the symbolic name rather than the number
- Explicitly state when data_read() fails due to EOI

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agofuzz/x86_emulate: Clear errors in the officially sanctioned way
George Dunlap [Wed, 11 Oct 2017 17:49:37 +0000 (18:49 +0100)]
fuzz/x86_emulate: Clear errors in the officially sanctioned way

Commit 849a1f10c9 was checked in inappropriately; review flagged up
that clearerr() was too big a hammer, as it would clear both the EOF
flag and stream errors.

Stream errors shouldn't be cleared; we only want the EOF and other
stream-related state reset.  To do this, it is sufficient to fseek()
to zero.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agopublic: add and enable XENFEAT_ARM_SMCCC_supported feature
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:51 +0000 (18:52 +0300)]
public: add and enable XENFEAT_ARM_SMCCC_supported feature

This feature indicates that hypervisor is compatible with ARM
SMC calling convention. Previously hypervisor would inject an
undefined instruction exception if an invalid SMC function were
called or would crash a domain if an invalid HVC function
were invoked.
XENFEAT_ARM_SMCCC_supported feature means that it safe to invoke
SMC/HVC calls that are compatible with SMC calling convention.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoarm: vsmc: remove 64 bit mode check in PSCI handler
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:50 +0000 (18:52 +0300)]
arm: vsmc: remove 64 bit mode check in PSCI handler

PSCI handling code had helper routine that checked calling convention.
It does not needed anymore, because:

 - Generic handler checks that 64 bit calls can be made only by
   64 bit guests.

 - SMCCC requires that 64-bit handler should support both 32 and 64 bit
   calls even if they originate from 64 bit caller.

This patch removes that extra check.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
7 years agoarm: PSCI: use definitions provided by asm/smccc.h
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:49 +0000 (18:52 +0300)]
arm: PSCI: use definitions provided by asm/smccc.h

smccc.h provides definitions to construct SMC call function number according
to SMCCC. We don't need multiple definitions for one thing, and definitions
in smccc.h are more generic than ones used in psci.h.

So psci.h will only provide function codes, while whole SMC function
identifier will be constructed using generic macros from smccc.h.

Function psci_mode_check() in vsmc.c will be removed in a next patch,
so there are no need to review it. I had to rework it, because
PSCI_0_2_64BIT definition is dropped now.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoarm: traps: handle PSCI calls inside `vsmc.c`
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:48 +0000 (18:52 +0300)]
arm: traps: handle PSCI calls inside `vsmc.c`

PSCI is part of HVC/SMC interface, so it should be handled in
appropriate place: `vsmc.c`. This patch moves PSCI handler
calls from `traps.c` to `vsmc.c`. Also it corrects coding
style of the PSCI handler functions.

Older PSCI 0.1 uses SMC function identifiers in range that is
reserved for existing APIs (ARM DEN 0028B, page 16), while newer
PSCI 0.2 and later is defined as "standard secure service" with its
own ranges (ARM DEN 0028B, page 18).

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm: smccc: handle SMCs according to SMCCC
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:47 +0000 (18:52 +0300)]
arm: smccc: handle SMCs according to SMCCC

SMCCC (SMC Call Convention) describes how to handle both HVCs and SMCs.
SMCCC states that both HVC and SMC are valid conduits to call to different
firmware functions. Thus, for example, PSCI calls can be made both by
SMC or HVC. Also SMCCC defines function number coding for such calls.
Besides functional calls there are query calls, which allows underling
OS determine version, UUID and number of functions provided by service
provider.

This patch adds new file `vsmc.c`, which handles both generic SMCs
and HVC according to SMCCC. At this moment it implements only one
service: Standard Hypervisor Service.

At this time Standard Hypervisor Service only supports query calls,
so caller can ask about hypervisor UID and determine that it is XEN running.

This change allows more generic handling for SMCs and HVCs and it can
be easily extended to support new services and functions.

But, before SMC is forwarded to standard SMCCC handler, it can be routed
to a domain monitor, if one is installed.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm: add SMCCC protocol definitions
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:46 +0000 (18:52 +0300)]
arm: add SMCCC protocol definitions

Add generic definitions used in ARM SMC call convention.
Those definitions was originally added to Linux kernel as
include/linux/arm-smccc.h by commit 98dd64f34f47
("ARM: 8478/2: arm/arm64: add arm-smccc")

I extended them and formatted according to XEN coding style. Some
of the macros were converted to inlined functions to ease parsing.

They can be used by both SMCCC clients (like PSCI) and by SMCCC
servers (like vPSCI or upcoming generic SMCCC handler).

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm: processor.h: add definition for immediate value mask
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:45 +0000 (18:52 +0300)]
arm: processor.h: add definition for immediate value mask

This patch defines HSR_XXC_IMM_MASK. It can be used to extract
immediate value for trapped HVC32, HVC64, SMC64, SVC32, SVC64
instructions, as described in the ARM ARM
(ARM DDI 0487B.a pages D7-2270, D7-2272).

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agopublic: xen.h: add definitions for UUID handling
Volodymyr Babchuk [Wed, 11 Oct 2017 11:57:59 +0000 (14:57 +0300)]
public: xen.h: add definitions for UUID handling

Added type xen_uuid_t. This type represents UUID as an array of 16
bytes in big endian format.

Added macro XEN_DEFINE_UUID that constructs UUID in the usual way:

 XEN_DEFINE_UUID(0x00112233, 0x4455, 0x6677, 0x8899,
0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff)

will construct UUID 00112233-4455-6677-8899-aabbccddeeff presented as
 {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88,
  0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff}

NB: We define a new structure here rather than re-using EFI_GUID.
EFI_GUID uses a Microsoft-style encoding which, among other things,
mixes little-endian and big-endian. The structure defined in this
patch, unlike EFI_GUID, is compatible with the Linux kernel and libuuid.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm: traps: check if SMC was conditional before handling it
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:43 +0000 (18:52 +0300)]
arm: traps: check if SMC was conditional before handling it

Trapped SMC instruction can fail condition check on ARMv8 architecture
(ARM DDI 0487B.a page D7-2271). So we need to check if condition was meet.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoarm: traps: use generic register accessors in the PSCI code
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:42 +0000 (18:52 +0300)]
arm: traps: use generic register accessors in the PSCI code

There are standard functions set_user_reg() and get_user_reg(). We can
use them in PSCI_SET_RESULT()/PSCI_ARG() macros instead of relying on
CONFIG_ARM_64 definition.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoarm: traps: use only least 32 bits of fid in PSCI handler
Volodymyr Babchuk [Tue, 10 Oct 2017 15:52:41 +0000 (18:52 +0300)]
arm: traps: use only least 32 bits of fid in PSCI handler

According to SMCCC (ARM DEN 0028B, page 12), function id is
stored in least 32 bits of r0/x0 register:

    The least significant 32-bits are used, and the most significant
    32-bits are zero.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: guest_walk: Fix get_ipa_output_size
Julien Grall [Wed, 11 Oct 2017 14:29:02 +0000 (15:29 +0100)]
xen/arm: guest_walk: Fix get_ipa_output_size

The function get_ipa_output_size checks whether the input size
configured by the guest is valid and will return it.

The check is done with the IPS already shifted against
TCR_EL1_IPS_48_BIT. However the constant has been defined with the
shift included, as a result the check is always false.

Fix it by doing the check on the non-shifted value.

This was introduced by commit 7d623b358a "arm/mem_access: Add long-descriptor
based gpt" introduced software page-table walk for stage-1.

Note that the IPS code is now surrounded with #ifdef CONFIG_ARM_64
because the Arm32 compiler will complain of shift bigger than the width
of the variable. This is fine as the code is executed for 64-bit domain only.

Coverity-ID: 1457707
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Rework MAIR* definitions to handle 32-bit compilation environment
Julien Grall [Wed, 11 Oct 2017 14:15:33 +0000 (15:15 +0100)]
xen/arm: mm: Rework MAIR* definitions to handle 32-bit compilation environment

Commit a0543df403 "xen/arm: page: Clean-up the definition of MAIRVAL"
combined the definition of MAIR0VAL and MAIR1VAL in MAIRVAL. Sadly, when
building in 32-bit environment, the assembler is unable to compute
64-bit constant and will ignore the 32-bit most-significants bits. This
will result of MAIR1 set 0.

Rather than fully reverting the offending commit, the code is reworked
to still avoid hardcoded values but split the definition in 2.

Lastly, a comment is added to avoid trying to blindly combine the both
definition again in the future.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agocommon/gnttab: Improve logging message by including relevent domid
Andrew Cooper [Tue, 10 Oct 2017 20:03:06 +0000 (21:03 +0100)]
common/gnttab: Improve logging message by including relevent domid

Several logging messages cite "bad ref %#x", without identifying which domain
the ref belongs to.  Add a domain back-pointer to struct grant_table to
improve the debugability.

While editing the messages, clean up some others:

 * Remove extranious punctuation
 * Use d%d rather than Dom%d
 * Remove "gnttab_transfer:" prefixes, as it is included by the gdprintk()
 * Reflow several messages to not be split across multiple lines

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: fix XEN_DMOP_remote_shutdown return value
Ross Lagerwall [Wed, 11 Oct 2017 15:49:48 +0000 (17:49 +0200)]
x86: fix XEN_DMOP_remote_shutdown return value

Return 0 to indicate success rather than whatever rc was previously set
to (-EINVAL).

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoQEMU_TAG update
Ian Jackson [Wed, 11 Oct 2017 14:54:30 +0000 (15:54 +0100)]
QEMU_TAG update

7 years agoRevert "DEBUG PRINTFS"
Wei Liu [Wed, 11 Oct 2017 14:03:01 +0000 (15:03 +0100)]
Revert "DEBUG PRINTFS"

This reverts commit cf8e5f25a940928550e69b543ed67df1d73f7b09.

It is not supposed to be committed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: psr: support co-exist features' values setting
Yi Sun [Wed, 11 Oct 2017 12:51:45 +0000 (14:51 +0200)]
x86: psr: support co-exist features' values setting

The whole value array is transferred into 'do_write_psr_msrs'. Then, we can
write all features values on the cos id into MSRs.

Because multiple features may co-exist, we need handle all features to write
values of them into a COS register with new COS ID. E.g:
1. L3 CAT and L2 CAT co-exist.
2. Dom1 and Dom2 share the same COS ID (2). The L3 CAT CBM of Dom1 is 0x1ff,
   the L2 CAT CBM of Dom1 is 0x1f.
3. User wants to change L2 CBM of Dom1 to be 0xf. Because COS ID 2 is
   used by Dom2 too, we have to pick a new COS ID 3. The values of Dom1 on
   COS ID 3 are all default values as below:
           ---------
           | COS 3 |
           ---------
   L3 CAT  | 0x7ff |
           ---------
   L2 CAT  | 0xff  |
           ---------
4. After setting, the L3 CAT CBM value of Dom1 should be kept and the new L2
   CAT CBM is set. So, the values on COS ID 3 should be below.
           ---------
           | COS 3 |
           ---------
   L3 CAT  | 0x1ff |
           ---------
   L2 CAT  | 0xf   |
           ---------

Note that the original -ENOSPC return, which is being transformed into
an ASSERT(), could have been an ASSERT() from the beginning.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86emul: handle address wrapping for VMASKMOVP{S,D}
Jan Beulich [Wed, 11 Oct 2017 12:50:33 +0000 (14:50 +0200)]
x86emul: handle address wrapping for VMASKMOVP{S,D}

I failed to recognize the need to mirror the changes done by 7869e2bafe
("x86emul/fuzz: add rudimentary limit checking") into the earlier
written but later committed 2fe43d333f ("x86emul: support remaining AVX
insns"): Behavior here is the same as for multi-part reads or writes.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/vmx: remove unnecessary is_hvm_domain() test in construct_vmcs()
Boris Ostrovsky [Wed, 11 Oct 2017 12:49:55 +0000 (14:49 +0200)]
x86/vmx: remove unnecessary is_hvm_domain() test in construct_vmcs()

It's a leftover from PVHv1 days.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: implement hvmemul_write() using real mappings
Andrew Cooper [Wed, 11 Oct 2017 12:48:50 +0000 (14:48 +0200)]
x86/hvm: implement hvmemul_write() using real mappings

An access which crosses a page boundary is performed atomically by x86
hardware, albeit with a severe performance penalty.  An important corner case
is when a straddled access hits two pages which differ in whether a
translation exists, or in net access rights.

The use of hvm_copy*() in hvmemul_write() is problematic, because it performs
a translation then completes the partial write, before moving onto the next
translation.

If an individual emulated write straddles two pages, the first of which is
writable, and the second of which is not, the first half of the write will
complete before #PF is raised from the second half.

This results in guest state corruption as a side effect of emulation, which
has been observed to cause windows to crash while under introspection.

Introduce the hvmemul_{,un}map_linear_addr() helpers, which translate an
entire contents of a linear access, and vmap() the underlying frames to
provide a contiguous virtual mapping for the emulator to use.  This is the
same mechanism as used by the shadow emulation code.

This will catch any translation issues and abort the emulation before any
modifications occur.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agoDEBUG PRINTFS
Ian Jackson [Fri, 15 Sep 2017 10:52:32 +0000 (11:52 +0100)]
DEBUG PRINTFS

7 years agoxl: Document VGA problems arising from lack of physmap dmop
Ian Jackson [Fri, 6 Oct 2017 14:30:25 +0000 (15:30 +0100)]
xl: Document VGA problems arising from lack of physmap dmop

Ross reports that stdvga guests do not work, and cirrus guests are
slow, because qemu tries to do xc_domain_add_to_physmap.  We will need
another dmop to fix this properly.

For now, document the problem.

(In the cirrus case, the vram remains mapped at the old guest-physical
addresses, while the guest runs.  We are not sure whether this is a
correctness or security problem and we should advise against it.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <Paul.Durrant@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: xentoolcore_restrict_all: use domid_t
Ian Jackson [Thu, 14 Sep 2017 17:12:57 +0000 (18:12 +0100)]
tools: xentoolcore_restrict_all: use domid_t

This necessitates adding $(CFLAGS_xeninclude) to all the depending
libraries (which can be done via Rules.mk), so that the definition of
domid_t (in xen.h) can be found.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: dm_restrict: Support uid range user
Ian Jackson [Fri, 15 Sep 2017 17:37:19 +0000 (18:37 +0100)]
libxl: dm_restrict: Support uid range user

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: userlookup_helper_getpwnam rename and turn into a macro
Ian Jackson [Fri, 15 Sep 2017 17:35:44 +0000 (18:35 +0100)]
libxl: userlookup_helper_getpwnam rename and turn into a macro

We are going to want versions of getpwuid, too.  And maybe in the
future getgr*.

This is most sanely achieved with a macro, as otherwise the types are
a mess.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: libxl__dm_runas_helper: return pwd
Ian Jackson [Fri, 15 Sep 2017 17:21:53 +0000 (18:21 +0100)]
libxl: libxl__dm_runas_helper: return pwd

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: Rationalise calculation of user to run qemu as
Ian Jackson [Fri, 15 Sep 2017 15:55:54 +0000 (16:55 +0100)]
libxl: Rationalise calculation of user to run qemu as

If the config specifies a user we use that.  Otherwise:

When we are not restricting qemu, there is very little point running
it as a different user than root.  Indeed, previously, creating the
"magic" users would cause qemu to become slightly dysfunctional (for
example, you can't insert a cd that the qemu user can't read).
So, in that case, default to running it as root.

Conversely, if restriction is requested, we must insist on running
qemu as a non-root user.

Sadly the admin is still required to create 2^16-epsilon users!

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxl, libxl: Provide dm_restrict
Ian Jackson [Fri, 15 Sep 2017 15:55:06 +0000 (16:55 +0100)]
xl, libxl: Provide dm_restrict

This functionality is still quite imperfect, but it will be useful in
certain restricted use cases.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore, _restrict_all: Document implementation "complete"
Ian Jackson [Fri, 15 Sep 2017 13:51:58 +0000 (14:51 +0100)]
xentoolcore, _restrict_all: Document implementation "complete"

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: "Implement" for xenstore
Ian Jackson [Fri, 15 Sep 2017 13:01:35 +0000 (14:01 +0100)]
xentoolcore_restrict_all: "Implement" for xenstore

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/xenstore: get_handle: Allocate struct before opening fd
Ian Jackson [Fri, 15 Sep 2017 12:44:50 +0000 (13:44 +0100)]
tools/xenstore: get_handle: Allocate struct before opening fd

Now we can also abolish the temporary local variable "fd" and simply
use h->fd.

This ordering is necessary to be able to call
xentoolcore__register_active_handle sensibly.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/xenstore: get_handle: use "goto err" error handling style
Ian Jackson [Fri, 15 Sep 2017 12:42:38 +0000 (13:42 +0100)]
tools/xenstore: get_handle: use "goto err" error handling style

Replace the ad-hoc exit clauses with the error handling style where
  - local variables contain either things to be freed, or sentinels
  - all error exits go via an "err" label which frees everything

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: "Implement" for xengnttab
Ian Jackson [Fri, 15 Sep 2017 12:35:55 +0000 (13:35 +0100)]
xentoolcore_restrict_all: "Implement" for xengnttab

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: Declare problems due to no evtchn support
Ian Jackson [Fri, 15 Sep 2017 12:35:07 +0000 (13:35 +0100)]
xentoolcore_restrict_all: Declare problems due to no evtchn support

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: Implement for libxenforeignmemory
Ian Jackson [Fri, 15 Sep 2017 11:01:19 +0000 (12:01 +0100)]
xentoolcore_restrict_all: Implement for libxenforeignmemory

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict: Break out xentoolcore__restrict_by_dup2_null
Ian Jackson [Fri, 15 Sep 2017 10:50:07 +0000 (11:50 +0100)]
xentoolcore_restrict: Break out xentoolcore__restrict_by_dup2_null

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: "Implement" for libxencall
Ian Jackson [Fri, 15 Sep 2017 10:44:58 +0000 (11:44 +0100)]
xentoolcore_restrict_all: "Implement" for libxencall

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: Implement for libxendevicemodel
Ian Jackson [Fri, 15 Sep 2017 10:28:54 +0000 (11:28 +0100)]
xentoolcore_restrict_all: Implement for libxendevicemodel

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: move CONTAINER_OF to xentoolcore_internal.h
Ian Jackson [Thu, 14 Sep 2017 17:05:49 +0000 (18:05 +0100)]
tools: move CONTAINER_OF to xentoolcore_internal.h

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: #include "xentoolcore_internal.h"
Ian Jackson [Thu, 14 Sep 2017 17:02:44 +0000 (18:02 +0100)]
libxl: #include "xentoolcore_internal.h"

We are going to want to move something here.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: qemu-xen build: prepare to link against xentoolcore
Ian Jackson [Fri, 15 Sep 2017 14:25:23 +0000 (15:25 +0100)]
tools: qemu-xen build: prepare to link against xentoolcore

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore: Link into minios (update MINIOS_UPSTREAM_REVISION)
Ian Jackson [Mon, 9 Oct 2017 14:32:01 +0000 (15:32 +0100)]
xentoolcore: Link into minios (update MINIOS_UPSTREAM_REVISION)

We need to do this before we start to make the other libraries call
into xentoolcore, or we break building minios with new the xen.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoxentoolcore: Link into stubdoms
Ian Jackson [Tue, 3 Oct 2017 18:45:52 +0000 (19:45 +0100)]
xentoolcore: Link into stubdoms

We need to do this before we start to make the other libraries call
into xentoolcore, or we break the stubdom build.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore, _restrict_all: Introduce new library and implementation
Ian Jackson [Thu, 14 Sep 2017 16:51:08 +0000 (17:51 +0100)]
xentoolcore, _restrict_all: Introduce new library and implementation

In practice, qemu opens a great many fds.  Tracking them all down and
playing whack-a-mole is unattractive.  It is also potentially fragile
in that future changes might accidentally undo our efforts.

Instead, we are going to teach all the Xen libraries how to register
their fds so that they can be neutered with one qemu call.

Right now, nothing will go wrong if some tries to link without
-ltoolcore, but that will stop working as soon as the first other Xen
library starts to register.  So this patch will be followed by the
stubdom build update, and should be followed by a
MINIOS_UPSTREAM_REVISION updated.

Sadly qemu upstream's configuration arrangements are too crude, being
keyed solely off the Xen version number.  So they cannot provide
forward/backward build compatibility across changes in xen-unstable,
like this one.  qemu patches to link against xentoolcore should be
applied in qemu upstream so avoid the qemu build breaking against the
released version of Xen 4.10.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: libxendevicemodel: Provide xendevicemodel_shutdown
Ian Jackson [Fri, 15 Sep 2017 16:21:14 +0000 (17:21 +0100)]
tools: libxendevicemodel: Provide xendevicemodel_shutdown

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen: x86 dm_op: add missing newline before XEN_DMOP_inject_msi
Ian Jackson [Mon, 18 Sep 2017 13:55:45 +0000 (14:55 +0100)]
xen: x86 dm_op: add missing newline before XEN_DMOP_inject_msi

Coding style only; no functional change.

CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
7 years agoxen: Provide XEN_DMOP_remote_shutdown
Ian Jackson [Fri, 15 Sep 2017 16:16:37 +0000 (17:16 +0100)]
xen: Provide XEN_DMOP_remote_shutdown

SCHEDOP_remote_shutdown should be a DMOP so that a deprivileged qemu
can do the propery tidying up.

We need to keep SCHEDOP_remote_shutdown for ABI stability reasons and
because it is needed for PV guests.

CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agodocs: enable per-VCPU extratime flag for RTDS
Meng Xu [Tue, 10 Oct 2017 23:17:45 +0000 (19:17 -0400)]
docs: enable per-VCPU extratime flag for RTDS

Revise xl tool use case by adding -e option
Remove work-conserving from TODO list

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <raistlin@linux.it>
7 years agoxl: enable per-VCPU extratime flag for RTDS
Meng Xu [Tue, 10 Oct 2017 23:17:43 +0000 (19:17 -0400)]
xl: enable per-VCPU extratime flag for RTDS

Change main_sched_rtds and related output functions to support
per-VCPU extratime flag.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <raistlin@linux.it>
7 years agolibxl: enable per-VCPU extratime flag for RTDS
Meng Xu [Tue, 10 Oct 2017 23:17:42 +0000 (19:17 -0400)]
libxl: enable per-VCPU extratime flag for RTDS

Modify libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
functions to support per-VCPU extratime flag

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <raistlin@linux.it>
7 years agoxen:rtds: towards work conserving RTDS
Meng Xu [Tue, 10 Oct 2017 23:17:41 +0000 (19:17 -0400)]
xen:rtds: towards work conserving RTDS

Make RTDS scheduler work conserving without breaking the real-time guarantees.

VCPU model:
Each real-time VCPU is extended to have an extratime flag
and a priority_level field.
When a VCPU's budget is depleted in the current period,
if it has extratime flag set,
its priority_level will increase by 1 and its budget will be refilled;
othewrise, the VCPU will be moved to the depletedq.

Scheduling policy is modified global EDF:
A VCPU v1 has higher priority than another VCPU v2 if
(i) v1 has smaller priority_leve; or
(ii) v1 has the same priority_level but has a smaller deadline

Queue management:
Run queue holds VCPUs with extratime flag set and VCPUs with
remaining budget. Run queue is sorted in increasing order of VCPUs priorities.
Depleted queue holds VCPUs which have extratime flag cleared and depleted budget.
Replenished queue is not modified.

Distribution of spare bandwidth
Spare bandwidth is distributed among all VCPUs with extratime flag set,
proportional to these VCPUs utilizations

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: Dario Faggioli <raistlin@linux.it>
7 years agotools/libxc: Fix domid parameter types
Andrew Cooper [Fri, 6 Oct 2017 19:00:00 +0000 (20:00 +0100)]
tools/libxc: Fix domid parameter types

Mixed throughout libxc are uint32_t, int, and domid_t for domid parameters.
With a signed type, and an explicitly 16-bit type, it is exceedingly difficult
to construct an INVALID_DOMID constant which works with all of them.  (The
main problem being that domid_t gets unconditionally zero extended when
promoted to int for arithmatic.)

Libxl uses uint32_t consistently everywhere, so alter libxc to match.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
[ wei: fix compilation error in libxl ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoARM: sunxi: support more Allwinner SoCs
Andre Przywara [Sat, 7 Oct 2017 00:06:40 +0000 (01:06 +0100)]
ARM: sunxi: support more Allwinner SoCs

So far we only supported the Allwinner A20 SoC. Add support for most
of the other virtualization capable Allwinner SoCs by:
- supporting the watchdog in newer (sun8i) SoCs
- getting the watchdog address from DT
- adding compatible strings for other 32-bit SoCs
- adding compatible strings for 64-bit SoCs

As all 64-bit SoCs support system reset via PSCI, we don't use the
platform specific reset routine there. Should the 32-bit SoCs start to
properly support the PSCI 0.2 SYSTEM_RESET call, we will use it for them
automatically, as we try PSCI first, then fall back to platform reset.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Use memory flags for modify_xen_mappings rather than custom one
Julien Grall [Mon, 9 Oct 2017 13:23:41 +0000 (14:23 +0100)]
xen/arm: mm: Use memory flags for modify_xen_mappings rather than custom one

This will help to consolidate the page-table code and avoid different
path depending on the action to perform.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoxen/arm: mm: Handle permission flags when adding a new mapping
Julien Grall [Mon, 9 Oct 2017 13:23:40 +0000 (14:23 +0100)]
xen/arm: mm: Handle permission flags when adding a new mapping

Currently, all the new mappings will be read-write non-executable. Allow the
caller to use other permissions.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Embed permission in the flags
Julien Grall [Mon, 9 Oct 2017 13:23:39 +0000 (14:23 +0100)]
xen/arm: mm: Embed permission in the flags

Currently, it is not possible to specify the permission of a new
mapping. It would be necessary to use the function modify_xen_mappings
with a different set of flags.

Introduce a couple of new flags for the permissions (Non-eXecutable,
Read-Only) and also provides definition that combine the memory attribute
and permission for common combinations.

PAGE_HYPERVISOR is now an alias to PAGE_HYPERVISOR_RW (read-write,
non-executable mappings). This does not affect the current mapping using
PAGE_HYPERVISOR because Xen is currently forcing all the mapping to be
non-executable by default (see mfn_to_xen_entry).

A follow-up patch will change modify_xen_mappings to use the new flags.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: page: Describe the layout of flags used to update page tables
Julien Grall [Mon, 9 Oct 2017 13:23:38 +0000 (14:23 +0100)]
xen/arm: page: Describe the layout of flags used to update page tables

Currently, the flags used to update page tables (i.e PAGE_HYPERVISOR_*)
only contains the memory attribute index. Follow-up patches will add
more information in it. So document the current layout.

At the same time introduce PAGE_AI_MASK to get the memory attribute
index easily.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Use PAGE_HYPERVISOR_* instead of MT_* when calling set_fixmap
Julien Grall [Mon, 9 Oct 2017 13:23:37 +0000 (14:23 +0100)]
xen/arm: mm: Use PAGE_HYPERVISOR_* instead of MT_* when calling set_fixmap

At the moment, PAGE_HYPERVISOR_* and MT_* have exactly the same value.
In a follow-up patch the former will be extended to carry more
information.

It looks like the caller of set_fixmap are mixing the both. Stay
consistent and only use PAGE_HYPERVISOR_*. This is also match the
behavior of create_xen_entries and would potentially allow to share some
part in the future.

Also rename the parameter 'attributes' to 'flags' so it is clearer what
is the interface.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Rename 'ai' into 'flags' in create_xen_entries
Julien Grall [Mon, 9 Oct 2017 13:23:36 +0000 (14:23 +0100)]
xen/arm: mm: Rename 'ai' into 'flags' in create_xen_entries

The parameter 'ai' is used either for attribute index or for
permissions. Follow-up patch will rework that parameters to carry more
information. So rename the parameter to 'flags'.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Switch to SYS_STATE_boot just after end_boot_allocator()
Julien Grall [Mon, 9 Oct 2017 13:23:35 +0000 (14:23 +0100)]
xen/arm: Switch to SYS_STATE_boot just after end_boot_allocator()

We should consider the early boot period to end when we stop using the
boot allocator. This is inline with x86 and will be helpful to know
whether we should allocate memory from the boot allocator or xenheap.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Rename and clarify AP[1] in the stage-1 page table
Julien Grall [Mon, 9 Oct 2017 13:23:34 +0000 (14:23 +0100)]
xen/arm: mm: Rename and clarify AP[1] in the stage-1 page table

The description of AP[1] in Xen is based on testing rather than the ARM
ARM.

Per the ARM ARM, on EL2 stage-1 page table, AP[1] is RES1 as the
translation regime applies to only one exception level (see D4.4.4 and
G4.6.1 in ARM DDI 0487B.a).

Update the comment and also rename the field to match the description in
the ARM ARM.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: page: Clean-up the definition of MAIRVAL
Julien Grall [Mon, 9 Oct 2017 13:23:33 +0000 (14:23 +0100)]
xen/arm: page: Clean-up the definition of MAIRVAL

Currently MAIRVAL is defined in term of MAIR0VAL and MAIR1VAL which are
both hardcoded value. This makes quite difficult to understand the value
written in both registers.

Rework the definition by using value of each attribute shifted by their
associated index.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: page: Use ARMv8 naming to improve readability
Julien Grall [Mon, 9 Oct 2017 13:23:32 +0000 (14:23 +0100)]
xen/arm: page: Use ARMv8 naming to improve readability

This is based on the Linux ARMv8 naming scheme (see arch/arm64/mm/proc.S). Each
type will contain "NORMAL" or "DEVICE" to make clear whether each attribute
targets device or normal memory.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>