]> xenbits.xensource.com Git - people/sstabellini/xen-unstable.git/.git/log
people/sstabellini/xen-unstable.git/.git
6 years agoxen/arm: use p2m_mmio_direct_c to map reserved-memory iomem_cache-wip
Stefano Stabellini [Thu, 7 Mar 2019 21:22:10 +0000 (13:22 -0800)]
xen/arm: use p2m_mmio_direct_c to map reserved-memory

Don't use p2m_ram_rw for memory mapped into the guest with iomem, and
for reserved-memory regions. Instead, use p2m_mmio_direct_c which has
very similar pagetable properties but not the same security implications
(p2m_is_ram checks and memory allocations.)

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
6 years agoxen/docs: improve reserved-memory doc
Stefano Stabellini [Thu, 7 Mar 2019 21:21:59 +0000 (13:21 -0800)]
xen/docs: improve reserved-memory doc

Extend the device tree snippet example in the docs to have a memory
node that covers the reserved-memory range as required by the device
tree spec.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
6 years agoxen/arm: add reserved-memory regions to the dom0 memory node
Stefano Stabellini [Thu, 7 Mar 2019 21:21:43 +0000 (13:21 -0800)]
xen/arm: add reserved-memory regions to the dom0 memory node

Reserved memory regions are automatically remapped to dom0. Their device
tree nodes are also added to dom0 device tree. However, the dom0 memory
node is not currently extended to cover the reserved memory regions
ranges as required by the spec.  This commit fixes it.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
6 years agoxen/docs: how to map a page between dom0 and domU using iomem
Stefano Stabellini [Tue, 26 Feb 2019 23:10:52 +0000 (15:10 -0800)]
xen/docs: how to map a page between dom0 and domU using iomem

Document how to use the iomem option to share a page between Dom0 and a
DomU.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
6 years agoxen/arm: map reserved-memory regions as normal memory in dom0
Stefano Stabellini [Tue, 26 Feb 2019 23:09:52 +0000 (15:09 -0800)]
xen/arm: map reserved-memory regions as normal memory in dom0

reserved-memory regions should be mapped as normal memory. At the
moment, they get remapped as device memory in dom0 because Xen doesn't
know any better. Add an explicit check for it.

However, reserved-memory regions are allowed to overlap partially or
completely with memory nodes. In these cases, the overlapping memory is
reserved-memory and should be handled accordingly.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
6 years agoxen/arm: keep track of reserved-memory regions
Stefano Stabellini [Tue, 26 Feb 2019 23:08:52 +0000 (15:08 -0800)]
xen/arm: keep track of reserved-memory regions

As we parse the device tree in Xen, keep track of the reserved-memory
regions as they need special treatment (follow-up patches will make use
of the stored information.)

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
6 years agolibxl/xl: add cacheability option to iomem
Stefano Stabellini [Tue, 26 Feb 2019 23:07:52 +0000 (15:07 -0800)]
libxl/xl: add cacheability option to iomem

Parse a new cacheability option for the iomem parameter, it can be
"devmem" for device memory mappings, which is the default, or "memory"
for normal memory mappings.

Store the parameter in a new field in libxl_iomem_range.

Pass the cacheability option to xc_domain_memory_mapping.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: ian.jackson@eu.citrix.com
CC: wei.liu2@citrix.com
6 years agolibxc: xc_domain_memory_mapping, handle cacheability
Stefano Stabellini [Tue, 26 Feb 2019 23:06:52 +0000 (15:06 -0800)]
libxc: xc_domain_memory_mapping, handle cacheability

Add an additional parameter to xc_domain_memory_mapping to pass
cacheability information. The same parameter values are the same for the
XEN_DOMCTL_memory_mapping hypercall (0 is device memory, 1 is normal
memory). Pass CACHEABILITY_DEVMEM by default -- no changes in behavior.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: ian.jackson@eu.citrix.com
CC: wei.liu2@citrix.com
6 years agoxen: extend XEN_DOMCTL_memory_mapping to handle cacheability
Stefano Stabellini [Tue, 26 Feb 2019 23:05:52 +0000 (15:05 -0800)]
xen: extend XEN_DOMCTL_memory_mapping to handle cacheability

Reuse the existing padding field to pass cacheability information about
the memory mapping, specifically, whether the memory should be mapped as
normal memory or as device memory (this is what we have today).

Add a cacheability parameter to map_mmio_regions. 0 means device
memory, which is what we have today.

On ARM, map device memory as p2m_mmio_direct_dev (as it is already done
today) and normal memory as p2m_ram_rw.

On x86, return error if the cacheability requested is not device memory.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: JBeulich@suse.com
CC: andrew.cooper3@citrix.com
6 years agoxen/arm: gic: Remove duplicated comment in do_sgi
Julien Grall [Tue, 23 Oct 2018 18:17:08 +0000 (19:17 +0100)]
xen/arm: gic: Remove duplicated comment in do_sgi

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov<andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: gic: Ensure ordering between read of INTACK and shared data
Julien Grall [Tue, 23 Oct 2018 18:17:07 +0000 (19:17 +0100)]
xen/arm: gic: Ensure ordering between read of INTACK and shared data

When an IPI is generated by a CPU, the pattern looks roughly like:

  <write shared data>
  dsb(sy);
  <write to GIC to signal SGI>

On the receiving CPU we rely on the fact that, once we've taken the
interrupt, then the freshly written shared data must be visible to us.
Put another way, the CPU isn't going to speculate taking an interrupt.

Unfortunately, this assumption turns out to be broken.

Consider that CPUx wants to send an IPI to CPUy, which will cause CPUy
to read some shared_data. Before CPUx has done anything, a random
peripheral raises an IRQ to the GIC and the IRQ line on CPUy is raised.
CPUy then takes the IRQ and starts executing the entry code, heading
towards gic_handle_irq. Furthermore, let's assume that a bunch of the
previous interrupts handled by CPUy were SGIs, so the branch predictor
kicks in and speculates that irqnr will be <16 and we're likely to
head into handle_IPI. The prefetcher then grabs a speculative copy of
shared_data which contains a stale value.

Meanwhile, CPUx gets round to updating shared_data and asking the GIC
to send an SGI to CPUy. Internally, the GIC decides that the SGI is
more important than the peripheral interrupt (which hasn't yet been
ACKed) but doesn't need to do anything to CPUy, because the IRQ line
is already raised.

CPUy then reads the ACK register on the GIC, sees the SGI value which
confirms the branch prediction and we end up with a stale shared_data
value.

This patch fixes the problem by adding an smp_rmb() to the IPI entry
code in do_SGI.

At the same time document the write barrier.

Based on Linux commit f86c4fbd930ff6fecf3d8a1c313182bd0f49f496
"irqchip/gic: Ensure ordering between read of INTACK and shared data".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov<andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: gic: Ensure we have an ISB between ack and do_IRQ()
Julien Grall [Tue, 23 Oct 2018 18:17:06 +0000 (19:17 +0100)]
xen/arm: gic: Ensure we have an ISB between ack and do_IRQ()

Devices that expose their interrupt status registers via system
registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,
vgic (although unused by Linux), ...) rely on a context synchronising
operation on the CPU to ensure that the updated status register is
visible to the CPU when handling the interrupt. This usually happens as
a result of taking the IRQ exception in the first place, but there are
two race scenarios where this isn't the case.

For example, let's say we have two peripherals (X and Y), where Y uses a
system register for its interrupt status.

Case 1:
1. CPU takes an IRQ exception as a result of X raising an interrupt
2. Y then raises its interrupt line, but the update to its system
   register is not yet visible to the CPU
3. The GIC decides to expose Y's interrupt number first in the Ack
   register
4. The CPU runs the IRQ handler for Y, but the status register is stale

Case 2:
1. CPU takes an IRQ exception as a result of X raising an interrupt
2. CPU reads the interrupt number for X from the Ack register and runs
   its IRQ handler
3. Y raises its interrupt line and the Ack register is updated, but
   again, the update to its system register is not yet visible to the
   CPU.
4. Since the GIC drivers poll the Ack register, we read Y's interrupt
   number and run its handler without a context synchronisation
   operation, therefore seeing the stale register value.

In either case, we run the risk of missing an IRQ. This patch solves the
problem by ensuring that we execute an ISB in the GIC drivers prior
to invoking the interrupt handler.

Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206
"irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov<andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Move vgic_* helpers from gic.h to vgic.h
Julien Grall [Wed, 31 Oct 2018 18:13:13 +0000 (18:13 +0000)]
xen/arm: Move vgic_* helpers from gic.h to vgic.h

Keep vgic_* helpers in a single place. At the same time remove gic.h
from event.h since the helpers has now been moved to vgic.h (included by
domain.h).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: platform: Don't include p2m.h in exynos5 and omap5
Julien Grall [Wed, 31 Oct 2018 18:13:12 +0000 (18:13 +0000)]
xen/arm: platform: Don't include p2m.h in exynos5 and omap5

None of the platforms are using the p2m helpers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Remove unnecessary includes in asm/current.h
Julien Grall [Wed, 31 Oct 2018 18:13:11 +0000 (18:13 +0000)]
xen/arm: Remove unnecessary includes in asm/current.h

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Remove unnecessary includes in asm-arm/acpi.h
Julien Grall [Wed, 31 Oct 2018 18:13:10 +0000 (18:13 +0000)]
xen/arm: Remove unnecessary includes in asm-arm/acpi.h

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Remove unnecessary includes in asm/p2m.h
Julien Grall [Fri, 9 Nov 2018 18:08:11 +0000 (10:08 -0800)]
xen/arm: Remove unnecessary includes in asm/p2m.h

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Remove unnecessary includes in traps.c
Julien Grall [Wed, 31 Oct 2018 18:13:08 +0000 (18:13 +0000)]
xen/arm: Remove unnecessary includes in traps.c

Also, include smccc.h instead of psci.h.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Remove unnecessary includes in asm/mmio.h
Julien Grall [Wed, 31 Oct 2018 18:13:07 +0000 (18:13 +0000)]
xen/arm: Remove unnecessary includes in asm/mmio.h

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Remove unnecessary includes in asm/vgic.h
Julien Grall [Wed, 31 Oct 2018 18:13:06 +0000 (18:13 +0000)]
xen/arm: Remove unnecessary includes in asm/vgic.h

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Only include vreg.h when necessary
Julien Grall [Wed, 31 Oct 2018 18:13:05 +0000 (18:13 +0000)]
xen/arm: Only include vreg.h when necessary

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Only include stringify.h when necessary
Julien Grall [Wed, 31 Oct 2018 18:13:04 +0000 (18:13 +0000)]
xen/arm: Only include stringify.h when necessary

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Move out of processor.h traps related variable/function
Julien Grall [Wed, 31 Oct 2018 18:13:03 +0000 (18:13 +0000)]
xen/arm: Move out of processor.h traps related variable/function

do_unexpected_traps() is moved to traps.h while init_traps() and
hyp_traps_vectors() are moved to setup.h.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Move SYSREG accessors in sysregs.h
Julien Grall [Wed, 31 Oct 2018 18:13:02 +0000 (18:13 +0000)]
xen/arm: Move SYSREG accessors in sysregs.h

System registers accessors are self-contained and should not be included
everywhere in Xen. Move the accessors in sysregs.h and include the file
when necessary.

With that change, it is not necessary to include processor.h in time.h.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Move HSR defines in a new header hsr.h
Julien Grall [Wed, 31 Oct 2018 18:13:01 +0000 (18:13 +0000)]
xen/arm: Move HSR defines in a new header hsr.h

The HSR defines are pretty much self-contained and not necessary to be
included everywhere in Xen. So move them in a new header hsr.h.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: gic-v3: Re-order includes in alphabetical order
Julien Grall [Wed, 31 Oct 2018 18:13:00 +0000 (18:13 +0000)]
xen/arm: gic-v3: Re-order includes in alphabetical order

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: gic-3: Remove unused includes
Julien Grall [Wed, 31 Oct 2018 18:12:59 +0000 (18:12 +0000)]
xen/arm: gic-3: Remove unused includes

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Move VABORT_GEN_BY_GUEST to traps.h and turned into inline
Julien Grall [Wed, 31 Oct 2018 18:12:58 +0000 (18:12 +0000)]
xen/arm: Move VABORT_GEN_BY_GUEST to traps.h and turned into inline

The macro VABORT_GEN_BY_GUEST is only used by the trap code. So move it
to trap.h.

While moving the code, convert is to a static inline to allow typecheck.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Consolidate CPU identification in cpufeature.{c,h}
Julien Grall [Wed, 31 Oct 2018 18:12:57 +0000 (18:12 +0000)]
xen/arm: Consolidate CPU identification in cpufeature.{c,h}

At the moment, CPU Identification is spread accross cpu.c, cpufeature.c,
processor.h, cpufeature.h. It would be better to keep everything
together in a single place.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: bugs: Move do_bug_frame to traps.h
Julien Grall [Wed, 31 Oct 2018 18:12:56 +0000 (18:12 +0000)]
xen/arm: bugs: Move do_bug_frame to traps.h

do_bug_frame is only necessary when trapping. This allows to remove
processor.h include.

However, time.h was missing an include resulting to compilation error if
processor.h is removed from bug.h.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Remove __init from prototype
Julien Grall [Wed, 31 Oct 2018 18:12:55 +0000 (18:12 +0000)]
xen/arm: Remove __init from prototype

In Xen, it is common to add __init to the declaration and not the
prototype. Remove the few __init on some prototypes which allows to
avoid the inclusion of init.h in headers.

With these changes, init.h is now required to be included on some c
files. Also, add __init where it was missing in declaration.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: regs: Convert guest_mode to a static inline helper
Julien Grall [Wed, 31 Oct 2018 18:12:54 +0000 (18:12 +0000)]
xen/arm: regs: Convert guest_mode to a static inline helper

At the same time, switch the parameter guest_mode from int to bool

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: traps: Constify show_*, do_unexpected_trap and do_bug_frame parameters
Julien Grall [Wed, 31 Oct 2018 18:12:53 +0000 (18:12 +0000)]
xen/arm: traps: Constify show_*, do_unexpected_trap and do_bug_frame parameters

Those helpers are not meant to modify most of the parameters. So constify them.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agox86/hvm: clean up the rest of bool_t from vm_event
Alexandru Isaila [Fri, 9 Nov 2018 12:06:28 +0000 (13:06 +0100)]
x86/hvm: clean up the rest of bool_t from vm_event

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agopass-through: adjust pIRQ migration
Jan Beulich [Fri, 9 Nov 2018 12:05:28 +0000 (13:05 +0100)]
pass-through: adjust pIRQ migration

For one it is quite pointless to iterate over all pIRQ-s the domain has
when just one is being adjusted. Introduce hvm_migrate_pirq() as an
externally accessible function.

Additionally it is bogus to migrate the pIRQ to a vCPU different from
the one the event is supposed to be posted to - if anything, it might be
worth considering not to migrate the pIRQ at all in the posting case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/dom0: Use init_xen_pae_l2_slots() rather than opencoding it
Andrew Cooper [Thu, 8 Nov 2018 14:17:46 +0000 (14:17 +0000)]
x86/dom0: Use init_xen_pae_l2_slots() rather than opencoding it

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/grant_table: Remove stale comment on top of map_grant_ref
Julien Grall [Thu, 1 Nov 2018 10:16:58 +0000 (10:16 +0000)]
xen/grant_table: Remove stale comment on top of map_grant_ref

Remove the 2 part comment on top of map_grant_ref:
    - The first part mention the return value which has been void since
    2006!
    - The second part mention a local variable 'addr' which does not
    exist anymore.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen/arm: initialize access
Stefano Stabellini [Tue, 6 Nov 2018 22:05:57 +0000 (14:05 -0800)]
xen/arm: initialize access

Initialize variable *access before returning it back to the caller.
It makes the code a bit nicer and it is a safety certification
requirement.

M3CM Rule-9.1: The value of an object with automatic storage duration
shall not be read before it has been set

QAVerify: 2962
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: rcojocaru@bitdefender.com
CC: Tamas K Lengyel <tamas@tklengyel.com>
6 years agoxen/arm: initialize target
Stefano Stabellini [Tue, 6 Nov 2018 22:05:56 +0000 (14:05 -0800)]
xen/arm: initialize target

Initialize variable target before passing it as a parameter.
It makes the code a bit nicer and it is a safety certification
requirement.

M3CM Rule-9.1: The value of an object with automatic storage duration
shall not be read before it has been set

QAVerify: 2972
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agox86/traps: use only one stub function for l/cstar
Wei Liu [Fri, 9 Nov 2018 10:46:36 +0000 (10:46 +0000)]
x86/traps: use only one stub function for l/cstar

And place it into .text.cold.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agocpufreq: convert to a single post-init driver (hooks) instance
Jan Beulich [Fri, 9 Nov 2018 10:42:10 +0000 (11:42 +0100)]
cpufreq: convert to a single post-init driver (hooks) instance

This reduces the post-init memory footprint, eliminates a pointless
level of indirection at the use sites, and allows for subsequent
alternatives call patching.

Take the opportunity and also add a name to the PowerNow! instance.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxsm: remove printing from set_to_dummy_if_null()
Xin Li [Fri, 9 Nov 2018 10:41:30 +0000 (11:41 +0100)]
xsm: remove printing from set_to_dummy_if_null()

Filling dummy module's hook to null value of xsm_operations structure
will generate debug message. This becomes boot time spew for module
like silo, which only sets a few hooks of itself. So remove the printing
to avoid boot time spew.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Xin Li <xin.li@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
6 years agoviridian: introduce struct viridian_page
Paul Durrant [Fri, 9 Nov 2018 10:40:12 +0000 (11:40 +0100)]
viridian: introduce struct viridian_page

The 'vp_assist' page is currently an example of a guest page which needs to
be kept mapped throughout the life-time of a guest, but there are other
such examples in the specifiction [1]. This patch therefore introduces a
generic 'viridian_page' type and converts the current vp_assist/apic_assist
related code to use it. Subsequent patches implementing other enlightments
can then also make use of it.

This patch also renames the 'vp_assist_pending' field in struct
hvm_viridian_vcpu_context to 'apic_assist_pending' to more accurately
reflect its meaning. The term 'vp_assist' applies to the whole page rather
than just the EOI-avoidance enlightenment. New versons of the specification
have defined data structures for other enlightenments within the same page.

No functional change.

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoviridian: define type for the 'virtual VP assist page'
Paul Durrant [Fri, 9 Nov 2018 10:39:27 +0000 (11:39 +0100)]
viridian: define type for the 'virtual VP assist page'

The specification [1] defines a type so we should use it, rather than just
OR-ing and AND-ing magic bits.

No functional change.

NOTE: The type defined in the specification does include an anonymous
      sub-struct in the page type but, as we currently use only the first
      element, the struct declaration has been omitted.

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoviridian: separate time related enlightenment implementations...
Paul Durrant [Fri, 9 Nov 2018 10:38:03 +0000 (11:38 +0100)]
viridian: separate time related enlightenment implementations...

...into new 'time' module.

This patch reduces the size of the main viridian source module by
moving time related enlightenments into their own source module. This is
done in anticipation of implementation of more such enightenments and
a desire to not further lengthen the main source module when this work
is done.

While moving the code:

- Move the declaration of HV_REFERENCE_TSC_PAGE from the header file into
  the new source module, since it is only used there.
- Clean up a bool_t.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoviridian: separate interrupt related enlightenment implementations...
Paul Durrant [Fri, 9 Nov 2018 10:36:52 +0000 (11:36 +0100)]
viridian: separate interrupt related enlightenment implementations...

...into new 'synic' module.

The SynIC (synthetic interrupt controller) is specified [1] to be a super-
set of a virtualized LAPIC, and its definition encompasses all
enlightenments related to virtual interrupt control.

This patch reduces the size of the main viridian source module by giving
these enlightenments their own module. This is done in anticipation of
implementation of more such enlightenments and a desire not to further
lengthen then main source module when this work is done.

Whilst moving the code:

- Fix various style issues.
- Move the MSR definitions into the header (since they are now needed in
  more than one source module).

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoautomation: build with Ubuntu 18.04
Wei Liu [Mon, 22 Oct 2018 15:18:51 +0000 (16:18 +0100)]
automation: build with Ubuntu 18.04

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoautomation: add dockerfile for Ubuntu 18.04
Wei Liu [Mon, 22 Oct 2018 15:18:50 +0000 (16:18 +0100)]
automation: add dockerfile for Ubuntu 18.04

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoRevert "arch/x86: Add registers to vm_event"
Wei Liu [Thu, 8 Nov 2018 17:22:35 +0000 (17:22 +0000)]
Revert "arch/x86: Add registers to vm_event"

This reverts commit da61a2102ff9f2430cad14277009a4cae05ac779, because
it breaks !CONFIG_HVM builds.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoautomation: build some customised configs
Wei Liu [Fri, 2 Nov 2018 17:49:47 +0000 (17:49 +0000)]
automation: build some customised configs

Introduce a new directory to put in configs we care about. Modify
build script to build with those configs.

While we only introduce x86 configs initially, provision for non-x86
configs.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agox86: expose CONFIG_PV
Wei Liu [Thu, 4 Oct 2018 09:15:08 +0000 (10:15 +0100)]
x86: expose CONFIG_PV

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: make PV hypercall entry points work with !CONFIG_PV
Wei Liu [Fri, 2 Nov 2018 13:44:01 +0000 (13:44 +0000)]
x86: make PV hypercall entry points work with !CONFIG_PV

We want Xen to crash if we hit these paths when PV is disabled.

For syscall, we provide stubs for {l,c}star_enter which end up calling
panic.  For sysenter, we initialise CS to 0 so that #GP can be raised.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/amd: don't set pv_post_outb_hook when !CONFIG_PV
Wei Liu [Thu, 8 Nov 2018 14:52:03 +0000 (14:52 +0000)]
x86/amd: don't set pv_post_outb_hook when !CONFIG_PV

Obviously it won't exist when PV is disabled.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoamd/pvh: enable ACPI C1E disable quirk on PVH Dom0
Roger Pau Monne [Thu, 8 Nov 2018 14:23:58 +0000 (15:23 +0100)]
amd/pvh: enable ACPI C1E disable quirk on PVH Dom0

PV Dom0 has a quirk for some AMD processors, where enabling ACPI can
also enable C1E mode. Apply the same workaround as done on PV for a
PVH Dom0, which consist on trapping accesses to the SMI command IO
port and disabling C1E if ACPI is enabled.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoarch/x86: Add registers to vm_event
Alexandru Stefan ISAILA [Mon, 5 Nov 2018 09:54:06 +0000 (09:54 +0000)]
arch/x86: Add registers to vm_event

This patch adds a couple of regs to the vm_event that are used by
the introspection. The base, limit and ar
bits are compressed into a uint64_t union so as not to enlarge the
vm_event.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
6 years agox86/genapic: remove indirection from genapic hook accesses
Jan Beulich [Thu, 8 Nov 2018 14:59:14 +0000 (15:59 +0100)]
x86/genapic: remove indirection from genapic hook accesses

Instead of loading a pointer at each use site, have a single runtime
instance of struct genapic, copying into it from the individual
instances. The individual instances can this way also be moved to .init
(also adjust apic_probe[] at this occasion).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agotools/misc: fix hard tabs in xen-hvmctx.c
Paul Durrant [Wed, 7 Nov 2018 10:52:22 +0000 (10:52 +0000)]
tools/misc: fix hard tabs in xen-hvmctx.c

Also add emacs boilerplate to avoid future problems.

Purely cosmetic. No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/xen-cpuid: Fix 32bit build
Andrew Cooper [Wed, 7 Nov 2018 12:51:43 +0000 (12:51 +0000)]
tools/xen-cpuid: Fix 32bit build

Clang reports:

  xen-cpuid.c:307:29: error: format specifies type 'unsigned long' but the
  argument has type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat]

                 msrs[l].idx, msrs[l].val);
                              ^~~~~~~~~~~

Use PRIx64 instead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agop2m: move p2m-common.h inclusion point
Jan Beulich [Wed, 7 Nov 2018 08:35:14 +0000 (09:35 +0100)]
p2m: move p2m-common.h inclusion point

The header is (hence its name) supposed to be a helper for the per-arch
p2m.h files. It was never supposed to be included directly, and for the
purpose of putting common function declarations into the common header
it is more helpful if things like p2m_t are already available at the
inclusion point.

This also undoes parts of 02ede7dc03 ("memory: add
check_get_page_from_gfn() as a wrapper..."), which had been there just
because of the unhelpful original way of including p2m-common.h.

Take the opportunity and also ditch a duplicate public/memory.h from the
ARM header.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agomm/page_alloc: make bootscrub happen in idle-loop
Sergey Dyasli [Wed, 7 Nov 2018 08:34:17 +0000 (09:34 +0100)]
mm/page_alloc: make bootscrub happen in idle-loop

Scrubbing RAM during boot may take a long time on machines with lots
of RAM. Add 'idle' option to bootscrub which marks all pages dirty
initially so they will eventually be scrubbed in idle-loop on every
online CPU.

It's guaranteed that the allocator will return scrubbed pages by doing
eager scrubbing during allocation (unless MEMF_no_scrub was provided).

Use the new 'idle' option as the default one.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: work around HLE host lockup erratum
Jan Beulich [Wed, 7 Nov 2018 08:33:24 +0000 (09:33 +0100)]
x86: work around HLE host lockup erratum

XACQUIRE prefixed accesses to the 4Mb range of memory starting at 1Gb
are liable to lock up the processor. Disallow use of this memory range.

Unfortunately the available Core Gen7 and Gen8 spec updates are pretty
old, so I can only guess that they're similarly affected when Core Gen6
is and the Xeon counterparts are, too.

This is part of XSA-282.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: extend get_platform_badpages() interface
Jan Beulich [Wed, 7 Nov 2018 08:32:08 +0000 (09:32 +0100)]
x86: extend get_platform_badpages() interface

Use a structure so along with an address (now frame number) an order can
also be specified.

This is part of XSA-282.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/domctl: Implement XEN_DOMCTL_get_cpu_policy
Sergey Dyasli [Thu, 21 Jun 2018 14:35:50 +0000 (16:35 +0200)]
x86/domctl: Implement XEN_DOMCTL_get_cpu_policy

This finally (after literally years of work!) marks the point where the
toolstack can ask the hypervisor for the current CPUID configuration of a
specific domain.

Introduce a new flask access vector and update the default policies.

Also extend xen-cpuid's --policy mode to be able to take a domid and dump a
specific domains CPUID and MSR policy.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/sysctl: Implement XEN_SYSCTL_get_cpu_policy
Sergey Dyasli [Thu, 21 Jun 2018 14:35:50 +0000 (16:35 +0200)]
x86/sysctl: Implement XEN_SYSCTL_get_cpu_policy

Provide a SYSCTL for the toolstack to obtain complete system CPUID and MSR
policy information.

For the flask side of things, this subop is closely related to
{phys,cputopo,numa}info, so shares the physinfo access vector.

Extend the xen-cpuid utility to be able to dump the system policies.  An
example output is:

  Xen reports there are maximum 113 leaves and 3 MSRs
  Raw policy: 93 leaves, 3 MSRs
   CPUID:
    leaf     subleaf  -> eax      ebx      ecx      edx
    00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
    00000001:ffffffff -> 000306c3:00100800:7ffafbff:bfebfbff
    00000002:ffffffff -> 76036301:00f0b5ff:00000000:00c10000
    00000004:00000000 -> 1c004121:01c0003f:0000003f:00000000
    00000004:00000001 -> 1c004122:01c0003f:0000003f:00000000
    00000004:00000002 -> 1c004143:01c0003f:000001ff:00000000
    00000004:00000003 -> 1c03c163:03c0003f:00001fff:00000006
    00000005:ffffffff -> 00000040:00000040:00000003:00042120
    00000006:ffffffff -> 00000077:00000002:00000009:00000000
    00000007:00000000 -> 00000000:000027ab:00000000:9c000000
    0000000a:ffffffff -> 07300403:00000000:00000000:00000603
    0000000b:00000000 -> 00000001:00000002:00000100:00000000
    0000000b:00000001 -> 00000004:00000008:00000201:00000000
    0000000d:00000000 -> 00000007:00000340:00000340:00000000
    0000000d:00000001 -> 00000001:00000000:00000000:00000000
    0000000d:00000002 -> 00000100:00000240:00000000:00000000
    80000000:ffffffff -> 80000008:00000000:00000000:00000000
    80000001:ffffffff -> 00000000:00000000:00000021:2c100800
    80000002:ffffffff -> 65746e49:2952286c:6f655820:2952286e
    80000003:ffffffff -> 55504320:2d334520:30343231:20337620
    80000004:ffffffff -> 2e332040:48473034:0000007a:00000000
    80000006:ffffffff -> 00000000:00000000:01006040:00000000
    80000007:ffffffff -> 00000000:00000000:00000000:00000100
    80000008:ffffffff -> 00003027:00000000:00000000:00000000
   MSRs:
    index    -> value
    000000ce -> 0000000080000000

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: Introduce struct cpu_policy to refer to a group of individual policies
Andrew Cooper [Mon, 2 Jul 2018 16:05:33 +0000 (16:05 +0000)]
x86: Introduce struct cpu_policy to refer to a group of individual policies

This is prep work for the following patch - please refer to it as well.

When auditing and manipulating policies, it is necessary to do so with a
complete set of policies, due to the interdependences of the contents.  A
containing structure like this will allow for clearer APIs and code.

As a first user, this structure is convenient for the mapping used by
XEN_SYSCTL_get_cpu_policy (implemented in the next patch), and for auditing
(later when XEN_DOMCTL_set_cpu_policy is implemented).

At this point, the distinction between *_max and *_default is introduced into
the ABI.  For now, *_default is mapped to *_max, but future development work
will result in *_default being a logical subset of *_max.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agolibx86: Introduce a helper to serialise msr_policy objects
Roger Pau Monné [Thu, 21 Jun 2018 14:35:50 +0000 (16:35 +0200)]
libx86: Introduce a helper to serialise msr_policy objects

As with CPUID, an architectural form is used for representing the MSR data.
It is expected not to change moving forwards, but does have a 32 bit field
(currently reserved) which can be used compatibly if needs be.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agolibx86: Introduce a helper to serialise cpuid_policy objects
Andrew Cooper [Thu, 21 Jun 2018 14:35:49 +0000 (16:35 +0200)]
libx86: Introduce a helper to serialise cpuid_policy objects

The serialised form is made up of the leaf, subleaf and data tuple.  As this
is the architectural form, it is expected not to change going forwards.

The serialisation of the Xen/Viridian leaves isn't fully implemented yet.  It
is just enough to be bug-compatible with the current DOMCTL_set_cpuid
behaviour, but needs further hypervisor work before the toolstack can sensibly
control these values.

x86_cpuid_copy_to_buffer() is implemented using Xen's regular copy_to_guest
primitives, with an API-compatible memcpy() is used for the libxc half of the
build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agotools/dm_depriv: Add first cut RLIMITs
George Dunlap [Tue, 6 Nov 2018 15:41:25 +0000 (15:41 +0000)]
tools/dm_depriv: Add first cut RLIMITs

Limit the ability of a potentially compromised QEMU to consume system
resources.  Key limits:
 - RLIMIT_FSIZE (file size): 256KiB
 - RLIMIT_NPROC (after uid changes to a unique uid)

Probably unnecessary limits but why not:
 - RLIMIT_CORE: 0
 - RLIMIT_MSGQUEUE: 0
 - RLIMIT_LOCKS: 0
 - RLIMIT_MEMLOCK: 0

NB that we do not yet set RLIMIT_AS (total virtual memory) or
RLIMIT_NOFILES (number of open files), since these require more care
and/or more coordination with QEMU to implement.

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Put global headers before local headers (sugg by Paul)
- Move #undif inside the braces (sugg by Paul)

Changes since v3:
- Align RLIMIT_ENTRY list for easier reading
- Fix wrong format string specifier
- Get rid of some trailing whitespace

Changes since v2:
- Use a macro to define rlimit entries
- Use RLIMIT_NLIMITS as an end-of-list marker, rather than -1
- Various style clean-ups

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
6 years agotools/dm_restrict: Unshare mount and IPC namespaces on Linux
George Dunlap [Tue, 6 Nov 2018 15:41:24 +0000 (15:41 +0000)]
tools/dm_restrict: Unshare mount and IPC namespaces on Linux

QEMU running under Xen doesn't need mount or IPC functionality.
Create and enter separate namespaces for each of these before
executing QEMU, so that in the event that other restrictions fail, the
process won't be able to even name system mount points or exsting
non-file-based IPC descriptors to attempt to attack them.

Unsharing is something a process can only do to itself (it would
seem); so add an os-specific "dm_preexec_restrict()" hook just before
we exec() the device model.

Also add checks to depriv-process-checker.sh to verify that dm is
running in a new namespace (or at least, a different one than the
caller).

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Fix function prototype for netbsd code

Changes since v3:
- Fix some more style issues

Changes since v2:
- Return an error rather than calling exit()
- Use LOGE() and print to the current stderr fd, rather than
  printing to the new stderr fd via write()
- Use r for external return values rather than rc.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
6 years agotools/dm_restrict: Ask QEMU to chroot
George Dunlap [Tue, 6 Nov 2018 15:41:23 +0000 (15:41 +0000)]
tools/dm_restrict: Ask QEMU to chroot

When dm_restrict is enabled, ask QEMU to chroot into an empty directory.

* Create $XEN_RUN_DIR/qemu-root-<domid> (deleting the old one if it's there)
* Pass the -chroot option to QEMU

Rather than running `rm -rf` on the directory before creating it
(since there is no library function to do this), simply rmdir the
directory, relying on the fact that the previous QEMU instance, if
properly restricted, shouldn't have been able to write anything
anyway.

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Minor change to comment
- Update stale directory name in commit message

Changes since v2:
- Style fixes
- Testing moved to a different patch

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
6 years agoSUPPORT.md: Add qemu-depriv section
George Dunlap [Tue, 6 Nov 2018 15:41:22 +0000 (15:41 +0000)]
SUPPORT.md: Add qemu-depriv section

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Fix some grammar (s/attack/attacking/;)

Changes since v3:
- Moved from the qemu-depriv doc patches.
- Reword to include the possibility of having a non-dom0 "devicemodel"
  domain which may want to be protected
- Specify `Linux dom0` as the currently-tech-supported window

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Konrad Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
CC: Anthony Perard <anthony.perard@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
6 years agodocs/qemu-deprivilege: Revise and update with status and future plans
George Dunlap [Tue, 6 Nov 2018 15:41:22 +0000 (15:41 +0000)]
docs/qemu-deprivilege: Revise and update with status and future plans

docs/qemu-deprivilege.txt had some basic instructions for using
dm_restrict, but it was incomplete, misleading, and stale.

Update the docs in a number of ways.

First, separate user-facing documentation and technical description
into docs/features and docs/design, respectively.

In the feature doc:

* Introduce a section mentioning minimim versions of Linux, Xen, and
qemu required (TBD)

* Fix the discussion of qemu userid.  Mention xen-qemuuser-range-base,
and provide example shell code that actually has some hope of working
(instead of failing out after creating 900 userids).

* Describe how to enable restrictions, as well as features which
probably don't or definitely don't work.

In the design doc, introduce a "Technical Details" section which
describes specifically what restrictions are currently done, and also
what restrictions we are looking at doing in the future.

The idea here is that as we implement the various items for the
future, we move them from "Restrictions still to do" to "Restrictions
done".  This can also act as a design document -- a place for public
discussion of what can or should be done and how.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Remove unnecessary FIXME
- Remove stale "Add SUPPORT.md"

Changes since v3:
- Fix typo (32->16)
- Use an example value not close to the `nobody` uids, but still a
  multiple of 2^16.
- Mention that using a multiple of 2^16 may have advantages.
- Have the example create a group as well
- Reorganize two comments on the "range-base" method for clarity

Changes since v2:
- Extraneous privcmd / evtchn instances aren't closed
- Expand description of how to test fd deprivileging
- Rework and clarify two namespace sections, give reference for QEMU NAK
- Add more information about migration technical challenges
- In UID section, mention possibility of container ID collisions.
- Fix name of design document.
- Add SUPPORT.md statement.  Specify Linux, to make sure that FreeBSD is
  evaluated separately.
- Mention that `-sandbox` is a blacklist and why

Changes since v1:
- Break into two, and move into appropriate directories (rather than 'misc')
- Updated version requirements
- Distinguish between features which "don't yet work" and features which we never expect to work
- Update description of xen-restrict functionality
- Reorder and expand further restrictions
- Make it more clear which restrictions are available on Linux only
- Include detailed description of how to kill a process
- Add RLIMIT_NPROC as something we can do without further changes to qemu
- Document the need to check for the sandbox feature before using it

Thank you to Ross Lagerwall, whose description of what XenServer is
doing formed much of the basis for the text here.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Konrad Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
CC: Anthony Perard <anthony.perard@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
6 years agotools: ipxe: Correct download error handling
Ian Jackson [Mon, 5 Nov 2018 18:40:49 +0000 (18:40 +0000)]
tools: ipxe: Correct download error handling

This shell fragment lacked set -e.  So, eg if the download failed a
broken ipxe.tar.gz would be left behind.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Tested-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools: Once again honour, but no longer advertise GIT_HTTP env var
Ian Jackson [Mon, 5 Nov 2018 18:37:05 +0000 (18:37 +0000)]
tools: Once again honour, but no longer advertise GIT_HTTP env var

In "build: add autoconf to replace custom checks in tools/check"
--enable-githttp was introduced.  But we missed this comment where it
was advertised.

Also, that commit had the effect of uncondtionally setting GIT_HTTP
from the configure variable.  But the env var has been advertised in
some places as the way to specify this behaviour, and overriding it is
just unfriendly.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools: libxl/xl: run NUMA placement even when an hard-affinity is set
Dario Faggioli [Fri, 19 Oct 2018 15:54:41 +0000 (17:54 +0200)]
tools: libxl/xl: run NUMA placement even when an hard-affinity is set

Right now, if either an hard or soft-affinity are explicitly specified
in a domain's config file, automatic NUMA placement is skipped. However,
automatic NUMA placement affects only the soft-affinity of the domain
which is being created.

Therefore, it is ok to let it run if an hard-affinity is specified. The
semantics will be that the best placement candidate would be found,
respecting the specified hard-affinity, i.e., using only the nodes that
contain the pcpus in the hard-affinity mask.

This is particularly helpful if global xl pinning masks are defined, as
made possible by commit aa67b97ed34279c43 ("xl.conf: Add global affinity
masks"). In fact, without this commit, defining a global affinity mask
would also mean disabling automatic placement, but that does not
necessarily have to be the case (especially in large systems).

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: put x86emul_{read,write}_dr under CONFIG_PV
Wei Liu [Mon, 5 Nov 2018 17:38:58 +0000 (17:38 +0000)]
x86: put x86emul_{read,write}_dr under CONFIG_PV

A build breakage is discovered by a non-debug build. Debug build
worked because the ASSERT made the compiler eliminate the rest of the
functions.

Currently they are PV only. There are comments alluding to possible
future HVM support but we can cross the bridge when we get there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoRelease: add release note link to SUPPORT.md
Juergen Gross [Fri, 26 Oct 2018 13:13:44 +0000 (15:13 +0200)]
Release: add release note link to SUPPORT.md

In order to have a link to the release notes in the feature list
generated from SUPPORT.md add that link in the "Release Support"
section of that file.

The real link needs to be adapted when the version is being released.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agodocs: remove ChangeLog file
Juergen Gross [Fri, 26 Oct 2018 10:38:06 +0000 (12:38 +0200)]
docs: remove ChangeLog file

docs/ChangeLog has been updated for Xen 3.3 last time. It seems to be
interesting for archaeologists only today.

Remove it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: make entry point code build when !CONFIG_PV
Wei Liu [Fri, 19 Oct 2018 11:32:12 +0000 (12:32 +0100)]
x86: make entry point code build when !CONFIG_PV

Skip building x86_64/compat/entry.S and put CONFIG_PV in
x86_64/entry.S.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/traps: Misc non-functional cleanup
Andrew Cooper [Mon, 5 Nov 2018 16:03:03 +0000 (16:03 +0000)]
x86/traps: Misc non-functional cleanup

 * s/unsigned char/uint8_t/ for clarity
 * Drop redundant return at the end of a void function

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agotools: Move the typesafe min/max helpers into xen-tools/libs.h
Andrew Cooper [Thu, 19 Jul 2018 15:42:07 +0000 (16:42 +0100)]
tools: Move the typesafe min/max helpers into xen-tools/libs.h

... rather than implementing them separately for libxc and libxl.  They will
shortly be wanted in libx86 as well.

Fix up the style/consistency in the declaration, but no functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/vcpu: Remove struct vcpu allocation restriction when possible
Andrew Cooper [Fri, 2 Nov 2018 17:46:38 +0000 (17:46 +0000)]
x86/vcpu: Remove struct vcpu allocation restriction when possible

There is no need for struct vcpu to live below the 4G boundary for PV guests,
or for HVM vcpus using HAP.

Plumb struct domain into alloc_vcpu_struct() so the x86 version can query the
domain's type and paging settings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agox86: update help string for CONFIG_HVM
Wei Liu [Fri, 2 Nov 2018 15:55:45 +0000 (15:55 +0000)]
x86: update help string for CONFIG_HVM

Update text. Change "guest" to "domain" where appropriate because
"guest" doesn't include Domain 0.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: rearrange x86_64/entry.S
Wei Liu [Fri, 2 Nov 2018 15:55:42 +0000 (15:55 +0000)]
x86: rearrange x86_64/entry.S

Split the file into two halves. The first half pertains to PV guest
code while the second half is mostly used by the hypervisor itself to
handle interrupts and exceptions.

No functional change intended.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/domctl: rework XEN_DOMCTL_{set,get}_address_size
Wei Liu [Fri, 2 Nov 2018 15:55:40 +0000 (15:55 +0000)]
x86/domctl: rework XEN_DOMCTL_{set,get}_address_size

Going through toolstack code, they are used for PV guests only.

Tighten their access to PV only. Return -EOPNOTSUPP if they are called
on HVM guests. Rewrite the code in a pattern that makes DCE work.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: make traps.c build with !CONFIG_PV
Wei Liu [Fri, 2 Nov 2018 15:55:39 +0000 (15:55 +0000)]
x86: make traps.c build with !CONFIG_PV

Provide a stub for pv_inject_event. Put code that accesses PV fields
and GDT / LDT fault handling code under CONFIG_PV. Move set_debugreg
to pv/misc-hypercalls.c.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: put vcpumask_to_pcpumask under CONFIG_PV
Wei Liu [Fri, 2 Nov 2018 19:28:51 +0000 (19:28 +0000)]
x86: put vcpumask_to_pcpumask under CONFIG_PV

This function is used by PV code only. This issue is discovered by
clang build.

Drop spurious inline while at it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: VME and PVI modes require a #GP(0) check first thing
Jan Beulich [Mon, 5 Nov 2018 10:13:59 +0000 (11:13 +0100)]
x86emul: VME and PVI modes require a #GP(0) check first thing

As explicitly spelled out by the SDM, EFLAGS.VIF and EFLAGS.VIP both set
at the start of an instruction trigger #GP(0) independent of actual
instruction.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: deal with firmware setting bogus TSC_ADJUST values
Jan Beulich [Mon, 5 Nov 2018 10:13:09 +0000 (11:13 +0100)]
x86: deal with firmware setting bogus TSC_ADJUST values

The system Intel have handed me for AVX512 emulator work ("Gigabyte
Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3
Pro-CF, BIOS F3 12/28/2017") would not come up under Xen - it hung in
the middle of Dom0 PCI initialization. As it turned out, Xen's time
management did not work because of the firmware setting (only) the boot
CPU's TSC_ADJUST MSR to a large negative value (on the order of -2^50).

Follow Linux (also shamelessly stealing their comments) in
- clearing the register for the boot CPU (we don't have a need for
  exceptions here yet, as the only exception in Linux is a class of
  systems Xen doesn't work on anyway as far as I'm aware),
- forcing non-negative values uniformly (commit 855615eee9 ["x86/tsc:
  Remove the TSC_ADJUST clamp"] dropped this, but without this my
  Haswell box won't boot anymore),
- syncing the registers within sockets.
Linux, prior to aforementioned commit, capped at 0x7fffffff as well, but as the
description there says this issue has been addressed with a microcode
update. Hence until someone runs into such a system without being able
to update its microcode, I think we should leave out that specific part.

In order to avoid making init_percpu_time() depend on running _before_
set_cpu_sibling_map() (and hence the booting CPU _not_ being accounted
in socket_cpumask[] yet), move that call slightly earlier in
start_secondary().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/TSC: don't allow deadline timer to be used with unfixed errata
Jan Beulich [Mon, 5 Nov 2018 10:12:39 +0000 (11:12 +0100)]
x86/TSC: don't allow deadline timer to be used with unfixed errata

In preparation of writes to the TSC_ADJUST MSR, avoid the bad
interaction of writes to it and the TSC_DEADLINE one. Presumably the
original Linux commit bd9240a18e ("x86/apic: Add TSC_DEADLINE quirk due
to errata") refers to e.g. KBW092. (Of course this is an issue also
without us writing the TSC_ADJUST MSR, if instead firmware did already.

The errata checking can't be put in init_apic_mappings() as Linux does,
as that runs before we update microcode on the boot CPU. It needs to
happen before consumers of tdt_enabled, i.e.
- __setup_APIC_LVTT() <- setup_APIC_timer() <- setup_boot_APIC_clock()
-                     <- calibrate_APIC_clock() <- setup_boot_APIC_clock()
- setup_boot_APIC_clock()
setup_boot_APIC_clock() gets called from smp_prepare_cpus(), which sits
after microcode loading (note that calibrate_APIC_clock() gets called
before setting tdt_enabled).

Also add an MFENCE as per Linux commit 5d7c631d92 ("x86/apic: Serialize
LVTT and TSC_DEADLINE writes"), but I see no reason to put a conditional
around it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoviridian: remove duplicate union types
Paul Durrant [Mon, 5 Nov 2018 10:11:39 +0000 (11:11 +0100)]
viridian: remove duplicate union types

The 'viridian_vp_assist', 'viridian_hypercall_gpa' and
'viridian_reference_tsc' union types are identical in layout. The layout
is also common throughout the specification [1].

This patch declares a common 'viridian_page_msr' type and converts the rest
of the code to use that type for both the hypercall and VP assist pages.

Also, rename 'viridian_guest_os_id' to 'viridian_guest_os_id_msr' since it
also is a union representing an MSR value.

No functional change.

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
6 years agoviridian: remove comments referencing section number in the spec
Paul Durrant [Mon, 5 Nov 2018 10:10:55 +0000 (11:10 +0100)]
viridian: remove comments referencing section number in the spec

Microsoft has a habit of re-numbering sections in the spec. so avoid
referring to section numbers in comments. Also remove the URL for the
spec. from the boilerplate... Again, Microsoft has a habit of changing
these too.

This patch also cleans up some > 80 character lines.

Purely cosmetic. No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
6 years agoviridian: remove MSR perf counters
Paul Durrant [Mon, 5 Nov 2018 10:09:35 +0000 (11:09 +0100)]
viridian: remove MSR perf counters

They're not really useful so maintaining them is pointless.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
6 years agolibxl/arm: fix guest type conversion
Wei Liu [Fri, 2 Nov 2018 12:34:12 +0000 (12:34 +0000)]
libxl/arm: fix guest type conversion

Commit 359970fd8b ("tools/libxl: Switch Arm guest type to PVH") missed
changing the type field in c_info. This issue didn't surface until
ef72c93df9 which made creating PV guest on Arm unusable.

Create libxl__arch_domain_create_info_setdefault and switch the type
there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agox86/hvm: clean up may_defer from hvm_* helpers
Alexandru Isaila [Fri, 2 Nov 2018 11:16:32 +0000 (12:16 +0100)]
x86/hvm: clean up may_defer from hvm_* helpers

The may_defer var was left with the older bool_t type. This patch
changes the type to bool.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Paul Durrant <paul.durrant@citrix.com>
6 years agoVMX: fix vmx_handle_eoi()
Jan Beulich [Fri, 2 Nov 2018 11:15:33 +0000 (12:15 +0100)]
VMX: fix vmx_handle_eoi()

In commit 303066fdb1e ("VMX: fix interaction of APIC-V and Viridian
emulation") I screwed up: Instead of clearing SVI, other ISR bits
should be taken into account.

Introduce a new helper set_svi(), split out of vmx_process_isr(), and
use it also from vmx_handle_eoi().

Following the problems in vmx_intr_assist() (see the still present big
block of debugging code there) also warn (once) if EOI'd vector and
original SVI don't match.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agotools/ocaml: make type of Xsraw.sync more precise
Christian Lindig [Tue, 30 Oct 2018 10:19:06 +0000 (10:19 +0000)]
tools/ocaml: make type of Xsraw.sync more precise

The type of Xsraw.sync is made more precise:

from val sync : (Xenbus.Xb.t -> 'a) -> con -> string
to   val sync : (Xenbus.Xb.t -> unit) -> con -> string

The first argument is enforced to return unit rather than a value that
is not used anyway.

[ No functional change. -iwj ]

Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agotools/ocaml: Re-introduce Xenctrl.with_intf wrapper
Christian Lindig [Thu, 1 Nov 2018 09:12:53 +0000 (09:12 +0000)]
tools/ocaml: Re-introduce Xenctrl.with_intf wrapper

Commit 81946a73dc975a7dafe9017a8e61d1e64fdbedbf removed
Xenctrl.with_intf based on its undesirable behaviour of opening and
closing a Xenctrl connection with every invocation. This commit
re-introduces with_intf but with an updated behaviour: it maintains a
global Xenctrl connection which is opened upon first usage and kept
open. This handle can be obtained by clients using new functions
get_handle() and close_handle().

The main motivation of re-introducing with_intf is that otherwise
clients will have to implement this functionality individually.

Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibvchan: create xenstore entries in one transaction
Marek Marczykowski-Górecki [Tue, 30 Oct 2018 23:49:05 +0000 (00:49 +0100)]
libvchan: create xenstore entries in one transaction

This will prevent race when client waits for server with xs_watch - all
entries should appear at once.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/misc/xenpm: fix getting info when some CPUs are offline
Marek Marczykowski-Górecki [Wed, 31 Oct 2018 13:04:58 +0000 (14:04 +0100)]
tools/misc/xenpm: fix getting info when some CPUs are offline

Use physinfo.max_cpu_id instead of physinfo.nr_cpus to get max CPU id.
This fixes for example 'xenpm get-cpufreq-para' with smt=off, which
otherwise would miss half of the cores.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>