]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agox86/mm: Fix infinite loop in get_spage_pages()
Andrew Cooper [Mon, 26 Jun 2017 11:58:25 +0000 (12:58 +0100)]
x86/mm: Fix infinite loop in get_spage_pages()

c/s 2b8eb37 switched int i to being unsigned, but the undo logic on failure
relied in i being signed.  As i being unsigned in still preforable, adjust the
undo logic to work with an unsigned i.

Coverity-ID: 1413017
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Will <konrad.wilk@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/arm: Rename vgic_reg* functions definitions and calls to vreg_reg*
Bhupinder Thakur [Thu, 22 Jun 2017 07:38:37 +0000 (13:08 +0530)]
xen/arm: Rename vgic_reg* functions definitions and calls to vreg_reg*

This patch renames the vgic_reg* access functions defined in vreg.h to vreg_reg*
and replaces all calls to vgic_reg* functions in vgic/its emulation code to vreg_reg*.

vreg_reg* are generic functions, which can be used to operate on 32/64-bit registers.

SBSA UART emulation code will also use vreg_reg* access functions for
accessing emulated pl011 registers.

Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: vpl011: Move vgic register access functions to vreg.h
Bhupinder Thakur [Thu, 22 Jun 2017 07:38:36 +0000 (13:08 +0530)]
xen/arm: vpl011: Move vgic register access functions to vreg.h

These functions are generic in nature and can be reused by other emulation
code in Xen. vGICv3 ITS and SBSA UART emulation code, would use these
functions to operate on their registers.

This patch moves the register access function definitions from vgic.h to
vreg.h.

Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/mm: Rename d to currd in do_mmuext_op()
Andrew Cooper [Tue, 18 Apr 2017 14:43:16 +0000 (14:43 +0000)]
x86/mm: Rename d to currd in do_mmuext_op()

This will make future cleanup more obviously correct.  No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86emul: correct CF output of SHLD/SHRD
Jan Beulich [Fri, 23 Jun 2017 15:59:12 +0000 (17:59 +0200)]
x86emul: correct CF output of SHLD/SHRD

CF reflects the last bit shifted out, i.e. can't possibly be derived
from the result value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen-detect: handle asprintf error
Wei Liu [Wed, 21 Jun 2017 14:41:52 +0000 (15:41 +0100)]
xen-detect: handle asprintf error

Otherwise gcc with -Wunused will complain the return value is not
used.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: fix coding a style issue in asm-x86/traps.h
Wei Liu [Thu, 8 Jun 2017 16:28:46 +0000 (17:28 +0100)]
x86: fix coding a style issue in asm-x86/traps.h

And add an emacs block.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move compat_show_guest_statck near its non-compat variant
Wei Liu [Thu, 8 Jun 2017 16:22:33 +0000 (17:22 +0100)]
x86: move compat_show_guest_statck near its non-compat variant

And make it static, remove the declaration in header.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move compat_iret along side its non-compat variant
Wei Liu [Thu, 8 Jun 2017 16:13:29 +0000 (17:13 +0100)]
x86: move compat_iret along side its non-compat variant

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move hypercall_page_initialise_ring1_kernel
Wei Liu [Thu, 8 Jun 2017 16:09:49 +0000 (17:09 +0100)]
x86: move hypercall_page_initialise_ring1_kernel

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move hypercall_page_initialise_ring3_kernel to pv/hypercall.c
Wei Liu [Thu, 8 Jun 2017 16:06:19 +0000 (17:06 +0100)]
x86: move hypercall_page_initialise_ring3_kernel to pv/hypercall.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/traps: move init_int80_direct_trap to pv/traps.c
Wei Liu [Thu, 8 Jun 2017 16:02:08 +0000 (17:02 +0100)]
x86/traps: move init_int80_direct_trap to pv/traps.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move do_iret to pv/iret.c
Wei Liu [Mon, 5 Jun 2017 17:19:38 +0000 (18:19 +0100)]
x86: move do_iret to pv/iret.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move toggle_guest_mode to pv/domain.c
Wei Liu [Mon, 5 Jun 2017 17:13:16 +0000 (18:13 +0100)]
x86: move toggle_guest_mode to pv/domain.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/traps: move set_guest_{machine,nmi}_trapbounce
Wei Liu [Mon, 5 Jun 2017 14:58:25 +0000 (15:58 +0100)]
x86/traps: move set_guest_{machine,nmi}_trapbounce

Take the opportunity to change their return type to bool. And rename
"v" to "curr".

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/traps: move pv_inject_event to pv/traps.c
Wei Liu [Mon, 5 Jun 2017 14:53:54 +0000 (15:53 +0100)]
x86/traps: move pv_inject_event to pv/traps.c

Take the opportunity to rename "v" to "curr".

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move some misc PV hypercalls to misc-hypercalls.c
Wei Liu [Mon, 5 Jun 2017 14:49:14 +0000 (15:49 +0100)]
x86: move some misc PV hypercalls to misc-hypercalls.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: clean up PV emulation code
Wei Liu [Mon, 5 Jun 2017 14:16:17 +0000 (15:16 +0100)]
x86: clean up PV emulation code

Replace bool_t with bool. Fix coding style issues. Add spaces around
binary ops. Use 1U for shifting. Eliminate TOGGLE_MODE.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/livepatch: Don't crash on encountering STN_UNDEF relocations
Andrew Cooper [Tue, 13 Jun 2017 20:36:58 +0000 (21:36 +0100)]
xen/livepatch: Don't crash on encountering STN_UNDEF relocations

A symndx of STN_UNDEF is special, and means a symbol value of 0.  While
legitimate in the ELF standard, its existance in a livepatch is questionable
at best.  Until a plausible usecase presents itself, reject such a relocation
with -EOPNOTSUPP.

Additionally, fix an off-by-one error while range checking symndx, and perform
a safety check on elf->sym[symndx].sym before derefencing it, to avoid
tripping over a NULL pointer when calculating val.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [x86 and arm32]
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
7 years agoxen/livepatch: Use zeroed memory allocations for arrays
Andrew Cooper [Thu, 22 Jun 2017 17:55:31 +0000 (18:55 +0100)]
xen/livepatch: Use zeroed memory allocations for arrays

Each of these arrays is sparse.  Use zeroed allocations to cause uninitialised
array elements to contain deterministic values, most importantly for the
embedded pointers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [x86 and arm32]
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
7 years agoxen/livepatch: Clean up arch relocation handling
Andrew Cooper [Tue, 13 Jun 2017 20:17:47 +0000 (21:17 +0100)]
xen/livepatch: Clean up arch relocation handling

 * Reduce symbol scope and initalisation as much as possible
 * Annotate a fallthrough case in arm64
 * Fix switch statement style in arm32

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [x86 and arm32]
7 years agoxen: Replace ASSERT(0) with ASSERT_UNREACHABLE()
Andrew Cooper [Wed, 21 Jun 2017 11:36:18 +0000 (12:36 +0100)]
xen: Replace ASSERT(0) with ASSERT_UNREACHABLE()

No functional change, but the result is more informative both in the code and
error messages if the assertions do get hit.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.gralL@arm.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agox86/mm: Drop is_guest_l1_slot()
Andrew Cooper [Tue, 18 Apr 2017 14:49:16 +0000 (15:49 +0100)]
x86/mm: Drop is_guest_l1_slot()

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: Misc nonfunctional cleanup
Andrew Cooper [Tue, 18 Apr 2017 14:41:16 +0000 (15:41 +0100)]
x86/mm: Misc nonfunctional cleanup

 * Drop trailing whitespace
 * Apply Xen comment and space style
 * Switch bool_t to bool
 * Drop TOGGLE_MODE() macro
 * Replace erroneous mandatory barriers with smp barriers
 * Switch boolean ints for real bools

No (intended) functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/monitor: Fixed CID 1412966: Memory - corruptions (OVERRUN)
Razvan Cojocaru [Wed, 21 Jun 2017 16:37:31 +0000 (19:37 +0300)]
x86/monitor: Fixed CID 1412966: Memory - corruptions (OVERRUN)

Fixed an issue where the maximum index allowed (31) goes beyond the
actual number of array elements (4) of ad->monitor.write_ctrlreg_mask.
Coverity-ID: 1412966

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agomake steal_page() return a proper error value
Jan Beulich [Fri, 23 Jun 2017 13:49:41 +0000 (15:49 +0200)]
make steal_page() return a proper error value

... and use it where suitable (the tmem caller doesn't propagate an
error code). While it doesn't matter as much, also make donate_page()
follow suit on x86 (on ARM it already returns -ENOSYS).

Also move their declarations to common code and add __must_check.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86emul: simplify SHLD/SHRD handling
Jan Beulich [Fri, 23 Jun 2017 13:48:28 +0000 (15:48 +0200)]
x86emul: simplify SHLD/SHRD handling

First of all there's no point considering the "shift == width" case,
when immediately before that check we mask "shift" by "width - 1". And
then truncate_word() use can be reduced too: dst.val, as obtained by
generic operand fetching code, is already suitably truncated, and its
use can also be made symmetric in the main conditional expression (on
only left shift results). Finally masking the result of a right shift
is not necessary when the left hand operand doesn't have more than
"width" significant bits.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoetherboot: use gzip -n
Bernhard M. Wiedemann [Thu, 22 Jun 2017 09:16:34 +0000 (11:16 +0200)]
etherboot: use gzip -n

to not include current timestamp in results
to allow for reproducible builds.

See https://reproducible-builds.org/ for why this matters

Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agox86/mm: drop redundant domain parameter from get_page_from_gfn_p2m()
Jan Beulich [Thu, 22 Jun 2017 07:58:07 +0000 (09:58 +0200)]
x86/mm: drop redundant domain parameter from get_page_from_gfn_p2m()

It can always be read from the passed p2m. Take the opportunity and
also rename the function, making the "p2m" suffix a prefix, to match
other p2m functions, and convert the "gfn" parameter's type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/mm: consolidate setting of TLB flush time stamps
Jan Beulich [Thu, 22 Jun 2017 07:57:15 +0000 (09:57 +0200)]
x86/mm: consolidate setting of TLB flush time stamps

Move code and comment into a helper function instead of repeating it in
multiple places.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mmuext: don't allow copying/clearing non-RAM pages
Jan Beulich [Thu, 22 Jun 2017 07:56:20 +0000 (09:56 +0200)]
x86/mmuext: don't allow copying/clearing non-RAM pages

The two operations really aren't meant for anything else.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: limit mapkind()'s iteration count
Jan Beulich [Thu, 22 Jun 2017 07:55:08 +0000 (09:55 +0200)]
gnttab: limit mapkind()'s iteration count

There's no need for the function to observe increases of the maptrack
table (which can occur as the maptrack lock isn't being held) - actual
population of maptrack entries is excluded while we're here (by way of
holding the respective grant table lock for writing, while code
populating entries acquires it for reading). Latch the limit ahead of
the loop, allowing for the barrier to move out, too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab: remove host map in the event of a grant_map failure
George Dunlap [Thu, 22 Jun 2017 07:53:18 +0000 (09:53 +0200)]
gnttab: remove host map in the event of a grant_map failure

The current code appropriately removes the reference and type counts
on failure, but leaves the mapping set up. As the only path which can
trigger this is failure from IOMMU manipulation, and as unprivileged
domains are being crashed in that case, this is not by itself a
security issue.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: fold identical error paths in xenmem_add_to_physmap_one()
Jan Beulich [Thu, 22 Jun 2017 07:52:32 +0000 (09:52 +0200)]
x86: fold identical error paths in xenmem_add_to_physmap_one()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoARM: simplify page type handling
Jan Beulich [Thu, 22 Jun 2017 07:51:29 +0000 (09:51 +0200)]
ARM: simplify page type handling

There's no need to have anything here on ARM other than the distinction
between writable and non-writable pages (and even that could likely be
eliminated, but with a more intrusive change). Limit type to a single
bit and drop pinned and validated flags altogether.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoevtchn: convert evtchn_port_is_*() to plain bool
Jan Beulich [Thu, 22 Jun 2017 07:50:47 +0000 (09:50 +0200)]
evtchn: convert evtchn_port_is_*() to plain bool

... at once reducing overall source size by combining some statements
and constifying a few pointers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agodomctl: restrict DOMCTL_set_target to HVM domains
Jan Beulich [Thu, 22 Jun 2017 07:50:00 +0000 (09:50 +0200)]
domctl: restrict DOMCTL_set_target to HVM domains

Both the XSA-217 fix and
lists.xenproject.org/archives/html/xen-devel/2017-04/msg02945.html
make this assumption, so let's enforce it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
7 years agognttab: remove redundant xenheap check from gnttab_transfer()
Jan Beulich [Thu, 22 Jun 2017 07:49:03 +0000 (09:49 +0200)]
gnttab: remove redundant xenheap check from gnttab_transfer()

The message isn't very useful, and the check is being done by
steal_page() anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agopublic: adjust documentation following XSA-217
Jan Beulich [Thu, 22 Jun 2017 07:47:08 +0000 (09:47 +0200)]
public: adjust documentation following XSA-217

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoidle_loop: either deal with tasklets or go idle
Dario Faggioli [Thu, 22 Jun 2017 07:45:37 +0000 (09:45 +0200)]
idle_loop: either deal with tasklets or go idle

In fact, there are two kinds of tasklets: vCPU and
softirq context. When we want to do vCPU context tasklet
work, we force the idle vCPU (of a particular pCPU) into
execution, and run it from there.

This means there are two possible reasons for choosing
to run the idle vCPU:
1) we want a pCPU to go idle,
2) we want to run some vCPU context tasklet work.

If we're in case 2), it does not make sense to even
try to go idle (as the check will _always_ fail).

This patch rearranges the code of the body of idle
vCPUs, so that we actually check whether we are in
case 1) or 2), and act accordingly.

As a matter of fact, this also means that we do not
check if there's any tasklet work to do after waking
up from idle. This is not a problem, because:
a) for softirq context tasklets, if any is queued
   "during" wakeup from idle, TASKLET_SOFTIRQ is
   raised, and the call to do_softirq() (which is still
   happening *after* the wakeup) will take care of it;
b) for vCPU context tasklets, if any is queued "during"
   wakeup from idle, SCHEDULE_SOFTIRQ is raised and
   do_softirq() (happening after the wakeup) calls
   the scheduler. The scheduler sees that there is
   tasklet work pending and confirms the idle vCPU
   in execution, which then will get to execute
   do_tasklet().

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agodocs: improve ARM passthrough doc
Stefano Stabellini [Tue, 20 Jun 2017 23:04:17 +0000 (16:04 -0700)]
docs: improve ARM passthrough doc

Add a warning: use passthrough with care.

Add a pointer to the gic device tree bindings. Add an explanation on how
to calculate irq numbers from device tree.

Add a brief explanation of the reg property and a pointer to the xl docs
for a description of the iomem property. Add a note that in the example
we are using different memory addresses for guests and host.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/traps: remove now unused inclusion of emulate.h
Wei Liu [Mon, 5 Jun 2017 15:04:57 +0000 (16:04 +0100)]
x86/traps: remove now unused inclusion of emulate.h

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: move PV invalid op emulation code
Wei Liu [Mon, 5 Jun 2017 12:07:16 +0000 (13:07 +0100)]
x86: move PV invalid op emulation code

Move the code to pv/emul-inv-op.c. Both functions are unchanged.
Provide pv_emulate_invalid_op and use it in traps.c.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: move PV gate op emulation code
Wei Liu [Mon, 5 Jun 2017 11:59:07 +0000 (12:59 +0100)]
x86: move PV gate op emulation code

Move the code to pv/emul-gate-op.c. Prefix emulate_gate_op with pv_
and export it via pv/traps.h.

Pure code motion except for the rename.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: move PV privileged instruction emulation code
Wei Liu [Mon, 5 Jun 2017 11:44:51 +0000 (12:44 +0100)]
x86: move PV privileged instruction emulation code

Move the code to pv/emul-priv-op.c. Prefix emulate_privileged_op with
pv_ and export it via pv/traps.h.

Also move gpr_switch.S since it is used by the privileged instruction
emulation code only.

Code motion only except for the rename. Cleanup etc will come later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: factor out common PV emulation code
Wei Liu [Mon, 5 Jun 2017 11:26:18 +0000 (12:26 +0100)]
x86: factor out common PV emulation code

We're going to split PV emulation code into several files. This patch
extracts the functions needed by them into a dedicated file.

The functions are now prefixed with "pv_emul_" and exported via a
local header file.

While at it, change bool_t to bool.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/xen-detect: try sysfs node for obtaining guest type
Juergen Gross [Thu, 15 Jun 2017 09:58:27 +0000 (11:58 +0200)]
tools/xen-detect: try sysfs node for obtaining guest type

Instead of relying on cpuid instruction behaviour to tell which domain
type we are just try asking the kernel via the appropriate sysfs node
(added in Linux kernel 4.13).

Keep the old detection logic as a fallback for older kernels.

Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoxen-access: write_ctrlreg_c4 test
Petre Pircalabu [Tue, 20 Jun 2017 15:13:56 +0000 (17:13 +0200)]
xen-access: write_ctrlreg_c4 test

Add test for write_ctrlreg event handling.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
7 years agox86/monitor: add masking support for write_ctrlreg events
Petre Pircalabu [Tue, 20 Jun 2017 15:13:20 +0000 (17:13 +0200)]
x86/monitor: add masking support for write_ctrlreg events

Add support for filtering out the write_ctrlreg monitor events if they
are generated only by changing certains bits.
A new parameter (bitmask) was added to the xc_monitor_write_ctrlreg
function in order to mask the event generation if the changed bits are
set.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agorombios/ata: wait for BSY to clear after write
Ross Lagerwall [Tue, 20 Jun 2017 15:13:02 +0000 (17:13 +0200)]
rombios/ata: wait for BSY to clear after write

After rombios transfers the data for a write, it checks the status and
fails if BSY is set. qemu-trad doesn't set BSY for PIO writes, but QEMU
upstream does, and this causes rombios to fail writes because they are
marked as BSY. Instead, wait for BSY to clear after a write.

INT 13 writes are probably rarely used these days, but they are used by
GRUB 2 to write to its environment file which happens by default on
Ubuntu.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/test/Makefile: Fix clean target, broken by pattern rule
Ian Jackson [Mon, 19 Jun 2017 14:04:08 +0000 (15:04 +0100)]
xen/test/Makefile: Fix clean target, broken by pattern rule

In "xen/test/livepatch: Regularise Makefiles" we reworked
xen/test/Makefile to use a pattern rule.  However, there are two
problems with this.  Both are related to the way that xen/Rules.mk is
implicitly part of this Makefile because of the way that Makefiles
under xen/ are invoked by their parent directory Makefiles.

Firstly, the Rules.mk `clean' target overrides the pattern rule in
xen/test/Makefile.  The result is that `make -C xen clean' does not
actually run the livepatch clean target.

The Rules.mk clean target does have provision for recursing into
subdirectories, but that feature is tangled up with complex object
file iteration machinery which is not desirable here.  However, we can
extend the Rules.mk clean target since it is a double-colon rule.

Sadly this involves duplicating the SUBDIR iteration boilerplate.  (A
make function could be used but the cure would be worse than the
disease.)

Secondly, Rules.mk has a number of -include directives.  make likes to
try to (re)build files mentioned in includes.  With the % pattern
rule, this applies to those files too.

As a result, make -C xen clean would try to build `.*.d' (for example)
in xen/test.  This would fail with an error message.  The error would
be ignored because of the `-', but it's annoying and ugly.

Solve this by limiting the % pattern rule to the targets we expect it
to handle.  These are those listed in the top-level Makefile help
message, apart from: those which are subdir- or component-qualified;
clean targets (which are handled specially, even distclean); and dist,
src-tarball-*, etc. (which are converted to install by an earlier
Makefile).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agomemory: don't suppress P2M update in populate_physmap()
Jan Beulich [Tue, 20 Jun 2017 12:51:53 +0000 (14:51 +0200)]
memory: don't suppress P2M update in populate_physmap()

Commit d18627583d ("memory: don't hand MFN info to translated guests")
wrongly added a null-handle check there - just like stated in its
description for memory_exchange(), the array is also an input for
populate_physmap() (and hence can't reasonably be null). I have no idea
how I've managed to overlook this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agorb_tree: make clear distinction between two different cases in rb_erase()
Wolfram Strepp [Tue, 20 Jun 2017 12:51:07 +0000 (14:51 +0200)]
rb_tree: make clear distinction between two different cases in rb_erase()

There are two cases when a node, having 2 childs, is erased:
'normal case': the successor is not the right-hand-child of the node to be
erased
'special case': the successor is the right-hand child of the node to be erased

Here some ascii-art, with following symbols (referring to the code):
O: node to be deleted
N: the successor of O
P: parent of N
C: child of N
L: some other node

normal case:

                   O                         N
                  / \                       / \
                 /   \                     /   \
                L     \                   L     \
               / \     P      ---->      / \     P
                      / \                       / \
                     /                         /
                    N                         C
                     \                       / \
                      \
                       C
                      / \

special case:
                  O|P                        N
                  / \                       / \
                 /   \                     /   \
                L     \                   L     \
               / \     N      ---->      /       C
                        \                       / \
                         \
                          C
                         / \

Notice that for the special case we don't have to reconnect C to N.

Signed-off-by: Wolfram Strepp <wstrepp@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 4c60117811171d867d4f27f17ea07d7419d45dae]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorb_tree: reorganize code in rb_erase() for additional changes
Wolfram Strepp [Tue, 20 Jun 2017 12:50:39 +0000 (14:50 +0200)]
rb_tree: reorganize code in rb_erase() for additional changes

First, move some code around in order to make the next change more obvious.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Wolfram Strepp <wstrepp@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 16c047add3ceaf0ab882e3e094d1ec904d02312d]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: optimize rb_erase()
Wolfram Strepp [Tue, 20 Jun 2017 12:50:13 +0000 (14:50 +0200)]
rbtree: optimize rb_erase()

Tfour 4 redundant if-conditions in function __rb_erase_color() in
lib/rbtree.c are removed.

In pseudo-source-code, the structure of the code is as follows:

if ((!A || B) && (!C || D)) {
.
.
.
} else {
if (!C || D) {//if this is true, it implies: (A == true) && (B == false)
if (A) {//hence this always evaluates to 'true'...
.
}
.
//at this point, C always becomes true, because of:
__rb_rotate_right/left();
//and:
other = parent->rb_right/left;
}
.
.
if (C) {//...and this too !
.
}
}

Signed-off-by: Wolfram Strepp <wstrepp@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 55a63998b8967615a15e2211ba0ff3a84a565824]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: add const qualifier to some functions
Artem Bityutskiy [Tue, 20 Jun 2017 12:49:40 +0000 (14:49 +0200)]
rbtree: add const qualifier to some functions

The 'rb_first()', 'rb_last()', 'rb_next()' and 'rb_prev()' calls
take a pointer to an RB node or RB root. They do not change the
pointed objects, so add a 'const' qualifier in order to make life
of the users of these functions easier.

Indeed, if I have my own constant pointer &const struct my_type *p,
and I call 'rb_next(&p->rb)', I get a GCC warning:

warning: passing argument 1 of \91rb_next\92 discards qualifiers from pointer target
type

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit f4b477c47332367d35686bd2b808c2156b96d7c7]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
7 years agox86/dpci: make sure hvm_do_IRQ_dpci is only called for HVM guests
Roger Pau Monné [Tue, 20 Jun 2017 12:49:11 +0000 (14:49 +0200)]
x86/dpci: make sure hvm_do_IRQ_dpci is only called for HVM guests

While there add an ASSERT to hvm_do_IRQ_dpci.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agomm: don't use _{g,m}fn for defining INVALID_{G,M}FN
Julien Grall [Tue, 20 Jun 2017 12:48:10 +0000 (14:48 +0200)]
mm: don't use _{g,m}fn for defining INVALID_{G,M}FN

INVALID_{G,M}FN are defined using static inline helpers _{g,m}fn.
This means, they cannot be used to initialize a build time static variable:

In file included from mm.c:24:0:
xen/xen/include/xen/mm.h:59:26: error: initializer element is not constant
 #define INVALID_MFN      _mfn(~0UL)

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Tim Deegan <tim@xen.org>
[jb: add parentheses and const]
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/arm: vgic: Sanitize target mask used to send SGI
Julien Grall [Tue, 6 Jun 2017 14:35:42 +0000 (15:35 +0100)]
xen/arm: vgic: Sanitize target mask used to send SGI

The current function vgic_to_sgi does not sanitize the target mask and
may therefore get an invalid vCPU ID. This will result to an out of
bound access of d->vcpu[...] as there is no check whether the vCPU ID is
within the maximum supported by the guest.

This was introduced by commit ea37fd2111 "xen/arm: split vgic driver
into generic and vgic-v2 driver".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agognttab: __gnttab_unmap_common_complete() is all-or-nothing
Jan Beulich [Tue, 20 Jun 2017 12:46:47 +0000 (14:46 +0200)]
gnttab: __gnttab_unmap_common_complete() is all-or-nothing

All failures have to be detected in __gnttab_unmap_common(), the
completion function must not skip part of its processing. In particular
the GNTMAP_device_map related putting of page references and adjustment
of pin count must not occur if __gnttab_unmap_common() signaled an
error. Furthermore the function must not make adjustments to global
state (here: clearing GNTTAB_device_map) before all possibly failing
operations have been performed.

There's one exception for IOMMU related failures: As IOMMU manipulation
occurs after GNTMAP_*_map have been cleared already, the related page
reference and pin count adjustments need to be done nevertheless. A
fundamental requirement for the correctness of this is that
iommu_{,un}map_page() crash any affected DomU in case of failure.

The version check appears to be pointless (or could perhaps be a
BUG_ON() or ASSERT()), but for the moment also move it.

This is part of XSA-224.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: correct logic to get page references during map requests
George Dunlap [Tue, 20 Jun 2017 12:46:21 +0000 (14:46 +0200)]
gnttab: correct logic to get page references during map requests

The rules for reference counting are somewhat complicated:

* Each of GNTTAB_host_map and GNTTAB_device_map need their own
reference count

* If the mapping is writeable:
 - GNTTAB_host_map needs a type count under only some conditions
 - GNTTAB_device_map always needs a type count

If the mapping succeeds, we need to keep all of these; if the mapping
fails, we need to release whatever references we have acquired so far.

Additionally, the code that does a lot of this calculation "inherits"
a reference as part of the process of finding out who the owner is.

Finally, if the grant is mapped as writeable (without the
GNTMAP_readonly flag), but the hypervisor cannot grab a
PGT_writeable_page type, the entire operation should fail.

Unfortunately, the current code has several logic holes:

* If a grant is mapped only GNTTAB_device_map, and with a writeable
  mapping, but in conditions where a *host* type count is not
  necessary, the code will fail to grab the necessary type count.

* If a grant is mapped both GNTTAB_device_map and GNTTAB_host_map,
  with a writeable mapping, in conditions where the host type count is
  not necessary, *and* where the page cannot be changed to type
  PGT_writeable, the condition will not be detected.

In both cases, this means that on success, the type count will be
erroneously reduced when the grant is unmapped.  In the second case,
the type count will be erroneously reduced on the failure path as
well.  (In the first case the failure path logic has the same hole
as the reference grabbing logic.)

Additionally, the return value of get_page() is not checked; but this
may fail even if the first get_page() succeeded due to a reference
counting overflow.

First of all, simplify the restoration logic by explicitly counting
the reference and type references acquired.

Consider each mapping type separately, explicitly marking the
'incoming' reference as used so we know when we need to grab a second
one.

Finally, always check the return value of get_page[_type]() and go to
the failure path if appropriate.

This is part of XSA-224.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: never create host mapping unless asked to
Jan Beulich [Tue, 20 Jun 2017 12:46:01 +0000 (14:46 +0200)]
gnttab: never create host mapping unless asked to

We shouldn't create a host mapping unless asked to even in the case of
mapping a granted MMIO page. In particular the mapping wouldn't be torn
down when processing the matching unmap request.

This is part of XSA-224.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: fix handling of dev_bus_addr during unmap
George Dunlap [Tue, 20 Jun 2017 12:45:33 +0000 (14:45 +0200)]
gnttab: fix handling of dev_bus_addr during unmap

If a grant has been mapped with the GNTTAB_device_map flag, calling
grant_unmap_ref() with dev_bus_addr set to zero should cause the
GNTTAB_device_map part of the mapping to be left alone.

Unfortunately, at the moment, op->dev_bus_addr is implicitly checked
before clearing the map and adjusting the pin count, but only the bits
above 12; and it is not checked at all before dropping page
references.  This means a guest can repeatedly make such a call to
cause the reference count to drop to zero, causing the page to be
freed and re-used, even though it's still mapped in its pagetables.

To fix this, always check op->dev_bus_addr explicitly for being
non-zero, as well as op->flag & GNTMAP_device_map, before doing
operations on the device_map.

While we're here, make the logic a bit cleaner:

* Always initialize op->frame to zero and set it from act->frame, to reduce the
chance of untrusted input being used

* Explicitly check the full dev_bus_addr against act->frame <<
  PAGE_SHIFT, rather than ignoring the lower 12 bits

This is part of XSA-224.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agoarm: vgic: Don't update the LR when the IRQ is not enabled
Julien Grall [Tue, 20 Jun 2017 12:41:55 +0000 (14:41 +0200)]
arm: vgic: Don't update the LR when the IRQ is not enabled

gic_raise_inflight_irq will be called if the IRQ is already inflight
(i.e the IRQ is injected to the guest). If the IRQ is already already in
the LRs, then the associated LR will be updated.

To know if the interrupt is already in the LR, the function check if the
interrupt is queued. However, if the interrupt is not enabled then the
interrupt may not be queued nor in the LR. So gic_update_one_lr may be
called (if we inject on the current vCPU) and read the LR.

Because the interrupt is not in the LR, Xen will either read:
    * LR 0 if the interrupt was never injected before
    * LR 255 (GIC_INVALID_LR) if the interrupt was injected once. This
    is because gic_update_one_lr will reset p->lr.

Reading LR 0 will result to potentially update the wrong interrupt and
not keep the LRs in sync with Xen.

Reading LR 255 will result to:
    * Crash Xen on GICv3 as the LR index is bigger than supported (see
    gicv3_ich_read_lr).
    * Read/write always GICH_LR + 255 * 4 that is not part of the memory
    mapped.

The problem can be prevented by checking whether the interrupt is
enabled in gic_raise_inflight_irq before calling gic_update_one_lr.

A follow-up of this patch is expected to mitigate the issue in the
future.

This is XSA-223.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoguest_physmap_remove_page() needs its return value checked
Jan Beulich [Tue, 20 Jun 2017 12:41:16 +0000 (14:41 +0200)]
guest_physmap_remove_page() needs its return value checked

Callers, namely such subsequently freeing the page, must not blindly
assume success - the function may namely fail when needing to shatter a
super page, but there not being memory available for the then needed
intermediate page table.

As it happens, guest_remove_page() callers now also all check the
return value.

Furthermore a missed put_gfn() on an error path in gnttab_transfer() is
also being taken care of.

This is part of XSA-222.

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agomemory: fix return value handing of guest_remove_page()
Andrew Cooper [Tue, 20 Jun 2017 12:39:56 +0000 (14:39 +0200)]
memory: fix return value handing of guest_remove_page()

Despite the description in mm.h, guest_remove_page() previously returned 0 for
paging errors.

Switch guest_remove_page() to having regular 0/-error semantics, and propagate
the return values from clear_mmio_p2m_entry() and mem_sharing_unshare_page()
to the callers (although decrease_reservation() is the only caller which
currently cares).

This is part of XSA-222.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoevtchn: avoid NULL derefs
Jan Beulich [Tue, 20 Jun 2017 12:37:47 +0000 (14:37 +0200)]
evtchn: avoid NULL derefs

Commit fbbd5009e6 ("evtchn: refactor low-level event channel port ops")
added a de-reference of the struct evtchn pointer for a port without
first making sure the bucket pointer is non-NULL. This de-reference is
actually entirely unnecessary, as all relevant callers (beyond the
problematic do_poll()) already hold the port number in their hands, and
the actual leaf functions need nothing else.

For FIFO event channels there's a second problem in that the ordering
of reads and updates to ->num_evtchns and ->event_array[] was so far
undefined (the read side isn't always holding the domain's event lock).
Add respective barriers.

This is XSA-221.

Reported-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: avoid leaking PKRU and BND* between vCPU-s
Jan Beulich [Tue, 20 Jun 2017 12:36:51 +0000 (14:36 +0200)]
x86: avoid leaking PKRU and BND* between vCPU-s

PKRU is explicitly "XSAVE-managed but not XSAVE-enabled", so guests
might access the register (via {RD,WR}PKRU) without setting XCR0.PKRU.
Force context switching as well as migrating the register as soon as
CR4.PKE is being set the first time.

For MPX (BND<n>, BNDCFGU, and BNDSTATUS) the situation is less clear,
and the SDM has not entirely consistent information for that case.
While experimentally the instructions don't change register state as
long as the two XCR0 bits aren't both 1, be on the safe side and enable
both if BNDCFGS.EN is being set the first time.

This is XSA-220.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/shadow: hold references for the duration of emulated writes
Andrew Cooper [Tue, 20 Jun 2017 12:36:11 +0000 (14:36 +0200)]
x86/shadow: hold references for the duration of emulated writes

The (misnamed) emulate_gva_to_mfn() function translates a linear address to an
mfn, but releases its page reference before returning the mfn to its caller.

sh_emulate_map_dest() uses the results of one or two translations to construct
a virtual mapping to the underlying frames, completes an emulated
write/cmpxchg, then unmaps the virtual mappings.

The page references need holding until the mappings are unmapped, or the
frames can change ownership before the writes occurs.

This is XSA-219.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agognttab: correct maptrack table accesses
Jan Beulich [Tue, 20 Jun 2017 12:34:34 +0000 (14:34 +0200)]
gnttab: correct maptrack table accesses

In order to observe a consistent (limit,pointer-table) pair, the reader
needs to either hold the maptrack lock (in line with documentation) or
both sides need to order their accesses suitably (the writer side
barrier was removed by commit dff515dfea ["gnttab: use per-VCPU
maptrack free lists"], and a read side barrier has never been there).

Make the writer publish a new table page before limit (for bounds
checks to work), and new list head last (for racing maptrack_entry()
invocations to work). At the same time add read barriers to lockless
readers.

Additionally get_maptrack_handle() must not assume ->maptrack_head to
not change behind its back: Another handle may be put (updating only
->maptrack_tail) and then got or stolen (updating ->maptrack_head).

This is part of XSA-218.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agognttab: Avoid potential double-put of maptrack entry
George Dunlap [Tue, 20 Jun 2017 12:33:13 +0000 (14:33 +0200)]
gnttab: Avoid potential double-put of maptrack entry

Each grant mapping for a particular domain is tracked by an in-Xen
"maptrack" entry.  This entry is is referenced by a "handle", which is
given to the guest when it calls gnttab_map_grant_ref().

There are two types of mapping a particular handle can refer to:
GNTMAP_host_map and GNTMAP_device_map.  A given
gnttab_unmap_grant_ref() call can remove either only one or both of
these entries.  When a particular handle has no entries left, it must
be freed.

gnttab_unmap_grant_ref() loops through its grant unmap request list
twice.  It first removes entries from any host pagetables and (if
appropraite) iommus; then it does a single domain TLB flush; then it
does the clean-up, including telling the granter that entries are no
longer being used (if appropriate).

At the moment, it's during the first pass that the maptrack flags are
cleared, but the second pass that the maptrack entry is freed.

Unfortunately this allows the following race, which results in a
double-free:

 A: (pass 1) clear host_map
 B: (pass 1) clear device_map
 A: (pass 2) See that maptrack entry has no mappings, free it
 B: (pass 2) See that maptrack entry has no mappings, free it #

Unfortunately, unlike the active entry pinning update, we can't simply
move the maptrack flag changes to the second half, because the
maptrack flags are used to determine if iommu entries need to be
added: a domain's iommu must never have fewer permissions than the
maptrack flags indicate, or a subsequent map_grant_ref() might fail to
add the necessary iommu entries.

Instead, free the maptrack entry in the first pass if there are no
further mappings.

This is part of XSA-218.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: fix unmap pin accounting race
Jan Beulich [Tue, 20 Jun 2017 12:32:03 +0000 (14:32 +0200)]
gnttab: fix unmap pin accounting race

Once all {writable} mappings of a grant entry have been unmapped, the
hypervisor informs the guest that the grant entry has been released by
clearing the _GTF_{reading,writing} usage flags in the guest's grant
table as appropriate.

Unfortunately, at the moment, the code that updates the accounting
happens in a different critical section than the one which updates the
usage flags; this means that under the right circumstances, there may be
a window in time after the hypervisor reported the grant as being free
during which the grant referee still had access to the page.

Move the grant accounting code into the same critical section as the
reporting code to make sure this kind of race can't happen.

This is part of XSA-218.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: disallow page stealing from HVM domains
Jan Beulich [Tue, 20 Jun 2017 12:29:51 +0000 (14:29 +0200)]
x86/mm: disallow page stealing from HVM domains

The operation's success can't be controlled by the guest, as the device
model may have an active mapping of the page. If we nevertheless
permitted this operation, we'd have to add further TLB flushing to
prevent scenarios like

"Domains A (HVM), B (PV), C (PV); B->target==A
 Steps:
 1. B maps page X from A as writable
 2. B unmaps page X without a TLB flush
 3. A sends page X to C via GNTTABOP_transfer
 4. C maps page X as pagetable (potentially causing a TLB flush in C,
 but not in B)

 At this point, X would be mapped as a pagetable in C while being
 writable through a stale TLB entry in B."

A similar scenario could be constructed for A using XENMEM_exchange and
some arbitrary PV domain C then having this page allocated.

This is XSA-217.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoipxe: update to newer commit
Wei Liu [Mon, 12 Jun 2017 15:04:17 +0000 (16:04 +0100)]
ipxe: update to newer commit

To get 5f85cbb9ee1c00cec81a848a9e871ad5d1e7f53f to placate gcc 7.

The only patch we have applies cleanly.

Reported-by: Zhongze Liu <blackskygg@gmail.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agotools: fix several "format-truncation" warnings with GCC 7
Zhongze Liu [Wed, 14 Jun 2017 01:11:48 +0000 (09:11 +0800)]
tools: fix several "format-truncation" warnings with GCC 7

GCC 7.1.1 complains that several buffers passed to snprintf() in xenpmd
and tools/ocmal/xc are too small to hold the largest possible resulting string,
which is calculated by adding up the maximum length of all the substrings.

The warnings are treated as errors by -Werror, and goes like this (abbreviated):

xenpmd.c:94:36: error: ‘%s’ directive output may be truncated writing up to
255 bytes into a region of size 13 [-Werror=format-truncation=]
     #define BATTERY_INFO_FILE_PATH "/proc/acpi/battery/%s/info"
                                    ^
xenpmd.c:113:13: note: ‘snprintf’ output between 25 and 280 bytes into a
destination of size 32

xenpmd.c:95:37: error: ‘%s’ directive output may be truncated writing up to
255 bytes into a region of size 13 [-Werror=format-truncation=]
     #define BATTERY_STATE_FILE_PATH "/proc/acpi/battery/%s/state"
                                     ^
xenpmd.c:116:13: note: ‘snprintf’ output between 26 and 281 bytes into a
destination of size 32

xenctrl_stubs.c:65:15: error: ‘%s’ directive output may be truncated writing
up to 1023 bytes into a region of size 252 [-Werror=format-truncation=]
      "%d: %s: %s", error->code,
               ^~
xenctrl_stubs.c:64:4: note: ‘snprintf’ output 5 or more bytes (assuming 1028)
into a destination of size 256

Enlarge the size of these buffers as suggested by the complier
(and slightly rounded) to fix the warnings.

No functional changes.

Signed-off-by: Zhongze Liu <blackskygg@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86emul: minor cleanup
Jan Beulich [Fri, 16 Jun 2017 14:18:54 +0000 (16:18 +0200)]
x86emul: minor cleanup

Drop a redundant input constraint and correct a comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/SVM: correct comments in vmcb.h
Dushyant Behl [Fri, 16 Jun 2017 14:18:10 +0000 (16:18 +0200)]
x86/SVM: correct comments in vmcb.h

The VMEXIT codes listed from EXCEPTION_PF to EXCEPTION_XF had comments
describe the exitcodes slightly shifted than the expected value.
The expected exitcode value for page-fault is 78 which should be 0x4E
and so on till exception XF.

Signed-off-by: Dushyant Behl <myselfdushyantbehl@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agoxen/arm: mm: Use typesafe MFN in dump_pt_walk
Julien Grall [Tue, 13 Jun 2017 16:13:17 +0000 (17:13 +0100)]
xen/arm: mm: Use typesafe MFN in dump_pt_walk

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Use typesafe MFN in set_fixmap
Julien Grall [Tue, 13 Jun 2017 16:13:16 +0000 (17:13 +0100)]
xen/arm: mm: Use typesafe MFN in set_fixmap

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Use the newly introduced MFN <-> MADDR and GFN <-> MADDR helpers
Julien Grall [Tue, 13 Jun 2017 16:13:14 +0000 (17:13 +0100)]
xen/arm: Use the newly introduced MFN <-> MADDR and GFN <-> MADDR helpers

Replace the following constructions:
    - _gfn(paddr_to_pfn(...))   => gaddr_to_gfn(...)
    - _mfn(paddr_to_pfn(...))   => maddr_to_mfn(...)
    - pfn_to_paddr(mfn_x(...))  => mfn_to_maddr(...)
    - pfn_to_paddr(gfn_x(...))  => gfn_to_gaddr(...)
    - _mfn(... >> PAGE_SHIFT)   => maddr_to_mfn(...)

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
Cc: Tamas K Lengyel <tamas@tklengyel.com>
7 years agoxen/arm: Introduce wrappers for MFN <-> MADDR and GFN <-> GADDR
Julien Grall [Tue, 13 Jun 2017 16:13:13 +0000 (17:13 +0100)]
xen/arm: Introduce wrappers for MFN <-> MADDR and GFN <-> GADDR

The new wrappers will add more safety when converting an address to a
frame number (either machine or guest). A follow-up patch will use them
to simplify the code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: traps: Replace p2m_lookup(..., ..., NULL) by gfn_to_mfn(..., ...)
Julien Grall [Tue, 13 Jun 2017 16:13:12 +0000 (17:13 +0100)]
xen/arm: traps: Replace p2m_lookup(..., ..., NULL) by gfn_to_mfn(..., ...)

gfn_to_mfn is a wrapper of p2m_lookup which does not return the
p2m_type.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Replace DIV_ROUND_UP(..., PAGE_SIZE) by PFN_UP(...)
Julien Grall [Tue, 13 Jun 2017 16:13:11 +0000 (17:13 +0100)]
xen/arm: Replace DIV_ROUND_UP(..., PAGE_SIZE) by PFN_UP(...)

DIV_ROUND_UP(..., PAGE_SIZE) and PFN_UP(...) are equivalent.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Replace __va(pfn_to_paddr(...)) by mfn_to_virt
Julien Grall [Tue, 13 Jun 2017 16:13:10 +0000 (17:13 +0100)]
xen/arm: mm: Replace __va(pfn_to_paddr(...)) by mfn_to_virt

__va(pfn_to_paddr(...)) and mfn_to_virt are equivalent.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain_build: Replace paddr_to_pfn(virt_to_maddr(.)) by virt_to_mfn(.)
Julien Grall [Tue, 13 Jun 2017 16:13:09 +0000 (17:13 +0100)]
xen/arm: domain_build: Replace paddr_to_pfn(virt_to_maddr(.)) by virt_to_mfn(.)

paddr_to_pfn(virt_to_maddr(.)) and virt_to_mfn(.) are equivalent.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Define mfn_to_page/page_to_mfn in term of __mfn_to_page/__page_to_mfn
Julien Grall [Tue, 13 Jun 2017 16:13:08 +0000 (17:13 +0100)]
xen/arm: Define mfn_to_page/page_to_mfn in term of __mfn_to_page/__page_to_mfn

This is matching the x86 side where the __* version is used if you need
to override the helpers in source files.

At the same time, move the non-underscore version at the end of the
defintion and add a comment to explain them.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Use typesafe MFN in mfn_to_xen_entry
Julien Grall [Tue, 13 Jun 2017 16:13:07 +0000 (17:13 +0100)]
xen/arm: mm: Use typesafe MFN in mfn_to_xen_entry

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Clean-up mfn_to_xen_entry
Julien Grall [Tue, 13 Jun 2017 16:13:06 +0000 (17:13 +0100)]
xen/arm: mm: Clean-up mfn_to_xen_entry

The physical address is computed from the machine frame number, so
checking if the physical address is page aligned is pointless.

Furthermore, directly assigned the MFN to the corresponding field in the
entry rather than converting to a physical address and orring the value.
It will avoid to rely on the field position and make the code clearer.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Fix coding style of mfn_to_xen_entry
Julien Grall [Tue, 13 Jun 2017 16:13:05 +0000 (17:13 +0100)]
xen/arm: mm: Fix coding style of mfn_to_xen_entry

Fix the comment coding style and add a newline before the return.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Move mfn_to_xen_entry from page.h to mm.c
Julien Grall [Tue, 13 Jun 2017 16:13:04 +0000 (17:13 +0100)]
xen/arm: mm: Move mfn_to_xen_entry from page.h to mm.c

The file mm.c is the only user of mfn_to_xen_entry. This will also help
to use the typesafe MFN.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Introduce clear_table and use it
Julien Grall [Tue, 13 Jun 2017 16:13:03 +0000 (17:13 +0100)]
xen/arm: mm: Introduce clear_table and use it

Add a new helper clear_table to clear a page table entry and invalidate
the cache.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: gic-v2: Fix indentation in gicv2_map_hwdom_extra_mappings
Julien Grall [Tue, 13 Jun 2017 16:13:01 +0000 (17:13 +0100)]
xen/arm: gic-v2: Fix indentation in gicv2_map_hwdom_extra_mappings

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: vITS: create ITS subnodes for Dom0 DT
Andre Przywara [Thu, 18 Aug 2016 14:40:55 +0000 (15:40 +0100)]
ARM: vITS: create ITS subnodes for Dom0 DT

Dom0 expects all ITSes in the system to be propagated to be able to
use MSIs.
Create Dom0 DT nodes for each hardware ITS, keeping the register frame
address the same, as the doorbell address that the Dom0 drivers program
into the BARs has to match the hardware.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: vITS: create and initialize virtual ITSes for Dom0
Andre Przywara [Wed, 7 Sep 2016 00:44:05 +0000 (01:44 +0100)]
ARM: vITS: create and initialize virtual ITSes for Dom0

For each hardware ITS create and initialize a virtual ITS for Dom0.
We use the same memory mapped address to keep the doorbell working.
This introduces a function to initialize a virtual ITS.
We maintain a list of virtual ITSes, at the moment for the only
purpose of later being able to free them again.
We configure the virtual ITSes to match the hardware ones, that is we
keep the number of device ID bits and event ID bits the same as the host
ITS.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: vITS: increase mmio_count for each ITS
Andre Przywara [Mon, 10 Apr 2017 19:13:47 +0000 (20:13 +0100)]
ARM: vITS: increase mmio_count for each ITS

Increase the count of MMIO regions needed by one for each ITS Dom0 has
to emulate. We emulate the ITSes 1:1 from the hardware, so the number
is the number of host ITSes.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: vITS: handle INVALL command
Andre Przywara [Tue, 6 Sep 2016 16:20:50 +0000 (17:20 +0100)]
ARM: vITS: handle INVALL command

The INVALL command instructs an ITS to invalidate the configuration
data for all LPIs associated with a given redistributor (read: VCPU).
This is nasty to emulate exactly with our architecture, so we just
iterate over all mapped LPIs and filter for those from that particular
VCPU.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: vITS: handle INV command
Andre Przywara [Tue, 6 Sep 2016 16:20:17 +0000 (17:20 +0100)]
ARM: vITS: handle INV command

The INV command instructs the ITS to update the configuration data for
a given LPI by re-reading its entry from the property table.
We don't need to care so much about the priority value, but enabling
or disabling an LPI has some effect: We remove or push virtual LPIs
to their VCPUs, also check the virtual pending bit if an LPI gets enabled.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: vITS: handle DISCARD command
Andre Przywara [Wed, 7 Sep 2016 00:51:41 +0000 (01:51 +0100)]
ARM: vITS: handle DISCARD command

The DISCARD command drops the connection between a DeviceID/EventID
and an LPI/collection pair.
We mark the respective structure entries as not allocated and make
sure that any queued IRQs are removed.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: vITS: handle MOVI command
Andre Przywara [Wed, 7 Sep 2016 00:50:55 +0000 (01:50 +0100)]
ARM: vITS: handle MOVI command

The MOVI command moves the interrupt affinity from one redistributor
(read: VCPU) to another.
For now migration of "live" LPIs is not yet implemented, but we store
the changed affinity in our virtual ITTE and the pending_irq.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: vITS: handle MAPTI/MAPI command
Andre Przywara [Wed, 7 Sep 2016 00:49:37 +0000 (01:49 +0100)]
ARM: vITS: handle MAPTI/MAPI command

The MAPTI commands associates a DeviceID/EventID pair with a LPI/CPU
pair and actually instantiates LPI interrupts. MAPI is just a variant
of this comment, where the LPI ID is the same as the event ID.
We connect the already allocated host LPI to this virtual LPI, so that
any triggering LPI on the host can be quickly forwarded to a guest.
Beside entering the domain and the virtual LPI number in the respective
host LPI entry, we also initialize and add the already allocated
struct pending_irq to our radix tree, so that we can now easily find it
by its virtual LPI number.
We also read the property table to update the enabled bit and the
priority for our new LPI, as we might have missed this during an earlier
INVALL call (which only checks mapped LPIs). But we make sure that the
property table is actually valid, as all redistributors might still
be disabled at this point.
Since write_itte() now sees its first usage, we change the declaration
to static.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>