Eric Chanudet [Thu, 16 Nov 2017 10:45:38 +0000 (11:45 +0100)]
x86/hvm: do not register hpet mmio during s3 cycle
Do it once at domain creation (hpet_init).
Sleep -> Resume cycles will end up crashing an HVM guest with hpet as
the sequence during resume takes the path:
-> hvm_s3_suspend
-> hpet_reset
-> hpet_deinit
-> hpet_init
-> register_mmio_handler
-> hvm_next_io_handler
register_mmio_handler will use a new io handler each time, until
eventually it reaches NR_IO_HANDLERS, then hvm_next_io_handler calls
domain_crash.
Signed-off-by: Eric Chanudet <chanudete@ainfosec.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 015d6738ddff4074668c1d4887bbffd507ed1a7f
master date: 2017-11-14 17:09:50 +0100
George Dunlap [Thu, 16 Nov 2017 10:44:58 +0000 (11:44 +0100)]
x86/mm: Make PV linear pagetables optional
Allowing pagetables to point to other pagetables of the same level
(often called 'linear pagetables') has been included in Xen since its
inception; but recently it has been the source of a number of subtle
reference-counting bugs.
It is not used by Linux or MiniOS; but it is used by NetBSD and Novell
Netware. There are significant numbers of people who are never going
to use the feature, along with significant numbers who need the
feature.
Add a Kconfig option for the feature (default to 'y'). Also add a
command-line option to control whether PV linear pagetables are
allowed (default to 'true').
NB that we leave linear_pt_count in the page struct. It's in a union,
so its presence doesn't increase the size of the data struct.
Changing the layout of the other elements based on configuration
options is asking for trouble however; so we'll just leave it there
and ASSERT that it's zero.
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 3285e75dea89afb0ef5b3ee39bd15194bd7cc110
master date: 2017-10-27 14:36:45 +0100
Jan Beulich [Thu, 16 Nov 2017 10:43:07 +0000 (11:43 +0100)]
x86: don't latch wrong (stale) GS base addresses
load_segments() writes selector registers before doing any of the base
address updates. Any of these selector loads can cause a page fault in
case it references the LDT, and the LDT page accessed was only recently
installed. Therefore the call tree map_ldt_shadow_page() ->
guest_get_eff_kern_l1e() -> toggle_guest_mode() would in such a case
wrongly latch the outgoing vCPU's GS.base into the incoming vCPU's
recorded state.
Split page table toggling from GS handling - neither
guest_get_eff_kern_l1e() nor guest_io_okay() need more than the page
tables being the kernel ones for the memory access they want to do.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a711f6f24a7157ae70d1cc32e61b98f23dc0c584
master date: 2017-10-27 13:49:10 +0100
Jan Beulich [Thu, 16 Nov 2017 10:42:16 +0000 (11:42 +0100)]
x86: also show FS/GS base addresses when dumping registers
Their state may be important to figure the reason for a crash. To not
further grow duplicate code, break out a helper function.
I realize that (ab)using the control register array here may not be
considered the nicest solution, but it seems easier (and less overall
overhead) to do so compared to the alternative of introducing another
helper structure.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
master commit: be7f60b5a39741eab0a8fea0324f7be0cb724cfb
master date: 2017-10-24 18:13:13 +0200
Jan Beulich [Thu, 16 Nov 2017 10:41:44 +0000 (11:41 +0100)]
x86: fix GS-base-dirty determination
load_segments() writes the two MSRs in their "canonical" positions
(GS_BASE for the user base, SHADOW_GS_BASE for the kernel one) and uses
SWAPGS to switch them around if the incoming vCPU is in kernel mode. In
order to not leave a stale kernel address in GS_BASE when the incoming
guest is in user mode, the check on the outgoing vCPU needs to be
dependent upon the mode it is currently in, rather than blindly looking
at the user base.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 91f85280b9b80852352fcad73d94ed29fafb88da
master date: 2017-10-24 18:12:31 +0200
David Esler [Thu, 16 Nov 2017 10:40:37 +0000 (11:40 +0100)]
x86/boot: fix early error output
In 9180f5365524 a change was made to the send_chr function to take in
C-strings and output a character at a time until a NULL was encountered.
However, when there is no VGA there is no code to increment the current
character position resulting in an endless loop of the first character.
This moves the (implicit) increment such that it occurs in all cases.
Signed-off-by: David Esler <drumandstrum@gmail.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
[jb: correct title and description] Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
master commit: 78e693cc123296db2f79e792cf474544c1ffd064
master date: 2017-10-20 09:29:29 +0200
Andrew Cooper [Tue, 24 Oct 2017 14:16:47 +0000 (16:16 +0200)]
x86/vvmx: Fix WRMSR interception of VMX MSRs
FEATURE_CONTROL is already read with LOCK bit set (so is unmodifiable), and
all VMX MSRs are read-only. Also, fix the MSR_IA32_VMX_TRUE_ENTRY_CTLS bound
to be MSR_IA32_VMX_VMFUNC, rather than having the intervening MSRs falling
into the default case.
Raise #GP faults if the guest tries to modify any of them.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 46c3acb308bf0cd044b114e637aacaf18b957618
master date: 2017-06-30 11:27:50 +0100
Jan Beulich [Tue, 24 Oct 2017 14:14:50 +0000 (16:14 +0200)]
x86: avoid #GP for PV guest MSR accesses
Halfway recent Linux kernels probe MISC_FEATURES_ENABLES on all CPUs,
leading to ugly recovered #GP fault messages with debug builds on older
systems. We can do better, so introduce synthetic feature flags for
both this and PLATFORM_INFO to avoid the rdmsr_safe() altogether.
Note that the r/o nature of PLATFORM_INFO is now also being enforced.
The rdmsr_safe() uses for MISC_ENABLE are left in place as benign - it
exists for all 64-bit capable Intel CPUs (see e.g. early_init_intel()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
While I can't seem to find any users of this hypercall (being a likely
explanation of why the problem wasn't noticed so far), just like for
do_mmu_update() paged-out and shared page handling is needed here. Move
all this logic into mod_l1_entry(), which then also results in no
longer
- doing any of this handling for non-present PTEs,
- acquiring two temporary page references when one is already more than
enough.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 46aaf41ee099a048d7a554c03ae01bcdaa07f776
master date: 2017-10-13 12:43:41 +0200
George Dunlap [Tue, 24 Oct 2017 14:13:22 +0000 (16:13 +0200)]
fuzz/x86_emulate: clear errors after each iteration
Once feof() returns true for a stream, it will continue to return true
for that stream until clearerr() is called (or the stream is closed
and re-opened).
In llvm-clang-fast-mode, the same file descriptor is used for each
iteration of the loop, meaning that the "Input too large" check was
broken -- feof() would return true even if the fread() hadn't hit the
end of the file. The result is that AFL generates testcases of
arbitrary size.
Fix this by clearing the error after each iteration.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
fuzz/x86_emulate: Clear errors in the officially sanctioned way
Commit 849a1f10c9 was checked in inappropriately; review flagged up
that clearerr() was too big a hammer, as it would clear both the EOF
flag and stream errors.
Stream errors shouldn't be cleared; we only want the EOF and other
stream-related state reset. To do this, it is sufficient to fseek()
to zero.
George Dunlap [Tue, 24 Oct 2017 14:13:03 +0000 (16:13 +0200)]
fuzz/x86_emulate: actually use cpu_regs input
Commit c07574b reorganized the way fuzzing was done, explicitly
creating a structure that the input data would be copied into.
Unfortunately, the cpu register state used by the emulator is on the
stack; it's cleared, but data is never copied into it.
If we're explicitly setting an entirely new cpu_regs struct for each
new input anyway, there's no need to have two copies around anymore;
just point to the one in the data structure.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 581c3e874c300695ddfa8b2a95675a42ebf97301
master date: 2017-10-09 16:03:53 +0200
Jan Beulich [Tue, 24 Oct 2017 14:12:30 +0000 (16:12 +0200)]
x86emul/fuzz: add rudimentary limit checking
fuzz_insn_fetch() is the only data access helper where it is possible
to see offsets larger than 4Gb in 16- or 32-bit modes, as we leave the
incoming rIP untouched in the emulator itself. The check is needed here
as otherwise, after successfully fetching insn bytes, we may end up
zero-extending EIP soon after complete_insn, which collides with the
X86EMUL_EXCEPTION-conditional respective ASSERT() in
x86_emulate_wrapper(). (NB: put_rep_prefix() is what allows
complete_insn to be reached with rc set to other than X86EMUL_OKAY or
X86EMUL_DONE. See also commit 53f87c03b4 ["x86emul: generalize
exception handling for rep_* hooks"].)
Add assert()-s for all other (data) access routines, as effective
address generation in the emulator ought to guarantee in-range values.
For them to not trigger, several adjustments to the emulator's address
calculations are needed: While the DstBitBase one is really mandatory,
the specification allows for either original or new behavior for two-
part accesses. Observed behavior on real hardware, however, is for such
accesses to silently wrap at the 2^^32 boundary in other than 64-bit
mode, just like they do at the 2^^64 boundary in 64-bit mode, which our
code is now being brought in line with. While adding truncate_ea()
invocations there, also convert open coded instances of it.
Reported-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 7869e2bafe11094260a69a31cb07d17901d07b8b
master date: 2017-10-09 16:01:22 +0200
Andrew Cooper [Tue, 24 Oct 2017 14:11:24 +0000 (16:11 +0200)]
xen/domctl: Fix Xen heap leak via XEN_DOMCTL_getvcpucontext
The backing structure for XEN_DOMCTL_getvcpucontext is only zeroed in the x86
HVM case. At the very least, this means that ARM returns junk through its
flags field (as it is only ever conditionally or'd into), and x86 PV leaks
data through gdt_frames[14...15]. (An exhaustive search for other leaks
hasn't been performed).
Unconditionally zero the memory upon allocation, and forgo the double clear
for x86 HVM. These hypercalls are not on hotpaths.
Note that this does not qualify for an XSA. Per XSA-77,
XEN_DOMCTL_getvcpucontext is unsafe for disaggregation, meaning that only the
control domain can use this hypercall.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 3b2eeb7412e529f38d1e8b872ba0bc6ab09a7008
master date: 2017-10-09 12:43:21 +0100
Segment bases (and limits) aren't being cleared by the loading of a nul
selector into a segment register on AMD CPUs. Therefore, if an
outgoing vCPU has a non-zero base in FS or GS and the subsequent
incoming vCPU has a non-zero but nul selector in the respective
register(s), the selector value(s) would be loaded without clearing the
segment base(s) in the hidden register portion.
Since the ABI states "zero" in its description of the fs and gs fields,
it is worth noting that the chosen approach to fix this alters the
written down ABI. I consider this preferrable over enforcing the
previously written down behavior, as nul selectors are far more likely
to be what was meant from the beginning.
The adjustments also eliminate an inconsistency between FS and GS
handling: Old code had an extra pointless (gs_base_user was always zero
when DIRTY_GS was set) conditional for GS. The old bitkeeper changeset
has no explanation for this asymmetry.
Additionally for DS and ES a flat selector is being loaded prior to the
loading of a nul one on AMD CPUs, just as a precautionary measure
(we're not currently aware of ways for a guest to deduce the base of a
segment register which has a nul selector loaded).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 4e383df8650d72e47e2ca4ebfc4f6986f791d2f2
master date: 2017-10-04 14:17:08 +0200
Andrew Cooper [Tue, 24 Oct 2017 14:09:11 +0000 (16:09 +0200)]
x86/svm: Fix a livelock when trying to run shadowed unpaged guests
On AMD processors which support SMEP (Some Fam16h processors) and SMAP (Zen,
Fam17h), a guest which is running with shadow paging and clears CR0.PG while
keeping CR4.{SMEP,SMAP} set will livelock, as hardware raises #PF which the
shadow pagetable concludes shouldn't happen.
This occurs because hardware is running with host paging settings, which
causes the guests choice of SMEP/SMAP to actually take effect, even though
they shouldn't from the guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 3164f2f9db1e63ea64c3f9520d40cb09920d2b35
master date: 2017-10-02 13:57:34 +0100
Wei Liu [Tue, 24 Oct 2017 14:08:22 +0000 (16:08 +0200)]
x86/hvm/dmop: fix EFAULT condition
The copy macro returns false when the copy fails.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 1c2ea5ee05f6a046c458e5a0babbd72234b2526d
master date: 2017-09-28 11:57:21 +0100
Jan Beulich [Tue, 24 Oct 2017 14:06:14 +0000 (16:06 +0200)]
gnttab: fix pin count / page reference race
Dropping page references before decrementing pin counts is a bad idea
if assumptions are being made that a non-zero pin count implies a valid
page. Fix the order of operations in gnttab_copy_release_buf(), but at
the same time also remove the assertion that was found to trigger:
map_grant_ref() also has the potential of causing a race here, and
changing the order of operations there would likely be quite a bit more
involved.
This is CVE-2017-15597 / XSA-236.
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e008f7619dcd6d549727c9635b3f9f3c7ee483ed
master date: 2017-10-24 16:01:33 +0200
The variable domctl.u.address_size.size may remain uninitialized if
guest_type is not one of xen-3.0-aarch64 or xen-3.0-armv7l. And the
code precisely checks if this variable is still 0 to decide if the
guest type is supported or not.
This fixes the following build failure with gcc 7.x:
xc_dom_arm.c:229:31: error: 'domctl.u.address_size.size' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if ( domctl.u.address_size.size == 0 )
Patch originally taken from
https://www.mail-archive.com/xen-devel@lists.xen.org/msg109313.html.
Signed-off-by: Bernd Kuhls <bernd.kuhls@t-online.de> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 88bfbf90e35f1213f9967a97dee0b2039f9998a4)
Andrew Cooper [Thu, 12 Oct 2017 13:08:34 +0000 (15:08 +0200)]
x86/cpu: Fix IST handling during PCPU bringup
Clear IST references in newly allocated IDTs. Nothing good will come of
having them set before the TSS is suitably constructed (although the chances
of the CPU surviving such an IST interrupt/exception is extremely slim).
Uniformly set the IST references after the TSS is in place. This fixes an
issue on AMD hardware, where onlining a PCPU while PCPU0 is in HVM context
will cause IST_NONE to be copied into the new IDT, making that PCPU vulnerable
to privilege escalation from PV guests until it subsequently schedules an HVM
guest.
This is XSA-244.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: cc08c73c8c1f5ba5ed0f8274548db6725e1c3157
master date: 2017-10-12 14:50:31 +0200
Andrew Cooper [Thu, 12 Oct 2017 13:08:05 +0000 (15:08 +0200)]
x86/shadow: Don't create self-linear shadow mappings for 4-level translated guests
When initially creating a monitor table for 4-level translated guests, don't
install a shadow-linear mapping. This mapping is actually self-linear, and
trips up the writeable heuristic logic into following Xen's mappings, not the
guests' shadows it was expecting to follow.
A consequence of this is that sh_guess_wrmap() needs to cope with there being
no shadow-linear mapping present, which in practice occurs once each time a
vcpu switches to 4-level paging from a different paging mode.
An appropriate shadow-linear slot will be inserted into the monitor table
either while constructing lower level monitor tables, or by sh_update_cr3().
While fixing this, clarify the safety of the other mappings. Despite
appearing unsafe, it is correct to create a guest-linear mapping for
translated domains; this is self-linear and doesn't point into the translated
domain. Drop a dead clause for translate != external guests.
This is XSA-243.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
master commit: bf2b4eadcf379d0361b38de9725ea5f7a18a5205
master date: 2017-10-12 14:50:07 +0200
Jan Beulich [Thu, 12 Oct 2017 13:07:28 +0000 (15:07 +0200)]
x86: don't allow page_unlock() to drop the last type reference
Only _put_page_type() does the necessary cleanup, and hence not all
domain pages can be released during guest cleanup (leaving around
zombie domains) if we get this wrong.
Jan Beulich [Thu, 12 Oct 2017 13:06:55 +0000 (15:06 +0200)]
x86: don't store possibly stale TLB flush time stamp
While the timing window is extremely narrow, it is theoretically
possible for an update to the TLB flush clock and a subsequent flush
IPI to happen between the read and write parts of the update of the
per-page stamp. Exclude this possibility by disabling interrupts
across the update, preventing the IPI to be serviced in the middle.
This is XSA-241.
Reported-by: Jann Horn <jannh@google.com> Suggested-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 23a183607a427572185fc51c76cc5ab11c00c4cc
master date: 2017-10-12 14:48:25 +0200
Jan Beulich [Thu, 12 Oct 2017 13:06:12 +0000 (15:06 +0200)]
x86: limit linear page table use to a single level
That's the only way that they're meant to be used. Without such a
restriction arbitrarily long chains of same-level page tables can be
built, tearing down of which may then cause arbitrarily deep recursion,
causing a stack overflow. To facilitate this restriction, a counter is
being introduced to track both the number of same-level entries in a
page table as well as the number of uses of a page table in another
same-level one (counting into positive and negative direction
respectively, utilizing the fact that both counts can't be non-zero at
the same time).
Note that the added accounting introduces a restriction on the number
of times a page can be used in other same-level page tables - more than
32k of such uses are no longer possible.
Note also that some put_page_and_type[_preemptible]() calls are
replaced with open-coded equivalents. This seemed preferrable to
adding "parent_table" to the matrix of functions.
Note further that cross-domain same-level page table references are no
longer permitted (they probably never should have been).
This is XSA-240.
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
master commit: 6987fc7558bdbab8119eabf026e3cdad1053f0e5
master date: 2017-10-12 14:44:34 +0200
Jan Beulich [Thu, 12 Oct 2017 13:05:44 +0000 (15:05 +0200)]
x86/HVM: prefill partially used variable on emulation paths
Certain handlers ignore the access size (vioapic_write() being the
example this was found with), perhaps leading to subsequent reads
seeing data that wasn't actually written by the guest. For
consistency and extra safety also do this on the read path of
hvm_process_io_intercept(), even if this doesn't directly affect what
guests get to see, as we've supposedly already dealt with read handlers
leaving data completely unitialized.
This is XSA-239.
Reported-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 0d4732ac29b63063764c29fa3bd8946daf67d6f3
master date: 2017-10-12 14:43:26 +0200
Misbehaving device model can pass incorrect XEN_DMOP_map/
unmap_io_range_to_ioreq_server arguments, namely end < start when
specifying address range. When this happens we hit ASSERT(s <= e) in
rangeset_contains_range()/rangeset_overlaps_range() with debug builds.
Production builds will not trap right away but may misbehave later
while handling such bogus ranges.
Jan Beulich [Thu, 12 Oct 2017 13:04:27 +0000 (15:04 +0200)]
x86/FLASK: fix unmap-domain-IRQ XSM hook
The caller and the FLASK implementation of xsm_unmap_domain_irq()
disagreed about what the "data" argument points to in the MSI case:
Change both sides to pass/take a PCI device.
This is part of XSA-237.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6f17f5c43a3bd28d27ed8133b2bf513e2eab7d59
master date: 2017-10-12 14:37:56 +0200
Jan Beulich [Thu, 12 Oct 2017 13:03:26 +0000 (15:03 +0200)]
x86/MSI: disallow redundant enabling
At the moment, Xen attempts to allow redundant enabling of MSI by
having pci_enable_msi() return 0, and point to the existing MSI
descriptor, when the msi already exists.
Unfortunately, if subsequent errors are encountered, the cleanup
paths assume pci_enable_msi() had done full initialization, and
hence undo everything that was assumed to be done by that
function without also undoing other setup that would normally
occur only after that function was called (in map_domain_pirq()
itself).
Rather than try to make the redundant enabling case work properly, just
forbid it entirely by having pci_enable_msi() return -EEXIST when MSI
is already set up.
This is part of XSA-237.
Reported-by: HW42 <hw42@ipsumj.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: a46126fec20e0cf4f5442352ef45efaea8c89646
master date: 2017-10-12 14:36:58 +0200
Jan Beulich [Thu, 12 Oct 2017 13:02:54 +0000 (15:02 +0200)]
x86: enforce proper privilege when (un)mapping pIRQ-s
(Un)mapping of IRQs, just like other RESOURCE__ADD* / RESOURCE__REMOVE*
actions (in FLASK terms) should be XSM_DM_PRIV rather than XSM_TARGET.
This in turn requires bypassing the XSM check in physdev_unmap_pirq()
for the HVM emuirq case just like is being done in physdev_map_pirq().
The primary goal security wise, however, is to no longer allow HVM
guests, by specifying their own domain ID instead of DOMID_SELF, to
enter code paths intended for PV guest and the control domains of HVM
guests only.
This is part of XSA-237.
Reported-by: HW42 <hw42@ipsumj.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: db72faf69c94513e180568006a9d899ed422ff90
master date: 2017-10-12 14:36:30 +0200
Jan Beulich [Thu, 12 Oct 2017 13:02:08 +0000 (15:02 +0200)]
x86: don't allow MSI pIRQ mapping on unowned device
MSI setup should be permitted only for existing devices owned by the
respective guest (the operation may still be carried out by the domain
controlling that guest).
This is part of XSA-237.
Reported-by: HW42 <hw42@ipsumj.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3308374b1be7d43e23bd2e9eaf23ec06d7959882
master date: 2017-10-12 14:35:14 +0200
xen/arm: p2m: Read *_mapped_gfn with the p2m lock taken
*_mapped_gfn are currently read before acquiring the lock. However, they
may be modified by the p2m code before the lock was acquired. This means
we will use the wrong values.
Fix it by moving the read inside the section protected by the p2m lock.
xen/arm: Fix the issue in cmp_mmio_handler used in find_mmio_handler
This patch fixes the wrong range check done in cmp_mmio_handler().
This function returns -1 , 0 or 1 based on whether the key value
is below the range, in the range or above the range where the range is
(start, start+size). However, it should check against (start, start+size-1)
because start+size falls outside the range.
This resulted in returning a wrong mmio_handler for a given mmio address which
happened to be start+size.
This bug was introduced when the mmio region search switched from
linear search to binary search in the following commit:
8047e09 "xen/arm: io: Use binary search for mmio handler lookup".
Julien Grall [Fri, 6 Oct 2017 12:59:38 +0000 (14:59 +0200)]
xen/arm: Correctly report the memory region in the dummy NUMA helpers
NUMA is currently not supported on Arm. Because common code is
NUMA-aware, dummy helpers are instead provided to expose a single node.
Those helpers are for instance used to know the region to scrub.
However the memory region is not reported correctly. Indeed, the
frametable may not be at the beginning of the memory and there might be
multiple memory banks. This will lead to not scrub some part of the
memory.
The memory information can be found using:
* first_valid_mfn as the start of the memory
* max_page - first_valid_mfn as the spanned pages
Note that first_valid_mfn is now been exported. The prototype has been
added in asm-arm/numa.h and not in a common header because I would
expect the variable to become static once NUMA is fully supported on
Arm.
Julien Grall [Fri, 6 Oct 2017 12:59:00 +0000 (14:59 +0200)]
xen/page_alloc: Cover memory unreserved after boot in first_valid_mfn
On Arm, some regions (e.g Initramfs, Dom0 Kernel...) are marked as
reserved until the hardware domain is built and they are copied into its
memory. Therefore, they will not be added in the boot allocator via
init_boot_pages.
Instead, init_xenheap_pages will be called once the region are not used
anymore.
Update first_valid_mfn in both init_heap_pages and init_boot_pages
(already exist) to cover all the cases.
Jan Beulich [Fri, 6 Oct 2017 12:57:55 +0000 (14:57 +0200)]
x86/HVM: correct repeat count update in linear->phys translation
For the insn emulator's fallback logic in REP INS/OUTS handling
to work correctly, *reps must not be set to zero when returning
X86EMUL_UNHANDLEABLE.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Paul Durrant <paul.durrant@citrix.com>
master commit: 49160d205236d8e36d27d40b6bf69b9b75f2c333
master date: 2017-09-08 16:23:46 +0200
Jan Beulich [Fri, 6 Oct 2017 12:57:20 +0000 (14:57 +0200)]
x86: introduce and use setup_force_cpu_cap()
For XEN_SMEP and XEN_SMAP to not be cleared while bringing up APs we'd
need to clone the respective hack used for CPUID_FAULTING. Introduce an
inverse of setup_clear_cpu_cap() instead, but let clearing of features
overrule forced setting of them.
XEN_SMAP being wrong post-boot is a problem specifically for live
patching, as a live patch may need alternative instruction patching
keyed off of that feature flag.
Reported-by: Sarah Newman <security@prgmr.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 0829a6bdbdc6b79990bd0668e847275b6a2717e5
master date: 2017-09-06 12:32:00 +0200
Jan Beulich [Fri, 6 Oct 2017 12:56:47 +0000 (14:56 +0200)]
x86emul: correct VEX.L handling for VCVT{,T}S{S,D}2SI
Recent changes to the SDM (and XED) have made clear that older hardware
raising #UD when the bit is set was really an erratum. Generalize the
so far AMD-only override.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a6488965ca3ec30f2e0b7022b539bba78c2aeede
master date: 2017-09-05 17:32:05 +0200
Jan Beulich [Fri, 6 Oct 2017 12:56:11 +0000 (14:56 +0200)]
x86emul: correct VEX.W handling for non-64-bit VPINSRD
Going though the XED commits from the last couple of months made me
notice that VPINSRD, other than VPEXTRD, does not clear VEX.W for non-
64-bit modes, leading to an insertion of stray 32-bits of zero in case
the original instruction had the bit set.
Also remove a pointless fall-through in VPEXTRW handling, bringing
things in line with VPINSRW.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 9c2babd05a213f8802e3cc1c64a2af932b5cbd7d
master date: 2017-09-05 17:31:01 +0200
Andrew Cooper [Fri, 6 Oct 2017 12:55:33 +0000 (14:55 +0200)]
x86/emul: Fix the handling of unimplemented Grp7 instructions
Grp7 is abnormally complicated to decode, even by x86's standards, with
{s,l}msw being the problematic cases.
Previously, any value which fell through the first switch statement (looking
for instructions with entirely implicit operands) would be interpreted by the
second switch statement (handling instructions with memory operands).
Unimplemented instructions would then hit the #UD case for having a non-memory
operand, rather than taking the cannot_emulate path.
Consolidate the two switch statements into a single one, using ranges to cover
the instructions with memory operands.
Reported-by: Petre Pircalabu <ppircalabu@bitdefender.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <JBeulich@suse.com>
master commit: 4d3f0fde471e7588ce512eaff1abdab209d8cd4b
master date: 2017-09-05 12:58:47 +0100
Chao Gao [Fri, 6 Oct 2017 12:54:51 +0000 (14:54 +0200)]
VT-d: use correct BDF for VF to search VT-d unit
When SR-IOV is enabled, 'Virtual Functions' of a 'Physical Function'
are under the scope of the same VT-d unit as the 'Physical Function'.
A 'Physical Function' can be a 'Traditional Function' or an ARI
'Extended Function'. And furthermore, 'Extended Functions' on an
endpoint are under the scope of the same VT-d unit as the 'Traditional
Functions' on the endpoint. To search VT-d unit for a VF, if its PF
isn't an extended function, the BDF of PF should be used. Otherwise
the BDF of a traditional function in the same device with the PF
should be used.
Current code uses PCI_SLOT() to recognize an ARI 'Extended Funcion'.
But it is conceptually wrong w/o checking whether PF is an extended
function and would lead to match VFs of a RC integrated PF to a wrong
VT-d unit.
This patch overrides VF 'is_extfn' field and uses this field to
indicate whether the PF of this VF is an extended function. The field
helps to use correct BDF to search VT-d unit.
Reported-by: Crawford, Eric R <Eric.R.Crawford@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Tested-by: Crawford, Eric R <Eric.R.Crawford@intel.com>
master commit: c286af54c7177c14180121b422d8df7281e547cb
master date: 2017-09-01 11:02:23 +0200
Xiong Zhang [Fri, 6 Oct 2017 12:54:24 +0000 (14:54 +0200)]
hvmloader: use base instead of pci_mem_start for find_next_rmrr()
find_next_rmrr(base) is used to find the lowest RMRR ending above base
but below 4G. Current method couldn't cover the following situation:
a. two rmrr exist, small gap between them
b. pci_mem_start and mem_resource.base is below the first rmrr.base
c. find_next_rmrr(pci_mem_start) will find the first rmrr
d. After aligning mem_resource.base to bar size,
first_rmrr.end < new_base < second_rmrr.base and
new_base + bar_sz > second_rmrr.base.
So the new bar will overlap with the second rmrr and doesn't overlap
with the first rmrr.
But the next_rmrr point to the first rmrr, then check_overlap() couldn't
find the overlap. Finally assign a wrong address to bar.
This patch using aligned new base to find the next rmrr, could fix the
above case and find all the overlapped rmrr with new base.
David Woodhouse [Fri, 6 Oct 2017 12:53:51 +0000 (14:53 +0200)]
x86/efi: don't write relocations in efi_arch_relocate_image() first pass
The function is invoked with delta=0 before ExitBootServices() is called,
as a dummy run purely to validate that all the relocations can be handled.
This allows us to exit gracefully with an error message.
However, we have relocations in read-only sections such as .rodata and
.init.te(xt). Recent versions of UEFI will actually make those sections
read-only, which will cause a fault. This functionaity was added in
EDK2 commit d0e92aad4 ("MdeModulePkg/DxeCore: Add UEFI image protection.")
It's OK to actually make the changes in the later pass because UEFI will
tear down the protection when ExitBootServices() is called, because it
knows we're going to need to do this kind of thing.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
master commit: 34828425d36b560adfe96430b9b83dfb0f66f2a8
master date: 2017-08-25 14:07:40 +0200
Jan Beulich [Tue, 12 Sep 2017 13:01:58 +0000 (15:01 +0200)]
gnttab: also validate PTE permissions upon destroy/replace
In order for PTE handling to match up with the reference counting done
by common code, presence and writability of grant mapping PTEs must
also be taken into account; validating just the frame number is not
enough. This is in particular relevant if a guest fiddles with grant
PTEs via non-grant hypercalls.
Note that the flags being passed to replace_grant_host_mapping()
already happen to be those of the existing mapping, so no new function
parameter is needed.
This is CVE-2017-14319 / XSA-234.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 16b1414de91b5a82a0996c67f6db3af7d7e32873
master date: 2017-09-12 14:45:13 +0200
tools/xenstore: dont unlink connection object twice
A connection object of a domain with associated stubdom has two
parents: the domain and the stubdom. When cleaning up the list of
active domains in domain_cleanup() make sure not to unlink the
connection twice from the same domain. This could happen when the
domain and its stubdom are being destroyed at the same time leading
to the domain loop being entered twice.
Additionally don't use talloc_free() in this case as it will remove
a random parent link, leading eventually to a memory leak. Use
talloc_unlink() instead specifying the context from which the
connection object should be removed.
This is CVE-2017-14317 / XSA-233.
Reported-by: Eric Chanudet <chanudete@ainfosec.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
master commit: 562a1c0f7ef3fbf3c122c3dfa4f2ad9dd51da9fe
master date: 2017-09-12 14:44:56 +0200
Andrew Cooper [Tue, 12 Sep 2017 13:01:11 +0000 (15:01 +0200)]
grant_table: fix GNTTABOP_cache_flush handling
Don't fall over a NULL grant_table pointer when the owner of the domain
is a system domain (DOMID_{XEN,IO} etc).
This is CVE-2017-14318 / XSA-232.
Reported-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c3d830b244998b3686e2eb64db95996be5eb5e5c
master date: 2017-09-12 14:44:11 +0200
George Dunlap [Tue, 12 Sep 2017 13:00:10 +0000 (15:00 +0200)]
xen/mm: make sure node is less than MAX_NUMNODES
The output of MEMF_get_node(memflags) can be as large as nodeid_t can
hold (currently 255). This is then used as an index to arrays of size
MAX_NUMNODE, which is 64 on x86 and 1 on ARM, can be passed in by an
untrusted guest (via memory_exchange and increase_reservation) and is
not currently bounds-checked.
Check the value in page_alloc.c before using it, and also check the
value in the hypercall call sites and return -EINVAL if appropriate.
Don't permit domains other than the hardware or control domain to
allocate node-constrained memory.
This is CVE-2017-14316 / XSA-231.
Reported-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 2fece35303529395bfea6b03d2268380ef682c93
master date: 2017-09-12 14:43:16 +0200
When no memory is available in the hypervisor, rather than immediately
failing the request, try to steal a handle from another vCPU.
Reported-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: d02f1a0b7576bafb2fba903c7e6e7221ab0d2847
master date: 2017-08-17 14:41:01 +0200
cpufreq: only stop ondemand governor if already started
On CPUFREQ_GOV_STOP in cpufreq_governor_dbs, shortcut to
return success if the governor is already stopped.
Avoid executing dbs_timer_exit, to prevent tripping an assertion
within a call to kill_timer on a timer that has not been prepared
with init_timer, if the CPUFREQ_GOV_START case has not
run beforehand.
kill_timer validates timer state:
* itself, via BUG_ON(this_cpu(timers).running == timer);
* within active_timer, ASSERTing timer->status is within bounds;
* within list_del, which ASSERTs timer inactive list membership.
Patch is synonymous to an OpenXT patch produced at Citrix prior to
June 2014.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: e7ec6f5f32cd2d0f723083cde3d7761c4e675f2c
master date: 2017-08-10 12:35:50 +0200
Chao Gao [Mon, 28 Aug 2017 09:38:25 +0000 (11:38 +0200)]
VT-d PI: disable VT-d PI when CPU-side PI isn't enabled
From the context calling pi_desc_init(), we can conclude the current
implementation of VT-d PI depends on CPU-side PI. If we enable VT-d PI
and disable CPU-side PI by disabling APICv explicitly in xen boot
command line, we would get an assertion failure.
This patch clears iommu_intpost once finding CPU-side PI won't be enabled.
It is safe for this is done before this flag starts taking effect. Also
take this chance to remove the useless check of "acknowledge interrupt on
exit", which is a minimal requirement which has been checked earlier.
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: e489eb6138e7efe4214a7e9ba0d21f54fc5b7d35
master date: 2017-08-10 12:32:16 +0200
Rusty Bird [Mon, 28 Aug 2017 09:37:43 +0000 (11:37 +0200)]
VT-d: don't panic/warn on iommu=no-igfx
When operating on an Intel graphics device, iommu_enable_translation()
panicked (force_iommu==1) or warned (force_iommu==0) about the BIOS if
is_igd_vt_enabled_quirk() returned 0. That's good if the actual BIOS
problem has been detected. But since commit 1463411, returning 0 could
also happen if the user simply passed "iommu=no-igfx", in which case
bailing out with an info message (instead of a panic/warning) would be
more appropriate.
The panic broke the combination "iommu=force,no-igfx", and also the case
where "iommu=no-igfx" is passed but force_iommu=1 is set automatically
by x2apic_bsp_setup().
Move the iommu_igfx check from is_igd_vt_enabled_quirk() into its only
caller iommu_enable_translation(), and tweak the logic.
Olaf Hering [Mon, 28 Aug 2017 09:37:20 +0000 (11:37 +0200)]
docs: correct paragraph indention in xen-tscmode
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
master commit: 55924baf2211ddcf5ba8f702c9a4c07730e0c8e8
master date: 2017-07-24 10:17:24 +0100
Olaf Hering [Mon, 28 Aug 2017 09:36:51 +0000 (11:36 +0200)]
docs: replace xm with xl in xen-tscmode
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
master commit: 763267e315a93e2b6d66a0afdcda96db939e09b6
master date: 2017-07-24 10:17:21 +0100
Andrew Cooper [Mon, 28 Aug 2017 09:36:05 +0000 (11:36 +0200)]
x86/hvm: Fixes to hvmemul_insn_fetch()
Force insn_off to a single byte, as offset can wrap around or truncate with
respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.
Furthermore, don't use an ASSERT() for bounds checking the write into
hvmemul_ctxt->insn_buf[].
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/hvm: Fix boundary check in hvmemul_insn_fetch()
c/s 0943a03037 added some extra protection for overflowing the emulation
instruction cache, but Coverity points out that boundary condition is off by
one when memcpy()'ing out of the buffer.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
x86/HVM: fix boundary check in hvmemul_insn_fetch() (again)
Commit 5a992b670b ("x86/hvm: Fix boundary check in
hvmemul_insn_fetch()") went a little too far in its correction to
commit 0943a03037 ("x86/hvm: Fixes to hvmemul_insn_fetch()"): Keep the
start offset check, but restore the original end offset one.
Olaf Hering [Mon, 28 Aug 2017 09:35:29 +0000 (11:35 +0200)]
rombios: prevent building with PIC/PIE
If the default compiler silently defaults to to -fPIC/-fPIE building
rombios fails:
ld -melf_i386 -s -r 32bitbios.o tcgbios/tcgbiosext.o util.o pmm.o -o 32bitbios_all.o
There are undefined symbols in the BIOS:
U _GLOBAL_OFFSET_TABLE_
make[10]: *** [Makefile:26: 32bitbios_all.o] Error 11
Prevent the failure by enforcing non-PIC/PIE mode.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 672949d6c61d9cba01c5b414eed9d522082f04d3
master date: 2017-06-26 14:32:46 +0100
Gregory Herrero [Mon, 28 Aug 2017 09:34:26 +0000 (11:34 +0200)]
stop_machine: fill fn_result only in case of error
When stop_machine_run() is called with NR_CPUS as last argument,
fn_result member must be filled only if an error happens since it is
shared across all cpus.
Assume CPU1 detects an error and set fn_result to -1, then CPU2 doesn't
detect an error and set fn_result to 0. The error detected by CPU1 will
be ignored.
Note that in case multiple failures occur on different CPUs, only the
last error will be reported.
Jan Beulich [Wed, 23 Aug 2017 15:51:44 +0000 (17:51 +0200)]
arm/mm: release grant lock on xenmem_add_to_physmap_one() error paths
Commit 55021ff9ab ("xen/arm: add_to_physmap_one: Avoid to map mfn 0 if
an error occurs") introduced error paths not releasing the grant table
lock. Replace them by a suitable check after the lock was dropped.
This is XSA-235.
Reported-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <julien.grall@arm.com>
master commit: 59546c1897a90fe9af5ebbbb05ead8d98b4d17b9
master date: 2017-08-23 17:45:45 +0200
Jan Beulich [Thu, 17 Aug 2017 13:07:23 +0000 (15:07 +0200)]
gnttab: fix transitive grant handling
Processing of transitive grants must not use the fast path, or else
reference counting breaks due to the skipped recursive call to
__acquire_grant_for_copy() (its __release_grant_for_copy()
counterpart occurs independent of original pin count). Furthermore
after re-acquiring temporarily dropped locks we need to verify no grant
properties changed if the original pin count was non-zero; checking
just the pin counts is sufficient only for well-behaved guests. As a
result, __release_grant_for_copy() needs to mirror that new behavior.
Furthermore a __release_grant_for_copy() invocation was missing on the
retry path of __acquire_grant_for_copy(), and gnttab_set_version() also
needs to bail out upon encountering a transitive grant.
This is part of XSA-226.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: ad48fb963dbff02762d2db5396fa655ac0c432c7
master date: 2017-08-17 14:40:31 +0200
Jan Beulich [Thu, 17 Aug 2017 12:58:42 +0000 (14:58 +0200)]
gnttab: don't use possibly unbounded tail calls
There is no guarantee that the compiler would actually translate them
to branches instead of calls, so only ones with a known recursion limit
are okay:
- __release_grant_for_copy() can call itself only once, as
__acquire_grant_for_copy() won't permit use of multi-level transitive
grants,
- __acquire_grant_for_copy() is fine to call itself with the last
argument false, as that prevents further recursion,
- __acquire_grant_for_copy() must not call itself to recover from an
observed change to the active entry's pin count
This is part of XSA-226.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 999d2ccb7f73408aa22656e1ba2f98b077eaa1c2
master date: 2017-08-17 14:39:18 +0200
Jan Beulich [Tue, 15 Aug 2017 13:14:36 +0000 (15:14 +0200)]
gnttab: correct pin status fixup for copy
Regardless of copy operations only setting GNTPIN_hst*, GNTPIN_dev*
also need to be taken into account when deciding whether to clear
_GTF_{read,writ}ing. At least for consistency with code elsewhere the
read part better doesn't use any mask at all.
This is XSA-230.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6e2a4c73564ab907b732059adb317d6ca2d138a2
master date: 2017-08-15 15:08:03 +0200
Jan Beulich [Tue, 15 Aug 2017 13:14:02 +0000 (15:14 +0200)]
gnttab: split maptrack lock to make it fulfill its purpose again
The way the lock is currently being used in get_maptrack_handle(), it
protects only the maptrack limit: The function acts on current's list
only, so races on list accesses are impossible even without the lock.
Otoh list access races are possible between __get_maptrack_handle() and
put_maptrack_handle(), due to the invocation of the former for other
than current from steal_maptrack_handle(). Introduce a per-vCPU lock
for list accesses to become race free again. This lock will be
uncontended except when it becomes necessary to take the steal path,
i.e. in the common case there should be no meaningful performance
impact.
When in get_maptrack_handle adds a stolen entry to a fresh, empty,
freelist, we think that there is probably no concurrency. However,
this is not a fast path and adding the locking there makes the code
clearly correct.
Also, while we are here: the stolen maptrack_entry's tail pointer was
not properly set. Set it.
This is CVE-2017-12136 / XSA-228.
Reported-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
master commit: 02cbeeb6207508b0f04a2c6181445c8eb3f1e117
master date: 2017-08-15 15:07:25 +0200
Andrew Cooper [Tue, 15 Aug 2017 13:12:41 +0000 (15:12 +0200)]
x86/grant: disallow misaligned PTEs
Pagetable entries must be aligned to function correctly. Disallow attempts
from the guest to have a grant PTE created at a misaligned address, which
would result in corruption of the L1 table with largely-guest-controlled
values.
This is CVE-2017-12137 / XSA-227.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ce442926c2530da9376199dcc769436376ad2386
master date: 2017-08-15 15:06:45 +0200
Thomas Sanders [Tue, 28 Mar 2017 17:57:52 +0000 (18:57 +0100)]
oxenstored: trim history in the frequent_ops function
We were trimming the history of commits only at the end of each
transaction (regardless of how it ended).
Therefore if non-transactional writes were being made but no
transactions were being ended, the history would grow
indefinitely. Now we trim the history at regular intervals.
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Andrew Cooper [Tue, 13 Jun 2017 20:36:58 +0000 (21:36 +0100)]
xen/livepatch: Don't crash on encountering STN_UNDEF relocations
A symndx of STN_UNDEF is special, and means a symbol value of 0. While
legitimate in the ELF standard, its existance in a livepatch is questionable
at best. Until a plausible usecase presents itself, reject such a relocation
with -EOPNOTSUPP.
Additionally, fix an off-by-one error while range checking symndx, and perform
a safety check on elf->sym[symndx].sym before derefencing it, to avoid
tripping over a NULL pointer when calculating val.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [x86 and arm32] Reviewed-by: Jan Beulich <JBeulich@suse.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
(cherry picked from commit 2ff229643b739e2fd0cd0536ee9fca506cfa92f8) Release-acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Thu, 22 Jun 2017 17:55:31 +0000 (18:55 +0100)]
xen/livepatch: Use zeroed memory allocations for arrays
Each of these arrays is sparse. Use zeroed allocations to cause uninitialised
array elements to contain deterministic values, most importantly for the
embedded pointers.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [x86 and arm32] Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
(cherry picked from commit bd53b85156aaf0330181ab9b82d9a6c52fb30f8c) Release-acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Tue, 13 Jun 2017 20:17:47 +0000 (21:17 +0100)]
xen/livepatch: Clean up arch relocation handling
* Reduce symbol scope and initalisation as much as possible
* Annotate a fallthrough case in arm64
* Fix switch statement style in arm32
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [x86 and arm32]
(cherry picked from commit bf570f593554356eb508ac2f8ffb05ebbf747c78) Release-acked-by: Julien Grall <julien.grall@arm.com>
Add a pointer to the gic device tree bindings. Add an explanation on how
to calculate irq numbers from device tree.
Add a brief explanation of the reg property and a pointer to the xl docs
for a description of the iomem property. Add a note that in the example
we are using different memory addresses for guests and host.
Ian Jackson [Mon, 19 Jun 2017 14:04:08 +0000 (15:04 +0100)]
xen/test/Makefile: Fix clean target, broken by pattern rule
In "xen/test/livepatch: Regularise Makefiles" we reworked
xen/test/Makefile to use a pattern rule. However, there are two
problems with this. Both are related to the way that xen/Rules.mk is
implicitly part of this Makefile because of the way that Makefiles
under xen/ are invoked by their parent directory Makefiles.
Firstly, the Rules.mk `clean' target overrides the pattern rule in
xen/test/Makefile. The result is that `make -C xen clean' does not
actually run the livepatch clean target.
The Rules.mk clean target does have provision for recursing into
subdirectories, but that feature is tangled up with complex object
file iteration machinery which is not desirable here. However, we can
extend the Rules.mk clean target since it is a double-colon rule.
Sadly this involves duplicating the SUBDIR iteration boilerplate. (A
make function could be used but the cure would be worse than the
disease.)
Secondly, Rules.mk has a number of -include directives. make likes to
try to (re)build files mentioned in includes. With the % pattern
rule, this applies to those files too.
As a result, make -C xen clean would try to build `.*.d' (for example)
in xen/test. This would fail with an error message. The error would
be ignored because of the `-', but it's annoying and ugly.
Solve this by limiting the % pattern rule to the targets we expect it
to handle. These are those listed in the top-level Makefile help
message, apart from: those which are subdir- or component-qualified;
clean targets (which are handled specially, even distclean); and dist,
src-tarball-*, etc. (which are converted to install by an earlier
Makefile).
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 592e834522086009975bd48d59386094771bd06b)
Andre Przywara [Wed, 24 May 2017 00:07:00 +0000 (01:07 +0100)]
ARM: vGIC: avoid rank lock when reading priority
When reading the priority value of a virtual interrupt, we were taking
the respective rank lock so far.
However for forwarded interrupts (Dom0 only so far) this may lead to a
deadlock with the following call chain:
- MMIO access to change the IRQ affinity, calling the ITARGETSR handler
- this handler takes the appropriate rank lock and calls vgic_store_itargetsr()
- vgic_store_itargetsr() will eventually call vgic_migrate_irq()
- if this IRQ is already in-flight, it will remove it from the old
VCPU and inject it into the new one, by calling vgic_vcpu_inject_irq()
- vgic_vcpu_inject_irq will call vgic_get_virq_priority()
- vgic_get_virq_priority() tries to take the rank lock - again!
It seems like this code path has never been exercised before.
Fix this by avoiding taking the lock in vgic_get_virq_priority() (like we
do in vgic_get_target_vcpu()).
Actually we are just reading one byte, and priority changes while
interrupts are handled are a benign race that can happen on real hardware
too. So it is safe to just prevent the compiler from reading from the
struct more than once.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 43208a9cb4c3decce67b653539c1b860121fbb5e)
Jan Beulich [Tue, 20 Jun 2017 15:07:12 +0000 (17:07 +0200)]
memory: don't suppress P2M update in populate_physmap()
Commit d18627583d ("memory: don't hand MFN info to translated guests")
wrongly added a null-handle check there - just like stated in its
description for memory_exchange(), the array is also an input for
populate_physmap() (and hence can't reasonably be null). I have no idea
how I've managed to overlook this.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Julien Grall <julien.grall@arm.com>
master commit: b964e3106d2cdaa11cc4524181ff14607d110ae4
master date: 2017-06-20 14:51:53 +0200
Julien Grall [Tue, 20 Jun 2017 13:48:42 +0000 (15:48 +0200)]
xen/arm: vgic: Sanitize target mask used to send SGI
The current function vgic_to_sgi does not sanitize the target mask and
may therefore get an invalid vCPU ID. This will result to an out of
bound access of d->vcpu[...] as there is no check whether the vCPU ID is
within the maximum supported by the guest.
This was introduced by commit ea37fd2111 "xen/arm: split vgic driver
into generic and vgic-v2 driver".
Jan Beulich [Tue, 20 Jun 2017 13:48:11 +0000 (15:48 +0200)]
gnttab: __gnttab_unmap_common_complete() is all-or-nothing
All failures have to be detected in __gnttab_unmap_common(), the
completion function must not skip part of its processing. In particular
the GNTMAP_device_map related putting of page references and adjustment
of pin count must not occur if __gnttab_unmap_common() signaled an
error. Furthermore the function must not make adjustments to global
state (here: clearing GNTTAB_device_map) before all possibly failing
operations have been performed.
There's one exception for IOMMU related failures: As IOMMU manipulation
occurs after GNTMAP_*_map have been cleared already, the related page
reference and pin count adjustments need to be done nevertheless. A
fundamental requirement for the correctness of this is that
iommu_{,un}map_page() crash any affected DomU in case of failure.
The version check appears to be pointless (or could perhaps be a
BUG_ON() or ASSERT()), but for the moment also move it.
George Dunlap [Tue, 20 Jun 2017 13:47:46 +0000 (15:47 +0200)]
gnttab: correct logic to get page references during map requests
The rules for reference counting are somewhat complicated:
* Each of GNTTAB_host_map and GNTTAB_device_map need their own
reference count
* If the mapping is writeable:
- GNTTAB_host_map needs a type count under only some conditions
- GNTTAB_device_map always needs a type count
If the mapping succeeds, we need to keep all of these; if the mapping
fails, we need to release whatever references we have acquired so far.
Additionally, the code that does a lot of this calculation "inherits"
a reference as part of the process of finding out who the owner is.
Finally, if the grant is mapped as writeable (without the
GNTMAP_readonly flag), but the hypervisor cannot grab a
PGT_writeable_page type, the entire operation should fail.
Unfortunately, the current code has several logic holes:
* If a grant is mapped only GNTTAB_device_map, and with a writeable
mapping, but in conditions where a *host* type count is not
necessary, the code will fail to grab the necessary type count.
* If a grant is mapped both GNTTAB_device_map and GNTTAB_host_map,
with a writeable mapping, in conditions where the host type count is
not necessary, *and* where the page cannot be changed to type
PGT_writeable, the condition will not be detected.
In both cases, this means that on success, the type count will be
erroneously reduced when the grant is unmapped. In the second case,
the type count will be erroneously reduced on the failure path as
well. (In the first case the failure path logic has the same hole
as the reference grabbing logic.)
Additionally, the return value of get_page() is not checked; but this
may fail even if the first get_page() succeeded due to a reference
counting overflow.
First of all, simplify the restoration logic by explicitly counting
the reference and type references acquired.
Consider each mapping type separately, explicitly marking the
'incoming' reference as used so we know when we need to grab a second
one.
Finally, always check the return value of get_page[_type]() and go to
the failure path if appropriate.
This is part of XSA-224.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 75b384ece635adc55c2bafbdc2d8959c10542c31
master date: 2017-06-20 14:46:21 +0200
Jan Beulich [Tue, 20 Jun 2017 13:47:19 +0000 (15:47 +0200)]
gnttab: never create host mapping unless asked to
We shouldn't create a host mapping unless asked to even in the case of
mapping a granted MMIO page. In particular the mapping wouldn't be torn
down when processing the matching unmap request.
George Dunlap [Tue, 20 Jun 2017 13:46:50 +0000 (15:46 +0200)]
gnttab: fix handling of dev_bus_addr during unmap
If a grant has been mapped with the GNTTAB_device_map flag, calling
grant_unmap_ref() with dev_bus_addr set to zero should cause the
GNTTAB_device_map part of the mapping to be left alone.
Unfortunately, at the moment, op->dev_bus_addr is implicitly checked
before clearing the map and adjusting the pin count, but only the bits
above 12; and it is not checked at all before dropping page
references. This means a guest can repeatedly make such a call to
cause the reference count to drop to zero, causing the page to be
freed and re-used, even though it's still mapped in its pagetables.
To fix this, always check op->dev_bus_addr explicitly for being
non-zero, as well as op->flag & GNTMAP_device_map, before doing
operations on the device_map.
While we're here, make the logic a bit cleaner:
* Always initialize op->frame to zero and set it from act->frame, to reduce the
chance of untrusted input being used
* Explicitly check the full dev_bus_addr against act->frame <<
PAGE_SHIFT, rather than ignoring the lower 12 bits
This is part of XSA-224.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 8fdfcb2b6bcd074776560e76843815f124d587f1
master date: 2017-06-20 14:45:33 +0200
Julien Grall [Tue, 20 Jun 2017 13:46:38 +0000 (15:46 +0200)]
arm: vgic: Don't update the LR when the IRQ is not enabled
gic_raise_inflight_irq will be called if the IRQ is already inflight
(i.e the IRQ is injected to the guest). If the IRQ is already already in
the LRs, then the associated LR will be updated.
To know if the interrupt is already in the LR, the function check if the
interrupt is queued. However, if the interrupt is not enabled then the
interrupt may not be queued nor in the LR. So gic_update_one_lr may be
called (if we inject on the current vCPU) and read the LR.
Because the interrupt is not in the LR, Xen will either read:
* LR 0 if the interrupt was never injected before
* LR 255 (GIC_INVALID_LR) if the interrupt was injected once. This
is because gic_update_one_lr will reset p->lr.
Reading LR 0 will result to potentially update the wrong interrupt and
not keep the LRs in sync with Xen.
Reading LR 255 will result to:
* Crash Xen on GICv3 as the LR index is bigger than supported (see
gicv3_ich_read_lr).
* Read/write always GICH_LR + 255 * 4 that is not part of the memory
mapped.
The problem can be prevented by checking whether the interrupt is
enabled in gic_raise_inflight_irq before calling gic_update_one_lr.
A follow-up of this patch is expected to mitigate the issue in the
future.
Jan Beulich [Tue, 20 Jun 2017 13:45:55 +0000 (15:45 +0200)]
guest_physmap_remove_page() needs its return value checked
Callers, namely such subsequently freeing the page, must not blindly
assume success - the function may namely fail when needing to shatter a
super page, but there not being memory available for the then needed
intermediate page table.
As it happens, guest_remove_page() callers now also all check the
return value.
Furthermore a missed put_gfn() on an error path in gnttab_transfer() is
also being taken care of.
This is part of XSA-222.
Reported-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a0cce6048d010a30ac82f8db7787bbf9aada64f4
master date: 2017-06-20 14:41:16 +0200
Andrew Cooper [Tue, 20 Jun 2017 13:44:53 +0000 (15:44 +0200)]
memory: fix return value handing of guest_remove_page()
Despite the description in mm.h, guest_remove_page() previously returned 0 for
paging errors.
Switch guest_remove_page() to having regular 0/-error semantics, and propagate
the return values from clear_mmio_p2m_entry() and mem_sharing_unshare_page()
to the callers (although decrease_reservation() is the only caller which
currently cares).
This is part of XSA-222.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b614f642c35da5184416787352f51a6379a92628
master date: 2017-06-20 14:39:56 +0200
Jan Beulich [Tue, 20 Jun 2017 13:44:11 +0000 (15:44 +0200)]
evtchn: avoid NULL derefs
Commit fbbd5009e6 ("evtchn: refactor low-level event channel port ops")
added a de-reference of the struct evtchn pointer for a port without
first making sure the bucket pointer is non-NULL. This de-reference is
actually entirely unnecessary, as all relevant callers (beyond the
problematic do_poll()) already hold the port number in their hands, and
the actual leaf functions need nothing else.
For FIFO event channels there's a second problem in that the ordering
of reads and updates to ->num_evtchns and ->event_array[] was so far
undefined (the read side isn't always holding the domain's event lock).
Add respective barriers.
Jan Beulich [Tue, 20 Jun 2017 13:43:13 +0000 (15:43 +0200)]
x86: avoid leaking PKRU and BND* between vCPU-s
PKRU is explicitly "XSAVE-managed but not XSAVE-enabled", so guests
might access the register (via {RD,WR}PKRU) without setting XCR0.PKRU.
Force context switching as well as migrating the register as soon as
CR4.PKE is being set the first time.
For MPX (BND<n>, BNDCFGU, and BNDSTATUS) the situation is less clear,
and the SDM has not entirely consistent information for that case.
While experimentally the instructions don't change register state as
long as the two XCR0 bits aren't both 1, be on the safe side and enable
both if BNDCFGS.EN is being set the first time.
This is XSA-220.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: de20bb6c4f65c4161e0931402613f9ffac86302d
master date: 2017-06-20 14:36:51 +0200
Andrew Cooper [Tue, 20 Jun 2017 13:42:01 +0000 (15:42 +0200)]
x86/shadow: hold references for the duration of emulated writes
The (misnamed) emulate_gva_to_mfn() function translates a linear address to an
mfn, but releases its page reference before returning the mfn to its caller.
sh_emulate_map_dest() uses the results of one or two translations to construct
a virtual mapping to the underlying frames, completes an emulated
write/cmpxchg, then unmaps the virtual mappings.
The page references need holding until the mappings are unmapped, or the
frames can change ownership before the writes occurs.
This is XSA-219.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 26217aff67ae1538d4e1b2226afab6993cdbe772
master date: 2017-06-20 14:36:11 +0200
Jan Beulich [Tue, 20 Jun 2017 13:41:36 +0000 (15:41 +0200)]
gnttab: correct maptrack table accesses
In order to observe a consistent (limit,pointer-table) pair, the reader
needs to either hold the maptrack lock (in line with documentation) or
both sides need to order their accesses suitably (the writer side
barrier was removed by commit dff515dfea ["gnttab: use per-VCPU
maptrack free lists"], and a read side barrier has never been there).
Make the writer publish a new table page before limit (for bounds
checks to work), and new list head last (for racing maptrack_entry()
invocations to work). At the same time add read barriers to lockless
readers.
Additionally get_maptrack_handle() must not assume ->maptrack_head to
not change behind its back: Another handle may be put (updating only
->maptrack_tail) and then got or stolen (updating ->maptrack_head).
This is part of XSA-218.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 4b78efa91c8ae3c42e14b8eaeaad773c5eb3b71a
master date: 2017-06-20 14:34:34 +0200
George Dunlap [Tue, 20 Jun 2017 13:41:07 +0000 (15:41 +0200)]
gnttab: Avoid potential double-put of maptrack entry
Each grant mapping for a particular domain is tracked by an in-Xen
"maptrack" entry. This entry is is referenced by a "handle", which is
given to the guest when it calls gnttab_map_grant_ref().
There are two types of mapping a particular handle can refer to:
GNTMAP_host_map and GNTMAP_device_map. A given
gnttab_unmap_grant_ref() call can remove either only one or both of
these entries. When a particular handle has no entries left, it must
be freed.
gnttab_unmap_grant_ref() loops through its grant unmap request list
twice. It first removes entries from any host pagetables and (if
appropraite) iommus; then it does a single domain TLB flush; then it
does the clean-up, including telling the granter that entries are no
longer being used (if appropriate).
At the moment, it's during the first pass that the maptrack flags are
cleared, but the second pass that the maptrack entry is freed.
Unfortunately this allows the following race, which results in a
double-free:
A: (pass 1) clear host_map
B: (pass 1) clear device_map
A: (pass 2) See that maptrack entry has no mappings, free it
B: (pass 2) See that maptrack entry has no mappings, free it #
Unfortunately, unlike the active entry pinning update, we can't simply
move the maptrack flag changes to the second half, because the
maptrack flags are used to determine if iommu entries need to be
added: a domain's iommu must never have fewer permissions than the
maptrack flags indicate, or a subsequent map_grant_ref() might fail to
add the necessary iommu entries.
Instead, free the maptrack entry in the first pass if there are no
further mappings.
This is part of XSA-218.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: b7f6cbb9d43f7384e1f38f8764b9a48216c8a525
master date: 2017-06-20 14:33:13 +0200
Jan Beulich [Tue, 20 Jun 2017 13:40:19 +0000 (15:40 +0200)]
gnttab: fix unmap pin accounting race
Once all {writable} mappings of a grant entry have been unmapped, the
hypervisor informs the guest that the grant entry has been released by
clearing the _GTF_{reading,writing} usage flags in the guest's grant
table as appropriate.
Unfortunately, at the moment, the code that updates the accounting
happens in a different critical section than the one which updates the
usage flags; this means that under the right circumstances, there may be
a window in time after the hypervisor reported the grant as being free
during which the grant referee still had access to the page.
Move the grant accounting code into the same critical section as the
reporting code to make sure this kind of race can't happen.
Jan Beulich [Tue, 20 Jun 2017 13:39:16 +0000 (15:39 +0200)]
x86/mm: disallow page stealing from HVM domains
The operation's success can't be controlled by the guest, as the device
model may have an active mapping of the page. If we nevertheless
permitted this operation, we'd have to add further TLB flushing to
prevent scenarios like
"Domains A (HVM), B (PV), C (PV); B->target==A
Steps:
1. B maps page X from A as writable
2. B unmaps page X without a TLB flush
3. A sends page X to C via GNTTABOP_transfer
4. C maps page X as pagetable (potentially causing a TLB flush in C,
but not in B)
At this point, X would be mapped as a pagetable in C while being
writable through a stale TLB entry in B."
A similar scenario could be constructed for A using XENMEM_exchange and
some arbitrary PV domain C then having this page allocated.
This is XSA-217.
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
master commit: fae7d5be8bb8b7a5b7005c4f3b812a47661a721e
master date: 2017-06-20 14:29:51 +0200
Zhongze Liu [Wed, 14 Jun 2017 01:11:48 +0000 (09:11 +0800)]
tools: fix several "format-truncation" warnings with GCC 7
GCC 7.1.1 complains that several buffers passed to snprintf() in xenpmd
and tools/ocmal/xc are too small to hold the largest possible resulting string,
which is calculated by adding up the maximum length of all the substrings.
The warnings are treated as errors by -Werror, and goes like this (abbreviated):
xenpmd.c:94:36: error: ‘%s’ directive output may be truncated writing up to
255 bytes into a region of size 13 [-Werror=format-truncation=]
#define BATTERY_INFO_FILE_PATH "/proc/acpi/battery/%s/info"
^
xenpmd.c:113:13: note: ‘snprintf’ output between 25 and 280 bytes into a
destination of size 32
xenpmd.c:95:37: error: ‘%s’ directive output may be truncated writing up to
255 bytes into a region of size 13 [-Werror=format-truncation=]
#define BATTERY_STATE_FILE_PATH "/proc/acpi/battery/%s/state"
^
xenpmd.c:116:13: note: ‘snprintf’ output between 26 and 281 bytes into a
destination of size 32
xenctrl_stubs.c:65:15: error: ‘%s’ directive output may be truncated writing
up to 1023 bytes into a region of size 252 [-Werror=format-truncation=]
"%d: %s: %s", error->code,
^~
xenctrl_stubs.c:64:4: note: ‘snprintf’ output 5 or more bytes (assuming 1028)
into a destination of size 256
Enlarge the size of these buffers as suggested by the complier
(and slightly rounded) to fix the warnings.
No functional changes.
Signed-off-by: Zhongze Liu <blackskygg@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 2d78f78a14528752266982473c07118f1bc336e3)