]> xenbits.xensource.com Git - xen.git/log
xen.git
5 years agox86/cpuidle: clean up Cx dumping
Jan Beulich [Tue, 21 May 2019 06:31:47 +0000 (08:31 +0200)]
x86/cpuidle: clean up Cx dumping

Don't log the same global information once per CPU. Don't log the same
information (here: the currently active state) twice. Don't prefix
decimal numbers with zeros (giving the impression they're octal). Use
format specifiers matching the type of the corresponding expressions.
Don't split printk()-s without intervening new-lines.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpuidle: push parked CPUs into deeper sleep states when possible
Jan Beulich [Tue, 21 May 2019 06:31:09 +0000 (08:31 +0200)]
x86/cpuidle: push parked CPUs into deeper sleep states when possible

When the mwait-idle driver isn't used, C-state information becomes
available only in the course of Dom0 starting up. Use the provided data
to allow parked CPUs to sleep in a more energy efficient way, by waking
them briefly (via NMI) once the data has been recorded.

This involves re-arranging how/when the governor's ->enable() hook gets
invoked. The changes there include addition of so far missing error
handling in the respective CPU notifier handlers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/idle: re-arrange dead-idle handling
Jan Beulich [Tue, 21 May 2019 06:30:23 +0000 (08:30 +0200)]
x86/idle: re-arrange dead-idle handling

In order to be able to wake parked CPUs from default_dead_idle() (for
them to then enter a different dead-idle routine), the function should
not itself loop. Move the loop into play_dead(), and use play_dead() as
well on the AP boot error path.

Furthermore, not the least considering the comment in play_dead(),
make sure NMI raised (for now this would be a bug elsewhere, but that's
about to change) against a parked or fully offline CPU won't invoke the
actual, full-blown NMI handler.

Note however that this doesn't make #MC any safer for fully offline
CPUs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: basic AVX512DQ testing
Jan Beulich [Tue, 21 May 2019 06:29:51 +0000 (08:29 +0200)]
x86emul: basic AVX512DQ testing

Test various of the insns which have been implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: basic AVX512BW testing
Jan Beulich [Tue, 21 May 2019 06:29:38 +0000 (08:29 +0200)]
x86emul: basic AVX512BW testing

Test various of the insns which have been implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: support AVX512{BW,DQ} mask move insns
Jan Beulich [Tue, 21 May 2019 06:28:48 +0000 (08:28 +0200)]
x86emul: support AVX512{BW,DQ} mask move insns

Entries to the tables in evex-disp8.c are added despite these insns not
allowing for memory operands, with the goal of the tables giving a
complete picture of the supported EVEX-encoded insns in the end.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: support AVX512{F,BW} integer shuffle insns
Jan Beulich [Tue, 21 May 2019 06:27:58 +0000 (08:27 +0200)]
x86emul: support AVX512{F,BW} integer shuffle insns

Also include vshuff{32x4,64x2} as being very similar to vshufi{32x4,64x2}.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
5 years agox86emul: support AVX512{F,BW,_VBMI} full permute insns
Jan Beulich [Tue, 21 May 2019 06:27:16 +0000 (08:27 +0200)]
x86emul: support AVX512{F,BW,_VBMI} full permute insns

Take the liberty and also correct the (public interface) name of the
AVX512_VBMI feature flag, on the assumption that no external consumer
has actually been using that flag so far. Furthermore make it have
AVX512BW instead of AVX512F as a prerequisite, for requiring full
64-bit mask registers (the upper 48 bits of which can't be accessed
other than through XSAVE/XRSTOR without AVX512BW support).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: support AVX512{F,BW} integer unpack insns
Jan Beulich [Tue, 21 May 2019 06:23:57 +0000 (08:23 +0200)]
x86emul: support AVX512{F,BW} integer unpack insns

There's once again one extra twobyte_table[] entry which gets its Disp8
shift value set right away without getting support implemented just yet,
again to avoid needlessly splitting groups of entries.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpuid: adjust dependencies of post-SSE ISA extensions
Jan Beulich [Tue, 21 May 2019 06:21:45 +0000 (08:21 +0200)]
x86/cpuid: adjust dependencies of post-SSE ISA extensions

Move AESNI, PCLMULQDQ, and SHA to SSE2, as all of them act on vectors of
integers, whereas plain SSE supports vectors of single precision floats
only. This is in line with how e.g. binutils and gcc treat them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/arm: traps: Avoid using BUG_ON() to check guest state in advance_pc()
Julien Grall [Wed, 15 May 2019 20:17:30 +0000 (21:17 +0100)]
xen/arm: traps: Avoid using BUG_ON() to check guest state in advance_pc()

The condition of the BUG_ON() in advance_pc() is pretty wrong because
the bits [26:25] and [15:10] have a different meaning between AArch32
and AArch64 state.

On AArch32, they are used to store PSTATE.IT. On AArch64, they are RES0
or used for new feature (e.g ARMv8.0-SSBS, ARMv8.5-BTI).

This means a 64-bit guest will hit the BUG_ON() if it is trying to use
any of these features.

More generally, RES0 means that the bits is reserved for future use. So
crashing the host is definitely not the right solution.

In this particular case, we only need to know the guest was using 32-bit
Mode and the Thumb instructions. So replace the BUG_ON() by a proper
check.

Reported-by: Lukas Jünger <lukas.juenger@ice.rwth-aachen.de>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agolibxl: fix libxl_domain_need_memory after 899433f149d
Wei Liu [Fri, 17 May 2019 17:05:55 +0000 (18:05 +0100)]
libxl: fix libxl_domain_need_memory after 899433f149d

After 899433f149d libxl needs to know the content of d_config to
determine which QEMU is used. The code is changed such that
libxl__domain_set_device_model needs to be called before
libxl__domain_build_info_setdefault.

This is fine for libxl code, but it is problematic for
libxl_domain_need_memory, which is the only public API that takes a
build_info. To avoid breaking its users, provide a compatibility
setting inside that function.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agocoverage: filter out libfdt.o and libelf.o
Viktor Mitin [Thu, 16 May 2019 13:20:16 +0000 (16:20 +0300)]
coverage: filter out libfdt.o and libelf.o

While the build system explicitly compiles any .init object without gcov
option, this does not cover the libraries libfdt and libelf. This is
because the two
libraries are built normally and then some sections will have .init
append.

As coverage will be enabled for libfdt, some of the GCOV counters may be
stored in a section that will be stripped after init. On Arm64, this
will reliably result to a crash when 'xencov' will ask to reset the
counters.

Interestingly, on x86, all the counters for libelf seems to be in
sections that will not be renamed so far. Hence, why this was not
discovered before. But this is a latent bug.

As the two libraries can only be used at boot, it is fine to disable
coverage for the entire library.

Reported-by: Viktor Mitin <viktor.mitin.19@gmail.com>
Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Viktor Mitin <viktor.mitin.19@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
[julien: Reword commit message]
Signed-off-by: Julien Grall <julien.grall@arm.com>
5 years agox86/emul: dedup hvmemul_cpuid() and pv_emul_cpuid()
Andrew Cooper [Thu, 19 Jul 2018 16:40:06 +0000 (16:40 +0000)]
x86/emul: dedup hvmemul_cpuid() and pv_emul_cpuid()

They are identical, so provide a single x86emul_cpuid() instead.

As x86_emulate() now only uses the ->cpuid() hook for real CPUID instructions,
the hook can be omitted from all special-purpose emulation ops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/emul: Don't use the ->cpuid() hook for feature checks
Andrew Cooper [Thu, 19 Jul 2018 15:57:41 +0000 (15:57 +0000)]
x86/emul: Don't use the ->cpuid() hook for feature checks

For a release build of xen, this removes nearly 5k of code volume, and removes
a function pointer call from every instantiation.

  add/remove: 0/1 grow/shrink: 0/3 up/down: 0/-4822 (-4822)
  Function                                     old     new   delta
  adjust_bnd                                   260     244     -16
  x86_decode                                  8915    8890     -25
  vcpu_has.isra                                129       -    -129
  x86_emulate                               130040  125388   -4652
  Total: Before=3326565, After=3321743, chg -0.14%

Note that one corner case changes.  At the moment, it is possible for an
entity making direct DOMCTL_set_cpuid hypercalls to construct a policy with
max_leaf < 7, but feature bits set in leaf 7.  By default, libxc and libxl
don't do this, and the result is properly bounded by what the hardware is
capable of (so we won't start trying to use instructions which don't exist in
the CPU).

Previously, the cpuid() hook would end up hiding these features, but they may
still be set cpuid_policy, and therefore might start being accepted by
x86_emulate().

This corner case will be fixed by the in-progress DOMCTL_set_cpu_policy work,
and a guest would only encounter the corner case if it was constructed in a
non-standard manner, and if tried using instruction which it couldn't see
CPUID feature bits for.  As such, it isn't a corner case which we need to
worry about.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/emul: Pass a full cpuid_policy into x86_emulate()
Andrew Cooper [Thu, 19 Jul 2018 15:52:06 +0000 (15:52 +0000)]
x86/emul: Pass a full cpuid_policy into x86_emulate()

This will be used to simplify feature checking.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: cover for clang's lack of support of -mpreferred-stack-boundary=<N>
Jan Beulich [Fri, 17 May 2019 15:32:20 +0000 (17:32 +0200)]
x86: cover for clang's lack of support of -mpreferred-stack-boundary=<N>

While clang supposedly supports -mstack-alignment=<N> instead, I'm not
using that alternative here due to being uncertain whether that's indeed
an exact equivalent of the gcc option. Only make use of the option
entirely conditional for now.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agolibxc: elf_kernel loader: Remove check for shstrtab
Anthony PERARD [Fri, 17 May 2019 11:38:43 +0000 (12:38 +0100)]
libxc: elf_kernel loader: Remove check for shstrtab

This was probably useful as a sanity check when the "__xen_guest"
section were not legacy.  But now ELF notes are prefered and
"should live in a PT_NOTE segment" (elfnote.h).

This check is unnecessary as elf_xen_parse() from xen/common/libelf
will do the right thing and look for ELFNOTEs in the different places
in order of preference. elf_xen_parse() will still be able to also
look for the legacy "__xen_guest" section without the check in libxc.

This patch would allow to write a simpler ELF header for an OVMF blob
(which isn't an ELF) and allow it to be loaded as a PVH kernel. The
header only needs to declare two program segments:
- one to tell an ELF loader where to put the blob,
- one for a Xen ELFNOTE.

The ELFNOTE is to comply to the pvh design which wants the
XEN_ELFNOTE_PHYS32_ENTRY to declare a blob as compaptible with the PVH
boot ABI.

Note that without the ELFNOTE, libxc will load an ELF but with
the plain ELF loader, which doesn't check for shstrtab.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/sched: fix csched2_deinit_pdata()
Juergen Gross [Fri, 17 May 2019 13:41:17 +0000 (15:41 +0200)]
xen/sched: fix csched2_deinit_pdata()

Commit 753ba43d6d16e688 ("xen/sched: fix credit2 smt idle handling")
introduced a regression when switching cpus between cpupools.

When assigning a cpu to a cpupool with credit2 being the default
scheduler csched2_deinit_pdata() is called for the credit2 private data
after the new scheduler's private data has been hooked to the per-cpu
scheduler data. Unfortunately csched2_deinit_pdata() will cycle through
all per-cpu scheduler areas it knows of for removing the cpu from the
respective sibling masks including the area of the just moved cpu. This
will (depending on the new scheduler) either clobber the data of the
new scheduler or in case of sched_rt lead to a crash.

Avoid that by removing the cpu from the list of active cpus in credit2
data first.

The opposite problem is occurring when removing a cpu from a cpupool:
init_pdata() of credit2 will access the per-cpu data of the old
scheduler.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agovideo: fix handling framebuffer located above 4GB
Marek Marczykowski-Górecki [Fri, 17 May 2019 12:48:23 +0000 (14:48 +0200)]
video: fix handling framebuffer located above 4GB

On some machines (for example Thinkpad P52), UEFI GOP reports
framebuffer located above 4GB (0x4000000000 on that machine). This
address does not fit in {xen,dom0}_vga_console_info.u.vesa_lfb.lfb_base
field, which is 32bit. The overflow here cause all kind of memory
corruption when anything tries to write something on the screen,
starting with zeroing the whole framebuffer in vesa_init().

Fix this similar to how it's done in Linux: add ext_lfb_base field at
the end of the structure, to hold upper 32bits of the address. Since the
field is added at the end of the structure, it will work with older
Linux versions too (other than using possibly truncated address - no
worse than without this change). Thanks to ABI containing size of the
structure (start_info.console.dom0.info_size), Linux can detect when
this field is present and use it appropriately then.

Since this change public interface and use __XEN_INTERFACE_VERSION__,
bump __XEN_LATEST_INTERFACE_VERSION__.

Note: if/when backporting this change to Xen <= 4.12, #if in xen.h needs
to be extended with " || defined(__XEN__)".

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: adjust IOMMU list head initialization
Jan Beulich [Fri, 17 May 2019 12:43:43 +0000 (14:43 +0200)]
AMD/IOMMU: adjust IOMMU list head initialization

Do this statically, which will allow accessing the (empty) list even
without having come through acpi_ivrs_init().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoIOMMU: patch certain indirect calls to direct ones
Jan Beulich [Fri, 17 May 2019 12:40:41 +0000 (14:40 +0200)]
IOMMU: patch certain indirect calls to direct ones

This is intentionally not touching hooks used rarely (or not at all)
during the lifetime of a VM, unless perhaps sitting on an error path
next to a call which gets changed (in which case I think the error
path better remains consistent with the respective main path).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
5 years agocpufreq: patch target() indirect call to direct one
Jan Beulich [Fri, 17 May 2019 12:40:12 +0000 (14:40 +0200)]
cpufreq: patch target() indirect call to direct one

This looks to be the only frequently executed hook; don't bother
patching any other ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpuidle: patch some indirect calls to direct ones
Jan Beulich [Fri, 17 May 2019 12:39:38 +0000 (14:39 +0200)]
x86/cpuidle: patch some indirect calls to direct ones

For now only the ones used during entering/exiting of idle states are
converted. Additionally pm_idle{,_save} and lapic_timer_{on,off} can't
be converted, as they may get established rather late (when Dom0 is
already active).

Note that for patching to be deferred until after the pre-SMP initcalls
(from where cpuidle_init_cpu() runs the first time) the pointers need to
start out as NULL.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/genapic: patch indirect calls to direct ones
Jan Beulich [Fri, 17 May 2019 12:39:08 +0000 (14:39 +0200)]
x86/genapic: patch indirect calls to direct ones

For (I hope) obvious reasons only the ones used at runtime get
converted.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: patch ctxt_switch_masking() indirect call to direct one
Jan Beulich [Fri, 17 May 2019 12:38:38 +0000 (14:38 +0200)]
x86: patch ctxt_switch_masking() indirect call to direct one

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: patch vINTR indirect calls through hvm_funcs to direct ones
Jan Beulich [Fri, 17 May 2019 12:38:07 +0000 (14:38 +0200)]
x86/HVM: patch vINTR indirect calls through hvm_funcs to direct ones

While not strictly necessary, change the VMX initialization logic to
update the function table in start_vmx() from NULL rather than to NULL,
to make more obvious that we won't ever change an already (explicitly)
initialized function pointer.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: patch indirect calls through hvm_funcs to direct ones
Jan Beulich [Fri, 17 May 2019 12:37:25 +0000 (14:37 +0200)]
x86/HVM: patch indirect calls through hvm_funcs to direct ones

This is intentionally not touching hooks used rarely (or not at all)
during the lifetime of a VM, like {domain,vcpu}_initialise or cpu_up,
as well as nested, VM event, and altp2m ones (they can all be done
later, if so desired). Virtual Interrupt delivery ones will be dealt
with in a subsequent patch.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: infrastructure to allow converting certain indirect calls to direct ones
Jan Beulich [Fri, 17 May 2019 12:36:36 +0000 (14:36 +0200)]
x86: infrastructure to allow converting certain indirect calls to direct ones

In a number of cases the targets of indirect calls get determined once
at boot time. In such cases we can replace those calls with direct ones
via our alternative instruction patching mechanism.

Some of the targets (in particular the hvm_funcs ones) get established
only in pre-SMP initcalls, making necessary a second passs through the
alternative patching code. Therefore some adjustments beyond the
recognition of the new special pattern are necessary there.

Note that patching such sites more than once is not supported (and the
supplied macros also don't provide any means to do so).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
5 years agox86: clone Linux'es ASM_CALL_CONSTRAINT
Jan Beulich [Fri, 17 May 2019 12:35:52 +0000 (14:35 +0200)]
x86: clone Linux'es ASM_CALL_CONSTRAINT

While we don't mean to run their objtool over our generated code, it
still seems desirable to avoid calls to further functions before a
function's frame pointer is set up.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
5 years agox86: reduce general stack alignment to 8
Jan Beulich [Fri, 17 May 2019 12:35:14 +0000 (14:35 +0200)]
x86: reduce general stack alignment to 8

We don't need bigger alignment except when calling EFI boot or runtime
services functions (and we don't guarantee that either, as explained
close to the top of xen/common/efi/runtime.c in the struct efi_rs_state
declaration). Hence if the compiler supports reducing stack alignment
from the ABI compatible 16 bytes (gcc 7 and newer), do so wherever
possible.

The EFI case itself is largely dealt with already (actually forcing
32-byte alignment) as a result of commit f6b7fedc89 ("x86/EFI: meet
further spec requirements for runtime calls"). However, as explained in
the description of that earlier change, without using
-mincoming-stack-boundary=3 (which we don't want) we still have to make
the compiler assume 16-byte stack boundaries for CUs making EFI calls in
order to keep the compiler from aligning the stack, but then placing an
odd number of 8-byte objects on it, resulting in a mis-aligned outgoing
stack.

This as a side effect yields some code size reduction, since for a
number of sufficiently simple non-leaf functions the stack adjustment
(by 8, when there are no local stack variables at all) gets dropped
altogether. I notice exceptions though, for example in guest_cpuid(),
where in a release build gcc 8.2 now decides to set up a frame pointer
(without ever using %rbp); I consider this a compiler quirk which we
should leave to the compiler folks to address eventually.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
5 years agoxen:arm: we never get into schedule_tail() with prev==current
Andrii Anisov [Wed, 8 May 2019 09:59:38 +0000 (12:59 +0300)]
xen:arm: we never get into schedule_tail() with prev==current

ARM's schedule_tail() is called from two places: context_switch() and
continue_new_vcpu(). Both functions are always called with
prev!=current. So replace the correspondent check in schedule_tail()
with ASSERT() which is the development (debug) build guard.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: Add early printk support for SCIFA compatible UARTs
Oleksandr Tyshchenko [Thu, 2 May 2019 17:00:22 +0000 (20:00 +0300)]
xen/arm: Add early printk support for SCIFA compatible UARTs

This patch makes possible to use existing early prink code
for Renesas "Stout" board based on R-Car H2 SoC (SCIFA).

The "EARLY_PRINTK_VERSION" for that board should be 'A':
CONFIG_EARLY_PRINTK=scif,0xe6c40000,A

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: Extend SCIF early prink code to handle other interfaces
Oleksandr Tyshchenko [Thu, 2 May 2019 17:00:21 +0000 (20:00 +0300)]
xen/arm: Extend SCIF early prink code to handle other interfaces

Extend early prink code to be able to handle other SCIF(X)
compatible interfaces as well. These interfaces have lot in common,
but mostly differ in offsets and bits for some registers.

Introduce "EARLY_PRINTK_VERSION" config option to choose which
interface version should be used (to properly apply register offsets).

Please note, nothing has been technically changed for Renesas "Lager"
and other supported boards (SCIF).

The "EARLY_PRINTK_VERSION" option for that board should be empty:
CONFIG_EARLY_PRINTK=scif,0xe6e60000

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agopage-alloc: accompany BUG() with printk()
Jan Beulich [Thu, 16 May 2019 11:43:54 +0000 (13:43 +0200)]
page-alloc: accompany BUG() with printk()

Log information likely relevant for understanding why the BUG()s were
triggering.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86emul: add support for missing {,V}PMADDWD insns
Jan Beulich [Thu, 16 May 2019 11:43:17 +0000 (13:43 +0200)]
x86emul: add support for missing {,V}PMADDWD insns

Their pre-AVX512 incarnations have clearly been overlooked during much
earlier work. Their memory access pattern is entirely standard, so no
specific tests get added to the harness.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoAMD/IOMMU: don't open-code for_each_amd_iommu()
Jan Beulich [Thu, 16 May 2019 11:41:39 +0000 (13:41 +0200)]
AMD/IOMMU: don't open-code for_each_amd_iommu()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agolibxl: fix regression introduced in 5c883cf036cf
Wei Liu [Thu, 16 May 2019 09:11:53 +0000 (10:11 +0100)]
libxl: fix regression introduced in 5c883cf036cf

A few lines were erroneously deleted during rebase which caused domain
destruction to fail.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoDrop blktap2
Wei Liu [Wed, 15 May 2019 15:19:57 +0000 (16:19 +0100)]
Drop blktap2

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools: remove blktap2 related code and documentation
Wei Liu [Mon, 15 Aug 2016 10:32:56 +0000 (11:32 +0100)]
tools: remove blktap2 related code and documentation

Blktap2 is effectively dead for a few years.

Notable changes in this patch:

0. Unhook blktap2 from build system
1. libxl no longer supports TAP disk backend, with appropriate assertions
   added and some code paths now return ERROR_FAIL
2. Tap is no longer a supported backend
3. Remove blktap2 entry from MAINTAINERS

A patch to remove blktap2 directory will come later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoINSTALL: remove duplicate sentence
Wei Liu [Tue, 14 May 2019 14:22:33 +0000 (15:22 +0100)]
INSTALL: remove duplicate sentence

The same sentence is repeated in the next paragraph.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agoREADME: document requirement about python
Wei Liu [Tue, 14 May 2019 14:22:32 +0000 (15:22 +0100)]
README: document requirement about python

Provide information on what is expected from the build system
regarding python.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agogitignore: ignore .vscode directory
Wei Liu [Tue, 14 May 2019 14:22:31 +0000 (15:22 +0100)]
gitignore: ignore .vscode directory

The directory is created by Visual Studio Code editor to store its
local state.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agogitlab-ci: allow specifying base and tip in build test
Wei Liu [Wed, 15 May 2019 10:00:38 +0000 (11:00 +0100)]
gitlab-ci: allow specifying base and tip in build test

We will soon provide this new capability to humans and automated
systems.

The default behaviour is retained: tip and base are passed by Gitlab
CI.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agopvshim: make PV shim build selectable from configure
Roger Pau Monne [Tue, 14 May 2019 13:59:22 +0000 (15:59 +0200)]
pvshim: make PV shim build selectable from configure

So a user can decide whether to compile a PV shim as part of the tools
build. Note that the default behavior is preserved, which is to build
a PV shim when the target or host (if target is unset) architecture is
64bit x86.

Requested-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: run autogen.s ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
5 years agolibxl: make vkbd tunable for HVM guests
Eslam Elnikety [Tue, 14 May 2019 08:43:25 +0000 (08:43 +0000)]
libxl: make vkbd tunable for HVM guests

Each HVM guest currently gets a vkbd frontend/backend pair (c/s ebbd2561b4c).
This consumes host resources unnecessarily for guests that have no use for
vkbd. Make this behaviour tunable to allow an administrator to choose. The
commit retains the current behaviour -- HVM guests still get vkdb unless
specified otherwise.

Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agolibxl: fix migration of PV and PVH domUs with and without qemu
Olaf Hering [Tue, 14 May 2019 08:05:58 +0000 (10:05 +0200)]
libxl: fix migration of PV and PVH domUs with and without qemu

If a domU has a qemu-xen instance attached, it is required to call qemus
"xen-save-devices-state" method. Without it, the receiving side of a PV or
PVH migration may be unable to lock the image:

xen be: qdisk-51712: xen be: qdisk-51712: error: Failed to get "write" lock
error: Failed to get "write" lock
xen be: qdisk-51712: xen be: qdisk-51712: initialise() failed
initialise() failed

To fix this bug, libxl__domain_suspend_device_model() and
libxl__domain_resume_device_model() have to be called not only for HVM,
but also if the active device_model is QEMU_XEN.

Unfortunately, libxl__domain_build_info_setdefault() used to hardcode
b_info->device_model_version to QEMU_XEN if it does not know it any
better. As a result libxl__device_model_version_running() will return
incorrect values. This breaks domUs without a device_model.
libxl__qmp_stop() would wait 10 seconds in qmp_open() for a qemu that
will never appear. During this long timeframe the domU remains in state
paused on the sending side. As a result network connections may be
dropped. Once this bug is fixed as well, by just removing the assumption
that every domU has a QEMU_XEN, there is no code to actually initialise
b_info->device_model_version.

There is a helper function libxl__need_xenpv_qemu(), which is used in
various places to decide if a device_model has to be spawned. This
function can not be used as is, just to fill device_model_version,
because store_libxl_entry() was already called earlier.

Introduce LIBXL_DEVICE_MODEL_VERSION_NONE for PV and PVH that have no
need for a device_model to make the state explicit. Indicate this new
state via LIBXL_HAVE macro in libxl.h.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agolibxl: add helper function to set device_model_version
Olaf Hering [Tue, 14 May 2019 07:27:41 +0000 (09:27 +0200)]
libxl: add helper function to set device_model_version

An upcoming change will set the value of device_model_version properly
also for the non-HVM case.

Move existing code to new function libxl__domain_set_device_model.
Move also initialization for device_model_stubdomain to that function.
Make sure libxl__domain_build_info_setdefault is called with
device_model_version set.

Update libxl__spawn_stub_dm() and initiate_domain_create() to call the
new function prior libxl__domain_build_info_setdefault() because
device_mode_version is expected to be initialzed.
libxl_domain_need_memory() needs no update because it does not have a
d_config available anyway, and the callers provide a populated b_info.

The upcoming change needs a full libxl_domain_config, and the existing
libxl__domain_build_info_setdefault has just a libxl_domain_build_info
to work with.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agox86/altp2m: move altp2m_get_effective_entry() under CONFIG_HVM
Razvan Cojocaru [Tue, 14 May 2019 16:13:57 +0000 (19:13 +0300)]
x86/altp2m: move altp2m_get_effective_entry() under CONFIG_HVM

All its callers live inside #ifdef CONFIG_HVM sections.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/spec-ctrl: Introduce options to control VERW flushing
Andrew Cooper [Wed, 12 Dec 2018 19:22:15 +0000 (19:22 +0000)]
x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, some
   parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
   vulnerable to MDS, so do require VERW on the HVM path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers
Andrew Cooper [Wed, 12 Dec 2018 19:22:15 +0000 (19:22 +0000)]
x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data Sampling
Andrew Cooper [Wed, 12 Sep 2018 13:36:00 +0000 (14:36 +0100)]
x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/spec-ctrl: Misc non-functional cleanup
Andrew Cooper [Wed, 12 Sep 2018 13:36:00 +0000 (14:36 +0100)]
x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoIOMMU: avoid NULL deref in iommu_lookup_page()
Jan Beulich [Tue, 14 May 2019 14:22:17 +0000 (16:22 +0200)]
IOMMU: avoid NULL deref in iommu_lookup_page()

Luckily the function currently has no callers - it would have called
through NULL for both Arm and x86/AMD.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
5 years agox86/mm: subsume set_gpfn_from_mfn() into guest_physmap_add_page()
Jan Beulich [Tue, 14 May 2019 14:21:33 +0000 (16:21 +0200)]
x86/mm: subsume set_gpfn_from_mfn() into guest_physmap_add_page()

The two callers in common/memory.c currently call set_gpfn_from_mfn()
themselves, so moving the call into guest_physmap_add_page() helps
tidy their code.

The two callers in common/grant_table.c fail to make that call alongside
the one to guest_physmap_add_page(), so will actually get fixed by the
change.

Other (x86) callers are HVM only and are hence unaffected by a change
to the function's !paging_mode_translate() part.

Sadly this isn't enough yet to drop Arm's dummy macro, as there's one
more use in page_alloc.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mm: make guest_physmap_add_entry() HVM-only
Jan Beulich [Tue, 14 May 2019 14:20:06 +0000 (16:20 +0200)]
x86/mm: make guest_physmap_add_entry() HVM-only

Lift its !paging_mode_translate() part into guest_physmap_add_page()
(which is what common code calls), eliminating the dummy use of a
(HVM-only really) P2M type in the PV case.

Suggested-by: George Dunlap <George.Dunlap@eu.citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mm: short-circuit HVM-only mode flags when !HVM
Jan Beulich [Tue, 14 May 2019 14:18:58 +0000 (16:18 +0200)]
x86/mm: short-circuit HVM-only mode flags when !HVM

#define-ing them to zero allows better code generation in this case,
and paves the way for more DCE, allowing to leave certain functions just
declared, but not defined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoiommu: trivial re-organisation to avoid unnecessary test
Paul Durrant [Mon, 13 May 2019 15:50:46 +0000 (17:50 +0200)]
iommu: trivial re-organisation to avoid unnecessary test

An 'if ( !iommu_enabled )' followed by an 'if ( iommu_enabled )' with
only a printk() in between seems a little silly. Move the printk() and
use 'else' instead.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agomemory: restrict XENMEM_remove_from_physmap to translated guests
Jan Beulich [Mon, 13 May 2019 15:49:39 +0000 (17:49 +0200)]
memory: restrict XENMEM_remove_from_physmap to translated guests

The commit re-introducing it (14eb3b41d0 ["xen: reinstate previously
unused XENMEM_remove_from_physmap hypercall"]) as well as the one having
originally introduced it (d818f3cb7c ["hvm: Use main memory for video
memory"]) and the one then purging it again (78c3097e4f ["Remove unused
XENMEM_remove_from_physmap"]) make clear that this operation is intended
for use on HVM (i.e. translated) guests only. Restrict it at least as
much, because for PV guests documentation (in the public header) does
not even match the implementation: It talks about GPFN as input, but
get_page_from_gfn() assumes a GMFN in the non-translated case (and hands
back the value passed in).

Also lift the check in XENMEM_add_to_physmap{,_batch} handling up
directly into top level hypercall handling, and clarify things in the
public header accordingly.

Take the liberty and also replace a pointless use of "current" with a
more efficient use of an existing local variable (or function parameter
to be precise).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mm: free_page_type() is PV-only
Jan Beulich [Mon, 13 May 2019 14:42:34 +0000 (16:42 +0200)]
x86/mm: free_page_type() is PV-only

While it already has a CONFIG_PV wrapped around its entire body, it is
still uselessly invoking mfn_to_gmfn(), which is about to be replaced.
Avoid morphing this code into even more suspicious shape and remove the
effectively dead code - translated mode has been made impossible for PV
quite some time ago.

Adjust and extend the assertions at the same time: The original
ASSERT(!shadow_mode_refcounts(owner)) really means
ASSERT(!shadow_mode_enabled(owner) || !paging_mode_refcounts(owner)),
which isn't what we want here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/IRQ: avoid UB (or worse) in trace_irq_mask()
Jan Beulich [Mon, 13 May 2019 14:41:03 +0000 (16:41 +0200)]
x86/IRQ: avoid UB (or worse) in trace_irq_mask()

Dynamically allocated CPU mask objects may be smaller than cpumask_t, so
copying has to be restricted to the actual allocation size. This is
particulary important since the function doesn't bail early when tracing
is not active, so even production builds would be affected by potential
misbehavior here.

Take the opportunity and also
- use initializers instead of assignment + memset(),
- constify the cpumask_t input pointer,
- u32 -> uint32_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agopublic/tmem.h: fix version number in comment
Wei Liu [Mon, 13 May 2019 13:47:12 +0000 (14:47 +0100)]
public/tmem.h: fix version number in comment

The version number has been changed above due to rebasing onto 4.13
branch, but the one in the matching comment was left unchanged.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoinstall pkgconfig files into libdir
Olaf Hering [Mon, 25 Mar 2019 16:00:10 +0000 (17:00 +0100)]
install pkgconfig files into libdir

Most pkgconfig files contain a Libs: variable, which is either /usr/lib
or /usr/lib64. If a 32bit and a 64bit variant of xen libraries is
installed, the last one wins. As a result compiling for the other
bitsize will fail.

Instead of sharedir use libdir as install target. This matches both the
documentation and the expected result.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agodocs/xl: Clarify documentation for mem-max and mem-set
George Dunlap [Mon, 8 Apr 2019 11:09:43 +0000 (12:09 +0100)]
docs/xl: Clarify documentation for mem-max and mem-set

mem-set is the primary command that users will need to use and
understand.  Move it first, and clarify the wording; also specify that
you can't set the target higher than maxmem from the domain config.

mem-max is actually a pretty useless command at the moment.  Clarify
that users are not expected to use it; and document all of its quirky
behavior.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agogitlab-ci: avoid deleting build-each-commit-gcc.log
Wei Liu [Tue, 7 May 2019 16:11:01 +0000 (17:11 +0100)]
gitlab-ci: avoid deleting build-each-commit-gcc.log

072a96c4901 used `git clean -ffdx` which caused the log to be deleted.

Generate the log in the parent directory then move it back.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agotools/Makefile: Fix build of QEMU, remove --source-path
Anthony PERARD [Thu, 2 May 2019 16:25:50 +0000 (17:25 +0100)]
tools/Makefile: Fix build of QEMU, remove --source-path

Following QEMU's commit 79d77bcd36 (configure: Remove --source-path
option), Xen's build system fails to build qemu-xen. The --source-path
option gives redundant information about the location of the sources
so simply remove it. (configure already looks at its $0 to find the
source-path.)

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agopython: Adjust xc_physinfo wrapper for updated virt_caps bits
Marek Marczykowski-Górecki [Mon, 29 Apr 2019 22:42:52 +0000 (00:42 +0200)]
python: Adjust xc_physinfo wrapper for updated virt_caps bits

Commit f089fddd94 "xen: report PV capability in sysctl and use it in
toolstack" changed meaning of virt_caps bit 1 - previously it was
"directio", but was changed to "pv" and "directio" was moved to bit 2.
Adjust python wrapper to use #defines for the bits values, and add
reporting of both "pv_directio" and "hvm_directio".

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools/include: propagate python interpreter path
Roger Pau Monne [Wed, 24 Apr 2019 09:20:37 +0000 (11:20 +0200)]
tools/include: propagate python interpreter path

To the Makefile that generates the cpuid policy. Without this fix if
the tools python interpreter is different than the default 'python' it
won't be correctly propagated.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agolibxl: update prototype of libxl__device_vkb_dm_needed
Olaf Hering [Wed, 10 Apr 2019 10:26:34 +0000 (12:26 +0200)]
libxl: update prototype of libxl__device_vkb_dm_needed

Align code to match other usage of device_dm_needed_fn_t:
recieve a void pointer, assign to expected type and use it accordingly.

No functional change expected.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agodocs: remove tmem related text
Wei Liu [Tue, 27 Nov 2018 18:12:00 +0000 (18:12 +0000)]
docs: remove tmem related text

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen: remove tmem from hypervisor
Wei Liu [Wed, 28 Nov 2018 12:13:15 +0000 (12:13 +0000)]
xen: remove tmem from hypervisor

This patch removes all tmem related code and CONFIG_TMEM from the
hypervisor. Also remove tmem hypercalls from the default XSM policy.

It is written as if tmem is disabled and tmem freeable pages is 0.

We will need to keep public/tmem.h around forever to avoid breaking
guests.  Remove the hypervisor only part and put guest visible part
under a xen version check. Take the chance to remove trailing
whitespaces.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools: remove tmem code and commands
Wei Liu [Tue, 27 Nov 2018 17:53:00 +0000 (17:53 +0000)]
tools: remove tmem code and commands

Remove all tmem related code in libxc.

Leave some stubs in libxl in case anyone has linked to those functions
before the removal.

Remove all tmem related commands in xl, all tmem related code in other
utilities we ship.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/arm: drivers: scif: Add support for SCIFA compatible UARTs
Oleksandr Tyshchenko [Thu, 2 May 2019 17:00:20 +0000 (20:00 +0300)]
xen/arm: drivers: scif: Add support for SCIFA compatible UARTs

For the driver to be able to handle SCIFA interface as well,
this patch just adds the following:
- SCIFA related macros
- New element in "port_params" array to keep SCIFA specific things
- SCIFA compatible string

This patch makes possible to use existing driver for Renesas "Stout"
board based on R-Car H2 SoC (SCIFA).

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: drivers: scif: Extend driver to handle other interfaces
Oleksandr Tyshchenko [Thu, 2 May 2019 17:00:19 +0000 (20:00 +0300)]
xen/arm: drivers: scif: Extend driver to handle other interfaces

Extend driver to be able to handle other SCIF(X) compatible
interfaces as well. These interfaces have lot in common,
but mostly differ in offsets and bits for some registers.

For example, the main difference between SCIF and SCIFA interfaces
from "scif-uart" driver's point of view:
- Registers offset: serial status, receive/transmit FIFO data
  registers have different offset
- Internal FIFO size: 64 bytes for SCIFA and 16 bytes for SCIF
- Overrun bit location: serial status register for SCIFA and
  dedicated line status register for SCIF

Introduce "port_params" array to keep interface specific things.

The "data" field in struct dt_device_match is used for recognizing
what interface is present on a target board.

Please note, nothing has been technically changed for Renesas "Lager"
and other supported boards (SCIF).

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: Misc improvements to do_common_cpu_on()
Andrew Cooper [Wed, 24 Apr 2019 18:10:58 +0000 (19:10 +0100)]
xen/arm: Misc improvements to do_common_cpu_on()

 * Use domain_vcpu() rather than opencoding the lookup.  Amongst other things,
   domain_vcpu() is spectre-v1-safe.
 * Unlock the domain immediately after arch_set_info_guest() completes.  There
   is no need for free_vcpu_guest_context() to be within the critical region,
   and moving the call simplifies the error case.

No practical change in functionality.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm64: __cmpxchg and __cmpxchg_mb should always be inline
Julien Grall [Wed, 27 Mar 2019 18:45:31 +0000 (18:45 +0000)]
xen/arm64: __cmpxchg and __cmpxchg_mb should always be inline

Currently __cmpxchg_mb and __cmpxchg are only marked inline. The
compiler is free to decide to not honor the inline. This will result to
generate code use __bad_cmpxchg and lead a link failure.

This was caught by Clang 8.0.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: guest_walk: Avoid theoritical unitialized value in get_top_bit
Julien Grall [Wed, 27 Mar 2019 18:45:28 +0000 (18:45 +0000)]
xen/arm: guest_walk: Avoid theoritical unitialized value in get_top_bit

Clang 8.0 throws an error in the get_top_bit function:

guest_walk.c:328:15: error: variable 'topbit' is used uninitialized
whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
    else if ( is_64bit_domain(d) )
              ^~~~~~~~~~~~~~~~~~

This is happening because clang thinks that is_32bit_domain(d) is not
the exact inverse of is_64bit_domain(d). So it expects a else case to
handle the case where the latter call is false.

In other part of the code, dealing with difference between 32-bit and
64-bit domain, we usually use if ( is_XXbit_domain ) ... else ...

So use the same pattern here.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: sysreg: Implement the 32-bit helpers using the 64-bit helpers
Julien Grall [Wed, 27 Mar 2019 18:45:25 +0000 (18:45 +0000)]
xen/arm64: sysreg: Implement the 32-bit helpers using the 64-bit helpers

Clang is pickier than GCC for the register size in asm statement. It
expects the register size to match the value size.

The instructions msr/mrs are expecting a 64-bit register. This means the
implementation of the 32-bit helpers is not correct. The easiest
solution is to implement the 32-bit helpers using the 64-bit helpers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano <sstabellini@kernel.org>
5 years agoxen/arm: zynqmp: Fix header guard for xilinx-zynqmp-eemi.h
Julien Grall [Wed, 27 Mar 2019 18:45:22 +0000 (18:45 +0000)]
xen/arm: zynqmp: Fix header guard for xilinx-zynqmp-eemi.h

The header guard for xilinx-zynqmp-eemi.h is not followed by a #define
of the macro used in the guard.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Add Amlogic Meson SoCs earlyprintk support
Amit Singh Tomar [Sun, 14 Apr 2019 17:50:06 +0000 (23:20 +0530)]
xen/arm: Add Amlogic Meson SoCs earlyprintk support

This patch adds earlyprintk support for Amlogic Meson SoC based
boards.

ATF[1] and U-boot[2] already initialize the UART for us. So no need to do it again.

Tested With:
 http://wiki.friendlyarm.com/wiki/index.php/NanoPi_K2

[1]: https://github.com/ARM-software/arm-trusted-firmware/blob/master/drivers/meson/console/aarch64/meson_console.S#L92
[2]: https://github.com/u-boot/u-boot/blob/master/drivers/serial/serial_meson.c#L44

Signed-off-by: Amit Singh Tomar <amittomer25@gmail.com>
5 years agoxen/arm64: head: Combine lsl and str instructions in a single one
Julien Grall [Tue, 19 Mar 2019 23:27:53 +0000 (23:27 +0000)]
xen/arm64: head: Combine lsl and str instructions in a single one

We can optimize a bit the assembly code by combining the 2 instructions
in a single one. This likely not going to make the code faster, but
likely make easier to read the assembly.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Clarify usage of earlyprintk for Lager board
Oleksandr Tyshchenko [Wed, 17 Apr 2019 14:59:31 +0000 (17:59 +0300)]
xen/arm: Clarify usage of earlyprintk for Lager board

Current sentence is not entirely correct. Since SCIF0 interface is
applicable for Lager board, but is not applicable for all R-Car H2
based boards. For example, Stout board uses SCIFA0 interface.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: kernel: Remove Dom prefix when using %pd format
Julien Grall [Tue, 19 Mar 2019 23:23:43 +0000 (23:23 +0000)]
xen/arm: kernel: Remove Dom prefix when using %pd format

The format %pd will already prefix the domain ID with 'd'. So avoid to
prefix with 'Dom'.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86/vvmx: Simplify per-CPU memory allocations
Andrew Cooper [Wed, 27 Mar 2019 18:50:46 +0000 (18:50 +0000)]
x86/vvmx: Simplify per-CPU memory allocations

 * Use XFREE() instead of opencoding it in nvmx_cpu_dead()
 * Avoid redundant evaluations of per_cpu()
 * Don't allocate vvmcs_buf at all if it isn't going to be used.  It is never
   touched on hardware lacking the VMCS Shadowing feature.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
5 years agosched/credit: avoid priority boost for capped domains when unpark
Eslam Elnikety [Fri, 3 May 2019 19:43:49 +0000 (19:43 +0000)]
sched/credit: avoid priority boost for capped domains when unpark

When unpausing a capped domain, the scheduler currently clears the
CSCHED_FLAG_VCPU_PARKED flag before vcpu_wake(). This, in turn, causes the
vcpu_wake to set CSCHED_PRI_TS_BOOST, resulting in an unfair credit boost. The
comment around the changed lines already states that clearing the flag should
happen AFTER the unpause. This bug was introduced in commit be650750945
"credit1: Use atomic bit operations for the flags structure".

Original patch author credit: Xi Xiong while at Amazon.

Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Petre Eftime <epetre@amazon.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86/boot: Annotate the Real Mode entry points
Andrew Cooper [Wed, 1 May 2019 17:14:03 +0000 (18:14 +0100)]
x86/boot: Annotate the Real Mode entry points

... because its already hard enough to follow.  Cross reference the locations
in C which set the entrypoints up, and state the alignment requirements and
entry conditions.

Drop a redundant .align 16, and panic() in do_boot_cpu() if the AP trampoline
isn't set up properly rather than blindly continuing and letting the APs
execute junk, or shifting part of the address into unrelated fields in ICR.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Fix latent memory corruption with early_boot_opts_t
Andrew Cooper [Wed, 1 May 2019 17:14:03 +0000 (18:14 +0100)]
x86/boot: Fix latent memory corruption with early_boot_opts_t

c/s ebb26b509f "xen/x86: make VGA support selectable" added an #ifdef
CONFIG_VIDEO into the middle the backing space for early_boot_opts_t,
but didn't adjust the structure definition in cmdline.c

This only functions correctly because the affected fields are at the end
of the structure, and cmdline.c doesn't write to them in this case.

To retain the slimming effect of compiling out CONFIG_VIDEO, adjust
cmdline.c with enough #ifdef-ary to make C's idea of the structure match
the declaration in asm.  This requires adding __maybe_unused annotations
to two helper functions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/wakeup: Stop using %fs for lidt/lgdt
David Woodhouse [Sun, 28 Apr 2019 14:13:37 +0000 (17:13 +0300)]
x86/wakeup: Stop using %fs for lidt/lgdt

The wakeup code is now relocated alongside the trampoline code, so
as long as we move idt_48 and gdt_48 up a little bit so that they're
visible in the real-mode segment that the wakeup code runs in, using
%ds is just fine here.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpu: Use cpu_has_sep for configuring the SYSENTER MSRs
Andrew Cooper [Fri, 26 Apr 2019 10:19:07 +0000 (11:19 +0100)]
x86/cpu: Use cpu_has_sep for configuring the SYSENTER MSRs

Currently, configuration of the SYSENTER MSRs are behind a vendor check for
Intel and Centaur, but this misses Zhaoxin.

Use the feature bit, rather than a vendor check.  cpu_has_sep is cleared early
for AMD processors, which can't use SYSENTER/SYSEXIT when operating in long
mode.

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/IRQ: reduce unused space in struct arch_irq_desc
Jan Beulich [Mon, 29 Apr 2019 11:25:49 +0000 (05:25 -0600)]
x86/IRQ: reduce unused space in struct arch_irq_desc

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mem_sharing: aquire extra references for pages with correct domain
Tamas K Lengyel [Thu, 25 Apr 2019 15:32:50 +0000 (09:32 -0600)]
x86/mem_sharing: aquire extra references for pages with correct domain

Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced
grabbing extra references for pages that drop references tied to PGC_allocated.
However, these pages are actually owned by dom_cow, resulting both sharing and
unsharing breaking.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/timers: Fix memory leak with cpu unplug/plug (take 2)
Andrew Cooper [Tue, 23 Apr 2019 15:18:29 +0000 (16:18 +0100)]
xen/timers: Fix memory leak with cpu unplug/plug (take 2)

Previous attempts to fix this leak didn't identify the root cause, and
ultimately failed.  The cause is actually the CPU_UP_PREPARE case
(re)initialising ts->heap back to dummy_heap, which leaks the previous
allocation.

Rearrange the logic to only initialise ts once.  This also avoids the
redundant (but benign, due to ts->inactive always being empty) initialising of
the other ts fields.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/domain: Block more speculative out-of-bound accesses
Andrew Cooper [Wed, 24 Apr 2019 17:53:15 +0000 (18:53 +0100)]
xen/domain: Block more speculative out-of-bound accesses

c/s f8303458 restricted speculative access for do_vcpu_op(), but neglected its
compat counterpart, which is reachable by guests using the 32bit ABI.

Make an identical adjustment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/shadow: Drop incorrect diagnostic when shadowing TSS.RSP0
Andrew Cooper [Thu, 26 May 2016 16:37:30 +0000 (17:37 +0100)]
x86/shadow: Drop incorrect diagnostic when shadowing TSS.RSP0

During development of the XTF pagewalk tests, I reliably encountered this
message exactly once per run.  It occurs when the first action to touch
TSS.RSP0 is an interrupt/exception taken in userspace, and the processor tries
to push the IRET frame.

Subsequently, OSSTest has demonstrated that it triggers frequently for a
KPTI-enabled kernel.

  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646687f38, mfn=0x2415a1
  [ 1411.949155] systemd-logind[2683]: New session 73 of user root.
  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad264671ff38, mfn=0x240a41
  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646837f38, mfn=0x2415c5
  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad26468a7f38, mfn=0x2414e7
  [ 1442.207473] systemd-logind[2683]: New session 74 of user root.
  [ 1471.452206] systemd-logind[2683]: New session 75 of user root.
  (XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646d17f08, mfn=0x2417c5
  [ 1501.698971] systemd-logind[2683]: New session 76 of user root.

The actions performed by the shadow code are correct, and the guest continues
without error, but the emitted error is misleading.  Tweak the comment to more
clearly identify why the condition exists, but drop the message.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/svm: Fix handling of ICEBP intercepts
Andrew Cooper [Fri, 1 Feb 2019 14:48:48 +0000 (14:48 +0000)]
x86/svm: Fix handling of ICEBP intercepts

c/s 9338a37d "x86/svm: implement debug events" added support for introspecting
ICEBP debug exceptions, but didn't account for the fact that
svm_get_insn_len() (previously __get_instruction_length) can fail and may
already have raised #GP with the guest.

If svm_get_insn_len() fails, return back to guest context rather than
continuing and mistaking a trap-style VMExit for a fault-style one.

Spotted by Coverity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoxen/sched: we never get into context_switch() with prev==next
Dario Faggioli [Sat, 20 Apr 2019 15:24:47 +0000 (17:24 +0200)]
xen/sched: we never get into context_switch() with prev==next

In schedule(), if we pick, as the next vcpu to run (next) the same one
that is running already (prev), we never get to call context_switch().

We can, therefore, get rid of all the `if`-s testing prev and next being
different, trading them with an ASSERT() (on ARM, the ASSERT() was even
already there!)

Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
5 years agox86/boot: Detect the firmware SMT setting correctly on Intel hardware
Andrew Cooper [Fri, 5 Apr 2019 12:26:30 +0000 (13:26 +0100)]
x86/boot: Detect the firmware SMT setting correctly on Intel hardware

While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware.  As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.

Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware.  Fall back to
using the ACPI table APIC IDs.

While adjusting this logic, fix a latent bug in amd_get_topology().  The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT
Andrew Cooper [Fri, 5 Apr 2019 12:26:30 +0000 (12:26 +0000)]
x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT

This is a model specific register which details the current configuration
cores and threads in the package.  Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.

It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties.  Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/spec-ctrl: Reposition the XPTI command line parsing logic
Andrew Cooper [Wed, 12 Sep 2018 13:36:00 +0000 (14:36 +0100)]
x86/spec-ctrl: Reposition the XPTI command line parsing logic

It has ended up in the middle of the mitigation calculation logic.  Move it to
be beside the other command line parsing.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Don't leak the module_map allocation in __start_xen()
Andrew Cooper [Fri, 5 Apr 2019 15:58:44 +0000 (15:58 +0000)]
x86/boot: Don't leak the module_map allocation in __start_xen()

Ever since its introducion in c/s 436fb462 "x86/microcode: enable boot
time (pre-Dom0) loading", the allocation has gone un-freed, and has its final
use as part of constructing dom0.

Xen already consideres it an error to have more than a single unaccounted-for
module (again, logic from the same change), and will only pass the first one
to dom0 as the initrd.

Instead of having an 8 byte pointer to a bitmap which won't exceed 4 bits wide
in any production scenario (dom0 kernel, initrd, XSM blob and microcode blob),
allocate module_map[] on the stack and add a sanity bound for mbi->mods_count.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>