Andrew Cooper [Fri, 10 Jan 2020 16:35:14 +0000 (16:35 +0000)]
x86/boot: Create the l2_xenmap[] mappings dynamically
The build-time construction of l2_xenmap[] imposes an arbitrary limit of 16M
total, which is a limit looking to be lifted.
Adjust both the BIOS and EFI paths to fill it in dynamically, based on the
final linked size of Xen. l2_xenmap[] stays between __page_tables_{start,end}
(rather than move into .bss.page_aligned) as it is expected to gain a
different pagetable reference shortly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Fri, 8 Nov 2019 15:33:32 +0000 (16:33 +0100)]
xen/sched: eliminate sched_tick_suspend() and sched_tick_resume()
sched_tick_suspend() and sched_tick_resume() only call rcu related
functions, so eliminate them and do the rcu_idle_timer*() calling in
rcu_idle_[enter|exit]().
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Julien Grall <julien@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Fri, 8 Nov 2019 07:02:53 +0000 (08:02 +0100)]
xen/sched: remove special cases for free cpus in schedulers
With the idle scheduler now taking care of all cpus not in any cpupool
the special cases in the other schedulers for no cpupool associated
can be removed.
Juergen Gross [Thu, 7 Nov 2019 14:34:37 +0000 (15:34 +0100)]
xen/sched: make sched-if.h really scheduler private
include/xen/sched-if.h should be private to scheduler code, so move it
to common/sched/private.h and move the remaining use cases to
cpupool.c and core.c.
Jan Beulich [Wed, 22 Jan 2020 15:39:58 +0000 (16:39 +0100)]
VT-d: don't pass bridge devices to domain_context_mapping_one()
When passed a non-NULL pdev, the function does an owner check when it
finds an already existing context mapping. Bridges, however, don't get
passed through to guests, and hence their owner is always going to be
Dom0, leading to the assigment of all but one of the function of multi-
function PCI devices behind bridges to fail.
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monné [Wed, 22 Jan 2020 15:38:39 +0000 (16:38 +0100)]
x86/smp: use APIC ALLBUT destination shorthand when possible
If the IPI destination mask matches the mask of online CPUs use the
APIC ALLBUT destination shorthand in order to send an IPI to all CPUs
on the system except the current one. This can only be safely used
when no CPU hotplug or unplug operations are taking place, no
offline CPUs or those have been onlined and parked, all CPUs in the
system have been accounted for (ie: the number of CPUs doesn't exceed
NR_CPUS and APIC IDs are below MAX_APICS) and there's no possibility
of CPU hotplug (ie: no disabled CPUs have been reported by the
firmware tables).
This is specially beneficial when using the PV shim, since using the
shorthand avoids performing an APIC register write (or multiple ones
if using xAPIC mode) for each destination when doing a global TLB
flush.
The lock time of flush_lock on a 32 vCPU guest using the shim in
x2APIC mode without the shorthand is:
Note that this requires locking the CPU maps (get_cpu_maps) which uses
a trylock. This is currently safe as all users of cpu_add_remove_lock
do a trylock, but will need reevaluating if non-trylock users appear.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Sat, 18 Jan 2020 15:39:24 +0000 (15:39 +0000)]
xen/arm: gic: Remove pointless assertion against enum gic_sgi
The Arm Compiler will complain that the assertions ASSERT(sgi < 16) are
always true. This is because sgi is an item of the enum gic_sgi and
should always contain less than 16 SGIs.
Rather than using ASSERTs, introduce a new item in the enum that could
be checked against a build time.
Take the opportunity to remove the specific assigned values for each
item. This is fine because enum always starts at zero and values will be
assigned by increment of one. None of our code also rely on hardcoded
value.
Julien Grall [Thu, 16 Jan 2020 21:51:36 +0000 (21:51 +0000)]
Revert "xen/arm32: setup: Give a xenheap page to the boot allocator"
Since commit c61c1b4943 "xen/page_alloc: statically allocate
bootmem_region_list", the boot allocator does not use the first page of
the first region passed for its own purpose.
George Dunlap [Fri, 17 Jan 2020 14:01:05 +0000 (14:01 +0000)]
golang/xenlight: Don't leak memory on context open failure
If libxl_ctx_alloc() returns an error, we need to destroy the logger
that we made.
Restructure the Close() method such that it checks for each resource
to be freed and then frees it. This allows Close() to be come
idempotent, as well as to be a useful clean-up to a partially-created
context.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
George Dunlap [Thu, 26 Dec 2019 17:18:14 +0000 (17:18 +0000)]
golang/xenlight: Errors are negative
Commit 871e51d2d4 changed the sign on the xenlight error types (making
the values negative, same as the C-generated constants), but failed to
flip the sign in the Error() string function. The result is that
ErrorNonspecific.String() prints "libxl error: 1" rather than the
human-readable error message.
Get rid of the whole issue by making libxlErrors a map, and mapping
actual error values to string, falling back to printing the actual
value of the Error type if it's not present.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
George Dunlap [Thu, 26 Dec 2019 14:45:08 +0000 (14:45 +0000)]
go/xenlight: More informative error messages
If an error is encountered deep in a complicated data structure, it's
often difficult to tell where the error actually is. Make the error
message from the generated toC() and fromC() structures more
informative by tagging which field being converted encountered the
error. This will have the effect of giving a "stack trace" of the
failure inside a nested data structure.
NB that my version of python insists on reordering a couple of switch
statements for some reason; In other patches I've reverted those
changes, but in this case it's more difficult because they interact
with actual code changes. I'll leave this here for now, as we're
going to remove helpers.gen.go from being tracked by git at some point
in the near future anyway.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
George Dunlap [Thu, 26 Dec 2019 17:43:17 +0000 (17:43 +0000)]
go/xenlight: Fix CpuidPoliclyList conversion
Empty Go strings should be converted to `nil` libxl_cpuid_policy_list;
otherwise libxl_cpuid_parse_config gets confused.
Also, libxl_cpuid_policy_list returns a weird error, not a "normal"
libxl error; if it returns one of these non-standard errors, convert
it to ErrorInval.
Finally, make the fromC() method take a pointer, and set the value of
CpuidPolicyList such that it will generate a valid CpuidPolicyList in
response.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Paul Durrant [Mon, 13 Jan 2020 15:32:17 +0000 (15:32 +0000)]
Introduce CHANGELOG.md
As agreed during the 2020-01 community call [1] this patch introduces a
changelog, based on the principles explained at keepachangelog.com [2].
A new MAINTAINERS entry is also added, with myself as (currently sole)
maintainer.
[1] See C.2 at https://cryptpad.fr/pad/#/2/pad/edit/ERZtMYD5j6k0sv-NG6Htl-AJ/
[2] https://keepachangelog.com/en/1.0.0/
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Lars Kurth <lars.kurth@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Roger Pau Monné [Mon, 20 Jan 2020 11:48:05 +0000 (12:48 +0100)]
x86/smp: move and clean APIC helpers
Move __prepare_ICR{2}, apic_wait_icr_idle and
__default_send_IPI_shortcut to the top of the file, since they will be
used by send_IPI_mask in future changes.
While there, take the opportunity to remove the leading underscores,
drop the inline attribute, drop the default prefix from the shorthand
helper, change the return type of the prepare helpers to unsigned and
do some minor style cleanups.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 20 Jan 2020 11:47:31 +0000 (12:47 +0100)]
VT-d: dma_pte_clear_one() can't fail anymore
Hence it's pointless for it to return an error indicator, and it's even
less useful for it to be __must_check. This is a result of commit e8afe1124cc1 ("iommu: elide flushing for higher order map/unmap
operations") moving the TLB flushing out of the function.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Fri, 17 Jan 2020 16:38:19 +0000 (17:38 +0100)]
build: fix dependency file generation with ENFORCE_UNIQUE_SYMBOLS=y
The recorded file, unless overridden by -MQ (or -MT) is that specified
by -o, which doesn't produce correct dependencies and hence will cause
failure to re-build when included files change.
Fixes: 81ecb38b83b0 ("build: provide option to disambiguate symbol names") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jason Andryuk [Fri, 17 Jan 2020 15:19:16 +0000 (16:19 +0100)]
x86/shadow: use single (atomic) MOV for emulated writes
This is the corresponding change to the shadow code as made by bf08a8a08a2e "x86/HVM: use single (atomic) MOV for aligned emulated
writes" to the non-shadow HVM code.
The bf08a8a08a2e commit message:
Using memcpy() may result in multiple individual byte accesses
(depending how memcpy() is implemented and how the resulting insns,
e.g. REP MOVSB, get carried out in hardware), which isn't what we
want/need for carrying out guest insns as correctly as possible. Fall
back to memcpy() only for accesses not 2, 4, or 8 bytes in size.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Tim Deegan <tim@xen.org>
Igor Druzhinin [Fri, 17 Jan 2020 15:18:20 +0000 (16:18 +0100)]
x86/sm{e, a}p: do not enable SMEP/SMAP in PV shim by default on AMD
Due to AMD and Hygon being unable to selectively trap CR4 bit modifications
running 32-bit PV guest inside PV shim comes with significant performance
hit. Moreover, for SMEP in particular every time CR4.SMEP changes on context
switch to/from 32-bit PV guest, it gets trapped by L0 Xen which then
tries to perform global TLB invalidation for PV shim domain. This usually
results in eventual hang of a PV shim with at least several vCPUs.
Since the overall security risk is generally lower for shim Xen as it being
there more of a defense-in-depth mechanism, choose to disable SMEP/SMAP in
it by default on AMD and Hygon unless a user chose otherwise.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 17 Jan 2020 15:17:23 +0000 (16:17 +0100)]
x86: adjust EFI-related build message
As of commit 93249f7fc17c ("x86/efi: split compiler vs linker support"),
EFI support in xen.gz may be available even if no xen.efi gets
generated. Distinguish the cases when emitting the message.
Also drop the pointlessly (afaict) left use of $(filter ...) (needed
only when used in $(if ...)), from the ifeq() introduced by 7059afb202ff
("x86/Makefile: remove $(guard) use from $(TARGET).efi target").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 17 Jan 2020 15:15:28 +0000 (16:15 +0100)]
x86: refine link time stub area related assertion
While it has been me to introduce this, the use of | there has become
(and perhaps was from the very beginning) misleading. Rather than
avoiding the right side of it when linking the xen.efi intermediate file
at a different base address, make the expression cope with that case,
thus verifying placement on every step.
Furthermore the original check was too strict: We don't use one page per
CPU, so account for this as well. This involves moving the
STUBS_PER_PAGE definition and making DIV_ROUND_UP() accessible from
assembly (and hence the linker script); move a few other potentially
generally useful definitions along with it.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Igor Druzhinin [Fri, 17 Jan 2020 15:11:20 +0000 (16:11 +0100)]
x86/time: update TSC stamp on restore from deep C-state
If ITSC is not available on CPU (e.g if running nested as PV shim)
then X86_FEATURE_NONSTOP_TSC is not advertised in certain cases, i.e.
all AMD and some old Intel processors. In which case TSC would need to
be restored on CPU from platform time by Xen upon exiting C-states.
As platform time might be behind the last TSC stamp recorded for the
current CPU, invariant of TSC stamp being always behind local TSC counter
is violated. This has an effect of get_s_time() going negative resulting
in eventual system hang or crash.
Fix this issue by updating local TSC stamp along with TSC counter write.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Lars Kurth [Fri, 17 Jan 2020 15:10:57 +0000 (16:10 +0100)]
get-maintainer.pl: Dont fall over when L: contains a display name
Prior to this change e-mail addresses of the form "display name
<email>" would result into empty output. Also see
https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg00753.html
Signed-off-by: Lars Kurth <lars.kurth@citrix.com> Reviewed-by: Julien Grall <julien@xen.org>
Despite being vaguely aware, the difference between PAGE_HYPERVISOR in ASM and
C code has nevertheless caused several bugs I should have known better about,
and contributed to review confusion.
There are exactly 4 uses of these constants in asm code (and one is shortly
going to disappear).
Instead of creating the constants which behave differently between ASM and C
code, expose all the constants and use non-ambiguous non-NX ones in ASM.
Adjust the hiding to just _PAGE_NX, which contains a C ternary expression.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 14 Jan 2020 12:17:45 +0000 (12:17 +0000)]
tools/libxc: Construct 32bit PV guests with L3 A/D bits set
With the 32 PAE build of Xen gone, 32bit PV guests' top level pagetables no
longer behave exactly like PAE in hardware.
They should have A/D bits set, for the same performance reasons as apply to
other levels. This brings the domain builder in line with how Xen constructs
a 32bit dom0.
As a purely code improvement, make use of range notation to initialise
identical values in adjacent array elements.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org>
Nick Rosbrook [Sat, 4 Jan 2020 21:00:52 +0000 (16:00 -0500)]
golang/xenlight: implement keyed union Go to C marshaling
Since the C union cannot be directly populated, populate the fields of the
corresponding C struct defined in the cgo preamble, and then copy that
struct as bytes into the byte slice that Go uses as the union.
Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Roger Pau Monne [Tue, 24 Dec 2019 10:18:10 +0000 (11:18 +0100)]
x86/hvm: always expose x2APIC feature in max HVM cpuid policy
On hardware without x2APIC support Xen emulated local APIC will
provide such mode, and hence the feature should be set in the maximum
HVM cpuid policy.
Not exposing it in the maximum policy results in HVM domains not
getting such feature exposed unless it's also supported by the
underlying hardware.
This was regressed by c/s 3e0c8272f20 which caused x2APIC not to be enabled
unilaterally for HVM guests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 19 Dec 2019 21:19:35 +0000 (21:19 +0000)]
libxc/migration: Adjust layout of struct xc_sr_context
We are shortly going to want to introduce some common x86 fields, so having
x86_pv and x86_hvm as the top level objects is a problem. Insert a
surrounding struct x86 and drop the x86 prefix from the pv/hvm objects.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Thu, 5 Dec 2019 15:57:13 +0000 (15:57 +0000)]
tools/migration: Formatting and style cleanup
The code has devating from the prevailing style in many ways. Adjust spacing,
indentation, position of operators, layout of multiline comments, removal of
superfluous comments, constness, trailing commas, and use of unqualified
'unsigned'.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Wed, 15 Jan 2020 18:44:18 +0000 (18:44 +0000)]
xen/vcpu: Improve sanity checks in vcpu_create()
The BUG_ON() is confusing to follow. The (!is_idle_domain(d) || vcpu_id) part
is a vestigial remnant of architectures poisioning idle_vcpu[0] with non-NULL
pointers.
Now that idle_vcpu[0] is NULL on all architectures, and d->max_vcpus specified
before vcpu_create() is called, we can properly range check the requested
vcpu_id.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien@xen.org>
Andrew Cooper [Wed, 15 Jan 2020 18:43:58 +0000 (18:43 +0000)]
ARM/boot: Don't poison 'current' during early boot
This logic was inherited from x86 (which was updated several times since).
Unlike x86 (at the time) however, while NULL isn't mapped in ARM, 0xfffff000
is, making this actively dangerous.
Drop the logic entirely, and leave 'current' as NULL during early boot.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien@xen.org>
xen/arm: during efi boot, improve the check for usable memory
When booting via EFI, the EFI memory map has information about memory
regions and their type. Improve the check for the type and attribute of
each memory region to figure out whether it is usable memory or not.
This patch brings the check on par with Linux v5.5-rc6 and makes more
memory reusable as normal memory by Xen (except that Linux also reuses
EFI_PERSISTENT_MEMORY, which we do not).
Specifically, this patch also reuses memory marked as
EfiLoaderCode/Data, and it uses both Attribute and Type for the check
(Attribute needs to be EFI_MEMORY_WB).
Tamas K Lengyel [Fri, 10 Jan 2020 02:30:52 +0000 (19:30 -0700)]
Remove undocumented and unmaintained tools/memshr library
The library has been largely untouched for over a decade at this point, it is
undocumented and it's unclear what it was originally used for. Remove it from
tree, if anyone needs it in the future it can be carved out from git history.
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Fri, 10 Jan 2020 16:06:08 +0000 (16:06 +0000)]
x86/boot: Rename l?_identmap to l?_directmap
Since c/s faa85d4fb3 "x86/boot: Don't map 0 during boot", l1_identmap no
longer has an alias mapped at 0, meaning that none of the l?_identmap[]
pagetables are actually an identity map.
Rename them to l?_directmap, which avoids any kind of implication that they
might be mapped at 0.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 20 Dec 2019 16:34:16 +0000 (16:34 +0000)]
libxc/migration: Rationalise the 'checkpointed' field to 'stream_type'
Originally, 'checkpointed' was a boolean signalling the difference between a
plain and a Remus stream. COLO was added later, but several bits of code
retained boolean-style logic. While correct, it is confusing to follow.
Additionally, XC_MIG_STREAM_NONE means "no checkpoints" but reads as "no
stream".
Consolidate all the logic on the term 'stream_type', and rename STREAM_NONE
to STREAM_PLAIN. Re-position the stream_type variable so it isn't
duplicated in both the save and restore unions.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Wed, 18 Dec 2019 19:01:57 +0000 (19:01 +0000)]
libxc/restore: Introduce functionality to simplify blob handling
During migration, we buffer several blobs of data which ultimately need
handing back to Xen at an appropriate time.
Currently, this is all handled in an ad-hoc manner, but more blobs are soon
going to be added. Introduce xc_sr_blob to encapsulate a ptr/size pair, and
update_blob() to handle the memory management aspects.
Switch the HVM_CONTEXT and the four PV_VCPU_* blobs over to this new
infrastructure. No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
"IRQ: u16 is too narrow for an event channel number" introduced a use of
evetchn_port_t, but its typedef apparently surfaces indirectly here only
on x86.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Thu, 19 Dec 2019 08:12:21 +0000 (08:12 +0000)]
xen/arm: Place a speculation barrier sequence following an eret instruction
Some CPUs can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.
Newer CPUs may implement a new SB barrier instruction which acts
as an architected speculation barrier. For current CPUs, the sequence
DSB; ISB is known to prevent speculation.
The latter sequence is heavier than SB but it would never be executed
(this is speculation after all!).
Introduce a new macro 'sb' that could be used when a speculation barrier
is required. For now it is using dsb; isb but this could easily be
updated to cater SB in the future.
Julien Grall [Sat, 11 Jan 2020 00:03:44 +0000 (00:03 +0000)]
docs/misc: pvcalls: Verbatim block should be indented with 4 spaces
At the moment, the diagram is only indented with 2 spaces. So pandoc
will try to badly interpret it and not display it correctly.
Fix it by indenting all the block by 4 spaces (i.e an extra 2 spaces).
Fixes: d661611d08 ("docs/markdown: Switch to using pandoc, and fix underscore escaping") Signed-off-by: Julien Grall <julien@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Paul Durrant [Thu, 9 Jan 2020 11:15:05 +0000 (11:15 +0000)]
tools/Rules.mk: fix distclean
Running 'make distclean' under tools will currently result in:
tools/Rules.mk:245: *** You have to run ./configure before building or installing the tools. Stop.
This patch adds 'distclean', 'subdir-distclean%' and 'subdir-clean%' to
no-configure-targets, which allows 'make distclean' to run to completion.
Fixes: 00691c6c (tools: Allow to make *-dir-force-update without ./configure) Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Tue, 14 Jan 2020 11:03:47 +0000 (12:03 +0100)]
IRQ: u16 is too narrow for an event channel number
FIFO event channels allow ports up to 2^17, so we need to use a wider
field in struct pirq. Move "masked" such that it may share the 8-byte
slot with struct arch_pirq on 64-bit arches, rather than leaving a
7-byte hole in all cases.
Take the opportunity and also add a comment regarding "arch" placement
within the structure.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Mon, 9 Dec 2019 11:12:07 +0000 (11:12 +0000)]
CODING_STYLE: Document how to handle unexpected conditions
It's not always clear what the best way is to handle unexpected
conditions: whether with ASSERT(), domain_crash(), BUG_ON(), or some
other method. All methods have a risk of introducing security
vulnerabilities and unnecessary instabilities to production systems.
Provide guidelines for different options and when to use them.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien@xen.org>
Andrew Cooper [Sat, 28 Dec 2019 15:01:00 +0000 (15:01 +0000)]
x86/boot: Drop INVALID_VCPU
Now that NULL will fault at boot, there is no need for a special constant to
signify "current not set up yet".
Since c/s fae249d23413 "x86/boot: Rationalise stack handling during early
boot", the BSP cpu_info block is now consistently zero, so drop the adjacent
re-zeroing.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 6 Jan 2020 13:37:41 +0000 (13:37 +0000)]
x86/boot: Don't map 0 during boot
In particular, it causes accidental NULL pointer dereferences to go unnoticed.
The majority of the early operation takes place either in Real mode, or
Protected Unpaged mode. The only bit which requires pagetable mappings is the
trampoline transition into Long mode and jump to the higher mappings, so there
is no need for the whole bottom 2M to be mapped.
Introduce a new l1_bootmap in .init.data, and use it instead of l1_identmap.
The EFI boot path doesn't pass through the trampoline, so doesn't need any
adjustment.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 6 Jan 2020 13:37:54 +0000 (13:37 +0000)]
x86/boot: Clean up l?_bootmap[] construction
The need for Xen to be identity mapped into the bootmap is not obvious, and
differs between the MB and EFI boot paths.
The EFI side is further complicated by an attempt to cope with with l2_bootmap
only being 4k long. This is undocumented, confusing, only works if Xen is the
single object wanting mapping.
The pageables are common to both the MB and EFI builds, so simplify the EFI
bootmap construction code to make exactly one identity-map of Xen, which now
makes the two paths consistent. Comment both pieces of logic, explaining what
the mappings are needed for.
Finally, leave a linker assert covering the fact that plenty of code blindly
assumes that Xen is less that 16M. This wants fixing in due course.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sat, 28 Dec 2019 14:20:59 +0000 (14:20 +0000)]
x86/boot: Remove the preconstructed low 16M superpage mappings
These are left over from c/s b2804422 "x86: make Xen early boot code
relocatable", which made it possible for Xen not to be in the bottom 16M.
Nothing using the mappings any more. Build them in the directmap when walking
the E820 table along with everything else.
Furthermore, it is undefined to have superpages and MTRRs disagree on
cacheability boundaries, and nothing actually checks. While we don't fix this
explicitly, we do at least honour the E820 now if it says there are boundaries
in this range.
As a consequence, there are now no _PAGE_PRESENT entries between
__page_tables_{start,end} which need to skip relocation. This simplifies the
MB1/2 entry path logic to remove the l2_identmap[] special case.
The low 2M (using 4k pages) is retained for now. Amongst other things, it
matters for console logging while the legacy VGA hole is in use.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 8 Jan 2020 13:36:42 +0000 (13:36 +0000)]
x86/boot: Rationalise stack handling during early boot
The top (numerically higher addresses) of cpu0_stack[] contains the BSP's
cpu_info block. Logic in Xen expects this to be initialised to 0, but this
area of stack is also used during early boot.
Update the head.S code to avoid using the cpu_info block. Additionally,
update the stack_start variable to match, which avoids __high_start() and
efi_arch_post_exit_boot() needing to make the adjustment manually.
Finally, leave a big warning by the BIOS BSS initialisation, because it is by
no means obvious that the stack doesn't survive the REP STOS.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 9 Jan 2020 10:09:02 +0000 (11:09 +0100)]
x86/MCE: correct struct mcinfo_extended for compat guests
The use of any kind of pointers in the public interface is wrong,
including dimensioning arrays based on the size of pointers. The least
bad option of addressing the issue looks to be to pin down the number
that the (64-bit) hypervisor has used anyway (even when passing
information to compat but privileged guests). There aren't actual
instantiations of the structure apart from ones allocated dynamically
out of struct mc_info's mi_data[], which is entirely controlled by the
hypervisor.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 9 Jan 2020 10:08:29 +0000 (11:08 +0100)]
x86/MCE: avoid leaking stack data
While HYPERVISOR_mca is a privileged operation, we still shouldn't leak
stack contents (the tail of every array entry's mc_msrvalues[] of
XEN_MC_physcpuinfo output). Simply use a zeroing allocation here.
Take the occasion and also restrict the involved local variable's scope.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Thu, 9 Jan 2020 10:07:38 +0000 (11:07 +0100)]
x86: clear per cpu stub page information in cpu_smpboot_free()
cpu_smpboot_free() removes the stubs for the cpu going offline, but it
isn't clearing the related percpu variables. This will result in
crashes when a stub page is released due to all related cpus gone
offline and one of those cpus going online later.
Fix that by clearing stubs.addr and stubs.mfn in order to allocate a
new stub page when needed, irrespective of whether the CPU gets parked
or removed.
Fixes: 2e6c8f182c9c50 ("x86: distinguish CPU offlining from CPU removal") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Tao Xu <tao3.xu@intel.com>
Andrew Cooper [Wed, 8 Jan 2020 13:11:13 +0000 (13:11 +0000)]
x86/boot: Simplify BSS zeroing
There is no need to load a non-flat %es to zero the BSS. Use sym_esi()
instead, which is easier to follow, faster (avoids two segment loads) and
doesn't require use of the stack.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 6 Jan 2020 13:36:30 +0000 (13:36 +0000)]
x86/boot: Map the trampoline as read-only
c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
removed the final writes which occurred between enabling paging and switching
to the high mappings. There don't plausibly need to be any memory writes in
few instructions is takes to perform this transition.
As a consequence, we can remove the RWX mapping of the trampoline. It is RX
via its identity mapping below 1M, and RW via the directmap.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Wed, 8 Jan 2020 16:57:16 +0000 (17:57 +0100)]
MAINTAINERS: fix malformed entry
MAINTAINERS entries tagged with "L:" should have a pure mail address
as the second word. Fix a malformed entry. Otherwise add_maintainers.pl
will produce an empty "Cc:" line.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Wed, 8 Jan 2020 10:43:24 +0000 (11:43 +0100)]
xen/spinlock: disable spinlock debugging in console_force_unlock()
console_force_unlock() might result in subsequent ASSERT() triggering
when CONFIG_DEBUG_LOCKS was active. Avoid that by calling
spin_debug_disable() in console_force_unlock() and make the spinlock
debug assertions trigger only if spin_debug was active.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
mem_sharing.c:361:13: error: 'rmap_has_entries' defined but not used [-Werror=unused-function]
static bool rmap_has_entries(const struct page_info *page)
^
cc1: all warnings being treated as errors
This happens in a release build (disables MEM_SHARING_AUDIT) when
CONFIG_MEM_SHARING is enabled.
Expand both trivial helpers into their single callsite.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tamas K Lengyel <tamas@tklengyel.com>
Wei Liu [Tue, 7 Jan 2020 17:17:03 +0000 (17:17 +0000)]
x86/hyperv: drop all __packed from hyperv-tlfs.h
All structures are already naturally aligned. Linux added those
attributes out of paranoia.
In Xen we've had instance we had to drop pointless __packed to placate
gcc 9 (see ca9310b24e "x86/IO-APIC: fix build with gcc9"), it is better
to drop those attributes in hyperv-tlfs.h as well.
Requested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <liuwe@microsoft.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 8 Jan 2020 14:04:36 +0000 (15:04 +0100)]
libxl: don't needlessly report "highmem" in use
Due to the unconditional updating of dom->highmem_end in
libxl__domain_device_construct_rdm() I've observed on a 2Gb HVM guest
with a passed through device (without overly large BARs, and with no RDM
ranges at all)
Jan Beulich [Wed, 8 Jan 2020 14:03:19 +0000 (15:03 +0100)]
x86/mm: rename and tidy create_pae_xen_mappings()
After dad74b0f9e ("i386: fix handling of Xen entries in final L2 page
table") and the removal of 32-bit support the function doesn't modify
state anymore, and hence its name has been misleading. Change its name,
constify parameters and a local variable, and make it return bool.
Also drop the call to it from mod_l3_entry(): The function explicitly
disallows 32-bit domains to modify slot 3. This way we also won't
re-check slot 3 when a slot other than slot 3 changes. Doing so has
needlessly disallowed making some L2 table recursively link back to an
L2 used in some L3's 3rd slot, as we check for the type ref count to be
1. (Note that allowing dynamic changes of L3 entries in the way we do is
bogus anyway, as that's not how L3s behave in the native and EPT cases:
They get re-evaluated only upon CR3 reloads. NPT is different in this
regard.)
As a result of this we no longer need to play games to get at the start
of the L3 table.
Additionally move the single remaining call site, allowing to drop one
is_pv_32bit_domain() invocation and a _PAGE_PRESENT check (in the
function itself) as well as to exit the loop early (remaining entries
have all been set to empty just ahead of this loop).
Further move a BUG_ON() such that in the common case its condition
wouldn't need evaluating.
Finally, since we're at it, move init_xen_pae_l2_slots() next to the
renamed function, as they really belong together (in fact
init_xen_pae_l2_slots() was [indirectly] broken out of this function).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 8 Jan 2020 14:02:26 +0000 (15:02 +0100)]
x86/mm: mod_l<N>_entry() have no need to use __copy_from_user()
mod_l1_entry()'s need to do so went away with commit 2d0557c5cb ("x86:
Fold page_info lock into type_info"), and the other three never had such
a need, at least going back as far as 3.2.0. Replace the uses by
l<N>e_read_atomic().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Wed, 8 Jan 2020 13:59:25 +0000 (14:59 +0100)]
sched: fix resuming from S3 with smt=0
When resuming from S3 and smt=0 or maxcpus= are specified we must not
do anything in cpu_schedule_callback(). This is not true today for
taking down a cpu during resume.
If anything goes wrong during resume all the scheduler related error
handling is in cpupool.c, so we can just bail out early from
cpu_schedule_callback() when suspending or resuming.
This fixes commit 0763cd2687897b55e7 ("xen/sched: don't disable
scheduler on cpus during suspend").
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Wei Liu [Tue, 7 Jan 2020 12:06:49 +0000 (12:06 +0000)]
x86/mm: change pl*e to l*t in virt_to_xen_l*e
We will need to have a variable named pl*e when we rewrite
virt_to_xen_l*e. Change pl*e to l*t to reflect better its purpose.
This will make reviewing later patch easier.
No functional change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Hongyan Xia <hongyxia@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Tue, 7 Jan 2020 12:06:46 +0000 (12:06 +0000)]
x86/mm: introduce l{1,2}t local variables to modify_xen_mappings
The pl2e and pl1e variables are heavily (ab)used in that function. It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.
We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.
Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.
No functional change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Tue, 7 Jan 2020 12:06:45 +0000 (12:06 +0000)]
x86/mm: introduce l{1,2}t local variables to map_pages_to_xen
The pl2e and pl1e variables are heavily (ab)used in that function. It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.
We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.
Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.
No functional change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 17 Dec 2019 18:20:33 +0000 (18:20 +0000)]
tools/dombuilder: Don't allocate dom->p2m_host[] for translated domains
xc_dom_p2m() and dom->p2m_host[] implement a linear transform for translated
domains, but waste a substantial chunk of RAM doing so.
ARM literally never reads dom->p2m_host[] (because of the xc_dom_translated()
short circuit in xc_dom_p2m()). Drop it all.
x86 HVM does use dom->p2m_host[] for xc_domain_populate_physmap_exact() calls
when populating 4k pages. Reuse the same tactic from 2M/1G ranges and use an
on-stack array instead. Drop the memory allocation.
x86 PV guests do use dom->p2m_host[] as a non-identity transform. Rename the
field to pv_p2m to make it clear it is PV-only.
No change in the constructed guests.
Reported-by: Varad Gautam <vrd@amazon.de> Reported-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Tue, 17 Dec 2019 17:41:36 +0000 (17:41 +0000)]
tools/dombuilder: Remove p2m_guest from the common interface
In-guest p2m's are a concept specific to x86 PV guests. alloc_p2m_list() is
the only hook which initialises dom->p2m_guest, making
xc_dom_update_guest_p2m() a nop for non-PV guests.
Move p2m_guest into xc_dom_image_x86 and adjust alloc_p2m_list() to match.
Drop xc_dom_update_guest_p2m() entirely.
One caller, move_l3_below_4G(), only uses it to modify a single entry, so
rewriting the whole guest p2m is wasteful - opencode the single update
instead. The other caller is common code. Instead, move the logic into the
setup_pgtables() hooks, which know their own sizeof_pfn and can do away with
the switch statement.
No change in the constructed guests.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Tue, 17 Dec 2019 17:08:22 +0000 (17:08 +0000)]
tools/dombuilder: Remove PV-only, mandatory hooks
Currently, the setup_pgtable() hook is optional, but alloc_pgtable() hook is
not. Both are specific to x86 PV guests, and stubbed in various ways by the
dombuilders for translated guests (x86 HVM, ARM).
Make alloc_pgtables() optional, and drop all the stubs for translated guest
types.
No change in the constructed guests.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Julien Grall <julien@xen.org>
Andrew Cooper [Tue, 17 Dec 2019 17:03:17 +0000 (17:03 +0000)]
tools/dombuilder: xc_dom_x86 cleanup
The two xc_dom_params structures for PV pagetables are never modified and can
live in .rodata. Reduce their scope to the alloc_pgtable_*() functions which
construct xc_dom_image_x86 appropriately.
Rename {alloc,setup}_pgtables() to {alloc,setup}_pgtables_pv() to highlight
that they are PV only, and drop some _x86() suffixes from static helpers.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
And the following reduced to stubs:
arch_iommu_hwdom_init 852 2 -850
p2m_add_foreign 880 16 -864
This patch also has the unintended but useful consequence of stopping
hardware_dom= functionality from being usable (in at least PV_SHIM_EXCLUSIVE
builds).
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wl@xen.org> Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Andrew Cooper [Fri, 3 Jan 2020 18:31:46 +0000 (18:31 +0000)]
tools/save: Drop unused parameters from xc_domain_save()
XCFLAGS_CHECKPOINT_COMPRESS has been unused since c/s b15bc4345 (2015),
XCFLAGS_HVM since c/s 9e8672f1c (2013), and XCFLAGS_STDVGA since c/s 087d43326 (2007). Drop the constants, and code which sets them.
The separate hvm parameter (appeared in c/s d11bec8a1, 2007 and ultimately
redundant with XCFLAGS_HVM), is used for sanity checking and debug printing,
then discarded and replaced with Xen's idea of whether the domain is PV or
HVM.
Rearrange the logic in xc_domain_save() to ask Xen sightly earlier, and use a
consistent idea of 'hvm' throughout. Removing this parameter removes the
final user of libxl's dss->hvm, so drop that field as well.
Update the doxygen comment to be accurate.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <Ian.Jackson@citrix.com>
livepatch: use proper rc variable in livepatch_do_action()
Fix c&p bug in the livepatch_do_action() code of
LIVEPATCH_ACTION_REPLACE case.
The correct variable handling return code of revert action is
other->rc in this case.
Coverity-ID: 1457467 Fixes: 6047104c3c ("livepatch: Add per-function applied/reverted state tracking marker") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Andrew Cooper [Mon, 6 Jan 2020 13:26:28 +0000 (13:26 +0000)]
Coverity: Improve model for {,un}map_domain_page()
The first attempt resulted in several "Free of address-of
expression (BAD_FREE)" issues, because of code which relies on the fact that
any pointer in the same page is ok to pass to unmap_domain_page()
Model this property to remove the issues.
Coverity IDs: 1135356 113536{0,1} 1401300 141809{0,1} 1438864 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Fri, 3 Jan 2020 17:29:35 +0000 (18:29 +0100)]
tools/libxc: disable x2APIC when using nested virtualization
There are issues as reported by osstest when Xen is running nested on
itself and the L1 Xen is using x2APIC. While those are being
investigated, disable announcing the x2APIC feature in CPUID when nested
HVM mode is enabled.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Thu, 19 Dec 2019 07:42:08 +0000 (08:42 +0100)]
xen: put more code under CONFIG_CRASH_DEBUG
debugger_trap_entry() is not needed without CONFIG_CRASH_DEBUG, so only
include it if CONFIG_CRASH_DEBUG is defined.
While at it remove CONFIG_HAS_GDBSX as it can easily be replaced by
CONFIG_CRASH_DEBUG.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>