Currently the XEN_DOMCTL_get_vcpu_msrs is only capable of gathering a handful
of predetermined vcpu MSRs. In our use-case gathering the vPMU MSRs by an
external privileged tool is necessary, thus we introduce a new set of domctls
to allow for querying for any guest MSRs.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
wrmsr and AMD side totally unimplemented
still needs wiring into guest_{rd/wr}msr
Michal Orzel [Fri, 21 Oct 2022 13:22:38 +0000 (15:22 +0200)]
automation: Build Xen according to the type of the job
All the build jobs exist in two flavors: debug and non-debug, where the
former sets 'debug' variable to 'y' and the latter to 'n'. This variable
is only being recognized by the toolstack, because Xen requires
enabling/disabling debug build via e.g. menuconfig/config file.
As a corollary, we end up building/testing Xen with CONFIG_DEBUG always
set to a default value ('y' for unstable and 'n' for stable branches),
regardless of the type of the build job.
Fix this behavior by setting CONFIG_DEBUG according to the 'debug' value.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Michal Orzel [Mon, 24 Oct 2022 12:04:43 +0000 (14:04 +0200)]
automation: Explicitly enable NULL scheduler for boot-cpupools test
NULL scheduler is not enabled by default on non-debug Xen builds. This
causes the boot time cpupools test to fail on such build jobs. Fix the issue
by explicitly specifying the config options required to enable the NULL
scheduler.
Fixes: 36e3f4158778 ("automation: Add a new job for testing boot time cpupools on arm64") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Make may not have copied "_libxl_types_json.h" into $(XEN_INCLUDE)
before starting to build the different objects.
Make sure that the generated headers are copied into $(XEN_INCLUDE)
before using them. This is achieved by telling make about which
headers are needed to use "libxl_internal.h" which use "libxl_json.h"
which uses "_libxl_types_json.h". "libxl_internal.h" also uses
"libxl.h" so add it to the list.
This also prevent `gcc` from using a potentially installed headers
from a previous version of Xen.
Reported-by: Per Bilse <per.bilse@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Jan Beulich [Mon, 24 Oct 2022 13:46:11 +0000 (15:46 +0200)]
x86/shadow: drop (replace) bogus assertions
The addition of a call to shadow_blow_tables() from shadow_teardown()
has resulted in the "no vcpus" related assertion becoming triggerable:
If domain_create() fails with at least one page successfully allocated
in the course of shadow_enable(), or if domain_create() succeeds and
the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.
Note that in-tree tests (test-resource and test-tsx) do exactly the
latter of these two.
The assertion's comment was bogus anyway: Shadow mode has been getting
enabled before allocation of vCPU-s for quite some time. Convert the
assertion to a conditional: As long as there are no vCPU-s, there's
nothing to blow away.
Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
A similar assertion/comment pair exists in _shadow_prealloc(); the
comment is similarly bogus, and the assertion could in principle trigger
e.g. when shadow_alloc_p2m_page() is called early enough. Replace those
at the same time by a similar early return, here indicating failure to
the caller (which will generally lead to the domain being crashed in
shadow_prealloc()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Juergen Gross [Fri, 21 Oct 2022 10:50:26 +0000 (12:50 +0200)]
xen/sched: fix restore_vcpu_affinity() by removing it
When the system is coming up after having been suspended,
restore_vcpu_affinity() is called for each domain in order to adjust
the vcpu's affinity settings in case a cpu didn't come to live again.
The way restore_vcpu_affinity() is doing that is wrong, because the
specific scheduler isn't being informed about a possible migration of
the vcpu to another cpu. Additionally the migration is often even
happening if all cpus are running again, as it is done without check
whether it is really needed.
As cpupool management is already calling cpu_disable_scheduler() for
cpus not having come up again, and cpu_disable_scheduler() is taking
care of eventually needed vcpu migration in the proper way, there is
simply no need for restore_vcpu_affinity().
So just remove restore_vcpu_affinity() completely, together with the
no longer used sched_reset_affinity_broken().
Fixes: 8a04eaa8ea83 ("xen/sched: move some per-vcpu items to struct sched_unit") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Dario Faggioli <dfaggioli@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Juergen Gross [Fri, 21 Oct 2022 10:32:23 +0000 (12:32 +0200)]
xen/sched: fix race in RTDS scheduler
When a domain gets paused the unit runnable state can change to "not
runnable" without the scheduling lock being involved. This means that
a specific scheduler isn't involved in this change of runnable state.
In the RTDS scheduler this can result in an inconsistency in case a
unit is losing its "runnable" capability while the RTDS scheduler's
scheduling function is active. RTDS will remove the unit from the run
queue, but doesn't do so for the replenish queue, leading to hitting
an ASSERT() in replq_insert() later when the domain is unpaused again.
Fix that by removing the unit from the replenish queue as well in this
case.
Fixes: 7c7b407e7772 ("xen/sched: introduce unit_runnable_state()") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Dario Faggioli <dfaggioli@suse.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Jan Beulich [Fri, 21 Oct 2022 10:30:24 +0000 (12:30 +0200)]
EFI: don't convert memory marked for runtime use to ordinary RAM
efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
higher priority than the type of the range. To avoid accessing memory at
runtime which was re-used for other purposes, make
efi_arch_process_memory_map() follow suit. While in theory the same would
apply to EfiACPIReclaimMemory, we don't actually "reclaim" or clobber
that memory (converted to E820_ACPI on x86) there (and it would be a bug
if the Dom0 kernel tried to reclaim the range, bypassing Xen's memory
management, plus it would be at least bogus if it clobbered that space),
hence that type's handling can be left alone.
Fixes: bf6501a62e80 ("x86-64: EFI boot code") Fixes: facac0af87ef ("x86-64: EFI runtime code") Fixes: 6d70ea10d49f ("Add ARM EFI boot support") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Xenia Ragiadakou [Wed, 19 Oct 2022 14:49:13 +0000 (17:49 +0300)]
xen/arm: p2m: fix pa_range_info for 52-bit pa range
Currently, the fields 'root_order' and 'sl0' of the pa_range_info for
the 52-bit pa range have the values 3 and 3, respectively.
This configuration does not match any of the valid root table configurations
for 4KB granule and t0sz 12, described in ARM DDI 0487I.a D8.2.7.
More specifically, according to ARM DDI 0487I.a D8.2.7, in order to support
the 52-bit pa size with 4KB granule, the p2m root table needs to be configured
either as a single table at level -1 or as 16 concatenated tables at level 0.
Since, currently there is no support for level -1, set the 'root_order' and
'sl0' fields of the 52-bit pa_range_info according to the second approach.
Note that the values of those fields are not used so far. This patch updates
their values only for the sake of correctness.
Fixes: 407b13a71e32 ("xen/arm: p2m don't fall over on FEAT_LPA enabled hw") Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
All functions in domain_build.c should be marked __init. This was
spotted when building the hypervisor with -Og.
Fixes: 1050a7b91c2e ("xen/arm: add pci-domain for disabled devices") Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Acked-by: Julien Grall <jgrall@amazon.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Edwin Török [Fri, 21 Oct 2022 07:59:25 +0000 (08:59 +0100)]
tools/ocaml/xenstored: fix live update exception
During live update we will load the /tool/xenstored path from the previous binary,
and then try to mkdir /tool again which will fail with EEXIST.
Check for existence of the path before creating it.
The write call to /tool/xenstored should not need any changes
(and we do want to overwrite any previous path, in case it changed).
Prior to 7110192b1df6 live update would work only if the binary path was
specified, and with 7110192b1df6 and this live update also works when
no binary path is specified in `xenstore-control live-update`.
Fixes: 7110192b1df6 ("tools/oxenstored: Fix Oxenstored Live Update") Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Peter Hoyes [Mon, 3 Oct 2022 14:42:16 +0000 (15:42 +0100)]
tools/xendomains: Restrict domid pattern in LIST_GREP
The xendomains script uses the output of `xl list -l` to collect the
id and name of each domain, which is used in the shutdown logic, amongst
other purposes.
The linked commit added a "domid" field to libxl_domain_create_info.
This causes the output of `xl list -l` to contain two "domid"s per
domain, which may not be equal. This in turn causes `xendomains stop` to
issue two shutdown commands per domain, one of which is to a duplicate
and/or invalid domid.
To work around this, make the LIST_GREP pattern more restrictive for
domid, so it only detects the domid at the top level and not the domid
inside c_info.
Fixes: 4a3a25678d92 ("libxl: allow creation of domains with a specified or random domid") Signed-off-by: Peter Hoyes <Peter.Hoyes@arm.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Andrew Cooper [Wed, 19 Oct 2022 17:12:33 +0000 (18:12 +0100)]
tools/oxenstored: Fix Oxenstored Live Update
tl;dr This hunk was part of the patch emailed to xen-devel, but was missing
from what ultimately got committed.
https://lore.kernel.org/xen-devel/4164cb728313c3b9fc38cf5e9ecb790ac93a9600.1610748224.git.edvin.torok@citrix.com/
is the patch in question, but was part of a series that had threading issues.
I have a vague recollection that I sourced the commits from a local branch,
which clearly wasn't as up-to-date as I had thought.
Either way, it's my fault/mistake, and this hunk should have been part of what
got comitted.
Fixes: 00c48f57ab36 ("tools/oxenstored: Start live update process") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Roger Pau Monné [Thu, 20 Oct 2022 14:37:29 +0000 (16:37 +0200)]
test/vpci: enable by default
CONFIG_HAS_PCI is not defined for the tools build, and as a result the
vpci harness would never get build. Fix this by building it
unconditionally, there's nothing arch specific in it.
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Roger Pau Monné [Thu, 20 Oct 2022 14:37:15 +0000 (16:37 +0200)]
test/vpci: fix vPCI test harness to provide pci_get_pdev()
Instead of pci_get_pdev_by_domain(), which is no longer present in the
hypervisor.
While there add parentheses around the define value.
Fixes: a37f9ea7a6 ('PCI: fold pci_get_pdev{,_by_domain}()') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Roger Pau Monné [Thu, 20 Oct 2022 14:36:48 +0000 (16:36 +0200)]
test/vpci: add dummy cfcheck define
Some vpci functions got the cfcheck attribute added, but that's not
defined in the user-space test harness, so add a dummy define in order
for the harness to build.
Fixes: 4ed7d5525f ('xen/vpci: CFI hardening') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Henry Wang [Tue, 18 Oct 2022 14:23:46 +0000 (14:23 +0000)]
xen/arm: p2m: Populate pages for GICv2 mapping in p2m_init()
Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
when the domain is created. Considering the worst case of page tables
which requires 6 P2M pages as the two pages will be consecutive but not
necessarily in the same L3 page table and keep a buffer, populate 16
pages as the default value to the P2M pages pool in p2m_init() at the
domain creation stage to satisfy the GICv2 requirement. For GICv3, the
above-mentioned P2M mapping is not necessary, but since the allocated
16 pages here would not be lost, hence populate these pages
unconditionally.
With the default 16 P2M pages populated, there would be a case that
failures would happen in the domain creation with P2M pages already in
use. To properly free the P2M for this case, firstly support the
optionally preemption of p2m_teardown(), then call p2m_teardown() and
p2m_set_allocation(d, 0, NULL) non-preemptively in p2m_final_teardown().
As non-preemptive p2m_teardown() should only return 0, use a
BUG_ON to confirm that.
Since p2m_final_teardown() is called either after
domain_relinquish_resources() where relinquish_p2m_mapping() has been
called, or from failure path of domain_create()/arch_domain_create()
where mappings that require p2m_put_l3_page() should never be created,
relinquish_p2m_mapping() is not added in p2m_final_teardown(), add
in-code comments to refer this.
Fixes: cbea5a1149ca ("xen/arm: Allocate and free P2M pages from the P2M pool") Suggested-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Release-acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Tue, 18 Oct 2022 14:23:45 +0000 (14:23 +0000)]
arm/p2m: Rework p2m_init()
p2m_init() is mostly trivial initialisation, but has two fallible operations
which are on either side of the backpointer trigger for teardown to take
actions.
p2m_free_vmid() is idempotent with a failed p2m_alloc_vmid(), so rearrange
p2m_init() to perform all trivial setup, then set the backpointer, then
perform all fallible setup.
This will simplify a future bugfix which needs to add a third fallible
operation.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Mon, 17 Oct 2022 10:34:03 +0000 (11:34 +0100)]
tools: Workaround wrong use of tools/Rules.mk by qemu-trad
qemu-trad build system, when built from xen.git, will make use of
Rules.mk (setup via qemu-trad.git/xen-setup). This mean that changes
to Rules.mk will have an impact our ability to build qemu-trad.
Recent commit e4f5949c4466 ("tools: Add -Werror by default to all
tools/") have added "-Werror" to the CFLAGS and qemu-trad start to use
it. But this fails and there's lots of warning that are now turned
into error.
We should teach qemu-trad and xen.git to not have to use Rules.mk when
building qemu-trad, but for now, avoid adding -Werror to CFLAGS when
building qemu-trad.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:13 +0000 (14:05 +0100)]
tools: Rework linking options for ocaml binding libraries
Using a full path to the C libraries when preparing one of the ocaml
binding for those libraries make the binding unusable by external
project. The full path is somehow embedded and reused by the external
project when linking against the binding.
Instead, we will use the proper way to link a library, by using '-l'.
For in-tree build, we also need to provide the search directory via
'-L'.
(The search path -L are still be embedded, but at least that doesn't
prevent the ocaml binding from been used.)
Related-to: xen-project/xen#96 Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:12 +0000 (14:05 +0100)]
tools/golang/xenlight: Rework gengotypes.py and generation of *.gen.go
gengotypes.py creates both "types.gen.go" and "helpers.gen.go", but
make can start gengotypes.py twice. Rework the rules so that
gengotypes.py is executed only once.
Also, add the ability to provide a path to tell gengotypes.py where to
put the files. This doesn't matter yet but it will when for example
the script will be run from tools/ to generate the targets.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:09 +0000 (14:05 +0100)]
libs/light: Rework generation of include/_libxl_*.h
Instead of moving the public "_libxl_*.h" headers, we make a copy to
the destination so that make doesn't try to remake the targets
"_libxl_*.h" in libs/light/ again.
A new .PRECIOUS target is added to tell make to not deletes the
intermediate targets generated by "gentypes.py".
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:08 +0000 (14:05 +0100)]
libs/light: Rework acpi table build targets
Currently, a rebuild of libxl will always rebuild "build.o". This is because
the target depends on "acpi" which never exist. So instead we will have
"build.o" have as prerequisites targets that are actually generated by "acpi",
that is $(DSDT_FILES-y).
While "dsdt_*.c" isn't really a dependency for "build.o", a side
effect of building that dsdt_*.c is to also generate the "ssdt_*.h"
that "build.o" needs, but I don't want to list all the headers needed
by "build.o" and duplicate the information available in
"libacpi/Makefile" at this time.
Also avoid duplicating the "acpi" target for Arm, and unique one for
both architecture. And move the "acpi" target to be with other targets
rather than in the middle of the source listing. For the same reason,
move the prerequisites listing for both $(DSDT_FILES-y) and "build.o".
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:07 +0000 (14:05 +0100)]
tools/include: Rework Makefile
Rework "xen-xsm" rules to not have to change directory to run
mkflask.sh, and store mkflask.sh path in a var, and use a full path
for FLASK_H_DEPEND, and output directory is made relative.
Rename "all-y" target to a more descriptive "xen/lib/x86/all".
Removed the "dist" target which was the only one existing in tools/.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:05 +0000 (14:05 +0100)]
libs: Avoid exposing -Wl,--version-script to other built library
$(SHLIB_LDFLAGS) is used by more targets that the single targets that
except it (libxenfoo.so.X.Y). There is also some dynamic libraries in
stats/ that uses $(SHLIB_LDFLAGS) (even if those are never built), and
there's libxenlight_test.so which doesn't needs a version script.
Also, libxenlight_test.so might failed to build if the version script
doesn't exist yet.
For these reasons, avoid changing the generic $(SHLIB_LDFLAGS) flags,
and add the flag directly on the command line.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:04 +0000 (14:05 +0100)]
git-checkout.sh: handle running git-checkout from a different directory
"$DIR" might not be a full path and it might not have `pwd` as ".."
directory. So use `cd -` to undo the first `cd` command.
Also, use `basename` to make a symbolic link with a relative path.
This doesn't matter yet but it will when for example the commands to
clone OVMF is been run from tools/ rather than tools/firmware/.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:03 +0000 (14:05 +0100)]
libs/light/gentypes.py: allow to generate headers in subdirectory
This doesn't matter yet but it will when for example the script will
be run from tools/ to generate files tools/libs/light/.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:02 +0000 (14:05 +0100)]
tools/hotplug: Generate "hotplugpath.sh" with configure
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:01 +0000 (14:05 +0100)]
tools: Remove -Werror everywhere else
The previous changeset, e4f5949c4466 ("tools: Add -Werror by default to all
tools/"), added "-Werror" to CFLAGS in tools/Rules.mk. Remove it from
everywhere else now it is duplicated.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:05:00 +0000 (14:05 +0100)]
tools: Add -Werror by default to all tools/
And provide an option to ./configure to disable it.
A follow-up patch will remove -Werror from every other Makefile in
tools/.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:04:59 +0000 (14:04 +0100)]
tools: Introduce $(xenlibs-ldflags, ) macro
This avoid the need to open-coding the list of flags needed to link
with an in-tree Xen library when using -lxen*.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:04:58 +0000 (14:04 +0100)]
tools/xentrace: rework Makefile
Remove "build" targets.
Use "$(TARGETS)" to list binary to be built.
Cleanup "clean" rule.
Also drop conditional install of $(BIN) and $(LIBBIN) as those two
variables are now always populated.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Anthony PERARD [Thu, 13 Oct 2022 13:04:57 +0000 (14:04 +0100)]
tools/debugger/gdbsx: Fix and cleanup makefiles
gdbsx/:
- Make use of subdir facility for the "clean" target.
- No need to remove the *.a, they aren't in this dir.
- Avoid calling "distclean" in subdirs as "distclean" targets do only
call "clean", and the "clean" also runs "clean" in subdirs.
- Avoid the need to make "gx_all.a" and "xg_all.a" in the "all"
recipe by forcing make to check for update of "xg/xg_all.a" and
"gx/gx_all.a" by having "FORCE" as prerequisite. Now, when making
"gdbsx", make will recurse even when both *.a already exist.
- List target in $(TARGETS).
gdbsx/*/:
- Fix dependency on *.h.
- Remove some dead code.
- List targets in $(TARGETS).
- Remove "build" target.
- Cleanup "clean" targets.
- remove comments about the choice of "ar" instead of "ld"
- Use "$(AR)" instead of plain "ar".
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Jason Andryuk [Fri, 7 Oct 2022 19:31:24 +0000 (15:31 -0400)]
argo: Remove reachable ASSERT_UNREACHABLE
I observed this ASSERT_UNREACHABLE in partner_rings_remove consistently
trip. It was in OpenXT with the viptables patch applied.
dom10 shuts down.
dom7 is REJECTED sending to dom10.
dom7 shuts down and this ASSERT trips for dom10.
The argo_send_info has a domid, but there is no refcount taken on
the domain. Therefore it's not appropriate to ASSERT that the domain
can be looked up via domid. Replace with a debug message.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
As that commit describes, on early Sapphire Rapids Xeon platforms the C1 and
C1E states were mutually exclusive, so that users could only have either C1 and
C6, or C1E and C6.
However, Intel firmware engineers managed to remove this limitation and make C1
and C1E to be completely independent, just like on previous Xeon platforms.
Therefore, this patch:
* Removes commentary describing the old, and now non-existing SPR C1E
limitation.
* Marks SPR C1E as available by default.
* Removes the 'preferred_cstates' parameter handling for SPR. Both C1 and
C1E will be available regardless of 'preferred_cstates' value.
We expect that all SPR systems are shipping with new firmware, which includes
the C1/C1E improvement.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 1548fac47a11 Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Peter Zijlstra [Thu, 13 Oct 2022 15:55:22 +0000 (17:55 +0200)]
x86/mwait-idle: disable IBRS during long idle
Having IBRS enabled while the SMT sibling is idle unnecessarily slows
down the running sibling. OTOH, disabling IBRS around idle takes two
MSR writes, which will increase the idle latency.
Therefore, only disable IBRS around deeper idle states. Shallow idle
states are bounded by the tick in duration, since NOHZ is not allowed
for them by virtue of their short target residency.
Only do this for mwait-driven idle, since that keeps interrupts disabled
across idle, which makes disabling IBRS vs IRQ-entry a non-issue.
Note: C6 is a random threshold, most importantly C1 probably shouldn't
disable IBRS, benchmarking needed.
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bf5835bcdb96 Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Zhang Rui [Thu, 13 Oct 2022 15:54:23 +0000 (17:54 +0200)]
x86/mwait-idle: add AlderLake support
Similar to SPR, the C1 and C1E states on ADL are mutually exclusive.
Only one of them can be enabled at a time.
But contrast to SPR, which usually has a strong latency requirement
as a Xeon processor, C1E is preferred on ADL for better energy
efficiency.
Add custom C-state tables for ADL with both C1 and C1E, and
1. Enable the "C1E promotion" bit in MSR_IA32_POWER_CTL and mark C1
with the CPUIDLE_FLAG_UNUSABLE flag, so C1 is not available by
default.
2. Add support for the "preferred_cstates" module parameter, so that
users can choose to use C1 instead of C1E by booting with
"intel_idle.preferred_cstates=2".
Separate custom C-state tables are introduced for the ADL mobile and
desktop processors, because of the exit latency differences between
these two variants, especially with respect to PC10.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits, code rearrangement ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git d1cf8bbfed1e Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Add a Sapphire Rapids Xeon C6 optimization, similar to what we have for Sky Lake
Xeon: if package C6 is disabled, adjust C6 exit latency and target residency to
match core C6 values, instead of using the default package C6 values.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 3a9cf77b60dc
Make sure a contradictory "preferred-cstates" wouldn't cause bypassing
of the added logic.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Artem Bityutskiy [Thu, 13 Oct 2022 15:52:36 +0000 (17:52 +0200)]
x86/mwait-idle: add 'preferred-cstates' command line option
On Sapphire Rapids Xeon (SPR) the C1 and C1E states are basically mutually
exclusive - only one of them can be enabled. By default, 'intel_idle' driver
enables C1 and disables C1E. However, some users prefer to use C1E instead of
C1, because it saves more energy.
This patch adds a new module parameter ('preferred_cstates') for enabling C1E
and disabling C1. Here is the idea behind it.
1. This option has effect only for "mutually exclusive" C-states like C1 and
C1E on SPR.
2. It does not have any effect on independent C-states, which do not require
other C-states to be disabled (most states on most platforms as of today).
3. For mutually exclusive C-states, the 'intel_idle' driver always has a
reasonable default, such as enabling C1 on SPR by default. On other
platforms, the default may be different.
4. Users can override the default using the 'preferred_cstates' parameter.
5. The parameter accepts the preferred C-states bit-mask, similarly to the
existing 'states_off' parameter.
6. This parameter is not limited to C1/C1E, and leaves room for supporting
other mutually exclusive C-states, if they come in the future.
Today 'intel_idle' can only be compiled-in, which means that on SPR, in order
to disable C1 and enable C1E, users should boot with the following kernel
argument: intel_idle.preferred_cstates=4
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git da0e58c038e6
Enable C1E (if requested) not only on the BSP's socket / package. Alter
command line option to fit our model, and extend it to also accept
string form arguments.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Andrew Cooper [Mon, 25 Jul 2022 17:36:29 +0000 (18:36 +0100)]
tools/ocaml/xc: Address ABI issues with physinfo arch flags
The current bindings function, but the preexisting
type physinfo_arch_cap_flag =
| X86 of x86_physinfo_arch_cap_flag
is a special case in the Ocaml type system with an unusual indirection, and
will break when a second option, e.g. `| ARM of ...` is added.
Also, the position the list is logically wrong. Currently, the types express
a list of elements which might be an x86 flag or an arm flag (and can
intermix), whereas what we actually want is either a list of x86 flags, or a
list of ARM flags (that cannot intermix).
Rework the Ocaml types to avoid the ABI special case and move the list
primitive, and adjust the C bindings to match.
Fixes: 2ce11ce249a3 ("x86/HVM: allow per-domain usage of hardware virtualized APIC") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Andrew Cooper [Wed, 12 Oct 2022 10:02:08 +0000 (11:02 +0100)]
tools/ocaml/xc: Fix code legibility in stub_xc_domain_create()
Reposition the defines to match the outer style and to make the logic
half-legible.
No functional change.
Fixes: 0570d7f276dd ("x86/msr: introduce an option for compatible MSR behavior selection") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Jan Beulich [Wed, 12 Oct 2022 15:57:56 +0000 (17:57 +0200)]
VMX: correct error handling in vmx_create_vmcs()
With the addition of vmx_add_msr() calls to construct_vmcs() there are
now cases where simply freeing the VMCS isn't enough: The MSR bitmap
page as well as one of the MSR area ones (if it's the 2nd vmx_add_msr()
which fails) may also need freeing. Switch to using vmx_destroy_vmcs()
instead.
Fixes: 3bd36952dab6 ("x86/spec-ctrl: Introduce an option to control L1D_FLUSH for HVM HAP guests") Fixes: 53a570b28569 ("x86/spec-ctrl: Support IBPB-on-entry") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Jan Beulich [Tue, 11 Oct 2022 12:30:41 +0000 (14:30 +0200)]
x86emul: respect NSCB
protmode_load_seg() would better adhere to that "feature" of clearing
base (and limit) during NULL selector loads.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Jan Beulich [Tue, 11 Oct 2022 12:29:30 +0000 (14:29 +0200)]
gnttab: correct locking on transitive grant copy error path
While the comment next to the lock dropping in preparation of
recursively calling acquire_grant_for_copy() mistakenly talks about the
rd == td case (excluded a few lines further up), the same concerns apply
to the calling of release_grant_for_copy() on a subsequent error path.
This is CVE-2022-33748 / XSA-411.
Fixes: ad48fb963dbf ("gnttab: fix transitive grant handling") Signed-off-by: Jan Beulich <jbeulich@suse.com>
Henry Wang [Mon, 6 Jun 2022 06:17:30 +0000 (06:17 +0000)]
xen/arm: Allocate and free P2M pages from the P2M pool
This commit sets/tearsdown of p2m pages pool for non-privileged Arm
guests by calling `p2m_set_allocation` and `p2m_teardown_allocation`.
- For dom0, P2M pages should come from heap directly instead of p2m
pool, so that the kernel may take advantage of the extended regions.
- For xl guests, the setting of the p2m pool is called in
`XEN_DOMCTL_shadow_op` and the p2m pool is destroyed in
`domain_relinquish_resources`. Note that domctl->u.shadow_op.mb is
updated with the new size when setting the p2m pool.
- For dom0less domUs, the setting of the p2m pool is called before
allocating memory during domain creation. Users can specify the p2m
pool size by `xen,domain-p2m-mem-mb` dts property.
To actually allocate/free pages from the p2m pool, this commit adds
two helper functions namely `p2m_alloc_page` and `p2m_free_page` to
`struct p2m_domain`. By replacing the `alloc_domheap_page` and
`free_domheap_page` with these two helper functions, p2m pages can
be added/removed from the list of p2m pool rather than from the heap.
Since page from `p2m_alloc_page` is cleaned, take the opportunity
to remove the redundant `clean_page` in `p2m_create_table`.
This is part of CVE-2022-33747 / XSA-409.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Henry Wang [Mon, 6 Jun 2022 06:17:29 +0000 (06:17 +0000)]
xen/arm, libxl: Implement XEN_DOMCTL_shadow_op for Arm
This commit implements the `XEN_DOMCTL_shadow_op` support in Xen
for Arm. The p2m pages pool size for xl guests is supposed to be
determined by `XEN_DOMCTL_shadow_op`. Hence, this commit:
- Introduces a function `p2m_domctl` and implements the subops
`XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION` and
`XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION` of `XEN_DOMCTL_shadow_op`.
- Adds the `XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION` support in libxl.
Therefore enabling the setting of shadow memory pool size
when creating a guest from xl and getting shadow memory pool size
from Xen.
Note that the `XEN_DOMCTL_shadow_op` added in this commit is only
a dummy op, and the functionality of setting/getting p2m memory pool
size for xl guests will be added in following commits.
This is part of CVE-2022-33747 / XSA-409.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Henry Wang [Mon, 6 Jun 2022 06:17:28 +0000 (06:17 +0000)]
xen/arm: Construct the P2M pages pool for guests
This commit constructs the p2m pages pool for guests from the
data structure and helper perspective.
This is implemented by:
- Adding a `struct paging_domain` which contains a freelist, a
counter variable and a spinlock to `struct arch_domain` to
indicate the free p2m pages and the number of p2m total pages in
the p2m pages pool.
- Adding a helper `p2m_get_allocation` to get the p2m pool size.
- Adding a helper `p2m_set_allocation` to set the p2m pages pool
size. This helper should be called before allocating memory for
a guest.
- Adding a helper `p2m_teardown_allocation` to free the p2m pages
pool. This helper should be called during the xl domain destory.
This is part of CVE-2022-33747 / XSA-409.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Henry Wang [Mon, 6 Jun 2022 06:17:27 +0000 (06:17 +0000)]
libxl, docs: Add per-arch extra default paging memory
This commit adds a per-arch macro `EXTRA_DEFAULT_PAGING_MEM_MB`
to the default paging memory size, in order to cover the p2m
pool for extended regions of a xl-based guest on Arm.
For Arm, the extra default paging memory is 128MB.
For x86, the extra default paging memory is zero, since there
are no extended regions on x86.
Also update the xl.cfg documentation to add Arm documentation
according to code changes.
This is part of CVE-2022-33747 / XSA-409.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Julien Grall [Tue, 11 Oct 2022 12:24:48 +0000 (14:24 +0200)]
xen/x86: p2m: Add preemption in p2m_teardown()
The list p2m->pages contain all the pages used by the P2M. On large
instance this can be quite large and the time spent to call
d->arch.paging.free_page() will take more than 1ms for a 80GB guest
on a Xen running in nested environment on a c5.metal.
By extrapolation, it would take > 100ms for a 8TB guest (what we
current security support). So add some preemption in p2m_teardown()
and propagate to the callers. Note there are 3 places where
the preemption is not enabled:
- hap_final_teardown()/shadow_final_teardown(): We are
preventing update the P2M once the domain is dying (so
no more pages could be allocated) and most of the P2M pages
will be freed in preemptive manneer when relinquishing the
resources. So this is fine to disable preemption.
- shadow_enable(): This is fine because it will undo the allocation
that may have been made by p2m_alloc_table() (so only the root
page table).
The preemption is arbitrarily checked every 1024 iterations.
We now need to include <xen/event.h> in p2m-basic in order to
import the definition for local_events_need_delivery() used by
general_preempt_check(). Ideally, the inclusion should happen in
xen/sched.h but it opened a can of worms.
Note that with the current approach, Xen doesn't keep track on whether
the alt/nested P2Ms have been cleared. So there are some redundant work.
However, this is not expected to incurr too much overhead (the P2M lock
shouldn't be contended during teardown). So this is optimization is
left outside of the security event.
This is part of CVE-2022-33746 / XSA-410.
Signed-off-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
----
Changes since v12:
- Correct altp2m preemption check placement.
Changes since v9:
- Integrate patch into series.
Changes since v2:
- Rework the loop doing the preemption
- Add a comment in shadow_enable() to explain why p2m_teardown()
doesn't need to be preemptible.
Changes since v1:
- Update the commit message
- Rebase on top of Roger's v8 series
- Fix preemption check
- Use 'unsigned int' rather than 'unsigned long' for the counter
Roger Pau Monné [Tue, 11 Oct 2022 12:24:21 +0000 (14:24 +0200)]
x86/p2m: free the paging memory pool preemptively
The paging memory pool is currently freed in two different places:
from {shadow,hap}_teardown() via domain_relinquish_resources() and
from {shadow,hap}_final_teardown() via complete_domain_destroy().
While the former does handle preemption, the later doesn't.
Attempt to move as much p2m related freeing as possible to happen
before the call to {shadow,hap}_teardown(), so that most memory can be
freed in a preemptive way. In order to avoid causing issues to
existing callers leave the root p2m page tables set and free them in
{hap,shadow}_final_teardown(). Also modify {hap,shadow}_free to free
the page immediately if the domain is dying, so that pages don't
accumulate in the pool when {shadow,hap}_final_teardown() get called.
Move altp2m_vcpu_disable_ve() to be done in hap_teardown(), as that's
the place where altp2m_active gets disabled now.
This is part of CVE-2022-33746 / XSA-410.
Reported-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Roger Pau Monné [Tue, 11 Oct 2022 12:23:51 +0000 (14:23 +0200)]
x86/p2m: truly free paging pool memory for dying domains
Modify {hap,shadow}_free to free the page immediately if the domain is
dying, so that pages don't accumulate in the pool when
{shadow,hap}_final_teardown() get called. This is to limit the amount of
work which needs to be done there (in a non-preemptable manner).
Note the call to shadow_free() in shadow_free_p2m_page() is moved after
increasing total_pages, so that the decrease done in shadow_free() in
case the domain is dying doesn't underflow the counter, even if just for
a short interval.
This is part of CVE-2022-33746 / XSA-410.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Roger Pau Monné [Tue, 11 Oct 2022 12:22:53 +0000 (14:22 +0200)]
x86/shadow: tolerate failure in shadow_prealloc()
Prevent _shadow_prealloc() from calling BUG() when unable to fulfill
the pre-allocation and instead return true/false. Modify
shadow_prealloc() to crash the domain on allocation failure (if the
domain is not already dying), as shadow cannot operate normally after
that. Modify callers to also gracefully handle {_,}shadow_prealloc()
failing to fulfill the request.
Note this in turn requires adjusting the callers of
sh_make_monitor_table() also to handle it returning INVALID_MFN.
sh_update_paging_modes() is also modified to add additional error
paths in case of allocation failure, some of those will return with
null monitor page tables (and the domain likely crashed). This is no
different that current error paths, but the newly introduced ones are
more likely to trigger.
The now added failure points in sh_update_paging_modes() also require
that on some error return paths the previous structures are cleared,
and thus monitor table is null.
While there adjust the 'type' parameter type of shadow_prealloc() to
unsigned int rather than u32.
This is part of CVE-2022-33746 / XSA-410.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 11 Oct 2022 12:22:24 +0000 (14:22 +0200)]
x86/shadow: tolerate failure of sh_set_toplevel_shadow()
Subsequently sh_set_toplevel_shadow() will be adjusted to install a
blank entry in case prealloc fails. There are, in fact, pre-existing
error paths which would put in place a blank entry. The 4- and 2-level
code in sh_update_cr3(), however, assume the top level entry to be
valid.
Hence bail from the function in the unlikely event that it's not. Note
that 3-level logic works differently: In particular a guest is free to
supply a PDPTR pointing at 4 non-present (or otherwise deemed invalid)
entries. The guest will crash, but we already cope with that.
Really mfn_valid() is likely wrong to use in sh_set_toplevel_shadow(),
and it should instead be !mfn_eq(gmfn, INVALID_MFN). Avoid such a change
in security context, but add a respective assertion.
This is part of CVE-2022-33746 / XSA-410.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 11 Oct 2022 12:21:56 +0000 (14:21 +0200)]
x86/HAP: adjust monitor table related error handling
hap_make_monitor_table() will return INVALID_MFN if it encounters an
error condition, but hap_update_paging_modes() wasn’t handling this
value, resulting in an inappropriate value being stored in
monitor_table. This would subsequently misguide at least
hap_vcpu_teardown(). Avoid this by bailing early.
Further, when a domain has/was already crashed or (perhaps less
important as there's no such path known to lead here) is already dying,
avoid calling domain_crash() on it again - that's at best confusing.
This is part of CVE-2022-33746 / XSA-410.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monné [Tue, 11 Oct 2022 12:21:23 +0000 (14:21 +0200)]
x86/p2m: add option to skip root pagetable removal in p2m_teardown()
Add a new parameter to p2m_teardown() in order to select whether the
root page table should also be freed. Note that all users are
adjusted to pass the parameter to remove the root page tables, so
behavior is not modified.
No functional change intended.
This is part of CVE-2022-33746 / XSA-410.
Suggested-by: Julien Grall <julien@xen.org> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Julien Grall [Mon, 6 Jun 2022 06:17:26 +0000 (06:17 +0000)]
xen/arm: p2m: Handle preemption when freeing intermediate page tables
At the moment the P2M page tables will be freed when the domain structure
is freed without any preemption. As the P2M is quite large, iterating
through this may take more time than it is reasonable without intermediate
preemption (to run softirqs and perhaps scheduler).
Split p2m_teardown() in two parts: one preemptible and called when
relinquishing the resources, the other one non-preemptible and called
when freeing the domain structure.
As we are now freeing the P2M pages early, we also need to prevent
further allocation if someone call p2m_set_entry() past p2m_teardown()
(I wasn't able to prove this will never happen). This is done by
the checking domain->is_dying from previous patch in p2m_set_entry().
Similarly, we want to make sure that no-one can accessed the free
pages. Therefore the root is cleared before freeing pages.
This is part of CVE-2022-33746 / XSA-410.
Signed-off-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Henry Wang <Henry.Wang@arm.com> Tested-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Julien Grall [Mon, 6 Jun 2022 06:17:25 +0000 (06:17 +0000)]
xen/arm: p2m: Prevent adding mapping when domain is dying
During the domain destroy process, the domain will still be accessible
until it is fully destroyed. So does the P2M because we don't bail
out early if is_dying is non-zero. If a domain has permission to
modify the other domain's P2M (i.e. dom0, or a stubdomain), then
foreign mapping can be added past relinquish_p2m_mapping().
Therefore, we need to prevent mapping to be added when the domain
is dying. This commit prevents such adding of mapping by adding the
d->is_dying check to p2m_set_entry(). Also this commit enhances the
check in relinquish_p2m_mapping() to make sure that no mappings can
be added in the P2M after the P2M lock is released.
This is part of CVE-2022-33746 / XSA-410.
Signed-off-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Henry Wang <Henry.Wang@arm.com> Tested-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Jan Beulich [Wed, 5 Oct 2022 08:55:27 +0000 (10:55 +0200)]
x86/NUMA: correct off-by-1 in node map population
As it turns out populate_memnodemap() so far "relied" on
extract_lsb_from_nodes() setting memnodemapsize one too high in edge
cases. Correct the issue there as well, by changing "epdx" to be an
inclusive PDX and adjusting the respective relational operators.
While there also limit the scope of both related variables.
Fixes: b1f4b45d02ca ("x86/NUMA: correct off-by-1 in node map size calculation") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
xen/arm: fix booting ACPI based system after static evtchn series
When ACPI is enabled and the system booted with ACPI, BUG() is observed
after merging the static event channel series. As there is no DT when
booted with ACPI there will be no chosen node because of that
"BUG_ON(chosen == NULL)" will be hit.
(XEN) Xen BUG at arch/arm/domain_build.c:3578
Move call to alloc_static_evtchn() under acpi_disabled check to fix the
issue.
Jan Beulich [Fri, 30 Sep 2022 13:16:22 +0000 (15:16 +0200)]
x86/NUMA: improve memnode_shift calculation for multi node system
SRAT may describe individual nodes using multiple ranges. When they're
adjacent (with or without a gap in between), only the start of the first
such range actually needs accounting for. Furthermore the very first
range doesn't need considering of its start address at all, as it's fine
to associate all lower addresses (with no memory) with that same node.
For this to work, the array of ranges needs to be sorted by address -
adjust logic accordingly in acpi_numa_memory_affinity_init().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 30 Sep 2022 07:56:27 +0000 (09:56 +0200)]
Arm/vGIC: adjust gicv3_its_deny_access() to fit other gic*_iomem_deny_access(
While an oversight in 9982fe275ba4 ("arm/vgic: drop const attribute
from gic_iomem_deny_access()"), the issue really became apparent only
when iomem_deny_access() was switched to have a non-const first
parameter.
Fixes: c4e5cc2ccc5b ("x86/ept: limit calls to memory_type_changed()") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Tested-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Fri, 30 Sep 2022 07:55:34 +0000 (09:55 +0200)]
x86/NUMA: correct off-by-1 in node map size calculation
extract_lsb_from_nodes() accumulates "memtop" from all PDXes one past
the covered ranges. Hence the maximum address which can validly by used
to index the node map is one below this value, and we may currently set
up a node map with an unused (and never initialized) trailing entry. In
boundary cases this may also mean we dynamically allocate a page when
the static (64-entry) map would suffice.
While there also correct the comment ahead of the function, for it to
match the actual code: Linux commit 54413927f022 ("x86-64:
x86_64-make-the-numa-hash-function-nodemap-allocation fix fix") removed
the ORing in of the end address before we actually cloned their code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Tamas K Lengyel [Fri, 30 Sep 2022 07:53:49 +0000 (09:53 +0200)]
x86/vpmu: Fix race-condition in vpmu_load
The vPMU code-bases attempts to perform an optimization on saving/reloading the
PMU context by keeping track of what vCPU ran on each pCPU. When a pCPU is
getting scheduled, checks if the previous vCPU isn't the current one. If so,
attempts a call to vpmu_save_force. Unfortunately if the previous vCPU is
already getting scheduled to run on another pCPU its state will be already
runnable, which results in an ASSERT failure.
Fix this by always performing a pmu context save in vpmu_save when called from
vpmu_switch_from, and do a vpmu_load when called from vpmu_switch_to.
While this presents a minimal overhead in case the same vCPU is getting
rescheduled on the same pCPU, the ASSERT failure is avoided and the code is a
lot easier to reason about.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Thu, 29 Sep 2022 09:51:31 +0000 (10:51 +0100)]
automation: Information about running containers for a different arch
Adding pointer to 'qemu-user-static'.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Michal Orzel [Mon, 19 Sep 2022 18:37:37 +0000 (20:37 +0200)]
xen/arm: domain_build: Always print the static shared memory region
At the moment, the information about allocating static shared memory
region is only printed during the debug build. This information can also
be helpful for the end user (which may not be the same as the person
building the package), so switch to printk(). Also drop XENLOG_INFO to be
consistent with other printk() used to print the domain information.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Jan Beulich [Thu, 29 Sep 2022 12:47:45 +0000 (14:47 +0200)]
x86: wire up VCPUOP_register_vcpu_time_memory_area for 32-bit guests
Forever sinced its introduction VCPUOP_register_vcpu_time_memory_area
was available only to native domains. Linux, for example, would attempt
to use it irrespective of guest bitness (including in its so called
PVHVM mode) as long as it finds XEN_PVCLOCK_TSC_STABLE_BIT set (which we
set only for clocksource=tsc, which in turn needs engaging via command
line option).
Fixes: a5d39947cb89 ("Allow guests to register secondary vcpu_time_info") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Jan Beulich [Thu, 29 Sep 2022 12:46:50 +0000 (14:46 +0200)]
x86: re-connect VCPUOP_send_nmi for 32-bit guests
With the "inversion" of VCPUOP handling, processing arch-specific ones
first, the forwarding of this sub-op from the (common) compat handler to
(common) non-compat one did no longer have the intended effect. It now
needs forwarding between the arch-specific handlers.
Fixes: 8a96c0ea7999 ("xen: move do_vcpu_op() to arch specific code") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
memory_type_changed() is currently only implemented for Intel EPT, and
results in the invalidation of EMT attributes on all the entries in
the EPT page tables. Such invalidation causes EPT_MISCONFIG vmexits
when the guest tries to access any gfns for the first time, which
results in the recalculation of the EMT for the accessed page. The
vmexit and the recalculations are expensive, and as such should be
avoided when possible.
Remove the call to memory_type_changed() from
XEN_DOMCTL_memory_mapping: there are no modifications of the
iomem_caps ranges anymore that could alter the return of
cache_flush_permitted() from that domctl.
Encapsulate calls to memory_type_changed() resulting from changes to
the domain iomem_caps or ioport_caps ranges in the helpers themselves
(io{ports,mem}_{permit,deny}_access()), and add a note in
epte_get_entry_emt() to remind that changes to the logic there likely
need to be propagaed to the IO capabilities helpers.
Note changes to the IO ports or memory ranges are not very common
during guest runtime, but Citrix Hypervisor has an use case for them
related to device passthrough.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
arm/vgic: drop const attribute from gic_iomem_deny_access()
While correct from a code point of view, the usage of the const
attribute for the domain parameter of gic_iomem_deny_access() is at
least partially bogus. Contents of the domain structure (the iomem
rangeset) is modified by the function. Such modifications succeed
because right now the iomem rangeset is allocated separately from
struct domain, and hence is not subject to the constness of struct
domain.
Amend this by dropping the const attribute from the function
parameter.
This is required by further changes that will convert
iomem_{permit,deny}_access into a function.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Thu, 29 Sep 2022 12:39:52 +0000 (14:39 +0200)]
x86/NUMA: correct memnode_shift calculation for single node system
SRAT may describe even a single node system (including such with
multiple nodes, but only one having any memory) using multiple ranges.
Hence simply counting the number of ranges (note that function
parameters are mis-named) is not an indication of the number of nodes in
use. Since we only care about knowing whether we're on a single node
system, accounting for this is easy: Increment the local variable only
when adjacent ranges are for different nodes. That way the count may
still end up larger than the number of nodes in use, but it won't be
larger than 1 when only a single node has any memory.
To compensate populate_memnodemap() now needs to be prepared to find
the correct node ID already in place for a range. (This could of course
also happen when there's more than one node with memory, while at least
one node has multiple adjacent ranges, provided extract_lsb_from_nodes()
would also know to recognize this case.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
I am departing DornerWorks. I will still be working with Xen in my next
role, and I still have an interest in co-maintaining the ARINC 653
scheduler, so change to my personal email address.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com> Acked-by: Nathan Studer <nathan.studer@dornerworks.com>
tools: remove xenstore entries on vchan server closure
vchan server creates XenStore entries to advertise its event channel and
ring, but those are not removed after the server quits.
Add additional cleanup step, so those are removed, so clients do not try
to connect to a non-existing server.
Andrew Cooper [Mon, 26 Sep 2022 13:02:13 +0000 (14:02 +0100)]
CI: Force CONFIG_XEN_IBT in the buster-gcc-ibt test
buster-gcc-ibt is a dedicated test to run a not-yet-upstreamed compiler patch
which is relevant to CONFIG_XEN_IBT in 4.17 and later.
Force it on, rather than having 50% of the jobs not testing what they're
supposed to be testing.
Fixes: 5d59421815d5 ("x86: Use control flow typechecking where possible") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
[stefano: minor code style improvement] Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Andrew Cooper [Tue, 27 Sep 2022 15:47:08 +0000 (16:47 +0100)]
Build: Drop -no-pie from EMBEDDED_EXTRA_CFLAGS
This breaks all Clang builds, as demostrated by Gitlab CI.
Contrary to the description in ecd6b9759919, -no-pie is not even an option
passed to the linker. GCC's actual behaviour is to inhibit the passing of
-pie to the linker, as well as selecting different cr0 artefacts to be linked.
EMBEDDED_EXTRA_CFLAGS is not used for $(CC)-doing-linking, and not liable to
gain such a usecase.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Tested-by: Stefano Stabellini <sstabellini@kernel.org> Fixes: ecd6b9759919 ("Config.mk: correct PIE-related option(s) in EMBEDDED_EXTRA_CFLAGS")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Fixes: 022e40edd4dc ("drivers/char: allow using both dbgp=xhci and dbgp=ehci") Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Michal Orzel [Mon, 26 Sep 2022 11:04:14 +0000 (13:04 +0200)]
automation: Use custom build jobs when extra config options are needed
Currently, all the arm64 defconfig build jobs, regardless of the
container used, end up building Xen with the extra config options
specified in the main build script (e.g. CONFIG_EXPERT,
CONFIG_STATIC_MEMORY). Because these options are only needed for
specific test jobs, the current behavior of the CI is incorrect
as we add the extra options to all the defconfig builds. This means
that on arm64 there is not a single job performing proper defconfig build.
To fix this issue, add custom build jobs each time there is a need for
building Xen with additional config options. Introduce EXTRA_XEN_CONFIG
variable to be used by these jobs to store the required options. This
variable will be then read by the main build script to modify the .config
file. This will also help users to understand what is needed to run specific
test.
Anthony PERARD [Mon, 26 Sep 2022 09:16:04 +0000 (11:16 +0200)]
build: fix x86 out-of-tree build without EFI
We can't have a source file with the same name that exist in both the
common code and in the arch specific code for efi/. This can lead to
confusion in make and it can pick up the wrong source file. This issue
lead to a failure to build a pv-shim for x86 out-of-tree, as this is
one example of an x86 build using the efi/stub.c.
The issue is that in out-of-tree, make might find x86/efi/stub.c via
VPATH, but as the target needs to be rebuilt due to FORCE, make
actually avoid changing the source tree and rebuilt the target with
VPATH ignored, so $@ lead to the build tree where "stub.c" doesn't
exist yet so a link is made to "common/stub.c".
Rework the new common/stub.c file to have a different name than the
already existing one, by renaming the existing one. We can hide the
compat aliases that x86 uses behind CONFIG_COMPAT so a Arm build will
not have them.
Also revert the change to the rule that creates symbolic links it's
better to just recreate the link in cases where an existing file exist
or the link goes to the wrong file.
Avoid using $(EFIOBJ-y) as an alias for $(clean-files), add
"stub.c" directly to $(clean-files).
Also update .gitignore as this was also missing from the original
patch.
Fixes: 7f96859b0d00 ("xen: reuse x86 EFI stub functions for Arm") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Daniel P. Smith [Mon, 26 Sep 2022 09:14:19 +0000 (11:14 +0200)]
xsm/flask: adjust print messages to use %pd
Print messages from flask use an inconsistent format when printing the domain
id. When referencing system domains, the domain id is printed which is not
immediately identifiable. The %pd conversion specifier provides a consistent
and clear way to format for the domain id. In addition this will assist in
aligning FLASK with current hypervisor code practices.
While addressing the domain id formating, two related issues were addressed.
The first being that avc_printk() was not applying any conversion specifier
validation. To address this, the printf annotation was added to avc_printk() to
help ensure the correct types are passed to each conversion specifier. The second
was concern that source and target domains were being appropriately reported for
an AVC. This was addressed by simplifying the conditional logic.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
drivers/char: add console=ehci as an alias for console=dbgp
Make it consistent with console=xhci.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Jan Beulich <jbeulich@suse.com>
drivers/char: allow driving the rest of XHCI by a domain while Xen uses DbC
That's possible, because the capability was designed specifically to
allow separate driver handle it, in parallel to unmodified xhci driver
(separate set of registers, pretending the port is "disconnected" for
the main xhci driver etc). It works with Linux dom0, although requires
an awful hack - re-enabling bus mastering behind dom0's backs.
Linux driver does similar thing - see
drivers/usb/early/xhci-dbc.c:xdbc_handle_events().
When controller sharing is enabled in kconfig (option marked as
experimental), dom0 is allowed to use the controller even if Xen uses it
for debug console. Additionally, option `dbgp=xhci,share=` is available
to either prevent even dom0 from using it (`no` value), or allow any
domain using it (`any` value).
In any case, to avoid Linux messing with the DbC, mark this MMIO area as
read-only. This might cause issues for Linux's driver (if it tries to
write something on the same page - like anoter xcap), but makes Xen's
use safe. In practice, as of Linux 5.18, it seems to work without
issues.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
drivers/char: fix handling cable re-plug in XHCI console driver
When cable is unplugged, dbc_ensure_running() correctly detects this
situation (DBC_CTRL_DCR flag is clear), and prevent sending data
immediately to the device. It gets only queued in work ring buffers.
When cable is plugged in again, subsequent dbc_flush() will send the
buffered data.
But there is a corner case, where no subsequent data was buffered in the
work buffer, but a TRB was still pending. Ring the doorbell to let the
controller re-send them. For console output it is rare corner case (TRB
is pending for a very short time), but for console input it is very
normal case (there is always one pending TRB for input).
Extract doorbell ringing into separate function to avoid duplication.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Add another work ring buffer for received data, and point IN TRB at it.
Ensure there is always at least one pending IN TRB, so the controller
has a way to send incoming data to the driver.
Note that both "success" and "short packet" completion codes are okay -
in fact it will be "short packet" most of the time, as the TRB length is
about maximum size, not required size.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Jan Beulich <jbeulich@suse.com>
drivers/char: mark DMA buffers as reserved for the XHCI
The important part is to include those buffers in IOMMU page table
relevant for the USB controller. Otherwise, DbC will stop working as
soon as IOMMU is enabled, regardless of to which domain device assigned
(be it xen or dom0).
If the device is passed through to dom0 or other domain (see later
patches), that domain will effectively have access to those buffers too.
It does give such domain yet another way to DoS the system (as is the
case when having PCI device assigned already), but also possibly steal
the console ring content. Thus, such domain should be a trusted one.
In any case, prevent anything else being placed on those pages by adding
artificial padding.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Add API similar to rmrr= and ivmd= arguments, but in a common code. This
will allow drivers to register reserved memory regardless of the IOMMU
vendor.
The direct reason for this API is xhci-dbc console driver (aka xue),
that needs to use DMA. But future change may unify command line
arguments for user-supplied reserved memory, and it may be useful for
other drivers in the future too.
This commit just introduces an API, subsequent patches will plug it in
appropriate places. The reserved memory ranges needs to be saved
locally, because at the point when they are collected, Xen doesn't know
yet which IOMMU driver will be used.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Jan Beulich <jbeulich@suse.com>
drivers/char: allow using both dbgp=xhci and dbgp=ehci
This allows configuring EHCI and XHCI consoles separately,
simultaneously, such that e.g. one can be used for the console and the
other by the debugger.
This changes string_param() to custom_param() in both ehci and xhci
drivers. Both drivers parse only values applicable to them.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 26 Sep 2022 09:07:48 +0000 (11:07 +0200)]
build: correct cppcheck-misra make rule
Having cppcheck-misra.json depend on cppcheck-misra.txt does not
properly address the multiple targets problem. If cppcheck-misra.json
is deleted from the build tree but cppcheck-misra.txt is still there,
nothing will re-generate cppcheck-misra.json.
With GNU make 4.3 or newer we could use the &: grouped target separator,
but since we support older make as well we need to use some other
mechanism. Convert the rule to a pattern one (with "cppcheck" kind of
arbitrarily chosen as the stem), thus making known to make that both
files are created by a single command invocation. Since, as a result,
the JSON file is now "intermediate" from make's perspective, prevent it
being deleted again by making it a prereq of .PRECIOUS.
Fixes: 57caa5375321 ("xen: Add MISRA support to cppcheck make rule") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Daniel P. Smith [Wed, 14 Sep 2022 13:22:01 +0000 (15:22 +0200)]
xsm/flask: correcting initial sid assignment on context allocation
The current flow for initial SID assignment is that the function
flask_domain_alloc_security() allocates the security context and assigns an
initial SID based on the limited state information it can access. Specifically
the initial SID is determined by the domid of the domain, where it would assign
the label for one of the domains the hypervisor constructed with the exception
of initial domain (dom0). In the case of the initial domain and all other
domains it would use the unlabeled_t SID.
When it came to the SID for the initial domain, its assignment was managed by
flask_domain_create() where it would be switched from unlabeled_t to dom0_t.
This logic worked under the assumption that the first call to
flask_domain_create() would be the hypervisor constructing the initial domain.
After which it would be the toolstack constructing the domain, for which it is
expected to provide an appropriate SID or else unlabeled_t would be used.
The issue is that the assumptions upon which the current flow is built were
weak and are invalid for PV shim and dom0less. Under the current flow even
though the initial domain for PV shim is not set as privileged, flask would
label the domain as dom0_t. For dom0less, the situation is two-fold. First is
that every domain after the first domain creation will fail as they will be
labeled as unlabeled_t. The second is that if the dom0less configuration does
not include a "dom0", the first domain created would be labeled as dom0_t.
This commit only seeks to address the situation for PV shim, by including a
check for xenboot_t context in flask_domain_alloc_security() to determine if
the domain is being constructed at system boot. Then a check for is_privilged
and pv_shim is added to differentiate between a "dom0" initial domain and a PV
shim initial domain.
The logic for flask_domain_create() was altered to allow the incoming SID to
override the initial label. This allows a domain builder, whether it is a
toolstack, dom0less, or hyperlaunch, to provide the correct label for the
domain at construction.
The base policy was adjusted to allow the idle domain under the xenboot_t
context the ability to construct domains of both types, dom0_t and domu_t.
This will enable a hypervisor resident domain builder to construct domains
beyond the initial domain,
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Henry Wang [Fri, 9 Sep 2022 05:23:57 +0000 (05:23 +0000)]
xen/arm, device-tree: Make static-mem use #{address,size}-cells
In order to keep consistency in the device tree binding, there is
no need for static memory allocation feature to define a specific
set of address and size cells for "xen,static-mem" property.
Therefore, this commit reuses the regular #{address,size}-cells
for parsing the device tree "xen,static-mem" property. Update
the documentation accordingly.
Also, take the chance to remove the unnecessary "#address-cells"
and "#size-cells" in the domU1 node of the device tree to only
emphasize the related part that the example is showing.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
xen/pci: replace call to is_memory_hole to pci_check_bar
is_memory_hole was implemented for x86 and not for ARM when introduced.
Replace is_memory_hole call to pci_check_bar as function should check
if device BAR is in defined memory range. Also, add an implementation
for ARM which is required for PCI passthrough.
On x86, pci_check_bar will call is_memory_hole which will check if BAR
is not overlapping with any memory region defined in the memory map.
On ARM, pci_check_bar will go through the host bridge ranges and check
if the BAR is in the range of defined ranges.
xen/arm: create shared memory nodes in guest device tree
We expose the shared memory to the domU using the "xen,shared-memory-v1"
reserved-memory binding. See
Documentation/devicetree/bindings/reserved-memory/xen,shared-memory.txt
in Linux for the corresponding device tree binding.
To save the cost of re-parsing shared memory device tree configuration when
creating shared memory nodes in guest device tree, this commit adds new field
"shm_mem" to store shm-info per domain.
For each shared memory region, a range is exposed under
the /reserved-memory node as a child node. Each range sub-node is
named xen-shmem@<address> and has the following properties:
- compatible:
compatible = "xen,shared-memory-v1"
- reg:
the base guest physical address and size of the shared memory region
- xen,id:
a string that identifies the shared memory region.
- xen,offset: (borrower VMs only)
64 bit integer offset within the owner virtual machine's shared
memory region used for the mapping in the borrower VM.
Currently, we provide "xen,offset=<0x0>" as a temporary placeholder.