Andrew Cooper [Tue, 29 Oct 2024 15:27:54 +0000 (16:27 +0100)]
x86/pv: Rename pv.iobmp_limit to iobmp_nr and clarify behaviour
Ever since it's introduction in commit 013351bd7ab3 ("Define new event-channel
and physdev hypercalls") in 2006, the public interface was named nr_ports
while the internal field was called iobmp_limit.
Rename the internal field to iobmp_nr to match the public interface, and
clarify that, when nonzero, Xen will read 2 bytes.
There isn't a perfect parallel with a real TSS, but iobmp_nr being 0 is the
paravirt "no IOPB" case, and it is important that no read occurs in this case.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 633ee8b2df963f7e5cb8de1219c1a48bfb4447f6
master date: 2024-10-01 14:58:18 +0100
Andrew Cooper [Tue, 29 Oct 2024 15:27:41 +0000 (16:27 +0100)]
x86/pv: Handle #PF correctly when reading the IO permission bitmap
The switch statement in guest_io_okay() is a very expensive way of
pre-initialising x with ~0, and performing a partial read into it.
However, the logic isn't correct either.
In a real TSS, the CPU always reads two bytes (like here), and any TSS limit
violation turns silently into no-access. But, in-limit accesses trigger #PF
as usual. AMD document this property explicitly, and while Intel don't (so
far as I can tell), they do behave consistently with AMD.
Switch from __copy_from_guest_offset() to __copy_from_guest_pv(), like
everything else in this file. This removes code generation setting up
copy_from_user_hvm() (in the likely path even), and safety LFENCEs from
evaluate_nospec().
Change the logic to raise #PF if __copy_from_guest_pv() fails, rather than
disallowing the IO port access. This brings the behaviour better in line with
normal x86.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 8a6c495d725408d333c1b47bb8af44615a5bfb18
master date: 2024-10-01 14:58:18 +0100
Andrew Cooper [Tue, 29 Oct 2024 15:27:29 +0000 (16:27 +0100)]
x86/pv: Rework guest_io_okay() to return X86EMUL_*
In order to fix a bug with guest_io_okay() (subsequent patch), rework
guest_io_okay() to take in an emulation context, and return X86EMUL_* rather
than a boolean.
For the failing case, take the opportunity to inject #GP explicitly, rather
than returning X86EMUL_UNHANDLEABLE. There is a logical difference between
"we know what this is, and it's #GP", vs "we don't know what this is".
There is no change in practice as emulation is the final step on general #GP
resolution, but returning X86EMUL_UNHANDLEABLE would be a latent bug if a
subsequent action were to appear.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 7429e1cc071b0e20ea9581da4893fb9b2f6d21d4
master date: 2024-10-01 14:58:18 +0100
x86/traps: Re-enable interrupts after reading cr2 in the #PF handler
Hitting a page fault clobbers %cr2, so if a page fault is handled while
handling a previous page fault then %cr2 will hold the address of the
latter fault rather than the former. In particular, if a debug key
handler happens to trigger during #PF and before %cr2 is read, and that
handler itself encounters a #PF, then %cr2 will be corrupt for the outer #PF
handler.
This patch makes the page fault path delay re-enabling IRQs until %cr2
has been read in order to ensure it stays consistent.
A similar argument holds in additional cases, but they happen to be safe:
* %dr6 inside #DB: Safe because IST exceptions don't re-enable IRQs.
* MSR_XFD_ERR inside #NM: Safe because AMX isn't used in #NM handler.
While in the area, remove redundant q suffix to a movq in entry.S and
the space after the comma.
Fixes: a4cd20a19073 ("[XEN] 'd' key dumps both host and guest state.") Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: b06e76db7c35974f1b127762683e7852ca0c8e76
master date: 2024-10-01 09:45:49 +0200
Taking a fault on a non-byte-granular insn means that the "number of
bytes not handled" return value would need extra care in calculating, if
we want callers to be able to derive e.g. exception context (to be
injected to the guest) - CR2 for #PF in particular - from the value. To
simplify things rather than complicating them, reduce inline assembly to
just byte-granular string insns. On recent CPUs that's also supposed to
be more efficient anyway.
For singular element accessors, however, alignment checks are added,
hence slightly complicating the code. Misaligned (user) buffer accesses
will now be forwarded to copy_{from,to}_guest_ll().
Naturally copy_{from,to}_unsafe_ll() accessors end up being adjusted the
same way, as they're produced by mere re-processing of the same code.
Otoh copy_{from,to}_unsafe() aren't similarly adjusted, but have their
comments made match reality; down the road we may want to change their
return types, e.g. to bool.
Fixes: 76974398a63c ("Added user-memory accessing functionality for x86_64") Fixes: 7b8c36701d26 ("Introduce clear_user and clear_guest") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 67a8e5721e1ea9c28526883036bf08fb2e8a8c9c
master date: 2024-10-01 09:44:55 +0200
xen/ucode: Fix buffer under-run when parsing AMD containers
The AMD container format has no formal spec. It is, at best, precision
guesswork based on AMD's prior contributions to open source projects. The
Equivalence Table has both an explicit length, and an expectation of having a
NULL entry at the end.
Xen was sanity checking the NULL entry, but without confirming that an entry
was present, resulting in a read off the front of the buffer. With some
manual debugging/annotations this manifests as:
(XEN) *** Buf ffff83204c00b19c, eq ffff83204c00b194
(XEN) *** eq: 0c 00 00 00 44 4d 41 00 00 00 00 00 00 00 00 00 aa aa aa aa
^-Actual buffer-------------------^
(XEN) *** installed_cpu: 000c
(XEN) microcode: Bad equivalent cpu table
(XEN) Parsing microcode blob error -22
When loaded by hypercall, the 4 bytes interpreted as installed_cpu happen to
be the containing struct ucode_buf's len field, and luckily will be nonzero.
When loaded at boot, it's possible for the access to #PF if the module happens
to have been placed on a 2M boundary by the bootloader. Under Linux, it will
commonly be the end of the CPIO header.
Drop the probe of the NULL entry; Nothing else cares. A container without one
is well formed, insofar that we can still parse it correctly. With this
dropped, the same container results in:
(XEN) microcode: couldn't find any matching ucode in the provided blob!
Fixes: 4de936a38aa9 ("x86/ucode/amd: Rework parsing logic in cpu_request_microcode()") Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a8bf14f6f331d4f428010b4277b67c33f561ed19
master date: 2024-09-13 15:23:30 +0100
blkif: reconcile protocol specification with in-use implementations
Current blkif implementations (both backends and frontends) have all slight
differences about how they handle the 'sector-size' xenstore node, and how
other fields are derived from this value or hardcoded to be expressed in units
of 512 bytes.
To give some context, this is an excerpt of how different implementations use
the value in 'sector-size' as the base unit for to other fields rather than
just to set the logical sector size of the block device:
An attempt was made by 67e1c050e36b in order to change the base units of the
request fields and the xenstore 'sectors' node. That however only lead to more
confusion, as the specification now clearly diverged from the reference
implementation in Linux. Such change was only implemented for QEMU Qdisk
and Windows PV blkfront.
Partially revert to the state before 67e1c050e36b while adjusting the
documentation for 'sectors' to match what it used to be previous to 2fa701e5346d:
* Declare 'feature-large-sector-size' deprecated. Frontends should not expose
the node, backends should not make decisions based on its presence.
* Clarify that 'sectors' xenstore node and the requests fields are always in
512-byte units, like it was previous to 2fa701e5346d and 67e1c050e36b.
All base units for the fields used in the protocol are 512-byte based, the
xenbus 'sector-size' field is only used to signal the logic block size. When
'sector-size' is greater than 512, blkfront implementations must make sure that
the offsets and sizes (despite being expressed in 512-byte units) are aligned
to the logical block size specified in 'sector-size', otherwise the backend
will fail to process the requests.
This will require changes to some of the frontends and backends in order to
properly support 'sector-size' nodes greater than 512.
Fixes: 2fa701e5346d ('blkif.h: Provide more complete documentation of the blkif interface') Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector based quantities') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
master commit: 221f2748e8dabe8361b8cdfcffbeab9102c4c899
master date: 2024-09-12 14:04:56 +0200
xen/x86/pvh: handle ACPI RSDT table in PVH Dom0 build
Xen always generates an XSDT table even if the firmware only provided an
RSDT table. Copy the RSDT header from the firmware table, adjusting the
signature, for the XSDT table when not provided by the firmware.
This is necessary to run Xen on QEMU.
Fixes: 1d74282c455f ('x86: setup PVHv2 Dom0 ACPI tables') Suggested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 6e7f7a0c16c4d406bda6d4a900252ff63a7c5fad
master date: 2024-09-12 09:18:25 +0200
Jan Beulich [Tue, 24 Sep 2024 12:43:02 +0000 (14:43 +0200)]
x86/HVM: properly reject "indirect" VRAM writes
While ->count will only be different from 1 for "indirect" (data in
guest memory) accesses, it being 1 does not exclude the request being an
"indirect" one. Check both to be on the safe side, and bring the ->count
part also in line with what ioreq_send_buffered() actually refuses to
handle.
Fixes: 3bbaaec09b1b ("x86/hvm: unify stdvga mmio intercept with standard mmio intercept") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: eb7cd0593d88c4b967a24bca8bd30591966676cd
master date: 2024-09-12 09:13:04 +0200
Jan Beulich [Tue, 24 Sep 2024 12:42:39 +0000 (14:42 +0200)]
x86emul/test: fix build with gas 2.43
Drop explicit {evex} pseudo-prefixes. New gas (validly) complains when
they're used on things other than instructions. Our use was potentially
ahead of macro invocations - see simd.h's "override" macro.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3c09288298af881ea1bb568740deb2d2a06bcd41
master date: 2024-09-06 08:41:18 +0200
Jan Beulich [Tue, 24 Sep 2024 12:41:59 +0000 (14:41 +0200)]
x86: fix UP build with gcc14
The complaint is:
In file included from ././include/xen/config.h:17,
from <command-line>:
arch/x86/smpboot.c: In function ‘link_thread_siblings.constprop’:
./include/asm-generic/percpu.h:16:51: error: array subscript [0, 0] is outside array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds=]
16 | (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
./include/xen/compiler.h:140:29: note: in definition of macro ‘RELOC_HIDE’
140 | (typeof(ptr)) (__ptr + (off)); })
| ^~~
arch/x86/smpboot.c:238:27: note: in expansion of macro ‘per_cpu’
238 | cpumask_set_cpu(cpu2, per_cpu(cpu_sibling_mask, cpu1));
| ^~~~~~~
In file included from ./arch/x86/include/generated/asm/percpu.h:1,
from ./include/xen/percpu.h:30,
from ./arch/x86/include/asm/cpuid.h:9,
from ./arch/x86/include/asm/cpufeature.h:11,
from ./arch/x86/include/asm/system.h:6,
from ./include/xen/list.h:11,
from ./include/xen/mm.h:68,
from arch/x86/smpboot.c:12:
./include/asm-generic/percpu.h:12:22: note: while referencing ‘__per_cpu_offset’
12 | extern unsigned long __per_cpu_offset[NR_CPUS];
| ^~~~~~~~~~~~~~~~
Which I consider bogus in the first place ("array subscript [0, 0]" vs a
1-element array). Yet taking the experience from 99f942f3d410 ("Arm64:
adjust __irq_to_desc() to fix build with gcc14") I guessed that
switching function parameters to unsigned int (which they should have
been anyway) might help. And voilà ...
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a2de7dc4d845738e734b10fce6550c89c6b1092c
master date: 2024-09-04 16:09:28 +0200
Jan Beulich [Tue, 24 Sep 2024 12:41:51 +0000 (14:41 +0200)]
SUPPORT.md: split XSM from Flask
XSM is a generic framework, which in particular is also used by SILO.
With this it can't really be experimental: Arm mandates SILO for having
a security supported configuration.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
master commit: d7c18b8720824d7efc39ffa7296751e1812865a9
master date: 2024-09-04 16:05:03 +0200
libxl: Fix nul-termination of the return value of libxl_xen_console_read_line()
When built with ASAN, "xl dmesg" crashes in the "printf("%s", line)"
call in main_dmesg(). ASAN reports a heap buffer overflow: an
off-by-one access to cr->buffer.
The readconsole sysctl copies up to count characters into the buffer,
but it does not add a null character at the end. Despite the
documentation of libxl_xen_console_read_line(), line_r is not
nul-terminated if 16384 characters were copied to the buffer.
Fix this by asking xc_readconsolering() to fill the buffer up to size
- 1. As the number of characters in the buffer is only needed in
libxl_xen_console_read_line(), make it a local variable there instead
of part of the libxl__xen_console_reader struct.
Jan Beulich [Tue, 24 Sep 2024 12:40:34 +0000 (14:40 +0200)]
Arm64: adjust __irq_to_desc() to fix build with gcc14
With the original code I observe
In function ‘__irq_to_desc’,
inlined from ‘route_irq_to_guest’ at arch/arm/irq.c:465:12:
arch/arm/irq.c:54:16: error: array subscript -2 is below array bounds of ‘irq_desc_t[32]’ {aka ‘struct irq_desc[32]’} [-Werror=array-bounds=]
54 | return &this_cpu(local_irq_desc)[irq];
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
which looks pretty bogus: How in the world does the compiler arrive at
-2 when compiling route_irq_to_guest()? Yet independent of that the
function's parameter wants to be of unsigned type anyway, as shown by
a vast majority of callers (others use plain int when they really mean
non-negative quantities). With that adjustment the code compiles fine
again.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Michal Orzel <michal.orzel@amd.com>
master commit: 99f942f3d410059dc223ee0a908827e928ef3592
master date: 2024-08-29 10:03:53 +0200
For partial writes the non-written parts of registers are folded into
the full 64-bit value from what they're presently set to. That's wrong
to do though when the behavior is write-1-to-clear: Writes not
including to low 3 bits would unconditionally clear all ISR bits which
are presently set. Re-calculate the value to use.
Fixes: be07023be115 ("x86/vhpet: add support for level triggered interrupts") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 41d358d2f9607ba37c216effa39b9f1bc58de69d
master date: 2024-08-29 10:02:20 +0200
x86/dom0: disable SMAP for PV domain building only
Move the logic that disables SMAP so it's only performed when building a PV
dom0, PVH dom0 builder doesn't require disabling SMAP.
The fixes tag is to account for the wrong usage of cpu_has_smap in
create_dom0(), it should instead have used
boot_cpu_has(X86_FEATURE_XEN_SMAP). Fix while moving the logic to apply to PV
only.
While there also make cr4_pv32_mask __ro_after_init.
Fixes: 493ab190e5b1 ('xen/sm{e, a}p: allow disabling sm{e, a}p for Xen itself') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: fb1658221a31ec1db33253a80001191391e73b17
master date: 2024-08-28 19:59:07 +0100
Jan Beulich [Tue, 24 Sep 2024 12:38:27 +0000 (14:38 +0200)]
x86/x2APIC: correct cluster tracking upon CPUs going down for S3
Downing CPUs for S3 is somewhat special: Since we can expect the system
to come back up in exactly the same hardware configuration, per-CPU data
for the secondary CPUs isn't de-allocated (and then cleared upon re-
allocation when the CPUs are being brought back up). Therefore the
cluster_cpus per-CPU pointer will retain its value for all CPUs other
than the final one in a cluster (i.e. in particular for all CPUs in the
same cluster as CPU0). That, however, is in conflict with the assertion
early in init_apic_ldr_x2apic_cluster().
Note that the issue is avoided on Intel hardware, where we park CPUs
instead of bringing them down.
Extend the bypassing of the freeing to the suspend case, thus making
suspend/resume also a tiny bit faster.
Fixes: 2e6c8f182c9c ("x86: distinguish CPU offlining from CPU removal") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: ad3ff7b4279d16c91c23cda6e8be5bc670b25c9a
master date: 2024-08-26 10:30:40 +0200
Jan Beulich [Tue, 24 Sep 2024 12:37:52 +0000 (14:37 +0200)]
x86emul: set (fake) operand size for AVX512CD broadcast insns
Back at the time I failed to pay attention to op_bytes still being zero
when reaching the respective case block: With the ext0f38_table[]
entries having simd_packed_int, the defaulting at the bottom of
x86emul_decode() won't set the field to non-zero for F3-prefixed insns.
Fixes: 37ccca740c26 ("x86emul: support AVX512CD insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6fa6b7feaafd622db3a2f3436750cf07782f4c12
master date: 2024-08-23 09:12:24 +0200
Jan Beulich [Tue, 24 Sep 2024 12:37:08 +0000 (14:37 +0200)]
x86emul: always set operand size for AVX-VNNI-INT8 insns
Unlike for AVX-VNNI-INT16 I failed to notice that op_bytes may still be
zero when reaching the respective case block: With the ext0f38_table[]
entries having simd_packed_int, the defaulting at the bottom of
x86emul_decode() won't set the field to non-zero for F3- or F2-prefixed
insns.
Fixes: 842acaa743a5 ("x86emul: support AVX-VNNI-INT8") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: d45687cca2450bfebe1dfbddb22f4f03c6fbc9cb
master date: 2024-08-23 09:11:15 +0200
Andrew Cooper [Tue, 24 Sep 2024 12:36:25 +0000 (14:36 +0200)]
x86/pv: Address Coverity complaint in check_guest_io_breakpoint()
Commit 08aacc392d86 ("x86/emul: Fix misaligned IO breakpoint behaviour in PV
guests") caused a Coverity INTEGER_OVERFLOW complaint based on the reasoning
that width could be 0.
It can't, but digging into the code generation, GCC 8 and later (bisected on
godbolt) choose to emit a CSWITCH lookup table, and because the range (bottom
2 bits clear), it's a 16-entry lookup table.
So Coverity is understandable, given that GCC did emit a (dead) logic path
where width stayed 0.
Rewrite the logic. Introduce x86_bp_width() which compiles to a single basic
block, which replaces the switch() statement. Take the opportunity to also
make start and width be loop-scope variables.
No practical change, but it should compile better and placate Coverity.
Fixes: 08aacc392d86 ("x86/emul: Fix misaligned IO breakpoint behaviour in PV guests")
Coverity-ID: 1616152 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6d41a9d8a12ff89adabdc286e63e9391a0481699
master date: 2024-08-21 23:59:19 +0100
Andrew Cooper [Tue, 24 Sep 2024 12:34:30 +0000 (14:34 +0200)]
x86/pv: Fix merging of new status bits into %dr6
All #DB exceptions result in an update of %dr6, but this isn't captured in
Xen's handling, and is buggy just about everywhere.
To begin resolving this issue, add a new pending_dbg field to x86_event
(unioned with cr2 to avoid taking any extra space, adjusting users to avoid
old-GCC bugs with anonymous unions), and introduce pv_inject_DB() to replace
the current callers using pv_inject_hw_exception().
Push the adjustment of v->arch.dr6 into pv_inject_event(), and use the new
x86_merge_dr6() rather than the current incorrect logic.
A key property is that pending_dbg is taken with positive polarity to deal
with RTM/BLD sensibly. Most callers pass in a constant, but callers passing
in a hardware %dr6 value need to XOR the value with X86_DR6_DEFAULT to flip to
positive polarity.
This fixes the behaviour of the breakpoint status bits; that any left pending
are generally discarded when a new #DB is raised. In principle it would fix
RTM/BLD too, except PV guests can't turn these capabilities on to start with.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: db39fa4b27ea470902d4625567cb6fa24030ddfa
master date: 2024-08-21 23:59:19 +0100
Andrew Cooper [Tue, 24 Sep 2024 12:30:49 +0000 (14:30 +0200)]
x86/pv: Introduce x86_merge_dr6() and fix do_debug()
Pretty much everywhere in Xen the logic to update %dr6 when injecting #DB is
buggy. Introduce a new x86_merge_dr6() helper, and start fixing the mess by
adjusting the dr6 merge in do_debug(). Also correct the comment.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 54ef601a66e8d812a6a6a308f02524e81201825e
master date: 2024-08-21 23:59:19 +0100
Jan Beulich [Tue, 24 Sep 2024 12:30:04 +0000 (14:30 +0200)]
x86emul: correct #UD check for AVX512-FP16 complex multiplications
avx512_vlen_check()'s argument was inverted, while the surrounding
conditional wrongly forced the EVEX.L'L check for the scalar forms when
embedded rounding was in effect.
Fixes: d14c52cba0f5 ("x86emul: handle AVX512-FP16 complex multiplication insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a30d438ce58b70c5955f5d37f776086ab8f88623
master date: 2024-08-19 15:32:31 +0200
Jan Beulich [Tue, 24 Sep 2024 12:28:22 +0000 (14:28 +0200)]
Arm: correct FIXADDR_TOP
While reviewing a RISC-V patch cloning the Arm code, I noticed an
off-by-1 here: FIX_PMAP_{BEGIN,END} being an inclusive range and
FIX_LAST being the same as FIX_PMAP_END, FIXADDR_TOP cannot derive from
FIX_LAST alone, or else the BUG_ON() in virt_to_fix() would trigger if
FIX_PMAP_END ended up being used.
While touching this area also add a check for fixmap and boot FDT area
to not only not overlap, but to have at least one (unmapped) page in
between.
Jan Beulich [Tue, 24 Sep 2024 12:27:03 +0000 (14:27 +0200)]
x86/vLAPIC: prevent undue recursion of vlapic_error()
With the error vector set to an illegal value, the function invoking
vlapic_set_irq() would bring execution back here, with the non-recursive
lock already held. Avoid the call in this case, merely further updating
ESR (if necessary).
This is XSA-462 / CVE-2024-45817.
Fixes: 5f32d186a8b1 ("x86/vlapic: don't silently accept bad vectors") Reported-by: Federico Serafini <federico.serafini@bugseng.com> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: c42d9ec61f6d11e25fa77bd44dd11dad1edda268
master date: 2024-09-24 14:23:29 +0200
Use expect to invoke QEMU so that we can terminate the test as soon as
we get the right string in the output instead of waiting until the
final timeout.
For timeout, instead of an hardcoding the value, use a Gitlab CI
variable "QEMU_TIMEOUT" that can be changed depending on the latest
status of the Gitlab CI runners.
The Yocto jobs take a long time to run. We are changing Gitlab ARM64
runners and the new runners might not be able to finish the Yocto jobs
in a reasonable time.
For now, disable the Yocto jobs by turning them into "manual" trigger
(they need to be manually executed.)
Jan Beulich [Tue, 13 Aug 2024 14:48:13 +0000 (16:48 +0200)]
x86/pass-through: documents as security-unsupported when sharing resources
When multiple devices share resources and one of them is to be passed
through to a guest, security of the entire system and of respective
guests individually cannot really be guaranteed without knowing
internals of any of the involved guests. Therefore such a configuration
cannot really be security-supported, yet making that explicit was so far
missing.
Teddy Astie [Tue, 13 Aug 2024 14:47:19 +0000 (16:47 +0200)]
x86/IOMMU: move tracking in iommu_identity_mapping()
If for some reason xmalloc() fails after having mapped the reserved
regions, an error is reported, but the regions remain mapped in the P2M.
Similarly if an error occurs during set_identity_p2m_entry() (except on
the first call), the partial mappings of the region would be retained
without being tracked anywhere, and hence without there being a way to
remove them again from the domain's P2M.
Move the setting up of the list entry ahead of trying to map the region.
In cases other than the first mapping failing, keep record of the full
region, such that a subsequent unmapping request can be properly torn
down.
To compensate for the potentially excess unmapping requests, don't log a
warning from p2m_remove_identity_entry() when there really was nothing
mapped at a given GFN.
This is XSA-460 / CVE-2024-31145.
Fixes: 2201b67b9128 ("VT-d: improve RMRR region handling") Fixes: c0e19d7c6c42 ("IOMMU: generalize VT-d's tracking of mapped RMRR regions") Signed-off-by: Teddy Astie <teddy.astie@vates.tech> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: beadd68b5490ada053d72f8a9ce6fd696d626596
master date: 2024-08-13 16:36:40 +0200
Matthew Barnes [Thu, 8 Aug 2024 11:47:30 +0000 (13:47 +0200)]
tools/lsevtchn: Use errno macro to handle hypercall error cases
Currently, lsevtchn aborts its event channel enumeration when it hits
an event channel that is owned by Xen.
lsevtchn does not distinguish between different hypercall errors, which
results in lsevtchn missing potential relevant event channels with
higher port numbers.
Use the errno macro to distinguish between hypercall errors, and
continue event channel enumeration if the hypercall error is not
critical to enumeration.
Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
master commit: e92a453c8db8bba62d6be3006079e2b9990c3978
master date: 2024-08-02 08:43:57 +0200
George Dunlap [Thu, 8 Aug 2024 11:47:02 +0000 (13:47 +0200)]
xen/hvm: Don't skip MSR_READ trace record
Commit 37f074a3383 ("x86/msr: introduce guest_rdmsr()") introduced a
function to combine the MSR_READ handling between PV and HVM.
Unfortunately, by returning directly, it skipped the trace generation,
leading to gaps in the trace record, as well as xenalyze errors like
this:
hvm_generic_postprocess: d2v0 Strange, exit 7c(VMEXIT_MSR) missing a handler
Roger Pau Monné [Thu, 8 Aug 2024 11:45:58 +0000 (13:45 +0200)]
x86/altcall: further refine clang workaround
The current code in ALT_CALL_ARG() won't successfully workaround the clang
code-generation issue if the arg parameter has a size that's not a power of 2.
While there are no such sized parameters at the moment, improve the workaround
to also be effective when such sizes are used.
Instead of using a union with a long use an unsigned long that's first
initialized to 0 and afterwards set to the argument value.
Roger Pau Monné [Thu, 8 Aug 2024 11:45:28 +0000 (13:45 +0200)]
x86/dom0: fix restoring %cr3 and the mapcache override on PV build error
One of the error paths in the PV dom0 builder section that runs on the guest
page-tables wasn't restoring the Xen value of %cr3, neither removing the
mapcache override.
Andrew Cooper [Thu, 8 Aug 2024 11:44:56 +0000 (13:44 +0200)]
XSM/domctl: Fix permission checks on XEN_DOMCTL_createdomain
The XSM checks for XEN_DOMCTL_createdomain are problematic. There's a split
between xsm_domctl() called early, and flask_domain_create() called quite late
during domain construction.
All XSM implementations except Flask have a simple IS_PRIV check in
xsm_domctl(), and operate as expected when an unprivileged domain tries to
make a hypercall.
Flask however foregoes any action in xsm_domctl() and defers everything,
including the simple "is the caller permitted to create a domain" check, to
flask_domain_create().
As a consequence, when XSM Flask is active, and irrespective of the policy
loaded, all domains irrespective of privilege can:
* Mutate the global 'rover' variable, used to track the next free domid.
Therefore, all domains can cause a domid wraparound, and combined with a
voluntary reboot, choose their own domid.
* Cause a reasonable amount of a domain to be constructed before ultimately
failing for permission reasons, including the use of settings outside of
supported limits.
In order to remediate this, pass the ssidref into xsm_domctl() and at least
check that the calling domain privileged enough to create domains.
Take the opportunity to also fix the sign of the cmd parameter to be unsigned.
This issue has not been assigned an XSA, because Flask is experimental and not
security supported.
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
master commit: ee32b9b29af449d38aad0a1b3a81aaae586f5ea7
master date: 2024-07-30 17:42:17 +0100
Ross Lagerwall [Thu, 8 Aug 2024 11:44:26 +0000 (13:44 +0200)]
bunzip2: fix rare decompression failure
The decompression code parses a huffman tree and counts the number of
symbols for a given bit length. In rare cases, there may be >= 256
symbols with a given bit length, causing the unsigned char to overflow.
This causes a decompression failure later when the code tries and fails to
find the bit length for a given symbol.
Since the maximum number of symbols is 258, use unsigned short instead.
Fixes: ab77e81f6521 ("x86/dom0: support bzip2 and lzma compressed bzImage payloads") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 303d3ff85c90ee4af4bad4e3b1d4932fa2634d64
master date: 2024-07-30 11:55:56 +0200
"$dev" needs to be set correctly for backendtype=phy as well as
backendtype=tap. Move the setting into the conditional, so it can be
handled properly for each.
(dev could be captured during tap-ctl allocate for blktap module, but it
would not be set properly for the find_device case. The backendtype=tap
case would need to be handled regardless.)
x86/altcall: fix clang code-gen when using altcall in loop constructs
Yet another clang code generation issue when using altcalls.
The issue this time is with using loop constructs around alternative_{,v}call
instances using parameter types smaller than the register size.
Given the following example code:
static void bar(bool b)
{
unsigned int i;
for ( i = 0; i < 10; i++ )
{
int ret_;
register union {
bool e;
unsigned long r;
} di asm("rdi") = { .e = b };
register unsigned long si asm("rsi");
register unsigned long dx asm("rdx");
register unsigned long cx asm("rcx");
register unsigned long r8 asm("r8");
register unsigned long r9 asm("r9");
register unsigned long r10 asm("r10");
register unsigned long r11 asm("r11");
Clang will generate machine code that only resets the low 8 bits of %rdi
between loop calls, leaving the rest of the register possibly containing
garbage from the use of %rdi inside the called function. Note also that clang
doesn't truncate the input parameters at the callee, thus breaking the psABI.
Fix this by turning the `e` element in the anonymous union into an array that
consumes the same space as an unsigned long, as this forces clang to reset the
whole %rdi register instead of just the low 8 bits.
Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang') Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
master commit: d51b2f5ea1915fe058f730b0ec542cf84254fca0
master date: 2024-07-23 13:59:30 +0200
Jan Beulich [Tue, 16 Jul 2024 12:09:14 +0000 (14:09 +0200)]
x86/IRQ: avoid double unlock in map_domain_pirq()
Forever since its introduction the main loop in the function dealing
with multi-vector MSI had error exit points ("break") with different
properties: In one case no IRQ descriptor lock is being held.
Nevertheless the subsequent error cleanup path assumed such a lock would
uniformly need releasing. Identify the case by setting "desc" to NULL,
thus allowing the unlock to be skipped as necessary.
This is CVE-2024-31143 / XSA-458.
Coverity ID: 1605298 Fixes: d1b6d0a02489 ("x86: enable multi-vector MSI") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Thu, 11 Jul 2024 15:09:58 +0000 (16:09 +0100)]
CI: Add Ubuntu 22.04 (Jammy) and 24.04 (Noble) testing
The containers are exactly as per 20.04 (Focal). However, this now brings us
to 5 releases * 4 build jobs worth of Ubuntu testing, which is overkill.
The oldest and newest toolchains are the most likely to find problems with new
code, so reduce the middle 3 releases (18/20/22) to just a single smoke test
each.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 11 Jul 2024 15:09:22 +0000 (16:09 +0100)]
CI: Refresh Ubuntu Focal container as 20.04-x86_64
As with 16.04 (Xenial), with python3-setuptools included. Having this package
only in some containers was intentional; see commit bbc72a7877d8 ("automation:
Add python3's setuptools to some containers") for the rational.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 10 Jul 2024 13:37:53 +0000 (14:37 +0100)]
CI: Refresh OpenSUSE Leap container
See prior patch for most discussion.
Despite appearing to be a fixed release (and therefore not marked as permitted
failure), the dockerfile references the `leap` tag which is rolling in
practice. Switch to 15.6 explicitly, for better test stability.
Vs tumbleweed, use `zypper update` rather than dist-upgrade, and retain the
RomBIOS dependencies; bin86 and dev86.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 10 Jul 2024 13:40:23 +0000 (14:40 +0100)]
CI: Refresh OpenSUSE Tumbleweed container
Existing as suse:opensuse-tumbleweed is a historical quirk, and adjusted for
consistency with all the other containers.
Make it non-root, use heredocs for legibility, and use the zypper long names
for the benefit of those wondering what was being referenced or duplicated.
Trim the dependencies substantially. Testing docs isn't very interesting and
saves a lot of space. Other savings come from removing a huge pile of
optional QEMU dependencies (QEMU just needs to build the Xen parts to be
useful here, not have a full GUI environment).
Finally, there where some packages such as bc, libssh2-devel, libtasn1-devel
and nasm that I'm not aware of any reason to have had, even historically.
Furthermore, identify which components of the build use which dependencies,
which will help managing them in the future.
Thanks to Olaf Hering for dependency fixes that have been subsumed into this
total overhaul.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Tue, 9 Jul 2024 14:54:52 +0000 (15:54 +0100)]
CI: Refresh and upgrade the GCC-IBT container
Upgrade from Debian buster to bookworm, GCC 11.3 to 11.4 and to be a non-root
container.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Mon, 8 Jul 2024 17:18:22 +0000 (18:18 +0100)]
CI: Refresh bullseye-ppc64le as debian:11-ppc64le
... in the style of debian:12-ppc64le.
Rename the jobs and reposition them later as they're not a dependency for the
smoke testing any more.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Mon, 8 Jul 2024 17:17:25 +0000 (18:17 +0100)]
CI: Use debian:12-ppc64le for smoke testing
qemu-system-ppc64/8.1.0-ppc64 was added because bullseye's QEMU didn't
understand the powernv9 machine. However bookworm's QEMU does and this is
preferable to maintaining a random build of QEMU ourselves.
Use the debian:12-ppc64le container and test the output of that build too.
Remove qemu-system-ppc64-8.1.0-ppc64-export which is unused now.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Mon, 8 Jul 2024 17:00:21 +0000 (18:00 +0100)]
CI: Introduce a debian:12-ppc64le container
... conforming to the new naming scheme; $DISTRO-$VERSION-$ARCH-* so the jobs
sort more coherently.
Make it non-root by default, and set XEN_TARGET_ARCH=ppc64. Include QEMU too,
which will be used subsequently.
Add build jobs too, with debian-12-ppc64le-gcc-debug specifically early as it
will be used for smoke testing shortly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 10 Jul 2024 12:38:52 +0000 (13:38 +0100)]
CI: Mark Archlinux/x86 as allowing failures
Archlinux is a rolling distro. As a consequence, rebuilding the container
periodically changes the toolchain, and this affects all stable branches in
one go.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 10 Jul 2024 00:01:13 +0000 (01:01 +0100)]
CI: Drop Ubuntu Trusty testing
This is also End of Life.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Tue, 9 Jul 2024 23:26:56 +0000 (00:26 +0100)]
CI: Drop Debian Stretch testing
Debian stretch is also End of Life. Update a couple of test steps to use
bookworm instead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Tue, 9 Jul 2024 23:02:47 +0000 (00:02 +0100)]
CI: Drop Debian Jessie dockerfiles
These were removed from testing in Xen 4.18.
Fixes: 3817e3c1b4b8 ("automation: Remove testing on Debian Jessie") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
as PPC64 doesn't want randconfig right now, and buster-gcc-ibt is a special
job with a custom compiler.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Mon, 8 Jul 2024 17:00:49 +0000 (18:00 +0100)]
CI: Fix CONTAINER_UID0=1 scripts/containerize
Right now, most build containers use root. Archlinux, Fedora and Yocto set up
a regular user called `user`.
For those containers, trying to containerize as root fails, because
CONTAINER_UID0=1 does nothing, whereas CONTAINER_UID0=0 forces the user away
from root.
To make CONTAINER_UID0=1 work reliably, force to root if requested.
Fixes: 17fbe6504dfd ("automation: introduce a new variable to control container user") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 4 Jul 2024 12:09:21 +0000 (13:09 +0100)]
build: Drop xorg-x11 as a build dependency
The history on this one is complicated. The note to README was added in
commit 1f95747a4f16 ("Add openssl-dev and xorg-x11-dev to README") in 2007.
At the time, there was a vendered version of Qemu in xen.git with a local
modification using <X11/keysymdef.h> to access the monitor console over VNC.
The final reference to keysymdef.h was dropped in commit 85896a7c4dc7 ("build:
add autoconf to replace custom checks in tools/check") in 2012. The next
prior mention was in 2009 with commit a8ccb671c377 ("tools: fix x11 check")
noting that x11 was not a direct dependcy of Xen; it was transitive through
SDL for Qemu for source-based distros.
It appears there may have been other unspecified dependencies on xorg,
e.g. the use of lndir by unmodified_drivers which are no longer relevant
either.
These days its only the Debian based dockerfiles which install xorg-x11, and
Qemu builds fine in these and others without x11.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 3 Jul 2024 20:02:20 +0000 (21:02 +0100)]
CI: Refresh the Coverity Github Action configuration
Update to Ubuntu 24.04, and checkout@v4 as v2 is deprecated.
The build step goes out of it's way to exclude docs and stubdom (but include
plain MiniOS), so disable those at the ./configure stage.
Refresh the package list. libbz2-dev was in there twice, and e2fslibs-dev is
a a transitional package to libext2fs-dev. I'm not aware of libtool ever
having been a Xen dependency.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 4 Jul 2024 12:08:40 +0000 (13:08 +0100)]
build: Fix the version of python checked for by ./configure
We previously upped the minimum python version to 2.7, but neglected to
reflect this in ./configure
Fixes: 2a353c048c68 ("tools: Don't use distutils in configure or Makefile") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 3 Jul 2024 17:21:09 +0000 (18:21 +0100)]
build: Regenerate ./configure with Autoconf 2.71
This is the version now found in Debian Bookworm.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
x86/physdev: Return pirq that irq was already mapped to
Fix bug introduced by 0762e2502f1f ("x86/physdev: factor out the code to allocate and
map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means
caller want to allocate a free pirq for irq but irq already has a mapped pirq, then
it returns the negative pirq, so it fails. However, the logic before that
re-factoring is different, it should return the current_pirq that irq was already
mapped to and make the call success.
Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a pirq") Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 3 Jul 2024 11:06:46 +0000 (12:06 +0100)]
CI: Rework the CentOS7 container
CentOS 7 is fully End-of-life as of 2024-06-30, and the Yum repo configuration
points at URLs which have become non-existent.
First, start by using a heredoc RUN for legibility. It's important to use
`set -e` to offset the fact that we're no longer chaining every command
together with an &&.
Also, because we're using a single RUN command to perform all RPM operations,
we no longer need to work around the OverlayFS bug.
Adjust the CentOS-*.repo files to point at vault.centos.org. This also
involves swapping mirrorlist= for baseurl= in the yum config.
Use a minor bashism to express the dependenices more coherently, and identify
why we have certain dependencies. Some adjustments are:
* We need bzip2-devel for the dombuilder. bzip2 needs retaining stubdom or
`tar` fails to unpack the .bz2 archives.
* {lzo,lz4,ztd}-devel are new optional dependency since the last time this
package list was refreshed.
* openssl-devel hasn't been a dependency since Xen 4.6.
* We long ago ceased being able to build Qemu and SeaBIOS in this container,
so drop their dependencies too.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
For inline files, use COPY with a heredoc, rather than opencoding it through
/bin/sh.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Tue, 2 Jul 2024 13:34:36 +0000 (14:34 +0100)]
CI: Formalise the use of heredocs
Commit b5739330d7f4 introduced the use of heredocs in the jessie/stretch
dockerfiles.
It turns out this was introduced by BuildKit in 2018 along with a
standardisation of Dockerfile syntax, and has subsequently been adopted by the
docker community.
Annotate all dockerfiles with a statement of the syntax in use, and extend
README.md details including how to activate BuildKit when it's available but
off by default.
This allows the containers to be rebuilt following commit a0e29b316363 ("CI:
Drop glibc-i386 from the build containers").
Fixes: b5739330d7f4 ("automation: fix jessie/stretch images to use archive.debian.org apt repos") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Python regexes should use raw strings. Convert all regexes, and drop escaped
backslashes. Note that regular escape sequences are interpreted normally when
parsing a regex, so \n even in a raw-string regex is a newline.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 3 Jul 2024 20:59:34 +0000 (21:59 +0100)]
build/mkheader: Remove C-isms from the code
This was clearly written by a C programmer, rather than a python programmer.
Drop all the useless semi-colons.
The very final line of the script simply references f.close, rather than
calling the function. Switch to using a with: statement, as python does care
about unclosed files if you enable enough warnings.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 3 Jul 2024 22:01:11 +0000 (23:01 +0100)]
tools/xs-clients: Fix `make clean` rule
Prior to the split, "the clients" used tools/xenstored/Makefile.common whose
clean rule includes *.o whereas after the split, the removal of *.o was lost
by virtule of not including Makefile.common any more.
This is the bug behind the following build error:
make[2]: Entering directory '/local/xen.git/tools/xs-clients'
gcc xenstore_client.o (snip)
/usr/bin/ld: xenstore_client.o: relocation R_X86_64_32S against `.rodata' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:35: xenstore] Error 1
which was caused by `make clean` not properly cleaning the tree as I was
swapping between various build containers.
Switch to a plain single-colon clean rule.
Fixes: 5c293058b130 ("tools/xenstore: move xenstored sources into dedicated directory") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
xen/riscv: use .insn with operands to support the older gas
Support for specifying "raw" insns was added only in 2.38.
To support older version it would be better switch to .insn
with operands.
The following compilation error occurs:
./arch/riscv/include/asm/processor.h: Assembler messages:
./arch/riscv/include/asm/processor.h:70: Error: unrecognized opcode `0x0100000F'
In case of the following Binutils:
$ riscv64-linux-gnu-as --version
GNU assembler (GNU Binutils for Debian) 2.35.2
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Drop the 0, which is in line with how we annotate RCs elsewhere.
Fixes: 4a73eb4c205d ("Update Xen version to 4.19-rc") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Wed, 3 Jul 2024 12:04:15 +0000 (14:04 +0200)]
cmdline: "extra_guest_irqs" is inapplicable to PVH
PVH in particular has no (externally visible) notion of pIRQ-s. Mention
that in the description of the respective command line option and have
arch_hwdom_irqs() also reflect this (thus suppressing the log message
there as well, as being pretty meaningless in this case anyway).
Suggested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Wed, 3 Jul 2024 12:03:27 +0000 (14:03 +0200)]
amend 'cmdline: document and enforce "extra_guest_irqs" upper bounds'
Address late review comments for what is now commit 17f6d398f765:
- bound max_irqs right away against nr_irqs
- introduce a #define for a constant used twice
Requested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Tue, 2 Jul 2024 10:01:59 +0000 (12:01 +0200)]
xen: avoid UB in guest handle field accessors
Much like noted in 43d5c5d5f70b ("xen: avoid UB in guest handle
arithmetic"), address calculations involved in accessing a struct field
can overflow, too. Cast respective pointers to "unsigned long" and
convert type checking accordingly. Remaining arithmetic is, despite
there possibly being mathematical overflow, okay as per the C99 spec:
"A computation involving unsigned operands can never overflow, because a
result that cannot be represented by the resulting unsigned integer type
is reduced modulo the number that is one greater than the largest value
that can be represented by the resulting type." The overflow that we
need to guard against is checked for in array_access_ok().
While there add the missing (see {,__}copy_to_guest_offset()) is-not-
const checks to {,__}copy_field_to_guest().
Typically, but not always, no change to generated code; code generation
(register allocation) is different for at least common/grant_table.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Tue, 2 Jul 2024 10:01:21 +0000 (12:01 +0200)]
x86/entry: don't clear DF when raising #UD for lack of syscall handler
While doing so is intentional when invoking the actual callback, to
mimic a hard-coded SYCALL_MASK / FMASK MSR, the same should not be done
when no handler is available and hence #UD is raised.
Fixes: ca6fcf4321b3 ("x86/pv: Inject #UD for missing SYSCALL callbacks") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Tue, 2 Jul 2024 10:00:27 +0000 (12:00 +0200)]
cmdline: document and enforce "extra_guest_irqs" upper bounds
PHYSDEVOP_pirq_eoi_gmfn_v<N> accepting just a single GFN implies that no
more than 32k pIRQ-s can be used by a domain on x86. Document this upper
bound.
To also enforce the limit, (ab)use both arch_hwdom_irqs() (changing its
parameter type) and setup_system_domains(). This is primarily to avoid
exposing the two static variables or introducing yet further arch hooks.
While touching arch_hwdom_irqs() also mark it hwdom-init.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 28 Jun 2024 13:04:30 +0000 (14:04 +0100)]
tools/libxs: Fix CLOEXEC handling in xs_fileno()
xs_fileno() opens a pipe on first use to communicate between the watch thread
and the main thread. Nothing ever sets CLOEXEC on the file descriptors.
Check for the availability of the pipe2() function with configure. Despite
starting life as Linux-only, FreeBSD and NetBSD have gained it.
When pipe2() isn't available, try our best with pipe() and set_cloexec().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Andrew Cooper [Fri, 28 Jun 2024 13:10:12 +0000 (14:10 +0100)]
tools/libxs: Fix CLOEXEC handling in get_dev()
Move the O_CLOEXEC compatibility outside of an #ifdef USE_PTHREAD block.
Introduce set_cloexec() to wrap fcntl() setting FD_CLOEXEC. It will be reused
for other CLOEXEC fixes too.
Use set_cloexec() when O_CLOEXEC isn't available as a best-effort fallback.
Fixes: f4f2f3402b2f ("tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Andrew Cooper [Thu, 27 Jun 2024 12:22:14 +0000 (13:22 +0100)]
tools/dombuilder: Correct the length calculation in xc_dom_alloc_segment()
xc_dom_alloc_segment() is passed a size in bytes, calculates a size in pages
from it, then fills in the new segment information with a bytes value
re-calculated from the number of pages.
This causes the module information given to the guest (MB, or PVH) to have
incorrect sizes; specifically, sizes rounded up to the next page.
This in turn is problematic for Xen. When Xen finds a gzipped module, it
peeks at the end metadata to judge the decompressed size, which is a -4
backreference from the reported end of the module.
Fill in seg->vend using the correct number of bytes.
Fixes: ea7c8a3d0e82 ("libxc: reorganize domain builder guest memory allocator") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
During Gitlab CI randconfig job for RISC-V failed witn an error:
common/trace.c:57:22: error: expected '=', ',', ';', 'asm' or
'__attribute__' before '__read_mostly'
57 | static u32 data_size __read_mostly;
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@cloud.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Tue, 2 Jul 2024 06:35:56 +0000 (08:35 +0200)]
pirq_cleanup_check() leaks
Its original introduction had two issues: For one the "common" part of
the checks (carried out in the macro) was inverted. And then after
removal from the radix tree the structure wasn't scheduled for freeing.
(All structures still left in the radix tree would be freed upon domain
destruction, though.)
For the freeing to be safe even if it didn't use RCU (i.e. to avoid use-
after-free), re-arrange checks/operations in evtchn_close(), such that
the pointer wouldn't be used anymore after calling pirq_cleanup_check()
(noting that unmap_domain_pirq_emuirq() itself calls the function in the
success case).
Fixes: c24536b636f2 ("replace d->nr_pirqs sized arrays with radix tree") Fixes: 79858fee307c ("xen: fix hvm_domain_use_pirq's behavior") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>