Andrew Cooper [Tue, 10 Apr 2018 15:25:40 +0000 (16:25 +0100)]
x86/intel: Expose MSR_ARCH_CAPS to dom0
The overhead of (the lack of) MDS_NO alone has been measured at 30% on some
workloads. While we're not in a position yet to offer MSR_ARCH_CAPS generally
to guests, dom0 doesn't migrate, so we can pass a subset of hardware values
straight through.
This will cause PVH dom0's not to use KPTI by default, and all dom0's not to
use VERW flushing by default, and to use eIBRS in preference to retpoline on
recent Intel CPUs.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Sun, 23 Aug 2020 08:00:12 +0000 (10:00 +0200)]
tools: tweak tools/libs/libs.mk for being able to support libxenctrl
tools/libs/libs.mk needs to be modified for being able to support
building libxenctrl, as the pkg-config file of that library is not
following the same conventions as those of the other libraries.
So add support for specifying PKG_CONFIG before including libs.mk.
In order to make life easier for unstable libraries like libxenctrl
set MAJOR and MINOR automatically to the Xen-version and 0 when not
specified. This removes the need to bump the versions of unstable
libraries when switching to a new Xen version.
As all libraries built via libs.mk require a map file generate a dummy
one in case there is none existing. This again will help avoiding the
need to bump the libarary version in the map file of an unstable
library in case it is exporting all symbols.
The clean target is missing the removal of _paths.h.
Finally drop the foreach loop when setting PKG_CONFIG_LOCAL, as there
is always only one element in PKG_CONFIG.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Costin Lupu [Fri, 28 Aug 2020 07:17:44 +0000 (09:17 +0200)]
stubdom/grub: update init_netfront() call for mini-os
This patch updates the call of init_netfront() function according to its
recently updated declaration which can also include parameters for gateway
and netmask addresses. While we are here, the patch also removes passing
the ip parameter because (a) it is not used anywhere and (b) it wastes
memory since it would reference a dynamically allocated string.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:12 +0000 (10:00 +0200)]
stubdom: simplify building xen libraries for stubdoms
The pattern for building a Xen library with sources under tools/libs
is always the same. Simplify stubdom/Makefile by defining a callable
make program for those libraries.
Even if not needed right now add the possibility for defining
additional dependencies for a library.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: generate most contents of library make variables
Library related make variables (CFLAGS_lib*, SHDEPS_lib*, LDLIBS_lib*
and SHLIB_lib*) mostly have a common pattern for their values. Generate
most of this content automatically by adding a new per-library variable
defining on which other libraries a lib is depending. Those definitions
are put into an own file in order to make it possible to include it
from various Makefiles, especially for stubdom.
This in turn makes it possible to drop the USELIB variable from each
library Makefile.
The LIBNAME variable can be dropped, too, as it can be derived from the
directory name the library is residing in.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: add a copy of library headers in tools/include
The headers.chk target in tools/Rules.mk tries to compile all headers
stand alone for testing them not to include any internal header.
Unfortunately the headers tested against are not complete, as any
header for a Xen library is not included in the include path of the
test compile run, resulting in a failure in case any of the tested
headers in including an official Xen library header.
Fix that by copying the official headers located in
tools/libs/*/include to tools/include.
In order to support libraries with header name other than xen<lib>.h
or with multiple headers add a LIBHEADER make variable a lib specific
Makefile can set in that case.
Move the headers.chk target from Rules.mk to libs.mk as it is used
for libraries in tools/libs only.
Add NO_HEADERS_CHK variable to skip checking headers as this will be
needed e.g. for libxenctrl.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: switch XEN_LIBXEN* make variables to lower case (XEN_libxen*)
In order to harmonize names of library related make variables switch
XEN_LIBXEN* names to XEN_libxen*, as all other related variables (e.g.
CFLAGS_libxen*, SHDEPS_libxen*, ...) already use this pattern.
Rename XEN_LIBXC to XEN_libxenctrl, XEN_XENSTORE to XEN_libxenstore,
XEN_XENLIGHT to XEN_libxenlight, XEN_XLUTIL to XEN_libxlutil, and
XEN_LIBVCHAN to XEN_libxenvchan for the same reason.
Introduce XEN_libxenguest with the same value as XEN_libxenctrl.
No functional change.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Tue, 10 Apr 2018 15:25:40 +0000 (16:25 +0100)]
x86: Begin to introduce support for MSR_ARCH_CAPS
... including serialisation/deserialisation logic and unit tests.
There is no current way to configure this MSR correctly for guests.
The toolstack side this logic needs building, which is far easier to
do with it in place.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Thu, 27 Aug 2020 10:48:38 +0000 (10:48 +0000)]
gitignore: ignore ebmalloc.c soft link
A previous commit split ebmalloc to its own translation unit but forgot
to modify gitignore.
Fixes: 8856a914bffd ("build: also check for empty .bss.* in .o -> .init.o conversion") Signed-off-by: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 11 Aug 2020 08:02:01 +0000 (09:02 +0100)]
xl: add 'mtu' option to network configuration
This patch adds code to parse a value for MTU from the network configuration
if it is present. The documentation in xl-network-configuration.5.pod is
also modified accordingly.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:02:00 +0000 (09:02 +0100)]
tools/hotplug/Linux: modify set_mtu() to optionally use a configured value...
...and also inform the frontend.
The set_mtu() function in xen-network-common.sh currently sets the backend
vif MTU to match that of the bridge.
A prior patch added code into libxl such that a tools-configured 'mtu' value
may be present in the xenstore backend area. If the node is present in
xenstore then it should be authoritative. Hence set_mtu() is modified to only
read the MTU of the bridge if it is not present.
The function is also modified to write whatever value it applies to the
backend vif into the xenstore frontend area where is may then be used to
configure the frontend network stack.
NOTE: There is also a small modification replacing '$mtu' with '${mtu}'
for style consistency.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:01:59 +0000 (09:01 +0100)]
libxl: wire the libxl_device_nic 'mtu' value into xenstore
Currently the 'mtu' field of libxl_device_nic objects is effectively ignored:
It is set by libxl__device_nic_setdefault() to a slightly odd default value of
1492 but otherwise ignored.
This patch changes the default value to a more conventional 1500 and modifies
libxl__set_xenstore_nic() to write the value into an 'mtu' node in the
xenstore backend area (if it is a non-default value), as well as a read-only
node of the same name in the frontend area.
The backend node is used to set the value of 'mtu' in
libxl__nic_from_xenstore(), when retrieving the configuration.
NOTE: There is currently no way to set a non-default value of 'mtu', hence
the backend node is never written. This, however, will be addressed
by a subsequent patch.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:01:58 +0000 (09:01 +0100)]
tools/hotplug/Linux: remove code duplication in vif-bridge
The 'add' and 'online' cases do exactly the same thing so have 'add' simply
fall through to 'online'.
NOTE: This patch also adds in the missing 'remove' case, which falls though
to 'offline'. (The former is passed for 'tap' devices, the latter for
'vif' devices).
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:01:57 +0000 (09:01 +0100)]
tools/hotplug/Linux: add remove_from_bridge()
This patch adds a remove_from_bridge() function into xen-network-common.sh
to partner with the existing add_to_bridge() function. The vif-bridge
script is then modified to use it.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:01:55 +0000 (09:01 +0100)]
public/io/netif: specify MTU override node
There is currently no documentation to state what MTU a frontend should
adertise to its network stack. It has however long been assumed that the
default value of 1500 is correct.
This patch specifies a mechanism to allow the tools to set the MTU via a
xenstore node in the frontend area and states that the absence of that node
means the frontend should assume an MTU of 1500 octets.
NOTE: The Windows PV frontend has used an MTU sampled from the xenstore
node specified in this patch.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wl@xen.org>
libxl: fix -Werror=stringop-truncation in libxl__prepare_sockaddr_un
In file included from /usr/include/string.h:495,
from libxl_internal.h:38,
from libxl_utils.c:20:
In function 'strncpy',
inlined from 'libxl__prepare_sockaddr_un' at libxl_utils.c:1262:5:
/usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 108 equals destination size [-Werror=stringop-truncation]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
It seems xlu_pci_parse_bdf has a state machine that is too complex for
gcc to understand. The build fails with:
libxlu_pci.c: In function 'xlu_pci_parse_bdf':
libxlu_pci.c:32:18: error: 'func' may be used uninitialized in this function [-Werror=maybe-uninitialized]
32 | pcidev->func = func;
| ~~~~~~~~~~~~~^~~~~~
libxlu_pci.c:51:29: note: 'func' was declared here
51 | unsigned dom, bus, dev, func, vslot = 0;
| ^~~~
libxlu_pci.c:31:17: error: 'dev' may be used uninitialized in this function [-Werror=maybe-uninitialized]
31 | pcidev->dev = dev;
| ~~~~~~~~~~~~^~~~~
libxlu_pci.c:51:24: note: 'dev' was declared here
51 | unsigned dom, bus, dev, func, vslot = 0;
| ^~~
libxlu_pci.c:30:17: error: 'bus' may be used uninitialized in this function [-Werror=maybe-uninitialized]
30 | pcidev->bus = bus;
| ~~~~~~~~~~~~^~~~~
libxlu_pci.c:51:19: note: 'bus' was declared here
51 | unsigned dom, bus, dev, func, vslot = 0;
| ^~~
libxlu_pci.c:29:20: error: 'dom' may be used uninitialized in this function [-Werror=maybe-uninitialized]
29 | pcidev->domain = domain;
| ~~~~~~~~~~~~~~~^~~~~~~~
libxlu_pci.c:51:14: note: 'dom' was declared here
51 | unsigned dom, bus, dev, func, vslot = 0;
| ^~~
cc1: all warnings being treated as errors
Workaround it by setting the initial value to invalid value (0xffffffff)
and then assert on each value being set. This way we mute the gcc
warning, while still detecting bugs in the parse code.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 26 Aug 2020 14:47:19 +0000 (15:47 +0100)]
MAINTAINERS: Update my email address
I am changing my email address. (My affiliation to Citrix remains
unchanged.) See
https://xenbits.xen.org/people/iwj/2020/email-transition.txt
for a signed confirmation with full details.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 27 Aug 2020 07:51:07 +0000 (09:51 +0200)]
x86: don't build with EFI support in shim-exclusive mode
There's no need for xen.efi at all, and there's also no need for EFI
support in xen.gz since the shim runs in PVH mode, i.e. without any
firmware (and hence by implication also without EFI one).
The slightly odd looking use of $(space) is to ensure the new ifneq()
evaluates consistently between "build" and "install" invocations of
make.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 27 Aug 2020 07:46:55 +0000 (09:46 +0200)]
build: also check for empty .bss.* in .o -> .init.o conversion
We're gaining such sections, and like .text.* and .data.* they shouldn't
be present in objects subject to automatic to-init conversion. Oddly
enough for quite some time we did have an instance breaking this rule,
which gets fixed at this occasion, by breaking out the EFI boot
allocator functions into its own translation unit.
Fixes: c5b9805bc1f7 ("efi: create new early memory allocator") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Tue, 25 Aug 2020 15:47:27 +0000 (17:47 +0200)]
make better use of mfn local variable in free_heap_pages()
Besides the one use that there is in the function (of the value
calculated at function entry), there are two more places where the
redundant page-to-address conversion can be avoided.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Wei Liu <wl@xen.org>
Jan Beulich [Tue, 25 Aug 2020 15:46:27 +0000 (17:46 +0200)]
x86: don't maintain compat M2P when !PV32
It's effectively unused in this case (as well as when "pv=no-32").
While touching their definitions anyway, also adjust section placement
of m2p_compat_vstart and compat_idle_pg_table_l2. Similarly, while
putting init_xen_pae_l2_slots() inside #ifdef, also move it to a PV-only
source file.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 25 Aug 2020 15:43:52 +0000 (17:43 +0200)]
x86/EFI: sanitize build logic
With changes done over time and as far as linking goes, the only special
things about building with EFI support enabled are
- the need for the dummy relocations object (for xen.gz uniformly in all
build stages, for xen.efi in stage 1),
- the special efi/buildid.o file, which can't be made part of
efi/built_in.o, due to the extra linker options required for it.
All other efi/*.o can be consumed from the built_in*.o files.
In efi/Makefile, besides moving relocs-dummy.o to "extra", also properly
split between obj-y and obj-bin-y.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 24 Aug 2020 13:38:48 +0000 (15:38 +0200)]
x86/PV: also check kernel endianness when building Dom0
While big endian x86 images are pretty unlikely to appear, merely
logging endianness isn't of much use.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 24 Aug 2020 13:38:03 +0000 (15:38 +0200)]
x86: convert set_gpfn_from_mfn() to a function
It is already a little too heavy for a macro, and more logic is about to
get added to it.
This also allows reducing the scope of compat_machine_to_phys_mapping.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Mon, 24 Aug 2020 13:36:44 +0000 (15:36 +0200)]
x86/vpic: rename irq to pin in vpic_ioport_write
The irq variable is wrongly named, as it's used to store the pin on
the 8259 chip, but not the global irq value. While renaming reduce
it's scope and make it unsigned.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
vpic_elcr_mask wasn't using the v parameter, and instead worked
because in the context of the callers v would be vpic. Fix this by
correctly using the parameter. While there also remove the unneeded
casts to uint8_t and the ending semicolon.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
George Dunlap [Fri, 21 Aug 2020 14:32:01 +0000 (15:32 +0100)]
MAINTAINERS: Add Roger Pau Monné as x86 maintainer
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Enable CPU erratum of Speculative AT on the Neoverse N1 processor
versions r0p0 to r2p0.
Also Fix Cortex A76 Erratum string which had a wrong errata number.
Roger Pau Monne [Mon, 17 Aug 2020 15:57:52 +0000 (17:57 +0200)]
x86/pv: handle writes to the EFER MSR
Silently drop writes to the EFER MSR for PV guests if the value is not
changed from what it's being reported. Current PV Linux will attempt
to write to the MSR with the same value that's been read, and raising
a fault will result in a guest crash.
As part of this work introduce a helper to easily get the EFER value
reported to guests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Mon, 17 Aug 2020 18:45:47 +0000 (19:45 +0100)]
tools/ocaml/xenstored: drop select based socket watching
Poll has been the default since 2014, I think we can safely say by now
that poll() works and we don't need to fall back to select().
This will allow fixing up the way we call poll to be more efficient
(and pave the way for introducing epoll support):
currently poll wraps the select API, which is inefficient.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
xen/arm: cmpxchg: Add missing memory barriers in __cmpxchg_mb_timeout()
The function __cmpxchg_mb_timeout() was intended to have the same
semantics as __cmpxchg_mb(). Unfortunately, the memory barriers were
not added when first implemented.
There is no known issue with the existing callers, but the barriers are
added given this is the expected semantics in Xen.
The issue was introduced by XSA-295.
Backport: 4.8+ Fixes: 86b0bc958373 ("xen/arm: cmpxchg: Provide a new helper that can timeout") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Roger Pau Monné [Wed, 12 Aug 2020 12:47:05 +0000 (14:47 +0200)]
x86/hvm: change EOI exit bitmap helper parameter
Change the last parameter of the update_eoi_exit_bitmap helper to be a
set/clear boolean instead of a triggering field. This is already
inline with how the function is implemented, and will allow deciding
whether an exit is required by the higher layers that call into
update_eoi_exit_bitmap. Note that the current behavior is not changed
by this patch.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Don Slutz [Sun, 9 Aug 2020 18:22:34 +0000 (14:22 -0400)]
rpmball: Adjust to new rpm, do not require --force
Also prevent warning: directory /boot: remove failed
Before:
[root@TestCloud1 xen]# rpm -hiv dist/xen*rpm
Preparing... ################################# [100%]
file /boot from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/bin from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/lib from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/lib64 from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/sbin from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
[root@TestCloud1 xen]# rpm -e xen
warning: directory /boot: remove failed: Device or resource busy
Paul Durrant [Tue, 4 Aug 2020 13:41:59 +0000 (14:41 +0100)]
x86/iommu: convert AMD IOMMU code to use new page table allocator
This patch converts the AMD IOMMU code to use the new page table allocator
function. This allows all the free-ing code to be removed (since it is now
handled by the general x86 code) which reduces TLB and cache thrashing as well
as shortening the code.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 4 Aug 2020 13:41:57 +0000 (14:41 +0100)]
x86/iommu: add common page-table allocator
Instead of having separate page table allocation functions in VT-d and AMD
IOMMU code, we could use a common allocation function in the general x86 code.
This patch adds a new allocation function, iommu_alloc_pgtable(), for this
purpose. The function adds the page table pages to a list. The pages in this
list are then freed by iommu_free_pgtables(), which is called by
domain_relinquish_resources() after PCI devices have been de-assigned.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 4 Aug 2020 13:41:56 +0000 (14:41 +0100)]
x86/iommu: re-arrange arch_iommu to separate common fields...
... from those specific to VT-d or AMD IOMMU, and put the latter in a union.
There is no functional change in this patch, although the initialization of
the 'mapped_rmrrs' list occurs slightly later in iommu_domain_init() since
it is now done (correctly) in VT-d specific code rather than in general x86
code.
NOTE: I have not combined the AMD IOMMU 'root_table' and VT-d 'pgd_maddr'
fields even though they perform essentially the same function. The
concept of 'root table' in the VT-d code is different from that in the
AMD code so attempting to use a common name will probably only serve
to confuse the reader.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
David Woodhouse [Thu, 19 Mar 2020 20:40:24 +0000 (20:40 +0000)]
tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
The do_ls() function has somewhat inconsistent handling of errors.
If reading the node's contents with xs_read() fails, then do_ls() will
just quietly not display the contents.
If reading the node's permissions with xs_get_permissions() fails, then
do_ls() will print a warning, continue, and ultimately won't exit with
an error code (unless another error happens).
If recursing into the node with xs_directory() fails, then do_ls() will
abort immediately, not printing any further nodes.
For persistent failure modes — such as ENOENT because a node has been
removed, or EACCES because it has had its permisions changed since the
xs_directory() on the parent directory returned its name — it's
obviously quite likely that if either of the first two errors occur for
a given node, then so will the third and thus xenstore-ls will abort.
The ENOENT one is actually a fairly common case, and has caused tools to
fail to clean up a network device because it *apparently* already
doesn't exist in xenstore.
There is a school of thought that says, "Well, xenstore-ls returned an
error. So the tools should not trust its output."
The natural corollary of this would surely be that the tools must re-run
xenstore-ls as many times as is necessary until its manages to exit
without hitting the race condition. I am not keen on that conclusion.
For the specific case of ENOENT it seems reasonable to declare that,
but for the timing, we might as well just not have seen that node at
all when calling xs_directory() for the parent. By ignoring the error,
we give acceptable output.
The issue can be reproduced as follows:
(dom0) # for a in `seq 1 1000` ; do
xenstore-write /local/domain/2/foo/$a $a ;
done
Now simultaneously:
(dom0) # for a in `seq 1 999` ; do
xenstore-rm /local/domain/2/foo/$a ;
done
(dom2) # while true ; do
./xenstore-ls -p /local/domain/2/foo | grep -c 1000 ;
done
We should expect to see node 1000 in the output, every time.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Thu, 13 Aug 2020 10:35:53 +0000 (11:35 +0100)]
x86/viridian: remove the viridian_vcpu msg_pending bit mask
The mask does not actually serve a useful purpose as we only use the SynIC
for timer messages. Dropping the mask means that the EOM MSR handler
essentially becomes a no-op. This means we can avoid setting 'message_pending'
for timer messages and hence avoid a VMEXIT for the EOM.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Trammell Hudson [Wed, 12 Aug 2020 17:42:48 +0000 (17:42 +0000)]
x86/setup: Ignore early boot parameters like no-real-mode
There are parameters in xen/arch/x86/boot/cmdline.c that
are only used early in the boot process, so handlers are
necessary to avoid an "Unknown command line option" in
dmesg.
This also updates ignore_param() to generate a temporary
variable name so that the macro can be used more than once
per file.
Signed-off-by: Trammell hudson <hudson@trmm.net> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Leave note to stop TEMP_NAME() finding more general use] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:32:11 +0000 (13:32 +0200)]
x86/EFI: sanitize build logic
With changes done over time and as far as linking goes, the only special
thing about building with EFI support enabled is the need for the dummy
relocations object for xen.gz uniformly in all build stages. All other
efi/*.o can be consumed from the built_in*.o files.
In efi/Makefile, besides moving relocs-dummy.o to "extra", also properly
split between obj-y and obj-bin-y.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:14:02 +0000 (13:14 +0200)]
x86: slightly re-arrange 32-bit handling in dom0_construct_pv()
Add #ifdef-s (the 2nd one will be needed in particular, to guard the
uses of m2p_compat_vstart and HYPERVISOR_COMPAT_VIRT_START()) and fold
duplicate uses of elf_32bit().
Also adjust what gets logged: Avoid "compat32" when support isn't built
in, and don't assume ELF class <> ELFCLASS64 means ELFCLASS32.
While doing this, in code getting touched anyway:
- use ROUNDUP() instead of open-coding it,
- drop a stale (dead) BUG_ON(),
- replace panic() by printk() plus error return, for being consistent
with other code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
The originally used sed expression converted not just multiple leading
zeroes (as intended), but also trailing ones, rendering the error
message somewhat confusing. Collapse zeroes in just the one place where
we need them collapsed, and leave objdump's output as is for all other
purposes.
Fixes: 48115d14743e ("Move more kernel decompression bits to .init.* sections") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:12:00 +0000 (13:12 +0200)]
build: work around bash issue
Older bash (observed with 3.2.57(2)) fails to honor "set -e" for certain
built-in commands ("while" here), despite the command's status correctly
being non-zero. The subsequent objcopy invocation now being separated by
a semicolon results in no failure. Insert an explicit "exit" (replacing
; by && ought to be another possible workaround).
Fixes: e321576f4047 ("xen/build: start using if_changed") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
during boot. The units on the first line are Hz, not MHz, so correct that and
add a space for clarity.
Also, for the min/max line, use three dots instead of two and add more spaces
so that the line can't be mistaken for being a double decimal point typo.
Andrew Cooper [Wed, 5 Aug 2020 11:05:27 +0000 (12:05 +0100)]
x86/ioapic: Fix fixmap error path logic in ioapic_init_mappings()
In the case that bad_ioapic_register() fails, the current position of idx++
means that clear_fixmap(idx) will be called with the wrong index, and not
clean up the mapping just created.
Increment idx as part of the loop, rather than midway through the loop body.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 5 Aug 2020 08:30:18 +0000 (10:30 +0200)]
x86emul: correct AVX512_BF16 insn names in EVEX Disp8 test
The leading 'v' ought to be omitted from the table entries.
Fixes: 7ff66809ccd5 ("x86emul: support AVX512_BF16 insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:29:18 +0000 (10:29 +0200)]
x86emul: AVX512PF insns aren't memory accesses
These are prefetches, so should be treated just like other prefetches.
Fixes: 467e91bde720 ("x86emul: support AVX512PF insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:28:40 +0000 (10:28 +0200)]
x86emul: AVX512F scatter insns are memory writes
While the custom handling renders the "to_mem" field generally unused,
x86_insn_is_mem_write() still (indirectly) consumes that information,
and hence the table entries want to be correct.
Fixes: 7d569b848036 ("x86emul: support AVX512F scatter insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:28:01 +0000 (10:28 +0200)]
x86emul: AVX512{F,BW} down conversion moves are memory writes
For this to be properly reported, the case labels need to move to a
different switch() block.
Fixes: 30e0bdf79828 ("x86emul: support AVX512{F,BW} down conversion moves") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:26:11 +0000 (10:26 +0200)]
x86emul: adjustments to mem access / write logic testing
The combination of specifying a ModR/M byte with the upper two bits set
and the modrm field set to T is pointless - the same test will be
executed twice, i.e. overall things will be slower for no extra gain. I
can only assume this was a copy-and-paste-without-enough-editing mistake
of mine.
Furthermore adjust the base type of a few bit fields to shrink table
size, as subsequently quite a few new entries will get added to the
tables using this type.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:20:59 +0000 (10:20 +0200)]
x86emul: further FPU env testing relaxation for AMD-like CPUs
See the code comment that's being extended. Additionally a few more
zap_fpsel() invocations are needed - whenever we stored state after
there potentially having been a context switch behind our backs.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 3 Aug 2020 08:06:32 +0000 (10:06 +0200)]
libxl: avoid golang building without CONFIG_GOLANG=y
While this doesn't address the real problem I've run into (attempting to
update r/o source files), not recursing into tools/golang/xenlight/ is
enough to fix the build for me for the moment. I don't currently see why 60db5da62ac0 ("libxl: Generate golang bindings in libxl Makefile") found
it necessary to invoke this build step unconditionally.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Mon, 3 Aug 2020 14:27:22 +0000 (16:27 +0200)]
x86emul: avoid assembler warning about .type not taking effect in test harness
gcc re-orders top level blocks by default when optimizing. This
re-ordering results in all our .type directives to get emitted to the
assembly file first, followed by gcc's. The assembler warns about
attempts to change the type of a symbol when it was already set (and
when there's no intervening setting to "notype").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Fri, 31 Jul 2020 15:43:31 +0000 (17:43 +0200)]
x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()
Re-factor the code to take advantage of the fact that the APIC access page is
a 'special' page. The VMX code is left alone and hence the APIC access page is
still inserted into the P2M with type p2m_mmio_direct. This is left alone as it
is not obvious there is another suitable type to use, and the necessary
re-ordering in epte_get_entry_emt() is straightforward.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Fri, 31 Jul 2020 15:42:47 +0000 (17:42 +0200)]
x86/hvm: set 'ipat' in EPT for special pages
All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
when PV drivers running in a guest populate the BAR space of the Xen Platform
PCI Device with pages such as the Shared Info page or Grant Table pages,
accesses to these pages will be cachable.
However, should IOMMU mappings be enabled be enabled for the guest then these
accesses become uncachable. This has a substantial negative effect on I/O
throughput of PV devices. Arguably PV drivers should bot be using BAR space to
host the Shared Info and Grant Table pages but it is currently commonplace for
them to do this and so this problem needs mitigation. Hence this patch makes
sure the 'ipat' bit is set for any special page regardless of where in GFN
space it is mapped.
NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
that there is any similar mitigation possible for AMD NPT. Downstreams
such as Citrix XenServer have been carrying a patch similar to this for
several releases though.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 31 Jul 2020 15:41:58 +0000 (17:41 +0200)]
x86emul: replace UB shifts
Displacement values can be negative, hence we shouldn't left-shift them.
Or else we get
(XEN) UBSAN: Undefined behaviour in x86_emulate/x86_emulate.c:3482:55
(XEN) left shift of negative value -2
While auditing shifts, I noticed a pair of missing parentheses, which
also get added right here.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 31 Jul 2020 15:40:13 +0000 (17:40 +0200)]
x86/PV: drop a few misleading paging_mode_refcounts() checks
The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
was apparently inconsistent so far: There was a type ref acquired
unconditionally for the new top level page table, but the dropping of
the old type ref was conditional upon !paging_mode_refcounts(). Mirror
this also to arch_set_info_guest().
Also move new_guest_cr3()'s #ifdef to around the function - both callers
now get built only when CONFIG_PV, i.e. no need to retain a stub.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 26 Jun 2020 16:46:38 +0000 (17:46 +0100)]
tools/configure: drop BASH configure variable
This is a weird variable to have in the first place. The only user of it is
XSM's CONFIG_SHELL, which opencodes a fallback to sh. The scripts are shebang
sh, which is already necessary to support non-Linux build environments.
Make the mkflask.sh and mkaccess_vector.sh scripts executable, drop the
CONFIG_SHELL, and drop the $BASH variable to prevent further use.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/spinlock: move debug helpers inside the locked regions
Debug helpers such as lock profiling or the invariant pCPU assertions
must strictly be performed inside the exclusive locked region, or else
races might happen.
Note the issue was not strictly introduced by the pointed commit in
the Fixes tag, since lock stats where already incremented before the
barrier, but that commit made it more apparent as manipulating the cpu
field could happen outside of the locked regions and thus trigger the
BUG_ON on rel_lock(). This is only enabled on debug builds, and thus
releases are not affected.
Fixes: 80cba391a35 ('spinlocks: in debug builds store cpu holding the lock') Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Andrew Cooper [Fri, 20 Jul 2018 17:22:25 +0000 (17:22 +0000)]
x86/hvm: Clean up track_dirty_vram() calltree
* Rename nr to nr_frames. A plain 'nr' is confusing to follow in the the
lower levels.
* Use DIV_ROUND_UP() rather than opencoding it in several different ways
* The hypercall input is capped at uint32_t, so there is no need for
nr_frames to be unsigned long in the lower levels.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>