I misread the upstream documentation. Adding max: for x86 is actually
necessary to not have it "attempt to allocate enough page tables to
cover all of *host* RAM which can exhaust its actual memory allocation".
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
Ian Jackson [Fri, 22 Feb 2019 12:26:31 +0000 (12:26 +0000)]
xencommons xenstored startup: Use --pid-file $F not --pid-file=$F
The ocaml libraries in stretch understands only the variant
with separate arguments. So a build of this package for stretch
produces an error message and C xenstored instead of oxenstored.
Fix this here even in the buster branch since it makes backporting
easier.
Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
Ian Jackson [Thu, 21 Feb 2019 17:34:39 +0000 (17:34 +0000)]
xendomains init script; Add should-dependency on nfs-kernel-server
Perhaps the guests use the hosts's NFS server, as in the case reported
in #826871.
In any case, it seems much more likely that on a system which is an
NFS server and also a Xen host, that the guests depend on the NFS,
rather than vice versa.
The obvious exception would be a driver domain. Driver domains
probably need to be handled separately anyway, since they must start
before other domains. (And anyway they are not particularly
well-handled by the current packaging.)
Closes: #826871 Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
xen init script: don't fail when being run in domU
When installing xen-utils-V in a driver domain domU, it drags in
xen-utils-common, which also contains the init script for xenstored and
xenconsoled.
Installing the package will fail right away, because it exits non-zero
after checking whether it's running in a xen dom0 or not:
systemd[1]: Starting LSB: Xen daemons...
xen[7215]: Starting Xen daemons: (warning).
systemd[1]: xen.service: Control process exited, code=exited, status=255/EXCEPTION
systemd[1]: xen.service: Failed with result 'exit-code'.
systemd[1]: Failed to start LSB: Xen daemons.
dpkg: error processing package xen-utils-common (--configure):
installed xen-utils-common package post-installation script subprocess returned error exit status 1
Since there's nothing to be fixed, there should not be a warning. It's
totally fine to skip xenstored, xenconsoled and qemu steps in this case,
so just exit cleanly.
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=922033 Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Hans van Kranenburg <hans@knorrie.org> Acked-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
Ok, I finally tried this, and it's really great that I now have correct
(different) linux kernel options generated for just linux and xen+dom0.
Do more cosmetics and reorganizing to make this wall of comments better
parseable by the eye. It was not obvious to me where sections started
and ended, and I like to have first the explanation and then at the end
the actual variables you can set.
Signed-off-by: Hans van Kranenburg <hans@knorrie.org> Acked-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
...and not as xl.sh. All other files in that completions directory also
don't have .sh. This is just cosmetics, but it annoys me. And we have
dh-exec in here now anyway. :]
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
We're adding oxenstored, and we want use it by default in Xen 4.11.
When doing a Debian upgrade from Stretch to Buster, the xen-utils-common
package will be upgraded to the new version, but it still needs to
support running the Xen 4.8 hypervisor and utils, because the user might
not have rebooted yet, or might boot into the 4.8 hypervisor again
because there were troubles running 4.11.
So, this means that oxenstored might or might not be available, and we
have to deal with that.
See comments in the code for more explanation about the new program
flow.
Also...
* Allow the user to explicitly configure a xenstored binary in
/etc/default/xen.
* Use if statements rather than constructs using || and && to make the
program flow a bit easier to understand.
* Remove the confuscating madness of having 1 as a success return code.
* Don't print the xenstored progress message if we're not touching it.
Signed-off-by: Hans van Kranenburg <hans@knorrie.org> Acked-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
This file has been in the package before, it contained TOOLSTACK= to
switch between xm and xl. It was abandoned and never cleaned up, so old
installs still have this file.
Ship it again.
Signed-off-by: Hans van Kranenburg <hans@knorrie.org> Acked-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
Ian Jackson [Fri, 1 Feb 2019 16:49:33 +0000 (16:49 +0000)]
oxenstored: Build it
* Add the relevant build dependencies
ocaml-native-compilers is good on stretch because it
will get us better output code. In buster the
ocaml-native-compilers package is merged into ocaml-nox.
In bullseye we can drop ocaml-native-compilers from the list.
* Drop the rules line that disables the ocaml build.
* Ship /etc/xen/oxenstored.conf.
* Placate dh_missing about ocaml development libraries.
Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
[add trailing comma, fix typo, change bulleseye line] Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
Ian Jackson [Thu, 7 Feb 2019 16:07:03 +0000 (16:07 +0000)]
xen init script: Tidy up wrong/missing Xen version error handling
We no longer want to discard the stderr from xen-dir, and treat this
as a success. All the reasons why this failure might previously have
been thought tolerable have been dealt with.
Specifically, we will no longer reach this code if we are not running
under Xen, or if we ran this init script on behalf of a xen-utils-V
package for some V different to the running Xen version.
We know we are running under Xen, and that either we're running not as
a result of a maint script, or as a result of a xen-utils-V maint
script for the running Xen version, or as a result of some other maint
script (of which we don't think there are any, but it presumably
expects this code to work).
So if xen-dir fails, let it print its error message, and also exit
nonzero. And don't mention not running under Xen in our
log_warning_msg.
Signed-off-by: Ian Jackson <ian.jackson@citrix.com> Acked-by: Hans van Kranenburg <hans@knorrie.org>
Ian Jackson [Thu, 7 Feb 2019 15:56:49 +0000 (15:56 +0000)]
xen init script: Do nothing if running for wrong Xen package
See the big comment. We think that this is responsible for various
bugs and, particularly, reports of mysteriously missing xenconsoled.
For example, this bug would mean that after a Xen version upgrade,
autoremoval of an obsolete xen-utils-V package would stop the running
xenconsoled. This is obviously awkward to track down, and could occur
many weeks or months after the upgrade.
Closes: #851654 Signed-off-by: Ian Jackson <ian.jackson@citrix.com> Acked-by: Hans van Kranenburg <hans@knorrie.org>
Ian Jackson [Thu, 7 Feb 2019 15:24:06 +0000 (15:24 +0000)]
xen version/upgrade handling: Improve an error message
When xen-dir cannot find xen-utils, mention that this might be because
xen-utils-<RUNNING-XEN-VERSION> was already removed.
This is generally helpful, but it does not solve the `missing
xenconsoled' problems because 1. it only changes messages and
2. actually in the init script, the error message is currently
discarded anyway (!)
But, anyway, it is an improvemennt.
Signed-off-by: Ian Jackson <ian.jackson@citrix.com> Acked-by: Hans van Kranenburg <hans@knorrie.org>
Lintian is complaining: missing-build-dependency-for-dh_-command
"The source package appears to be using a dh_ command but doesn't build
depend on the package that actually provides it. If it uses it, it must
build depend on it."
This part of commit ea2334dfe0 was left behind after redoing the
packaging and getting all libraries in the right place. The build now
complains about it:
dpkg-gensymbols: warning: some libraries disappeared in the symbols
file: libxentoolcore-4.10.so.1
dpkg-gensymbols: warning: debian/libxenstore3.0/DEBIAN/symbols doesn't
match completely debian/libxenstore3.0.symbols
In the theoretical case that xenstore-utils gets upgraded, when
upgrading from Stretch to Buster, and then deliberately gets downgraded
again by the user, a few manual page files could be removed.
In a normal sane upgrade scenario this would never happen.
As pointed out by Gergely in debian bug #919758, the examples in the
grub documentation contain incorrect suggestions, at least for dom0_mem
and earlyprintk.
Correct those, and take the opportunity to refresh all of this a bit,
including the most common set of used options.
Also, point to the online documentation where more explanation about all
options can be found.
Wei Liu [Tue, 29 Jan 2019 11:37:59 +0000 (11:37 +0000)]
libxl: correctly dispose of dominfo list in libxl_name_to_domid
Tamas reported ssid_label was leaked. Use the designated function to
free dominfo list to fix the leakage.
Reported-by: Tamas K Lengyel <tamas@tklengyel.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
(cherry picked from commit f50dd67950ca9d5a517501af10de7c8d88d1a188)
Juergen Gross [Thu, 17 Jan 2019 16:40:59 +0000 (16:40 +0000)]
libxl: don't set gnttab limits in soft reset case
In case of soft reset the gnttab limit setting will fail, so omit it.
Setting of max vcpu count is pointless in this case, too, so we can
drop that as well.
Without this patch soft reset will fail with:
libxl: error: libxl_dom.c:363:libxl__build_pre: Couldn't set grant table limits
Reported-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 1 Feb 2019 10:36:17 +0000 (11:36 +0100)]
x86/hvm: Fix bit checking for CR4 and MSR_EFER
Before the cpuid_policy logic came along, %cr4/EFER auditing on migrate-in was
complicated, because at that point no CPUID information had been set for the
guest. Auditing against the host CPUID was better than nothing, but not
ideal.
Similarly at the time, PVHv1 lacked the "CPUID passed through from hardware"
behaviour with PV guests had, and PVH dom0 had to be special-cased to be able
to boot.
Order of information in the migration stream is still an issue (hence we still
need to keep the restore parameter to cope with a nested virt corner case for
%cr4), but since Xen 4.9, all domains start with a suitable CPUID policy,
which is a more appropriate upper bound than host_cpuid_policy.
Finally, reposition the UMIP logic as it is the only row out of order.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
master commit: 9d8c1d1814b744d0fb41085463db5d8ae025607e
master date: 2019-01-29 11:28:11 +0000
Jan Beulich [Fri, 1 Feb 2019 10:35:41 +0000 (11:35 +0100)]
x86/AMD: flush TLB after ucode update
The increased number of messages (spec_ctrl.c:print_details()) within a
certain time window made me notice some slowness of boot time screen
output. Experimentally I've narrowed the time window to be from
immediately after the early ucode update on the BSP to the PAT write in
cpu_init(), which upon further investigation has an effect because of
the full TLB flush that's implied by that write.
For that reason, as a workaround, flush the TLB of the mapping of the
page that holds the blob. Note that flushing just a single page is
sufficient: As per verify_patch_size() patch size can't exceed 4k, and
the way xmalloc() works the blob can't be crossing a page boundary.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Brian Woods <brian.woods@amd.com>
master commit: f19a199281a23725beb73bef61eb8964d8e225ce
master date: 2019-01-28 17:40:39 +0100
Andrew Cooper [Fri, 1 Feb 2019 10:34:35 +0000 (11:34 +0100)]
xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
When the command line parsing was updated to use const strings and no longer
tokenise with NUL characters, string matches could no longer be made with
strcmp().
Unfortunately, the replacement was buggy. strncmp(s, "opt", ss - s) matches
"o", "op" and "opt" on the command line, as ss - s may be shorter than the
passed literal. Furthermore, parse_bool() is affected by this, so substrings
such as "d", "e" and "o" are considered valid, with the latter being ambiguous
between "on" and "off".
Introduce a new strcmp-like function for the task, which looks for exact
string matches, but declares success when the NUL of the literal matches a
comma, colon or semicolon in the command line fragment.
No change to the intended parsing functionality, but fixes cases where a
partial string on the command line will inadvertently trigger options.
A few areas were more than just a trivial change:
* parse_irq_vector_map_param() gained some style corrections.
* parse_vpmu_params() was rewritten to use the normal list-of-options form,
rather than just fixing up parse_vpmu_param() and leaving the parsing being
hard to follow.
* Instead of making the trivial fix of adding an explicit length check in
parse_bool(), use the length to select which token to we search for, which
is more efficient than the previous linear search over all possible tokens.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
master commit: 2ddf7e3e341df3ccf21613ff7ffd4b7693abe9e9
master date: 2019-01-15 12:58:34 +0000
Sergey Dyasli [Fri, 1 Feb 2019 10:33:44 +0000 (11:33 +0100)]
mm/page_alloc: fix MEMF_no_dma allocations for single NUMA
Currently dma_bitsize is zero by default on single NUMA node machines.
This makes all alloc_domheap_pages() calls with MEMF_no_dma return NULL.
There is only 1 user of MEMF_no_dma: dom0_memflags, which are used
during memory allocation for Dom0. Failing allocation with default
dom0_memflags is especially severe for the PV Dom0 case: it makes
alloc_chunk() to use suboptimal 2MB allocation algorithm with a search
for higher memory addresses.
This can lead to the NMI watchdog timeout during PV Dom0 construction
on some machines, which can be worked around by specifying "dma_bits"
in Xen's cmdline manually.
Fix the issue by ignoring MEMF_no_dma in cases when dma_bitsize is zero,
which means there is no DMA zone. This shouldn't cause any issues for
Dom0 because alloc_heap_pages() will first use higher memory addresses
for satisfying memory allocation requests.
Jan Beulich [Fri, 1 Feb 2019 10:33:09 +0000 (11:33 +0100)]
x86emul: work around SandyBridge errata
There are a number of exception condition related errata on SandyBridge
CPUs, some of which are unexpected #UD (others, of no interest here, are
lack of mandated exceptions, or exceptions of unexpected type). Annotate
the one workaround we already have, and add two more.
Due to the exception recovery we have in place for stub invocations
these aren't security issues.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 0d4d9e8f55602415475e04a5dc8b4ad27845a7f9
master date: 2018-12-18 15:19:47 +0100
Jan Beulich [Fri, 1 Feb 2019 10:32:35 +0000 (11:32 +0100)]
x86emul: fix 3-operand IMUL
While commit 75066cd4ea ("x86emul: fix {,i}mul and {,i}div") indeed did
as its title says, it broke the 3-operand form by uniformly using AL/AX/
EAX/RAX as second source operand. Fix this and add tests covering both
cases.
Reported-by: Andrei Lutas <vlutas@bitdefender.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 19232b378fab04997c0612e5c19e82c29b59d99e
master date: 2018-12-18 14:27:09 +0100
Andrew Cooper [Fri, 1 Feb 2019 10:32:03 +0000 (11:32 +0100)]
x86/hvm: Corrections to RDTSCP intercept handling
For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline
supports the instruction, but the guest may not have RDTSCP in its featureset.
Bring the vmexit handlers in line with the main emulator behaviour by
optionally handing back #UD.
Next on the AMD side, if RDTSCP actually ends up being intercepted on a debug
build or first-gen SVM hardware which lacks NRIP, we first update regs->rcx,
then call __get_instruction_length() asking for RDTSC. As the two
instructions are different (and indeed, different lengths!),
__get_instruction_length_from_list() fails and hands back a #GP fault.
This can demonstrated by putting a guest into tsc_mode="always emulate" and
executing an RDTSCP instruction:
(d1) --- Xen Test Framework ---
(d1) Environment: HVM 64bit (Long mode 4 levels)
(d1) Test rdtscp
(d1) TSC mode 1
(XEN) emulate.c:147:d1v0 __get_instruction_length: Mismatch between expected and actual instruction:
(XEN) emulate.c:152:d1v0 insn_index 8, opcode 0xf0031 modrm 0
(XEN) emulate.c:154:d1v0 rip 0x10475f, nextrip 0x104762, len 3
(XEN) SVM insn len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 f9 0f 31 5b 31 ff 31 c0 e9 c2 db ff ff 00
(d1) ******************************
(d1) PANIC: Unhandled exception at 0008:000000000010475f
(d1) Vec 13 #GP[0000]
(d1) ******************************
First, teach __get_instruction_length() to cope with RDTSCP, and improve
svm_vmexit_do_rdtsc() to ask for the correct instruction. Move the regs->rcx
adjustment into this function to ensure it gets done after we are done
potentially raising faults.
Reported-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 3fd3fda9c26fc3c4f77250f795ed7ff9d38e2ec6
master date: 2018-12-17 16:28:03 +0000
Andrew Cooper [Fri, 1 Feb 2019 10:31:28 +0000 (11:31 +0100)]
x86/VT-x: Don't activate VMCS Shadowing outside of nested vmx mode
By default on capable hardware, SECONDARY_EXEC_ENABLE_VMCS_SHADOWING is
activated unilaterally. The VMCS Link pointer is initialised to ~0, but the
VMREAD/VMWRITE bitmap pointers are not.
This causes the 16bit IVT and Bios Data Area get interpreted as the read/write
permission bitmap for guests which blindly execute VMREAD/VMWRITE
instructions.
This is not a security issue because the VMCS Link pointer being ~0 causes
VMREAD/VMWRITE to complete with VMFailInvalid (rather than modifying a
potential shadow VMCS), and the contents of MFN 0 has already been determined
not to contain any interesting data because of L1TF's ability to read that 4k
frame.
Leave VMCS Shadowing disabled by default, and toggle it in
nvmx_{set,clear}_vmcs_pointer(). This isn't the most efficient course of
action, but it is the most simple way of leaving nested-virt working as it did
before.
While editing construct_vmcs(), collect all default secondary_exec_control
modifications together. The disabling of PML is latently buggy because it
happens after secondary_exec_control are written into the VMCS, although there
is an unconditional update later which writes the correct value into hardware.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 75ce36eb72cb93e8a3c9f60fd5e697067921d712
master date: 2018-12-10 16:24:08 +0000
Jan Beulich [Fri, 1 Feb 2019 10:30:55 +0000 (11:30 +0100)]
x86/shadow: don't enable shadow mode with too small a shadow allocation
We've had more than one report of host crashes after failed migration,
and in at least one case we've had a hint towards a too far shrunk
shadow allocation pool. Instead of just checking the pool for being
empty, check whether the pool is smaller than what
shadow_set_allocation() would minimally bump it to if it was invoked in
the first place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
master commit: 2634b997afabfdc5a972e07e536dfbc6febb4385
master date: 2018-11-30 12:10:39 +0100
Jan Beulich [Fri, 1 Feb 2019 10:29:53 +0000 (11:29 +0100)]
ns16550/PCI: fix skipping of devices
Selecting between single/multiple BAR mode should happen after checking
whether to skip the present device, or else multi-BAR devices won't be
skipped correctly, due to port_idx getting set to zero in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
master commit: c34fe0468acc61aca62422483c37a1309708f1cb
master date: 2018-11-30 12:07:33 +0100
Andrew Cooper [Fri, 1 Feb 2019 10:29:16 +0000 (11:29 +0100)]
x86/soft-reset: Drop gfn reference after calling get_gfn_query()
get_gfn_query() internally takes the p2m lock, and this error path leaves it
locked.
This wasn't included in XSA-277 because the error path can only be triggered
by a carefully timed phymap operation concurrent with the domain being paused
and the toolstack issuing DOMCTL_soft_reset.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: e7969e917cef276318f722a607985a2e896aeb94
master date: 2018-11-22 17:58:46 +0000
Andrew Cooper [Fri, 1 Feb 2019 10:28:45 +0000 (11:28 +0100)]
x86/mem-sharing: Don't leave the altp2m lock held when nominating a page
get_gfn_type_access() internally takes the p2m lock, and nothing ever unlocks
it. Switch to using the unlocked accessor instead.
This wasn't included in XSA-277 because neither mem-sharing nor altp2m are
supported.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: d6e02850d3b45c9658457214a749cc48097bdef4
master date: 2018-11-22 17:58:46 +0000
Jan Beulich [Fri, 1 Feb 2019 10:27:59 +0000 (11:27 +0100)]
x86/HVM: __hvm_copy() should not write to p2m_ioreq_server pages
Commit 3bdec530a5 ("x86/HVM: split page straddling emulated accesses in
more cases") introduced a hvm_copy_to_guest_linear() attempt before
falling back to hvmemul_linear_mmio_write(). This is wrong for the
p2m_ioreq_server special case. That change widened a pre-existing issue
though: Other writes to such pages also need to be failed (or forced
through emulation), in particular hypercall buffer writes.
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: d7bff2bc003cd5fd8c618b70c62b8fcfd9cd187e
master date: 2018-11-15 16:42:25 +0100
Jan Beulich [Fri, 1 Feb 2019 10:25:52 +0000 (11:25 +0100)]
VMX: fix vmx_handle_eoi()
In commit 303066fdb1e ("VMX: fix interaction of APIC-V and Viridian
emulation") I screwed up: Instead of clearing SVI, other ISR bits
should be taken into account.
Introduce a new helper set_svi(), split out of vmx_process_isr(), and
use it also from vmx_handle_eoi().
Following the problems in vmx_intr_assist() (see the still present big
block of debugging code there) also warn (once) if EOI'd vector and
original SVI don't match.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 45cb9a4123b5550eb1f84846fe5482acae1c13a3
master date: 2018-11-02 12:15:33 +0100
Julien Grall [Mon, 1 Oct 2018 16:42:27 +0000 (17:42 +0100)]
xen/arm: vgic-v3: Don't create empty re-distributor regions
At the moment, Xen is assuming the hardware domain will have the same
number of re-distributor regions as the host. However, as the
number of CPUs or the stride (e.g on GICv4) may be different we end up
exposing regions which does not contain any re-distributors.
When booting, Linux will go through all the re-distributor region to
check whether a property (e.g vPLIs) is available accross all the
re-distributors. This will result to a data abort on empty regions
because there are no underlying re-distributor.
So we need to limit the number of regions exposed to the hardware
domain. The code reworked to only expose the minimun number of regions
required by the hardware domain. It is assumed the regions will be
populated starting from the first one.
Lastly, rename vgic_v3_rdist_count to reflect the value return by the
helper.
Julien Grall [Mon, 1 Oct 2018 16:42:26 +0000 (17:42 +0100)]
xen/arm: vgic-v3: Delay the initialization of the domain information
A follow-up patch will require to know the number of vCPUs when
initializating the vGICv3 domain structure. However this information is
not available at domain creation. This is only known once
XEN_DOMCTL_max_vpus is called for that domain.
In order to get the max vCPUs around, delay the domain part of the vGIC
v3 initialization until the first vCPU of the domain is initialized.
xen/arm: check for multiboot nodes only under /chosen
Make sure to only look for multiboot compatible nodes only under
/chosen, not under any other paths (depth <= 3).
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
[julien: Use sizeof(path) instead of len ] Reviewed-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit c32e3689c546305d4eae53e6ccf9c8b4e048c7df)
Julien Grall [Tue, 23 Oct 2018 18:17:07 +0000 (19:17 +0100)]
xen/arm: gic: Ensure ordering between read of INTACK and shared data
When an IPI is generated by a CPU, the pattern looks roughly like:
<write shared data>
dsb(sy);
<write to GIC to signal SGI>
On the receiving CPU we rely on the fact that, once we've taken the
interrupt, then the freshly written shared data must be visible to us.
Put another way, the CPU isn't going to speculate taking an interrupt.
Unfortunately, this assumption turns out to be broken.
Consider that CPUx wants to send an IPI to CPUy, which will cause CPUy
to read some shared_data. Before CPUx has done anything, a random
peripheral raises an IRQ to the GIC and the IRQ line on CPUy is raised.
CPUy then takes the IRQ and starts executing the entry code, heading
towards gic_handle_irq. Furthermore, let's assume that a bunch of the
previous interrupts handled by CPUy were SGIs, so the branch predictor
kicks in and speculates that irqnr will be <16 and we're likely to
head into handle_IPI. The prefetcher then grabs a speculative copy of
shared_data which contains a stale value.
Meanwhile, CPUx gets round to updating shared_data and asking the GIC
to send an SGI to CPUy. Internally, the GIC decides that the SGI is
more important than the peripheral interrupt (which hasn't yet been
ACKed) but doesn't need to do anything to CPUy, because the IRQ line
is already raised.
CPUy then reads the ACK register on the GIC, sees the SGI value which
confirms the branch prediction and we end up with a stale shared_data
value.
This patch fixes the problem by adding an smp_rmb() to the IPI entry
code in do_SGI.
Julien Grall [Tue, 23 Oct 2018 18:17:06 +0000 (19:17 +0100)]
xen/arm: gic: Ensure we have an ISB between ack and do_IRQ()
Devices that expose their interrupt status registers via system
registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,
vgic (although unused by Linux), ...) rely on a context synchronising
operation on the CPU to ensure that the updated status register is
visible to the CPU when handling the interrupt. This usually happens as
a result of taking the IRQ exception in the first place, but there are
two race scenarios where this isn't the case.
For example, let's say we have two peripherals (X and Y), where Y uses a
system register for its interrupt status.
Case 1:
1. CPU takes an IRQ exception as a result of X raising an interrupt
2. Y then raises its interrupt line, but the update to its system
register is not yet visible to the CPU
3. The GIC decides to expose Y's interrupt number first in the Ack
register
4. The CPU runs the IRQ handler for Y, but the status register is stale
Case 2:
1. CPU takes an IRQ exception as a result of X raising an interrupt
2. CPU reads the interrupt number for X from the Ack register and runs
its IRQ handler
3. Y raises its interrupt line and the Ack register is updated, but
again, the update to its system register is not yet visible to the
CPU.
4. Since the GIC drivers poll the Ack register, we read Y's interrupt
number and run its handler without a context synchronisation
operation, therefore seeing the stale register value.
In either case, we run the risk of missing an IRQ. This patch solves the
problem by ensuring that we execute an ISB in the GIC drivers prior
to invoking the interrupt handler.
Marc Zyngier [Tue, 25 Sep 2018 17:20:38 +0000 (18:20 +0100)]
xen/arm: smccc-1.1: Make return values unsigned long
An unfortunate consequence of having a strong typing for the input
values to the SMC call is that it also affects the type of the
return values, limiting r0 to 32 bits and r{1,2,3} to whatever
was passed as an input.
Let's turn everything into "unsigned long", which satisfies the
requirements of both architectures, and allows for the full
range of return values.
It appears that the pygrub script itself is still broken because of
import problems with a renamed library. Make sure we're not claiming
that the bugs are solved.
When a user uses a locale that results in translating menu item titles
into another language than English, the hardcoded "Debian GNU/Linux,
with Xen hypervisor" would not match anything.
So, use gettext to make it match the right translated entry.
Also see
- https://bugs.launchpad.net/ubuntu/+source/xen/+bug/1321144
- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=865086
Note that (thanks Ian for the info):
* When GRUB_TERMINAL is not empty and set to anything other than
`gfxterm', grub will not do translation at all, because grub-mkconfig
thinks that other GRUB_TERMINAL values including `serial' preclude
non-ASCII characters, and that causes it to set LANG=C. (I have
GRUB_TERMINAL="serial console", which caused much confusion when
trying to test all of this).
* Just trying the printf "$(gettext... below is not enough to test if a
translation shows up. It needs -d grub additionally for gettext, or
TEXTDOMAIN=grub in the environment, which is probably present when
this file gets run by update-grub.
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>
Ian Jackson [Wed, 10 Oct 2018 14:43:49 +0000 (15:43 +0100)]
debian/control: Add missing Replaces on old xen-utils-common
Previously the xenstore utility manpages were erroneously in
xen-utils-common. We need to declare Replaces so that dpkg lets us
take them over rather than regarding it as a file conflict.
I think we can safely drop the old Conflicts/Replaces from Xen 3.1.0
days.
Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
Andrew Cooper [Tue, 20 Nov 2018 14:35:48 +0000 (15:35 +0100)]
x86/dom0: Avoid using 1G superpages if shadowing may be necessary
The shadow code doesn't support 1G superpages, and will hand #PF[RSVD] back to
guests.
For dom0's with 512GB of RAM or more (and subject to the P2M alignment), Xen's
domain builder might use 1G superpages.
Avoid using 1G superpages (falling back to 2M superpages instead) if there is
a reasonable chance that we may have to shadow dom0. This assumes that there
are no circumstances where we will activate logdirty mode on dom0.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 96f6ee15ad7ca96472779fc5c083b4149495c584
master date: 2018-11-12 11:26:04 +0000
Jan Beulich [Tue, 20 Nov 2018 14:34:51 +0000 (15:34 +0100)]
x86/shadow: shrink struct page_info's shadow_flags to 16 bits
This is to avoid it overlapping the linear_pt_count field needed for PV
domains. Introduce a separate, HVM-only pagetable_dying field to replace
the sole one left in the upper 16 bits.
Note that the accesses to ->shadow_flags in shadow_{pro,de}mote() get
switched to non-atomic, non-bitops operations, as {test,set,clear}_bit()
are not allowed on uint16_t fields and hence their use would have
required ugly casts. This is fine because all updates of the field ought
to occur with the paging lock held, and other updates of it use |= and
&= as well (i.e. using atomic operations here didn't really guard
against potentially racing updates elsewhere).
This is part of XSA-280.
Reported-by: Prgmr.com Security <security@prgmr.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 789589968ed90e82a832dbc60e958c76b787be7e
master date: 2018-11-20 14:59:54 +0100
Jan Beulich [Tue, 20 Nov 2018 14:34:13 +0000 (15:34 +0100)]
x86/shadow: move OOS flag bit positions
In preparation of reducing struct page_info's shadow_flags field to 16
bits, lower the bit positions used for SHF_out_of_sync and
SHF_oos_may_write.
Instead of also adjusting the open coded use in _get_page_type(),
introduce shadow_prepare_page_type_change() to contain knowledge of the
bit positions to shadow code.
This is part of XSA-280.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: d68e1070c3e8f4af7a31040f08bdd98e6d6eac1d
master date: 2018-11-20 14:59:13 +0100