Andrew Cooper [Tue, 23 Oct 2018 18:49:34 +0000 (19:49 +0100)]
xen/gnttab: Simplify gnttab_map_frame()
* Reflow some lines to remove unnecessary line breaks.
* Factor out the gnttab_get_frame_gfn() calculation. Neither x86 nor ARM
builds seem to be able to fold the two calls, and the resulting code is far
easier to follow.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 22 Oct 2018 14:50:14 +0000 (15:50 +0100)]
x86/p2m: Switch the two_gfns infrastructure to using gfn_t
Additionally, drop surrounding trailing whitespace.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
--- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wei.liu2@citrix.com> CC: George Dunlap <george.dunlap@eu.citrix.com> CC: Tamas K Lengyel <tamas@tklengyel.com>
Andrew Cooper [Mon, 22 Oct 2018 14:25:14 +0000 (15:25 +0100)]
xen/mm: Drop ARM put_gfn() stub
On x86, get_gfn_*() and put_gfn() are reference counting pairs. All the
get_gfn_*() functions are called from within CONFIG_X86 sections, but
put_gfn() is stubbed out on ARM.
As a result, the common code reads as if ARM is dropping references it never
acquired.
Put all put_gfn() calls in common code inside CONFIG_X86 to make the code
properly balanced, and drop the ARM stub.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 7 Nov 2018 12:25:19 +0000 (12:25 +0000)]
x86/soft-reset: Drop gfn reference after calling get_gfn_query()
get_gfn_query() internally takes the p2m lock, and this error path leaves it
locked.
This wasn't included in XSA-277 because the error path can only be triggered
by a carefully timed phymap operation concurrent with the domain being paused
and the toolstack issuing DOMCTL_soft_reset.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Mon, 8 Oct 2018 18:33:42 +0000 (19:33 +0100)]
xen/arm: p2m: Introduce a helper to generate P2M table entry from a page
Generate P2M table entry requires to set some default values which are
worth to explain in a comment. At the moment, there are 2 places where
such entry are created but only one as proper comment.
So move the code to generate P2M table entry in a separate helper.
This will be helpful in a follow-up patch to make modification on the
defaults.
At the same time, switch the default access from p2m->default_access to
p2m_access_rwx. This should not matter as permission are ignored for
table by the hardware.
Julien Grall [Mon, 8 Oct 2018 18:33:40 +0000 (19:33 +0100)]
xen/arm: guest_walk_tables: Switch the return to bool
At the moment, guest_walk_tables can either return 0, -EFAULT, -EINVAL.
The use of the last 2 are not clearly defined and used inconsistently in
the code. The current only caller does not care about the return
value and the value of it seems very limited (no way to differentiate
between the 15ish error paths).
So switch to bool to simplify the return and make the developer life a
bit easier.
Julien Grall [Mon, 8 Oct 2018 18:33:39 +0000 (19:33 +0100)]
xen/arm: Allow lpae_is_{table, mapping} helpers to work on invalid entry
Currently, lpae_is_{table, mapping} helpers will always return false on
entries with the valid bit unset. However, it would be useful to have them
operating on any entry. For instance to store information in advance but
still request a fault.
With that change, the p2m is now providing an overlay for *_is_{table,
mapping} that will check the valid bit of the entry.
Julien Grall [Mon, 8 Oct 2018 18:33:38 +0000 (19:33 +0100)]
xen/arm: Introduce helpers to get/set an MFN from/to an LPAE entry
The new helpers make it easier to read the code by abstracting the way to
set/get an MFN from/to an LPAE entry. The helpers are using "walk" as the
bits are common across different LPAE stages.
At the same time, use the new helpers to replace the various open-coding
place.
Jan Beulich [Thu, 22 Nov 2018 13:31:06 +0000 (14:31 +0100)]
x86emul: suppress default test harness build with incapable compiler
A top level "make build", as used e.g. by osstest, wants to build all
"all" targets in enabled tools subdirectories, which by default also
includes the emulator test harness. The use of, in particular, AVX512
insns in, again in particular, test_x86_emulator.c causes this build to
fail though when the compiler is not new enough. Take a big hammer and
suppress the default harness build altogether when any of the extensions
used is not supported by the specified (or defaulted to) compiler.
Leave the "run" target alone though: While some of the test code blobs
may fail to build with older compilers, as long as the main executable
can be built some limited testing can still be done.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Sergey Dyasli [Thu, 22 Nov 2018 13:30:14 +0000 (14:30 +0100)]
x86/dom0: use MEMF_no_scrub during Dom0 construction
Now that idle scrub is the default option, all memory is marked as dirty
and alloc_domheap_pages() will do eager scrubbing by default. This can
lead to longer Dom0 construction and potentially to a watchdog timeout,
especially on older H/W (e.g. Harpertown).
Pass MEMF_no_scrub to optimise this process since there is little point
in scrubbing memory for Dom0.
Dario Faggioli [Thu, 22 Nov 2018 11:54:56 +0000 (11:54 +0000)]
credit2: during scheduling, update the idle mask before using it
Load balancing, when happening, at the end of a "scheduler epoch", can
trigger vcpu migration, which in its turn may call runq_tickle(). If the
cpu where this happens was idle, but we're now going to schedule a vcpu
on it, let's update the runq's idle cpus mask accordingly _before_ doing
load balancing.
Not doing that, in fact, may cause runq_tickle() to think that the cpu
is still idle, and tickle it to go pick up a vcpu from the runqueue,
which might be wrong/unideal.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Wei Liu [Wed, 21 Nov 2018 16:28:10 +0000 (16:28 +0000)]
automation: make clean between builds
Currently randconfig tests are more likely to fail than to succeed
because of a bug in xen's build system: symbols-dummy.o's dependency
is wrong, which causes it to not get rebuild between runs, which
eventually causes linking to fail. There may also be other corner
cases we haven't discovered.
The fix is not straightforward. For now, make sure the tree is cleaned
properly between builds so we don't see random failures in Gitlab CI.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Doug Goldstein <cardoe@cardoe.com>
Dario Faggioli [Wed, 21 Nov 2018 15:44:53 +0000 (15:44 +0000)]
xen: sched: Credit2: avoid looping too much (over runqueues) during load balancing
For doing load balancing between runqueues, we check the load of each
runqueue, select the one more "distant" than our own load, and then take
the proper runq lock and attempt vcpu migrations.
If we fail to take such lock, we try again, and the idea was to give up
and bail if, during the checking phase, we can't take the lock of any
runqueue (check the comment near to the 'goto retry;', in the middle of
balance_load())
However, the variable that controls the "give up and bail" part, is not
reset upon retries. Therefore, provided we did manage to check the load of
at least one runqueue during the first pass, if we can't get any runq lock,
we don't bail, but we try again taking the lock of that same runqueue
(and that may even more than once).
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Razvan Cojocaru [Wed, 21 Nov 2018 09:55:21 +0000 (10:55 +0100)]
x86/mem_access: move p2m_mem_access_sanity_check() from header
Move p2m_mem_access_sanity_check() from the asm-x86/mem_access.h
header, where it currently is declared inline, to
arch/x86/mm/mem_access.c. This allows source code that includes it
directly, or indirectly (such as xen/mem_access.h), to not worry
about also including sched.h for is_hvm_domain(). Including
xen/mem_access.h is useful for code wanting to use p2m_access_t.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Wed, 21 Nov 2018 09:54:05 +0000 (10:54 +0100)]
x86: correct instances of PGC_allocated clearing
For domain heap pages assigned to a domain dropping the page reference
tied to PGC_allocated may not drop the last reference, as otherwise the
test_and_clear_bit() might already act on an unowned page.
Work around this where possible, but the need to acquire extra page
references is a fair hint that references should have been acquired in
other places instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Wed, 21 Nov 2018 09:53:14 +0000 (10:53 +0100)]
x86/shadow: un-hide "full" auditing code
In particular sh_oos_audit() has become stale due to changes elsewhere,
and the need for adjustment was not noticed because both "full audit"
flags are off in both release and debug builds. Switch away from pre-
processor conditionals, thus exposing the code to the compiler at all
times. This obviously requires correcting the accumulated issues with
the so far hidden code.
Note that shadow_audit_tables() now also gains an effect with "full
entry audit" mode disabled; the prior code structure suggests that this
was originally intended anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Norbert Manthey [Wed, 21 Nov 2018 09:52:05 +0000 (10:52 +0100)]
retpoline: disable jump tables
To mitigate Spectre v2, Xen has been fixed with a software fix, namely
using retpoline sequences generated by the compiler. This way, indirect
branches are protected against the attack.
However, the retpoline sequence comes with a slow down. To make up for
this, we propose to avoid jump tables in the first place. Without the
retpoline sequences, this code would be less efficient. However, when
retpoline is enabled, this actually results in a slight performance
improvement.
This change might become irrelevant once the compiler starts avoiding
jump tables in case retpolines are used:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
Paul Durrant [Wed, 21 Nov 2018 09:50:29 +0000 (10:50 +0100)]
iommu / p2m: add a page_order parameter to iommu_map/unmap_page()...
...and re-name them to iommu_map/unmap() since they no longer necessarily
operate on a single page.
The P2M code currently contains many loops to deal with the fact that,
while it may be require to handle page orders greater than 0, the
IOMMU map and unmap functions do not.
This patch adds a page_order parameter to those functions and implements
the necessary loops within. This allows the P2M code to be substantially
simplified.
This patch also adds emacs boilerplate to xen/iommu.h to avoid tabbing
problem.
NOTE: This patch does not modify the underlying vendor IOMMU
implementations to deal with more than a single page at once.
Wei Liu [Wed, 14 Nov 2018 18:17:30 +0000 (18:17 +0000)]
tools: update examples/README
This file gets installed to the host system.
This patch cleans it up: 1. remove things that don't exist anymore; 2.
change xm to xl; 3. fix xen-devel list address; 4. add things that are
missing; 5. delete trailing whitespaces.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 20 Nov 2018 14:13:54 +0000 (15:13 +0100)]
x86emul: support AVX512{F,BW} packed integer arithmetic insns
Note: vpadd* / vpsub* et al are put at seemingly the wrong slot of the
big switch(). This is in anticipation of adding e.g. vpunpck* to those
groups (see the legacy/VEX encoded case labels nearby to support this).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 20 Nov 2018 14:11:09 +0000 (15:11 +0100)]
x86emul: support AVX512F legacy-equivalent logic insns
Plus vpternlog{d,q} as being extensively used by the compiler, in order
to facilitate test enabling in the harness as soon as possible. Also the
twobyte_table[] entries for a few more insns get their .d8s field set
right away, in order to not split and later re-combine the groups.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 20 Nov 2018 14:06:24 +0000 (15:06 +0100)]
x86emul: test for correct EVEX Disp8 scaling
Besides the already existing tests (which are going to be extended once
respective ISA extension support is complete), let's also ensure for
every individual insn that their Disp8 scaling (and memory access width)
are correct.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 20 Nov 2018 14:05:12 +0000 (15:05 +0100)]
x86emul: support basic AVX512 moves
Note: SDM Vol 2 rev 067 is not really consistent about EVEX.L'L for LIG
insns - the only place where this is made explicit is a table in
the section titled "Vector Length Orthogonality": While they
tolerate 0, 1, and 2, a value of 3 uniformly leads to #UD.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 20 Nov 2018 13:59:54 +0000 (14:59 +0100)]
x86/shadow: shrink struct page_info's shadow_flags to 16 bits
This is to avoid it overlapping the linear_pt_count field needed for PV
domains. Introduce a separate, HVM-only pagetable_dying field to replace
the sole one left in the upper 16 bits.
Note that the accesses to ->shadow_flags in shadow_{pro,de}mote() get
switched to non-atomic, non-bitops operations, as {test,set,clear}_bit()
are not allowed on uint16_t fields and hence their use would have
required ugly casts. This is fine because all updates of the field ought
to occur with the paging lock held, and other updates of it use |= and
&= as well (i.e. using atomic operations here didn't really guard
against potentially racing updates elsewhere).
This is part of XSA-280.
Reported-by: Prgmr.com Security <security@prgmr.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 20 Nov 2018 13:59:13 +0000 (14:59 +0100)]
x86/shadow: move OOS flag bit positions
In preparation of reducing struct page_info's shadow_flags field to 16
bits, lower the bit positions used for SHF_out_of_sync and
SHF_oos_may_write.
Instead of also adjusting the open coded use in _get_page_type(),
introduce shadow_prepare_page_type_change() to contain knowledge of the
bit positions to shadow code.
This is part of XSA-280.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Tue, 20 Nov 2018 13:58:41 +0000 (14:58 +0100)]
x86/mm: Don't perform flush after failing to update a guests L1e
If the L1e update hasn't occured, the flush cannot do anything useful. This
skips the potentially expensive vcpumask_to_pcpumask() conversion, and
broadcast TLB shootdown.
More importantly however, we might be in the error path due to a bad va
parameter from the guest, and this should not propagate into the TLB flushing
logic. The INVPCID instruction for example raises #GP for a non-canonical
address.
This is XSA-279.
Reported-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 20 Nov 2018 13:58:10 +0000 (14:58 +0100)]
x86/mm: Put the gfn on all paths after get_gfn_query()
c/s 7867181b2 "x86/PoD: correctly handle non-order-0 decrease-reservation
requests" introduced an early exit in guest_remove_page() for unexpected p2m
types. However, get_gfn_query() internally takes the p2m lock, and must be
matched with a put_gfn() call later.
Fix the erroneous comment beside the declaration of get_gfn_query().
This is XSA-277.
Reported-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Tue, 20 Nov 2018 13:57:38 +0000 (14:57 +0100)]
x86/hvm/ioreq: use ref-counted target-assigned shared pages
Passing MEMF_no_refcount to alloc_domheap_pages() will allocate, as
expected, a page that is assigned to the specified domain but is not
accounted for in tot_pages. Unfortunately there is no logic for tracking
such allocations and avoiding any adjustment to tot_pages when the page
is freed.
The only caller of alloc_domheap_pages() that passes MEMF_no_refcount is
hvm_alloc_ioreq_mfn() so this patch removes use of the flag from that
call-site to avoid the possibility of a domain using an ioreq server as
a means to adjust its tot_pages and hence allocate more memory than it
should be able to.
However, the reason for using the flag in the first place was to avoid
the allocation failing if the emulator domain is already at its maximum
memory limit. Hence this patch switches to allocating memory from the
target domain instead of the emulator domain. There is already an extra
memory allowance of 2MB (LIBXL_HVM_EXTRA_MEMORY) applied to HVM guests,
which is sufficient to cover the pages required by the supported
configuration of a single IOREQ server for QEMU. (Stub-domains do not,
so far, use resource mapping). It also also the case the QEMU will have
mapped the IOREQ server pages before the guest boots, hence it is not
possible for the guest to inflate its balloon to consume these pages.
Reported-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Paul Durrant [Tue, 20 Nov 2018 13:57:05 +0000 (14:57 +0100)]
x86/hvm/ioreq: fix page referencing
The code does not take a page reference in hvm_alloc_ioreq_mfn(), only a
type reference. This can lead to a situation where a malicious domain with
XSM_DM_PRIV can engineer a sequence as follows:
- create IOREQ server: no pages as yet.
- acquire resource: page allocated, total 0.
- decrease reservation: -1 ref, total -1.
This will cause Xen to hit a BUG_ON() in free_domheap_pages().
This patch fixes the issue by changing the call to get_page_type() in
hvm_alloc_ioreq_mfn() to a call to get_page_and_type(). This change
in turn requires an extra put_page() in hvm_free_ioreq_mfn() in the case
that _PGC_allocated is still set (i.e. a decrease reservation has not
occurred) to avoid the page being leaked.
This is part of XSA-276.
Reported-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 20 Nov 2018 13:55:14 +0000 (14:55 +0100)]
AMD/IOMMU: suppress PTE merging after initial table creation
The logic is not fit for this purpose, so simply disable its use until
it can be fixed / replaced. Note that this re-enables merging for the
table creation case, which was disabled as a (perhaps unintended) side
effect of the earlier "amd/iommu: fix flush checks". It relies on no
page getting mapped more than once (with different properties) in this
process, as that would still be beyond what the merging logic can cope
with. But arch_iommu_populate_page_table() guarantees this afaict.
Flush checking for AMD IOMMU didn't check whether the previous entry
was present, or whether the flags (writable/readable) changed in order
to decide whether a flush should be executed.
Fix this by taking the writable/readable/next-level fields into account,
together with the present bit.
Along these lines the flushing in amd_iommu_map_page() must not be
omitted for PV domains. The comment there was simply wrong: Mappings may
very well change, both their addresses and their permissions. Ultimately
this should honor iommu_dont_flush_iotlb, but to achieve this
amd_iommu_ops first needs to gain an .iotlb_flush hook.
Also make clear_iommu_pte_present() static, to demonstrate there's no
caller omitting the (subsequent) flush.
Andrew Cooper [Mon, 19 Nov 2018 21:16:28 +0000 (21:16 +0000)]
automation: Add 32bit Debian Jessie builds
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: rebase ] Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 19 Nov 2018 13:03:02 +0000 (13:03 +0000)]
libx86: Work around GCC being unable to spill the PIC hard register
Versions of GCC before 5 can't compile cpuid.c, and fail with the rather cryptic:
In file included from lib/x86/cpuid.c:3:0:
lib/x86/cpuid.c: In function ‘x86_cpuid_policy_fill_native’:
include/xen/lib/x86/cpuid.h:25:5: error: inconsistent operand constraints in an ‘asm’
asm ( "cpuid"
^
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54232 for more details.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 1 Dec 2017 13:29:36 +0000 (13:29 +0000)]
x86/msr: Handle MSR_AMD64_DR{0-3}_ADDRESS_MASK in the new MSR infrastructure
This is a followup to c/s 96f235c26 which fulfils the remaining TODO item.
First of all, the pre-existing SVM code has a bug. The value in
msrs->dr_mask[] may be stale, as we allow direct access to these MSRs.
Resolve this in guest_rdmsr() by reading directly from hardware in the
affected case.
With the reading/writing logic moved to the common guest_{rd,wr}msr()
infrastructure, the migration logic can be simplified. The PV migration logic
drops all of its special casing, and SVM's entire {init,save,load}_msr()
infrastructure becomes unnecessary.
The resulting diffstat shows quite how expensive the PV special cases where in
arch_do_domctl().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Fri, 16 Nov 2018 18:58:55 +0000 (18:58 +0000)]
x86: fix efi.lds dependency generation
RANDCONFIG builds discover efi.lds is not updated when autogenerated
headers are updated.
Upon inspection, the generated .d file contains xen.lds.o as target,
not the once thought efi.lds.o. That's because gcc disregards the
output object name specified by -o when generating dependency, so the
sed invocation has no effect.
Arguably that's a bug in gcc, which can be fixed at some point, so we
make the sed rune work with *.lds. At the same time replace the
hardcoded sed rune for xen.lds with the new one.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 6 Sep 2018 11:35:31 +0000 (11:35 +0000)]
xen/bitmap: Drop all bitmap_scn{,list}printf() infrastructure
All callers have been convered to using %*pb[l]. In the unlikely case that
future code wants to retain this functionaly, it can be replicated in a more
convenient fashon with snprintf().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Andrew Cooper [Thu, 6 Sep 2018 11:26:18 +0000 (11:26 +0000)]
xen/common: Use %*pb[l] instead of {cpu,node}mask_scn{,list}printf()
This removes all use of keyhandler_scratch as a bounce-buffer for the rendered
string. In some cases, collapse combine adjacent printk()'s which are writing
parts of the same line.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Thu, 6 Sep 2018 11:14:56 +0000 (11:14 +0000)]
xen/sched: Use %*pb[l] instead of cpumask_scn{,list}printf()
This removes all use of keyhandler_scratch as a bounce-buffer for the rendered
string. In some cases, collapse combine adjacent printk()'s which are writing
parts of the same line.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Dario Faggioli <dfaggioli@suse.com>
Jan Beulich [Thu, 15 Nov 2018 15:43:36 +0000 (16:43 +0100)]
x86/shadow: emulate_gva_to_mfn() should respect p2m_ioreq_server
Writes to such pages need to be handed to the emulator.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 15 Nov 2018 15:42:25 +0000 (16:42 +0100)]
x86/HVM: __hvm_copy() should not write to p2m_ioreq_server pages
Commit 3bdec530a5 ("x86/HVM: split page straddling emulated accesses in
more cases") introduced a hvm_copy_to_guest_linear() attempt before
falling back to hvmemul_linear_mmio_write(). This is wrong for the
p2m_ioreq_server special case. That change widened a pre-existing issue
though: Other writes to such pages also need to be failed (or forced
through emulation), in particular hypercall buffer writes.
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Fri, 9 Nov 2018 11:23:46 +0000 (11:23 +0000)]
xen: report PV capability in sysctl and use it in toolstack
0e2c886ef ("xen: decouple HVM and IOMMU capabilities") provided a
truth table for what `xl info` would report. In order to make the
table work xen will need to report its PV capability.
Replace cap_directio with cap_pv in libxl IDL. It is safe to do so
because cap_directio has never been released. Revert to use
cap_hvm_directio to mark the availability of IOMMU, to save us from
providing a compatibility layer.
Don't bump sysctl version number because we've already done so.
Also provide a new virt_caps "pv", change "directio" to "pv_directio".
The truth table is now:
Jan Beulich [Thu, 15 Nov 2018 12:36:52 +0000 (13:36 +0100)]
x86/HVM: hvmemul_cmpxchg() should also use known_gla()
To be consistent with the write and rmw cases the mapping approach
should not be used when the guest linear address translation is known.
This in particular excludes the discard-write case from bypassing the
emulation path. This also means that now EFLAGS should actually get
properly updated, despite the discarded write portion of the memory
access.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Jan Beulich [Thu, 15 Nov 2018 12:36:10 +0000 (13:36 +0100)]
x86/HVM: make hvmemul_map_linear_addr() honor p2m_ioreq_server
Write accesses to p2m_ioreq_server pages should get redirected to the
emulator also when using the mapping approach. Extend the
p2m_is_discard_write() check there, and restrict both to the write
access case (this is just a latent bug as currently we go this route
only for write accesses).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Razvan Cojocaru [Thu, 15 Nov 2018 12:35:08 +0000 (13:35 +0100)]
x86/altp2m: propagate ept.ad changes to all active altp2ms
This patch is a pre-requisite for fixing the logdirty VGA issue
(display freezes when switching to a new altp2m view early in a
domain's lifetime).
The new ept_set_ad_sync() function has been added to update all
active altp2ms' ept.ad. New altp2ms will inherit the hostp2m's
ept.ad value.
The p2m_{en,dis}able_hardware_log_dirty() hostp2m locking has
been moved to the new ept_{en,dis}able_hardware_log_dirty()
functions as part of the code refactoring, while locks for the
individual altp2ms are taken in ept_set_ad_sync() (called by
ept_{en,dis}able_pml()).
Suggested-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Tested-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 15 Nov 2018 12:34:21 +0000 (13:34 +0100)]
IOMMU/x86: remove indirection from certain IOMMU hook accesses
There's no need to go through an extra level of indirection. In order to
limit code churn, call sites using struct domain_iommu's platform_ops
don't get touched here, however.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Brian Woods <brian.woods@amd.com>
Jan Beulich [Thu, 15 Nov 2018 12:32:47 +0000 (13:32 +0100)]
IOMMU: move inclusion point of asm/iommu.h
In preparation of allowing inline functions in asm/iommu.h to
de-reference struct struct iommu_ops, move the inclusion downwards past
the declaration of that structure. This in turn requires moving the
struct domain_iommu declaration, as it requires struct arch_iommu to be
fully declared beforehand.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Tue, 2 Oct 2018 19:10:27 +0000 (20:10 +0100)]
Revert "xen/arm: vgic-v3: Delay the initialization of the domain information"
This reverts commit 703d9d5ec13a0f487e7415174ba54e0e3ca158db. The domain
creation logic has been adjusted to set up d->max_vcpus early enough to be
usable in vgic_v3_domain_init().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Tue, 2 Oct 2018 14:02:55 +0000 (14:02 +0000)]
xen/domain: Allocate d->vcpu[] earlier during domain_create()
The ARM code has a chicken-and-egg problem. One of the vGICv3 emulations
wants to know d->max_vcpus to be able to size itself appropriately, but the
current order of initialisation requires the vGIC to be set up before the
requested number of vcpus can be checked.
Move the range checking of config->max_vcpus into sanitise_domain_config()
path, which allows for the allocation of d->vcpu[] and d->max_vcpus to happen
earlier during create, and in particular, before the call to
arch_domain_create().
The x86 side is fairly easy, and implements the logical equivalent of
domain_max_vcpus() but using XEN_DOMCTL_CDF_hvm_guest rather than
is_hvm_domain().
For the ARM side, re-purpose vgic_max_vcpus() to take a domctl vGIC version,
and return the maximum number of supported vCPUs, reusing 0 for "version not
supported". To avoid exporting the vgic_ops structures (which are in the
process of being replaced), hard code the upper limits.
This allows for the removal of the domain_max_vcpus() infrastructure, which is
done to prevent it being reused incorrectly in the future.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Fri, 9 Nov 2018 18:55:59 +0000 (18:55 +0000)]
xen/domain: Move guest type checks into the arch_sanitise_domain_config() path
This is a more appropriate location for the checks to happen, and cleans up
the common code substantially.
Take the opportunity to make ARM strictly require HVM|HAP for guests, which is
how the toolstack behaves, and leave a dprintk() behind for auditing failures.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Make vpl011 being able to be used without a userspace component in Dom0.
In that case, output is printed to the Xen serial and input is received
from the Xen serial one character at a time.
Call domain_vpl011_init during construct_domU if vpl011 is enabled.
Introduce a new ring struct with only the ring array to avoid a waste of
memory. Introduce separate read_data and write_data functions for
initial domains: vpl011_write_data_xen is very simple and just writes
to the console, while vpl011_read_data_xen is a duplicate of
vpl011_read_data. Although textually almost identical, we are forced to
duplicate the functions because the struct layout is different.
To avoid mixing the output of different domains on the console, buffer
the output chars and print line by line. Unless the domain has input
from the serial, in which case we want to print char by char for a
smooth user experience.
The size of SBSA_UART_OUT_BUF_SIZE is arbitrary, choose the same size
as VUART_BUF_SIZE used in vuart.c.
Export a function named console_input_domain() to allow others to know
which domains has input at a given time.
xen: support console_switching between Dom0 and DomUs on ARM
Today Ctrl-AAA is used to switch between Xen and Dom0. Extend the
mechanism to allow for switching between Xen, Dom0, and any of the
initial DomU created from Xen alongside Dom0 out of information provided
via device tree.
Rename xen_rx to console_rx to match the new behavior.
Clarify existing comment about "notify the guest", making it clear that
it is only about the hardware domain.
Switching the console input to domUs started from Xen at boot is
#ifdef'ed to 0 in this patch. The code will be enabled when
vpl011_rx_char_xen is introduced. For now it is disabled for
bisectability.
Move the code to calculate in_fifo_level and out_fifo_level out of
vpl011_data_avail, to the caller.
This change will make it possible to reuse vpl011_data_avail with
different ring structures in a later patch.
Introduce a union in struct vpl011 to contain the console ring members.
A later patch will add another member of the union for the case where
the backend is in Xen.
xen/arm: generate vpl011 node on device tree for domU
Introduce vpl011 support to guests started from Xen: it provides a
simple way to print output from a guest, as most guests come with a
pl011 driver. It is also able to provide a working console with
interrupt support.
The UART exposed to the guest is a SBSA compatible UART and not a PL011.
SBSA UART is a subset of PL011 r1p5. A full PL011 implementation in Xen
would just be too difficult, so guests may require some drivers changes.
Enable vpl011 conditionally if the user requested it.
Call a new function, "create_domUs", from setup_xen to start DomU VMs.
Introduce support for the "xen,domain" compatible node on device tree.
Create new DomU VMs based on the information found on device tree under
"xen,domain". Call construct_domU for each domain.
Introduce a simple global variable named max_init_domid to keep track of
the initial allocated domids. It holds the max domid among the initial
domains.
Move the discard_initial_modules after DomUs have been built.
First create domUs, then start dom0 -- no point in trying to start dom0
when the cpu is busy.
xen/arm: move unregister_init_virtual_region to init_done
Move unregister_init_virtual_region to init_done. Follow the same path
as x86. It is also useful to move it later so that create_domUs can be
called before that in following patches.
Introduce an allocate_memory function able to allocate memory for DomUs
and map it at the right guest addresses, according to the guest memory
map: GUEST_RAM0_BASE and GUEST_RAM1_BASE.