Roger Pau Monne [Fri, 6 Oct 2017 13:51:59 +0000 (14:51 +0100)]
libxc: panic when trying to create a PVH guest without kernel support
Previously when trying to boot a PV capable but not PVH capable kernel
inside of a PVH container xc_dom_guest_type would succeed and return a
PV guest type, which would lead to failures later on in the build
process.
Instead provide a clear error message when trying to create a PVH
guest using a kernel that doesn't support PVH.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Mon, 9 Oct 2017 14:27:33 +0000 (16:27 +0200)]
x86emul: re-order cases of main switch statement
Re-store intended numerical ordering, which has become "violated"
mostly by incremental additions where moving around bigger chunks did
not seem advisable. One exception though at the very top of the
switch(): Keeping the arithmetic ops together seems preferable over
entirely strict ordering.
Additionally move a few macro definitions before their first uses (the
placement is benign as long as those uses are themselves only macro
definitions, but that's going to change when those macros have helpers
broken out).
No (intended) functional change.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Mon, 9 Oct 2017 14:04:11 +0000 (16:04 +0200)]
fuzz/x86_emulate: clear errors after each iteration
Once feof() returns true for a stream, it will continue to return true
for that stream until clearerr() is called (or the stream is closed
and re-opened).
In llvm-clang-fast-mode, the same file descriptor is used for each
iteration of the loop, meaning that the "Input too large" check was
broken -- feof() would return true even if the fread() hadn't hit the
end of the file. The result is that AFL generates testcases of
arbitrary size.
Fix this by clearing the error after each iteration.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
George Dunlap [Mon, 9 Oct 2017 14:03:53 +0000 (16:03 +0200)]
fuzz/x86_emulate: actually use cpu_regs input
Commit c07574b reorganized the way fuzzing was done, explicitly
creating a structure that the input data would be copied into.
Unfortunately, the cpu register state used by the emulator is on the
stack; it's cleared, but data is never copied into it.
If we're explicitly setting an entirely new cpu_regs struct for each
new input anyway, there's no need to have two copies around anymore;
just point to the one in the data structure.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 9 Oct 2017 14:03:10 +0000 (16:03 +0200)]
x86emul: fold/eliminate some local variables
Make i switch-wide (at once making it unsigned, as it should have been)
and introduce n (for immediate use in enter and aam/aad handling).
Eliminate on-stack arrays in pusha/popa handling. Use ea.val instead of
a custom variable in bound handling.
No (intended) functional change.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 9 Oct 2017 14:01:22 +0000 (16:01 +0200)]
x86emul/fuzz: add rudimentary limit checking
fuzz_insn_fetch() is the only data access helper where it is possible
to see offsets larger than 4Gb in 16- or 32-bit modes, as we leave the
incoming rIP untouched in the emulator itself. The check is needed here
as otherwise, after successfully fetching insn bytes, we may end up
zero-extending EIP soon after complete_insn, which collides with the
X86EMUL_EXCEPTION-conditional respective ASSERT() in
x86_emulate_wrapper(). (NB: put_rep_prefix() is what allows
complete_insn to be reached with rc set to other than X86EMUL_OKAY or
X86EMUL_DONE. See also commit 53f87c03b4 ["x86emul: generalize
exception handling for rep_* hooks"].)
Add assert()-s for all other (data) access routines, as effective
address generation in the emulator ought to guarantee in-range values.
For them to not trigger, several adjustments to the emulator's address
calculations are needed: While the DstBitBase one is really mandatory,
the specification allows for either original or new behavior for two-
part accesses. Observed behavior on real hardware, however, is for such
accesses to silently wrap at the 2^^32 boundary in other than 64-bit
mode, just like they do at the 2^^64 boundary in 64-bit mode, which our
code is now being brought in line with. While adding truncate_ea()
invocations there, also convert open coded instances of it.
Reported-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Sun, 8 Oct 2017 14:12:18 +0000 (15:12 +0100)]
xen/domctl: Fix Xen heap leak via XEN_DOMCTL_getvcpucontext
The backing structure for XEN_DOMCTL_getvcpucontext is only zeroed in the x86
HVM case. At the very least, this means that ARM returns junk through its
flags field (as it is only ever conditionally or'd into), and x86 PV leaks
data through gdt_frames[14...15]. (An exhaustive search for other leaks
hasn't been performed).
Unconditionally zero the memory upon allocation, and forgo the double clear
for x86 HVM. These hypercalls are not on hotpaths.
Note that this does not qualify for an XSA. Per XSA-77,
XEN_DOMCTL_getvcpucontext is unsafe for disaggregation, meaning that only the
control domain can use this hypercall.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Julien Grall <julien.grall@linaro.org>
Julien Grall [Mon, 9 Oct 2017 11:26:35 +0000 (13:26 +0200)]
xenoprof: convert the file to use typesafe MFN
The file common/xenoprof.c is now converted to use typesafe. This is
requiring to override the macros virt_to_mfn and mfn_to_page to make
them work with mfn_t.
Also, add a couple of missing newlines in the code modified.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Mon, 9 Oct 2017 11:24:50 +0000 (13:24 +0200)]
x86: use maddr_to_page and maddr_to_mfn to avoid open-coded >> PAGE_SHIFT
The constructions _mfn(... > PAGE_SHIFT) and mfn_to_page(... >> PAGE_SHIFT)
could respectively be replaced by maddr_to_mfn(...) and
maddr_to_page(...).
Signed-off-by: Julien Grall <julien.grall@linaro.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Dario Faggioli [Mon, 9 Oct 2017 11:24:01 +0000 (13:24 +0200)]
RCU: make the period of the idle timer adaptive
Basically, if the RCU idle timer, when (if!) it fires,
finds that the grace period isn't over, we increase the
timer's period (i.e., it will fire later, next time).
If, OTOH, it finds the grace period is already finished,
we decrease the timer's period (i.e., it will fire a bit
earlier next time).
The goal is to let the period timer sefl-adjust to a
number of 'misses', of the order of 1%.
Suggested-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Dario Faggioli [Mon, 9 Oct 2017 11:23:24 +0000 (13:23 +0200)]
RCU: make the period of the idle timer configurable
Make it possible for the user to specify, with the boot
time parameter rcu-idle-timer-period-ms, how frequently
a CPU that went idle with pending RCU callbacks should be
woken up to check if the grace period ended.
Typical values (i.e., some of the values used by Linux as
the tick frequency) are 10, 4 or 1 ms. Default valus (used
when this parameter is not specified) is 10ms. Maximum is
100ms.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Dario Faggioli [Mon, 9 Oct 2017 11:22:07 +0000 (13:22 +0200)]
RCU: let the RCU idle timer handler run
If stop_timer() is called between when the RCU
idle timer's interrupt arrives (and TIMER_SOFTIRQ is
raised) and when softirqs are checked and handled, the
timer is deactivated, and the handler never runs.
This happens to the RCU idle timer because stop_timer()
is called on it during the wakeup from idle (e.g., C-states,
on x86) path.
To fix that, we avoid calling stop_timer(), in case we see
that the timer itself is:
- still active,
- expired (i.e., it's expiry time is in the past).
In fact, that indicates (for this particular timer) that
it has fired, and we are just about to handle the TIMER_SOFTIRQ
(which will perform the timer deactivation and run its handler).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/arm: p2m: Read *_mapped_gfn with the p2m lock taken
*_mapped_gfn are currently read before acquiring the lock. However, they
may be modified by the p2m code before the lock was acquired. This means
we will use the wrong values.
Fix it by moving the read inside the section protected by the p2m lock.
Dario Faggioli [Fri, 6 Oct 2017 16:02:34 +0000 (18:02 +0200)]
MAINTAINERS: update entries to Dario's new email address
Replace, in the 'M:' fields of the components I co-maintain
('CPU POOLS', 'SCHEDULING' and 'RTDS SCHEDULER'), the Citrix
email, to which I don't have access any longer, with my
personal email.
Awais Masood [Fri, 6 Oct 2017 16:01:50 +0000 (18:01 +0200)]
ns16550: fix ISR lockup on Allwinner uart
This patch fixes an ISR lockup seen on Allwinner uart
On Allwinner H5, serial driver goes into an infinite loop
when interrupts are enabled. The reason is a residual
"busy detect" interrupt. Since the condition UART_IIR_NOINT
will not be true unless this interrupt is cleared, the
interrupt handler will remain locked up in this while loop.
It checks for a busy condition during setup and clears the
condition by reading UART_USR register.
On Allwinner hardware, the "busy detect" condition occurs
later because an LCR write is performed during setup 'after'
this clear and if uart is busy, the "busy detect" condition
will trigger again and cause the ISR lockup.
To solve this problem, the same UART_USR read operation needs
to be performed within the interrupt handler to clear this
condition.
Linux dw 8250 driver also handles this condition within
interrupt handler
http://elixir.free-electrons.com/linux/latest/source/drivers/tty/serial/8250/8250_dw.c#L233
Tested on Orange Pi PC2 (H5). This issue is seen on H3
as well and the same fix works.
Signed-off-by: Awais Masood <awais.masood@vadion.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Thu, 5 Oct 2017 17:42:16 +0000 (18:42 +0100)]
xen/x86: mem_sharing: Use copy_domain_page in __mem_sharing_unshare_page
The function __mem_sharing_unshare_page contains an open-code version of
copy_domain_page. Use the function to simplify a bit the code.
At the same time replace _mfn(__page_to_mfn(...)) by page_to_mfn(...)
given that the file given already provides a typesafe version of page_to_mfn.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Sergey Dyasli [Tue, 3 Oct 2017 15:21:04 +0000 (16:21 +0100)]
x86/np2m: add break to np2m_flush_eptp()
Now that np2m sharing is implemented, there can be only one np2m object
with the same np2m_base. Break from loop if the required np2m was found
during np2m_flush_eptp().
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Sergey Dyasli [Tue, 3 Oct 2017 15:21:03 +0000 (16:21 +0100)]
x86/np2m: refactor p2m_get_nestedp2m_locked()
Remove some code duplication.
Suggested-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Sergey Dyasli [Tue, 3 Oct 2017 15:21:02 +0000 (16:21 +0100)]
x86/np2m: implement sharing of np2m between vCPUs
At the moment, nested p2ms are not shared between vcpus even if they
share the same base pointer.
Modify p2m_get_nestedp2m() to allow sharing a np2m between multiple
vcpus with the same np2m_base (L1 np2m_base value in VMCx12).
If the current np2m doesn't match the current base pointer, first look
for another nested p2m in the same domain with the same base pointer,
before reclaiming one from the LRU.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Sergey Dyasli [Tue, 3 Oct 2017 15:21:01 +0000 (16:21 +0100)]
x86/np2m: send flush IPIs only when a vcpu is actively using an np2m
Flush IPIs are sent to all cpus in an np2m's dirty_cpumask when
updated. This mask however is far too broad. A pcpu's bit is set in
the cpumask when a vcpu runs on that pcpu, but is only cleared when a
flush happens. This means that the IPI includes the current pcpu of
vcpus that are not currently running, and also includes any pcpu that
has ever had a vcpu use this p2m since the last flush (which in turn
will cause spurious invalidations if a different vcpu is using an np2m).
Avoid these IPIs by keeping closer track of where an np2m is being used,
and when a vcpu needs to be flushed:
- On schedule-out, clear v->processor in p2m->dirty_cpumask
- Add a 'generation' counter to the p2m and nestedvcpu structs to
detect changes that would require re-loads on re-entry
- On schedule-in or p2m change:
- Set v->processor in p2m->dirty_cpumask
- flush the vcpu's nested p2m pointer (and update nv->generation) if
the generation changed
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Sergey Dyasli [Tue, 3 Oct 2017 15:21:00 +0000 (16:21 +0100)]
x86/vvmx: make updating shadow EPTP value more efficient
At the moment, the shadow EPTP value is written unconditionally in
ept_handle_violation().
Instead, write the value on vmentry to the guest; but only write it if
the value needs updating.
To detect this, add a flag to the nestedvcpu struct, stale_np2m, to
indicate when such an action is necessary. Set it when the nested p2m
changes or when the np2m is flushed by an IPI, and clear it when we
write the new value.
Since an IPI invalidating the p2m may happen between
nvmx_switch_guest() and vmx_vmenter, but we can't perform the vmwrite
with interrupts disabled, check the flag just before entering the
guest and restart the vmentry if it's set.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
There is a possibility for nested_p2m to became stale between
nestedhvm_hap_nested_page_fault() and nestedhap_fix_p2m(). At the moment
this is handled by detecting such a race inside nestedhap_fix_p2m() and
special-casing it.
Instead, introduce p2m_get_nestedp2m_locked(), which will returned a
still-locked p2m. This allows us to call nestedhap_fix_p2m() with the
lock held and remove the code detecting the special-case.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Sergey Dyasli [Tue, 3 Oct 2017 15:20:58 +0000 (16:20 +0100)]
x86/np2m: remove np2m_base from p2m_get_nestedp2m()
Remove np2m_base parameter as it should always match the value of
np2m_base in VMCx12.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Sergey Dyasli [Tue, 3 Oct 2017 15:20:57 +0000 (16:20 +0100)]
x86/np2m: flush all np2m objects on nested INVEPT
At the moment, nvmx_handle_invept() updates the current np2m just to
flush it. Instead introduce a function, np2m_flush_base(), which will
look up the np2m base pointer and call p2m_flush_table() instead.
Unfortunately, since we don't know which p2m a given vcpu is using, we
must flush all p2ms that share that base pointer.
Convert p2m_flush_table() into p2m_flush_table_locked() in order not
to release the p2m_lock after np2m_base check.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Sergey Dyasli [Tue, 3 Oct 2017 15:20:56 +0000 (16:20 +0100)]
x86/np2m: refactor p2m_get_nestedp2m()
1. Add a helper function assign_np2m()
2. Remove useless volatile
3. Update function's comment in the header
4. Minor style fixes ('\n' and d)
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Wei Liu [Thu, 5 Oct 2017 09:35:28 +0000 (10:35 +0100)]
libxl: use correct type modifier for vuart_gfn
Fixes compilation error like:
libxl_console.c: In function ‘libxl__device_vuart_add’:
libxl_console.c:379:5: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘xen_pfn_t’ [-Werror=format=]
flexarray_append(ro_front, GCSPRINTF("%lu", state->vuart_gfn));
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
livepatch: Expand check for safe_for_reapply if livepatch has only .rodata.
If the livepatch has only .rodata sections then it is OK to also
apply/revert/apply the livepatch without having to worry about the
unforseen consequences.
Ross Lagerwall [Wed, 28 Jun 2017 16:13:44 +0000 (17:13 +0100)]
livepatch: Declare live patching as a supported feature
See docs/features/livepatch.pandoc for the details.
Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If the .bug.frames.X or .livepatch.funcs sizes are different
than what the hypervisor expects - we fail the payload. To help
in diagnosing this include the expected and the payload
sizes.
Also make it more natural by having "Multiples" in the warning.
Also fix one case where we would fail if the size of the .ex_table
was being zero - but that is OK.
Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The ELF specification mentions nothing about the sh_size being
modulo the sh_addralign. Only that sh_addr MUST be aligned on
sh_addralign if sh_addralign is not zero or one.
We on loading did not take this in-to account so this patch adds
a check on the ELF file as it is being parsed.
Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Joao Martins [Tue, 3 Oct 2017 17:46:08 +0000 (18:46 +0100)]
public/io/netif.h: add gref mapping control messages
Adds 3 messages to allow guest to let backend keep grants mapped,
such that 1) guests allowing fast recycling of pages can avoid doing
grant ops for those cases, or otherwise 2) preferring copies over
grants and 3) always using a fixed set of pages for network I/O.
The three control ring messages added are:
- Add grefs to be mapped by backend
- Remove grefs mappings (If they are not in use)
- Get maximum amount of grefs kept mapped.
Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Juergen Gross [Wed, 4 Oct 2017 12:24:13 +0000 (14:24 +0200)]
gnttab: make resource limits per domain
Instead of using the same global resource limits of grant tables (max.
number of grant frames, max. number of maptrack frames) for all domains
make these limits per domain. Set those per-domain limits in
grant_table_set_limits(). The global settings are serving as an upper
boundary now which must not be exceeded by a per-domain value. The
default of max_grant_frames is set to the maximum default xl will use.
While updating the semantics of the boot parameters remove the
documentation of the no longer existing gnttab_max_nr_frames and
correct the default gnttab_max_maptrack_frames uses.
Segment bases (and limits) aren't being cleared by the loading of a nul
selector into a segment register on AMD CPUs. Therefore, if an
outgoing vCPU has a non-zero base in FS or GS and the subsequent
incoming vCPU has a non-zero but nul selector in the respective
register(s), the selector value(s) would be loaded without clearing the
segment base(s) in the hidden register portion.
Since the ABI states "zero" in its description of the fs and gs fields,
it is worth noting that the chosen approach to fix this alters the
written down ABI. I consider this preferrable over enforcing the
previously written down behavior, as nul selectors are far more likely
to be what was meant from the beginning.
The adjustments also eliminate an inconsistency between FS and GS
handling: Old code had an extra pointless (gs_base_user was always zero
when DIRTY_GS was set) conditional for GS. The old bitkeeper changeset
has no explanation for this asymmetry.
Additionally for DS and ES a flat selector is being loaded prior to the
loading of a nul one on AMD CPUs, just as a precautionary measure
(we're not currently aware of ways for a guest to deduce the base of a
segment register which has a nul selector loaded).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen/arm: vpl011: Add a pl011 uart DT node in the guest device tree
The SBSA UART node format is as specified in
Documentation/devicetree/bindings/serial/arm_sbsa_uart.txt and given below:
ARM SBSA defined generic UART
------------------------------
This UART uses a subset of the PL011 registers and consequently lives
in the PL011 driver. It's baudrate and other communication parameters
cannot be adjusted at runtime, so it lacks a clock specifier here.
Required properties:
- compatible: must be "arm,sbsa-uart"
- reg: exactly one register range
- interrupts: exactly one interrupt specifier
- current-speed: the (fixed) baud rate set by the firmware
Currently the baud rate of 115200 has been selected as a default value,
which is one of the valid baud rate settings. Higher baud rate was
selected since an emulated pl011 can support any valid baud rate without
any limitation of the hardware.
xen/arm: vpl011: Add support for multiple consoles in xenconsole
This patch adds the support for multiple consoles and introduces the
iterator functions to operate on multiple consoles.
The functions called by the iterators check that they are operating
on valid I/O parameters. This ensures that if a particular console is
not initialized then the functions will not do anything for that
console type.
This patch is in preparation to support a new vuart console.
Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org> Acked-by: Wei Liu <wei.liu2@citrix.com>
xen/arm: vpl011: Add a new handle_console_ring function in xenconsole
This patch introduces a new handle_console_ring function. This function
reads the data from the ring buffer on receiving an event.
The initialization of event channel poll fd to -1 is moved inside the
handle_console_ring function as they are related. There should be no
change in the behavior as there is no functional change.
xen/arm: vpl011: Modify xenconsole functions to take console structure as input
Xenconsole functions take domain structure as input. These functions shall be
modified to take console structure as input since these functions typically perform
console specific operations.
Also the console specific functions starting with prefix "domain_" shall be modified
to "console_" to indicate that these are console specific functions.
This patch is in preparation to support multiple consoles to support vuart console.
xen/arm: vpl011: Modify xenconsole to define and use a new console structure
Xenconsole uses a domain structure which contains console specific fields. This
patch defines a new console structure, which would be used by the xenconsole
functions to perform console specific operations like reading/writing data from/to
the console ring buffer or reading/writing data from/to console tty.
This patch is in preparation to support multiple consoles to support vuart console.
xen/arm: vpl011: Add a new domctl API to initialize vpl011
Add a new domctl API to initialize vpl011. It takes the GFN and console
backend domid as input and returns an event channel to be used for
sending and receiving events from Xen.
Xen will communicate with xenconsole using GFN as the ring buffer and
the event channel to transmit and receive pl011 data on the guest domain's
behalf.
Add emulation code to emulate read/write access to pl011 registers
and pl011 interrupts:
- Emulate DR read/write by reading and writing from/to the IN
and OUT ring buffers and raising an event to the backend when
there is data in the OUT ring buffer and injecting an interrupt
to the guest when there is data in the IN ring buffer
- Other registers are related to interrupt management and
essentially control when interrupts are delivered to the guest
This patch implements the SBSA Generic UART which is a subset of ARM
PL011 UART.
The SBSA Generic UART is covered in Appendix B of
https://static.docs.arm.com/den0029/a/Server_Base_System_Architecture_v3_1_ARM_DEN_0029A.pdf
xen/arm: vpl011: Define common ring buffer helper functions in console.h
DEFINE_XEN_FLEX_RING(xencons) defines common helper functions such as
xencons_queued() to tell the current size of the ring buffer,
xencons_mask() to mask off the index, which are useful helper functions.
pl011 emulation code will use these helper functions.
io/console.h includes io/ring.h which defines DEFINE_XEN_FLEX_RING.
In console/daemon/io.c, string.h had to be included before io/console.h
because ring.h uses string functions.
Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Julien Grall [Mon, 2 Oct 2017 12:59:41 +0000 (13:59 +0100)]
xen/x86: p2m-pod: Rework prototype of p2m_pod_demand_populate
- Switch the return type to bool
- Remove the parameter p2m_query_t q as it is not used
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Mon, 2 Oct 2017 12:59:40 +0000 (13:59 +0100)]
xen/x86: p2m-pod: Use typesafe gfn for the fields reclaim_single and max_guest
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Mon, 2 Oct 2017 12:59:39 +0000 (13:59 +0100)]
xen/x86: p2m-pod: Use typesafe gfn in p2m_pod_demand_populate
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Mon, 2 Oct 2017 12:59:38 +0000 (13:59 +0100)]
xen/x86: p2m-pod: Use typesafe gfn in p2m_pod_zero_check
At the same time make the array gfns const has it is not modified within
the function.
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Mon, 2 Oct 2017 12:59:37 +0000 (13:59 +0100)]
xen/x86: p2m-pod: Clean-up p2m_pod_zero_check
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Mon, 2 Oct 2017 12:59:36 +0000 (13:59 +0100)]
xen/x86: p2m-pod: Use typesafe GFN in pod_eager_record
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Mon, 2 Oct 2017 12:59:35 +0000 (13:59 +0100)]
xen/x86: p2m: Use typesafe GFN in p2m_set_entry
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Mon, 2 Oct 2017 12:59:34 +0000 (13:59 +0100)]
xen/x86: p2m: Use typesafe gfn for the P2M callbacks get_entry and set_entry
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Julien Grall [Mon, 2 Oct 2017 15:40:04 +0000 (16:40 +0100)]
xen/x86: p2m-pod: Use typesafe gfn in p2m_pod_decrease_reservation
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Mon, 2 Oct 2017 15:40:03 +0000 (16:40 +0100)]
xen/x86: p2m-pod: Clean-up use of typesafe MFN
Some unboxing/boxing can be avoided by using mfn_add(...) instead.
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Mon, 2 Oct 2017 15:40:02 +0000 (16:40 +0100)]
xen/x86: p2m-pod: Avoid redundant assignments in p2m_pod_demand_populate
gfn_aligned is assigned 3 times with the exact same formula. All the
variables used are not modified, so consolidate in a single assignment
at the beginning of the function.
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Mon, 2 Oct 2017 15:40:01 +0000 (16:40 +0100)]
xen/x86: p2m-pod: Fix coding style
Also take the opportunity to:
- move from 1 << * to 1UL << *.
- use unsigned when possible
- move from unsigned int -> unsigned long for some induction
variables
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Mon, 2 Oct 2017 15:40:01 +0000 (16:40 +0100)]
xen/x86: p2m-pod: Fix coding style for comments
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Mon, 2 Oct 2017 15:40:00 +0000 (16:40 +0100)]
xen/x86: p2m-pod: Remove trailing whitespaces
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Mon, 2 Oct 2017 15:39:59 +0000 (16:39 +0100)]
xen/x86: p2m-pod: Clean-up includes
A lot of the headers are not necessary. At the same time, order them in the
alphabetical order.
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Petre Pircalabu [Mon, 2 Oct 2017 15:04:56 +0000 (16:04 +0100)]
x86/monitor: Notify monitor if an emulation fails.
If case of a vm_event with the emulate_flags set, if the instruction
is not implemented by the emulator, the monitor should be notified instead
of directly injecting a hw exception.
This behavior can be used to re-execute an instruction not supported by
the emulator using the real processor (e.g. altp2m) instead of just
crashing.
Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Petre Pircalabu [Mon, 2 Oct 2017 15:04:55 +0000 (16:04 +0100)]
x86emul: Add return code information to error messages
- print the return code of the last failed emulator operation
in hvm_dump_emulation_state.
- print the return code in sh_page_fault (SHADOW_PRINTK) to make the
distiction between X86EMUL_UNHANDLEABLE and X86EMUL_UNIMPLEMENTED.
Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Petre Pircalabu [Mon, 2 Oct 2017 15:04:54 +0000 (16:04 +0100)]
x86emul: New return code for unimplemented instruction
Enforce the distinction between an instruction not implemented by the
emulator and the failure to emulate that instruction by defining a new
return code, X86EMUL_UNIMPLEMENTED.
This value should only be returned by the core emulator when a valid
opcode is found but the execution logic for that instruction is missing.
It should NOT be returned by any of the x86_emulate_ops callbacks.
e.g. hvm_process_io_intercept should not return X86EMUL_UNIMPLEMENTED.
The return value of this function depends on either the return code of
one of the hvm_io_ops handlers (read/write) or the value returned by
hvm_copy_guest_from_phys / hvm_copy_to_guest_phys.
Similary, none of this functions should return X86EMUL_UNIMPLEMENTED.
- hvm_io_intercept
- hvmemul_do_io
- hvm_send_buffered_ioreq
- hvm_send_ioreq
- hvm_broadcast_ioreq
- hvmemul_do_io_buffer
- hvmemul_validate
Also the behavior of hvm_emulate_one_insn and vmx_realmode_emulate_one
was modified to generate an Invalid Opcode trap when X86EMUL_UNRECOGNIZED
is returned by the emulator instead of just crash the domain.
Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Tue, 26 Sep 2017 16:08:33 +0000 (17:08 +0100)]
x86/svm: Fix a livelock when trying to run shadowed unpaged guests
On AMD processors which support SMEP (Some Fam16h processors) and SMAP (Zen,
Fam17h), a guest which is running with shadow paging and clears CR0.PG while
keeping CR4.{SMEP,SMAP} set will livelock, as hardware raises #PF which the
shadow pagetable concludes shouldn't happen.
This occurs because hardware is running with host paging settings, which
causes the guests choice of SMEP/SMAP to actually take effect, even though
they shouldn't from the guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
xen/arm: Correctly report the memory region in the dummy NUMA helpers
NUMA is currently not supported on Arm. Because common code is
NUMA-aware, dummy helpers are instead provided to expose a single node.
Those helpers are for instance used to know the region to scrub.
However the memory region is not reported correctly. Indeed, the
frametable may not be at the beginning of the memory and there might be
multiple memory banks. This will lead to not scrub some part of the
memory.
The memory information can be found using:
* first_valid_mfn as the start of the memory
* max_page - first_valid_mfn as the spanned pages
Note that first_valid_mfn is now been exported. The prototype has been
added in asm-arm/numa.h and not in a common header because I would
expect the variable to become static once NUMA is fully supported on
Arm.
xen/page_alloc: Cover memory unreserved after boot in first_valid_mfn
On Arm, some regions (e.g Initramfs, Dom0 Kernel...) are marked as
reserved until the hardware domain is built and they are copied into its
memory. Therefore, they will not be added in the boot allocator via
init_boot_pages.
Instead, init_xenheap_pages will be called once the region are not used
anymore.
Update first_valid_mfn in both init_heap_pages and init_boot_pages
(already exist) to cover all the cases.
This is XSA-245.
Signed-off-by: Julien Grall <julien.grall@arm.com>
[Adjust comment, added locking around first_valid_mfn update] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/arm: Fix the issue in cmp_mmio_handler used in find_mmio_handler
This patch fixes the wrong range check done in cmp_mmio_handler().
This function returns -1 , 0 or 1 based on whether the key value
is below the range, in the range or above the range where the range is
(start, start+size). However, it should check against (start, start+size-1)
because start+size falls outside the range.
This resulted in returning a wrong mmio_handler for a given mmio address which
happened to be start+size.
This bug was introduced when the mmio region search switched from
linear search to binary search in the following commit:
8047e09 "xen/arm: io: Use binary search for mmio handler lookup".
xen: fail gnttab_grow_table() in case of missing allocations
In case gnttab_grow_table() is being called without
grant_table_set_limits() having been called for the domain, e.g. in
case of a toolstack error, fail the function instead of crashing the
system.
While at it let gnttab_grow_table() return a proper error code instead
of 1 for success.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Fri, 29 Sep 2017 12:29:21 +0000 (13:29 +0100)]
xen/gnttab: Clean up goto tangle in grant_table_init()
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com>
It should be possible to re-introduce it in the future with a proper
implementation, in order to create a HVM guest without a device model,
which is slightly different from a PVHv2 guest.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>