Jan Beulich [Fri, 13 May 2016 17:15:34 +0000 (18:15 +0100)]
x86: reduce code size of struct cpu_info member accesses
Instead of addressing these fields via the base of the stack (which
uniformly requires 4-byte displacements), address them from the end
(which for everything other than guest_cpu_user_regs requires just
1-byte ones). This yields a code size reduction somewhere between 8k
and 12k in my builds.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Fri, 13 May 2016 17:12:22 +0000 (18:12 +0100)]
x86: suppress SMEP and SMAP while running 32-bit PV guest code
Since such guests' kernel code runs in ring 1, their memory accesses,
at the paging layer, are supervisor mode ones, and hence subject to
SMAP/SMEP checks. Such guests cannot be expected to be aware of those
two features though (and so far we also don't expose the respective
feature flags), and hence may suffer page faults they cannot deal with.
While the placement of the re-enabling slightly weakens the intended
protection, it was selected such that 64-bit paths would remain
unaffected where possible. At the expense of a further performance hit
the re-enabling could be put right next to the CLACs.
Note that this introduces a number of extra TLB flushes - CR4.SMEP
transitioning from 0 to 1 always causes a flush, and it transitioning
from 1 to 0 may also do.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 12 May 2016 16:02:21 +0000 (18:02 +0200)]
x86/PoD: skip eager reclaim when possible
Reclaiming pages is pointless when the cache can already satisfy all
outstanding PoD entries, and doing reclaims in that case can be very
harmful to performance when that memory gets used by the guest, but
only to store zeroes there.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Thu, 12 May 2016 12:24:39 +0000 (14:24 +0200)]
Revert "blktap2: Use RING_COPY_REQUEST"
This reverts commit 19f6c522a6a9599317ee1d8c4a155d1400d04c89. It
did wrongly get associated with XSA-155, and was (rightfully) never
backported to any of the stable trees. See also
http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg00571.html.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
xsplice: Unmask (aka reinstall NMI handler) if we need to abort.
If we have to abort in xsplice_spin() we end following
the goto abort. But unfortunataly we neglected to unmask.
This patch fixes that.
Reported-by: Martin Pohlack <mpohlack@amazon.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Wed, 11 May 2016 11:14:45 +0000 (12:14 +0100)]
tools/xendomains: Create lockfile on start unconditionally
At the moment, the xendomains init script will only create a lockfile
if when started, it actually does something -- either tries to restore
a previously saved domain as a result of XENDOMAINS_RESTORE, or tries
to create a domain as a result of XENDOMAINS_AUTO.
RedHat-based SYSV init systems try to only call "${SERVICE} shutdown"
on systems which actually have an actively running component; and they
use the existence of /var/lock/subsys/${SERVICE} to determine which
systems are running.
This means that at the moment, on RedHat-based SYSV systems (such as
CentOS 6), if you enable xendomains, and have XENDOMAINS_RESTORE set
to "true", but don't happen to start a VM, then your running VMs will
not be suspended on shutdown.
Since the lockfile doesn't really have any other effect than to
prevent duplicate starting, just create it unconditionally every time
we start the xendomains script.
The other option would have been to touch the lockfile if
XENDOMAINS_RESTORE was true regardless of whether there were any
domains to be restored. But this would mean that if you started with
the xendomains script active but XENDOMAINS_RESTORE set to "false",
and then changed it to "true", then xendomains would still not run the
next time you shut down. This seems to me to violate the principle of
least surprise.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Olaf Hering <olaf@aepfle.de> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Wed, 11 May 2016 11:14:44 +0000 (12:14 +0100)]
hotplug: Fix xendomains lock path for RHEL-based systems
Commit c996572 changed the LOCKFILE path from a check between two
hardcoded paths (/var/lock/subsys/ or /var/lock) to using the
XEN_LOCK_DIR variable designated at configure time. Since
XEN_LOCK_DIR doesn't (and shouldn't) have the 'subsys' postfix, this
effectively moves all the lock files by default to /var/lock instead.
Unfortunately, this breaks xendomains on RedHat-based SYSV init
systems. RedHat-based SYSV init systems try to only call "${SERVICE}
shutdown" on systems which actually have an actively running
component; and they use the existence of /var/lock/subsys/${SERVICE}
to determine which systems are running.
Changing XEN_LOCK_DIR to /var/lock/subsys is not suitable, as only
system services like xendomains should create lockfiles there; other
locks (such as the console locks) should be created in /var/lock
instead.
Instead, re-instate the check for the subsys/ subdirectory of the lock
directory in the xendomains script.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Olaf Hering <olaf@aepfle.de> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Paul Durrant [Mon, 9 May 2016 16:43:14 +0000 (17:43 +0100)]
tools: configure correct trace backend for QEMU
Newer versions of the QEMU source have replaced the 'stderr' trace
backend with 'log'. This patch adjusts the tools Makefile to test for
the 'log' backend and specify it if it is available.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 11 May 2016 07:46:02 +0000 (09:46 +0200)]
XSA-77: widen scope again
As discussed on the hackathon, avoid us having to issue security
advisories for issues affecting only heavily disaggregated tool stack
setups, which no-one appears to use (or else they should step up to get
things into shape).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ross Lagerwall [Tue, 10 May 2016 09:10:02 +0000 (10:10 +0100)]
xsplice: Prevent new symbols duplicating core symbols
When loading patches, the code prevents loading a patch containing a new
symbol that duplicates a symbol from another loaded patch. However, the
check should also prevent loading a new symbol that duplicates a symbol
from the core hypervisor.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 9 May 2016 13:13:57 +0000 (13:13 +0000)]
x86/hvm: Fix invalidation for emulated invlpg instructions
hap_invlpg() is reachable from the instruction emulator, which means
introspection and tests using hvm_fep can end up here. As such, crashing the
domain is not an appropriate action to take.
Fixing this involves rearranging the callgraph.
paging_invlpg() is now the central entry point. It first checks for the
non-canonical NOP case, and calls ino the paging subsystem. If a real flush
is needed, it will call the appropriate handler for the vcpu. This allows the
PV callsites of paging_invlpg() to be simplified.
The sole user of hvm_funcs.invlpg_intercept() is altered to use
paging_invlpg() instead, allowing the .invlpg_intercept() hook to be removed.
For both VMX and SVM, the existing $VENDOR_invlpg_intercept() is split in
half. $VENDOR_invlpg_intercept() stays as the intercept handler only (which
just calls paging_invlpg()), and new $VENDOR_invlpg() functions do the
ASID/VPID management. These later functions are made available in hvm_funcs
for paging_invlpg() to use.
As a result, correct ASID/VPID management occurs for the hvmemul path, even if
it did not originate from an real hardware intercept.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: George Dunlap <george.dunlap@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 9 May 2016 17:09:38 +0000 (18:09 +0100)]
x86/svm: Don't unconditionally use a new ASID in svm_invlpg_intercept()
paging_invlpg() already returns a boolean indicating whether an invalidation
is necessary or not. A return value of 0 indicates that the specified virtual
address wasn't shadowed (or has already been flushed), cannot currently be
cached in the TLB.
This is a performance optimisation.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Tim Deegan <tim@xen.org> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 22 Apr 2016 08:44:53 +0000 (09:44 +0100)]
x86/hvm: Correct the emulated interaction of invlpg with segments
The `invlpg` instruction is documented to take a memory address, and is not
documented to suffer faults from segmentation violations. It is also
explicitly documented to be a NOP when issued on a non-canonical address.
Experimentally, and subsequently confirmed by both Intel and AMD, the
instruction does take into account segment bases, but will happily invalidate
a TLB entry for a mapping beyond the segment limit.
The emulation logic will currently raise #GP/#SS faults for segment limit
violations, or non-canonical addresses, which doesn't match hardware's
behaviour. Instead, squash exceptions generated by
hvmemul_virtual_to_linear() and proceed with invalidation.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Sat, 7 May 2016 12:41:05 +0000 (13:41 +0100)]
x86/hvm: Raise #SS faults for %ss-based segmentation violations
Raising #GP under such circumstances is architecturally wrong.
Refer to the Intel or AMD manuals describing faults, and the conditions
under which #SS is raised.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Tue, 10 May 2016 13:37:00 +0000 (14:37 +0100)]
sched/rt: Fix memory leak in rt_init()
c/s 2656bc7b0 "xen: adopt .deinit_pdata and improve timer handling"
introduced a error path into rt_init() which leaked prv if the
allocation of prv->repl_timer failed.
Introduce an error cleanup path.
Spotted by Coverity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Meng Xu <mengxu@cis.upenn.edu> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
--- CC: George Dunlap <george.dunlap@eu.citrix.com> CC: Dario Faggioli <dario.faggioli@citrix.com>
Dario Faggioli [Mon, 9 May 2016 14:41:00 +0000 (15:41 +0100)]
xen: adopt .deinit_pdata and improve timer handling
The scheduling hooks API is now used properly, and no
initialization or de-initialization happen in
alloc/free_pdata any longer.
In fact, just like it is for Credit2, there is no real
need for implementing alloc_pdata and free_pdata.
This also made it possible to improve the replenishment
timer handling logic, such that now the timer is always
kept on one of the pCPU of the scheduler it's servicing.
Before this commit, in fact, even if the pCPU where the
timer happened to be initialized at creation time was
moved to another cpupool, the timer stayed there,
potentially inferfering with the new scheduler of the
pCPU itself.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-and-Tested-by: Meng Xu <mengxu@cis.upenn.edu> Acked-by: George Dunlap <george.dunlap@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
xen: sched: avoid spuriously re-enabling IRQs in csched2_switch_sched()
interrupts are already disabled when calling the hook
(from schedule_cpu_switch()), so we must use spin_lock()
and spin_unlock().
Add an ASSERT(), so we will notice if this code and its
caller get out of sync with respect to disabling interrupts
(and add one at the same exact occurrence of this pattern
in Credit1 too)
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Mon, 9 May 2016 11:21:06 +0000 (13:21 +0200)]
IOMMU: don't BUG() on exotic hardware
On x86, iommu_get_ops() BUG()s when running on non-Intel, non-AMD
hardware. While, with our current code, that's a correct prerequisite
assumption for IOMMU presence, this is wrong on systems without IOMMU.
Hence iommu_enabled (and alike) checks should be done prior to calling
that function, not after.
Also move iommu_suspend() next to iommu_resume() - it escapes me why
iommu_do_domctl() had got put between the two.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Tue, 3 May 2016 10:55:09 +0000 (12:55 +0200)]
xen/xsplice: add ELFOSABI_FREEBSD as a supported OSABI for payloads
The calling convention used by the FreeBSD ELF OSABI is exactly the same as
the the one defined by System V, so payloads with a FreeBSD OSABI should be
accepted by the xsplice machinery.
Specifically "the FreeBSD ELF OSABI only has a meaning for userspace
applications, it's used by FreeBSD in order to detect if an application
is native or if it needs to be run in the linuxator (the Linux emulator,
or any other emulator that is available and matches the ELF OSABI specified
in the binary FWIW).
The only difference from SYSV to FreeBSD OSABI is the sysentvec that's
selected inside of the FreeBSD kernel (the ABI between the kernel and the
user-space application), but of course this doesn't apply to kernel code,
which is what Xen and the xsplice payloads are. Sadly this is not written
anywhere. " And since the ELF tools on FreeBSD by default build with
this - they would stick this OSABI entry.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Wei Liu [Fri, 29 Apr 2016 15:11:17 +0000 (16:11 +0100)]
blktap2: initialise buf in qcow2raw.c:main
Gcc complains:
qcow2raw.c: In function ‘main’:
qcow2raw.c:387:17: error: ‘buf’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
treq.buf = buf;
^
But buf is a valid buffer allocated by posix_memalign at that point.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Fri, 29 Apr 2016 15:11:16 +0000 (16:11 +0100)]
blktap2: initialise buf to NULL in img2qcow.c:main
Gcc complains:
qcow2raw.c: In function ‘main’:
qcow2raw.c:387:17: error: ‘buf’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
treq.buf = buf;
^
But at the point of that assignment, buf is a valid buffer allocated by
posix_memalign and filled in by read.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Fri, 29 Apr 2016 15:11:15 +0000 (16:11 +0100)]
blktap2: initialise buf in vhd_util_check_footer
Gcc complains:
vhd-util-check.c: In function ‘vhd_util_check_footer’:
vhd-util-check.c:413:2: error: ‘buf’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
memcpy(&backup, buf, sizeof(backup));
In fact buf is initialised a few lines above.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Fri, 29 Apr 2016 15:11:14 +0000 (16:11 +0100)]
rombios/tcgbios: initialise logdataptr in HashLogEvent32
Gcc complains:
tcgbios.c: In function ‘HashLogEvent32’:
tcgbios.c:1131:10: error: ‘logdataptr’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
entry = tcpa_extend_acpi_log(logdataptr);
It fails to figure out when logdataptr is used it is always initialised
in a if block a few line above.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Fri, 29 Apr 2016 15:11:12 +0000 (16:11 +0100)]
rombios/tcgbios: initialise size in tcpa_extend_acpi_log
Gcc complains:
tcgbios.c:362:3: error: ‘size’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
memcpy((char *)lasa_last, (char *)entry_ptr, size);
It fails to figure out if size is used in memcpy it is always initialised.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Thu, 5 May 2016 20:18:09 +0000 (15:18 -0500)]
init: shebang should be the first line
The shebang was not on the first line in the init script and it should
be.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Thu, 5 May 2016 20:18:08 +0000 (15:18 -0500)]
init: drop GNU-isms for sleep command
Most implementations of the sleep command only take integers. GNU
coreutils has a GNU extension to allow any floating point number to be
passed but we shouldn't depend on that.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Paul Lai [Wed, 4 May 2016 15:54:07 +0000 (08:54 -0700)]
build: Honor '--enable-githttp' in toplevel Makefile generation
During the make world, git mini-os.git didn't honor the 'configure
--enable-githttp' option. The 'enable-githttp' was only honored in
the tools subdirectory.
Signed-off-by: Paul Lai <paul.c.lai@intel.com>
[ wei: add prefix "build:" to title ] Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Tue, 3 May 2016 21:46:50 +0000 (23:46 +0200)]
xen: credit2: fix 2 (minor) issues in load tracking logic
All calculations that involve load_last_update uses quantities
shifted by LOADAVG_GRANULARITY_SHIFT, so make sure that this
is true even when the field is assigned a value for the first
time, during vcpu allocation.
Also, during migration, while the loads of both the source and
destination runqueues certainly need changing, the vcpu being
moved does not change its running/non-running status, and its
calculated load should hence not be affected.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Tue, 3 May 2016 21:46:42 +0000 (23:46 +0200)]
xen: sched: fix killing an uninitialized timer in free_pdata.
commit 64269d9365 "sched: implement .init_pdata in Credit,
Credit2 and RTDS" helped fixing Credit2 runqueues, and
the races we had in switching scheduler for pCPUs, but
introduced another issue. In fact, if CPU bringup fails
during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
but before CPU_STARTING) the CPU_UP_CANCELED notifier
would be executed, which calls the free_pdata hook.
Such hook does, right now, two things: (1) undo the
initialization done inside the init_pdata hook and (2)
free the memory allocated by the alloc_pdata hook.
However, in the failure path just described, it is possible
that only alloc_pdata were called, and this is potentially
an issue (depending on how actually free_pdata does).
In fact, for Credit1 (the only scheduler that actually
allocates per-pCPU data), this result in calling kill_timer()
on a timer that had not yet been initialized, which causes
the following:
Solve this by making the scheduler hooks API symmetric again,
i.e., by adding a deinit_pdata hook and making it responsible
of undoing what init_pdata did, rather than asking to free_pdata
to do everything.
This is cleaner and, in the case at hand, makes it possible to
only call free_pdata (which is the right thing to do) as only
allocation and no initialization was performed.
Reported-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Tue, 3 May 2016 15:59:49 +0000 (16:59 +0100)]
configure: Fix when no libsystemd compat lib are available
From systemd change log, since version 209, libsystemd.so contain
everything, including libsystemd-daemon.so. Distro may, or may not provide
the compatibility libraries which libsystemd-daemon is part of.
So, if libsystemd-daemon is not available, check for the presence of
a recent enough libsystemd.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
[ wei: run autogen.sh ] Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Tue, 3 May 2016 10:55:07 +0000 (12:55 +0200)]
tools/xsplice: fix mixing system errno values with Xen ones.
Avoid using system errno values when comparing with Xen errno values.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Tue, 3 May 2016 10:55:06 +0000 (12:55 +0200)]
tools/xsplice: corrently use errno
Some error paths incorrectly used rc instead of errno.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Tue, 3 May 2016 10:55:05 +0000 (12:55 +0200)]
libxl: add a define for equivalent ENODATA errno on FreeBSD
Currently FreeBSD lacks the ENODATA errno value, so the privcmd driver
always translates ENODATA to ENOENT, add a define to libxl in order to
correctly match ENODATA with ENOENT on FreeBSD.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
xen/arm64: ensure that the correct SP is used for exceptions
The ARMv8 architecture has a SPSel ("stack pointer selection") machine
register that allows us to determine which exception level's stack
pointer is loaded when an exception occurs. As we don't want to
use the non-privileged SP_EL0 stack pointer -- or even assume that SP_EL0
points to a valid address in the hypervisor context-- we'll need to ensure
that our EL2 code sets the SPSel to SP_ELn mode, so exceptions that trap
to EL2 use the EL2 stack pointer.
This corrects an issue that can manifest as a hang-on-IRQ on some
arm64 cores if the firmware/bootloader has previously initialized SPSel
to 0; in which case Xen's exceptions will incorrectly use an invalid SP_EL0,
and will endlessly spin on the synchronous abort handler.
Roger Pau Monné [Wed, 4 May 2016 07:46:57 +0000 (09:46 +0200)]
xsplice: check against ELFOSABI_NONE instead of ELFOSABI_SYSV
They are equivalent, but using ELFOSABI_NONE is more correct in this
context.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 4 May 2016 07:44:32 +0000 (09:44 +0200)]
IOMMU/x86: per-domain control structure is not HVM-specific
... and hence should not live in the HVM part of the PV/HVM union. In
fact it's not even architecture specific (there already is a per-arch
extension type to it), so it gets moved out right to common struct
domain.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 4 May 2016 07:43:37 +0000 (09:43 +0200)]
x86/p2m: also tear down altp2m
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Razvan Cojocaru [Wed, 4 May 2016 07:42:06 +0000 (09:42 +0200)]
x86/monitor: disallow setting mem_access_emulate_each_rep when vm_event is NULL
It is meaningless (and potentially dangerous - see hvmemul_virtual_to_linear())
to set mem_access_emulate_each_rep before xc_monitor_enable() (which allocates
vcpu->arch.vm_event) has been called, so return an error from the
XEN_DOMCTL_MONITOR_OP_EMULATE_EACH_REP hypercall when that is the case.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
David Vrabel [Tue, 3 May 2016 16:15:38 +0000 (17:15 +0100)]
x86: show correct code in CPU state
When showing the CPU state (e.g., after a crash) the dump of code
around RIP is incorrect.
Incorrect:
Xen code around <ffff82d0801113cf> (...):
00 c6 c1 ee 08 48 c1 e0 <04> 03 04 f1 8b ...
^^ Uninitialized ^^ Missing 0x48
Correct:
Xen code around <ffff82d0801113cf> (...):
c6 c1 ee 08 48 c1 e0 04 <48> 03 04 f1 8b ...
When coping the bytes before RIP, the destination was off-by-one.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-by: Jan Beulich <JBeulich@suse.com> Acked-by: Jan Beulich <JBeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
ocaml/xc_get_cpu_featureset/arm: Return not implemented on ARM
... as it is not implemented on it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Mon, 2 May 2016 07:20:17 +0000 (09:20 +0200)]
x86/shadow: account for ioreq server pages before complaining about not found mapping
prepare_ring_for_helper(), just like share_xen_page_with_guest(),
takes a write reference on the page, and hence should similarly be
accounted for when determining whether to log a complaint.
This requires using recursive locking for the ioreq server lock, as the
offending invocation of sh_remove_all_mappings() is down the call stack
from hvm_set_ioreq_server_state(). (While not strictly needed to be
done in all other instances too, convert all of them for consistency.)
At once improve the usefulness of the shadow error message: Log all
values involved in triggering it as well as the GFN (to aid
understanding which guest page it is that there is a problem with - in
cases like the one here the GFN is invariant across invocations, while
the MFN obviously can [and will] vary).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Fri, 29 Apr 2016 17:25:31 +0000 (18:25 +0100)]
mkelf32: fix compilation on 32 bit build host
When cross-compiling xen on a 32 bit build host:
boot/mkelf32.c: In function 'main':
boot/mkelf32.c:360:21: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'Elf64_Off' [-Werror=format]
cc1: all warnings being treated as errors
Fix that by using PRId64 in format string.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Wed, 27 Apr 2016 16:00:37 +0000 (17:00 +0100)]
MAINTAINERS: Clarify the meaning of nested maintainership
Clarify the meaning of nested maintainership.
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
We had a discussion about the meaning of nested maintainership at the
recent Xen Hackathon. The notes of that meeting can be found on this
list [1]. No decision is official until discussed on this list, so
consider this patch the official proposal for this change, and object
or ask for clarification accordingly.
Compared to v1, there is one change that is worth pointing out: The
claim that THE REST consists of all committers. This is the case at
the moment, but this change would codify that this is an invariant we
intend to keep going forward.
The advantage of this is that the dispute resolution mentioned in this
patch for maintainers who can't agree lines up directly with the
fall-back for broader community issues upon which we can't reach
consensus.
Changes in v2:
- fixed spelling of "maintainer"
- fixed path of multi.c
- clarified that the resolution by REST would be by *majority* vote
- Asserted that The REST consists of all committers
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Jan Beulich <jbeulich@suse.com> CC: Keir Fraser <keir@xen.org> CC: Tim Deegan <tim@xen.org> CC: Wei Liu <wei.liu2@citrix.com> CC: Konrad Wilk <konrad.wilk@oracle.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Lars Kurth <lars.kurth@citrix.com>
Jan Beulich [Fri, 29 Apr 2016 16:28:41 +0000 (18:28 +0200)]
x86/vMSI-X: also snoop REP MOVS
... as at least certain versions of Windows use such to update the
MSI-X table. However, to not overly complicate the logic for now
- only EFLAGS.DF=0 is being handled,
- only updates not crossing MSI-X table entry boundaries are handled.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
The reason is that state is in 'CHECKED' which changes to 'APPLIED'
once check_for_xsplice_work finishes. So we have a race between 1) -> 3)
where one can manipulate the payload.
To guard against this we add a check in xsplice_action to not allow
any actions if schedule_work has been called for this specific payload.
The function 'is_work_scheduled' checks xsplice_work which is safe as:
- The ->do_work changes to 1 under the payload_lock (which we also hold).
- The ->do_work changes to 0 when all CPUs are quisced and IRQs have
been disabled.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
MAINTAINERS/xsplice: Add myself and Ross as the maintainers.
If you have a patch for xSplice send it our way!
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Wed, 6 Apr 2016 19:15:01 +0000 (15:15 -0400)]
xsplice: Prevent duplicate payloads from being loaded.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
[xen_hello_world depends on hypervisor build-id]
-bash-4.1# xen-xsplice load xen_bye_world.xsplice
Uploading xen_bye_world.xsplice (7076 bytes)
Performing check: completed
Performing apply:. completed
[xen_bye_world depends on xen_hello_world build-id]
-bash-4.1# xen-xsplice upload xen_replace_world xen_replace_world.xsplice
Uploading xen_replace_world.xsplice (7148 bytes)
-bash-4.1# xen-xsplice list
ID | status
----------------------------------------+------------
xen_hello_world | APPLIED
xen_bye_world | APPLIED
xen_replace_world | CHECKED
-bash-4.1# xen-xsplice replace xen_replace_world
Performing replace:. completed
-bash-4.1# xl info | grep extra
xen_extra : Hello Again World!
-bash-4.1# xen-xsplice list
ID | status
----------------------------------------+------------
xen_hello_world | CHECKED
xen_bye_world | CHECKED
xen_replace_world | APPLIED
and revert both of the previous payloads and apply
the xen_replace_world.
All the magic of this is in the Makefile - we extract
the build-id from the hypervisor (xen-syms) and jam it
in the xen_replace_world as .xsplice.depends.
We also make .old_addr be zero, forcing the hypervisor
to lookup the xen_extra_version.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
We now expect that the ELF payloads be built with the
--build-id.
Also the .xsplice.deps section has to have the contents
of the hypervisor (or a preceding payload) build-id.
We already have the code to verify the Elf_Note build-id
so export parts of it.
This dependency means the hypervisor MUST be compiled with
--build-id - so we gate the build of xSplice on the availability
of said functionality.
This does not impact the ordering of how the payloads can
be loaded, but it does enforce an STRICT ordering when the
payloads are applied. Also the REPLACE is special - we need
to check that its dependency against the hypervisor - not
the last applied patch.
To make this easier to test we also add an extra test-case
to be used - which can only be applied on top of the
xen_hello_world payload.
As in, one can apply xen_hello_world and then xen_bye_world
on top of that. Not the other way.
We also print the dependency and payloads build_in the keyhandler.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
XENVER_build_id/libxc: Provide ld-embedded build-id
If the hypervisor was built with build-ids we can expose the
build-id value to the toolstack (if it is not built with
it will just return -ENODATA). This is a priviligied operation
so only the controlling stack is able to request this.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
xsplice: Print build_id in keyhandler and on bootup.
As it should be an useful debug mechanism.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
This patch enables the Elf to be built with the build-id
and provide in the Xen hypervisor the code to extract it.
The man-page for ld --build-id says it is:
"Request the creation of a ".note.gnu.build-id" ELF note
section or a ".build-id" COFF section. The contents of the
note are unique bits identifying this linked file. style can be
"uuid" to use 128 random bits, "sha1" to use a 160-bit SHA1 hash
on the normative parts of the output contents, ..."
One can also retrieve the value of the build-id by doing
'readelf -n xen-syms'.
For EFI builds we re-use the same build-id that the xen-syms
was built with.
The version of ld that first implemented --build-id is v2.18.
We check for to see if the linker supports the --build-id
parameter and if so use it.
For x86 we have two binaries - the xen-syms and the xen - an
smaller version with lots of sections removed. To make it possible
for readelf -n xen we also modify mkelf32 and xen.lds.S to include
the PT_NOTE ELF section.
The EFI binary is more complicated. We only build one type of
binary and expanding the amount of sections the EFI binary has to
include an .note one is pointless - as there is no concept of
PT_NOTE. The best we can do is move this .note in the .rodata section.
Further development wise should move it to .buildid section
so that DataDirectory debug data nor CodeView can view it.
(The author has no clue what those are).
Note that in earlier patches the linker script had:
Which meant you could have different ELF notes _outside_ the
__note_gnu_build_id_end. However for EFI builds we take the whole
.note* section and jam it in the EFI to be between
__note_gnu_build_id_start and __note_gnu_build_id_end.
To not make this happend we make on the ELF build the section
be called .note.gnu.build-id (instead of just .note).
If there is a need for a different type of note other folks
can add it as a different section name.
Note that we do call --binary-id=sha1 on all linker invocations.
We have to do to enforce that the symbol offsets don't changes
(the side effect is that we we would have multiple binary ids -
except that the last one is the final one).
Without this working the symbol table embedded in Xen ends
up incorrect - some of the values it contains would be offset by the
size of the included build id.
This obviously causes problems when resolving symbols.
We also define the NT_GNU_BUILD_ID in the elfstructs.h as we
need to use it in various places.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Martin Pohlack <mpohlack@amazon.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Sat, 9 Apr 2016 14:07:21 +0000 (10:07 -0400)]
xsplice: Add support for alternatives
Add support for applying alternative sections within xsplice payload.
At payload load time, apply an alternative sections that are found.
Also we add an test-case exercising a rather useless alternative
(patching a NOP with a NOP) - but it does exercise the code-path.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Wed, 20 Apr 2016 20:20:26 +0000 (16:20 -0400)]
xsplice: Add support for exception tables.
Add support for exception tables contained within xSplice payloads. If an
exception occurs search either the main exception table or a particular
active payload's exception table depending on the instruction pointer.
Also we add an test-case to make sure we have an exception that
is handled.
To not grow the code-base if xSplice is not compiled in we add
certain #define to help in determining if code needs to be __init
or not.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Wed, 27 Apr 2016 15:30:54 +0000 (11:30 -0400)]
xsplice: Add support for bug frames.
Add support for handling bug frames contained with xsplice modules. If a
trap occurs search either the kernel bug table or an applied payload's
bug table depending on the instruction pointer.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Wed, 27 Apr 2016 15:30:25 +0000 (11:30 -0400)]
x86, xsplice: Print payload's symbol name and payload name in backtraces
Naturally the backtrace is presented when an instruction
hits an bug_frame or %p is used.
The payloads do not support bug_frames yet - however the functions
the payloads call could hit an BUG() or WARN().
The traps.c has logic to scan for it this - and eventually it will
find the correct bug_frame and the walk the stack using %p to print
the backtrace. For %p and symbols to print a string - the
'is_active_kernel_text' is consulted which uses an 'struct virtual_region'.
Therefore we register our start->end addresses so that
'is_active_kernel_text' will include our payload address.
We also register our symbol lookup table function so that it can
scan the list of payloads and retrieve the correct name.
Lastly we change vsprintf to take into account s and namebuf.
For core code they are the same, but for payloads they are different.
This gets us:
Which is great if payloads have similar or same symbol names.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
xsplice, symbols: Implement fast symbol names -> virtual addresses lookup
The current mechanism is geared towards fast virtual address ->
symbol names lookup. This is fine for the normal use cases
(BUG_ON, WARN_ON, etc), but for xSplice - where we need to find
hypervisor symbols - it is slow.
To understand this patch, a description of the existing
method is explained first. For folks familar go to 'NEW CODE:'.
HOW IT WORKS:
The symbol table lookup mechanism uses a simple encoding mechanism
where it extracts the common ascii characters that the symbol's use.
This saves us space. The lookup mechanism is geared towards looking
up symbols based on address. We have one 0..N (where N is
the number of symbols, so 6849 for example) table:
symbols_addresses[0..N]
And an 1-1 (in a loose fashion) of the symbols (encoded) in a
symbols_names stream of size N.
The N is variable (later on that below)
The symbols_names are sorted based on symbols_addresses, which
means that the decoded entries inside symbols_names are not in
ascending or descending order.
There is also the encoding mechanism - the table of 255 entries
called symbols_token_index[]. And the symbols_token_table which
is an stream of ASCIIZ characters, such as (it really
is not a table as the values are variable):
And the symbols_token_index:
@0 .short 0
@1 .short 7
@2 .short 12
@4 .short 16
...
@84 .short 300
The relationship between them is that the symbols_token_index
gives us the offset to symbols_token_table.
The symbol_names[] array is a stream of encoded values. Each value
follows the same pattern - <len> followed by <encoding values>.
And the another <len> followed by <encoding values>.
Hence to find the right one you need to read <len>, add <len>
(to skip over), read <len>, add <len>, and so on until one
finds the right tuple offset.
The <encoding values> are the indicies into the symbols_token_index.
Meaning if you have:
0x04, 0x54, 0xda, 0xe2, 0x74
[4, 84, 218, 226, 116 in human numbering]
The 0x04 tells us that the symbol is four bytes past this one (so next
symbol offset starts at 5). If we lookup symbols_token_index[84] we get 300.
symbols_token[300] gets us the "S". And so on, the string eventually
end up being decode to be 'S_stext'. The first character is the type,
then optionally follwed by the filename (and # right after filename)
and then lastly the symbol, such as:
tvpmu_intel.c#core2_vpmu_do_interrupt
Keep in mind that there are two fixed sized tables:
symbols_addresses[0..symbols_num_syms], and
symbols_markers[0..symbols_num_syms/255].
The symbols_markers is used to speed searching for the right address.
It gives us the offsets within symbol_names that start at the <len><encoded value>.
The way to find a symbol based on the address is:
1) Figure out the 'tuple offset' from symbols_address[0..symbols_num_syms].
This table is sorted by virtual addresses so finding the value is simple.
2) Get starting offset of symbol_names by retrieving value of
symbol_markers['tuple offset' / 255].
3). Iterate up to 'tuple_offset & 255' in symbols_markers stream starting
at 'offset'.
4). Decode the <len><encoded value>
This however does not work very well if we want to search the other
way - we have the symbol name and want to find the address.
NEW CODE:
To make that work we add one fixed size table called symbols_sorted_offsets which
has two elements: offset in symbol stream, offset in the symbol-address.
This whole array is sorted on the original symbol name during build-time
(in case of collision we also take into account the type).
Which makes it incredibly easy to get in the symbols_names and also
symbols_addresses (or symbols_offsets)
Searching for symbols is simplified as we can do a binary search
on symbols_sorted_offsets. Since the symbols are sorted it takes on
average 13 calls to symbols_expand_symbol.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Wed, 20 Apr 2016 20:19:27 +0000 (16:19 -0400)]
xsplice,symbols: Implement symbol name resolution on address.
If in the payload we do not have the old_addr we can resolve
the virtual address based on the UNDEFined symbols.
We also use an boolean flag: new_symbol to track symbols. The usual
case this is used is by:
* A payload may introduce a new symbol
* A payload may override an existing symbol (introduced in Xen or another
payload)
* Overriding symbols must exist in the symtab for backtraces.
* A payload must always link against the object which defines the new symbol.
Considering that payloads may be loaded in any order it would be incorrect to
link against a payload which simply overrides a symbol because you could end
up with a chain of jumps which is inefficient and may result in the expected
function not being executed.
Since the payload we get is an relocatable image (partial linked ELF file)
we have to match up the symbols. We follow the ELF visibility rules for that
and for local symbols do what bintutils ld does.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
This change demonstrates how to generate an xSplice ELF payload.
The idea here is that we want to patch in the hypervisor
the 'xen_version_extra' function with an function that will
return 'Hello World'. The 'xl info | grep extraversion'
will reflect the new value after the patching.
To generate this ELF payload file we need:
- C code of the new code (xen_hello_world_func.c).
- C code generating the .xsplice.funcs structure
(xen_hello_world.c)
- The address of the old code (xen_extra_version). We
retrieve it by using 'nm --defined' on xen-syms.
- The size of the new and old code for which we use
nm --defined -S on our code and xen-syms respectively.
There are two C files and one header files generated
during build. One could make this one C file if the
size of the newly patched function size was known in
advance (or an random value was choosen).
There is also a strict order of compiling:
1) xen_hello_world_func.c
2) config.h - extract the size of the new function,
the old function and the old function address.
3) xen_hello_world.c - which contains the .xsplice.funcs
structure.
4) Link the object files in an xen_hello_world.xsplice file.
The use-case is simple:
$xen-xsplice load /usr/lib/debug/xen_hello_world.xsplice
$xen-xsplice list
ID | status
----------------------------------------+------------
xen_hello_world APPLIED
$xl info | grep extra
xen_extra : Hello World
$xen-xsplice revert xen_hello_world
Performing revert: completed
$xen-xsplice unload xen_hello_world
Performing unload: completed
$xl info | grep extra
xen_extra : -unstable
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> [ARM] Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Wed, 27 Apr 2016 13:07:04 +0000 (09:07 -0400)]
xsplice: Implement support for applying/reverting/replacing patches.
Implement support for the apply, revert and replace actions.
To perform and action on a payload, the hypercall sets up a data
structure to schedule the work. A hook is added in the reset_stack_and_jump
to check for work and execute it if needed (specifically we check an
per-cpu flag to make this as quick as possible).
In this way, patches can be applied with all CPUs idle and without
stacks. The first CPU to run check_for_xsplice_work() becomes the
master and triggers a reschedule softirq to trigger all the other CPUs
to enter check_for_xsplice_work() with no stack. Once all CPUs
have rendezvoused, all CPUs disable their IRQs and NMIs are ignored.
The system is then quiscient and the master performs the action.
After this, all CPUs enable IRQs and NMIs are re-enabled.
Note that it is unsafe to patch do_nmi and the xSplice internal functions.
Patching functions on NMI/MCE path is liable to end in disaster on x86.
This is not addressed in this patch and is mentioned in the
design doc as a further TODO.
The action to perform is one of:
- APPLY: For each function in the module, store the first arch-specific
number bytes of the old function and replace it with a jump to the
new function. (on x86 it is 5 bytes, on ARM it will likey be 4 bytes).
- REVERT: Copy the previously stored bytes into the first arch-specific
number of bytes of the old function (again, 5 bytes on x86).
- REPLACE: Revert each applied module and then apply the new module.
To prevent a deadlock with any other barrier in the system, the master
will wait for up to 30ms before timing out.
Measurements found that the patch application to take about 100 μs on a
72 CPU system, whether idle or fully loaded.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Wed, 27 Apr 2016 13:01:51 +0000 (09:01 -0400)]
xsplice: Implement payload loading
Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
[xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region. Split them in three
regions - .text, .data, and .rodata. MUST have at least .text.
- Resolve section symbols. All other symbols must be absolute addresses.
(Note that patch titled "xsplice,symbols: Implement symbol name resolution
on address" implements that)
- Perform relocations.
- Secure the the regions (.text,.data,.rodata) with proper permissions.
We capitalize on the vmalloc callback API (see patch titled:
"rm/x86/vmap: Add v[z|m]alloc_xen, and vm_init_type") to allocate
a region of memory within the [xen_virt_end, XEN_VIRT_END] for the code.
We also use the "x86/mm: Introduce modify_xen_mappings()"
to change the virtual address page-table permissions.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Fri, 19 Feb 2016 19:37:17 +0000 (14:37 -0500)]
xsplice: Add helper elf routines
Add Elf routines and data structures in preparation for loading an
xSplice payload.
We make an assumption that the max number of sections an ELF payload
can have is 64. We can in future make this be dependent on the
names of the sections and verifying against a list, but for right now
this suffices.
Also we a whole lot of checks to make sure that the ELF payload
file is not corrupted nor that the offsets point past the file.
For most of the checks we print an message if the hypervisor is built
with debug enabled.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
For those users who want to use the virtual addresses that
are in the hypervisor's code/data region address space -
these three new functions allow that.
Implementation wise the vmap API keeps track of two virtual
address regions now:
a) VMAP_VIRT_START
b) Any provided virtual address space (need start and end).
The a) one is the default one and the existing behavior
for users of vmalloc, vmap, etc is the same.
If however one wishes to use the b) one only has to use
the vm_init_type to initialize and the vmzalloc_xen to utilize
it (vfree and vunmap are capable of searching both address spaces).
This allows users (such as xSplice) to provide their own
mechanism to change the the page flags, and also use virtual
addresses closer to the hypervisor virtual addresses (at least
on x86) while not having to deal with the allocation of
pages.
For example of users, see patch titled "xsplice: Implement payload
loading", where we parse the payload's ELF relocations - which
is defined to be signed 32-bit (on x86) (max displacement hence
is 2GB virtual space, ARM32 is 128MB). The displacement of the
hypervisor virtual addresses to the vmalloc (on x86)
is more than 32-bits - which means that ELF relocations would
truncate the 34 and 33th bit. Hence this alternate API.
We also add add extra checks in case the b) range has not been
initialized.
Part of this patch also removes 'vm_alloc' and 'vm_free'
decleration as we do not have any users of it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Suggested-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com> [ARM] Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup.
During execution of the hypervisor we have two regions of
executable code - stext -> _etext, and _sinittext -> _einitext.
The later is not needed after bootup.
We also have various built-in macros and functions to search
in between those two swaths depending on the state of the system.
That is either for bug_frames, exceptions (x86) or symbol
names for the instruction.
With xSplice in the picture - we need a mechanism for new payloads
to searched as well for all of this.
Originally we had extra 'if (xsplice)...' but that gets
a bit tiring and does not hook up nicely.
This 'struct virtual_region' and virtual_region_list provide a
mechanism to search for the bug_frames, exception table,
and symbol names entries without having various calls in
other sub-components in the system.
Code which wishes to participate in bug_frames and exception table
entries search has to only use two public APIs:
- register_virtual_region
- unregister_virtual_region
to let the core code know.
If the ->lookup_symbol is not then the default internal symbol lookup
mechanism is used.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> [ARM] Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
A simple tool that allows an system admin to perform
basic xsplice operations:
- Upload a xsplice file (with an unique name)
- List all the xsplice payloads loaded.
- Apply, revert, replace, or unload the payload using the
unique name.
- Do all two - upload, and apply the payload in one go (load).
Also will use the name of the file as the <name>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
The underlaying toolstack code to do the basic
operations when using the XEN_XSPLICE_op syscalls:
- upload the payload,
- get status of an payload,
- list all the payloads,
- apply, check, replace, and revert the payload.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
The implementation does not actually do any patching.
It just adds the framework for doing the hypercalls,
keeping track of ELF payloads, and the basic operations:
- query which payloads exist,
- query for specific payloads,
- check*1, apply*1, replace*1, and unload payloads.
*1: Which of course in this patch are nops.
The functionality is disabled on ARM until all arch
components are implemented.
Also by default it is disabled until the implementation
is in place.
We also use recursive spinlocks to so that the find_payload
function does not need to have a 'lock' and 'non-lock' variant.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
A mechanism is required to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.
This document describes the design of the API that would allow us to
upload to the hypervisor binary patches.
This document has been shaped by the input from:
Martin Pohlack <mpohlack@amazon.de>
Jan Beulich <jbeulich@suse.com>
Thank you!
Input-from: Martin Pohlack <mpohlack@amazon.de>
Input-from: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 28 Apr 2016 13:10:45 +0000 (15:10 +0200)]
x86/vMSI-X: also snoop qword writes
... the high half of which may be a write to the Vector Control field.
This gets things in sync again with msixtbl_write().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 28 Apr 2016 13:10:22 +0000 (15:10 +0200)]
x86/vMSI-X: add further checks to snoop logic
msixtbl_range(), as any other MMIO ->check() handlers, may get called
with other than the base address of an access - avoid the snoop logic
considering those.
Also avoid considering vCPU-s not blocked in the hypervisor in
msixtbl_pt_register(), just to be on the safe side.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 28 Apr 2016 13:09:26 +0000 (15:09 +0200)]
x86/HVM: fix forwarding of internally cached requests (part 2)
Commit 96ae556569 ("x86/HVM: fix forwarding of internally cached
requests") wasn't quite complete: hvmemul_do_io() also needs to
propagate up the clipped count. (I really should have re-tested the
forward port resulting in the earlier change, instead of relying on the
testing done on the older version of Xen which the fix was first needed
for.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
public/x86: remove HVMMEM_mmio_write_dm from the public interface
HVMMEM_mmio_write_dm is removed for new xen interface versions, and
is replaced with type HVMMEM_unused. Attempts to set a page to this
type will return -EINVAL in xen after 4.7.0. And there will be no
pages with type p2m_mmio_write_dm, therefore HVMOP_get_mem_type will
never get the old type - HVMMEM_mmio_write_dm.
New approaches to write protect guest ram pages will be provided in
future patches.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
The current test performed in order to check if the assembler supports
certain instructions doesn't take into account the value of AFLAGS, which
when using clang contains the option that disables the integrated assembler
due to the lack of features.
As a result of this, the current instruction tests were performed against the
integrated assembler, but then at build time the non-integrated assembler
was used. If both have feature-parity, this is a non-issue, but we cannot
assume this.
Fix this by passing AFLAGS in the instruction test, and including the arch
Rules.mk makefile after AFLAGS is set.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 28 Apr 2016 13:06:56 +0000 (15:06 +0200)]
x86/time: fix gtime_to_gtsc for vtsc=1 PV guests
For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time()
using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct
vcpu_time_info, is calculated from stime_local_stamp using
gtime_to_gtsc.
However gtime_to_gtsc can return 0, if time < vtsc_offset, which can
actually happen when gtime_to_gtsc is called passing stime_local_stamp
(the caller function is __update_vcpu_system_time).
In that case the pvclock protocol doesn't work properly and the guest is
unable to calculate the system time correctly. As a consequence when the
guest tries to set a timer event (for example calling the
VCPUOP_set_singleshot_timer hypercall), the event will be in the past
causing Linux to hang.
The purpose of the pvclock protocol is to allow the guest to calculate
the system_time in nanosec correctly. The guest calculates as follow:
Given that with vtsc=1:
rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - vtsc_offset)
The expression evaluates to NOW(), which is what we want. However when
stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is
actually 0. As a consequence the calculated overall system_time is not
correct.
This patch fixes the issue by letting gtime_to_gtsc return a negative
integer in the form of a wrapped around unsigned integer, thus when the
guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate
the right value.
Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 27 Apr 2016 13:06:04 +0000 (14:06 +0100)]
travis: Enable tools when building with clang
tools now build under clang, so let them be tested.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Doug Goldstein <cardoe@cardoe.com>
Andrew Cooper [Wed, 27 Apr 2016 16:10:57 +0000 (17:10 +0100)]
travis: Remove clang-3.8 build
The package appears to have been renamed in Ubuntu. The only reason this test
is currently passing is because the hypervisor build falls back to clang, at
version 3.5
Add an explicit test in the build script that out desired compiler is
available. Note that travis already performs this step, but in a way which
isn't fatal to the build.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by Doug Goldstein <cardoe@cardoe.com>
Andrew Cooper [Wed, 27 Apr 2016 12:58:27 +0000 (13:58 +0100)]
tools/kdd: Fix uninitialised variable warning
Clang warns:
kdd.c:1031:9: error: variable 'fd' is used uninitialized whenever '||'
condition is true [-Werror,-Wsometimes-uninitialized]
if (argc != 4
^~~~~~~~~
kdd.c:1040:20: note: uninitialized use occurs here
if (select(fd + 1, &fds, NULL, NULL, NULL) > 0)
^~
This situation can't actually happen, as usage() is a terminal path. Annotate
it appropriately.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com>