Jan Beulich [Wed, 7 Dec 2016 12:54:41 +0000 (13:54 +0100)]
x86/HVM: drop hvm_emulate_one_no_write()
It was pointlessly non-static, and being static and a simple wrapper it
can as well be folded into its single caller.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 7 Dec 2016 12:53:50 +0000 (13:53 +0100)]
libelf: treat phdr and shdr similarly
Just like elf_shdr_count(), elf_phdr_count() better bounds checks the
value.
Add table entry size checks to elf_init().
Also both program and section headers are optional, and hence their
checking better is done conditionally only when any such headers are
present.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 7 Dec 2016 12:53:28 +0000 (13:53 +0100)]
libelf: type adjustments
Don't needlessly use uint64_t when unsigned suffices.
Also don't open code elf_phdr_count() and replace a redundant call to
elf_shdr_count().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 7 Dec 2016 12:52:59 +0000 (13:52 +0100)]
libelf: use UINT_MAX
While Xen indeed doesn't have limits.h, it still does have UINT_MAX, so
we should avoid open coding it (and perhaps - even if unlikely -
getting it wrong).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 7 Dec 2016 12:52:35 +0000 (13:52 +0100)]
libelf: section index 0 is special
When iterating over sections, table entry zero needs to be ignored.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 7 Dec 2016 12:49:08 +0000 (13:49 +0100)]
x86/HVM: prefer structure assignment for seg reg copying
This makes things type safe.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cédric Bosdonnat [Mon, 28 Nov 2016 13:53:57 +0000 (14:53 +0100)]
libxl: invert xc and domain model resume calls in xc_domain_resume()
Resume is sometimes silently failing for HVM guests. Getting the
xc_domain_resume() and libxl__domain_resume_device_model() in the
reverse order than what is in the suspend code fixes the problem.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: rebase it on top of staging ] Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Wed, 5 Oct 2016 11:42:15 +0000 (12:42 +0100)]
xen/common: Replace incorrect mandatory barriers with SMP barriers
Mandatory barriers are only for use with reduced-cacheability MMIO mappings.
All of these uses are just to deal with shared memory between multiple
processors, so use the smp_*() which are the correct barriers for the purpose.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Julien Grall [Mon, 5 Dec 2016 17:43:23 +0000 (17:43 +0000)]
xen/arm: traps: Emulate ICC_SRE_EL1 as RAZ/WI
Recent Linux kernel (4.4 and onwards [1]) is checking whether it is possible
to enable sysreg access (ICC_SRE_EL1.SRE) when the ID register
(ID_AA64PRF0_EL1.GIC) is reporting the presence of the sysreg interface.
When the guest has been configured to use GICv2, the hypervisor will
disable sysreg access for this vm (via ICC_SRE_EL2.Enable) and therefore
access to system register such as ICC_SRE_EL1 are trapped in EL2.
However, ICC_SRE_EL1 is not emulated by the hypervisor. This means that
Linux will crash as soon as it is trying to access ICC_SRE_EL1.
To solve this problem, Xen can implement ICC_SRE_EL1 as read-as-zero
write-ignore. The emulation will only be used when sysreg are disabled
for EL1.
[1] 963fcd409 "arm64: cpufeatures: Check ICC_EL1_SRE.SRE before
enabling ARM64_HAS_SYSREG_GIC_CPUIF"
arm/irq: Reorder check when the IRQ is already used by someone
Call irq_get_domain for the IRQ we are interested in
only after making sure that it is the guest IRQ to avoid
ASSERT(test_bit(_IRQ_GUEST, &desc->status)) triggering.
The restriction on non contiguous memory was resolved by commit 2d02b05c77fc5e7c76bf6f112db84bbaa44fdcb5:
"xen: arm: improve handling of system with non-contiguous RAM regions"
So, reverting this change,to enable Xen image placement at the end of the
useable system RAM.
Peng Fan [Fri, 23 Sep 2016 02:55:34 +0000 (10:55 +0800)]
xen/arm: domain_build: allocate lowmem for dom0 as much as possible
On AArch64 SoCs, some IPs may only have the capability to access
32 bits address space. The physical memory assigned for Dom0 maybe
not in 4GB address space, then the IPs will not work properly.
So need to allocate memory under 4GB for Dom0.
There is no restriction that how much lowmem needs to be allocated for
Dom0 ,so allocate lowmem as much as possible for Dom0.
This patch does not affect 32-bit domain, because Variable "lowmem" is
set to true at the beginning. If failed to allocate bank0 under 4GB,
need to panic for 32-bit domain, because 32-bit domain requires bank0
be allocated under 4GB.
For 64-bit domain, set "lowmem" to false, and continue allocating
memory from above 4GB.
Jun Sun [Mon, 10 Oct 2016 19:27:56 +0000 (12:27 -0700)]
Don't clear HCR_VM bit when updating VTTBR.
Currently function p2m_restore_state() would clear HCR_VM bit, i.e.,
disabling stage2 translation, before updating VTTBR register. After
some research and talking to ARM support, I got confirmed that this is not
necessary. We are currently working on a new platform that would need this
to be removed.
The patch is tested on FVP foundation model.
Signed-off-by: Jun Sun <jsun@junsun.net> Acked-by: Steve Capper <steve.capper@linaro.org> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Juergen Gross [Tue, 6 Dec 2016 06:41:54 +0000 (07:41 +0100)]
tools/xenstore: avoid unterminated string in xs_directory_part()
Commit d4016288ab1f ("xenstore: support XS_DIRECTORY_PART in
libxenstore") introduced a theoretical bug: the generation count of
the read node is transferred via strncpy without forcing a NUL byte
at the end. Correct this.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Mon, 5 Dec 2016 16:45:36 +0000 (16:45 +0000)]
Travis-ci: specify KCONFIG_ALLCONFIG for randconfig
The file provided contains symbols that must be set to certain values.
This then prevents random build breakage in travis due to
known-incompatible symbol selections.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Andrew Cooper [Tue, 25 Oct 2016 18:41:01 +0000 (19:41 +0100)]
x86/emul: Debugging improvements to the test harness
Disable stdout buffering, so logging gets out even if the harness crashes.
Add a verbose option (compile time disabled) which dumps all read/write calls
the harness makes
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 2 Dec 2016 18:23:02 +0000 (18:23 +0000)]
x86/shadow: Drop stale adjustment in the PAE second-half search
This shouldn't have been present in c/s 29a57c992 "x86/emul: Rework emulator
event injection". It was a leftover from a previous version of the series.
This conditional has no effect on the behaviour following it, as both
X86EMUL_EXCEPTION and X86EMUL_UNHANDLEABLE fall into the same "return back to
guest" path.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Tue, 24 May 2016 10:56:58 +0000 (11:56 +0100)]
x86/pagewalk: Improve print_gw()
print_gw() has no callers, meaning that it only gets used as part of manual
debugging. As such, the FILE/LINE references are of no practical use, and
voluminous in the log. Additionally, the function becoming empty in a
non-debug build is unhelpful. Switch from gdprintk() to gprintk().
Print the entry and mfn for a specific level on the same line. This halves
the number of lines printed overall. There needs to be a small adjustment to
the #ifdef'ary to maintain the proper l3e behaviour for 3-level paging, where
there is no l3mfn to print.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Juergen Gross [Mon, 5 Dec 2016 07:48:52 +0000 (08:48 +0100)]
xenstore: add small default data buffer to internal struct
Instead of always allocating a data buffer for incoming or outgoing
xenstore wire data add a small buffer to the buffered_data structure
of xenstored. This has the advantage that especially sending simple
response messages like errors or "OK" will no longer need allocating
a data buffer. This requires adding a memory context where the
allocated buffer was used for that purpose.
In order to avoid allocating a new buffered_data structure for each
response reuse the structure of the original request. This in turn
will avoid any new memory allocations for sending e.g. an ENOMEM
response making it possible to send it at all. To do this the
allocation of the buffered_data structure for the incoming request
must be done when a new request is recognized instead of doing it
when accepting a new connect.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Mon, 5 Dec 2016 07:48:51 +0000 (08:48 +0100)]
xenstore: add helper functions for wire argument parsing
The xenstore wire command argument parsing of the different commands
is repeating some patterns multiple times. Add some helper functions
to avoid the duplicated code.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Mon, 5 Dec 2016 07:48:46 +0000 (08:48 +0100)]
xenstore: add support for reading directory with many children
As the payload size for one xenstore wire command is limited to 4096
bytes it is impossible to read the children names of a node with a
large number of children (e.g. /local/domain in case of a host with
more than about 2000 domains). This effectively limits the maximum
number of domains a host can support.
In order to support such long directory outputs add a new wire command
XS_DIRECTORY_PART which will return only some entries in each call and
can be called in a loop to get all entries.
Input data are the path of the node and the byte offset into the child
list where returned data should start.
Output is the generation count of the node (which will change each time
the node is being modified) and a list of child names starting with
the specified index. The end of the list is indicated by an empty
child name. It is the responsibility of the caller to check for data
consistency by comparing the generation counts of all returned data
sets to be the same for one node.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Mon, 5 Dec 2016 07:48:45 +0000 (08:48 +0100)]
xenstore: add per-node generation counter
In order to be able to support reading the list of a node's children in
multiple chunks (needed for list sizes > 4096 bytes) without having to
allocate a temporary buffer we need some kind of generation counter for
each node. This will help to recognize a node has changed between
reading two chunks.
As removing a node and reintroducing it must result in different
generation counts each generation value has to be globally unique. This
can be ensured only by using a global 64 bit counter.
For handling of transactions there is already such a counter available,
it just has to be expanded to 64 bits and must be stored in each
modified node.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Mon, 5 Dec 2016 07:48:43 +0000 (08:48 +0100)]
xenstore: call add_change_node() directly when writing node
Instead of calling add_change_node() at places where write_node() is
called, do that inside write_node().
Note that there is one case where add_change_node() is called now when
a later failure will prohibit the changed node to be written: in case
of a write_node failing due to an error in tdb_store(). As the only
visible change of behavior is a stale event fired for the node, while
the failing tdb_store() signals a corrupted xenstore database, the
stale event will be the least problem of this case.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
In order to prepare adding a generation count to each node modify
add_change_node() to take the connection pointer and a node pointer
instead of the transaction pointer and node name as parameters. This
requires moving the call of add_change_node() from do_rm() to
delete_node_single().
While at it correct the comment for the prototype: there is no
longjmp() involved.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
libxl.c: switch to LOG*D use (refactored messages)
Use LOG*D functions to output the domain ID in logs as much as
possible. This will help consumer code sorting the logs by
domain.
This commit, only changes LOG*() into LOG*D() and adds a domid
parameter. The message of these LOG* calls has been altered to
remove the domain id from it since it is already contained in
the output log string.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
libxl: add LIBXL_LOGD_* and LOG*D function families.
These functions should be used to log messages when the domain
id is known. libxl__log will now prepend the log message by
"Domain %PRIu32:" if the domain id is a valid one.
This aims at helping consumers filter logs on domain IDs.
Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Tue, 8 Nov 2016 09:09:41 +0000 (10:09 +0100)]
stubdom: remove EXTRA_CFLAGS meant for building tools
When building stubdoms EXTRA_CFLAGS_XEN_TOOLS and
EXTRA_CFLAGS_QEMU_TRADITIONAL should be cleared as they might contain
flags not suitable for all stubdom builds (e.g. "-m64" often to be
found in $RPM_OPT_FLAGS will break building 32 bit stubdoms).
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Juergen Gross [Fri, 4 Nov 2016 09:53:29 +0000 (10:53 +0100)]
stubdom: simplify and fix Makefile
The stubdom Makefile is setting up links for various libraries. This
is done only once when qemu links are created and each library's links
are updated/created only if the link for the Makefile of the library
isn't already existing. In case a source is added to one library after
doing the first make of stubdom the new source won't be linked by a
new call of make.
Instead of testing the existence of the Makefile link use a make
dependency which will catch changes of the linked Makefile, too.
At the same time don't repeat the same link pattern 7 times but use a
make macro to do the linking.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
[ wei: move "touch $@" to correct location in do_links ] Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Thu, 13 Oct 2016 14:33:15 +0000 (15:33 +0100)]
flask: add gcov_op check
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Wei Liu [Thu, 29 Sep 2016 20:10:53 +0000 (21:10 +0100)]
gcov: add new interface and new formats support
A new sysctl interface for passing gcov data back to userspace. The new
interface uses a customised record file format. The new sysctl reuses
original sysctl number but renames the op to gcov_op.
Formats starting from gcc version 3.4 are supported. The code is
rewritten so that a new format can be easily added in the future.
Version specific code is grouped into different files. The format one
needs to use can be picked via Kconfig. The default format is the newest
one.
Userspace programs to handle extracted data will come in a later patch.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Thu, 29 Sep 2016 17:38:30 +0000 (18:38 +0100)]
xen, tools: rip out old gcov implementation
The internal data structure and code are tied to an old gcov format.
It's easier to just redo everything from scratch.
Salvage the reusable parts: leave xen/common/gcov and an empty Makefile
there, leave gcov support in Kconfig but mark that as broken. Also
reserve the sysctl number for later use (but delete relevant sysctl
structures).
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 1 Jul 2016 17:29:46 +0000 (18:29 +0100)]
x86/emul: Use system-segment relative memory accesses
With hvm_virtual_to_linear_addr() capable of doing proper system-segment
relative memory accesses, avoid open-coding the address and limit calculations
locally.
When a table spans the 4GB boundary (32bit) or non-canonical boundary (64bit),
segmentation errors are now raised. Previously, the use of x86_seg_none
resulted in segmentation being skipped, and the linear address being truncated
through the pagewalk, and possibly coming out valid on the far side.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <JBeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Thu, 30 Jun 2016 22:55:33 +0000 (23:55 +0100)]
x86/emul: Prepare to allow use of system segments for memory references
All system segments (GDT/IDT/LDT and TR) describe a linear address and limit,
and act similarly to user segments. However all current uses of these tables
in the emulator opencode the address calculations and limit checks. In
particular, no care is taken for access which wrap around the 4GB or
non-canonical boundaries.
Alter hvm_virtual_to_linear_addr() to cope with performing segmentation checks
on system segments. This involves restricting access checks in the 32bit case
to user segments only, and adding presence/limit checks in the 64bit case.
When suffering a segmentation fault for a system segments, return
X86EMUL_EXCEPTION but leave the fault injection to the caller. The fault type
depends on the higher level action being performed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <JBeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Tue, 1 Nov 2016 20:02:35 +0000 (20:02 +0000)]
x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require callers
to inject the pagefault themselves.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Wed, 23 Nov 2016 11:11:23 +0000 (11:11 +0000)]
x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear()
The functions use linear addresses, not virtual addresses, as no segmentation
is used. (Lots of other code in Xen makes this mistake.)
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Wed, 2 Nov 2016 11:49:25 +0000 (11:49 +0000)]
x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Tue, 1 Nov 2016 20:49:25 +0000 (20:49 +0000)]
x86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer
which is filled with pagefault information should one occur.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Fri, 25 Nov 2016 15:20:44 +0000 (15:20 +0000)]
x86/shadow: Avoid raising faults behind the emulators back
Use x86_emul_{hw_exception,pagefault}() rather than
{pv,hvm}_inject_page_fault() and hvm_inject_hw_exception() to cause raised
faults to be known to the emulator. This requires altering the callers of
x86_emulate() to properly re-inject the event.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>