Ian Jackson [Thu, 18 Feb 2016 12:37:04 +0000 (12:37 +0000)]
tools: libxl: Simplify logic in libxl__realloc
Replace the loop exit and separate test for loop overrun with an
assert in the loop body.
This simplifies the code. It also (hopefully) avoids Coverity
thinking that gc->alloc_maxsize might change, resulting in the loop
failing to find the right answer but also failing to abort.
(gc->alloc_maxsize can't change because gcs are all singlethreaded:
either they are on the stack of a specific thread, or they belong to
an ao and are covered by the ctx lock.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Corneliu ZUZU [Thu, 18 Feb 2016 16:47:36 +0000 (17:47 +0100)]
x86/monitor: minor left-shift undefined behavior checks
This minor patch adds a range-check to avoid left-shift caused undefined
behavior. Also replaces '1 <<' w/ '1U <<' @ x86 monitor.h in an effort to avoid
a future potential '1 << 31' that would cause a similar issue.
Corneliu ZUZU [Thu, 18 Feb 2016 14:08:25 +0000 (15:08 +0100)]
x86/hvm_event: fix uninitialized struct field usage introduced by c/s f5365e6
c/s f5365e6: "xen/vm-events: Move parts of monitor_domctl code to common-side",
introduced a use without initialization issue.
hvm_event_breakpoint calls hvm_event_traps(&req) and if sync is true that
ors some bits into req->flags which was never initialised.
Reported by Coverity Scan.
Initializes req @ hvm_event_breakpoint entry.
Coverity-ID: 1353192 Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 18 Feb 2016 14:07:33 +0000 (15:07 +0100)]
x86: drop failsafe callback invocation from assembly
Afaict this was never necessary on a 64-bit hypervisor, and was instead
just blindly cloned over from 32-bit code: We don't fiddle with (and
hence don't reload) any of DS, ES, FS, or GS, and an exception on IRET
itself can equally well be reported to the guest as that very exception
on the target of that IRET.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 18 Feb 2016 14:05:34 +0000 (15:05 +0100)]
x86emul: fix rIP handling
Deal with rIP just like with any other register: Truncate to designated
width upon entry, write back the zero-extended 32-bit value when
emulating 32-bit code, and leave the upper 48 bits unchanged for 16-bit
code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli [Thu, 18 Feb 2016 14:04:23 +0000 (15:04 +0100)]
RTDS: pack trace data better for xentrace_format
when tracing runstate changes, the vcpu and domain IDs
are encoded in the lower and higher, respectively, parts
of a 32 bits integer. When decoding a trace with
xentrace_format, this makes it possible to display
such events like this:
For consistency, we should do the same when displaying
the events coming from the RTDS scheduler (when using
the same tool), and to do that, we need to invert the
order in which the fields are being put in the trace
struct right now.
While there, we also:
- fix the use of TRC_RTDS_SCHED_TASKLET (it should
only be involved when a tasklet is scheduled, not
_every_ time rt_schedule() is invoked!);
- remove a very chatty and useless (nothing has been
picked!) use of TRC_RTDS_RUNQ_PICK.
In fact, one can already figure out when nothing has been
picked from the runqueue, by looking at when cpu_idle
is invoked --which is the same thing one would do if on
Credit or Credit2.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Thu, 18 Feb 2016 14:04:00 +0000 (15:04 +0100)]
credit2: pack trace data better for xentrace_format
when tracing runstate changes, the vcpu and domain IDs
are encoded in the lower and higher, respectively, parts
of a 32 bits integer. When decoding a trace with
xentrace_format, this makes it possible to display
such events like this:
For consistency, we should do the same when displaying
the events coming from the Credit2 scheduler (when using
the same tool), and to do that, we need to invert the
order in which the fields are being put in the trace
struct right now.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Thu, 18 Feb 2016 14:03:34 +0000 (15:03 +0100)]
sched: improve domain creation tracing
by doing the following two things:
- move TRC_SCHED_DOM_{ADD,REM}, into the functions
that do the actual scheduling-related domain
initialization;
- add two 'generic' DOM_{ADD,REM} events. They're
made part of the TRC_DOM0 tracing class, as Dom0
is, usually, from where domains are created and
destroyed.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Dario Faggioli [Thu, 18 Feb 2016 14:03:15 +0000 (15:03 +0100)]
sched: move up the trace record for vcpu_wake and vcpu_sleep
vcpu_wake() and vcpu_sleep() are called before the specific
schedulers wakeup and sleep routines (in fact, it is them
that calls those specific routine).
Make the trace reflect that, by moving the records up. In
fact, it is more natural and easy to find the record of
the event (e.g., the wakeup) *before* the records of the
actions that deals with the event itself.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Thu, 18 Feb 2016 14:02:16 +0000 (15:02 +0100)]
x86: avoid flush IPI when possible
Since CLFLUSH, other than WBINVD, is a cache coherency domain wide
flush, there's no need to IPI other CPUs if this is the only flushing
being requested. (As a secondary change, move a local variable into the
scope where it's actually needed.)
As a secondary change also eliminate another leftover from 32-bit days:
invalidate_interrupt() can clear FLUSH_TLB_GLOBAL alongside FLUSH_TLB,
since write_ptbase() (as a descendant of __sync_local_execstate()) now
unconditionally fiddles with CR4.PGE.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dirk Behme [Thu, 4 Feb 2016 16:49:35 +0000 (17:49 +0100)]
xen/arm64: Make sure we get all debug output
Starting in the wrong ELx mode I get the following debug output:
...
- Current EL 00000004 -
- Xen must be entered in NS EL2 mode -
- Boot failed -
The output of "Please update the bootloader" is missing here, because
string concatenation in gas, unlike in C, keeps the \0 between each
individual string.
Make sure this is output, too. With this, we get
...
- Current EL 00000004 -
- Xen must be entered in NS EL2 mode -
- Please update the bootloader -
- Boot failed -
as intended.
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- added same change to arm32 case ]
Ian Campbell [Wed, 17 Feb 2016 14:58:33 +0000 (14:58 +0000)]
xenpaging: do not leak if --pagefile given twice
By freeing filename (which is either NULL or the previous iteration of
this argument). This implements a semantic where the last --pagefile
given on the command line takes precedence.
This is the same semantic as the other options have.
Jim Fehlig [Wed, 17 Feb 2016 17:20:57 +0000 (10:20 -0700)]
libxlu_cfg: reject unknown characters following '\'
When dequoting config strings in xlu__cfgl_dequote(), unknown
characters following a '\', and the '\' itself, are discarded.
E.g. a disk configuration string containing
Doug Goldstein [Wed, 17 Feb 2016 15:24:29 +0000 (16:24 +0100)]
x86/PMU: make {acquire,release}_pmu_ownership names consistent
The function names were inconsistent with acquire and release being
called acquire_pmu_ownership() and release_pmu_ownship() respectively.
Function prototypes were available for both spellings so this change
makes them consistent and drops the dual function prototypes.
Additionally change the internal variable names within those functions
to ownership as well.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Roger Pau Monné [Wed, 17 Feb 2016 15:22:21 +0000 (16:22 +0100)]
x86/PVHv2: update the start info structure layout
After some discussion around the new boot ABI consensus has been reached
about the layout and contents of the start info. The following patch updates
the layout to what has been agreed.
Also, the new layout is described in binary terms in order to avoid issues
with alignments when using C structs.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Wed, 17 Feb 2016 15:21:20 +0000 (16:21 +0100)]
public: make some constants usable for assembler
Some constants defined in xen/include/public/xen.h are not usable in
assembler sources as they are either defined with "U" or "UL" suffixes
or they are inside #ifndef __ASSEMBLY__ areas.
Change this as grub2 could make use of those definitions.
This requires to move the definition of mk_unsigned_long() up. While
we are touching this macro, rename it in order to avoid namespace
pollution. This in turn requires adaption of some arch-x86 specific
headers.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 17 Feb 2016 15:20:01 +0000 (16:20 +0100)]
x86emul: relax asm() constraints
Let's give the compiler as much liberty at picking instruction operands
as possible. Also drop unnecessary size modifiers when the correct size
can already be derived from the asm() operands. Finally also drop an
"unsigned" from idiv_dbl()'s second parameter, allowing a cast to be
eliminated.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Feb 2016 15:18:50 +0000 (16:18 +0100)]
x86/HVM: fold hypercall tables
In order to reduce the risk of unintentionally adding a function
pointer to just one of the two tables, merge them into one, with each
entry pair getting generated by a single macro invocation (at once
dropping all explicit casting outside the macro definition).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Feb 2016 15:18:08 +0000 (16:18 +0100)]
x86/VMX: sanitize rIP before re-entering guest
... to prevent guest user mode arranging for a guest crash (due to
failed VM entry). (On the AMD system I checked, hardware is doing
exactly the canonicalization being added here.)
Note that fixing this in an architecturally correct way would be quite
a bit more involved: Making the x86 instruction emulator check all
branch targets for validity, plus dealing with invalid rIP resulting
from update_guest_eip() or incoming directly during a VM exit. The only
way to get the latter right would be by not having hardware do the
injection.
Note further that there are a two early returns from
vmx_vmexit_handler(): One (through vmx_failed_vmentry()) leads to
domain_crash() anyway, and the other covers real mode only and can
neither occur with a non-canonical rIP nor result in an altered rIP,
so we don't need to force those paths through the checking logic.
This is CVE-2016-2271 / XSA-170.
Reported-by: 刘令 <liuling-it@360.cn> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 17 Feb 2016 15:16:53 +0000 (16:16 +0100)]
x86: enforce consistent cachability of MMIO mappings
We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).
This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.
Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.
Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).
This is CVE-2016-2270 / XSA-154.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ian Campbell [Tue, 16 Feb 2016 11:49:53 +0000 (11:49 +0000)]
libxl: close fd's in parent when spawning qdisk
Coverity points out that these remain open in the parent upon
success, which is a resource leak.
To fix this rejig the exit paths such that success and error cases
both close the two fds, this means adjusting the callback to only
happen for the error case and it also makes sense to rename the label
from "error" to just "out".
Coverity (correctly) complains that the strncpy(p, "0x", 2) will not
null terminate p.
Although we can see that in the rest of the function p will
definitely be NULL terminated by the time it is complete there is no
harm in passing 3 to the strncpy and allowing it to NULL terminate to
placate Coverity. We know this is safe because the allocation to hold
the string includes a "+3" for the 0x and the terminating NULL.
Ian Campbell [Tue, 16 Feb 2016 11:09:43 +0000 (11:09 +0000)]
tools: libxl: free devpath on failure in libxl__blktap_devpath
The underlying code paths in tap_ctl_create attempt to handle both
*devpath == NULL (by allocating) and *devpath != NULL (caller provided
name) and if they allocate tend to write the return immediately before
doing other potentially error generating tasks. All of which makes
handling this at a lower level rather more complicated than handling
it in the error path of libxl__blktap_devpath.
Note that libxl__blktap_devpath initialises devpath to NULL and if the
earlier GCSPRINTF succeeds then the value is returned earlier.
Therefore if we make it to the call to tap_ctl_create then devpath is
still NULL on entry, therefore on the error path devpath is either
still NULL or has been set to a freshly allocated value by
tap_ctl_create. Since free(NULL) is fine it is sufficient to just
free(devpath).
I also considered adding a non-NULL devnull to the gc, even on
failure, but that would have required a comment to explain the
apparently strange behaviour.
Wei Liu [Tue, 16 Feb 2016 12:28:27 +0000 (12:28 +0000)]
stubdom: fix link farm runes
Previously in the three problematic libraries all public headers were
linked to source code directory. We should have created an include
directory for each library and linked public headers there.
Note that there was no breakage for those three libraries before this
patch. This patch merely changes the location headers are linked to so
that all libraries follow the same pattern.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
docs: document shortcomings of running QEMU as non-root
Running QEMU as non-root causes migration and PCI passthrough not to
work properly. Migration can be fixed rather easily
(http://marc.info/?l=xen-devel&m=145382864118600), but PCI passthrough
cannot (http://marc.info/?l=xen-devel&m=145286946113964).
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Fri, 12 Feb 2016 19:21:31 +0000 (19:21 +0000)]
tools/libxc: Fix use of zlib-options when building the domain builder
c/s de0f8c7c changed the use of zlib-options, and moved it from being locally
generated to coming from ./configure.
However, it neglected to modify the users of zlib-options. The curious use of
$(call ...) was to select either the -D or -l options as appropriate, but c/s de0f8c7c broke this by loosing the `grep`.
Instead, use $(filter ...) to pick out either the -D or -l options. This
fixes the build with Clang, which complains at passing '-llzma' when trying
to compile xc_dom_bzimageloader.c to xc_dom_bzimageloader.o.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Whatever the reason for silly code like this, it fools the current stack
overflow detection logic in the #DF handler (which triggers reliably on the
'orq' instruction).
Update the overflow condition to declare an overflow if %esp is anywhere
within the guard page, rather than just within the upper 8th of the page.
Additionally, check %esp against the expected stack base in all builds.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Corneliu ZUZU [Mon, 15 Feb 2016 13:14:16 +0000 (14:14 +0100)]
x86: merge 2 hvm_event_... functions into 1
This patch merges almost identical functions hvm_event_int3 and
hvm_event_single_step into a single function called hvm_event_breakpoint.
Also fixes event.c file header comment in the process.
Razvan Cojocaru [Mon, 15 Feb 2016 13:13:31 +0000 (14:13 +0100)]
vm_event: remove xc_mem_access_enable_emulate() and friends
xc_mem_access_enable_emulate() and xc_mem_access_disable_emulate()
are currently no-ops, that is all they do is set a flag that
nobody else checks. The user can already set the EMULATE flags in
the vm_event response if emulation is desired, and having an extra
check above that is not inherently safer, but it does complicate
(currenly unnecessarily) the API. This patch removes these
functions and the corresponding hypervisor code.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Coverity correctly identifies that the changes in mtrr_attrib_to_str()
introduce dead code. strings[] is a 2d array, rather than an array of
strings, which means that strings[x] will never be a NULL pointer.
Adjust the check to compenstate, by looking for a NUL in strings[x][0]
instead.
Curiously, Coverity did not notice the same error with memory_type_to_str().
There was also a further error; the strings were not NULL terminated, which
made the return type of memory_type_to_str() erronious.
Bump the 2D array to 3 characters, so the strings retain their NUL characters,
and introduce an ASSERT() as requested on one thread of the original patch.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 15 Feb 2016 13:12:06 +0000 (14:12 +0100)]
x86: improvements to pv_cpuid()
pv_cpuid() has two completely separate paths inside it depending on whether
current is dom0 or a domU. This causes unnecessary divergence, and
complicates future improvements. Take steps to undo it.
Changes:
* Create leaf and subleaf variables and use them consistently, instead of a
mix of {a,c} and regs->e{a,c}x as the input parameters.
* Combine the dom0 and domU hypervisor leaf handling, with an early exit.
* Apply sanity checks to domU as well. This brings PV domU cpuid handling in
line with HVM domains and PV dom0.
* Perform a real cpuid instruction for calculating CPUID.0xD[ECX=0].EBX. The
correct xcr0 is in context, and this avoids the O(M*N) loop over the domain
cpuid policy list which exists currently.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Doug Goldstein [Mon, 8 Feb 2016 02:45:03 +0000 (20:45 -0600)]
travis: add initial Travis CI script to do builds
This is just suppose to do a simple compile test on Travis CI. Currently
due to linux86 (bcc/bin86/dev86) not being whitelisted the tools cannot
be built.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 12 Feb 2016 09:34:13 +0000 (10:34 +0100)]
uniformally use __ varients for attribute names
Otherwise, debug code such as "void __attribute__((noreturn)) foobar()" fails
to compile when the noreturn itself gets expanded, resulting in
__attribute__((__attribute__((noreturn)))).
No function change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
and it was having trouble combining the explicit REX prefix with the REX.B
required for the use of %r15.
Follow what Linux does and use a redundant %ds prefix instead, for a final
generated instruction of `3e 41 0f ae 3f`
While modifying this line, fix the indentation which was out by one space.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Doug Goldstein <cardoe@cardoe.com>
Introduce and use NOP_DS_PREFIX.
Also the above description of the issue is slightly wrong: We're not
suffering from some gas versions not being able to combine multiple REX
prefixes, but from the replacement instruction, when requiring a REX
prefix in order to express the memory operand, becoming one byte longer
than the original one, triggering the respective build time safety
check.
Olaf Hering [Thu, 11 Feb 2016 15:38:14 +0000 (15:38 +0000)]
tools/console: correct make dependencies for _paths.h
Correct dependencies for _paths.h to avoid build failure with make -j.
Only main.c requires _paths.h. This fixes commit 8398ec70 ("xenconsole:
Ensure exclusive access to console using locks")
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 11 Feb 2016 12:11:21 +0000 (12:11 +0000)]
tools: probe for existence of qemu-xen stderr trace backend.
QEMU upstream commit ed7f5f1d8db0 ("trace: convert stderr backend to
log") renamed the "stderr" trace backend to "log", which breaks the
xen build when pointed at a QEMU tree after that point:
Upstream also changed, in baf86d6b3ca0 ("trace: switch default backend
to "log""), to use "log" as the default backend (previously it was
"nop").
Use ./scripts/tracetool.py to check for the presence of the stderr
backend and if it is present then explicitly enable it. If the stderr
backend is not present then assume a newer QEMU which defaults to
"log" and simply accept that default (there is a 1 commit window
upstream where this would result in no trace backend being enabled).
The check is done using the older (deprecated?) --check-backend/--backend
variant of the tracetool.py options rather than the new plural
versions since the singular was supported even by very old versions of
QEMU. New QEMU has compatibility code but if/when that is removed we
will still do the right thing i.e. no explict configuiration resulting
in the upstream default (currently "log").
If the explicit selection of the "stderr" backend is required then it
is now done unconditionally (not depending on debug=y), which is
simpler to arrange here but also matches the newer upstream's default
to "log" which is not conditional on debug being enabled either.
Tested with current qemu-xen-unstable (e9d8252) and current QEMU
upstream master (88c73d1), both out of tree via
QEMU_UPSTREAM_URL=/path/to/qemu-xen.git.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Paul Durrant <paul.durrant@citrix.com> Cc: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Thu, 11 Feb 2016 15:48:38 +0000 (16:48 +0100)]
x86/traps: prevent interleaving of concurrent cpu state dumps
If two cpus enter show_execution_state() concurrently, the resulting console
output interleaved, and of no help debugging the situation further.
As calls to these locations are rare and usually important, it is acceptable
to serialise them. These codepaths are also on the terminal error paths, so
the console lock must be the lock used for serialisation, to allow
console_force_unlock() to function properly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 11 Feb 2016 15:45:08 +0000 (16:45 +0100)]
x86/p2m: use large pages for MMIO mappings
When mapping large BARs (e.g. the frame buffer of a graphics card) the
overhead of establishing such mappings using only 4k pages has,
particularly after the XSA-125 fix, become unacceptable. Alter the
XEN_DOMCTL_memory_mapping semantics once again, so that there's no
longer a fixed amount of guest frames that represents the upper limit
of what a single invocation can map. Instead bound execution time by
limiting the number of iterations (regardless of page size).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Thu, 11 Feb 2016 15:40:47 +0000 (16:40 +0100)]
x86: drop X86_FEATURE_3DNOW_ALT
Introducing an X86_FEATURE aliased value turns out to complicate automatic
processing of the feature list. Drop X86_FEATURE_3DNOW_ALT and use
X86_FEATURE_PBE, extending the comment accordingly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Ian Campbell [Thu, 11 Feb 2016 09:23:54 +0000 (09:23 +0000)]
tools: libxl: make it illegal to pass libxl__realloc(gc) a non-gc ptr
That is, if gc is not NOGC and ptr is not NULL then ptr must be
associated with a gc.
Currently in this case the new_ptr would not be registered with any
gc, which Coverity rightly points out (in various different places)
would be a memory leak.
It would also be possible to fix this by adding a libxl__ptr_add() at
the same point, however semantically it seems like a programming error
to gc-realloc a pointer which is not associated with the gc in
question, so treat it as such.
Compile tested only, this change could expose latent bugs.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Wed, 10 Feb 2016 16:56:22 +0000 (16:56 +0000)]
init-xenstore-domain: cleanup all resources on a single exit path
Previously xs_fd would be left open, which is CID 1055993 (previously
partially fixed by 3bca826aae5eb).
Instead arrange for both success and error cases to cleanup everything
on a single exit path instead of doing partial cleanup on the success
path a few operations higher up.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Wed, 10 Feb 2016 16:26:25 +0000 (16:26 +0000)]
xenmon: initialise dummy array
This is just used to expand the shared backing file to the expected
size (whether this is actually necessary I'm not sure). Rather than
leaking some small amount of the processes' heap set the array to
zeroes.
While at it add a check that the malloc succeeded before using the
result.
Doug Goldstein [Thu, 11 Feb 2016 12:23:42 +0000 (12:23 +0000)]
build: specify minimum versions of make
To help people avoid having to figure out what versions of make
needs to be supported document it explicitly.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Doug Goldstein [Thu, 11 Feb 2016 12:23:41 +0000 (12:23 +0000)]
build: specify minimum versions of gcc and binutils
To help people avoid having to figure out what versions of gcc and
binutils need to be supported document them explicitly.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Wed, 10 Feb 2016 16:49:25 +0000 (16:49 +0000)]
public/io/netif.h: fix typos
Unfortunately my patch 162a81ab "document control ring and toeplitz
hashing" contained a couple of typos. This patch fixes them.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Wed, 10 Feb 2016 13:51:25 +0000 (14:51 +0100)]
x86: fix get_cpu_info() when built with clang
Clang understands the GCCism in use here, but still complains that sp is
unintialised. In such cases, resort to the older version of this code, which
directly reads %rsp into the temporary variable.
Note that we still keep the GCCism in the default case, as it causes GCC to
create rather better assembly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 10 Feb 2016 13:50:37 +0000 (14:50 +0100)]
x86: fix section type mismatch in mm.c
Clang doesn't like mixing const and non-const data in the same section. Move
zero_page into .bss.page_aligned.const and wildcard .bss.page_aligned when
linking.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 10 Feb 2016 13:49:06 +0000 (14:49 +0100)]
remove or annotate possibly-unused functions
Clang notices more unused functions than GCC.
* sh_next_page() is only used at GUEST_PAGING_LEVELS=2, so remove it from the
other guest level translation units
* rcu_batch_after() is completely unused.
* Various of the COMPAT() generated functions are used only for their
BUILD_BUG_ON() properties. Annotate them all as __maybe_used.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 10 Feb 2016 13:48:36 +0000 (14:48 +0100)]
xen/lib.h: fix ASSERT() to build with clang
Clang warns about a semicolon immediately following an if() clause as a
possible mistake, and recommends putting the semicolon on a new line if it was
intentional. A newline is not an option here, so use a set of empty braces
instead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Wed, 10 Feb 2016 13:46:45 +0000 (14:46 +0100)]
hvmloader: fix scratch_alloc to avoid overlaps
scratch_alloc() set scratch_start to the last byte of the current
allocation. The value of scratch_start is then reused as is (if it is
already aligned) in the next allocation. This result in a potential reuse
of the last byte of the previous allocation.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Wed, 10 Feb 2016 13:46:09 +0000 (14:46 +0100)]
x86/altp2m: merge p2m_set_altp2m_mem_access and p2m_set_mem_access
The altp2m subsystem in its current form duplicates much of the existing
code present in p2m for setting mem_access permissions. In this patch we
consolidate the two versions but keep the separate MEMOP and HVMOP interfaces.
Signed-off-by: Tamas K Lengyel <tlengyel@novetta.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Ian Campbell < ian.campbell@citrix.com > Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 9 Feb 2016 12:24:00 +0000 (13:24 +0100)]
x86/mm: make {cmpxchg,write}_guest_entry() hook shadow mode specific
... as they're being used for PV guests only, which don't use HAP mode.
This eliminates another pair of NULL callbacks in HAP as well as in 2-
and 3-guest-level shadow modes.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 9 Feb 2016 12:23:30 +0000 (13:23 +0100)]
x86/mm: drop guest_{map,get_eff}_l1e() hooks
Disallow the unmaintained and presumed broken translated-but-not-
external paging mode combination, allowing the respective paging hooks
to go away (which eliminates one pair of NULL callbacks in HAP mode).
As a result of them no longer being generic paging operations, make the
inline functions private to mm.c, dropping their struct vcpu parameters
where suitable.
The enforcement of the proper mode combination gets now done in
paging_enable(), requiring shadow_domctl() to no longer call
shadow_enable() directly.
Also as a result support for XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE gets
removed too.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 9 Feb 2016 12:22:13 +0000 (13:22 +0100)]
x86/nHVM: avoid NULL deref during INVLPG intercept handling
When intercepting (or emulating) L1 guest INVLPG, the nested P2M
pointer may be (is?) NULL, and hence there's no point in calling
p2m_flush(). In fact doing so would cause a dereference of that NULL
pointer at least in the ASSERT() right at the beginning of the
function.
While so far nothing supports hap_invlpg() being reachable from the
INVLPG intercept paths (only INVLPG insn emulation would lead there),
and hence the code in question (added by dd6de3ab99 ["Implement
Nested-on-Nested"]) appears to be dead, this seems to be the change
which can be agreed on as an immediate fix. Ideally, however, the
problematic code would go away altogether. See thread at
lists.xenproject.org/archives/html/xen-devel/2016-01/msg03762.html.
Reported-by: 刘令 <liuling-it@360.cn> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Razvan Cojocaru [Tue, 9 Feb 2016 12:20:49 +0000 (13:20 +0100)]
x86/hvm: simplify emulation triggered by vm_event response
Currently, after receiving a vm_event reply requesting emulation,
the actual emulation is triggered in p2m_mem_access_check(),
which means that we're waiting for the page fault to occur again
before emulating. Aside from the performance impact, this
complicates the code since between hvm_do_resume() and the second
page fault it is possible that the latter becomes a completely
new page fault - hence checking that EIP and the GPA match with
the ones in the original page fault. If they don't, duplicate
EPT fault vm_events will occur, of which a monitoring application
needs to be aware.
This patch makes struct arch_vm_event smaller (since we no longer
need to track eip and gpa), removes the checking code from
p2m_mem_access_check(), and moves the emulation in hvm_do_resume().
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Tamas K Lengyel [Fri, 5 Feb 2016 21:22:20 +0000 (14:22 -0700)]
xen-access: minor fixes
Only copy the VCPU_PAUSED flag to the response. Copy the entire mem_access
struct which is useful and easily forgotten when also testing the emulate
response flags. Turn off singlestepping on the vCPUs once we are done
processing all events, as we might have turned on singlestep there and leave
the VM in an undesirable state.
Signed-off-by: Tamas K Lengyel <tlengyel@novetta.com> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Juergen Gross [Mon, 8 Feb 2016 14:23:52 +0000 (15:23 +0100)]
libxc: correct memory range check in domain builder
Commit 81a76e4b12961a9f54f5021809074196dfe6dbba ("libxc: rework of
domain builder's page table handler") introduced a regression with
checking the required memory size of the domain. The needed maximum pfn
of the initial kernel mapping was added to the currently last used pfn
resulting in doubling the estimated memory need.
Correct the calculation of the last needed pfn to enable booting of
small domains again.
Reported-by: Anthony Perard <anthony.perard@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
When xc_map_foreign_batch got deprecated reinitializing vm_event on a domain
where an event listener was previously active broke as it relied on the flag
XEN_DOMCTL_PFINFO_XTAB to indicate that the magic page is not in the physmap.
Manually check the gpfn type, add it to the physmap if needed, and only then
try to map it.
Signed-off-by: Tamas K Lengyel <tlengyel@novetta.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Roger Pau Monne [Thu, 4 Feb 2016 15:25:50 +0000 (15:25 +0000)]
libxc: fix uninitialised usage of rc in meminit_hvm
Due to the HVMlite changes there's a chance that the value in rc is checked
without being initialised. Fix this by initialising it to 0 prior to the
while loop. Also add a specific error check to a previous populate_physmap
call, this prevents us from overwriting this error.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>