Aliasing FAR_EL2 to HIFAR makes the code confusing because on ARMv8
FAR_EL2[31:0] is architecturally mapped to HDFAR and FAR_EL2[63:32] to
HIFAR. See D7.2.30 in ARM DDI 0487B.a. Open-code the alias instead.
Boris Ostrovsky [Tue, 19 Sep 2017 15:47:47 +0000 (17:47 +0200)]
mm: scrub pages returned back to heap if MEMF_no_scrub is set
Set free_heap_pages()'s need_scrub to true if alloc_domheap_pages()
returns pages back to heap as result of assign_pages() error when those
pages were requested with MEMF_no_scrub flag.
We need to do this because there is a possibility that
alloc_heap_pages() might clear buddy's PGC_need_scrubs flag without
actually clearing the page.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 18 Sep 2017 10:31:02 +0000 (12:31 +0200)]
x86emul: re-order checks in test harness
On older systems printing the "n/a" messages (resulting from the
compiler not being new enough to deal with some of the test code) isn't
very useful: If both CPU and compiler are too old for a certain test,
we can as well omit those messages, as those tests wouldn't be run even
if the compiler did produce code. (This has become obvious with the
3DNow! tests, which I had to run on an older system still supporting
those insns, and that system naturally also had an older compiler.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
At the moment, most of the callers will have to use mfn_x. However
follow-up patches will remove some of them by propagating the typesafe a
bit further.
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
xen/arm: p2m: Check for p2m->domain to be initialized before releasing resources
Since p2m_teardown() can be called when p2m_init() haven't executed yet
we might deal with unitialized list "p2m->pages" which leads to crash.
To avoid this use back pointer to domain as end-of-initialization indicator.
Currently, cpregs.h is indirectly included every files of the hypervisor even
for arm64. However, the only use for arm64 is when emulating co-processors.
For arm32, all users of processor.h expect cpregs.h to be included in
order to access co-processors. So move the inclusion in
asm-arm/arm32/processor.h.
cpregs.h will also be directly included in the co-processors emulation
to accommodate arm64.
This is drastically reducing the exposure of cpregs.h to any source file
on arm64.
xen/arm: traps: Export a bunch of helpers to handle emulation
A follow-up patch will move some parts of traps.c in separate files.
The will require to use helpers that are currently statically defined.
Export the following helpers:
- inject_undef64_exception
- inject_undef_exception
- check_conditional_instr
- advance_pc
- handle_raz_wi
- handle_wo_wi
- handle_ro_raz
Note that asm-arm/arm32/traps.h is empty but it is to keep parity with
the arm64 counterpart.
This commit implements the Xen part of the cap mechanism for
Credit2.
A cap is how much, in terms of % of physical CPU time, a domain
can execute at most.
For instance, a domain that must not use more than 1/4 of
one physical CPU, must have a cap of 25%; one that must not
use more than 1+1/2 of physical CPU time, must be given a cap
of 150%.
Caps are per domain, so it is all a domain's vCPUs, cumulatively,
that will be forced to execute no more than the decided amount.
This is implemented by giving each domain a 'budget', and
using a (per-domain again) periodic timer. Values of budget
and 'period' are chosen so that budget/period is equal to the
cap itself.
Budget is burned by the domain's vCPUs, in a similar way to
how credits are.
When a domain runs out of budget, its vCPUs can't run any
longer. They can gain, when the budget is replenishment by
the timer, which event happens once every period.
Blocking the vCPUs because of lack of budget happens by
means of a new (_VPF_parked) pause flag, so that, e.g.,
vcpu_runnable() still works. This is similar to what is
done in sched_rtds.c, as opposed to what happens in
sched_credit.c, where vcpu_pause() and vcpu_unpause()
(which means, among other things, more overhead).
Note that, while adding new fields to csched2_vcpu and
csched2_dom, currently existing members are being moved
around, to achieve best placement inside cache lines.
Note also that xenalyze and tools/xentrace/format are being
updated too.
The entire file of mctelem.c is in Linux coding style, so do not
change the coding style and only remove trailing spaces and extra
blank lines.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
mctelem.c uses the tab indention. Add an emacs block to avoid mixed
indention styles in certain editors.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/mce: adapt mce_intel.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/mce: adapt mcation.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/vmce: adapt vmce.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/mce: adapt mce.{c, h} to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 11 Aug 2017 13:02:31 +0000 (13:02 +0000)]
x86/mm: Prevent 32bit PV guests using out-of-range linear addresses
The grant ABI uses 64 bit values, and allows a PV guest to specify linear
addresses. There is nothing interesting a 32bit PV guest can reference which
will pass an __addr_ok() check (and therefore succeed), but we should still
explicitly check and reject such an attempt.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
As with the create side of things, these are largely identical. Most cases
are actually destroying the mapping rather than replacing it with a stolen
entry.
Reimplement their logic in replace_grant_pv_mapping() in a mostly common
way.
No (intended) change in behaviour from a guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 1 Aug 2017 15:39:59 +0000 (15:39 +0000)]
x86/mm: Combine create_grant_{pte,va}_mapping()
create_grant_{pte,va}_mapping() are nearly identical; all that is really
different between them is how they convert their addr parameter to the pte to
install the grant into.
Reimplement their logic in create_grant_pv_mapping() in a mostly common way.
No (intended) change in behaviour from a guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 2 Aug 2017 11:40:02 +0000 (12:40 +0100)]
x86/mm: Improvements to PV l1e mapping helpers
Drop guest_unmap_l1e() and use unmap_domain_page() directly. This will
simplify future cleanup. Rename guest_map_l1e() to map_guest_l1e() to closer
match the mapping nomenclature.
Switch map_guest_l1e() to using mfn_t. Correct the comment to indicate that
it takes a linear address (not a virtual address), and correct the parameter
name.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Getting nic list in case userspace proxy is called
without freeing. The fix is to use cds->nics to
keep nic list. cds->nics will be freed in
devices_teardown_cb.
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Add libxl__device_add to simple write XenStore device conifg
and libxl__device_add_async to update domain configuration
and write XenStore device config asynchroniously.
Almost all devices have similar libxl__device_xxxx_add function.
This generic functions implement same functionality but
using the device handling framework. Th device specific
part such as setting xen store configurationis moved
to set_xenstore_config callback of the device framework.
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Tue, 12 Sep 2017 12:45:13 +0000 (14:45 +0200)]
gnttab: also validate PTE permissions upon destroy/replace
In order for PTE handling to match up with the reference counting done
by common code, presence and writability of grant mapping PTEs must
also be taken into account; validating just the frame number is not
enough. This is in particular relevant if a guest fiddles with grant
PTEs via non-grant hypercalls.
Note that the flags being passed to replace_grant_host_mapping()
already happen to be those of the existing mapping, so no new function
parameter is needed.
This is CVE-2017-14319 / XSA-234.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
tools/xenstore: dont unlink connection object twice
A connection object of a domain with associated stubdom has two
parents: the domain and the stubdom. When cleaning up the list of
active domains in domain_cleanup() make sure not to unlink the
connection twice from the same domain. This could happen when the
domain and its stubdom are being destroyed at the same time leading
to the domain loop being entered twice.
Additionally don't use talloc_free() in this case as it will remove
a random parent link, leading eventually to a memory leak. Use
talloc_unlink() instead specifying the context from which the
connection object should be removed.
This is CVE-2017-14317 / XSA-233.
Reported-by: Eric Chanudet <chanudete@ainfosec.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Tue, 12 Sep 2017 12:43:16 +0000 (14:43 +0200)]
xen/mm: make sure node is less than MAX_NUMNODES
The output of MEMF_get_node(memflags) can be as large as nodeid_t can
hold (currently 255). This is then used as an index to arrays of size
MAX_NUMNODE, which is 64 on x86 and 1 on ARM, can be passed in by an
untrusted guest (via memory_exchange and increase_reservation) and is
not currently bounds-checked.
Check the value in page_alloc.c before using it, and also check the
value in the hypercall call sites and return -EINVAL if appropriate.
Don't permit domains other than the hardware or control domain to
allocate node-constrained memory.
This is CVE-2017-14316 / XSA-231.
Reported-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The new wrappers will add more safety when converting an address to a
frame number (either machine or guest). They are already existing for
Arm and could be useful in common code.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Wed, 16 Aug 2017 17:07:27 +0000 (18:07 +0100)]
xen/x86: Replace mandatory barriers with compiler barriers
In this case, rmb() is being used for its compiler barrier property. Replace
it with an explicit barrer() and comment, to avoid it becoming an unnecessary
lfence instruction (when rmb() gets fixed) or looking like an SMP issue.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 7 Sep 2017 16:38:52 +0000 (17:38 +0100)]
x86/mm: Allow map_domain_page_global() to be used during boot
map_domain_page_global() uses vmap under the hood, which is set up immediately
after switching to SYS_STATE_boot. Relax the local_irq_is_enabled() part of
the assertion before Xen has finished booting, so map_domain_page_global() can
be used duing SMP preparation.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 8 Sep 2017 14:24:41 +0000 (16:24 +0200)]
hvmloader: dynamically determine scratch memory range for tests
This re-enables tests on configurations where commit 0d6968635c
("hvmloader: avoid tests when they would clobber used memory") forced
them to be skipped.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Fri, 8 Sep 2017 13:44:33 +0000 (14:44 +0100)]
monitor: switch to plain bool
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Andrew Cooper [Wed, 6 Sep 2017 13:34:04 +0000 (14:34 +0100)]
x86/page: Implement {get,set}_pte_flags() as static inlines
This resolves 11 Coverity issues along the lines of the following:
1600 for ( i = 0; i < NR_RESERVED_GDT_PAGES; i++ )
CID: Operands don't affect result
(CONSTANT_EXPRESSION_RESULT)result_independent_of_operands: ((33U /* 1U |
0x20U */) | (({...}) ? 8388608U /* 1U << 23 */ : 0) | 0x40U | 2U) & 4095
is always 0x63 regardless of the values of its operands. This occurs as
the bitwise second operand of "|".
1601 l1e_write(pl1e + FIRST_RESERVED_GDT_PAGE + i,
1602 l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW));
This is presumably because once preprocessed, the association of joint logic
inside {get,set}_pte_flags() is lost.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Mon, 4 Sep 2017 16:46:16 +0000 (17:46 +0100)]
DEPS handling: Remove absolute paths from references to cwd
In some directories we use gcc on source files elsewhere, to generate
a .o here in the current directory. Eg in tools/libxl/,
gcc -I -o build.o /path/to/libacpi/build.c
We pass -MMD and -MF options to generate a .d file right here.
In the general case this .c file might need to include things from the
directory here, eg libacpi/build.c eventually #includes various
*libxl*.h. We pass gcc -I. for this, which means things from the cwd
where we invoked gcc, not the directory of the #including file.
When we do this, gcc's -MMD output mentions /path/to/libxl/*libxl*.h,
even though it could refer to simply *libxl*.h. This is presumably
because gcc has noticed that `.' in this context must mean relative to
the invocation cwd, not relative to build.c, and gcc doesn't realise
that references in the .d file are also wrt the invocation cwd.
make distinguishes targets purely textually. It will canonicalise a
target name by removing ./ before comparison (so _libxl_types.h and
./_libxl_types.h are considered the same target) but it won't examine
the filesystem. So _libxl_types.h and
/path/to/tools/libxl/_libxl_types.h are different targets.
And, _libxl_types.h is generated from a pattern rule. This pattern
rule is therefore instatiated twice, and the two instances may be run
concurrently - but use the same tempfiles and can therefore fail.
The thing that is wrong here is gcc's choice to output an absolute
path.
We could work around it by adding a rule to teach make about a
relationship between these `two different files'. But this has to be
done for every autogenerated file and is therefore fragile (leaving a
race bug when we get it wrong).
Ideally we would fix the problem by fixing the .d file as it is
generated. But the .d files are generated by many many rules
mentioning $(CC) and $(CFLAGS). (We might in theory pass a bash
process substitution to -MF, but 1. that's not portable to people who
don't have bash and 2. it hangs, anyway.)
So instead we do this conversion at include time. That is, we tell
make to include not the raw .d files, but the sedded ones.
The sedding removes occurrences of ` $PWD/'. We use the shell
variable PWD because the make variable sometimes refers to the xen
toplevel. If gcc's output format should change, then this sed rune
may not work any more, but that doesn't seem very likely.
The rune is only effective for dependencies on files which are exactly
in the current directory, or a subdirectory of it named simply by its
subdirectory name. If there are autogenerated include files which
exist in a sibling (or worse, somewhere completely else), this
approach will not work, because we'd have to figure out what name this
Makefile usually uses to refer to them. Hopefully such things don't
exist.
The indirect variables DEPS_RM and DEPS_INCLUDE are necessary to
preserve the assumptions made in the various Makefiles. Specifically,
xen/ Makefiles assume that it is ok to say DEPS+=something (where
something is in a subdirectory); tools/ Makefiles all used to include
DEPS themselves (but now they include DEPS_INCLUDE); and many
Makefiles tended to explictly rm DEPS (but now rm DEPS_RM).
In the new scheme of things: DEPS is the files that come out of gcc
(or perhaps an assembler or something) and may be assigned to by
Makefiles. DEPS_INCLUDE is the processed form. And DEPS_RM is both
combined, so that they both get cleaned.
We need to explicitly use $(wildcard ) to do the wildcard expansion on
DEPS a bit earlier. If we didn't, then DEPS_INCLUDE would contain
`.*.d2' which would not exist.
Evaluation order: DEPS_RM and DEPS_INCLUDE are recursively expanded
variables, so that although they are defined early (in Config.mk),
their actual values are computed at the time of use, using the value
of DEPS that is prevailing at that time.
Reported-by: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
I have verified that I haven't missed anything, with this rune:
git-grep '\bDEPS\b'
Reported-by: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
And editing tools/xenstat/libxenstat/Makefile by hand.
I verified that I didn't miss anything with this rune:
git-grep '\bDEPS\b' | grep -v include |less
Reported-by: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Mon, 4 Sep 2017 16:46:13 +0000 (17:46 +0100)]
DEPS handling: Provide DEPS_RM and DEPS_INCLUDE
These are not used anywhere yet, so no functional change.
Reported-by: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Wed, 6 Sep 2017 15:33:52 +0000 (11:33 -0400)]
mm: Don't scrub pages while holding heap lock in alloc_heap_pages()
Instead, preserve PGC_need_scrub bit when setting PGC_state_inuse
state while still under the lock and clear those pages later.
Note that we still need to grub the lock when clearing PGC_need_scrub
bit since count_info might be updated during MCE handling in
mark_page_offline().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Yi Sun [Mon, 4 Sep 2017 11:01:44 +0000 (19:01 +0800)]
tools: change the type of '*nr' in 'libxl_psr_cat_get_info'
Due to historical reason, type of parameter '*nr' in 'libxl_psr_cat_get_info'
is 'int'. But this is not right. It should be 'unsigned int'. This patch fixes
this and does related changes.
Suggested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Yi Sun [Mon, 4 Sep 2017 11:01:43 +0000 (19:01 +0800)]
tools: use '__i386__' and '__x86_64__' to replace PSR macros
The libxl interfaces and related functions are not necessary to be included by
'LIBXL_HAVE_PSR_CMT' and 'LIBXL_HAVE_PSR_CAT'. So replace them to common x86
macros. Furthermore, only compile 'xl_psr.c' under x86.
Suggested-by: Roger Pau Monné <roger.pau@citrix.com> Suggested-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Wed, 6 Sep 2017 10:32:00 +0000 (12:32 +0200)]
x86: introduce and use setup_force_cpu_cap()
For XEN_SMEP and XEN_SMAP to not be cleared while bringing up APs we'd
need to clone the respective hack used for CPUID_FAULTING. Introduce an
inverse of setup_clear_cpu_cap() instead, but let clearing of features
overrule forced setting of them.
XEN_SMAP being wrong post-boot is a problem specifically for live
patching, as a live patch may need alternative instruction patching
keyed off of that feature flag.
Reported-by: Sarah Newman <security@prgmr.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 1 Sep 2017 17:05:21 +0000 (17:05 +0000)]
xen: Drop asmlinkage everywhere
asmlinkage is defined as nothing on all architectures, and not used
consistently anywhere, even in common code. Remove it all.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>