It has a different behaviour compared to the old MSR handlers: if MSR
is being handled by guest_rdmsr() then RDMSR will either succeed (if
a guest is allowed to access it based on its MSR policy) or produce
a GP fault. A guest will never see a H/W value of some MSR unknown to
this function.
guest_rdmsr() unifies and replaces the handling code from
vmx_msr_read_intercept() and priv_op_read_msr().
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
This (along with the prep work in init_domain_msr_policy()) also fixes
a bug where Dom0 could probe and find CPUID faulting, even though it
couldn't actually use it.
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy
Since each vCPU now has struct msr_vcpu_policy, use cpuid_faulting bit
from there in current logic and remove arch_vcpu::cpuid_faulting.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
The new structure contains information about guest's MSRs that are
unique to each vCPU. It starts with only 1 MSR:
MSR_INTEL_MISC_FEATURES_ENABLES
Which currently has only 1 usable bit: cpuid_faulting.
Add 2 global policy objects: hvm_max and pv_max that are inited during
boot up. Availability of MSR_INTEL_MISC_FEATURES_ENABLES depends on
availability of MSR_INTEL_PLATFORM_INFO.
Add init_vcpu_msr_policy() which sets initial MSR policy for every vCPU
during domain creation with a special case for Dom0.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
The new structure contains information about guest's MSRs that are
shared between all domain's vCPUs. It starts with only 1 MSR:
MSR_INTEL_PLATFORM_INFO
Which currently has only 1 usable bit: cpuid_faulting.
Add 2 global policy objects: hvm_max and pv_max that are inited during
boot up. It's always possible to emulate CPUID faulting for HVM guests
while for PV guests the H/W support is required.
Add init_domain_msr_policy() which sets initial MSR policy during
domain creation with a special case for Dom0.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Wei Liu [Wed, 13 Sep 2017 16:46:53 +0000 (17:46 +0100)]
x86/mm: add pv prefix to {set,destroy}_gdt
Unfortunately they can't stay local to PV code because domain.c still
needs them. Change their names and fix up call sites. The code will be
moved later together with other descriptor table manipulation code.
Also move the declarations to pv/mm.h and provide stubs.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Wed, 13 Sep 2017 15:52:20 +0000 (16:52 +0100)]
x86/mm: move ro page fault emulation code
Move the code to pv/ro-page-fault.c. Create a new header file
asm-x86/pv/mm.h and move the declaration of pv_ro_page_fault there.
Include the new header file in traps.c.
Fix some coding style issues. The prefixes (ptwr and mmio) are
retained because there are two sets of emulation code in the same file.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
As we have generic functions to get device list
(libxl__device_list) no need to have callback in
the framework. To resolve issue when XS entry
doesn't match device name, device type is
extended with field "entry" which keeps XS entry.
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Yi Sun [Fri, 22 Sep 2017 14:28:35 +0000 (16:28 +0200)]
x86/PSR: fix cmdline option handling
Commit 0ade5e causes a bug that the psr features presented in cmdline
cannot be correctly enumerated.
1. If there is only 'psr=', then CMT is enumerated which is not right.
2. If cmdline is 'psr=cmt,cat,cdp,mba', only the last feature is enumerated.
This patch fixes the issues.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Fri, 22 Sep 2017 14:27:36 +0000 (16:27 +0200)]
custom parameter handling fixes
The recent changes to their handling introduced a few false warnings,
due to checks looking at the wrong string slot. While going through all
those commits and looking for patterns similar to the "dom0_mem=" I've
noticed this with, I also realized that there were other issues with
"dom0_nodes=" and "rmrr=", partly pre-existing, but partly also due to
those recent changes not having gone far enough.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
perfc: fix build after commit fc32575968 when CONFIG_PERF_COUNTERS=y
The commit fc32575968 "public/sysctl: drop unnecessary typedefs and
handles" went a bit too far by replacing all xen_systcl_*_t type to
struct xen_sysctl_*.
However, xen_sysctl_perfc_val_t was a typedef on uint32_t and therefore
is not associated to a structure.
Use xen_sysctl_perfc_val_t to fix the build when CONFIG_PERF_COUNTERS=y
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
The implementation of get_paged_frame is currently different whether the
architecture support sharing memory or paging memory. Both
version are extremely similar so it is possible to consolidate in a
single implementation.
The main difference is the x86 version will allow grant on foreign page
when using HVM/PVH whilst Arm does not. At the moment, on x86 foreign pages
are only allowed for PVH Dom0. It seems that foreign pages should never
be granted so deny them
The check for shared/paged memory are now gated with the respective ifdef.
Potentially, dummy p2m_is_shared/p2m_is_paging could be implemented for
Arm.
Lastly remove pointless parenthesis in the code modified.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
In commit ad4b3e1e9df34 ("xen: credit2: implement
utilization cap") xfree() was being called (for
deallocating the budget replenishment timer, during
domain destruction) inside an IRQ disabled critical
section.
That must not happen, as it uses the mem-pool's lock,
which needs to be taken with IRQ enabled. And, in fact,
we crash (in debug builds):
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at spinlock.c:47
(XEN) ****************************************
Let's, therefore, kill and deallocate the timer outside of
the critical sections, when IRQs are enabled.
xen/arm: page: Remove unused attributes DEV_NONSHARED and DEV_CACHED
They were imported from non-LPAE Linux, but Xen is LPAE only. It is time
to do some clean-up in the memory attribute and keep only what make
sense for Xen. Follow-up patch will do more clean-up.
Also, update the comment saying our attribute matches Linux.
Jan Beulich [Wed, 20 Sep 2017 15:25:01 +0000 (17:25 +0200)]
public/sysctl: drop unnecessary typedefs and handles
By virtue of the struct xen_sysctl container structure, most of them
are really just cluttering the name space.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Wed, 20 Sep 2017 15:23:41 +0000 (17:23 +0200)]
public/domctl: drop unnecessary typedefs and handles
By virtue of the struct xen_domctl container structure, most of them
are really just cluttering the name space.
While doing so,
- convert an enum typed (pt_irq_type_t) structure field to a fixed
width type,
- make x86's paging_domctl() and descendants take a properly typed
handle,
- add const in a few places.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Meng Xu <mengxu@cis.upenn.edu> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Andrew Cooper [Wed, 20 Sep 2017 15:21:29 +0000 (17:21 +0200)]
x86/hvm: break out __hvm_copy()'s translation logic
It will be reused by later changes.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 20 Sep 2017 15:20:53 +0000 (17:20 +0200)]
x86/hvm: rename enum hvm_copy_result to hvm_translation_result
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
add new domctl hypercall to set grant table resource limits
Add a domctl hypercall to set the domain's resource limits regarding
grant tables.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
move XENMAPSPACE_grant_table code into grant_table.c
The x86 and arm versions of XENMAPSPACE_grant_table handling are nearly
identical. Move the code into a function in grant_table.c and add an
architecture dependant hook to handle the differences.
Switch to mfn_t in order to be more type safe.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
x86/vvmx: add hvm_intsrc_vector support to nvmx_intr_intercept()
Under the following circumstances:
1. L1 doesn't enable PAUSE exiting or PAUSE-loop exiting controls
2. L2 executes PAUSE in a loop with RFLAGS.IE == 0
L1's PV IPI through event channel will never reach the target L1's vCPU
which runs L2 because nvmx_intr_intercept() doesn't know about
hvm_intsrc_vector. This leads to infinite L2 loop without nested
vmexits and can cause L1 to hang.
The issue is easily reproduced with Qemu/KVM on CentOS-7-1611 as L1
and an L2 guest with SMP.
Fix nvmx_intr_intercept() by injecting hvm_intsrc_vector irq into L1
which will cause nested vmexit.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
xen/arm: traps: Introduce a helper to read the hypersivor fault register
While ARM32 has 2 distinct registers for the hypervisor fault register
(one for prefetch abort, the other for data abort), AArch64 has only
one.
Currently, the logic is open-code but a follow-up patch will require to
read it too. So move the logic in a separate helper and use it instead
of open-coding it.
Aliasing FAR_EL2 to HIFAR makes the code confusing because on ARMv8
FAR_EL2[31:0] is architecturally mapped to HDFAR and FAR_EL2[63:32] to
HIFAR. See D7.2.30 in ARM DDI 0487B.a. Open-code the alias instead.
Boris Ostrovsky [Tue, 19 Sep 2017 15:47:47 +0000 (17:47 +0200)]
mm: scrub pages returned back to heap if MEMF_no_scrub is set
Set free_heap_pages()'s need_scrub to true if alloc_domheap_pages()
returns pages back to heap as result of assign_pages() error when those
pages were requested with MEMF_no_scrub flag.
We need to do this because there is a possibility that
alloc_heap_pages() might clear buddy's PGC_need_scrubs flag without
actually clearing the page.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 18 Sep 2017 10:31:02 +0000 (12:31 +0200)]
x86emul: re-order checks in test harness
On older systems printing the "n/a" messages (resulting from the
compiler not being new enough to deal with some of the test code) isn't
very useful: If both CPU and compiler are too old for a certain test,
we can as well omit those messages, as those tests wouldn't be run even
if the compiler did produce code. (This has become obvious with the
3DNow! tests, which I had to run on an older system still supporting
those insns, and that system naturally also had an older compiler.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
At the moment, most of the callers will have to use mfn_x. However
follow-up patches will remove some of them by propagating the typesafe a
bit further.
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
xen/arm: p2m: Check for p2m->domain to be initialized before releasing resources
Since p2m_teardown() can be called when p2m_init() haven't executed yet
we might deal with unitialized list "p2m->pages" which leads to crash.
To avoid this use back pointer to domain as end-of-initialization indicator.
Currently, cpregs.h is indirectly included every files of the hypervisor even
for arm64. However, the only use for arm64 is when emulating co-processors.
For arm32, all users of processor.h expect cpregs.h to be included in
order to access co-processors. So move the inclusion in
asm-arm/arm32/processor.h.
cpregs.h will also be directly included in the co-processors emulation
to accommodate arm64.
This is drastically reducing the exposure of cpregs.h to any source file
on arm64.
xen/arm: traps: Export a bunch of helpers to handle emulation
A follow-up patch will move some parts of traps.c in separate files.
The will require to use helpers that are currently statically defined.
Export the following helpers:
- inject_undef64_exception
- inject_undef_exception
- check_conditional_instr
- advance_pc
- handle_raz_wi
- handle_wo_wi
- handle_ro_raz
Note that asm-arm/arm32/traps.h is empty but it is to keep parity with
the arm64 counterpart.
This commit implements the Xen part of the cap mechanism for
Credit2.
A cap is how much, in terms of % of physical CPU time, a domain
can execute at most.
For instance, a domain that must not use more than 1/4 of
one physical CPU, must have a cap of 25%; one that must not
use more than 1+1/2 of physical CPU time, must be given a cap
of 150%.
Caps are per domain, so it is all a domain's vCPUs, cumulatively,
that will be forced to execute no more than the decided amount.
This is implemented by giving each domain a 'budget', and
using a (per-domain again) periodic timer. Values of budget
and 'period' are chosen so that budget/period is equal to the
cap itself.
Budget is burned by the domain's vCPUs, in a similar way to
how credits are.
When a domain runs out of budget, its vCPUs can't run any
longer. They can gain, when the budget is replenishment by
the timer, which event happens once every period.
Blocking the vCPUs because of lack of budget happens by
means of a new (_VPF_parked) pause flag, so that, e.g.,
vcpu_runnable() still works. This is similar to what is
done in sched_rtds.c, as opposed to what happens in
sched_credit.c, where vcpu_pause() and vcpu_unpause()
(which means, among other things, more overhead).
Note that, while adding new fields to csched2_vcpu and
csched2_dom, currently existing members are being moved
around, to achieve best placement inside cache lines.
Note also that xenalyze and tools/xentrace/format are being
updated too.
The entire file of mctelem.c is in Linux coding style, so do not
change the coding style and only remove trailing spaces and extra
blank lines.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
mctelem.c uses the tab indention. Add an emacs block to avoid mixed
indention styles in certain editors.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/mce: adapt mce_intel.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/mce: adapt mcation.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/vmce: adapt vmce.c to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/mce: adapt mce.{c, h} to Xen hypervisor coding style
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 11 Aug 2017 13:02:31 +0000 (13:02 +0000)]
x86/mm: Prevent 32bit PV guests using out-of-range linear addresses
The grant ABI uses 64 bit values, and allows a PV guest to specify linear
addresses. There is nothing interesting a 32bit PV guest can reference which
will pass an __addr_ok() check (and therefore succeed), but we should still
explicitly check and reject such an attempt.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
As with the create side of things, these are largely identical. Most cases
are actually destroying the mapping rather than replacing it with a stolen
entry.
Reimplement their logic in replace_grant_pv_mapping() in a mostly common
way.
No (intended) change in behaviour from a guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 1 Aug 2017 15:39:59 +0000 (15:39 +0000)]
x86/mm: Combine create_grant_{pte,va}_mapping()
create_grant_{pte,va}_mapping() are nearly identical; all that is really
different between them is how they convert their addr parameter to the pte to
install the grant into.
Reimplement their logic in create_grant_pv_mapping() in a mostly common way.
No (intended) change in behaviour from a guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 2 Aug 2017 11:40:02 +0000 (12:40 +0100)]
x86/mm: Improvements to PV l1e mapping helpers
Drop guest_unmap_l1e() and use unmap_domain_page() directly. This will
simplify future cleanup. Rename guest_map_l1e() to map_guest_l1e() to closer
match the mapping nomenclature.
Switch map_guest_l1e() to using mfn_t. Correct the comment to indicate that
it takes a linear address (not a virtual address), and correct the parameter
name.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Getting nic list in case userspace proxy is called
without freeing. The fix is to use cds->nics to
keep nic list. cds->nics will be freed in
devices_teardown_cb.
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com> Acked-by: Wei Liu <wei.liu2@citrix.com>