This commit substitutes the direct access of the host's p2m
(&d->arch.p2m) for the macro "p2m_get_hostp2m". This macro simplifies
readability and also the differentiation between the host's p2m and
alternative p2m's, i.e., as part of the altp2m subsystem that will be
submitted in the future.
If option '-l' or '--lmce' is specified and the host supports LMCE,
xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c'
is not present).
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
tools/libxc: add support of injecting MC# to specified CPUs
Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the
current xc_mca_op() does not use this feature and not provide an
interface to callers. This commit add a new xc_mca_op_inject_v2() that
receives a cpumap providing the set of target CPUs.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present
in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP.
By default, LMCE is not exposed to guest so as to keep the backwards migration
compatibility.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> for hypervisor side Acked-by: Wei Liu <wei.liu2@citrix.com>
x86/domctl: generalize the restore of vMCE parameters
vMCE parameters in struct xen_domctl_ext_vcpucontext were extended in
the past, and is likely to be extended in the future. When migrating a
PV domain from old Xen, XEN_DOMCTL_set_ext_vcpucontext should handle
the differences.
Instead of adding ad-hoc handling code at each extension, we introduce
an array to record sizes of the current and all past versions of vMCE
parameters, and search for the largest one that does not expire the
size of passed-in parameters to determine vMCE parameters that will be
restored. If vMCE parameters are extended in the future, we only need
to adapt the array to reflect the extension.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
VT-d: fix VF of RC integrated PF matched to wrong VT-d unit
The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
we would wrongly use 00:00.0 to search VT-d unit.
If a PF is an extended function, the BDF of a traditional function within the
same device should be used to search VT-d unit. Otherwise, the real BDF of PF
should be used. According PCI-e spec, an extended function is a function
within an ARI device and Function Number is greater than 7. The original code
tried to tell apart them through checking PCI_SLOT(), missing counterpart of
pci_ari_enabled() (this function exists in linux kernel) compared to linux
kernel. Without checking whether ARI is enabled, it incurs a RC integrated PF
with PCI_SLOT() >0 is wrongly classified to an extended function. Note that a
RC integrated function isn't within an ARI device and thus cannot be extended
function and in this case the real BDF should be used.
Considering 'is_extfn' field of struct pci_dev has been passed down from
Domain0 to indicate whether the function is an extended function, this patch
just looks up the 'is_extfn' field of PF's struct pci_dev and set 'devfn' to 0
when 'is_extfn' is true.
Reported-by: Crawford, Eric R <Eric.R.Crawford@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Wei Liu [Fri, 30 Jun 2017 16:20:47 +0000 (17:20 +0100)]
x86/monitor.c: use plain bool
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
tools/libxl/libxl_pci.c: Judge igd through class code instead of device ID
IGD passthrough couldn't work on Skylake and Kabylake, because their
Device ID aren't in fixup_ids[]. Currently we need to add every intel
graphic ID into fixup_ids[], it is hard to maintain.
This patch judge intel graphics through vendor id (0x8086) and class
code(0x030000), this could support both the old and new intel graphics,
and reduce maintain work in future.
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
As both xen-netfront and xen-blkfront support multi-queue, they would
consume a lot of grant table references when there are many paravirtual
devices and vcpus assigned to guest. Guest domU might panic or hang due to
grant allocation failure when nr_grant_frames in guest has reached its max
value.
This utility would help the administrators to diagnose xen issue. There is
only one command gnttab_query_size so far to monitor the guest grant table
frame usage on dom0 side so that it is not required to debug on guest
kernel side for crash/hang analysis anymore.
It is extensible for adding new commands for more diagnostic functions and
the framework of xen-diag.c is from xen-livepatch.c.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
tools/libxc: add interface for GNTTABOP_query_size
This patch adds new interface for GNTTABOP_query_size in libxc to help
query the current grant table frames and maximum grant table frames for a
specific domain.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Thomas Sanders [Tue, 28 Mar 2017 17:57:52 +0000 (18:57 +0100)]
oxenstored: trim history in the frequent_ops function
We were trimming the history of commits only at the end of each
transaction (regardless of how it ended).
Therefore if non-transactional writes were being made but no
transactions were being ended, the history would grow
indefinitely. Now we trim the history at regular intervals.
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
x86/vmx: expose LMCE feature via guest MSR_IA32_FEATURE_CONTROL
If MCG_LMCE_P is present in guest MSR_IA32_MCG_CAP, then set LMCE and
LOCK bits in guest MSR_IA32_FEATURE_CONTROL. Intel SDM requires those
bits are set before SW can enable LMCE.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
A round of mce_softirq() may handle multiple deferred MCE's.
1/ If all of them are LMCE's, then mce_softirq() is called on one CPU
and should not wait for others.
2/ If at least one of them is non-local MCE, then mce_softirq()
should sync with other CPUs. mce_softirq() should check those two
cases and handle them accordingly.
Because mce_softirq() can be interrupted by MC# again, we should also
ensure the deferred MCE handling in mce_softirq() is immutable to the
change of the checking result.
A per-cpu list 'lmce_pending' is introduced to 'struct mc_telem_cpu_ctl'
along with the existing per-cpu list 'pending' for LMCE handling.
MC# handler mcheck_cmn_handler() ensures that
1/ if all deferred MCE's on a CPU are LMCE's, then all of their
telemetries will be only in 'lmce_pending' on that CPU;
2/ if at least one of deferred MCE on a CPU is not LMCE, then all
telemetries of deferred MCE's on that CPU will be only in
'pending' on that CPU.
Therefore, the non-empty of 'lmce_pending' can be used to determine
whether it's the former of the beginning two cases in MCE softirq
handler mce_softirq().
mce_softirq() atomically moves deferred MCE's from either list
'lmce_pending' on the current CPU or lists 'pending' on the current or
other CPUs to list 'processing' in the current CPU, and then handles
deferred MCE's in list 'processing'. New coming MC# before and after
the atomic move, which change the result of the check, do not change
whether MCE's in 'processing' are LMCE or not, so mce_softirq() can
still handle 'processing' according to the result of previous check.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/mce: allow mce_barrier_{enter,exit} to return without waiting
Add a 'wait' argument to mce_barrier_{enter,exit}() to specify whether
the barrier functions should return immediately without waiting
mce_barrier_{enter,exit}() on other CPUs. This is useful when handling
LMCE, where mce_barrier_{enter,exit} are called only on one CPU.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Since c/s cbc585158f ("x86/mce: eliminate unnecessary NR_CPUS-sized
arrays"), struct mc_telem_cpu_ctl was introduced and has been used as
the type of per-cpu variables rather than global variables. However,
some comments within it have not been updated accordingly.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Fri, 30 Jun 2017 15:54:26 +0000 (16:54 +0100)]
xen/arm: p2m: Rename p2m_valid, p2m_table, p2m_mapping and p2m_is_superpage
The helpers p2m_valid, p2m_table, p2m_mapping and p2m_is_superpage are
not specific to the stage-2 translation tables. They can also work on
any LPAE translation tables. So rename then to lpae_* and use pte.walk
to look for the value of the field.
Julien Grall [Fri, 30 Jun 2017 15:54:23 +0000 (16:54 +0100)]
xen/arm: create_xen_entries: Use typesafe MFN
Add a bit more safety when using create_xen_entries.
Also when destroying/modifying mapping, the MFN is currently not used.
Rather than passing _mfn(0) use INVALID_MFN to stay consistent with the
other usage.
Igor Druzhinin [Wed, 28 Jun 2017 19:27:08 +0000 (20:27 +0100)]
tools/libxenforeignmemory: add xenforeignmemory_map2 function
The new function repeats the behavior of the first version
except it has an extended list of arguments which are subsequently
passed to mmap() call.
This is needed for QEMU depriviledging.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Reverse sorting order, add blank lines at register change.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
To have it in .rodata, instead of reconstructing each time on stack.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
This is result of parsing cpu_map.xml from libvirt.
The most important part is handling leaf 0x00000007, but while at it add
other bits too.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Sergey Dyasli [Wed, 28 Jun 2017 09:35:45 +0000 (10:35 +0100)]
vvmx: fix ept_sync() for nested p2m
If ept_sync_domain() is called for np2m, the following happens:
1. *np2m*::ept_data::invalidate cpumask is updated
2. IPIs are sent for CPUs in domain_dirty_cpumask forcing vmexits
3. vmx_vmenter_helper() checks *hostp2m*::ept_data::invalidate
and does nothing
Which is clearly a bug. Make ept_sync_domain() to update hostp2m's
invalidate mask in nested p2m case and make vmx_vmenter_helper() to
invalidate EPT translations for all EPTPs if nested virt is enabled.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Wed, 28 Jun 2017 14:05:35 +0000 (15:05 +0100)]
x86/vvmx: Fix WRMSR interception of VMX MSRs
FEATURE_CONTROL is already read with LOCK bit set (so is unmodifiable), and
all VMX MSRs are read-only. Also, fix the MSR_IA32_VMX_TRUE_ENTRY_CTLS bound
to be MSR_IA32_VMX_VMFUNC, rather than having the intervening MSRs falling
into the default case.
Raise #GP faults if the guest tries to modify any of them.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Zhongze Liu [Thu, 22 Jun 2017 16:35:28 +0000 (00:35 +0800)]
libxc: add xc_domain_add_to_physmap_batch to wrap XENMEM_add_to_physmap_batch
This is a preparation for the proposal "allow setting up shared memory areas
between VMs from xl config file". See:
V2: https://lists.xen.org/archives/html/xen-devel/2017-06/msg02256.html
V1: https://lists.xen.org/archives/html/xen-devel/2017-05/msg01288.html
The plan is to use XENMEM_add_to_physmap_batch in xl to map foregin pages from
one DomU to another so that the page could be shared. But currently there is no
wrapper for XENMEM_add_to_physmap_batch in libxc, so we just add a wrapper for
it.
Signed-off-by: Zhongze Liu <blackskygg@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 23 Jun 2017 10:48:21 +0000 (10:48 +0000)]
xen/tmem: Switch to using bool
* Drop redundant initialisers
* Style corrections while changing client_over_quota()
* Drop all write-only bools from do_tmem_op()
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Mon, 26 Jun 2017 14:20:35 +0000 (15:20 +0100)]
xen: move do_nmi_op and make it x86 only
Since ARM doesn't need {compat,do}_nmi_op, move the hypercall handlers
from common/kernel.c to pv/callback.c. Drop the stubs in ARM. Delete
the common and ARM nmi.h and adjust header inclusions in various
files.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Wei Liu [Mon, 5 Jun 2017 15:15:26 +0000 (16:15 +0100)]
x86/traps: factor out pv_trap_init
Factor out pv_trap_init and call it at the beginning of trap_init. We
then need to tune the code to generate stub handlers in entry.S. Take
the chance to tune init_irq_data so that 0x80 and 0x82 can be used for
regular interrupts in !CONFIG_PV case.
While at it, fix some coding style issues in init_irq_data and replace
0x80 with LEGACY_SYSCALL_VECTOR in pv_trap_init.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reivewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 27 Jun 2017 17:45:03 +0000 (18:45 +0100)]
xen/pt: Avoid NULL dereference in hvm_pirq_eoi()
Coverity warns that pirq_dpci unconditionally dereferences a NULL pointer.
This warning appears to be triggered by pirq_dpci() which is a hidden ternary
expression. In reality, it appears that both callers pass a non-NULL pirq
parameter, so the code is ok in practice.
Rearange the logic to fail-safe, which should quiesce Coverity.
Clean up bool_t => bool and trailing whitespace for hvm_domain_use_pirq()
while auditing this area.
No (intended) functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Tue, 27 Jun 2017 17:29:55 +0000 (18:29 +0100)]
xen/pt: Unlock d->event_lock on error paths
Introduced by c/s fba00494268 "x86/pt: enable binding of GSIs to a PVH Dom0"
Spotted by Coverity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>