Yi Sun [Tue, 1 Aug 2017 09:05:00 +0000 (11:05 +0200)]
tools: L2 CAT: support set cbm for L2 CAT.
This patch implements the xl/xc changes to support set CBM
for L2 CAT.
The new level option is introduced to original CAT setting
command in order to set CBM for specified level CAT.
- 'xl psr-cat-set' is updated to set cache capacity bitmasks(CBM)
for a domain according to input cache level.
root@:~$ xl psr-cat-set -l2 1 0x7f
root@:~$ xl psr-cat-show -l2 1
Socket ID : 0
Default CBM : 0xff
ID NAME CBM
1 ubuntu14 0x7f
Signed-off-by: He Chen <he.chen@linux.intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Yi Sun [Tue, 1 Aug 2017 09:05:00 +0000 (11:05 +0200)]
tools: L2 CAT: support show cbm for L2 CAT.
This patch implements changes in xl/xc changes to support
showing CBM of L2 CAT.
The new level option is introduced to original CAT showing
command in order to show CBM for specified level CAT.
- 'xl psr-cat-show' is updated to show CBM of a domain
according to input cache level.
Examples:
root@:~$ xl psr-cat-show -l2 1
Socket ID : 0
Default CBM : 0xff
ID NAME CBM
1 ubuntu14 0x7f
Signed-off-by: He Chen <he.chen@linux.intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Yi Sun [Tue, 1 Aug 2017 09:05:00 +0000 (11:05 +0200)]
tools: L2 CAT: support get HW info for L2 CAT.
This patch implements xl/xc changes to support get HW info
for L2 CAT.
'xl psr-hwinfo' is updated to show both L3 CAT and L2 CAT
info.
Example(on machine which only supports L2 CAT):
Cache Monitoring Technology (CMT):
Enabled : 0
Cache Allocation Technology (CAT): L2
Socket ID : 0
Maximum COS : 3
CBM length : 8
Default CBM : 0xff
Signed-off-by: He Chen <he.chen@linux.intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Yi Sun [Tue, 1 Aug 2017 09:04:00 +0000 (11:04 +0200)]
x86: refactor psr: L3 CAT: set value: implement cos id picking flow.
Continue from previous patch:
'x86: refactor psr: L3 CAT: set value: implement cos finding flow.'
If fail to find a COS ID, we need pick a new COS ID for domain. Only COS ID
that ref[COS_ID] is 1 or 0 can be picked to input a new set feature values.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Yi Sun [Tue, 1 Aug 2017 09:04:00 +0000 (11:04 +0200)]
x86: refactor psr: L3 CAT: set value: assemble features value array.
Only can one COS ID be used by one domain at one time. That means all enabled
features' COS registers at this COS ID are valid for this domain at that time.
When user updates a feature's value, we need make sure all other features'
values are not affected. So, we firstly need gather an array which contains
all features current values and replace the setting feature's value in array
to new value.
Then, we can try to find if there is a COS ID on which all features' COS
registers values are same as the array. If we can find, we just use this COS
ID. If fail to find, we need pick a new COS ID.
This patch implements value array assembling flow.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Yi Sun [Tue, 1 Aug 2017 09:04:00 +0000 (11:04 +0200)]
x86: refactor psr: L3 CAT: set value: implement framework.
As set value flow is the most complicated one in psr, it will be
divided to some patches to make things clearer. This patch
implements the set value framework to show a whole picture firstly.
It also changes domctl interface to make it more general.
To make the set value flow be general and can support multiple features
at same time, it includes below steps:
1. Test and set dom_ids bit corresponding to the domain. If the old bit is 0
which means the domain's COS ID is invalid, restore COS ID to 0. If the
COS ID is valid, get the COS ID that current domain is using.
2. Gather a value array to store all features current value
into it and replace the current value of the feature which is
being set to the new input value.
3. Find if there is already a COS ID on which all features'
values are same as the array. Then, we can reuse this COS
ID.
4. If fail to find, we need pick an available COS ID. Only COS ID which ref
is 0 or 1 can be picked.
5. Write the feature's MSRs according to the COS ID.
6. Update ref according to COS ID.
7. Save the COS ID into current domain's psr_cos_ids[socket] so that we
can know which COS the domain is using on the socket.
So, some functions are abstracted and the callback functions will be
implemented in next patches.
Here is an example to understand the process. The CPU supports
two featuers, e.g. L3 CAT and L2 CAT. User wants to set L3 CAT
of Dom1 to 0x1ff.
1. At the initial time, the old_cos of Dom1 is 0. The COS registers values
are below at this time.
-------------------------------
| COS 0 | COS 1 | COS 2 | ... |
-------------------------------
L3 CAT | 0x7ff | 0x7ff | 0x7ff | ... |
-------------------------------
L2 CAT | 0xff | 0xff | 0xff | ... |
-------------------------------
2. Gather the value array and insert new value into it:
val[0]: 0x1ff
val[1]: 0xff
3. It cannot find a matching COS.
4. Pick COS 1 to store the value set.
5. Write the L3 CAT COS 1 registers. The COS registers values are
changed to below now.
-------------------------------
| COS 0 | COS 1 | COS 2 | ... |
-------------------------------
L3 CAT | 0x7ff | 0x1ff | ... | ... |
-------------------------------
L2 CAT | 0xff | 0xff | ... | ... |
-------------------------------
6. The ref[1] is increased to 1 because Dom1 is using it now.
7. Save 1 to Dom1's psr_cos_ids[socket].
Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos
of Dom2 is 0 too. Repeat above flow.
The val array assembled is:
val[0]: 0x1ff
val[1]: 0xff
So, it can find a matching COS, COS 1. Then, it can reuse COS 1
for Dom2.
The ref[1] is increased to 2 now because both Dom1 and Dom2 are
using this COS ID. Set 1 to Dom2's psr_cos_ids[socket].
There is one thing need to emphasize that we need restore domain's COS ID to
0 when socket is offline. Otherwise, a wrong COS ID will be used when the
socket is online again. That may cause user see the wrong CBM shown. But it
takes much time to iterate all domains to restore COS ID to 0. So, we define
a 'dom_ids[]' to represents all domains, one bit corresponds to one domain.
If the bit is 0 when entering 'psr_ctxt_switch_to', that means this is the
first time the domain is switched to this socket or domain's COS ID has not
been set since the socket is online. So, the COS ID set to ASSOC register on
this socket should be default value, 0. If not, that means the domain's COS
ID has been set when the socket was online. So, this COS ID is valid and we
can directly use it. We restore the domain's COS ID to 0 if the bit
corresponding to the domain is 0 but the domain's COS ID is not 0 when
'psr_get_val' and 'psr_set_val' is called. This can avoid CPU serialization
if restoring action is exectued in 'psr_ctxt_switch_to'.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
This patch implements the Domain init/free and schedule flows.
- When domain init, its psr resource should be allocated.
- When domain free, its psr resource should be freed too.
- When domain is scheduled, its COS ID on the socket should be
set into ASSOC register to make corresponding COS MSR value
work.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Yi Sun [Tue, 1 Aug 2017 09:04:00 +0000 (11:04 +0200)]
x86: refactor psr: L3 CAT: implement main data structures, CPU init and free flows.
To construct an extendible framework, we need analyze PSR features
and abstract the common things and feature specific things. Then,
encapsulate them into different data structures.
By analyzing PSR features, we can get below map.
+------+------+------+
--------->| Dom0 | Dom1 | ... |
| +------+------+------+
| |
|Dom ID | cos_id of domain
| V
| +-----------------------------------------------------------------------------+
User --------->| PSR |
Socket ID | +--------------+---------------+---------------+ |
| | Socket0 Info | Socket 1 Info | ... | |
| +--------------+---------------+---------------+ |
| | cos_id=0 cos_id=1 ... |
| | +-----------------------+-----------------------+-----------+ |
| |->Ref : | ref 0 | ref 1 | ... | |
| | +-----------------------+-----------------------+-----------+ |
| | +-----------------------+-----------------------+-----------+ |
| |->L3 CAT: | cos 0 | cos 1 | ... | |
| | +-----------------------+-----------------------+-----------+ |
| | +-----------------------+-----------------------+-----------+ |
| |->L2 CAT: | cos 0 | cos 1 | ... | |
| | +-----------------------+-----------------------+-----------+ |
| | +-----------+-----------+-----------+-----------+-----------+ |
| |->CDP : | cos0 code | cos0 data | cos1 code | cos1 data | ... | |
| +-----------+-----------+-----------+-----------+-----------+ |
+-----------------------------------------------------------------------------+
So, we need define a socket info data structure, 'struct
psr_socket_info' to manage information per socket. It contains a
reference count array according to COS ID and a feature array to
manage all features enabled. Every entry of the reference count
array is used to record how many domains are using the COS registers
according to the COS ID. For example, L3 CAT and L2 CAT are enabled,
Dom1 uses COS_ID=1 registers of both features to save CBM values, like
below.
+-------+-------+-------+-----+
| COS 0 | COS 1 | COS 2 | ... |
+-------+-------+-------+-----+
L3 CAT | 0x7ff | 0x1ff | ... | ... |
+-------+-------+-------+-----+
L2 CAT | 0xff | 0xff | ... | ... |
+-------+-------+-------+-----+
If Dom2 has same CBM values, it can reuse these registers which COS_ID=1.
That means, both Dom1 and Dom2 use same COS registers(ID=1) to keep same
L3/L2 values. So, the value of ref[1] is 2 which means 2 domains are using
COS_ID 1.
To manage a feature, we need define a feature node data structure,
'struct feat_node', to manage feature's specific HW info, and an array of all
COS registers values of this feature.
To manage feature properties, we need define a feature property data structure,
'struct feat_props', to manage common properties (callback functions - all
feature's specific behaviors are encapsulated into these callback functions,
and generic values - e.g. the cos_max), the feature independent values.
CDP is a special feature which uses two entries of the array
for one COS ID. So, the number of CDP COS registers is the half of L3
CAT. E.g. L3 CAT has 16 COS registers, then CDP has 8 COS registers if
it is enabled. CDP uses the COS registers array as below.
For more details, please refer SDM and patches to implement 'get value' and
'set value'.
This patch also implements the CPU init and free flow including L3 CAT
initialization and some resources free. It includes below flows:
1. presmp init:
- parse command line parameter.
- allocate socket info for every socket.
- allocate feature resource.
- initialize socket info, get feature info and add feature into feature
array per cpuid result.
- free resources allocated if error happens.
- register cpu notifier to handle cpu events.
2. cpu notifier:
- handle cpu online events, if initialization work has been done before,
do nothing.
- handle cpu offline events, if it is the last cpu offline, free some
socket resources.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Yi Sun [Tue, 1 Aug 2017 09:04:00 +0000 (11:04 +0200)]
x86: refactor psr: remove L3 CAT/CDP codes.
The current cache allocation codes in psr.c do not consider
future features addition and are not friendly to extend.
To make psr.c be more flexible to add new features and fulfill
the program principle, open for extension but closed for
modification, we have to refactor the psr.c:
1. Analyze cache allocation features and abstract general data
structures.
2. Analyze the init and all other functions flow, abstract all
steps that different features may have different implementations.
Make these steps be callback functions and register feature
specific fuctions. Then, the main processes will not be changed
when introducing a new feature.
Because the quantity of refactor codes is big and the logics are
changed a lot, it will cause reviewers confused if just change
old codes. Reviewers have to understand both old codes and new
implementations. After review iterations from V1 to V3, Jan has
proposed to remove all old cache allocation codes firstly, then
implement new codes step by step. This will help to make codes
be more easily reviewable.
There is no construction without destruction. So, this patch
removes all current L3 CAT/CDP codes in psr.c. The following
patches will introduce the new mechanism.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Yi Sun [Tue, 1 Aug 2017 09:04:00 +0000 (11:04 +0200)]
docs: create Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) feature document
This patch creates CAT and CDP feature document in doc/features/. It describes
key points to implement L3 CAT/CDP and L2 CAT which is described in details in
Intel SDM "INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES".
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Move pre-existing PAGE_(SHIFT|SIZE|MASK|ALIGN)_(4K|64K) and
introduce corresponding defines for 16K page granularity to/in a
common place in xen/page-defs.h to allow later commits to use the
consolidated defines.
Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de> Acked-by: Jan Beulich <jbeulich@suse.com>
Praveen Kumar [Thu, 3 Aug 2017 10:24:25 +0000 (12:24 +0200)]
rbtree: changes to align the code with Linux tree
The patch aligns the code of rbtree related files with Linux tree.
This will minimize the conflicts during any future porting from Linux tree.
Linux commit till f4b477c47332367d35686bd2b808c2156b96d7c7 for rbtree.h
This includes addition of commented inline functions in rbtree.h, to have
complete replica from Linux tree.
Olaf Hering [Wed, 26 Jul 2017 14:39:50 +0000 (16:39 +0200)]
docs: add pod variant of xl-numa-placement
Convert source for xl-numa-placement.7 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 26 Jul 2017 14:39:49 +0000 (16:39 +0200)]
docs: add pod variant of xl-network-configuration.5
Convert source for xl-network-configuration.5 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 26 Jul 2017 14:39:48 +0000 (16:39 +0200)]
docs: add pod variant of xen-pv-channel.7
Convert source for xen-pv-channel.7 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com>
Running "make uninstall" does not remove all installed files, a
situation which might cause link related issues if xen is re-installed
in a different location.
In order to make uninstall correctly remove the files it is best
the process should be done recursively by mirroring each "install"
target with an "uninstall" who removes the installed files.
An exception to this rule is uninstalling the files produced by
"qemu-xen-dir-remote" and "qemu-xen-traditional-dir", which are external
to the project. These projects do not implement an "uninstall" target so
the files have to be removed manually.
Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
If xc_gntshr_open failed the only thing to cleanup is free allocated
memory. So instead of calling libxenvchan_close (which assume
valid calculated buffers being mmaped already) free memory and return.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 30 Jun 2017 12:24:19 +0000 (12:24 +0000)]
x86/svm: Alias the VMCB segment registers as an array
This allows svm_{get,set}_segment_register() to access the user segments by
array index, as the x86_seg_* constants match the hardware encoding.
While making this alteration, add some newlines for clarity, switch an int for
a bool, and make the functions fail safe in a release build, rather than
crashing Xen.
Bloat-o-meter reports some modest improvements:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-130 (-130)
function old new delta
svm_set_segment_register 662 653 -9
svm_get_segment_register 409 288 -121
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Wed, 19 Jul 2017 09:28:03 +0000 (10:28 +0100)]
x86/vvmx: Fix handing of the MSR_BITMAP field with VMCS shadowing
Currently, the following sequence of actions:
* VMPTRLD (creates a mapping, likely pointing at gfn 0 for an empty vmcs)
* VMWRITE CPU_BASED_VM_EXEC_CONTROL (completed by hardware)
* VMWRITE MSR_BITMAP (completed by hardware)
* VMLAUNCH
results in an L2 guest running with ACTIVATE_MSR_BITMAP set, but Xen using a
stale mapping (likely gfn 0) when reading the interception bitmap. The
MSR_BITMAP field needs unconditionally intercepting even with VMCS shadowing,
so Xen's mapping of the bitmap can be updated.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 18 Jul 2017 14:55:03 +0000 (14:55 +0000)]
x86/vvmx: Switch nested MSR intercept handling to use struct vmx_msr_bitmap
Rename vmx_check_msr_bitmap() to vmx_msr_is_intercepted() in order to more
clearly identify what the boolean return value means. Change the int
access_type to bool is_write.
The NULL pointer check is moved out, as it doesn't pertain to whether the MSR
is intercepted or not. The check is moved into nvmx_n2_vmexit_handler(),
where it becomes a hard error in the case that ACTIVATE_MSR_BITMAP is set.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 18 Jul 2017 14:44:05 +0000 (14:44 +0000)]
x86/vmx: Introduce and use struct vmx_msr_bitmap
This avoids opencoding the bitmap bases in accessor functions. Introduce a
build_assertions() function to check the structure layout against the manual
definiton. In addition, drop some stale comments and ASSERT() that callers
pass an in-range MSR.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 18 Jul 2017 14:33:13 +0000 (14:33 +0000)]
x86/vpmu: Use vmx_{clear,set}_msr_intercept() rather than opencoding them
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 18 Jul 2017 14:14:32 +0000 (14:14 +0000)]
x86/vmx: Improvements to vmx_{dis,en}able_intercept_for_msr()
* Shorten the names to vmx_{clear,set}_msr_intercept()
* Use an enumeration for MSR_TYPE rather than a plain integer
* Introduce VMX_MSR_RW, as most callers alter both the read and write
intercept at the same time.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 25 Jul 2017 18:48:43 +0000 (19:48 +0100)]
x86/hvm: Fix boundary check in hvmemul_insn_fetch()
c/s 0943a03037 added some extra protection for overflowing the emulation
instruction cache, but Coverity points out that boundary condition is off by
one when memcpy()'ing out of the buffer.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Wei Liu [Wed, 26 Jul 2017 07:44:56 +0000 (08:44 +0100)]
libxc: bail immediately when PV superpage is discovered
The original code was added with the hope that PV superpage migration
might work. But it was never proven that the code actually worked.
Now that PV superpage is gone, simplify the code by returning error
immediately.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Wei Liu [Wed, 26 Jul 2017 07:44:55 +0000 (08:44 +0100)]
tools: nuke superpage parameters in code
Also fix manpage because there is no superpages options in xl.cfg.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Wei Liu [Wed, 26 Jul 2017 07:44:54 +0000 (08:44 +0100)]
x86: nuke PV superpage option and code
Delete the user visible option and code for PV superpage support. The
mm code is modified as if the option is set to false (the default
value).
Return the address space occupied by spage_info back to the reserved
address space.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
GRUB_MODULES="boot chain configfile echo efinet eval ext2 fat font gettext gfxterm gzio help linux loadenv lsefi normal part_gpt par
t_msdos read regexp search search_fs_file search_fs_uuid search_label terminal terminfo test tftp time xen_boot"
The null scheduler does not really use hard-affinity for
scheduling, it uses it for 'placement', i.e., for deciding
to what pCPU to statically assign a vCPU.
Let's use soft-affinity in the same way, of course with the
difference that, if there's no free pCPU within the vCPU's
soft-affinity, we go checking the hard-affinity, instead of
putting the vCPU in the waitqueue.
This does has no impact on the scheduling overhead, because
soft-affinity is only considered in cold-path (like when a
vCPU joins the scheduler for the first time, or is manually
moved between pCPUs by the user).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Mon, 5 Jun 2017 16:19:27 +0000 (17:19 +0100)]
x86/emul: Drop segment_attributes_t
The amount of namespace resolution is unnecessarily large, as all code deals
in terms of struct segment_register. This removes the attr.fields part of all
references, and alters attr.bytes to just attr.
Three areas of code using initialisers for segment_register are tweaked to
compile with older versions of GCC. arch_set_info_hvm_guest() has its SEG()
macros altered to use plain comma-based initialisation, while
{rm,vm86}_{cs,ds}_attr are simplified to plain numbers which matches their
description in the manuals.
No functional change. (For some reason, the old {rm,vm86}_{cs,ds}_attr causes
GCC to create variable in .rodata, whereas the new code uses immediate
operands. As a result, vmx_{get,set}_segment_register() are slightly
shorter.)
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 30 Jun 2017 12:12:00 +0000 (12:12 +0000)]
x86/svm: Drop svm_segment_register_t
Most SVM code already uses struct segment_register. Drop the typedef and
adjust the definitions in struct vmcb_struct, and svm_dump_sel(). Introduce
some build-time assertions that struct segment_register from the common
emulation code is usable in struct vmcb_struct.
While making these adjustments, fix some comments to not mix decimal and
hexidecimal offsets, and drop all trailing whitespace in vmcb.h
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Mon, 24 Jul 2017 16:28:25 +0000 (17:28 +0100)]
x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()
The purpose of guest_walk_tables() is to match the behaviour of real hardware.
A PV guest can have 2M superpages in its pagetables, via the M2P (and for dom0
via the initial P2M), even if the guest isn't permitted to create arbitrary 2M
superpage mappings.
guest_can_use_l2_superpages() checking opt_allow_superpage is a piece of PV
guest policy enforcement, rather than its intended purpose of meaning "would
hardware tolerate finding an L2 superpage with these control settings?"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Revert "VT-d: fix VF of RC integrated PF matched to wrong VT-d unit"
This reverts commit 89df98b77d28136c4d7aade13a1c8bc154d2919f, which
incurs Xen crash when loading VF driver. The reason seems that
pci_get_pdev() can't be called when interrupt is disabled. I don't have a
quick solution to fix this; therefore revert this patch to let common cases
work well. As to the corner case I intended to fix, I will propose another
solution later.
Andrew Cooper [Tue, 25 Jul 2017 10:40:40 +0000 (11:40 +0100)]
xen: Drop repeated semicolons
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
David Woodhouse [Tue, 25 Jul 2017 09:21:37 +0000 (10:21 +0100)]
xen/link: Move .data.rel.ro sections into .rodata for final link
This includes stuff like the hypercall tables which we really kind of want
to be read-only. And they were going into .data.read-mostly.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli [Fri, 23 Jun 2017 10:55:05 +0000 (12:55 +0200)]
xen: credit: rearrange members of control structures
With the aim of improving memory size and layout, and
at the same time trying to put related fields reside
in the same cacheline.
Here's a summary of the output of `pahole`, with and
without this patch, for the affected data structures.
csched_pcpu:
* Before:
size: 88, cachelines: 2, members: 6
sum members: 80, holes: 1, sum holes: 4
padding: 4
paddings: 1, sum paddings: 5
last cacheline: 24 bytes
* After:
size: 80, cachelines: 2, members: 6
paddings: 1, sum paddings: 5
last cacheline: 16 bytes
csched_vcpu:
* Before:
size: 72, cachelines: 2, members: 9
padding: 2
last cacheline: 8 bytes
* After:
same numbers, but move some fields to put
related fields in same cache line.
csched_private:
* Before:
size: 152, cachelines: 3, members: 17
sum members: 140, holes: 2, sum holes: 8
padding: 4
paddings: 1, sum paddings: 5
last cacheline: 24 bytes
* After:
same numbers, but move some fields to put
related fields in same cache line.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Fri, 23 Jun 2017 10:54:59 +0000 (12:54 +0200)]
xen: credit2: make the cpu to runqueue map per-cpu
Instead of keeping an NR_CPUS big array of int-s,
directly inside csched2_private, use a per-cpu
variable.
That's especially beneficial (in terms of saved
memory) when there are more instance of Credit2 (in
different cpupools), and also helps fitting
csched2_private itself into CPU caches.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Fri, 23 Jun 2017 10:54:52 +0000 (12:54 +0200)]
xen: credit2: allocate runqueue data structure dynamically
Instead of keeping an NR_CPUS big array of csched2_runqueue_data
elements, directly inside the csched2_private structure, allocate
it dynamically.
This has two positive effects:
- reduces the size of csched2_private sensibly, which is
especially good in case there are more instance of Credit2
(in different cpupools), and is also good from the point
of view of fitting the struct into CPU caches;
- we can use nr_cpu_ids as array size, which may be sensibly
smaller than NR_CPUS
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Mon, 17 Jul 2017 12:38:03 +0000 (13:38 +0100)]
tools: Drop xc_cpuid_check() and bindings
There are no current users which I can locate. One piece of xend which didn't
move forwards into xl/libxl is this:
# Configure host CPUID consistency checks, which must be satisfied for this
# VM to be allowed to run on this host's processor type:
#cpuid_check=[ '1:ecx=xxxxxxxxxxxxxxxxxxxxxxxxxx1xxxxx' ]
# - Host must have VMX feature flag set
The implementation of xc_cpuid_check() is conceptually broken. Dom0's view of
CPUID is not the approprite view to check, and will be wrong in the presence
of CPUID masking/faulting, and for HVM-based toolstack domains.
If it turns out that the functionality is required, it should be implemented
in terms of XEN_SYSCTL_get_cpuid_policy to use the proper CPUID view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Felix Schmoll [Thu, 20 Jul 2017 07:47:48 +0000 (09:47 +0200)]
xenconsole: Add option to xenconsole to always forward console input
Currently the default behaviour of the xenconsole client is to
ignore any input to stdin, unless stdin and stdout are both
ttys. The new option allows to manually overwrite this, causing the
client to forward input regardless.
Signed-off-by: Felix Schmoll <eggi.innovations@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
The patch introduces a new command line option 'cpu' that when used will create
runqueue per logical pCPU. This may be useful for small systems, and also for
development, performance evalution and comparison.
Owen Smith [Mon, 3 Jul 2017 12:57:53 +0000 (12:57 +0000)]
kbdif: Define "feature-raw-pointer" and "request-raw-pointer"
Backends set "feature-raw-pointer" if its capable of reporting
absolute positions without scaling the coordinates to screen
size. This should be set during the backend init.
Frontends set "request-raw-pointer" to request that backends
do not rescale absolute coordinates to screen size, and the
coordinates remain in the range [0, 0x7fff]. This request is
only applicable if "request-abs-pointer" is also set. Frontends
should set this value before setting Connected.
Signed-off-by: Owen Smith <owen.smith@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Andrew Cooper [Thu, 22 Jun 2017 10:30:00 +0000 (11:30 +0100)]
x86/hvm: Drop more remains of the PVHv1 implementation
These functions don't need is_hvm_{vcpu,domain}() predicates.
hvmop_set_evtchn_upcall_vector() does need the predicate to prevent a PV
caller accessing the hvm union, but swap the copy_from_guest() and
is_hvm_domain() predicate to avoid reading the hypercall parameter if we not
going to use it.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Tue, 9 May 2017 14:31:54 +0000 (15:31 +0100)]
x86/hvm: Fixes to hvmemul_insn_fetch()
Force insn_off to a single byte, as offset can wrap around or truncate with
respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.
Furthermore, don't use an ASSERT() for bounds checking the write into
hvmemul_ctxt->insn_buf[].
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 18 Jul 2017 14:21:46 +0000 (15:21 +0100)]
x86/evtchn: Restrict the ops usable in do_event_channel_op_compat()
This hypercall is unused by guests these days, but there was no prevention of
usable subops. The following ops have been restricted, as there is no
suitable structure in the evntchn_op union.
This commit substitutes the direct access of the host's p2m
(&d->arch.p2m) for the macro "p2m_get_hostp2m". This macro simplifies
readability and also the differentiation between the host's p2m and
alternative p2m's, i.e., as part of the altp2m subsystem that will be
submitted in the future.
If option '-l' or '--lmce' is specified and the host supports LMCE,
xen-mceinj will inject LMCE to CPU specified by '-c' (or CPU0 if '-c'
is not present).
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
tools/libxc: add support of injecting MC# to specified CPUs
Though XEN_MC_inject_v2 allows injecting MC# to specified CPUs, the
current xc_mca_op() does not use this feature and not provide an
interface to callers. This commit add a new xc_mca_op_inject_v2() that
receives a cpumap providing the set of target CPUs.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
x86/vmce, tools/libxl: expose LMCE capability in guest MSR_IA32_MCG_CAP
If LMCE is supported by host and ' mca_caps = [ "lmce" ] ' is present
in xl config, the LMCE capability will be exposed in guest MSR_IA32_MCG_CAP.
By default, LMCE is not exposed to guest so as to keep the backwards migration
compatibility.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> for hypervisor side Acked-by: Wei Liu <wei.liu2@citrix.com>
x86/domctl: generalize the restore of vMCE parameters
vMCE parameters in struct xen_domctl_ext_vcpucontext were extended in
the past, and is likely to be extended in the future. When migrating a
PV domain from old Xen, XEN_DOMCTL_set_ext_vcpucontext should handle
the differences.
Instead of adding ad-hoc handling code at each extension, we introduce
an array to record sizes of the current and all past versions of vMCE
parameters, and search for the largest one that does not expire the
size of passed-in parameters to determine vMCE parameters that will be
restored. If vMCE parameters are extended in the future, we only need
to adapt the array to reflect the extension.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
VT-d: fix VF of RC integrated PF matched to wrong VT-d unit
The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
we would wrongly use 00:00.0 to search VT-d unit.
If a PF is an extended function, the BDF of a traditional function within the
same device should be used to search VT-d unit. Otherwise, the real BDF of PF
should be used. According PCI-e spec, an extended function is a function
within an ARI device and Function Number is greater than 7. The original code
tried to tell apart them through checking PCI_SLOT(), missing counterpart of
pci_ari_enabled() (this function exists in linux kernel) compared to linux
kernel. Without checking whether ARI is enabled, it incurs a RC integrated PF
with PCI_SLOT() >0 is wrongly classified to an extended function. Note that a
RC integrated function isn't within an ARI device and thus cannot be extended
function and in this case the real BDF should be used.
Considering 'is_extfn' field of struct pci_dev has been passed down from
Domain0 to indicate whether the function is an extended function, this patch
just looks up the 'is_extfn' field of PF's struct pci_dev and set 'devfn' to 0
when 'is_extfn' is true.
Reported-by: Crawford, Eric R <Eric.R.Crawford@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com>