x86/mapcache: initialise the mapcache even for the idle domain
In situations where PMAP cannot be used or the mapcache of a domain is
simply not ready, we need to have a mapcache in the idle domain to map
pages when there is no direct map.
Wei Liu [Fri, 8 Feb 2019 17:19:26 +0000 (17:19 +0000)]
x86/mm: drop _new suffix for page table APIs
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Hongyan Xia <hongyax@amazon.com>
---
Changed since v1:
- Fix rebase conflicts against new master and other changes since v1.
Changed since v2:
- Also drop _new for the fix of l2t leak.
Wei Liu [Tue, 29 Jan 2019 14:40:26 +0000 (14:40 +0000)]
x86_64/mm: switch to new APIs in paging_init
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Hongyan Xia <hongyax@amazon.com>
---
Changed since v1:
- Use a global mapping for compat_idle_pg_table_l2, otherwise
l2_ro_mpt will unmap it.
Wei Liu [Tue, 29 Jan 2019 12:54:48 +0000 (12:54 +0000)]
x86/mm: change pl*e to l*t in virt_to_xen_l*e
We will need to have a variable named pl*e when we rewrite
virt_to_xen_l*e. Change pl*e to l*t to reflect better its purpose.
This will make reviewing later patch easier.
No functional change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Hongyan Xia <hongyax@amazon.com>
Wei Liu [Mon, 28 Jan 2019 18:10:10 +0000 (18:10 +0000)]
x86/mm: introduce l{1,2}t local variables to modify_xen_mappings
The pl2e and pl1e variables are heavily (ab)used in that function. It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.
We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.
Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.
Wei Liu [Mon, 28 Jan 2019 17:54:24 +0000 (17:54 +0000)]
x86/mm: introduce l{1,2}t local variables to map_pages_to_xen
The pl2e and pl1e variables are heavily (ab)used in that function. It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.
We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.
Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.
Wei Liu [Wed, 23 Jan 2019 15:33:07 +0000 (15:33 +0000)]
x86: introduce a new set of APIs to manage Xen page tables
We are going to switch to using domheap page for page tables.
A new set of APIs is introduced to allocate, map, unmap and free pages
for page tables.
The allocation and deallocation work on mfn_t but not page_info,
because they are required to work even before frame table is set up.
Implement the old functions with the new ones. We will rewrite, site
by site, other mm functions that manipulate page tables to use the new
APIs.
Note these new APIs still use xenheap page underneath and no actual
map and unmap is done so that we don't break xen half way. They will
be switched to use domheap and dynamic mappings when usage of old APIs
is eliminated.
Olaf Hering [Wed, 2 Oct 2019 17:05:36 +0000 (19:05 +0200)]
stubdom/vtpm: include stdio.h for declaration of printf
The function read_vtpmblk uses printf(3), but stdio.h is not included
in this file. This results in a warning from gcc-7:
vtpmblk.c: In function 'read_vtpmblk':
vtpmblk.c:322:7: warning: implicit declaration of function 'printf' [-Wimplicit-function-declaration]
printf("Expected: ");
vtpmblk.c:322:7: warning: incompatible implicit declaration of built-in function 'printf'
vtpmblk.c:322:7: note: include '<stdio.h>' or provide a declaration of 'printf'
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Fri, 19 Jul 2019 07:57:50 +0000 (08:57 +0100)]
docs/sphinx: Indent cleanup
Sphinx, its linters, and RST modes in common editors, expect 3 spaces of
indentation. Some bits already conform to this expectation. Update the
rest to match.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Lars Kurth <lars.kurth@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
make_cpus_node() is using a static buffer to generate the FDT node name.
While mpdir_aff is a 64-bit integer, we only ever use the bits [23:0] as
only AFF{0, 1, 2} are supported for now.
To avoid any potential issues in the future, check that mpdir_aff has
only bits [23:0] set.
Take the opportunity to reduce the size of the buffer. Indeed, only 8
characters are needed to print a 32-bit hexadecimal number. So
sizeof("cpu@") + 8 + 1 (for '\0') = 13 characters is sufficient.
Fixes: c81a791d34 (xen/arm: Set 'reg' of cpu node for dom0 to match MPIDR's affinity) Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Paul Durrant [Thu, 10 Oct 2019 15:45:15 +0000 (17:45 +0200)]
x86/mm: don't needlessly veto migration
Now that xl.cfg has an option to explicitly enable IOMMU mappings for a
domain, migration may be needlessly vetoed due to the check of
is_iommu_enabled() in paging_log_dirty_enable().
There is actually no need to prevent logdirty from being enabled unless
devices are assigned to a domain.
NOTE: While in the neighbourhood, the bool_t parameter type in
paging_log_dirty_enable() is replaced with a bool and the format
of the comment in assign_device() is fixed.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Igor Druzhinin [Thu, 10 Oct 2019 14:50:50 +0000 (16:50 +0200)]
x86/efi: properly handle 0 in pixel reserved bitmask
In some graphics modes firmware is allowed to return 0 in pixel reserved
bitmask which doesn't go against UEFI Spec 2.8 (12.9 Graphics Output Protocol).
Without this change non-TrueColor modes won't work which will cause
GOP init to fail - observed while trying to boot EFI Xen with Cirrus VGA.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Wed, 9 Oct 2019 18:21:14 +0000 (19:21 +0100)]
x86/hvm: Fix the use of "hap=0" following c/s c0902a9a143a
c/s c0902a9a143a refactored hvm_enable() a little, but dropped the logic which
cleared hap_supported in the case that the user had asked for it off.
This results in Xen booting up, claiming:
(XEN) HVM: Hardware Assisted Paging (HAP) detected but disabled
but with HAP advertised via sysctl, and XEN_DOMCTL_CDF_hap being accepted in
domain_create().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Release-acked-by: Juergen Gross <jgross@suse.com>
Roger Pau Monné [Thu, 10 Oct 2019 08:59:27 +0000 (10:59 +0200)]
pci: clear {host/guest}_maskall field on assign
The current implementation of host_maskall makes it sticky across
assign and deassign calls, which means that once a guest forces Xen to
set host_maskall the maskall bit is not going to be cleared until a
call to PHYSDEVOP_prepare_msix is performed. Such call however
shouldn't be part of the normal flow when doing PCI passthrough, and
hence the flag needs to be cleared when assigning in order to prevent
host_maskall being carried over from previous assignations.
Note that the entry maskbit is reset when the msix capability is
initialized, and the guest_maskall field is also cleared so that the
hardware value matches Xen's internal state (hardware maskall =
host_maskall | guest_maskall).
Also note that doing the reset of host_maskall there would allow the
guest to reset such field by enabling and disabling MSIX, which is not
intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Chao Gao <chao.gao@intel.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Igor Druzhinin [Thu, 10 Oct 2019 08:58:45 +0000 (10:58 +0200)]
efi/boot: make sure graphics mode is set while booting through MB2
If a bootloader is using native driver instead of EFI GOP it might
reset graphics mode to be different from what has been originally set
by firmware. While booting through MB2 Xen either need to parse video
setting passed by MB2 and use them instead of what GOP reports or
reset the mode to synchronise it with firmware - prefer the latter.
Observed while booting Xen using MB2 with EFI GRUB2 compiled with
all possible video drivers where native drivers take priority over firmware.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Thu, 10 Oct 2019 07:51:46 +0000 (09:51 +0200)]
AMD/IOMMU: pre-fill all DTEs right after table allocation
Make sure we don't leave any DTEs unexpected requests through which
would be passed through untranslated. Set V and IV right away (with
all other fields left as zero), relying on the V and/or IV bits
getting cleared only by amd_iommu_set_root_page_table() and
amd_iommu_set_intremap_table() under special pass-through circumstances.
Switch back to initial settings in amd_iommu_disable_domain_device().
Take the liberty and also make the latter function static, constifying
its first parameter at the same time, at this occasion.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Thu, 10 Oct 2019 07:51:12 +0000 (09:51 +0200)]
AMD/IOMMU: allow callers to request allocate_buffer() to skip its memset()
The command ring buffer doesn't need clearing up front in any event.
Subsequently we'll also want to avoid clearing the device tables.
While playing with functions signatures replace undue use of fixed width
types at the same time, and extend this to deallocate_buffer() as well.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Thu, 10 Oct 2019 07:50:00 +0000 (09:50 +0200)]
AMD/IOMMU: allocate one device table per PCI segment
Having a single device table for all segments can't possibly be right.
(Even worse, the symbol wasn't static despite being used in just one
source file.) Attach the device tables to their respective IVRS mapping
ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
When reserved-memory regions are present in the host device tree, dom0
is started with multiple memory nodes. Each memory node should have a
unique name, but today they are all called "memory" leading to Linux
printing the following warning at boot:
OF: Duplicate name in base, renamed to "memory#1"
This patch fixes the problem by appending a "@<unit-address>" to the
name, as per the Device Tree specification, where <unit-address> matches
the base of address of the first region.
Fixes: 248faa637d2 (xen/arm: add reserved-memory regions to the dom0 memory node) Reported-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Acked-by: Julien Grall <julien.grall@arm.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Lars Kurth [Thu, 3 Oct 2019 15:47:05 +0000 (08:47 -0700)]
docs: update all URLs in man pages
Specifically
* xen.org to xenproject.org
* http to https
* Replaced pages where page has moved
(including on xen pages as well as external pages)
* Removed some URLs (e.g. downloads for Linux PV drivers)
Tested-by: Lars Kurth <lars.kurth@citrix.com> Signed-off-by: Lars Kurth <lars.kurth@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org> Release-acked-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Mon, 7 Oct 2019 06:35:19 +0000 (08:35 +0200)]
xen/sched: let credit scheduler control its timer all alone
The credit scheduler is the only scheduler with tick_suspend and
tick_resume callbacks. Today those callbacks are invoked without being
guarded by the scheduler lock which is critical when at the same the
cpu those callbacks are active is being moved to or from a cpupool.
Crashes like the following are possible due to that race:
The callbacks are used when the cpu is going to or coming from idle in
order to allow higher C-states.
The credit scheduler knows when it is going to schedule an idle
scheduling unit or another one after idle, so it can easily stop or
resume the timer itself removing the need to do so via the callback.
As this timer handling is done in the main scheduling function the
scheduler lock is still held, so the race with cpupool operations can
no longer occur. Note that calling the callbacks from schedule_cpu_rm()
and schedule_cpu_add() is no longer needed, as the transitions to and
from idle in the cpupool with credit active will automatically occur
and do the right thing.
With the last user of the callbacks gone those can be removed.
Julien Grall [Fri, 4 Oct 2019 16:32:49 +0000 (17:32 +0100)]
xen/xsm: flask: Prevent NULL deference in flask_assign_{, dt}device()
flask_assign_{, dt}device() may be used to check whether you can test if
a device is assigned. In this case, the domain will be NULL.
However, flask_iommu_resource_use_perm() will be called and may end up
to deference a NULL pointer. This can be prevented by moving the call
after we check the validity for the domain pointer.
Coverity-ID: 1486741 Fixes: 71e617a6b8 ('use is_iommu_enabled() where appropriate...') Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Paul Durrant <paul@xen.org> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Tue, 1 Oct 2019 16:27:49 +0000 (17:27 +0100)]
x86/Kconfig: Invert the defaults for CONFIG_{PVH_GUEST,PV_SHIM}
This is a minor UI change, but users which have elected to enable
XEN_GUEST (which still defaults to no) will definitely need one of these
options, and will typically want both.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
There are legitimate circumstance where array hardening is not wanted or
needed. Allow it to be turned off.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Thu, 3 Oct 2019 14:04:03 +0000 (15:04 +0100)]
x86/spec-ctrl: Annotate remaining model names
The names in retpoline_safe() are copied from should_use_eager_fpu(). The
names in mds_calculations() come partly from Linux's intel-family.h, and
partly from conversations with Intel.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
We don't have a clear way to know how many virtual SPIs we need for the
dom0-less domains. Introduce a new option under xen,domain to specify
the number of SPIs to allocate for a domain.
The property is optional. When absent, we'll use the physical number of
GIC lines for dom0-less domains, or GUEST_VPL011_SPI+1 if vpl011 is
requested, whichever is greater.
Remove the old setting of nr_spis based on the presence of the vpl011.
The implication of this change is that without nr_spis dom0less domains
get the same amount of SPI allocated as dom0, regardless of how many
physical devices they have assigned, and regardless of whether they have
a virtual pl011 (which also needs an emulated SPI). This is done because
the SPIs allocation needs to be done before parsing any passthrough
information, so we have to account for any potential physical SPI
assigned to the domain.
When nr_spis is present, the domain gets exactly nr_spis allocated SPIs.
If the number is too low, it might not be enough for the devices
assigned it to it. If the number is less than GUEST_VPL011_SPI, the
virtual pl011 won't work.
Detect "multiboot,device-tree" compatible nodes. Add them to the bootmod
array as BOOTMOD_GUEST_DTB. In kernel_probe, find the right
BOOTMOD_GUEST_DTB and store a pointer to it in dtb_bootmodule.
Scan the user provided dtb fragment at boot. For each device node, map
memory to guests, and route interrupts and setup the iommu.
The memory region to remap is specified by the "xen,reg" property.
The iommu is setup by passing the node of the device to assign on the
host device tree. The path is specified in the device tree fragment as
the "xen,path" string property.
The interrupts are remapped based on the information from the
corresponding node on the host device tree. Call
handle_device_interrupts to remap interrupts. Interrupts related device
tree properties are copied from the device tree fragment, same as all
the other properties.
Require both xen,reg and xen,path to be present, unless
xen,force-assign-without-iommu is also set. In that case, tolerate a
missing xen,path, also tolerate iommu setup failure for the passthrough
device.
Also set add the new flag XEN_DOMCTL_CDF_iommu so that dom0less domU
can use the IOMMU if a partial dtb is specified.
Read the dtb fragment corresponding to a passthrough device from memory
at the location referred to by the "multiboot,device-tree" compatible
node.
Add a new field named dtb_bootmodule to struct kernel_info to keep track
of the dtb fragment location.
Copy the fragment to the guest dtb (only /aliases and /passthrough).
Set kinfo->phandle_gic based on the phandle of the special "/gic"
node in the device tree fragment. "/gic" is a dummy node in the dtb
fragment that represents the gic interrupt controller. Other properties
in the dtb fragment might refer to it (for instance interrupt-parent of
a device node). We reuse the phandle of "/gic" from the dtb fragment as
the phandle of the full GIC node that will be created for the guest
device tree. That way, when we copy properties from the device tree
fragment to the domU device tree the links remain unbroken.
scan_passthrough_prop is introduced here and not used in this patch but
it will be used by later patches.
Some of the code below is taken from tools/libxl/libxl_arm.c. Note that
it is OK to take LGPL 2.1 code and including it into a GPLv2 code base.
The result is GPLv2 code.
Changes in v5:
- code style
- in-code comment
- remove depth parameter from scan_pfdt_node
- for instead of loop in domain_handle_dtb_bootmodule
- move "gic" check to domain_handle_dtb_bootmodule
- add check_partial_fdt
- use DT_ROOT_NODE_ADDR/SIZE_CELLS_DEFAULT
- add scan_passthrough_prop parameter, set it to false for "/aliases"
Changes in v4:
- use recursion in the implementation
- rename handle_properties to handle_prop_pfdt
- rename scan_pt_node to scan_pfdt_node
- pass kinfo to handle_properties
- use uint32_t instead of u32
- rename r to res
- add "passthrough" and "aliases" check
- add a name == NULL check
- code style
- move DTB fragment scanning earlier, before DomU GIC node creation
- set guest_phandle_gic based on "/gic"
- in-code comment
Changes in v3:
- switch to using device_tree_for_each_node for the copy
Changes in v2:
- add a note about the code coming from libxl in the commit message
- copy /aliases
- code style
Instead of always hard-coding the GIC phandle (GUEST_PHANDLE_GIC), store
it in a variable under kinfo. This way it can be dynamically chosen per
domain. Remove the fdt pointer argument to the make_*_domU_node
functions and oass a struct kernel_info * instead. The fdt pointer can
be accessed from kinfo->fdt. Remove the struct domain *d parameter to
the make_*_domU_node functions because it becomes unused.
Initialize phandle_gic to GUEST_PHANDLE_GIC at the beginning of
prepare_dtb_domU for DomUs. Later patches will change the value of
phandle_gic depending on user provided information.
For Dom0, initialize phandle_gic to dt_interrupt_controller->phandle
(current value) at the beginning of prepare_dtb.
xen/arm: boot with device trees with "mmu-masters" and "iommus"
Some Device Trees may expose both legacy SMMU and generic IOMMU bindings
together. However, the SMMU driver in Xen is only supporting the legacy
SMMU bindings, leading to fatal initialization errors at boot time.
This patch fixes the booting problem by adding a check to
iommu_add_dt_device: if the Xen driver doesn't support the new generic
bindings, and the device is behind an IOMMU, do not return error. The
following iommu_assign_dt_device should succeed.
This check will become superfluous, hence removable, once the Xen SMMU
driver gets support for the generic IOMMU bindings.
libxl: don't try to manipulate json config for stubdomain
Stubdomain do not have it's own config file - its configuration is
derived from target domains. Do not try to manipulate it when attaching
PCI device.
This bug prevented starting HVM with stubdomain and PCI passthrough
device attached.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
libxl: attach PCI device to qemu only after setting pciback/pcifront
When qemu is running in stubdomain, handling "pci-ins" command will fail
if pcifront is not initialized already. Fix this by sending such command
only after confirming that pciback/front is running.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain.
But pciback serve also second function - it reset the device when it is
deassigned from the guest and for this reason pciback needs to be used
with HVM domain too.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself may prevent attaching xen-pciback to the
(PV) stubdomain, effectively breaking PCI passthrough.
Fix this by attaching pciback only to one domain: if PV stubdomain is in
use, let it be stubdomain (the commit prevents attaching device to target
HVM in this case); otherwise, attach it to the target domain.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>