Commit f855dd9625 "sched: add minimalistic idle scheduler for free cpus"
introduce the use of ZERO_BLOCK_PTR in the scheduler code. However, the
define does not exist outside of xmalloc_tsf.c for non-x86 architecture.
This will result to a compilation error on Arm:
schedule.c: In function ‘sched_idle_alloc_vdata’:
schedule.c:100:12: error: ‘ZERO_BLOCK_PTR’ undeclared (first use in this function)
return ZERO_BLOCK_PTR;
^~~~~~~~~~~~~~
schedule.c:100:12: note: each undeclared identifier is reported only once for each function it appears in
schedule.c:101:1: error: control reaches end of non-void function [-Werror=return-type]
}
^
cc1: all warnings being treated as errors
To avoid the compilation error, the default definition for
ZERO_BLOCK_PTR is now moved in xen/config.h allowing all the code to use
the define.
Fixes: f855dd9625 ('sched: add minimalistic idle scheduler for free cpus') Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
sched: add minimalistic idle scheduler for free cpus
Instead of having a full blown scheduler running for the free cpus
add a very minimalistic scheduler for that purpose only ever scheduling
the related idle vcpu. This has the big advantage of not needing any
per-cpu, per-domain or per-scheduling unit data for free cpus and in
turn simplifying moving cpus to and from cpupools a lot.
Right now, CPUs that are not in any pool, still belong to Pool-0's
scheduler. This forces us to make, within the scheduler, extra effort
to avoid actually running vCPUs on those.
In the case of Credit1, this also cause issue to weights
(re)distribution, as the number of CPUs available to the scheduler is
wrong.
This is described in the changelog of commit e7191920261d ("xen:
credit2: never consider CPUs outside of our cpupool").
This new scheduler will just use a common lock for all free cpus.
As this new scheduler is not user selectable don't register it as an
official scheduler, but just include it in schedule.c.
Today a cpu which is removed from the system is taken directly from
Pool0 to the offline state. This will conflict with the new idle
scheduler, so remove it from Pool0 first. Additionally accept removing
a free cpu instead of requiring it to be in Pool0.
For the resume failed case we need to call the scheduler code for that
situation after the cpupool handling, so move the scheduler code into
a function and call it from cpupool_cpu_remove_forced() and remove the
CPU_RESUME_FAILED case from cpu_schedule_callback().
Note that we are calling now schedule_cpu_switch() in stop_machine
context so we need to switch from spinlock_irq to spinlock_irqsave.
Jan Beulich [Tue, 24 Sep 2019 08:50:33 +0000 (10:50 +0200)]
libxc/x86: avoid certain overflows in CPUID APIC ID adjustments
Recent AMD processors may report up to 128 logical processors in CPUID
leaf 1. Doubling this value produces 0 (which OSes sincerely dislike),
as the respective field is only 8 bits wide. Suppress doubling the value
(and its leaf 0x80000008 counterpart) in such a case.
Note that while there's a similar overflow in intel_xc_cpuid_policy(),
that one is being left alone for now.
Note further that while it was considered to suppress the multiplication
by 2 altogether if the host topology already provides at least one bit
of thread ID within APIC IDs, it was decided to avoid more change here
than really needed at this point.
Also zap leaf 4 (and at the same time leaf 2) EDX output for AMD, as it
should have been from the beginning.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Alexandru Isaila [Tue, 24 Sep 2019 08:49:36 +0000 (10:49 +0200)]
x86/emulate: send vm_event from emulate
A/D bit writes (on page walks) can be considered benign by an introspection
agent, so receiving vm_events for them is a pessimization. We try here to
optimize by filtering these events out.
Currently, we are fully emulating the instruction at RIP when the hardware sees
an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
incorrect, because the instruction at RIP might legitimately cause an
EPT fault of its own while accessing a _different_ page from the original one,
where A/D were set.
The solution is to perform the whole emulation, while ignoring EPT restrictions
for the walk part, and taking them into account for the "actual" emulating of
the instruction at RIP. When we send out a vm_event, we don't want the emulation
to complete, since in that case we won't be able to veto whatever it is doing.
That would mean that we can't actually prevent any malicious activity, instead
we'd only be able to report on it.
When we see a "send-vm_event" case while emulating, we need to first send the
event out and then suspend the emulation (return X86EMUL_RETRY).
After the emulation stops we'll call hvm_vm_event_do_resume() again after the
introspection agent treats the event and resumes the guest. There, the
instruction at RIP will be fully emulated (with the EPT ignored) if the
introspection application allows it, and the guest will continue to run past
the instruction.
A common example is if the hardware exits because of an EPT fault caused by a
page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
If the vm_event was sent and it would be treated so it runs the instruction
at RIP, that instruction might also hit a protected page and provoke a vm_event.
Now if npfec.kind == npfec_kind_in_gpt and d->arch.monitor.inguest_pagefault_disabled
is true then we are in the page walk case and we can do this emulation optimization
and emulate the page walk while ignoring the EPT, but don't ignore the EPT for the
emulation of the actual instruction.
In the first case we would have 2 EPT events, in the second case we would have
1 EPT event if the instruction at the RIP triggers an EPT event.
We use hvmemul_map_linear_addr() to intercept write access and
__hvm_copy() to intercept exec, read and write access.
A new return type was added, HVMTRANS_need_retry, in order to have all
the places that consume HVMTRANS* return X86EMUL_RETRY.
hvm_emulate_send_vm_event() can return false if there was no violation,
if there was an error from monitor_traps() or p2m_get_mem_access().
-ESRCH from p2m_get_mem_access() is treated as restricted access.
NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable
arch.vm_event->send_event
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Paul Durrant <paul@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Jan Beulich [Tue, 24 Sep 2019 08:48:44 +0000 (10:48 +0200)]
x86/traps: widen condition for logging top-of-stack
Despite -fno-omit-frame-pointer the compiler may omit the frame pointer,
often for relatively simple leaf functions. (To give a specific example,
the case I've run into this with is _pci_hide_device() and gcc 8.
Interestingly the even more simple neighboring iommu_has_feature() does
get a frame pointer set up, around just a single instruction. But this
may be a result of the size-of-asm() effects discussed elsewhere.)
Log the top-of-stack value if it looks valid _or_ if RIP looks invalid.
Also annotate all stack trace entries with a marker, to indicate their
origin:
R: register state
F: frame pointer based
S: raw stack contents
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 24 Sep 2019 08:47:53 +0000 (10:47 +0200)]
x86/traps: guard top-of-stack reads
Nothing guarantees that the original frame's stack pointer points at
readable memory. Avoid a (likely nested) crash by attaching exception
recovery to the read (making it a single read at the same time). Don't
even invoke _show_trace() in case of a non-readable top slot.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Mon, 23 Sep 2019 13:26:52 +0000 (14:26 +0100)]
libxl: Fix build when LIBXL_API_VERSION is set
The compatibility function mistakenly called itself.
Fixes: 95627b87c3159928458ee586e8c5c593bdd248d8 Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
We want to limit number of shared buffers that guest can register in
OP-TEE. Every such buffer consumes XEN resources and we don't want
guest to exhaust XEN. So we choose arbitrary limit for shared buffers.
xen/arm: optee: check for preemption while freeing shared buffers
We can check for hypercall_preempt_check() in the loop inside
optee_relinquish_resources() to increase hypervisor responsiveness in
case if preemption is required.
xen/arm: optee: impose limit on shared buffer size
We want to limit number of calls to lookup_and_pin_guest_ram_addr()
per one request. There are two ways to do this: either preempt
translate_noncontig() or limit size of one shared buffer size.
It is quite hard to preempt translate_noncontig(), because it is deep
nested. So we chose the second option. We will allow 129 pages per one
shared buffer. This corresponds to the GP standard, as it requires
that size limit for shared buffer should be at least 512kB. One extra
page (129th) is needed to cope with the fact that user's buffer is not
necessary aligned with page boundary.
Also, with this limitation OP-TEE still passes own "xtest" test suite,
so this is okay for now.
Anthony PERARD [Fri, 20 Sep 2019 16:19:02 +0000 (17:19 +0100)]
tools/ocaml: Build fix following libxl API changes
The following libxl API became asynchronous and gained an additional
`ao_how' parameter:
libxl_domain_pause()
libxl_domain_unpause()
libxl_send_trigger()
Adapt the ocaml binding.
Build tested only.
Fixes: edaa631ddcee665cdfae1cf6bc7492c791e01ef4 Fixes: 95627b87c3159928458ee586e8c5c593bdd248d8 Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/arm: livepatch: Prevent CPUs to fetch stale instructions after livepatching
During livepatch, a single CPU will take care of applying the patch and
all the others will wait for the action to complete. They will then once
execute arch_livepatch_post_action() to flush the pipeline.
Per B2.2.5 "Concurrent modification and execution of instructions" in
DDI 0487E.a, flushing the instruction cache is not enough to ensure new
instructions are seen. All the PEs should also do an isb() to
synchronize the fetched instruction stream.
xen/arm32: setup: Give a xenheap page to the boot allocator
After commit 6e3e771203 "xen/arm: setup: Relocate the Device-Tree later on
in the boot", the boot allocator will not receive any xenheap page (i.e.
mapped page) on Arm32.
However, the boot allocator implicitly relies on having the first page
already mapped and therefore result to break boot on Arm32.
The easiest way for now is to give a xenheap page to the boot allocator.
We may want to rethink the interface in the future.
[stefano: fix grammar in commit message]
Fixes: 6e3e771203 ('xen/arm: setup: Relocate the Device-Tree later on in the boot') Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Anthony PERARD [Tue, 13 Aug 2019 14:48:27 +0000 (15:48 +0100)]
libxlu: Handle += in config files
Handle += of both strings and lists.
If += is used for config options expected to be numbers, then a
warning is printed and the config option ignored (because xl ignores
config options with errors).
This is to be used for development purposes, where modifying config
option can be done on the `xl create' command line.
One could have a cmdline= in the cfg file, and specify cmdline+= on
the `xl create` command line without the need to write the whole
cmdline in `xl' command line but simply append to the one in the cfg file.
Or add an extra vif or disk by simply having "vif += [ '', ];" in the
`xl' cmdline.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 19 Sep 2019 16:52:24 +0000 (17:52 +0100)]
libxl_pci: Extract common part of *qemu_trad_watch_state_cb
Functions pci_add_qemu_trad_watch_state_cb and
pci_remove_qemu_trad_watch_state_cb are similar so the common part is
extracted in a different function.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 30 May 2019 17:08:45 +0000 (18:08 +0100)]
libxl: Use ev_qmp in libxl_set_vcpuonline
Removed libxl__qmp_cpu_add since it's not used anymore.
`cpumap' arg of libxl__set_vcpuonline_xenstore is constified.
The QMP command "query-cpus" is going to be called from different
places, so the algorithm that parse the answer is in a separate
function, qmp_parse_query_cpus.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Tue, 30 Jul 2019 14:56:30 +0000 (15:56 +0100)]
libxl_pci: Only check if qemu-dm is running in qemu-trad case
QEMU upstream (or qemu-xen) may not have set "running" state in
xenstore. "running" with QEMU doesn't mean that the binary is
running, it means that the emulation have started. When adding a
pci-passthrough device to QEMU, we do so via QMP, we have a direct
answer to whether QEMU is running or not, no need to check ahead.
Moving the check to do it only with qemu-trad makes upcoming changes
simpler.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 9 May 2019 17:08:09 +0000 (18:08 +0100)]
libxl_pci: Coding style of do_pci_add
do_pci_add is going to be asynchronous, so we start by having a single
path out of the function. All `return`s instead set rc and goto out.
While here, some use of `rc' was used to store the return value of
libxc calls, change them to store into `r'. Also, add the value of `r'
in the error message of those calls.
There were an `out' label that was use it seems to skip setting up the
IRQ, the label has been renamed to `out_no_irq'.
No functional changes.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Wed, 8 May 2019 14:23:52 +0000 (15:23 +0100)]
libxl: Use aodev for libxl__device_usbdev_remove
This also mean libxl__initiate_device_usbctrl_remove, which uses
libxl__device_usbdev_remove synchronously, needs to be updated to use
it with multidev.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Tue, 7 May 2019 14:54:08 +0000 (15:54 +0100)]
libxl: Add device_{config,type} to libxl__ao_device
These two fields help to give more information about the device been
hotplug/hotunplug to callbacks.
There is already `dev' of type `libxl__device', but it is mostly
useful when the backend/frontend is xenstore. Some device (like
`usbdev') don't have devid, so `dev' can't be used.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Wed, 17 Apr 2019 16:16:07 +0000 (17:16 +0100)]
libxl: Add libxl__ev_qmp to libxl__ao_device
`aodev->qmp' is initialised in libxl__prepare_ao_device(), but since
there isn't a single exit path for a `libxl__ao_device', users of this
new `qmp' field will have to disposed of it.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Tue, 7 May 2019 16:18:56 +0000 (17:18 +0100)]
libxl: Inline do_usbdev_remove into libxl__device_usbdev_remove
Having the function do_usbdev_remove makes it harder to add asynchronous
calls into it. Move its body back into libxl__device_usbdev_remove and
adjust the latter as there are no reason to have a separated function.
No functional changes.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 18 Apr 2019 11:10:30 +0000 (12:10 +0100)]
libxl: Inline do_usbdev_add into libxl__device_usbdev_add
Having the function do_usbdev_add makes it harder to add asynchronous
calls into it. Move its body back into libxl__device_usbdev_add and
adjust the latter as there are no reason to have a separated function.
No functional changes.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Sun, 26 May 2019 12:37:44 +0000 (13:37 +0100)]
libxl: Re-introduce libxl__domain_resume
libxl__domain_resume is a rework libxl__domain_resume_deprecated. It
makes uses of ev_xswatch and ev_qmp, to replace synchronous QMP calls
and libxl__wait_for_device_model_deprecated call.
This patch also introduce libxl__dm_resume which is a sub-operation of
both libxl__domain_resume and libxl__domain_unpause and can be used
instead of libxl__domain_resume_device_model_deprecated.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 23 May 2019 14:07:52 +0000 (15:07 +0100)]
libxl: Deprecate libxl__domain_{unpause,resume}
These two functions are used from many places in libxl and need to
change to be able to accomodate libxl__ev_qmp calls and thus needs to
be asynchronous.
(There is also libxl__domain_resume_device_model in the mix.)
A later patch will introduce a new libxl__domain_resume and
libxl__domain_unpause which will make use of libxl__ev_qmp.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Fri, 17 May 2019 09:39:13 +0000 (10:39 +0100)]
libxl: Replace libxl__qmp_initializations by ev_qmp calls
Setup a timeout of 10s for all the commands. It used to be about 5s
per commands.
The order of command is changed, we call 'query-vnc' before
'change-vnc-password', but that should not matter. That makes it
easier to call 'change-vnc-password' conditionally.
Also 'change' command is replaced by 'change-vnc-password'
because 'change' is deprecated. The new command is available in all
QEMU versions that also have Xen support.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 16 May 2019 13:23:28 +0000 (14:23 +0100)]
libxl: Move "qmp_initializations" to libxl_dm
libxl__qmp_initializations is part of the device domain startup, it
queries information about the newly spawned QEMU and do some
post-startup configuration. So the function call doesn't belong to the
general domain creation, but only to the device model part of the
process, thus the call belong to libxl_dm and libxl__dm_spawn_state's
machinery.
We move the call ahead of a follow-up patch which going to "inline"
libxl__qmp_initializations.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 13 Jun 2019 15:51:29 +0000 (16:51 +0100)]
libxl_usb: Use usbctrl instead of usbctrlinfo
The functions that calls usbctrl_getinfo() only needs information that
can be found in a `libxl_device_usbctrl'. So avoid calling
libxl_device_usbctrl_getinfo and call libxl_devid_to_device_usbctrl
instead. (libxl_device_usbctrl_getinfo needs a `libxl_device_usbctrl'
anyway.)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 13 Jun 2019 15:26:40 +0000 (16:26 +0100)]
libxl_usb: usbctrl, make use of generic device handling functions
Two functions in generate `libxl_device_usbctrl' can be replaced by
generic macro:
- libxl_device_usbctrl_list -> LIBXL_DEFINE_DEVICE_LIST
- libxl_devid_to_device_usbctrl -> LIBXL_DEFINE_DEVID_TO_DEVICE
This patch only needs to define `libxl__usbctrl_devtype.from_xenstore'
to makes use of them.
Small change, libxl_devid_to_device_usbctrl doesn't list all usbctrl
anymore before finding the right one.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 13 Jun 2019 15:42:09 +0000 (16:42 +0100)]
libxl: Constify libxl_device_* param of *_getinfo
The libxl_device_TYPE parameter of all the libxl_device_TYPE_getinfo
function seems to be only used as input to find more information to bi
stored in the libxl_TYPEinfo parameter.
Make sure this is always true and constify the input parameter to avoid
further mistake.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 13 Jun 2019 11:20:31 +0000 (12:20 +0100)]
libxl_usb: Fix wrong usage of asserts
Replace the assert(0) by abort() since the intention in libxl is that
asserts are always compiled in. This patch makes its clear and removes
the need to deal with asserts been compiled out.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Wed, 29 May 2019 16:01:06 +0000 (17:01 +0100)]
libxl_usb: Use proper domid value, from libxl__device
ao->domid isn't a reliable way of getting a domid, it might not be set
(this isn't the case here). The right domid value can be found in the
libxl__device (which is the device we want to remove) attached to
libxl__ao_device.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Fri, 12 Apr 2019 16:54:48 +0000 (17:54 +0100)]
libxl_dom_save: Reorder functions for switch_qemu_logdirty
There are two differents set of callbacks here, one for
libxl__domain_common_switch_qemu_logdirty,
and one for libxl__domain_suspend_common_switch_qemu_logdirty.
The first set calls the second.
Pure code motion.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 9 May 2019 14:04:33 +0000 (15:04 +0100)]
libxl_pci: Constify arg `pcidev' of libxl__device_pci_add_xenstore
libxl__device_pci_add_xenstore doesn't modify `pcidev', so it can be
constified. Also, we don't need pcidev_saved anymore, so remove the
saved copy. (device_add_domain_config is going to make it's own copy
anyway.)
To achieve this, constify pcidev in all functions that
libxl__device_pci_add_xenstore calls.
No functional changes.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 9 May 2019 15:52:33 +0000 (16:52 +0100)]
libxl_pci: Make libxl__create_pci_backend static
libxl__create_pci_backend isn't called from outside of libxl_pci
anymore, and it's only useful as part of the pci_add process, so
remove the prototype from libxl_internal.h.
No functional changes.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 18 Apr 2019 16:26:09 +0000 (17:26 +0100)]
libxl: Rename struct libxl_device_type to libxl__device_type
libxl__device_type is internal to libxl, rename it to the internal
only prefix. And eliminate redundant 'struct' keyword, in accord with
the coding style.
No functional changes.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Tue, 20 Aug 2019 12:22:55 +0000 (13:22 +0100)]
xen/arm: iommu: Panic if not all IOMMUs are initialized
At the moment, the platform can come up with only part of the IOMMUs
initialized. This could lead to a failure later on when building the
hardware domain or even trying to assign a device to a guest.
To avoid unwanted behavior, Xen will not continue if one of the IOMMUs
has not been initialized correctly.
[stefano: fix typo in comment, add '\n' to panic message]
Anthony PERARD [Fri, 22 Mar 2019 15:04:57 +0000 (15:04 +0000)]
libxl_disk: Cut libxl_cdrom_insert into steps ..
.. and use a new "slow" lock to avoid holding the userdata lock across
several functions.
This patch cuts libxl_cdrom_insert into different step/function but
there are still called synchronously. (Taking the ev_lock is the only
step that might be asynchronous.) A later patch will call them
asynchronously when QMP is involved.
Thee userdata lock (json_lock) use to protect against concurrent change
of cdrom is replaced by an ev_lock which can be held across different
CTX_LOCK sections. The json_lock is still used when reading/modifying
the domain userdata (mandatory) and update xenstore (mostly because
it's updated as the same time as the userdata).
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Fri, 7 Jun 2019 14:19:02 +0000 (15:19 +0100)]
libxl: Add optimisation to ev_lock
It will often be the case that the lock is free to grab. So we first
try to grab it before we have to fork. Even though in this case the
locks are grabbed in the wrong order in the lock hierarchy (ev_lock
should be outside of CTX_LOCK), it is fine to try without blocking. If
that failed, we will release CTX_LOCK and try to grab both lock again
in the right order.
That optimisation is only enabled in releases (debug=n) so the more
complicated code with fork is actually exercised.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 6 Jun 2019 13:32:11 +0000 (14:32 +0100)]
libxl_internal: Introduce libxl__ev_devlock for devices hotplug via QMP
The current lock `domain_userdata_lock' can't be used when modification
to a guest is done by sending command to QEMU, this is a slow process
and requires to call CTX_UNLOCK, which is not possible while holding
the `domain_userdata_lock'.
To resolve this issue, we create a new lock which can take over part
of the job of the json_lock.
This lock is outside CTX_LOCK in the lock hierarchy.
libxl__ev_devlock_lock will have CTX_UNLOCK before trying to grab the
ev_devlock. The callback is used to notify when the ev_devlock have
been acquired.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 18 Sep 2019 16:10:15 +0000 (17:10 +0100)]
tools/configure: Allow specifying python to be found from path
./configure takes a PYTHON=... argument. You can use this to specify
the python interpreter. However, for no good reason, it expects an
absolute path.
Fix this. The new logic is:
* if not set, default to `python'
* if not absolute, look it up with type -p
* split into directory and executable name
The results in config/Tools.mk (which contains @PYTHON@ and
@PYTHONPATH@) are identical for both
./configure
./configure PYTHON=/usr/bin/python
so I assert this has no functional change except that now you can say
./configure PYTHON=python
In particular you can now say
./configure PYTHON=python2
./configure PYTHON=python3
The latter is useful if you want python3 (which should probably be the
default, but does not work right now). The former is useful if you
want python2 but your distro has foolishly made "python" refer to
python3.
CC: Doug Goldstein <cardoe@cardoe.com> CC: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Mon, 9 Sep 2019 10:43:28 +0000 (11:43 +0100)]
x86: Misc trivial cleanup of bootsym_phys()
In smpboot, there is no need to abstract setup_trampoline() away. Drop the
define and use bootsym_phys() directly.
In tboot, the 3 size calculations are invariant of their bootsym_phys()/__pa()
transformations, but the compiler can't tell this. Drop the tranformations,
which simplifies the compiled function.
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-134 (-134)
Function old new delta
tboot_shutdown 620 486 -134
Total: Before=3337042, After=3336908, chg -0.00%
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
tools/arm: optee: create optee firmware node in DT if tee=optee
If TEE support is enabled with "tee=optee" option in xl.cfg,
then we need to inform guest about available TEE, by creating
corresponding node in the guest's device tree.
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 18 Sep 2019 13:20:00 +0000 (15:20 +0200)]
x86/CPUID: drop INVPCID dependency on PCID
PCID validly depends on LM, as it can be enabled in Long Mode only.
INVPCID, otoh, can be used not only without PCID enabled, but also
outside of Long Mode altogether. In both cases its functionality is
simply restricted to PCID 0, which is sort of expected as no other PCID
can be activated there.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 18 Sep 2019 13:14:49 +0000 (15:14 +0200)]
x86: limit the amount of TLB flushing in switch_cr3_cr4()
We really need to flush the TLB just once, if we do so with or after the
CR3 write. The only case where two flushes are unavoidable is when we
mean to turn off CR4.PGE (perhaps just temporarily; see the code
comment).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 18 Sep 2019 13:13:21 +0000 (15:13 +0200)]
x86emul: treat Hygon guests like AMD ones
For some reason the Hygon enabling series left out the insn emulator.
Make appropriate adjustments wherever we've been special casing AMD.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Wed, 18 Sep 2019 13:12:33 +0000 (15:12 +0200)]
core-parking: interact with runtime SMT-disabling
When disabling SMT at runtime, secondary threads should no longer be
candidates for bringing back up in response to _PUR ACPI events. Purge
them from the tracking array.
Doing so involves adding locking to guard accounting data in the core
parking code. While adding the declaration for the lock, take the
liberty to drop two unnecessary forward function declarations.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
make subdirs-install
make[2]: Entering directory `/home/travis/build/andyhhp/xen/tools'
make[3]: Entering directory `/home/travis/build/andyhhp/xen/tools'
make -C libs install
make[4]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs'
make[5]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs'
make -C toolcore install
make[6]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs/toolcore'
make libs
make[7]: Entering directory`/home/travis/build/andyhhp/xen/tools/libs/toolcore'
for i in include/xentoolcore.h include/xentoolcore_internal.h; do \
gcc -x c -ansi -Wall -Werror -I<snip>/xen/tools/libs/toolcore/../../../tools/include \
-S -o /dev/null $i || exit 1; \
echo $i; \
done >headers.chk.new
include/xentoolcore_internal.h:30:31: fatal error: _xentoolcore_list.h: No such file or directory
#include "_xentoolcore_list.h"
^
compilation terminated.
make[7]: *** [headers.chk] Error 1
The problem is that xentoolcore_internal.h includes _xentoolcore_list.h which
hasn't been generated yet.
The toolcore headers.chk rule (unlike the other libraries) had an additional
dependency against $(AUTOINCS), which forced the headers to be generated
first. Replicate this in the common libs.mk
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/arm: Zero BSS after the MMU and D-cache is turned on
At the moment BSS is zeroed before the MMU and D-Cache is turned on.
In other words, the cache will be bypassed when zeroing the BSS section.
On Arm64, per the Image protocol [1], the state of the cache for BSS region
is not known because it is not part of the "loaded kernel image".
On Arm32, the boot protocol [2] does not mention anything about the
state of the cache. Therefore, it should be assumed that it is not known
for BSS region.
This means that the cache will need to be invalidated twice for the BSS
region:
1) Before zeroing to remove any dirty cache line. Otherwise they may
get evicted while zeroing and therefore overriding the value.
2) After zeroing to remove any cache line that may have been
speculated. Otherwise when turning on MMU and D-Cache, the CPU may
see old values.
At the moment, the only reason to have BSS zeroed early is because the
boot page tables are part of it. To avoid the two cache invalidations,
it would be better if the boot page tables are part of the "loaded
kernel image" and therefore be zeroed when loading the image into
memory. A good candidate is the section .data.page_aligned.
A new macro DEFINE_BOOT_PAGE_TABLE is introduced to create and mark
page-tables used before BSS is zeroed. This includes all boot_* but also
xen_fixmap as zero_bss() will print a message when earlyprintk is
enabled.
Boot CPU and secondary CPUs will use different entry point to C code. At
the moment, the decision on which entry to use is taken within launch().
In order to avoid using conditional instruction and make the call
clearer, launch() is reworked to take in parameters the entry point and its
arguments.
Lastly, document the behavior and the main registers usage within the
function.
Andrew Cooper [Fri, 13 Sep 2019 16:17:21 +0000 (17:17 +0100)]
drivers/acpi: Drop "ERST table was not found" message
ERST isn't a mandatory table, and also isn't very common to find. The message
is unnecessary noise during boot. Furthermore, it is redundant with the list
of found ACPI tables printed just ahead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
This patch defines a new bit reported in the hw_cap field of struct
xen_sysctl_physinfo to indicate whether the platform supports sharing of
HAP page tables (i.e. the P2M) with the IOMMU. This informs the toolstack
whether the domain needs extra memory to store discrete IOMMU page tables
or not.
NOTE: This patch makes sure iommu_hap_pt_shared is clear if HAP is not
supported or the IOMMU is disabled, and defines it to false if
!CONFIG_HVM.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Julien Grall <julien.grall@arm.com>