xenbits.xensource.com Git - people/hx242/xen.git/log

xen: sched: Fix Arm build after commit f855dd9625

Commit f855dd9625 "sched: add minimalistic idle scheduler for free cpus"
introduce the use of ZERO_BLOCK_PTR in the scheduler code. However, the
define does not exist outside of xmalloc_tsf.c for non-x86 architecture.

This will result to a compilation error on Arm:

schedule.c: In function ‘sched_idle_alloc_vdata’:
schedule.c:100:12: error: ‘ZERO_BLOCK_PTR’ undeclared (first use in this function)
return ZERO_BLOCK_PTR;
^~~~~~~~~~~~~~
schedule.c:100:12: note: each undeclared identifier is reported only once for each function it appears in
schedule.c:101:1: error: control reaches end of non-void function [-Werror=return-type]
}
^
cc1: all warnings being treated as errors

To avoid the compilation error, the default definition for
ZERO_BLOCK_PTR is now moved in xen/config.h allowing all the code to use
the define.

Fixes: f855dd9625 ('sched: add minimalistic idle scheduler for free cpus')
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

sched: switch to debugtrace in cpupool handling

Instead of having a cpupool_dprintk() define just use debugtrace.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>

sched: add minimalistic idle scheduler for free cpus

Instead of having a full blown scheduler running for the free cpus
add a very minimalistic scheduler for that purpose only ever scheduling
the related idle vcpu. This has the big advantage of not needing any
per-cpu, per-domain or per-scheduling unit data for free cpus and in
turn simplifying moving cpus to and from cpupools a lot.

Right now, CPUs that are not in any pool, still belong to Pool-0's
scheduler. This forces us to make, within the scheduler, extra effort
to avoid actually running vCPUs on those.

In the case of Credit1, this also cause issue to weights
(re)distribution, as the number of CPUs available to the scheduler is
wrong.

This is described in the changelog of commit e7191920261d ("xen:
credit2: never consider CPUs outside of our cpupool").

This new scheduler will just use a common lock for all free cpus.

As this new scheduler is not user selectable don't register it as an
official scheduler, but just include it in schedule.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>

sched: remove cpu from pool0 before removing it

Today a cpu which is removed from the system is taken directly from
Pool0 to the offline state. This will conflict with the new idle
scheduler, so remove it from Pool0 first. Additionally accept removing
a free cpu instead of requiring it to be in Pool0.

For the resume failed case we need to call the scheduler code for that
situation after the cpupool handling, so move the scheduler code into
a function and call it from cpupool_cpu_remove_forced() and remove the
CPU_RESUME_FAILED case from cpu_schedule_callback().

Note that we are calling now schedule_cpu_switch() in stop_machine
context so we need to switch from spinlock_irq to spinlock_irqsave.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Tested-by: Dario Faggioli <dfaggioli@suse.com>

libxc/x86: avoid certain overflows in CPUID APIC ID adjustments

Recent AMD processors may report up to 128 logical processors in CPUID
leaf 1. Doubling this value produces 0 (which OSes sincerely dislike),
as the respective field is only 8 bits wide. Suppress doubling the value
(and its leaf 0x80000008 counterpart) in such a case.

Note that while there's a similar overflow in intel_xc_cpuid_policy(),
that one is being left alone for now.

Note further that while it was considered to suppress the multiplication
by 2 altogether if the host topology already provides at least one bit
of thread ID within APIC IDs, it was decided to avoid more change here
than really needed at this point.

Also zap leaf 4 (and at the same time leaf 2) EDX output for AMD, as it
should have been from the beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/emulate: send vm_event from emulate

A/D bit writes (on page walks) can be considered benign by an introspection
agent, so receiving vm_events for them is a pessimization. We try here to
optimize by filtering these events out.
Currently, we are fully emulating the instruction at RIP when the hardware sees
an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
incorrect, because the instruction at RIP might legitimately cause an
EPT fault of its own while accessing a _different_ page from the original one,
where A/D were set.
The solution is to perform the whole emulation, while ignoring EPT restrictions
for the walk part, and taking them into account for the "actual" emulating of
the instruction at RIP. When we send out a vm_event, we don't want the emulation
to complete, since in that case we won't be able to veto whatever it is doing.
That would mean that we can't actually prevent any malicious activity, instead
we'd only be able to report on it.
When we see a "send-vm_event" case while emulating, we need to first send the
event out and then suspend the emulation (return X86EMUL_RETRY).
After the emulation stops we'll call hvm_vm_event_do_resume() again after the
introspection agent treats the event and resumes the guest. There, the
instruction at RIP will be fully emulated (with the EPT ignored) if the
introspection application allows it, and the guest will continue to run past
the instruction.

A common example is if the hardware exits because of an EPT fault caused by a
page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
If the vm_event was sent and it would be treated so it runs the instruction
at RIP, that instruction might also hit a protected page and provoke a vm_event.

Now if npfec.kind == npfec_kind_in_gpt and d->arch.monitor.inguest_pagefault_disabled
is true then we are in the page walk case and we can do this emulation optimization
and emulate the page walk while ignoring the EPT, but don't ignore the EPT for the
emulation of the actual instruction.

In the first case we would have 2 EPT events, in the second case we would have
1 EPT event if the instruction at the RIP triggers an EPT event.

We use hvmemul_map_linear_addr() to intercept write access and
__hvm_copy() to intercept exec, read and write access.

A new return type was added, HVMTRANS_need_retry, in order to have all
the places that consume HVMTRANS* return X86EMUL_RETRY.

hvm_emulate_send_vm_event() can return false if there was no violation,
if there was an error from monitor_traps() or p2m_get_mem_access().
-ESRCH from p2m_get_mem_access() is treated as restricted access.

NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable
arch.vm_event->send_event

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Petre Pircalabu <ppircalabu@bitdefender.com>

x86/traps: widen condition for logging top-of-stack

Despite -fno-omit-frame-pointer the compiler may omit the frame pointer,
often for relatively simple leaf functions. (To give a specific example,
the case I've run into this with is _pci_hide_device() and gcc 8.
Interestingly the even more simple neighboring iommu_has_feature() does
get a frame pointer set up, around just a single instruction. But this
may be a result of the size-of-asm() effects discussed elsewhere.)

Log the top-of-stack value if it looks valid _or_ if RIP looks invalid.

Also annotate all stack trace entries with a marker, to indicate their
origin:
R: register state
F: frame pointer based
S: raw stack contents

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/traps: guard top-of-stack reads

Nothing guarantees that the original frame's stack pointer points at
readable memory. Avoid a (likely nested) crash by attaching exception
recovery to the read (making it a single read at the same time). Don't
even invoke _show_trace() in case of a non-readable top slot.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

libxl: Fix build when LIBXL_API_VERSION is set

The compatibility function mistakenly called itself.

Fixes: 95627b87c3159928458ee586e8c5c593bdd248d8
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xen/arm: optee: limit number of shared buffers

We want to limit number of shared buffers that guest can register in
OP-TEE. Every such buffer consumes XEN resources and we don't want
guest to exhaust XEN. So we choose arbitrary limit for shared buffers.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>

xen/arm: optee: check for preemption while freeing shared buffers

We can check for hypercall_preempt_check() in the loop inside
optee_relinquish_resources() to increase hypervisor responsiveness in
case if preemption is required.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>

xen/arm: optee: impose limit on shared buffer size

We want to limit number of calls to lookup_and_pin_guest_ram_addr()
per one request. There are two ways to do this: either preempt
translate_noncontig() or limit size of one shared buffer size.

It is quite hard to preempt translate_noncontig(), because it is deep
nested. So we chose the second option. We will allow 129 pages per one
shared buffer. This corresponds to the GP standard, as it requires
that size limit for shared buffer should be at least 512kB. One extra
page (129th) is needed to cope with the fact that user's buffer is not
necessary aligned with page boundary.

Also, with this limitation OP-TEE still passes own "xtest" test suite,
so this is okay for now.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>

tools/ocaml: Build fix following libxl API changes

The following libxl API became asynchronous and gained an additional
`ao_how' parameter:
    libxl_domain_pause()
    libxl_domain_unpause()
    libxl_send_trigger()

Adapt the ocaml binding.

Build tested only.

Fixes: edaa631ddcee665cdfae1cf6bc7492c791e01ef4
Fixes: 95627b87c3159928458ee586e8c5c593bdd248d8
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xen/arm: livepatch: Prevent CPUs to fetch stale instructions after livepatching

During livepatch, a single CPU will take care of applying the patch and
all the others will wait for the action to complete. They will then once
execute arch_livepatch_post_action() to flush the pipeline.

Per B2.2.5 "Concurrent modification and execution of instructions" in
DDI 0487E.a, flushing the instruction cache is not enough to ensure new
instructions are seen. All the PEs should also do an isb() to
synchronize the fetched instruction stream.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>

xen/arm32: setup: Give a xenheap page to the boot allocator

After commit 6e3e771203 "xen/arm: setup: Relocate the Device-Tree later on
in the boot", the boot allocator will not receive any xenheap page (i.e.
mapped page) on Arm32.

However, the boot allocator implicitly relies on having the first page
already mapped and therefore result to break boot on Arm32.

The easiest way for now is to give a xenheap page to the boot allocator.
We may want to rethink the interface in the future.

[stefano: fix grammar in commit message]

Fixes: 6e3e771203 ('xen/arm: setup: Relocate the Device-Tree later on in the boot')
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

libxlu: Handle += in config files

Handle += of both strings and lists.

If += is used for config options expected to be numbers, then a
warning is printed and the config option ignored (because xl ignores
config options with errors).

This is to be used for development purposes, where modifying config
option can be done on the `xl create' command line.

One could have a cmdline= in the cfg file, and specify cmdline+= on
the `xl create` command line without the need to write the whole
cmdline in `xl' command line but simply append to the one in the cfg file.
Or add an extra vif or disk by simply having "vif += [ '', ];" in the
`xl' cmdline.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools: fix linking hypervisor includes to tools include directory

An incremental build of tools/include won't pickup new hypervisor
headers in tools/include/xen. Fix that.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Extract common part of *qemu_trad_watch_state_cb

Functions pci_add_qemu_trad_watch_state_cb and
pci_remove_qemu_trad_watch_state_cb are similar so the common part is
extracted in a different function.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: libxl_qemu_monitor_command now uses ev_qmp

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: libxl_retrieve_domain_configuration now uses ev_qmp

This was the last user of libxl__qmp_query_cpus which can now be
removed.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Use ev_qmp in libxl_set_vcpuonline

Removed libxl__qmp_cpu_add since it's not used anymore.

`cpumap' arg of libxl__set_vcpuonline_xenstore is constified.

The QMP command "query-cpus" is going to be called from different
places, so the algorithm that parse the answer is in a separate
function, qmp_parse_query_cpus.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Use ev_qmp for libxl_send_trigger

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Use ev_qmp for pci_remove

This patch also replaces the use of
libxl__wait_for_device_model_deprecated() by its equivalent
without the need for a thread.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Use libxl__ao_device with pci_remove

This is in preparation of using asynchronous operation to communicate
with QEMU via QMP (libxl__ev_qmp).

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Use ev_qmp in do_pci_add

This patch also replaces the use of
libxl__wait_for_device_model_deprecated() by its equivalent
without the need for a thread.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Use libxl__ao_device with libxl__device_pci_add

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Only check if qemu-dm is running in qemu-trad case

QEMU upstream (or qemu-xen) may not have set "running" state in
xenstore. "running" with QEMU doesn't mean that the binary is
running, it means that the emulation have started. When adding a
pci-passthrough device to QEMU, we do so via QMP, we have a direct
answer to whether QEMU is running or not, no need to check ahead.

Moving the check to do it only with qemu-trad makes upcoming changes
simpler.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Coding style of do_pci_add

do_pci_add is going to be asynchronous, so we start by having a single
path out of the function. All `return`s instead set rc and goto out.

While here, some use of `rc' was used to store the return value of
libxc calls, change them to store into `r'. Also, add the value of `r'
in the error message of those calls.

There were an `out' label that was use it seems to skip setting up the
IRQ, the label has been renamed to `out_no_irq'.

No functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Remove libxl__qmp_run_command_flexarray

There are no more users.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: libxl__initiate_device_usbdev_remove now use ev_qmp

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Use aodev for libxl__device_usbdev_remove

This also mean libxl__initiate_device_usbctrl_remove, which uses
libxl__device_usbdev_remove synchronously, needs to be updated to use
it with multidev.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_usb: Make libxl__device_usbdev_add uses ev_qmp

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_usb: Make libxl__initiate_device_usbctrl_remove uses ev_qmp

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_usb: Make libxl__device_usbctrl_add uses ev_qmp

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Add device_{config,type} to libxl__ao_device

These two fields help to give more information about the device been
hotplug/hotunplug to callbacks.

There is already `dev' of type `libxl__device', but it is mostly
useful when the backend/frontend is xenstore. Some device (like
`usbdev') don't have devid, so `dev' can't be used.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Add libxl__ev_qmp to libxl__ao_device

`aodev->qmp' is initialised in libxl__prepare_ao_device(), but since
there isn't a single exit path for a `libxl__ao_device', users of this
new `qmp' field will have to disposed of it.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Inline do_usbdev_remove into libxl__device_usbdev_remove

Having the function do_usbdev_remove makes it harder to add asynchronous
calls into it. Move its body back into libxl__device_usbdev_remove and
adjust the latter as there are no reason to have a separated function.

No functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Inline do_usbdev_add into libxl__device_usbdev_add

Having the function do_usbdev_add makes it harder to add asynchronous
calls into it. Move its body back into libxl__device_usbdev_add and
adjust the latter as there are no reason to have a separated function.

No functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_domain: Convert libxl_domain_unpause to use libxl__domain_unpause

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_dm: Update libxl__spawn_stub_dm to use libxl__domain_unpause

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Re-introduce libxl__domain_unpause

libxl__domain_unpause is a reimplementation of
libxl__domain_unpause_deprecated with asynchronous operation.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_domain: Convert libxl_domain_resume to use libxl__domain_resume

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Re-introduce libxl__domain_resume

libxl__domain_resume is a rework libxl__domain_resume_deprecated. It
makes uses of ev_xswatch and ev_qmp, to replace synchronous QMP calls
and libxl__wait_for_device_model_deprecated call.

This patch also introduce libxl__dm_resume which is a sub-operation of
both libxl__domain_resume and libxl__domain_unpause and can be used
instead of libxl__domain_resume_device_model_deprecated.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Deprecate libxl__domain_{unpause,resume}

These two functions are used from many places in libxl and need to
change to be able to accomodate libxl__ev_qmp calls and thus needs to
be asynchronous.

(There is also libxl__domain_resume_device_model in the mix.)

A later patch will introduce a new libxl__domain_resume and
libxl__domain_unpause which will make use of libxl__ev_qmp.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Replace libxl__qmp_initializations by ev_qmp calls

Setup a timeout of 10s for all the commands. It used to be about 5s
per commands.

The order of command is changed, we call 'query-vnc' before
'change-vnc-password', but that should not matter. That makes it
easier to call 'change-vnc-password' conditionally.

Also 'change' command is replaced by 'change-vnc-password'
because 'change' is deprecated. The new command is available in all
QEMU versions that also have Xen support.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Move "qmp_initializations" to libxl_dm

libxl__qmp_initializations is part of the device domain startup, it
queries information about the newly spawned QEMU and do some
post-startup configuration. So the function call doesn't belong to the
general domain creation, but only to the device model part of the
process, thus the call belong to libxl_dm and libxl__dm_spawn_state's
machinery.

We move the call ahead of a follow-up patch which going to "inline"
libxl__qmp_initializations.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Use ev_qmp for switch_qemu_xen_logdirty

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Make libxl_qemu_monitor_command async

.. because it makes QMP calls which are going to be async.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Make libxl_retrieve_domain_configuration async

.. because it makes QMP calls which are going to be async.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Make libxl_set_vcpuonline async

.. because it makes QMP calls which are going to be async.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Make libxl_send_trigger async

.. because it makes QMP calls which are going to be async.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Make libxl_domain_unpause async

libxl_domain_unpause needs to make QMP calls, which are asynchronous,
change the API to reflect that.

Do the same with libxl_domain_pause async, even if it will keep
completing synchronously.

Also fix some coding style issue in those functions.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_usb: Use usbctrl instead of usbctrlinfo

The functions that calls usbctrl_getinfo() only needs information that
can be found in a `libxl_device_usbctrl'. So avoid calling
libxl_device_usbctrl_getinfo and call libxl_devid_to_device_usbctrl
instead. (libxl_device_usbctrl_getinfo needs a `libxl_device_usbctrl'
anyway.)

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_usb: usbctrl, make use of generic device handling functions

Two functions in generate `libxl_device_usbctrl' can be replaced by
generic macro:
- libxl_device_usbctrl_list -> LIBXL_DEFINE_DEVICE_LIST
- libxl_devid_to_device_usbctrl -> LIBXL_DEFINE_DEVID_TO_DEVICE

This patch only needs to define `libxl__usbctrl_devtype.from_xenstore'
to makes use of them.

Small change, libxl_devid_to_device_usbctrl doesn't list all usbctrl
anymore before finding the right one.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Constify libxl_device_* param of *_getinfo

The libxl_device_TYPE parameter of all the libxl_device_TYPE_getinfo
function seems to be only used as input to find more information to bi
stored in the libxl_TYPEinfo parameter.

Make sure this is always true and constify the input parameter to avoid
further mistake.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_usb: Fix libxl_device_usbctrl_getinfo

`usbctrl' is modified in this function which doesn't seems to be
intended, and usbctrlinfo.backend_id was never modified.

Take this opportunity to consify the argument `usbctrl' in libxl API
to avoid similar mistake.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_usb: Fix wrong usage of asserts

Replace the assert(0) by abort() since the intention in libxl is that
asserts are always compiled in. This patch makes its clear and removes
the need to deal with asserts been compiled out.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_usb: Use proper domid value, from libxl__device

ao->domid isn't a reliable way of getting a domid, it might not be set
(this isn't the case here). The right domid value can be found in the
libxl__device (which is the device we want to remove) attached to
libxl__ao_device.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_domain: Cleanup libxl__destroy_domid

- dom_path isn't used anymore in that function, remove it.
- Use `r' to store return value of external calls.
- Use `CTX', no need for a local `ctx'.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Comment libxl__dm_spawn_state aboud init and dispose

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_dm: Fix initialisation of libxl__stub_dm_spawn_state

sdss->pvqemu wasn't initialiased and disposed of properly.
Also, move the initialisation of sdss->xswait with the rest of the
initialisation of sdss.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_dom_save: Reorder functions for switch_qemu_logdirty

There are two differents set of callbacks here, one for
libxl__domain_common_switch_qemu_logdirty,
and one for libxl__domain_suspend_common_switch_qemu_logdirty.

The first set calls the second.

Pure code motion.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: `starting' is a bool

The argument `starting' is used as a boolean, change its type to
reflex that throughout libxl_pci.c.

No functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Constify arg `pcidev' of libxl__device_pci_add_xenstore

libxl__device_pci_add_xenstore doesn't modify `pcidev', so it can be
constified. Also, we don't need pcidev_saved anymore, so remove the
saved copy. (device_add_domain_config is going to make it's own copy
anyway.)

To achieve this, constify pcidev in all functions that
libxl__device_pci_add_xenstore calls.

No functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_pci: Make libxl__create_pci_backend static

libxl__create_pci_backend isn't called from outside of libxl_pci
anymore, and it's only useful as part of the pci_add process, so
remove the prototype from libxl_internal.h.

No functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Remove unused variable in libxl__device_pci_add_xenstore

*device isn't used.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Rename struct libxl_device_type to libxl__device_type

libxl__device_type is internal to libxl, rename it to the internal
only prefix. And eliminate redundant 'struct' keyword, in accord with
the coding style.

No functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xen/arm: iommu: Panic if not all IOMMUs are initialized

At the moment, the platform can come up with only part of the IOMMUs
initialized. This could lead to a failure later on when building the
hardware domain or even trying to assign a device to a guest.

To avoid unwanted behavior, Xen will not continue if one of the IOMMUs
has not been initialized correctly.

[stefano: fix typo in comment, add '\n' to panic message]

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

configure: fix print syntax for python 3

16cc3362a missed one print statement.

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_disk: Use ev_qmp in libxl_cdrom_insert

Make libxl_cdrom_insert asynchronous when QEMU is involved. And
have the cdrom opened by libxl, sending a file descriptor to QEMU.

The "opaque" parameter of the "add-fd" can help to figure out what a
fdset in QEMU is used for. It can be queried by "query-fdsets".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Move qmp_parameters_* prototypes to libxl_internal.h

.. and rename them to libxl__qmp_param_*.

This is to allow other files than libxl_qmp.c to make QMP calls with
parameters.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_disk: Implement missing timeout for libxl_cdrom_insert

After the patch "libxl_disk: Use ev_qmp in libxl_cdrom_insert"
there will not be any kind of timeout, add one back.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_disk: Cut libxl_cdrom_insert into steps ..

.. and use a new "slow" lock to avoid holding the userdata lock across
several functions.

This patch cuts libxl_cdrom_insert into different step/function but
there are still called synchronously. (Taking the ev_lock is the only
step that might be asynchronous.) A later patch will call them
asynchronously when QMP is involved.

Thee userdata lock (json_lock) use to protect against concurrent change
of cdrom is replaced by an ev_lock which can be held across different
CTX_LOCK sections. The json_lock is still used when reading/modifying
the domain userdata (mandatory) and update xenstore (mostly because
it's updated as the same time as the userdata).

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_disk: Reorganise libxl_cdrom_insert

This is in preparation of cutting libxl_cdrom_insert into several
functions to allow asynchronous callbacks.

No functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Add optimisation to ev_lock

It will often be the case that the lock is free to grab. So we first
try to grab it before we have to fork. Even though in this case the
locks are grabbed in the wrong order in the lock hierarchy (ev_lock
should be outside of CTX_LOCK), it is fine to try without blocking. If
that failed, we will release CTX_LOCK and try to grab both lock again
in the right order.

That optimisation is only enabled in releases (debug=n) so the more
complicated code with fork is actually exercised.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_internal: Introduce libxl__ev_devlock for devices hotplug via QMP

The current lock `domain_userdata_lock' can't be used when modification
to a guest is done by sending command to QEMU, this is a slow process
and requires to call CTX_UNLOCK, which is not possible while holding
the `domain_userdata_lock'.

To resolve this issue, we create a new lock which can take over part
of the job of the json_lock.

This lock is outside CTX_LOCK in the lock hierarchy.
libxl__ev_devlock_lock will have CTX_UNLOCK before trying to grab the
ev_devlock. The callback is used to notify when the ev_devlock have
been acquired.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: Pointer on usage of libxl__domain_userdata_lock

It is currently difficult to know how/when/why the userdata lock is
supposed to be used. Add some pointers to the hotplug comments.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl_internal: Remove lost comment

That comment as been separated from the function it defines by
4197d3abbb3055d3798254eb7ba239bfb5824360, but then was not useful
anymore when the libxl__device_disk_add() prototype was removed by
22ea8ad02e465e32cd40887c750b55c3a997a288.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/configure: Allow specifying python to be found from path

./configure takes a PYTHON=... argument.  You can use this to specify
the python interpreter.  However, for no good reason, it expects an
absolute path.

Fix this.  The new logic is:
* if not set, default to `python'
* if not absolute, look it up with type -p
* split into directory and executable name

The results in config/Tools.mk (which contains @PYTHON@ and
@PYTHONPATH@) are identical for both
  ./configure
  ./configure PYTHON=/usr/bin/python
so I assert this has no functional change except that now you can say
  ./configure PYTHON=python

In particular you can now say
  ./configure PYTHON=python2
  ./configure PYTHON=python3

The latter is useful if you want python3 (which should probably be the
default, but does not work right now).  The former is useful if you
want python2 but your distro has foolishly made "python" refer to
python3.

CC: Doug Goldstein <cardoe@cardoe.com>
CC: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wl@xen.org>

iommu/arm: Order the headers alphabetically in iommu.c

Clean up the code a bit by putting the headers in alphabetical order.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

x86: Misc trivial cleanup of bootsym_phys()

In smpboot, there is no need to abstract setup_trampoline() away.  Drop the
define and use bootsym_phys() directly.

In tboot, the 3 size calculations are invariant of their bootsym_phys()/__pa()
transformations, but the compiler can't tell this.  Drop the tranformations,
which simplifies the compiled function.

  add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-134 (-134)
  Function                                     old     new   delta
  tboot_shutdown                               620     486    -134
  Total: Before=3337042, After=3336908, chg -0.00%

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

tools/arm: optee: create optee firmware node in DT if tee=optee

If TEE support is enabled with "tee=optee" option in xl.cfg,
then we need to inform guest about available TEE, by creating
corresponding node in the guest's device tree.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/arm: tee: add "tee" option for xl.cfg

This enumeration controls TEE type for a domain. Currently there is
two possible options: either 'none' or 'optee'.

'none' is the default value and it basically disables TEE support at
all.

'optee' enables access to the OP-TEE running on a host machine. This
requires special OP-TEE build with virtualization support enabled.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86: PCID is unused when !PV

This allows in particular some streamlining of the TLB flushing code
paths.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/CPUID: drop INVPCID dependency on PCID

PCID validly depends on LM, as it can be enabled in Long Mode only.
INVPCID, otoh, can be used not only without PCID enabled, but also
outside of Long Mode altogether. In both cases its functionality is
simply restricted to PCID 0, which is sort of expected as no other PCID
can be activated there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/HVM: relax shadow mode check in hvm_set_cr3()

There's no need to re-obtain a page reference if only bits not affecting
the address change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: limit the amount of TLB flushing in switch_cr3_cr4()

We really need to flush the TLB just once, if we do so with or after the
CR3 write. The only case where two flushes are unavoidable is when we
mean to turn off CR4.PGE (perhaps just temporarily; see the code
comment).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: adjust cr3_pcid() return type

There's no need for it to be 64 bits wide - only the low twelve bits
of CR3 hold the PCID.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86emul: treat Hygon guests like AMD ones

For some reason the Hygon enabling series left out the insn emulator.
Make appropriate adjustments wherever we've been special casing AMD.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>

core-parking: interact with runtime SMT-disabling

When disabling SMT at runtime, secondary threads should no longer be
candidates for bringing back up in response to _PUR ACPI events. Purge
them from the tracking array.

Doing so involves adding locking to guard accounting data in the core
parking code. While adding the declaration for the lock, take the
liberty to drop two unnecessary forward function declarations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/libs: Fix build following c/s 56dccee3f, take 2

The fix for c/s 01ba8f62b618 was speculative given no local repro. It turns
out that it didn't fix the problem.

The $(AUTOINCS) variable needs to be visible before libs.mk is included, to
have any effect.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/libs: Fix build following c/s 56dccee3f

Travis reports:

  make subdirs-install
  make[2]: Entering directory `/home/travis/build/andyhhp/xen/tools'
  make[3]: Entering directory `/home/travis/build/andyhhp/xen/tools'
  make -C libs install
  make[4]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs'
  make[5]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs'
  make -C toolcore install
  make[6]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs/toolcore'
  make libs
  make[7]: Entering directory`/home/travis/build/andyhhp/xen/tools/libs/toolcore'
  for i in include/xentoolcore.h include/xentoolcore_internal.h; do \
          gcc -x c -ansi -Wall -Werror -I<snip>/xen/tools/libs/toolcore/../../../tools/include \
                    -S -o /dev/null $i || exit 1; \
                        echo $i; \
                        done >headers.chk.new
  include/xentoolcore_internal.h:30:31: fatal error: _xentoolcore_list.h: No such file or directory
   #include "_xentoolcore_list.h"
                                 ^
  compilation terminated.
  make[7]: *** [headers.chk] Error 1

The problem is that xentoolcore_internal.h includes _xentoolcore_list.h which
hasn't been generated yet.

The toolcore headers.chk rule (unlike the other libraries) had an additional
dependency against $(AUTOINCS), which forced the headers to be generated
first.  Replicate this in the common libs.mk

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xen/arm: Zero BSS after the MMU and D-cache is turned on

At the moment BSS is zeroed before the MMU and D-Cache is turned on.
In other words, the cache will be bypassed when zeroing the BSS section.

On Arm64, per the Image protocol [1], the state of the cache for BSS region
is not known because it is not part of the "loaded kernel image".

On Arm32, the boot protocol [2] does not mention anything about the
state of the cache. Therefore, it should be assumed that it is not known
for BSS region.

This means that the cache will need to be invalidated twice for the BSS
region:
    1) Before zeroing to remove any dirty cache line. Otherwise they may
    get evicted while zeroing and therefore overriding the value.
    2) After zeroing to remove any cache line that may have been
    speculated. Otherwise when turning on MMU and D-Cache, the CPU may
    see old values.

At the moment, the only reason to have BSS zeroed early is because the
boot page tables are part of it. To avoid the two cache invalidations,
it would be better if the boot page tables are part of the "loaded
kernel image" and therefore be zeroed when loading the image into
memory. A good candidate is the section .data.page_aligned.

A new macro DEFINE_BOOT_PAGE_TABLE is introduced to create and mark
page-tables used before BSS is zeroed. This includes all boot_* but also
xen_fixmap as zero_bss() will print a message when earlyprintk is
enabled.

[1] linux/Documentation/arm64/booting.txt
[2] linux/Documentation/arm/Booting

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Setup HTTBR in enable_mmu() and add missing isb

At the moment, HTTBR is setup in create_page_tables(). This is fine as
it is called by every CPUs.

However, such assumption may not hold in the future. To make change
easier, the HTTBR is not setup in enable_mmu().

Take the opportunity to add the missing isb() to ensure the HTTBR is
seen before the MMU is turned on.

Lastly, the only use of r5 in create_page_tables() is now removed. So
the register can be removed from the clobber list of the function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm32: head: Rework and document launch()

Boot CPU and secondary CPUs will use different entry point to C code. At
the moment, the decision on which entry to use is taken within launch().

In order to avoid using conditional instruction and make the call
clearer, launch() is reworked to take in parameters the entry point and its
arguments.

Lastly, document the behavior and the main registers usage within the
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

drivers/acpi: Drop "ERST table was not found" message

ERST isn't a mandatory table, and also isn't very common to find. The message
is unnecessary noise during boot. Furthermore, it is redundant with the list
of found ACPI tables printed just ahead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/vpmu: Drop "VPMU: disabled" message

Printing "$foo disabled" is unnecessary noise during boot. All other VPMU
settings emit a message, so this doesn't result in any ambiguity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

tools/libs: put common Makefile parts into new libs.mk

The Makefile below tools/libs have a lot in common. Put those common
parts into a new libs.mk and include that from the specific Makefiles.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>

vpci: honor read-only devices

Don't allow the hardware domain write access the PCI config space of
devices marked as read-only.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

sysctl / libxl: report whether IOMMU/HAP page table sharing is supported

This patch defines a new bit reported in the hw_cap field of struct
xen_sysctl_physinfo to indicate whether the platform supports sharing of
HAP page tables (i.e. the P2M) with the IOMMU. This informs the toolstack
whether the domain needs extra memory to store discrete IOMMU page tables
or not.

NOTE: This patch makes sure iommu_hap_pt_shared is clear if HAP is not
supported or the IOMMU is disabled, and defines it to false if
!CONFIG_HVM.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Julien Grall <julien.grall@arm.com>