Daniel Sabogal [Fri, 25 Aug 2017 21:35:47 +0000 (17:35 -0400)]
libxl/arm: Fix build on arm64 + acpi
With musl, the build fails with the following errors:
actypes.h:202:2: error: #error unknown ACPI_MACHINE_WIDTH
#error unknown ACPI_MACHINE_WIDTH
^~~~~
actypes.h:207:9: error: unknown type name ‘acpi_native_uint’
typedef acpi_native_uint acpi_size;
^~~~~~~~~~~~~~~~
actypes.h:617:3: error: unknown type name ‘acpi_io_address’
acpi_io_address pblk_address;
^~~~~~~~~~~~~~~
This likely went undetected with glibc builds since glibc
indirectly pulls __BITS_PER_LONG from the linux headers
through a standard header. For musl, this is not the case.
Instead, use BITS_PER_LONG to fix the build.
Signed-off-by: Daniel Sabogal <dsabogalcc@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Tue, 29 Aug 2017 08:50:24 +0000 (09:50 +0100)]
acpi: set correct address of the control/event blocks in the FADT
Commit 149c6b unmasked an issue long present in Xen: the control/event
block addresses provided in the ACPI FADT table where hardcoded to the
V1 version. This was papered over because hvmloader would also always
set HVM_PARAM_ACPI_IOPORTS_LOCATION to 1 regardless of the BIOS
version.
The most notable issue caused by the above bug was that the QEMU
traditional GPE0 block was out of sync: the address provided in the
FADT didn't match the address QEMU was using.
Note that PM1a and TMR worked fine because the V1 address was
hardcoded in the FADT and HVM_PARAM_ACPI_IOPORTS_LOCATION was
unconditionally set to 1 by hvmloader.
Fix this by passing the address of the control/event blocks to
acpi_build_tables, so the values can be properly set in the FADT table
provided to the guest.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Basically, what happens is that runq_tickle() realizes
d0v13 should preempt d2v7, running on cpu 12, as it
has higher credits (10135529 vs. 2619231). It therefore
tickles cpu 12 [1], which, in turn, schedules [2].
But --surprise surprise-- d2v7 has run for less than the
ratelimit interval [3], and hence it is _not_ preempted,
and continues to run. This indeed looks fine. Actually,
this is what ratelimiting is there for. Note, however,
that:
1) we interrupted cpu 12 for nothing;
2) what if, say on cpu 8, there is a vcpu that has:
+ less credit than d0v13 (so d0v13 can well
preempt it),
+ more credit than d2v7 (that's why it was not
selected to be preempted),
+ run for more than the ratelimiting interval
(so it can really be scheduled out)?
With this patch, if we are in case 2), we'd realize
that tickling 12 would be pointless, and we'll continue
looking, eventually finding and tickling 8.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Dario Faggioli [Tue, 29 Aug 2017 09:18:52 +0000 (10:18 +0100)]
xen: credit2: optimize runq_candidate() a little bit
By factoring into one (at the top) all the checks
to see whether current is the idle vcpu, and mark
it as unlikely().
In fact, if current is idle, all the logic for
dealing with yielding, context switching rate
limiting and soft-affinity, is just pure overhead,
and we better rush checking the runq and pick some
vcpu up.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Tue, 29 Aug 2017 09:18:51 +0000 (10:18 +0100)]
xen: credit2: kick away vcpus not running within their soft-affinity
If, during scheduling, we realize that the current vcpu
is running outside of its own soft-affinity, it would be
preferable to send it somewhere else.
Of course, that may not be possible, and if we're too
strict, we risk having vcpus sit in runqueues, even if
there are idle pcpus (violating work-conservingness).
In fact, what about there are no pcpus, from the soft
affinity mask of the vcpu in question, where it can
run?
To make sure we don't fall in the above described trap,
only actually de-schedule the vcpu if there are idle and
not already tickled cpus from its soft affinity where it
can run immediately.
If there is (at least one) of such cpus, we let current
be preempted, so that csched2_context_saved() will put
it back in the runq, and runq_tickle() will wake (one
of) the cpu.
If there is not even one, we let current run where it is,
as running outside its soft-affinity is still better than
not running at all.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Tue, 29 Aug 2017 09:18:51 +0000 (10:18 +0100)]
xen: credit2: soft-affinity awareness in csched2_cpu_pick()
We want to find the runqueue with the least average load,
and to do that, we scan through all the runqueues.
It is, therefore, enough that, during such scan:
- we identify the runqueue with the least load, among
the ones that have pcpus that are part of the soft
affinity of the vcpu we're calling pick on;
- we identify the same, but for hard affinity.
At this point, we can decide whether to go for the
runqueue with the least load among the ones with some
soft-affinity, or overall.
Therefore, at the price of some code reshuffling, we
can avoid the loop.
(Also, kill a spurious ';' in the definition of MAX_LOAD.)
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Tue, 29 Aug 2017 09:18:50 +0000 (10:18 +0100)]
xen: credit2: soft-affinity awareness in gat_fallback_cpu()
By, basically, moving all the logic of the function
inside the usual two steps (soft-affinity step and
hard-affinity step) loop.
While there, add two performance counters (in cpu_pick
and in get_fallback_cpu() itself), in order to be able
to tell how frequently it happens that we need to look
for a fallback cpu.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
George Dunlap [Tue, 29 Aug 2017 09:18:49 +0000 (10:18 +0100)]
xen/credit2: soft-affinity awareness in runq_tickle()
Soft-affinity support is usually implemented by means
of a two step "balancing loop", where:
- during the first step, we consider soft-affinity
(if the vcpu has one);
- during the second (if we get to it), we consider
hard-affinity.
In runq_tickle(), we need to do that for checking
whether we can execute the waking vCPU on an pCPU
that is idle. In fact, we want to be sure that, if
there is an idle pCPU in the vCPU's soft affinity,
we'll use it.
If there are no such idle pCPUs, though, and we
have to check non-idle ones, we can avoid the loop
and to both hard and soft-affinity in one pass.
In fact, we can we scan runqueue and compute a
"score" for each vCPU which is running on each pCPU.
The idea is, since we may have to preempt someone:
- try to make sure that the waking vCPU will run
inside its soft-affinity,
- try to preempt someone that is running outside
of its own soft-affinity.
The value of the score is added to a trace record,
so xenalyze's code and tools/xentrace/formats are
updated accordingly.
Suggested-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Juergen Gross [Mon, 28 Aug 2017 14:49:30 +0000 (16:49 +0200)]
xen: fix boolean parameter handling
Commit 63e8a1e5ffa7a7fdbde887805f673fea7e8d2e94 ("xen: check parameter
validity when parsing command line") introduced a bug for the case
when a boolean parameter was specified by its keyword only (no value).
It would set just the wrong boolean value for that parameter.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Mon, 28 Aug 2017 07:35:00 +0000 (09:35 +0200)]
xen: add hypercall for setting parameters at runtime
Add a sysctl hypercall to support setting parameters similar to
command line parameters, but at runtime. The parameters to set are
specified as a string, just like the boot parameters.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Mon, 28 Aug 2017 07:35:00 +0000 (09:35 +0200)]
xen: add basic support for runtime parameter changing
Add the needed infrastructure for runtime parameter changing similar
to that used at boot time via cmdline. We are using the same parsing
functions as for cmdline parsing, but with a different array of
parameter definitions.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Mon, 28 Aug 2017 07:35:00 +0000 (09:35 +0200)]
xen: carve out a generic parsing function from _cmdline_parse()
In order to support generic parameter parsing carve out the parser from
_cmdline_parse(). As this generic function might be called after boot
remove the __init annotations from all called sub-functions.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
With _cmdline_parse() now issuing error messages in case of illegal
parameters signalled by parsing functions specified in custom_param()
the message issued by parse_credit2_runqueue() can be removed.
With _cmdline_parse() now issuing error messages in case of illegal
parameters signalled by parsing functions specified in custom_param()
the message issued by setup_ioapic_ack() can be removed.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
With _cmdline_parse() now issuing error messages in case of illegal
parameters signalled by parsing functions specified in custom_param()
the message issued by parse_viridian_version() can be removed.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
With _cmdline_parse() now issuing error messages in case of illegal
parameters signalled by parsing functions specified in custom_param()
the message issued by mce_set_verbosity() can be removed.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
With _cmdline_parse() now issuing error messages in case of illegal
parameters signalled by parsing functions specified in custom_param()
the message issued by apic_set_verbosity() can be removed.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Mon, 28 Aug 2017 07:34:00 +0000 (09:34 +0200)]
xen: check parameter validity when parsing command line
Where possible check validity of parameters in _cmdline_parse() and
issue a warning message in case of an error detected.
In order to make sure a custom parameter parsing function really
returns a value (error or success), don't use a void pointer for
storing the function address, but a proper typed function pointer.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Xiong Zhang [Mon, 28 Aug 2017 08:51:24 +0000 (10:51 +0200)]
hvmloader: use base instead of pci_mem_start for find_next_rmrr()
find_next_rmrr(base) is used to find the lowest RMRR ending above base
but below 4G. Current method couldn't cover the following situation:
a. two rmrr exist, small gap between them
b. pci_mem_start and mem_resource.base is below the first rmrr.base
c. find_next_rmrr(pci_mem_start) will find the first rmrr
d. After aligning mem_resource.base to bar size,
first_rmrr.end < new_base < second_rmrr.base and
new_base + bar_sz > second_rmrr.base.
So the new bar will overlap with the second rmrr and doesn't overlap
with the first rmrr.
But the next_rmrr point to the first rmrr, then check_overlap() couldn't
find the overlap. Finally assign a wrong address to bar.
This patch using aligned new base to find the next rmrr, could fix the
above case and find all the overlapped rmrr with new base.
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 28 Aug 2017 08:50:29 +0000 (10:50 +0200)]
x86/EFI: warn about r/o sections requiring relocations
EFI implementations may write-protect r/o sections, but we need to
apply relocations. Eliminate the one present case of a r/o section
with relocations (.init.text, which is now being combined with
.init.data into just .init).
Also correct a few other format strings (to account for the possibly
missing NUL in section names) in mkreloc.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 28 Aug 2017 08:48:55 +0000 (10:48 +0200)]
passthrough: give XEN_DOMCTL_test_assign_device more sane semantics
So far callers of the libxc interface passed in a domain ID which was
then ignored in the hypervisor. Instead, make the hypervisor honor it
(accepting DOMID_INVALID to obtain original behavior), allowing to
query whether a device can be assigned to a particular domain. Do this
by folding the assign and test-assign paths.
Drop XSM's test_assign_{,dt}device hooks as no longer being
individually useful.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Juergen Gross [Fri, 25 Aug 2017 16:11:25 +0000 (18:11 +0200)]
xen: fix parse_bool() with empty string
parse_bool() should return -1 in case it is called with an empty
string. In order to allow boolean parameters in the cmdline without
specifying a value this case must be handled in _cmdline_parse() by
always passing a value string.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 23 Aug 2017 18:01:02 +0000 (19:01 +0100)]
x86/mm: Introduce and use l?e_{get,from}_mfn()
This avoids the explicit boxing/unboxing of mfn_t in relevant codepaths.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Wed, 23 Aug 2017 18:01:02 +0000 (19:01 +0100)]
x86/mm: Replace opencoded forms of map_l?t_from_l?e()
No functional change (confirmed by diffing the disassembly).
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Wed, 23 Aug 2017 18:01:02 +0000 (19:01 +0100)]
x86/mm: Replace opencoded forms of l?e_{get,from}_page()
No functional change (confirmed by diffing the disassembly).
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Wed, 23 Aug 2017 16:47:42 +0000 (16:47 +0000)]
x86/pv: Minor improvements to guest_get_eff_{,kern}_l1e()
* These functions work in terms of linear addresses, not virtual addresses.
Update the comments and parameter names.
* Drop unnecessary inlines.
* Drop vcpu parameter from guest_get_eff_kern_l1e(). Its sole caller passes
current, and its callee strictly operates on current.
* Switch guest_get_eff_kern_l1e()'s parameter from void * to l1_pgentry_t *.
Both its caller and callee already use the correct type already.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Fri, 25 Aug 2017 14:42:01 +0000 (16:42 +0200)]
x86/vlapic: apply change to TDCR right away to the timer
The description in the Intel SDM of how the divide configuration
register is used: "The APIC timer frequency will be the processor's bus
clock or core crystal clock frequency divided by the value specified in
the divide configuration register."
Observation of baremetal shown that when the TDCR is change, the TMCCT
does not change or make a big jump in value, but the rate at which it
count down change.
The patch update the emulation to APIC timer to so that a change to the
divide configuration would be reflected in the value of the counter and
when the next interrupt is triggered.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Fri, 25 Aug 2017 14:41:37 +0000 (16:41 +0200)]
x86/vlapic: keep timer running when switching between one-shot and periodic mode
If we take TSC-deadline mode timer out of the picture, the Intel SDM
does not say that the timer is disable when the timer mode is change,
either from one-shot to periodic or vice versa.
After this patch, the timer is no longer disarmed on change of mode, so
the counter (TMCCT) keeps counting down.
So what does a write to LVTT changes ? On baremetal, the change of mode
is probably taken into account only when the counter reach 0. When this
happen, LVTT is use to figure out if the counter should restard counting
down from TMICT (so periodic mode) or stop counting (if one-shot mode).
This also mean that if the counter reach 0 and the mode is one-shot, a
change to periodic would not restart the timer. This is achieve by
setting vlapic->timer_last_update=0.
This patch is based on observation of the behavior of the APIC timer on
baremetal as well as check that they does not go against the description
written in the Intel SDM.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Wed, 23 Aug 2017 17:34:00 +0000 (19:34 +0200)]
xen: add an optional string end parameter to parse_bool()
Add a parameter to parse_bool() to specify the end of the to be
parsed string. Specifying it as NULL will preserve the current
behavior to parse until the end of the input string, while passing
a non-NULL pointer will specify the first character after the input
string.
This will allow to parse boolean sub-strings without having to
write a NUL byte into the input string.
Modify all users of parse_bool() to pass NULL for the new parameter.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
David Woodhouse [Fri, 25 Aug 2017 12:07:40 +0000 (14:07 +0200)]
x86/efi: don't write relocations in efi_arch_relocate_image() first pass
The function is invoked with delta=0 before ExitBootServices() is called,
as a dummy run purely to validate that all the relocations can be handled.
This allows us to exit gracefully with an error message.
However, we have relocations in read-only sections such as .rodata and
.init.te(xt). Recent versions of UEFI will actually make those sections
read-only, which will cause a fault. This functionaity was added in
EDK2 commit d0e92aad4 ("MdeModulePkg/DxeCore: Add UEFI image protection.")
It's OK to actually make the changes in the later pass because UEFI will
tear down the protection when ExitBootServices() is called, because it
knows we're going to need to do this kind of thing.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Boris Ostrovsky [Fri, 25 Aug 2017 12:07:12 +0000 (14:07 +0200)]
x86/hvm: vmx/svm_cpu_up_prepare should be called only once
These routines are first called via CPU_UP_PREPARE notifier by
the BSP and then by the booting ASP from vmx_cpu_up()/_svm_cpu_up().
Avoid the unnecessary second call. Because BSP doesn't go through
CPU_UP_PREPARE it is a special case. We pass 'bsp' flag to newly
added _vmx_cpu_up() (just like it's already done for _svm_cpu_up())
so they can decide whether or not to call vmx/svm_cpu_up_prepare().
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Alexandru Isaila [Fri, 25 Aug 2017 12:05:09 +0000 (14:05 +0200)]
x86/hvm: allow guest_request vm_events coming from userspace
In some introspection usecases, an in-guest agent needs to communicate
with the external introspection agent. An existing mechanism is
HVMOP_guest_request_vm_event, but this is restricted to kernel usecases
like all other hypercalls.
Introduce a mechanism whereby the introspection agent can whitelist the
use of HVMOP_guest_request_vm_event directly from userspace.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monné [Wed, 23 Aug 2017 15:47:38 +0000 (17:47 +0200)]
hvmloader: add fields for SMBIOS 2.4 compliance
The version of SMBIOS set in the entry point is 2.4, however several
structures are missing fields required by 2.4. Fix this by adding the
missing fields, this is based on the documents found at the DMTF site
[0].
Most fields are set to 0 (undefined/not specified), except for the
cache related handlers that need to be initialized to 0xffff in order
to signal that the information is not provided.
xsm: policy hooks to require an IOMMU and interrupt remapping
Isolation of devices passed through to domains usually requires an
active IOMMU. The existing method of requiring an IOMMU is via a Xen
boot parameter ("iommu=force") which will abort boot if an IOMMU is not
available.
More graceful degradation of behaviour when an IOMMU is absent can be
achieved by enabling XSM to perform enforcement of IOMMU requirement.
This patch enables an enforceable XSM policy to specify that an IOMMU is
required for particular domains to access devices and how capable that
IOMMU must be. This allows a Xen system to boot whilst still
ensuring that an IOMMU is active before permitting device use.
Using a XSM policy ensures that the isolation properties remain enforced
even when the large, complex toolstack software changes.
For some hardware platforms interrupt remapping is a strict requirement
for secure isolation. Not all IOMMUs provide interrupt remapping.
The XSM policy can now optionally require interrupt remapping.
The device use hooks now check whether an IOMMU is:
* Active and securely isolating:
-- current criteria for this is that interrupt remapping is ok
* Active but interrupt remapping is not available
* Not active
This patch also updates the reference XSM policy to use the new
primitives, with policy entries that do not require an active IOMMU.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Ross Philipson <ross.philipson@gmail.com>
Jan Beulich [Wed, 23 Aug 2017 15:45:45 +0000 (17:45 +0200)]
arm/mm: release grant lock on xenmem_add_to_physmap_one() error paths
Commit 55021ff9ab ("xen/arm: add_to_physmap_one: Avoid to map mfn 0 if
an error occurs") introduced error paths not releasing the grant table
lock. Replace them by a suitable check after the lock was dropped.
This is XSA-235.
Reported-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <julien.grall@arm.com>
Wei Liu [Mon, 21 Aug 2017 14:09:11 +0000 (15:09 +0100)]
xen: move hvm save code under common to x86
The code is only used by x86 at this point. Merge common/hvm/save.c
into x86 hvm/save.c. Move the headers and fix up inclusions. Remove
the now empty common/hvm directory.
Also fix some issues while moving:
1. removing trailing spaces;
2. fix multi-line comment;
3. make "i" in hvm_save unsigned int;
4. add some blank lines to separate sections of code;
5. change bool_t to bool.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Igor Druzhinin [Thu, 17 Aug 2017 14:57:13 +0000 (15:57 +0100)]
hvmloader, libxl: use the correct ACPI settings depending on device model
We need to choose ACPI tables and ACPI IO port location
properly depending on the device model version we are running.
Previously, this decision was made by BIOS type specific
code in hvmloader, e.g. always load QEMU traditional specific
tables if it's ROMBIOS and always load QEMU Xen specific
tables if it's SeaBIOS.
This change saves this behavior (for compatibility) but adds
an additional way (xenstore key) to specify the correct
device model if we happen to run a non-default one. Toolstack
bit makes use of it.
The enforcement of BIOS type depending on QEMU version will
be lifted later when the rest of ROMBIOS compatibility fixes
are in place.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
The variable domctl.u.address_size.size may remain uninitialized if
guest_type is not one of xen-3.0-aarch64 or xen-3.0-armv7l. And the
code precisely checks if this variable is still 0 to decide if the
guest type is supported or not.
This fixes the following build failure with gcc 7.x:
xc_dom_arm.c:229:31: error: 'domctl.u.address_size.size' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if ( domctl.u.address_size.size == 0 )
Patch originally taken from
https://www.mail-archive.com/xen-devel@lists.xen.org/msg109313.html.
Signed-off-by: Bernd Kuhls <bernd.kuhls@t-online.de> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Boris Ostrovsky [Wed, 16 Aug 2017 18:31:00 +0000 (20:31 +0200)]
mm: Keep heap accessible to others while scrubbing
Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
Boris Ostrovsky [Wed, 16 Aug 2017 18:31:00 +0000 (20:31 +0200)]
spinlock: Introduce spin_lock_cb()
While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.
Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.
We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket and execute a
callback while waiting. This callback is executed on every iteration
of the spinlock waiting loop.
Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Julien Grall <julien.grall@arm.com>
Boris Ostrovsky [Wed, 16 Aug 2017 18:30:00 +0000 (20:30 +0200)]
mm: Scrub memory from idle loop
Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.
We might come to scrub_free_pages()from idle loop while another CPU
uses mapcache override, resulting in a fault while trying to do
__map_domain_page() in scrub_one_page(). To avoid this, make mapcache
vcpu override a per-cpu variable.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com>
Julien Grall [Tue, 8 Aug 2017 17:17:26 +0000 (18:17 +0100)]
xen/arm: Tighten memory attribute requirement for memory shared
Xen allows shared mapping to be Normal inner-cacheable with any inner cache
allocation strategy and no restriction of the outer-cacheability.
However, Xen is always mapping those region Normal Inner Write-Back
Outer Write-Back Inner-shareable. Per B2.8 "Mismatched memory
attributes" in ARM DDI 0487B.a, if the guest is not using the exact same
memory attributes (excluding any cache allocation hints) for the shared
region then the region will be accessed with mismatched attributes.
This will result to potential loss of coherency, and may impact the
performance.
Given that the ARM ARM strongly recommends to avoid using mismatched
attributes, we should impose shared region to be Normal Inner Write-Back
Outer Write-Back Inner-shareable.
hvmloader: support system enclosure asset tag (SMBIOS type 3)
Allow setting system enclosure asset tag for HVM guest. Guest OS can
check and perform desired operation like support installation.
Also added documentation of '~/bios-string/*' xenstore keys into
docs/misc/xenstore-paths.markdown
Signed-off-by: Vivek Kumar Chaubey <vivekkumar.chaubey@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>