]> xenbits.xensource.com Git - xen.git/log
xen.git
6 years agox86/ACPI: also parse AMD IOMMU tables early
Jan Beulich [Mon, 8 Apr 2019 11:03:07 +0000 (13:03 +0200)]
x86/ACPI: also parse AMD IOMMU tables early

In order to be able to initialize x2APIC mode we need to parse
respective ACPI tables early. Split amd_iov_detect() into two parts for
this purpose, and call the initial part earlier on.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agogitlab-ci: log commit range in build test
Wei Liu [Mon, 8 Apr 2019 10:08:56 +0000 (11:08 +0100)]
gitlab-ci: log commit range in build test

It is easier to debug stuff when the target range is clearly visible
at the top.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen/arm: Cap the number of interrupt lines for dom0
Lukas Juenger [Fri, 5 Apr 2019 13:54:04 +0000 (15:54 +0200)]
xen/arm: Cap the number of interrupt lines for dom0

Dom0 vGIC will use the same number of interrupt lines as the hardware GIC.
While the hardware GIC can support up to 1020 interrupt lines,
the vGIC is only supporting up to 992 interrupt lines.
This means that Xen will not be able to boot on platforms where the hardware
GIC supports more than 992 interrupt lines.
While it would make sense to increase the limits in the vGICs, this is not
trivial because of the design choices.
At the moment, only models seem to report the maximum of interrupt lines.
They also do not have any interrupt wired above the 992 limit.
So it should be fine to cap the number of interrupt lines for dom0 to 992 lines.

Signed-off-by: Lukas Juenger <juenger@ice.rwth-aachen.de>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoxen/timers: Fix memory leak with cpu unplug/plug
Andrew Cooper [Fri, 29 Mar 2019 16:17:24 +0000 (16:17 +0000)]
xen/timers: Fix memory leak with cpu unplug/plug

timer_softirq_action() realloc's itself a larger timer heap whenever
necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
ever freed this allocation.

CPU plug and unplug has the side effect of zeroing the percpu data area, which
clears ts->heap.  This in turn causes new timers to be put on the list rather
than the heap, and for timer_softirq_action() to bootstrap itself again.

This in practice leaks ts->heap every time a CPU is unplugged and replugged.

Implement free_percpu_timers() which includes freeing ts->heap when
appropriate, and update the notifier callback with the recent cpu parking
logic and free-avoidance across suspend.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agodocs/hypervisor-guide: Code Coverage
Andrew Cooper [Tue, 26 Mar 2019 11:54:34 +0000 (11:54 +0000)]
docs/hypervisor-guide: Code Coverage

During a discussion in person, it was identified that Coverage doesn't
currently work for ARM yet.  Also, there are a number of errors with the
existing coverage document.

Take the opportunity to rewrite it in RST, making it easier to follow for a
non-expert user.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agodocs/sphinx: Introduce a hypervisor guide section
Andrew Cooper [Tue, 26 Mar 2019 11:54:32 +0000 (11:54 +0000)]
docs/sphinx: Introduce a hypervisor guide section

Include (and retrofit to the user guide) an introductory paragraph describing
the intended audience.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86emul: don't read mask register on AVX512F-incapable platforms
Jan Beulich [Fri, 5 Apr 2019 15:27:13 +0000 (17:27 +0200)]
x86emul: don't read mask register on AVX512F-incapable platforms

Nor when register state isn't sufficiently enabled.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoautomation: fix "build each commit" test
Wei Liu [Fri, 5 Apr 2019 11:21:57 +0000 (12:21 +0100)]
automation: fix "build each commit" test

An error was introduced while rebasing 9b8b3f30. The new test
shouldn't depend on anything, otherwise artefacts will be downloaded
from build stage and cause the script to abort.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/entry: drop unused header inclusions
Jan Beulich [Fri, 5 Apr 2019 14:28:31 +0000 (16:28 +0200)]
x86/entry: drop unused header inclusions

I'm in particular after getting rid of asm/apicdef.h, but there are more
no longer (or perhaps never having been) used ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agoMAINTAINERS: Move xen/lib/x86 under x86 maintainership
Julien Grall [Thu, 4 Apr 2019 14:04:10 +0000 (15:04 +0100)]
MAINTAINERS: Move xen/lib/x86 under x86 maintainership

At the moment, xen/lib/x86 is covered by the "REST". However, this is
x86-only, so this can fall under the x86 maintainership.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agodocs/cmdline: Partially revert 3860d5534df4
Andrew Cooper [Fri, 5 Apr 2019 12:32:08 +0000 (13:32 +0100)]
docs/cmdline: Partially revert 3860d5534df4

This hunk modifies the cpuid= documentation, which is unrelated to the
spec-ctrl= section.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agovm_event: fix XEN_VM_EVENT_RESUME domctl
Petre Pircalabu [Fri, 5 Apr 2019 13:42:03 +0000 (15:42 +0200)]
vm_event: fix XEN_VM_EVENT_RESUME domctl

Make XEN_VM_EVENT_RESUME return 0 in case of success, instead of
-EINVAL.
Remove vm_event_resume form vm_event.h header and set the function's
visibility to static as is used only in vm_event.c.
Move the vm_event_check_ring test inside vm_event_resume in order to
simplify the code.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
6 years agox86/gnttab: relax a get_gfn() invocation
Jan Beulich [Fri, 5 Apr 2019 13:41:24 +0000 (15:41 +0200)]
x86/gnttab: relax a get_gfn() invocation

In the case here only a query is intended, i.e. without populating a
possible PoD or paged out entry, as the intention is to replace the
current (grant) entry anyway. Use get_gfn_query() there instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: don't allow clearing of TF_kernel_mode for other than 64-bit PV
Jan Beulich [Fri, 5 Apr 2019 13:40:42 +0000 (15:40 +0200)]
x86: don't allow clearing of TF_kernel_mode for other than 64-bit PV

The flag is really only meant for those, both HVM and 32-bit PV tell
kernel from user mode based on CPL/RPL. Remove the all-question-marks
comment and let's be on the safe side here and also suppress clearing
for 32-bit PV (this isn't a fast path after all).

Remove no longer necessary is_pv_32bit_*() from sh_update_cr3() and
sh_walk_guest_tables(). Note that shadow_one_bit_disable() already
assumes the new behavior.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agocommon/domain: block speculative out-of-bound accesses
Norbert Manthey [Thu, 14 Mar 2019 12:57:00 +0000 (13:57 +0100)]
common/domain: block speculative out-of-bound accesses

When issuing a vcpu_op hypercall, guests have control over the
vcpuid variable. In the old code, this allowed to perform
speculative out-of-bound accesses. To block this, we make use
of the domain_vcpu function.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: add nospec to hvmop param
Norbert Manthey [Thu, 14 Mar 2019 12:56:00 +0000 (13:56 +0100)]
x86/hvm: add nospec to hvmop param

The params array in hvm can be accessed with get and set functions.
As the index is guest controlled, make sure no out-of-bound accesses
can be performed.

As we cannot influence how future compilers might modify the
instructions that enforce the bounds, we furthermore block speculation,
so that the update is visible in the architectural state.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agocommon/memory: block speculative out-of-bound accesses
Norbert Manthey [Thu, 14 Mar 2019 12:56:00 +0000 (13:56 +0100)]
common/memory: block speculative out-of-bound accesses

The get_page_from_gfn method returns a pointer to a page that belongs
to a gfn. Before returning the pointer, the gfn is checked for being
valid. Under speculation, these checks can be bypassed, so that
the function get_page is still executed partially. Consequently, the
function page_get_owner_and_reference might be executed partially as
well. In this function, the computed pointer is accessed, resulting in
a speculative out-of-bound address load. As the gfn can be controlled by
a guest, this access is problematic.

To mitigate the root cause, an lfence instruction is added via the
evaluate_nospec macro. To make the protection generic, we do not
introduce the lfence instruction for this single check, but add it to
the mfn_valid function. This way, other potentially problematic accesses
are protected as well.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agois_hvm/pv_domain: block speculation
Norbert Manthey [Thu, 14 Mar 2019 12:56:00 +0000 (13:56 +0100)]
is_hvm/pv_domain: block speculation

When checking for being an hvm domain, or PV domain, we have to make
sure that speculation cannot bypass that check, and eventually access
data that should not end up in cache for the current domain type.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Fri, 5 Apr 2019 10:16:52 +0000 (12:16 +0200)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

6 years agoautomation: introduce a test to build each commit
Wei Liu [Wed, 27 Feb 2019 17:26:42 +0000 (17:26 +0000)]
automation: introduce a test to build each commit

This is added to the test stage so that its failure won't block other
things.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: add a script to build newly pushed commits in Gitlab CI
Wei Liu [Thu, 28 Feb 2019 12:50:02 +0000 (12:50 +0000)]
automation: add a script to build newly pushed commits in Gitlab CI

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agois_control_domain: block speculation
Norbert Manthey [Thu, 14 Mar 2019 12:56:00 +0000 (13:56 +0100)]
is_control_domain: block speculation

Checks of domain properties, such as is_hardware_domain or is_hvm_domain,
might be bypassed by speculatively executing these instructions. A reason
for bypassing these checks is that these macros access the domain
structure via a pointer, and check a certain field. Since this memory
access is slow, the CPU assumes a returned value and continues the
execution.

In case an is_control_domain check is bypassed, for example during a
hypercall, data that should only be accessible by the control domain could
be loaded into the cache.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoautomation: set ret for potential error in build-test.sh
Wei Liu [Wed, 27 Feb 2019 18:22:34 +0000 (18:22 +0000)]
automation: set ret for potential error in build-test.sh

`git rev-list` can fail if the base..tip range contains invalid
commit(s). If that happens ret never gets a chance to be set.

Set ret before hand to fix the issue.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: allow build-test.sh to run in detached HEAD state
Wei Liu [Wed, 27 Feb 2019 17:42:07 +0000 (17:42 +0000)]
automation: allow build-test.sh to run in detached HEAD state

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agonospec: introduce evaluate_nospec
Norbert Manthey [Thu, 14 Mar 2019 12:55:00 +0000 (13:55 +0100)]
nospec: introduce evaluate_nospec

Since the L1TF vulnerability of Intel CPUs, loading hypervisor data into
L1 cache is problematic, because when hyperthreading is used as well, a
guest running on the sibling core can leak this potentially secret data.

To prevent these speculative accesses, we block speculation after
accessing the domain property field by adding lfence instructions. This
way, the CPU continues executing and loading data only once the condition
is actually evaluated.

As this protection is typically used in if statements, the lfence has to
come in a compatible way. Therefore, a function that returns true after an
lfence instruction is introduced. To protect both branches after a
conditional, an lfence instruction has to be added for the two branches.
To be able to block speculation after several evaluations, the generic
barrier macro block_speculation is also introduced.

As the L1TF vulnerability is only present on the x86 architecture, there is
no need to add protection for other architectures. Hence, the introduced
functions are defined but empty.

On the x86 architecture, by default, the lfence instruction is not present
either. Only when a L1TF vulnerable platform is detected, the lfence
instruction is patched in via alternative patching. Similarly, PV guests
are protected wrt L1TF by default, so that the protection is furthermore
disabled in case HVM is exclueded via the build configuration.

Introducing the lfence instructions catches a lot of potential leaks with
a simple unintrusive code change. During performance testing, we did not
notice performance effects.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agospec: add l1tf-barrier
Norbert Manthey [Thu, 14 Mar 2019 12:55:00 +0000 (13:55 +0100)]
spec: add l1tf-barrier

To control the runtime behavior on L1TF vulnerable platforms better, the
command line option l1tf-barrier is introduced. This option controls
whether on vulnerable x86 platforms the lfence instruction is used to
prevent speculative execution from bypassing the evaluation of
conditionals that are protected with the evaluate_nospec macro.

By now, Xen is capable of identifying L1TF vulnerable hardware. However,
this information cannot be used for alternative patching, as a CPU feature
is required. To control alternative patching with the command line option,
a new x86 feature "X86_FEATURE_SC_L1TF_VULN" is introduced. This feature
is used to patch the lfence instruction into the arch_barrier_nospec_true
function. The feature is enabled only if L1TF vulnerable hardware is
detected and the command line option does not prevent using this feature.

The status of hyperthreading is considered when automatically enabling
adding the lfence instruction. Since platforms without hyperthreading can
still be vulnerable to L1TF in case the L1 cache is not flushed properly,
the additional lfence instructions are patched in if either hyperthreading
is enabled, or L1 cache flushing is missing.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/msr: Fix handling of MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV
Andrew Cooper [Mon, 1 Apr 2019 10:08:28 +0000 (11:08 +0100)]
x86/msr: Fix handling of MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV

There are a number of bugs.  There are no read/write hooks on the HVM side, so
guest accesses fall into the "read/write-discard" defaults, which bypass the
correct faulting behaviour and the Intel special case.

For the PV side, writes are discarded (again, bypassing proper faulting),
except for a pinned dom0, which is permitted to actually write the values
other than 0.  This is pointless with read hook implementing the Intel special
case.

However, implementing the Intel special case is itself pointless.  First of
all, OS software can't guarentee to read back 0 in the first place, because a)
this behaviour isn't guarenteed in the SDM, and b) there are SMM handlers
which use the CPUID instruction.  Secondly, when a guest executes CPUID, this
doesn't typically result in Xen executing a CPUID instruction in practice.

With the dom0 special case removed, there are now no writes to this MSR other
than Xen's microcode loading facilities, which means that the value held in
the MSR will be properly up-to-date.  Forward it directly, without jumping
through any hoops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/cpu: Renumber X86_VENDOR_* to form a bitmap
Andrew Cooper [Thu, 4 Apr 2019 18:39:08 +0000 (19:39 +0100)]
x86/cpu: Renumber X86_VENDOR_* to form a bitmap

CPUs from different vendors sometimes share characteristics.  All users of
X86_VENDOR_* are now direct equal/not-equal comparisons.  By expressing the
X86_VENDOR_* constants in a bitmap fashon, we can more concicely and
efficiently test whether a vendor is one of a group.

Update all parts of the code which can already benefit from this improvement.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/cpu: Introduce x86_cpuid_vendor_to_str() and drop cpu_dev.c_vendor[]
Andrew Cooper [Thu, 4 Apr 2019 18:19:20 +0000 (19:19 +0100)]
x86/cpu: Introduce x86_cpuid_vendor_to_str() and drop cpu_dev.c_vendor[]

cpu_dev.c_vendor[] is a char[8] array which is printed using %s in two
locations.  This leads to subtle lack-of-NUL bugs when using an 8 character
vendor name.

Introduce x86_cpuid_vendor_to_str() to turn an x86_vendor into a printable
string, use it in the two locations that c_vendor is used, and drop c_vendor.

This drops the final user of X86_VENDOR_NUM, so drop that as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/cpu: Drop cpu_devs[] and $VENDOR_init_cpu() hooks
Andrew Cooper [Thu, 4 Apr 2019 14:51:25 +0000 (15:51 +0100)]
x86/cpu: Drop cpu_devs[] and $VENDOR_init_cpu() hooks

These helpers each fill in a single cpu_devs[] pointer, and since c/s
00b4f4d0f "x86/cpuid: Drop get_cpu_vendor() completely", this array is read
exactly once on boot.

Delete the hooks and cpu_devs[], and have early_cpu_detect() pick the
appropriate cpu_dev structure directly.

As early_cpu_init() is empty now other than a call to early_cpu_detect(), and
this isn't expected to change moving forwards, rename the latter and delete
the former.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86emul: support AVX512{F,BW} down conversion moves
Jan Beulich [Fri, 5 Apr 2019 08:42:39 +0000 (10:42 +0200)]
x86emul: support AVX512{F,BW} down conversion moves

Note that the vpmov{,s,us}{d,q}w table entries in evex-disp8.c are
slightly different from what one would expect, due to them requiring
EVEX.W to be zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512{F,BW} zero- and sign-extending moves
Jan Beulich [Fri, 5 Apr 2019 08:41:59 +0000 (10:41 +0200)]
x86emul: support AVX512{F,BW} zero- and sign-extending moves

Note that the testing in simd.c doesn't really follow the ISA extension
pattern - to fit the scheme, extensions from byte and word granular
vectors can (currently) sensibly only happen in the AVX512BW case (and
hence respective abstraction macros will be added there rather than
here).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: basic AVX512VL testing
Jan Beulich [Fri, 5 Apr 2019 08:41:12 +0000 (10:41 +0200)]
x86emul: basic AVX512VL testing

Test the 128- and 256-bit variants of the insns which have been
implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512{F,BW,DQ} integer broadcast insns
Jan Beulich [Fri, 5 Apr 2019 08:40:33 +0000 (10:40 +0200)]
x86emul: support AVX512{F,BW,DQ} integer broadcast insns

Note that the pbroadcastw table entry in evex-disp8.c is slightly
different from what one would expect, due to it requiring EVEX.W to be
zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: basic AVX512F testing
Jan Beulich [Fri, 5 Apr 2019 08:40:02 +0000 (10:40 +0200)]
x86emul: basic AVX512F testing

Test various of the insns which have been implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512{F,BW,DQ} insert insns
Jan Beulich [Fri, 5 Apr 2019 08:39:17 +0000 (10:39 +0200)]
x86emul: support AVX512{F,BW,DQ} insert insns

Also correct the comment of the AVX form of VINSERTPS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512{F,BW,DQ} extract insns
Jan Beulich [Fri, 5 Apr 2019 08:38:38 +0000 (10:38 +0200)]
x86emul: support AVX512{F,BW,DQ} extract insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoviridian: add implementation of the HvSendSyntheticClusterIpi hypercall
Paul Durrant [Tue, 19 Mar 2019 15:29:00 +0000 (16:29 +0100)]
viridian: add implementation of the HvSendSyntheticClusterIpi hypercall

This patch adds an implementation of the hypercall as documented in the
specification [1], section 10.5.2. This enlightenment, as with others, is
advertised by CPUID leaf 0x40000004 and is under control of a new
'hcall_ipi' option in libxl.

If used, this enlightenment should mean the guest only takes a single VMEXIT
to issue IPIs to multiple vCPUs rather than the multiple VMEXITs that would
result from using the emulated local APIC.

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: add implementation of synthetic timers
Paul Durrant [Tue, 19 Mar 2019 15:29:00 +0000 (16:29 +0100)]
viridian: add implementation of synthetic timers

This patch introduces an implementation of the STIMER0-15_CONFIG/COUNT MSRs
and hence a the first SynIC message source.

The new (and documented) 'stimer' viridian enlightenment group may be
specified to enable this feature.

While in the neighbourhood, this patch adds a missing check for an
attempt to write the time reference count MSR, which should result in an
exception (but not be reported as an unimplemented MSR).

NOTE: It is necessary for correct operation that timer expiration and
      message delivery time-stamping use the same time source as the guest.
      The specification is ambiguous but testing with a Windows 10 1803
      guest has shown that using the partition reference counter as a
      source whilst the guest is using RDTSC and the reference tsc page
      does not work correctly. Therefore the time_now() function is used.
      This implements the algorithm for acquiring partition reference time
      that is documented in the specifiction.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: add implementation of synthetic interrupt MSRs
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: add implementation of synthetic interrupt MSRs

This patch introduces an implementation of the SCONTROL, SVERSION, SIEFP,
SIMP, EOM and SINT0-15 SynIC MSRs. No message source is added and, as such,
nothing will yet generate a synthetic interrupt. A subsequent patch will
add an implementation of synthetic timers which will need the infrastructure
added by this patch to deliver expiry messages to the guest.

NOTE: A 'synic' option is added to the toolstack viridian enlightenments
      enumeration but is deliberately not documented as enabling these
      SynIC registers without a message source is only useful for
      debugging.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: stop directly calling viridian_time_ref_count_freeze/thaw()...
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: stop directly calling viridian_time_ref_count_freeze/thaw()...

...from arch_domain_shutdown/pause/unpause().

A subsequent patch will introduce an implementaion of synthetic timers
which will also need freeze/thaw hooks, so make the exported hooks more
generic and call through to (re-named and static) time_ref_count_freeze/thaw
functions.

NOTE: This patch also introduces a new time_ref_count() helper to return
      the current counter value. This is currently only used by the MSR
      read handler but the synthetic timer code will also need to use it.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: use viridian_map/unmap_guest_page() for reference tsc page
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: use viridian_map/unmap_guest_page() for reference tsc page

Whilst the reference tsc page does not currently need to be kept mapped
after it is initially set up (or updated after migrate), the code can
be simplified by using the common guest page map/unmap and dump functions.
New functionality added by a subsequent patch will also require the page to
kept mapped for the lifetime of the domain.

NOTE: Because the reference tsc page is per-domain rather than per-vcpu
      this patch also changes viridian_map_guest_page() to take a domain
      pointer rather than a vcpu pointer. The domain pointer cannot be
      const, unlike the vcpu pointer.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoviridian: add missing context save helpers into synic and time modules
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: add missing context save helpers into synic and time modules

Currently the time module lacks vcpu context save helpers and the synic
module lacks domain context save helpers. These helpers are not yet
required but subsequent patches will require at least some of them so this
patch completes the set to avoid introducing them in an ad-hoc way.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoviridian: extend init/deinit hooks into synic and time modules
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: extend init/deinit hooks into synic and time modules

This patch simply adds domain and vcpu init/deinit hooks into the synic
and time modules and wires them into viridian_[domain|vcpu]_[init|deinit]().
Only one of the hooks is currently needed (to unmap the 'VP Assist' page)
but subsequent patches will make use of the others.

NOTE: To perform the unmap of the VP Assist page,
      viridian_unmap_guest_page() is now directly called in the new
      viridian_synic_vcpu_deinit() function (which is safe even if
      is_viridian_vcpu() evaluates to false). This replaces the slightly
      hacky mechanism of faking a zero write to the
      HV_X64_MSR_VP_ASSIST_PAGE MSR in viridian_cpu_deinit().

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoviridian: make 'fields' struct anonymous...
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: make 'fields' struct anonymous...

...inside viridian_page_msr and viridian_guest_os_id_msr unions.

There's no need to name it and the code is shortened by not doing so.
No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: use stack variables for viridian_vcpu and viridian_domain...
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: use stack variables for viridian_vcpu and viridian_domain...

...where there is more than one dereference inside a function.

This shortens the code and makes it more readable. No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: separately allocate domain and vcpu structures
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: separately allocate domain and vcpu structures

Currently the viridian_domain and viridian_vcpu structures are inline in
the hvm_domain and hvm_vcpu structures respectively. Subsequent patches
will need to add sizable extra fields to the viridian structures which
will cause the PAGE_SIZE limit of the overall vcpu structure to be
exceeded. This patch, therefore, uses the new init hooks to separately
allocate the structures and converts the 'viridian' fields in hvm_domain
and hvm_cpu to be pointers to these allocations. These separate allocations
also allow some vcpu and domain pointers to become const.

Ideally, now that they are no longer inline, the allocations of the
viridian structures could be made conditional on whether the toolstack
is going to configure the viridian enlightenments. However the toolstack
is currently unable to convey this information to the domain creation code
so such an enhancement is deferred until that becomes possible.

NOTE: The patch also introduced the 'is_viridian_vcpu' macro to avoid
      introducing a second evaluation of 'is_viridian_domain' with an
      open-coded 'v->domain' argument. This macro will also be further
      used in a subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: add init hooks
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: add init hooks

This patch adds domain and vcpu init hooks for viridian features. The init
hooks do not yet do anything; the functionality will be added to by
subsequent patches.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agohvmloader: add SMBIOS type 2 info for customized string
Xin Li [Fri, 5 Apr 2019 08:16:16 +0000 (10:16 +0200)]
hvmloader: add SMBIOS type 2 info for customized string

Extend smbios type 2 struct to match specification, add support to
write it when customized string provided and no smbios passed in.

Signed-off-by: Xin Li <xin.li@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/mm: drop redundant local variable from _get_page_type()
Jan Beulich [Fri, 5 Apr 2019 08:15:10 +0000 (10:15 +0200)]
x86/mm: drop redundant local variable from _get_page_type()

Instead of the separate iommu_ret, the general rc can be used even for
the IOMMU operations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen: vcpu_migrate_start can be static
Wei Liu [Thu, 4 Apr 2019 14:13:36 +0000 (15:13 +0100)]
xen: vcpu_migrate_start can be static

It's not used outside of schedule.c.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agobuild: don't mandate availability of a fetcher program
Wei Liu [Thu, 14 Mar 2019 14:08:47 +0000 (14:08 +0000)]
build: don't mandate availability of a fetcher program

It is common that build hosts are isolated from outside world. They
don't necessarily have wget or ftp installed.

Turn the error into warning in configure. And point FETCHER to `false'
command if neither wget nor ftp is available, so any attempt to
download will result in error.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/sysctl: Clean up XEN_SYSCTL_cpu_hotplug
Andrew Cooper [Fri, 29 Mar 2019 12:14:37 +0000 (12:14 +0000)]
x86/sysctl: Clean up XEN_SYSCTL_cpu_hotplug

A future change is going to introduce two more cases.  Instead of opcoding the
XSM checks and contine_hypercall logic, collect the data into local variables.

Switch the default return value to -EOPNOTSUPP to distinguish a bad op from a
bad cpu index.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/cpu: Distinguish "cpu already in that state" in cpu_{up,down}()
Andrew Cooper [Tue, 2 Apr 2019 13:21:56 +0000 (14:21 +0100)]
xen/cpu: Distinguish "cpu already in that state" in cpu_{up,down}()

All methods of querying the online state of a CPU are racy without the hotplug
lock held, which can lead to a TOCTOU race trying to online or offline CPUs.

Distinguish this case with -EEXIST rather than -EINVAL, so the caller can take
other actions if necessary.

While adjusting this, rework the code slightly to fold the exit paths, which
results in a minor reduction in compiled code size.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/pv: Drop redundant CONFIG_PV ifdefary
Andrew Cooper [Wed, 3 Apr 2019 18:55:55 +0000 (19:55 +0100)]
x86/pv: Drop redundant CONFIG_PV ifdefary

These were made redundant by c/s 23058e7b3 "x86/shadow: put PV L1TF functions
under CONFIG_PV" but makes the surrounding code read as if is outside of the
ifdef.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agogitlab-ci: add fedora gcc build jobs
Wei Liu [Thu, 4 Apr 2019 11:23:02 +0000 (12:23 +0100)]
gitlab-ci: add fedora gcc build jobs

Although the image comes with clang, clang builds don't work yet.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: add Fedora image to containerize script
Wei Liu [Thu, 4 Apr 2019 11:23:01 +0000 (12:23 +0100)]
automation: add Fedora image to containerize script

At the same time sort the list alphabetically.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: add a Fedora image
Wei Liu [Thu, 4 Apr 2019 11:23:00 +0000 (12:23 +0100)]
automation: add a Fedora image

Use the latest and greatest.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agopublic/io/blkif.h: try to fix the semantics of sector based quantities
Paul Durrant [Thu, 4 Apr 2019 11:40:02 +0000 (12:40 +0100)]
public/io/blkif.h: try to fix the semantics of sector based quantities

The semantics of sector based quantities, such as first_sect and last_sect
in blkif_request_segment, and the value of "sectors" in the backend info
in xenstore have become confused. Some comments in the header suggest they
should be supplied/interpreted strictly in terms of 512-byte units, others
suggest they should be scaled by the value of "sector-size" i.e. the
logical block size of the underlying backend storage.
This confusion has caused mixed semantics to become ingrained in frontend
implementations. For instance Linux xen-blkfront.c contains code such as:

    fsect = offset >> 9;
    lsect = fsect + (len >> 9) - 1;

whereas the Windows XENVBD frontend contains the following equivalent code:

    Segment->FirstSector = (UCHAR)((Offset + SectorSize - 1) / SectorSize);
    *SectorsNow = __min(SectorsLeft, SectorsPerPage - Segment->FirstSector);
    Segment->LastSector = (UCHAR)(Segment->FirstSector + *SectorsNow - 1);

(where SectorSize is the "sector-size" value advertized in xenstore).

Thus it has become unsafe for a backend to set "sector-size" to anything
other than 512 as it does not know which way the frontend is coded.

This patch is intended to clarify the situation and also introduce a
mechanism to allow logical block sizes of more than 512 to be supported...

A new frontend feature node is specified: 'feature-large-sector-size'.
If this node is present and set to "1" then it means that frontend is
coded to supply and interpret all sector based quantities in terms of the
the advertized "sector-size" value rather than a hardcoded size of 512.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
6 years agoxen/sched: don't disable scheduler on cpus during suspend
Juergen Gross [Tue, 2 Apr 2019 05:34:57 +0000 (07:34 +0200)]
xen/sched: don't disable scheduler on cpus during suspend

Today there is special handling in cpu_disable_scheduler() for suspend
by forcing all vcpus to the boot cpu. In fact there is no need for that
as during resume the vcpus are put on the correct cpus again.

So we can just omit the call of cpu_disable_scheduler() when offlining
a cpu due to suspend and on resuming we can omit taking the schedule
lock for selecting the new processor.

In restore_vcpu_affinity() we should be careful when applying affinity
as the cpu might not have come back to life. This in turn enables us
to even support affinity_broken across suspend/resume.

Avoid all other scheduler dealloc - alloc dance when doing suspend and
resume, too. It is enough to react on cpus failing to come up on resume
again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
6 years agoxen/cpupool: simplify suspend/resume handling
Juergen Gross [Tue, 2 Apr 2019 05:34:56 +0000 (07:34 +0200)]
xen/cpupool: simplify suspend/resume handling

Instead of removing cpus temporarily from cpupools during
suspend/resume only remove cpus finally which didn't come up when
resuming.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
6 years agoxen: don't free percpu areas during suspend
Juergen Gross [Tue, 2 Apr 2019 05:34:55 +0000 (07:34 +0200)]
xen: don't free percpu areas during suspend

Instead of freeing percpu areas during suspend and allocating them
again when resuming keep them. Only free an area in case a cpu didn't
come up again when resuming.

It should be noted that there is a potential change in behaviour as
the percpu areas are no longer zeroed out during suspend/resume. While
I have checked the called cpu notifier hooks to cope with that there
might be some well hidden dependency on the previous behaviour. OTOH
a component not registering itself for cpu down/up and expecting to
see a zeroed percpu variable after suspend/resume is kind of broken
already. And the opposite case, where a component is not registered
to be called for cpu down/up and is not expecting a percpu variable
suddenly to be zero due to suspend/resume is much more probable,
especially as the suspend/resume functionality seems not to be tested
that often.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen: add new cpu notifier action CPU_RESUME_FAILED
Juergen Gross [Tue, 2 Apr 2019 05:34:54 +0000 (07:34 +0200)]
xen: add new cpu notifier action CPU_RESUME_FAILED

Add a new cpu notifier action CPU_RESUME_FAILED which is called for all
cpus which failed to come up at resume. The calls will be done after
all other cpus are already up in order to know which resources are
available then.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agoxen: add helper for calling notifier_call_chain() to common/cpu.c
Juergen Gross [Tue, 2 Apr 2019 05:34:53 +0000 (07:34 +0200)]
xen: add helper for calling notifier_call_chain() to common/cpu.c

Add a helper cpu_notifier_call_chain() to call notifier_call_chain()
for a cpu with a specified action, returning an errno value.

This avoids coding the same pattern multiple times.

While at it avoid side effects from using BUG_ON() by not using
cpu_online(cpu) as a parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/sched: call cpu_disable_scheduler() via cpu notifier
Juergen Gross [Tue, 2 Apr 2019 16:19:05 +0000 (18:19 +0200)]
xen/sched: call cpu_disable_scheduler() via cpu notifier

cpu_disable_scheduler() is being called from __cpu_disable() today.
There is no need to execute it on the cpu just being disabled, so use
the CPU_DEAD case of the cpu notifier chain. Moving the call out of
stop_machine() context is fine, as we just need to hold the domain RCU
lock and need the scheduler percpu data to be still allocated.

Add another hook for CPU_DOWN_PREPARE to bail out early in case
cpu_disable_scheduler() would fail. This will avoid crashes in rare
cases for cpu hotplug or suspend.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoautomation: Add Arch Linux container and build jobs
Anthony PERARD [Wed, 3 Apr 2019 17:33:58 +0000 (18:33 +0100)]
automation: Add Arch Linux container and build jobs

One particularity of Arch Linux, /usr/bin/python is python3.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agox86/altp2m: treat view 0 as the hostp2m in p2m_get_mem_access()
Razvan Cojocaru [Wed, 3 Apr 2019 08:56:37 +0000 (11:56 +0300)]
x86/altp2m: treat view 0 as the hostp2m in p2m_get_mem_access()

p2m_set_mem_access() (and other places) treat view 0 as the
hostp2m, but p2m_get_mem_access() does not. Correct that
inconsistency.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
6 years agoamd-iommu: use a bitfield for DTE
Paul Durrant [Wed, 3 Apr 2019 13:16:08 +0000 (15:16 +0200)]
amd-iommu: use a bitfield for DTE

The current use of get/set_field_from/in_reg_u32() is both inefficient and
requires some ugly casting.

This patch defines a new bitfield structure (amd_iommu_dte) and uses this
structure in all DTE manipulation, resulting in much more readable and
compact code.

NOTE: This patch also includes some clean-up of get_dma_requestor_id() to
      change the types of the arguments from u16 to uint16_t.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agoamd-iommu: use a bitfield for PTE/PDE
Paul Durrant [Wed, 3 Apr 2019 13:15:29 +0000 (15:15 +0200)]
amd-iommu: use a bitfield for PTE/PDE

The current use of get/set_field_from/in_reg_u32() is both inefficient and
requires some ugly casting.

This patch defines a new bitfield structure (amd_iommu_pte) and uses this
structure in all PTE/PDE manipulation, resulting in much more readable
and compact code.

NOTE: This commit also fixes one malformed comment in
      set_iommu_pte_present().

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agoxen/tools/symbols.c: fix potential segfault
Xiaochen Wang [Wed, 3 Apr 2019 08:18:20 +0000 (10:18 +0200)]
xen/tools/symbols.c: fix potential segfault

Description:
This bug hardly appears during real kernel compiling,
 because the vmlinux symbols table is huge.

But we can still catch it under strict condition , as follows.
   $ echo "c101b97b T do_fork" | ./scripts/kallsyms --all-symbols
   #include <asm/types.h>
   ......
   ......
   .globl kallsyms_token_table
           ALGN
   kallsyms_token_table:
   Segmentation fault (core dumped)
   $

If symbols table is small, all entries in token_profit[0x10000] may
decrease to 0 after several calls of compress_symbols() in optimize_result().
In that case, find_best_token() always return 0 and
best_table[i] is set to "\0\0" and best_table_len[i] is set to 2.

As a result, expand_symbol(best_table[0]="\0\0", best_table_len[0]=2, buf)
in write_src() will run in infinite recursion until stack overflows,
causing segfault.

This patch checks the find_best_token() return value. If all entries in
token_profit[0x10000] become 0 according to return value, it breaks the loop
in optimize_result().
And expand_symbol() works well when best_table_len[i] is 0.

Signed-off-by: Xiaochen Wang <wangxiaochen0@gmail.com>
[Linux: e0a04b11e4059cab033469617 scripts/kallsyms.c: fix potential segfault]
Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Bjoern Doebel <doebel@amazon.de>
Reviewed-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoVT-d: return full destination ID for IO-APIC reads
Jan Beulich [Wed, 3 Apr 2019 08:15:54 +0000 (10:15 +0200)]
VT-d: return full destination ID for IO-APIC reads

In x2APIC mode it is 32 bits wide. Not having returned the full value
was mostly benign: We never modify the ID based on its original value;
full new values get written at all times. It was "just" debug logging
which ended up wrong this way (and which will need adjustment itself as
well, to also consume the full value).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86/IO-APIC: consolidate / complete #define-s
Jan Beulich [Wed, 3 Apr 2019 08:15:20 +0000 (10:15 +0200)]
x86/IO-APIC: consolidate / complete #define-s

Drop redundant ones from apic.h. Add delivery mode mask. Use them in
place of open coded hex numbers.

Take the opportunity and modify a helper function's parameters to be
just unsigned int. Also drop the bogus double underscore from its name,
as it and all its callers get touched anyway.

No functional change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86emul: suppress general register update upon AVX gather failures
Jan Beulich [Wed, 3 Apr 2019 08:14:32 +0000 (10:14 +0200)]
x86emul: suppress general register update upon AVX gather failures

While destination and mask registers may indeed need updating in this
case, the rIP update in particular needs to be avoided, as well as e.g.
raising a single step trap.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/vvmx: Fix debug prints to not have 17 unnecessary spaces
Andrew Cooper [Wed, 27 Mar 2019 19:52:17 +0000 (19:52 +0000)]
x86/vvmx: Fix debug prints to not have 17 unnecessary spaces

This has been problematic since its introduction in Xen 4.3

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
6 years agotools/ocaml: make python scripts 2 and 3 compatible
Wei Liu [Mon, 1 Apr 2019 10:32:38 +0000 (11:32 +0100)]
tools/ocaml: make python scripts 2 and 3 compatible

1. Explicitly import reduce because that's required in 3.
2. Change print to function.
3. Eliminate invocations of has_key.

Signed-off-by: M A Young <m.a.young@durham.ac.uk>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
6 years agopygrub: encode / decode string in Python 3
Wei Liu [Mon, 1 Apr 2019 10:32:37 +0000 (11:32 +0100)]
pygrub: encode / decode string in Python 3

String is unicode in 3 but bytes in 2. We need to call encode / decode
function when using Python 3.

Reported-by: M A Young <m.a.young@durham.ac.uk>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agopygrub/grub: always use integer for default entry
Wei Liu [Mon, 1 Apr 2019 10:32:36 +0000 (11:32 +0100)]
pygrub/grub: always use integer for default entry

The original code set the default to either a string or an integer
(0) and relies on a Python 2 specific behaviour to work (integer is
allowed to be compared to string in Python 2 but not 3).

Always use integer. The caller (pygrub) already has code to handle
that.

Reported-by: M A Young <m.a.young@durham.ac.uk>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agopygrub: fix message in grub parser
Wei Liu [Mon, 1 Apr 2019 10:32:35 +0000 (11:32 +0100)]
pygrub: fix message in grub parser

The code suggests 0 is allowed. Zero is not a positive number.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen/sched: Remove d->is_pinned
Andrew Cooper [Mon, 1 Apr 2019 10:08:43 +0000 (10:08 +0000)]
xen/sched: Remove d->is_pinned

The is_pinned field is rather odd.  It can only be activated with the
"dom0_vcpus_pin" command line option, and causes dom0 (or the late hwdom) to
have its vcpus identity pinned to pcpus.

Having dom0_vcpus_pin active disallows the use of vcpu_set_hard_affinity().
However, when a pcpu is offlined, or moved between cpupools, the affinity is
broken and reverts to cpumask_all.  This results in vcpus which are no longer
pinned, and cannot be adjusted.

A related bit of functionality is the is_pinned_vcpu() predicate.  This is
only used by x86 code, and permits the use of VCPUOP_get_physid and writeable
access to some extra MSRs.

The implementation however returns true for is_pinned (which will include
unpinned vcpus from the above scenario), *or* if the hard affinity mask only
has a single bit set (which is redundant with the intended effect of
is_pinned, but also includes other domains).

Rework the behaviour of "dom0_vcpus_pin" to only being an initial pinning
configuration, and permit full adjustment.  This allows the user to
reconfigure dom0 after the fact or fix up from the fallout of cpu hot unplug
and cpupool manipulation.

An unprivileged domain has no business using VCPUOP_get_physid, and shouldn't
be able to just because it happens to be pinned by admin choice.  All uses of
is_pinned_vcpu() should be restricted to the hardware domain, so rename it to
is_hwdom_pinned_vcpu() to avoid future misuse.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
6 years agotools/xenmon: make xenmon.py compatible with python 2 and 3
Wei Liu [Mon, 1 Apr 2019 10:39:00 +0000 (11:39 +0100)]
tools/xenmon: make xenmon.py compatible with python 2 and 3

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/APIC: suppress redundant "Switched to ..." messages
Jan Beulich [Mon, 1 Apr 2019 09:12:54 +0000 (11:12 +0200)]
x86/APIC: suppress redundant "Switched to ..." messages

There's no need to log anything when what we "switch to" is what is in
use already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul/fuzz: adjust canonicalization in sanitize_input()
Jan Beulich [Mon, 1 Apr 2019 09:12:16 +0000 (11:12 +0200)]
x86emul/fuzz: adjust canonicalization in sanitize_input()

Drop it entirely for %rbp - this register is not special purpose enough
to warrant such special treatment. Add a comment to clarify the purpose
of the canonicalization of %rip and %rsp.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/paging: paging_set_allocation() is init-only
Jan Beulich [Mon, 1 Apr 2019 09:09:43 +0000 (11:09 +0200)]
x86/paging: paging_set_allocation() is init-only

This is needed for Dom0 creation only, therefore it gets additionally
framed by an #ifdef.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agoxen/timers: Document and improve the representation of the timer heap metadata
Andrew Cooper [Fri, 29 Mar 2019 13:32:09 +0000 (13:32 +0000)]
xen/timers: Document and improve the representation of the timer heap metadata

The {GET,SET}_HEAP_{SIZE,LIMIT}() macros implement some completely
undocumented pointer misuse to store the size and limit information.  In
practice, heap[0] is never a timer pointer, and used to stash the metadata
instead.

Extend the HEAP OPERATIONS comment to include this detail.  Introduce a
structure representing the heap metadata, and a static inline function to
perfom the type punning.

Replace all of the above macros with an equivelent expression involving the
heap_metadata() helper.  Note that I deliberately haven't rearranged the
surrounding code - this allows the correctness of the transformation to be
checked by confirming that the compiled binary is identical.

This also removes two cases of a macro argument with side effects, which only
worked correctly because the arguments were only evaluated once.

Finally, fix up the type of dummy_heap.  The old code functioned correctly,
but only by virtue of confusing a discrete object and a single-entry array.
Change its type to match the intended semantics, and drop the redundant
initialisation in timer_init().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/sched: fix credit2 smt idle handling
Juergen Gross [Thu, 28 Mar 2019 15:46:22 +0000 (16:46 +0100)]
xen/sched: fix credit2 smt idle handling

Credit2's smt_idle_mask_set() and smt_idle_mask_clear() are used to
identify idle cores where vcpus can be moved to. A core is thought to
be idle when all siblings are known to have the idle vcpu running on
them.

Unfortunately the information of a vcpu running on a cpu is per
runqueue. So in case not all siblings are in the same runqueue a core
will never be regarded to be idle, as the sibling not in the runqueue
is never known to run the idle vcpu.

Use a credit2 specific cpumask of siblings with only those cpus
being marked which are in the same runqueue as the cpu in question.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
6 years agolibx86: Recalculate synthesised cpuid_policy fields when appropriate
Andrew Cooper [Tue, 10 Jul 2018 12:53:21 +0000 (13:53 +0100)]
libx86: Recalculate synthesised cpuid_policy fields when appropriate

When filling a policy, either from CPUID or an incomming leaf stream,
recalculate the synthesised vendor value.  All callers are expected to want
this behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agotools/libxc: Use x86_cpuid_lookup_vendor() rather than opencoding the logic
Andrew Cooper [Wed, 20 Mar 2019 14:56:15 +0000 (14:56 +0000)]
tools/libxc: Use x86_cpuid_lookup_vendor() rather than opencoding the logic

This doesn't address any of the assumptions that "anything which isn't AMD is
Intel".  This logic is expected to be replaced wholesale with libx86 in the
longterm.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/cpuid: Drop get_cpu_vendor() completely
Andrew Cooper [Tue, 10 Jul 2018 12:53:21 +0000 (13:53 +0100)]
x86/cpuid: Drop get_cpu_vendor() completely

get_cpu_vendor() tries to do a number of things, and ends up doing none of
them well.

For calculating the vendor itself, use x86_cpuid_lookup_vendor() which is
implemented in a far more efficient manner than looping over cpu_devs[].

For setting up this_cpu, set it up once on the BSP only, rather than
latest-takes-precident across the APs.  Such a system is probably not going to
boot, but this feels like a less dangerous course of action.  Adjust the
printed errors to be more clear in the mismatch case.

This removes the only user of cpu_dev->c_ident[], so drop that field as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agolibx86: Introduce x86_cpuid_lookup_vendor()
Andrew Cooper [Wed, 20 Mar 2019 14:05:11 +0000 (14:05 +0000)]
libx86: Introduce x86_cpuid_lookup_vendor()

Also introduce constants for the vendor strings in CPUID leaf 0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoCI: Add a CentOS 6 container and build jobs
Andrew Cooper [Tue, 26 Mar 2019 14:23:03 +0000 (14:23 +0000)]
CI: Add a CentOS 6 container and build jobs

CentOS 6 is probably the most frequently broken build, so adding it to CI
would be a very good move.

One problem is that CentOS 6 comes with Python 2.6, and Qemu requires 2.7.
There appear to be no sensible ways to get Python 2.7 into a CentOS 6
environments, so modify the build script to skip the Qemu upstream build
instead.  Additionally, SeaBIOS requires GCC 4.6 or later, so skip it as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agoCI: Fix indentation in containerize script
Andrew Cooper [Fri, 22 Mar 2019 11:12:28 +0000 (11:12 +0000)]
CI: Fix indentation in containerize script

The script is mostly indented with spaces, but there are three tabs.  Fix them
up to be consistent.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agodocs/admin-guide: Boot time microcode loading
Andrew Cooper [Mon, 18 Mar 2019 16:22:29 +0000 (16:22 +0000)]
docs/admin-guide: Boot time microcode loading

Recent discussion on xen-devel has demonstrated that Xen existing microcode
loading support isn't adequately documented.  Take the opportunity to address
this, and start some end-user focused documentation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agodocs/rst: Use pandoc to render ReStructuredText
Andrew Cooper [Wed, 21 Nov 2018 17:03:50 +0000 (17:03 +0000)]
docs/rst: Use pandoc to render ReStructuredText

Sphinx uses ReStructuredText as its markup format.  Although missing the
project wide integration, individual *.rst files can be rendered by pandoc to
suppliement our existing ad-hoc documentation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agodocs/sphinx: Skeleton setup
Andrew Cooper [Wed, 21 Nov 2018 17:03:50 +0000 (17:03 +0000)]
docs/sphinx: Skeleton setup

Sphinx is a documentation system, which is popular for technical writing.  It
uses ReStructuredText as its markup syntax, and is designed for whole-project
documentation, rather than the misc assortment of individual files that we
currently have.

This is a skeleton setup which just enough infrastructure to render an empty
set of pages.  It will become better integrated into Xen's docs system when it
becomes less WIP.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agopassthrough/vtd: Drop the "workaround_bios_bug" logic entirely
Andrew Cooper [Thu, 21 Mar 2019 19:36:48 +0000 (19:36 +0000)]
passthrough/vtd: Drop the "workaround_bios_bug" logic entirely

It turns out that this code was previously dead.

c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
enough for acpi_parse_one_drhd() to not take the

  /* Skip checking if segment is not accessible yet. */

path unconditionally.  However, some systems have DMAR tables which list
devices which are disabled by user choice (in particular, Dell PowerEdge R740
with I/O AT DMA disabled), and turning off all IOMMU functionality in this
case is entirely unhelpful behaviour.

Leave the warning which identifies the problematic devices, but drop the
remaining logic.  This leaves the system in better overall state, and working
in the same way that it did in previous releases.

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/drivers: char: Match #if CONFIG_DEBUG_TRACE and #endif comment
Julien Grall [Tue, 4 Dec 2018 18:02:40 +0000 (18:02 +0000)]
xen/drivers: char: Match #if CONFIG_DEBUG_TRACE and #endif comment

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/memory: Fix typo in the comment on top of check_get_page_from_gfn
Julien Grall [Sat, 9 Mar 2019 21:20:23 +0000 (21:20 +0000)]
xen/memory: Fix typo in the comment on top of check_get_page_from_gfn

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/mm: Fix typo in comment on top of page_lock
Julien Grall [Sun, 10 Mar 2019 12:41:01 +0000 (12:41 +0000)]
x86/mm: Fix typo in comment on top of page_lock

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agolibxc: fix HVM core dump
Wei Liu [Wed, 20 Mar 2019 15:43:38 +0000 (15:43 +0000)]
libxc: fix HVM core dump

f969bc9fc96 forbid get_address_size call on HVM guests, because that
didn't make sense. It broke core dump functionality on HVM because
libxc unconditionally asked for guest width.

Force guest_width to a sensible value.

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agox86: decouple xen alignment setting from EFI/ELF build
Wei Liu [Tue, 19 Mar 2019 13:57:06 +0000 (13:57 +0000)]
x86: decouple xen alignment setting from EFI/ELF build

Introduce a new Kconfig option to pick the alignment for xen binary.
To retain original behaviour, the default pick for EFI build is 2M and
ELF build 4K.

Make the PVHSHIM build use 2M alignment for potentially better
performance.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>