]> xenbits.xensource.com Git - people/dariof/xen.git/log
people/dariof/xen.git
5 years agox86/pv: Fix Clang build with !CONFIG_PV32
Andrew Cooper [Tue, 5 May 2020 13:03:35 +0000 (14:03 +0100)]
x86/pv: Fix Clang build with !CONFIG_PV32

Clang 3.5 doesn't do enough dead-code-elimination to drop the compat_gdt
reference, resulting in a linker failure:

  hidden symbol `per_cpu__compat_gdt' isn't defined

Drop the local variable, and move the evaluation of this_cpu(compat_gdt) to
within the guarded region.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/pv: Prune include lists
Andrew Cooper [Tue, 5 May 2020 10:27:22 +0000 (11:27 +0100)]
x86/pv: Prune include lists

Several of these in particular haven't been pruned since the logic was all
part of arch/x86/traps.c

Some adjustments to header files are required to avoid compile errors:
 * emulate.h needs xen/sched.h because gdt_ldt_desc_ptr() uses v->vcpu_id.
 * mmconfig.h needs to forward declare acpi_table_header.
 * shadow.h and trace.h need to have uint*_t in scope before including the Xen
   public headers.  For shadow.h, reorder the includes.  For trace.h, include
   types.h

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/pv: Compile out emul-gate-op in !CONFIG_PV32 builds
Andrew Cooper [Tue, 5 May 2020 10:17:32 +0000 (11:17 +0100)]
x86/pv: Compile out emul-gate-op in !CONFIG_PV32 builds

The caller is already guarded by is_pv_32bit_vcpu().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hvm: simplify hvm_physdev_op allowance control
Roger Pau Monné [Tue, 5 May 2020 07:52:28 +0000 (09:52 +0200)]
x86/hvm: simplify hvm_physdev_op allowance control

PVHv1 dom0 was given access to all PHYSDEVOP hypercalls, and such
restriction was not removed when PVHv1 code was removed. As a result
the switch in hvm_physdev_op was more complicated than required, and
relied on PVHv2 dom0 not having PIRQ support in order to prevent
access to some PV specific PHYSDEVOPs.

Fix this by moving the default case to the bottom of the switch, since
there's no need for any fall through now. Also remove the hardware
domain check, as all the not explicitly listed PHYSDEVOPs are
forbidden for HVM domains.

Finally tighten the condition to allow usage of
PHYSDEVOP_pci_mmcfg_reserved: apart from having vPCI enabled it should
only be used by the hardware domain. Note that the code in
do_physdev_op is already restricting the call to privileged domains
only, but it can be further restricted to the hardware domain only, as
other privileged domains don't have access to MMCFG regions anyway.

Overall no functional change should arise from this change.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: extend x86_insn_is_mem_write() coverage
Jan Beulich [Tue, 5 May 2020 07:50:54 +0000 (09:50 +0200)]
x86emul: extend x86_insn_is_mem_write() coverage

Several insns were missed when this function was first added. As far as
insns already supported by the emulator go - SMSW and {,V}STMXCSR were
wrongly considered r/o insns so far.

Insns like the VMX, SVM, or CET-SS ones, PTWRITE, or AMD's new SNP ones
are intentionally not covered just yet. VMPTRST is put there just to
complete the respective group.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/amd: Initial support for Fam19h processors
Andrew Cooper [Thu, 30 Apr 2020 09:47:14 +0000 (10:47 +0100)]
x86/amd: Initial support for Fam19h processors

Fam19h is very similar to Fam17h in these regards.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/HyperV: correct hv_hcall_page for xen.efi build
Jan Beulich [Mon, 4 May 2020 09:54:35 +0000 (11:54 +0200)]
x86/HyperV: correct hv_hcall_page for xen.efi build

Along the lines of what the not reverted part of 3c4b2eef4941 ("x86:
refine link time stub area related assertion") did, we need to transform
the absolute HV_HCALL_PAGE into the image base relative hv_hcall_page
(or else there'd be no need for two distinct symbols). Otherwise
mkreloc, as used for generating the base relocations of xen.efi, will
spit out warnings like "Difference at .text:0009b74f is 0xc0000000
(expected 0x40000000)". As long as the offending relocations are PC
relative ones, the generated binary is correct afaict, but if there ever
was the absolute address stored, xen.efi would miss a fixup for it.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/EFI: correct section offsets in mkreloc diagnostics
Jan Beulich [Mon, 4 May 2020 09:53:42 +0000 (11:53 +0200)]
x86/EFI: correct section offsets in mkreloc diagnostics

These are more helpful if they point at the address where the relocated
value starts, rather than at the specific byte of the difference.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/hap: be more selective with assisted TLB flush
Roger Pau Monné [Mon, 4 May 2020 09:53:01 +0000 (11:53 +0200)]
x86/hap: be more selective with assisted TLB flush

When doing an assisted flush on HAP the purpose of the
on_selected_cpus is just to trigger a vmexit on remote CPUs that are
in guest context, and hence just using is_vcpu_dirty_cpu is too lax,
also check that the vCPU is running. Due to the lazy context switching
done by Xen dirty_cpu won't always be cleared when the guest vCPU is
not running, and hence relying on is_running allows more fine grained
control of whether the vCPU is actually running.

I've measured the time of the non-local branch of flush_area_mask
inside the shim running with 32vCPUs over 100000 executions and
averaged the result on a large Westmere system (80 ways total). The
figures where fetched during the boot of a SLES 11 PV guest. The
results are as follow (less is better):

Non assisted flush with x2APIC:      112406ns
Assisted flush without this patch:   820450ns
Assisted flush with this patch:        8330ns

While there also pass NULL as the data parameter of on_selected_cpus,
the dummy handler doesn't consume the data in any way.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxenoprof: limit scope of types and #define-s
Jan Beulich [Mon, 4 May 2020 09:51:47 +0000 (11:51 +0200)]
xenoprof: limit scope of types and #define-s

Quite a few of the items are used by xenoprof.c only, so move them there
to limit their visibility as well as the amount of re-building needed in
case of changes. Also drop the inclusion of the public header there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxenoprof: drop unused struct xenoprof fields
Jan Beulich [Mon, 4 May 2020 09:51:18 +0000 (11:51 +0200)]
xenoprof: drop unused struct xenoprof fields

Both is_primary and domain_ready are only ever written to. Drop both
fields and restrict structure visibility to just the one involved CU.
While doing so (and just for starters) make "is_compat" properly bool.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agoxenoprof: adjust ordering of page sharing vs domain type setting
Jan Beulich [Mon, 4 May 2020 09:48:13 +0000 (11:48 +0200)]
xenoprof: adjust ordering of page sharing vs domain type setting

Buffer pages should be shared with "ignored" or "active" guests only
(besides, obviously, the primary profiling domain). Hence domain type
should be set to "ignored" before unsharing from the primary domain
(which implies even a previously "passive" domain may then access its
buffers, albeit that's not very useful unless it gets promoted to
"active" subsequently), i.e. such that no further writes of records to
the buffer would occur, and (at least for consistency) also before
sharing it (with the calling domain) from the XENOPROF_get_buffer path.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/CPUID: correct error indicator for max extended leaf
Jan Beulich [Thu, 30 Apr 2020 08:45:09 +0000 (10:45 +0200)]
x86/CPUID: correct error indicator for max extended leaf

With the max base leaf using 0, this one should be using the extended
leaf counterpart thereof, rather than some arbitrary extended leaf.

Fixes: 588a966a572e ("libx86: Introduce x86_cpu_policies_are_compatible()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/pv: map and unmap page tables in mark_pv_pt_pages_rdonly
Wei Liu [Thu, 30 Apr 2020 08:44:34 +0000 (10:44 +0200)]
x86/pv: map and unmap page tables in mark_pv_pt_pages_rdonly

Also, clean up the initialisation of plXe.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agomem_sharing: map shared_info page to same gfn during fork
Tamas K Lengyel [Thu, 30 Apr 2020 08:43:52 +0000 (10:43 +0200)]
mem_sharing: map shared_info page to same gfn during fork

During a VM fork we copy the shared_info page; however, we also need to ensure
that the page is mapped into the same GFN in the fork as its in the parent.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/pass-through: avoid double IRQ unbind during domain cleanup
Jan Beulich [Thu, 30 Apr 2020 08:40:59 +0000 (10:40 +0200)]
x86/pass-through: avoid double IRQ unbind during domain cleanup

XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTs.
In that scenario, it is possible to receive multiple _pirq_guest_unbind
calls for the same pirq from domain_kill, if the pirq has not yet been
removed from the domain's pirq_tree, as:
  domain_kill()
    -> domain_relinquish_resources()
      -> pci_release_devices()
        -> pci_clean_dpci_irq()
          -> pirq_guest_unbind()
            -> __pirq_guest_unbind()

Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ
from the tree being iterated after the first call there. In case such a
removed entry still has a softirq outstanding, record it and re-check
upon re-invocation.

Note that pirq_cleanup_check() gets relaxed beyond what's strictly
needed here, to avoid introducing an asymmetry there between HVM and PV
guests.

Reported-by: Varad Gautam <vrd@amazon.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Varad Gautam <vrd@amazon.de>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86: drop high compat r/o M2P table address range
Jan Beulich [Thu, 30 Apr 2020 08:38:07 +0000 (10:38 +0200)]
x86: drop high compat r/o M2P table address range

Now that we don't properly hook things up into the page tables anymore
we also don't need to set aside an address range. Drop it, using
compat_idle_pg_table_l2[] simply (explicitly) from slot 0.

While doing the re-arrangement, which is accompanied by the dropping or
replacing of some local variables, restrict the scopes of some further
ones at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/msr: Fix XEN_MSR_PAT to build with older binutils
Andrew Cooper [Thu, 30 Apr 2020 08:34:56 +0000 (10:34 +0200)]
x86/msr: Fix XEN_MSR_PAT to build with older binutils

Older binutils complains with:
  trampoline.S:95: Error: junk `ul&0xffffffff' after expression

Use an assembly-safe constant.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: drop unnecessary page table walking in compat r/o M2P handling
Jan Beulich [Thu, 30 Apr 2020 08:28:27 +0000 (10:28 +0200)]
x86: drop unnecessary page table walking in compat r/o M2P handling

We have a global variable where the necessary L2 table is recorded; no
need to inspect L4 and L3 tables (and this way a few less places will
eventually need adjustment when we want to support 5-level page tables).
Also avoid setting up the L3 entry, as the address range never gets used
anyway (it'll be dropped altogether in a subsequent patch).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/boot: Don't enable EFER.SCE for !CONFIG_PV builds
Andrew Cooper [Mon, 20 Apr 2020 13:36:53 +0000 (14:36 +0100)]
x86/boot: Don't enable EFER.SCE for !CONFIG_PV builds

This will cause all SYSCALL/SYSRET instructions to suffer #UD rather than
following the MSR_{L,C}STAR pointers, allowing us to drop the star_enter()
panic helper, allowing us to clean up the IST stacks in a subsequent patch.

Drop the now-dead conditional SYSENTER logic in the middle of
subarch_percpu_traps_init().

In addition, vmx_restore_host_msrs() need not restore any host
state.  (Regarding the asymmetric changes, VT-x automatically restores
SYSENTER state on vmexit, and SVM restores both SYSCALL/SYSENTER state with
the VMSAVE/VMLOAD instructions.)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/pv: Compile out compat_gdt in !CONFIG_PV builds
Andrew Cooper [Fri, 17 Apr 2020 14:49:59 +0000 (15:49 +0100)]
x86/pv: Compile out compat_gdt in !CONFIG_PV builds

There is no need for the Compat GDT if there are no 32bit PV guests.  This
saves 4k per online CPU

Bloat-o-meter reports the following savings in Xen itself:

  add/remove: 0/3 grow/shrink: 1/4 up/down: 7/-4612 (-4605)
  Function                                     old     new   delta
  cpu_smpboot_free                            1249    1256      +7
  per_cpu__compat_gdt_l1e                        8       -      -8
  per_cpu__compat_gdt                            8       -      -8
  init_idt_traps                               442     420     -22
  load_system_tables                           414     364     -50
  trap_init                                    444     280    -164
  cpu_smpboot_callback                        1255     991    -264
  boot_compat_gdt                             4096       -   -4096
  Total: Before=3062726, After=3058121, chg -0.15%

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/pv: Short-circuit is_pv_{32,64}bit_domain() in !CONFIG_PV32 builds
Andrew Cooper [Fri, 17 Apr 2020 14:36:06 +0000 (15:36 +0100)]
x86/pv: Short-circuit is_pv_{32,64}bit_domain() in !CONFIG_PV32 builds

... and move arch.is_32bit_pv into the pv union while at it.

Adjust the impacted code to use true/false, dropping the hunk
pv_domain_initialise() which is storing 0 into an already zeroed
datastructure.

Bloat-o-meter reports the following net savings with some notable differences
highlighted:

  add/remove: 4/6 grow/shrink: 5/76 up/down: 1955/-18792 (-16837)
  Function                                     old     new   delta
  ...
  pv_vcpu_initialise                           411     158    -253
  guest_cpuid                                 1837    1584    -253
  pv_hypercall                                 579     297    -282
  check_descriptor                             427     130    -297
  _get_page_type                              5915    5202    -713
  arch_get_info_guest                         2225    1195   -1030
  context_switch                              3831    2635   -1196
  dom0_construct_pv                          10284    8939   -1345
  arch_set_info_guest                         5564    3267   -2297
  Total: Before=3079563, After=3062726, chg -0.55%

In principle, DOMAIN_is_32bit_pv should be based on CONFIG_PV32, but the
assembly code is going to need further untangling before that becomes easy to
do.  For now, use CONFIG_PV as missed accidentally by c/s ec651bd2460 "x86:
make entry point code build when !CONFIG_PV".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/pv: Options to disable and/or compile out 32bit PV support
Andrew Cooper [Fri, 17 Apr 2020 11:39:40 +0000 (12:39 +0100)]
x86/pv: Options to disable and/or compile out 32bit PV support

This is the start of some performance and security-hardening improvements,
based on the fact that 32bit PV guests are few and far between these days.

Ring1 is full of architectural corner cases, such as counting as supervisor
from a paging point of view.  This accounts for a substantial performance hit
on processors from the last 8 years (adjusting SMEP/SMAP on every privilege
transition), and the gap is only going to get bigger with new hardware
features.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/S3: Drop {save,restore}_rest_processor_state() completely
Andrew Cooper [Wed, 11 Dec 2019 20:59:19 +0000 (20:59 +0000)]
x86/S3: Drop {save,restore}_rest_processor_state() completely

There is no need to save/restore FS/GS/XCR0 state.  It will be handled
suitably on the context switch away from the idle.

The CR4 restoration in restore_rest_processor_state() was actually fighting
later code in enter_state() which tried to keep CR4.MCE clear until everything
was set up.  Delete the intermediate restoration, and defer final restoration
until after MCE is reconfigured.

Restoring PAT can be done earlier, and ideally before paging is enabled.  By
moving it into the trampoline during the setup for 64bit, the call can be
dropped from cpu_init().  The EFI path boot path doesn't disable paging, so
make the adjustment when switching onto Xen's pagetables.

The only remaing piece of restoration is load_system_tables(), so suspend.c
can be deleted in its entirety.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/grants: fix hypercall continuation for GNTTABOP_cache_flush
Juergen Gross [Wed, 22 Apr 2020 13:07:53 +0000 (15:07 +0200)]
xen/grants: fix hypercall continuation for GNTTABOP_cache_flush

The GNTTABOP_cache_flush hypercall has a wrong test for hypercall
continuation, the test today is:

    if ( rc > 0 || opaque_out != 0 )

Unfortunately this will be true even in case of an error (rc < 0),
possibly leading to very long lasting hypercalls (times of more
than an hour have been observed in a test case).

Correct the test condition to result in false with rc < 0 and set
opaque_out only if no error occurred, to be on the safe side.

Partially-suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86/hyperv: L0 assisted TLB flush
Wei Liu [Thu, 9 Apr 2020 17:41:04 +0000 (18:41 +0100)]
x86/hyperv: L0 assisted TLB flush

Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
of several hypercalls:

 * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX

Pick the most efficient hypercall available.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agox86/hyperv: skeleton for L0 assisted TLB flush
Wei Liu [Thu, 9 Apr 2020 17:41:03 +0000 (18:41 +0100)]
x86/hyperv: skeleton for L0 assisted TLB flush

Implement a basic hook for L0 assisted TLB flush. The hook needs to
check if prerequisites are met. If they are not met, it returns an error
number to fall back to native flushes.

Introduce a new variable to indicate if hypercall page is ready.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agox86/hypervisor: pass flags to hypervisor_flush_tlb
Wei Liu [Thu, 9 Apr 2020 17:41:02 +0000 (18:41 +0100)]
x86/hypervisor: pass flags to hypervisor_flush_tlb

Hyper-V's L0 assisted flush has fine-grained control over what gets
flushed. We need all the flags available to make the best decisions
possible.

No functional change because Xen's implementation doesn't care about
what is passed to it.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/tlb: use Xen L0 assisted TLB flush when available
Roger Pau Monné [Wed, 29 Apr 2020 07:10:19 +0000 (09:10 +0200)]
x86/tlb: use Xen L0 assisted TLB flush when available

Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
This greatly increases the performance of TLB flushes when running
with a high amount of vCPUs as a Xen guest, and is specially important
when running in shim mode.

The following figures are from a PV guest running `make -j32 xen` in
shim mode with 32 vCPUs and HAP.

Using x2APIC and ALLBUT shorthand:
real 4m35.973s
user 4m35.110s
sys 36m24.117s

Using L0 assisted flush:
real    1m2.596s
user    4m34.818s
sys     5m16.374s

The implementation adds a new hook to hypervisor_ops so other
enlightenments can also implement such assisted flush just by filling
the hook.

Note that the Xen implementation completely ignores the dirty CPU mask
and the linear address passed in, and always performs a global TLB
flush on all vCPUs. This is a limitation of the hypercall provided by
Xen. Also note that local TLB flushes are not performed using the
assisted TLB flush, only remote ones.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/tlb: allow disabling the TLB clock
Roger Pau Monné [Wed, 29 Apr 2020 07:07:32 +0000 (09:07 +0200)]
x86/tlb: allow disabling the TLB clock

The TLB clock is helpful when running Xen on bare metal because when
doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
last flush.

This is not the case however when Xen is running virtualized, and the
underlying hypervisor provides mechanism to assist in performing TLB
flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
order to perform a TLB flush without having to IPI each CPU. When
using such mechanisms it's no longer possible to keep a timestamp of
the flushes on each CPU, as they are performed by the underlying
hypervisor.

Offer a boolean in order to signal Xen that the timestamped TLB
shouldn't be used. This avoids keeping the timestamps of the flushes,
and also forces NEED_FLUSH to always return true.

No functional change intended, as this change doesn't introduce any
user that disables the timestamped TLB.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/tlb: introduce a flush HVM ASIDs flag
Roger Pau Monné [Wed, 29 Apr 2020 07:04:40 +0000 (09:04 +0200)]
x86/tlb: introduce a flush HVM ASIDs flag

Introduce a specific flag to request a HVM guest linear TLB flush,
which is an ASID/VPID tickle that forces a guest linear to guest
physical TLB flush for all HVM guests.

This was previously unconditionally done in each pre_flush call, but
that's not required: HVM guests not using shadow don't require linear
TLB flushes as Xen doesn't modify the pages tables the guest runs on
in that case (ie: when using HAP). Note that shadow paging code
already takes care of issuing the necessary flushes when the shadow
page tables are modified.

In order to keep the previous behavior modify all shadow code TLB
flushes to also flush the guest linear to physical TLB if the guest is
HVM. I haven't looked at each specific shadow code TLB flush in order
to figure out whether it actually requires a guest TLB flush or not,
so there might be room for improvement in that regard.

Also perform ASID/VPID flushes when modifying the p2m tables as it's a
requirement for AMD hardware. Finally keep the flush in
switch_cr3_cr4, as it's not clear whether code could rely on
switch_cr3_cr4 also performing a guest linear TLB flush. A following
patch can remove the ASID/VPID tickle from switch_cr3_cr4 if found to
not be necessary.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agoPCI: drop a redundant variable from pci_add_device()
Jan Beulich [Tue, 28 Apr 2020 15:49:55 +0000 (17:49 +0200)]
PCI: drop a redundant variable from pci_add_device()

Surrounding code already uses the available alternative, after all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86/pv: map and unmap page table in dom0_construct_pv
Wei Liu [Tue, 28 Apr 2020 15:49:17 +0000 (17:49 +0200)]
x86/pv: map and unmap page table in dom0_construct_pv

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/smpboot: map and unmap page tables in cleanup_cpu_root_pgt
Wei Liu [Tue, 28 Apr 2020 15:48:36 +0000 (17:48 +0200)]
x86/smpboot: map and unmap page tables in cleanup_cpu_root_pgt

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86_64/mm: map and unmap page tables in subarch_memory_op
Wei Liu [Tue, 28 Apr 2020 15:48:02 +0000 (17:48 +0200)]
x86_64/mm: map and unmap page tables in subarch_memory_op

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86_64/mm: map and unmap page tables in subarch_init_memory
Wei Liu [Tue, 28 Apr 2020 15:47:20 +0000 (17:47 +0200)]
x86_64/mm: map and unmap page tables in subarch_init_memory

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86_64/mm: map and unmap page tables in cleanup_frame_table
Wei Liu [Tue, 28 Apr 2020 15:46:29 +0000 (17:46 +0200)]
x86_64/mm: map and unmap page tables in cleanup_frame_table

Also fix a weird indentation and use PAGE_{MASK,SIZE} there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/xenstore: simplify socket initialization
Juergen Gross [Tue, 28 Apr 2020 14:58:37 +0000 (16:58 +0200)]
tools/xenstore: simplify socket initialization

The setup of file descriptors for the Xenstore sockets is needlessly
complicated: the space is allocated dynamically, while two static
variables really would do the job.

For tearing down the sockets it is easier to widen the scope of the
file descriptors from function to file.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agoMAINTAINERS: list myself as mini-os reviewer
Wei Liu [Tue, 28 Apr 2020 11:23:46 +0000 (12:23 +0100)]
MAINTAINERS: list myself as mini-os reviewer

I probably don't have much time to actually review patches, but I do
want to be CC'ed such that I can commit patches in a timely manner.

Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/pvh: Override opt_console_xen earlier
Andrew Cooper [Mon, 27 Apr 2020 12:19:15 +0000 (13:19 +0100)]
x86/pvh: Override opt_console_xen earlier

This allows printk() to work from the start of day, and backtraces from as
early as the IDT is set up.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/S3: Use percpu_traps_init() rather than opencoding SYSCALL/SYSENTER restoration
Andrew Cooper [Mon, 20 Apr 2020 13:54:30 +0000 (14:54 +0100)]
x86/S3: Use percpu_traps_init() rather than opencoding SYSCALL/SYSENTER restoration

This make the S3 BSP path consistent with AP paths, and reduces the amount of
state needing stashing specially.  Also, it takes care of re-setting up Xen's
LBR configuration if requested, which was missing previously.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agogolang/xenlight: stop tracking generated files
Nick Rosbrook [Fri, 24 Apr 2020 03:05:41 +0000 (23:05 -0400)]
golang/xenlight: stop tracking generated files

The generated go files were tracked temporarily while the initial
implementation of gengotypes.py was in progress. They can now be removed
and ignored by git and hg.

While here, make sure generated files are removed by make clean.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools: build golang tools if go compiler is present
Nick Rosbrook [Fri, 24 Apr 2020 03:05:40 +0000 (23:05 -0400)]
tools: build golang tools if go compiler is present

By default, if the go compiler is found by the configure script, build
the golang tools. If the compiler is not found, and --enable-golang was
not explicitly set, do not build to the golang tools.

The corresponding make variable is CONFIG_GOLANG. Remove CONFIG_GOLANG
from tools/Rules.mk since the variable is now set by configure in
config/Tools.mk.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agoxen/build: silence make warnings about missing auto.conf*
Anthony PERARD [Mon, 27 Apr 2020 07:31:13 +0000 (09:31 +0200)]
xen/build: silence make warnings about missing auto.conf*

In a clean tree, both files include/config/auto.conf{,.cmd} are
missing and older version of GNU Make complain about it:
    Makefile:103: include/config/auto.conf: No such file or directory
    Makefile:106: include/config/auto.conf.cmd: No such file or directory

Those warnings are harmless, make will create the files and start over. But
to avoid confusion, we'll use "-include" to silence the warning.

Those warning started to appear with commit 6c122d3984a5 ("xen/build:
include include/config/auto.conf in main Makefile").

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoguestcopy: evaluate {,__}copy{,_field}_to_guest*() ptr argument just once
Jan Beulich [Mon, 27 Apr 2020 07:30:16 +0000 (09:30 +0200)]
guestcopy: evaluate {,__}copy{,_field}_to_guest*() ptr argument just once

There's nothing wrong with having e.g.

    copy_to_guest(uarg, ptr++, 1);

yet until now this would increment "ptr" twice.

Also drop a pair of unneeded parentheses from every instance at this
occasion.

Fixes: b7954cc59831 ("Enhance guest memory accessor macros so that source operands can be")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agoguest_access: harden *copy_to_guest_offset() to prevent const dest operand
Julien Grall [Mon, 27 Apr 2020 07:28:21 +0000 (09:28 +0200)]
guest_access: harden *copy_to_guest_offset() to prevent const dest operand

At the moment, *copy_to_guest_offset() will allow the hypervisor to copy
data to guest handle marked const.

Thankfully, no users of the helper will do that. Rather than hoping this
can be caught during review, harden copy_to_guest_offset() so the build
will fail if such users are introduced.

There is no easy way to check whether a const is NULL in C99. The
approach used is to introduce an unused variable that is non-const and
assign the handle. If the handle were const, this would fail at build
because without an explicit cast, it is not possible to assign a const
variable to a non-const variable.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoIntroduce a description of the Backport and Fixes tags
Stefano Stabellini [Tue, 21 Apr 2020 18:29:46 +0000 (11:29 -0700)]
Introduce a description of the Backport and Fixes tags

Create a new document under docs/process to describe our special tags.
Add a description of the Fixes tag and the new Backport tag. Also
clarify that lines with tags should not be split.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: jbeulich@suse.com
CC: george.dunlap@citrix.com
CC: julien@xen.org
CC: lars.kurth@citrix.com
CC: andrew.cooper3@citrix.com
CC: konrad.wilk@oracle.com
5 years agoUpdate QEMU_TRADITIONAL_REVISION
Ian Jackson [Fri, 24 Apr 2020 14:49:23 +0000 (15:49 +0100)]
Update QEMU_TRADITIONAL_REVISION

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86: drop cpu_has_ffxsr
Jan Beulich [Fri, 24 Apr 2020 13:06:15 +0000 (15:06 +0200)]
x86: drop cpu_has_ffxsr

It's definition is bogus when it comes to Hygon CPUs, but since we don't
use it anywhere drop it rather than correcting it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agomem_sharing: fix sharability check during fork reset
Tamas K Lengyel [Fri, 24 Apr 2020 13:05:33 +0000 (15:05 +0200)]
mem_sharing: fix sharability check during fork reset

When resetting a VM fork we ought to only remove pages that were allocated for
the fork during it's execution and the contents copied over from the parent.
This can be determined if the page is sharable as special pages used by the
fork for other purposes will not pass this test. Unfortunately during the fork
reset loop we only partially check whether that's the case. A page's type may
indicate it is sharable (pass p2m_is_sharable) but that's not a sufficient
check by itself. All checks that are normally performed before a page is
converted to the sharable type need to be performed to avoid removing pages
from the p2m that may be used for other purposes. For example, currently the
reset loop also removes the vcpu info pages from the p2m, potentially putting
the guest into infinite page-fault loops.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoxen/build: start using if_changed
Anthony PERARD [Fri, 24 Apr 2020 13:02:03 +0000 (15:02 +0200)]
xen/build: start using if_changed

This patch start to use if_changed introduced in a previous commit.

Whenever if_changed is called, the target must have FORCE as
dependency so that if_changed can check if the command line to be
run has changed, so the macro $(real-prereqs) must be used to
discover the dependencies without "FORCE".

Whenever a target isn't in obj-y, it should be added to extra-y so the
.*.cmd dependency file associated with the target can be loaded. This
is done for xsm/flask/ and both common/lib{elf,fdt}/ and
arch/x86/Makefile.

For the targets that generate .*.d dependency files, there's going to
be two dependency files (.*.d and .*.cmd) until we can merge them
together in a later patch via fixdep from Linux.

One cleanup, libelf-relocate.o doesn't exist anymore.

We import cmd_ld and cmd_objcopy from Linux v5.4.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agoxen/build: introduce if_changed and if_changed_rule
Anthony PERARD [Fri, 24 Apr 2020 13:01:11 +0000 (15:01 +0200)]
xen/build: introduce if_changed and if_changed_rule

The if_changed macro from Linux, in addition to check if any files
needs an update, check if the command line has changed since the last
invocation. The latter will force a rebuild if any options to the
executable have changed.

if_changed_rule checks dependencies like if_changed, but execute
rule_$(1) instead of cmd_$(1) when a target needs to be rebuilt. A rule_
macro can call more than one cmd_ macro. One of the cmd_ macro in a
rule need to be call using a macro that record the command line, so
cmd_and_record is introduced. It is similar to cmd_and_fixup from
Linux but without a call to fixdep which we don't have yet. (We will
later replace cmd_and_record by cmd_and_fixup.)

Example of a rule_ macro:
define rule_cc_o_c
    $(call cmd_and_record,cc_o_o)
    $(call cmd,objcopy)
endef

This needs one of the call to use cmd_and_record, otherwise no .*.cmd
file will be created, and the target will keep been rebuilt.

In order for if_changed to works correctly, we need to load the .%.cmd
files that the macro generates, this is done by adding targets in to
the $(targets) variable. We use intermediate_targets to add %.init.o
dependency %.o to target since there aren't in obj-y.

We also add $(MAKECMDGOALS) to targets so that when running for
example `make common/memory.i`, make will load the associated .%.cmd
dependency file.

Beside the if_changed*, we import the machinery used for a "beautify
output". The important one is when running make with V=2 which help to
debug the makefiles by printing why a target is been rebuilt, via the
$(echo-why) macro.

if_changed and if_changed_rule aren't used yet.

Most of this code is copied from Linux v5.4, including the
documentation.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: introduce documentation for xen Makefiles
Anthony PERARD [Fri, 24 Apr 2020 13:00:37 +0000 (15:00 +0200)]
build: introduce documentation for xen Makefiles

This start explainning the variables that can be used in the many
Makefiles in xen/.

Most of the document copies and modifies text from Linux v5.4 document
linux.git/Documentation/kbuild/makefiles.rst. Modification are mostly
to avoid mentioning kbuild. Thus I've added the SPDX tag which was
only in index.rst in linux.git.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/build: have the root Makefile generates the CFLAGS
Anthony PERARD [Fri, 24 Apr 2020 12:57:10 +0000 (14:57 +0200)]
xen/build: have the root Makefile generates the CFLAGS

Instead of generating the CFLAGS in Rules.mk everytime we enter a new
subdirectory, we are going to generate most of them a single time, and
export the result in the environment so that Rules.mk can use it.  The
only flags left to be generated are the ones that depend on the
targets, but the variable $(c_flags) takes care of that.

Arch specific CFLAGS are generated by a new file "arch/*/arch.mk"
which is included by the root Makefile.

We export the *FLAGS via the environment variables XEN_*FLAGS because
Rules.mk still includes Config.mk and would add duplicated flags to
CFLAGS.

When running Rules.mk in the root directory (xen/), the variable
`root-make-done' is set, so `need-config' will remain undef and so the
root Makefile will not generate the cflags again.

We can't use CFLAGS in subdirectories to add flags to particular
targets, instead start to use CFLAGS-y. Idem for AFLAGS.
So there are two different CFLAGS-y, the one in xen/Makefile (and
arch.mk), and the one in subdirs that Rules.mk is going to use.
We can't add to XEN_CFLAGS because it is exported, so making change to
it might be propagated to subdirectory which isn't intended.

Some style change are introduced in this patch:
    when LDFLAGS_DIRECT is included in LDFLAGS
    use of CFLAGS-$(CONFIG_INDIRECT_THUNK) instead of ifeq().

The LTO change hasn't been tested properly, as LTO is marked as
broken.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agogolang/xenlight: Implement DomainCreateNew
George Dunlap [Tue, 24 Dec 2019 12:51:56 +0000 (12:51 +0000)]
golang/xenlight: Implement DomainCreateNew

This implements the wrapper around libxl_domain_create_new().  With
the previous changes, it's now possible to create a domain using the
golang bindings (although not yet to unpause it or harvest it after it
shuts down).

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agogolang/xenlight: Notify xenlight of SIGCHLD
George Dunlap [Thu, 26 Dec 2019 17:35:27 +0000 (17:35 +0000)]
golang/xenlight: Notify xenlight of SIGCHLD

libxl forks external processes and waits for them to complete; it
therefore needs to be notified when children exit.

In absence of instructions to the contrary, libxl sets up its own
SIGCHLD handlers.

Golang always unmasks and handles SIGCHLD itself.  libxl thankfully
notices this and throws an assert() rather than clobbering SIGCHLD
handlers.

Tell libxl that we'll be responsible for getting SIGCHLD notifications
to it.  Arrange for a channel in the context to receive notifications
on SIGCHLD, and set up a goroutine that will pass these on to libxl.

NB that every libxl context needs a notification; so multiple contexts
will each spin up their own goroutine when opening a context, and shut
it down on close.

libxl also wants to hold on to a const pointer to
xenlight_childproc_hooks rather than do a copy; so make a global
structure in C space.  Make it `static const`, just for extra safety;
this requires making a function in the C space to pass it to libxl.

While here, add a few comments to make the context set-up a bit easier
to follow.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agogolang/xenlight: Don't try to marshall zero-length arrays in fromC
George Dunlap [Thu, 26 Dec 2019 13:08:05 +0000 (13:08 +0000)]
golang/xenlight: Don't try to marshall zero-length arrays in fromC

The current fromC array code will do the "magic" casting and
martialling even when num_foo variable is 0.  Go crashes when doing
the cast.

Only do array marshalling if the number of elements is non-zero;
otherwise, leave the target pointer empty (nil for Go slices, NULL for
C arrays).

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agogolang/xenlight: add DeviceUsbdevAdd/Remove wrappers
Nick Rosbrook [Sun, 12 Apr 2020 22:02:42 +0000 (18:02 -0400)]
golang/xenlight: add DeviceUsbdevAdd/Remove wrappers

Add DeviceUsbdevAdd and DeviceUsbdevRemove as wrappers for
libxl_device_usbdev_add and libxl_device_usbdev_remove.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agogolang/xenlight: add DevicePciAdd/Remove wrappers
Nick Rosbrook [Sun, 12 Apr 2020 22:02:41 +0000 (18:02 -0400)]
golang/xenlight: add DevicePciAdd/Remove wrappers

Add DevicePciAdd and DevicePciRemove as wrappers for
libxl_device_pci_add and libxl_device_pci remove.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agogolang/xenlight: add DeviceNicAdd/Remove wrappers
Nick Rosbrook [Sun, 12 Apr 2020 22:02:40 +0000 (18:02 -0400)]
golang/xenlight: add DeviceNicAdd/Remove wrappers

Add DeviceNicAdd and DeviceNicRemove as wrappers for
libxl_device_nic_add and libxl_device_nic_remove.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agomem_sharing: allow forking domain with IOMMU enabled
Tamas K Lengyel [Thu, 23 Apr 2020 08:03:18 +0000 (10:03 +0200)]
mem_sharing: allow forking domain with IOMMU enabled

The memory sharing subsystem by default doesn't allow a domain to share memory
if it has an IOMMU active for obvious security reasons. However, when fuzzing a
VM fork, the same security restrictions don't necessarily apply. While it makes
no sense to try to create a full fork of a VM that has an IOMMU attached as only
one domain can own the pass-through device at a time, creating a shallow fork
without a device model is still very useful for fuzzing kernel-mode drivers.

By allowing the parent VM to initialize the kernel-mode driver with a real
device that's pass-through, the driver can enter into a state more suitable for
fuzzing. Some of these initialization steps are quite complex and are easier to
perform when a real device is present. After the initialization, shallow forks
can be utilized for fuzzing code-segments in the device driver that don't
directly interact with the device.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoxen/build: use new $(c_flags) and $(a_flags) instead of $(CFLAGS)
Anthony PERARD [Thu, 23 Apr 2020 08:00:07 +0000 (10:00 +0200)]
xen/build: use new $(c_flags) and $(a_flags) instead of $(CFLAGS)

In a later patch ("xen/build: have the root Makefile generates the
CFLAGS), we want to generate the CFLAGS in xen/Makefile, then export
it and have Rules.mk use a CFLAGS from the environment variables. That
changes the flavor of the CFLAGS and flags intended for one target
(like -D__OBJECT_FILE__ and -M%) gets propagated and duplicated. So we
start by moving such flags out of $(CFLAGS) and into $(c_flags) which
is to be modified by only Rules.mk.

__OBJECT_FILE__ is only used by arch/x86/mm/*.c files, so having it in
$(c_flags) is enough, we don't need it in $(a_flags).

For include/Makefile and as-insn we can keep using CFLAGS, but since
it doesn't have -M* flags anymore there is no need to filter them out.

The XEN_BUILD_EFI tests in arch/x86/Makefile was filtering out
CFLAGS-y, but according to dd40177c1bc8 ("x86-64/EFI: add CFLAGS to
check compile"), it was done to filter out -MF. CFLAGS doesn't
have those flags anymore, so no filtering is needed.

This is inspired by the way Kbuild generates CFLAGS for each targets.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/build: include include/config/auto.conf in main Makefile
Anthony PERARD [Thu, 23 Apr 2020 07:59:27 +0000 (09:59 +0200)]
xen/build: include include/config/auto.conf in main Makefile

We are going to generate the CFLAGS early from "xen/Makefile" instead
of in "Rules.mk", but we need to include "config/auto.conf", so
include it in "Makefile".

Before including "config/auto.conf" we check which make target a user
is calling, as some targets don't need "auto.conf". For targets that
needs auto.conf, make will generate it (and a default .config if
missing).

root-make-done is to avoid doing the calculation again once Rules.mk
takes over and is been executed with the root Makefile. When Rules.mk
is including xen/Makefile, `config-build' and `need-config' are
undefined so auto.conf will not be included again (it is already
included by Rules.mk) and kconfig target are out of reach of Rules.mk.

We are introducing a target %config to catch all targets for kconfig.
So we need an extra target %/.config to prevent make from trying to
regenerate $(XEN_ROOT)/.config that is included in Config.mk.

The way targets are filtered is inspired by Kbuild, with some code
imported from Linux. That's why there is PHONY variable that isn't
used yet, for example.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild,xsm: fix multiple call
Anthony PERARD [Thu, 23 Apr 2020 07:59:05 +0000 (09:59 +0200)]
build,xsm: fix multiple call

Both script mkflask.sh and mkaccess_vector.sh generates multiple
files. Exploits the 'multi-target pattern rule' trick to call each
scripts only once.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mm: use cache in guest_walk_tables()
Jan Beulich [Thu, 23 Apr 2020 07:58:04 +0000 (09:58 +0200)]
x86/mm: use cache in guest_walk_tables()

Emulation requiring device model assistance uses a form of instruction
re-execution, assuming that the second (and any further) pass takes
exactly the same path. This is a valid assumption as far as use of CPU
registers goes (as those can't change without any other instruction
executing in between [1]), but is wrong for memory accesses. In
particular it has been observed that Windows might page out buffers
underneath an instruction currently under emulation (hitting between two
passes). If the first pass translated a linear address successfully, any
subsequent pass needs to do so too, yielding the exact same translation.
To guarantee this, leverage the caching that now backs HVM insn
emulation.

[1] Other than on actual hardware, actions like
    XEN_DOMCTL_sethvmcontext, XEN_DOMCTL_setvcpucontext,
    VCPUOP_initialise, INIT, or SIPI issued against the vCPU can occur
    while the vCPU is blocked waiting for a device model to return data.
    In such cases emulation now gets canceled, though, and hence re-
    execution correctness is unaffected.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <pdurrant@amzn.com>
5 years agox86/HVM: implement memory read caching for insn emulation
Jan Beulich [Thu, 23 Apr 2020 07:55:00 +0000 (09:55 +0200)]
x86/HVM: implement memory read caching for insn emulation

Emulation requiring device model assistance uses a form of instruction
re-execution, assuming that the second (and any further) pass takes
exactly the same path. This is a valid assumption as far as use of CPU
registers goes (as those can't change without any other instruction
executing in between [1]), but is wrong for memory accesses. In
particular it has been observed that Windows might page out buffers
underneath an instruction currently under emulation (hitting between two
passes). If the first pass read a memory operand successfully, any
subsequent pass needs to get to see the exact same value.

Introduce a cache to make sure above described assumption holds. This
is a very simplistic implementation for now: Only exact matches are
satisfied (no overlaps or partial reads or anything); this is sufficient
for the immediate purpose of making re-execution an exact replay. The
cache also won't be used just yet for guest page walks; that'll be the
subject of a subsequent change.

With the cache being generally transparent to upper layers, but with it
having limited capacity yet being required for correctness, certain
users of hvm_copy_from_guest_*() need to disable caching temporarily,
without invalidating the cache. Note that the adjustments here to
hvm_hypercall() and hvm_task_switch() are benign at this point; they'll
become relevant once we start to be able to emulate respective insns
through the main emulator (and more changes will then likely be needed
to nested code).

As to the actual data page in a problamtic scenario, there are a couple
of aspects to take into consideration:
- We must be talking about an insn accessing two locations (two memory
  ones, one of which is MMIO, or a memory and an I/O one).
- If the non I/O / MMIO side is being read, the re-read (if it occurs at
  all) is having its result discarded, by taking the shortcut through
  the first switch()'s STATE_IORESP_READY case in hvmemul_do_io(). Note
  how, among all the re-issue sanity checks there, we avoid comparing
  the actual data.
- If the non I/O / MMIO side is being written, it is the OSes
  responsibility to avoid actually moving page contents to disk while
  there might still be a write access in flight - this is no different
  in behavior from bare hardware.
- Read-modify-write accesses are, as always, complicated, and while we
  deal with them better nowadays than we did in the past, we're still
  not quite there to guarantee hardware like behavior in all cases
  anyway. Nothing is getting worse by the changes made here, afaict.

In __hvm_copy() also reduce p's scope and change its type to void *.

[1] Other than on actual hardware, actions like
    XEN_DOMCTL_sethvmcontext, XEN_DOMCTL_setvcpucontext,
    VCPUOP_initialise, INIT, or SIPI issued against the vCPU can occur
    while the vCPU is blocked waiting for a device model to return data.
    In such cases emulation now gets canceled, though, and hence re-
    execution correctness is unaffected.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <pdurrant@amzn.com>
5 years agox86/HVM: cancel emulation when register state got altered
Jan Beulich [Thu, 23 Apr 2020 07:51:18 +0000 (09:51 +0200)]
x86/HVM: cancel emulation when register state got altered

Re-execution (after having received data from a device model) relies on
the same register state still being in place as it was when the request
was first sent to the device model. Therefore vCPU state changes
effected by remote sources need to result in no attempt of re-execution.
Instead the returned data is to simply be ignored.

Note that any such asynchronous state changes happen with the vCPU at
least paused (potentially down and/or not marked ->is_initialised), so
there's no issue with fiddling with register state behind the actively
running emulator's back. Hence the new function doesn't need to
synchronize with the core emulation logic.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <pdurrant@amzn.com>
5 years agox86: validate VM assist value in arch_set_info_guest()
Jan Beulich [Wed, 22 Apr 2020 11:01:10 +0000 (13:01 +0200)]
x86: validate VM assist value in arch_set_info_guest()

While I can't spot anything that would go wrong, just like the
respective hypercall only permits applicable bits to be set, we should
also do so when loading guest context.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: expose VM assist hypercall
Jan Beulich [Wed, 22 Apr 2020 10:58:25 +0000 (12:58 +0200)]
x86/HVM: expose VM assist hypercall

In preparation for the addition of VMASST_TYPE_runstate_update_flag
commit 72c538cca957 ("arm: add support for vm_assist hypercall") enabled
the hypercall for Arm. I consider it not logical that it then isn't also
exposed to x86 HVM guests (with the same single feature permitted to be
enabled as Arm has); Linux actually tries to use it afaict.

Rather than introducing yet another thin wrapper around vm_assist(),
make that function the main handler, requiring a per-arch
arch_vm_assist_valid_mask() definition instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agox86/mm: monitor table is HVM-only
Jan Beulich [Wed, 22 Apr 2020 08:55:15 +0000 (10:55 +0200)]
x86/mm: monitor table is HVM-only

Move the per-vCPU field to the HVM sub-structure.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/shadow: sh_update_linear_entries() is a no-op for PV
Jan Beulich [Wed, 22 Apr 2020 08:54:08 +0000 (10:54 +0200)]
x86/shadow: sh_update_linear_entries() is a no-op for PV

Consolidate the shadow_mode_external() in here: Check this once at the
start of the function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/shadow: make sh_remove_write_access() helper HVM only
Jan Beulich [Wed, 22 Apr 2020 08:50:05 +0000 (10:50 +0200)]
x86/shadow: make sh_remove_write_access() helper HVM only

Despite the inline attribute at least some clang versions warn about
trace_shadow_wrmap_bf() being unused in !HVM builds. Include the helper
in the #ifdef region.

Fixes: 8b8d011ad868 ("x86/shadow: the guess_wrmap() hook is needed for HVM only")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agoxen/arm: Avoid open-coding the relinquish state machine
Julien Grall [Sun, 19 Apr 2020 09:50:30 +0000 (10:50 +0100)]
xen/arm: Avoid open-coding the relinquish state machine

In commit 0dfffe01d5 "x86: Improve the efficiency of
domain_relinquish_resources()", the x86 version of the function has been
reworked to avoid open-coding the state machine and also add more
documentation.

Bring the Arm version on par with x86 by introducing a documented
PROGRESS() macro to avoid latent bugs and make the new PROG_* states
private to domain_relinquish_resources().

Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: vgic-v3: fix GICD_ISACTIVER range
Peng Fan [Fri, 17 Apr 2020 22:16:09 +0000 (15:16 -0700)]
xen/arm: vgic-v3: fix GICD_ISACTIVER range

The end should be GICD_ISACTIVERN not GICD_ISACTIVER.

See https://marc.info/?l=xen-devel&m=158527653730795 for a discussion on
what it would take to implement GICD_ISACTIVER/GICD_ICACTIVER properly.

We chose v1 instead of v2 of this patch to avoid spamming the console:
v2 adds a printk for every read, and reads can happen often.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
[Stefano: improve commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agox86: Enumeration for Control-flow Enforcement Technology
Andrew Cooper [Fri, 21 Feb 2020 17:56:57 +0000 (17:56 +0000)]
x86: Enumeration for Control-flow Enforcement Technology

The CET spec has been published and guest kernels are starting to get support.
Introduce the CPUID and MSRs, and fully block the MSRs from guest use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/shadow: don't open-code shadow_blow_tables_per_domain()
Jan Beulich [Tue, 21 Apr 2020 09:03:46 +0000 (11:03 +0200)]
x86/shadow: don't open-code shadow_blow_tables_per_domain()

Make shadow_blow_all_tables() call the designated function, and on this
occasion make the function itself use domain_vcpu().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/shadow: the trace_emul_write_val() hook is HVM-only
Jan Beulich [Tue, 21 Apr 2020 09:02:36 +0000 (11:02 +0200)]
x86/shadow: the trace_emul_write_val() hook is HVM-only

Its only caller lives in HVM-only code, and the only caller of
trace_shadow_emulate() also already site in a HVM-only code section.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/mm: pagetable_dying() is HVM-only
Jan Beulich [Tue, 21 Apr 2020 08:59:43 +0000 (10:59 +0200)]
x86/mm: pagetable_dying() is HVM-only

Its only caller lives in HVM-only code.

This involves wider changes, in order to limit #ifdef-ary: Shadow's
SHOPT_FAST_EMULATION and the fields used by it get constrained to HVM
builds as well. Additionally the shadow_{init,continue}_emulation()
stubs for the !HVM case aren't needed anymore and hence get dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/shadow: the guess_wrmap() hook is needed for HVM only
Jan Beulich [Tue, 21 Apr 2020 08:58:45 +0000 (10:58 +0200)]
x86/shadow: the guess_wrmap() hook is needed for HVM only

sh_remove_write_access() bails early for !external guests, and hence its
building and thus the need for the hook can be suppressed altogether in
!HVM configs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/shadow: sh_remove_write_access_from_sl1p() can be static
Jan Beulich [Tue, 21 Apr 2020 08:58:05 +0000 (10:58 +0200)]
x86/shadow: sh_remove_write_access_from_sl1p() can be static

It's only used by common.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/shadow: monitor table is HVM-only
Jan Beulich [Tue, 21 Apr 2020 08:57:04 +0000 (10:57 +0200)]
x86/shadow: monitor table is HVM-only

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/shadow: drop a stray forward structure declaration
Jan Beulich [Tue, 21 Apr 2020 08:55:58 +0000 (10:55 +0200)]
x86/shadow: drop a stray forward structure declaration

struct sh_emulate_ctxt is private to shadow code, and hence a
declaration for it is not needed here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/vtd: relax EPT page table sharing check
Roger Pau Monné [Tue, 21 Apr 2020 08:54:56 +0000 (10:54 +0200)]
x86/vtd: relax EPT page table sharing check

The EPT page tables can be shared with the IOMMU as long as the page
sizes supported by EPT are also supported by the IOMMU.

Current code checks that both the IOMMU and EPT support the same page
sizes, but this is not strictly required, the IOMMU supporting more
page sizes than EPT is fine and shouldn't block page table sharing.

This is likely not a common case (IOMMU supporting more page sizes
than EPT), but should still be fixed for correctness.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86emul: SYSRET must change CPL
Jan Beulich [Tue, 21 Apr 2020 08:51:42 +0000 (10:51 +0200)]
x86emul: SYSRET must change CPL

The special AMD behavior of leaving SS mostly alone wasn't really
complete: We need to adjust CPL aka SS.DPL.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/ocaml: Fix stubs build when OCaml has been compiled with -safe-string
Julien Grall [Mon, 30 Mar 2020 14:14:23 +0000 (15:14 +0100)]
tools/ocaml: Fix stubs build when OCaml has been compiled with -safe-string

The OCaml code has been fixed to handle properly -safe-string in Xen
4.11, however the stubs part were missed.

On OCaml newer than 4.06.1, String_Val() will return a const char *
when using -safe-string leading to build failure when this is used
in place where char * is expected.

The main use in Xen code base is when a new string is allocated. The
suggested approach by the OCaml community [1] is to use the helper
caml_alloc_initialized_string() but it was introduced by OCaml 4.06.1.

The next best approach is to cast String_val() to (char *) as the helper
would have done. So use it when we need to update the new string using
memcpy().

Take the opportunity to remove the unnecessary cast of the source as
mempcy() is expecting a void *.

[1] https://github.com/ocaml/ocaml/pull/1274

Reported-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
5 years agotools/ocaml: libxb: Avoid to use String_val() when value is bytes
Julien Grall [Mon, 30 Mar 2020 17:50:08 +0000 (18:50 +0100)]
tools/ocaml: libxb: Avoid to use String_val() when value is bytes

Commit ec7d54dd1a "ocaml/libs/xb: Use bytes in place of strings for
mutable buffers" switch mutable buffers from string to bytes. However
the C code were still using String_Val() to access them.

While the underlying structure is the same between string and bytes, a
string is meant to be immutable. OCaml 4.06.1 and later will enforce it.
Therefore, it will not be possible to build the OCaml libs when using
-safe-string. This is because String_val() will return a const value.

To avoid plain cast in the code, the code is now switched to use
Bytes_val(). As the macro is not defined in older OCaml version, we need
to provide a stub.

Take the opportunity to switch to const the buffer in
ml_interface_write() as it should not be modified.

Reported-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
5 years agotools/ocaml: libxb: Harden stub_header_of_string()
Julien Grall [Mon, 30 Mar 2020 13:29:10 +0000 (14:29 +0100)]
tools/ocaml: libxb: Harden stub_header_of_string()

stub_header_of_string() should not modify the header. So mark the
variable 'hdr' as const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
5 years agotools/ocaml: libxc: Check error return in stub_xc_vcpu_context_get()
Julien Grall [Sun, 29 Mar 2020 19:12:34 +0000 (20:12 +0100)]
tools/ocaml: libxc: Check error return in stub_xc_vcpu_context_get()

xc_vcpu_getcontext() may fail to retrieve the vcpu context. Rather than
ignoring the return value, check it and throw an error if needed.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
5 years agox86/pv: Delete CONFIG_PV_LDT_PAGING
Andrew Cooper [Fri, 17 Apr 2020 11:31:13 +0000 (12:31 +0100)]
x86/pv: Delete CONFIG_PV_LDT_PAGING

... in accordance with the timeline laid out in the Kconfig message.  There
has been no comment since it was disabled by default.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agosched: fix scheduler_disable() with core scheduling
Sergey Dyasli [Fri, 17 Apr 2020 07:28:16 +0000 (09:28 +0200)]
sched: fix scheduler_disable() with core scheduling

In core-scheduling mode, Xen might crash when entering ACPI S5 state.
This happens in sched_slave() during is_idle_unit(next) check because
next->vcpu_list is stale and points to an already freed memory.

This situation happens shortly after scheduler_disable() is called if
some CPU is still inside sched_slave() softirq. Current logic simply
returns prev->next_task from sched_wait_rendezvous_in() which causes
the described crash because next_task->vcpu_list has become invalid.

Fix the crash by returning NULL from sched_wait_rendezvous_in() in
the case when scheduler_disable() has been called.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agosched/core: fix bug when moving a domain between cpupools
Jeff Kubascik [Fri, 17 Apr 2020 07:27:21 +0000 (09:27 +0200)]
sched/core: fix bug when moving a domain between cpupools

For each UNIT, sched_set_affinity is called before unit->priv is updated
to the new cpupool private UNIT data structure. The issue is
sched_set_affinity will call the adjust_affinity method of the cpupool.
If defined, the new cpupool may use unit->priv (e.g. credit), which at
this point still references the old cpupool private UNIT data structure.

This change fixes the bug by moving the switch of unit->priv earler in
the function.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86_64/mm: map and unmap page tables in destroy_m2p_mapping
Wei Liu [Thu, 16 Apr 2020 09:05:58 +0000 (11:05 +0200)]
x86_64/mm: map and unmap page tables in destroy_m2p_mapping

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86_64/mm: map and unmap page tables in share_hotadd_m2p_table
Wei Liu [Thu, 16 Apr 2020 09:05:28 +0000 (11:05 +0200)]
x86_64/mm: map and unmap page tables in share_hotadd_m2p_table

Fetch lYe by mapping and unmapping lXe instead of using the direct map,
which is now done via the lYe_from_lXe() helpers.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86_64/mm: map and unmap page tables in m2p_mapped
Wei Liu [Thu, 16 Apr 2020 09:04:51 +0000 (11:04 +0200)]
x86_64/mm: map and unmap page tables in m2p_mapped

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/shim: map and unmap page tables in replace_va_mapping
Wei Liu [Thu, 16 Apr 2020 09:01:46 +0000 (11:01 +0200)]
x86/shim: map and unmap page tables in replace_va_mapping

Also, introduce lYe_from_lXe() macros which do not rely on the direct
map when walking page tables. Unfortunately, they cannot be inline
functions due to the header dependency on domain_page.h, so keep them as
macros just like map_lYt_from_lXe().

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agohvmloader: enable MMIO and I/O decode, after all resource allocation
Harsha Shamsundara Havanur [Thu, 16 Apr 2020 08:58:46 +0000 (10:58 +0200)]
hvmloader: enable MMIO and I/O decode, after all resource allocation

It was observed that PCI MMIO and/or IO BARs were programmed with
memory and I/O decodes (bits 0 and 1 of PCI COMMAND register) enabled,
during PCI setup phase. This resulted in incorrect memory mapping as
soon as the lower half of the 64 bit bar is programmed.
This displaced any RAM mappings under 4G. After the
upper half is programmed PCI memory mapping is restored to its
intended high mem location, but the RAM displaced is not restored.
The OS then continues to boot and function until it tries to access
the displaced RAM at which point it suffers a page fault and crashes.

This patch address the issue by deferring enablement of memory and
I/O decode in command register until all the resources, like interrupts
I/O and/or MMIO BARs for all the PCI device functions are programmed,
in the descending order of memory requested.

Signed-off-by: Harsha Shamsundara Havanur <havanur@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/altp2m: add missing break
Roger Pau Monné [Thu, 16 Apr 2020 08:55:42 +0000 (10:55 +0200)]
x86/altp2m: add missing break

Add a missing break in the HVMOP_altp2m_set_visibility case, or else
code flow will continue into the default case and trigger the assert.

Fixes: 3fd3e9303ec4b1 ('x86/altp2m: hypercall to set altp2m view visibility')
Coverity-ID: 1461759
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/boot: Fix early exception handling with CONFIG_PERF_COUNTERS
Andrew Cooper [Wed, 15 Apr 2020 16:01:09 +0000 (17:01 +0100)]
x86/boot: Fix early exception handling with CONFIG_PERF_COUNTERS

The PERFC_INCR() macro uses current->processor, but current is not valid
during early boot.  This causes the following crash to occur if
e.g. rdmsr_safe() has to recover from a #GP fault.

  (XEN) Early fatal page fault at e008:ffff82d0803b1a39 (cr2=0000000000000004, ec=0000)
  (XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e008:[<ffff82d0803b1a39>] x86_64/entry.S#handle_exception_saved+0x64/0xb8
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d0803b1a39>] R x86_64/entry.S#handle_exception_saved+0x64/0xb8
  (XEN)    [<ffff82d0806394fe>] F __start_xen+0x2cd/0x2980
  (XEN)    [<ffff82d0802000ec>] F __high_start+0x4c/0x4e

Furthermore, the PERFC_INCR() macro is wildly inefficient.  There has been a
single caller for many releases now, so inline it and delete the macro
completely.

There is no need to reference current at all.  What is actually needed is the
per_cpu_offset which can be obtained directly from the top-of-stack block.
This simplifies the counter handling to 3 instructions and no spilling to the
stack at all.

The same breakage from above is now handled properly:

  (XEN) traps.c:1591: GPF (0000): ffff82d0806394fe [__start_xen+0x2cd/0x2980] -> ffff82d0803b3bfb

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Julien Grall <jgrall@amazon.com>
5 years agox86/svm: Don't use vmcb->tlb_control as if it is a boolean
Andrew Cooper [Tue, 12 Feb 2019 18:37:04 +0000 (18:37 +0000)]
x86/svm: Don't use vmcb->tlb_control as if it is a boolean

svm_asid_handle_vmrun() treats tlb_control as if it were boolean, but this has
been superseded by new additions to the SVM spec.

Introduce an enum containing all legal values, and update
svm_asid_handle_vmrun() to use appropriate constants.

While adjusting this, take the opportunity to fix up two coding style issues,
and trim the include list.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agognttab: fix GNTTABOP_copy continuation handling
Jan Beulich [Tue, 14 Apr 2020 12:42:32 +0000 (14:42 +0200)]
gnttab: fix GNTTABOP_copy continuation handling

The XSA-226 fix was flawed - the backwards transformation on rc was done
too early, causing a continuation to not get invoked when the need for
preemption was determined at the very first iteration of the request.
This in particular means that all of the status fields of the individual
operations would be left untouched, i.e. set to whatever the caller may
or may not have initialized them to.

This is part of XSA-318.

Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Pawel Wieczorkiewicz <wipawel@amazon.de>