]> xenbits.xensource.com Git - people/iwj/xen.git/log
people/iwj/xen.git
5 years agoxen/arm32: head: Mark the end of subroutines with ENDPROC
Julien Grall [Wed, 26 Jun 2019 11:29:54 +0000 (12:29 +0100)]
xen/arm32: head: Mark the end of subroutines with ENDPROC

putn() and puts() are two subroutines. Add ENDPROC for the benefits of
static analysis tools and the reader.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: head: Add a macro to move an immediate constant into a 32-bit register
Julien Grall [Mon, 15 Apr 2019 20:58:51 +0000 (21:58 +0100)]
xen/arm32: head: Add a macro to move an immediate constant into a 32-bit register

The current boot code is using the pattern ldr rX, =... to move an
immediate constant into a 32-bit register.

This pattern implies to load the immediate constant from a literal pool,
meaning a memory access will be performed.

The memory access can be avoided by using movw/movt instructions.

A new macro is introduced to move an immediate constant into a 32-bit
register without a memory load. Follow-up patches will make use of it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Fix typo in the documentation on top of init_uart()
Julien Grall [Wed, 31 Jul 2019 19:26:19 +0000 (20:26 +0100)]
xen/arm64: head: Fix typo in the documentation on top of init_uart()

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Introduce a macro to get a PC-relative address of a symbol
Julien Grall [Mon, 17 Jun 2019 13:51:21 +0000 (14:51 +0100)]
xen/arm64: head: Introduce a macro to get a PC-relative address of a symbol

Arm64 provides instructions to load a PC-relative address, but with some
limitations:
   - adr is enable to cope with +/-1MB
   - adrp is enale to cope with +/-4GB but relative to a 4KB page
     address

Because of that, the code requires to use 2 instructions to load any Xen
symbol. To make the code more obvious, introducing a new macro adr_l is
introduced.

The new macro is used to replace a couple of open-coded use in
efi_xen_start.

The macro is copied from Linux 5.2-rc4.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Setup TTBR_EL2 in enable_mmu() and add missing isb
Julien Grall [Sat, 13 Apr 2019 21:55:18 +0000 (22:55 +0100)]
xen/arm64: head: Setup TTBR_EL2 in enable_mmu() and add missing isb

At the moment, TTBR_EL2 is setup in create_page_tables(). This is fine
as it is called by every CPUs.

However, such assumption may not hold in the future. To make change
easier, the TTBR_EL2 is not setup in enable_mmu().

Take the opportunity to add the missing isb() to ensure the TTBR_EL2 is
seen before the MMU is turned on.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Rework and document launch()
Julien Grall [Mon, 15 Apr 2019 11:24:30 +0000 (12:24 +0100)]
xen/arm64: head: Rework and document launch()

Boot CPU and secondary CPUs will use different entry point to C code. At
the moment, the decision on which entry to use is taken within launch().

In order to avoid a branch for the decision and make the code clearer,
launch() is reworked to take in parameters the entry point and its
arguments.

Lastly, document the behavior and the main registers usage within the
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: lpae: Allow more LPAE helpers to be used in assembly
Julien Grall [Tue, 6 Aug 2019 17:14:08 +0000 (18:14 +0100)]
xen/arm: lpae: Allow more LPAE helpers to be used in assembly

A follow-up patch will require to use *_table_offset() and *_MASK helpers
from assembly. This can be achieved by using _AT() macro to remove the type
when called from assembly.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86/cpuid: Extend the cpuid= option to support all named features
Andrew Cooper [Mon, 26 Nov 2018 17:06:23 +0000 (17:06 +0000)]
x86/cpuid: Extend the cpuid= option to support all named features

For gen-cpuid.py, fix a comment describing self.names, and generate the
reverse mapping in self.values.  Write out INIT_FEATURE_NAMES which maps a
string name to a bit position.

For parse_cpuid(), use cmdline_strcmp() and perform a binary search over
INIT_FEATURE_NAMES.  A tweak to cmdline_strcmp() is needed to break at equals
signs as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/apic: do not initialize LDR and DFR for bigsmp
Bandan Das [Fri, 6 Sep 2019 15:07:55 +0000 (17:07 +0200)]
x86/apic: do not initialize LDR and DFR for bigsmp

Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
bigsmp APIC implementation uses physical destination mode, but it
nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
multiple bit being set.

This does not cause a functional problem because LDR and DFR are ignored
when physical destination mode is active, but it triggered a problem on a
32-bit KVM guest which jumps into a kdump kernel.

The multiple bits set unearthed a bug in the KVM APIC implementation. The
code which creates the logical destination map for VCPUs ignores the
disabled state of the APIC and ends up overwriting an existing valid entry
and as a result, APIC calibration hangs in the guest during kdump
initialization.

Remove the bogus LDR/DFR initialization.

This is not intended to work around the KVM APIC bug. The LDR/DFR
ininitalization is wrong on its own.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bandan Das <bsd@redhat.com>
[Linux commit bae3a8d3308ee69a7dbdf145911b18dfda8ade0d]

Drop init_apic_ldr_x2apic_phys() at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/apic: include the LDR when clearing out APIC registers
Bandan Das [Fri, 6 Sep 2019 15:07:14 +0000 (17:07 +0200)]
x86/apic: include the LDR when clearing out APIC registers

Although APIC initialization will typically clear out the LDR before
setting it, the APIC cleanup code should reset the LDR.

This was discovered with a 32-bit KVM guest jumping into a kdump
kernel. The stale bits in the LDR triggered a bug in the KVM APIC
implementation which caused the destination mapping for VCPUs to be
corrupted.

Note that this isn't intended to paper over the KVM APIC bug. The kernel
has to clear the LDR when resetting the APIC registers except when X2APIC
is enabled.

Signed-off-by: Bandan Das <bsd@redhat.com>
[Linux commit 558682b5291937a70748d36fd9ba757fb25b99ae]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: drop CONFIG_X86_MCE_THERMAL
Jan Beulich [Fri, 6 Sep 2019 15:06:19 +0000 (17:06 +0200)]
x86: drop CONFIG_X86_MCE_THERMAL

There's no point having this if it's not exposed through Kconfig.

Take the liberty and also drop an unnecessary "return" in context.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mwait-idle: add support for Jacobsville
Zhang Rui [Fri, 6 Sep 2019 15:05:39 +0000 (17:05 +0200)]
x86/mwait-idle: add support for Jacobsville

Jacobsville uses the same C-states as Denverton.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[Linux commit 04b1d5d098491244f506c4265cc95b87210eef2f]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/xstate: make use_xsave non-init
Roger Pau Monné [Fri, 6 Sep 2019 15:04:39 +0000 (17:04 +0200)]
x86/xstate: make use_xsave non-init

LLVM code generation can attempt to load from a variable in the next
condition of an expression under certain circumstances, thus
attempting to load use_xsave regardless of the value of the bsp
variable, which leads to a page fault when the init section has
already been unmapped.

Fix this by making use_xsave non-init, thus preventing the page fault;
use __read_mostly instead. The LLVM bug with the discussion about this
issue can be found at:

https://bugs.llvm.org/show_bug.cgi?id=39707

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoRevert "x86/shim: Refresh pvshim_defconfig"
Andrew Cooper [Fri, 6 Sep 2019 12:33:19 +0000 (13:33 +0100)]
Revert "x86/shim: Refresh pvshim_defconfig"

This reverts commit 32b1d62887d01f85f0c1d2e0103f69f74e1f6fa3 and its fixup
060f4eee0fb408b316548775ab921e16b7acd0e0, which are still causing build and
test problems.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/AMD: Fix handling of x87 exception pointers on Fam17h hardware
Andrew Cooper [Thu, 27 Dec 2018 15:14:01 +0000 (15:14 +0000)]
x86/AMD: Fix handling of x87 exception pointers on Fam17h hardware

AMD Pre-Fam17h CPUs "optimise" {F,}X{SAVE,RSTOR} by not saving/restoring
FOP/FIP/FDP if an x87 exception isn't pending.  This causes an information
leak, CVE-2006-1056, and worked around by several OSes, including Xen.  AMD
Fam17h CPUs no longer have this leak, and advertise so in a CPUID bit.

Introduce the RSTR_FP_ERR_PTRS feature, as specified by AMD, and expose to all
guests by default.  While adjusting libxl's cpuid table, add CLZERO which
looks to have been omitted previously.

Also introduce an X86_BUG bit to trigger the (F)XRSTOR workaround, and set it
on AMD hardware where RSTR_FP_ERR_PTRS is not advertised.  Optimise the
conditions for the workaround paths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/feature: Generalise synth and introduce a bug word
Andrew Cooper [Thu, 27 Dec 2018 15:13:55 +0000 (15:13 +0000)]
x86/feature: Generalise synth and introduce a bug word

Future changes are going to want to use cpu_bug_* in a mannor similar to
Linux.  Introduce one bug word, and generalise the calculation of
NCAPINTS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/vtd: Drop struct intel_iommu
Andrew Cooper [Tue, 27 Nov 2018 15:27:41 +0000 (15:27 +0000)]
x86/vtd: Drop struct intel_iommu

The sole remaining member of struct intel_iommu is the drhd backpointer.  Move
this into struct vtd_iommu, replacing the the 'intel' pointer.

This removes one dynamic memory allocation per IOMMU on the system.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/vtd: Drop struct iommu_flush
Andrew Cooper [Tue, 27 Nov 2018 15:06:15 +0000 (15:06 +0000)]
x86/vtd: Drop struct iommu_flush

It is unclear why this abstraction exists, but iommu_get_flush() returns
possibly NULL and every user unconditionally dereferences the result.  In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.

Move the two function pointers into struct vtd_iommu (using a flush prefix),
and delete iommu_get_flush().  Furthermore, there is no need to pass the IOMMU
pointer to the callbacks via a void pointer, so change the parameter to be
correctly typed as struct vtd_iommu.  Clean up bool_t to bool in surrounding
context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/vtd: Drop struct ir_ctrl
Andrew Cooper [Tue, 27 Nov 2018 15:02:18 +0000 (15:02 +0000)]
x86/vtd: Drop struct ir_ctrl

It is unclear why this abstraction exists, but iommu_ir_ctrl() returns
possibly NULL and every user unconditionally dereferences the result.  In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.

Move the fields into struct vtd_iommu, and delete iommu_ir_ctrl().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/vtd: Drop struct qi_ctrl
Andrew Cooper [Tue, 27 Nov 2018 14:57:14 +0000 (14:57 +0000)]
x86/vtd: Drop struct qi_ctrl

It is unclear why this abstraction exists, but iommu_qi_ctrl() returns
possibly NULL and every user unconditionally dereferences the result.  In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.

Move the sole member into struct vtd_iommu, and delete iommu_qi_ctrl().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/vtd: Rename struct iommu to vtd_iommu
Andrew Cooper [Tue, 27 Nov 2018 15:05:48 +0000 (15:05 +0000)]
x86/vtd: Rename struct iommu to vtd_iommu

VT-d's local struct iommu is an overly-generic name, for a structure which in
practice maps 1-to-1 with the real IOMMUs in the system.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
5 years agoVT-d/ATS: tidy device_in_domain()
Jan Beulich [Thu, 5 Sep 2019 08:02:11 +0000 (10:02 +0200)]
VT-d/ATS: tidy device_in_domain()

Use appropriate types. Drop unnecessary casts. Check for failures which
can (at least in theory because of non-obvious breakage elsewhere)
occur, instead of ones which really can't (map_domain_page() won't
return NULL).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86: remove sched-if.h includes from various sources
Juergen Gross [Thu, 5 Sep 2019 08:00:36 +0000 (10:00 +0200)]
x86: remove sched-if.h includes from various sources

xen/sched-if.h is included in multiple sources where it isn't directly
needed. Remove those #include statements.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpu-policy: work around bogus warning in test harness
Jan Beulich [Thu, 5 Sep 2019 08:00:07 +0000 (10:00 +0200)]
x86/cpu-policy: work around bogus warning in test harness

Despite %.12s properly limiting the number of characters read from
ident[], gcc 9 (at least up to 9.2.0) warns about the strings not
being nul-terminated:

test-cpu-policy.c:64:18: error: '%.12s' directive argument is not a nul-terminated string [-Werror=format-overflow=]
   64 |             fail("  Test '%.12s', expected vendor %u, got %u\n",
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test-cpu-policy.c:20:12: note: in definition of macro 'fail'
   20 |     printf(fmt, ##__VA_ARGS__);                 \
      |            ^~~
test-cpu-policy.c:64:27: note: format string is defined here
   64 |             fail("  Test '%.12s', expected vendor %u, got %u\n",
      |                           ^~~~~
test-cpu-policy.c:44:7: note: referenced argument declared here
   44 |     } tests[] = {
      |       ^~~~~

The issue was reported against gcc in their bugzilla (bug 91667).

Re-order array entries, oddly enough suppressing the warning.

Reported-by: Christopher Clark <christopher.w.clark@gmail.com>
Reported-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agop2m/ept: add _subtree suffix to ept_invalidate_emt
Roger Pau Monné [Thu, 5 Sep 2019 07:59:26 +0000 (09:59 +0200)]
p2m/ept: add _subtree suffix to ept_invalidate_emt

So that the name implies the function is used to walk the page table
pointer passed as parameter. Drop the parent_ prefix from the level
parameter, since the level passed is the one matching the EPT entry
passed in the mfn parameter.

While there also change bool_t to bool and add an assert to make sure
no level 0 entries (ie: 4K EPT leaf entries) are passed as parameters.

No functional change intended.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoVT-d: avoid PCI device lookup
Jan Beulich [Thu, 5 Sep 2019 07:58:17 +0000 (09:58 +0200)]
VT-d: avoid PCI device lookup

The two uses of pci_get_pdev_by_domain() lack proper locking, but are
also only used to get hold of a NUMA node ID. Calculate and store the
node ID earlier on and remove the lookups (in lieu of fixing the
locking).

While doing this it became apparent that iommu_alloc()'s use of
alloc_pgtable_maddr() would occur before RHSAs would have been parsed:
iommu_alloc() gets called from the DRHD parsing routine, which - on
spec conforming platforms - happens strictly before RHSA parsing. Defer
the allocation until after all ACPI table parsing has finished,
established the node ID there first.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoVT-d: tidy <X>_to_<Y>() functions
Jan Beulich [Thu, 5 Sep 2019 07:57:44 +0000 (09:57 +0200)]
VT-d: tidy <X>_to_<Y>() functions

Drop iommu_to_drhd() altogether - there's no need for a loop here, the
corresponding DRHD is a field in struct intel_iommu.

Constify drhd_to_rhsa()'s parameter and adjust style.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/shadow: don't enable shadow mode with too small a shadow allocation (part 2)
Jan Beulich [Thu, 5 Sep 2019 07:56:42 +0000 (09:56 +0200)]
x86/shadow: don't enable shadow mode with too small a shadow allocation (part 2)

Commit 2634b997af ("x86/shadow: don't enable shadow mode with too small
a shadow allocation") was incomplete: The adjustment done there to
shadow_enable() is also needed in shadow_one_bit_enable(). The (new)
problem report was (apparently) a failed PV guest migration followed by
another migration attempt for that same guest. Disabling log-dirty mode
after the first one had left a couple of shadow pages allocated (perhaps
something that also wants fixing), and hence the second enabling of
log-dirty mode wouldn't have allocated anything further.

Reported-by: James Wang <jnwang@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/altp2m: Add a new hypercall to get the active altp2m index
Alexandru Stefan ISAILA [Wed, 4 Sep 2019 15:17:39 +0000 (16:17 +0100)]
x86/altp2m: Add a new hypercall to get the active altp2m index

The patch adds a new lib xc function (xc_altp2m_get_vcpu_p2m_idx) that
uses a new hvmop (HVMOP_altp2m_get_p2m_idx) to get the active altp2m
index from a given vcpu.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agotools/shim: Apply more duct tape to the linkfarm logic
Andrew Cooper [Mon, 2 Sep 2019 16:16:53 +0000 (17:16 +0100)]
tools/shim: Apply more duct tape to the linkfarm logic

Sander reported a build failure which manifests as `make; make install`
failing with:

  <snip>/cross-install -m0644 -p xen-dir/xen-shim //usr/local/lib/xen/boot/xen-shim
  install: cannot stat 'xen-dir/xen-shim': No such file or directory
  make[4]: *** [Makefile:52: install] Error 1
  make[4]: Leaving directory '/usr/src/new/xen-unstable/tools/firmware'

It has subsequently been seen intermittently by OSSTest.  This was caused by
c/s 32b1d628 triggering a preexisting linkfarm bug for partial rebuilds.

Between the first `make` and the subsequent `make install`, the linkfarm logic
observes new final build products and regenerates the linkfarm.  This includes
a distclean, which throws away everything from the first `make`.

As the xen-shim rule use a symlink, the link itself remains still up-to-date
but is broken due to the distclean, which causes install to fail.

Update the linkfarm logic to not regenerate itself when build artefacts
appear.  This isn't a comprehensive fix but is the best which can be done
easily.  Any further effort would be better spent making out-of-tree builds
work for Xen.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools/shim: Fix race condition creating linkfarm.stamp
Andrew Cooper [Thu, 29 Aug 2019 17:19:25 +0000 (18:19 +0100)]
tools/shim: Fix race condition creating linkfarm.stamp

In the case the while loop gets interrupted, the target musn't appear as
up-to-date.  The mov $X.tmp $X must be the last action of the rule.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agodebugtrace: use common output function
Juergen Gross [Tue, 3 Sep 2019 12:51:28 +0000 (14:51 +0200)]
debugtrace: use common output function

Today dumping the debugtrace buffers is done via sercon_puts(), while
direct printing of trace entries (after toggling output to the console)
is using serial_puts().

Use sercon_puts() in both cases, as the difference between both is not
really making sense.

In order to prepare moving debugtrace functionality to an own source
file rename sercon_puts() to console_serial_puts() and make it globally
visible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86emul: support INVPCID
Jan Beulich [Tue, 3 Sep 2019 12:50:33 +0000 (14:50 +0200)]
x86emul: support INVPCID

Just like for INVLPGA the HVM hook only supports PCID 0 for the time
being for individual address invalidation. It also translates the other
types to a full flush, which is architecturally permitted and
performance-wise presumably not much worse because emulation is slow
anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: generalize invlpg() hook
Jan Beulich [Tue, 3 Sep 2019 12:49:52 +0000 (14:49 +0200)]
x86emul: generalize invlpg() hook

The hook is already in use for INVLPGA as well. Rename the hook and add
parameters. For the moment INVLPGA with a non-zero ASID remains
unsupported, but the TODO item gets pushed into the actual hook handler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: ignore guest INVD uses
Jan Beulich [Tue, 3 Sep 2019 12:49:20 +0000 (14:49 +0200)]
x86/HVM: ignore guest INVD uses

The only place we'd expect the insn to be sensibly used is in
(virtualization unaware) firmware.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: support WBNOINVD
Jan Beulich [Tue, 3 Sep 2019 12:48:19 +0000 (14:48 +0200)]
x86emul: support WBNOINVD

Rev 037 of Intel's ISA extensions document does not state intercept
behavior for the insn (I've been unofficially told that the distinction
is going to be by exit qualification, as I would have assumed
considering that this way it's sufficiently transparent to unaware
software, as using WBINVD in place of WBNOINVD is always correct, just
less efficient). Similarly AMD's PM volume 2 version 3.31 only states
that both use the same VMEXIT, but not how to distinugish them (other
than by decoding the insn). Therefore in the HVM case for now it'll be
backed by the same ->wbinvd_intercept() handlers.

Use this occasion and also add the two missing table entries for
CLDEMOTE, which doesn't require any further changes to make work.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agopublic: add macro for defining variable length array in public headers
Juergen Gross [Tue, 3 Sep 2019 12:47:18 +0000 (14:47 +0200)]
public: add macro for defining variable length array in public headers

Several public headers of the hypervisor contain structures with
variable length arrays. In order to be usable with different compilers
those definitions are depending on the compiler type and the standard
supported by the compiler.

In order to avoid open coding the different variants in each header
add a common macro for that purpose in xen.h.

This at once corrects most of the definitions which miss one case
leading to not defining the array at all.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/domain: remove the 'oos_off' flag
Paul Durrant [Tue, 3 Sep 2019 12:46:08 +0000 (14:46 +0200)]
x86/domain: remove the 'oos_off' flag

The flag is not needed since the domain 'options' can now be tested
directly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/apci: Adjust command line parsing for "acpi_sleep"
Andrew Cooper [Thu, 29 Aug 2019 12:35:33 +0000 (13:35 +0100)]
x86/apci: Adjust command line parsing for "acpi_sleep"

Perform parsing in a custom_param, rather than stashing the content in a
string and parsing in an initcall.  Adjust the parsing to conform to current
standards.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/acpi: Drop sleep_states[] and associated print messages
Andrew Cooper [Thu, 29 Aug 2019 12:28:15 +0000 (13:28 +0100)]
x86/acpi: Drop sleep_states[] and associated print messages

sleep_states[] is a write-only array, and despite the loop logic, the printed
message is consistently "ACPI sleep modes: S3".  Drop it all.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: shrink video_{flags,mode} to {8,16} bits
Jan Beulich [Mon, 2 Sep 2019 12:45:57 +0000 (14:45 +0200)]
x86: shrink video_{flags,mode} to {8,16} bits

We really don't need them to be any wider.

Also remove the C level declaration (and hence also the GLOBAL) of
video_mode altogether; it's used in assembly code only.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: a little bit of 16-bit video mode setting code cleanup
Jan Beulich [Mon, 2 Sep 2019 12:45:18 +0000 (14:45 +0200)]
x86: a little bit of 16-bit video mode setting code cleanup

To "compensate" for the code size growth by an earlier change:
- drop "trampoline" labels (in almost all cases the target label is
  reachable with an 8-bit-displacement branch anyway, and a single 16-
  bit-displacement branch is still better than a pair of two branches)
- drop an entirely dead insn from wakeup.S:mode_setw
- reduce code size in a few other (obvious I hope) cases, by more
  suitable insn/operands selection

Also drop redundant #define-s (move suitable #include a little earlier
instead) and add two alignment directives.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/ACPI: restore VESA mode upon resume from S3
Jan Beulich [Mon, 2 Sep 2019 12:41:19 +0000 (14:41 +0200)]
x86/ACPI: restore VESA mode upon resume from S3

In order for "acpi_sleep=s3_mode" to have any effect, we should record
the video mode we switched to during boot. Since right now there's mode
setting code for VESA modes only in the resume case, record the mode
just in that one case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: generalize wbinvd() hook
Jan Beulich [Mon, 2 Sep 2019 12:40:15 +0000 (14:40 +0200)]
x86emul: generalize wbinvd() hook

The hook is already in use for other purposes, and emulating e.g.
CLFLUSH by issuing WBINVD is, well, not very nice. Rename the hook and
add parameters. Use lighter weight flushing insns when possible in
hvmemul_cache_op().

hvmemul_cache_op() treating x86emul_invd the same as x86emul_wbinvd is
to retain original behavior, but I'm not sure this is what we want in
the long run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Paul Durrant <paul.durrant@citrix.com>
5 years agotimers: limit heap size
Jan Beulich [Mon, 2 Sep 2019 12:38:37 +0000 (14:38 +0200)]
timers: limit heap size

First and foremost make timer_softirq_action() avoid growing the heap
if its new size can't be stored without truncation. 64k entries is a
lot, and I don't think we're at risk of actually running into the issue,
but I also think it's better not to allow for hard to debug problems to
occur in the first place.

Furthermore also adjust the code such the size/limit fields becoming
unsigned int would at least work from a mere sizing point of view. For
this also switch various uses of plain int to unsigned int.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/domain: don't destroy IOREQ servers on soft reset
Igor Druzhinin [Fri, 30 Aug 2019 13:23:01 +0000 (15:23 +0200)]
x86/domain: don't destroy IOREQ servers on soft reset

Performing soft reset should not opportunistically kill IOREQ servers
for device emulators that might be currently running for a domain.
Every emulator is supposed to clean up IOREQ servers for itself on exit.
This allows a toolstack to elect whether or not a particular device
model should be restarted.

The original code was introduced in 3235cbfe ("arch-specific hooks for
domain_soft_reset()") likely due to the fact 'default' IOREQ server
existed in Xen at the time and used by QEMU didn't have an API call to
destroy. Since the removal of 'default' IOREQ server from Xen this
reason has gone away.

Since commit ba7fdd64b ("xen: cleanup IOREQ server on exit") QEMU now
destroys IOREQ server for itself as every other device emulator
is supposed to do. It's now safe to remove this code from soft reset
path - existing systems with old QEMU should be able to work as
even if there are IOREQ servers left behind, a new QEMU instance will
override its ranges anyway.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: move INVPCID_TYPE_* to x86-defns.h
Jan Beulich [Fri, 30 Aug 2019 13:21:54 +0000 (15:21 +0200)]
x86: move INVPCID_TYPE_* to x86-defns.h

This way the insn emulator can then too use the #define-s. In place of
the TYPE infix add an X86 prefix.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/ACPI: re-park previously parked CPUs upon resume from S3
Jan Beulich [Fri, 30 Aug 2019 08:24:13 +0000 (10:24 +0200)]
x86/ACPI: re-park previously parked CPUs upon resume from S3

Aiui when resuming from S3, CPUs come back out of RESET/INIT. Therefore
they need to undergo the same procedure as was added elsewhere by
commits d8f974f1a6 ("x86: command line option to avoid use of secondary
hyper-threads") and 8797d20a6e ("x86: possibly bring up all CPUs even
if not all are supposed to be used").

Just like done at boot time, avoid (at least pointlessly) using
stop-machine logic.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/ucode/AMD: make freeing of old ucode conditional
Chao Gao [Fri, 30 Aug 2019 08:22:55 +0000 (10:22 +0200)]
x86/ucode/AMD: make freeing of old ucode conditional

It is certain to be NULL at least the first time through.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: properly gate clearing of PKU feature
Jan Beulich [Thu, 29 Aug 2019 13:10:07 +0000 (15:10 +0200)]
x86: properly gate clearing of PKU feature

setup_clear_cpu_cap() is __init and hence may not be called post-boot.
Note that opt_pku nevertheless is not getting __initdata added - see
e.g. commit 43fa95ae6a ("mm: make opt_bootscrub non-init").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agopartially revert "x86/mm: Clean IOMMU flags from p2m-pt code"
Roger Pau Monné [Thu, 29 Aug 2019 13:08:46 +0000 (15:08 +0200)]
partially revert "x86/mm: Clean IOMMU flags from p2m-pt code"

This partially reverts commit
854a49a7486a02edae5b3e53617bace526e9c1b1 by re-adding the logic that
propagates changes to the domain physmap done by p2m_pt_set_entry into
the iommu page tables. Without this logic changes to the guest physmap
are not propagated to the iommu, leaving stale iommu entries that can
leak data, or failing to add new entries.

Note that this commit doesn't re-introduce iommu flags to the cpu page
table entries, since the logic to add/remove entries to the iommu page
tables is based on the p2m type and the mfn.

Fixes: 854a49a7486a02 ('x86/mm: Clean IOMMU flags from p2m-pt code')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/boot: Drop all use of lmsw
Andrew Cooper [Wed, 1 May 2019 17:14:03 +0000 (18:14 +0100)]
x86/boot: Drop all use of lmsw

lmsw is an obsolete relic of the 286 processor - so much so that it even lacks
intercept assistance on AMD processors.

Use a plain mov to %cr0 which is easier to follow, certainly faster to
virtualise on AMD hardware, and almost certainly a faster microcode path in
real hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Further minor GDT corrections
Andrew Cooper [Mon, 19 Aug 2019 12:18:06 +0000 (13:18 +0100)]
x86/boot: Further minor GDT corrections

gdt_boot_descr and gdt_48 disagree on how long trampoline_gdt is.

Introduce an end label for each GDT and have the linker calculate their size,
rather than hard coding it.

Also, just as with c/s af292b41e9, there is no point forcing the CPU to set
Access bits.  Fix all affected GDTs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/suspend: Simplify system table handling on resume
Andrew Cooper [Mon, 12 Aug 2019 17:40:04 +0000 (18:40 +0100)]
x86/suspend: Simplify system table handling on resume

load_TR() is used exclusively in the resume path, but jumps through a lot of
unnecessary hoops.  As suspend/resume is strictly on CPU0 in idle context, the
correct GDT to use is boot_gdt, which means it doesn't need saving on suspend.

Although doing more than strictly necessary, reuse load_system_tables(), which
is already used by APs on the S3 resume path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/desc: Move boot_gdtr into .rodata
Andrew Cooper [Mon, 12 Aug 2019 14:16:38 +0000 (15:16 +0100)]
x86/desc: Move boot_gdtr into .rodata

It is never written to.

This was an oversight when it was moved from asm into C.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/suspend: Sanity check more properties in enter_state()
Andrew Cooper [Mon, 12 Aug 2019 14:14:13 +0000 (15:14 +0100)]
x86/suspend: Sanity check more properties in enter_state()

The logic depends on being run on CPU0, and in IDLE context.  Having this
explicitly identified allows for simplification of the whole S3 path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: Drop XEN_DOMCTL_{get,set}_machine_address_size
Andrew Cooper [Wed, 7 Aug 2019 11:53:51 +0000 (12:53 +0100)]
xen: Drop XEN_DOMCTL_{get,set}_machine_address_size

This functionality is obsolete.  It was introduced by c/s 41296317a31 into
Xend, but was never exposed in libxl.

Nothing limits this to PV guests, but it makes no sense for HVM guests.

Looking through the XenServer templates, this was used to work around bugs in
the 32bit RHEL/CentOS 4.7 and 4.8 kernels (fixed in 4.9) and RHEL/CentOS/OEL
5.2 and 5.3 kernels (fixed in 5.4).  RHEL 4 as a major version went out of
support in 2017, whereas the 5.2/5.3 kernels went out of support when 5.4 was
released in 2009.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: Drop XEN_DOMCTL_suppress_spurious_page_faults
Andrew Cooper [Wed, 7 Aug 2019 11:49:37 +0000 (12:49 +0100)]
xen: Drop XEN_DOMCTL_suppress_spurious_page_faults

This functionality is obsolete.  It was introduced by c/s 39407bed9c0 into
Xend, but never exposed in libxl.

While not explicitly limited to PV guests, this is PV-only by virtue of its
position in the pagefault handler.

Looking though the XenServer templates, this was used to work around bugs in
the 32bit RHEL/CentOS 4.{5..7} kernels (fixed in 4.8).  RHEL 4 as a major
version when out if support in 2017.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hvm/domain: remove the 'hap_enabled' flag
Paul Durrant [Wed, 28 Aug 2019 14:58:45 +0000 (16:58 +0200)]
x86/hvm/domain: remove the 'hap_enabled' flag

The hap_enabled() macro can determine whether the feature is available
using the domain 'options'; there is no need for a separate flag.

NOTE: Furthermore, by extending sanitizing of the domain 'options', the
      macro can be transformed into an inline function and re-located to
      xen/sched.h. This also makes hap_enabled() common, thus allowing
      removal of an ugly ifdef CONFIG_X86 from the common iommu code.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agop2m/ept: pass correct level to atomic_write_ept_entry in ept_invalidate_emt
Roger Pau Monné [Wed, 28 Aug 2019 14:57:36 +0000 (16:57 +0200)]
p2m/ept: pass correct level to atomic_write_ept_entry in ept_invalidate_emt

The level passed to ept_invalidate_emt corresponds to the EPT entry
passed as the mfn parameter, which is a pointer to an EPT page table,
hence the entries in that page table will have one level less than the
parent.

Fix the call to atomic_write_ept_entry to pass the correct level, ie:
one level less than the parent.

Fixes: 50fe6e73705 ('pvh dom0: add and remove foreign pages')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>.
5 years agomicrocode/amd: fix memory leak
Chao Gao [Wed, 28 Aug 2019 14:52:18 +0000 (16:52 +0200)]
microcode/amd: fix memory leak

Two buffers, '->equiv_cpu_table' and '->mpb',  inside 'mc_amd' might be
allocated and in the error-handing path they are not freed properly.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/arm: traps: Remove all zero padding before PRIregister format
Julien Grall [Wed, 14 Aug 2019 09:36:07 +0000 (10:36 +0100)]
xen/arm: traps: Remove all zero padding before PRIregister format

Commit af156ff085 "xen/arm: types: Specify the zero padding in the
definition of PRIregister" moved the zero padding within the definition
of PRIregister.

However, some of the users still had zero padding before which result
to print tens of zero when dumping the CPU state.

To prevent this, remove the last users of zero padding before
PRIregister.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86/mm: correctly initialise M2P entries on boot
Igor Druzhinin [Tue, 27 Aug 2019 11:48:05 +0000 (12:48 +0100)]
x86/mm: correctly initialise M2P entries on boot

Since guest resource management work it's now possible to have a page
assigned to a domain without a valid M2P entry. Some paths in the code
rely on the fact a GFN returned from mfn_to_gfn() for such a page
is not valid as well, i.e. see arch_iommu_populate_page_table().

For systems without 512GB contiguous RAM M2P entries were already
correctly initialised on boot with INVALID_M2P_ENTRY (~0UL) but
on systems where M2P could be covered by a single 1GB page directory
0x77 poison was used instead. That eventually resulted in a crash
during IOMMU construction on systems without shared PTs enabled.

While here fix up compat M2P entries as well.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoMAINTAINERS: remove myself from SVM and AMD IOMMU
Brian Woods [Thu, 15 Aug 2019 22:49:26 +0000 (22:49 +0000)]
MAINTAINERS: remove myself from SVM and AMD IOMMU

I will no longer be working at AMD and am removing myself.

Signed-off-by: Brian Woods <brian.woods@amd.com>
5 years agotools/oxenstored: port XS_INTRODUCE evtchn rebind function from cxenstored
Igor Druzhinin [Mon, 19 Aug 2019 18:45:35 +0000 (19:45 +0100)]
tools/oxenstored: port XS_INTRODUCE evtchn rebind function from cxenstored

C version of xenstored had this ability since 61aaed0d5 ("Allow
XS_INTRODUCE to be used for rebinding the xenstore evtchn.") from 2007.
Copy it as is to Ocaml version.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
5 years agox86/domain: remove the 's3_integrity' flag
Paul Durrant [Fri, 16 Aug 2019 17:19:56 +0000 (18:19 +0100)]
x86/domain: remove the 's3_integrity' flag

The flag is not needed since the domain 'options' can now be tested
directly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agodomain: remove the 'is_xenstore' flag
Paul Durrant [Fri, 16 Aug 2019 17:19:55 +0000 (18:19 +0100)]
domain: remove the 'is_xenstore' flag

This patch introduces a convenience macro, is_xenstore_domain(), which
tests the domain 'options' directly and then uses that in place of
the 'is_xenstore' flag.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
Acked-by: George Dunlap <George.Dunlap@eu.citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
5 years agopassthrough: make deassign_device() static
Paul Durrant [Fri, 16 Aug 2019 17:19:52 +0000 (18:19 +0100)]
passthrough: make deassign_device() static

This function is only ever called from within the same source module and
really has no business being declared xen/iommu.h. This patch relocates
the function ahead of the first caller and makes it static.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/vtd: Fix S3 resume following c/s 650c31d3af
Andrew Cooper [Mon, 12 Aug 2019 17:08:00 +0000 (18:08 +0100)]
x86/vtd: Fix S3 resume following c/s 650c31d3af

c/s 650c31d3af "x86/IRQ: fix locking around vector management" adjusted the
locking in adjust_irq_affinity().

The S3 path ends up here via iommu_resume() before interrupts are enabled, at
which point spin_lock_irq() fails ASSERT(local_irq_is_enabled()); but with no
working console.

Use spin_lock_irqsave() instead to cope with interrupts already being
disabled.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoxen/console: debugtrace: Compute the buffer length is O(1) rather O(n)
Julien Grall [Wed, 21 Aug 2019 21:19:17 +0000 (22:19 +0100)]
xen/console: debugtrace: Compute the buffer length is O(1) rather O(n)

This was meant to be part of commit e0bf98394e "xen/console: Fix build
when CONFIG_DEBUG_TRACE" but not addressed before been committed.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/console: Fix build when CONFIG_DEBUG_TRACE=y
Julien Grall [Mon, 19 Aug 2019 17:13:05 +0000 (18:13 +0100)]
xen/console: Fix build when CONFIG_DEBUG_TRACE=y

Commit b5e6e1ee8da "xen/console: Don't treat NUL character as the end
of the buffer" extended sercon_puts to take the number of character
to print in argument.

Sadly, a couple of couple of the callers in debugtrace_dump_worker()
were not converted. This result to a build failure when enabling
CONFIG_DEBUG_TRACE.

Spotted by Travis using randconfig
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoinclude/public/memory.h: remove the XENMEM_rsrc_acq_caller_owned flag
Paul Durrant [Fri, 19 Jul 2019 12:25:45 +0000 (13:25 +0100)]
include/public/memory.h: remove the XENMEM_rsrc_acq_caller_owned flag

When commit 3f8f1228 "x86/mm: add HYPERVISOR_memory_op to acquire guest
resources" introduced the concept of directly mapping some guest resources,
it was envisaged that the memory for some resources associated with a guest
may not actually be assigned to that guest, specifically the IOREQ server
resource introduces in commit 6e387461 "x86/hvm/ioreq: add a new mappable
resource type...". Such resources were dubbed "caller owned" and resulted
in the owned resources" and acquiring them resulted in the
XENMEM_rsrc_acq_caller_owned flag being passed back to the caller of the
memory op.

Unfortunately the implementation led to XSA-276, which was mitigated
by commit f6b6ae78 "x86/hvm/ioreq: fix page referencing" and then a related
memory accounting problem was worked around by commit e862e6ce
"x86/hvm/ioreq: use ref-counted target-assigned shared pages". This latter
commit removed the only instance of a "caller owned" resource, but the
flag was left in header and checked in one place in the core code.
This patch removes that now redundant check and removes the definition of
XENMEM_rsrc_acq_caller_owned from the public header. Also, since this was
the only flag defined for the XENMEM_acquire_resource memory op, it removes
the 'flags' field of struct xen_mem_acquire_resource and replaces it with
an equivalently sized 'pad' field.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agopython: do not report handled EAGAIN error
Marek Marczykowski-Górecki [Tue, 20 Aug 2019 02:12:41 +0000 (04:12 +0200)]
python: do not report handled EAGAIN error

match_watch_by_token() when returns an error, sets also exception within
python. This is generally the right thing to do, but when
xspy_read_watch() handle EAGAIN error internally, the exception needs to
be cleared. Otherwise it will fail like this:

    xen.lowlevel.xs.Error: (11, 'Resource temporarily unavailable')

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      (...)
        result = self.handle.read_watch()
    SystemError: <method 'read_watch' of 'xen.lowlevel.xs.xs' objects> returned a result with an error set

Fixes f6e1023412 "python: Extract registered watch search logic from xspy_read_watch()"
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoviridian: make viridian_time_domain_freeze() safe to call...
Paul Durrant [Wed, 21 Aug 2019 08:22:58 +0000 (09:22 +0100)]
viridian: make viridian_time_domain_freeze() safe to call...

...on a partially destroyed domain.

viridian_time_domain_freeze() and viridian_time_vcpu_freeze() rely
(respectively) on the dynamically allocated per-domain and per-vcpu viridian
areas [1], which are freed during domain_relinquish_resources().
Because arch_domain_pause() can call viridian_domain_time_freeze() this
can lead to host crashes if e.g. a XEN_DOMCTL_pausedomain is issued after
domain_relinquish_resources() has run.

To prevent such crashes, this patch adds a check of is_dying into
viridian_time_domain_freeze(), and viridian_time_domain_thaw() which is
similarly vulnerable to indirection into freed memory.

NOTE: The patch also makes viridian_time_vcpu_freeze/thaw() static, since
      they have no callers outside of the same source module.

[1] See commit e7a9b5e72f26 "viridian: separately allocate domain and vcpu
    structures".

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/p2m: fix non-translated handling of iommu mappings
Roger Pau Monne [Tue, 23 Jul 2019 12:43:43 +0000 (14:43 +0200)]
x86/p2m: fix non-translated handling of iommu mappings

The current usage of need_iommu_pt_sync in p2m for non-translated
guests is wrong because it doesn't correctly handle a relaxed PV
hardware domain, that has need_sync set to false, but still need
entries to be added from calls to {set/clear}_identity_p2m_entry.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: Roman Shaposhnik <roman@zededa.com>
5 years agopython: Add XC binding for Xen build ID
Pawel Wieczorkiewicz [Tue, 20 Aug 2019 12:51:08 +0000 (12:51 +0000)]
python: Add XC binding for Xen build ID

Extend the list of xc() object methods with additional one to display
Xen's buildid. The implementation follows the libxl implementation
(e.g. max buildid size assumption being XC_PAGE_SIZE minus
sizeof(buildid->len)).

Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Martin Mazein <amazein@amazon.de>
Reviewed-by: Andra-Irina Paraschiv <andraprs@amazon.com>
Reviewed-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
5 years agoxen/arm: add reserved-memory regions to the dom0 memory node
Stefano Stabellini [Mon, 19 Aug 2019 17:43:38 +0000 (10:43 -0700)]
xen/arm: add reserved-memory regions to the dom0 memory node

Reserved memory regions are automatically remapped to dom0. Their device
tree nodes are also added to dom0 device tree. However, the dom0 memory
node is not currently extended to cover the reserved memory regions
ranges as required by the spec.  This commit fixes it.

Change make_memory_node to take a  struct meminfo * instead of a
kernel_info. Call it twice for dom0, once to create the first regular
memory node, and the second time to create a second memory node with the
ranges covering reserved-memory regions.

Also, make a small code style fix in make_memory_node.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: don't iomem_permit_access for reserved-memory regions
Stefano Stabellini [Mon, 19 Aug 2019 17:43:37 +0000 (10:43 -0700)]
xen/arm: don't iomem_permit_access for reserved-memory regions

Don't allow reserved-memory regions to be remapped into any unprivileged
guests, until reserved-memory regions are properly supported in Xen. For
now, do not call iomem_permit_access on them, because giving
iomem_permit_access to dom0 means that the toolstack will be able to
assign the region to a domU.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: handle reserved-memory in consider_modules and dt_unreserved_regions
Stefano Stabellini [Mon, 19 Aug 2019 17:43:36 +0000 (10:43 -0700)]
xen/arm: handle reserved-memory in consider_modules and dt_unreserved_regions

reserved-memory regions overlap with memory nodes. The overlapping
memory is reserved-memory and should be handled accordingly:
consider_modules and dt_unreserved_regions should skip these regions the
same way they are already skipping mem-reserve regions.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: early_print_info print reserved_mem
Stefano Stabellini [Mon, 19 Aug 2019 17:43:35 +0000 (10:43 -0700)]
xen/arm: early_print_info print reserved_mem

Improve early_print_info to also print the banks saved in
bootinfo.reserved_mem. Print them right after RESVD, increasing the same
index.

Since we are at it, also switch the existing RESVD print to use unsigned
int.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: fix indentation in early_print_info
Stefano Stabellini [Mon, 19 Aug 2019 17:43:34 +0000 (10:43 -0700)]
xen/arm: fix indentation in early_print_info

No functional changes.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: keep track of reserved-memory regions
Stefano Stabellini [Mon, 19 Aug 2019 17:43:33 +0000 (10:43 -0700)]
xen/arm: keep track of reserved-memory regions

As we parse the device tree in Xen, keep track of the reserved-memory
regions as they need special treatment (follow-up patches will make use
of the stored information.)

Reuse process_memory_node to add reserved-memory regions to the
bootinfo.reserved_mem array.

Refuse to continue once we reach the max number of reserved memory
regions to avoid accidentally mapping any portions of them into a VM.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: make process_memory_node a device_tree_node_func
Stefano Stabellini [Mon, 19 Aug 2019 17:43:32 +0000 (10:43 -0700)]
xen/arm: make process_memory_node a device_tree_node_func

Change the signature of process_memory_node to match
device_tree_node_func. Thanks to this change, the next patch will be
able to use device_tree_for_each_node to call process_memory_node on all
the children of a provided node.

Return error if there is no reg property or if nr_banks is reached. Let
the caller deal with the error.

Add a printk when device tree parsing fails.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: pass node to device_tree_for_each_node
Stefano Stabellini [Mon, 19 Aug 2019 17:43:31 +0000 (10:43 -0700)]
xen/arm: pass node to device_tree_for_each_node

Add a new parameter to device_tree_for_each_node: node, the node to
start the search from.

To avoid scanning device tree, and given that we only care about
relative increments of depth compared to the depth of the initial node,
we set the initial depth to 0. Then, we call func() for every node with
depth > 0.

Don't call func() on the parent node passed as an argument. Clarify the
change in the comment on top of the function. The current callers pass
the root node as argument: it is OK to skip the root node because no
relevant properties are in it, only subnodes.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
[julien: Remove min_depth variable]
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agolivepatch: always print XENLOG_ERR information
Pawel Wieczorkiewicz [Wed, 14 Aug 2019 12:23:05 +0000 (12:23 +0000)]
livepatch: always print XENLOG_ERR information

A lot of legitimate error messages were hidden behind debug printk
only. Most of these messages can be triggered by loading a malformed
hotpatch payload and are priceless for understanding issues with such
payloads.
Thus, always display all relevant XENLOG_ERR messages.

Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Amit Shah <aams@amazon.de>
Reviewed-by: Martin Mazein <amazein@amazon.de>
Reviewed-by: Bjoern Doebel <doebel@amazon.de>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
[Fix indentation and double LIVEPATCH prefixes, drop gratuitous punctuation]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/x86: pv: Convert update_intpte() to use typesafe MFN
Julien Grall [Tue, 30 Apr 2019 17:43:25 +0000 (18:43 +0100)]
xen/x86: pv: Convert update_intpte() to use typesafe MFN

The third parameter of update_intpte() is a MFN, so it can be switched
to use the typesafe.

At the same time, the typesafe is propagated as far as possible without
major modifications.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: Convert is_xen_fixed_mfn to use typesafe MFN
Julien Grall [Sat, 26 Jan 2019 16:38:47 +0000 (16:38 +0000)]
xen: Convert is_xen_fixed_mfn to use typesafe MFN

No functional changes.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoxen: Convert is_xen_heap_mfn to use typesafe MFN
Julien Grall [Sat, 26 Jan 2019 16:51:42 +0000 (16:51 +0000)]
xen: Convert is_xen_heap_mfn to use typesafe MFN

No functional changes.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoxen: Convert hotplug page function to use typesafe MFN
Julien Grall [Sat, 26 Jan 2019 16:31:55 +0000 (16:31 +0000)]
xen: Convert hotplug page function to use typesafe MFN

Convert online_page, offline_page and query_page_offline to use
typesafe MFN.

At the same time, the typesafe is propagated as far as possible without
major modifications.

Note, for clarity, the words have been re-ordered in the error message
updated by this patch.

No functional changes.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/grant-table: Make arch specific macros typesafe
Julien Grall [Sat, 26 Jan 2019 16:14:22 +0000 (16:14 +0000)]
xen/grant-table: Make arch specific macros typesafe

This patch rework all the arch specific macros in grant_table.h to use
the typesafe MFN/GFN.

At the same time, some functions are renamed s/gmfn/gfn/ to match the
current naming scheme (see include/mm.h).

No functional changes intended.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/x86: Use mfn_to_gfn rather than mfn_to_gmfn
Julien Grall [Sat, 26 Jan 2019 15:58:48 +0000 (15:58 +0000)]
xen/x86: Use mfn_to_gfn rather than mfn_to_gmfn

mfn_to_gfn and mfn_to_gmfn are doing exactly the same except the former
is using mfn_t and gfn_t (return type).

Furthermore, the naming of the former is more consistent with the
current naming scheme (GFN/MFN). So replace mfn_to_gmfn with
mfn_to_gfn in x86 code.

Take the opportunity to convert some of the callers to use typesafe GFN and
format the message correctly.

No functional changes.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
--
    Changes in v3:
        - The hunk in x86/mm.c is not necessary anymore
        - Update printk message to use GFN rather than frame when suitable
        - Update commit message with some NITs
        - Add Jan's reviewed-by

    Changes in v2:
        - mfn_to_gfn now returns a gfn_t
        - Use %pd and PRI_gfn when possible in the message
        - Don't split format string to help grep/ack.

5 years agoxen/x86: Make mfn_to_gfn typesafe
Julien Grall [Wed, 13 Mar 2019 15:36:40 +0000 (15:36 +0000)]
xen/x86: Make mfn_to_gfn typesafe

No functional changes intended.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86: Restore IA32_MISC_ENABLE on wakeup
Michał Kowalczyk [Mon, 19 Aug 2019 02:23:33 +0000 (04:23 +0200)]
x86: Restore IA32_MISC_ENABLE on wakeup

Code in intel.c:early_init_intel() modifies IA32_MISC_ENABLE MSR. Those
modifications must be restored after resuming from S3 (see e.g. Linux wakeup
code), otherwise bad things may happen (e.g. wakeup code may cause #GP when
trying to set IA32_EFER.NXE [1]).

This bug was noticed on a ThinkPad x230 with NX disabled in the BIOS:
Xen could correctly boot, but crashed when resuming from suspend.
Applying this patch fixed the problem.

[1] Intel SDM vol 3: "If the execute-disable capability is not
available, a write to set IA32_EFER.NXE produces a #GP exception."

Signed-off-by: Michał Kowalczyk <mkow@invisiblethingslab.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/console: Simplify domU console handling in guest_console_write
Julien Grall [Tue, 2 Apr 2019 14:30:21 +0000 (15:30 +0100)]
xen/console: Simplify domU console handling in guest_console_write

2 paths in the domU console handling are now the same. So they can be
merged to make the code simpler.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agoxen/public: Document HYPERCALL_console_io()
Julien Grall [Fri, 1 Mar 2019 15:39:21 +0000 (15:39 +0000)]
xen/public: Document HYPERCALL_console_io()

Currently, OS developpers will have to look at Xen code in order to know
the parameters of an hypercall and how it is meant to work.

This is not a trivial task as you may need to have a deep understanding
of Xen internal.

This patch attempts to document the behavior of HYPERCALL_console_io() to
help OS developer.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/console: Rework HYPERCALL_console_io interface
Julien Grall [Mon, 5 Aug 2019 10:19:03 +0000 (11:19 +0100)]
xen/console: Rework HYPERCALL_console_io interface

At the moment, HYPERCALL_console_io is using signed int to describe the
command (@cmd) and the size of the buffer (@count).
    * @cmd does not need to be signed this used as a set of named value.
    None of them are negative. If new one are introduced they can be
    positive.
    * @count is used to know the size of the buffer. It makes little
    sense to have a negative value here.

So both variables are now switched to use unsigned int.

Changing @count to unsigned type will result in a change of behavior for
the existing commands:
    - write: Any buffer bigger than 2GB will now be printed rather than
      been ignored (the command return 0).
    - read: The return value is a signed 32-bit value for 32-bit Xen.
      To keep compatibility between 32-bit and 64-bit ABI, it
      effectively means the return value is 32-bit (despite been long
      on 64-bit). Negative value are used for error and positive value
      for the number of characters read. To avoid clash between the two
      sets, the buffer is still limited to 2GB. The only difference is
      an error is returned rather than claiming there are no characters.

The behavior is only affecting unlikely use of the current interface, so
this is not a big concern regarding backward compatibility.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/console: Don't treat NUL character as the end of the buffer
Julien Grall [Tue, 26 Feb 2019 21:39:58 +0000 (21:39 +0000)]
xen/console: Don't treat NUL character as the end of the buffer

After upgrading Debian to Buster, I have began to notice console
mangling when using zsh in Dom0. This is happenning because output sent by
zsh to the console may contain NULs in the middle of the buffer.

The actual implementation of CONSOLEIO_write considers that a buffer
always terminate with a NUL and therefore will ignore anything after it.

In general, NULs are perfectly legitimate in terminal streams. For
instance, this could be used for padding slow terminals. See terminfo(5)
section `Delays and Padding`, or search for the pcre '\bpad'.

Other use cases includes using the console for dumping non-human
readable information (e.g debugger, file if no network...). With the
current behavior, the resulting stream will end up to be corrupted.

The documentation for CONSOLEIO_write is pretty limited (to not say
inexistent). From the declaration, the hypercall takes a buffer and size.
So this could lead to think the NUL character is allowed in the middle of
the buffer.

This patch updates the console API to pass the size along the buffer
down so we can remove the reliance on buffer terminating by a NUL
character.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mm: Clean IOMMU flags from p2m-pt code
Alexandru Stefan ISAILA [Wed, 14 Aug 2019 14:41:23 +0000 (15:41 +0100)]
x86/mm: Clean IOMMU flags from p2m-pt code

At this moment IOMMU pt sharing is disabled by commit [1].

This patch aims to clear the IOMMU hap share support as it will not be
used in the future. By doing this the IOMMU bits used in pte[52:58] can
be used in other ways.

[1] c2ba3db31ef2d9f1e40e7b6c16cf3be3d671d555

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoxen/arm: setup: Add Xen as boot module before printing all boot modules
Julien Grall [Mon, 12 Aug 2019 11:23:43 +0000 (12:23 +0100)]
xen/arm: setup: Add Xen as boot module before printing all boot modules

Since commit f60658c6ae "xen/arm: Stop relocating Xen", the position of
Xen in memory is not printed anymore. This can make difficult to debug
early code.

As Xen is not relocated anymore, we can add Xen as boot module before
calling boot_fdt_info(). With that, the function will print Xen module
information along with all the other modules.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agotools/pygrub: Failing to set value to 0 in Grub2ConfigFile
Michael Young [Tue, 13 Aug 2019 20:15:02 +0000 (21:15 +0100)]
tools/pygrub: Failing to set value to 0 in Grub2ConfigFile

In Grub2ConfigFile the code to handle ${saved_entry} and ${next_entry}
sets arg = "0" but this now does nothing following c/s d1b93ea2615bd
"tools/pygrub: Make pygrub understand default entry in string format"
which replaced arg.strip() with arg_strip in the following line.  This
patch restores the previous behaviour.

Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>