]> xenbits.xensource.com Git - people/liuw/xen.git/log
people/liuw/xen.git
6 years agox86: lift vcpu mapcache to arch_vcpu vmap-xenheap
Wei Liu [Mon, 17 Dec 2018 15:58:13 +0000 (15:58 +0000)]
x86: lift vcpu mapcache to arch_vcpu

It is going to be needed by HVM as well, because we want even HVM vcpu
to have a per-vcpu mapcache.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: lift domain mapcache to arch_domain
Wei Liu [Mon, 17 Dec 2018 15:50:49 +0000 (15:50 +0000)]
x86: lift domain mapcache to arch_domain

It is going to be needed by HVM as well, because we want even HVM
domain to have a per-domain mapcache.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoRevert "XXX a bunch of printks"
Wei Liu [Mon, 17 Dec 2018 15:53:37 +0000 (15:53 +0000)]
Revert "XXX a bunch of printks"

This reverts commit 75586dfa01f8f997686550c85ecc8dafec68ae73.

6 years agoRevert "XXX switch to using vmap to map pages from alloc_xenheap_pages"
Wei Liu [Mon, 17 Dec 2018 15:53:30 +0000 (15:53 +0000)]
Revert "XXX switch to using vmap to map pages from alloc_xenheap_pages"

This reverts commit e7f004da9e336bc98f265c63e9b646cecdce50c8.

6 years agoRevert "xxx debug"
Wei Liu [Mon, 17 Dec 2018 15:53:23 +0000 (15:53 +0000)]
Revert "xxx debug"

This reverts commit e1d6737def27410d80603fd7649dd6615a40b3ac.

6 years agoRevert "xxx"
Wei Liu [Mon, 17 Dec 2018 15:53:17 +0000 (15:53 +0000)]
Revert "xxx"

This reverts commit 34284fdd1aaa0f9709f69e6bc4425f71d4a75b75.

6 years agoxxx
Wei Liu [Wed, 12 Dec 2018 17:22:38 +0000 (17:22 +0000)]
xxx

6 years agoxxx debug
Wei Liu [Wed, 12 Dec 2018 16:19:52 +0000 (16:19 +0000)]
xxx debug

6 years agoXXX switch to using vmap to map pages from alloc_xenheap_pages
Wei Liu [Wed, 12 Dec 2018 11:48:59 +0000 (11:48 +0000)]
XXX switch to using vmap to map pages from alloc_xenheap_pages

This is broken atm because vm_init depends on alloc_xenheap_pages to
be functional.

6 years agoXXX a bunch of printks
Wei Liu [Wed, 12 Dec 2018 12:36:12 +0000 (12:36 +0000)]
XXX a bunch of printks

6 years agoXXX x86: move vm_init to early boot stage
Wei Liu [Wed, 12 Dec 2018 12:20:44 +0000 (12:20 +0000)]
XXX x86: move vm_init to early boot stage

We want to make alloc_xenheap_pages use vm_init. But there is this
call chain:

   vm_init -> vm_init_type -> alloc_xen_pagetable ->
     alloc_xenheap_page / alloc_boot_pages

Obviously if it ends up calling alloc_xenheap_page it becomes
circular. We make it call alloc_boot_pages instead by moving it to
early boot stage.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoXXX xen/vmap: allow vm_init_type to be called during early_boot
Wei Liu [Wed, 12 Dec 2018 12:17:09 +0000 (12:17 +0000)]
XXX xen/vmap: allow vm_init_type to be called during early_boot

We want to move vm_init, which calls vm_init_type under the hood, to
early boot stage. Add a path to get page from boot allocator instead.

Add an emacs block to that file while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: always use vmap for global mapping
Wei Liu [Mon, 17 Dec 2018 15:36:30 +0000 (15:36 +0000)]
x86: always use vmap for global mapping

We will remove direct map soon. In that case we can't rely on direct
map for global mapping. Remove the fast path.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86emul: avoid triggering assertions with VME/PVI early #GP check
Jan Beulich [Tue, 18 Dec 2018 14:21:17 +0000 (15:21 +0100)]
x86emul: avoid triggering assertions with VME/PVI early #GP check

In commit efe9cba66c ("x86emul: VME and PVI modes require a #GP(0) check
first thing") I neglected the fact that the retire flags get zapped only
in x86_decode(), which hasn't been invoked yet at the point of the #GP(0)
check added. Move output state initialization into a helper function,
and invoke it from the callers of x86_decode() instead of doing it
(possibly too late) in that function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: fix vector-length check for AVX512F scalar fused-multiply-add insns
Jan Beulich [Tue, 18 Dec 2018 14:20:32 +0000 (15:20 +0100)]
x86emul: fix vector-length check for AVX512F scalar fused-multiply-add insns

The check needs to happen whenever EVEX.b (SDM nomenclature) is clear,
not just in the memory operand case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: work around SandyBridge errata
Jan Beulich [Tue, 18 Dec 2018 14:19:47 +0000 (15:19 +0100)]
x86emul: work around SandyBridge errata

There are a number of exception condition related errata on SandyBridge
CPUs, some of which are unexpected #UD (others, of no interest here, are
lack of mandated exceptions, or exceptions of unexpected type). Annotate
the one workaround we already have, and add two more.

Due to the exception recovery we have in place for stub invocations
these aren't security issues.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: fix 3-operand IMUL
Jan Beulich [Tue, 18 Dec 2018 13:27:09 +0000 (14:27 +0100)]
x86emul: fix 3-operand IMUL

While commit 75066cd4ea ("x86emul: fix {,i}mul and {,i}div") indeed did
as its title says, it broke the 3-operand form by uniformly using AL/AX/
EAX/RAX as second source operand. Fix this and add tests covering both
cases.

Reported-by: Andrei Lutas <vlutas@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul/test: drop another instance of .byte
Jan Beulich [Tue, 18 Dec 2018 13:26:44 +0000 (14:26 +0100)]
x86emul/test: drop another instance of .byte

Now that we require use of the {evex} pseudo-prefix, we can also use
the q-suffixed encoding of VPCMPESTRI, which is available as of 2.29
just like {evex} is.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/hvm: Corrections to RDTSCP intercept handling
Andrew Cooper [Fri, 30 Nov 2018 16:14:08 +0000 (16:14 +0000)]
x86/hvm: Corrections to RDTSCP intercept handling

For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline
supports the instruction, but the guest may not have RDTSCP in its featureset.
Bring the vmexit handlers in line with the main emulator behaviour by
optionally handing back #UD.

Next on the AMD side, if RDTSCP actually ends up being intercepted on a debug
build or first-gen SVM hardware which lacks NRIP, we first update regs->rcx,
then call __get_instruction_length() asking for RDTSC.  As the two
instructions are different (and indeed, different lengths!),
__get_instruction_length_from_list() fails and hands back a #GP fault.

This can demonstrated by putting a guest into tsc_mode="always emulate" and
executing an RDTSCP instruction:

  (d1) --- Xen Test Framework ---
  (d1) Environment: HVM 64bit (Long mode 4 levels)
  (d1) Test rdtscp
  (d1) TSC mode 1
  (XEN) emulate.c:147:d1v0 __get_instruction_length: Mismatch between expected and actual instruction:
  (XEN) emulate.c:152:d1v0   insn_index 8, opcode 0xf0031 modrm 0
  (XEN) emulate.c:154:d1v0   rip 0x10475f, nextrip 0x104762, len 3
  (XEN) SVM insn len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 f9 0f 31 5b 31 ff 31 c0 e9 c2 db ff ff 00
  (d1) ******************************
  (d1) PANIC: Unhandled exception at 0008:000000000010475f
  (d1) Vec 13 #GP[0000]
  (d1) ******************************

First, teach __get_instruction_length() to cope with RDTSCP, and improve
svm_vmexit_do_rdtsc() to ask for the correct instruction.  Move the regs->rcx
adjustment into this function to ensure it gets done after we are done
potentially raising faults.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoxen/arm: mm: Use pte_xen_addr when creating xen entries
Julien Grall [Fri, 14 Dec 2018 11:44:54 +0000 (11:44 +0000)]
xen/arm: mm: Use pte_xen_addr when creating xen entries

The helper pte_xen_addr computes the MFN based on the virtual
address and generates the PTE. This can be r

At the same time, make va a vaddr_t to make clear it holds virtual address.

Signed-off-by: Julien Grall <julien.grall@arm.com>
6 years agoxen: add CONFIG item for default dom0 memory size
Juergen Gross [Mon, 10 Dec 2018 11:44:22 +0000 (12:44 +0100)]
xen: add CONFIG item for default dom0 memory size

With being able to specify a dom0_mem value depending on host memory
size on x86 make it easy for distros to specify a default dom0 size by
adding a CONFIG_DOM0_MEM item which presets the dom0_mem boot parameter
value.

It will be used only if no dom0_mem parameter was specified in the
boot parameters.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoarm/irq: skip action availability check for non-debug build
Andrii Anisov [Wed, 12 Dec 2018 18:20:55 +0000 (20:20 +0200)]
arm/irq: skip action availability check for non-debug build

Under desc->lock taken:
An IRQ with _IRQ_GUEST flag set always has an action.
An IRQ with _IRQ_DISABLED flag cleared always has an action.
Those flags checks cover all accesses to desc->action in do_IRQ,
so we can skip desc->action check in non-debug build.
Keep in place for debug build to help diagnostics potential
misconfiguration.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agogic-vgic: Drop an excessive clear_lrs
Andrii Anisov [Wed, 12 Dec 2018 18:20:54 +0000 (20:20 +0200)]
gic-vgic: Drop an excessive clear_lrs

This action is excessive because for an invalid LR there is no need
to write another invalid value to a register. So we can skip it here,
saving a peripheral register write.
Keep clearing the LR for the DEBUG build. This would make dumped
invalid LRs be zero. That is more obvious than picking state bits
from a non-zero value.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
6 years agoamd-iommu: remove page merging code
Paul Durrant [Thu, 13 Dec 2018 11:01:50 +0000 (12:01 +0100)]
amd-iommu: remove page merging code

The page merging logic makes use of bits 1-8 and bit 63 of a PTE, which
used to be specified as 'ignored'. However, bits 5 and 6 are now specified
as 'accessed' and 'dirty' bits and their use only remains safe as long as
the DTE 'Host Access Dirty' bits remain unused by Xen, or by hardware
before the domain starts running. (XSA-275 disabled the operation of the
code after domain creation completes).

With the page merging logic present in its current form there are no spare
ignored bits in the PTE at all, but PV-IOMMU support will require at least
one spare bit to track which PTEs are added by hypercall.

This patch removes the code, freeing up the remaining PTE ignored bits
for other use, including PV-IOMMU support, as well as significantly
simplifying and shortening the source by ~170 lines. There may be some
marginal performance cost (but none has been observed in manual testing
with a passed-through NVIDIA GPU) since higher order mappings will now be
ruled out until a mapping order parameter is passed to iommu_ops. That will
be dealt with by a subsequent patch though.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agoxen/arm: mm: Set-up page permission for Xen mappings earlier on
Julien Grall [Thu, 29 Nov 2018 11:37:43 +0000 (11:37 +0000)]
xen/arm: mm: Set-up page permission for Xen mappings earlier on

Xen mapping is first create using a 2MB page and then shatterred in 4KB
page for fine-graine permission. However, it is not safe to break-down
superpage page without going to an intermediate step invalidating
the entry.

As we are changing Xen mappings, we cannot go through the intermediate
step. The only solution is to create Xen mapping using 4KB entries
directly. As the Xen should always access the mappings according with
the runtime permission, it is then possible to set-up the permissions
while create the mapping.

We are still playing with the fire as there are still some
break-before-make issue in setup_pagetables (i.e switch between 2 sets of
page-tables). But it should slightly be better than the current state.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Reported-by: Jan-Peter Larsson <Jan-Peter.Larsson@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Matthew Daley <mattd@bugfuzz.com>
6 years agoxen/arm: domctl: Use typesafe gfn in XEN_DOMCTL_cacheflush
Julien Grall [Thu, 29 Nov 2018 19:14:43 +0000 (19:14 +0000)]
xen/arm: domctl: Use typesafe gfn in XEN_DOMCTL_cacheflush

This will make changes in a follow-up patch easier.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Rework p2m_cache_flush_range
Julien Grall [Thu, 29 Nov 2018 19:02:09 +0000 (19:02 +0000)]
xen/arm: p2m: Rework p2m_cache_flush_range

A follow-up patch will add support for preemption in p2m_cache_flush_range.
Because of the complexity for the 2 loops, it would be necessary to add
preemption in both of them.

This can be avoided by merging the 2 loops together and still keeping
the code fairly simple to read and extend.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: traps: Rework leave_hypervisor_tail
Julien Grall [Mon, 26 Nov 2018 14:25:54 +0000 (14:25 +0000)]
xen/arm: traps: Rework leave_hypervisor_tail

The function leave_hypervisor_tail is called before each return to the
guest vCPU. It has two main purposes:
    1) Process physical CPU work (e.g rescheduling) if required
    2) Prepare the physical CPU to run the guest vCPU

2) will always be done once we finished to process physical CPU work. At
the moment, it is done part of the last iterations of 1) making adding
some extra indentation in the code.

This could be streamlined by moving out 2) of the loop. At the same
time, 1) is moved in a separate function making more obvious what is
happening.

All those changes will help a follow-up patch where we would want to
introduce some vCPU work before returning to the guest vCPU.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Extend p2m_get_entry to return the value of bit[0] (valid bit)
Julien Grall [Mon, 6 Aug 2018 16:47:54 +0000 (17:47 +0100)]
xen/arm: p2m: Extend p2m_get_entry to return the value of bit[0] (valid bit)

With the recent changes, a P2M entry may be populated but may not be
valid. In some situation, it would be useful to know whether the entry
has been marked available to guest in order to perform a specific
action. So extend p2m_get_entry to return the value of bit[0] (valid bit).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Allow to flush cache on any RAM region
Julien Grall [Wed, 21 Feb 2018 14:18:44 +0000 (14:18 +0000)]
xen/arm: p2m: Allow to flush cache on any RAM region

Currently, we only allow to flush cache on regions mapped as p2m_ram_{rw,ro}.

There are no real problem in cache flushing any RAM regions such as grants
and foreign mapping. Therefore, relax the check to allow flushing the
cache on any RAM region.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Rework p2m_cache_flush to take a range [begin, end)
Julien Grall [Wed, 21 Feb 2018 14:18:43 +0000 (14:18 +0000)]
xen/arm: Rework p2m_cache_flush to take a range [begin, end)

The function will be easier to re-use in a follow-up patch if you have
only the begin and end.

At the same time, rename the function to reflect the change in the
prototype.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Introduce a function to resolve translation fault
Julien Grall [Mon, 16 Jul 2018 14:49:03 +0000 (15:49 +0100)]
xen/arm: p2m: Introduce a function to resolve translation fault

Currently a Stage-2 translation fault could happen:
    1) MMIO emulation
    2) Another pCPU was modifying the P2M using Break-Before-Make
    3) Guest Physical address is not mapped

A follow-up patch will re-purpose the valid bit in an entry to generate
translation fault. This would be used to do an action on each entry to
track pages used for a given period.

When receiving the translation fault, we would need to walk the pages
table to find the faulting entry and then toggle valid bit. We can't use
p2m_lookup() for this purpose as it only tells us the mapping exists.

So this patch adds a new function to walk the page-tables and updates
the entry. This function will also handle 2) as it also requires walking
the page-table.

The function is able to cope with both table and block entry having the
validate bit unset. This gives flexibility to the function clearing the
valid bits. To keep the algorithm simple, the fault will be propating
one-level down. This will be repeated until a block entry has been
reached.

At the moment, there are no action done when reaching a block/page entry
but setting the valid bit to 1.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Handle translation fault in get_page_from_gva
Julien Grall [Wed, 21 Feb 2018 14:18:40 +0000 (14:18 +0000)]
xen/arm: p2m: Handle translation fault in get_page_from_gva

A follow-up patch will re-purpose the valid bit of LPAE entries to
generate fault even on entry containing valid information.

This means that when translating a guest VA to guest PA (e.g IPA) will
fail if the Stage-2 entries used have the valid bit unset. Because of
that, we need to fallback to walk the page-table in software to check
whether the fault was expected.

This patch adds the software page-table walk on all the translation
fault. It would be possible in the future to avoid pointless walk when
the fault in PAR_EL1 is not a translation fault.

This function has only worked for guest RAM pages (no foreing mappings or
MMIO mappings) because we require the page to belong to the domain for
getting a reference. This means we can deny all non guest RAM pages.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Introduce p2m_is_valid and use it
Julien Grall [Wed, 21 Feb 2018 14:18:42 +0000 (14:18 +0000)]
xen/arm: p2m: Introduce p2m_is_valid and use it

The LPAE format allows to store information in an entry even with the
valid bit unset. In a follow-up patch, we will take advantage of this
feature to re-purpose the valid bit for generating a translation fault
even if an entry contains valid information.

So we need a different way to know whether an entry contains valid
information. It is possible to use the information hold in the p2m_type
to know for that purpose. Indeed all entries containing valid
information will have a valid p2m type (i.e p2m_type != p2m_invalid).

This patch introduces a new helper p2m_is_valid, which implements that
idea, and replace most of lpae_is_valid call with the new helper. The ones
remaining are for TLBs handling and entries accounting.

With the renaming there are 2 others changes required:
    - Generate table entry with a valid p2m type
    - Detect new mapping for proper stats accounting

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Clean-up headers included and order them alphabetically
Julien Grall [Thu, 22 Nov 2018 10:57:36 +0000 (10:57 +0000)]
xen/arm: p2m: Clean-up headers included and order them alphabetically

A lot of the headers are not necessary, so remove them. At the same
time, re-order them alphabetically.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: traps: Move the implementation of GUEST_BUG_ON in traps.h
Julien Grall [Wed, 21 Feb 2018 14:18:43 +0000 (14:18 +0000)]
xen/arm: traps: Move the implementation of GUEST_BUG_ON in traps.h

GUEST_BUG_ON may be used in other files doing guest emulation.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Introduce helpers to clear/flags flags in HCR_EL2
Julien Grall [Wed, 21 Feb 2018 14:18:44 +0000 (14:18 +0000)]
xen/arm: Introduce helpers to clear/flags flags in HCR_EL2

A couple of places in the code will need to clear/set flags in HCR_EL2
for a given vCPU and then replicate into the hardware. Introduce
helpers and replace open-coded version.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen: simplify {check,poison}_one_page
Wei Liu [Tue, 11 Dec 2018 11:56:31 +0000 (11:56 +0000)]
xen: simplify {check,poison}_one_page

Use __map_domain_page macro to deal with page_info directly.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen: clean up common/page_alloc.c
Wei Liu [Tue, 11 Dec 2018 11:56:30 +0000 (11:56 +0000)]
xen: clean up common/page_alloc.c

Remove trailing whitespaces. Turn bool_t into bool. Annotate a section
for CONFIG_SEPARATE_XENHEAP.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: remove out label in spurious_interrupt
Wei Liu [Tue, 11 Dec 2018 11:55:15 +0000 (11:55 +0000)]
x86: remove out label in spurious_interrupt

The out label is followed by a semicolon only. Use return directly.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: add dom0 memory sizing variants
Juergen Gross [Tue, 11 Dec 2018 08:43:00 +0000 (09:43 +0100)]
x86: add dom0 memory sizing variants

Today the memory size of dom0 can be specified only in terms of bytes
(either an absolute value or "host-mem - value"). When dom0 shouldn't
be auto-ballooned this requires nearly always a manual adaption of the
Xen boot parameters to reflect the actual host memory size.

Add more possibilities to specify memory sizes. Today we have:

dom0_mem= List of ( min:<size> | max:<size> | <size> )

with <size> being a positive or negative size value (e.g. 1G).

Modify that to:

dom0_mem= List of ( min:<sz> | max:<sz> | <sz> )
<sz>: <size> | [<size>+]<frac>%
<frac>: integer value < 100

With the following semantics:

<frac>% specifies a fraction of host memory size in percent.
<sz> is a percentage of host memory plus an offset.

So <sz> being 1G+25% on a 256G host would result in 65G.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agomodify parse_size_and_unit() to support percentage
Juergen Gross [Tue, 11 Dec 2018 08:42:20 +0000 (09:42 +0100)]
modify parse_size_and_unit() to support percentage

Modify parse_size_and_unit() to support a value followed by a '%'
character. In this case ps is required to be non-NULL to ensure the
caller can detect that case. The returned value will be the integer
value s was pointing to and *ps will point to the '%' character.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86emul: slightly alter AVX512 exception checking conditionals
Jan Beulich [Tue, 11 Dec 2018 08:41:17 +0000 (09:41 +0100)]
x86emul: slightly alter AVX512 exception checking conditionals

While actually benign (operands are either register or memory ones
anyway), I think it is better to use != instead of == for such checks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoautomation: skip test stage for some branches
Wei Liu [Mon, 10 Dec 2018 15:11:10 +0000 (15:11 +0000)]
automation: skip test stage for some branches

We skipped build stage for those branches. We want to skip test state
for those branches too.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agox86/VT-x: Don't activate VMCS Shadowing outside of nested vmx mode
Andrew Cooper [Fri, 7 Dec 2018 17:00:47 +0000 (17:00 +0000)]
x86/VT-x: Don't activate VMCS Shadowing outside of nested vmx mode

By default on capable hardware, SECONDARY_EXEC_ENABLE_VMCS_SHADOWING is
activated unilaterally.  The VMCS Link pointer is initialised to ~0, but the
VMREAD/VMWRITE bitmap pointers are not.

This causes the 16bit IVT and Bios Data Area get interpreted as the read/write
permission bitmap for guests which blindly execute VMREAD/VMWRITE
instructions.

This is not a security issue because the VMCS Link pointer being ~0 causes
VMREAD/VMWRITE to complete with VMFailInvalid (rather than modifying a
potential shadow VMCS), and the contents of MFN 0 has already been determined
not to contain any interesting data because of L1TF's ability to read that 4k
frame.

Leave VMCS Shadowing disabled by default, and toggle it in
nvmx_{set,clear}_vmcs_pointer().  This isn't the most efficient course of
action, but it is the most simple way of leaving nested-virt working as it did
before.

While editing construct_vmcs(), collect all default secondary_exec_control
modifications together.  The disabling of PML is latently buggy because it
happens after secondary_exec_control are written into the VMCS, although there
is an unconditional update later which writes the correct value into hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agodocs/cmdline: Rewrite the cpuid_mask_* section
Andrew Cooper [Fri, 7 Dec 2018 13:43:27 +0000 (13:43 +0000)]
docs/cmdline: Rewrite the cpuid_mask_* section

A large amount of the information here is obsolete since Xen 4.7

To being with, however, this patch marks a change in style for section
headings, due to how HTML anchors are generated.  Having more than one
parameter per heading makes an awkward anchor, especially when brace globbing
is used.  Furthermore, the misc suffixes such as (AMD only) get included, as
do the escaping for the underscores.

Markdown doesn't require escaped underscores in headings (I'm not entirely
sure how we ended up with that style), so remove them and fully expand the
glob syntax.  Also adjust com1,com2 while at it, which is the only other
multi-parameter heading.  Move the misc suffixes into an "Applicability:" note
alongside the information about defaults.

This results in the headings being unadorned, and identical to how they are
expressed on the command line and in code.

For cpuid_mask_cpu option, collapse the long line of almost identical strings
using [] globbing.  The result is much shorter and clearer to read.  Add a
warning that this option no longer masks all features on Fam15h and above, due
to not making use of the leaf 7 masks.

For the remainder of the cpuid_mask_* options, collapse them all together into
a single description.

Finally, leave an explicit note explaining that people should not be using
these options for migration safety.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agodocs/cmdline: Move XSM to be in alphabetical order
Andrew Cooper [Fri, 7 Dec 2018 13:43:25 +0000 (13:43 +0000)]
docs/cmdline: Move XSM to be in alphabetical order

Adjust the default line to note that the default is now selectable in Kconfig.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agodocs/cmdline: Fix markdown syntax
Andrew Cooper [Fri, 7 Dec 2018 13:43:23 +0000 (13:43 +0000)]
docs/cmdline: Fix markdown syntax

 * vwfi needs a closing `.  rmrr needs one as well, and the opening ' switched
   to `
 * The com1/com2 example lines are already verbatim blocks and shouldn't
   escape their underscores.  This ends up in the rendered output.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/pv: Code improvements to do_update_descriptor()
Andrew Cooper [Thu, 6 Dec 2018 14:05:34 +0000 (14:05 +0000)]
x86/pv: Code improvements to do_update_descriptor()

 * Add "uint64_t raw" to seg_desc_t to remove the opencoded uint64_t casting
   in this function.  Change the parameter to be of type seg_desc_t.
 * Rename the 'pa' parameter to 'gaddr', because it lives in GFN space rather
   than physical address space.
 * Use gfn_t and mfn_t rather than unsigned longs.
 * Check the alignment and proposed new descriptor before taking a page
   reference.
 * Use the more flexible ACCESS_ONCE() accessor in preference to
   write_atomic()

No expected change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: Switch "struct desc_struct" to being seg_desc_t
Andrew Cooper [Thu, 6 Dec 2018 14:05:29 +0000 (14:05 +0000)]
x86: Switch "struct desc_struct" to being seg_desc_t

The struct suffix is redundant in the name, and a future change will want to
turn it into a union, rather than a structure.  As this represents a segment
descriptor, give it an appropriate typedef.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoSUPPORT.md: Turn release notes link into a proper link.
Ian Jackson [Mon, 3 Dec 2018 11:18:25 +0000 (11:18 +0000)]
SUPPORT.md: Turn release notes link into a proper link.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: Allow definition lists for features
Ian Jackson [Mon, 3 Dec 2018 12:05:41 +0000 (12:05 +0000)]
docs/parse-support-md: Allow definition lists for features

Now, as well as a `code block', with
  |    Something: some status
we tolerate a definition list which in pandoc terms looks like this
  |Term
  |: Definition

This ought not usually be be used for features but it will be useful
for linking to the release notes, because markup is not allowed in
code blocks but is in definitions.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: Correct handling of Status
Ian Jackson [Mon, 3 Dec 2018 12:01:55 +0000 (12:01 +0000)]
docs/parse-support-md: Correct handling of Status

In fact this was not markdown content, but just a string.  We are
however going to make it be markdown content.  So adjust the comments,
and the consumer.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: pandoc2html_inline: print failing json
Ian Jackson [Mon, 3 Dec 2018 12:03:48 +0000 (12:03 +0000)]
docs/parse-support-md: pandoc2html_inline: print failing json

If our run of pandoc to convert pieces of markup in our hand, into
html, fails, print the json that was rejected.

No change in non-error cases.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: Break out descr2key
Ian Jackson [Mon, 3 Dec 2018 12:03:19 +0000 (12:03 +0000)]
docs/parse-support-md: Break out descr2key

We are going to want to reuse this.  No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: Adjust some (commented-out) debugging
Ian Jackson [Mon, 3 Dec 2018 12:01:27 +0000 (12:01 +0000)]
docs/parse-support-md: Adjust some (commented-out) debugging

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/parse-support-md: More complete example runes
Ian Jackson [Mon, 3 Dec 2018 12:09:28 +0000 (12:09 +0000)]
docs/parse-support-md: More complete example runes

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/hvm/viridian: stop open coding updates to APIC registers
Paul Durrant [Fri, 7 Dec 2018 17:50:08 +0000 (17:50 +0000)]
x86/hvm/viridian: stop open coding updates to APIC registers

The code in viridian_synic_wrmsr() duplicates logic in vlapic_reg_write()
to update the ICR, ICR2 and TASKPRI registers. Instead of doing this,
make vlapic_reg_write() non-static and call it.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Rename "offset" to "reg" for consistency with the rest of the vlapic API.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/hvm: remove duplicate vlapic_find_highest_isr() calls
Paul Durrant [Fri, 7 Dec 2018 13:13:02 +0000 (13:13 +0000)]
x86/hvm: remove duplicate vlapic_find_highest_isr() calls

When viridian APIC assist is active, the code in vlapic_has_pending_irq()
may end up re-calling vlapic_find_highest_isr() after emulating an EOI
whereas simply moving the call after the EOI emulation removes the need
for this duplication.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen/arm32: Remove __init prefixes from funcs that are used within CPU up flow
Oleksandr Tyshchenko [Fri, 7 Dec 2018 09:45:31 +0000 (11:45 +0200)]
xen/arm32: Remove __init prefixes from funcs that are used within CPU up flow

This is a follow-up patch to
commit 01a7e8ccef6e7d5718a251ad587567afbe723330
xen/arm: Remove __initdata and __init to enable CPU hotplug

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoxen/arm: link: Link proc_info_list in .rodata instead of .init.data
Oleksandr Tyshchenko [Fri, 7 Dec 2018 13:41:16 +0000 (15:41 +0200)]
xen/arm: link: Link proc_info_list in .rodata instead of .init.data

To be able to use it for the hot-plugged CPUs as well.

The reason why we link proc_info_list in ".rodata" section is that
it context should never be modified.

This patch also renames ".init.proc.info" section to ".proc.info"
as "init" prefix is not actual anymore.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
6 years agotools/libxl: fix boot of HVM domain with Xenstore-stubdom
Juergen Gross [Tue, 4 Dec 2018 14:28:57 +0000 (15:28 +0100)]
tools/libxl: fix boot of HVM domain with Xenstore-stubdom

The Xenstore domid isn't set for HVM domains. This will result in
failure when booting a HVM domain on a system with Xenstore not running
in dom0.

Same applies for console domid, so set both.

This is broken since commit a2d9a6fa1fcd ("tools/libxenctrl: use new
xenforeignmemory API to seed grant table").

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/xenstore: Document failure for xs_{read,directory,read_watch}
Anthony PERARD [Wed, 5 Dec 2018 16:26:02 +0000 (16:26 +0000)]
tools/xenstore: Document failure for xs_{read,directory,read_watch}

Those functions can return NULL on failure, document it in the public
header.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agons16550: enable use of PCI MSI
Jan Beulich [Thu, 6 Dec 2018 11:21:34 +0000 (12:21 +0100)]
ns16550: enable use of PCI MSI

Which, on x86, requires fiddling with the INTx bit in PCI config space,
since for internally used MSI we can't delegate this to Dom0.

ns16550_init_postirq() also needs (benign) re-ordering of its
operations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoconsole: adjust IRQ initialization
Jan Beulich [Thu, 6 Dec 2018 11:20:55 +0000 (12:20 +0100)]
console: adjust IRQ initialization

In order for a Xen internal PCI device driver to enable MSI on the
device, we need another hook which the driver can use to create the IRQ
(doing this in the init_preirq hook is too early, since IRQ code hasn't
got initialized at that time yet, and doing it in init_postirq is too
late because at least on x86 smp_intr_init() needs to know the IRQ
number).

On x86 this additionally requires a slight ordering change to IRQ
initialization, to facilitate calling the new hook between basic
initialization and the call path leading to smp_intr_init().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agomake domain_adjust_tot_pages() __must_check
Jan Beulich [Thu, 6 Dec 2018 11:19:04 +0000 (12:19 +0100)]
make domain_adjust_tot_pages() __must_check

Even if unlikely, donate_page() should not ignore the possible need to
obtain a domain reference. To make people look more closely when they
add new uses of domain_adjust_tot_pages(), force its return value to be
checked. This in turn requires a benign change to assign_pages().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: reduce code duplication in guest_remove_page()
Jan Beulich [Thu, 6 Dec 2018 11:18:03 +0000 (12:18 +0100)]
x86: reduce code duplication in guest_remove_page()

Quite a bit of duplicate code has accumulated on the "paging" types
special case path. Re-use what can be re-used from the common path.

Since it needs touching anyway, slightly re-format and extend the
gdprintk() on the common path as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoautomation: break .gitlab-yaml into smaller files
Wei Liu [Thu, 22 Nov 2018 15:49:03 +0000 (15:49 +0000)]
automation: break .gitlab-yaml into smaller files

Break out files for build jobs and test jobs. Keep the top level
.gitlab-ci.yaml small.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: add a qemu smoke test for clang build
Wei Liu [Thu, 22 Nov 2018 15:49:02 +0000 (15:49 +0000)]
automation: add a qemu smoke test for clang build

Also rename the old test to have -gcc suffix.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agox86/hvm: Handle x2apic MSRs via the new guest_{rd,wr}msr() infrastructure
Andrew Cooper [Mon, 26 Feb 2018 12:45:58 +0000 (12:45 +0000)]
x86/hvm: Handle x2apic MSRs via the new guest_{rd,wr}msr() infrastructure

Dispatch from the guest_{rd,wr}msr() functions.  The read side should be safe
outside of current context, but the write side is definitely not.  As the
toolstack has no legitimate reason to access the APIC registers via this
interface (not least because whether they are accessible at all depends on
guest settings), unilaterally reject access attempts outside of current
context.

Rename to guest_{rd,wr}msr_x2apic() for consistency, and alter the functions
to use X86EMUL_EXCEPTION rather than X86EMUL_UNHANDLEABLE.  The previous
callers turned UNHANDLEABLE into EXCEPTION, but using UNHANDLEABLE will now
interfere with the fallback to legacy MSR handling.

While altering guest_rdmsr_x2apic() make a couple of minor improvements.
Reformat the initialiser for readable[] so it indents in a more natural way,
and alter high to be a 64bit integer to avoid shifting 0 by 32 in the common
path.

Observant people might notice that we now don't let PV guests read the x2apic
MSRs.  They should never have been able to in the first place.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agox86: Fix APIC MSR constant names
Andrew Cooper [Wed, 7 Mar 2018 16:48:01 +0000 (16:48 +0000)]
x86: Fix APIC MSR constant names

We currently have MSR_IA32_APICBASE and MSR_IA32_APICBASE_MSR which are
synonymous from a naming point of view, but refer to very different things.

Rename the x2APIC MSRs to MSR_X2APIC_*, which are shorter constants and
visually separate the register function from the generic APIC name.  For the
case ranges, introduce MSR_X2APIC_LAST, rather than relying on the knowledge
that there are 0x3ff MSRs architecturally reserved for x2APIC functionality.

For functionality relating to the APIC_BASE MSR, use MSR_APIC_BASE for the MSR
itself, but drop the MSR prefix from the other constants to shorten the names.
In all cases, the fact that we are dealing with the APIC_BASE MSR is obvious
from the context.

No functional change (the combined binary is identical).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agox86/cpuid: Drop the synthetic X86_FEATURE_XEN_IBPB
Andrew Cooper [Thu, 29 Nov 2018 18:16:01 +0000 (18:16 +0000)]
x86/cpuid: Drop the synthetic X86_FEATURE_XEN_IBPB

This appears to be a vestigial remnent of an old version of the
XSA-254/Spectre series, and has never been used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/spec-ctrl: Drop the bti= command line option
Andrew Cooper [Thu, 29 Nov 2018 18:17:45 +0000 (18:17 +0000)]
x86/spec-ctrl: Drop the bti= command line option

bti= was introduced with the original Spectre fixes (Jan 2018), but by the
time Speculative Store Bypass came along (May 2018), it was superceeded by the
more generic spec-ctrl=.

Since then, we've had LazyFPU (June 2018) and L1TF (August 2018), which means
noone will be using the option.  Remove it entirely - anyone who happens to
accidentially be using it might now spot Xen complaining about an option it
doesn't understand.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agopci: apply workaround for Intel errata HSE43 and BDF2/BDX2
Roger Pau Monné [Tue, 4 Dec 2018 13:04:54 +0000 (14:04 +0100)]
pci: apply workaround for Intel errata HSE43 and BDF2/BDX2

These errata affect the values read from the BAR registers, and could
render vPCI (and by extension PVH Dom0 unusable).

HSE43 is a Haswell erratum where a non-BAR register is implemented at
the position where the first BAR of the device should be found in a
Power Control Unit device. Note that there are no BARs on this device,
apart from the bogus CSR register positioned on top of the first BAR.

BDF2/BDX2 is a Broadwell erratum where BARs in the Home Agent device
will return bogus non-zero values.

In both cases the solution is to treat such devices as having no BARs
in the vPCI code.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agovmx: remove stale prototypes
Juergen Gross [Tue, 4 Dec 2018 13:04:20 +0000 (14:04 +0100)]
vmx: remove stale prototypes

Some prototypes in include/asm-x86/hvm/vmx/vmx.h have no related
implementation. Remove them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86emul: raise #GP(0) in VME mode for POPF with TF set in new value
Jan Beulich [Tue, 4 Dec 2018 13:03:43 +0000 (14:03 +0100)]
x86emul: raise #GP(0) in VME mode for POPF with TF set in new value

This is a check explicitly listed by the instruction page in the SDM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: skip VIF processing in VME mode for 16-bit POPF at IOPL 3
Jan Beulich [Tue, 4 Dec 2018 13:02:46 +0000 (14:02 +0100)]
x86emul: skip VIF processing in VME mode for 16-bit POPF at IOPL 3

At IOPL 3 CR4.VME is irrelevant.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agotools/libxc: Fix error handling in get_cpuid_domain_info()
Andrew Cooper [Thu, 29 Nov 2018 18:17:01 +0000 (18:17 +0000)]
tools/libxc: Fix error handling in get_cpuid_domain_info()

get_cpuid_domain_info() has two conflicting return styles - either -error for
local failures, or -1/errno for hypercall failures.  Switch to consistently
use -error.

While fixing the xc_get_cpu_featureset(), take the opportunity to remove the
redundancy and move it to be adjacent to the other featureset handling.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/libxc: Fix issues with libxc and Xen having different featureset lengths
Andrew Cooper [Thu, 29 Nov 2018 18:10:38 +0000 (18:10 +0000)]
tools/libxc: Fix issues with libxc and Xen having different featureset lengths

In almost all cases, Xen and libxc will agree on the featureset length,
because they are built from the same source.

However, there are circumstances (e.g. security hotfixes) where the featureset
gets longer and dom0 will, after installing updates, be running with an old
Xen but new libxc.  Despite writing the code with this scenario in mind, there
were some bugs.

First, xen-cpuid's get_featureset() erroneously allocates a buffer based on
Xen's featureset length, but records libxc's length, which may be longer.

In this situation, the hypercall bounce buffer code reads/writes the recorded
length, which is beyond the end of the allocated object, and a later free()
encounters corrupt heap metadata.  Fix this by recording the same length that
we allocate.

Secondly, get_cpuid_domain_info() has a related bug when the passed-in
featureset is a different length to libxc's.

A large amount of the libxc cpuid functionality depends on info->featureset
being as long as expected, and it is allocated appropriately.  However, in the
case that a shorter external featureset is passed in, the logic to check for
trailing nonzero bits may read off the end of it.  Rework the logic to use the
correct upper bound.

In addition, leave a comment next to the fields in struct cpuid_domain_info
explaining the relationship between the various lengths, and how to cope with
different lengths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxl: free bitmaps on exit
Olaf Hering [Wed, 28 Nov 2018 12:24:34 +0000 (13:24 +0100)]
xl: free bitmaps on exit

Every invocation of xl via valgrind will show three leaks.
Since libxl_bitmap_alloc uses NOGC, the caller has to free the memory
after use. And since xl_ctx_free might be called before
parse_global_config, also move the libxl_bitmap_init calls into
xl_ctx_alloc.

Also move the call to atexit() after xl_ctx_alloc, because the latter is
also called again in postfork.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/shadow: don't enable shadow mode with too small a shadow allocation
Jan Beulich [Fri, 30 Nov 2018 11:10:39 +0000 (12:10 +0100)]
x86/shadow: don't enable shadow mode with too small a shadow allocation

We've had more than one report of host crashes after failed migration,
and in at least one case we've had a hint towards a too far shrunk
shadow allocation pool. Instead of just checking the pool for being
empty, check whether the pool is smaller than what
shadow_set_allocation() would minimally bump it to if it was invoked in
the first place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
6 years agoamd/iommu: skip host bridge devices when updating IOMMU page tables
Roger Pau Monné [Fri, 30 Nov 2018 11:10:00 +0000 (12:10 +0100)]
amd/iommu: skip host bridge devices when updating IOMMU page tables

Host bridges are not behind an IOMMU, and are already special cased and
skipped in amd_iommu_add_device. Apply the same special casing when
updating page tables.

This is required or else update_paging_mode will fail and return an
error to the caller (amd_iommu_{un}map_page) which will destroy the
domain.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agoamd/iommu: assign iommu devices to Xen
Roger Pau Monné [Fri, 30 Nov 2018 11:09:09 +0000 (12:09 +0100)]
amd/iommu: assign iommu devices to Xen

AMD IOMMU devices are exposed on the PCI bus, and thus are assigned by
default to the hardware domain. This can cause issues because the
IOMMU devices themselves are not behind an IOMMU, so update_paging_mode will
return an error if Xen tries to expand the page tables of a domain
that has assigned devices not behind an IOMMU. update_paging_mode
failing will cause the domain to be destroyed.

Fix this by hiding PCI IOMMU devices, so they are not assigned to the
hardware domain.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agoamd-iommu: replace occurrences of u<N> with uint<N>_t...
Paul Durrant [Fri, 30 Nov 2018 11:08:28 +0000 (12:08 +0100)]
amd-iommu: replace occurrences of u<N> with uint<N>_t...

...for N in {8, 16, 32, 64}.

Bring the coding style up to date.

Also, while in the neighbourhood, fix some tabs and remove use of uint64_t
values where it leads to the need for explicit casting.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agons16550/PCI: fix skipping of devices
Jan Beulich [Fri, 30 Nov 2018 11:07:33 +0000 (12:07 +0100)]
ns16550/PCI: fix skipping of devices

Selecting between single/multiple BAR mode should happen after checking
whether to skip the present device, or else multi-BAR devices won't be
skipped correctly, due to port_idx getting set to zero in that case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: Remove redundant pidpath setting
George Dunlap [Fri, 23 Nov 2018 17:14:54 +0000 (17:14 +0000)]
libxl: Remove redundant pidpath setting

This exact same line is duplicated further on without being used or
modified in between.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agotools: set Dom0 UUID if requested
Wei Liu [Mon, 26 Nov 2018 10:40:44 +0000 (10:40 +0000)]
tools: set Dom0 UUID if requested

Introduce XEN_DOM0_UUID in Xen's global configuration file.  Make
xen-init-dom0 accept an extra argument for UUID.

Also switch xs_open error message in xen-init-dom0 to use perror.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agox86: fix paging_max_paddr_bits()
Juergen Gross [Wed, 28 Nov 2018 14:51:20 +0000 (15:51 +0100)]
x86: fix paging_max_paddr_bits()

paging_max_paddr_bits() has an invalid use of IS_ENABLED(): instead of
IS_ENABLED(CONFIG_BIGMEM) it is using IS_ENABLED(BIGMEM). Fix that.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86emul: correct 32-bit address handling for AVX2 gathers
Jan Beulich [Wed, 28 Nov 2018 14:50:26 +0000 (15:50 +0100)]
x86emul: correct 32-bit address handling for AVX2 gathers

As done for other cases by commit 7869e2bafe ("x86emul/fuzz: add
rudimentary limit checking"), address calculations should also use
truncate_ea() for the AVX2 gather insns.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoamd-iommu: replace occurrences of bool_t with bool
Paul Durrant [Wed, 28 Nov 2018 14:49:01 +0000 (15:49 +0100)]
amd-iommu: replace occurrences of bool_t with bool

Bring the coding style up to date. No functional change (except for
removal of some pointless initializers).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agoxen: remove trailing spaces from public headers
Juergen Gross [Wed, 28 Nov 2018 12:32:36 +0000 (13:32 +0100)]
xen: remove trailing spaces from public headers

Several public header files have trailing spaces in them. This is
rather annoying when importing them into other projects as they might
be rejected not complying to coding style.

Remove the trailing spaces in all headers below xen/include/public/.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agotools/xenstore: Document that xs_close(0) is OK.
Ian Jackson [Fri, 2 Nov 2018 17:01:07 +0000 (17:01 +0000)]
tools/xenstore: Document that xs_close(0) is OK.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/libvchan: Initialise xs_transaction_t to XBT_NULL, not NULL
Ian Jackson [Fri, 2 Nov 2018 17:01:06 +0000 (17:01 +0000)]
tools/libvchan: Initialise xs_transaction_t to XBT_NULL, not NULL

This is an integer type, not a pointer.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agoarm/irq: Fix block parathenses and whitespaces
Andrii Anisov [Fri, 16 Nov 2018 16:24:18 +0000 (18:24 +0200)]
arm/irq: Fix block parathenses and whitespaces

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoarm/irq: replace an odd tab with spaces
Andrii Anisov [Fri, 16 Nov 2018 16:24:17 +0000 (18:24 +0200)]
arm/irq: replace an odd tab with spaces

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agomm: make opt_bootscrub non-init
Roger Pau Monne [Mon, 26 Nov 2018 17:55:48 +0000 (18:55 +0100)]
mm: make opt_bootscrub non-init

LLVM code generation can attempt to load from a variable in the next
condition of an expression under certain circumstances, thus turning
the following condition:

if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )

Into:

0xffff82d080223967 <+103>: cmpl   $0x3,0x37b032(%rip) # 0xffff82d08059e9a0 <system_state>
0xffff82d08022396e <+110>: setb   -0x29(%rbp)
0xffff82d080223972 <+114>: cmpl   $0x2,0x228a8b(%rip) # 0xffff82d08044c404 <opt_bootscrub>

Such code will trigger a page fault if system_state >=
SYS_STATE_active because opt_bootscrub will be unmapped.

Fix this by making opt_bootscrub non-init, thus preventing the page
fault. The LLVM bug with the discussion about this issue can be found
at:

https://bugs.llvm.org/show_bug.cgi?id=39707

I haven't been able to find any other instances of such conditional
expression that uses system_state together with an init variable or
function.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/libs: xenforeignmemory_unmap_resource() should be idempotent...
Paul Durrant [Tue, 27 Nov 2018 16:39:17 +0000 (16:39 +0000)]
tools/libs: xenforeignmemory_unmap_resource() should be idempotent...

...and is not because linux osdep_xenforeignmemory_unmap_resource() is not.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxen/tools: Fix gen-cpuid.py's ability to report errors
Andrew Cooper [Mon, 26 Nov 2018 12:03:07 +0000 (12:03 +0000)]
xen/tools: Fix gen-cpuid.py's ability to report errors

c/s 18596903 "xen/tools: support Python 2 and Python 3" unfortunately
introduced a TypeError when changing how Fail exceptions were printed:

  /local/xen.git/xen/../xen/tools/gen-cpuid.py:Traceback (most recent call last):
    File "/local/xen.git/xen/../xen/tools/gen-cpuid.py", line 483, in <module>
        sys.stderr.write(e)
  TypeError: expected a character buffer object

Coerce e to a string before printing.  While changing this, fold the three
write() calls making up the line into a single one, and take the opportunity
to neaten the output.

A sample error is:

  /local/xen.git/xen/tools/gen-cpuid.py: Fail: Aliased value between FOO and BAR

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: fix assertion failure
Paul Durrant [Mon, 26 Nov 2018 16:54:24 +0000 (17:54 +0100)]
viridian: fix assertion failure

Whilst attempting to crash an apparently wedged Windows domain using
'xen-hvmcrash' I managed to trigger the following ASSERT:

(XEN) Assertion '!vp->ptr' failed at viridian.c:607

with stack:

(XEN)    [<ffff82d08032c55d>] viridian_map_guest_page+0x1b4/0x1b6
(XEN)    [<ffff82d08032b1db>] viridian_synic_load_vcpu_ctxt+0x39/0x3b
(XEN)    [<ffff82d08032b90d>] viridian.c#viridian_load_vcpu_ctxt+0x93/0xcc
(XEN)    [<ffff82d0803096d6>] hvm_load+0x10e/0x19e
(XEN)    [<ffff82d080274c6d>] arch_do_domctl+0xb74/0x25b4
(XEN)    [<ffff82d0802068ab>] do_domctl+0x16f7/0x19d8

This happened because viridian_map_guest_page() was not written to cope
with being called multiple times, but this is unfortunately exactly what
happens when xen-hvmcrash re-loads the domain context (having clobbered
the values of RIP).

This patch simply makes viridian_map_guest_page() return immediately if it
finds the page already mapped (i.e. vp->ptr != NULL).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agox86emul: suppress default test harness build with incapable assembler
Jan Beulich [Mon, 26 Nov 2018 16:53:51 +0000 (17:53 +0100)]
x86emul: suppress default test harness build with incapable assembler

A top level "make build", as used e.g. by osstest, wants to build all
"all" targets in enabled tools subdirectories, which by default also
includes the emulator test harness. The use of, in particular, {evex}
insn pseudo-prefixes in, again in particular, test_x86_emulator.c causes
this build to fail though when the assembler is not new enough. Take
another big hammer and suppress the default harness build altogether
also when this and other pseudo-prefixes are not supported by the
specified (or defaulted to) assembler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>