]> xenbits.xensource.com Git - people/royger/xen.git/log
people/royger/xen.git
6 years agoguest/pvh: special case the low 1MB fix_memory_scrub_v1 gitlab/fix_memory_scrub_v1
Roger Pau Monne [Fri, 9 Nov 2018 17:09:20 +0000 (18:09 +0100)]
guest/pvh: special case the low 1MB

When running as a PVH guest Xen only special cases the trampoline
code in the low 1MB, without also reserving the space used by the
relocated metadata or the trampoline stack.

Fix this by always reserving the low 1MB regardless of whether Xen is
running as a guest or natively.

Reported-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
6 years agoguest/pvh: fix handling of multiboot info and module list
Roger Pau Monne [Fri, 9 Nov 2018 17:09:19 +0000 (18:09 +0100)]
guest/pvh: fix handling of multiboot info and module list

When booting Xen as a PVH guest the data in the PVH start info
structure is copied over to a multiboot structure and a module list
array that resides in the .init section of the Xen image. The
resulting multiboot structures are then handled to the generic boot
process using their physical address.

This works fine as long as the Xen image doesn't relocate itself, if
there's such a relocation the physical addresses of the multiboot
structure and the module array are no longer valid.

Fix this by handling the virtual address of the multiboot structure
and module array to the generic boot process instead of it's physical
address.

Reported-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
6 years agox86/hvm: clean up the rest of bool_t from vm_event
Alexandru Isaila [Fri, 9 Nov 2018 12:06:28 +0000 (13:06 +0100)]
x86/hvm: clean up the rest of bool_t from vm_event

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agopass-through: adjust pIRQ migration
Jan Beulich [Fri, 9 Nov 2018 12:05:28 +0000 (13:05 +0100)]
pass-through: adjust pIRQ migration

For one it is quite pointless to iterate over all pIRQ-s the domain has
when just one is being adjusted. Introduce hvm_migrate_pirq() as an
externally accessible function.

Additionally it is bogus to migrate the pIRQ to a vCPU different from
the one the event is supposed to be posted to - if anything, it might be
worth considering not to migrate the pIRQ at all in the posting case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/dom0: Use init_xen_pae_l2_slots() rather than opencoding it
Andrew Cooper [Thu, 8 Nov 2018 14:17:46 +0000 (14:17 +0000)]
x86/dom0: Use init_xen_pae_l2_slots() rather than opencoding it

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/grant_table: Remove stale comment on top of map_grant_ref
Julien Grall [Thu, 1 Nov 2018 10:16:58 +0000 (10:16 +0000)]
xen/grant_table: Remove stale comment on top of map_grant_ref

Remove the 2 part comment on top of map_grant_ref:
    - The first part mention the return value which has been void since
    2006!
    - The second part mention a local variable 'addr' which does not
    exist anymore.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen/arm: initialize access
Stefano Stabellini [Tue, 6 Nov 2018 22:05:57 +0000 (14:05 -0800)]
xen/arm: initialize access

Initialize variable *access before returning it back to the caller.
It makes the code a bit nicer and it is a safety certification
requirement.

M3CM Rule-9.1: The value of an object with automatic storage duration
shall not be read before it has been set

QAVerify: 2962
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: rcojocaru@bitdefender.com
CC: Tamas K Lengyel <tamas@tklengyel.com>
6 years agoxen/arm: initialize target
Stefano Stabellini [Tue, 6 Nov 2018 22:05:56 +0000 (14:05 -0800)]
xen/arm: initialize target

Initialize variable target before passing it as a parameter.
It makes the code a bit nicer and it is a safety certification
requirement.

M3CM Rule-9.1: The value of an object with automatic storage duration
shall not be read before it has been set

QAVerify: 2972
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agox86/traps: use only one stub function for l/cstar
Wei Liu [Fri, 9 Nov 2018 10:46:36 +0000 (10:46 +0000)]
x86/traps: use only one stub function for l/cstar

And place it into .text.cold.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agocpufreq: convert to a single post-init driver (hooks) instance
Jan Beulich [Fri, 9 Nov 2018 10:42:10 +0000 (11:42 +0100)]
cpufreq: convert to a single post-init driver (hooks) instance

This reduces the post-init memory footprint, eliminates a pointless
level of indirection at the use sites, and allows for subsequent
alternatives call patching.

Take the opportunity and also add a name to the PowerNow! instance.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxsm: remove printing from set_to_dummy_if_null()
Xin Li [Fri, 9 Nov 2018 10:41:30 +0000 (11:41 +0100)]
xsm: remove printing from set_to_dummy_if_null()

Filling dummy module's hook to null value of xsm_operations structure
will generate debug message. This becomes boot time spew for module
like silo, which only sets a few hooks of itself. So remove the printing
to avoid boot time spew.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Xin Li <xin.li@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
6 years agoviridian: introduce struct viridian_page
Paul Durrant [Fri, 9 Nov 2018 10:40:12 +0000 (11:40 +0100)]
viridian: introduce struct viridian_page

The 'vp_assist' page is currently an example of a guest page which needs to
be kept mapped throughout the life-time of a guest, but there are other
such examples in the specifiction [1]. This patch therefore introduces a
generic 'viridian_page' type and converts the current vp_assist/apic_assist
related code to use it. Subsequent patches implementing other enlightments
can then also make use of it.

This patch also renames the 'vp_assist_pending' field in struct
hvm_viridian_vcpu_context to 'apic_assist_pending' to more accurately
reflect its meaning. The term 'vp_assist' applies to the whole page rather
than just the EOI-avoidance enlightenment. New versons of the specification
have defined data structures for other enlightenments within the same page.

No functional change.

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoviridian: define type for the 'virtual VP assist page'
Paul Durrant [Fri, 9 Nov 2018 10:39:27 +0000 (11:39 +0100)]
viridian: define type for the 'virtual VP assist page'

The specification [1] defines a type so we should use it, rather than just
OR-ing and AND-ing magic bits.

No functional change.

NOTE: The type defined in the specification does include an anonymous
      sub-struct in the page type but, as we currently use only the first
      element, the struct declaration has been omitted.

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoviridian: separate time related enlightenment implementations...
Paul Durrant [Fri, 9 Nov 2018 10:38:03 +0000 (11:38 +0100)]
viridian: separate time related enlightenment implementations...

...into new 'time' module.

This patch reduces the size of the main viridian source module by
moving time related enlightenments into their own source module. This is
done in anticipation of implementation of more such enightenments and
a desire to not further lengthen the main source module when this work
is done.

While moving the code:

- Move the declaration of HV_REFERENCE_TSC_PAGE from the header file into
  the new source module, since it is only used there.
- Clean up a bool_t.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoviridian: separate interrupt related enlightenment implementations...
Paul Durrant [Fri, 9 Nov 2018 10:36:52 +0000 (11:36 +0100)]
viridian: separate interrupt related enlightenment implementations...

...into new 'synic' module.

The SynIC (synthetic interrupt controller) is specified [1] to be a super-
set of a virtualized LAPIC, and its definition encompasses all
enlightenments related to virtual interrupt control.

This patch reduces the size of the main viridian source module by giving
these enlightenments their own module. This is done in anticipation of
implementation of more such enlightenments and a desire not to further
lengthen then main source module when this work is done.

Whilst moving the code:

- Fix various style issues.
- Move the MSR definitions into the header (since they are now needed in
  more than one source module).

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoautomation: build with Ubuntu 18.04
Wei Liu [Mon, 22 Oct 2018 15:18:51 +0000 (16:18 +0100)]
automation: build with Ubuntu 18.04

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoautomation: add dockerfile for Ubuntu 18.04
Wei Liu [Mon, 22 Oct 2018 15:18:50 +0000 (16:18 +0100)]
automation: add dockerfile for Ubuntu 18.04

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoRevert "arch/x86: Add registers to vm_event"
Wei Liu [Thu, 8 Nov 2018 17:22:35 +0000 (17:22 +0000)]
Revert "arch/x86: Add registers to vm_event"

This reverts commit da61a2102ff9f2430cad14277009a4cae05ac779, because
it breaks !CONFIG_HVM builds.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoautomation: build some customised configs
Wei Liu [Fri, 2 Nov 2018 17:49:47 +0000 (17:49 +0000)]
automation: build some customised configs

Introduce a new directory to put in configs we care about. Modify
build script to build with those configs.

While we only introduce x86 configs initially, provision for non-x86
configs.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agox86: expose CONFIG_PV
Wei Liu [Thu, 4 Oct 2018 09:15:08 +0000 (10:15 +0100)]
x86: expose CONFIG_PV

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: make PV hypercall entry points work with !CONFIG_PV
Wei Liu [Fri, 2 Nov 2018 13:44:01 +0000 (13:44 +0000)]
x86: make PV hypercall entry points work with !CONFIG_PV

We want Xen to crash if we hit these paths when PV is disabled.

For syscall, we provide stubs for {l,c}star_enter which end up calling
panic.  For sysenter, we initialise CS to 0 so that #GP can be raised.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/amd: don't set pv_post_outb_hook when !CONFIG_PV
Wei Liu [Thu, 8 Nov 2018 14:52:03 +0000 (14:52 +0000)]
x86/amd: don't set pv_post_outb_hook when !CONFIG_PV

Obviously it won't exist when PV is disabled.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoamd/pvh: enable ACPI C1E disable quirk on PVH Dom0
Roger Pau Monne [Thu, 8 Nov 2018 14:23:58 +0000 (15:23 +0100)]
amd/pvh: enable ACPI C1E disable quirk on PVH Dom0

PV Dom0 has a quirk for some AMD processors, where enabling ACPI can
also enable C1E mode. Apply the same workaround as done on PV for a
PVH Dom0, which consist on trapping accesses to the SMI command IO
port and disabling C1E if ACPI is enabled.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoarch/x86: Add registers to vm_event
Alexandru Stefan ISAILA [Mon, 5 Nov 2018 09:54:06 +0000 (09:54 +0000)]
arch/x86: Add registers to vm_event

This patch adds a couple of regs to the vm_event that are used by
the introspection. The base, limit and ar
bits are compressed into a uint64_t union so as not to enlarge the
vm_event.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
6 years agox86/genapic: remove indirection from genapic hook accesses
Jan Beulich [Thu, 8 Nov 2018 14:59:14 +0000 (15:59 +0100)]
x86/genapic: remove indirection from genapic hook accesses

Instead of loading a pointer at each use site, have a single runtime
instance of struct genapic, copying into it from the individual
instances. The individual instances can this way also be moved to .init
(also adjust apic_probe[] at this occasion).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agotools/misc: fix hard tabs in xen-hvmctx.c
Paul Durrant [Wed, 7 Nov 2018 10:52:22 +0000 (10:52 +0000)]
tools/misc: fix hard tabs in xen-hvmctx.c

Also add emacs boilerplate to avoid future problems.

Purely cosmetic. No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/xen-cpuid: Fix 32bit build
Andrew Cooper [Wed, 7 Nov 2018 12:51:43 +0000 (12:51 +0000)]
tools/xen-cpuid: Fix 32bit build

Clang reports:

  xen-cpuid.c:307:29: error: format specifies type 'unsigned long' but the
  argument has type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat]

                 msrs[l].idx, msrs[l].val);
                              ^~~~~~~~~~~

Use PRIx64 instead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agop2m: move p2m-common.h inclusion point
Jan Beulich [Wed, 7 Nov 2018 08:35:14 +0000 (09:35 +0100)]
p2m: move p2m-common.h inclusion point

The header is (hence its name) supposed to be a helper for the per-arch
p2m.h files. It was never supposed to be included directly, and for the
purpose of putting common function declarations into the common header
it is more helpful if things like p2m_t are already available at the
inclusion point.

This also undoes parts of 02ede7dc03 ("memory: add
check_get_page_from_gfn() as a wrapper..."), which had been there just
because of the unhelpful original way of including p2m-common.h.

Take the opportunity and also ditch a duplicate public/memory.h from the
ARM header.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agomm/page_alloc: make bootscrub happen in idle-loop
Sergey Dyasli [Wed, 7 Nov 2018 08:34:17 +0000 (09:34 +0100)]
mm/page_alloc: make bootscrub happen in idle-loop

Scrubbing RAM during boot may take a long time on machines with lots
of RAM. Add 'idle' option to bootscrub which marks all pages dirty
initially so they will eventually be scrubbed in idle-loop on every
online CPU.

It's guaranteed that the allocator will return scrubbed pages by doing
eager scrubbing during allocation (unless MEMF_no_scrub was provided).

Use the new 'idle' option as the default one.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: work around HLE host lockup erratum
Jan Beulich [Wed, 7 Nov 2018 08:33:24 +0000 (09:33 +0100)]
x86: work around HLE host lockup erratum

XACQUIRE prefixed accesses to the 4Mb range of memory starting at 1Gb
are liable to lock up the processor. Disallow use of this memory range.

Unfortunately the available Core Gen7 and Gen8 spec updates are pretty
old, so I can only guess that they're similarly affected when Core Gen6
is and the Xeon counterparts are, too.

This is part of XSA-282.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: extend get_platform_badpages() interface
Jan Beulich [Wed, 7 Nov 2018 08:32:08 +0000 (09:32 +0100)]
x86: extend get_platform_badpages() interface

Use a structure so along with an address (now frame number) an order can
also be specified.

This is part of XSA-282.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/domctl: Implement XEN_DOMCTL_get_cpu_policy
Sergey Dyasli [Thu, 21 Jun 2018 14:35:50 +0000 (16:35 +0200)]
x86/domctl: Implement XEN_DOMCTL_get_cpu_policy

This finally (after literally years of work!) marks the point where the
toolstack can ask the hypervisor for the current CPUID configuration of a
specific domain.

Introduce a new flask access vector and update the default policies.

Also extend xen-cpuid's --policy mode to be able to take a domid and dump a
specific domains CPUID and MSR policy.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/sysctl: Implement XEN_SYSCTL_get_cpu_policy
Sergey Dyasli [Thu, 21 Jun 2018 14:35:50 +0000 (16:35 +0200)]
x86/sysctl: Implement XEN_SYSCTL_get_cpu_policy

Provide a SYSCTL for the toolstack to obtain complete system CPUID and MSR
policy information.

For the flask side of things, this subop is closely related to
{phys,cputopo,numa}info, so shares the physinfo access vector.

Extend the xen-cpuid utility to be able to dump the system policies.  An
example output is:

  Xen reports there are maximum 113 leaves and 3 MSRs
  Raw policy: 93 leaves, 3 MSRs
   CPUID:
    leaf     subleaf  -> eax      ebx      ecx      edx
    00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
    00000001:ffffffff -> 000306c3:00100800:7ffafbff:bfebfbff
    00000002:ffffffff -> 76036301:00f0b5ff:00000000:00c10000
    00000004:00000000 -> 1c004121:01c0003f:0000003f:00000000
    00000004:00000001 -> 1c004122:01c0003f:0000003f:00000000
    00000004:00000002 -> 1c004143:01c0003f:000001ff:00000000
    00000004:00000003 -> 1c03c163:03c0003f:00001fff:00000006
    00000005:ffffffff -> 00000040:00000040:00000003:00042120
    00000006:ffffffff -> 00000077:00000002:00000009:00000000
    00000007:00000000 -> 00000000:000027ab:00000000:9c000000
    0000000a:ffffffff -> 07300403:00000000:00000000:00000603
    0000000b:00000000 -> 00000001:00000002:00000100:00000000
    0000000b:00000001 -> 00000004:00000008:00000201:00000000
    0000000d:00000000 -> 00000007:00000340:00000340:00000000
    0000000d:00000001 -> 00000001:00000000:00000000:00000000
    0000000d:00000002 -> 00000100:00000240:00000000:00000000
    80000000:ffffffff -> 80000008:00000000:00000000:00000000
    80000001:ffffffff -> 00000000:00000000:00000021:2c100800
    80000002:ffffffff -> 65746e49:2952286c:6f655820:2952286e
    80000003:ffffffff -> 55504320:2d334520:30343231:20337620
    80000004:ffffffff -> 2e332040:48473034:0000007a:00000000
    80000006:ffffffff -> 00000000:00000000:01006040:00000000
    80000007:ffffffff -> 00000000:00000000:00000000:00000100
    80000008:ffffffff -> 00003027:00000000:00000000:00000000
   MSRs:
    index    -> value
    000000ce -> 0000000080000000

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: Introduce struct cpu_policy to refer to a group of individual policies
Andrew Cooper [Mon, 2 Jul 2018 16:05:33 +0000 (16:05 +0000)]
x86: Introduce struct cpu_policy to refer to a group of individual policies

This is prep work for the following patch - please refer to it as well.

When auditing and manipulating policies, it is necessary to do so with a
complete set of policies, due to the interdependences of the contents.  A
containing structure like this will allow for clearer APIs and code.

As a first user, this structure is convenient for the mapping used by
XEN_SYSCTL_get_cpu_policy (implemented in the next patch), and for auditing
(later when XEN_DOMCTL_set_cpu_policy is implemented).

At this point, the distinction between *_max and *_default is introduced into
the ABI.  For now, *_default is mapped to *_max, but future development work
will result in *_default being a logical subset of *_max.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agolibx86: Introduce a helper to serialise msr_policy objects
Roger Pau Monné [Thu, 21 Jun 2018 14:35:50 +0000 (16:35 +0200)]
libx86: Introduce a helper to serialise msr_policy objects

As with CPUID, an architectural form is used for representing the MSR data.
It is expected not to change moving forwards, but does have a 32 bit field
(currently reserved) which can be used compatibly if needs be.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agolibx86: Introduce a helper to serialise cpuid_policy objects
Andrew Cooper [Thu, 21 Jun 2018 14:35:49 +0000 (16:35 +0200)]
libx86: Introduce a helper to serialise cpuid_policy objects

The serialised form is made up of the leaf, subleaf and data tuple.  As this
is the architectural form, it is expected not to change going forwards.

The serialisation of the Xen/Viridian leaves isn't fully implemented yet.  It
is just enough to be bug-compatible with the current DOMCTL_set_cpuid
behaviour, but needs further hypervisor work before the toolstack can sensibly
control these values.

x86_cpuid_copy_to_buffer() is implemented using Xen's regular copy_to_guest
primitives, with an API-compatible memcpy() is used for the libxc half of the
build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agotools/dm_depriv: Add first cut RLIMITs
George Dunlap [Tue, 6 Nov 2018 15:41:25 +0000 (15:41 +0000)]
tools/dm_depriv: Add first cut RLIMITs

Limit the ability of a potentially compromised QEMU to consume system
resources.  Key limits:
 - RLIMIT_FSIZE (file size): 256KiB
 - RLIMIT_NPROC (after uid changes to a unique uid)

Probably unnecessary limits but why not:
 - RLIMIT_CORE: 0
 - RLIMIT_MSGQUEUE: 0
 - RLIMIT_LOCKS: 0
 - RLIMIT_MEMLOCK: 0

NB that we do not yet set RLIMIT_AS (total virtual memory) or
RLIMIT_NOFILES (number of open files), since these require more care
and/or more coordination with QEMU to implement.

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Put global headers before local headers (sugg by Paul)
- Move #undif inside the braces (sugg by Paul)

Changes since v3:
- Align RLIMIT_ENTRY list for easier reading
- Fix wrong format string specifier
- Get rid of some trailing whitespace

Changes since v2:
- Use a macro to define rlimit entries
- Use RLIMIT_NLIMITS as an end-of-list marker, rather than -1
- Various style clean-ups

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
6 years agotools/dm_restrict: Unshare mount and IPC namespaces on Linux
George Dunlap [Tue, 6 Nov 2018 15:41:24 +0000 (15:41 +0000)]
tools/dm_restrict: Unshare mount and IPC namespaces on Linux

QEMU running under Xen doesn't need mount or IPC functionality.
Create and enter separate namespaces for each of these before
executing QEMU, so that in the event that other restrictions fail, the
process won't be able to even name system mount points or exsting
non-file-based IPC descriptors to attempt to attack them.

Unsharing is something a process can only do to itself (it would
seem); so add an os-specific "dm_preexec_restrict()" hook just before
we exec() the device model.

Also add checks to depriv-process-checker.sh to verify that dm is
running in a new namespace (or at least, a different one than the
caller).

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Fix function prototype for netbsd code

Changes since v3:
- Fix some more style issues

Changes since v2:
- Return an error rather than calling exit()
- Use LOGE() and print to the current stderr fd, rather than
  printing to the new stderr fd via write()
- Use r for external return values rather than rc.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
6 years agotools/dm_restrict: Ask QEMU to chroot
George Dunlap [Tue, 6 Nov 2018 15:41:23 +0000 (15:41 +0000)]
tools/dm_restrict: Ask QEMU to chroot

When dm_restrict is enabled, ask QEMU to chroot into an empty directory.

* Create $XEN_RUN_DIR/qemu-root-<domid> (deleting the old one if it's there)
* Pass the -chroot option to QEMU

Rather than running `rm -rf` on the directory before creating it
(since there is no library function to do this), simply rmdir the
directory, relying on the fact that the previous QEMU instance, if
properly restricted, shouldn't have been able to write anything
anyway.

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Minor change to comment
- Update stale directory name in commit message

Changes since v2:
- Style fixes
- Testing moved to a different patch

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
6 years agoSUPPORT.md: Add qemu-depriv section
George Dunlap [Tue, 6 Nov 2018 15:41:22 +0000 (15:41 +0000)]
SUPPORT.md: Add qemu-depriv section

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Fix some grammar (s/attack/attacking/;)

Changes since v3:
- Moved from the qemu-depriv doc patches.
- Reword to include the possibility of having a non-dom0 "devicemodel"
  domain which may want to be protected
- Specify `Linux dom0` as the currently-tech-supported window

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Konrad Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
CC: Anthony Perard <anthony.perard@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
6 years agodocs/qemu-deprivilege: Revise and update with status and future plans
George Dunlap [Tue, 6 Nov 2018 15:41:22 +0000 (15:41 +0000)]
docs/qemu-deprivilege: Revise and update with status and future plans

docs/qemu-deprivilege.txt had some basic instructions for using
dm_restrict, but it was incomplete, misleading, and stale.

Update the docs in a number of ways.

First, separate user-facing documentation and technical description
into docs/features and docs/design, respectively.

In the feature doc:

* Introduce a section mentioning minimim versions of Linux, Xen, and
qemu required (TBD)

* Fix the discussion of qemu userid.  Mention xen-qemuuser-range-base,
and provide example shell code that actually has some hope of working
(instead of failing out after creating 900 userids).

* Describe how to enable restrictions, as well as features which
probably don't or definitely don't work.

In the design doc, introduce a "Technical Details" section which
describes specifically what restrictions are currently done, and also
what restrictions we are looking at doing in the future.

The idea here is that as we implement the various items for the
future, we move them from "Restrictions still to do" to "Restrictions
done".  This can also act as a design document -- a place for public
discussion of what can or should be done and how.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Remove unnecessary FIXME
- Remove stale "Add SUPPORT.md"

Changes since v3:
- Fix typo (32->16)
- Use an example value not close to the `nobody` uids, but still a
  multiple of 2^16.
- Mention that using a multiple of 2^16 may have advantages.
- Have the example create a group as well
- Reorganize two comments on the "range-base" method for clarity

Changes since v2:
- Extraneous privcmd / evtchn instances aren't closed
- Expand description of how to test fd deprivileging
- Rework and clarify two namespace sections, give reference for QEMU NAK
- Add more information about migration technical challenges
- In UID section, mention possibility of container ID collisions.
- Fix name of design document.
- Add SUPPORT.md statement.  Specify Linux, to make sure that FreeBSD is
  evaluated separately.
- Mention that `-sandbox` is a blacklist and why

Changes since v1:
- Break into two, and move into appropriate directories (rather than 'misc')
- Updated version requirements
- Distinguish between features which "don't yet work" and features which we never expect to work
- Update description of xen-restrict functionality
- Reorder and expand further restrictions
- Make it more clear which restrictions are available on Linux only
- Include detailed description of how to kill a process
- Add RLIMIT_NPROC as something we can do without further changes to qemu
- Document the need to check for the sandbox feature before using it

Thank you to Ross Lagerwall, whose description of what XenServer is
doing formed much of the basis for the text here.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Konrad Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
CC: Anthony Perard <anthony.perard@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
6 years agotools: ipxe: Correct download error handling
Ian Jackson [Mon, 5 Nov 2018 18:40:49 +0000 (18:40 +0000)]
tools: ipxe: Correct download error handling

This shell fragment lacked set -e.  So, eg if the download failed a
broken ipxe.tar.gz would be left behind.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Tested-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools: Once again honour, but no longer advertise GIT_HTTP env var
Ian Jackson [Mon, 5 Nov 2018 18:37:05 +0000 (18:37 +0000)]
tools: Once again honour, but no longer advertise GIT_HTTP env var

In "build: add autoconf to replace custom checks in tools/check"
--enable-githttp was introduced.  But we missed this comment where it
was advertised.

Also, that commit had the effect of uncondtionally setting GIT_HTTP
from the configure variable.  But the env var has been advertised in
some places as the way to specify this behaviour, and overriding it is
just unfriendly.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools: libxl/xl: run NUMA placement even when an hard-affinity is set
Dario Faggioli [Fri, 19 Oct 2018 15:54:41 +0000 (17:54 +0200)]
tools: libxl/xl: run NUMA placement even when an hard-affinity is set

Right now, if either an hard or soft-affinity are explicitly specified
in a domain's config file, automatic NUMA placement is skipped. However,
automatic NUMA placement affects only the soft-affinity of the domain
which is being created.

Therefore, it is ok to let it run if an hard-affinity is specified. The
semantics will be that the best placement candidate would be found,
respecting the specified hard-affinity, i.e., using only the nodes that
contain the pcpus in the hard-affinity mask.

This is particularly helpful if global xl pinning masks are defined, as
made possible by commit aa67b97ed34279c43 ("xl.conf: Add global affinity
masks"). In fact, without this commit, defining a global affinity mask
would also mean disabling automatic placement, but that does not
necessarily have to be the case (especially in large systems).

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: put x86emul_{read,write}_dr under CONFIG_PV
Wei Liu [Mon, 5 Nov 2018 17:38:58 +0000 (17:38 +0000)]
x86: put x86emul_{read,write}_dr under CONFIG_PV

A build breakage is discovered by a non-debug build. Debug build
worked because the ASSERT made the compiler eliminate the rest of the
functions.

Currently they are PV only. There are comments alluding to possible
future HVM support but we can cross the bridge when we get there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoRelease: add release note link to SUPPORT.md
Juergen Gross [Fri, 26 Oct 2018 13:13:44 +0000 (15:13 +0200)]
Release: add release note link to SUPPORT.md

In order to have a link to the release notes in the feature list
generated from SUPPORT.md add that link in the "Release Support"
section of that file.

The real link needs to be adapted when the version is being released.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agodocs: remove ChangeLog file
Juergen Gross [Fri, 26 Oct 2018 10:38:06 +0000 (12:38 +0200)]
docs: remove ChangeLog file

docs/ChangeLog has been updated for Xen 3.3 last time. It seems to be
interesting for archaeologists only today.

Remove it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: make entry point code build when !CONFIG_PV
Wei Liu [Fri, 19 Oct 2018 11:32:12 +0000 (12:32 +0100)]
x86: make entry point code build when !CONFIG_PV

Skip building x86_64/compat/entry.S and put CONFIG_PV in
x86_64/entry.S.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/traps: Misc non-functional cleanup
Andrew Cooper [Mon, 5 Nov 2018 16:03:03 +0000 (16:03 +0000)]
x86/traps: Misc non-functional cleanup

 * s/unsigned char/uint8_t/ for clarity
 * Drop redundant return at the end of a void function

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agotools: Move the typesafe min/max helpers into xen-tools/libs.h
Andrew Cooper [Thu, 19 Jul 2018 15:42:07 +0000 (16:42 +0100)]
tools: Move the typesafe min/max helpers into xen-tools/libs.h

... rather than implementing them separately for libxc and libxl.  They will
shortly be wanted in libx86 as well.

Fix up the style/consistency in the declaration, but no functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/vcpu: Remove struct vcpu allocation restriction when possible
Andrew Cooper [Fri, 2 Nov 2018 17:46:38 +0000 (17:46 +0000)]
x86/vcpu: Remove struct vcpu allocation restriction when possible

There is no need for struct vcpu to live below the 4G boundary for PV guests,
or for HVM vcpus using HAP.

Plumb struct domain into alloc_vcpu_struct() so the x86 version can query the
domain's type and paging settings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agox86: update help string for CONFIG_HVM
Wei Liu [Fri, 2 Nov 2018 15:55:45 +0000 (15:55 +0000)]
x86: update help string for CONFIG_HVM

Update text. Change "guest" to "domain" where appropriate because
"guest" doesn't include Domain 0.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: rearrange x86_64/entry.S
Wei Liu [Fri, 2 Nov 2018 15:55:42 +0000 (15:55 +0000)]
x86: rearrange x86_64/entry.S

Split the file into two halves. The first half pertains to PV guest
code while the second half is mostly used by the hypervisor itself to
handle interrupts and exceptions.

No functional change intended.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/domctl: rework XEN_DOMCTL_{set,get}_address_size
Wei Liu [Fri, 2 Nov 2018 15:55:40 +0000 (15:55 +0000)]
x86/domctl: rework XEN_DOMCTL_{set,get}_address_size

Going through toolstack code, they are used for PV guests only.

Tighten their access to PV only. Return -EOPNOTSUPP if they are called
on HVM guests. Rewrite the code in a pattern that makes DCE work.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: make traps.c build with !CONFIG_PV
Wei Liu [Fri, 2 Nov 2018 15:55:39 +0000 (15:55 +0000)]
x86: make traps.c build with !CONFIG_PV

Provide a stub for pv_inject_event. Put code that accesses PV fields
and GDT / LDT fault handling code under CONFIG_PV. Move set_debugreg
to pv/misc-hypercalls.c.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: put vcpumask_to_pcpumask under CONFIG_PV
Wei Liu [Fri, 2 Nov 2018 19:28:51 +0000 (19:28 +0000)]
x86: put vcpumask_to_pcpumask under CONFIG_PV

This function is used by PV code only. This issue is discovered by
clang build.

Drop spurious inline while at it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: VME and PVI modes require a #GP(0) check first thing
Jan Beulich [Mon, 5 Nov 2018 10:13:59 +0000 (11:13 +0100)]
x86emul: VME and PVI modes require a #GP(0) check first thing

As explicitly spelled out by the SDM, EFLAGS.VIF and EFLAGS.VIP both set
at the start of an instruction trigger #GP(0) independent of actual
instruction.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: deal with firmware setting bogus TSC_ADJUST values
Jan Beulich [Mon, 5 Nov 2018 10:13:09 +0000 (11:13 +0100)]
x86: deal with firmware setting bogus TSC_ADJUST values

The system Intel have handed me for AVX512 emulator work ("Gigabyte
Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3
Pro-CF, BIOS F3 12/28/2017") would not come up under Xen - it hung in
the middle of Dom0 PCI initialization. As it turned out, Xen's time
management did not work because of the firmware setting (only) the boot
CPU's TSC_ADJUST MSR to a large negative value (on the order of -2^50).

Follow Linux (also shamelessly stealing their comments) in
- clearing the register for the boot CPU (we don't have a need for
  exceptions here yet, as the only exception in Linux is a class of
  systems Xen doesn't work on anyway as far as I'm aware),
- forcing non-negative values uniformly (commit 855615eee9 ["x86/tsc:
  Remove the TSC_ADJUST clamp"] dropped this, but without this my
  Haswell box won't boot anymore),
- syncing the registers within sockets.
Linux, prior to aforementioned commit, capped at 0x7fffffff as well, but as the
description there says this issue has been addressed with a microcode
update. Hence until someone runs into such a system without being able
to update its microcode, I think we should leave out that specific part.

In order to avoid making init_percpu_time() depend on running _before_
set_cpu_sibling_map() (and hence the booting CPU _not_ being accounted
in socket_cpumask[] yet), move that call slightly earlier in
start_secondary().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/TSC: don't allow deadline timer to be used with unfixed errata
Jan Beulich [Mon, 5 Nov 2018 10:12:39 +0000 (11:12 +0100)]
x86/TSC: don't allow deadline timer to be used with unfixed errata

In preparation of writes to the TSC_ADJUST MSR, avoid the bad
interaction of writes to it and the TSC_DEADLINE one. Presumably the
original Linux commit bd9240a18e ("x86/apic: Add TSC_DEADLINE quirk due
to errata") refers to e.g. KBW092. (Of course this is an issue also
without us writing the TSC_ADJUST MSR, if instead firmware did already.

The errata checking can't be put in init_apic_mappings() as Linux does,
as that runs before we update microcode on the boot CPU. It needs to
happen before consumers of tdt_enabled, i.e.
- __setup_APIC_LVTT() <- setup_APIC_timer() <- setup_boot_APIC_clock()
-                     <- calibrate_APIC_clock() <- setup_boot_APIC_clock()
- setup_boot_APIC_clock()
setup_boot_APIC_clock() gets called from smp_prepare_cpus(), which sits
after microcode loading (note that calibrate_APIC_clock() gets called
before setting tdt_enabled).

Also add an MFENCE as per Linux commit 5d7c631d92 ("x86/apic: Serialize
LVTT and TSC_DEADLINE writes"), but I see no reason to put a conditional
around it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoviridian: remove duplicate union types
Paul Durrant [Mon, 5 Nov 2018 10:11:39 +0000 (11:11 +0100)]
viridian: remove duplicate union types

The 'viridian_vp_assist', 'viridian_hypercall_gpa' and
'viridian_reference_tsc' union types are identical in layout. The layout
is also common throughout the specification [1].

This patch declares a common 'viridian_page_msr' type and converts the rest
of the code to use that type for both the hypercall and VP assist pages.

Also, rename 'viridian_guest_os_id' to 'viridian_guest_os_id_msr' since it
also is a union representing an MSR value.

No functional change.

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
6 years agoviridian: remove comments referencing section number in the spec
Paul Durrant [Mon, 5 Nov 2018 10:10:55 +0000 (11:10 +0100)]
viridian: remove comments referencing section number in the spec

Microsoft has a habit of re-numbering sections in the spec. so avoid
referring to section numbers in comments. Also remove the URL for the
spec. from the boilerplate... Again, Microsoft has a habit of changing
these too.

This patch also cleans up some > 80 character lines.

Purely cosmetic. No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
6 years agoviridian: remove MSR perf counters
Paul Durrant [Mon, 5 Nov 2018 10:09:35 +0000 (11:09 +0100)]
viridian: remove MSR perf counters

They're not really useful so maintaining them is pointless.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
6 years agolibxl/arm: fix guest type conversion
Wei Liu [Fri, 2 Nov 2018 12:34:12 +0000 (12:34 +0000)]
libxl/arm: fix guest type conversion

Commit 359970fd8b ("tools/libxl: Switch Arm guest type to PVH") missed
changing the type field in c_info. This issue didn't surface until
ef72c93df9 which made creating PV guest on Arm unusable.

Create libxl__arch_domain_create_info_setdefault and switch the type
there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agox86/hvm: clean up may_defer from hvm_* helpers
Alexandru Isaila [Fri, 2 Nov 2018 11:16:32 +0000 (12:16 +0100)]
x86/hvm: clean up may_defer from hvm_* helpers

The may_defer var was left with the older bool_t type. This patch
changes the type to bool.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Paul Durrant <paul.durrant@citrix.com>
6 years agoVMX: fix vmx_handle_eoi()
Jan Beulich [Fri, 2 Nov 2018 11:15:33 +0000 (12:15 +0100)]
VMX: fix vmx_handle_eoi()

In commit 303066fdb1e ("VMX: fix interaction of APIC-V and Viridian
emulation") I screwed up: Instead of clearing SVI, other ISR bits
should be taken into account.

Introduce a new helper set_svi(), split out of vmx_process_isr(), and
use it also from vmx_handle_eoi().

Following the problems in vmx_intr_assist() (see the still present big
block of debugging code there) also warn (once) if EOI'd vector and
original SVI don't match.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agotools/ocaml: make type of Xsraw.sync more precise
Christian Lindig [Tue, 30 Oct 2018 10:19:06 +0000 (10:19 +0000)]
tools/ocaml: make type of Xsraw.sync more precise

The type of Xsraw.sync is made more precise:

from val sync : (Xenbus.Xb.t -> 'a) -> con -> string
to   val sync : (Xenbus.Xb.t -> unit) -> con -> string

The first argument is enforced to return unit rather than a value that
is not used anyway.

[ No functional change. -iwj ]

Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agotools/ocaml: Re-introduce Xenctrl.with_intf wrapper
Christian Lindig [Thu, 1 Nov 2018 09:12:53 +0000 (09:12 +0000)]
tools/ocaml: Re-introduce Xenctrl.with_intf wrapper

Commit 81946a73dc975a7dafe9017a8e61d1e64fdbedbf removed
Xenctrl.with_intf based on its undesirable behaviour of opening and
closing a Xenctrl connection with every invocation. This commit
re-introduces with_intf but with an updated behaviour: it maintains a
global Xenctrl connection which is opened upon first usage and kept
open. This handle can be obtained by clients using new functions
get_handle() and close_handle().

The main motivation of re-introducing with_intf is that otherwise
clients will have to implement this functionality individually.

Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibvchan: create xenstore entries in one transaction
Marek Marczykowski-Górecki [Tue, 30 Oct 2018 23:49:05 +0000 (00:49 +0100)]
libvchan: create xenstore entries in one transaction

This will prevent race when client waits for server with xs_watch - all
entries should appear at once.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/misc/xenpm: fix getting info when some CPUs are offline
Marek Marczykowski-Górecki [Wed, 31 Oct 2018 13:04:58 +0000 (14:04 +0100)]
tools/misc/xenpm: fix getting info when some CPUs are offline

Use physinfo.max_cpu_id instead of physinfo.nr_cpus to get max CPU id.
This fixes for example 'xenpm get-cpufreq-para' with smt=off, which
otherwise would miss half of the cores.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/hotplug: remove xen-hotplug-cleanup
Marek Marczykowski-Górecki [Tue, 30 Oct 2018 23:56:59 +0000 (00:56 +0100)]
tools/hotplug: remove xen-hotplug-cleanup

Since udev is no longer used to call hotplug scripts (neither in dom0
nor driver domain), this scripts is no longer referenced anywhere. libxl
(xl devd or else) has own cleanup code.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: Consolidate the storage of MSR_AMD64_DR{0-3}_ADDRESS_MASK
Andrew Cooper [Fri, 19 Oct 2018 15:14:22 +0000 (16:14 +0100)]
x86: Consolidate the storage of MSR_AMD64_DR{0-3}_ADDRESS_MASK

The PV and HVM code both have a copy of these, which gives the false
impression in the context switch code that they are PV/HVM specific.

Move the storage into struct vcpu_msrs, and update all users to match.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: move the code into its own sub-directory
Paul Durrant [Wed, 31 Oct 2018 12:44:00 +0000 (13:44 +0100)]
viridian: move the code into its own sub-directory

Subsequent patches will introduce support for more viridian enlightenments
which will make a single source module quite lengthy.

This patch therefore creates a new arch/x86/hvm/viridian sub-directory and
moves viridian.c into that.

The patch also fixes some bad whitespace whilst moving the code and
adjusts the MAINTAINERS file.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/pvh: allow PVH Dom0 to use the debug IO port console
Roger Pau Monné [Wed, 31 Oct 2018 16:59:26 +0000 (17:59 +0100)]
x86/pvh: allow PVH Dom0 to use the debug IO port console

Force trapping accesses to IO port 0xe9 for a PVH Dom0, so it can
print to the HVM debug console.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: introduce a define for the debug output IO port
Roger Pau Monné [Wed, 31 Oct 2018 16:58:47 +0000 (17:58 +0100)]
x86/hvm: introduce a define for the debug output IO port

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/pvh: fix TSC mode setup for PVH Dom0
Roger Pau Monné [Wed, 31 Oct 2018 16:58:15 +0000 (17:58 +0100)]
x86/pvh: fix TSC mode setup for PVH Dom0

A PVH Dom0 might use TSC scaling or other HVM specific TSC
adjustments, so only short-circuit the TSC setup for a classic PV
Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agouse consistent values when consuming runtime-changeable parameters
Jan Beulich [Wed, 31 Oct 2018 16:57:19 +0000 (17:57 +0100)]
use consistent values when consuming runtime-changeable parameters

There's no guarantee that e.g. a switch() control expression's memory
operand(s) get(s) read just once. Guard against the compiler producing
"unexpected" code by sprinkling around some ACCESS_ONCE().

I'm leaving alone opt_conswitch[]: It gets accessed in quite a few
places anyway, and an intermediate change won't have any severe effect
afaict.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agoxen/arm: introduce NO_PLAT
Stefano Stabellini [Mon, 24 Sep 2018 22:55:04 +0000 (15:55 -0700)]
xen/arm: introduce NO_PLAT

Add a Kconfig option to select no specific platform support.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
6 years agoxen/arm: make platform specific code dependent on ALL32_PLAT or ALL64_PLAT
Stefano Stabellini [Mon, 24 Sep 2018 22:55:03 +0000 (15:55 -0700)]
xen/arm: make platform specific code dependent on ALL32_PLAT or ALL64_PLAT

Compile platform code that doesn't have its own specific kconfig option
only if ALL32_PLAT or ALL64_PLAT depending on the architecture. The
benefit is that choosing one of the platforms available as a menu
option allows the user not to build other unnecessary platform code.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
6 years agolibxencall: Improve linux syscall error messages
Ian Jackson [Mon, 15 Oct 2018 14:22:53 +0000 (15:22 +0100)]
libxencall: Improve linux syscall error messages

Make the bufdev and non-bufdev messages distinct, and always print the
non-constant argument (ie, the size).

This assists diagnosis.

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Hans van Kranenburg <hans@knorrie.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: libxl__ev_fd_callback: Document perhaps-no-retry semantics
Ian Jackson [Mon, 15 Oct 2018 13:58:54 +0000 (14:58 +0100)]
libxl: libxl__ev_fd_callback: Document perhaps-no-retry semantics

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Anthony PERARD <anthony.perard@citrix.com>
6 years agolibxl: CODING_STYLE: Clarify line length limit to 75
Ian Jackson [Mon, 15 Oct 2018 13:51:01 +0000 (14:51 +0100)]
libxl: CODING_STYLE: Clarify line length limit to 75

And give a reason.

The previous `limit' of 75-80 was ambiguous.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
CC: Anthony PERARD <anthony.perard@citrix.com>
6 years agox86/pv: Fix crash when using `xl set-parameter pcid=...`
Andrew Cooper [Tue, 30 Oct 2018 11:17:00 +0000 (11:17 +0000)]
x86/pv: Fix crash when using `xl set-parameter pcid=...`

"pcid=" is registered as a runtime parameter, which means that parse_pcid()
must not reside in .init, or the following happens when parse_params() tries
to call an unmapped function pointer.

  (XEN) ----[ Xen-4.12-unstable  x86_64  debug=y   Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e008:[<ffff82d080407fb3>] ffff82d080407fb3
  (XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor (d0v1)
  (XEN) rax: ffff82d080407fb3   rbx: ffff82d0803cf270   rcx: 0000000000000000
  (XEN) rdx: ffff8300abe67fff   rsi: 000000000000000a   rdi: ffff8300abe67bfd
  (XEN) rbp: ffff8300abe67ca8   rsp: ffff8300abe67ba0   r8:  ffff83084d980000
  (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
  (XEN) r12: ffff8300abe67bfd   r13: ffff82d0803cb628   r14: 0000000000000000
  (XEN) r15: ffff8300abe67bf8   cr0: 0000000080050033   cr4: 0000000000172660
  (XEN) cr3: 0000000828efd000   cr2: ffff82d080407fb3
  (XEN) fsb: 00007fb810d4b780   gsb: ffff88007ce20000   gss: 0000000000000000
  (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
  (XEN) Xen code around <ffff82d080407fb3> (ffff82d080407fb3) [fault on access]:
  (XEN)  -- -- -- -- -- -- -- -- <--> -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  (XEN) Xen stack trace from rsp=ffff8300abe67ba0:
  (XEN)    ffff82d080217f61 ffff830826db0f09 ffff8300abe67bf8 ffff82d0803cf1e0
  (XEN)    00007cff54198409 ffff8300abe67bf0 010001d000000000 0000000000000000
  (XEN)    ffff82d0803cf288 ffff8300abe67c88 ffff82d0805a09c0 616c620064696370
  (XEN)    00000000aaaa0068 0000000000000296 ffff82d08023d60e aaaaaaaaaaaaaaaa
  (XEN)    ffff83084d9b4000 ffff8300abe67c68 ffff82d08024940e ffff83083736e000
  (XEN)    0000000000000080 000000000000007a 000000000000000a ffff82d08045e61c
  (XEN)    ffff82d080573d80 ffff8300abe67cb8 ffff82d080249805 80000007fce54067
  (XEN)    fffffffffffffff2 ffff830826db0f00 ffff8300abfa7000 ffff82d08045e61c
  (XEN)    ffff82d080573d80 ffff8300abe67cb8 ffff82d08021801e ffff8300abe67e48
  (XEN)    ffff82d08023f60a ffff83083736e000 0000000000000000 ffff8300abe67d58
  (XEN)    ffff82d080293d90 0000000000000092 ffff82d08023d60e ffff820040006ae0
  (XEN)    0000000000000000 0000000000000000 00007fb810d5c010 ffff83083736e248
  (XEN)    0000000000000286 ffff8300abe67d58 0000000000000000 ffff82e010521b00
  (XEN)    0000000000000206 0000000000000000 0000000000000000 ffff8300abe67e48
  (XEN)    ffff82d080295270 00000000ffffffff ffff83083736e000 ffff8300abe67e48
  (XEN)    ffff820040006ae0 ffff8300abe67d98 000000120000001c 00007fb810d5d010
  (XEN)    0000000000000009 0000000000000002 0000000000000001 00007fb810b53260
  (XEN)    0000000000000001 0000000000000000 0000000000638bc0 00007fb81066a748
  (XEN)    00007ffe11087881 0000000000000002 0000000000000001 00007fb810b53260
  (XEN)    0000000000638b60 0000000000000000 00007fb8100322a0 ffff82d08035d444
  (XEN) Xen call trace:
  (XEN)    [<ffff82d080217f61>] kernel.c#parse_params+0x34a/0x3eb
  (XEN)    [<ffff82d08021801e>] runtime_parse+0x1c/0x1e
  (XEN)    [<ffff82d08023f60a>] do_sysctl+0x108d/0x1241
  (XEN)    [<ffff82d0803535cb>] pv_hypercall+0x1ac/0x4c5
  (XEN)    [<ffff82d08035d4a2>] lstar_enter+0x112/0x120
  (XEN)
  (XEN) Pagetable walk from ffff82d080407fb3:
  (XEN)  L4[0x105] = 00000000abe5c063 ffffffffffffffff
  (XEN)  L3[0x142] = 00000000abe59063 ffffffffffffffff
  (XEN)  L2[0x002] = 000000084d9bf063 ffffffffffffffff
  (XEN)  L1[0x007] = 0000000000000000 ffffffffffffffff
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) FATAL PAGE FAULT
  (XEN) [error_code=0010]
  (XEN) Faulting linear address: ffff82d080407fb3
  (XEN) ****************************************

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/vvmx: Don't handle unknown nested vmexit reasons at L0
Andrew Cooper [Thu, 25 Oct 2018 13:11:58 +0000 (14:11 +0100)]
x86/vvmx: Don't handle unknown nested vmexit reasons at L0

This is very dangerous from a security point of view, because a missing entry
will cause L2's action to be interpreted as L1's action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86/vvmx: Drop the now-obsolete vmx_inst_check_privilege()
Andrew Cooper [Thu, 25 Oct 2018 14:17:50 +0000 (15:17 +0100)]
x86/vvmx: Drop the now-obsolete vmx_inst_check_privilege()

Now that nvmx_handle_vmx_insn() performs all VT-x instruction checks, there is
no need for redundant checking in vmx_inst_check_privilege().  Remove it, and
take out the vmxon_check boolean which was plumbed through decode_vmx_inst().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86/vvmx: Unconditionally initialise vmxon_region_pa during vcpu construction
Andrew Cooper [Thu, 25 Oct 2018 13:40:11 +0000 (14:40 +0100)]
x86/vvmx: Unconditionally initialise vmxon_region_pa during vcpu construction

This is a stopgap solution until the toolstack side of initialisation can be
sorted out, but it does result in the nvmx_vcpu_in_vmx() predicate working
correctly even when nested virt hasn't been enabled for the domain.

Update nvmx_handle_vmx_insn() to include the in-vmx mode check (for all
instructions other than VMXON) to complete the set of #UD checks.

In addition, sanity check that the nested vmexit handler has worked correctly,
and that we are only providing emulation of the VT-x instructions to L1
guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86/vvmx: Let L1 handle all the unconditional vmexit instructions
Andrew Cooper [Thu, 25 Oct 2018 13:08:33 +0000 (14:08 +0100)]
x86/vvmx: Let L1 handle all the unconditional vmexit instructions

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86: Reorganise and rename debug register fields in struct vcpu
Andrew Cooper [Mon, 28 May 2018 14:22:49 +0000 (15:22 +0100)]
x86: Reorganise and rename debug register fields in struct vcpu

Reusing debugreg[5] for the PV emulated IO breakpoint information is confusing
to read.  Instead, introduce a dr7_emul field in pv_vcpu for the purpose.

With the PV emulation out of the way, debugreg[4,5] are entirely unused and
don't need to be stored.

Rename debugreg[0..3] to dr[0..3] to reduce code volume, but keep them as an
array because their behaviour is identical and this helps simplfy some of the
PV handling.  Introduce dr6 and dr7 fields to replace debugreg[6,7] which
removes the storage for debugreg[4,5].

In arch_get_info_guest(), handle the merging of emulated dr7 state alongside
all other dr handling, rather than much later.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agox86/emul: Unfold %cr4.de handling in x86emul_read_dr()
Andrew Cooper [Mon, 28 May 2018 15:16:37 +0000 (15:16 +0000)]
x86/emul: Unfold %cr4.de handling in x86emul_read_dr()

No functional change (as curr->arch.debugreg[5] is zero when DE is clear), but
this change simplifies the following patch.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agox86/domain: Fix build with GCC 4.3.x
Andrew Cooper [Mon, 29 Oct 2018 11:29:54 +0000 (11:29 +0000)]
x86/domain: Fix build with GCC 4.3.x

GCC 4.3.x can't initialise the user_regs structure like this.

Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoarm,smmu: backport "Disable stalling faults for all endpoints"
Stefano Stabellini [Sun, 14 Oct 2018 22:57:32 +0000 (15:57 -0700)]
arm,smmu: backport "Disable stalling faults for all endpoints"

Backport commit 3714ce1d6655098ee69ede632883e5874d67e4ab
"iommu/arm-smmu: Disable stalling faults for all endpoints" from the
Linux kernel. This works-around Erratum #842869.

Original commit message:

  Enabling stalling faults can result in hardware deadlock on poorly
  designed systems, particularly those with a PCI root complex upstream of
  the SMMU.

  Although it's not really Linux's job to save hardware integrators from
  their own misfortune, it *is* our job to stop userspace (e.g. VFIO
  clients) from hosing the system for everybody else, even if they might
  already be required to have elevated privileges.

  Given that the fault handling code currently executes entirely in IRQ
  context, there is nothing that can sensibly be done to recover from
  things like page faults anyway, so let's rip this code out for now and
  avoid the potential for deadlock.

Cc: <stable@vger.kernel.org>
Fixes: 48ec83bcbcf5 ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices")
Reported-by: Matt Evans <matt.evans@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoMake credit2 the default scheduler
George Dunlap [Mon, 29 Oct 2018 14:51:51 +0000 (14:51 +0000)]
Make credit2 the default scheduler

Credit2 was declared "supported" in 4.8, and as of 4.10 had two other
critical features implemented (soft affinity / NUMA and caps).

Why change the default?

The code is better: more predictable, less jitter, easier to determine
how modifications will affect overall behavior, easier in the future
to make load-balancing behavior more subtle (e.g., taking into account
the cost of powering up extra cores, &c).

Overall performance compared to Credit1 is somewhat of a mixed bag.
Unfortunately most of what I have are tests using XenServer's internal
perf testing system, so I can't share the raw data (via links anyway).

Here is a summary of data from an internal e-mail Dario sent in the
past:

* DVDbench: On underloaded systems, credit2 outperformed credit1 by
about 4%.  On overloaded systems, credit2 underperformed by about 3%.

* On a range of tests (unixbench, lmbench, &c), credit and credit2
perform within 5% of each other (up and down).

* Credit2 fairly consistently beats credit for TCP-style workloads.

* Credit2 is sometimes equal to, sometimes 5-15% worse than, credit for
synthetic CPU workloads (e.g., Dhrystone).

* On LoginVSI, credit2 fairly consistently outperforms credit by about 10%.

Credit2, like credit, has a number of workloads / setups for which
performance could be improved.  Personally I think networking and
partially-loaded systems is going to be more representative of what
Xen is actually used for; so I think credit2 is on the whole the
better scheduler to use by default.  And in any case, making those
improvements on credit2 will be easier than on credit.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
6 years agoamd-iommu: get rid of pointless IOMMU_PAGING_MODE_LEVEL_X definitions
Paul Durrant [Mon, 29 Oct 2018 12:47:24 +0000 (13:47 +0100)]
amd-iommu: get rid of pointless IOMMU_PAGING_MODE_LEVEL_X definitions

The levels are absolute numbers such that IOMMU_PAGING_MODE_LEVEL_X
evaluates to X (for the valid range of 0 - 7) so simply use numbers in
the code.

No functional change.

NOTE: This patch also adds emacs boilerplate to amd-iommu-defs.h

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agox86emul: generalize vector length handling for AVX512/EVEX
Jan Beulich [Mon, 29 Oct 2018 12:40:56 +0000 (13:40 +0100)]
x86emul: generalize vector length handling for AVX512/EVEX

To allow for some code sharing where possible, copy VEX.L into EVEX.LR
even for VEX (or XOP) encoded insns. Make operand size determination
use this right away, at the same time adding consistency checks for the
EVEX scalar insn cases (the non-scalar ones aren't uniform enough for
the checking to be done in a central place like this).

Note that the broadcast case is not handled here, but will be taken care
of elsewhere (in just a single place rather than at least two).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: put some code in arch_set_info_guest under CONFIG_PV
Wei Liu [Fri, 19 Oct 2018 14:28:28 +0000 (15:28 +0100)]
x86: put some code in arch_set_info_guest under CONFIG_PV

This function is called by both PV and HVM. Unfortunately the code is
very convoluted. We can reason that code between the call to
hvm_set_info_guest and out label is PV only. Put that portion under
CONFIG_PV.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: make mm.c build with !CONFIG_PV
Wei Liu [Fri, 19 Oct 2018 14:28:27 +0000 (15:28 +0100)]
x86: make mm.c build with !CONFIG_PV

Start by putting hypercall handlers which are supposed to be PV only
under CONFIG_PV. Shuffle some code around to avoid introducing
excessive numbers of CONFIG_PV.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86emul: correct EVEX decoding
Jan Beulich [Fri, 26 Oct 2018 15:50:01 +0000 (17:50 +0200)]
x86emul: correct EVEX decoding

Fix an inverted pair of checks, drop an incorrect instance of #UD
raising for non-64-bit mode, and add further generic checks.

Note: Despite what SDM Vol 2 rev 067 states, EVEX.V' is _not_ ignored
      outside of 64-bit mode when the field does not encode a register.
      Just like EVEX.VVVV is required to be 0b1111 in that case, EVEX.V'
      is required to be 1 there.

Also rename the bcst field to br, as #UD generation for individual insns
will need to consider both of its possible meanings.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul/test: introduce eq()
Jan Beulich [Fri, 26 Oct 2018 13:21:20 +0000 (15:21 +0200)]
x86emul/test: introduce eq()

In preparation for sensible to-boolean conversion on AVX512, wrap
another abstraction function around the present to_bool(<x> == <y>), to
get rid of the open-coded == (which will get in the way of using
built-in functions instead). For the future AVX512 use scalar operands
can't be used then anymore: Use (vec_t){} when the operand is zero,
and broadcast (if available) otherwise (assume pre-AVX512 when broadcast
is not available, in which case a plain scalar is still fine).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512 opmask insns
Jan Beulich [Fri, 26 Oct 2018 13:20:37 +0000 (15:20 +0200)]
x86emul: support AVX512 opmask insns

These are all VEX encoded, so the EVEX decoding logic continues to
remain unused at this point.

The new testcase is deliberately coded in assembly, as a C one would
have become almost unreadable due to the overwhelming amount of
__builtin_...() that would need to be used. After all the compiler has
no underlying type (yet) that could be operated on without builtins,
other than the vector types used for "normal" SIMD insns.

Note that outside of 64-bit mode and despite the SDM not currently
saying so, VEX.W is ignored for the KMOV{D,Q} encodings to/from GPRs,
just like e.g. for the similar VMOV{D,Q}.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: restrict HVMOP_pagetable_dying to current
Jan Beulich [Fri, 26 Oct 2018 13:18:52 +0000 (15:18 +0200)]
x86: restrict HVMOP_pagetable_dying to current

This is not used (and probably was never meant to be) by the tool stack.
Limiting it to the current domain in particular allows to eliminate a
bogus use of vCPU 0 in pagetable_dying().

Remove the now unnecessary domain/vCPU parameters from the wrapper/hook
functions at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: don't build guest-walk code without HVM and SHADOW_PAGING
Jan Beulich [Fri, 26 Oct 2018 13:16:23 +0000 (15:16 +0200)]
x86: don't build guest-walk code without HVM and SHADOW_PAGING

It's dead code in that case.

We could go further, as we don't really need the 2- and 3-level walk
code in PV mode, but to drop their compilation requires quite a bit of
disentangling of shadow mode code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>