xenbits.xensource.com Git - people/julieng/xen-unstable.git/log

svm: fix incorrect TSC scaling

SVM TSC ratio is incorrectly used in the current
svm_get_tsc_offset(). This patch replaces the scaling logic in
svm_get_tsc_offset() with a correct implementation.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86: refine nr_sockets calculation

The previous variant didn't work for non-contiguous socket numbers.

Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Ed Swierk <eswierk@skyportsystems.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Revert "vVMX: use latched VMCS machine address"

This reverts commit d02e84b9d9d16b6b56186f0dfdcb3c90b83c82a3,
causing a regression on some systems.

x86/libxc: add an arch domain config parameter to xc_domain_create

With the addition of HVMlite the hypervisor now always requires a non-null
arch domain config, which is different between HVM and PV guests.

Add a new parameter to xc_domain_create that contains a pointer to an arch
domain config. If the pointer is null, create a default arch domain config
based on guest type.

Fix all the in-tree callers to provide a null arch domain config in order to
mimic previous behaviour.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

build: fix clean to remove all *.o and .*.d files

In commit 8b6ef9c152edceabecc7f90c811cd538a7b7a110, several files in
xen/common/compat were changed to be built using the Makefile in
xen/common, by appending the compat prefix to the object
files. Additionally, the xen/common/compat directory was removed from
the subdirs-y variable, so it is no longer visited by the clean
rule. This resulted in some object files and dependency files being
generated by inclusion into obj-y, but not cleaned because they lived in a
directory that was unvisited by the clean rules.

Since there is a desire for all of the object files and dependency files
to be cleaned, just search for all objects and dependency files and
delete them on clean. The previous method of only tracking with the
$(DEPS) and *.o in the clean rules had the disadvantage that, if the
configuration changed between a build and a clean, some of the
dependencies or objects could get left behind. This method does not have
the same disadvantage.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
[dropped removal of *.o and $(DEPS) from xen/Rules.mk's clean rule]
Acked-by: Jan Beulich <jbeulich@suse.com>

x86: __{cpu,dev}initdata drop follow-up

While reviewing those patches I noticed a few types that could do with
tweaking.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: make sure the HVM callback vector is correctly set

If certain devices (like the local or the io apic) are disabled some modes
of operation of the HVM event channel callback cannot be used. Make sure Xen
doesn't try to setup them.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

VT-d: drop unneeded Ivybridge quirk workaround

We've been told by Intel that server chipsets don't need the workaround
anymore starting with Ivybridge (Xeon E5/E7 v2); the second half of the
workaround was missing anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/PCI: make all config space writes subject to XSM checking

Now that we intercept them all, there's no reason not to also uniformly
hand them to XSM. Reads (which are expected to be of less interest) get
handled as before (MMCFG accesses un-audited).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

libxc: do proper return code checking of allocator in domain builder

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxc: replace INVALID_P2M_ENTRY by INVALID_PFN

INVALID_P2M_ENTRY is defined as (xen_pfn_t)-1 and is often used
according to it's type for an invalid pfn. Change the name of the
macro to INVALID_PFN.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xendomains initscript: test for privcmd char device

Allow the init script to continue if either the character device or the
proc file is available.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools: update outdated header comment on privcmd.h

The BSDs have always accessed privcmd via /dev/xen/privcmd while Linux
has used /proc/xen/privcmd but things are shifting to /dev/xen/privcmd
as well.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxc: prefer using privcmd character device

Prefer using the character device over the proc file if the character
device exists. This follows similar conversions of xenbus to avoid
issues with FMODE_ATOMIC_POS added in Linux 3.14 and newer.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xen/build: disable default built-in rules and variables

Disable the built-in rules and variables from GNU make to improve
build performance and avoid awkward corner cases with the built-in
rules. Currently none of the implicit rules are used but this is helpful
to do when developing changes to the build system.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

MAINTAINERS: restore original maintainership of arch VPMU files

It was lost when vpmu* files were moved from xen/arch/x86/hvm/{vmx|svm}/ to
xen/arch/x86/cpu/

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

evtchn: don't reuse ports that are still "busy"

When using the FIFO ABI a guest may close an event channel that is
still LINKED. If this port is reused, subsequent events may be lost
because they may become pending on the wrong queue.

This could be fixed by requiring guests to only close event channels
that are not linked. This is difficult since: a) irq cleanup in the
guest may be done in a context that cannot wait for the event to be
unlinked; b) the guest may attempt to rebind a PIRQ whose previous
close is still pending; and c) existing guests already have the
problematic behaviour.

Instead, simply check a port is not "busy" (i.e., it's not linked)
before reusing it.

Guests should still drain any queues for VCPUs that are being
offlined, or the port will become unusable until the VCPU is onlined
and starts processing events again.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/HVM: XSETBV intercept needs to check CPL on SVM only

VMX doesn't need a software CPL check on the XSETBV intercept, and
SVM can do that check without resorting to hvm_get_segment_register().

Clean up what is left of hvm_handle_xsetbv(), namely make it return a
proper error code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/vmx: enable PML by default

Since PML series were merged (but disabled by default) we have conducted lots of
PML tests (live migration, GUI display) and PML has been working fine, therefore
turn it on by default.

Document of PML command line is adjusted accordingly as well.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Tested-by: Robert Hu <robert.hu@intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/ept: remove unnecessary sync after resolving misconfigured entries

When using EPT, type changes are done with the following steps:

1. Set entry as invalid (misconfigured) by settings a reserved memory
type.

2. Flush all EPT and combined translations (ept_sync_domain()).

3. Fixup misconfigured entries as required (on EPT_MISCONFIG vmexits or
when explicitly setting an entry.

Since resolve_misconfig() only updates entries that were misconfigured,
there is no need to invalidate any translations since the hardware
does not cache misconfigured translations (vol 3, section 28.3.2).

Remove the unnecessary (and very expensive) ept_sync_domain() calls).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

libxc: refactor memory allocation functions

There were some problems with the original memory allocation functions:
1. xc_dom_alloc_segment and xc_dom_alloc_pad ended up calling
xc_dom_chk_alloc_pages while xc_dom_alloc_page open-coded everything.
2. xc_dom_alloc_pad didn't call dom->allocate.

Refactor the code so that:
1. xc_dom_alloc_{segment,pad,page} end up calling
xc_dom_chk_alloc_pages.
2. xc_dom_chk_alloc_pages calls dom->allocate.

This way we avoid scattering dom->allocate over multiple locations and
open-coding.

Also change the return type of xc_dom_alloc_page to xen_pfn_t and return
an invalid pfn when xc_dom_chk_alloc_pages fails.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>

libxc: correct domain builder for 64 bit guest with 32 bit tools

Commit 8c45adec18e0512c3d34dcafb13414ecba21be6a ("create unmapped
initrd in domain builder if supported") introduced an error for
building a 64 bit guest with a 32 bit toolset.

The initrd start address and size where stored in an unsigned long
instead of using a 64 bit type.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxc: use correct return type for do_memory_op()

Currently do_memory_op() is returning int, while the hypervisor is
returning long. This will lead to wrong return informations as soon as
e.g. a pfn larger than about 2 billion (8 TB) is returned.

Use the correct long return type instead and correct the functions
expecting a pfn via the return value of do_memory_op().

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

ocaml/xc: add softreset shutdown reason

According to public/sched.h, there is a new shutdown_reason called
soft_reset. Propagate that value to ocaml.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave@recoil.org>

libxl: implement libxl__xs_mknod using XS_WRITE rather than XS_MKDIR

This patch modifies the implentation of libxl__xs_mknod() to use XS_WRITE
rather than XS_MKDIR since passing an empty value to the former will
ensure that the path is both existent and empty upon return, rather than
merely existent. The function return type is also changed to a libxl
error value rather than a boolean, it's declaration is accordingly moved
into the 'checked' section in libxl_internal.h, and a comment is added to
clarify its semantics.

This patch also contains as small whitespace fix in the definition of
libxl__xs_mknod() and the addition of 'ok' to CODING_STYLE as the
canonical variable name for holding return values from boolean functions.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: replace libxl__xs_mkdir() with libxl__xs_mknod()

This patch is purely cosmetic, it contains no functional change. A
change in the implementation of libxl__xs_mknod() will be made in a
subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

mwait_idle: Skylake Client Support

Skylake Client CPU idle Power states (C-states)
are similar to the previous generation, Broadwell.
However, Skylake does get its own table with updated
worst-case latency and average energy-break-even residency values.

Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit 493f133f47750aa5566fafa9403617e3f0506f8c]

mwait_idle: Skylake Client Support - updated

Addition of PC9 state, and minor tweaks to existing PC6 and PC8 states.

Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit 135919a3a80565070b9645009e65f73e72c661c0]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

drop unused __devexit{,data} and CONFIG_HOTPLUG

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Also CONFIG_HOTPLUG_CPU.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

drop empty __devinit annotation, and aliased __pminit

x86 is the only architecture which uses __devinit, and also has CONFIG_HOTPLUG
enabled, making the annotation empty.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

drop empty __devinitdata annotation

x86 is the only architecture which uses __devinitdata, and also has
CONFIG_HOTPLUG enabled, making the annotation empty.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

drop empty __cpuinitdata annotation

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

drop unused fastcall annotation

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86: properly macroize the two XRSTOR flavors

All they differ by is the REX64 prefix. Create a single macro covering
both, at once allowing to get rid of the disconnect between the current
partial macro and its two use sites.

No change in generated code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86: drop dummy input from alternative_{input,io}()

We don't need the claimed API compatibility. No change in generated
code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/cpu: introduce cpu_dev.c_early_init()

The name is chosen to be consistent with Linux. Doing this allows
early_intel_workaround() to be removed from common code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86: allow disabling the emulated local apic

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/vlapic: fixes for HVM code when running without a vlapic

The HVM related code (SVM, VMX) generally assumed that a local apic is
always present. With the introduction of a HVM mode were the local apic can
be removed, some of this broken code paths arised.

The SVM exit/resume paths unconditionally checked the state of the lapic,
which is wrong if it's been disabled by hardware, fix this by adding the
necessary checks. On the VMX side, make sure we don't add mappings for a
local apic if it's disabled.

In the generic vlapic code, add checks to prevent setting the TSC deadline
timer if the lapic is disabled, and also prevent trying to inject interrupts
from the PIC is the lapic is also disabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86: suppress bogus log message

The way we populate mpc_cpufeature is not compatible with modern CPUs,
and hence the message printed using that information is useless/bogus.
It's of interest only anyway when not using ACPI, so move it into MPS
parsing code. This at once significantly reduces boot time logging on
huge systems.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

HVM/save: allow the usage of zeroextend and a fixup function

With the current compat implementation in the save/restore context handling,
only one compat structure is allowed, and using _zeroextend prevents the
fixup function from being called.

In order to allow for the compat handling layer to be able to handle
different compat versions allow calling the fixup function with
hvm_load_entry_zeroextend.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

HVM/save: pass a size parameter to the HVM compat functions

In order to cope with types having multiple compat versions pass a size
parameter to the fixup function so we can identify which compat version
Xen is dealing with.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

build: fix dependencies for files compiled from their parent directory

The use of $(basename ...) here was wrong (yet I'm sure I tested it).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

MAINTAINERS: change the vt-d maintainer

add Feng as the new maintainer of VT-d stuff

Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

x86/viridian: flush remote tlbs by hypercall

The Microsoft Hypervisor Top Level Functional Spec. (section 3.4) defines
two bits in CPUID leaf 0x40000004:EAX for the hypervisor to recommend
whether or not to issue a hypercall for local or remote TLB flush.

Whilst it's doubtful whether using a hypercall for local TLB flush would
be any more efficient than a specific INVLPG VMEXIT, a remote TLB flush
may well be more efficiently done. This is because the alternative
mechanism is to IPI all the vCPUs in question which (in the absence of
APIC virtualisation) will require emulation and scheduling of the vCPUs
only to have them immediately VMEXIT for local TLB flush.

This patch therefore adds a viridian option which, if selected, enables
the hypercall for remote TLB flush and implements it using ASID
invalidation for targetted vCPUs followed by an IPI only to the set of
CPUs that happened to be running a targetted vCPU (which may be the empty
set). The flush may be more severe than requested since the hypercall can
request flush only for a specific address space (CR3) but Xen neither
keeps a mapping of ASID to guest CR3 nor allows invalidation of a specific
ASID, but on a host with contended CPUs performance is still likely to
be better than a more specific flush using IPIs.

The implementation of the patch introduces per-vCPU viridian_init() and
viridian_deinit() functions to allow a scratch cpumask to be allocated.
This avoids needing to put this potentially large data structure on stack
during hypercall processing. It also modifies the hypercall input and
output bit-fields to allow a check for the 'fast' calling convention,
and a white-space fix in the definition of HVMPV_feature_mask (to remove
hard tabs).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

public/event_channel.h: correct comment

According to definition of structure evtchn_alloc_unbound,
there is an entry "domid_t remote_dom", no "rdom". So
using "remote_dom" in comments instead of "rdom".

Signed-off-by: Peng Fan <van.freenix@gmail.com>

x86/boot: check for not allowed sections before linking

Currently check for not allowed sections is performed just after
compilation. However, if compilation succeeds and check fails then
second build will create xen.gz/xen.efi without any visible error.
This happens because %.o: %.c recipe created object file during first
run and make do not execute this recipe during second run. So, look
for not allowed sections before linking. This way check will be
executed every time.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

libxc: expose xsaves/xgetbv1/xsavec to hvm guest

This patch exposes xsaves/xgetbv1/xsavec to hvm guest.
The reserved bits of eax/ebx/ecx/edx must be cleaned up
when call cpuid(0dh) with leaf 1 or 2..63.

According to the spec the following bits must be reserved:
For leaf 1, bits 03-04/08-31 of ecx is reserved. Edx is reserved.
For leaf 2...63, bits 01-31 of ecx is reserved, Edx is reserved.

But as no XSS festures are currently supported, even in HVM guests,
for leaf 2...63, ecx should be zero at the moment.

Signed-off-by: Shuai Ruan <shuai.ruan@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

x86/xsaves: enable xsaves/xrstors for hvm guest

This patch enables xsaves for hvm guest, includes:
1.handle xsaves vmcs init and vmexit.
2.add logic to write/read the XSS msr.

Add IA32_XSS_MSR save/rstore support.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/xsaves: enable xsaves/xrstors/xsavec in xen

This patch uses xsaves/xrstors/xsavec instead of xsaveopt/xrstor
to perform the xsave_area switching so that xen itself
can benefit from them when available.

For xsaves/xrstors/xsavec only use compact format. Add format conversion
support when perform guest os migration. Also, pv guest will not support
xsaves/xrstors.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
[dropped redundant uses of XRSTOR_FIXUP and fix formatting]
Signed-off-by: Jan Beulich <jbeulich@suse.com>

x86/xsaves: using named operand instead numbered operand in xrstor

This is pre-req patch for latter xsaves patch. This patch introduce
a macro to handle restor fixup, also use named opreand instead of
numbered operand in restor fixup code.

Signed-off-by: Shuai Ruan <shuai.ruan@intel.com>
[with the expectation of later doing some cleanup:]
Acked-by: Jan Beulich <jbeulich@suse.com>

build: remove .d files from xen/ on a clean

Dependency files were getting left behind in the xen
directory (since 8b6ef9c152edceabecc7f90c811cd538a7b7a110),
so append the $(DEPS) to the clean rule that runs in the
hypervisor directory.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>

console: make printk() line continuation tracking per-CPU

This avoids cases where split messages (with other than the initial
part not carrying a log level; single line messages only of course)
issued on multiple CPUs interfere with each other, causing messages to
be issued which are supposed to be suppressed due to the log level
setting. E.g.

CPU A CPU B
XENLOG_G_DEBUG "abc"
XENLOG_G_DEBUG "def\n"
"xyz\n"

would cause the last message to be logged despite this obviously not
being intended (at default log levels).

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v3: Make clear that GICD_*SPI_* registers are reserved

Our vGIC emulation have GICD_TYPER.MBIS set to 0 which means that
GICD_*SPI_* registers are reserved. Implement them using the *_reserved
labels.

Also, implement theses registers for the read part.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v3: Don't implement write-only register read as zero

A read to a write only register is unknown. Use a memorable value to
differentiate from an actual RAZ register.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v3: Remove spurious return in GICR_INVALLR

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v3: Emulate read to GICD_ICACTIVER<n>

The GICD_ICACTIVER<n> registers are missing in the read emulation of the
distributor.

Call the common emulation for the whole range.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic: Re-order the register emulations to match the memory map

It helps to find quickly whether we forgot to emulate a register or not.

At the same time add the missing reserved/implementation defined
registers. All other missing registers will be added in a follow-up if
necessary.

Note that only the distributor register map explicitely say the
size of a register (see 8.8 in ARM IHI 0069A). When the size is not
known, the implementation defined/reserved may not be emulated
correctly.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v3: Remove GICR_MOVALLR and GICR_MOVLPIR

The 2 registers are not described in the software spec (ARM IHI 0069A)
and their offsets are marked "implementation defined".

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic: Properly emulate the full register

The offset in the emulation is based on byte. As most of the registers
are 64/32 bits, they will span over multiple bytes.

However, the current emulation only cares about the first offset. This
will result in not properly emulating any access on the register with
any other offset.

Introduce new macros to help implementing access on multiple byte and
use them over the vGIC emulation.

Note that I didn't convert the reserved/implementation defined
registers. It will be done in a follow-up.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v3: Only emulate identification registers required by the spec

Most of the identification registers space contains implementation
defined registers (see 8.1.13 in ARM IHI 0069A) and only GIC{D,R}_PIDR2
is required to be implemented.

Currently the emulation of those registers mimic the ARM implementation,
but it's untrue to say that we properly emulate a such implementation.

Keep only GIC{D,R}_PIDR2 implemented with the "implementation defined
bits" to zero and the ArchRev field (bits[7:4]) to 0x3 as we emulate a
GICv3.

Note that the emulation of the range wasn't valid anyway because the
registers are split in 2 sets (PIDR4-PIDR7 and PIDR0-PIDR2).

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v3: Use the correct offset GICR_IGRPMODR0

The offset is 0x0D00 and not 0x0F80.

Also re-order the definition to keep all the definitions ordered.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v3: Don't try to emulate IROUTER which do not exist in the spec

The range of valid IROUTER<n> are n = 32 - 1019 (see 8.9.13 in IHI 0069A)
which correspond to the offset 0x6100-0x7FD8.

Other offsets are invalid and therefore should not be emulated.

Also remove the now unused label read_as_zero_64 and write_ignore_64.

Note that GICD_IROUTER is kept to accommodate the GICv3 drivers which has
been in part taken from Linux.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v2: Implement correctly ICFGR{0, 1} read-only

Each ITARGETSR register is 4-bytes wide and the offset is in bytes.

The current implementation is computing the offset of ICFGR1 and ICFG2
wrongly result to emulate only the first 2 byte of the ICFGR<n> range
read-only. The rest will be treated as read-write.

For convenience introduce ITARGETSR1 and ITARGETSR2.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- typoes in commit message ]

xen/arm: vgic-v3: Support 32-bit access for 64-bit registers

Based on 8.1.3 (IHI 0069A), unless stated otherwise, the 64-bit registers
supports both 32-bit and 64-bits access.

All the registers we properly emulate (i.e not RAZ/WI) supports 32-bit access.

For RAZ/WI, it's also seems to be the case but I'm not 100% sure. Anyway,
emulating 32-bit access for them doesn't hurt. Note that we would need
some extra care when they will be implemented (for instance GICR_PROPBASER).

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic: Introduce helpers to extract/update/clear/set vGIC register ...

and use them in the vGIC emulation.

The GIC registers may support different access sizes. Rather than open
coding the access for every registers, provide a set of helpers to access
them.

The caller will have to call vgic_regN_* where N is the size of the
emulated registers.

The new helpers supports any access size and expect the caller to
validate the access size supported by the emulated registers.

Finally, take the opportunity to fix the coding style in section we are
modifying.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic: Optimize the way to store the target vCPU in the rank

Xen is currently directly storing the value of GICD_ITARGETSR register
(for GICv2) and GICD_IROUTER (for GICv3) in the rank. This makes the
emulation of the registers access very simple but makes the code to get
the target vCPU for a given vIRQ more complex.

While the target vCPU of an vIRQ is retrieved every time an vIRQ is
injected to the guest, the access to the register occurs less often.

So the data structure should be optimized for the most common case
rather than the inverse.

This patch introduces the usage of an array to store the target vCPU for
every interrupt in the rank. This will make the code to get the target
very quick. The emulation code will now have to generate the GICD_ITARGETSR
and GICD_IROUTER register for read access and split it to store in a
convenient way.

With the new way to store the target vCPU, the structure vgic_irq_rank
is shrunk down from 320 bytes to 92 bytes. This is saving about 228
bytes of memory allocated separately per vCPU.

Note that with these changes, any read to those register will list only
the target vCPU used by Xen. As the spec is not clear whether this is a
valid choice or not, OSes which have a different interpretation of the
spec (i.e OSes which perform read-modify-write operations on these
registers) may not boot anymore on Xen. Although, I think this is fair
trade between memory usage in Xen (1KB less on a domain using 4 vCPUs
with no SPIs) and a strict interpretation of the spec (though all the
cases are not clearly defined).

Furthermore, the implementation of the callback get_target_vcpu is now
exactly the same. Consolidate the implementation in the common vGIC code
and drop the callback.

Finally take the opportunity to fix coding style and replace "irq" by
"virq" to make clear that we are dealing with virtual IRQ in section we
are modifying.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0

The current implementation ignores the whole write if one of the field is
0. Although, based on the spec (4.3.12 IHI 0048B.b), 0 is a valid value
when:
    - The interrupt is not wired in the distributor. From the Xen
    point of view, it means that the corresponding bit is not set in
    d->arch.vgic.allocated_irqs.
    - The user wants to disable the IRQ forwarding in the distributor.
    I.e the IRQ stays pending in the distributor and never received by
    the guest.

Implementing the later will require more work in Xen because we always
assume the interrupt is forwarded to a valid vCPU. So for now, ignore
any field where the value is 0.

The emulation of the write access of ITARGETSR has been reworked and
moved to a new function because it would have been difficult to
implement properly the behavior with the current code.

The new implementation is breaking the register in 4 distinct bytes. For
each byte, it will check the validity of the target list, find the new
target, migrate the interrupt and store the value if necessary.

In the new implementation there is nearly no distinction of the access
size to avoid having too many different path which is harder to test.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: vgic-v2: Handle correctly byte write in ITARGETSR

During a store, the byte is always in the low part of the register (i.e
[0:7]).

We are incorrectly masking the register by using a shift of the byte
offset in the ITARGETSR while the byte is alwasy in r[0:7]. This will
result in a target list equal to 0 which is ignored by the emulation.

Because of that the guest will only be able to modify the first byte in
each ITARGETSR.

Furthermore, the body of the loop is retrieving the old target list
using the index of the byte.

To avoid modifying too much the loop, shift the byte stored to the correct
offset.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

xen/arm: vgic-v2: Implement correctly ITARGETSR0 - ITARGETSR7 read-only

Each ITARGETSR register are 4-byte wide and the offset is in byte.

The current implementation is computing the end of the range wrongly
resulting to emulate only ITARGETSR{0,1} read-only. The rest will be
treated as read-write.

As 8 registers should be read-only, the end of the range should be
ITARGETSR + (4 * 8) - 1.

For convenience introduce ITARGETSR7 and ITARGETSR8.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

xen/arm: move ticks conversions function declarations to the header file

This is just a cleanup, not required at the moment.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

arm: export platform_op XENPF_settime64

Call update_domain_wallclock_time at domain initialization.
Set time_offset_seconds to the number of seconds between physical boot
and domain initialization: it is going to be used to get/set the
wallclock time.
Add time_offset_seconds to system_time when before calling do_settime,
so that system_time actually accounts for all the time in nsec between
machine boot and when the wallclock was set.

Expose xsm_platform_op to ARM.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: dgdegra@tycho.nsa.gov

xen: move wallclock functions from x86 to common

Remove dummy arm implementation of wallclock_time.
Use shared_info() in common code rather than x86-ism to access it, when
possible.

Define the static variable wc_sec, and the local variable sec in
update_domain_wallclock_time, as uint64_t instead of unsigned long, to
avoid size issue on arm.
Take a uint64_t sec parameter in do_settime for the same reason.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: JBeulich@suse.com
CC: andrew.cooper3@citrix.com
[ ijc -- typoes in commit message ]

x86/VPMU: return correct fixed PMC count

Fixes a register typo.

Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

public/io/netif.h: tidy up and remove duplicate comments

Now that requests and response types and extra info segments are
documented in block comments, we can get rid of the inline comments
in the structures. This has the happy side-effect of making the Linux
checkpatch.pl script make fewer complaints after import.

This patch also fixes a small whitespace issue in the initial boiler-
plate comment, and a typo in one of the ascii-art diagrams.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

public/io/netif.h: add definition of gso_prefix flag

This flag is defined here only for compatibility with the Linux variant of
this header. The feature has never been documented and should be
considered deprecated.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

public/io/netif.h: document the reality of netif_rx_request/reponse

Because GSO metadata is passed from backend to frontend using
netif_extra_info segments, which do not carry information stating which
netif_rx_request_t was consumed to free up their slot, frontends must
assume some form of identity relation between ring slot and request.
Hence, so that it is able to use GSO metadata, Linux netfront simply
assumes rx responses appear in the same ring slot as their corresponding
request.

This patch documents the assumption made by Linux netfront and the
necessity of the assumption (to support GSO) so that backends are coded
to be compatible.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

x86/VPMU: Initialize VPMU's lvtpc vector

If a guest sets up performance counters so that they can generate
a PMC interrupt but does not initilaize APIC LVTPC register the
resulting interrupt will cause an APIC error.

Note that a guest deciding to clear LVTPC in order to unduce the error
will not be successful in achieving its goal: emulation code only
looks at the mask bit and always sets the vector to PMU_APIC_VECTOR.
Only the initial value of LVTPC (which is zero) that gets loaded into
APIC as result of PMC initialization is the problem.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/vPMU: document as unsupported

This is XSA-163.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

x86/kexec: hide more kexec infrastructure behind CONFIG_KEXEC

Experimenting with the kconfig series showed that various bits of kexec
infrastructure were still being unconditionally included. Make them
conditional on CONFIG_KEXEC.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

x86: drop MAX_APICID

It's unused and wrong (we already have MAX_LOCAL_APIC and MAX_APICS).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

libxl: fix line wrapping issues introduced by automatic replacement

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

libxl: convert libxl__sprintf(gc) to GCSPRINTF

The rune used is:

sed -i 's/libxl__sprintf(gc,\s*/GCSPRINTF(/g' libxl*.c

This rune is simple and better than trying to match every possible
patterns.

Two instances in libxl_dm.c need fixing up. They are in fact better to just
use libxl__strdup.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

tools/hotplug: quote all variables in vif-bridge

Cosmetics: most of the variables used in vif-bridge are already quoted.
Add quoting also to the remaining shell variables.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

docs: Introduce xenstore paths for guest network address information

It is useful for a toolstack to be able to see the network addresses
in use by a domain for a particular vif in xenstore for display
purposes and, for example, so that a VNC session can be established
to the guest GUI.

This patch documents paths to allow a domain to advertise an interface
name, MAC (unicast and multicast) and IP (version 4 and 6) address
information.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

docs: Introduce xenstore paths for hotplug features

Without some indication from a guest it is not possible for a
toolstack to know whether instantiation of a new vbd or vif should
result in a new PV device of the appropriate type being brought online.
(In other words whether guest PV drivers are present and functioning).

This patch documents two paths which vif and vbd frontend drivers can
use to advertise their ability to respond to new vif or vbd
instantiations.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

docs: Introduce xenstore paths for PV driver information

For domain management purposes it is convenient to be able to see
information about PV drivers in xenstore. The XAPI toolstack in
XenServer has always created a ~/drivers path for this purpose.

This patch documents that path and also adds a specification of how
it should be used.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

docs: Introduce xenstore paths for PV control features

XenServer already makes use of ~/control/feature-suspend being written
to advertise guest capability of responding to 'suspend' when written to
~/control/shutdown and, since they are derived from XenServer drivers,
the Xen Project Windows PV drivers attempt to write this value. The write
currently fails for libxl provisioned VMs because ~/control is read-only
to the guest (only ~/control/shutdown is writable, for ackowledgement
purposes).

This patch documents feature-suspend and also a set of similar control
feature flags, so that that they may be added to libxl provisioned
guests by subsequent patches:

feature-poweroff: PV drivers/agent can shut down the guest
feature-reboot: PV drivers/agent can reboot the guest
feature-s3: PV drivers/agent can trigger guest sleep (HVM only)
feature-s4: PV drivers/agent can trigger guest hibernate (HVM only)

The patch (bacause it adds features relating to S3 and S4 power states)
also clarifies that the initial set of platform properties mentioned are
booleans, and updates the specifier accordingly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

get_maintainer: fix perl 5.22/5.24 deprecated/incompatible "\C" use

Perl 5.22 emits a deprecated message when "\C" is used in a regex. Perl
5.24 will disallow it altogether.

Fix it by using [A-Z] instead of \C.

[ Upstream commit ce8155f7a3d59ce868ea16d8891edda4d865e873 ]

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

tools/libxl: Drop dead code following calls to libxl__exec()

libxl__exec() doesn't ever return. Inform the compiler of this, and
remove all dead code.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: use masking operation instead of test_bit for MCSF bits

This is a follow of commit 90f2e2a307fc6a6258c39cc87b3b2bf9441c0fa7 "use
masking operation instead of test_bit for MCSF bits" where the ARM
changes were missing.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

MAINTAINERS: mini-os patches should be copied to minios-devel

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: samuel.thibault@ens-lyon.org
Cc: stefano.stabellini@eu.citrix.com
Cc: minios-devel@lists.xenproject.org
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>

MINIOS_UPSTREAM_REVISION Update

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

Config.mk: Update SEABIOS_UPSTREAM_TAG to rel-1.9.0

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

sched: get rid of the per domain vCPU list in Credit2

As, curently, there is no reason for bothering having
it and keeping it updated.

In fact, it is only used for dumping and changing
vCPUs parameters, but that can be achieved easily with
for_each_vcpu.

While there, improve alignment of comments, ad
add a const qualifier to a pointer, making things
more consistent with what happens everywhere else
in the source file.

This also allows us to kill one of the remaining
FIXMEs in the code, which is always good.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

sched: get rid of the per domain vCPU list in RTDS

As, curently, there is no reason for bothering having
it and keeping it updated.

In fact, it is only used for dumping and changing
vCPUs parameters, but that can be achieved easily with
for_each_vcpu.

While there, take care of the case when
XEN_DOMCTL_SCHEDOP_getinfo is called but no vCPUs have
been allocated yet (by returning the default scheduling
parameters).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

sched: better handle (not) inserting idle vCPUs in runqueues

Idle vCPUs are set to run immediately, as a part of their
own initialization, so we shouldn't even try to put them
in a runqueue. In fact, no scheduler does that, even when
asked to (that is rather explicit in Credit2 and RTDS, a
bit less evident in Credit1).

Let's make things look as follows:
- in generic code, explicitly avoid even trying to
insert idle vCPUs in runqueues;
- in specific schedulers' code, enforce that.

Note that, as csched_vcpu_insert() is no longer being
called, during boot (from sched_init_vcpu()) we can
safely avoid saving the flags when taking the runqueue
lock.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

sched: clarify use cases of schedule_cpu_switch()

schedule_cpu_switch() is meant to be only used for moving
pCPUs from a cpupool to no cpupool, and from there back
to a cpupool, *not* to move them directly from one cpupool
to another.

This is something inherent to the way the function is
implemented and called, but is not that clear, just by the
look of it.

Make it more evident by:
- adding commentary and ASSERT()s;
- update the cpupool per-CPU variable (mapping pCPUs to
pools) directly in schedule_cpu_switch(), rather than
in various places in cpupool.c.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

sched: fix locking for insert_vcpu() in credit1 and RTDS

The insert_vcpu() hook is handled with inconsistent locking.
In fact, schedule_cpu_switch() calls the hook with runqueue
lock held, while sched_move_domain() relies on the hook
implementations to take the lock themselves (and, since that
is not done in Credit1 and RTDS, such operation is not safe
in those cases).

This is fixed as follows:
- take the lock in the hook implementations, in specific
   schedulers' code;
- avoid calling insert_vcpu(), for the idle vCPU, in
   schedule_cpu_switch(). In fact, idle vCPUs are set to run
   immediately, and the various schedulers won't insert them
   in their runqueues anyway, even when explicitly asked to.

While there, still in schedule_cpu_switch(), locking with
_irq() is enough (there's no need to do *_irqsave()).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>

x86/HVM: type adjustments

- constify struct hvm_trap * function parameters
- width reduce and shuffle some struct hvm_trap members
- use bool_t for boolean fields struct hvm_function_table
- use unsigned for struct hvm_function_table's hap_capabilities field

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky<boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

VMX: fix/adjust trap injection

In the course of investigating the 4.1.6 backport issue of the XSA-156
patch I realized that #DB injection has always been broken, but with it
now getting always intercepted the problem has got worse: Documentation
clearly states that neither DR7.GD nor DebugCtl.LBR get cleared before
the intercept, so this is something we need to do before reflecting the
intercepted exception.

While adjusting this (and also with 4.1.6's strange use of
X86_EVENTTYPE_SW_EXCEPTION for #DB in mind) I further realized that
the special casing of individual vectors shouldn't be done for
software interrupts (resulting from INT $nn).

And then some code movement: Setting of CR2 for #PF can be done in the
same switch() statement (no need for a separate if()), and reading of
intr_info is better done close the the consumption of the variable
(allowing the compiler to generate better code / use fewer registers
for variables).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>