]> xenbits.xensource.com Git - people/dariof/xen.git/log
people/dariof/xen.git
7 years agoxen: sched: simplify (and speedup) checking soft-affinity rel/sched/misc-credit1-credit2-optim github/rel/sched/misc-credit1-credit2-optim gitlab/rel/sched/misc-credit1-credit2-optim
Dario Faggioli [Thu, 14 Sep 2017 01:46:35 +0000 (03:46 +0200)]
xen: sched: simplify (and speedup) checking soft-affinity

The fact of whether or not a vCPU has a soft-affinity
which is effective, i.e., with the power of actually
affecting the scheduling of the vCPU itself rarely
changes. Very, very rarely, as compared to how often
we need to check for the same thing (basically, at
every scheduling decision!).

That can be improved by storing in a (per-vCPU) flag
(it's actually a boolean field in struct vcpu) whether
or not, considering how hard-affinity and soft-affinity
look like, soft-affinity should or not be taken into
account during scheduling decisions.

This saves some cpumask manipulations, which is nice,
considering how frequently they were being done. Note
that we can't get rid of 100% of the cpumask operations
involved in the check, because soft-affinity being
effective or not, not only depends on the relationship
between the hard and soft-affinity masks of a vCPU, but
also of the online pCPUs and/or of what pCPUs are part
of the cpupool where the vCPU lives, and that's rather
impractical to store in a per-vCPU flag. Still the
overhead is reduced to "just" one cpumask_subset() (and
only if the newly introduced flag is 'true')!

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
7 years agoxen: sched: improve checking soft-affinity
Dario Faggioli [Wed, 13 Sep 2017 17:05:21 +0000 (19:05 +0200)]
xen: sched: improve checking soft-affinity

Whether or not a vCPU has a soft-affinity which is
effective, i.e., with the power of actually affecting
the scheduling of the vCPU itself, happens in an
helper function, called has_soft_affinity().

Such function takes a custom cpumask as its third
parameter, for better flexibility, but that mask is
different from the vCPU's hard-affinity only in one
case. Getting rid of that parameter, not only simplify
the function, but enables for optimizing the soft
affinity check (which will happen, in a subsequent
commit).

This commit, therefore, does that. It's mostly
mechanical, with the only exception _csched_cpu_pick()
(in Credit1 code).

Signed-off-by: Dario Faggioli <raistlin@linux.it>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
Changes from v2:
- fix potential spurious usage of the scratch space of the wrong cpu.

7 years agoxen: sched: optimize exclusive pinning case (Credit1 & 2)
Dario Faggioli [Fri, 16 Jun 2017 15:48:37 +0000 (17:48 +0200)]
xen: sched: optimize exclusive pinning case (Credit1 & 2)

Exclusive pinning of vCPUs is used, sometimes, for
achieving the highest level of determinism, and the
least possible overhead, for the vCPUs in question.

Although static 1:1 pinning is not recommended, for
general use cases, optimizing the tickling code (of
Credit1 and Credit2) is easy and cheap enough, so go
for it.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
Changes from last posting:
- none!

Changes from v2:
- deal with NULL being passed as the value of 'hard' to sched_set_affinity().

Changes from v1:
- use a flag during runtime, as suggested during review;
- make use of the affinity-change hook, introduced in pevious patch.

7 years agoxen: sched: introduce 'adjust_affinity' hook.
Dario Faggioli [Thu, 27 Jul 2017 14:54:06 +0000 (16:54 +0200)]
xen: sched: introduce 'adjust_affinity' hook.

For now, just as a way to give a scheduler an "heads up",
about the fact that the affinity changed.

This enables some optimizations, such as pre-computing
and storing (e.g., in flags) facts like a vcpu being
exclusively pinned to a pcpu, or having or not a
soft affinity. I.e., conditions that, despite the fact
that they rarely change, are right now checked very
frequently, even in hot paths.

Note that, as we expect many scheduler specific
implementations of the adjust_affinity hook to do
something with the per-scheduler vCPU private data,
this commit moves the calls to sched_set_affinity()
after that is allocated (in sched_init_vcpu()).

Note also that this, in future, may turn out as a useful
mean for, e.g., having the schedulers vet, ack or nack
the changes themselves.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
Changes from last posting:
- rebased on staging.

Changes from v2:
- fix sched_set_affinity() not to use always hard_affinity;
- move calls to sched_set_affinity() below per-scheduler vCPU data allocation.

7 years agoxen/x86: Implement enable_nmis() in C
Andrew Cooper [Thu, 15 Mar 2018 16:15:45 +0000 (16:15 +0000)]
xen/x86: Implement enable_nmis() in C

I don't recall why I chose to implement this in assembly to begin with, but
it can happily live in a static inline instead, and only has two callers.

Doing so reduces the quantity of code in .text.entry.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/libacpi: Drop useless print messages
Andrew Cooper [Thu, 15 Mar 2018 11:56:40 +0000 (11:56 +0000)]
tools/libacpi: Drop useless print messages

Libraries have no buisness using stdout directly, and these have no real
value.  Dropping them removes the following output when building a PVH guest:

  [root@fusebot ~]# xl create shim.cfg
  Parsing config from shim.cfg
  S3 disabled
  S4 disabled
  CONV disabled
  [root@fusebot ~]#

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86emul: place test blobs in executable section
Jan Beulich [Thu, 15 Mar 2018 16:01:33 +0000 (17:01 +0100)]
x86emul: place test blobs in executable section

This allows the section contents to be disassembled without going
through any extra hoops, simplifying the analysis of problems in test
and/or emulation code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: support 3DNow! insns
Jan Beulich [Thu, 15 Mar 2018 16:00:56 +0000 (17:00 +0100)]
x86emul: support 3DNow! insns

Yes, recent AMD CPUs don't support them anymore, but I think we should
nevertheless cope.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR
Liran Alon [Thu, 15 Mar 2018 15:59:52 +0000 (16:59 +0100)]
x86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR

According to Intel SDM section "Interrupt Acceptance for Fixed Interrupts":
"The trigger mode register (TMR) indicates the trigger mode of the
interrupt (see Figure 10-20). Upon acceptance of an interrupt
into the IRR, the corresponding TMR bit is cleared for
edge-triggered interrupts and set for level-triggered interrupts.
If a TMR bit is set when an EOI cycle for its corresponding
interrupt vector is generated, an EOI message is sent to
all I/O APICs."

Before this patch TMR-bit was cleared on LAPIC EOI which is not what
real hardware does. This was also confirmed in KVM upstream commit
a0c9a822bf37 ("KVM: dont clear TMR on EOI").

Behavior after this patch is aligned with both Intel SDM and KVM
implementation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/VMX: don't risk corrupting host CR4
Jan Beulich [Thu, 15 Mar 2018 11:45:30 +0000 (12:45 +0100)]
x86/VMX: don't risk corrupting host CR4

Instead of "syncing" the live value to what mmu_cr4_features has, make
sure vCPU-s run with the value most recently loaded into %cr4, such that
after the next VM exit we continue to run with the intended value rather
than a possibly stale one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86: ignore guest microcode loading attempts
Jan Beulich [Thu, 15 Mar 2018 11:44:24 +0000 (12:44 +0100)]
x86: ignore guest microcode loading attempts

The respective MSRs are write-only, and hence attempts by guests to
write to these are - as of 1f1d183d49 ("x86/HVM: don't give the wrong
impression of WRMSR succeeding") no longer ignored. Restore original
behavior for the two affected MSRs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoRevert "tools: detect appropriate debug optimization level"
Wei Liu [Wed, 14 Mar 2018 17:15:15 +0000 (17:15 +0000)]
Revert "tools: detect appropriate debug optimization level"

This reverts commit b43501451733193b265de30fd79a764363a2a473.

Due to the implementation of cc-option, the check is always true,
which means build for gcc that doesn't have -Og support is broken.

This patch can be reapplied once we have fixed cc-option.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs: Fix entry for the "usbdev" option
Anthony PERARD [Wed, 14 Mar 2018 15:00:14 +0000 (15:00 +0000)]
docs: Fix entry for the "usbdev" option

The man for xl.cfg have the "devtype=hostdev" option, but xl only
understand "type=hostdev", fix the manual to reflect actual
implementation.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/entry: Trivial nonfunctional fixes
Andrew Cooper [Wed, 14 Mar 2018 10:36:09 +0000 (10:36 +0000)]
x86/entry: Trivial nonfunctional fixes

 * Drop unnecessary size suffixes
 * The C pseudocode refers to a trap_info object, not trap_bounce.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Fix guest crashes following f75b1a5247b "x86/pv: Drop int80_bounce from struc...
Andrew Cooper [Wed, 14 Mar 2018 10:48:36 +0000 (10:48 +0000)]
x86/pv: Fix guest crashes following f75b1a5247b "x86/pv: Drop int80_bounce from struct pv_vcpu"

The original init_int80_direct_trap() was in fact buggy; `int $0x80` is not an
exception.  This went unnoticed for years because int80_bounce and trap_bounce
were separate structures, but were combined by this change.

Exception handling is different to interrupt handling for PV guests.  By
reusing trap_bounce, the following corner case can occur:

 * Handle a guest `int $0x80` instruction.  Latches TBF_EXCEPTION into
   trap_bounce.
 * Handle an exception, which emulates to success (such as ptwr support),
   which leaves trap_bounce unmodified.
 * The exception exit path sees TBF_EXCEPTION set and re-injects the `int
   $0x80` a second time.

Drop the TBF_EXCEPTION from the int80 invocation, which matches the equivalent
logic from the syscall/sysenter paths.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxl_qmp: Tell QEMU about live migration or snapshot
Anthony PERARD [Tue, 13 Mar 2018 11:13:18 +0000 (11:13 +0000)]
libxl_qmp: Tell QEMU about live migration or snapshot

Since version 2.10, QEMU will lock the disk images so a second QEMU
instance will not try to open it. This would prevent live migration from
working correctly. A new parameter as been added to the QMP command
"xen-save-devices-state" in QEMU version 2.11 which allow to unlock the
disk image for a live migration, but also keep it locked for a snapshot.

QEMU commit: 5d6c599fe1d69a1bf8c5c4d3c58be2b31cd625ad
"migration, xen: Fix block image lock issue on live migration"

The extra "live" parameter can only be use if QEMU knows about it, so
only add it if qemu is recent enough.

The struct libxl__domain_suspend_state as now knowledge if the suspend
is part of a live migration.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: Add a version check of QEMU for QMP commands
Anthony PERARD [Tue, 13 Mar 2018 11:13:17 +0000 (11:13 +0000)]
libxl: Add a version check of QEMU for QMP commands

On connection to QEMU via QMP, the version of QEMU is provided, store it
for later use.

Add a function qmp_qemu_check_version that can be used to check if QEMU
is new enough for certain fonctionnality. This will be used in a moment.

As it's a static function, it is commented out until first use, which is
in the next patch.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agogitignore: ignore wrappers.c link for fuzzer
Wei Liu [Wed, 14 Mar 2018 11:02:31 +0000 (11:02 +0000)]
gitignore: ignore wrappers.c link for fuzzer

At the same time reorder the entries alphabetically.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxl: remove apic option for PVH guests
Roger Pau Monne [Wed, 14 Mar 2018 11:09:24 +0000 (11:09 +0000)]
xl: remove apic option for PVH guests

XSA-256 forces the local APIC to always be enabled for PVH guests, so
ignore any apic option for PVH guests. Update the documentation
accordingly.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: xenalyze.c fix format-truncation
John Thomson [Wed, 14 Mar 2018 08:21:24 +0000 (18:21 +1000)]
tools: xenalyze.c fix format-truncation

With gcc optimization enabled by:
tools: detect appropriate debug optimization level
b43501451733193b265de30fd79a764363a2a473

-Wformat-truncation throws warnings

gcc version 7.3.0

xenalyze.c: In function 'find_symbol':
xenalyze.c:382:36: error: 'snprintf' output may be truncated before the last format character [-Werror=format-truncation=]
     snprintf(name, 128, "(%s +%llx)",
                                    ^
xenalyze.c:382:5: note: 'snprintf' output between 6 and 144 bytes into a destination of size 128
     snprintf(name, 128, "(%s +%llx)",
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
              lastname, offset);
              ~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agotools/xl: fix uninitialized variable in xl_vdispl
Doug Goldstein [Tue, 13 Mar 2018 16:25:29 +0000 (11:25 -0500)]
tools/xl: fix uninitialized variable in xl_vdispl

The code added in 7a48622a78a0b452e8afa55b8442c958abd226a7 could use rc
uninitialized in main_vdisplattach().

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agomake xen ocaml safe-strings compliant
Michael Young [Mon, 12 Mar 2018 18:49:29 +0000 (18:49 +0000)]
make xen ocaml safe-strings compliant

Xen built with ocaml 4.06 gives errors such as
Error: This expression has type bytes but an expression was
        expected of type string
as Byte and safe-strings which were introduced in 4.02 are the
default in 4.06.
This patch which is partly by Richard W.M. Jones of Red Hat
from https://bugzilla.redhat.com/show_bug.cgi?id=1526703
fixes these issues.

Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Reviewed-by: Christian Lindig<christian.lindig@citrix.com>
7 years agotools: detect appropriate debug optimization level
Doug Goldstein [Tue, 13 Mar 2018 04:06:51 +0000 (23:06 -0500)]
tools: detect appropriate debug optimization level

When building debug use -Og as the optimization level if its available,
otherwise retain the use of -O0. -Og has been added by GCC to enable all
optimizations that to not affect debugging while retaining full
debugability.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs: Remove redundant qemu-xen-security document
George Dunlap [Fri, 9 Mar 2018 11:04:18 +0000 (11:04 +0000)]
docs: Remove redundant qemu-xen-security document

All this information is now covered in SUPPORT.md.

Most of the emulated hardware is obvious a couple of the items are
worth pointing out specifically.

"xen_disk" is listed under "Blkback"

"...the PCI host bridge and the PIIX3 chipset...": This statement is
redundant -- the PCI host bridge is a part of the piix3 chipset, which
is listed as supported.

xenfb: The "graphics" side of "xenfb" is listed under "PV Framebuffer
(backend)", and the "input" side of "xenfb" (including both keyboard
and mouse) is listed under "PV Keyboard (backend)".

Backing storage image format is listed in the "Blkback" section.

Fix 'stdvga' spelling while we're here.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Specify support for various image formats
George Dunlap [Fri, 9 Mar 2018 17:27:58 +0000 (17:27 +0000)]
SUPPORT.md: Specify support for various image formats

QEMU supports various image formats, but we only provide security
support for raw, qcow, qcow2, and vhd formats.

Rather than duplicate this information under the "x86/Emulated
storage" section, just refer to the "Blkback" section.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Clarify that the PV keyboard protocol includes mouse support
George Dunlap [Fri, 9 Mar 2018 11:26:03 +0000 (11:26 +0000)]
SUPPORT.md: Clarify that the PV keyboard protocol includes mouse support

s/fo/fo; while we're here.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoARM: GICv2: fix GICH_V2_LR definitions
Andre Przywara [Fri, 9 Mar 2018 15:11:33 +0000 (15:11 +0000)]
ARM: GICv2: fix GICH_V2_LR definitions

The bit definition for the CPUID mask in the GICv2 LR register was
wrong, fortunately the current implementation does not use that bit.
Fix it up (it's starting at bit 10, not bit 9) and clean up some
nearby definitions on the way.
This will be used by the new VGIC shortly.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: GICv3: poke_irq: make RWP optional
Andre Przywara [Fri, 9 Mar 2018 15:11:32 +0000 (15:11 +0000)]
ARM: GICv3: poke_irq: make RWP optional

A GICv3 hardware implementation can be implemented in several parts that
communicate with each other (think multi-socket systems).
To make sure that critical settings have arrived at all endpoints, some
bits are tracked using the RWP bit in the GICD_CTLR register, which
signals whether a register write is still in progress.
However this only applies to *some* registers, namely the bits in the
GICD_ICENABLER (disabling interrupts) and some bits in the GICD_CTLR
register (cf. Arm IHI 0069D, 8.9.4: RWP, bit[31]).
But our gicv3_poke_irq() was always polling this bit before returning,
resulting in pointless MMIO reads for many registers.
Add an option to gicv3_poke_irq() to state whether we want to wait for
this bit and use it accordingly to match the spec.
Replace a "1 << " with a "1U << " on the way to fix a potentially
undefined behaviour when the argument evaluates to 31.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: GICv2: introduce gicv2_poke_irq()
Andre Przywara [Fri, 9 Mar 2018 15:11:31 +0000 (15:11 +0000)]
ARM: GICv2: introduce gicv2_poke_irq()

The GICv2 uses bitmaps spanning several MMIO registers for holding some
interrupt state. Similar to GICv3, add a poke helper functions to set a bit
for a given irq_desc in one of those bitmaps.
At the moment there is only one use in gic-v2.c, but there will be more
coming soon.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: GICv3: rename HYP interface definitions to use ICH_ prefix
Andre Przywara [Fri, 9 Mar 2018 15:11:29 +0000 (15:11 +0000)]
ARM: GICv3: rename HYP interface definitions to use ICH_ prefix

On a GICv3 in non-compat mode the hypervisor interface is always
accessed via system registers. Those register names have a "ICH_" prefix
in the manual, to differentiate them from the MMIO registers. Also those
registers are mostly 64-bit (compared to the 32-bit GICv2 registers) and
use different bit assignments.
To make this obvious and to avoid clashes with double definitions using
the same names for actually different bits, lets change all GICv3
hypervisor interface registers to use the "ICH_" prefix from the manual.
This renames the definitions in gic_v3_defs.h and their usage in gic-v3.c
and is needed to allow co-existence of the GICv2 and GICv3 definitions
in the same file.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: VGIC: Introduce gic_get_nr_lrs()
Andre Przywara [Fri, 9 Mar 2018 15:11:28 +0000 (15:11 +0000)]
ARM: VGIC: Introduce gic_get_nr_lrs()

So far the number of list registers (LRs) a GIC implements is only
needed in the hardware facing side of the VGIC code (gic-vgic.c).
The new VGIC will need this information in more and multiple places, so
export a function that returns the number.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: VGIC: reorder prototypes in vgic.h
Andre Przywara [Fri, 9 Mar 2018 15:11:27 +0000 (15:11 +0000)]
ARM: VGIC: reorder prototypes in vgic.h

Currently vgic.h both contains prototypes used by Xen arch code outside
of the actual VGIC (for instance vgic_vcpu_inject_irq()), and prototypes
for functions used by the VGIC internally.
Group them to later allow an easy split with one #ifdef.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: VGIC: carve out struct vgic_cpu and struct vgic_dist
Andre Przywara [Fri, 9 Mar 2018 15:11:26 +0000 (15:11 +0000)]
ARM: VGIC: carve out struct vgic_cpu and struct vgic_dist

Currently we describe the VGIC specific fields in a structure
*embedded* in struct arch_domain and struct arch_vcpu. These members
there are however related to the current VGIC implementation, and will
be substantially different in the future.
To allow coexistence of two implementations, move the definition of these
embedded structures into vgic.h, and just use the opaque type in the arch
specific structures.
This allows easy switching between different implementations later.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: VGIC: change to level-IRQ compatible IRQ injection interface
Andre Przywara [Fri, 9 Mar 2018 15:11:25 +0000 (15:11 +0000)]
ARM: VGIC: change to level-IRQ compatible IRQ injection interface

At the moment vgic_vcpu_inject_irq() is the interface for Xen internal
code and virtual devices to inject IRQs into a guest. This interface has
two shortcomings:
1) It requires a VCPU pointer, which we may not know (and don't need!)
for shared interrupts. A second function (vgic_vcpu_inject_spi()), was
there to work around this issue.
2) This interface only really supports edge triggered IRQs, which is
what the Xen VGIC emulates only anyway. However this needs to and will
change, so we need to add the desired level (high or low) to the
interface.
This replaces the existing injection call (taking a VCPU and an IRQ
parameter) with a new one, taking domain, VCPU, IRQ and level parameters.
The VCPU can be NULL in case we don't know and don't care.
We change all call sites to use this new interface. This still doesn't
give us the missing level IRQ handling, but at least prepares the callers
to do the right thing later automatically.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: VGIC: Adjust domain_max_vcpus() to be VGIC specific
Andre Przywara [Fri, 9 Mar 2018 15:11:23 +0000 (15:11 +0000)]
ARM: VGIC: Adjust domain_max_vcpus() to be VGIC specific

domain_max_vcpus(), which is used by generic Xen code, returns the
maximum number of VCPUs for a domain, which on ARM is mostly limited by
the VGIC model emulated (a (v)GICv2 can only handle 8 CPUs).
Our current implementation lives in arch/arm/domain.c, but reaches into
VGIC internal data structures.
Move the actual functionality into vgic.c, and provide a shim in
domain.h, to keep this VGIC internal.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: VGIC: Move gic_remove_from_lr_pending() prototype
Andre Przywara [Fri, 9 Mar 2018 15:11:22 +0000 (15:11 +0000)]
ARM: VGIC: Move gic_remove_from_lr_pending() prototype

The prototype for gic_remove_from_lr_pending() is the last function in
gic.h which references a VGIC data structure.
Move it over to vgic.h, so that we can remove the inclusion of vgic.h
from gic.h. We add it to asm/domain.h instead, where it is actually
needed.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: VGIC: rename gic_inject() and gic_clear_lrs()
Andre Przywara [Fri, 9 Mar 2018 15:11:21 +0000 (15:11 +0000)]
ARM: VGIC: rename gic_inject() and gic_clear_lrs()

The two central functions to synchronise our emulated VGIC state with
the GIC hardware (the LRs, really), are named somewhat confusingly.
Rename them from gic_inject() to vgic_sync_to_lrs() and from
gic_clear_lrs() to vgic_sync_from_lrs(), to make the code more readable.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: vGICv3: remove rdist_stride from VGIC structure
Andre Przywara [Fri, 9 Mar 2018 15:11:20 +0000 (15:11 +0000)]
ARM: vGICv3: remove rdist_stride from VGIC structure

The last patch removed the usage of the hardware's redistributor-stride
value from our (Dom0) GICv3 emulation. This means we no longer need to
store this value in the VGIC data structure.
Remove that variable and every code snippet that handled that, instead
simply always use the architected value.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoARM: vGICv3: always use architected redist stride
Andre Przywara [Fri, 9 Mar 2018 15:11:19 +0000 (15:11 +0000)]
ARM: vGICv3: always use architected redist stride

The redistributor-stride property in a GICv3 DT node is only there to
cover broken platforms where this value deviates from the architected one.
Since we emulate the GICv3 distributor even for Dom0, we don't need to
copy the broken behaviour. All the special handling for Dom0s using
GICv3 is just for using the hardware's memory map, which is unaffected
by the redistributor stride - it can never be smaller than the
architected two pages.
Remove the redistributor-stride property from Dom0's DT node and also
remove the code that tried to reuse the hardware value for Dom0's GICv3
emulation.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: GICv3: use hardware GICv3 redistributor values for Dom0
Andre Przywara [Fri, 9 Mar 2018 15:11:18 +0000 (15:11 +0000)]
ARM: GICv3: use hardware GICv3 redistributor values for Dom0

The code to generate the DT node or MADT table for Dom0 reaches into the
domain's vGIC structure to learn the number of redistributor regions and
their base addresses.
Since those values are copied from the hardware, we can as well use
those hardware values directly when setting up the hardware domain.

This avoids the hardware GIC code to reference vGIC data structures.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: vGICv3: clarify on GUEST_GICV3_RDIST_REGIONS symbol
Andre Przywara [Fri, 9 Mar 2018 15:11:17 +0000 (15:11 +0000)]
ARM: vGICv3: clarify on GUEST_GICV3_RDIST_REGIONS symbol

Normally there is only one GICv3 redistributor region, and we use
that for DomU guests using a GICv3.
Explain the background in a comment and why we need to keep the number
of hardware regions for Dom0.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agocpufreq/ondemand: fix race while offlining CPU
Jan Beulich [Fri, 9 Mar 2018 16:30:49 +0000 (17:30 +0100)]
cpufreq/ondemand: fix race while offlining CPU

Offlining a CPU involves stopping the cpufreq governor. The on-demand
governor will kill the timer before letting generic code proceed, but
since that generally isn't happening on the subject CPU,
cpufreq_dbs_timer_resume() may run in parallel. If that managed to
invoke the timer handler, that handler needs to run to completion before
dbs_timer_exit() may safely exit.

Make the "stoppable" field a tristate, changing it from +1 to -1 around
the timer function invocation, and make dbs_timer_exit() wait for it to
become non-negative (still writing zero if it's +1).

Also adjust coding style in cpufreq_dbs_timer_resume().

Reported-by: Martin Cerveny <martin@c-home.cz>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Martin Cerveny <martin@c-home.cz>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: improve MSR_SHADOW_GS accesses
Jan Beulich [Fri, 9 Mar 2018 16:29:45 +0000 (17:29 +0100)]
x86: improve MSR_SHADOW_GS accesses

Instead of using RDMSR/WRMSR, on fsgsbase-capable systems use a double
SWAPGS combined with RDGSBASE/WRGSBASE. This halves execution time for
a shadow GS update alone on my Haswell (and we have indications of
good performance improvements by this on Skylake too), while the win is
even higher when e.g. updating more than one base (as may and commonly
will happen in load_segments()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/traps: Put idt_table[] back into .bss
Andrew Cooper [Fri, 9 Mar 2018 15:01:21 +0000 (15:01 +0000)]
x86/traps: Put idt_table[] back into .bss

c/s d1d6fc97d "x86/xpti: really hide almost all of Xen image" accidentially
moved idt_table[] from .bss to .data by virtue of using the page_aligned
section.  We also have .bss.page_aligned, so use that.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86emul/test: wrap libc functions with FPU save/restore code
Jan Beulich [Fri, 9 Mar 2018 13:47:21 +0000 (06:47 -0700)]
x86emul/test: wrap libc functions with FPU save/restore code

Currently with the native tool chain on Debian Jessie ./test_x86_emulator
yields:

  Testing AVX2 256bit single native execution...okay
  Testing AVX2 256bit single 64-bit code sequence...[line 933] failed!

The bug is that libc's memcpy() in read() uses %xmm8 (specifically, in
__memcpy_sse2_unaligned()), which corrupts %ymm8 behind the back of the AVX2
test code.

Introduce wrappers (and machinery to forward calls to those wrappers)
saving/restoring FPU state around certain library calls.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-and-tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agotests/x86emul: Helpers to save and restore FPU state
Andrew Cooper [Tue, 6 Mar 2018 13:42:36 +0000 (13:42 +0000)]
tests/x86emul: Helpers to save and restore FPU state

Introduce common helpers for saving and restoring FPU state.  During
emul_test_init(), calculate whether to use xsave or fxsave, and tweak the
existing mxcsr_mask logic to avoid using another large static buffer.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/alt: Drop explicit padding of origin sites
Andrew Cooper [Fri, 9 Feb 2018 14:33:59 +0000 (14:33 +0000)]
x86/alt: Drop explicit padding of origin sites

Now that the alternatives infrastructure can calculate the required padding
automatically, there is no need to hard code it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/alt: Support for automatic padding calculations
Andrew Cooper [Fri, 9 Feb 2018 12:47:58 +0000 (12:47 +0000)]
x86/alt: Support for automatic padding calculations

The correct amount of padding in an origin patch site can be calculated
automatically, based on the relative lengths of the replacements.

This requires a bit of trickery to calculate correctly, especially in the
ALTENRATIVE_2 case where a branchless max() calculation in needed.  The
calculation is further complicated because GAS's idea of true is -1 rather
than 1, which is why the extra negations are required.

Additionally, have apply_alternatives() attempt to optimise the padding nops.
This is complicated by the fact that we must not attempt to optimise nops over
an origin site which has already been modified.

To keep track of this, add a priv field to struct alt_instr, which gets
modified by apply_alternatives().  This method is used in preference to a
local variable in case we make multiple passes.  One extra requirement is that
alt_instr's referring to the same origin site must now be consecutive, but we
already have this property.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/asm: Remove opencoded uses of altinstruction_entry
Andrew Cooper [Fri, 9 Feb 2018 15:58:39 +0000 (15:58 +0000)]
x86/asm: Remove opencoded uses of altinstruction_entry

With future changes, altinstruction_entry is going to become more complicated
to use.  Furthermore, there are already ALTERNATIVE* macros which can be used
to avoid opencoding the creation of replacement information.

For ASM_STAC, ASM_CLAC and CR4_PV32_RESTORE, this means the removal of all
hardocded label numbers.  For the cr4_pv32 alternatives, this means hardcoding
the extra space required in the original patch site, but the hardcoding will
be removed by a later patch.

No change to any functionality, but the handling of nops inside the original
patch sites are a bit different.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/alt: Clean up the assembly used to generate alternatives
Andrew Cooper [Fri, 9 Feb 2018 13:31:28 +0000 (13:31 +0000)]
x86/alt: Clean up the assembly used to generate alternatives

 * On the C side, switch to using local lables rather than hardcoded numbers.
 * Rename parameters and lables to be consistent with alt_instr names, and
   consistent between the the C and asm versions.
 * On the asm side, factor some expressions out into macros to aid clarity.
 * Consistently declare section attributes.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/alt: Clean up struct alt_instr and its users
Andrew Cooper [Fri, 9 Feb 2018 13:31:28 +0000 (13:31 +0000)]
x86/alt: Clean up struct alt_instr and its users

 * Rename some fields for consistency and clarity, and use standard types.
 * Don't opencode the use of ALT_{ORIG,REPL}_PTR().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agox86/alt: Drop unused alternative infrastructure
Andrew Cooper [Fri, 9 Feb 2018 12:54:58 +0000 (12:54 +0000)]
x86/alt: Drop unused alternative infrastructure

ALTERNATIVE_3 is more complicated than ALTERNATIVE_2 when it comes to
calculating extra padding length, and we have no need for the complexity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/domain: Added debug safety in the domain_create() failure path
Andrew Cooper [Wed, 28 Feb 2018 14:02:41 +0000 (14:02 +0000)]
xen/domain: Added debug safety in the domain_create() failure path

Hitting the fail path with err = 0 causes callers to dereference a NULL
pointer, as 0 fails an IS_ERR() check.

All of the paths appear to be fine, but leave some logic to help catch stray
misuses.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agosched/rt: Fix build after c/s c3715dd8fb766
Andrew Cooper [Thu, 8 Mar 2018 11:10:52 +0000 (11:10 +0000)]
sched/rt: Fix build after c/s c3715dd8fb766

Travis reports:

  sched_rt.c:241:30: error: unused function 'rt_dom' [-Werror,-Wunused-function]
  static inline struct rt_dom *rt_dom(const struct domain *dom)
                               ^
  1 error generated.

when compiling with Clang.  Drop the function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
7 years agocommon/sched: Fix ARM build following c/s 340edc3902
Andrew Cooper [Wed, 7 Mar 2018 19:36:50 +0000 (19:36 +0000)]
common/sched: Fix ARM build following c/s 340edc3902

The OSSTest smoke tests reports:

  sched_credit2.c: In function 'csched2_alloc_domdata':
  sched_credit2.c:3015:9: error: implicit declaration of function 'ERR_PTR' [-Werror=implicit-function-declaration]
           return ERR_PTR(-ENOMEM);
           ^
  sched_credit2.c:3015:9: error: nested extern declaration of 'ERR_PTR' [-Werror=nested-externs]

As the ERR infrastructure is part of the main scheduler interface now, include it from xen/sched-if.h

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agotools/xenstore: add libdl dependency to libxenstore
Juergen Gross [Wed, 7 Mar 2018 11:03:18 +0000 (12:03 +0100)]
tools/xenstore: add libdl dependency to libxenstore

Commit 448c03b3cbe14873ee63 ("tools/xenstore: try to get minimum thread
stack size for watch thread") added a dependency to libdl to
libxenstore.

Add the needed flags to LDLIBS_libxenstore and the pkg-config file of
libxenstore.

Fixes: 448c03b3cbe14873ee63
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/domain: Call sched_destroy_domain() in the domain_create() error path
Andrew Cooper [Wed, 28 Feb 2018 11:43:25 +0000 (11:43 +0000)]
xen/domain: Call sched_destroy_domain() in the domain_create() error path

If domain_create() fails, complete_domain_destroy() doesn't get called,
meaning that sched_destroy_domain() is missed.  In practice, this can only
fail because of exceptional late_hwdom_init() issues at the moment.

Make sched_destroy_domain() idempotent, and call it in the fail path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
7 years agoxen/sched: Remove {init,destroy}_domain() interfaces
Andrew Cooper [Tue, 27 Feb 2018 16:48:19 +0000 (16:48 +0000)]
xen/sched: Remove {init,destroy}_domain() interfaces

These hooks have one single caller (sched_{init,destroy}_domain()
respectively) and are all identical (when implemented).

Previous changes have ensured that only real domains reach these functions, so
ASSERT() that system domains are not seen. Call sched_{alloc,free}_domdata()
directly, and handle d->sched_priv directly.

The net diffstat is:
  add/remove: 0/8 grow/shrink: 1/7 up/down: 7/-335 (-328)
  function                                     old     new   delta
  sched_destroy_domain                         130     137      +7
  sched_init_domain                            138     137      -1
  rt_dom_destroy                                 6       -      -6
  null_dom_destroy                               6       -      -6
  csched_dom_destroy                             9       -      -9
  csched2_dom_destroy                            9       -      -9
  sched_rtds_def                               264     248     -16
  sched_null_def                               264     248     -16
  sched_credit_def                             264     248     -16
  sched_credit2_def                            264     248     -16
  sched_arinc653_def                           264     248     -16
  ops                                          264     248     -16
  rt_dom_init                                   52       -     -52
  null_dom_init                                 52       -     -52
  csched_dom_init                               52       -     -52
  csched2_dom_init                              52       -     -52

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
7 years agoxen/sched: Improvements to the {alloc,free}_domdata() interfaces
Andrew Cooper [Tue, 27 Feb 2018 16:48:19 +0000 (16:48 +0000)]
xen/sched: Improvements to the {alloc,free}_domdata() interfaces

The main purpose of this change is for the subsequent cleanup it enables, but
it stands on its own merits.

In principle, these hooks are optional, but the SCHED_OP() default aliases a
memory allocation failure, which causes arinc653 to play the dangerous game of
passing its priv pointer back, and remembering not to actually free it.

Redefine alloc_domdata to use ERR_PTR() for errors, NULL for nothing, and
non-NULL for a real allocation, which allows the hook to become properly
optional.  Redefine free_domdata to be idempotent.

For arinc653, this means the dummy hooks can be dropped entirely.  For the
other schedulers, this means returning ERR_PTR(-ENOMEM) instead of NULL for
memory allocation failures, and modifying the free hooks to cope with a NULL
pointer.  While making the alterations, drop some spurious casts to void *.

Introduce and use proper wrappers for sched_{alloc,free}_domdata().  These are
strictly better than SCHED_OP(), as the source code is visible to
grep/cscope/tags, the generated code is better, and there can be proper
per-hook defaults and checks.

Callers of the alloc hooks are switched to using IS_ERR(), rather than
checking for NULL.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
7 years agoxen/credit2: Move repl_timer into struct csched2_dom
Andrew Cooper [Tue, 27 Feb 2018 16:48:19 +0000 (16:48 +0000)]
xen/credit2: Move repl_timer into struct csched2_dom

For exactly the same reason as 418ae6021d.  Having a separate allocation is
unnecessary and wasteful.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
7 years agoxen/domain: Reduce the quantity of initialisation for system domains
Andrew Cooper [Tue, 27 Feb 2018 16:35:02 +0000 (16:35 +0000)]
xen/domain: Reduce the quantity of initialisation for system domains

 * System domains don't need watchdog initialisation or iomem/irq rangesets,
   and will not plausibly be a xenstore or hardware domain.
 * The idle domain doesn't need scheduler initialisation (and in particular,
   removing this path allows for substantial scheduler cleanup), and isn't
   liable to ever need late_hwdom_init().

Move all of these initialisations pass the DOMCRF_dummy early exit, and into
non-idle paths.  rangeset_domain_initialise() remains because it makes no
allocations, but does initialise a linked list and spinlock.  The poolid
parameter can be dropped as sched_init_domain()'s parameter is now
unconditionally 0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
7 years agotools/libxc: Correct comment for normalise_pagetable()
Andrew Cooper [Wed, 7 Mar 2018 12:15:36 +0000 (12:15 +0000)]
tools/libxc: Correct comment for normalise_pagetable()

This is most likely a copy/paste mistake.

Reported-by: Bruno Alvisio <bruno.alvisio@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agovvmx: fixes after CR4 trapping optimizations
Roger Pau Monne [Fri, 2 Mar 2018 16:19:29 +0000 (16:19 +0000)]
vvmx: fixes after CR4 trapping optimizations

Commit 40681735502 doesn't update nested VMX code in order to
take into account L1 CR4 host mask when nested guest (L2) writes
to CR4, and thus the mask written to CR4_GUEST_HOST_MASK is
likely not as restrictive as it should be.

Also the VVMCS GUEST_CR4 value should be updated to match the
underlying value when syncing the VVMCS state.

Fixes: 40681735502 ("vmx/hap: optimize CR4 trapping")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agotools/xenstore: Don't link libxenstore against systemd
Andrew Cooper [Wed, 7 Mar 2018 11:13:19 +0000 (11:13 +0000)]
tools/xenstore: Don't link libxenstore against systemd

It is only xenstored which uses libsystemd.  Avoid having libxenstore pull
libsystemd into the address space of all of its users.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/arm: disable CPUs with different dcache line sizes
Stefano Stabellini [Tue, 6 Mar 2018 19:29:23 +0000 (11:29 -0800)]
xen/arm: disable CPUs with different dcache line sizes

Even different cpus in big.LITTLE systems are expected to have the same
dcache line size. Unless the minimum of all dcache line sizes is used
across all cpu cores, cache coherency protocols can go wrong. Instead,
for now, just disable any cpu with a different dcache line size.

This check is not covered by the hmp-unsafe option, because even with
the correct scheduling and vcpu pinning in place, the system breaks if
dcache line sizes differ across cores. We don't believe it is a problem
for most big.LITTLE systems.

This patch moves the implementation of setup_cache to a static inline,
still setting dcache_line_bytes at the beginning of start_xen as
before.

In start_secondary we check that the dcache level 1 line sizes match,
otherwise we disable the cpu.

Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: update the docs about heterogeneous computing
Stefano Stabellini [Tue, 6 Mar 2018 19:29:23 +0000 (11:29 -0800)]
xen/arm: update the docs about heterogeneous computing

Introduce a new document about big.LITTLE and update the documentation
of hmp-unsafe.

Also update the warning messages to point users to the docs.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: jbeulich@suse.com
CC: konrad.wilk@oracle.com
CC: tim@xen.org
CC: wei.liu2@citrix.com
CC: andrew.cooper3@citrix.com
CC: George.Dunlap@eu.citrix.com
CC: ian.jackson@eu.citrix.com
7 years agoxen/arm: set VPIDR based on the MIDR value of the underlying pCPU
Stefano Stabellini [Tue, 6 Mar 2018 19:29:23 +0000 (11:29 -0800)]
xen/arm: set VPIDR based on the MIDR value of the underlying pCPU

On big.LITTLE systems not all cores have the same MIDR. Instead of
storing only one VPIDR per domain, initialize it to the value of the
MIDR of the pCPU where the vCPU will run.

This way, assuming that the vCPU has been created with the right pCPU
affinity, the guest will be able to read the right VPIDR value, matching
the one of the physical cpu.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: read ACTLR on the pcpu where the vcpu will run
Stefano Stabellini [Tue, 6 Mar 2018 19:29:23 +0000 (11:29 -0800)]
xen/arm: read ACTLR on the pcpu where the vcpu will run

On big.LITTLE systems not all cores have the same ACTLR. Instead of
reading ACTLR and setting v->arch.actlr in vcpu_initialise, do it later
on the same pcpu where the vcpu will run.

This way, assuming that the vcpu has been created with the right pcpu
affinity, the guest will be able to read the right ACTLR value, matching
the one of the physical cpu.

Also move processor_vcpu_initialise(v) to continue_new_vcpu as it
can modify v->arch.actlr.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: make processor a per cpu variable
Stefano Stabellini [Tue, 6 Mar 2018 19:29:23 +0000 (11:29 -0800)]
xen/arm: make processor a per cpu variable

There can be processors of different kinds on a single system. Make
processor a per_cpu variable pointing to the right processor type for
each core.

Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: Park CPUs with a MIDR different from the boot CPU.
Julien Grall [Tue, 6 Mar 2018 19:28:54 +0000 (11:28 -0800)]
xen/arm: Park CPUs with a MIDR different from the boot CPU.

Xen does not properly support big.LITTLE platform. All vCPUs of a guest
will always have the MIDR of the boot CPU (see arch_domain_create).
At best the guest may see unreliable performance (vCPU switching between
big and LITTLE), at worst the guest will become unreliable or insecure.

This is becoming more apparent with branch predictor hardening in Linux
because they target a specific kind of CPUs and may not work on other
CPUs.

For the time being, park any CPUs with a MDIR different from the boot
CPU. This will be revisited in the future once Xen gains understanding
of big.LITTLE.

[1] https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg00826.html

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Oleksandr Tyshchenkko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/arm: Read the dcache line size from CTR register
Stefano Stabellini [Tue, 6 Mar 2018 19:27:54 +0000 (11:27 -0800)]
xen/arm: Read the dcache line size from CTR register

See the corresponding Linux commit as reference:

  commit f91e2c3bd427239c198351f44814dd39db91afe0
  Author: Catalin Marinas <catalin.marinas@arm.com>
  Date:   Tue Dec 7 16:52:04 2010 +0100

      ARM: 6527/1: Use CTR instead of CCSIDR for the D-cache line size on ARMv7

      The current implementation of the dcache_line_size macro reads the L1
      cache size from the CCSIDR register. This, however, is not guaranteed to
      be the smallest cache line in the cache hierarchy. The patch changes to
      the macro to use the more architecturally correct CTR register.

Reported-by: Kevin Sapp <ksapp@quicinc.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Also rename cacheline_bytes to dcache_line_bytes to clarify that it is
the minimum D-Cache line size.

Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agox86: remove CR reads from exit-to-guest path
Jan Beulich [Tue, 6 Mar 2018 15:49:36 +0000 (16:49 +0100)]
x86: remove CR reads from exit-to-guest path

CR3 is - during normal operation - only ever loaded from v->arch.cr3,
so there's no need to read the actual control register. For CR4 we can
generally use the cached value on all synchronous entry end exit paths.
Drop the write_cr3 macro, as the two use sites are probably easier to
follow without its use.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: slightly reduce Meltdown band-aid overhead
Jan Beulich [Tue, 6 Mar 2018 15:48:44 +0000 (16:48 +0100)]
x86: slightly reduce Meltdown band-aid overhead

I'm not sure why I didn't do this right away: By avoiding the use of
global PTEs in the cloned directmap, there's no need to fiddle with
CR4.PGE on any of the entry paths. Only the exit paths need to flush
global mappings.

The reduced flushing, however, requires that we now have interrupts off
on all entry paths until after the page table switch, so that flush IPIs
can't be serviced while on the restricted pagetables, leaving a window
where a potentially stale guest global mapping can be brought into the
TLB. Along those lines the "sync" IPI after L4 entry updates now needs
to become a real (and global) flush IPI, so that inside Xen we'll also
pick up such changes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agopv_console: remove unnecessary #ifdefs
Sergey Dyasli [Tue, 6 Mar 2018 15:47:34 +0000 (16:47 +0100)]
pv_console: remove unnecessary #ifdefs

The header for PV console contains empty function definitions in case of
!CONFIG_XEN_GUEST specially to avoid #ifdefs in a code that uses them
to make the code look cleaner.

Unfortunately, during the release of shim-comet, PV console functions
were enclosed into unnecessary #ifdefs CONFIG_X86. Remove them.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/xpti: don't map stack guard pages
Jan Beulich [Tue, 6 Mar 2018 15:46:57 +0000 (16:46 +0100)]
x86/xpti: don't map stack guard pages

Other than for the main mappings, don't even do this in release builds,
as there are no huge page shattering concerns here.

Note that since we don't run on the restructed page tables while HVM
guests execute, the non-present mappings won't trigger the triple fault
issue AMD SVM is susceptible to with our current placement of STGI vs
TR loading.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/xpti: really hide almost all of Xen image
Jan Beulich [Tue, 6 Mar 2018 15:46:27 +0000 (16:46 +0100)]
x86/xpti: really hide almost all of Xen image

Commit 422588e885 ("x86/xpti: Hide almost all of .text and all
.data/.rodata/.bss mappings") carefully limited the Xen image cloning to
just entry code, but then overwrote the just allocated and populated L3
entry with the normal one again covering both Xen image and stubs.

Drop the respective code in favor of an explicit clone_mapping()
invocation. This in turn now requires setup_cpu_root_pgt() to run after
stub setup in all cases. Additionally, with (almost) no unintended
mappings left, the BSP's IDT now also needs to be page aligned.

The moving ahead of cleanup_cpu_root_pgt() is not strictly necessary
for functionality, but things are more logical this way, and we retain
cleanup being done in the inverse order of setup.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: move and rename XSTATE_*
Jan Beulich [Tue, 6 Mar 2018 15:45:43 +0000 (16:45 +0100)]
x86: move and rename XSTATE_*

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: support SWAPGS
Jan Beulich [Tue, 6 Mar 2018 15:44:03 +0000 (16:44 +0100)]
x86emul: support SWAPGS

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agotools: ARM: vGICv3: Avoid inserting optional DT properties
Andre Przywara [Mon, 5 Mar 2018 16:03:19 +0000 (16:03 +0000)]
tools: ARM: vGICv3: Avoid inserting optional DT properties

When creating a GICv3 devicetree node, we currently insert the
redistributor-stride and #redistributor-regions properties, with fixed
values which are actually the architected ones. Since those properties are
optional, and in the case of the stride only needed to cover for broken
platforms, we don't need to describe them if they don't differ from the
default values. This will always be the case for our constructed
DomU memory map.
So we drop those properties altogether and provide a clean and architected
GICv3 DT node for DomUs.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoPlease Welcome Julien, our new Committer
Stefano Stabellini [Thu, 1 Mar 2018 19:17:13 +0000 (11:17 -0800)]
Please Welcome Julien, our new Committer

In recognition of his expertise and commitment to Xen Project, please
join me in welcoming Julien among the Committers and REST Maintainers.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/pv: Drop int80_bounce from struct pv_vcpu
Andrew Cooper [Fri, 5 May 2017 16:01:47 +0000 (17:01 +0100)]
x86/pv: Drop int80_bounce from struct pv_vcpu

The int80_bounce field of struct pv_vcpu is a bit of an odd special case,
because it is a simple derivation of trap_ctxt[0x80], which is also stored.

It is also the only use of {compat_,}create_bounce_frame() which isn't
referencing the plain trap_bounce field of struct pv_vcpu.  (And altering this
property the purpose of this patch.)

Remove the int80_bounce field entirely, along with init_int80_direct_trap(),
which in turn requires that the int80_direct_trap() path gain logic previously
contained in init_int80_direct_trap().

This does admittedly make the int80 fastpath slightly longer, but these few
instructions are in the noise compared to the architectural context switch
overhead, and it now matches the syscall/sysenter paths (which have far less
architectural overhead already).

No behavioural change from the guests point of view.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Correct comparisons against boolean variables
Andrew Cooper [Wed, 21 Feb 2018 13:00:23 +0000 (13:00 +0000)]
x86/entry: Correct comparisons against boolean variables

The correct way to check a boolean is `cmpb $0` or `testb $0xff`, whereas a
lot of our entry code uses `testb $1`.  This will work in principle for values
which are really C _Bool types, but won't work for other integer types which
are intended to have boolean properties.

cmp is the more logical way of thinking about the operation, so adjust all
outstanding uses of `testb $1` against boolean values.  Changing test to cmp
changes the logical mnemonic of the following condition from 'zero' to
'equal', but the actual encoding remains the same.

No functional change, as all uses are real C _Bool types, and confirmed by
diffing the disassembly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/boot: Annotate the multiboot headers with size and type information
Andrew Cooper [Fri, 2 Mar 2018 17:45:52 +0000 (17:45 +0000)]
x86/boot: Annotate the multiboot headers with size and type information

This causes objdump not to try and disassemble the data.

While altering this area, switch to using .balign, and fill with 0xc2 to help
highlight the embedded padding (rather than having it filled with 0f 1f 40 00
which is a long nop).  Also, shorten the labels by stripping off the _start
suffix.

The end result is now:
  ffff82d080200000 <_start>:
  ffff82d080200000:       e9 af c1 1c 00          jmpq   ffff82d0803cc1b4 <__start>
  ffff82d080200005:       0f 1f 00                nopl   (%rax)

  ffff82d080200008 <multiboot1_header>:
  ffff82d080200008:       02 b0 ad 1b 03 00 00 00 fb 4f 52 e4 c2 c2 c2 c2     .........OR.....

  ffff82d080200018 <multiboot2_header>:
  ffff82d080200018:       d6 50 52 e8 00 00 00 00 88 00 00 00 a2 ae ad 17     .PR.............
  ffff82d080200028:       01 00 00 00 10 00 00 00 04 00 00 00 06 00 00 00     ................
  ffff82d080200038:       06 00 00 00 08 00 00 00 0a 00 01 00 18 00 00 00     ................
  ffff82d080200048:       00 00 20 00 ff ff ff ff 00 00 20 00 02 00 00 00     .. ....... .....
  ffff82d080200058:       04 00 01 00 0c 00 00 00 02 00 00 00 c2 c2 c2 c2     ................
  ffff82d080200068:       05 00 01 00 14 00 00 00 00 00 00 00 00 00 00 00     ................
  ffff82d080200078:       00 00 00 00 c2 c2 c2 c2 07 00 01 00 08 00 00 00     ................
  ffff82d080200088:       09 00 01 00 0c 00 00 00 5e c0 3c 00 c2 c2 c2 c2     ........^.<.....
  ffff82d080200098:       00 00 00 00 08 00 00 00                             ........

  ffff82d0802000a0 <__high_start>:
  ffff82d0802000a0:       0f 01 15 5f 8f 25 00    lgdt   0x258f5f(%rip)        # ffff82d080459006 <gdt_descr>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86emul: support XOP insns
Jan Beulich [Mon, 5 Mar 2018 15:23:52 +0000 (16:23 +0100)]
x86emul: support XOP insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: support AVX2 gather insns
Jan Beulich [Mon, 5 Mar 2018 15:23:06 +0000 (16:23 +0100)]
x86emul: support AVX2 gather insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: support most remaining AVX2 insns
Jan Beulich [Mon, 5 Mar 2018 15:21:49 +0000 (16:21 +0100)]
x86emul: support most remaining AVX2 insns

I.e. those not being equivalents of SSEn ones, but with the exception
of the various gather operations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: extend vbroadcasts{s,d} to AVX2
Jan Beulich [Mon, 5 Mar 2018 15:20:46 +0000 (16:20 +0100)]
x86emul: extend vbroadcasts{s,d} to AVX2

These gain register forms now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/arm: domain_builder: irq sanity check logic fix
Stewart Hildebrand [Tue, 27 Feb 2018 15:15:55 +0000 (15:15 +0000)]
xen/arm: domain_builder: irq sanity check logic fix

Since commit "xen/arm: domain_build: Rework the way to allocate the
event channel interrupt", it is not possible for an irq to be both below 16
and greater/equal than 32.

Also fix the reference to linux documentation while we're at it.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
[Slightly rework the commit message]

7 years agoxen/arm: domain_build: Rework the way to allocate the event channel interrupt
Julien Grall [Tue, 27 Feb 2018 15:15:54 +0000 (15:15 +0000)]
xen/arm: domain_build: Rework the way to allocate the event channel interrupt

At the moment, a placeholder will be created in the device-tree for the
event channel information. Later in the domain construction, the
interrupt for the event channel upcall will be allocated the device-tree
fixed up.

Looking at the code, the current split is not necessary because all the
PPIs used by the hardware domain will by the time we create the node in
the device-tree.

>From now, mandate that all interrupts are registered before
acpi_prepare() and dtb_prepare(). This allows us to rework the event
channel code and remove one placeholder.

Note, this will also help to fix the BUG(...) condition in set_interrupt_ppi
which is completely wrong. See in a follow-up patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain_build: Prepare DTB/ACPI tables after specific mappings
Julien Grall [Tue, 27 Feb 2018 15:15:53 +0000 (15:15 +0000)]
xen/arm: domain_build: Prepare DTB/ACPI tables after specific mappings

A follow-up patch will require to have all interrupts routed to the
hardware registered before calling prepare_dtb/prepare_acpi.

At the moment, it is not necessary to call platform specific mappings
(gic and platform) after, so it is fine to move them.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agotools/xenstore: try to get minimum thread stack size for watch thread
Juergen Gross [Mon, 26 Feb 2018 08:46:12 +0000 (09:46 +0100)]
tools/xenstore: try to get minimum thread stack size for watch thread

When creating a pthread in xs_watch() try to get the minimal needed
size of the thread from glibc instead of using a constant. This avoids
problems when the library is used in programs with large per-thread
memory.

Use dlsym() to get the pointer to __pthread_get_minstack() in order to
avoid linkage problems and fall back to the current constant size if
not found.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
7 years agox86: rename HAVE_GAS_* to HAVE_AS_*
Wei Liu [Fri, 2 Mar 2018 16:46:25 +0000 (16:46 +0000)]
x86: rename HAVE_GAS_* to HAVE_AS_*

Xen also uses clang's assembler when it is possible. Change the macro
names to not be GAS specific.

Patch produced with:

$ for f in `git grep HAVE_GAS_ | cut -d':' -f1`; \
    do sed -i 's/HAVE_GAS_/HAVE_AS_/g' $f; done

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86: invpcid support
Wei Liu [Fri, 2 Mar 2018 16:23:38 +0000 (16:23 +0000)]
x86: invpcid support

Provide the functions needed for different modes. Add cpu_has_invpcid.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agopublic: correct GNTTABOP_set_version comment
Jan Beulich [Fri, 2 Mar 2018 14:20:15 +0000 (15:20 +0100)]
public: correct GNTTABOP_set_version comment

Version changes are allowed any number of times. Simply re-use the
comment XTF has (thanks Andrew).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: guard more stack pages
Jan Beulich [Fri, 2 Mar 2018 14:19:28 +0000 (15:19 +0100)]
x86: guard more stack pages

There's no reason to keep the unused pages (of which there are actually
two; respective commentary also gets adjusted) mapped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/link: Don't merge .init.text and .init.data
Andrew Cooper [Sat, 11 Nov 2017 19:08:37 +0000 (19:08 +0000)]
x86/link: Don't merge .init.text and .init.data

c/s 1308f0170c merged .init.text and .init.data, because EFI might properly
write-protect r/o sections.

However, that change makes xen-syms unusable for disassembly analysis.  In
particular, searching for indirect branches as part of the SP2/Spectre
mitigation series.

As the merging isn't necessary for ELF targets at all, make it conditional on
the EFI side of the build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agofuzz/x86_emulate: fix bounds for input size
Paul Semel [Fri, 23 Feb 2018 22:48:57 +0000 (23:48 +0100)]
fuzz/x86_emulate: fix bounds for input size

The maximum size for the input size was set to INPUT_SIZE, which is actually
the size of the data array inside the fuzz_corpus structure and so was not
abling user (or AFL) to fill in the whole structure. Changing to
sizeof(struct fuzz_corpus) correct this problem.

Signed-off-by: Paul Semel <semelpaul@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: drop stale references to curl/xml2-config
Olaf Hering [Thu, 1 Mar 2018 14:26:32 +0000 (15:26 +0100)]
tools: drop stale references to curl/xml2-config

Curl and xml2 are not required anymore since 185bb58be3 ("tools: drop
libxen") removed their only user.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: run autogen.sh ]

7 years agolibxl: set channel devid when not provided by application
Jim Fehlig [Mon, 26 Feb 2018 18:28:39 +0000 (11:28 -0700)]
libxl: set channel devid when not provided by application

Applications like libvirt may not populate a device devid field,
delegating that to libxl. If needed, the application can later
retrieve the libxl-produced devid. Indeed most devices are handled
this way in libvirt, channel devices included.

This works well when only one channel device is defined, but more
than one results in

qemu-system-i386: -chardev socket,id=libxl-channel-1,\
path=/tmp/test-org.qemu.guest_agent.00,server,nowait:
Duplicate ID 'libxl-channel-1' for chardev

Besides the odd '-1' value in the id, multiple channels have the same
id, causing qemu to fail. A simple fix is to set an uninitialized
devid (-1) to the dev_num passed to libxl__init_console_from_channel().

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agolibxl: do not fail device removal if backend domain is gone
Marek Marczykowski-Górecki [Fri, 23 Feb 2018 20:00:41 +0000 (21:00 +0100)]
libxl: do not fail device removal if backend domain is gone

Backend domain may be independently destroyed - there is no
synchronization of libxl structures (including /libxl tree) elsewhere.
Backend might also remove the device info from its backend xenstore
subtree on its own.

We have various cases (not comprehensive list):

 - both frontend and backend operational: after setting
   be/state=XenbusStateClosing backend wait for frontend confirmation
   and respond with be/state=XenbusStateClosed; then libxl in dom0
   remove frontend entries and libxl in backend domain (which may be the
   same) remove backend entries
 - unresponsive backend/frontend: after a timeout, force=1 is used to remove
   frontend entries, instead of just setting
   be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
   If that timeout too, remove both frontend and backend entries
 - backend gone, with this patch: no place for setting/waiting on
   be/state - go directly to removing frontend entries, without waiting
   for be/state=XenbusStateClosed (this is the difference vs force=1)

Without this patch the end result is similar, both frontend and backend
entries are removed, but in case of backend gone:
 - libxl waits for be/state=XenbusStateClosed (and obviously timeout)
 - return value from the function signal an error, which for example
   confuse libvirt - it thinks the device remove failed, so is still
   there

If such situation is detected, do not fail the removal, but finish the
cleanup of the frontend side and return 0.

This is just workaround, the real fix should watch when the device
backend is removed (including backend domain destruction) and remove
frontend at that time. And report such event to higher layer code, so
for example libvirt could synchronize its state.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>