Dario Faggioli [Thu, 22 Feb 2018 14:30:21 +0000 (15:30 +0100)]
xen/libxc: suppress direct access to Credit1's migration delay
Removes special purpose access to Credit1 vCPU
migration delay parameter.
This fixes a build breakage, occuring when Xen
is configured with SCHED_CREDIT=n.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
--- Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tim Deegan <tim@xen.org> Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
Changes from v1:
* bumped the interface version, as requested.
Dario Faggioli [Mon, 19 Feb 2018 18:07:43 +0000 (19:07 +0100)]
tools: xenpm: continue to support {set,get}-vcpu-migration-delay
Now that it is possible to get and set the migration
delay via the SCHEDOP sysctl, use that in xenpm, instead
of the special purpose libxc interface (which will be
removed in a following commit).
The sysctl, however, requires a cpupool-id argument,
for knowing on which scheduler it is operating on. In
this case, since we don't want to alter xenpm's command
line interface, we always use '0', which means xenpm
will always act on the default cpupool ('Pool-0').
From this commit on, `xenpm {set,get}-vcpu-migration-delay'
commands work again. But that is only for the sake of
backward compatibility, and their use is deprecated, in
favour of 'xl sched-credit -s [-c <poolid>] -m <delay>'.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
--- Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes from v2:
* send the warning to stderr, rather than to stdout.
Dario Faggioli [Mon, 19 Feb 2018 18:03:31 +0000 (19:03 +0100)]
tools: libxl/xl: allow to get/set Credit1's vcpu_migration_delay
Make it possible to get and set a (Credit1) scheduler's
vCPU migration delay via the SCHEDOP sysctl, from both
libxl and xl (no change needed in libxc).
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
--- Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes from v2:
* drop a redundant scinfo->vcpu_migr_delay_us != 0 check
Changes from v1:
* add missing 'break', fix using wrong variable in xl_sched.c.
Dario Faggioli [Thu, 15 Feb 2018 14:53:50 +0000 (15:53 +0100)]
xen: sched/credit1: make vcpu_migration_delay per-cpupool
Right now, vCPU migration delay is controlled by
the vcpu_migration_delay boot parameter. This means
the same value will always be used for every instance
of Credit1, in any cpupool that will be created.
Also, in order to get and set such value, a special
purpose libxc interface is defined, and used by the
xenpm tool. And this is problematic if Xen is built
without Credit1 support.
This commit adds a vcpu_migr_delay field inside
struct csched_private, so that we can get/set the
migration delay indepently for each Credit1 instance,
in different cpupools.
Getting and setting now happens via XEN_SYSCTL_SCHEDOP_*,
which is much better suited for this parameter.
The value of the boot time parameter is used for
initializing the vcpu_migr_delay field of the private
structure of all the scheduler instances, when they're
created.
While there, save reading NOW() and doing any s_time_t
operation, when the migration delay of a scheduler is
zero (as it is, by default), in
__csched_vcpu_is_cache_hot().
Finally, note that, from this commit on, using `xenpm
{set,get}-vcpu-migration-delay' will have no effect
any longer. A subsequent commit will re-enable it, for
the sake of backwards-compatibility.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes from v1:
* improved the changelog, as suggested;
* add the _US suffix to XEN_SYSCTL_CSCHED_MGR_DLY_MAX;
* add the _us suffix vcpu_migration_delay too;
* fix wrong time conversions;
* drop redundant and wrong checks for [params]->vcpu_migration_delay to be 0.
Jan Beulich [Thu, 15 Mar 2018 11:45:30 +0000 (12:45 +0100)]
x86/VMX: don't risk corrupting host CR4
Instead of "syncing" the live value to what mmu_cr4_features has, make
sure vCPU-s run with the value most recently loaded into %cr4, such that
after the next VM exit we continue to run with the intended value rather
than a possibly stale one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 15 Mar 2018 11:44:24 +0000 (12:44 +0100)]
x86: ignore guest microcode loading attempts
The respective MSRs are write-only, and hence attempts by guests to
write to these are - as of 1f1d183d49 ("x86/HVM: don't give the wrong
impression of WRMSR succeeding") no longer ignored. Restore original
behavior for the two affected MSRs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 14 Mar 2018 15:00:14 +0000 (15:00 +0000)]
docs: Fix entry for the "usbdev" option
The man for xl.cfg have the "devtype=hostdev" option, but xl only
understand "type=hostdev", fix the manual to reflect actual
implementation.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Wed, 14 Mar 2018 10:48:36 +0000 (10:48 +0000)]
x86/pv: Fix guest crashes following f75b1a5247b "x86/pv: Drop int80_bounce from struct pv_vcpu"
The original init_int80_direct_trap() was in fact buggy; `int $0x80` is not an
exception. This went unnoticed for years because int80_bounce and trap_bounce
were separate structures, but were combined by this change.
Exception handling is different to interrupt handling for PV guests. By
reusing trap_bounce, the following corner case can occur:
* Handle a guest `int $0x80` instruction. Latches TBF_EXCEPTION into
trap_bounce.
* Handle an exception, which emulates to success (such as ptwr support),
which leaves trap_bounce unmodified.
* The exception exit path sees TBF_EXCEPTION set and re-injects the `int
$0x80` a second time.
Drop the TBF_EXCEPTION from the int80 invocation, which matches the equivalent
logic from the syscall/sysenter paths.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Tue, 13 Mar 2018 11:13:18 +0000 (11:13 +0000)]
libxl_qmp: Tell QEMU about live migration or snapshot
Since version 2.10, QEMU will lock the disk images so a second QEMU
instance will not try to open it. This would prevent live migration from
working correctly. A new parameter as been added to the QMP command
"xen-save-devices-state" in QEMU version 2.11 which allow to unlock the
disk image for a live migration, but also keep it locked for a snapshot.
xenalyze.c: In function 'find_symbol':
xenalyze.c:382:36: error: 'snprintf' output may be truncated before the last format character [-Werror=format-truncation=]
snprintf(name, 128, "(%s +%llx)",
^
xenalyze.c:382:5: note: 'snprintf' output between 6 and 144 bytes into a destination of size 128
snprintf(name, 128, "(%s +%llx)",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lastname, offset);
~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Michael Young [Mon, 12 Mar 2018 18:49:29 +0000 (18:49 +0000)]
make xen ocaml safe-strings compliant
Xen built with ocaml 4.06 gives errors such as
Error: This expression has type bytes but an expression was
expected of type string
as Byte and safe-strings which were introduced in 4.02 are the
default in 4.06.
This patch which is partly by Richard W.M. Jones of Red Hat
from https://bugzilla.redhat.com/show_bug.cgi?id=1526703
fixes these issues.
Signed-off-by: Michael Young <m.a.young@durham.ac.uk> Reviewed-by: Christian Lindig<christian.lindig@citrix.com>
When building debug use -Og as the optimization level if its available,
otherwise retain the use of -O0. -Og has been added by GCC to enable all
optimizations that to not affect debugging while retaining full
debugability.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Fri, 9 Mar 2018 11:04:18 +0000 (11:04 +0000)]
docs: Remove redundant qemu-xen-security document
All this information is now covered in SUPPORT.md.
Most of the emulated hardware is obvious a couple of the items are
worth pointing out specifically.
"xen_disk" is listed under "Blkback"
"...the PCI host bridge and the PIIX3 chipset...": This statement is
redundant -- the PCI host bridge is a part of the piix3 chipset, which
is listed as supported.
xenfb: The "graphics" side of "xenfb" is listed under "PV Framebuffer
(backend)", and the "input" side of "xenfb" (including both keyboard
and mouse) is listed under "PV Keyboard (backend)".
Backing storage image format is listed in the "Blkback" section.
Fix 'stdvga' spelling while we're here.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:33 +0000 (15:11 +0000)]
ARM: GICv2: fix GICH_V2_LR definitions
The bit definition for the CPUID mask in the GICv2 LR register was
wrong, fortunately the current implementation does not use that bit.
Fix it up (it's starting at bit 10, not bit 9) and clean up some
nearby definitions on the way.
This will be used by the new VGIC shortly.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:32 +0000 (15:11 +0000)]
ARM: GICv3: poke_irq: make RWP optional
A GICv3 hardware implementation can be implemented in several parts that
communicate with each other (think multi-socket systems).
To make sure that critical settings have arrived at all endpoints, some
bits are tracked using the RWP bit in the GICD_CTLR register, which
signals whether a register write is still in progress.
However this only applies to *some* registers, namely the bits in the
GICD_ICENABLER (disabling interrupts) and some bits in the GICD_CTLR
register (cf. Arm IHI 0069D, 8.9.4: RWP, bit[31]).
But our gicv3_poke_irq() was always polling this bit before returning,
resulting in pointless MMIO reads for many registers.
Add an option to gicv3_poke_irq() to state whether we want to wait for
this bit and use it accordingly to match the spec.
Replace a "1 << " with a "1U << " on the way to fix a potentially
undefined behaviour when the argument evaluates to 31.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:31 +0000 (15:11 +0000)]
ARM: GICv2: introduce gicv2_poke_irq()
The GICv2 uses bitmaps spanning several MMIO registers for holding some
interrupt state. Similar to GICv3, add a poke helper functions to set a bit
for a given irq_desc in one of those bitmaps.
At the moment there is only one use in gic-v2.c, but there will be more
coming soon.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:29 +0000 (15:11 +0000)]
ARM: GICv3: rename HYP interface definitions to use ICH_ prefix
On a GICv3 in non-compat mode the hypervisor interface is always
accessed via system registers. Those register names have a "ICH_" prefix
in the manual, to differentiate them from the MMIO registers. Also those
registers are mostly 64-bit (compared to the 32-bit GICv2 registers) and
use different bit assignments.
To make this obvious and to avoid clashes with double definitions using
the same names for actually different bits, lets change all GICv3
hypervisor interface registers to use the "ICH_" prefix from the manual.
This renames the definitions in gic_v3_defs.h and their usage in gic-v3.c
and is needed to allow co-existence of the GICv2 and GICv3 definitions
in the same file.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Acked-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:28 +0000 (15:11 +0000)]
ARM: VGIC: Introduce gic_get_nr_lrs()
So far the number of list registers (LRs) a GIC implements is only
needed in the hardware facing side of the VGIC code (gic-vgic.c).
The new VGIC will need this information in more and multiple places, so
export a function that returns the number.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:27 +0000 (15:11 +0000)]
ARM: VGIC: reorder prototypes in vgic.h
Currently vgic.h both contains prototypes used by Xen arch code outside
of the actual VGIC (for instance vgic_vcpu_inject_irq()), and prototypes
for functions used by the VGIC internally.
Group them to later allow an easy split with one #ifdef.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:26 +0000 (15:11 +0000)]
ARM: VGIC: carve out struct vgic_cpu and struct vgic_dist
Currently we describe the VGIC specific fields in a structure
*embedded* in struct arch_domain and struct arch_vcpu. These members
there are however related to the current VGIC implementation, and will
be substantially different in the future.
To allow coexistence of two implementations, move the definition of these
embedded structures into vgic.h, and just use the opaque type in the arch
specific structures.
This allows easy switching between different implementations later.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:25 +0000 (15:11 +0000)]
ARM: VGIC: change to level-IRQ compatible IRQ injection interface
At the moment vgic_vcpu_inject_irq() is the interface for Xen internal
code and virtual devices to inject IRQs into a guest. This interface has
two shortcomings:
1) It requires a VCPU pointer, which we may not know (and don't need!)
for shared interrupts. A second function (vgic_vcpu_inject_spi()), was
there to work around this issue.
2) This interface only really supports edge triggered IRQs, which is
what the Xen VGIC emulates only anyway. However this needs to and will
change, so we need to add the desired level (high or low) to the
interface.
This replaces the existing injection call (taking a VCPU and an IRQ
parameter) with a new one, taking domain, VCPU, IRQ and level parameters.
The VCPU can be NULL in case we don't know and don't care.
We change all call sites to use this new interface. This still doesn't
give us the missing level IRQ handling, but at least prepares the callers
to do the right thing later automatically.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:23 +0000 (15:11 +0000)]
ARM: VGIC: Adjust domain_max_vcpus() to be VGIC specific
domain_max_vcpus(), which is used by generic Xen code, returns the
maximum number of VCPUs for a domain, which on ARM is mostly limited by
the VGIC model emulated (a (v)GICv2 can only handle 8 CPUs).
Our current implementation lives in arch/arm/domain.c, but reaches into
VGIC internal data structures.
Move the actual functionality into vgic.c, and provide a shim in
domain.h, to keep this VGIC internal.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
The prototype for gic_remove_from_lr_pending() is the last function in
gic.h which references a VGIC data structure.
Move it over to vgic.h, so that we can remove the inclusion of vgic.h
from gic.h. We add it to asm/domain.h instead, where it is actually
needed.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:21 +0000 (15:11 +0000)]
ARM: VGIC: rename gic_inject() and gic_clear_lrs()
The two central functions to synchronise our emulated VGIC state with
the GIC hardware (the LRs, really), are named somewhat confusingly.
Rename them from gic_inject() to vgic_sync_to_lrs() and from
gic_clear_lrs() to vgic_sync_from_lrs(), to make the code more readable.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Acked-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:20 +0000 (15:11 +0000)]
ARM: vGICv3: remove rdist_stride from VGIC structure
The last patch removed the usage of the hardware's redistributor-stride
value from our (Dom0) GICv3 emulation. This means we no longer need to
store this value in the VGIC data structure.
Remove that variable and every code snippet that handled that, instead
simply always use the architected value.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Acked-by: Julien Grall <julien.grall@linaro.org>
Andre Przywara [Fri, 9 Mar 2018 15:11:19 +0000 (15:11 +0000)]
ARM: vGICv3: always use architected redist stride
The redistributor-stride property in a GICv3 DT node is only there to
cover broken platforms where this value deviates from the architected one.
Since we emulate the GICv3 distributor even for Dom0, we don't need to
copy the broken behaviour. All the special handling for Dom0s using
GICv3 is just for using the hardware's memory map, which is unaffected
by the redistributor stride - it can never be smaller than the
architected two pages.
Remove the redistributor-stride property from Dom0's DT node and also
remove the code that tried to reuse the hardware value for Dom0's GICv3
emulation.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Acked-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:18 +0000 (15:11 +0000)]
ARM: GICv3: use hardware GICv3 redistributor values for Dom0
The code to generate the DT node or MADT table for Dom0 reaches into the
domain's vGIC structure to learn the number of redistributor regions and
their base addresses.
Since those values are copied from the hardware, we can as well use
those hardware values directly when setting up the hardware domain.
This avoids the hardware GIC code to reference vGIC data structures.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:17 +0000 (15:11 +0000)]
ARM: vGICv3: clarify on GUEST_GICV3_RDIST_REGIONS symbol
Normally there is only one GICv3 redistributor region, and we use
that for DomU guests using a GICv3.
Explain the background in a comment and why we need to keep the number
of hardware regions for Dom0.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Acked-by: Julien Grall <julien.grall@arm.com>
Jan Beulich [Fri, 9 Mar 2018 16:30:49 +0000 (17:30 +0100)]
cpufreq/ondemand: fix race while offlining CPU
Offlining a CPU involves stopping the cpufreq governor. The on-demand
governor will kill the timer before letting generic code proceed, but
since that generally isn't happening on the subject CPU,
cpufreq_dbs_timer_resume() may run in parallel. If that managed to
invoke the timer handler, that handler needs to run to completion before
dbs_timer_exit() may safely exit.
Make the "stoppable" field a tristate, changing it from +1 to -1 around
the timer function invocation, and make dbs_timer_exit() wait for it to
become non-negative (still writing zero if it's +1).
Also adjust coding style in cpufreq_dbs_timer_resume().
Reported-by: Martin Cerveny <martin@c-home.cz> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Martin Cerveny <martin@c-home.cz> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Fri, 9 Mar 2018 16:29:45 +0000 (17:29 +0100)]
x86: improve MSR_SHADOW_GS accesses
Instead of using RDMSR/WRMSR, on fsgsbase-capable systems use a double
SWAPGS combined with RDGSBASE/WRGSBASE. This halves execution time for
a shadow GS update alone on my Haswell (and we have indications of
good performance improvements by this on Skylake too), while the win is
even higher when e.g. updating more than one base (as may and commonly
will happen in load_segments()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Fri, 9 Mar 2018 15:01:21 +0000 (15:01 +0000)]
x86/traps: Put idt_table[] back into .bss
c/s d1d6fc97d "x86/xpti: really hide almost all of Xen image" accidentially
moved idt_table[] from .bss to .data by virtue of using the page_aligned
section. We also have .bss.page_aligned, so use that.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Fri, 9 Mar 2018 13:47:21 +0000 (06:47 -0700)]
x86emul/test: wrap libc functions with FPU save/restore code
Currently with the native tool chain on Debian Jessie ./test_x86_emulator
yields:
Testing AVX2 256bit single native execution...okay
Testing AVX2 256bit single 64-bit code sequence...[line 933] failed!
The bug is that libc's memcpy() in read() uses %xmm8 (specifically, in
__memcpy_sse2_unaligned()), which corrupts %ymm8 behind the back of the AVX2
test code.
Introduce wrappers (and machinery to forward calls to those wrappers)
saving/restoring FPU state around certain library calls.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-and-tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 6 Mar 2018 13:42:36 +0000 (13:42 +0000)]
tests/x86emul: Helpers to save and restore FPU state
Introduce common helpers for saving and restoring FPU state. During
emul_test_init(), calculate whether to use xsave or fxsave, and tweak the
existing mxcsr_mask logic to avoid using another large static buffer.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 9 Feb 2018 14:33:59 +0000 (14:33 +0000)]
x86/alt: Drop explicit padding of origin sites
Now that the alternatives infrastructure can calculate the required padding
automatically, there is no need to hard code it.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 9 Feb 2018 12:47:58 +0000 (12:47 +0000)]
x86/alt: Support for automatic padding calculations
The correct amount of padding in an origin patch site can be calculated
automatically, based on the relative lengths of the replacements.
This requires a bit of trickery to calculate correctly, especially in the
ALTENRATIVE_2 case where a branchless max() calculation in needed. The
calculation is further complicated because GAS's idea of true is -1 rather
than 1, which is why the extra negations are required.
Additionally, have apply_alternatives() attempt to optimise the padding nops.
This is complicated by the fact that we must not attempt to optimise nops over
an origin site which has already been modified.
To keep track of this, add a priv field to struct alt_instr, which gets
modified by apply_alternatives(). This method is used in preference to a
local variable in case we make multiple passes. One extra requirement is that
alt_instr's referring to the same origin site must now be consecutive, but we
already have this property.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 9 Feb 2018 15:58:39 +0000 (15:58 +0000)]
x86/asm: Remove opencoded uses of altinstruction_entry
With future changes, altinstruction_entry is going to become more complicated
to use. Furthermore, there are already ALTERNATIVE* macros which can be used
to avoid opencoding the creation of replacement information.
For ASM_STAC, ASM_CLAC and CR4_PV32_RESTORE, this means the removal of all
hardocded label numbers. For the cr4_pv32 alternatives, this means hardcoding
the extra space required in the original patch site, but the hardcoding will
be removed by a later patch.
No change to any functionality, but the handling of nops inside the original
patch sites are a bit different.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 9 Feb 2018 13:31:28 +0000 (13:31 +0000)]
x86/alt: Clean up the assembly used to generate alternatives
* On the C side, switch to using local lables rather than hardcoded numbers.
* Rename parameters and lables to be consistent with alt_instr names, and
consistent between the the C and asm versions.
* On the asm side, factor some expressions out into macros to aid clarity.
* Consistently declare section attributes.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 9 Feb 2018 13:31:28 +0000 (13:31 +0000)]
x86/alt: Clean up struct alt_instr and its users
* Rename some fields for consistency and clarity, and use standard types.
* Don't opencode the use of ALT_{ORIG,REPL}_PTR().
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Andrew Cooper [Fri, 9 Feb 2018 12:54:58 +0000 (12:54 +0000)]
x86/alt: Drop unused alternative infrastructure
ALTERNATIVE_3 is more complicated than ALTERNATIVE_2 when it comes to
calculating extra padding length, and we have no need for the complexity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Andrew Cooper [Wed, 7 Mar 2018 19:36:50 +0000 (19:36 +0000)]
common/sched: Fix ARM build following c/s 340edc3902
The OSSTest smoke tests reports:
sched_credit2.c: In function 'csched2_alloc_domdata':
sched_credit2.c:3015:9: error: implicit declaration of function 'ERR_PTR' [-Werror=implicit-function-declaration]
return ERR_PTR(-ENOMEM);
^
sched_credit2.c:3015:9: error: nested extern declaration of 'ERR_PTR' [-Werror=nested-externs]
As the ERR infrastructure is part of the main scheduler interface now, include it from xen/sched-if.h
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Wed, 28 Feb 2018 11:43:25 +0000 (11:43 +0000)]
xen/domain: Call sched_destroy_domain() in the domain_create() error path
If domain_create() fails, complete_domain_destroy() doesn't get called,
meaning that sched_destroy_domain() is missed. In practice, this can only
fail because of exceptional late_hwdom_init() issues at the moment.
Make sched_destroy_domain() idempotent, and call it in the fail path.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
These hooks have one single caller (sched_{init,destroy}_domain()
respectively) and are all identical (when implemented).
Previous changes have ensured that only real domains reach these functions, so
ASSERT() that system domains are not seen. Call sched_{alloc,free}_domdata()
directly, and handle d->sched_priv directly.
Andrew Cooper [Tue, 27 Feb 2018 16:48:19 +0000 (16:48 +0000)]
xen/sched: Improvements to the {alloc,free}_domdata() interfaces
The main purpose of this change is for the subsequent cleanup it enables, but
it stands on its own merits.
In principle, these hooks are optional, but the SCHED_OP() default aliases a
memory allocation failure, which causes arinc653 to play the dangerous game of
passing its priv pointer back, and remembering not to actually free it.
Redefine alloc_domdata to use ERR_PTR() for errors, NULL for nothing, and
non-NULL for a real allocation, which allows the hook to become properly
optional. Redefine free_domdata to be idempotent.
For arinc653, this means the dummy hooks can be dropped entirely. For the
other schedulers, this means returning ERR_PTR(-ENOMEM) instead of NULL for
memory allocation failures, and modifying the free hooks to cope with a NULL
pointer. While making the alterations, drop some spurious casts to void *.
Introduce and use proper wrappers for sched_{alloc,free}_domdata(). These are
strictly better than SCHED_OP(), as the source code is visible to
grep/cscope/tags, the generated code is better, and there can be proper
per-hook defaults and checks.
Callers of the alloc hooks are switched to using IS_ERR(), rather than
checking for NULL.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Meng Xu <mengxu@cis.upenn.edu> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Andrew Cooper [Tue, 27 Feb 2018 16:48:19 +0000 (16:48 +0000)]
xen/credit2: Move repl_timer into struct csched2_dom
For exactly the same reason as 418ae6021d. Having a separate allocation is
unnecessary and wasteful.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Andrew Cooper [Tue, 27 Feb 2018 16:35:02 +0000 (16:35 +0000)]
xen/domain: Reduce the quantity of initialisation for system domains
* System domains don't need watchdog initialisation or iomem/irq rangesets,
and will not plausibly be a xenstore or hardware domain.
* The idle domain doesn't need scheduler initialisation (and in particular,
removing this path allows for substantial scheduler cleanup), and isn't
liable to ever need late_hwdom_init().
Move all of these initialisations pass the DOMCRF_dummy early exit, and into
non-idle paths. rangeset_domain_initialise() remains because it makes no
allocations, but does initialise a linked list and spinlock. The poolid
parameter can be dropped as sched_init_domain()'s parameter is now
unconditionally 0.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Roger Pau Monne [Fri, 2 Mar 2018 16:19:29 +0000 (16:19 +0000)]
vvmx: fixes after CR4 trapping optimizations
Commit 40681735502 doesn't update nested VMX code in order to
take into account L1 CR4 host mask when nested guest (L2) writes
to CR4, and thus the mask written to CR4_GUEST_HOST_MASK is
likely not as restrictive as it should be.
Also the VVMCS GUEST_CR4 value should be updated to match the
underlying value when syncing the VVMCS state.
Fixes: 40681735502 ("vmx/hap: optimize CR4 trapping") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
xen/arm: disable CPUs with different dcache line sizes
Even different cpus in big.LITTLE systems are expected to have the same
dcache line size. Unless the minimum of all dcache line sizes is used
across all cpu cores, cache coherency protocols can go wrong. Instead,
for now, just disable any cpu with a different dcache line size.
This check is not covered by the hmp-unsafe option, because even with
the correct scheduling and vcpu pinning in place, the system breaks if
dcache line sizes differ across cores. We don't believe it is a problem
for most big.LITTLE systems.
This patch moves the implementation of setup_cache to a static inline,
still setting dcache_line_bytes at the beginning of start_xen as
before.
In start_secondary we check that the dcache level 1 line sizes match,
otherwise we disable the cpu.
xen/arm: set VPIDR based on the MIDR value of the underlying pCPU
On big.LITTLE systems not all cores have the same MIDR. Instead of
storing only one VPIDR per domain, initialize it to the value of the
MIDR of the pCPU where the vCPU will run.
This way, assuming that the vCPU has been created with the right pCPU
affinity, the guest will be able to read the right VPIDR value, matching
the one of the physical cpu.
xen/arm: read ACTLR on the pcpu where the vcpu will run
On big.LITTLE systems not all cores have the same ACTLR. Instead of
reading ACTLR and setting v->arch.actlr in vcpu_initialise, do it later
on the same pcpu where the vcpu will run.
This way, assuming that the vcpu has been created with the right pcpu
affinity, the guest will be able to read the right ACTLR value, matching
the one of the physical cpu.
Also move processor_vcpu_initialise(v) to continue_new_vcpu as it
can modify v->arch.actlr.
Julien Grall [Tue, 6 Mar 2018 19:28:54 +0000 (11:28 -0800)]
xen/arm: Park CPUs with a MIDR different from the boot CPU.
Xen does not properly support big.LITTLE platform. All vCPUs of a guest
will always have the MIDR of the boot CPU (see arch_domain_create).
At best the guest may see unreliable performance (vCPU switching between
big and LITTLE), at worst the guest will become unreliable or insecure.
This is becoming more apparent with branch predictor hardening in Linux
because they target a specific kind of CPUs and may not work on other
CPUs.
For the time being, park any CPUs with a MDIR different from the boot
CPU. This will be revisited in the future once Xen gains understanding
of big.LITTLE.
ARM: 6527/1: Use CTR instead of CCSIDR for the D-cache line size on ARMv7
The current implementation of the dcache_line_size macro reads the L1
cache size from the CCSIDR register. This, however, is not guaranteed to
be the smallest cache line in the cache hierarchy. The patch changes to
the macro to use the more architecturally correct CTR register.
Reported-by: Kevin Sapp <ksapp@quicinc.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Also rename cacheline_bytes to dcache_line_bytes to clarify that it is
the minimum D-Cache line size.
Jan Beulich [Tue, 6 Mar 2018 15:49:36 +0000 (16:49 +0100)]
x86: remove CR reads from exit-to-guest path
CR3 is - during normal operation - only ever loaded from v->arch.cr3,
so there's no need to read the actual control register. For CR4 we can
generally use the cached value on all synchronous entry end exit paths.
Drop the write_cr3 macro, as the two use sites are probably easier to
follow without its use.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 6 Mar 2018 15:48:44 +0000 (16:48 +0100)]
x86: slightly reduce Meltdown band-aid overhead
I'm not sure why I didn't do this right away: By avoiding the use of
global PTEs in the cloned directmap, there's no need to fiddle with
CR4.PGE on any of the entry paths. Only the exit paths need to flush
global mappings.
The reduced flushing, however, requires that we now have interrupts off
on all entry paths until after the page table switch, so that flush IPIs
can't be serviced while on the restricted pagetables, leaving a window
where a potentially stale guest global mapping can be brought into the
TLB. Along those lines the "sync" IPI after L4 entry updates now needs
to become a real (and global) flush IPI, so that inside Xen we'll also
pick up such changes.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Sergey Dyasli [Tue, 6 Mar 2018 15:47:34 +0000 (16:47 +0100)]
pv_console: remove unnecessary #ifdefs
The header for PV console contains empty function definitions in case of
!CONFIG_XEN_GUEST specially to avoid #ifdefs in a code that uses them
to make the code look cleaner.
Unfortunately, during the release of shim-comet, PV console functions
were enclosed into unnecessary #ifdefs CONFIG_X86. Remove them.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Tue, 6 Mar 2018 15:46:57 +0000 (16:46 +0100)]
x86/xpti: don't map stack guard pages
Other than for the main mappings, don't even do this in release builds,
as there are no huge page shattering concerns here.
Note that since we don't run on the restructed page tables while HVM
guests execute, the non-present mappings won't trigger the triple fault
issue AMD SVM is susceptible to with our current placement of STGI vs
TR loading.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 6 Mar 2018 15:46:27 +0000 (16:46 +0100)]
x86/xpti: really hide almost all of Xen image
Commit 422588e885 ("x86/xpti: Hide almost all of .text and all
.data/.rodata/.bss mappings") carefully limited the Xen image cloning to
just entry code, but then overwrote the just allocated and populated L3
entry with the normal one again covering both Xen image and stubs.
Drop the respective code in favor of an explicit clone_mapping()
invocation. This in turn now requires setup_cpu_root_pgt() to run after
stub setup in all cases. Additionally, with (almost) no unintended
mappings left, the BSP's IDT now also needs to be page aligned.
The moving ahead of cleanup_cpu_root_pgt() is not strictly necessary
for functionality, but things are more logical this way, and we retain
cleanup being done in the inverse order of setup.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
When creating a GICv3 devicetree node, we currently insert the
redistributor-stride and #redistributor-regions properties, with fixed
values which are actually the architected ones. Since those properties are
optional, and in the case of the stride only needed to cover for broken
platforms, we don't need to describe them if they don't differ from the
default values. This will always be the case for our constructed
DomU memory map.
So we drop those properties altogether and provide a clean and architected
GICv3 DT node for DomUs.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 5 May 2017 16:01:47 +0000 (17:01 +0100)]
x86/pv: Drop int80_bounce from struct pv_vcpu
The int80_bounce field of struct pv_vcpu is a bit of an odd special case,
because it is a simple derivation of trap_ctxt[0x80], which is also stored.
It is also the only use of {compat_,}create_bounce_frame() which isn't
referencing the plain trap_bounce field of struct pv_vcpu. (And altering this
property the purpose of this patch.)
Remove the int80_bounce field entirely, along with init_int80_direct_trap(),
which in turn requires that the int80_direct_trap() path gain logic previously
contained in init_int80_direct_trap().
This does admittedly make the int80 fastpath slightly longer, but these few
instructions are in the noise compared to the architectural context switch
overhead, and it now matches the syscall/sysenter paths (which have far less
architectural overhead already).
No behavioural change from the guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 21 Feb 2018 13:00:23 +0000 (13:00 +0000)]
x86/entry: Correct comparisons against boolean variables
The correct way to check a boolean is `cmpb $0` or `testb $0xff`, whereas a
lot of our entry code uses `testb $1`. This will work in principle for values
which are really C _Bool types, but won't work for other integer types which
are intended to have boolean properties.
cmp is the more logical way of thinking about the operation, so adjust all
outstanding uses of `testb $1` against boolean values. Changing test to cmp
changes the logical mnemonic of the following condition from 'zero' to
'equal', but the actual encoding remains the same.
No functional change, as all uses are real C _Bool types, and confirmed by
diffing the disassembly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 2 Mar 2018 17:45:52 +0000 (17:45 +0000)]
x86/boot: Annotate the multiboot headers with size and type information
This causes objdump not to try and disassemble the data.
While altering this area, switch to using .balign, and fill with 0xc2 to help
highlight the embedded padding (rather than having it filled with 0f 1f 40 00
which is a long nop). Also, shorten the labels by stripping off the _start
suffix.
Since commit "xen/arm: domain_build: Rework the way to allocate the
event channel interrupt", it is not possible for an irq to be both below 16
and greater/equal than 32.
Also fix the reference to linux documentation while we're at it.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com> Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
[Slightly rework the commit message]
Julien Grall [Tue, 27 Feb 2018 15:15:54 +0000 (15:15 +0000)]
xen/arm: domain_build: Rework the way to allocate the event channel interrupt
At the moment, a placeholder will be created in the device-tree for the
event channel information. Later in the domain construction, the
interrupt for the event channel upcall will be allocated the device-tree
fixed up.
Looking at the code, the current split is not necessary because all the
PPIs used by the hardware domain will by the time we create the node in
the device-tree.
>From now, mandate that all interrupts are registered before
acpi_prepare() and dtb_prepare(). This allows us to rework the event
channel code and remove one placeholder.
Note, this will also help to fix the BUG(...) condition in set_interrupt_ppi
which is completely wrong. See in a follow-up patch.
Juergen Gross [Mon, 26 Feb 2018 08:46:12 +0000 (09:46 +0100)]
tools/xenstore: try to get minimum thread stack size for watch thread
When creating a pthread in xs_watch() try to get the minimal needed
size of the thread from glibc instead of using a constant. This avoids
problems when the library is used in programs with large per-thread
memory.
Use dlsym() to get the pointer to __pthread_get_minstack() in order to
avoid linkage problems and fall back to the current constant size if
not found.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com>
Wei Liu [Fri, 2 Mar 2018 16:46:25 +0000 (16:46 +0000)]
x86: rename HAVE_GAS_* to HAVE_AS_*
Xen also uses clang's assembler when it is possible. Change the macro
names to not be GAS specific.
Patch produced with:
$ for f in `git grep HAVE_GAS_ | cut -d':' -f1`; \
do sed -i 's/HAVE_GAS_/HAVE_AS_/g' $f; done
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Sat, 11 Nov 2017 19:08:37 +0000 (19:08 +0000)]
x86/link: Don't merge .init.text and .init.data
c/s 1308f0170c merged .init.text and .init.data, because EFI might properly
write-protect r/o sections.
However, that change makes xen-syms unusable for disassembly analysis. In
particular, searching for indirect branches as part of the SP2/Spectre
mitigation series.
As the merging isn't necessary for ELF targets at all, make it conditional on
the EFI side of the build.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Semel [Fri, 23 Feb 2018 22:48:57 +0000 (23:48 +0100)]
fuzz/x86_emulate: fix bounds for input size
The maximum size for the input size was set to INPUT_SIZE, which is actually
the size of the data array inside the fuzz_corpus structure and so was not
abling user (or AFL) to fill in the whole structure. Changing to
sizeof(struct fuzz_corpus) correct this problem.
Signed-off-by: Paul Semel <semelpaul@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jim Fehlig [Mon, 26 Feb 2018 18:28:39 +0000 (11:28 -0700)]
libxl: set channel devid when not provided by application
Applications like libvirt may not populate a device devid field,
delegating that to libxl. If needed, the application can later
retrieve the libxl-produced devid. Indeed most devices are handled
this way in libvirt, channel devices included.
This works well when only one channel device is defined, but more
than one results in
qemu-system-i386: -chardev socket,id=libxl-channel-1,\
path=/tmp/test-org.qemu.guest_agent.00,server,nowait:
Duplicate ID 'libxl-channel-1' for chardev
Besides the odd '-1' value in the id, multiple channels have the same
id, causing qemu to fail. A simple fix is to set an uninitialized
devid (-1) to the dev_num passed to libxl__init_console_from_channel().
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
libxl: do not fail device removal if backend domain is gone
Backend domain may be independently destroyed - there is no
synchronization of libxl structures (including /libxl tree) elsewhere.
Backend might also remove the device info from its backend xenstore
subtree on its own.
We have various cases (not comprehensive list):
- both frontend and backend operational: after setting
be/state=XenbusStateClosing backend wait for frontend confirmation
and respond with be/state=XenbusStateClosed; then libxl in dom0
remove frontend entries and libxl in backend domain (which may be the
same) remove backend entries
- unresponsive backend/frontend: after a timeout, force=1 is used to remove
frontend entries, instead of just setting
be/state=XenbusStateClosing; then wait for be/state=XenbusStateClosed.
If that timeout too, remove both frontend and backend entries
- backend gone, with this patch: no place for setting/waiting on
be/state - go directly to removing frontend entries, without waiting
for be/state=XenbusStateClosed (this is the difference vs force=1)
Without this patch the end result is similar, both frontend and backend
entries are removed, but in case of backend gone:
- libxl waits for be/state=XenbusStateClosed (and obviously timeout)
- return value from the function signal an error, which for example
confuse libvirt - it thinks the device remove failed, so is still
there
If such situation is detected, do not fail the removal, but finish the
cleanup of the frontend side and return 0.
This is just workaround, the real fix should watch when the device
backend is removed (including backend domain destruction) and remove
frontend at that time. And report such event to higher layer code, so
for example libvirt could synchronize its state.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>