Roger Pau Monne [Thu, 13 Feb 2020 10:26:12 +0000 (11:26 +0100)]
smp: convert cpu_hotplug_begin into a blocking lock acquisition
Don't allow cpu_hotplug_begin to fail by converting the trylock into a
blocking lock acquisition. Write users of the cpu_add_remove_lock are
limited to CPU plug/unplug operations, and cannot deadlock between
themselves or other users taking the lock in read mode as
cpu_add_remove_lock is always locked with interrupts enabled. There
are also no other locks taken during the plug/unplug operations.
The exclusive lock usage in register_cpu_notifier is also converted
into a blocking lock acquisition, as it was previously not allowed to
fail anyway.
This is meaningful when running Xen in shim mode, since VCPU_{up/down}
hypercalls use cpu hotplug/unplug operations in the background, and
hence failing to take the lock results in VPCU_{up/down} failing with
-EBUSY, which most users are not prepared to handle.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I've tested this and seems to work fine AFAICT either when running on
native or when used in the shim. I'm not sure if I'm missing something
that would prevent the write lock acquisition from being made
blocking.
Roger Pau Monne [Thu, 13 Feb 2020 09:44:10 +0000 (10:44 +0100)]
smp: convert the cpu maps lock into a rw lock
Most users of the cpu maps just care about the maps not changing while
the lock is being held, but don't actually modify the maps.
Convert the lock into a rw lock, and take the lock in read mode in
get_cpu_maps and in write mode in cpu_hotplug_begin. This will lower
the contention around the lock, since plug and unplug operations that
take the lock in write mode are not that common.
Note that the read lock can be taken recursively (as it's a shared
lock), and hence will keep the same behavior as the previously used
recursive lock. As for the write lock, it's only used by CPU
plug/unplug operations, and the lock is never taken recursively in
that case.
While there also change get_cpu_maps return type to bool.
Reported-by: Julien Grall <julien@xen.org> Suggested-also-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Tue, 11 Feb 2020 10:14:48 +0000 (11:14 +0100)]
x86: add accessors for scratch cpu mask
Current usage of the per-CPU scratch cpumask is dangerous since
there's no way to figure out if the mask is already being used except
for manual code inspection of all the callers and possible call paths.
This is unsafe and not reliable, so introduce a minimal get/put
infrastructure to prevent nested usage of the scratch mask and usage
in interrupt context.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Tue, 11 Feb 2020 11:04:41 +0000 (12:04 +0100)]
x86/smp: use a dedicated scratch cpumask in send_IPI_mask
Using scratch_cpumask in send_IPI_mak is not safe because it can be
called from interrupt context, and hence Xen would have to make sure
all the users of the scratch cpumask disable interrupts while using
it.
Instead introduce a new cpumask to be used by send_IPI_mask, and
disable interrupts while using.
Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible') Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Thu, 19 Dec 2019 13:16:16 +0000 (14:16 +0100)]
x86/tlb: use Xen L0 assisted TLB flush when available
Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
This greatly increases the performance of TLB flushes when running
with a high amount of vCPUs as a Xen guest, and is specially important
when running in shim mode.
The following figures are from a PV guest running `make -j32 xen` in
shim mode with 32 vCPUs and HAP.
Using x2APIC and ALLBUT shorthand:
real 4m35.973s
user 4m35.110s
sys 36m24.117s
Using L0 assisted flush:
real 1m2.596s
user 4m34.818s
sys 5m16.374s
The implementation adds a new hook to hypervisor_ops so other
enlightenments can also implement such assisted flush just by filling
the hook. Note that the Xen implementation completely ignores the
dirty CPU mask and the linear address passed in, and always performs a
global TLB flush on all vCPUs.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
- Use an alternative call for the flush hook.
Changes since v1:
- Add a L0 assisted hook to hypervisor ops.
Roger Pau Monne [Thu, 6 Feb 2020 14:56:30 +0000 (15:56 +0100)]
xen/guest: prepare hypervisor ops to use alternative calls
Adapt the hypervisor ops framework so it can be used with the
alternative calls framework. So far no hooks are modified to make use
of the alternatives patching, as they are not in any hot path.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
- New in this version.
Roger Pau Monne [Mon, 27 Jan 2020 09:41:24 +0000 (10:41 +0100)]
x86/tlb: allow disabling the TLB clock
The TLB clock is helpful when running Xen on bare metal because when
doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
last flush.
This is not the case however when Xen is running virtualized, and the
underlying hypervisor provides mechanism to assist in performing TLB
flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
order to perform a TLB flush without having to IPI each CPU. When
using such mechanisms it's no longer possible to keep a timestamp of
the flushes on each CPU, as they are performed by the underlying
hypervisor.
Offer a boolean in order to signal Xen that the timestamped TLB
shouldn't be used. This avoids keeping the timestamps of the flushes,
and also forces NEED_FLUSH to always return true.
No functional change intended, as this change doesn't introduce any
user that disables the timestamped TLB.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Mon, 27 Jan 2020 10:23:08 +0000 (11:23 +0100)]
x86/tlb: introduce a flush guests TLB flag
Introduce a specific flag to request a HVM guest TLB flush, which is
an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.
This was previously unconditionally done in each pre_flush call, but
that's not required: HVM guests not using shadow don't require linear
TLB flushes as Xen doesn't modify the guest page tables in that case
(ie: when using HAP).
Modify all shadow code TLB flushes to also flush the guest TLB, in
order to keep the previous behavior. I haven't looked at each specific
shadow code TLB flush in order to figure out whether it actually
requires a guest TLB flush or not, so there might be room for
improvement in that regard.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wl@xen.org>
The current implementation of the hypervisor assisted flush for HAP is
extremely inefficient.
First of all there's no need to call paging_update_cr3, as the only
relevant part of that function when doing a flush is the ASID vCPU
flush, so just call that function directly.
Since hvm_asid_flush_vcpu is protected against concurrent callers by
using atomic operations there's no need anymore to pause the affected
vCPUs.
Finally the global TLB flush performed by flush_tlb_mask is also not
necessary, since we only want to flush the guest TLB state it's enough
to trigger a vmexit on the pCPUs currently holding any vCPU state, as
such vmexit will already perform an ASID/VPID update, and thus clear
the guest TLB.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wl@xen.org>
---
Changes since v3:
- s/do_flush/handle_flush/.
- Add comment about handle_flush usage.
- Fix VPID typo in comment.
Roger Pau Monne [Tue, 14 Jan 2020 09:38:44 +0000 (10:38 +0100)]
x86/paging: add TLB flush hooks
Add shadow and hap implementation specific helpers to perform guest
TLB flushes. Note that the code for both is exactly the same at the
moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
further patches that will add implementation specific optimizations to
them.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wl@xen.org>
---
Changes since v3:
- Fix stray newline removal.
- Fix return of shadow_flush_tlb dummy function.
Roger Pau Monne [Tue, 21 Jan 2020 17:23:46 +0000 (17:23 +0000)]
x86/hvm: allow ASID flush when v != current
Current implementation of hvm_asid_flush_vcpu is not safe to use
unless the target vCPU is either paused or the currently running one,
as it modifies the generation without any locking.
Fix this by using atomic operations when accessing the generation
field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
allows to safely flush the current ASID generation. Note that for the
flush to take effect if the vCPU is currently running a vmexit is
required.
Note the same could be achieved by introducing an extra field to
hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
fields to 0, so there's no need to delay this to the vmentry ASID
helper.
This is not a bugfix as no callers that would violate the assumptions
listed in the first paragraph have been found, but a preparatory
change in order to allow remote flushing of HVM vCPUs.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wl@xen.org>
Roger Pau Monne [Thu, 23 Jan 2020 17:37:47 +0000 (18:37 +0100)]
x86/apic: simplify disconnect_bsp_APIC setup of LVT{0/1}
There's no need to read the current values of LVT{0/1} for the
purposes of the function, which seem to be to save the currently
selected vector: in the destination modes used (ExtINT and NMI) the
vector field is ignored and hence can be set to 0.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Tue, 14 Jan 2020 18:06:26 +0000 (19:06 +0100)]
x86/hvmloader: round up memory BAR size to 4K
When placing memory BARs with sizes smaller than 4K multiple memory
BARs can end up mapped to the same guest physical address, and thus
won't work correctly.
Round up all memory BAR sizes to be at least 4K, so that they are
naturally aligned to a page size and thus don't end up sharing a page.
Also add a couple of asserts to the current code to make sure the MMIO
hole is properly sized and aligned.
Note that the guest can still move the BARs around and create this
collisions, and that BARs not filling up a physical page might leak
access to other MMIO regions placed in the same host physical page.
This is however no worse than what's currently done, and hence should
be considered an improvement over the current state.
Reported-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Jason Andryuk <jandryuk@gmail.com>
---
Changes since v1:
- Do the round up when sizing the BARs, so that the MMIO hole is
correctly sized.
- Add some asserts that the hole is properly sized and size-aligned.
- Dropped Jason Tested-by since the code has changed.
---
Jason, can you give this a spin? Thanks.
Roger Pau Monne [Wed, 29 Jan 2020 11:38:00 +0000 (12:38 +0100)]
nvmx: always trap accesses to x2APIC MSRs
Nested VMX doesn't expose support for
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE,
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT, and hence the x2APIC MSRs should
always be trapped in the nested guest MSR bitmap, or else a nested
guest could access the hardware x2APIC MSRs given certain conditions.
Accessing the hardware MSRs could be achieved by forcing the L0 Xen to
use SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE and
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT (if supported), and then creating a
L2 guest with a MSR bitmap that doesn't trap accesses to the x2APIC
MSR range. Then OR'ing both L0 and L1 MSR bitmaps would result in a
bitmap that doesn't trap certain x2APIC MSRs and a VMCS that doesn't
have SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE and
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT set either.
Fix this by making sure x2APIC MSRs are always trapped in the nested
MSR bitmap.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since v3:
- Use bitmap_set.
Changes since v1:
- New in this version (split from #1 patch).
- Use non-locked set_bit.
Roger Pau Monne [Tue, 7 Jan 2020 11:32:39 +0000 (12:32 +0100)]
nvmx: implement support for MSR bitmaps
Current implementation of nested VMX has a half baked handling of MSR
bitmaps for the L1 VMM: it maps the L1 VMM provided MSR bitmap, but
doesn't actually load it into the nested vmcs, and thus the nested
guest vmcs ends up using the same MSR bitmap as the L1 VMM.
This is wrong as there's no assurance that the set of features enabled
for the L1 vmcs are the same that L1 itself is going to use in the
nested vmcs, and thus can lead to misconfigurations.
For example L1 vmcs can use x2APIC virtualization and virtual
interrupt delivery, and thus some x2APIC MSRs won't be trapped so that
they can be handled directly by the hardware using virtualization
extensions. On the other hand, the nested vmcs created by L1 VMM might
not use any of such features, so using a MSR bitmap that doesn't trap
accesses to the x2APIC MSRs will be leaking them to the underlying
hardware.
Fix this by crafting a merged MSR bitmap between the one used by L1
and the nested guest.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
This seems better than what's done currently, but TBH there's a lot of
work to be done in nvmx in order to make it functional and secure that
I'm not sure whether building on top of the current implementation is
something sane to do, or it would be better to start from scratch and
re-implement nvmx to just support the minimum required set of VTx
features in a sane and safe way.
---
Changes since v4:
- Add static to vcpu_relinquish_resources.
Changes since v3:
- Free the merged MSR bitmap page in nvmx_purge_vvmcs.
Changes since v2:
- Pass shadow_ctrl into update_msrbitmap, and check there if
CPU_BASED_ACTIVATE_MSR_BITMAP is set.
- Do not enable MSR bitmap unless it's enabled in both L1 and L2.
- Rename L1 guest to L2 in nestedvmx struct comment.
Changes since v1:
- Split the x2APIC MSR fix into a separate patch.
- Move setting MSR_BITMAP vmcs field into load_vvmcs_host_state for
virtual vmexit.
- Allocate memory with MEMF_no_owner.
- Use tabs to align comment of the nestedvmx struct field.
Jeff Kubascik [Tue, 4 Feb 2020 19:51:50 +0000 (14:51 -0500)]
xen/arm: Handle unimplemented VGICv3 registers as RAZ/WI
Per the ARM Generic Interrupt Controller Architecture Specification (ARM
IHI 0069E), reserved registers should generally be treated as RAZ/WI.
To simplify the VGICv3 design and improve guest compatibility, treat the
default case for GICD and GICR registers as read_as_zero/write_ignore.
Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com> Acked-by: Julien Grall <julien@xen.org>
Julien Grall [Thu, 6 Feb 2020 15:41:18 +0000 (15:41 +0000)]
xen/include: public: Document the padding in struct xen_hvm_param
There is an implicit padding of 2 bytes in struct xen_hvm_param between
the field domid and index. Make it explicit by introduce a padding
field. This can also serve as documentation.
Note that I don't think we can mandate it to be zero because a guest may
not have initialized the padding.
Signed-off-by: Julien Grall <jgrall@amazon.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Thu, 6 Feb 2020 15:23:30 +0000 (16:23 +0100)]
x86/HVM: reduce scope of pfec in hvm_emulate_init_per_insn()
It needs calculating only in one out of three cases. Re-structure the
code a little such that the variable truly gets calculated only when we
don't get any insn bytes from elsewhere, and hence need to (try to)
fetch them. Also OR in PFEC_insn_fetch right in the initializer.
While in this mood, restrict addr's scope as well.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Wei Liu [Wed, 5 Feb 2020 18:02:24 +0000 (18:02 +0000)]
x86/guest/xen: only set HVM parameter on BSP
There is no need for every CPU to set a guest property.
Suggested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Wei Liu <wl@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 6 Feb 2020 08:55:18 +0000 (09:55 +0100)]
domctl/vNUMA: avoid arithmetic overflow
Checking the result of a multiplication against a certain limit has no
sufficient implication on the original value's range. In the case here
it is in particular problematic that while handling the domctl we do
if ( copy_from_guest(info->vdistance, uinfo->vdistance,
nr_vnodes * nr_vnodes) )
goto vnuma_fail;
which means copying sizeof(unsigned int) * (nr_vnodes * nr_vnodes)
bytes, and the handling of XENMEM_get_vnumainfo similarly has
Jan Beulich [Thu, 6 Feb 2020 08:53:12 +0000 (09:53 +0100)]
xmalloc: guard against integer overflow
There are hypercall handling paths (EFI ones are what this was found
with) needing to allocate buffers of a caller specified size. This is
generally fine, as our page allocator enforces an upper bound on all
allocations. However, certain extremely large sizes could, when adding
in allocator overhead, result in an apparently tiny allocation size,
which would typically result in either a successful allocation, but a
severe buffer overrun when using that memory block, or in a crash right
in the allocator code.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Thu, 6 Feb 2020 08:52:33 +0000 (09:52 +0100)]
EFI: don't leak heap contents through XEN_EFI_get_next_variable_name
Commit 1f4eb9d27d0e ("EFI: fix getting EFI variable list on some
systems") switched to using the caller provided size for the copy-out
without making sure the copied buffer is properly scrubbed.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Thu, 6 Feb 2020 08:51:17 +0000 (09:51 +0100)]
EFI: re-check {get,set}-variable name strings after copying in
A malicious guest given permission to invoke XENPF_efi_runtime_call may
play with the strings underneath Xen sizing them and copying them in.
Guard against this by re-checking the copyied in data for consistency
with the initial sizing. At the same time also check that the actual
copy-in is in fact successful, and switch to the lighter weight non-
checking flavor of the function.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Wei Liu [Wed, 15 Jan 2020 16:40:49 +0000 (16:40 +0000)]
x86/hyperv: setup hypercall page
Hyper-V uses a technique called overlay page for its hypercall page. It
will insert a backing page to the guest when the hypercall functionality
is enabled. That means we can use a page that is not backed by real
memory for hypercall page.
To avoid shattering L0 superpages and treading on any MMIO areas
residing in low addresses, use the top-most addressable page for that
purpose. Adjust e820 map accordingly.
We also need to register Xen's guest OS ID to Hyper-V. Use 0x3 as the
vendor ID. Fix the comment in hyperv-tlfs.h while at it.
Signed-off-by: Wei Liu <liuwe@microsoft.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Paul Durrant <pdurrant@amazon.com>
Wei Liu [Wed, 8 Jan 2020 21:35:23 +0000 (21:35 +0000)]
x86: provide executable fixmap facility
This allows us to set aside some address space for executable mapping.
This fixed map range starts from XEN_VIRT_END so that it is within reach
of the .text section.
Shift the percpu stub range and shrink livepatch range accordingly.
Signed-off-by: Wei Liu <liuwe@microsoft.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Wed, 5 Feb 2020 12:53:14 +0000 (13:53 +0100)]
x86/mem_sharing: use default_access in add_to_physmap
When plugging a hole in the target physmap don't use the access permission
returned by __get_gfn_type_access as it is non-sensical (p2m_access_n) in
the use-case add_to_physmap was intended to be used in. It leads to vm_events
being sent out for access violations at unexpected locations. Make use of
p2m->default_access instead and document the ambiguity surrounding "hole"
types and corner-cases with custom mem_access being set on holes.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Wed, 5 Feb 2020 12:52:29 +0000 (13:52 +0100)]
x86/hvm: introduce hvm_copy_context_and_params
Currently the hvm parameters are only accessible via the HVMOP hypercalls. In
this patch we introduce a new function that can copy both the hvm context and
parameters directly into a target domain.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Wed, 5 Feb 2020 12:50:46 +0000 (13:50 +0100)]
x86/vvmx: don't enable interrupt window when using virt intr delivery
If virtual interrupt delivery is used to inject the interrupt to the
guest the interrupt window shouldn't be enabled, as the interrupt is
already injected using the GUEST_INTR_STATUS vmcs field.
Reported-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monné [Wed, 5 Feb 2020 12:50:09 +0000 (13:50 +0100)]
x86/vvmx: fix VM_EXIT_ACK_INTR_ON_EXIT handling
When VM_EXIT_ACK_INTR_ON_EXIT is clear in the vmexit control vmcs
register the bit 31 of VM_EXIT_INTR_INFO must be 0, in order to denote
that the field doesn't contain any interrupt information. This is not
currently acknowledged as the field always get filled with valid
interrupt information, regardless of whether VM_EXIT_ACK_INTR_ON_EXIT
is set.
Fix this and only fill VM_EXIT_INTR_INFO when VM_EXIT_ACK_INTR_ON_EXIT
is set. Note that this requires one minor change in
nvmx_update_apicv in order to obtain the interrupt information from
the internal state rather than the nested vmcs register.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monné [Wed, 5 Feb 2020 12:49:09 +0000 (13:49 +0100)]
x86/vvmx: fix virtual interrupt injection when Ack on exit control is used
When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
interrupts shouldn't be injected using the virtual interrupt delivery
mechanism unless the Ack on exit vmexit control bit isn't set in the
nested vmcs.
Gate the call to nvmx_update_apicv helper on whether the nested vmcs
has the Ack on exit bit set in the vmexit control field.
Note that this fixes the usage of x2APIC by the L1 VMM, at least when
the L1 VMM is Xen.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 4 Feb 2020 20:29:38 +0000 (20:29 +0000)]
libxc/restore: Fix REC_TYPE_X86_PV_VCPU_XSAVE data auditing (take 2)
It turns out that a bug (since forever) in Xen causes XSAVE records to have
non-architectural behaviour on xsave-capable hardware, when a PV guest has not
touched the state.
In such a case, the data record returned from Xen is 2*uint64_t, both claiming
the (illegitimate) state of %xcr0 and %xcr0_accum being 0.
Adjust the bound in handle_x86_pv_vcpu_blob() to cope with this.
Fixes: 2a62c22715b "libxc/restore: Fix data auditing in handle_x86_pv_vcpu_blob()" Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Wed, 5 Feb 2020 11:24:12 +0000 (11:24 +0000)]
libxl: fix assertion failure in stub domain creation
An assertion in libxl__domain_make():
'soft_reset || *domid == INVALID_DOMID'
does not hold true for stub domain creation, where soft_reset is false
but the passed in domid == 0. This is easily fixed by changing the
initializer in libxl__spawn_stub_dm().
NOTE: The comment for XEN_DOMCTL_createdomain in domctl.h is changed to
reflect reality.
Fixes: 75259239d85d ("libxl_create: make 'soft reset' explicit") Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Stefan Bader [Tue, 4 Feb 2020 09:34:23 +0000 (09:34 +0000)]
tools/xenstore: Re-introduce (fake) xs_restrict call to preserve ABI
libxenstore3.0 in Xen 4.8 had this function. We don't really want to
bump the ABI version (soname) just for this, since we don't think
there are actual callers anywhere. But tools complain about the
symbol going away.
So, provide a function xs_restrict which conforms to the original
semantics, although it always fails.
Gbp-Pq: Topic xenstore
Gbp-Pq: Name tools-fake-xs-restrict.patch Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Mon, 3 Feb 2020 12:07:19 +0000 (13:07 +0100)]
x86/EPT: do away with hidden GUEST_TABLE_MAP_FAILED == 0 assumptions
The code is quite a bit easier to read and to reason about this way,
I think.
In ept_set_entry() additionally change the function's return value in
the MAP_FAILED case to -ENOMEM; -ENOENT would be applicable only when
ept_next_entry() was invoked with "read_only" set to true.
In two cases, where ept_next_level() follows an ept_split_superpage()
invocation, actually tighten the loop exit condition from
"== MAP_FAILED" to "!= NORMAL_PAGE". Continuing these loops for other
than NORMAL_PAGE is invalid, and there are ASSERT()s in place after
these loops.
Also reduce the scope of "ret" variables where possible, in particular
to better distinguish them from "rc" often used in the same function.
Finally drop pointless "else" in a few areas touched anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Juergen Gross [Mon, 3 Feb 2020 12:04:30 +0000 (13:04 +0100)]
xen: split parameter related definitions in own header file
Move the parameter related definitions from init.h into a new header
file param.h. This will avoid include hell when new dependencies are
added to parameter definitions.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Julien Grall <julien@xen.org> Acked-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Mon, 27 Jan 2020 13:34:12 +0000 (13:34 +0000)]
xen/x86: domctl: Don't leak data via XEN_DOMCTL_gethvmcontext
The HVM context may not fill up the full buffer passed by the caller.
While we report corectly the size of the context, we will still be
copying back the full size of the buffer.
As the buffer is allocated through xmalloc(), we will be copying some
bits from the previous allocation.
Only copy back the part of the buffer used by the HVM context to prevent
any leak.
Note that per XSA-72, this is not a security issue.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Mon, 20 Jan 2020 14:10:57 +0000 (14:10 +0000)]
xen/x86: domain: Remove specific case when allocating struct domain
Commit 8916fcf4577 "x86/domain: compile with lock_profile=y enabled"
allowed the struct domain to use more than a PAGE_SIZE (i.e 4096).
However, the function free_domheap_struct() will only free the first
page.
We could modify the free part to free the correct number of pages, but
the structure has been fitting in a page (even with lock profile
enabled) since commit 428607a410 "x86: shrink 'struct domain', was
already PAGE_SIZE" (part of Xen 4.7).
Therefore, the specific case for lock profile is now removed.
This is not a security issue because struct domain can only be bigger
than a page size for lock profiling. The feature can only be selected
in DEBUG and EXPERT mode.
Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled") Reported-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Fri, 31 Jan 2020 14:25:57 +0000 (15:25 +0100)]
tools/xenstore: don't apply write limiting for privileged domain
Xenstore write limiting should not be applied to dom0. Unfortunately
write limiting is disabled only for connections via sockets. When
running in a stubdom Xenstore will apply write limiting to dom0, too.
Change that by testing for the domain to be privileged as well.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Fri, 31 Jan 2020 15:01:45 +0000 (15:01 +0000)]
libxl: generalise libxl__domain_userdata_lock()
This function implements a file-based lock with a file name generated
from a domid.
This patch splits it into two, generalising the core of the locking code
into a new libxl__lock_file() function which operates on a specified file,
leaving just the file name generation in libxl__domain_userdata_lock().
This patch also generalises libxl__unlock_domain_userdata() to
libxl__unlock_file() and modifies all call-sites.
Suggested-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Fri, 31 Jan 2020 15:01:44 +0000 (15:01 +0000)]
libxl_create: make 'soft reset' explicit
The 'soft reset' code path in libxl__domain_make() is currently taken if a
valid domid is passed into the function. A subsequent patch will enable
higher levels of the toolstack to determine the domid of newly created or
restored domains and therefore this criteria for choosing 'soft reset'
will no longer be usable.
This patch adds an extra boolean option to libxl__domain_make() to specify
whether it is being invoked in soft reset context and appropriately
modifies callers to choose the right value. To facilitate this, a new
'soft_reset' boolean field is added to struct libxl__domain_create_state
and the 'domid_soft_reset' field is renamed to 'domid' in anticipation of
its wider remit. For the moment do_domain_create() will always set
domid to INVALID_DOMID and hence we can add an assertion into
libxl__domain_create() that, if it is not called in soft reset context,
the passed in domid is exactly that value.
Whilst in the neighbourhood, some checks of 'restore_fd > -1' have been
replaced by 'restore_fd >= 0' to be more conventional and consistent with
checks of 'restore_fd < 0'.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Fri, 31 Jan 2020 15:01:43 +0000 (15:01 +0000)]
libxl: add definition of INVALID_DOMID to the API
Currently both xl and libxl have internal definitions of INVALID_DOMID
which happen to be identical. However, for the purposes of describing the
behaviour of libxl_domain_create_new/restore() it is useful to have a
specified invalid value for a domain id.
This patch therefore moves the libxl definition from libxl_internal.h to
libxl.h and removes the internal definition from xl_utils.h. The hardcoded
'-1' passed back via domcreate_complete() is then updated to INVALID_DOMID
and comment above libxl_domain_create_new/restore() is accordingly
modified.
NOTE: The value of INVALID_DOMID (~0) is distinct from the hypervisor's
DOMID_INVALID. This patch preserves that value.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Fri, 31 Jan 2020 15:47:29 +0000 (16:47 +0100)]
x86/HVM: relinquish resources also from hvm_domain_destroy()
Domain creation failure paths don't call domain_relinquish_resources(),
yet allocations and alike done from hvm_domain_initialize() need to be
undone nevertheless. Call the function also from hvm_domain_destroy(),
after making sure all descendants are idempotent.
Note that while viridian_{domain,vcpu}_deinit() were already used in
ways suggesting they're idempotent, viridian_time_vcpu_deinit() actually
wasn't: One can't kill a timer that was never initialized.
For hvm_destroy_all_ioreq_servers()'s purposes make
relocate_portio_handler() return whether the to be relocated port range
was actually found. This seems cheaper than introducing a flag into
struct hvm_domain's ioreq_server sub-structure.
In hvm_domain_initialise() additionally
- use XFREE() also to replace adjacent xfree(),
- use hvm_domain_relinquish_resources() as being idempotent now.
There as well as in hvm_domain_destroy() the explicit call to
rtc_deinit() isn't needed anymore.
In hvm_domain_relinquish_resources() additionally drop a no longer
relevant if().
Fixes: e7a9b5e72f26 ("viridian: separately allocate domain and vcpu structures") Fixes: 26fba3c85571 ("viridian: add implementation of synthetic timers") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Jan Beulich [Thu, 30 Jan 2020 16:19:46 +0000 (17:19 +0100)]
x86: fold linker script pre-processing rules
There's no need to have twice almost the same rule. Simply add the extra
-DEFI to AFLAGS for the EFI variant, and specify both targets for the
then single rule.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 30 Jan 2020 16:18:12 +0000 (17:18 +0100)]
x86: undo part of "refine link time stub area related assertion"
The original check was not too strict: While we don't use one page of
memory per CPU, we do use ons page of VA space per CPU. It is the
latter which matters here.
Undo that part of the change, but leave everything else in place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 11 Dec 2019 13:55:06 +0000 (13:55 +0000)]
xen: Move CONFIG_INDIRECT_THUNK to Kconfig
Now that Kconfig has the capability to run shell command when
generating CONFIG_* we can use it in some cases to test CFLAGS.
CONFIG_INDIRECT_THUNK is a good example that wants to exist both in
Makefile and as a C macro, which Kconfig do. So use Kconfig to
generate CONFIG_INDIRECT_THUNK and have the CFLAGS depends on that.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 4 Dec 2019 17:13:51 +0000 (17:13 +0000)]
xen: Import cc-ifversion from Kbuild
This is in preparation of importing Kbuild to build Xen. We won't be
able to include Config.mk so we will need a replacement for the macro
`cc-ifversion'.
This patch imports parts of "scripts/Kbuild.include" from Linux v5.4,
the macro cc-ifversion. It makes use of CONFIG_GCC_VERSION that
Kconfig now provides.
Since they are no other use of Xen's `cc-ifversion' macro, we can
remove it.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 4 Dec 2019 16:33:23 +0000 (16:33 +0000)]
xen: Have Kconfig check $(CC)'s version
This import several files from Linux v5.3
- scripts/Kconfig.include
- scripts/clang-version.sh
- scripts/gcc-version.sh
and several config values from from Linux's init/Kconfig file.
But gcc-version.sh have been modified to return "0" when $CC isn't
GCC, like clang-version.sh do.
Files are copied into scripts/ directory because that's were the files
are found in Linux tree, and also because we are going to import more
of Kbuild from Linux which is located in scripts/.
CONFIG_GCC_VERSION and CONFIG_CC_IS_CLANG are going to be use in
follow-up patches.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Tue, 17 Sep 2019 13:13:50 +0000 (14:13 +0100)]
xen: Update Kconfig to Linux v5.4
This patch updates Kconfig to a more recent version of Kconfig, found
in Linux v5.4.0, 219d54332a09 ("Linux 5.4").
With the updated version of Kconfig, other changes are necessary to
avoid breaking the build.
Kconfig files:
- fix Kconfig files that where using option env=*:
Since Linux commit 104daea149c4 ("kconfig: reference environment
variables directly and remove 'option env='"), we can access the
environment directly via $() and "option env=" as been removed.
- CONFIG_EXPERT='y' will now appear in .config file if
XEN_CONFIG_EXPERT=y in the environment. The alternative is to change
"EXPERT" to "$(XEN_CONFIG_EXPERT)" in all Kconfig files.
Makefile:
- silentoldconfig target as been removed from Kconfig. To update
include/generated/autoconf.h, we need to use syncconfig target
instead.
Makefile.kconfig:
- Import newer needed code from Linux's Makefile.lib and
Kbuild.include and Makefile.build.
- Set Q to empty, Xen build system doesn't silence commands. Having Q
empty mean we can import stuff from Linux without having to remove the
leading $(Q) from build commands. And quiet='' means commands will be
echoed.
- Add $(PHONY) to .PHONY. Like it is intended by Kbuild.
Makefile.host is also updated and copied from Linux.
Dependency change:
- Now depends on flex/bison, maybe we could _shipped those files like
before. Linux doesn't do that anymore.
The .gitignore in kconfig/ has more entries, compared to upstream, for
file generated by Makefile.host.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tamas K Lengyel [Wed, 29 Jan 2020 14:06:50 +0000 (15:06 +0100)]
x86/mem_access: use __get_gfn_type_access in set_mem_access
Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking
when the mem_access permission is being set on a page that has not yet been
copied over from the parent.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Igor Druzhinin [Wed, 29 Jan 2020 14:06:10 +0000 (15:06 +0100)]
x86/suspend: disable watchdog before calling console_start_sync()
... and enable it after exiting S-state. Otherwise accumulated
output in serial buffer might easily trigger the watchdog if it's
still enabled after entering sync transmission mode.
The issue observed on machines which, unfortunately, generate non-0
output in CPU offline callbacks.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Wed, 29 Jan 2020 13:48:15 +0000 (14:48 +0100)]
x86/mem_sharing: replace MEM_SHARING_DEBUG with gdprintk
Using XENLOG_ERR level since this is only used in debug paths (ie. it's
expected the user already has loglvl=all set). Also use %pd to print the domain
ids.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Wed, 29 Jan 2020 13:47:00 +0000 (14:47 +0100)]
x86/apic: fix disabling LVT0 in disconnect_bsp_APIC
The Intel SDM states:
"When an illegal vector value (0 to 15) is written to a LVT entry and
the delivery mode is Fixed (bits 8-11 equal 0), the APIC may signal an
illegal vector error, without regard to whether the mask bit is set or
whether an interrupt is actually seen on the input."
And that's exactly what's currently done in disconnect_bsp_APIC when
virt_wire_setup is true and LVT LINT0 is being masked. By writing only
APIC_LVT_MASKED Xen is actually setting the vector to 0 and the
delivery mode to Fixed (0), and hence it triggers an APIC error even
when the LVT entry is masked.
This would usually manifest when Xen is being shut down, as that's
where disconnect_bsp_APIC is called:
(XEN) APIC error on CPU0: 40(00)
Fix this by calling clear_local_APIC prior to setting the LVT LINT
registers which already clear LVT LINT0, and hence the troublesome
write can be avoided as the register is already cleared.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Paul Durrant <pdurrant@amazon.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Ian Jackson [Fri, 10 Jan 2020 13:19:36 +0000 (13:19 +0000)]
libxl: event: Move poller pipe emptying to the end of afterpoll
This seems neater. It doesn't have any significant effect because:
The poller fd wouldn't be emptied by time_occurs. It would only be
woken by time_occurs as a result of an ao completing, or by
libxl__egc_ao_cleanup_1_baton. But ...1_baton won't be called in
between (for one thing, this would violate the rule of not still
having the active caller when ...1_baton is called).
While discussing this patch, I noticed that there is a possibility (in
libxl in general) that poller_put might be called on a woken poller.
It would probably be sensible at some point to make poller_get empty
the pipe, at least if the pipe_nonempty flag is set.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
v3: Completely revised commit message; now we think this is just
cleanup.
Ian Jackson [Fri, 10 Jan 2020 13:05:42 +0000 (13:05 +0000)]
libxl: event: Fix possible hang with libxl_osevent_beforepoll
If the application uses libxl_osevent_beforepoll, a similar hang is
possible to the one described and fixed in
libxl: event: Fix hang when mixing blocking and eventy calls
Application behaviour would have to be fairly unusual, but it
doesn't seem sensible to just leave this latent bug.
We fix the latent bug by waking up the "poller_app" pipe every time we
add osevents. If the application does not ever call beforepoll, we
write one byte to the pipe and set pipe_nonempty and then we ignore
it. We only write another byte if beforepoll is called again.
Normally in an eventy program there would only be one thread calling
libxl_osevent_beforepoll. The effect in such a program is to
sometimes needlessly go round the poll loop again if a timeout
callback becomes interested in a new osevent. We'll fix that in a
moment.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: New addition to correctness arguments in libxl_event.c comment.
Ian Jackson [Fri, 10 Jan 2020 13:11:07 +0000 (13:11 +0000)]
libxl: event: Break out baton_wake
No functional change.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: Now it takes a gc, not an egc.
Ian Jackson [Fri, 10 Jan 2020 13:11:46 +0000 (13:11 +0000)]
libxl: event: poller pipe optimisation
Track in userland whether the poller pipe is nonempty. This saves us
writing many many bytes to the pipe if nothing ever reads them.
This is going to be relevant in a moment, where we are going to create
a situation where this will happen quite a lot.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
Ian Jackson [Fri, 10 Jan 2020 12:37:43 +0000 (12:37 +0000)]
libxl: event: Fix hang when mixing blocking and eventy calls
If the application calls libxl with ao_how==0 and also makes calls
like _occurred, libxl will sometimes get stuck.
The bug happens as follows (for example):
Thread A
libxl_do_thing(,ao_how==0)
libxl_do_thing starts, sets up some callbacks
libxl_do_thing exit path calls AO_INPROGRESS
libxl__ao_inprogress goes into event loop
eventloop_iteration sleeps on:
- do_thing's current fd set
- sigchld pipe if applicable
- its poller
Thread B
libxl_something_occurred
the something is to do with do_thing, above
do_thing_next_callback does some more work
do_thing_next_callback becomes interested in fd N
thread B returns to application
Note that nothing wakes up thread A. A is not listening on fd N. So
do_thing_* will not spot when fd N signals. do_thing will not make
further timely progress. If there is no timeout thread A will never
wake up.
The problem here occurs because thread A is waiting on an out of date
osevent set.
There is also the possibility that a thread might block waiting for
libxl osevents but outside libxl, eg if the application used
libxl_osevent_beforepoll. We will deal with that in a moment.
See the big comment in libxl_event.c for a fairly formal correctness
argument.
This depends on libxl__egc_ao_cleanup_1_baton being called everywhere
an egc or ao is disposed of. Firstly egcs: in this patch we rename
libxl__egc_cleanup, which means we catch all the disposal sites.
Secondly aos: these are disposed of by (i) AO_CREATE_FAIL
(ii) ao__inprogress and (iii) an event which completes the ao later.
(i) and (ii) we handle by adding the call to _baton. In the case of
(iii) any such function must be an event-generating function so it has
an egc too, so it will pass on the baton when the egc is disposed.
Reported-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: Call libxl__egc_ao_cleanup_1_baton (renamed from __egc_cleanup) on
all exits from ao_inprogress, even requests for async processing.
Fixes a remaining instance of this bug (!)
This involves disposing of ao->poller somewhat earlier.
v2: New correctness arguments in libxl_event.c comment and
in commit message.
Ian Jackson [Mon, 13 Jan 2020 15:56:28 +0000 (15:56 +0000)]
libxl: event: Make libxl__poller_wakeup take a gc, not an egc
We are going to want to call this in the following situation:
* We have just set up an ao, which is to call back - so a
non-synchronous one. It ought not to call the application
back right away, so no egc.
* There is a libxl thread blocking somewhere but it is using
using an out of date fd or timeout set, which does not take into
account the ao we have just started.
* We try to wake that thread up, but libxl__poller_wakeup fails.
In more detail:
The idea before was that these two functions take an egc, not so much
because it actually uses the egc, but to make sure it's only called in a
restricted set of conditions; and now we're relaxing those conditions.
Specifically, we need to make one exception, relating to ao's.
In the situation described above, there is no egc, but we need to call
libxl__poller_wakeup. Introducing an egc is wrong because that would
imply that this situation might result in application callbacks, but
it shouldn't (and not having an egc prevents that).
libxl__poller_wakeup and LIBXL__EVENT_DISASTER only take an egc for
form's sake; they don't use any part of it other than the gc. The
"form's sake" is to stop them being called from libxl entrypoints that
are not involved in event generation.
Before this patch this is enforced by the types: you can't call it in
the wrong place because it wants an egc which you don't have.
After this patch this is no longer enforced. But the mistake
(principally, calling _DISASTER) seems unlikely. The type enforcement
I mention above was done because it was possible and easy, not because
it was important.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v3: Significantly expanded commit message based on irc comments
v2: New patch
Ian Jackson [Mon, 13 Jan 2020 15:53:39 +0000 (15:53 +0000)]
libxl: event: Make LIBXL__EVENT_DISASTER take a gc, not an egc
We are going to want to change libxl__poller_wakeup to take a gc.
In theory there is a risk here that it would be called inappropriately
in a future patch but this seems unlikely.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
v2: New patch
Ian Jackson [Thu, 9 Jan 2020 18:54:19 +0000 (18:54 +0000)]
libxl: event: Introduce CTX_UNLOCK_EGC_FREE
This is a very common exit pattern. We are going to want to change
this pattern. So we should make it into a macro of its own.
No functional change.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
Ian Jackson [Thu, 9 Jan 2020 18:20:24 +0000 (18:20 +0000)]
libxl: event: Rename ctx.pollers_fd_changed to .pollers_active
We are going to use this a bit more widely. Make the name more
general.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
Ian Jackson [Thu, 9 Jan 2020 18:06:54 +0000 (18:06 +0000)]
libxl: event: Rename poller.fds_changed to .fds_deregistered
This is only for deregistration. We are going to add another variable
for new events, with different semantics, and this overly-general name
will become confusing.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
Paul Durrant [Mon, 27 Jan 2020 15:19:07 +0000 (15:19 +0000)]
docs: retrospectively add XS_DIRECTORY_PART to the xenstore protocol...
... specification.
This was added by commit 0ca64ed8 "xenstore: add support for reading
directory with many children" but not added to the specification at that
point. A version of xenstored supporting the command was first released
in Xen 4.9.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Backport: 4.9+