Paul Durrant [Thu, 30 Jan 2020 11:55:35 +0000 (11:55 +0000)]
add a domain_tot_pages() helper function
This patch adds a new domain_tot_pages() inline helper function into
sched.h, which will be needed by a subsequent patch.
No functional change.
NOTE: While modifying the comment for 'tot_pages' in sched.h this patch
makes some cosmetic fixes to surrounding comments.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: George Dunlap <George.Dunlap@eu.citrix.com> Acked-by: Julien Grall <julien@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Wei Liu [Thu, 13 Feb 2020 21:40:27 +0000 (21:40 +0000)]
libxl: mark parameters in stub functions as unused
Hopefully this can fix issues like:
In file included from ../../src/libxl/xen_xl.c:24:0:
/home/osstest/build.147035.build-amd64-libvirt/xendist/usr/local/include/libxl.h: In function 'libxl_cpuid_apply_policy':
/home/osstest/build.147035.build-amd64-libvirt/xendist/usr/local/include/libxl.h:2345:56: error: unused parameter 'ctx' [-Werror=unused-parameter]
static inline void libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid) {}
Fixes: dacb80f9 ("tools/libxl: Remove libxl_cpuid_{set,apply_policy}() from the API") Signed-off-by: Wei Liu <wl@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 13 Feb 2020 13:42:00 +0000 (13:42 +0000)]
automation: update debian:unstable-arm64v8 to have python3-config
The Arm container wasn't updated in the original patch.
Fixes: 1a3673da6482 ("automation: updating container to have python3-config binary") Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Wed, 12 Feb 2020 07:41:53 +0000 (08:41 +0100)]
xenstore: add console xenstore entries for xenstore stubdom
In order to be able to connect to the console of Xenstore stubdom we
need to create the appropriate entries in Xenstore.
For the moment we don't support xenconsoled living in another domain
than dom0, as this information isn't available other then via
Xenstore which we are just setting up.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
In order to be able to get access to the console of Xenstore stubdom
we need an appropriate granttab entry. So call xc_dom_gnttab_init()
when constructing the domain and preset some information needed
for that function in the dom structure.
We need to create the event channel for the console, too. Do that and
store all necessary data locally.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Wed, 12 Feb 2020 09:52:20 +0000 (10:52 +0100)]
dom0-build: fix build with clang5
With non-empty CONFIG_DOM0_MEM clang5 produces
dom0_build.c:344:24: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
^ ~~~~~~~~~~~~~~~~~~
dom0_build.c:344:24: note: use '&' for a bitwise operation
if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
^~
&
dom0_build.c:344:24: note: remove constant to silence this warning
if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
~^~~~~~~~~~~~~~~~~~~~~
1 error generated.
Obviously neither of the two suggestions are an option here. Oddly
enough swapping the operands of the && helps, while e.g. casting or
parenthesizing doesn't. Another workable variant looks to be the use of
!! on the constant.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien@xen.org> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Wed, 5 Feb 2020 15:25:21 +0000 (15:25 +0000)]
tools/libxl: Combine legacy CPUID handling logic
While we are in the process of overhauling boot time CPUID/MSR handling, the
existing logic is going to have to remain in roughly this form for backwards
compatibility.
Fold libxl__cpuid_apply_policy() and libxl__cpuid_set() together into a single
libxl__cpuid_legacy() to reduce the complexity for callers.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Thu, 5 Dec 2019 17:19:47 +0000 (17:19 +0000)]
MAINTAINERS: Add explicit check-in policy section
The "nesting" section in the MAINTAINERS file was not initially
intended to describe the check-in policy for patches, but only how
nesting worked; but since there was no check-in policy, it has been
acting as a de-facto policy.
One problem with this is that the policy is not complete: It doesn't
cover open objections, time to check-in, or so on. The other problem
with the policy is that, as written, it doesn't account for
maintainers submitting patches to files which they themselves
maintain. This is fine for situations where there are are multiple
maintainers, but not for situations where there is only one
maintainer.
Add an explicit "Check-in policy" section to the MAINTAINERS document
to serve as the canonical reference for the check-in policy. Move
paragraphs not explicitly related to nesting into it.
While here, "promote" the "The meaning of nesting" section title.
DISCUSSION
This seems to be a change from people's understanding of the current
policy. Most people's understanding of the current policy seems to be:
1. In order to get a change to a given file committed, it must have
an Ack or Review from at least one *maintainer* of that file other
than the submitter.
2. In the case where a file has only one maintainer, it must have an
Ack or Review from a "nested" maintainer.
I.e., if I submitted something to x86/mm, it would require an Ack from
Jan or Andy, or (in exceptional circumstances) The Rest; but an Ack from
(say) Roger or Juergen wouldn't suffice.
Let's call this the "maintainer-ack" approach (because it must have an
ack or r-b from a maintainer to be checked in), and the proposal in
this patch the "maintainer-approval" (since SoB from a maintainer
indicates approval).
The core issue I have with "maintainer-ack" is that it makes the
maintainer less privileged with regard to writing code than
non-maintainers. If component X has maintainers A and B, then a
non-maintainer can have code checked in if reviewed either by A or B.
If A or B wants code checked in, they have to wait for exactly one
person to review it.
In fact, if B is quite busy, the easiest way for A really to get their
code checked in might be to hand it to a non-maintainer N, and ask N
to submit it as their own. Then A can Ack the patches and check them
in.
The current system, therefore, either sets up a perverse incentive (if
you think the behavior described above is unacceptable) or unnecessary
bureaucracy (if you think it's acceptable). Either way I think we
should set up our system to avoid it.
Other variations on "maintainer-ack" have been proposed:
- Allow maintainer's patches to go in with an R-b from "designated
reviewers"
- Allow maintainer's patches to go in with an Ack from more general
maintainer
Both fundamentally make it harder for maintainers to get their code in
and/or reviewed effectively than non-maintainers, setting up the
perverse incentive / unnecessary bureaucracy.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 10 Feb 2020 18:33:26 +0000 (18:33 +0000)]
x86/pvh: Adjust dom0's starting state
Fixes: b25fb1a04e "xen/pvh: Fix segment selector ABI" Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sun, 2 Feb 2020 18:23:47 +0000 (18:23 +0000)]
AMD/IOMMU: Treat head/tail pointers as byte offsets
The MMIO registers as already byte offsets. Using them in this form removes
the need to shift their values for use.
It is also inefficient to store both entries and alloc_size (which only differ
by entry_size). Rename alloc_size to size, and drop entries entirely, which
simplifies the allocation/deallocation helpers slightly.
Mark send_iommu_command() and invalidate_iommu_all() as static, as they have
no external declaration or callers.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Thu, 6 Feb 2020 16:48:10 +0000 (16:48 +0000)]
MAINTAINERS: cc community manager on patches to CHANGELOG.md
The purpose of the change-log is to be a human-readable list of notable
changes and, as such, additions to it are likely of interest to the
community manager in preparing blog entries, release summaries, etc.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Julien Grall <julien@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Mon, 10 Feb 2020 11:27:32 +0000 (11:27 +0000)]
x86/svm: Reduce vmentry latency
Writing to the stack pointer in the middle of a line of pop operations is
specifically recommended against by the optimisation guide, and is a technique
used by Speculative Load Hardening to combat SpectreRSB.
In practice, it causes all further stack-relative accesses to block until the
write to the stack pointer retires, so the stack engine can get back in sync.
Pop into any dead register to discard %rax's value without clobbering the
stack engine. Smaller compiled code, and runs faster.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 3 Feb 2020 13:50:34 +0000 (13:50 +0000)]
AMD/IOMMU: Treat guest head/tail pointers as byte offsets
The MMIO registers as already formatted as byte offsets. Start by masking out
reserved bits, which fixes an implementation bug (reserved bits should be
read-only zero, rather than preserving their previously-written value). As a
consequence, we can use the values directly, instead of masking/shifting on
every use.
Store the buffer size, rather than the number of entries, to keep the same
units for comparison purposes.
This simplifies guest_iommu_get_table_mfn() by dropping the entry_size
parameter, and simplifies the map_domain_page() handling by being able to drop
the log_base variables.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Drop all status register MASK/SHIFT constants, and enumerate the bits
normally. Rename EVENT_OVERFLOW to EVENT_LOG_OVERFLOW for consistency. (The
field name in the spec is inconsistent, despite the description referring to
an overflow of the event log.)
The only semantic change is in iommu_reset_log() where 'run_bit' changes from
being a bit position to being a single-bit mask. Update some local variable
types to be more suitable.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 3 Feb 2020 13:09:17 +0000 (13:09 +0000)]
AMD/IOMMU: Move headers to be local
We currently have amd-iommu-defs.h, amd-iommu-proto.h and amd-iommu.h, and no
references outside of the AMD IOMMU driver.
Keep iommu-defs.h as is, but merge amd-iommu.h and amd-iommu-proto.h to just
iommu.h, and move them both into drivers/passthrough/amd/. (While merging,
drop the bogus #pragma pack around the *_entry structures.)
Take the opportunity to trim the include lists, including x86/mm/p2m.c
which (AFAICT) hasn't needed this include since c/s aef3f2275 "x86/mm/p2m:
break into common, pt-implementation and pod parts" in 2011.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sat, 8 Feb 2020 14:47:48 +0000 (14:47 +0000)]
xen/pvh: Fix segment selector ABI
The written ABI states that %es will be set up, but libxc doesn't do so. In
practice, it breaks `rep movs` inside guests before they reload %es.
The written ABI doesn't mention %ss, but libxc does set it up. Having %ds
different to %ss is obnoxous to work with, as different registers have
different implicit segments.
Modify the spec to state that %ss is set up as a flat read/write segment.
This a) matches the Multiboot 1 spec, b) matches what is set up in practice,
and c) is the more sane behaviour for guests to use.
Fixes: 68e1183411b ('libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests') Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wl@xen.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jeff Kubascik [Tue, 4 Feb 2020 19:51:50 +0000 (14:51 -0500)]
xen/arm: Handle unimplemented VGICv3 registers as RAZ/WI
Per the ARM Generic Interrupt Controller Architecture Specification (ARM
IHI 0069E), reserved registers should generally be treated as RAZ/WI.
To simplify the VGICv3 design and improve guest compatibility, treat the
default case for GICD and GICR registers as read_as_zero/write_ignore.
Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com> Acked-by: Julien Grall <julien@xen.org>
Julien Grall [Thu, 6 Feb 2020 15:41:18 +0000 (15:41 +0000)]
xen/include: public: Document the padding in struct xen_hvm_param
There is an implicit padding of 2 bytes in struct xen_hvm_param between
the field domid and index. Make it explicit by introduce a padding
field. This can also serve as documentation.
Note that I don't think we can mandate it to be zero because a guest may
not have initialized the padding.
Signed-off-by: Julien Grall <jgrall@amazon.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Thu, 6 Feb 2020 15:23:30 +0000 (16:23 +0100)]
x86/HVM: reduce scope of pfec in hvm_emulate_init_per_insn()
It needs calculating only in one out of three cases. Re-structure the
code a little such that the variable truly gets calculated only when we
don't get any insn bytes from elsewhere, and hence need to (try to)
fetch them. Also OR in PFEC_insn_fetch right in the initializer.
While in this mood, restrict addr's scope as well.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Wei Liu [Wed, 5 Feb 2020 18:02:24 +0000 (18:02 +0000)]
x86/guest/xen: only set HVM parameter on BSP
There is no need for every CPU to set a guest property.
Suggested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Wei Liu <wl@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 6 Feb 2020 08:55:18 +0000 (09:55 +0100)]
domctl/vNUMA: avoid arithmetic overflow
Checking the result of a multiplication against a certain limit has no
sufficient implication on the original value's range. In the case here
it is in particular problematic that while handling the domctl we do
if ( copy_from_guest(info->vdistance, uinfo->vdistance,
nr_vnodes * nr_vnodes) )
goto vnuma_fail;
which means copying sizeof(unsigned int) * (nr_vnodes * nr_vnodes)
bytes, and the handling of XENMEM_get_vnumainfo similarly has
Jan Beulich [Thu, 6 Feb 2020 08:53:12 +0000 (09:53 +0100)]
xmalloc: guard against integer overflow
There are hypercall handling paths (EFI ones are what this was found
with) needing to allocate buffers of a caller specified size. This is
generally fine, as our page allocator enforces an upper bound on all
allocations. However, certain extremely large sizes could, when adding
in allocator overhead, result in an apparently tiny allocation size,
which would typically result in either a successful allocation, but a
severe buffer overrun when using that memory block, or in a crash right
in the allocator code.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Thu, 6 Feb 2020 08:52:33 +0000 (09:52 +0100)]
EFI: don't leak heap contents through XEN_EFI_get_next_variable_name
Commit 1f4eb9d27d0e ("EFI: fix getting EFI variable list on some
systems") switched to using the caller provided size for the copy-out
without making sure the copied buffer is properly scrubbed.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Thu, 6 Feb 2020 08:51:17 +0000 (09:51 +0100)]
EFI: re-check {get,set}-variable name strings after copying in
A malicious guest given permission to invoke XENPF_efi_runtime_call may
play with the strings underneath Xen sizing them and copying them in.
Guard against this by re-checking the copyied in data for consistency
with the initial sizing. At the same time also check that the actual
copy-in is in fact successful, and switch to the lighter weight non-
checking flavor of the function.
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Wei Liu [Wed, 15 Jan 2020 16:40:49 +0000 (16:40 +0000)]
x86/hyperv: setup hypercall page
Hyper-V uses a technique called overlay page for its hypercall page. It
will insert a backing page to the guest when the hypercall functionality
is enabled. That means we can use a page that is not backed by real
memory for hypercall page.
To avoid shattering L0 superpages and treading on any MMIO areas
residing in low addresses, use the top-most addressable page for that
purpose. Adjust e820 map accordingly.
We also need to register Xen's guest OS ID to Hyper-V. Use 0x3 as the
vendor ID. Fix the comment in hyperv-tlfs.h while at it.
Signed-off-by: Wei Liu <liuwe@microsoft.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Paul Durrant <pdurrant@amazon.com>
Wei Liu [Wed, 8 Jan 2020 21:35:23 +0000 (21:35 +0000)]
x86: provide executable fixmap facility
This allows us to set aside some address space for executable mapping.
This fixed map range starts from XEN_VIRT_END so that it is within reach
of the .text section.
Shift the percpu stub range and shrink livepatch range accordingly.
Signed-off-by: Wei Liu <liuwe@microsoft.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Wed, 5 Feb 2020 12:53:14 +0000 (13:53 +0100)]
x86/mem_sharing: use default_access in add_to_physmap
When plugging a hole in the target physmap don't use the access permission
returned by __get_gfn_type_access as it is non-sensical (p2m_access_n) in
the use-case add_to_physmap was intended to be used in. It leads to vm_events
being sent out for access violations at unexpected locations. Make use of
p2m->default_access instead and document the ambiguity surrounding "hole"
types and corner-cases with custom mem_access being set on holes.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Wed, 5 Feb 2020 12:52:29 +0000 (13:52 +0100)]
x86/hvm: introduce hvm_copy_context_and_params
Currently the hvm parameters are only accessible via the HVMOP hypercalls. In
this patch we introduce a new function that can copy both the hvm context and
parameters directly into a target domain.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Wed, 5 Feb 2020 12:50:46 +0000 (13:50 +0100)]
x86/vvmx: don't enable interrupt window when using virt intr delivery
If virtual interrupt delivery is used to inject the interrupt to the
guest the interrupt window shouldn't be enabled, as the interrupt is
already injected using the GUEST_INTR_STATUS vmcs field.
Reported-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monné [Wed, 5 Feb 2020 12:50:09 +0000 (13:50 +0100)]
x86/vvmx: fix VM_EXIT_ACK_INTR_ON_EXIT handling
When VM_EXIT_ACK_INTR_ON_EXIT is clear in the vmexit control vmcs
register the bit 31 of VM_EXIT_INTR_INFO must be 0, in order to denote
that the field doesn't contain any interrupt information. This is not
currently acknowledged as the field always get filled with valid
interrupt information, regardless of whether VM_EXIT_ACK_INTR_ON_EXIT
is set.
Fix this and only fill VM_EXIT_INTR_INFO when VM_EXIT_ACK_INTR_ON_EXIT
is set. Note that this requires one minor change in
nvmx_update_apicv in order to obtain the interrupt information from
the internal state rather than the nested vmcs register.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monné [Wed, 5 Feb 2020 12:49:09 +0000 (13:49 +0100)]
x86/vvmx: fix virtual interrupt injection when Ack on exit control is used
When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
interrupts shouldn't be injected using the virtual interrupt delivery
mechanism unless the Ack on exit vmexit control bit isn't set in the
nested vmcs.
Gate the call to nvmx_update_apicv helper on whether the nested vmcs
has the Ack on exit bit set in the vmexit control field.
Note that this fixes the usage of x2APIC by the L1 VMM, at least when
the L1 VMM is Xen.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 4 Feb 2020 20:29:38 +0000 (20:29 +0000)]
libxc/restore: Fix REC_TYPE_X86_PV_VCPU_XSAVE data auditing (take 2)
It turns out that a bug (since forever) in Xen causes XSAVE records to have
non-architectural behaviour on xsave-capable hardware, when a PV guest has not
touched the state.
In such a case, the data record returned from Xen is 2*uint64_t, both claiming
the (illegitimate) state of %xcr0 and %xcr0_accum being 0.
Adjust the bound in handle_x86_pv_vcpu_blob() to cope with this.
Fixes: 2a62c22715b "libxc/restore: Fix data auditing in handle_x86_pv_vcpu_blob()" Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Wed, 5 Feb 2020 11:24:12 +0000 (11:24 +0000)]
libxl: fix assertion failure in stub domain creation
An assertion in libxl__domain_make():
'soft_reset || *domid == INVALID_DOMID'
does not hold true for stub domain creation, where soft_reset is false
but the passed in domid == 0. This is easily fixed by changing the
initializer in libxl__spawn_stub_dm().
NOTE: The comment for XEN_DOMCTL_createdomain in domctl.h is changed to
reflect reality.
Fixes: 75259239d85d ("libxl_create: make 'soft reset' explicit") Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Stefan Bader [Tue, 4 Feb 2020 09:34:23 +0000 (09:34 +0000)]
tools/xenstore: Re-introduce (fake) xs_restrict call to preserve ABI
libxenstore3.0 in Xen 4.8 had this function. We don't really want to
bump the ABI version (soname) just for this, since we don't think
there are actual callers anywhere. But tools complain about the
symbol going away.
So, provide a function xs_restrict which conforms to the original
semantics, although it always fails.
Gbp-Pq: Topic xenstore
Gbp-Pq: Name tools-fake-xs-restrict.patch Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Mon, 3 Feb 2020 12:07:19 +0000 (13:07 +0100)]
x86/EPT: do away with hidden GUEST_TABLE_MAP_FAILED == 0 assumptions
The code is quite a bit easier to read and to reason about this way,
I think.
In ept_set_entry() additionally change the function's return value in
the MAP_FAILED case to -ENOMEM; -ENOENT would be applicable only when
ept_next_entry() was invoked with "read_only" set to true.
In two cases, where ept_next_level() follows an ept_split_superpage()
invocation, actually tighten the loop exit condition from
"== MAP_FAILED" to "!= NORMAL_PAGE". Continuing these loops for other
than NORMAL_PAGE is invalid, and there are ASSERT()s in place after
these loops.
Also reduce the scope of "ret" variables where possible, in particular
to better distinguish them from "rc" often used in the same function.
Finally drop pointless "else" in a few areas touched anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Juergen Gross [Mon, 3 Feb 2020 12:04:30 +0000 (13:04 +0100)]
xen: split parameter related definitions in own header file
Move the parameter related definitions from init.h into a new header
file param.h. This will avoid include hell when new dependencies are
added to parameter definitions.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Julien Grall <julien@xen.org> Acked-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Mon, 27 Jan 2020 13:34:12 +0000 (13:34 +0000)]
xen/x86: domctl: Don't leak data via XEN_DOMCTL_gethvmcontext
The HVM context may not fill up the full buffer passed by the caller.
While we report corectly the size of the context, we will still be
copying back the full size of the buffer.
As the buffer is allocated through xmalloc(), we will be copying some
bits from the previous allocation.
Only copy back the part of the buffer used by the HVM context to prevent
any leak.
Note that per XSA-72, this is not a security issue.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Mon, 20 Jan 2020 14:10:57 +0000 (14:10 +0000)]
xen/x86: domain: Remove specific case when allocating struct domain
Commit 8916fcf4577 "x86/domain: compile with lock_profile=y enabled"
allowed the struct domain to use more than a PAGE_SIZE (i.e 4096).
However, the function free_domheap_struct() will only free the first
page.
We could modify the free part to free the correct number of pages, but
the structure has been fitting in a page (even with lock profile
enabled) since commit 428607a410 "x86: shrink 'struct domain', was
already PAGE_SIZE" (part of Xen 4.7).
Therefore, the specific case for lock profile is now removed.
This is not a security issue because struct domain can only be bigger
than a page size for lock profiling. The feature can only be selected
in DEBUG and EXPERT mode.
Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled") Reported-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Fri, 31 Jan 2020 14:25:57 +0000 (15:25 +0100)]
tools/xenstore: don't apply write limiting for privileged domain
Xenstore write limiting should not be applied to dom0. Unfortunately
write limiting is disabled only for connections via sockets. When
running in a stubdom Xenstore will apply write limiting to dom0, too.
Change that by testing for the domain to be privileged as well.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Fri, 31 Jan 2020 15:01:45 +0000 (15:01 +0000)]
libxl: generalise libxl__domain_userdata_lock()
This function implements a file-based lock with a file name generated
from a domid.
This patch splits it into two, generalising the core of the locking code
into a new libxl__lock_file() function which operates on a specified file,
leaving just the file name generation in libxl__domain_userdata_lock().
This patch also generalises libxl__unlock_domain_userdata() to
libxl__unlock_file() and modifies all call-sites.
Suggested-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Fri, 31 Jan 2020 15:01:44 +0000 (15:01 +0000)]
libxl_create: make 'soft reset' explicit
The 'soft reset' code path in libxl__domain_make() is currently taken if a
valid domid is passed into the function. A subsequent patch will enable
higher levels of the toolstack to determine the domid of newly created or
restored domains and therefore this criteria for choosing 'soft reset'
will no longer be usable.
This patch adds an extra boolean option to libxl__domain_make() to specify
whether it is being invoked in soft reset context and appropriately
modifies callers to choose the right value. To facilitate this, a new
'soft_reset' boolean field is added to struct libxl__domain_create_state
and the 'domid_soft_reset' field is renamed to 'domid' in anticipation of
its wider remit. For the moment do_domain_create() will always set
domid to INVALID_DOMID and hence we can add an assertion into
libxl__domain_create() that, if it is not called in soft reset context,
the passed in domid is exactly that value.
Whilst in the neighbourhood, some checks of 'restore_fd > -1' have been
replaced by 'restore_fd >= 0' to be more conventional and consistent with
checks of 'restore_fd < 0'.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Fri, 31 Jan 2020 15:01:43 +0000 (15:01 +0000)]
libxl: add definition of INVALID_DOMID to the API
Currently both xl and libxl have internal definitions of INVALID_DOMID
which happen to be identical. However, for the purposes of describing the
behaviour of libxl_domain_create_new/restore() it is useful to have a
specified invalid value for a domain id.
This patch therefore moves the libxl definition from libxl_internal.h to
libxl.h and removes the internal definition from xl_utils.h. The hardcoded
'-1' passed back via domcreate_complete() is then updated to INVALID_DOMID
and comment above libxl_domain_create_new/restore() is accordingly
modified.
NOTE: The value of INVALID_DOMID (~0) is distinct from the hypervisor's
DOMID_INVALID. This patch preserves that value.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Fri, 31 Jan 2020 15:47:29 +0000 (16:47 +0100)]
x86/HVM: relinquish resources also from hvm_domain_destroy()
Domain creation failure paths don't call domain_relinquish_resources(),
yet allocations and alike done from hvm_domain_initialize() need to be
undone nevertheless. Call the function also from hvm_domain_destroy(),
after making sure all descendants are idempotent.
Note that while viridian_{domain,vcpu}_deinit() were already used in
ways suggesting they're idempotent, viridian_time_vcpu_deinit() actually
wasn't: One can't kill a timer that was never initialized.
For hvm_destroy_all_ioreq_servers()'s purposes make
relocate_portio_handler() return whether the to be relocated port range
was actually found. This seems cheaper than introducing a flag into
struct hvm_domain's ioreq_server sub-structure.
In hvm_domain_initialise() additionally
- use XFREE() also to replace adjacent xfree(),
- use hvm_domain_relinquish_resources() as being idempotent now.
There as well as in hvm_domain_destroy() the explicit call to
rtc_deinit() isn't needed anymore.
In hvm_domain_relinquish_resources() additionally drop a no longer
relevant if().
Fixes: e7a9b5e72f26 ("viridian: separately allocate domain and vcpu structures") Fixes: 26fba3c85571 ("viridian: add implementation of synthetic timers") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Jan Beulich [Thu, 30 Jan 2020 16:19:46 +0000 (17:19 +0100)]
x86: fold linker script pre-processing rules
There's no need to have twice almost the same rule. Simply add the extra
-DEFI to AFLAGS for the EFI variant, and specify both targets for the
then single rule.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 30 Jan 2020 16:18:12 +0000 (17:18 +0100)]
x86: undo part of "refine link time stub area related assertion"
The original check was not too strict: While we don't use one page of
memory per CPU, we do use ons page of VA space per CPU. It is the
latter which matters here.
Undo that part of the change, but leave everything else in place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 11 Dec 2019 13:55:06 +0000 (13:55 +0000)]
xen: Move CONFIG_INDIRECT_THUNK to Kconfig
Now that Kconfig has the capability to run shell command when
generating CONFIG_* we can use it in some cases to test CFLAGS.
CONFIG_INDIRECT_THUNK is a good example that wants to exist both in
Makefile and as a C macro, which Kconfig do. So use Kconfig to
generate CONFIG_INDIRECT_THUNK and have the CFLAGS depends on that.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 4 Dec 2019 17:13:51 +0000 (17:13 +0000)]
xen: Import cc-ifversion from Kbuild
This is in preparation of importing Kbuild to build Xen. We won't be
able to include Config.mk so we will need a replacement for the macro
`cc-ifversion'.
This patch imports parts of "scripts/Kbuild.include" from Linux v5.4,
the macro cc-ifversion. It makes use of CONFIG_GCC_VERSION that
Kconfig now provides.
Since they are no other use of Xen's `cc-ifversion' macro, we can
remove it.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 4 Dec 2019 16:33:23 +0000 (16:33 +0000)]
xen: Have Kconfig check $(CC)'s version
This import several files from Linux v5.3
- scripts/Kconfig.include
- scripts/clang-version.sh
- scripts/gcc-version.sh
and several config values from from Linux's init/Kconfig file.
But gcc-version.sh have been modified to return "0" when $CC isn't
GCC, like clang-version.sh do.
Files are copied into scripts/ directory because that's were the files
are found in Linux tree, and also because we are going to import more
of Kbuild from Linux which is located in scripts/.
CONFIG_GCC_VERSION and CONFIG_CC_IS_CLANG are going to be use in
follow-up patches.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Tue, 17 Sep 2019 13:13:50 +0000 (14:13 +0100)]
xen: Update Kconfig to Linux v5.4
This patch updates Kconfig to a more recent version of Kconfig, found
in Linux v5.4.0, 219d54332a09 ("Linux 5.4").
With the updated version of Kconfig, other changes are necessary to
avoid breaking the build.
Kconfig files:
- fix Kconfig files that where using option env=*:
Since Linux commit 104daea149c4 ("kconfig: reference environment
variables directly and remove 'option env='"), we can access the
environment directly via $() and "option env=" as been removed.
- CONFIG_EXPERT='y' will now appear in .config file if
XEN_CONFIG_EXPERT=y in the environment. The alternative is to change
"EXPERT" to "$(XEN_CONFIG_EXPERT)" in all Kconfig files.
Makefile:
- silentoldconfig target as been removed from Kconfig. To update
include/generated/autoconf.h, we need to use syncconfig target
instead.
Makefile.kconfig:
- Import newer needed code from Linux's Makefile.lib and
Kbuild.include and Makefile.build.
- Set Q to empty, Xen build system doesn't silence commands. Having Q
empty mean we can import stuff from Linux without having to remove the
leading $(Q) from build commands. And quiet='' means commands will be
echoed.
- Add $(PHONY) to .PHONY. Like it is intended by Kbuild.
Makefile.host is also updated and copied from Linux.
Dependency change:
- Now depends on flex/bison, maybe we could _shipped those files like
before. Linux doesn't do that anymore.
The .gitignore in kconfig/ has more entries, compared to upstream, for
file generated by Makefile.host.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tamas K Lengyel [Wed, 29 Jan 2020 14:06:50 +0000 (15:06 +0100)]
x86/mem_access: use __get_gfn_type_access in set_mem_access
Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking
when the mem_access permission is being set on a page that has not yet been
copied over from the parent.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Igor Druzhinin [Wed, 29 Jan 2020 14:06:10 +0000 (15:06 +0100)]
x86/suspend: disable watchdog before calling console_start_sync()
... and enable it after exiting S-state. Otherwise accumulated
output in serial buffer might easily trigger the watchdog if it's
still enabled after entering sync transmission mode.
The issue observed on machines which, unfortunately, generate non-0
output in CPU offline callbacks.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Wed, 29 Jan 2020 13:48:15 +0000 (14:48 +0100)]
x86/mem_sharing: replace MEM_SHARING_DEBUG with gdprintk
Using XENLOG_ERR level since this is only used in debug paths (ie. it's
expected the user already has loglvl=all set). Also use %pd to print the domain
ids.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Wed, 29 Jan 2020 13:47:00 +0000 (14:47 +0100)]
x86/apic: fix disabling LVT0 in disconnect_bsp_APIC
The Intel SDM states:
"When an illegal vector value (0 to 15) is written to a LVT entry and
the delivery mode is Fixed (bits 8-11 equal 0), the APIC may signal an
illegal vector error, without regard to whether the mask bit is set or
whether an interrupt is actually seen on the input."
And that's exactly what's currently done in disconnect_bsp_APIC when
virt_wire_setup is true and LVT LINT0 is being masked. By writing only
APIC_LVT_MASKED Xen is actually setting the vector to 0 and the
delivery mode to Fixed (0), and hence it triggers an APIC error even
when the LVT entry is masked.
This would usually manifest when Xen is being shut down, as that's
where disconnect_bsp_APIC is called:
(XEN) APIC error on CPU0: 40(00)
Fix this by calling clear_local_APIC prior to setting the LVT LINT
registers which already clear LVT LINT0, and hence the troublesome
write can be avoided as the register is already cleared.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Paul Durrant <pdurrant@amazon.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Ian Jackson [Fri, 10 Jan 2020 13:19:36 +0000 (13:19 +0000)]
libxl: event: Move poller pipe emptying to the end of afterpoll
This seems neater. It doesn't have any significant effect because:
The poller fd wouldn't be emptied by time_occurs. It would only be
woken by time_occurs as a result of an ao completing, or by
libxl__egc_ao_cleanup_1_baton. But ...1_baton won't be called in
between (for one thing, this would violate the rule of not still
having the active caller when ...1_baton is called).
While discussing this patch, I noticed that there is a possibility (in
libxl in general) that poller_put might be called on a woken poller.
It would probably be sensible at some point to make poller_get empty
the pipe, at least if the pipe_nonempty flag is set.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
v3: Completely revised commit message; now we think this is just
cleanup.
Ian Jackson [Fri, 10 Jan 2020 13:05:42 +0000 (13:05 +0000)]
libxl: event: Fix possible hang with libxl_osevent_beforepoll
If the application uses libxl_osevent_beforepoll, a similar hang is
possible to the one described and fixed in
libxl: event: Fix hang when mixing blocking and eventy calls
Application behaviour would have to be fairly unusual, but it
doesn't seem sensible to just leave this latent bug.
We fix the latent bug by waking up the "poller_app" pipe every time we
add osevents. If the application does not ever call beforepoll, we
write one byte to the pipe and set pipe_nonempty and then we ignore
it. We only write another byte if beforepoll is called again.
Normally in an eventy program there would only be one thread calling
libxl_osevent_beforepoll. The effect in such a program is to
sometimes needlessly go round the poll loop again if a timeout
callback becomes interested in a new osevent. We'll fix that in a
moment.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: New addition to correctness arguments in libxl_event.c comment.
Ian Jackson [Fri, 10 Jan 2020 13:11:07 +0000 (13:11 +0000)]
libxl: event: Break out baton_wake
No functional change.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: Now it takes a gc, not an egc.
Ian Jackson [Fri, 10 Jan 2020 13:11:46 +0000 (13:11 +0000)]
libxl: event: poller pipe optimisation
Track in userland whether the poller pipe is nonempty. This saves us
writing many many bytes to the pipe if nothing ever reads them.
This is going to be relevant in a moment, where we are going to create
a situation where this will happen quite a lot.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
Ian Jackson [Fri, 10 Jan 2020 12:37:43 +0000 (12:37 +0000)]
libxl: event: Fix hang when mixing blocking and eventy calls
If the application calls libxl with ao_how==0 and also makes calls
like _occurred, libxl will sometimes get stuck.
The bug happens as follows (for example):
Thread A
libxl_do_thing(,ao_how==0)
libxl_do_thing starts, sets up some callbacks
libxl_do_thing exit path calls AO_INPROGRESS
libxl__ao_inprogress goes into event loop
eventloop_iteration sleeps on:
- do_thing's current fd set
- sigchld pipe if applicable
- its poller
Thread B
libxl_something_occurred
the something is to do with do_thing, above
do_thing_next_callback does some more work
do_thing_next_callback becomes interested in fd N
thread B returns to application
Note that nothing wakes up thread A. A is not listening on fd N. So
do_thing_* will not spot when fd N signals. do_thing will not make
further timely progress. If there is no timeout thread A will never
wake up.
The problem here occurs because thread A is waiting on an out of date
osevent set.
There is also the possibility that a thread might block waiting for
libxl osevents but outside libxl, eg if the application used
libxl_osevent_beforepoll. We will deal with that in a moment.
See the big comment in libxl_event.c for a fairly formal correctness
argument.
This depends on libxl__egc_ao_cleanup_1_baton being called everywhere
an egc or ao is disposed of. Firstly egcs: in this patch we rename
libxl__egc_cleanup, which means we catch all the disposal sites.
Secondly aos: these are disposed of by (i) AO_CREATE_FAIL
(ii) ao__inprogress and (iii) an event which completes the ao later.
(i) and (ii) we handle by adding the call to _baton. In the case of
(iii) any such function must be an event-generating function so it has
an egc too, so it will pass on the baton when the egc is disposed.
Reported-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: Call libxl__egc_ao_cleanup_1_baton (renamed from __egc_cleanup) on
all exits from ao_inprogress, even requests for async processing.
Fixes a remaining instance of this bug (!)
This involves disposing of ao->poller somewhat earlier.
v2: New correctness arguments in libxl_event.c comment and
in commit message.