]> xenbits.xensource.com Git - people/dwmw2/xen.git/log
people/dwmw2/xen.git
5 years agoadd a domain_tot_pages() helper function
Paul Durrant [Thu, 30 Jan 2020 11:55:35 +0000 (11:55 +0000)]
add a domain_tot_pages() helper function

This patch adds a new domain_tot_pages() inline helper function into
sched.h, which will be needed by a subsequent patch.

No functional change.

NOTE: While modifying the comment for 'tot_pages' in sched.h this patch
      makes some cosmetic fixes to surrounding comments.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: George Dunlap <George.Dunlap@eu.citrix.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agolibxl: mark parameters in stub functions as unused
Wei Liu [Thu, 13 Feb 2020 21:40:27 +0000 (21:40 +0000)]
libxl: mark parameters in stub functions as unused

Hopefully this can fix issues like:

In file included from ../../src/libxl/xen_xl.c:24:0:
/home/osstest/build.147035.build-amd64-libvirt/xendist/usr/local/include/libxl.h: In function 'libxl_cpuid_apply_policy':
/home/osstest/build.147035.build-amd64-libvirt/xendist/usr/local/include/libxl.h:2345:56: error: unused parameter 'ctx' [-Werror=unused-parameter]
 static inline void libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid) {}

Fixes: dacb80f9 ("tools/libxl: Remove libxl_cpuid_{set,apply_policy}() from the API")
Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/arm: Use asm/ rather than asm-arm/ when including header
Julien Grall [Thu, 13 Feb 2020 12:39:06 +0000 (12:39 +0000)]
xen/arm: Use asm/ rather than asm-arm/ when including header

All the arch headers (i.e under asm-arm) are included using "asm/*.h".

To stay consistent, remove the only instance where "asm-arm/*.h" is
used.

Take the opportunity to move the inclusion with the rest of the asm/
include.

Signed-off-by: Julien Grall <julien@xen.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agolibxl: fix libxl__cpuid_legacy in libxl_nocpuid.c
Wei Liu [Thu, 13 Feb 2020 15:27:51 +0000 (15:27 +0000)]
libxl: fix libxl__cpuid_legacy in libxl_nocpuid.c

Its last parameter should be libxl_domain_build_info.

Fixes: 1b3cec69 ("tools/libxl: Combine legacy CPUID handling logic")
Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agoautomation: update debian:unstable-arm64v8 to have python3-config
Anthony PERARD [Thu, 13 Feb 2020 13:42:00 +0000 (13:42 +0000)]
automation: update debian:unstable-arm64v8 to have python3-config

The Arm container wasn't updated in the original patch.

Fixes: 1a3673da6482 ("automation: updating container to have python3-config binary")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxenstore: remove not applicable control commands in stubdom
Juergen Gross [Wed, 12 Feb 2020 07:41:54 +0000 (08:41 +0100)]
xenstore: remove not applicable control commands in stubdom

When run in a stubdom environment Xenstore can't select a logfile or
emit memory statistics to a specific file.

So remove or modify those control commands accordingly.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxenstore: add console xenstore entries for xenstore stubdom
Juergen Gross [Wed, 12 Feb 2020 07:41:53 +0000 (08:41 +0100)]
xenstore: add console xenstore entries for xenstore stubdom

In order to be able to connect to the console of Xenstore stubdom we
need to create the appropriate entries in Xenstore.

For the moment we don't support xenconsoled living in another domain
than dom0, as this information isn't available other then via
Xenstore which we are just setting up.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxenstore: setup xenstore stubdom console interface properly
Juergen Gross [Wed, 12 Feb 2020 07:41:52 +0000 (08:41 +0100)]
xenstore: setup xenstore stubdom console interface properly

In order to be able to get access to the console of Xenstore stubdom
we need an appropriate granttab entry. So call xc_dom_gnttab_init()
when constructing the domain and preset some information needed
for that function in the dom structure.

We need to create the event channel for the console, too. Do that and
store all necessary data locally.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxen: remove empty softirq_init()
Juergen Gross [Wed, 12 Feb 2020 09:55:06 +0000 (10:55 +0100)]
xen: remove empty softirq_init()

softirq_init() is empty since Xen 4.1. Remove it together with its call
sites.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoAMD/IOMMU: drop redundant code
Jan Beulich [Wed, 12 Feb 2020 09:54:08 +0000 (10:54 +0100)]
AMD/IOMMU: drop redundant code

The level 1 special exit path is unnecessary in iommu_pde_from_dfn() -
the subsequent code takes care of this case quite fine.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agodom0-build: fix build with clang5
Jan Beulich [Wed, 12 Feb 2020 09:52:20 +0000 (10:52 +0100)]
dom0-build: fix build with clang5

With non-empty CONFIG_DOM0_MEM clang5 produces

dom0_build.c:344:24: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
    if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
                       ^  ~~~~~~~~~~~~~~~~~~
dom0_build.c:344:24: note: use '&' for a bitwise operation
    if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
                       ^~
                       &
dom0_build.c:344:24: note: remove constant to silence this warning
    if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
                      ~^~~~~~~~~~~~~~~~~~~~~
1 error generated.

Obviously neither of the two suggestions are an option here. Oddly
enough swapping the operands of the && helps, while e.g. casting or
parenthesizing doesn't. Another workable variant looks to be the use of
!! on the constant.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agotools/libxl: Combine legacy CPUID handling logic
Andrew Cooper [Wed, 5 Feb 2020 15:25:21 +0000 (15:25 +0000)]
tools/libxl: Combine legacy CPUID handling logic

While we are in the process of overhauling boot time CPUID/MSR handling, the
existing logic is going to have to remain in roughly this form for backwards
compatibility.

Fold libxl__cpuid_apply_policy() and libxl__cpuid_set() together into a single
libxl__cpuid_legacy() to reduce the complexity for callers.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools/libxl: Remove libxl_cpuid_{set,apply_policy}() from the API
Andrew Cooper [Wed, 8 Jan 2020 12:53:49 +0000 (12:53 +0000)]
tools/libxl: Remove libxl_cpuid_{set,apply_policy}() from the API

These functions should never have been exposed.  They don't have external
users, and can't usefully be used for several reasons.

Move libxl_cpuid_{set,apply_policy}() to being internal functions, and leave
an equivalent of the nop stubs in the API for caller compatibility.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoMAINTAINERS: Add explicit check-in policy section
George Dunlap [Thu, 5 Dec 2019 17:19:47 +0000 (17:19 +0000)]
MAINTAINERS: Add explicit check-in policy section

The "nesting" section in the MAINTAINERS file was not initially
intended to describe the check-in policy for patches, but only how
nesting worked; but since there was no check-in policy, it has been
acting as a de-facto policy.

One problem with this is that the policy is not complete: It doesn't
cover open objections, time to check-in, or so on.  The other problem
with the policy is that, as written, it doesn't account for
maintainers submitting patches to files which they themselves
maintain.  This is fine for situations where there are are multiple
maintainers, but not for situations where there is only one
maintainer.

Add an explicit "Check-in policy" section to the MAINTAINERS document
to serve as the canonical reference for the check-in policy.  Move
paragraphs not explicitly related to nesting into it.

While here, "promote" the "The meaning of nesting" section title.

DISCUSSION

This seems to be a change from people's understanding of the current
policy.  Most people's understanding of the current policy seems to be:

1.  In order to get a change to a given file committed, it must have
an Ack or Review from at least one *maintainer* of that file other
than the submitter.

2. In the case where a file has only one maintainer, it must have an
Ack or Review from a "nested" maintainer.

I.e., if I submitted something to x86/mm, it would require an Ack from
Jan or Andy, or (in exceptional circumstances) The Rest; but an Ack from
(say) Roger or Juergen wouldn't suffice.

Let's call this the "maintainer-ack" approach (because it must have an
ack or r-b from a maintainer to be checked in), and the proposal in
this patch the "maintainer-approval" (since SoB from a maintainer
indicates approval).

The core issue I have with "maintainer-ack" is that it makes the
maintainer less privileged with regard to writing code than
non-maintainers.  If component X has maintainers A and B, then a
non-maintainer can have code checked in if reviewed either by A or B.
If A or B wants code checked in, they have to wait for exactly one
person to review it.

In fact, if B is quite busy, the easiest way for A really to get their
code checked in might be to hand it to a non-maintainer N, and ask N
to submit it as their own.  Then A can Ack the patches and check them
in.

The current system, therefore, either sets up a perverse incentive (if
you think the behavior described above is unacceptable) or unnecessary
bureaucracy (if you think it's acceptable).  Either way I think we
should set up our system to avoid it.

Other variations on "maintainer-ack" have been proposed:

- Allow maintainer's patches to go in with an R-b from "designated
  reviewers"

- Allow maintainer's patches to go in with an Ack from more general
  maintainer

Both fundamentally make it harder for maintainers to get their code in
and/or reviewed effectively than non-maintainers, setting up the
perverse incentive / unnecessary bureaucracy.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/vmx: Shrink TASK_SWITCH's hvm_task_switch_reason reasons[]
Andrew Cooper [Wed, 4 Dec 2019 18:21:04 +0000 (18:21 +0000)]
x86/vmx: Shrink TASK_SWITCH's hvm_task_switch_reason reasons[]

No need to use 4-byte integers to store two bits of information.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotools: Default to python3
Anthony PERARD [Mon, 20 Jan 2020 11:50:53 +0000 (11:50 +0000)]
tools: Default to python3

Main reason, newer version of QEMU doesn't support python 2.x anymore.
Second main reason, python2 is EOL.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/pvh: Adjust dom0's starting state
Andrew Cooper [Mon, 10 Feb 2020 18:33:26 +0000 (18:33 +0000)]
x86/pvh: Adjust dom0's starting state

Fixes: b25fb1a04e "xen/pvh: Fix segment selector ABI"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: Treat head/tail pointers as byte offsets
Andrew Cooper [Sun, 2 Feb 2020 18:23:47 +0000 (18:23 +0000)]
AMD/IOMMU: Treat head/tail pointers as byte offsets

The MMIO registers as already byte offsets.  Using them in this form removes
the need to shift their values for use.

It is also inefficient to store both entries and alloc_size (which only differ
by entry_size).  Rename alloc_size to size, and drop entries entirely, which
simplifies the allocation/deallocation helpers slightly.

Mark send_iommu_command() and invalidate_iommu_all() as static, as they have
no external declaration or callers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/sched: remove pointless BUG_ON() in credit2
Juergen Gross [Tue, 11 Feb 2020 09:44:18 +0000 (10:44 +0100)]
xen/sched: remove pointless BUG_ON() in credit2

The BUG_ON() at the top of csched2_context_saved() is completely
pointless, as the ASSERT() just following it catches the same problem
already.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoMAINTAINERS: cc community manager on patches to CHANGELOG.md
Paul Durrant [Thu, 6 Feb 2020 16:48:10 +0000 (16:48 +0000)]
MAINTAINERS: cc community manager on patches to CHANGELOG.md

The purpose of the change-log is to be a human-readable list of notable
changes and, as such, additions to it are likely of interest to the
community manager in preparing blog entries, release summaries, etc.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Julien Grall <julien@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/svm: Reduce vmentry latency
Andrew Cooper [Mon, 10 Feb 2020 11:27:32 +0000 (11:27 +0000)]
x86/svm: Reduce vmentry latency

Writing to the stack pointer in the middle of a line of pop operations is
specifically recommended against by the optimisation guide, and is a technique
used by Speculative Load Hardening to combat SpectreRSB.

In practice, it causes all further stack-relative accesses to block until the
write to the stack pointer retires, so the stack engine can get back in sync.

Pop into any dead register to discard %rax's value without clobbering the
stack engine.  Smaller compiled code, and runs faster.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: Treat guest head/tail pointers as byte offsets
Andrew Cooper [Mon, 3 Feb 2020 13:50:34 +0000 (13:50 +0000)]
AMD/IOMMU: Treat guest head/tail pointers as byte offsets

The MMIO registers as already formatted as byte offsets.  Start by masking out
reserved bits, which fixes an implementation bug (reserved bits should be
read-only zero, rather than preserving their previously-written value).  As a
consequence, we can use the values directly, instead of masking/shifting on
every use.

Store the buffer size, rather than the number of entries, to keep the same
units for comparison purposes.

This simplifies guest_iommu_get_table_mfn() by dropping the entry_size
parameter, and simplifies the map_domain_page() handling by being able to drop
the log_base variables.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: Delete iommu_{get,set,clear}_bit() helpers
Andrew Cooper [Sun, 2 Feb 2020 16:35:32 +0000 (16:35 +0000)]
AMD/IOMMU: Delete iommu_{get,set,clear}_bit() helpers

These are obfuscations around simple bit operations, and the compiler really
can do a better job when it can see them normally:

  add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-189 (-189)
  Function                                     old     new   delta
  guest_iommu_add_ppr_log                      266     251     -15
  guest_iommu_add_event_log                    266     251     -15
  iommu_reset_log                              274     242     -32
  guest_iommu_process_command                 1602    1544     -58
  guest_iommu_mmio_write                      1123    1054     -69
  Total: Before=3019344, After=3019155, chg -0.01%

Drop all status register MASK/SHIFT constants, and enumerate the bits
normally.  Rename EVENT_OVERFLOW to EVENT_LOG_OVERFLOW for consistency.  (The
field name in the spec is inconsistent, despite the description referring to
an overflow of the event log.)

The only semantic change is in iommu_reset_log() where 'run_bit' changes from
being a bit position to being a single-bit mask.  Update some local variable
types to be more suitable.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: Move headers to be local
Andrew Cooper [Mon, 3 Feb 2020 13:09:17 +0000 (13:09 +0000)]
AMD/IOMMU: Move headers to be local

We currently have amd-iommu-defs.h, amd-iommu-proto.h and amd-iommu.h, and no
references outside of the AMD IOMMU driver.

Keep iommu-defs.h as is, but merge amd-iommu.h and amd-iommu-proto.h to just
iommu.h, and move them both into drivers/passthrough/amd/.  (While merging,
drop the bogus #pragma pack around the *_entry structures.)

Take the opportunity to trim the include lists, including x86/mm/p2m.c
which (AFAICT) hasn't needed this include since c/s aef3f2275 "x86/mm/p2m:
break into common, pt-implementation and pod parts" in 2011.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/pvh: Fix segment selector ABI
Andrew Cooper [Sat, 8 Feb 2020 14:47:48 +0000 (14:47 +0000)]
xen/pvh: Fix segment selector ABI

The written ABI states that %es will be set up, but libxc doesn't do so.  In
practice, it breaks `rep movs` inside guests before they reload %es.

The written ABI doesn't mention %ss, but libxc does set it up.  Having %ds
different to %ss is obnoxous to work with, as different registers have
different implicit segments.

Modify the spec to state that %ss is set up as a flat read/write segment.
This a) matches the Multiboot 1 spec, b) matches what is set up in practice,
and c) is the more sane behaviour for guests to use.

Fixes: 68e1183411b ('libxc: introduce a xc_dom_arch for hvm-3.0-x86_32 guests')
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agotools/ocaml: Drop cpuid helpers
Andrew Cooper [Tue, 28 Jan 2020 15:36:23 +0000 (15:36 +0000)]
tools/ocaml: Drop cpuid helpers

These have no callers, and the underlying infrastructure is about to be
rewritten completely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
5 years agotools/python: Drop cpuid helpers
Andrew Cooper [Wed, 5 Feb 2020 16:50:53 +0000 (16:50 +0000)]
tools/python: Drop cpuid helpers

These are believed-unused, and the underlying infrastructure is about to be
rewritten completely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
5 years agoxen/arm: Handle unimplemented VGICv3 registers as RAZ/WI
Jeff Kubascik [Tue, 4 Feb 2020 19:51:50 +0000 (14:51 -0500)]
xen/arm: Handle unimplemented VGICv3 registers as RAZ/WI

Per the ARM Generic Interrupt Controller Architecture Specification (ARM
IHI 0069E), reserved registers should generally be treated as RAZ/WI.
To simplify the VGICv3 design and improve guest compatibility, treat the
default case for GICD and GICR registers as read_as_zero/write_ignore.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agoxen/include: Fix typoes in asm-x86/domain.h
Julien Grall [Tue, 4 Feb 2020 16:53:49 +0000 (16:53 +0000)]
xen/include: Fix typoes in asm-x86/domain.h

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/include: public: Document the padding in struct xen_hvm_param
Julien Grall [Thu, 6 Feb 2020 15:41:18 +0000 (15:41 +0000)]
xen/include: public: Document the padding in struct xen_hvm_param

There is an implicit padding of 2 bytes in struct xen_hvm_param between
the field domid and index. Make it explicit by introduce a padding
field. This can also serve as documentation.

Note that I don't think we can mandate it to be zero because a guest may
not have initialized the padding.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/HVM: reduce scope of pfec in hvm_emulate_init_per_insn()
Jan Beulich [Thu, 6 Feb 2020 15:23:30 +0000 (16:23 +0100)]
x86/HVM: reduce scope of pfec in hvm_emulate_init_per_insn()

It needs calculating only in one out of three cases. Re-structure the
code a little such that the variable truly gets calculated only when we
don't get any insn bytes from elsewhere, and hence need to (try to)
fetch them. Also OR in PFEC_insn_fetch right in the initializer.

While in this mood, restrict addr's scope as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agox86/HVM: introduce "curr" into hvmemul_rep_{mov,sto}s()
Jan Beulich [Thu, 6 Feb 2020 15:22:28 +0000 (16:22 +0100)]
x86/HVM: introduce "curr" into hvmemul_rep_{mov,sto}s()

There are a number of uses of "current" already, and more may appear
down the road. Latch into a local variable.

At this occasion also drop stray casts from code getting touched anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agox86/guest/xen: only set HVM parameter on BSP
Wei Liu [Wed, 5 Feb 2020 18:02:24 +0000 (18:02 +0000)]
x86/guest/xen: only set HVM parameter on BSP

There is no need for every CPU to set a guest property.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86: make it clear range is inclusive when printing e820 range
Wei Liu [Thu, 6 Feb 2020 12:10:24 +0000 (12:10 +0000)]
x86: make it clear range is inclusive when printing e820 range

Change the format string to use "[,]" and subtract 1 from the end.

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agodomctl/vNUMA: avoid arithmetic overflow
Jan Beulich [Thu, 6 Feb 2020 08:55:18 +0000 (09:55 +0100)]
domctl/vNUMA: avoid arithmetic overflow

Checking the result of a multiplication against a certain limit has no
sufficient implication on the original value's range. In the case here
it is in particular problematic that while handling the domctl we do

    if ( copy_from_guest(info->vdistance, uinfo->vdistance,
                         nr_vnodes * nr_vnodes) )
        goto vnuma_fail;

which means copying sizeof(unsigned int) * (nr_vnodes * nr_vnodes)
bytes, and the handling of XENMEM_get_vnumainfo similarly has

        tmp.vdistance = xmalloc_array(unsigned int, dom_vnodes * dom_vnodes);

which means allocating sizeof(unsigned int) * (dom_vnodes * dom_vnodes)
bytes, whereas in then goes on doing this:

        memcpy(tmp.vdistance, d->vnuma->vdistance,
               sizeof(*d->vnuma->vdistance) * dom_vnodes * dom_vnodes);

Note the lack of parentheses in the multiplication expression.

Adjust the overflow check, moving the must-not-be-zero one right next to
it to avoid questions on whether there might be division by zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agoArm/GICv2: don't needlessly use xzalloc_bytes()
Jan Beulich [Thu, 6 Feb 2020 08:53:43 +0000 (09:53 +0100)]
Arm/GICv2: don't needlessly use xzalloc_bytes()

... when plain xzalloc() (which is more type safe) does.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agoxmalloc: guard against integer overflow
Jan Beulich [Thu, 6 Feb 2020 08:53:12 +0000 (09:53 +0100)]
xmalloc: guard against integer overflow

There are hypercall handling paths (EFI ones are what this was found
with) needing to allocate buffers of a caller specified size. This is
generally fine, as our page allocator enforces an upper bound on all
allocations. However, certain extremely large sizes could, when adding
in allocator overhead, result in an apparently tiny allocation size,
which would typically result in either a successful allocation, but a
severe buffer overrun when using that memory block, or in a crash right
in the allocator code.

Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoEFI: don't leak heap contents through XEN_EFI_get_next_variable_name
Jan Beulich [Thu, 6 Feb 2020 08:52:33 +0000 (09:52 +0100)]
EFI: don't leak heap contents through XEN_EFI_get_next_variable_name

Commit 1f4eb9d27d0e ("EFI: fix getting EFI variable list on some
systems") switched to using the caller provided size for the copy-out
without making sure the copied buffer is properly scrubbed.

Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoEFI: re-check {get,set}-variable name strings after copying in
Jan Beulich [Thu, 6 Feb 2020 08:51:17 +0000 (09:51 +0100)]
EFI: re-check {get,set}-variable name strings after copying in

A malicious guest given permission to invoke XENPF_efi_runtime_call may
play with the strings underneath Xen sizing them and copying them in.
Guard against this by re-checking the copyied in data for consistency
with the initial sizing. At the same time also check that the actual
copy-in is in fact successful, and switch to the lighter weight non-
checking flavor of the function.

Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/hyperv: setup VP assist page
Wei Liu [Sun, 29 Dec 2019 17:54:12 +0000 (17:54 +0000)]
x86/hyperv: setup VP assist page

VP assist page is rather important as we need to toggle some bits in it
for efficient nested virtualisation.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/hyperv: retrieve vp_index from Hyper-V
Wei Liu [Sun, 29 Dec 2019 17:16:40 +0000 (17:16 +0000)]
x86/hyperv: retrieve vp_index from Hyper-V

This will be useful when invoking hypercall that targets specific
vcpu(s).

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hyperv: provide percpu hypercall input page
Wei Liu [Sun, 29 Dec 2019 16:57:21 +0000 (16:57 +0000)]
x86/hyperv: provide percpu hypercall input page

Hyper-V's input / output argument must be 8 bytes aligned an not cross
page boundary. One way to satisfy those requirements is to use percpu
page.

For the foreseeable future we only need to provide input for TLB
and APIC hypercalls, so skip setting up an output page.

We will also need to provide an ap_setup hook for secondary cpus to
setup its own input page.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/hyperv: provide Hyper-V hypercall functions
Wei Liu [Sun, 29 Dec 2019 16:27:47 +0000 (16:27 +0000)]
x86/hyperv: provide Hyper-V hypercall functions

These functions will be used later to make hypercalls to Hyper-V.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hyperv: setup hypercall page
Wei Liu [Wed, 15 Jan 2020 16:40:49 +0000 (16:40 +0000)]
x86/hyperv: setup hypercall page

Hyper-V uses a technique called overlay page for its hypercall page. It
will insert a backing page to the guest when the hypercall functionality
is enabled. That means we can use a page that is not backed by real
memory for hypercall page.

To avoid shattering L0 superpages and treading on any MMIO areas
residing in low addresses, use the top-most addressable page for that
purpose. Adjust e820 map accordingly.

We also need to register Xen's guest OS ID to Hyper-V. Use 0x3 as the
vendor ID. Fix the comment in hyperv-tlfs.h while at it.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Paul Durrant <pdurrant@amazon.com>
5 years agox86/hypervisor: provide hypervisor_fixup_e820
Wei Liu [Fri, 31 Jan 2020 11:57:31 +0000 (11:57 +0000)]
x86/hypervisor: provide hypervisor_fixup_e820

And implement the hook for Xen guest.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: provide executable fixmap facility
Wei Liu [Wed, 8 Jan 2020 21:35:23 +0000 (21:35 +0000)]
x86: provide executable fixmap facility

This allows us to set aside some address space for executable mapping.
This fixed map range starts from XEN_VIRT_END so that it is within reach
of the .text section.

Shift the percpu stub range and shrink livepatch range accordingly.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/smp: don't online cpu if hypervisor_ap_setup fails
Wei Liu [Wed, 29 Jan 2020 13:53:53 +0000 (13:53 +0000)]
x86/smp: don't online cpu if hypervisor_ap_setup fails

Push hypervisor_ap_setup down to smp_callin.

Take the chance to replace xen_guest with cpu_has_hypervisor.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hypervisor: make hypervisor_ap_setup return an error code
Wei Liu [Wed, 29 Jan 2020 13:47:37 +0000 (13:47 +0000)]
x86/hypervisor: make hypervisor_ap_setup return an error code

We want to be able to handle AP setup error in the upper layer.

For Xen, remove all panic() and BUG_ON() in init_evtchn and
map_vcpuinfo. Only panic/BUG_ON when Xen can't fail gracefully.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: use default_access in add_to_physmap
Tamas K Lengyel [Wed, 5 Feb 2020 12:53:14 +0000 (13:53 +0100)]
x86/mem_sharing: use default_access in add_to_physmap

When plugging a hole in the target physmap don't use the access permission
returned by __get_gfn_type_access as it is non-sensical (p2m_access_n) in
the use-case add_to_physmap was intended to be used in. It leads to vm_events
being sent out for access violations at unexpected locations. Make use of
p2m->default_access instead and document the ambiguity surrounding "hole"
types and corner-cases with custom mem_access being set on holes.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hvm: introduce hvm_copy_context_and_params
Tamas K Lengyel [Wed, 5 Feb 2020 12:52:29 +0000 (13:52 +0100)]
x86/hvm: introduce hvm_copy_context_and_params

Currently the hvm parameters are only accessible via the HVMOP hypercalls. In
this patch we introduce a new function that can copy both the hvm context and
parameters directly into a target domain.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/HVM: rename a variable in __hvm_copy()
Jan Beulich [Wed, 5 Feb 2020 12:51:56 +0000 (13:51 +0100)]
x86/HVM: rename a variable in __hvm_copy()

This is to reflect its actual purpose. Also use in a 2nd place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoRevert "tools/libxc: disable x2APIC when using nested virtualization"
Roger Pau Monné [Wed, 5 Feb 2020 12:51:19 +0000 (13:51 +0100)]
Revert "tools/libxc: disable x2APIC when using nested virtualization"

This reverts commit 7b3c5b70a32303b46d0d051e695f18d72cce5ed0 and
re-enables the usage of x2APIC with nested virtualization.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/vvmx: don't enable interrupt window when using virt intr delivery
Roger Pau Monné [Wed, 5 Feb 2020 12:50:46 +0000 (13:50 +0100)]
x86/vvmx: don't enable interrupt window when using virt intr delivery

If virtual interrupt delivery is used to inject the interrupt to the
guest the interrupt window shouldn't be enabled, as the interrupt is
already injected using the GUEST_INTR_STATUS vmcs field.

Reported-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/vvmx: fix VM_EXIT_ACK_INTR_ON_EXIT handling
Roger Pau Monné [Wed, 5 Feb 2020 12:50:09 +0000 (13:50 +0100)]
x86/vvmx: fix VM_EXIT_ACK_INTR_ON_EXIT handling

When VM_EXIT_ACK_INTR_ON_EXIT is clear in the vmexit control vmcs
register the bit 31 of VM_EXIT_INTR_INFO must be 0, in order to denote
that the field doesn't contain any interrupt information. This is not
currently acknowledged as the field always get filled with valid
interrupt information, regardless of whether VM_EXIT_ACK_INTR_ON_EXIT
is set.

Fix this and only fill VM_EXIT_INTR_INFO when VM_EXIT_ACK_INTR_ON_EXIT
is set. Note that this requires one minor change in
nvmx_update_apicv in order to obtain the interrupt information from
the internal state rather than the nested vmcs register.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/vvmx: fix virtual interrupt injection when Ack on exit control is used
Roger Pau Monné [Wed, 5 Feb 2020 12:49:09 +0000 (13:49 +0100)]
x86/vvmx: fix virtual interrupt injection when Ack on exit control is used

When doing a virtual vmexit (ie: a vmexit handled by the L1 VMM)
interrupts shouldn't be injected using the virtual interrupt delivery
mechanism unless the Ack on exit vmexit control bit isn't set in the
nested vmcs.

Gate the call to nvmx_update_apicv helper on whether the nested vmcs
has the Ack on exit bit set in the vmexit control field.

Note that this fixes the usage of x2APIC by the L1 VMM, at least when
the L1 VMM is Xen.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agolibxc/restore: Fix REC_TYPE_X86_PV_VCPU_XSAVE data auditing (take 2)
Andrew Cooper [Tue, 4 Feb 2020 20:29:38 +0000 (20:29 +0000)]
libxc/restore: Fix REC_TYPE_X86_PV_VCPU_XSAVE data auditing (take 2)

It turns out that a bug (since forever) in Xen causes XSAVE records to have
non-architectural behaviour on xsave-capable hardware, when a PV guest has not
touched the state.

In such a case, the data record returned from Xen is 2*uint64_t, both claiming
the (illegitimate) state of %xcr0 and %xcr0_accum being 0.

Adjust the bound in handle_x86_pv_vcpu_blob() to cope with this.

Fixes: 2a62c22715b "libxc/restore: Fix data auditing in handle_x86_pv_vcpu_blob()"
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl: fix assertion failure in stub domain creation
Paul Durrant [Wed, 5 Feb 2020 11:24:12 +0000 (11:24 +0000)]
libxl: fix assertion failure in stub domain creation

An assertion in libxl__domain_make():

'soft_reset || *domid == INVALID_DOMID'

does not hold true for stub domain creation, where soft_reset is false
but the passed in domid == 0. This is easily fixed by changing the
initializer in libxl__spawn_stub_dm().

NOTE: The comment for XEN_DOMCTL_createdomain in domctl.h is changed to
      reflect reality.

Fixes: 75259239d85d ("libxl_create: make 'soft reset' explicit")
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoINSTALL: Mention kconfig
Ian Jackson [Tue, 4 Feb 2020 09:34:24 +0000 (09:34 +0000)]
INSTALL: Mention kconfig

Firstly, add a reference to the documentation for the kconfig system.

Secondly, warn the user about the XEN_CONFIG_EXPERT problem.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/xenstore: Re-introduce (fake) xs_restrict call to preserve ABI
Stefan Bader [Tue, 4 Feb 2020 09:34:23 +0000 (09:34 +0000)]
tools/xenstore: Re-introduce (fake) xs_restrict call to preserve ABI

libxenstore3.0 in Xen 4.8 had this function.  We don't really want to
bump the ABI version (soname) just for this, since we don't think
there are actual callers anywhere.  But tools complain about the
symbol going away.

So, provide a function xs_restrict which conforms to the original
semantics, although it always fails.

Gbp-Pq: Topic xenstore
Gbp-Pq: Name tools-fake-xs-restrict.patch
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
5 years agox86: move viridian_page_msr to hyperv-tlfs.h
Wei Liu [Fri, 31 Jan 2020 17:49:29 +0000 (17:49 +0000)]
x86: move viridian_page_msr to hyperv-tlfs.h

And rename it to hv_vp_assist_page_msr.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agodocs/xl: fix typo in xl.cfg
Roger Pau Monne [Mon, 3 Feb 2020 10:31:12 +0000 (11:31 +0100)]
docs/xl: fix typo in xl.cfg

The name of the option is nographic.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agodocs/misc: xen-command-line: fix parameters in sample serial configuration
Sarah Newman [Mon, 3 Feb 2020 12:09:13 +0000 (13:09 +0100)]
docs/misc: xen-command-line: fix parameters in sample serial configuration

The names of the serial parameters use hyphens, not underscores.

Signed-off-by: Sarah Newman <srn@prgmr.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86 / vmx: move teardown from domain_destroy()...
Paul Durrant [Mon, 3 Feb 2020 12:08:44 +0000 (13:08 +0100)]
x86 / vmx: move teardown from domain_destroy()...

... to domain_relinquish_resources().

The teardown code frees the APICv page. This does not need to be done late
so do it in domain_relinquish_resources() rather than domain_destroy().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/EPT: drop redundant ept_p2m_type_to_flags() parameters
Jan Beulich [Mon, 3 Feb 2020 12:08:06 +0000 (13:08 +0100)]
x86/EPT: drop redundant ept_p2m_type_to_flags() parameters

All callers set the respective fields in the entry being updated before
the call.

Take the opportunity and also constify the first parameter as well as
make a few style adjustments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/EPT: do away with hidden GUEST_TABLE_MAP_FAILED == 0 assumptions
Jan Beulich [Mon, 3 Feb 2020 12:07:19 +0000 (13:07 +0100)]
x86/EPT: do away with hidden GUEST_TABLE_MAP_FAILED == 0 assumptions

The code is quite a bit easier to read and to reason about this way,
I think.

In ept_set_entry() additionally change the function's return value in
the MAP_FAILED case to -ENOMEM; -ENOENT would be applicable only when
ept_next_entry() was invoked with "read_only" set to true.

In two cases, where ept_next_level() follows an ept_split_superpage()
invocation, actually tighten the loop exit condition from
"== MAP_FAILED" to "!= NORMAL_PAGE". Continuing these loops for other
than NORMAL_PAGE is invalid, and there are ASSERT()s in place after
these loops.

Also reduce the scope of "ret" variables where possible, in particular
to better distinguish them from "rc" often used in the same function.

Finally drop pointless "else" in a few areas touched anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/tlb: fix NEED_FLUSH return type
Roger Pau Monné [Mon, 3 Feb 2020 12:06:19 +0000 (13:06 +0100)]
x86/tlb: fix NEED_FLUSH return type

The returned type wants to be bool instead of int.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agoxen: split parameter related definitions in own header file
Juergen Gross [Mon, 3 Feb 2020 12:04:30 +0000 (13:04 +0100)]
xen: split parameter related definitions in own header file

Move the parameter related definitions from init.h into a new header
file param.h. This will avoid include hell when new dependencies are
added to parameter definitions.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <julien@xen.org>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/x86: domctl: Don't leak data via XEN_DOMCTL_gethvmcontext
Julien Grall [Mon, 27 Jan 2020 13:34:12 +0000 (13:34 +0000)]
xen/x86: domctl: Don't leak data via XEN_DOMCTL_gethvmcontext

The HVM context may not fill up the full buffer passed by the caller.
While we report corectly the size of the context, we will still be
copying back the full size of the buffer.

As the buffer is allocated through xmalloc(), we will be copying some
bits from the previous allocation.

Only copy back the part of the buffer used by the HVM context to prevent
any leak.

Note that per XSA-72, this is not a security issue.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/x86: domain: Remove specific case when allocating struct domain
Julien Grall [Mon, 20 Jan 2020 14:10:57 +0000 (14:10 +0000)]
xen/x86: domain: Remove specific case when allocating struct domain

Commit 8916fcf4577 "x86/domain: compile with lock_profile=y enabled"
allowed the struct domain to use more than a PAGE_SIZE (i.e 4096).
However, the function free_domheap_struct() will only free the first
page.

We could modify the free part to free the correct number of pages, but
the structure has been fitting in a page (even with lock profile
enabled) since commit 428607a410 "x86: shrink 'struct domain', was
already PAGE_SIZE" (part of Xen 4.7).

Therefore, the specific case for lock profile is now removed.

This is not a security issue because struct domain can only be bigger
than a page size for lock profiling. The feature can only be selected
in DEBUG and EXPERT mode.

Fixes: 8916fcf4577 ("x86/domain: compile with lock_profile=y enabled")
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: make paddr_bits available earlier
Wei Liu [Wed, 29 Jan 2020 14:09:54 +0000 (14:09 +0000)]
x86: make paddr_bits available earlier

Move early_cpu_init before init_e820, such that paddr_bits can be used
by e820 code.

This will reduce code repetition and prepare for further adjustment when
L0 hypervisor comes into play.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/xenstore: don't apply write limiting for privileged domain
Juergen Gross [Fri, 31 Jan 2020 14:25:57 +0000 (15:25 +0100)]
tools/xenstore: don't apply write limiting for privileged domain

Xenstore write limiting should not be applied to dom0. Unfortunately
write limiting is disabled only for connections via sockets. When
running in a stubdom Xenstore will apply write limiting to dom0, too.
Change that by testing for the domain to be privileged as well.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/xenstore: add newline for printing of stubdom console messages
Juergen Gross [Fri, 31 Jan 2020 14:25:09 +0000 (15:25 +0100)]
tools/xenstore: add newline for printing of stubdom console messages

There are several places in xenstore-stubdom where newlines at the end
of messages on the console are missing. Add them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl: generalise libxl__domain_userdata_lock()
Paul Durrant [Fri, 31 Jan 2020 15:01:45 +0000 (15:01 +0000)]
libxl: generalise libxl__domain_userdata_lock()

This function implements a file-based lock with a file name generated
from a domid.

This patch splits it into two, generalising the core of the locking code
into a new libxl__lock_file() function which operates on a specified file,
leaving just the file name generation in libxl__domain_userdata_lock().

This patch also generalises libxl__unlock_domain_userdata() to
libxl__unlock_file() and modifies all call-sites.

Suggested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl_create: make 'soft reset' explicit
Paul Durrant [Fri, 31 Jan 2020 15:01:44 +0000 (15:01 +0000)]
libxl_create: make 'soft reset' explicit

The 'soft reset' code path in libxl__domain_make() is currently taken if a
valid domid is passed into the function. A subsequent patch will enable
higher levels of the toolstack to determine the domid of newly created or
restored domains and therefore this criteria for choosing 'soft reset'
will no longer be usable.

This patch adds an extra boolean option to libxl__domain_make() to specify
whether it is being invoked in soft reset context and appropriately
modifies callers to choose the right value. To facilitate this, a new
'soft_reset' boolean field is added to struct libxl__domain_create_state
and the 'domid_soft_reset' field is renamed to 'domid' in anticipation of
its wider remit. For the moment do_domain_create() will always set
domid to INVALID_DOMID and hence we can add an assertion into
libxl__domain_create() that, if it is not called in soft reset context,
the passed in domid is exactly that value.

Whilst in the neighbourhood, some checks of 'restore_fd > -1' have been
replaced by 'restore_fd >= 0' to be more conventional and consistent with
checks of 'restore_fd < 0'.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: add definition of INVALID_DOMID to the API
Paul Durrant [Fri, 31 Jan 2020 15:01:43 +0000 (15:01 +0000)]
libxl: add definition of INVALID_DOMID to the API

Currently both xl and libxl have internal definitions of INVALID_DOMID
which happen to be identical. However, for the purposes of describing the
behaviour of libxl_domain_create_new/restore() it is useful to have a
specified invalid value for a domain id.

This patch therefore moves the libxl definition from libxl_internal.h to
libxl.h and removes the internal definition from xl_utils.h. The hardcoded
'-1' passed back via domcreate_complete() is then updated to INVALID_DOMID
and comment above libxl_domain_create_new/restore() is accordingly
modified.

NOTE: The value of INVALID_DOMID (~0) is distinct from the hypervisor's
      DOMID_INVALID. This patch preserves that value.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/HVM: avoid truncation of PM timer I/O port range version
Jan Beulich [Fri, 31 Jan 2020 15:48:25 +0000 (16:48 +0100)]
x86/HVM: avoid truncation of PM timer I/O port range version

Don't silently ignore the upper 32 bits.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: relinquish resources also from hvm_domain_destroy()
Jan Beulich [Fri, 31 Jan 2020 15:47:29 +0000 (16:47 +0100)]
x86/HVM: relinquish resources also from hvm_domain_destroy()

Domain creation failure paths don't call domain_relinquish_resources(),
yet allocations and alike done from hvm_domain_initialize() need to be
undone nevertheless. Call the function also from hvm_domain_destroy(),
after making sure all descendants are idempotent.

Note that while viridian_{domain,vcpu}_deinit() were already used in
ways suggesting they're idempotent, viridian_time_vcpu_deinit() actually
wasn't: One can't kill a timer that was never initialized.

For hvm_destroy_all_ioreq_servers()'s purposes make
relocate_portio_handler() return whether the to be relocated port range
was actually found. This seems cheaper than introducing a flag into
struct hvm_domain's ioreq_server sub-structure.

In hvm_domain_initialise() additionally
- use XFREE() also to replace adjacent xfree(),
- use hvm_domain_relinquish_resources() as being idempotent now.
There as well as in hvm_domain_destroy() the explicit call to
rtc_deinit() isn't needed anymore.

In hvm_domain_relinquish_resources() additionally drop a no longer
relevant if().

Fixes: e7a9b5e72f26 ("viridian: separately allocate domain and vcpu structures")
Fixes: 26fba3c85571 ("viridian: add implementation of synthetic timers")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agoMAINTAINERS: put Hyper-V code under Viridian maintainership
Wei Liu [Wed, 29 Jan 2020 20:20:23 +0000 (20:20 +0000)]
MAINTAINERS: put Hyper-V code under Viridian maintainership

And add myself as a maintainer.

Sort the list alphabetically while at it.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agox86: fold linker script pre-processing rules
Jan Beulich [Thu, 30 Jan 2020 16:19:46 +0000 (17:19 +0100)]
x86: fold linker script pre-processing rules

There's no need to have twice almost the same rule. Simply add the extra
-DEFI to AFLAGS for the EFI variant, and specify both targets for the
then single rule.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: undo part of "refine link time stub area related assertion"
Jan Beulich [Thu, 30 Jan 2020 16:18:12 +0000 (17:18 +0100)]
x86: undo part of "refine link time stub area related assertion"

The original check was not too strict: While we don't use one page of
memory per CPU, we do use ons page of VA space per CPU. It is the
latter which matters here.

Undo that part of the change, but leave everything else in place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: Move GCC_HAS_VISIBILITY_ATTRIBUTE to Kconfig and common
Anthony PERARD [Wed, 11 Dec 2019 16:38:57 +0000 (16:38 +0000)]
xen: Move GCC_HAS_VISIBILITY_ATTRIBUTE to Kconfig and common

The check for $(CC) -fvisibility=hidden is done by both arm and x86,
so the patch also move the check to the common area.

The check doesn't check if $(CC) is gcc, and clang can accept that
option as well, so s/GCC/CC/ is done to the define name.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: Use $(CONFIG_CC_IS_CLANG) instead of $(clang) in Makefile
Anthony PERARD [Wed, 11 Dec 2019 15:27:33 +0000 (15:27 +0000)]
xen: Use $(CONFIG_CC_IS_CLANG) instead of $(clang) in Makefile

Kconfig can check if $(CC) is clang or not, if it is
CONFIG_CC_IS_CLANG will be set.

With that patch, the hypervisor can be built using clang by running
`make CC=clang CXX=clang++` without needed to provide an extra clang
parameter.

`make clang=y` still works as Config.mk will set CC and CXX.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: Move CONFIG_INDIRECT_THUNK to Kconfig
Anthony PERARD [Wed, 11 Dec 2019 13:55:06 +0000 (13:55 +0000)]
xen: Move CONFIG_INDIRECT_THUNK to Kconfig

Now that Kconfig has the capability to run shell command when
generating CONFIG_* we can use it in some cases to test CFLAGS.

CONFIG_INDIRECT_THUNK is a good example that wants to exist both in
Makefile and as a C macro, which Kconfig do. So use Kconfig to
generate CONFIG_INDIRECT_THUNK and have the CFLAGS depends on that.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: Import cc-ifversion from Kbuild
Anthony PERARD [Wed, 4 Dec 2019 17:13:51 +0000 (17:13 +0000)]
xen: Import cc-ifversion from Kbuild

This is in preparation of importing Kbuild to build Xen. We won't be
able to include Config.mk so we will need a replacement for the macro
`cc-ifversion'.

This patch imports parts of "scripts/Kbuild.include" from Linux v5.4,
the macro cc-ifversion. It makes use of CONFIG_GCC_VERSION that
Kconfig now provides.

Since they are no other use of Xen's `cc-ifversion' macro, we can
remove it.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: Have Kconfig check $(CC)'s version
Anthony PERARD [Wed, 4 Dec 2019 16:33:23 +0000 (16:33 +0000)]
xen: Have Kconfig check $(CC)'s version

This import several files from Linux v5.3
 - scripts/Kconfig.include
 - scripts/clang-version.sh
 - scripts/gcc-version.sh
 and several config values from from Linux's init/Kconfig file.
But gcc-version.sh have been modified to return "0" when $CC isn't
GCC, like clang-version.sh do.

Files are copied into scripts/ directory because that's were the files
are found in Linux tree, and also because we are going to import more
of Kbuild from Linux which is located in scripts/.

CONFIG_GCC_VERSION and CONFIG_CC_IS_CLANG are going to be use in
follow-up patches.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: Update Kconfig to Linux v5.4
Anthony PERARD [Tue, 17 Sep 2019 13:13:50 +0000 (14:13 +0100)]
xen: Update Kconfig to Linux v5.4

This patch updates Kconfig to a more recent version of Kconfig, found
in Linux v5.4.0, 219d54332a09 ("Linux 5.4").

With the updated version of Kconfig, other changes are necessary to
avoid breaking the build.

Kconfig files:
- fix Kconfig files that where using option env=*:
  Since Linux commit 104daea149c4 ("kconfig: reference environment
  variables directly and remove 'option env='"), we can access the
  environment directly via $() and "option env=" as been removed.
- CONFIG_EXPERT='y' will now appear in .config file if
  XEN_CONFIG_EXPERT=y in the environment. The alternative is to change
  "EXPERT" to "$(XEN_CONFIG_EXPERT)" in all Kconfig files.

Makefile:
- silentoldconfig target as been removed from Kconfig. To update
  include/generated/autoconf.h, we need to use syncconfig target
  instead.

Makefile.kconfig:
- Import newer needed code from Linux's Makefile.lib and
  Kbuild.include and Makefile.build.
- Set Q to empty, Xen build system doesn't silence commands. Having Q
  empty mean we can import stuff from Linux without having to remove the
  leading $(Q) from build commands. And quiet='' means commands will be
  echoed.
- Add $(PHONY) to .PHONY. Like it is intended by Kbuild.

Makefile.host is also updated and copied from Linux.

Dependency change:
- Now depends on flex/bison, maybe we could _shipped those files like
  before. Linux doesn't do that anymore.

The .gitignore in kconfig/ has more entries, compared to upstream, for
file generated by Makefile.host.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mem_access: use __get_gfn_type_access in set_mem_access
Tamas K Lengyel [Wed, 29 Jan 2020 14:06:50 +0000 (15:06 +0100)]
x86/mem_access: use __get_gfn_type_access in set_mem_access

Use __get_gfn_type_access instead of p2m->get_entry to trigger page-forking
when the mem_access permission is being set on a page that has not yet been
copied over from the parent.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/suspend: disable watchdog before calling console_start_sync()
Igor Druzhinin [Wed, 29 Jan 2020 14:06:10 +0000 (15:06 +0100)]
x86/suspend: disable watchdog before calling console_start_sync()

... and enable it after exiting S-state. Otherwise accumulated
output in serial buffer might easily trigger the watchdog if it's
still enabled after entering sync transmission mode.

The issue observed on machines which, unfortunately, generate non-0
output in CPU offline callbacks.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/domctl: fix typo in comment
Olaf Hering [Wed, 29 Jan 2020 13:48:54 +0000 (14:48 +0100)]
x86/domctl: fix typo in comment

The array is named msr_policy.

Fixes commit 60529dfeca1

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/mem_sharing: replace MEM_SHARING_DEBUG with gdprintk
Tamas K Lengyel [Wed, 29 Jan 2020 13:48:15 +0000 (14:48 +0100)]
x86/mem_sharing: replace MEM_SHARING_DEBUG with gdprintk

Using XENLOG_ERR level since this is only used in debug paths (ie. it's
expected the user already has loglvl=all set). Also use %pd to print the domain
ids.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/apic: fix disabling LVT0 in disconnect_bsp_APIC
Roger Pau Monné [Wed, 29 Jan 2020 13:47:00 +0000 (14:47 +0100)]
x86/apic: fix disabling LVT0 in disconnect_bsp_APIC

The Intel SDM states:

"When an illegal vector value (0 to 15) is written to a LVT entry and
the delivery mode is Fixed (bits 8-11 equal 0), the APIC may signal an
illegal vector error, without regard to whether the mask bit is set or
whether an interrupt is actually seen on the input."

And that's exactly what's currently done in disconnect_bsp_APIC when
virt_wire_setup is true and LVT LINT0 is being masked. By writing only
APIC_LVT_MASKED Xen is actually setting the vector to 0 and the
delivery mode to Fixed (0), and hence it triggers an APIC error even
when the LVT entry is masked.

This would usually manifest when Xen is being shut down, as that's
where disconnect_bsp_APIC is called:

(XEN) APIC error on CPU0: 40(00)

Fix this by calling clear_local_APIC prior to setting the LVT LINT
registers which already clear LVT LINT0, and hence the troublesome
write can be avoided as the register is already cleared.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agodocs: document CONTROL command of xenstore protocol
Juergen Gross [Tue, 28 Jan 2020 06:21:07 +0000 (06:21 +0000)]
docs: document CONTROL command of xenstore protocol

The CONTROL command (former DEBUG command) isn't specified in the
xenstore protocol doc. Add it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Backport: 4.9+

5 years agodocs: fix StudlyCaps in libxl-migration-stream.pandoc
Wei Liu [Tue, 28 Jan 2020 12:40:31 +0000 (12:40 +0000)]
docs: fix StudlyCaps in libxl-migration-stream.pandoc

Note that "LibxlFmt" in the stream should remain unchanged.

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agodocs: Fix StudlyCaps in libxc-migration-stream.pandoc and xl.1.pod
Ian Jackson [Mon, 27 Jan 2020 16:45:47 +0000 (16:45 +0000)]
docs: Fix StudlyCaps in libxc-migration-stream.pandoc and xl.1.pod

$ git-grep libxenctrl | wc -l
99
$ git-grep libxc | wc -l
206
$ git-grep libxenlight | wc -l
48
$ git-grep libxl | wc -l
13433
$ git-grep LibXen | wc -l
2
$

Reported-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agodocs: add DIRECTORY_PART specification do xenstore protocol doc
Juergen Gross [Mon, 27 Jan 2020 16:50:50 +0000 (17:50 +0100)]
docs: add DIRECTORY_PART specification do xenstore protocol doc

DIRECTORY_PART was missing in docs/misc/xenstore.txt. Add it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Backport: 4.9+

5 years agolibxl: event: Move poller pipe emptying to the end of afterpoll
Ian Jackson [Fri, 10 Jan 2020 13:19:36 +0000 (13:19 +0000)]
libxl: event: Move poller pipe emptying to the end of afterpoll

This seems neater.  It doesn't have any significant effect because:

The poller fd wouldn't be emptied by time_occurs.  It would only be
woken by time_occurs as a result of an ao completing, or by
libxl__egc_ao_cleanup_1_baton.  But ...1_baton won't be called in
between (for one thing, this would violate the rule of not still
having the active caller when ...1_baton is called).

While discussing this patch, I noticed that there is a possibility (in
libxl in general) that poller_put might be called on a woken poller.
It would probably be sensible at some point to make poller_get empty
the pipe, at least if the pipe_nonempty flag is set.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
v3: Completely revised commit message; now we think this is just
    cleanup.

5 years agolibxl: event: Fix possible hang with libxl_osevent_beforepoll
Ian Jackson [Fri, 10 Jan 2020 13:05:42 +0000 (13:05 +0000)]
libxl: event: Fix possible hang with libxl_osevent_beforepoll

If the application uses libxl_osevent_beforepoll, a similar hang is
possible to the one described and fixed in
   libxl: event: Fix hang when mixing blocking and eventy calls
Application behaviour would have to be fairly unusual, but it
doesn't seem sensible to just leave this latent bug.

We fix the latent bug by waking up the "poller_app" pipe every time we
add osevents.  If the application does not ever call beforepoll, we
write one byte to the pipe and set pipe_nonempty and then we ignore
it.  We only write another byte if beforepoll is called again.

Normally in an eventy program there would only be one thread calling
libxl_osevent_beforepoll.  The effect in such a program is to
sometimes needlessly go round the poll loop again if a timeout
callback becomes interested in a new osevent.  We'll fix that in a
moment.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: New addition to correctness arguments in libxl_event.c comment.

5 years agolibxl: event: Break out baton_wake
Ian Jackson [Fri, 10 Jan 2020 13:11:07 +0000 (13:11 +0000)]
libxl: event: Break out baton_wake

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: Now it takes a gc, not an egc.

5 years agolibxl: event: poller pipe optimisation
Ian Jackson [Fri, 10 Jan 2020 13:11:46 +0000 (13:11 +0000)]
libxl: event: poller pipe optimisation

Track in userland whether the poller pipe is nonempty.  This saves us
writing many many bytes to the pipe if nothing ever reads them.

This is going to be relevant in a moment, where we are going to create
a situation where this will happen quite a lot.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
5 years agolibxl: event: Fix hang when mixing blocking and eventy calls
Ian Jackson [Fri, 10 Jan 2020 12:37:43 +0000 (12:37 +0000)]
libxl: event: Fix hang when mixing blocking and eventy calls

If the application calls libxl with ao_how==0 and also makes calls
like _occurred, libxl will sometimes get stuck.

The bug happens as follows (for example):

  Thread A
       libxl_do_thing(,ao_how==0)
       libxl_do_thing starts, sets up some callbacks
       libxl_do_thing exit path calls AO_INPROGRESS
       libxl__ao_inprogress goes into event loop
       eventloop_iteration sleeps on:
          - do_thing's current fd set
          - sigchld pipe if applicable
          - its poller

  Thread B
       libxl_something_occurred
       the something is to do with do_thing, above
       do_thing_next_callback does some more work
       do_thing_next_callback becomes interested in fd N
       thread B returns to application

Note that nothing wakes up thread A.  A is not listening on fd N.  So
do_thing_* will not spot when fd N signals.  do_thing will not make
further timely progress.  If there is no timeout thread A will never
wake up.

The problem here occurs because thread A is waiting on an out of date
osevent set.

There is also the possibility that a thread might block waiting for
libxl osevents but outside libxl, eg if the application used
libxl_osevent_beforepoll.  We will deal with that in a moment.

See the big comment in libxl_event.c for a fairly formal correctness
argument.

This depends on libxl__egc_ao_cleanup_1_baton being called everywhere
an egc or ao is disposed of.  Firstly egcs: in this patch we rename
libxl__egc_cleanup, which means we catch all the disposal sites.
Secondly aos: these are disposed of by (i) AO_CREATE_FAIL
(ii) ao__inprogress and (iii) an event which completes the ao later.
(i) and (ii) we handle by adding the call to _baton.  In the case of
(iii) any such function must be an event-generating function so it has
an egc too, so it will pass on the baton when the egc is disposed.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
---
v2: Call libxl__egc_ao_cleanup_1_baton (renamed from __egc_cleanup) on
    all exits from ao_inprogress, even requests for async processing.
    Fixes a remaining instance of this bug (!)
    This involves disposing of ao->poller somewhat earlier.

v2: New correctness arguments in libxl_event.c comment and
    in commit message.