]> xenbits.xensource.com Git - people/dariof/xen.git/log
people/dariof/xen.git
4 years agoxen: credit2: rebalance the number of CPUs in the scheduler runqueues sched/credit2-max-cpus-runqueue-v2
Dario Faggioli [Wed, 27 May 2020 08:32:11 +0000 (10:32 +0200)]
xen: credit2: rebalance the number of CPUs in the scheduler runqueues

When adding and removing CPUs to/from a pool, we can end up in a
situation where some runqueues have a lot of CPUs, while other have only
a couple of them. Even if the scheduler (namely, the load balancer)
should be capable of dealing with such a situation, it is something that
is better avoided.

We now have all the pieces in place to attempt an actual re-balancement
of the Credit2 scheduler runqueues, so let's go for it.

In short:
- every time we add or remove a CPU, especially considering the topology
  implications (e.g., we may have removed the last HT from a queue, so
  now there's space there for two CPUs, etc), we try to rebalance;
- rebalancing happens under the control of the cpupool_sync() mechanism.
  Basically, it happens from inside a tasklet, after having put the
  cpupool in a quiescent state;
- the new runqueue configuration may end up being both different, but
  also identical to the current one. It would be good to have a way to
  check whether the result would be identical, and in which case skip
  the balancing, but there is no way to do that.

Rebalancing, since it pauses all the domain of a pool, etc, is a time
consuming operation. But it only happens when the cpupool configuration
is changed, so it is considered acceptable.
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
* new patch

4 years agocpupool: create an the 'cpupool sync' infrastructure
Dario Faggioli [Wed, 27 May 2020 07:25:24 +0000 (09:25 +0200)]
cpupool: create an the 'cpupool sync' infrastructure

In case we want to make some live changes to the configuration
of (typically) the scheduler of a cpupool, we need things to be
quiet in that pool.

Not necessarily like with stop machine, but we at least need
to make sure that no domains are neither running not sitting
in the runqueues of the scheduler itself.

In fact, we need exactly something like this mechanism, for
changing "on the fly" which CPUs are assigned to which runqueue
in a Credit2 cpupool (check the following changes).
Therefore, instead than doing something specific for such a
use case, let's implement a generic mechanism.

Reason is, of course, that it may turn out to be useful for
other purposes, in future. But even for this specific case,
it is much easier and cleaner to just cede control to cpupool
code, instead of trying to do everything inside the scheduler.

Within the new cpupool_sync() function, we want to pause all
domains of a pool, including potentially the one calling
the function. Therefore, we defer the pausing, the actual work
and also the unpausing to a tasklet.

Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
---
Changes from v1:
* new patch

4 years agoxen: credit2: compute cpus per-runqueue more dynamically.
Dario Faggioli [Tue, 26 May 2020 07:57:25 +0000 (09:57 +0200)]
xen: credit2: compute cpus per-runqueue more dynamically.

During boot, we use num_online_cpus() as an indication of how
many CPUs will end up in cpupool 0. We then decide (basing also
on the value of the boot time parameter opt_max_cpus_runqueue)
the actual number of CPUs that we want in each runqueue, in such
a way that the runqueue themselves are as balanced (in therms of
how many CPUs they have) as much as possible.

After boot, though, when for instance we are creating a cpupool,
it would be more appropriate to use the number of CPUs of the
pool, rather than the total number of online CPUs.

Do exactly that, even if this means (since from Xen's perspective
CPUs are added to pools one by one) we'll be computing a different
maximum number of CPUs per runqueue at each time.

In fact, we do it in preparation for the next change where,
after having computed the new value, we will also re-balance
the runqueues, by rebuilding them in such a way that the newly
computed maximum is actually respected for all of them.
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
* new patch

4 years agoxen: credit2: limit the max number of CPUs in a runqueue
Dario Faggioli [Thu, 27 Feb 2020 18:10:18 +0000 (19:10 +0100)]
xen: credit2: limit the max number of CPUs in a runqueue

In Credit2 CPUs (can) share runqueues, depending on the topology. For
instance, with per-socket runqueues (the default) all the CPUs that are
part of the same socket share a runqueue.

On platform with a huge number of CPUs per socket, that could be a
problem. An example is AMD EPYC2 servers, where we can have up to 128
CPUs in a socket.

It is of course possible to define other, still topology-based, runqueue
arrangements (e.g., per-LLC, per-DIE, etc). But that may still result in
runqueues with too many CPUs on other/future platforms. For instance, a
system with 96 CPUs and 2 NUMA nodes will end up having 48 CPUs per
runqueue. Not as bad, but still a lot!

Therefore, let's set a limit to the max number of CPUs that can share a
Credit2 runqueue. The actual value is configurable (at boot time), the
default being 16. If, for instance,  there are more than 16 CPUs in a
socket, they'll be split among two (or more) runqueues.

Note: with core scheduling enabled, this parameter sets the max number
of *scheduling resources* that can share a runqueue. Therefore, with
granularity set to core (and assumint 2 threads per core), we will have
at most 16 cores per runqueue, which corresponds to 32 threads. But that
is fine, considering how core scheduling works.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
- always try to add a CPU to the runqueue with the least CPUs already in
  it. This should guarantee a more even distribution of CPUs among
  runqueues, as requested during review;
- rename the matching function from foo_smt_bar() to foo_siblings_bar(),
  which is more generic, and do the same to the per-arch wrappers;
- deal with the case where the user is trying to set fewer CPUs per
  runqueue than there are siblings per core (by putting siblings in the
  same runq anyway, but logging a message), as requested during review;
- use the per-cpupool value for the scheduling granularity, as requested
  during review;
- add a comment about why we also count siblings that are currently
  outside of our cpupool, as suggested during review;
- add a boot command line doc entry;
- fix typos in comments;

4 years agoxen: cpupool: add a back-pointer from a scheduler to its pool
Dario Faggioli [Thu, 28 May 2020 10:52:52 +0000 (12:52 +0200)]
xen: cpupool: add a back-pointer from a scheduler to its pool

If we need to know within which pool a particular scheduler
is working, we can do that by querying the cpupool pointer
of any of the sched_resource-s (i.e., ~ any of the CPUs)
assigned to the scheduler itself.

Basically, we pick any sched_resource that we know uses that
scheduler, and we check its *cpupool pointer. If we really
know that the resource uses the scheduler, this is fine, as
it also means the resource is inside the pool we are
looking for.

But, of course, we can do that for a pool/scheduler that has
not any been given any sched_resource yet (or if we do not
know whether or not it has any sched_resource).

To overcome such limitation, add a back pointer from the
scheduler, to its own pool.
---
Cc: Juergen Gross <jgross@suse.com>
Cc: George Dunlap <george.dunlap@citrix.com>
---
Changes from v1:
* new patch

4 years agoxen: credit2: factor runqueue initialization in its own function.
Dario Faggioli [Wed, 27 May 2020 16:01:35 +0000 (18:01 +0200)]
xen: credit2: factor runqueue initialization in its own function.

As it will be useful in later changes. While there, fix
the doc-comment.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>fdufiudi
Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
* new patch

4 years agoxen: credit2: factor cpu to runqueue matching in a function
Dario Faggioli [Thu, 2 Apr 2020 15:55:24 +0000 (17:55 +0200)]
xen: credit2: factor cpu to runqueue matching in a function

Just move the big if() condition in an inline function.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
4 years agoautomation/containerize: Add a shortcut for Debian unstable
George Dunlap [Thu, 28 May 2020 11:20:57 +0000 (12:20 +0100)]
automation/containerize: Add a shortcut for Debian unstable

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: Add golang packages to various dockerfiles
George Dunlap [Thu, 28 May 2020 11:20:56 +0000 (12:20 +0100)]
automation: Add golang packages to various dockerfiles

Specifically, Fedora 29, Archlinux, and Debian unstable.  This will
cause the CI loop to detect golang build failures.

CentOS 6 and 7 don't have golang packages, and the packages in
stretch, jessie, xenial, and trusty are too old.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation/archlinux: Add 32-bit glibc headers
George Dunlap [Thu, 28 May 2020 11:20:55 +0000 (12:20 +0100)]
automation/archlinux: Add 32-bit glibc headers

This fixes the following build error in hvmloader:

usr/include/gnu/stubs.h:7:11: fatal error: gnu/stubs-32.h: No such file or directory

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agogolang/xenlight: Get rid of GOPATH-based build artefacts
George Dunlap [Thu, 28 May 2020 11:20:54 +0000 (12:20 +0100)]
golang/xenlight: Get rid of GOPATH-based build artefacts

The original build setup used a "fake GOPATH" in tools/golang to test
the mechanism of building from go package files installed on a
filesystem.  With the move to modules, this isn't necessary, and leads
to potentially confusing directories being created.  (I.e., it might
not be obvious that files under tools/golang/src shouldn't be edited.)

Get rid of the code that creates this (now unused) intermediate
directory.  Add direct dependencies from 'build' onto the source
files.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
4 years agolibxl: Generate golang bindings in libxl Makefile
George Dunlap [Thu, 28 May 2020 11:20:53 +0000 (12:20 +0100)]
libxl: Generate golang bindings in libxl Makefile

The generated golang bindings (types.gen.go and helpers.gen.go) are
left checked in so that they can be fetched from xenbits using the
golang tooling.  This means that they must be updated whenever
libxl_types.idl (or other dependencies) are updated.  However, the
golang bindings are only built optionally; we can't assume that anyone
updating libxl_types.idl will also descend into the tools/golang tree
to re-generate the bindings.

Fix this by re-generating the golang bindings from the libxl Makefile
when the IDL dependencies are updated, so that anyone who updates
libxl_types.idl will also end up updating the golang generated files
as well.

 - Make a variable for the generated files, and a target in
   xenlight/Makefile which will only re-generate the files.

 - Add a target in libxl/Makefile to call external idl generation
   targets (currently only golang).

For ease of testing, also add a specific target in libxl/Makefile just
to check and update files generated from the IDL.

This does mean that there are two potential paths for generating the
files during a parallel build; but that shouldn't be an issue, since
tools/golang/xenlight should never be built until after tools/libxl
has completed building anyway.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoVT-x: extend LBR Broadwell errata coverage
Jan Beulich [Thu, 28 May 2020 10:03:25 +0000 (12:03 +0200)]
VT-x: extend LBR Broadwell errata coverage

For lbr_tsx_fixup_check() simply name a few more specific erratum
numbers.

For bdf93_fixup_check(), however, more models are affected. Oddly enough
despite being the same model and stepping, the erratum is listed for
Xeon E3 but not its Core counterpart. Apply the workaround uniformly,
and also for Xeon D, which only has the LBR-from one listed in its spec
update.

Seeing this broader applicability, rename anything BDF93-related to more
generic names.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agox86: relax LDT check in arch_set_info_guest()
Jan Beulich [Thu, 28 May 2020 10:00:24 +0000 (12:00 +0200)]
x86: relax LDT check in arch_set_info_guest()

It is wrong for us to check the base address when there's no LDT in the
first place. Once we don't do this check anymore we can also set the
base address to a non-canonical value when the LDT is empty.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen/arm: call iomem_permit_access for passthrough devices
Stefano Stabellini [Wed, 15 Apr 2020 01:02:55 +0000 (18:02 -0700)]
xen/arm: call iomem_permit_access for passthrough devices

iomem_permit_access should be called for MMIO regions of devices
assigned to a domain. Currently it is not called for MMIO regions of
passthrough devices of Dom0less guests. This patch fixes it.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Julien Grall <julien@xen.org>
4 years agox86/boot: Fix load_system_tables() to be NMI/#MC-safe
Andrew Cooper [Wed, 27 May 2020 12:48:45 +0000 (13:48 +0100)]
x86/boot: Fix load_system_tables() to be NMI/#MC-safe

During boot, load_system_tables() is used in reinit_bsp_stack() to switch the
virtual addresses used from their .data/.bss alias, to their directmap alias.

The structure assignment is implemented as a memset() to zero first, then a
copy-in of the new data.  This causes the NMI/#MC stack pointers to
transiently become 0, at a point where we may have an NMI watchdog running.

Rewrite the logic using a volatile tss pointer (equivalent to, but more
readable than, using ACCESS_ONCE() for all writes).

This does drop the zeroing side effect for holes in the structure, but the
backing memory for the TSS is fully zeroed anyway, and architecturally, they
are all reserved.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/mem_sharing: gate enabling on cpu_has_vmx
Tamas K Lengyel [Wed, 27 May 2020 07:50:55 +0000 (09:50 +0200)]
x86/mem_sharing: gate enabling on cpu_has_vmx

It is unclear whether mem_sharing was ever made to work on other architectures
but at this time the only verified platform for it is vmx. No plans to support
or maintain it on other architectures. Make this explicit by checking during
initialization.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Wei Liu <wl@xen.org>
4 years agox86: clear RDRAND CPUID bit on AMD family 15h/16h
Jan Beulich [Wed, 27 May 2020 07:49:37 +0000 (09:49 +0200)]
x86: clear RDRAND CPUID bit on AMD family 15h/16h

Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:

    There have been reports of RDRAND issues after resuming from suspend on
    some AMD family 15h and family 16h systems. This issue stems from a BIOS
    not performing the proper steps during resume to ensure RDRAND continues
    to function properly.

    Update the CPU initialization to clear the RDRAND CPUID bit for any family
    15h and 16h processor that supports RDRAND. If it is known that the family
    15h or family 16h system does not have an RDRAND resume issue or that the
    system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
    can be used to stop the clearing of the RDRAND CPUID bit.

    Note, that clearing the RDRAND CPUID bit does not prevent a processor
    that normally supports the RDRAND instruction from executing it. So any
    code that determined the support based on family and model won't #UD.

Warn if no explicit choice was given on affected hardware.

Check RDRAND functions at boot as well as after S3 resume (the retry
limit chosen is entirely arbitrary).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/ioemul: Rewrite stub generation to be shadow stack compatible
Andrew Cooper [Mon, 27 Apr 2020 12:19:21 +0000 (13:19 +0100)]
x86/ioemul: Rewrite stub generation to be shadow stack compatible

The logic is completely undocumented and almost impossible to follow.  It
actually uses return oriented programming.  Rewrite it to conform to more
normal call mechanics, and leave a big comment explaining thing.  As well as
the code being easier to follow, it will execute faster as it isn't fighting
the branch predictor.

Move the ioemul_handle_quirk() function pointer from traps.c to
ioport_emulate.c.  There is no reason for it to be in neither of the two
translation units which use it.  Alter the behaviour to return the number of
bytes written into the stub.

Introduce a new nocall annotation using __attribute__((error)) to prohibit
calls being made.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agogolang: Add a variable for the libxl source directory
George Dunlap [Tue, 26 May 2020 11:01:27 +0000 (12:01 +0100)]
golang: Add a variable for the libxl source directory

...rather than duplicating the path in several places.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
4 years agogolang: Add a minimum go version to go.mod
George Dunlap [Tue, 26 May 2020 11:01:26 +0000 (12:01 +0100)]
golang: Add a minimum go version to go.mod

`go build` wants to add the current go version to go.mod as the
minimum every time we run `make` in the directory.  Add 1.11 (the
earliest Go version that supports modules) there to make it happy.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
4 years agox86/shadow: Reposition sh_remove_write_access_from_sl1p()
Andrew Cooper [Thu, 21 May 2020 08:45:27 +0000 (09:45 +0100)]
x86/shadow: Reposition sh_remove_write_access_from_sl1p()

When compiling with SHOPT_OUT_OF_SYNC disabled, the build fails with:

  common.c:41:12: error: ‘sh_remove_write_access_from_sl1p’ declared ‘static’ but never defined [-Werror=unused-function]
   static int sh_remove_write_access_from_sl1p(struct domain *d, mfn_t gmfn,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

due to an unguarded forward declaration.

It turns out there is no need to forward declare
sh_remove_write_access_from_sl1p() to begin with, so move it to just ahead of
its first user, which is within a larger #ifdef'd SHOPT_OUT_OF_SYNC block.

Fix up for style while moving it.  No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
4 years agovmx: let opt_ept_ad always reflect the current setting
Juergen Gross [Mon, 25 May 2020 06:21:55 +0000 (08:21 +0200)]
vmx: let opt_ept_ad always reflect the current setting

In case opt_ept_ad has not been set explicitly by the user via command
line or runtime parameter, it is treated as "no" on Avoton cpus.

Change that handling by setting opt_ept_ad to 0 for this cpu type
explicitly if no user value has been set.

By putting this into the (renamed) boot time initialization of vmcs.c
_vmx_cpu_up() can be made static.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agoxen/arm: plat: Allocate as much as possible memory below 1GB for dom0 for RPI
Julien Grall [Sat, 16 May 2020 19:16:57 +0000 (20:16 +0100)]
xen/arm: plat: Allocate as much as possible memory below 1GB for dom0 for RPI

The raspberry PI 4 has devices that can only DMA into the first GB of
the RAM. Therefore we want allocate as much as possible memory below 1GB
for dom0.

Use the recently introduced dma_bitsize field to specify the DMA width
supported.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reported-by: Corey Minyard <minyard@acm.org>
Tested-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: Take into account the DMA width when allocating Dom0 memory banks
Julien Grall [Sat, 16 May 2020 10:57:00 +0000 (11:57 +0100)]
xen/arm: Take into account the DMA width when allocating Dom0 memory banks

At the moment, Xen is assuming that all the devices are at least 32-bit
DMA capable. However, some SoCs have devices that may be able to access
a much restricted range. For instance, the Raspberry PI 4 has devices
that can only access the first GB of RAM.

The function arch_get_dma_bit_size() will return the lowest DMA width on
the platform. Use it to decide what is the limit for the low memory.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Tested-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: Allow a platform to override the DMA width
Julien Grall [Sat, 16 May 2020 10:41:16 +0000 (11:41 +0100)]
xen/arm: Allow a platform to override the DMA width

At the moment, Xen is assuming that all the devices are at least 32-bit
DMA capable. However, some SoC have devices that may be able to access
a much restricted range. For instance, the RPI has devices that can
only access the first 1GB of RAM.

The structure platform_desc is now extended to allow a platform to
override the DMA width. The new is used to implement
arch_get_dma_bit_size().

The prototype is now moved in asm-arm/mm.h as the function is not NUMA
specific. The implementation is done in platform.c so we don't have to
include platform.h everywhere. This should be fine as the function is
not expected to be called in hotpath.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Tested-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agogolang/xenlight: add an empty line after DO NOT EDIT comment
Nick Rosbrook [Thu, 21 May 2020 14:55:25 +0000 (10:55 -0400)]
golang/xenlight: add an empty line after DO NOT EDIT comment

When generating documentation, pkg.go.dev and godoc.org assume a comment
that immediately precedes the package declaration is a "package
comment", and should be shown in the documentation. Add an empty line
after the DO NOT EDIT comment in generated files to prevent these
comments from appearing as "package comments."

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
4 years agoxen/trace: Don't dump offline CPUs in debugtrace_dump_worker()
Andrew Cooper [Thu, 21 May 2020 08:19:33 +0000 (09:19 +0100)]
xen/trace: Don't dump offline CPUs in debugtrace_dump_worker()

The 'T' debugkey reliably wedges on one of my systems, which has a sparse
APIC_ID layout due to a non power-of-2 number of cores per socket.  The
per_cpu(dt_cpu_data, cpu) calcution falls over the deliberately non-canonical
poison value.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/idle: Extend ISR/C6 erratum workaround to Haswell
Andrew Cooper [Fri, 22 May 2020 14:46:44 +0000 (15:46 +0100)]
x86/idle: Extend ISR/C6 erratum workaround to Haswell

This bug was first discovered against Haswell.  It is definitely affected.

(The XenServer ticket for this bug was opened on 2013-05-30 which is coming up
on 7 years old, and predates Broadwell).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/traps: Rework #PF[Rsvd] bit handling
Andrew Cooper [Mon, 18 May 2020 15:13:33 +0000 (16:13 +0100)]
x86/traps: Rework #PF[Rsvd] bit handling

The reserved_bit_page_fault() paths effectively turn reserved bit faults into
a warning, but in the light of L1TF, the real impact is far more serious.

Make #PF[Rsvd] a hard error, irrespective of mode.  Any new panic() caused by
this constitutes pagetable corruption, and probably an L1TF gadget needing
fixing.

Drop the PFEC_reserved_bit check in __page_fault_type() which has been made
dead by the rearrangement in do_page_fault().

Additionally, drop the comment for do_page_fault().  It is inaccurate (bit 0
being set isn't always a protection violation) and stale (missing bits
5,6,15,31).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/PV: polish pv_set_gdt()
Jan Beulich [Fri, 22 May 2020 14:10:40 +0000 (16:10 +0200)]
x86/PV: polish pv_set_gdt()

There's no need to invoke get_page_from_gfn(), and there's also no need
to update the passed in frames[]. Invoke get_page_and_type() directly.

Also make the function's frames[] parameter const, change its return
type to int, and drop the bogus casts from two of its invocations.

Finally a little bit of cosmetics.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86: relax GDT check in arch_set_info_guest()
Jan Beulich [Fri, 22 May 2020 14:09:54 +0000 (16:09 +0200)]
x86: relax GDT check in arch_set_info_guest()

It is wrong for us to check frames beyond the guest specified limit
(in the compat case another loop bound is already correct).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/idle: prevent entering C3/C6 on some Intel CPUs due to errata
Roger Pau Monné [Fri, 22 May 2020 14:08:54 +0000 (16:08 +0200)]
x86/idle: prevent entering C3/C6 on some Intel CPUs due to errata

Apply a workaround for errata BA80, AAK120, AAM108, AAO67, BD59,
AAY54: Rapid Core C3/C6 Transition May Cause Unpredictable System
Behavior.

Limit maximum C state to C1 when SMT is enabled on the affected CPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/idle: prevent entering C6 with in service interrupts on Intel
Roger Pau Monné [Fri, 22 May 2020 14:07:38 +0000 (16:07 +0200)]
x86/idle: prevent entering C6 with in service interrupts on Intel

Apply a workaround for Intel errata BDX99, CLX30, SKX100, CFW125,
BDF104, BDH85, BDM135, KWB131: "A Pending Fixed Interrupt May Be
Dispatched Before an Interrupt of The Same Priority Completes".

Apply the errata to all server and client models (big cores) from
Broadwell to Cascade Lake. The workaround is grouped together with the
existing fix for errata AAJ72, and the eoi from the function name is
removed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/HVM: cosmetics to hvm_set_cr3()
Jan Beulich [Fri, 22 May 2020 12:41:15 +0000 (14:41 +0200)]
x86/HVM: cosmetics to hvm_set_cr3()

Eliminate the not really useful local variable "old". Reduce the scope
of "page". Rename the latched "current".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/HVM: refuse CR3 loads with reserved (upper) bits set
Jan Beulich [Fri, 22 May 2020 12:40:30 +0000 (14:40 +0200)]
x86/HVM: refuse CR3 loads with reserved (upper) bits set

While bits 11 and below are, if not used for other purposes, reserved
but ignored, bits beyond physical address width are supposed to raise
exceptions (at least in the non-nested case; I'm not convinced the
current nested SVM/VMX behavior of raising #GP(0) here is correct, but
that's not the subject of this change).

Introduce currd as a local variable, and replace other v->domain
instances at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/HVM: move NOFLUSH handling out of hvm_set_cr3()
Jan Beulich [Fri, 22 May 2020 12:37:09 +0000 (14:37 +0200)]
x86/HVM: move NOFLUSH handling out of hvm_set_cr3()

The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in
particular not when loading nested guest state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86emul: correct test harness {evex} assembler capability check
Jan Beulich [Fri, 22 May 2020 12:35:04 +0000 (14:35 +0200)]
x86emul: correct test harness {evex} assembler capability check

The {evex} pseudo prefix gets rejected by gas for insns not allowing
EVEX encoding. Except there's a gas bug due to which its check gets
bypassed for insns without operands. Let's not rely on that bug to
remain there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agogolang: Update generated files after libxl_types.idl change
George Dunlap [Fri, 22 May 2020 09:35:10 +0000 (10:35 +0100)]
golang: Update generated files after libxl_types.idl change

c/s 7efd9f3d45 ("libxl: Handle Linux stubdomain specific QEMU
options.") modified libl_types.idl.  Run gengotypes.py again to update
the geneated golang bindings.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools/xenstore: mark variable in header as extern
Anthony PERARD [Wed, 20 May 2020 16:39:42 +0000 (17:39 +0100)]
tools/xenstore: mark variable in header as extern

This patch fix "multiple definition of `xprintf'" (or xgt_handle)
build error with GCC 10.1.0.

These are the error reported:
    gcc xs_tdb_dump.o utils.o tdb.o talloc.o      -o xs_tdb_dump
    /usr/bin/ld: utils.o:./utils.h:27: multiple definition of `xprintf'; xs_tdb_dump.o:./utils.h:27: first defined here
    [...]
    gcc xenstored_core.o xenstored_watch.o xenstored_domain.o xenstored_transaction.o xenstored_control.o xs_lib.o talloc.o utils.o tdb.o hashtable.o xenstored_posix.o      -lsystemd   -Wl,-rpath-link=... ../libxc/libxenctrl.so -lrt  -o xenstored
    /usr/bin/ld: xenstored_watch.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
    /usr/bin/ld: xenstored_domain.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
    /usr/bin/ld: xenstored_transaction.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
    /usr/bin/ld: xenstored_control.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
    /usr/bin/ld: xenstored_posix.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here

A difference that I noticed with earlier version of the build chain is
that before, I had:
    $ nm xs_tdb_dump.o | grep xprintf
    0000000000000008 C xprintf
And now, it's:
    0000000000000000 B xprintf
With the patch apply, the symbol isn't in xs_tdb_dump.o anymore.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agox86/mem-paging: further adjustments to p2m_mem_paging_prep()'s error handling
Jan Beulich [Wed, 20 May 2020 10:49:28 +0000 (12:49 +0200)]
x86/mem-paging: further adjustments to p2m_mem_paging_prep()'s error handling

Address late comments on ecb913be4aaa ("x86/mem-paging: correct
p2m_mem_paging_prep()'s error handling"):
- insert a gprintk() ahead of domain_crash(),
- add a comment.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/idle: rework C6 EOI workaround
Roger Pau Monné [Wed, 20 May 2020 10:48:37 +0000 (12:48 +0200)]
x86/idle: rework C6 EOI workaround

Change the C6 EOI workaround (errata AAJ72) to use x86_match_cpu. Also
call the workaround from mwait_idle, previously it was only used by
the ACPI idle driver. Finally make sure the routine is called for all
states equal or greater than ACPI_STATE_C3, note that the ACPI driver
doesn't currently handle them, but the errata condition shouldn't be
limited by that.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/setup: lift dom0 creation out into create_dom0() function
David Woodhouse [Wed, 20 May 2020 10:47:48 +0000 (12:47 +0200)]
x86/setup: lift dom0 creation out into create_dom0() function

The creation of dom0 can be relatively self-contained. Shift it into
a separate function and simplify __start_xen() a little bit.

This is a cleanup in its own right, but will be even more desireable
when live update provides an alternative path through __start_xen()
that doesn't involve creating a new dom0 at all.

Move the calculation of the 'initrd' parameter for create_dom0()
down past the cosmetic printk about NX support, because in the fullness
of time the whole initrd and create_dom0() part will be under the same
"not live update" conditional. And in the meantime it's just neater.

Also drop the explicit check for initrd to be module #0 since that would
be the dom0 kernel and the corresponding bit is always clear in
module_map.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agolibxl: Check stubdomain kernel & ramdisk presence
Jason Andryuk [Tue, 19 May 2020 01:55:03 +0000 (21:55 -0400)]
libxl: Check stubdomain kernel & ramdisk presence

Just out of context is the following comment for libxl__domain_make:
/* fixme: this function can leak the stubdom if it fails */

When the stubdomain kernel or ramdisk is not present, the domid and
stubdomain name will indeed be leaked.  Avoid the leak by checking the
file presence and erroring out when absent.  It doesn't fix all cases,
but it avoids a big one when using a linux device model stubdomain.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agodocs: Add device-model-domid to xenstore-paths
Jason Andryuk [Tue, 19 May 2020 01:55:02 +0000 (21:55 -0400)]
docs: Add device-model-domid to xenstore-paths

Document device-model-domid for when using a device model stubdomain.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: consider also qemu in stubdomain in libxl__dm_active check
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:55:01 +0000 (21:55 -0400)]
libxl: consider also qemu in stubdomain in libxl__dm_active check

Since qemu-xen can now run in stubdomain too, handle this case when
checking it's state too.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: ignore emulated IDE disks beyond the first 4
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:55:00 +0000 (21:55 -0400)]
libxl: ignore emulated IDE disks beyond the first 4

Qemu supports only 4 emulated IDE disks, when given more (or with higher
indexes), it will fail to start. Since the disks can still be accessible
using PV interface, just ignore emulated path and log a warning, instead
of rejecting the configuration altogether.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: require qemu in dom0 for multiple stubdomain consoles
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:59 +0000 (21:54 -0400)]
libxl: require qemu in dom0 for multiple stubdomain consoles

Device model stubdomains (both Mini-OS + qemu-trad and linux + qemu-xen)
are always started with at least 3 consoles: log, save, and restore.
Until xenconsoled learns how to handle multiple consoles, this is needed
for save/restore support.

For Mini-OS stubdoms, this is a bug.  In practice, it works in most
cases because there is something else that triggers qemu in dom0 too:
vfb/vkb added if vnc/sdl/spice is enabled.

Additionally, Linux-based stubdomain waits for all the backends to
initialize during boot. Lack of some console backends results in
stubdomain startup timeout.

This is a temporary patch until xenconsoled will be improved.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
[Updated commit message with Marek's explanation from mailing list.]
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: use vchan for QMP access with Linux stubdomain
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:58 +0000 (21:54 -0400)]
libxl: use vchan for QMP access with Linux stubdomain

Access to QMP of QEMU in Linux stubdomain is possible over vchan
connection. Handle the actual vchan connection in a separate process
(vchan-socket-proxy). This simplified integration with QMP (already
quite complex), but also allows preliminary filtering of (potentially
malicious) QMP input.
Since only one client can be connected to vchan server at the same time
and it is not enforced by the libxenvchan itself, additional client-side
locking is needed. It is implicitly implemented by vchan-socket-proxy,
as it handle only one connection at a time. Note that qemu supports only
one simultaneous client on a control socket anyway (but in UNIX socket
case, it enforce it server-side), so it doesn't add any extra
limitation.

libxl qmp client code already has locking to handle concurrent access
attempts to the same qemu qmp interface.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Squash in changes of regenerated autotools files.

Kill the vchan-socket-proxy so we don't leak the daemonized processes.
libxl__stubdomain_is_linux_running() works against the guest_domid, but
the xenstore path is beneath the stubdomain.  This leads to the use of
libxl_is_stubdom in addition to libxl__stubdomain_is_linux_running() so
that the stubdomain calls kill for the qmp-proxy.

Also call libxl__qmp_cleanup() to remove the unix sockets used by
vchan-socket-proxy.  vchan-socket-proxy only creates qmp-libxl-$domid,
and libxl__qmp_cleanup removes that as well as qmp-libxenstat-$domid.
However, it tolerates ENOENT, and a stray qmp-libxenstat-$domid should
not exist.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: Refactor kill_device_model to libxl__kill_xs_path
Jason Andryuk [Tue, 19 May 2020 01:54:57 +0000 (21:54 -0400)]
libxl: Refactor kill_device_model to libxl__kill_xs_path

Move kill_device_model to libxl__kill_xs_path so we have a helper to
kill a process from a pid stored in xenstore.  We'll be using it to kill
vchan-qmp-proxy.

libxl__kill_xs_path takes a "what" string for use in printing error
messages.  kill_device_model is retained in libxl_dm.c to provide the
string.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotools: add simple vchan-socket-proxy
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:56 +0000 (21:54 -0400)]
tools: add simple vchan-socket-proxy

Add a simple proxy for tunneling socket connection over vchan. This is
based on existing vchan-node* applications, but extended with socket
support. vchan-socket-proxy serves both as a client and as a server,
depending on parameters. It can be used to transparently communicate
with an application in another domian that normally expose UNIX socket
interface. Specifically, it's written to communicate with qemu running
within stubdom.

Server mode listens for vchan connections and when one is opened,
connects to a pointed UNIX socket.  Client mode listens on UNIX
socket and when someone connects, opens a vchan connection.  Only
a single connection at a time is supported.

Additionally, socket can be provided as a number - in which case it's
interpreted as already open FD (in case of UNIX listening socket -
listen() needs to be already called). Or "-" meaning stdin/stdout - in
which case it is reduced to vchan-node2 functionality.

Example usage:

1. (in dom0) vchan-socket-proxy --mode=client <DOMID>
    /local/domain/<DOMID>/data/vchan/1234 /run/qemu.(DOMID)

2. (in DOMID) vchan-socket-proxy --mode=server 0
   /local/domain/<DOMID>/data/vchan/1234 /run/qemu.(DOMID)

This will listen on /run/qemu.(DOMID) in dom0 and whenever connection is
made, it will connect to DOMID, where server process will connect to
/run/qemu.(DOMID) there. When client disconnects, vchan connection is
terminated and server vchan-socket-proxy process also disconnects from
qemu.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotools: add missing libxenvchan cflags
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:55 +0000 (21:54 -0400)]
tools: add missing libxenvchan cflags

libxenvchan.h include xenevtchn.h and xengnttab.h, so applications built
with it needs applicable -I in CFLAGS too.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: add save/restore support for qemu-xen in stubdomain
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:54 +0000 (21:54 -0400)]
libxl: add save/restore support for qemu-xen in stubdomain

Rely on a wrapper script in stubdomain to attach relevant consoles to
qemu.  The save console (1) must be attached to fdset/1.  When
performing a restore, $STUBDOM_RESTORE_INCOMING_ARG must be replaced on
the qemu command line by "fd:$FD", where $FD is an open file descriptor
number to the restore console (2).

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Address TODO in dm_state_save_to_fdset: Only remove savefile for
non-stubdom.
Use $STUBDOM_RESTORE_INCOMING_ARG instead of fd:3 and update commit
message.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotools/libvchan: notify server when client is connected
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:53 +0000 (21:54 -0400)]
tools/libvchan: notify server when client is connected

Let the server know when the client is connected. Otherwise server will
notice only when client send some data.
This change does not break existing clients, as libvchan user should
handle spurious notifications anyway (for example acknowledge of remote
side reading the data).

Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Replace spaces with tabs to match the file's whitespace.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoxl: add stubdomain related options to xl config parser
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:52 +0000 (21:54 -0400)]
xl: add stubdomain related options to xl config parser

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: write qemu arguments into separate xenstore keys
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:51 +0000 (21:54 -0400)]
libxl: write qemu arguments into separate xenstore keys

This allows using arguments with spaces, like -append, without
nominating any special "separator" character.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Write arguments in dm-argv directory instead of overloading mini-os's
dmargs string.

Make libxl__write_stub_dmargs vary behaviour based on the
is_linux_stubdom flag.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: Use libxl__xs_* in libxl__write_stub_dmargs
Jason Andryuk [Tue, 19 May 2020 01:54:50 +0000 (21:54 -0400)]
libxl: Use libxl__xs_* in libxl__write_stub_dmargs

Re-work libxl__write_stub_dmargs to use libxl_xs_* functions in a loop.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotools: Use INSTALL_PYTHON_PROG
Anthony PERARD [Wed, 11 Mar 2020 17:59:33 +0000 (17:59 +0000)]
tools: Use INSTALL_PYTHON_PROG

Whenever python scripts are install, have the shebang be modified to use
whatever PYTHON_PATH is. This is useful for system where python isn't available, or
where the package build tools prevent unversioned shebang.

INSTALL_PYTHON_PROG only looks for "#!/usr/bin/env python".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools/python: Fix install-wrap
Anthony PERARD [Wed, 11 Mar 2020 17:59:32 +0000 (17:59 +0000)]
tools/python: Fix install-wrap

This allows to use install-wrap when the source scripts is in a
subdirectory.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: Handle Linux stubdomain specific QEMU options.
Eric Shelton [Tue, 19 May 2020 01:54:49 +0000 (21:54 -0400)]
libxl: Handle Linux stubdomain specific QEMU options.

This patch creates an appropriate command line for the QEMU instance
running in a Linux-based stubdomain.

NOTE: a number of items are not currently implemented for Linux-based
stubdomains, such as:
- save/restore
- QMP socket
- graphics output (e.g., VNC)

Signed-off-by: Eric Shelton <eshelton@pobox.com>
Simon:
 * fix disk path
 * fix cdrom path and "format"

Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
[drop Qubes-specific parts]
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Allow setting stubdomain_ramdisk independently from stubdomain_kernel
Add a qemu- prefix for qemu-stubdom-linux-{kernel,rootfs} since stubdom
doesn't convey device-model.  Use qemu- since this code is qemu specific.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: Allow running qemu-xen in stubdomain
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:48 +0000 (21:54 -0400)]
libxl: Allow running qemu-xen in stubdomain

Do not prohibit anymore using stubdomain with qemu-xen.
To help distingushing MiniOS and Linux stubdomain, add helper inline
functions libxl__stubdomain_is_linux() and
libxl__stubdomain_is_linux_running(). Those should be used where really
the difference is about MiniOS/Linux, not qemu-xen/qemu-xen-traditional.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agolibxl: fix qemu-trad cmdline for no sdl/vnc case
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:47 +0000 (21:54 -0400)]
libxl: fix qemu-trad cmdline for no sdl/vnc case

When qemu is running in stubdomain, any attempt to initialize vnc/sdl
there will crash it (on failed attempt to load a keymap from a file). If
vfb is present, all those cases are skipped. But since
b053f0c4c9e533f3d97837cf897eb920b8355ed3 "libxl: do not start dom0 qemu
for stubdomain when not needed" it is possible to create a stubdomain
without vfb and contrary to the comment -vnc none do trigger VNC
initialization code (just skips exposing it externally).
Change the implicit SDL avoiding method to -nographics option, used when
none of SDL or VNC is enabled.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
4 years agoDocument ioemu Linux stubdomain protocol
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:46 +0000 (21:54 -0400)]
Document ioemu Linux stubdomain protocol

Add documentation for upcoming Linux stubdomain for qemu-upstream.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoDocument ioemu MiniOS stubdomain protocol
Marek Marczykowski-Górecki [Tue, 19 May 2020 01:54:45 +0000 (21:54 -0400)]
Document ioemu MiniOS stubdomain protocol

Add documentation based on reverse-engineered toolstack-ioemu stubdomain
protocol.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotools: use HOSTCC/CPP to compile rombios code and helper
Olaf Hering [Mon, 18 May 2020 14:44:00 +0000 (16:44 +0200)]
tools: use HOSTCC/CPP to compile rombios code and helper

Use also HOSTCFLAGS for biossums while touching the code.

Spotted by inspecting build logfile.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86: determine MXCSR mask in all cases
Jan Beulich [Mon, 18 May 2020 15:18:56 +0000 (17:18 +0200)]
x86: determine MXCSR mask in all cases

For its use(s) by the emulator to be correct in all cases, the filling
of the variable needs to be independent of XSAVE availability. As
there's no suitable function in i387.c to put the logic in, keep it in
xstate_init(), arrange for the function to be called unconditionally,
and pull the logic ahead of all return paths there.

Fixes: 9a4496a35b20 ("x86emul: support {,V}{LD,ST}MXCSR")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/mem-paging: consistently use gfn_t
Jan Beulich [Mon, 18 May 2020 15:17:51 +0000 (17:17 +0200)]
x86/mem-paging: consistently use gfn_t

Where gprintk()s get touched anyway to switch to PRI_gfn, also switch to
%pd for the domain logged.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/mem-paging: move code to its dedicated source file
Jan Beulich [Mon, 18 May 2020 15:16:55 +0000 (17:16 +0200)]
x86/mem-paging: move code to its dedicated source file

Do a little bit of style adjustment along the way, and drop the
"p2m_mem_paging_" prefixes from the now static functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/mem-paging: use guest handle for XENMEM_paging_op_prep
Jan Beulich [Mon, 18 May 2020 15:15:46 +0000 (17:15 +0200)]
x86/mem-paging: use guest handle for XENMEM_paging_op_prep

While it should have been this way from the beginning, not doing so will
become an actual problem with PVH Dom0. The interface change is binary
compatible, but requires tools side producers to be re-built.

Drop the bogus/unnecessary page alignment restriction on the input
buffer at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agox86/mm: no-one passes a NULL domain to init_xen_l4_slots()
Jan Beulich [Mon, 18 May 2020 15:13:38 +0000 (17:13 +0200)]
x86/mm: no-one passes a NULL domain to init_xen_l4_slots()

Drop the NULL checks - they've been introduced by commit 8d7b633ada
("x86/mm: Consolidate all Xen L4 slot writing into
init_xen_l4_slots()") without giving a reason; I'm told this was done
in anticipation of the function potentially getting called with a NULL
argument down the road.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/hvm: Fix shifting in stdvga_mem_read()
Andrew Cooper [Sat, 16 May 2020 18:50:45 +0000 (19:50 +0100)]
x86/hvm: Fix shifting in stdvga_mem_read()

stdvga_mem_read() has a return type of uint8_t, which promotes to int rather
than unsigned int.  Shifting by 24 may hit the sign bit.

Spotted by Coverity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/hvm: Fix memory leaks in hvm_copy_context_and_params()
Andrew Cooper [Sat, 16 May 2020 12:10:07 +0000 (13:10 +0100)]
x86/hvm: Fix memory leaks in hvm_copy_context_and_params()

Any error from hvm_save() or hvm_set_param() leaks the c.data allocation.

Spotted by Coverity.

Fixes: 353744830 "x86/hvm: introduce hvm_copy_context_and_params"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoCHANGELOG: add hypervisor framework and Hyper-V support
Wei Liu [Sat, 16 May 2020 11:54:38 +0000 (12:54 +0100)]
CHANGELOG: add hypervisor framework and Hyper-V support

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Paul Durrant <paul@xen.org>
4 years agoxen/sched: fix latent races accessing vcpu->dirty_cpu
Juergen Gross [Thu, 14 May 2020 15:36:14 +0000 (17:36 +0200)]
xen/sched: fix latent races accessing vcpu->dirty_cpu

The dirty_cpu field of struct vcpu denotes which cpu still holds data
of a vcpu. All accesses to this field should be atomic in case the
vcpu could just be running, as it is accessed without any lock held
in most cases. Especially sync_local_execstate() and context_switch()
for the same vcpu running concurrently have a risk for failing.

There are some instances where accesses are not atomically done, and
even worse where multiple accesses are done when a single one would
be mandated.

Correct that in order to avoid potential problems.

Add some assertions to verify dirty_cpu is handled properly.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/sched: don't call sync_vcpu_execstate() in sched_unit_migrate_finish()
Juergen Gross [Thu, 14 May 2020 15:36:13 +0000 (17:36 +0200)]
xen/sched: don't call sync_vcpu_execstate() in sched_unit_migrate_finish()

With support of core scheduling sched_unit_migrate_finish() gained a
call of sync_vcpu_execstate() as it was believed to be called as a
result of vcpu migration in any case.

In case of migrating a vcpu away from a physical cpu for a short period
of time but without ever being scheduled on the selected new cpu, this
might not be true so drop the call and let the lazy state syncing do its
job.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
4 years agochangelog: add relevant changes during 4.14 development window
Roger Pau Monne [Mon, 11 May 2020 10:31:45 +0000 (12:31 +0200)]
changelog: add relevant changes during 4.14 development window

Add entries for the relevant changes I've been working on during the
4.14 development time frame. Mostly performance improvements related
to pvshim scalability issues when running with high number of vCPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Paul Durrant <paul@xen.org>
4 years agoxen/build: use the correct kconfig makefile
Stewart Hildebrand [Fri, 15 May 2020 18:25:09 +0000 (14:25 -0400)]
xen/build: use the correct kconfig makefile

This resolves the following observed error during config merge:

  /bin/sh /path/to/xen/xen/../xen/tools/kconfig/merge_config.sh -m .config /path/to/xen/xen/../xen/arch/arm/configs/custom.config
  Using .config as base
  Merging /path/to/xen/xen/../xen/arch/arm/configs/custom.config
  #
  # merged configuration written to .config (needs make)
  #
  make -f /path/to/xen/xen/../xen/Makefile olddefconfig
  make[2]: Entering directory '/path/to/xen/xen'
  make[2]: *** No rule to make target 'olddefconfig'.  Stop.
  make[2]: Leaving directory '/path/to/xen/xen'
  tools/kconfig/Makefile:95: recipe for target 'custom.config' failed

The build was invoked by first doing a defconfig (which succeeded):

  $ make -C xen XEN_TARGET_ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig

Followed by the config fragment merge command (which failed before this patch)

  $ cat > xen/arch/arm/configs/custom.config <<EOF
  CONFIG_DEBUG=y
  CONFIG_EARLY_PRINTK_ZYNQMP=y
  EOF
  $ make -C xen XEN_TARGET_ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- custom.config

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
4 years agodomain_page: handle NULL within unmap_domain_page() itself
Hongyan Xia [Wed, 13 May 2020 15:43:33 +0000 (16:43 +0100)]
domain_page: handle NULL within unmap_domain_page() itself

The macro version UNMAP_DOMAIN_PAGE() does both NULL checking and
variable clearing. Move NULL checking into the function itself so that
the semantics is consistent with other similar constructs like XFREE().
This also eases the use unmap_domain_page() in error handling paths,
where we only care about NULL checking but not about variable clearing.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/build: fixup path to merge_config.sh
Stewart Hildebrand [Tue, 12 May 2020 17:52:05 +0000 (13:52 -0400)]
xen/build: fixup path to merge_config.sh

This resolves the following observed error:

/bin/sh: /path/to/xen/xen/../xen/scripts/kconfig/merge_config.sh: No such file or directory

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
4 years agox86: retrieve and log CPU frequency information
Jan Beulich [Fri, 15 May 2020 14:16:29 +0000 (16:16 +0200)]
x86: retrieve and log CPU frequency information

While from just a single Skylake system it is already clear that we
can't base any of our logic on CPUID leaf 15 [1] (leaf 16 is
documented to be used for display purposes only anyway), logging this
information may still give us some reference in case of problems as well
as for future work. Additionally on the AMD side it is unclear whether
the deviation between reported and measured frequencies is because of us
not doing well, or because of nominal and actual frequencies being quite
far apart.

The chosen variable naming in amd_log_freq() has pointed out a naming
problem in rdmsr_safe(), which is being taken care of at the same time.
Symmetrically wrmsr_safe(), being an inline function, also gets an
unnecessary underscore dropped from one of its local variables.

[1] With a core crystal clock of 24MHz and a ratio of 216/2, the
    reported frequency nevertheless is 2600MHz, rather than the to be
    expected (and calibrated by both us and Linux) 2592MHz.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86emul: support X{SUS,RES}LDTRK
Jan Beulich [Fri, 15 May 2020 14:13:03 +0000 (16:13 +0200)]
x86emul: support X{SUS,RES}LDTRK

There's nothing to be done by the emulator, as we unconditionally abort
any XBEGIN.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86emul: support SERIALIZE
Jan Beulich [Fri, 15 May 2020 14:09:22 +0000 (16:09 +0200)]
x86emul: support SERIALIZE

... enabling its use by all guest kinds at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agosched: allow rcu work to happen when syncing cpus in core scheduling
Juergen Gross [Fri, 15 May 2020 14:04:00 +0000 (16:04 +0200)]
sched: allow rcu work to happen when syncing cpus in core scheduling

With RCU barriers moved from tasklets to normal RCU processing cpu
offlining in core scheduling might deadlock due to cpu synchronization
required by RCU processing and core scheduling concurrently.

Fix that by bailing out from core scheduling synchronization in case
of pending RCU work. Additionally the RCU softirq is now required to
be of higher priority than the scheduling softirqs in order to do
RCU processing before entering the scheduler again, as bailing out from
the core scheduling synchronization requires to raise another softirq
SCHED_SLAVE, which would bypass RCU processing again.

Reported-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
4 years agox86/mem-paging: add minimal lock order enforcement to p2m_mem_paging_prep()
Jan Beulich [Fri, 15 May 2020 14:02:39 +0000 (16:02 +0200)]
x86/mem-paging: add minimal lock order enforcement to p2m_mem_paging_prep()

While full checking is impossible (as the lock is being acquired/
released down the call tree), perform at least a lock level check.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/mem-paging: correct p2m_mem_paging_prep()'s error handling
Jan Beulich [Fri, 15 May 2020 14:01:06 +0000 (16:01 +0200)]
x86/mem-paging: correct p2m_mem_paging_prep()'s error handling

Communicating errors from p2m_set_entry() to the caller is not enough:
Neither the M2P nor the stats updates should occur in such a case.
Instead the allocated page needs to be freed again; for cleanliness
reasons also properly take into account _PGC_allocated there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/mem-paging: fold p2m_mem_paging_prep()'s main if()-s
Jan Beulich [Fri, 15 May 2020 13:57:56 +0000 (15:57 +0200)]
x86/mem-paging: fold p2m_mem_paging_prep()'s main if()-s

The condition of the second can be true only if the condition of the
first was met; the second half of the condition of the second then also
is redundant with an earlier check. Combine them, drop a pointless
local variable, and take the liberty to drop the affected gdprintk()
altogether, as we don't normally log anything on -EFAULT paths.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/gen-cpuid: Distinguish default vs max in feature annotations
Andrew Cooper [Tue, 25 Feb 2020 15:33:31 +0000 (15:33 +0000)]
x86/gen-cpuid: Distinguish default vs max in feature annotations

Allow lowercase a/s/h to be used to annotate a non-default feature.

However, until the toolstack migration logic is fixed, it is not safe to
activate yet.  Tolerate the annotations, but ignore them for now.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/APIC: restrict certain messages to BSP
Jan Beulich [Thu, 14 May 2020 13:04:32 +0000 (15:04 +0200)]
x86/APIC: restrict certain messages to BSP

All CPUs get an equal setting of EOI broadcast suppression; no need to
log one message per CPU, even if it's only in verbose APIC mode.

Only the BSP is eligible to possibly get ExtINT enabled; no need to log
that it gets disabled on all APs, even if - again - it's only in verbose
APIC mode.

Take the opportunity and introduce a "bsp" parameter to the function, to
stop using smp_processor_id() to tell BSP from APs. No functional change
from this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoxen: Allow EXPERT mode to be selected from the menuconfig directly
Julien Grall [Thu, 30 Apr 2020 14:25:48 +0000 (15:25 +0100)]
xen: Allow EXPERT mode to be selected from the menuconfig directly

EXPERT mode is currently used to gate any options that are in technical
preview or not security supported. At the moment, this is selected by
adding XEN_CONFIG_EXPERT=y on the make command line, or to the
(currently undocumented) top-level .config file.

This makes the option very unintuitive to use: If the user forgets to
add the option when (re)building or when using menuconfig, then
xen/.config will be silently rewritten, leading to behavior which is
very difficult to diagnose.  Adding XEN_CONFIG_EXPERT=y to the
top-level .config is not obvious behavior, particularly as the file is
undocumented.

A lot of the options behind EXPERT would benefit from being more
accessible so users can experiment with them and voice any concerns
before they are fully supported.

To make this option more discoverable and consistent to use, make it
possible to select it from the menuconfig.

This doesn't change the fact a Xen with EXPERT mode selected will not
be security supported.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoxen/Kconfig: define EXPERT a bool rather than a string
Julien Grall [Thu, 30 Apr 2020 14:25:47 +0000 (15:25 +0100)]
xen/Kconfig: define EXPERT a bool rather than a string

Since commit f80fe2b34f08 "xen: Update Kconfig to Linux v5.4" EXPERT
can only have two values (enabled or disabled). So switch from a string
to a bool.

Take the opportunity to replace all "EXPERT = y" to "EXPERT" and use
squash the lines bool and prompt together in modified place.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agotools/xenstore: don't store domU's mfn of ring page in xenstored
Juergen Gross [Thu, 30 Apr 2020 05:38:42 +0000 (07:38 +0200)]
tools/xenstore: don't store domU's mfn of ring page in xenstored

The XS_INTRODUCE command has two parameters: the mfn (or better: gfn)
of the domain's xenstore ring page and the event channel of the
domain for communicating with Xenstore.

The gfn is not really needed. It is stored in the per-domain struct
in xenstored and in case of another XS_INTRODUCE for the domain it
is tested to match the original value. If it doesn't match the
command is aborted via EINVAL, otherwise the event channel to the
domain is recreated.

As XS_INTRODUCE is limited to dom0 and there is no real downside of
recreating the event channel just omit the test for the gfn to
match and don't return EINVAL for multiple XS_INTRODUCE calls.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agox86/PVH: PHYSDEVOP_pci_mmcfg_reserved should not blindly register a region
Jan Beulich [Thu, 14 May 2020 05:53:55 +0000 (07:53 +0200)]
x86/PVH: PHYSDEVOP_pci_mmcfg_reserved should not blindly register a region

The op has a "is reserved" flag, and hence registration shouldn't
happen unilaterally.

Fixes: eb3dd90e4089 ("x86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/build: Unilaterally disable -fcf-protection
Andrew Cooper [Tue, 12 May 2020 18:18:43 +0000 (19:18 +0100)]
x86/build: Unilaterally disable -fcf-protection

Xen doesn't support CET-IBT yet.  At a minimum, logic is required to enable it
for supervisor use, but the livepatch functionality needs to learn not to
overwrite ENDBR64 instructions.

Furthermore, Ubuntu enables -fcf-protection by default, along with a buggy
version of GCC-9 which objects to it in combination with
-mindirect-branch=thunk-extern (Fixed in GCC 10, 9.4).

Various objects (Xen boot path, Rombios 32 stubs) require .text to be at the
beginning of the object.  These paths explode when .note.gnu.properties gets
put ahead of .text and we end up executing the notes data.

Disable -fcf-protection for all embedded objects.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/build: move -fno-asynchronous-unwind-tables into EMBEDDED_EXTRA_CFLAGS
Andrew Cooper [Wed, 13 May 2020 12:06:28 +0000 (13:06 +0100)]
x86/build: move -fno-asynchronous-unwind-tables into EMBEDDED_EXTRA_CFLAGS

Users of EMBEDDED_EXTRA_CFLAGS already use -fno-asynchronous-unwind-tables, or
ought to.  This shrinks the size of the rombios 32bit stubs in guest memory.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/build32: Discard all orphaned sections
Andrew Cooper [Tue, 12 May 2020 18:18:37 +0000 (19:18 +0100)]
x86/build32: Discard all orphaned sections

Linkers may put orphaned sections ahead of .text, which breaks the calling
requirements.  A concrete example is Ubuntu's GCC-9 default of enabling
-fcf-protection which causes us to try and execute .note.gnu.properties during
Xen's boot.

Put .got.plt in its own section as it specifically needs preserving from the
linkers point of view, and discard everything else.  This will hopefully be
more robust to other unexpected toolchain properties.

Fixes boot from an Ubuntu build of Xen.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/guest: Fix assembler warnings with newer binutils
Andrew Cooper [Tue, 12 May 2020 16:21:33 +0000 (17:21 +0100)]
x86/guest: Fix assembler warnings with newer binutils

GAS of at least version 2.34 complains:

  hypercall_page.S: Assembler messages:
  hypercall_page.S:24: Warning: symbol 'HYPERCALL_set_trap_table' already has its type set
  ...
  hypercall_page.S:71: Warning: symbol 'HYPERCALL_arch_7' already has its type set

which is because the whole page is declared as STT_OBJECT already.  Rearrange
.set with respect to .type in DECLARE_HYPERCALL() so STT_FUNC is already in
place.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agostubdom: Use matching quotes in error message
Andrew Cooper [Wed, 13 May 2020 12:07:53 +0000 (13:07 +0100)]
stubdom: Use matching quotes in error message

This prevents syntax highlighting from believing the rest of the file is a
string.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agotools/libxc: Reduce feature handling complexity in xc_cpuid_apply_policy()
Andrew Cooper [Mon, 2 Mar 2020 14:36:03 +0000 (14:36 +0000)]
tools/libxc: Reduce feature handling complexity in xc_cpuid_apply_policy()

xc_cpuid_apply_policy() is gaining extra parameters to untangle CPUID
complexity in Xen.  While an improvement in general, it does have the
unfortunate side effect of duplicating some settings across multiple
parameters.

Rearrange the logic to only consider 'pae' if no explicit featureset is
provided.  This reduces the complexity for callers who have already provided a
pae setting in the featureset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amzn.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agogolang/xenlight: add necessary module/package documentation
Nick Rosbrook [Wed, 13 May 2020 14:18:19 +0000 (10:18 -0400)]
golang/xenlight: add necessary module/package documentation

Add a README and package comment giving a brief overview of the package.
These also help pkg.go.dev generate better documentation.

Also, add a copy of the LGPL (the same license used by libxl) to
tools/golang/xenlight. This is required for the package to be shown
on pkg.go.dev and added to the default module proxy, proxy.golang.org.

Finally, add an entry for the xenlight package to SUPPORT.md.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
4 years agogolang/xenlight: init xenlight go module
Nick Rosbrook [Wed, 13 May 2020 00:58:06 +0000 (20:58 -0400)]
golang/xenlight: init xenlight go module

Initialize the xenlight Go module using the xenbits git-http URL,
xenbits.xenproject.org/git-http/xen.git/tools/golang/xenlight.

Also simplify the build Make target by using `go build` instead of `go
install`, and do not set GOPATH here because it is now unnecessary.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>