]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agox86: psr: support co-exist features' values setting
Yi Sun [Wed, 11 Oct 2017 12:51:45 +0000 (14:51 +0200)]
x86: psr: support co-exist features' values setting

The whole value array is transferred into 'do_write_psr_msrs'. Then, we can
write all features values on the cos id into MSRs.

Because multiple features may co-exist, we need handle all features to write
values of them into a COS register with new COS ID. E.g:
1. L3 CAT and L2 CAT co-exist.
2. Dom1 and Dom2 share the same COS ID (2). The L3 CAT CBM of Dom1 is 0x1ff,
   the L2 CAT CBM of Dom1 is 0x1f.
3. User wants to change L2 CBM of Dom1 to be 0xf. Because COS ID 2 is
   used by Dom2 too, we have to pick a new COS ID 3. The values of Dom1 on
   COS ID 3 are all default values as below:
           ---------
           | COS 3 |
           ---------
   L3 CAT  | 0x7ff |
           ---------
   L2 CAT  | 0xff  |
           ---------
4. After setting, the L3 CAT CBM value of Dom1 should be kept and the new L2
   CAT CBM is set. So, the values on COS ID 3 should be below.
           ---------
           | COS 3 |
           ---------
   L3 CAT  | 0x1ff |
           ---------
   L2 CAT  | 0xf   |
           ---------

Note that the original -ENOSPC return, which is being transformed into
an ASSERT(), could have been an ASSERT() from the beginning.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86emul: handle address wrapping for VMASKMOVP{S,D}
Jan Beulich [Wed, 11 Oct 2017 12:50:33 +0000 (14:50 +0200)]
x86emul: handle address wrapping for VMASKMOVP{S,D}

I failed to recognize the need to mirror the changes done by 7869e2bafe
("x86emul/fuzz: add rudimentary limit checking") into the earlier
written but later committed 2fe43d333f ("x86emul: support remaining AVX
insns"): Behavior here is the same as for multi-part reads or writes.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/vmx: remove unnecessary is_hvm_domain() test in construct_vmcs()
Boris Ostrovsky [Wed, 11 Oct 2017 12:49:55 +0000 (14:49 +0200)]
x86/vmx: remove unnecessary is_hvm_domain() test in construct_vmcs()

It's a leftover from PVHv1 days.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: implement hvmemul_write() using real mappings
Andrew Cooper [Wed, 11 Oct 2017 12:48:50 +0000 (14:48 +0200)]
x86/hvm: implement hvmemul_write() using real mappings

An access which crosses a page boundary is performed atomically by x86
hardware, albeit with a severe performance penalty.  An important corner case
is when a straddled access hits two pages which differ in whether a
translation exists, or in net access rights.

The use of hvm_copy*() in hvmemul_write() is problematic, because it performs
a translation then completes the partial write, before moving onto the next
translation.

If an individual emulated write straddles two pages, the first of which is
writable, and the second of which is not, the first half of the write will
complete before #PF is raised from the second half.

This results in guest state corruption as a side effect of emulation, which
has been observed to cause windows to crash while under introspection.

Introduce the hvmemul_{,un}map_linear_addr() helpers, which translate an
entire contents of a linear access, and vmap() the underlying frames to
provide a contiguous virtual mapping for the emulator to use.  This is the
same mechanism as used by the shadow emulation code.

This will catch any translation issues and abort the emulation before any
modifications occur.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agoDEBUG PRINTFS
Ian Jackson [Fri, 15 Sep 2017 10:52:32 +0000 (11:52 +0100)]
DEBUG PRINTFS

7 years agoxl: Document VGA problems arising from lack of physmap dmop
Ian Jackson [Fri, 6 Oct 2017 14:30:25 +0000 (15:30 +0100)]
xl: Document VGA problems arising from lack of physmap dmop

Ross reports that stdvga guests do not work, and cirrus guests are
slow, because qemu tries to do xc_domain_add_to_physmap.  We will need
another dmop to fix this properly.

For now, document the problem.

(In the cirrus case, the vram remains mapped at the old guest-physical
addresses, while the guest runs.  We are not sure whether this is a
correctness or security problem and we should advise against it.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <Paul.Durrant@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: xentoolcore_restrict_all: use domid_t
Ian Jackson [Thu, 14 Sep 2017 17:12:57 +0000 (18:12 +0100)]
tools: xentoolcore_restrict_all: use domid_t

This necessitates adding $(CFLAGS_xeninclude) to all the depending
libraries (which can be done via Rules.mk), so that the definition of
domid_t (in xen.h) can be found.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: dm_restrict: Support uid range user
Ian Jackson [Fri, 15 Sep 2017 17:37:19 +0000 (18:37 +0100)]
libxl: dm_restrict: Support uid range user

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: userlookup_helper_getpwnam rename and turn into a macro
Ian Jackson [Fri, 15 Sep 2017 17:35:44 +0000 (18:35 +0100)]
libxl: userlookup_helper_getpwnam rename and turn into a macro

We are going to want versions of getpwuid, too.  And maybe in the
future getgr*.

This is most sanely achieved with a macro, as otherwise the types are
a mess.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: libxl__dm_runas_helper: return pwd
Ian Jackson [Fri, 15 Sep 2017 17:21:53 +0000 (18:21 +0100)]
libxl: libxl__dm_runas_helper: return pwd

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: Rationalise calculation of user to run qemu as
Ian Jackson [Fri, 15 Sep 2017 15:55:54 +0000 (16:55 +0100)]
libxl: Rationalise calculation of user to run qemu as

If the config specifies a user we use that.  Otherwise:

When we are not restricting qemu, there is very little point running
it as a different user than root.  Indeed, previously, creating the
"magic" users would cause qemu to become slightly dysfunctional (for
example, you can't insert a cd that the qemu user can't read).
So, in that case, default to running it as root.

Conversely, if restriction is requested, we must insist on running
qemu as a non-root user.

Sadly the admin is still required to create 2^16-epsilon users!

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxl, libxl: Provide dm_restrict
Ian Jackson [Fri, 15 Sep 2017 15:55:06 +0000 (16:55 +0100)]
xl, libxl: Provide dm_restrict

This functionality is still quite imperfect, but it will be useful in
certain restricted use cases.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore, _restrict_all: Document implementation "complete"
Ian Jackson [Fri, 15 Sep 2017 13:51:58 +0000 (14:51 +0100)]
xentoolcore, _restrict_all: Document implementation "complete"

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: "Implement" for xenstore
Ian Jackson [Fri, 15 Sep 2017 13:01:35 +0000 (14:01 +0100)]
xentoolcore_restrict_all: "Implement" for xenstore

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/xenstore: get_handle: Allocate struct before opening fd
Ian Jackson [Fri, 15 Sep 2017 12:44:50 +0000 (13:44 +0100)]
tools/xenstore: get_handle: Allocate struct before opening fd

Now we can also abolish the temporary local variable "fd" and simply
use h->fd.

This ordering is necessary to be able to call
xentoolcore__register_active_handle sensibly.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/xenstore: get_handle: use "goto err" error handling style
Ian Jackson [Fri, 15 Sep 2017 12:42:38 +0000 (13:42 +0100)]
tools/xenstore: get_handle: use "goto err" error handling style

Replace the ad-hoc exit clauses with the error handling style where
  - local variables contain either things to be freed, or sentinels
  - all error exits go via an "err" label which frees everything

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: "Implement" for xengnttab
Ian Jackson [Fri, 15 Sep 2017 12:35:55 +0000 (13:35 +0100)]
xentoolcore_restrict_all: "Implement" for xengnttab

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: Declare problems due to no evtchn support
Ian Jackson [Fri, 15 Sep 2017 12:35:07 +0000 (13:35 +0100)]
xentoolcore_restrict_all: Declare problems due to no evtchn support

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: Implement for libxenforeignmemory
Ian Jackson [Fri, 15 Sep 2017 11:01:19 +0000 (12:01 +0100)]
xentoolcore_restrict_all: Implement for libxenforeignmemory

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict: Break out xentoolcore__restrict_by_dup2_null
Ian Jackson [Fri, 15 Sep 2017 10:50:07 +0000 (11:50 +0100)]
xentoolcore_restrict: Break out xentoolcore__restrict_by_dup2_null

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: "Implement" for libxencall
Ian Jackson [Fri, 15 Sep 2017 10:44:58 +0000 (11:44 +0100)]
xentoolcore_restrict_all: "Implement" for libxencall

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore_restrict_all: Implement for libxendevicemodel
Ian Jackson [Fri, 15 Sep 2017 10:28:54 +0000 (11:28 +0100)]
xentoolcore_restrict_all: Implement for libxendevicemodel

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: move CONTAINER_OF to xentoolcore_internal.h
Ian Jackson [Thu, 14 Sep 2017 17:05:49 +0000 (18:05 +0100)]
tools: move CONTAINER_OF to xentoolcore_internal.h

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: #include "xentoolcore_internal.h"
Ian Jackson [Thu, 14 Sep 2017 17:02:44 +0000 (18:02 +0100)]
libxl: #include "xentoolcore_internal.h"

We are going to want to move something here.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: qemu-xen build: prepare to link against xentoolcore
Ian Jackson [Fri, 15 Sep 2017 14:25:23 +0000 (15:25 +0100)]
tools: qemu-xen build: prepare to link against xentoolcore

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore: Link into minios (update MINIOS_UPSTREAM_REVISION)
Ian Jackson [Mon, 9 Oct 2017 14:32:01 +0000 (15:32 +0100)]
xentoolcore: Link into minios (update MINIOS_UPSTREAM_REVISION)

We need to do this before we start to make the other libraries call
into xentoolcore, or we break building minios with new the xen.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoxentoolcore: Link into stubdoms
Ian Jackson [Tue, 3 Oct 2017 18:45:52 +0000 (19:45 +0100)]
xentoolcore: Link into stubdoms

We need to do this before we start to make the other libraries call
into xentoolcore, or we break the stubdom build.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxentoolcore, _restrict_all: Introduce new library and implementation
Ian Jackson [Thu, 14 Sep 2017 16:51:08 +0000 (17:51 +0100)]
xentoolcore, _restrict_all: Introduce new library and implementation

In practice, qemu opens a great many fds.  Tracking them all down and
playing whack-a-mole is unattractive.  It is also potentially fragile
in that future changes might accidentally undo our efforts.

Instead, we are going to teach all the Xen libraries how to register
their fds so that they can be neutered with one qemu call.

Right now, nothing will go wrong if some tries to link without
-ltoolcore, but that will stop working as soon as the first other Xen
library starts to register.  So this patch will be followed by the
stubdom build update, and should be followed by a
MINIOS_UPSTREAM_REVISION updated.

Sadly qemu upstream's configuration arrangements are too crude, being
keyed solely off the Xen version number.  So they cannot provide
forward/backward build compatibility across changes in xen-unstable,
like this one.  qemu patches to link against xentoolcore should be
applied in qemu upstream so avoid the qemu build breaking against the
released version of Xen 4.10.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: libxendevicemodel: Provide xendevicemodel_shutdown
Ian Jackson [Fri, 15 Sep 2017 16:21:14 +0000 (17:21 +0100)]
tools: libxendevicemodel: Provide xendevicemodel_shutdown

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen: x86 dm_op: add missing newline before XEN_DMOP_inject_msi
Ian Jackson [Mon, 18 Sep 2017 13:55:45 +0000 (14:55 +0100)]
xen: x86 dm_op: add missing newline before XEN_DMOP_inject_msi

Coding style only; no functional change.

CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
7 years agoxen: Provide XEN_DMOP_remote_shutdown
Ian Jackson [Fri, 15 Sep 2017 16:16:37 +0000 (17:16 +0100)]
xen: Provide XEN_DMOP_remote_shutdown

SCHEDOP_remote_shutdown should be a DMOP so that a deprivileged qemu
can do the propery tidying up.

We need to keep SCHEDOP_remote_shutdown for ABI stability reasons and
because it is needed for PV guests.

CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agodocs: enable per-VCPU extratime flag for RTDS
Meng Xu [Tue, 10 Oct 2017 23:17:45 +0000 (19:17 -0400)]
docs: enable per-VCPU extratime flag for RTDS

Revise xl tool use case by adding -e option
Remove work-conserving from TODO list

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <raistlin@linux.it>
7 years agoxl: enable per-VCPU extratime flag for RTDS
Meng Xu [Tue, 10 Oct 2017 23:17:43 +0000 (19:17 -0400)]
xl: enable per-VCPU extratime flag for RTDS

Change main_sched_rtds and related output functions to support
per-VCPU extratime flag.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <raistlin@linux.it>
7 years agolibxl: enable per-VCPU extratime flag for RTDS
Meng Xu [Tue, 10 Oct 2017 23:17:42 +0000 (19:17 -0400)]
libxl: enable per-VCPU extratime flag for RTDS

Modify libxl_vcpu_sched_params_get/set and sched_rtds_vcpu_get/set
functions to support per-VCPU extratime flag

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <raistlin@linux.it>
7 years agoxen:rtds: towards work conserving RTDS
Meng Xu [Tue, 10 Oct 2017 23:17:41 +0000 (19:17 -0400)]
xen:rtds: towards work conserving RTDS

Make RTDS scheduler work conserving without breaking the real-time guarantees.

VCPU model:
Each real-time VCPU is extended to have an extratime flag
and a priority_level field.
When a VCPU's budget is depleted in the current period,
if it has extratime flag set,
its priority_level will increase by 1 and its budget will be refilled;
othewrise, the VCPU will be moved to the depletedq.

Scheduling policy is modified global EDF:
A VCPU v1 has higher priority than another VCPU v2 if
(i) v1 has smaller priority_leve; or
(ii) v1 has the same priority_level but has a smaller deadline

Queue management:
Run queue holds VCPUs with extratime flag set and VCPUs with
remaining budget. Run queue is sorted in increasing order of VCPUs priorities.
Depleted queue holds VCPUs which have extratime flag cleared and depleted budget.
Replenished queue is not modified.

Distribution of spare bandwidth
Spare bandwidth is distributed among all VCPUs with extratime flag set,
proportional to these VCPUs utilizations

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: Dario Faggioli <raistlin@linux.it>
7 years agotools/libxc: Fix domid parameter types
Andrew Cooper [Fri, 6 Oct 2017 19:00:00 +0000 (20:00 +0100)]
tools/libxc: Fix domid parameter types

Mixed throughout libxc are uint32_t, int, and domid_t for domid parameters.
With a signed type, and an explicitly 16-bit type, it is exceedingly difficult
to construct an INVALID_DOMID constant which works with all of them.  (The
main problem being that domid_t gets unconditionally zero extended when
promoted to int for arithmatic.)

Libxl uses uint32_t consistently everywhere, so alter libxc to match.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
[ wei: fix compilation error in libxl ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoARM: sunxi: support more Allwinner SoCs
Andre Przywara [Sat, 7 Oct 2017 00:06:40 +0000 (01:06 +0100)]
ARM: sunxi: support more Allwinner SoCs

So far we only supported the Allwinner A20 SoC. Add support for most
of the other virtualization capable Allwinner SoCs by:
- supporting the watchdog in newer (sun8i) SoCs
- getting the watchdog address from DT
- adding compatible strings for other 32-bit SoCs
- adding compatible strings for 64-bit SoCs

As all 64-bit SoCs support system reset via PSCI, we don't use the
platform specific reset routine there. Should the 32-bit SoCs start to
properly support the PSCI 0.2 SYSTEM_RESET call, we will use it for them
automatically, as we try PSCI first, then fall back to platform reset.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Use memory flags for modify_xen_mappings rather than custom one
Julien Grall [Mon, 9 Oct 2017 13:23:41 +0000 (14:23 +0100)]
xen/arm: mm: Use memory flags for modify_xen_mappings rather than custom one

This will help to consolidate the page-table code and avoid different
path depending on the action to perform.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoxen/arm: mm: Handle permission flags when adding a new mapping
Julien Grall [Mon, 9 Oct 2017 13:23:40 +0000 (14:23 +0100)]
xen/arm: mm: Handle permission flags when adding a new mapping

Currently, all the new mappings will be read-write non-executable. Allow the
caller to use other permissions.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Embed permission in the flags
Julien Grall [Mon, 9 Oct 2017 13:23:39 +0000 (14:23 +0100)]
xen/arm: mm: Embed permission in the flags

Currently, it is not possible to specify the permission of a new
mapping. It would be necessary to use the function modify_xen_mappings
with a different set of flags.

Introduce a couple of new flags for the permissions (Non-eXecutable,
Read-Only) and also provides definition that combine the memory attribute
and permission for common combinations.

PAGE_HYPERVISOR is now an alias to PAGE_HYPERVISOR_RW (read-write,
non-executable mappings). This does not affect the current mapping using
PAGE_HYPERVISOR because Xen is currently forcing all the mapping to be
non-executable by default (see mfn_to_xen_entry).

A follow-up patch will change modify_xen_mappings to use the new flags.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: page: Describe the layout of flags used to update page tables
Julien Grall [Mon, 9 Oct 2017 13:23:38 +0000 (14:23 +0100)]
xen/arm: page: Describe the layout of flags used to update page tables

Currently, the flags used to update page tables (i.e PAGE_HYPERVISOR_*)
only contains the memory attribute index. Follow-up patches will add
more information in it. So document the current layout.

At the same time introduce PAGE_AI_MASK to get the memory attribute
index easily.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Use PAGE_HYPERVISOR_* instead of MT_* when calling set_fixmap
Julien Grall [Mon, 9 Oct 2017 13:23:37 +0000 (14:23 +0100)]
xen/arm: mm: Use PAGE_HYPERVISOR_* instead of MT_* when calling set_fixmap

At the moment, PAGE_HYPERVISOR_* and MT_* have exactly the same value.
In a follow-up patch the former will be extended to carry more
information.

It looks like the caller of set_fixmap are mixing the both. Stay
consistent and only use PAGE_HYPERVISOR_*. This is also match the
behavior of create_xen_entries and would potentially allow to share some
part in the future.

Also rename the parameter 'attributes' to 'flags' so it is clearer what
is the interface.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Rename 'ai' into 'flags' in create_xen_entries
Julien Grall [Mon, 9 Oct 2017 13:23:36 +0000 (14:23 +0100)]
xen/arm: mm: Rename 'ai' into 'flags' in create_xen_entries

The parameter 'ai' is used either for attribute index or for
permissions. Follow-up patch will rework that parameters to carry more
information. So rename the parameter to 'flags'.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Switch to SYS_STATE_boot just after end_boot_allocator()
Julien Grall [Mon, 9 Oct 2017 13:23:35 +0000 (14:23 +0100)]
xen/arm: Switch to SYS_STATE_boot just after end_boot_allocator()

We should consider the early boot period to end when we stop using the
boot allocator. This is inline with x86 and will be helpful to know
whether we should allocate memory from the boot allocator or xenheap.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Rename and clarify AP[1] in the stage-1 page table
Julien Grall [Mon, 9 Oct 2017 13:23:34 +0000 (14:23 +0100)]
xen/arm: mm: Rename and clarify AP[1] in the stage-1 page table

The description of AP[1] in Xen is based on testing rather than the ARM
ARM.

Per the ARM ARM, on EL2 stage-1 page table, AP[1] is RES1 as the
translation regime applies to only one exception level (see D4.4.4 and
G4.6.1 in ARM DDI 0487B.a).

Update the comment and also rename the field to match the description in
the ARM ARM.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: page: Clean-up the definition of MAIRVAL
Julien Grall [Mon, 9 Oct 2017 13:23:33 +0000 (14:23 +0100)]
xen/arm: page: Clean-up the definition of MAIRVAL

Currently MAIRVAL is defined in term of MAIR0VAL and MAIR1VAL which are
both hardcoded value. This makes quite difficult to understand the value
written in both registers.

Rework the definition by using value of each attribute shifted by their
associated index.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: page: Use ARMv8 naming to improve readability
Julien Grall [Mon, 9 Oct 2017 13:23:32 +0000 (14:23 +0100)]
xen/arm: page: Use ARMv8 naming to improve readability

This is based on the Linux ARMv8 naming scheme (see arch/arm64/mm/proc.S). Each
type will contain "NORMAL" or "DEVICE" to make clear whether each attribute
targets device or normal memory.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: replace ACPI_MEMCPY with memcpy
Stefano Stabellini [Tue, 10 Oct 2017 20:14:19 +0000 (13:14 -0700)]
ARM: replace ACPI_MEMCPY with memcpy

ACPI_MEMCPY is defined as memcpy. The macro is for the benefit of
drivers/acpi and shouldn't be used elsewhere.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: ITS: Expose ITS in the MADT table
Manish Jaggi [Tue, 10 Oct 2017 12:52:32 +0000 (18:22 +0530)]
ARM: ITS: Expose ITS in the MADT table

Add gicv3_its_make_hwdom_madt to update hwdom MADT ITS information.

Signed-off-by: Manish Jaggi <mjaggi@cavium.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: Update Formula to compute MADT size using new callbacks in gic_hw_operations
Manish Jaggi [Tue, 10 Oct 2017 12:52:31 +0000 (18:22 +0530)]
ARM: Update Formula to compute MADT size using new callbacks in gic_hw_operations

estimate_acpi_efi_size needs to be updated to provide correct size of
hardware domains MADT, which now adds ITS information as well.

This patch updates the formula to compute extra MADT size, as per GICv2/3
by calling gic_get_hwdom_extra_madt_size.

Signed-off-by: Manish Jaggi <mjaggi@cavium.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: ITS: Deny hardware domain access to ITS
Manish Jaggi [Tue, 10 Oct 2017 12:52:30 +0000 (18:22 +0530)]
ARM: ITS: Deny hardware domain access to ITS

This patch extends the gicv3_iomem_deny_access functionality by adding
support for ITS region as well. Add function gicv3_its_deny_access.

Signed-off-by: Manish Jaggi <mjaggi@cavium.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: ITS: Populate host_its_list from ACPI MADT Table
Manish Jaggi [Tue, 10 Oct 2017 12:52:29 +0000 (18:22 +0530)]
ARM: ITS: Populate host_its_list from ACPI MADT Table

Added gicv3_its_acpi_init to update host_its_list from MADT table.
For ACPI, host_its structure stores dt_node as NULL.

Signed-off-by: Manish Jaggi <mjaggi@cavium.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoARM: ITS: Introduce common function add_to_host_its_list
Manish Jaggi [Tue, 10 Oct 2017 12:52:28 +0000 (18:22 +0530)]
ARM: ITS: Introduce common function add_to_host_its_list

add_to_host_its_list will update the host_its_list. This common
function to be invoked from gicv3_its_dt_init and gic_v3_its_acpi_probe.

Signed-off-by: Manish Jaggi <mjaggi@cavium.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agotravis: disable UBSAN
Wei Liu [Tue, 10 Oct 2017 17:23:33 +0000 (18:23 +0100)]
travis: disable UBSAN

The stock compiler in travis doesn't support -fsanitize=undefined.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
7 years agox86/vmx: Better description of CR4 settings outside of paged mode
Andrew Cooper [Tue, 26 Sep 2017 16:08:33 +0000 (17:08 +0100)]
x86/vmx: Better description of CR4 settings outside of paged mode

This rearanges the logic to avoid the double !hvm_paging_enabled(v) check, but
is otherwise identical.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/vmx: Don't self-recurse in vmx_update_guest_cr()
Andrew Cooper [Wed, 27 Sep 2017 15:55:09 +0000 (15:55 +0000)]
x86/vmx: Don't self-recurse in vmx_update_guest_cr()

An update to CR4 following a CR0 update can be done easily by falling
through into the CR4 case.  This avoids unnecessary passes through
vmx_vmcs_{enter,exit}() and unnecessary stack usage (as the compiler
cannot optimise this use to a tailcall).

No behavioural change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/vmx: Misc cleanup to vmx_update_guest_cr()
Andrew Cooper [Wed, 27 Sep 2017 15:54:12 +0000 (15:54 +0000)]
x86/vmx: Misc cleanup to vmx_update_guest_cr()

 * Drop trailing whitespace
 * Fix indendation and newlines
 * Use bool where appropriate

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86: Make use of pagetable_get_mfn() where appropriate
Andrew Cooper [Wed, 27 Sep 2017 14:30:13 +0000 (14:30 +0000)]
x86: Make use of pagetable_get_mfn() where appropriate

... instead of the opencoded _mfn(pagetable_get_pfn(...)) construct.

Fix two overly long lines; no functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agoxen: hook up UBSAN with CONFIG_UBSAN
Wei Liu [Mon, 9 Oct 2017 13:54:58 +0000 (14:54 +0100)]
xen: hook up UBSAN with CONFIG_UBSAN

Make the following changes:

1. Introduce CONFIG_UBSAN and other auxiliary options.
2. Introduce Build system rune to filter objects.
3. Make ubsan.c build.

Currently only x86 is supported. All init.o's are filtered out because
of limitation in the build system. There is no user of noubsan-y yet
but it is worth keeping to ease future development.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/ubsan: Implement __ubsan_handle_nonnull_arg()
Andrew Cooper [Tue, 3 Oct 2017 18:07:52 +0000 (19:07 +0100)]
xen/ubsan: Implement __ubsan_handle_nonnull_arg()

This hook appears to be missing from the Linux ubsan implemention.  This patch
is a forward port of https://lkml.org/lkml/2014/10/20/182

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/ubsan: Import ubsan implementation from Linux 4.13
Andrew Cooper [Tue, 3 Oct 2017 18:07:51 +0000 (19:07 +0100)]
xen/ubsan: Import ubsan implementation from Linux 4.13

A future change will adjust it to compile in Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoexamples: add a PVH guest config file template
Roger Pau Monne [Fri, 6 Oct 2017 13:52:01 +0000 (14:52 +0100)]
examples: add a PVH guest config file template

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agoexamples: fix HVM config file example
Roger Pau Monne [Fri, 6 Oct 2017 13:52:00 +0000 (14:52 +0100)]
examples: fix HVM config file example

To use the new 'type' option.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxc: panic when trying to create a PVH guest without kernel support
Roger Pau Monne [Fri, 6 Oct 2017 13:51:59 +0000 (14:51 +0100)]
libxc: panic when trying to create a PVH guest without kernel support

Previously when trying to boot a PV capable but not PVH capable kernel
inside of a PVH container xc_dom_guest_type would succeed and return a
PV guest type, which would lead to failures later on in the build
process.

Instead provide a clear error message when trying to create a PVH
guest using a kernel that doesn't support PVH.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agox86emul: re-order cases of main switch statement
Jan Beulich [Mon, 9 Oct 2017 14:27:33 +0000 (16:27 +0200)]
x86emul: re-order cases of main switch statement

Re-store intended numerical ordering, which has become "violated"
mostly by incremental additions where moving around bigger chunks did
not seem advisable. One exception though at the very top of the
switch(): Keeping the arithmetic ops together seems preferable over
entirely strict ordering.

Additionally move a few macro definitions before their first uses (the
placement is benign as long as those uses are themselves only macro
definitions, but that's going to change when those macros have helpers
broken out).

No (intended) functional change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: support remaining AVX insns
Jan Beulich [Mon, 9 Oct 2017 14:26:46 +0000 (16:26 +0200)]
x86emul: support remaining AVX insns

I.e. those not being equivalents of SSEn ones.

There's one necessary change to generic code: Faulting behavior of
VMASKMOVP{S,D} requires us to do partial reads/writes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxl: set default maptrack frames to 1024
Roger Pau Monne [Mon, 9 Oct 2017 13:30:07 +0000 (14:30 +0100)]
xl: set default maptrack frames to 1024

This is in line with the previous behavior, setting the number of
maptrack frames to 0 will prevent driver domains from working
correctly.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl: set the default grant/maptrack frames at structure init
Roger Pau Monne [Mon, 9 Oct 2017 13:30:06 +0000 (14:30 +0100)]
libxl: set the default grant/maptrack frames at structure init

libxl_domain_build_info had both the maptrack and grant frames set to
0 by default, forcing the client of libxl to set a sane default.

This is not backwards compatible, so instead initialize both
max_grant_frames and max_maptrack_frames to a sane default (ie: like
previous behavior).

This fixes the libvirt tests in osstest.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agofuzz/x86_emulate: clear errors after each iteration
George Dunlap [Mon, 9 Oct 2017 14:04:11 +0000 (16:04 +0200)]
fuzz/x86_emulate: clear errors after each iteration

Once feof() returns true for a stream, it will continue to return true
for that stream until clearerr() is called (or the stream is closed
and re-opened).

In llvm-clang-fast-mode, the same file descriptor is used for each
iteration of the loop, meaning that the "Input too large" check was
broken -- feof() would return true even if the fread() hadn't hit the
end of the file.  The result is that AFL generates testcases of
arbitrary size.

Fix this by clearing the error after each iteration.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agofuzz/x86_emulate: actually use cpu_regs input
George Dunlap [Mon, 9 Oct 2017 14:03:53 +0000 (16:03 +0200)]
fuzz/x86_emulate: actually use cpu_regs input

Commit c07574b reorganized the way fuzzing was done, explicitly
creating a structure that the input data would be copied into.

Unfortunately, the cpu register state used by the emulator is on the
stack; it's cleared, but data is never copied into it.

If we're explicitly setting an entirely new cpu_regs struct for each
new input anyway, there's no need to have two copies around anymore;
just point to the one in the data structure.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86emul: fold/eliminate some local variables
Jan Beulich [Mon, 9 Oct 2017 14:03:10 +0000 (16:03 +0200)]
x86emul: fold/eliminate some local variables

Make i switch-wide (at once making it unsigned, as it should have been)
and introduce n (for immediate use in enter and aam/aad handling).
Eliminate on-stack arrays in pusha/popa handling. Use ea.val instead of
a custom variable in bound handling.

No (intended) functional change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul/fuzz: add rudimentary limit checking
Jan Beulich [Mon, 9 Oct 2017 14:01:22 +0000 (16:01 +0200)]
x86emul/fuzz: add rudimentary limit checking

fuzz_insn_fetch() is the only data access helper where it is possible
to see offsets larger than 4Gb in 16- or 32-bit modes, as we leave the
incoming rIP untouched in the emulator itself. The check is needed here
as otherwise, after successfully fetching insn bytes, we may end up
zero-extending EIP soon after complete_insn, which collides with the
X86EMUL_EXCEPTION-conditional respective ASSERT() in
x86_emulate_wrapper(). (NB: put_rep_prefix() is what allows
complete_insn to be reached with rc set to other than X86EMUL_OKAY or
X86EMUL_DONE. See also commit 53f87c03b4 ["x86emul: generalize
exception handling for rep_* hooks"].)

Add assert()-s for all other (data) access routines, as effective
address generation in the emulator ought to guarantee in-range values.
For them to not trigger, several adjustments to the emulator's address
calculations are needed: While the DstBitBase one is really mandatory,
the specification allows for either original or new behavior for two-
part accesses. Observed behavior on real hardware, however, is for such
accesses to silently wrap at the 2^^32 boundary in other than 64-bit
mode, just like they do at the 2^^64 boundary in 64-bit mode, which our
code is now being brought in line with. While adding truncate_ea()
invocations there, also convert open coded instances of it.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/domctl: Fix Xen heap leak via XEN_DOMCTL_getvcpucontext
Andrew Cooper [Sun, 8 Oct 2017 14:12:18 +0000 (15:12 +0100)]
xen/domctl: Fix Xen heap leak via XEN_DOMCTL_getvcpucontext

The backing structure for XEN_DOMCTL_getvcpucontext is only zeroed in the x86
HVM case.  At the very least, this means that ARM returns junk through its
flags field (as it is only ever conditionally or'd into), and x86 PV leaks
data through gdt_frames[14...15].  (An exhaustive search for other leaks
hasn't been performed).

Unconditionally zero the memory upon allocation, and forgo the double clear
for x86 HVM.  These hypercalls are not on hotpaths.

Note that this does not qualify for an XSA.  Per XSA-77,
XEN_DOMCTL_getvcpucontext is unsafe for disaggregation, meaning that only the
control domain can use this hypercall.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoxenoprof: convert the file to use typesafe MFN
Julien Grall [Mon, 9 Oct 2017 11:26:35 +0000 (13:26 +0200)]
xenoprof: convert the file to use typesafe MFN

The file common/xenoprof.c is now converted to use typesafe. This is
requiring to override the macros virt_to_mfn and mfn_to_page to make
them work with mfn_t.

Also, add a couple of missing newlines in the code modified.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agokexec, kimage: convert kexec and kimage to use typesafe mfn_t
Julien Grall [Mon, 9 Oct 2017 11:25:53 +0000 (13:25 +0200)]
kexec, kimage: convert kexec and kimage to use typesafe mfn_t

At the same time, correctly align one the prototype changed.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: use maddr_to_page and maddr_to_mfn to avoid open-coded >> PAGE_SHIFT
Julien Grall [Mon, 9 Oct 2017 11:24:50 +0000 (13:24 +0200)]
x86: use maddr_to_page and maddr_to_mfn to avoid open-coded >> PAGE_SHIFT

The constructions _mfn(... > PAGE_SHIFT) and mfn_to_page(... >> PAGE_SHIFT)
could respectively be replaced by maddr_to_mfn(...) and
maddr_to_page(...).

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agoRCU: make the period of the idle timer adaptive
Dario Faggioli [Mon, 9 Oct 2017 11:24:01 +0000 (13:24 +0200)]
RCU: make the period of the idle timer adaptive

Basically, if the RCU idle timer, when (if!) it fires,
finds that the grace period isn't over, we increase the
timer's period (i.e., it will fire later, next time).
If, OTOH, it finds the grace period is already finished,
we decrease the timer's period (i.e., it will fire a bit
earlier next time).

The goal is to let the period timer sefl-adjust to a
number of 'misses', of the order of 1%.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoRCU: make the period of the idle timer configurable
Dario Faggioli [Mon, 9 Oct 2017 11:23:24 +0000 (13:23 +0200)]
RCU: make the period of the idle timer configurable

Make it possible for the user to specify, with the boot
time parameter rcu-idle-timer-period-ms, how frequently
a CPU that went idle with pending RCU callbacks should be
woken up to check if the grace period ended.

Typical values (i.e., some of the values used by Linux as
the tick frequency) are 10, 4 or 1 ms. Default valus (used
when this parameter is not specified) is 10ms. Maximum is
100ms.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoRCU: let the RCU idle timer handler run
Dario Faggioli [Mon, 9 Oct 2017 11:22:07 +0000 (13:22 +0200)]
RCU: let the RCU idle timer handler run

If stop_timer() is called between when the RCU
idle timer's interrupt arrives (and TIMER_SOFTIRQ is
raised) and when softirqs are checked and handled, the
timer is deactivated, and the handler never runs.

This happens to the RCU idle timer because stop_timer()
is called on it during the wakeup from idle (e.g., C-states,
on x86) path.

To fix that, we avoid calling stop_timer(), in case we see
that the timer itself is:
- still active,
- expired (i.e., it's expiry time is in the past).
In fact, that indicates (for this particular timer) that
it has fired, and we are just about to handle the TIMER_SOFTIRQ
(which will perform the timer deactivation and run its handler).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen, tools: console.h shouldn't require string.h by default
Wei Liu [Fri, 6 Oct 2017 17:41:09 +0000 (18:41 +0100)]
xen, tools: console.h shouldn't require string.h by default

Unilaterally making string.h a prerequisite for console.h is going to
break build for a lot of consumers of console.h.

Define a macro for the new flex ring. Consumers which want to use it
should define the macro.

Partially revert af8d9356417cb617b635c5ace782388ebfe86e3a.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/arm: p2m: Read *_mapped_gfn with the p2m lock taken
Julien Grall [Thu, 14 Sep 2017 15:39:01 +0000 (16:39 +0100)]
xen/arm: p2m: Read *_mapped_gfn with the p2m lock taken

*_mapped_gfn are currently read before acquiring the lock. However, they
may be modified by the p2m code before the lock was acquired. This means
we will use the wrong values.

Fix it by moving the read inside the section protected by the p2m lock.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoMAINTAINERS: update entries to Dario's new email address
Dario Faggioli [Fri, 6 Oct 2017 16:02:34 +0000 (18:02 +0200)]
MAINTAINERS: update entries to Dario's new email address

Replace, in the 'M:' fields of the components I co-maintain
('CPU POOLS', 'SCHEDULING' and 'RTDS SCHEDULER'), the Citrix
email, to which I don't have access any longer, with my
personal email.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Meng Xu <mengxu@cis.upenn.edu>?
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agons16550: fix ISR lockup on Allwinner uart
Awais Masood [Fri, 6 Oct 2017 16:01:50 +0000 (18:01 +0200)]
ns16550: fix ISR lockup on Allwinner uart

This patch fixes an ISR lockup seen on Allwinner uart

On Allwinner H5, serial driver goes into an infinite loop
when interrupts are enabled. The reason is a residual
"busy detect" interrupt. Since the condition UART_IIR_NOINT
will not be true unless this interrupt is cleared, the
interrupt handler will remain locked up in this while loop.

A HW quirk fix was previously added for designware uart under
commit:
50417cd978aa54930d065ac1f139f935d14af76d

It checks for a busy condition during setup and clears the
condition by reading UART_USR register.

On Allwinner hardware, the "busy detect" condition occurs
later because an LCR write is performed during setup 'after'
this clear and if uart is busy, the "busy detect" condition
will trigger again and cause the ISR lockup.

To solve this problem, the same UART_USR read operation needs
to be performed within the interrupt handler to clear this
condition.

Linux dw 8250 driver also handles this condition within
interrupt handler
http://elixir.free-electrons.com/linux/latest/source/drivers/tty/serial/8250/8250_dw.c#L233

Tested on Orange Pi PC2 (H5). This issue is seen on H3
as well and the same fix works.

Signed-off-by: Awais Masood <awais.masood@vadion.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/tmem: Drop unnecessary noinline attribute
Andrew Cooper [Tue, 3 Oct 2017 18:07:50 +0000 (19:07 +0100)]
xen/tmem: Drop unnecessary noinline attribute

tmem_mempool_page_get() is only referenced by address, so isn't eligable for
inlining in the first place.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoxen/kimage: Remove defined but unused variables
Julien Grall [Thu, 5 Oct 2017 17:42:18 +0000 (18:42 +0100)]
xen/kimage: Remove defined but unused variables

In the function kimage_alloc_normal_control_page, the variables mfn and
emfn are defined but not used. Remove them.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/x86: mem_sharing: Use copy_domain_page in __mem_sharing_unshare_page
Julien Grall [Thu, 5 Oct 2017 17:42:16 +0000 (18:42 +0100)]
xen/x86: mem_sharing: Use copy_domain_page in __mem_sharing_unshare_page

The function __mem_sharing_unshare_page contains an open-code version of
copy_domain_page. Use the function to simplify a bit the code.

At the same time replace _mfn(__page_to_mfn(...)) by page_to_mfn(...)
given that the file given already provides a typesafe version of page_to_mfn.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
7 years agox86/np2m: add break to np2m_flush_eptp()
Sergey Dyasli [Tue, 3 Oct 2017 15:21:04 +0000 (16:21 +0100)]
x86/np2m: add break to np2m_flush_eptp()

Now that np2m sharing is implemented, there can be only one np2m object
with the same np2m_base. Break from loop if the required np2m was found
during np2m_flush_eptp().

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
7 years agox86/np2m: refactor p2m_get_nestedp2m_locked()
Sergey Dyasli [Tue, 3 Oct 2017 15:21:03 +0000 (16:21 +0100)]
x86/np2m: refactor p2m_get_nestedp2m_locked()

Remove some code duplication.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
7 years agox86/np2m: implement sharing of np2m between vCPUs
Sergey Dyasli [Tue, 3 Oct 2017 15:21:02 +0000 (16:21 +0100)]
x86/np2m: implement sharing of np2m between vCPUs

At the moment, nested p2ms are not shared between vcpus even if they
share the same base pointer.

Modify p2m_get_nestedp2m() to allow sharing a np2m between multiple
vcpus with the same np2m_base (L1 np2m_base value in VMCx12).

If the current np2m doesn't match the current base pointer, first look
for another nested p2m in the same domain with the same base pointer,
before reclaiming one from the LRU.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
7 years agox86/np2m: send flush IPIs only when a vcpu is actively using an np2m
Sergey Dyasli [Tue, 3 Oct 2017 15:21:01 +0000 (16:21 +0100)]
x86/np2m: send flush IPIs only when a vcpu is actively using an np2m

Flush IPIs are sent to all cpus in an np2m's dirty_cpumask when
updated.  This mask however is far too broad.  A pcpu's bit is set in
the cpumask when a vcpu runs on that pcpu, but is only cleared when a
flush happens.  This means that the IPI includes the current pcpu of
vcpus that are not currently running, and also includes any pcpu that
has ever had a vcpu use this p2m since the last flush (which in turn
will cause spurious invalidations if a different vcpu is using an np2m).

Avoid these IPIs by keeping closer track of where an np2m is being used,
and when a vcpu needs to be flushed:

- On schedule-out, clear v->processor in p2m->dirty_cpumask
- Add a 'generation' counter to the p2m and nestedvcpu structs to
  detect changes that would require re-loads on re-entry
- On schedule-in or p2m change:
  - Set v->processor in p2m->dirty_cpumask
  - flush the vcpu's nested p2m pointer (and update nv->generation) if
    the generation changed

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
7 years agox86/vvmx: make updating shadow EPTP value more efficient
Sergey Dyasli [Tue, 3 Oct 2017 15:21:00 +0000 (16:21 +0100)]
x86/vvmx: make updating shadow EPTP value more efficient

At the moment, the shadow EPTP value is written unconditionally in
ept_handle_violation().

Instead, write the value on vmentry to the guest; but only write it if
the value needs updating.

To detect this, add a flag to the nestedvcpu struct, stale_np2m, to
indicate when such an action is necessary.  Set it when the nested p2m
changes or when the np2m is flushed by an IPI, and clear it when we
write the new value.

Since an IPI invalidating the p2m may happen between
nvmx_switch_guest() and vmx_vmenter, but we can't perform the vmwrite
with interrupts disabled, check the flag just before entering the
guest and restart the vmentry if it's set.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
7 years agox86/np2m: simplify nestedhvm_hap_nested_page_fault()
Sergey Dyasli [Tue, 3 Oct 2017 15:20:59 +0000 (16:20 +0100)]
x86/np2m: simplify nestedhvm_hap_nested_page_fault()

There is a possibility for nested_p2m to became stale between
nestedhvm_hap_nested_page_fault() and nestedhap_fix_p2m().  At the moment
this is handled by detecting such a race inside nestedhap_fix_p2m() and
special-casing it.

Instead, introduce p2m_get_nestedp2m_locked(), which will returned a
still-locked p2m.  This allows us to call nestedhap_fix_p2m() with the
lock held and remove the code detecting the special-case.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
7 years agox86/np2m: remove np2m_base from p2m_get_nestedp2m()
Sergey Dyasli [Tue, 3 Oct 2017 15:20:58 +0000 (16:20 +0100)]
x86/np2m: remove np2m_base from p2m_get_nestedp2m()

Remove np2m_base parameter as it should always match the value of
np2m_base in VMCx12.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/np2m: flush all np2m objects on nested INVEPT
Sergey Dyasli [Tue, 3 Oct 2017 15:20:57 +0000 (16:20 +0100)]
x86/np2m: flush all np2m objects on nested INVEPT

At the moment, nvmx_handle_invept() updates the current np2m just to
flush it.  Instead introduce a function, np2m_flush_base(), which will
look up the np2m base pointer and call p2m_flush_table() instead.

Unfortunately, since we don't know which p2m a given vcpu is using, we
must flush all p2ms that share that base pointer.

Convert p2m_flush_table() into p2m_flush_table_locked() in order not
to release the p2m_lock after np2m_base check.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
7 years agox86/np2m: refactor p2m_get_nestedp2m()
Sergey Dyasli [Tue, 3 Oct 2017 15:20:56 +0000 (16:20 +0100)]
x86/np2m: refactor p2m_get_nestedp2m()

1. Add a helper function assign_np2m()
2. Remove useless volatile
3. Update function's comment in the header
4. Minor style fixes ('\n' and d)

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
7 years agolibxl: use correct type modifier for vuart_gfn
Wei Liu [Thu, 5 Oct 2017 09:35:28 +0000 (10:35 +0100)]
libxl: use correct type modifier for vuart_gfn

Fixes compilation error like:

libxl_console.c: In function ‘libxl__device_vuart_add’:
libxl_console.c:379:5: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘xen_pfn_t’ [-Werror=format=]
      flexarray_append(ro_front, GCSPRINTF("%lu", state->vuart_gfn));

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
7 years agolivepatch: Expand check for safe_for_reapply if livepatch has only .rodata.
Konrad Rzeszutek Wilk [Wed, 2 Aug 2017 00:29:01 +0000 (00:29 +0000)]
livepatch: Expand check for safe_for_reapply if livepatch has only .rodata.

If the livepatch has only .rodata sections then it is OK to also
apply/revert/apply the livepatch without having to worry about the
unforseen consequences.

See commit 98b728a7b235c67e210f67f789db5d9eb38ca00c
"livepatch: Disallow applying after an revert" for details.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agolivepatch: Declare live patching as a supported feature
Ross Lagerwall [Wed, 28 Jun 2017 16:13:44 +0000 (17:13 +0100)]
livepatch: Declare live patching as a supported feature

See docs/features/livepatch.pandoc for the details.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agomkhex: Move it to tools/misc
Konrad Rzeszutek Wilk [Mon, 18 Sep 2017 21:25:54 +0000 (17:25 -0400)]
mkhex: Move it to tools/misc

It makes more sense to put a tool to be used by other subsystems
to be in 'tools/misc' along 'mkrpm','mkdeb', etc.

The patch titled "xen/livepatch/x86/arm32: Force .livepatch.depends
section to be uint32_t aligned" uses mkhex.

Acked-by: Wei Liu <wei.liu2@citrix.com>
Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agolivepatch: Include sizes when an mismatch occurs
Konrad Rzeszutek Wilk [Tue, 20 Jun 2017 14:55:12 +0000 (10:55 -0400)]
livepatch: Include sizes when an mismatch occurs

If the .bug.frames.X or .livepatch.funcs sizes are different
than what the hypervisor expects - we fail the payload. To help
in diagnosing this include the expected and the payload
sizes.

Also make it more natural by having "Multiples" in the warning.

Also fix one case where we would fail if the size of the .ex_table
was being zero - but that is OK.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>