xenbits.xensource.com Git - people/dariof/xen.git/log

xen/arm: Fix the issue in cmp_mmio_handler used in find_mmio_handler

This patch fixes the wrong range check done in cmp_mmio_handler().

This function returns -1 , 0 or 1 based on whether the key value
is below the range, in the range or above the range where the range is
(start, start+size). However, it should check against (start, start+size-1)
because start+size falls outside the range.

This resulted in returning a wrong mmio_handler for a given mmio address which
happened to be start+size.

This bug was introduced when the mmio region search switched from
linear search to binary search in the following commit:

8047e09 "xen/arm: io: Use binary search for mmio handler lookup".

Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

xen: fail gnttab_grow_table() in case of missing allocations

In case gnttab_grow_table() is being called without
grant_table_set_limits() having been called for the domain, e.g. in
case of a toolstack error, fail the function instead of crashing the
system.

While at it let gnttab_grow_table() return a proper error code instead
of 1 for success.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

xen/gnttab: Clean up goto tangle in grant_table_init()

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

libxl: remove device model "none" from IDL

And the xl.cfg man page documentation.

It should be possible to re-introduce it in the future with a proper
implementation, in order to create a HVM guest without a device model,
which is slightly different from a PVHv2 guest.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xl: add PVH as a guest type

And remove device model "none".

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add PVH support to x86 functions

This also includes the x86 ACPI related functions. Remove support for
device model "none"

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add PVH support to USB

Add PVH support to usb related functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: remove device model "none" support from stream functions

Remove the usage of device model "none" in the migration stream
related functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: PVH guests use PV nics

Remove device model "none" support from the nic functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add PVH support to memory functions

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add PVH support to vpcu hotplug, domain destruction/pause and domain configuration

And remove support for device model "none".

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add PVH support to domain save/suspend

And remove the device model "none" support.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add PVH support to domain building

And remove device model "none" support.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: set device model for PVH guests

PVH guests use the same device model selection as PV guests, because
PVH guests only use the device model for the PV backends.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: remove device model "none" support from disk related functions

CD-ROM backend selection was partially based on the device model, this
is no longer needed since the device model "none" is now removed, so
HVM guests always have a device model.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add PVH support to domain creation

Remove the device model "none" support from domain creation and
introduce support for PVH.

This requires changing some of the HVM checks to be applied for both
HVM and PVH.

Setting device model to none was never supported since it was an
unstable interface used while transitioning from PVHv1 to PVHv2.

Now that PVHv1 has been finally removed and that a supported
interface for PVHv2 is being added this option is no longer necessary,
hence it's removed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: set PVH guests to use the PV console

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: allow PVH guests to use a bootloader

Allow PVH guests to boot using a bootloader.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: introduce a PVH guest type

The new guest type is introduced to the libxl IDL. libxl__domain_make
is also modified to save the guest type, and libxl__domain_type is
expanded to fetch that information when detecting guest type.

This is required because the hypervisor only differentiates between PV
and HVM guests, so libxl needs some extra information in order to
differentiate between a HVM and a PVH guest.

The new PVH guest type and its options are documented on the xl.cfg
man page.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xl: introduce a firmware option

The new firmware option aims to provide a coherent way to set the
firmware for the different kind of guests Xen supports.

For PV guests the available firmwares are pvgrub{32|64}, and for HVM
the following are supported: bios, uefi, seabios, rombios and ovmf.
Note that uefi maps to ovmf, and bios maps to the default firmware for
each device model.

The xl.cfg man page is updated to document the new feature.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xl: introduce a domain type option

Introduce a new type option to xl configuration files in order to
specify the domain type. This supersedes the current builder option.

The new option is documented in the xl.cfg man page, and the previous
builder option is marked as deprecated.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl/xl: use the new location of domain_build_info fields

This is required because those options will be used by the new PVH
guest type, and thus need to be shared between PV and HVM.

Defines are added in order to signal consumers that the fields are
available.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xl: parsing code movement

Code movement in preparation for making the bootloader,
bootloader_args, nested_hvm and timer_mode fields shared between all
guests types. While moving the code, limit the line-length to 80
columns.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: introduce a way to mark fields as deprecated in the idl

The deprecation involves generating a function that copies the
deprecated fields into it's new location if the new location has not
been set.

The fields that are going to be shared between PVH and HVM or between
PVH and PV are moved to the top level of libxl_domain_build_info, and
the old locations are marked as deprecated.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add is_default checkers for string and timer_mode types

Those types are missing a helper to check whether a definition of the
type holds the default value. This will be required by a later patch
that will implement deprecation of fields inside of a libxl type.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxendevicemodel: initialise extent.pad to zero

The pad field needs to be zero as required by the hypervisor.

Instead of setting the pad separately, use C99 initialiser.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

x86/hvm/dmop: fix EFAULT condition

The copy macro returns false when the copy fails.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

public: add some comments to arch-x86/cpuid.h

Leaf 4 of the Xen-specific CPUID leaves isn't mentioned at all in
include/public/arch-x86/cpuid.h, the comments for leaf 5 don't tell
anything about the sub-leaf semantics.

Add comments to clarify the interface.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86: add new Xen cpuid node for max address width info

On very large hosts a pv-guest needs to know whether it will have to
handle frame numbers larger than 32 bits in order to select the
appropriate grant interface version.

Add a new Xen specific CPUID node to contain the maximum machine address
width similar to the x86 CPUID node 0x80000008 containing the maximum
physical address width. The maximum frame width needs to take memory
hotplug into account.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

arm: move arch specific grant table bits into grant_table.c

Instead of attaching the ARM specific grant table data to the domain
structure add it to struct grant_table. Add the needed arch functions
to the asm-*/grant_table.h includes.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

gnttab: delay allocation of sub structures

Delay the allocation of the grant table sub structures in order to
allow modifying parameters needed for sizing of these structures at a
per domain basis. Allocate the structures and the table frames only
from grant_table_init().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

libxl: add libxl support for setting grant table resource limits

Add new domain config items for setting the limits for the maximum
numbers of grant table frames and maptrack frames of a domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

xl: add global grant limit config items

Add xl.conf config items for default values of grant limits:

max_grant_frames will set the default for the maximum number of grant
frames for a domain which will take effect if the domain's config file
doesn't specify a value. If max_grant_frames isn't set in xl.conf it
will default to 32 for hosts with all memory below 16TB and to 64 for
hosts with memory above 16TB.

max_maptrack_frames will set the default for the maximum number of
maptrack frames for a domain. If max_maptrack_frames isn't set in
xl.conf it will default to 0, as normally only backend domains need
maptrack frames.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: add max possible mfn to libxl_physinfo

Add the maximum possible mfn of the host to the libxl_physinfo
data structure.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools: set grant limits for xenstore stubdom

When creating a Xenstore stubdom set the grant limits: the stubdom
will need to setup a very limited amount of grants only, so 4 grant
frames are enough. For being able to support up to 32768 domains it
will need 128 maptrack frames (1 mapping per domain, 256 maptrack
entries per maptrack frame).

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxc: add libxc support for setting grant table resource limits

Add a new libxc function xc_domain_set_gnttbl_limits() setting the
limits for the maximum numbers of grant table frames and maptrack
frames of a domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

add function for obtaining highest possible memory address

Add a function for obtaining the highest possible physical memory
address of the system. This value is influenced by:

- hypervisor configuration (CONFIG_BIGMEM)
- processor capability (max. addressable physical memory)
- memory map at boot time
- memory hotplug capability

Add this value to xen_sysctl_physinfo in order to enable dom0 to do a
proper sizing of grant frame limits of guests.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

x86emul: properly refuse LOCK on most 0FC7 insns

When adding support for RDRAND/RDSEED/RDPID I didn't remember to also
update this special early check. Make it (hopefully) future-proof by
also refusing VEX-encodings.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

python/libxc: extend the call to get/set cap for credit2

Commit 68817024 ("xen: credit2: allow to set and get utilization cap")
added a new parameter. Implement it for the python binding as well.

Coverity-ID: 1418532

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

xen/credit2: add missing unlock

Coverity-ID: 1418531

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Merge remote-tracking branch 'origin/staging' into staging

Introduce migration precopy policy

This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
  mechanism of the precopy stage
- it can't take into account facts external to the immediate
  migration context, such external state transfer state, interactive
  user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
  what to do at the end of the precopy (they get an unconditional
  transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
  callbacks
- during the precopy phase of live migrations, consult this policy after
  each batch of pages transmitted and take the dictated action, which
  may be to a) abort the migration entirely, b) continue with the
  precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
  precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

Tidy libxc xc_domain_save

Tidy up libxc's xc_domain_save, removing unused paramaters
max_iters and max_factor, making matching changes to libxl.

Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

Config.mk: update OVMF changeset

Update to allow to build OVMF with GCC 7.2.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

libxl/xl: allow to get and set cap on Credit2.

Note that a cap is considered valid only if
it is within the [1, nr_vcpus]% interval.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

xen: credit2: improve distribution of budget (for domains with caps)

Instead of letting the vCPU that for first tries to get
some budget take it all (although temporarily), allow each
vCPU to only get a specific quota of the total budget.

This improves fairness, allows for more parallelism, and
prevents vCPUs from not being able to get any budget (e.g.,
because some other vCPU always comes before and gets it all)
for one or more period, and hence starve (and cause troubles
in guest kernels, such as livelocks, triggering of whatchdogs,
etc.).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

xen: credit2: allow to set and get utilization cap

As cap is already present in Credit1, as a parameter, all
the wiring is there already for it to be percolate down
to csched2_dom_cntl() too.

In this commit, we actually deal with it, and implement
setting, changing or disabling the cap of a domain.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>

x86/pv: Fix assertion failure in pv_emulate_privileged_op()

The ABI of {read,write}_msr() requires them to use x86_emul_hw_exception() if
they report an exception with the emulator core.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/levelling: Avoid NULL pointer dereference

Coverity points out that next is indeed NULL at times. Only try to read the
.cpuid_faulting field when we sure that next isn't NULL.

Fixes e7a370733bd "x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy"

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/hvm/ioreq: use bool rather than bool_t

This patch changes use of bool_t to bool in the ioreq server code. It also
fixes an incorrect indentation in a continuation line.

This patch is purely cosmetic. No semantic or functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/hvm/ioreq: rename .*pfn and .*gmfn to .*gfn

Since ioreq servers are only relevant to HVM guests and all the names in
question unequivocally refer to guest frame numbers, name them all .*gfn
to avoid any confusion.

This patch is purely cosmetic. No semantic or functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86emul/test: generate non-pie executable for 64bit builds

PIE may (and commonly will) result in the binary being loaded above
the 4Gb boundary, which can't work with at least the VZEROUPPER compat
mode test.

Add -fno-PIE and -no-pie when appropriate so that gcc won't generate a
PIE executable.

Reported-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

python: Add binding for non-blocking xs_check_watch()

xs_check_watch() checks for watch notifications without blocking.
Together with the binding for xs_fileno(), this makes it possible
to write event-driven clients in Python.

Signed-off-by: Euan Harris <euan.harris@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

python: Extract registered watch search logic from xspy_read_watch()

When a watch fires, xspy_read_watch() checks whether the client has
registered interest in the path which changed and, if so, returns the
path and a client-supplied token. The binding for xs_check_watch()
needs to do the same, so this patch extracts the search code into a
separate function.

Signed-off-by: Euan Harris <euan.harris@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

python: Add binding for xs_fileno()

xs_fileno() returns a file descriptor which receives events when Xenstore
watches fire. Exposing this in the Python bindings is a prerequisite
for writing event-driven clients in Python.

Signed-off-by: Euan Harris <euan.harris@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

x86/msr: introduce guest_wrmsr()

The new function is responsible for handling WRMSR from both HVM and PV
guests. Currently it handles only 2 MSRs:

MSR_INTEL_PLATFORM_INFO
MSR_INTEL_MISC_FEATURES_ENABLES

It has a different behaviour compared to the old MSR handlers: if MSR
is being handled by guest_wrmsr() then WRMSR will either succeed (if
a guest is allowed to access it and provided a correct value based on
its MSR policy) or produce a GP fault. A guest will never see
a successful WRMSR of some MSR unknown to this function.

guest_wrmsr() unifies and replaces the handling code from
vmx_msr_write_intercept() and priv_op_write_msr().

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
This also fixes a bug on AMD hardware where a guest which tries to
enable CPUID faulting via a direct write to the MSR will observe it
appearing to succeed, but because Xen actually ignored the write.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/msr: introduce guest_rdmsr()

The new function is responsible for handling RDMSR from both HVM and PV
guests. Currently it handles only 2 MSRs:

MSR_INTEL_PLATFORM_INFO
MSR_INTEL_MISC_FEATURES_ENABLES

It has a different behaviour compared to the old MSR handlers: if MSR
is being handled by guest_rdmsr() then RDMSR will either succeed (if
a guest is allowed to access it based on its MSR policy) or produce
a GP fault. A guest will never see a H/W value of some MSR unknown to
this function.

guest_rdmsr() unifies and replaces the handling code from
vmx_msr_read_intercept() and priv_op_read_msr().

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
This (along with the prep work in init_domain_msr_policy()) also fixes
a bug where Dom0 could probe and find CPUID faulting, even though it
couldn't actually use it.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy

Since each vCPU now has struct msr_vcpu_policy, use cpuid_faulting bit
from there in current logic and remove arch_vcpu::cpuid_faulting.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/msr: introduce struct msr_vcpu_policy

The new structure contains information about guest's MSRs that are
unique to each vCPU. It starts with only 1 MSR:

MSR_INTEL_MISC_FEATURES_ENABLES

Which currently has only 1 usable bit: cpuid_faulting.

Add 2 global policy objects: hvm_max and pv_max that are inited during
boot up. Availability of MSR_INTEL_MISC_FEATURES_ENABLES depends on
availability of MSR_INTEL_PLATFORM_INFO.

Add init_vcpu_msr_policy() which sets initial MSR policy for every vCPU
during domain creation with a special case for Dom0.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/msr: introduce struct msr_domain_policy

The new structure contains information about guest's MSRs that are
shared between all domain's vCPUs. It starts with only 1 MSR:

MSR_INTEL_PLATFORM_INFO

Which currently has only 1 usable bit: cpuid_faulting.

Add 2 global policy objects: hvm_max and pv_max that are inited during
boot up. It's always possible to emulate CPUID faulting for HVM guests
while for PV guests the H/W support is required.

Add init_domain_msr_policy() which sets initial MSR policy during
domain creation with a special case for Dom0.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>

x86/mm: move declaration of new_guest_cr3 to local pv/mm.h

It is only used by PV. The code can only be moved together with other
PV mm code.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move PV l4 table setup code

Move the code to pv/mm.c. Export pv_arch_init_memory via global
pv/mm.h and init_guest_l4_table via local pv/mm.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: factor out pv_arch_init_memory

Move the split l4 setup code into the new function. It can then be
moved to pv/ in a later patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move and rename map_ldt_shadow_page

Add pv prefix to it. Move it to pv/mm.c. Fix call sites.

Take the chance to change v to curr and d to currd.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move compat descriptor table manipulation code

Move them alongside the non-compat variants.

Change u{32,64} to uint{32,64}_t and the type of ret from long to int
while moving.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: split out descriptor table manipulation code

Move the code to pv/descriptor-tables.c. Change u64 to uint64_t while
moving. Use currd in do_update_descriptor.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: add pv prefix to {set,destroy}_gdt

Unfortunately they can't stay local to PV code because domain.c still
needs them. Change their names and fix up call sites. The code will be
moved later together with other descriptor table manipulation code.

Also move the declarations to pv/mm.h and provide stubs.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: split out pv grant table code

Move the code to pv/grant_table.c. Nothing needs to be done with
regard to headers.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move map_guest_l1e to pv/mm.c

And export the function via pv/mm.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: remove the now unused inclusion of pv/emulate.h

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move ro page fault emulation code

Move the code to pv/ro-page-fault.c. Create a new header file
asm-x86/pv/mm.h and move the declaration of pv_ro_page_fault there.
Include the new header file in traps.c.

Fix some coding style issues. The prefixes (ptwr and mmio) are
retained because there are two sets of emulation code in the same file.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move {un,}adjust_guest_l*e to pv/mm.h

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move update_intpte to pv/mm.h

While at it, change the type of preserve_ad to bool. Also move
UPDATE_ENTRY there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: export get_page_from_mfn

It will be used later in multiple files.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/mm: move guest_get_eff_l1e to pv/mm.h

Make it static inline. It will be used by map_ldt_shadow_page and ro
page fault emulation code later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

libxl: remove list callback from device framework

As we have generic functions to get device list
(libxl__device_list) no need to have callback in
the framework. To resolve issue when XS entry
doesn't match device name, device type is
extended with field "entry" which keeps XS entry.

Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

x86/PSR: fix cmdline option handling

Commit 0ade5e causes a bug that the psr features presented in cmdline
cannot be correctly enumerated.
1. If there is only 'psr=', then CMT is enumerated which is not right.
2. If cmdline is 'psr=cmt,cat,cdp,mba', only the last feature is enumerated.

This patch fixes the issues.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

custom parameter handling fixes

The recent changes to their handling introduced a few false warnings,
due to checks looking at the wrong string slot. While going through all
those commits and looking for patterns similar to the "dom0_mem=" I've
noticed this with, I also realized that there were other issues with
"dom0_nodes=" and "rmrr=", partly pre-existing, but partly also due to
those recent changes not having gone far enough.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

libxl: provide typedefs for device framework functions

Use the new typedefs to avoid copy-n-paste everywhere.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>

x86/mem_access: fix mis-indented line

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

perfc: fix build after commit fc32575968 when CONFIG_PERF_COUNTERS=y

The commit fc32575968 "public/sysctl: drop unnecessary typedefs and
handles" went a bit too far by replacing all xen_systcl_*_t type to
struct xen_sysctl_*.

However, xen_sysctl_perfc_val_t was a typedef on uint32_t and therefore
is not associated to a structure.

Use xen_sysctl_perfc_val_t to fix the build when CONFIG_PERF_COUNTERS=y

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

grant-table: simplify get_paged_frame

The implementation of get_paged_frame is currently different whether the
architecture support sharing memory or paging memory. Both
version are extremely similar so it is possible to consolidate in a
single implementation.

The main difference is the x86 version will allow grant on foreign page
when using HVM/PVH whilst Arm does not. At the moment, on x86 foreign pages
are only allowed for PVH Dom0. It seems that foreign pages should never
be granted so deny them

The check for shared/paged memory are now gated with the respective ifdef.
Potentially, dummy p2m_is_shared/p2m_is_paging could be implemented for
Arm.

Lastly remove pointless parenthesis in the code modified.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

xen: credit2: fix spinlock irq-safety violation

In commit ad4b3e1e9df34 ("xen: credit2: implement
utilization cap") xfree() was being called (for
deallocating the budget replenishment timer, during
domain destruction) inside an IRQ disabled critical
section.

That must not happen, as it uses the mem-pool's lock,
which needs to be taken with IRQ enabled. And, in fact,
we crash (in debug builds):

(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at spinlock.c:47
(XEN) ****************************************

Let's, therefore, kill and deallocate the timer outside of
the critical sections, when IRQs are enabled.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

xen/arm: page: Prefix memory types with MT_

This will avoid confusion in the code when using them.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: page: Use directly BUFFERABLE and drop DEV_WC

DEV_WC is only used for PAGE_HYPERVISOR_WC and does not bring much
improvement.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: page: Remove unused attributes DEV_NONSHARED and DEV_CACHED

They were imported from non-LPAE Linux, but Xen is LPAE only. It is time
to do some clean-up in the memory attribute and keep only what make
sense for Xen. Follow-up patch will do more clean-up.

Also, update the comment saying our attribute matches Linux.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

public/sysctl: drop unnecessary typedefs and handles

By virtue of the struct xen_sysctl container structure, most of them
are really just cluttering the name space.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>

public/domctl: drop unnecessary typedefs and handles

By virtue of the struct xen_domctl container structure, most of them
are really just cluttering the name space.

While doing so,
- convert an enum typed (pt_irq_type_t) structure field to a fixed
width type,
- make x86's paging_domctl() and descendants take a properly typed
handle,
- add const in a few places.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>

x86/hvm: break out __hvm_copy()'s translation logic

It will be reused by later changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/hvm: rename enum hvm_copy_result to hvm_translation_result

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

add new domctl hypercall to set grant table resource limits

Add a domctl hypercall to set the domain's resource limits regarding
grant tables.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

clean up grant_table.h

Many definitions can be moved from xen/grant_table.h to
common/grant_table.c now, as they are no longer used in other sources.

Rearrange the elements of struct grant_table to minimize holes due to
alignment of elements.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

move XENMAPSPACE_grant_table code into grant_table.c

The x86 and arm versions of XENMAPSPACE_grant_table handling are nearly
identical. Move the code into a function in grant_table.c and add an
architecture dependant hook to handle the differences.

Switch to mfn_t in order to be more type safe.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>

x86/vvmx: add hvm_intsrc_vector support to nvmx_intr_intercept()

Under the following circumstances:

1. L1 doesn't enable PAUSE exiting or PAUSE-loop exiting controls
2. L2 executes PAUSE in a loop with RFLAGS.IE == 0

L1's PV IPI through event channel will never reach the target L1's vCPU
which runs L2 because nvmx_intr_intercept() doesn't know about
hvm_intsrc_vector. This leads to infinite L2 loop without nested
vmexits and can cause L1 to hang.

The issue is easily reproduced with Qemu/KVM on CentOS-7-1611 as L1
and an L2 guest with SMP.

Fix nvmx_intr_intercept() by injecting hvm_intsrc_vector irq into L1
which will cause nested vmexit.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

xen/arm: Replace ioremap_attr(PAGE_HYPERVISOR_NOCACHE) call by ioremap_nocache

ioremap_cache is a wrapper of ioremap_attr(...).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: traps: Improve logging for data/prefetch abort fault

Walk the hypervisor page table for data/prefetch abort fault to help
diagnostics error in the page tables.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: traps: Introduce a helper to read the hypersivor fault register

While ARM32 has 2 distinct registers for the hypervisor fault register
(one for prefetch abort, the other for data abort), AArch64 has only
one.

Currently, the logic is open-code but a follow-up patch will require to
read it too. So move the logic in a separate helper and use it instead
of open-coding it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Introduce hsr_xabt to gather common bits between hsr_{d,i}abt

This will allow to consolidate some part of the data abort and prefetch
abort handling in a single function later on.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Add FnV field in hsr_*abt

FnV (FAR not Valid) bit was introduced by ARMv8 in both AArch32 and
AArch64 (See D7-2275, D7-2277, G6-4958, G6-4962 in ARM DDI 0487B.a).

Note the new revision of ARMv8 defined more bits in HSR. They haven't
been added at the moment because we have no use of them in Xen.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: arm32: Don't define FAR_EL1

Aliasing FAR_EL1 to IFAR is wrong because on ARMv8 FAR_EL1[31:0] is
architecturally mapped to DFAR and FAR_EL1[63:32] to IFAR.

As FAR_EL1 is not currently used in ARM32 code, remove it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>