Dario Faggioli [Sat, 6 Feb 2016 01:25:45 +0000 (02:25 +0100)]
xenalyze: handle scheduling events
so the trace will show properly decoded info,
rather than just a bunch of hex codes.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
---
Changes from v1:
* SCHED_DOM_{ADD,REM} handling slightly changed, to avoid
confusion with DOM0_DOM_{ADD,REM} (introduced later in
the series);
* '} * r =' turned into '} *r =', as requested
during review.
Dario Faggioli [Tue, 16 Feb 2016 12:13:47 +0000 (13:13 +0100)]
xentrace: formats: add domain create and destroy events.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Changes from v2:
* new patch in the series.
Dario Faggioli [Sat, 6 Feb 2016 01:25:16 +0000 (02:25 +0100)]
xentrace: formats: add events from Credit2 scheduler
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
---
Changes from v1:
* fix typo in two events (rq_idx/rq_id)., as requested during
review.
Dario Faggioli [Sat, 6 Feb 2016 01:25:04 +0000 (02:25 +0100)]
xentrace: formats: add events from Credit scheduler
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
Dario Faggioli [Sat, 6 Feb 2016 01:24:52 +0000 (02:24 +0100)]
xentrace: formats: update format of scheduling events
to include the vcpu IDs, in a way that matches
how the "dom:vcpu" couple is displayed in other
events (runstate changes).
Also add the trace for TRC_SCHED_SHUTDOWN_CODE which
was missing and was done via SCHEDOP_shutdown_code hypercall.
(TRC_SCHED_SHUTDOWN trace was present).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
---
Changes from v1:
* enhanced changelog, as suggested during review.
Shannon Zhao [Fri, 4 Mar 2016 15:45:52 +0000 (16:45 +0100)]
arm/timer: fix panic when booting with DT
While to support ACPI, patch "arm/acpi: Parse GTDT to initialize timer"
refactors the functions preinit_xen_time and init_xen_time. But it
wrongly moves the platform_get_irq from init_xen_time to
preinit_dt_xen_time and this will cause booting failure.
So move platform_get_irq back to init_xen_time to fix it.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 4 Mar 2016 13:15:53 +0000 (14:15 +0100)]
x86/HVM: limit flushing on cache attribute pinning adjustments
Avoid cache flush on EPT when removing a UC- range, since when used
this type gets converted to UC anyway (there's no UC- among the types
valid in MTRRs and hence EPT's emt field).
We might further wwant to consider only forcing write buffer flushes
when removing WC ranges.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 4 Mar 2016 13:14:25 +0000 (14:14 +0100)]
x86/HVM: remove unnecessary indirection from hvm_get_mem_pinned_cacheattr()
Its return value can easily serve the purpose. We cannot, however,
return unspecific "success" anymore for a domain of the wrong type -
since no caller exists that would call this for PV domains, simply add
an ASSERT().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 4 Mar 2016 13:12:11 +0000 (14:12 +0100)]
x86/HVM: honor cache attribute pinning for RAM only
Call hvm_get_mem_pinned_cacheattr() for RAM ranges only, and only when
the guest has a physical device assigned: XEN_DOMCTL_pin_mem_cacheattr
is documented to be intended for RAM only.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Paul Durrant [Fri, 4 Mar 2016 13:08:38 +0000 (14:08 +0100)]
public/io/netif.h: make control ring hash protocol more general
This patch modified the control ring protocol (of which there is
not yet an implementation) to make it more general. Most of the
concepts are not limited to toeplitz hashing so it's best not to
make them unnecessarily specific.
Apart from changing the names of various definitions and modifying
comments, this patch:
- Adds a new control message type to select a hash algorithm.
- Adds a reference implementation of the toeplitz hash.
- Changes the 'toeplitz' extra info fragment into a 'hash' extra
info fragment and replaces the octet of padding with the index of
the algorithm that was used to create the hash value.
- Relaxes the restriction that the mapping table has to be
power-of-2 sized.
The patch also fixes a few spelling typos noticed along the way.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Shannon Zhao [Wed, 2 Mar 2016 07:38:00 +0000 (08:38 +0100)]
arm/acpi: Add acpi parameter to enable/disable acpi
Define new command line parameter "acpi" to enable/disable acpi.
This implements the following policy to decide whether ACPI should be
used to boot the system:
- acpi=off: ACPI will not be used to boot the system, even if there is
no alternative available (e.g., device tree is empty)
- acpi=force: only ACPI will be used to boot the system; if that fails,
there will be no fallback to alternative methods (such as device tree)
- otherwise, ACPI will be used as a fallback if the device tree turns
out to lack a platform description; the heuristic to decide this is
whether /chosen is the only node present at depth 1
Shannon Zhao [Wed, 2 Mar 2016 07:40:00 +0000 (08:40 +0100)]
arm/acpi: Parse GTDT to initialize timer
Parse GTDT (Generic Timer Descriptor Table) to initialize timer. Using
the information presented by GTDT to initialize the arch timer (not
memory-mapped).
Shannon Zhao [Wed, 2 Mar 2016 07:37:00 +0000 (08:37 +0100)]
arm/gic: Add ACPI support for GIC preinit
Since ACPI 6.0 defines that GIC Distributor Structure contains the GIC
version filed, it could get GIC version from that. Then call acpi device
initializing function to preinit GIC device.
Parth Dixit [Wed, 2 Mar 2016 07:37:00 +0000 (08:37 +0100)]
arm/gic-v2: Add ACPI boot support for GICv2
ACPI on Xen hypervisor uses MADT table for proper GIC initialization.
First get the GIC version from GIC Distributor. Then parse GIC related
subtables, collect CPU interface and distributor addresses and call
driver initialization function (which is hardware abstraction agnostic).
In a similar way, FDT initialize GICv2.
Shannon Zhao [Wed, 2 Mar 2016 07:39:00 +0000 (08:39 +0100)]
arm/acpi: Add ACPI support for SMP initialization
ACPI 5.1 only has two explicit methods to boot up SMP, PSCI and Parking
protocol, but the Parking protocol is only specified for ARMv7 now, so
make PSCI as the only way for the SMP boot protocol before some updates
for the ACPI spec or the Parking protocol spec.
ACPI only supports PSCI 0.2+, since prior to PSCI 0.2 function IDs are
not well-defined.
Parth Dixit [Wed, 2 Mar 2016 07:35:00 +0000 (08:35 +0100)]
arm/acpi: Parse MADT to map logical cpu to MPIDR and get cpu_possible_map
MADT contains the information for MPIDR which is essential for SMP
initialization, parse the GIC cpu interface structures to get the MPIDR
value and map it to cpu_logical_map(), and add enabled cpu with valid
MPIDR into cpu_possible_map.
Move BAD_MADT_ENTRY to common place, parenthesize its parameters and
drop the pointer cast.
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Naresh Bhat <naresh.bhat@linaro.org> Signed-off-by: Parth Dixit <parth.dixit@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Shannon Zhao [Wed, 2 Mar 2016 07:43:00 +0000 (08:43 +0100)]
arm/acpi: Parse FADT table and get PSCI flags
There are two flags: PSCI_COMPLIANT and PSCI_USE_HVC. When set, the
former signals to the OS that the hardware is PSCI compliant. The latter
selects the appropriate conduit for PSCI calls by toggling between
Hypervisor Calls (HVC) and Secure Monitor Calls (SMC). FADT table
contains such information, parse FADT to get the flags for furture
usage.
Since STAO table and the GIC version are introduced by ACPI 6.0, we will
check the version and only parse FADT table with version >= 6.0. If
firmware provides ACPI tables with ACPI version less than 6.0, OS will
be messed up with those information, so disable ACPI if we get an FADT
table with version less than 6.0.
Shannon Zhao [Wed, 2 Mar 2016 07:37:00 +0000 (08:37 +0100)]
arm/acpi: Add basic ACPI initialization
acpi_boot_table_init() will be called in start_xen to get the RSDP and
all the table pointers. With this patch, we can get ACPI boot-time
tables from firmware on ARM64.
Juergen Gross [Thu, 3 Mar 2016 07:55:30 +0000 (08:55 +0100)]
silence affinity messages on suspend/resume
When taking cpus offline for suspend or bringing them online on resume
again the scheduler might issue debug messages when temporarily
breaking vcpu affinity or restoring the original affinity settings.
The resume message can be removed completely, while the message when
breaking affinity should only be issued if the breakage is permanent.
Yang Hongyang [Wed, 2 Mar 2016 03:44:50 +0000 (11:44 +0800)]
Remus: update email address in MAINTAINERS file
Signed-off-by: Yang Hongyang <imhy.yang@gmail.com> Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Fri, 19 Feb 2016 15:13:17 +0000 (09:13 -0600)]
travis: add IRC notifications
This will cause failed builds and when the build flips back to success
to be reported to #xentest on FreeNode. The syntax of the message will
be:
<travis-ci> xen-project/xen#BUILDID (BRANCH - REVISION : COMMITTER)
<travis-ci> Change view :
https://github.com/xen-project/xen/compare/RANGE
<travis-ci> Build details :
https://travis-ci.org/xen-project/xen/builds/BUILDID
The blob was generated with the following command:
travis encrypt -r xen-project/xen 'chat.freenode.net#xentest'
The reason it is encrypted is to prevent people that fork the repo to
spam #xentest. This value will only properly decrypt when running within
the xen-project/xen space.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Doug Goldstein [Fri, 19 Feb 2016 02:57:04 +0000 (20:57 -0600)]
m4/python: fix checks for Python library support
AC_CHECK_LIB() was running gcc -Llib -lm -lutils conftest.c which on
platforms that do as needed operations by default will result in
underlinking. Instead AC_CHECK_LIB() suggests supplying the extra
libraries necessary in a 5th argument.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Liang Li [Mon, 11 Jan 2016 08:52:10 +0000 (16:52 +0800)]
libxc: Expose the MPX cpuid flag to guest
If hardware support memory protect externsion, expose this feature
to guest by default. Users don't have to use a 'cpuid= ' option in
config file to turn it on.
Signed-off-by: Liang Li <liang.z.li@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Wed, 24 Feb 2016 15:03:29 +0000 (09:03 -0600)]
tools/configure: only require bcc/ld86/as86 when needed
bcc/ld86/as86 are necessary when we build ROMBIOS. However if we do not
build it (and are not building qemu-trad), the build requirements are
overly strict and can lead to failures.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Campbell [Wed, 17 Feb 2016 10:34:24 +0000 (10:34 +0000)]
xl: NULL terminate buf when reading dom0 /proc/uptime
The contents of /proc/uptime is typically something like "80164.57
640617.58", so the existing 512 byte buffer is more than large enoguh,
so reduce its effective size to 511 bytes and ensure we include a
NULL.
Otherwise Coverity points out that we pass a potentially unterminated
string to strtok. In practice this likely doesn't actually cause
issues (at least on Linux) because the
string should always contain a space so we will stop parsing.
CID: 105590
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Feng Wu [Tue, 1 Mar 2016 13:42:13 +0000 (14:42 +0100)]
vmx: VT-d posted-interrupt core logic handling
This is the core logic handling for VT-d posted-interrupts. Basically it
deals with how and when to update posted-interrupts during the following
scenarios:
- vCPU is preempted
- vCPU is slept
- vCPU is blocked
When vCPU is preempted/slept, we update the posted-interrupts during
scheduling by introducing two new architecutral scheduler hooks:
vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we
introduce a new architectural hook: arch_vcpu_block() to update
posted-interrupts descriptor.
Besides that, before VM-entry, we will make sure the 'NV' filed is set
to 'posted_intr_vector' and the vCPU is not in any blocking lists, which
is needed when vCPU is running in non-root mode. The reason we do this check
is because we change the posted-interrupts descriptor in vcpu_block(),
however, we don't change it back in vcpu_unblock() or when vcpu_block()
directly returns due to event delivery (in fact, we don't need to do it
in the two places, that is why we do it before VM-Entry).
When we handle the lazy context switch for the following two scenarios:
- Preempted by a tasklet, which uses in an idle context.
- the prev vcpu is in offline and no new available vcpus in run queue.
We don't change the 'SN' bit in posted-interrupt descriptor, this
may incur spurious PI notification events, but since PI notification
event is only sent when 'ON' is clear, and once the PI notificatoin
is sent, ON is set by hardware, hence no more notification events
before 'ON' is clear. Besides that, spurious PI notification events are
going to happen from time to time in Xen hypervisor, such as, when
guests trap to Xen and PI notification event happens, there is
nothing Xen actually needs to do about it, the interrupts will be
delivered to guest atht the next time we do a VMENTRY.
Suggested-by: Yang Zhang <yang.z.zhang@intel.com> Suggested-by: Dario Faggioli <dario.faggioli@citrix.com> Suggested-by: George Dunlap <george.dunlap@citrix.com> Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Ian Campbell [Wed, 17 Feb 2016 10:39:40 +0000 (10:39 +0000)]
xl: close nullfd after dup2'ing it to stdin
We assert that nullfd if not std{in,out,err} since that would result
in closing one of the just dup2'd fds. For this to happen
std{in,out,err} would have needed to be closed, at which point all
sorts of other things could go wrong.
Haozhong Zhang [Tue, 1 Mar 2016 13:38:22 +0000 (14:38 +0100)]
x86/hvm: move saving/loading vcpu's TSC to common code
Both VMX and SVM save/load vcpu's TSC when saving/loading vcpu's
context, so this patch moves saving/loading vcpu's TSC to the common
functions hvm_[save|load]_cpu_ctxt().
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Haozhong Zhang [Tue, 1 Mar 2016 13:37:53 +0000 (14:37 +0100)]
x86/hvm: replace architecture TSC scaling by a common function
This patch implements a common function hvm_scale_tsc() to scale TSC by
using TSC scaling information collected by architecture code.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> for SVM bits Reviewed-by: Jan Beulich <jbeulich@suse.com>
Doug Goldstein [Mon, 29 Feb 2016 15:09:09 +0000 (16:09 +0100)]
build: consolidate CONFIG_HAS_VIDEO and CONFIG_VIDEO
No real advantage to keeping these separate. The use case of this from
Linux is when the platform or target board has support for something but
the user wants to be given the option to disable it.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Doug Goldstein [Mon, 29 Feb 2016 15:08:43 +0000 (16:08 +0100)]
build: consolidate CONFIG_HAS_VGA and CONFIG_VGA
No real advantage to keeping these separate. The use case of this from
Linux is when the platform or target board has support for something but
the user wants to be given the option to disable it.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Corneliu ZUZU [Mon, 29 Feb 2016 15:07:49 +0000 (16:07 +0100)]
arm/monitor vm-events: implement guest-request support
This patch adds ARM support for guest-request monitor vm-events.
Note: on ARM hypercall instruction skipping must be done manually
by the caller. This will probably be changed in a future patch.
Summary of changes:
== Moved to common-side:
* XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST handling (moved from X86
arch_monitor_domctl_event to common monitor_domctl)
* hvm_event_guest_request->vm_event_monitor_guest_request
* hvm_event_traps->vm_event_monitor_traps (also added target vcpu as param)
* guest-request bits from X86 'struct arch_domain' (to common 'struct domain')
== ARM implementations:
* do_hvm_op now handling of HVMOP_guest_request_vm_event => calls
vm_event_monitor_guest_request (as on X86)
* arch_monitor_get_capabilities->vm_event_monitor_get_capabilities,
updated to reflect support for XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST
* vm_event_init_domain (does nothing), vm_event_cleanup_domain
== Misc:
* vm_event_fill_regs, no longer X86-specific. ARM-side implementation of this
function currently does nothing, that will be added in a separate patch.
Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Haozhong Zhang [Mon, 29 Feb 2016 15:06:40 +0000 (16:06 +0100)]
x86/hvm: setup TSC scaling ratio
This patch adds a field tsc_scaling_ratio in struct hvm_domain to record
the per-domain TSC scaling ratio, and sets it in tsc_set_info().
Before setting the per-domain TSC scaling ratio, we check its validity
in tsc_set_info(). If an invalid ratio is given, we will leave the
default value in tsc_scaling_ratio (i.e. ratio = 1) and setup guest TSC
as if no TSC scaling is used:
* For TSC_MODE_DEFAULT,
- if a user-specified TSC frequency is given, we will set the guest
TSC frequency to it; otherwise, we set it to the host TSC frequency.
- if guest TSC frequency does not equal to host TSC frequency, we will
emulate guest TSC (i.e. d->arch.vtsc is set to 1). In both cases,
guest TSC runs in the guest TSC frequency.
* For TSC_MODE_PVRDTSCP,
- we set the guest TSC frequency to the host TSC frequency.
- guest rdtsc is executed natively in the host TSC frequency as
before.
- if rdtscp is not available to guest, it will be emulated; otherwise,
it will be executed natively. In both cases, guest rdtscp gets TSC
in the host TSC frequency as before.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Parth Dixit [Mon, 29 Feb 2016 15:06:07 +0000 (16:06 +0100)]
arm/acpi: read acpi memory info from uefi
ACPI memory is seperate from conventional memory and should be marked
as reserved while passing to DOM0. Create a new meminfo structure to
store all the acpi tables listed in uefi.
Shannon Zhao [Mon, 29 Feb 2016 15:05:32 +0000 (16:05 +0100)]
arm/acpi: add placeholder for efi and acpi load address
We will create EFI table, memory description table and some of acpi
tables and we're going to map them to kinfo->gnttab_start of Dom0.
Add placeholder for the starting address for loading in DOM0 and the
size of new added tables. Also add a placeholder to store the new
created tables.
Dario Faggioli [Mon, 29 Feb 2016 14:58:49 +0000 (15:58 +0100)]
credit1: avoid boosting vCPUs being "just" migrated
Moving a vCPU to a different pCPU means offlining it and
then waking it up, on the new pCPU. Credit1 grants BOOST
priority to vCPUs that wakes up, with the aim of improving
I/O latency. The net effect of this all is that vCPUs get
boosted when migrating, which shouldn't happen.
For instance, this causes scheduling anomalies and,
potentially, performance problems, as reported here:
http://lists.xen.org/archives/html/xen-devel/2015-10/msg02851.html
This patch fixes this by noting down (by means of a flag)
the fact that the vCPU is about to undergo a migration.
This way we can tell, later, during a wakeup, whether the
vCPU is migrating or unblocking, and decide whether or
not to apply the boosting.
Note that it is important that atomic-safe bit operations
are used when manipulating vCPUs' flags. Take the chance
and add a comment about this.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
tools/libxl: seperate device init/cleanup from checkpoint device layer
we call (init|cleanup)_subkind_nic and (init|cleanup)_subkind_drbd_disk
directly in checkpoint device. Move them to libxl_remus.c, Call them before
calling libxl__checkpoint_devices_setup() or after calling
libxl__checkpoint_devices_teardown().
it is pure refactoring and no functional changes.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wen Congyang [Tue, 15 Dec 2015 05:59:52 +0000 (13:59 +0800)]
tools/libxl: move remus state into a seperate structure
Add a new structure remus state, and move concrete layer's private
member to remus state.
it is pure refactoring and no functional changes.
Init interval in libxl__remus_setup(). It is safe to move this initialisation,
because this value is only used for remus, and remus will use this value after
libxl__remus_setup().
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
tools/libxl: store remus_ops in checkpoint device state
Checkpoint device is an abstract layer to do checkpoint.
COLO can also use it to do checkpoint. But there are
still some codes in checkpoint device which touch remus.
This patch and:
tools/libxl: move remus state into a seperate structure
tools/libxl: seperate device init/cleanup from checkpoint device layer
will seperate remus from checkpoint device layer.
We use remus ops directly in checkpoint device. Store it
in checkpoint device state so that we do not aware of
remus_ops in the checkpoint device layer.
It is pure refactoring and no functional changes.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Acked-by:Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
The patch also fixes the following backword compatibility:
The error code ERROR_REMUS_XXX was introduced in Xen 4.5, and
changed to ERROR_CHECKPOINT_XXX after previous renaming.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Reviewed-Lightly-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wen Congyang [Mon, 14 Dec 2015 07:01:44 +0000 (15:01 +0800)]
migration/save: pass checkpointed_stream from libxl to libxc
Pass checkpointed_stream from libxl to libxc.
It won't affact legacy migration because legacy migration
won't use this param.
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wen Congyang [Mon, 14 Dec 2015 06:14:28 +0000 (14:14 +0800)]
tools/libxl: introduce enum type libxl_checkpointed_stream
Introduce enum type libxl_checkpointed_stream in IDL.
rename the last argument of migrate_receive from "remus" to
"checkpointed" since the semantics of this parameter has
changed.
NOTE:
libxl_domain_restore_params and domain_create aren't changed here,
checkpointed_stream is still an int. Because we will pass the
value from libxl to libxc.
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Currently struct libxl__domain_suspend_state contains 2 type of states,
one is save state, another is suspend state. This patch separates those
two out.
The motivation of this is that COLO will need to do suspend/resume
continuously, we need a more common suspend state.
After this change, dss stands for libxl__domain_save_state,
dsps stands for libxl__domain_suspend_state.
Also introduce libxl__domain_suspend_init to initialise the
libxl__domain_suspend_state.
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by:Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
tools/libxl: move save/restore code into libxl_dom_save.c
This is purely code motion.
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
After previous refactoring, we are now able to move all remus code
into a separate file libxl_remus.c.
Export following functions for internal use:
- setup/teardown Remus:
* libxl__remus_setup
* libxl__remus_teardown
* libxl__remus_restore_setup
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by:Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wen Congyang [Tue, 16 Feb 2016 03:41:16 +0000 (11:41 +0800)]
libxl/remus: init checkpoint callback in Remus setup callback
Init stream {read/write} state checkpoint_callback, suspend/resume/checkpoint
callback in Remus setup callback.
There's no functional change, it's just refactoring so that we can move
all remus code into one file.
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Bob Moore [Fri, 26 Feb 2016 11:37:18 +0000 (12:37 +0100)]
ACPICA / Headers: Add support for CSRT and DBG2 ACPI tables
These tables are defined outside of the ACPI specification.
Signed-off-by: Bob Moore <robert.moore@intel.com>
[Linux commit 4e2f9c278ad84196991fcf6f6646a3e15967fe90]
[only port the DBG2 changes] Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Doug Goldstein [Fri, 26 Feb 2016 11:31:47 +0000 (12:31 +0100)]
build: consolidate CONFIG_HAS_ACPI and CONFIG_ACPI
No real advantage to keeping these separate. The use case of this from
Linux is when the platform or target board has support for something but
the user wants to be given the option to disable it.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
David Vrabel [Fri, 26 Feb 2016 11:30:11 +0000 (12:30 +0100)]
x86/hvm: add HVM_PARAM_X87_FIP_WIDTH
The HVM parameter HVM_PARAM_X87_FIP_WIDTH to allow tools and the guest
to adjust the width of the FIP/FDP registers to be saved/restored by
the hypervisor. This is in case the hypervisor hueristics do not do
the right thing.
Add this parameter to the set saved during domain save/migrate.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
David Vrabel [Fri, 26 Feb 2016 11:16:13 +0000 (12:16 +0100)]
x86/fpu: add a per-domain field to set the width of FIP/FDP
The x86 architecture allows either: a) the 64-bit FIP/FDP registers to
be restored (clearing FCS and FDS); or b) the 32-bit FIP/FDP and
FCS/FDS registers to be restored (clearing the upper 32-bits).
Add a per-domain field to indicate which of these options a guest
needs. The options are: 8, 4 or 0. Where 0 indicates that the
hypervisor should automatically guess the FIP width by checking the
value of FIP/FDP when saving the state (this is the existing
behaviour).
The FIP width is initially automatic but is set explicitly in the
following cases:
- 32-bit PV guest: 4
- Newer CPUs that do not save FCS/FDS: 8
The x87_fip_width field is placed into an existing 1 byte hole in
struct arch_domain.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Fix build.
Jan Beulich [Fri, 26 Feb 2016 11:15:36 +0000 (12:15 +0100)]
vVMX: use latched VMCS machine address
Instead of calling domain_page_map_to_mfn() over and over, latch the
guest VMCS machine address unconditionally (i.e. independent of whether
VMCS shadowing is supported by the hardware).
Since this requires altering the parameters of __[gs]et_vmcs{,_real}()
(and hence all their callers) anyway, take the opportunity to also drop
the bogus double underscores from their names (and from
__[gs]et_vmcs_virtual() as well).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Liang Z Li <liang.z.li@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Fri, 26 Feb 2016 11:15:09 +0000 (12:15 +0100)]
x86emul: simplify IRET logic
Since we only handle real mode, we need to consider neither non-ring0
nor IOPL. Also for POPF the mode_iopl() check can really be inside the
not-ring-0 body.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 26 Feb 2016 11:14:39 +0000 (12:14 +0100)]
x86emul: limit-check branch targets
All branches need to #GP when their target violates the segment limit
(in 16- and 32-bit modes) or is non-canonical (in 64-bit mode). For
near branches facilitate this via a zero-byte instruction fetch from
the target address (resulting in address translation and validation
without an actual read from memory), while far branches get dealt with
by breaking up the segment register loading into a read-and-validate
part and a write one. The latter at once allows correcting some
ordering issues in how the individual emulation steps get carried out:
Before updating machine state, all exceptions unrelated to that state
updating should have got raised (i.e. the only ones possibly resulting
in partly updated state are faulting memory writes [pushes]).
Note that while not immediately needed here, write and distinct read
emulation routines get updated to deal with zero byte accesses too, for
overall consistency.
Reported-by: 刘令 <liuling-it@360.cn> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>