arm64: fix incorrect memory region size in TCR_EL2
The maximum and minimum values for TxSZ depend on level of
translation as per AArch64 Virtual Memory System Architecture.
According to ARM specification DDI0487A_h (sec D4.2.2, page 1752),
the minimum TxSZ value is 16. If TxSZ is programmed to a value
smaller than 16 then it is IMPLEMENTATION DEFINED.
This patch sets T0SZ to (64-48)bits since XEN uses all 4 levels
to cover 48bit (256TB) virtual address instead of value zero.
Doug Goldstein [Wed, 16 Mar 2016 14:11:00 +0000 (09:11 -0500)]
tmem: drop direct usage of opt_tmem
Most callers of tmem_freeable_pages() checked to see if by checking
opt_tmem before calling tmem_freeable_pages() but not all of them did. This
seemed like an oversight and to avoid similar situations like that,
stick the check of tmem into tmem_freeable_pages(). Similarly other
places should not directly check opt_tmem but instead use the
tmem_enabled() helper function.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Razvan Cojocaru [Thu, 3 Mar 2016 13:58:00 +0000 (15:58 +0200)]
libxc: Have xc_translate_foreign_address() set errno properly
Currently it's possible for xc_translate_foreign_address() to fail
and errno still be set to success. This patch fixes the issue.
Based on the first half of Don Slutz' patch:
http://lists.xen.org/archives/html/xen-devel/2014-03/msg03720.html
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Tue, 27 Oct 2015 16:21:32 +0000 (16:21 +0000)]
elf: Add relocation types to elfstructs.h
GCC generates R_X86_64_64, R_X86_64_PC32, and R_X86_64_PLT32
relocations so those are the ones we need initially
to support xSplice.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
David Vrabel [Tue, 15 Mar 2016 11:22:04 +0000 (12:22 +0100)]
hvmloader: add high memory e820 region if needed
If the MMIO hole is large and hvmloader needs to relocate memory to
immediately above the 4 GiB boundary, the e820 presented to the guest
will not have a RAM region above 4 GiB.
e.g., a guest with 3 GiB of memory and a 2 GiB MMIO hole will only see
2 GiB.
The required e820 memory region above 4 GiB needs to be added, and not
just filled in.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 15 Mar 2016 11:21:38 +0000 (12:21 +0100)]
x86: move both exception tables into .rodata
While they are being written during early boot (when sorting them),
that writing takes place before we actually start fiddling with page
table permissions, so these tables can benefit from getting write
protected just like ordinary r/o data does (for now only when using
2M mappings).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 15 Mar 2016 11:21:04 +0000 (12:21 +0100)]
x86: partially revert use of 2M mappings for hypervisor image
As explained by Andrew in
http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg01380.html
that change makes the uncompressed xen.gz image too large for certain
boot environments. As a result this change makes some of the effects of
commits cf393624ee ("x86: use 2M superpages for text/data/bss
mappings") and 53aa3dde17 ("x86: unilaterally remove .init mappings")
conditional, restoring alternative previous code where necessary. This
is so that xen.efi can still benefit from the new mechanisms, as it is
unaffected by said limitations.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ross Lagerwall [Tue, 15 Mar 2016 11:19:13 +0000 (12:19 +0100)]
vmx: restore debug registers when injecting #DB traps
Commit a929bee0e652 ("x86/vmx: Fix injection of #DB traps following
XSA-156") prevents an infinite loop in certain #DB traps. However, it
changed the behavior to not call hvm_hw_inject_trap() for #DB and #AC
traps which which means that the debug registers are not restored
correctly and nullified commit b56ae5b48c38 ("VMX: fix/adjust trap
injection").
To fix this, restore the original code path through hvm_inject_trap(),
but ensure that the struct hvm_trap is populated with all the required
data.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
The change to the enum ordering broke this assumption and caused functional
problems for the nested hap code. As it may be error prone to audit and find
all other p2m_access users assuming bitmask semantics, instead restore the
previous enum order and make it explict that bitmask semantics are to be
preserved for the read, write and execute access types.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Mon, 14 Mar 2016 12:02:56 +0000 (13:02 +0100)]
x86emul: fix 32-bit test build
Commit 5644ce0142 ("x86emul: relax asm() constraints") introduced a
64-bit only instruction suffix, which breaks running the emulator test
on a 32-bit system. Mirror __OS (and _OP for completeness) to the test
wrapper source file.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Quan Xu [Mon, 14 Mar 2016 11:59:39 +0000 (12:59 +0100)]
AMD IOMMU: fix an init time spinlock flaw
pcidevs_lock doesn't require interrupts to be disabled while being acquired.
However there remains an exception in AMD IOMMU code, where the lock is
acquired with interrupt disabled. This inconsistency might lead to deadlock.
The fix is straightforward to use spin_lock instead. Also interrupt has been
enabled when this function is invoked, so we're sure consistency around
pcidevs_lock can be guaranteed after this fix.
Signed-off-by: Quan Xu <quan.xu@intel.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Doug Goldstein [Tue, 8 Mar 2016 02:23:39 +0000 (20:23 -0600)]
libxl: ensure var is inited in libxl__domain_firmware
Some versions of GCC complain that the 'firmware' variable can be used
uninitialized. It looks like the switch inside of the else case is just
confusing GCC.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Thu, 10 Mar 2016 15:51:40 +0000 (16:51 +0100)]
sched: add Meng as RTDS maintainer
Meng Xu is one of the maintainers of the RT-Xen project,
which is from where the RTDS scheduler comes. He also
is the main author of the version of RTDS that we currently
have here upstream.
Since the upstreaming effort, he's continued looking after
the code, engaging with the community, coming and presenting
at XenSummits and, last but not least, doing development
himself as well as directing the work of others, with the
aim of improving the scheduler.
In summary, he has reached the point, both from thechnical
and community engagement point of views, where he can
effectively serve as a maintainer of RTDS code.
David Vrabel [Thu, 10 Mar 2016 15:51:03 +0000 (16:51 +0100)]
x86: don't flush the whole cache when changing cachability
Introduce the FLUSH_VA_VALID flag to flush_area_mask() and friends to
say that it is safe to use CLFLUSH (i.e., the virtual address is still
valid).
Use this when changing the cachability of the Xen direct mappings (in
response to the guest changing the cachability of its mappings). This
significantly improves performance by avoiding an expensive WBINVD.
This fixes a performance regression introduced by c61a6f74f80eb36ed83a82f713db3143159b9009 (x86: enforce consistent
cachability of MMIO mappings), the fix for XSA-154.
e.g., A set_memory_wc() call in Linux:
before: 4097 us
after: 47 us
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 10 Mar 2016 15:50:30 +0000 (16:50 +0100)]
x86/shadow: avoid extra local array variable
mfns[2] was there just because struct sh_emulate_ctxt's two MFN values
can't be used to hand to vmap(). Making the structure fields an array
avoids the extra copying.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 10 Mar 2016 15:48:23 +0000 (16:48 +0100)]
x86/shadow: compile most write emulation code just once
No need to compile all of this code three times, as most of it really
is guest mode independent. The savings are between 3k and 4k of binary
code in my builds. For this to fully work out, the sh_gva_to_gfn()
calls are being replaced by indirect ones through the paging mode
table.
No functional change (i.e. only formatting and naming changes) except
for
- sh_emulate_map_dest()'s user mode check corrected for the PV case
(affecting debugging mode only, this isn't being split out)
- simplifying the vaddr argument to emulate_gva_to_mfn() for the second
part in the cross page write case
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Razvan Cojocaru [Thu, 10 Mar 2016 15:47:48 +0000 (16:47 +0100)]
x86/HVM: don't disable the REP emulation optimizations for regular IO
Currently REP emulations optimizations remain disabled even if
the emulation does not happen as a result of a vm_event reply
requestion emulation (i.e. even for regular IO). This patch takes
emulate_each_rep into account only if emulation has been requested
by a vm_event-capable application, and is a noticeable speed
optimization for monitored guests.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 10 Mar 2016 15:47:02 +0000 (16:47 +0100)]
x86/xstate: undo bogus adjustment to xsave()
This reverts an unintended change in commit 879b44b041 ("x86/fpu: add
a per-domain field to set the width of FIP/FDP"), which I had done
intermediately while fixing the build issue: After having reverted that
adjustment I must have forgotten to "git add" the adjustment.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 10 Mar 2016 15:45:55 +0000 (16:45 +0100)]
tools/foreign: avoid using alignment directives when not appropriate
The foreign header generation blindly replaces 'uint64_t' with '__align8__
uint64_t', to get correct alignment when built as 32bit. This is correct in
most circumstances, but Clang objects to two specific uses.
* Inside a sizeof() expression
* As part of a typecast
An example error looks like:
/local/xen.git/tools/libxc/../../tools/include/xen/foreign/x86_64.h:204:44:
error: 'aligned' attribute ignored when parsing type [-Werror,-Wignored-attributes]
__align8__ uint64_t evtchn_mask[sizeof(__align8__ uint64_t) * 8];
^~~~~~~~~~
/local/xen.git/tools/libxc/../../tools/include/xen/foreign/x86_64.h:13:36:
note: expanded from macro '__align8__'
^~~~~~~~~~~
This sedary is sufficient to fix all the bad examples without touching any of
the legitimate uses, and is more simple than teaching mkheader.py how to parse
C.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
I would like to propose Julien Grall (julien.grall@arm.com) as
co-maintainer for Xen on ARM. His track record of contributions to the
project is outstanding and I think speaks for itself.
Julien made multiple public presentations about Xen on ARM at Xen
Developer Summit and Linaro Connect conferences. He led the development
of several key features, such as SMMU support and non-PCI passthrough
and participated in the design and the review of large code
contributions from others.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 9 Mar 2016 15:52:31 +0000 (16:52 +0100)]
mm: fix page_list_* helpers to evaluate all their arguments
If an architecture does not provide a custom page_list_entry, default
page_list_* helpers are provided, wrapping list_head as an underlying type for
page_list_head.
The two declarations of the page_list_* helpers differ between defines and
static inline functions, where the defines discard some of their parameters.
This causes a compilation failure if CONFIG_BIGMEM and debug=n in p2m-pod.c:
p2m-pod.c: In function \91p2m_pod_cache_add\92:
p2m-pod.c:72:20: error: unused variable \91d\92 [-Werror=unused-variable]
struct domain *d = p2m->domain;
^
cc1: all warnings being treated as errors
because the use of d outside of the !NDEBUG section doesn't get evaluated as a
parameter by page_list_del().
Fix this by turning all #defines into static inline functions, so all
parameters are evaluated even if they are not used.
While editing this area, correct the return type of page_list_empty from int
to bool_t.
No functional change.
Reported-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 9 Mar 2016 15:51:50 +0000 (16:51 +0100)]
mm: introduce arch_free_heap_page()
common/page_alloc.c references d->arch.relmem_list, which only exists on x86.
This only compiles on ARM because page_list_del2() discards its second
argument.
Introduce a new common arch_free_heap_page() which only uses common lists in
struct domain, and allow an architecture to override this with a custom
alternative. x86 then provides a custom arch_free_heap_page() which takes
care of managing d->arch.relmem_list.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Wed, 9 Mar 2016 15:50:29 +0000 (16:50 +0100)]
domctl: add force flag to xen_domctl_vcpuaffinity for undoing pin override
Add a XEN_VCPUAFFINITY_FORCE flag to xen_domctl_vcpuaffinity structure
which will allow to undo a SCHEDOP_pin_override in case of a driver
error of the hardware domain which didn't do the expected
SCHEDOP_pin_override with cpu < 0 which would have done the undo
operation.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Wed, 9 Mar 2016 15:49:59 +0000 (16:49 +0100)]
sched: add hypercall option to override and restore vcpu affinity
Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be
called on physical cpu 0 only. Linux drivers like dcdbas or i8k try
to achieve this by pinning the running thread to cpu 0, but in Dom0
this is not enough: the vcpu must be pinned to physical cpu 0 via
Xen, too.
Add a stable hypercall option SCHEDOP_pin_override to the sched_op
hypercall to achieve this. It is taking a physical cpu number as
parameter. If pinning is possible (the calling domain has the
privilege to make the call and the cpu is available in the domain's
cpupool) the calling vcpu is pinned to the specified cpu. The old
cpu affinity is saved. To undo the override pinning a negative cpu
value is specified. This will restore the original cpu affinity of
the vcpu.
Juergen Gross [Wed, 9 Mar 2016 15:44:04 +0000 (16:44 +0100)]
cpupool: correct error handling when removing cpu from cpupool
When schedule_cpu_switch() called from cpupool_unassign_cpu_helper()
returns an error, the domlist_read_lock isn't released again.
As cpu_disable_scheduler() might have changed affinity of some
domains domain_update_node_affinity() must be called for all domains
in the cpupool even in error case.
Even if looking weird it is okay to let the to be removed cpu set in
cpupool_free_cpus in case of an error returned by
cpu_disable_scheduler(). Add a comment explaining the reason for this.
Andrew Cooper [Mon, 7 Mar 2016 16:46:25 +0000 (17:46 +0100)]
x86/vPMU: do not clobber IA32_MISC_ENABLE
The VMX RDMSR intercept for MSR_IA32_MISC_ENABLE falls through into
vpmu_do_rdmsr(), so that core2_vpmu_do_rdmsr() may play with the PTS and PEBS
UNAVAIL bits.
Some 64bit Windows include IA32_MISC_ENABLE in the set of items checked by
PatchGuard, and will suffer a BSOD 0x109 CRITICAL_STRUCTURE_CORRUPTION if the
contents change on migrate.
The vPMU infrastructure should not clobber IA32_MISC_ENABLE at all.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Mon, 7 Mar 2016 16:46:03 +0000 (17:46 +0100)]
hvmloader: use xen/errno.h rather than the host systems errno.h
hvmloader is unhosted, and shouldn't use the system errno.h. It already has
to use Xen's errno.h for other hypercalls. The use of public/io/xs_wire.h
requires the use of un-prefixed errno values.
This fixes the build on stricter toolchains where requesting -fno-builtin does
reduce the include path as much as it can.
Reported-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 7 Mar 2016 16:45:13 +0000 (17:45 +0100)]
public/errno: Reduce complexity of inclusion
The inclusion rules conditions for errno.h were unnecesserily complicated, and
required the includer to jump through hoops if they wished to avoid getting
multiple namespaces worth of constants.
Simply the logic, and document what is going on.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Merge branch 'tracing/sched-events-improvements' of git://xenbits.xen.org/people/dariof/xen into staging
* 'tracing/sched-events-improvements' of git://xenbits.xen.org/people/dariof/xen:
xenalyze: handle Credit2 scheduler events
xenalyze: handle Credit1 scheduler events
xenalyze: handle scheduling events
xentrace: formats: add domain create and destroy events.
xentrace: formats: add events from RTDS scheduler
xentrace: formats: add events from Credit2 scheduler
xentrace: formats: add events from Credit scheduler
xentrace: formats: update format of scheduling events
On the mailing list all patches have: Acked-by: George Dunlap <george.dunlap@citrix.com>
and have been Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
so in they go.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Doug Goldstein [Fri, 4 Mar 2016 20:09:48 +0000 (14:09 -0600)]
travis: actually disable debug for non-debug
Non-debug builds need to explicitly disable debug due to debug being
defaulted to y in Config.mk
(Xen keeps debug=y in the staging branch. When Xen is in rc-X stage
the debug is altered so that the builds don't have the debug
option enabled by default).
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Doug Goldstein [Fri, 4 Mar 2016 20:09:47 +0000 (14:09 -0600)]
travis: use matching C++ for GCC version
When we use GCC 5.x, we need to install the C++ compiler and the C
compiler together because QEMU tests for feature flags against the C
compiler and assumes the C++ compiler has them. We also have to
ensure that GCC C++ is used. Have to do the modification of the CXX variable
in two steps to ensure we support older versions of bash in use by the
test machines. While we're at it simply how we select our compiler.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Doug Goldstein [Fri, 4 Mar 2016 20:09:46 +0000 (14:09 -0600)]
travis: skip building coverity, smoke, and master
Skip building of the coverity, smoke, stable, and master branches since
they just fast forward from staging.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Dario Faggioli [Sat, 6 Feb 2016 01:26:04 +0000 (02:26 +0100)]
xenalyze: handle Credit2 scheduler events
so the trace will show properly decoded info,
rather than just a bunch of hex codes.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Changes from v1:
* '} * r =' turned into '} *r =', as requested
during review.
Dario Faggioli [Sat, 6 Feb 2016 01:25:56 +0000 (02:25 +0100)]
xenalyze: handle Credit1 scheduler events
so the trace will show properly decoded info,
rather than just a bunch of hex codes.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
---
Changes from v1:
* '} * r =' turned into '} *r =', as requested
during review.
Dario Faggioli [Sat, 6 Feb 2016 01:25:45 +0000 (02:25 +0100)]
xenalyze: handle scheduling events
so the trace will show properly decoded info,
rather than just a bunch of hex codes.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
---
Changes from v1:
* SCHED_DOM_{ADD,REM} handling slightly changed, to avoid
confusion with DOM0_DOM_{ADD,REM} (introduced later in
the series);
* '} * r =' turned into '} *r =', as requested
during review.
Dario Faggioli [Tue, 16 Feb 2016 12:13:47 +0000 (13:13 +0100)]
xentrace: formats: add domain create and destroy events.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Changes from v2:
* new patch in the series.
Dario Faggioli [Sat, 6 Feb 2016 01:25:16 +0000 (02:25 +0100)]
xentrace: formats: add events from Credit2 scheduler
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
---
Changes from v1:
* fix typo in two events (rq_idx/rq_id)., as requested during
review.
Dario Faggioli [Sat, 6 Feb 2016 01:25:04 +0000 (02:25 +0100)]
xentrace: formats: add events from Credit scheduler
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
Dario Faggioli [Sat, 6 Feb 2016 01:24:52 +0000 (02:24 +0100)]
xentrace: formats: update format of scheduling events
to include the vcpu IDs, in a way that matches
how the "dom:vcpu" couple is displayed in other
events (runstate changes).
Also add the trace for TRC_SCHED_SHUTDOWN_CODE which
was missing and was done via SCHEDOP_shutdown_code hypercall.
(TRC_SCHED_SHUTDOWN trace was present).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de>
---
Changes from v1:
* enhanced changelog, as suggested during review.
Shannon Zhao [Fri, 4 Mar 2016 15:45:52 +0000 (16:45 +0100)]
arm/timer: fix panic when booting with DT
While to support ACPI, patch "arm/acpi: Parse GTDT to initialize timer"
refactors the functions preinit_xen_time and init_xen_time. But it
wrongly moves the platform_get_irq from init_xen_time to
preinit_dt_xen_time and this will cause booting failure.
So move platform_get_irq back to init_xen_time to fix it.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 4 Mar 2016 13:15:53 +0000 (14:15 +0100)]
x86/HVM: limit flushing on cache attribute pinning adjustments
Avoid cache flush on EPT when removing a UC- range, since when used
this type gets converted to UC anyway (there's no UC- among the types
valid in MTRRs and hence EPT's emt field).
We might further wwant to consider only forcing write buffer flushes
when removing WC ranges.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 4 Mar 2016 13:14:25 +0000 (14:14 +0100)]
x86/HVM: remove unnecessary indirection from hvm_get_mem_pinned_cacheattr()
Its return value can easily serve the purpose. We cannot, however,
return unspecific "success" anymore for a domain of the wrong type -
since no caller exists that would call this for PV domains, simply add
an ASSERT().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 4 Mar 2016 13:12:11 +0000 (14:12 +0100)]
x86/HVM: honor cache attribute pinning for RAM only
Call hvm_get_mem_pinned_cacheattr() for RAM ranges only, and only when
the guest has a physical device assigned: XEN_DOMCTL_pin_mem_cacheattr
is documented to be intended for RAM only.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Paul Durrant [Fri, 4 Mar 2016 13:08:38 +0000 (14:08 +0100)]
public/io/netif.h: make control ring hash protocol more general
This patch modified the control ring protocol (of which there is
not yet an implementation) to make it more general. Most of the
concepts are not limited to toeplitz hashing so it's best not to
make them unnecessarily specific.
Apart from changing the names of various definitions and modifying
comments, this patch:
- Adds a new control message type to select a hash algorithm.
- Adds a reference implementation of the toeplitz hash.
- Changes the 'toeplitz' extra info fragment into a 'hash' extra
info fragment and replaces the octet of padding with the index of
the algorithm that was used to create the hash value.
- Relaxes the restriction that the mapping table has to be
power-of-2 sized.
The patch also fixes a few spelling typos noticed along the way.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Shannon Zhao [Wed, 2 Mar 2016 07:38:00 +0000 (08:38 +0100)]
arm/acpi: Add acpi parameter to enable/disable acpi
Define new command line parameter "acpi" to enable/disable acpi.
This implements the following policy to decide whether ACPI should be
used to boot the system:
- acpi=off: ACPI will not be used to boot the system, even if there is
no alternative available (e.g., device tree is empty)
- acpi=force: only ACPI will be used to boot the system; if that fails,
there will be no fallback to alternative methods (such as device tree)
- otherwise, ACPI will be used as a fallback if the device tree turns
out to lack a platform description; the heuristic to decide this is
whether /chosen is the only node present at depth 1
Shannon Zhao [Wed, 2 Mar 2016 07:40:00 +0000 (08:40 +0100)]
arm/acpi: Parse GTDT to initialize timer
Parse GTDT (Generic Timer Descriptor Table) to initialize timer. Using
the information presented by GTDT to initialize the arch timer (not
memory-mapped).
Shannon Zhao [Wed, 2 Mar 2016 07:37:00 +0000 (08:37 +0100)]
arm/gic: Add ACPI support for GIC preinit
Since ACPI 6.0 defines that GIC Distributor Structure contains the GIC
version filed, it could get GIC version from that. Then call acpi device
initializing function to preinit GIC device.
Parth Dixit [Wed, 2 Mar 2016 07:37:00 +0000 (08:37 +0100)]
arm/gic-v2: Add ACPI boot support for GICv2
ACPI on Xen hypervisor uses MADT table for proper GIC initialization.
First get the GIC version from GIC Distributor. Then parse GIC related
subtables, collect CPU interface and distributor addresses and call
driver initialization function (which is hardware abstraction agnostic).
In a similar way, FDT initialize GICv2.
Shannon Zhao [Wed, 2 Mar 2016 07:39:00 +0000 (08:39 +0100)]
arm/acpi: Add ACPI support for SMP initialization
ACPI 5.1 only has two explicit methods to boot up SMP, PSCI and Parking
protocol, but the Parking protocol is only specified for ARMv7 now, so
make PSCI as the only way for the SMP boot protocol before some updates
for the ACPI spec or the Parking protocol spec.
ACPI only supports PSCI 0.2+, since prior to PSCI 0.2 function IDs are
not well-defined.
Parth Dixit [Wed, 2 Mar 2016 07:35:00 +0000 (08:35 +0100)]
arm/acpi: Parse MADT to map logical cpu to MPIDR and get cpu_possible_map
MADT contains the information for MPIDR which is essential for SMP
initialization, parse the GIC cpu interface structures to get the MPIDR
value and map it to cpu_logical_map(), and add enabled cpu with valid
MPIDR into cpu_possible_map.
Move BAD_MADT_ENTRY to common place, parenthesize its parameters and
drop the pointer cast.
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Naresh Bhat <naresh.bhat@linaro.org> Signed-off-by: Parth Dixit <parth.dixit@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Shannon Zhao [Wed, 2 Mar 2016 07:43:00 +0000 (08:43 +0100)]
arm/acpi: Parse FADT table and get PSCI flags
There are two flags: PSCI_COMPLIANT and PSCI_USE_HVC. When set, the
former signals to the OS that the hardware is PSCI compliant. The latter
selects the appropriate conduit for PSCI calls by toggling between
Hypervisor Calls (HVC) and Secure Monitor Calls (SMC). FADT table
contains such information, parse FADT to get the flags for furture
usage.
Since STAO table and the GIC version are introduced by ACPI 6.0, we will
check the version and only parse FADT table with version >= 6.0. If
firmware provides ACPI tables with ACPI version less than 6.0, OS will
be messed up with those information, so disable ACPI if we get an FADT
table with version less than 6.0.
Shannon Zhao [Wed, 2 Mar 2016 07:37:00 +0000 (08:37 +0100)]
arm/acpi: Add basic ACPI initialization
acpi_boot_table_init() will be called in start_xen to get the RSDP and
all the table pointers. With this patch, we can get ACPI boot-time
tables from firmware on ARM64.
Juergen Gross [Thu, 3 Mar 2016 07:55:30 +0000 (08:55 +0100)]
silence affinity messages on suspend/resume
When taking cpus offline for suspend or bringing them online on resume
again the scheduler might issue debug messages when temporarily
breaking vcpu affinity or restoring the original affinity settings.
The resume message can be removed completely, while the message when
breaking affinity should only be issued if the breakage is permanent.
Yang Hongyang [Wed, 2 Mar 2016 03:44:50 +0000 (11:44 +0800)]
Remus: update email address in MAINTAINERS file
Signed-off-by: Yang Hongyang <imhy.yang@gmail.com> Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Fri, 19 Feb 2016 15:13:17 +0000 (09:13 -0600)]
travis: add IRC notifications
This will cause failed builds and when the build flips back to success
to be reported to #xentest on FreeNode. The syntax of the message will
be:
<travis-ci> xen-project/xen#BUILDID (BRANCH - REVISION : COMMITTER)
<travis-ci> Change view :
https://github.com/xen-project/xen/compare/RANGE
<travis-ci> Build details :
https://travis-ci.org/xen-project/xen/builds/BUILDID
The blob was generated with the following command:
travis encrypt -r xen-project/xen 'chat.freenode.net#xentest'
The reason it is encrypted is to prevent people that fork the repo to
spam #xentest. This value will only properly decrypt when running within
the xen-project/xen space.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Doug Goldstein [Fri, 19 Feb 2016 02:57:04 +0000 (20:57 -0600)]
m4/python: fix checks for Python library support
AC_CHECK_LIB() was running gcc -Llib -lm -lutils conftest.c which on
platforms that do as needed operations by default will result in
underlinking. Instead AC_CHECK_LIB() suggests supplying the extra
libraries necessary in a 5th argument.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Liang Li [Mon, 11 Jan 2016 08:52:10 +0000 (16:52 +0800)]
libxc: Expose the MPX cpuid flag to guest
If hardware support memory protect externsion, expose this feature
to guest by default. Users don't have to use a 'cpuid= ' option in
config file to turn it on.
Signed-off-by: Liang Li <liang.z.li@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Wed, 24 Feb 2016 15:03:29 +0000 (09:03 -0600)]
tools/configure: only require bcc/ld86/as86 when needed
bcc/ld86/as86 are necessary when we build ROMBIOS. However if we do not
build it (and are not building qemu-trad), the build requirements are
overly strict and can lead to failures.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Campbell [Wed, 17 Feb 2016 10:34:24 +0000 (10:34 +0000)]
xl: NULL terminate buf when reading dom0 /proc/uptime
The contents of /proc/uptime is typically something like "80164.57
640617.58", so the existing 512 byte buffer is more than large enoguh,
so reduce its effective size to 511 bytes and ensure we include a
NULL.
Otherwise Coverity points out that we pass a potentially unterminated
string to strtok. In practice this likely doesn't actually cause
issues (at least on Linux) because the
string should always contain a space so we will stop parsing.
CID: 105590
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Feng Wu [Tue, 1 Mar 2016 13:42:13 +0000 (14:42 +0100)]
vmx: VT-d posted-interrupt core logic handling
This is the core logic handling for VT-d posted-interrupts. Basically it
deals with how and when to update posted-interrupts during the following
scenarios:
- vCPU is preempted
- vCPU is slept
- vCPU is blocked
When vCPU is preempted/slept, we update the posted-interrupts during
scheduling by introducing two new architecutral scheduler hooks:
vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we
introduce a new architectural hook: arch_vcpu_block() to update
posted-interrupts descriptor.
Besides that, before VM-entry, we will make sure the 'NV' filed is set
to 'posted_intr_vector' and the vCPU is not in any blocking lists, which
is needed when vCPU is running in non-root mode. The reason we do this check
is because we change the posted-interrupts descriptor in vcpu_block(),
however, we don't change it back in vcpu_unblock() or when vcpu_block()
directly returns due to event delivery (in fact, we don't need to do it
in the two places, that is why we do it before VM-Entry).
When we handle the lazy context switch for the following two scenarios:
- Preempted by a tasklet, which uses in an idle context.
- the prev vcpu is in offline and no new available vcpus in run queue.
We don't change the 'SN' bit in posted-interrupt descriptor, this
may incur spurious PI notification events, but since PI notification
event is only sent when 'ON' is clear, and once the PI notificatoin
is sent, ON is set by hardware, hence no more notification events
before 'ON' is clear. Besides that, spurious PI notification events are
going to happen from time to time in Xen hypervisor, such as, when
guests trap to Xen and PI notification event happens, there is
nothing Xen actually needs to do about it, the interrupts will be
delivered to guest atht the next time we do a VMENTRY.
Suggested-by: Yang Zhang <yang.z.zhang@intel.com> Suggested-by: Dario Faggioli <dario.faggioli@citrix.com> Suggested-by: George Dunlap <george.dunlap@citrix.com> Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Ian Campbell [Wed, 17 Feb 2016 10:39:40 +0000 (10:39 +0000)]
xl: close nullfd after dup2'ing it to stdin
We assert that nullfd if not std{in,out,err} since that would result
in closing one of the just dup2'd fds. For this to happen
std{in,out,err} would have needed to be closed, at which point all
sorts of other things could go wrong.
Haozhong Zhang [Tue, 1 Mar 2016 13:38:22 +0000 (14:38 +0100)]
x86/hvm: move saving/loading vcpu's TSC to common code
Both VMX and SVM save/load vcpu's TSC when saving/loading vcpu's
context, so this patch moves saving/loading vcpu's TSC to the common
functions hvm_[save|load]_cpu_ctxt().
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>