]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agox86/traps: Drop use_error_code parameter from do_{,guest_}trap()
Andrew Cooper [Wed, 3 Aug 2016 16:56:56 +0000 (16:56 +0000)]
x86/traps: Drop use_error_code parameter from do_{,guest_}trap()

Whether or not an error code is needed can be determinted entirely from the
trapnr paramter, as error codes are architecturally specified.

Introduce TRAP_HAVE_EC as a bitmap of reserved vectors which have error codes,
and drop the use_error_code from all callsites.

As a result, the DO_ERROR{,_NOCODE}() macros become entirely superflouous and
can be dropped.  Update the exception_table to point straight at do_trap().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agolibxl: CODING_STYLE: Forbid if (...) { stmt; } else stmt;
Ian Jackson [Mon, 8 Aug 2016 10:21:31 +0000 (11:21 +0100)]
libxl: CODING_STYLE: Forbid if (...) { stmt; } else stmt;

And clarify that the rule about omitting braces for single statements
is optional (it is even contradicted by the example).

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxl: use xenconsole startup protocol
Wei Liu [Mon, 1 Aug 2016 09:55:59 +0000 (10:55 +0100)]
xl: use xenconsole startup protocol

If user asks xl to automatically connect to console when creating a
guest, use the new startup protocol before trying to unpause domain so
that we don't lose any console output.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agodocs: document xenconsole startup protocol
Wei Liu [Mon, 1 Aug 2016 09:36:57 +0000 (10:36 +0100)]
docs: document xenconsole startup protocol

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxl: libxl_{primary_,}console_exec now take notify_fd argument
Wei Liu [Mon, 1 Aug 2016 09:28:00 +0000 (10:28 +0100)]
libxl: libxl_{primary_,}console_exec now take notify_fd argument

The new argument will be passed down to xenconsole process, which then
uses it to notify readiness.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxl: factor out libxl__console_tty_path
Wei Liu [Mon, 1 Aug 2016 11:20:09 +0000 (12:20 +0100)]
libxl: factor out libxl__console_tty_path

No other user yet.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools/console: introduce --start-notify-fd option for console client
Wei Liu [Fri, 29 Jul 2016 17:24:25 +0000 (18:24 +0100)]
tools/console: introduce --start-notify-fd option for console client

The console client will write 0x00 to that fd before entering console
loop to indicate its readiness.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools/console: fix help string in client
Wei Liu [Fri, 29 Jul 2016 17:22:26 +0000 (18:22 +0100)]
tools/console: fix help string in client

There is no short '-t' option.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoCODING_STYLE: Allow single-sentence comments without full stops
George Dunlap [Mon, 8 Aug 2016 10:07:46 +0000 (11:07 +0100)]
CODING_STYLE: Allow single-sentence comments without full stops

One of the common ways in which contributors trip up over the
CODING_STYLE guides is by not putting a full stop at the end of a
comment when there is only a single sentence.  Calling these out is a
waste of everybody's time: The full stop at the end of a comment with
a single sentence (or a single phrase) adds absolutely nothing to the
legibility of the code.

Modify CODING_STYLE to allow comments with a single sentence or
sentence fragment to either have a full stop or not, while making it
clear that comments with multiple sentences must have a full stop at
the end of each sentence.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@citrix.com>
8 years agotools: xenalyze: kill spurious sched_switch output in non dump mode.
Dario Faggioli [Thu, 4 Aug 2016 08:59:03 +0000 (10:59 +0200)]
tools: xenalyze: kill spurious sched_switch output in non dump mode.

In fact, 52cf096df7 ("xenalyze: handle scheduling event"),
when dealing with TRC_SCHED_SWITCH, forgot to check whether
we actually are in dump mode, causing the printf() in
dump_sched_switch() to always produce its output, which
is not what we want.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agox86/time: also use rdtsc_ordered() in check_tsc_warp()
Jan Beulich [Fri, 5 Aug 2016 16:00:45 +0000 (18:00 +0200)]
x86/time: also use rdtsc_ordered() in check_tsc_warp()

This really was meant to be added in a v2 of what became commit
fa74e70500 ("x86/time: introduce and use rdtsc_ordered()").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibelf: drop struct elf_dom_parms' virt_offset member
Jan Beulich [Fri, 5 Aug 2016 15:59:32 +0000 (17:59 +0200)]
libelf: drop struct elf_dom_parms' virt_offset member

It's being used solely by elf_xen_addr_calc_check(), and hence can be
a local variable there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibxl: return any serial tty path in libxl_console_get_tty
Bob Liu [Thu, 4 Aug 2016 01:07:56 +0000 (09:07 +0800)]
libxl: return any serial tty path in libxl_console_get_tty

When specifying a serial list in domain config, users of
libxl_console_get_tty cannot get the tty path of a second specified pty serial,
since right now it always returns the tty path of serial 0.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: make xenstore domain easy configurable
Juergen Gross [Tue, 2 Aug 2016 16:10:47 +0000 (18:10 +0200)]
tools: make xenstore domain easy configurable

Add configuration entries to sysconfig.xencommons for selection of the
xenstore type (domain or daemon) and start the selected xenstore
service via a script called from sysvinit or systemd.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: use pidfile for test if xenstored is running
Juergen Gross [Tue, 2 Aug 2016 16:10:46 +0000 (18:10 +0200)]
tools: use pidfile for test if xenstored is running

Instead of trying to read xenstore via xenstore-read use the pidfile
of xenstored for the test whether xenstored is running. This prepares
support of xenstore domain, as trying to read xenstore will block
for ever in case xenstore domain is started after trying to read.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: split out xenstored starting form xencommons
Juergen Gross [Tue, 2 Aug 2016 16:10:45 +0000 (18:10 +0200)]
tools: split out xenstored starting form xencommons

In order to prepare starting a xenstore domain split out the starting
of the xenstore daemon from the xencommons script into a dedicated
launch-xenstore script.

A rerun of autogen.sh is required.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: remove systemd xenstore socket definitions
Juergen Gross [Tue, 2 Aug 2016 16:10:44 +0000 (18:10 +0200)]
tools: remove systemd xenstore socket definitions

On a system with systemd the xenstore sockets are created via systemd.
Remove the related configuration files in order to be able to decide
at runtime whether the sockets should be created or not. This will
enable Xen to start xenstore either via a daemon or via a stub domain.

As the xenstore domain start program will exit after it has done its
job prepare the same behaviour to be tolerated by systemd for the
xenstore daemon by specifying the appropriate flags in the service
file.

A rerun of autogen.sh is required.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave@recoil.org>
8 years agoxen/arm: p2m: Don't use default access permission when shattering a superpage
Julien Grall [Fri, 29 Jul 2016 18:53:19 +0000 (19:53 +0100)]
xen/arm: p2m: Don't use default access permission when shattering a superpage

The following message flood the console when memaccess is enabled on
various platforms:

traps.c:2510:d1v0 HSR=0x9383004f pc=0xffff000008b7d4c4 gva=0xffff000008eeb8e0 gpa=0x0000004903f8e0

This is because a data abort from a guest was received due to a
permission fault but memaccess thought there are no permission fault.

On ARM, memaccess permissions are stored in a radix tree because there
are not enough available bits in the p2m entry to store the access
restriction. When memaccess is restricting the access (i.e any other
access than p2m_access_rwx), the access will be added in the radix tree
using the GFN as a key. This will be done for all 4KB pages.

This means that memaccess has to shatter all the superpages in a given
region to set the permission on a 4KB granularity. Currently, when a
superpage is shattered, the new entries are using the value
p2m->default_access which will restrict permission (because memaccess
has been enabled). However the radix tree does not yet contain
an entry for this GFN.

If a guest VCPU is running at the same time and trying to access the
modified region, it will result to a stage-2 permission fault. As
the radix tree does not yet contain an entry for the GFN, memaccess will
deduce that the fault was not valid and a data abort will be injecting
to the guest (and crash it).

Furthermore, the permission may be restricted outside of the requested
region if it is only a subset of a 1GB/2MB superpage.

The two issues can be fixed by re-using the permission of the superpage
entry and override the necessary fields. This is not a problem because
memaccess cannot work on superpage.

Lastly, document the code which call mfn_to_p2m_entry when creating a
the p2m entry for a table to explain that create the p2m entry to page table
to explain that permission are ignored by the hardware (See D4.3.1 in ARM DDI
0487A.j). so the value of the parameter 'access' of mfn_to_p2m_entry does
not matter.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: arm64: Add Cortex-A57 erratum 834220 workaround
Julien Grall [Thu, 4 Aug 2016 17:50:07 +0000 (18:50 +0100)]
xen/arm: arm64: Add Cortex-A57 erratum 834220 workaround

The ARM erratum applies to certain revisions of Cortex-A57. The
processor may report a Stage 2 translation fault as the result of
Stage 1 fault for load crossing a page boundary when there is a
permission fault or device memory fault at stage 1 and a translation
fault at Stage 2.

So Xen needs to check that Stage 1 translation does not generate a fault
before handling the Stage 2 fault. If it is a Stage 1 translation fault,
return to the guest to let the processor injecting the correct fault.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: traps: Avoid unnecessary VA -> IPA translation in abort handlers
Julien Grall [Thu, 4 Aug 2016 17:50:06 +0000 (18:50 +0100)]
xen/arm: traps: Avoid unnecessary VA -> IPA translation in abort handlers

Translating a VA to a IPA is expensive. Currently, Xen is assuming that
HPFAR_EL2 is only valid when the stage-2 data/instruction abort happened
during a translation table walk of a first stage translation (i.e S1PTW
is set).

However, based on the ARM ARM (D7.2.34 in DDI 0487A.j), the register is
also valid when the data/instruction abort occured for a translation
fault.

With this change, the VA -> IPA translation will only happen for
permission faults that are not related to a translation table of a
first stage translation.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: traps: MMIO should only be emulated for fault translation
Julien Grall [Thu, 4 Aug 2016 17:50:05 +0000 (18:50 +0100)]
xen/arm: traps: MMIO should only be emulated for fault translation

The function do_trap_data_abort_guest assumes that a stage-2 data abort
can only be taken for a translation fault or permission fault today.

Whilst this is true today, it might not be in the future. Rather than
emulating the MMIO for any fault other than the permission one, print
a warning message when the fault is not handled by Xen.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Use check_workaround to handle the erratum 766422
Julien Grall [Thu, 4 Aug 2016 17:50:04 +0000 (18:50 +0100)]
xen/arm: Use check_workaround to handle the erratum 766422

Currently, Xen is accessing the stored MIDR everytime it has to check
whether the processor is affected by the erratum 766422.

This could take advantage of the new capability bitfields to detect
whether the processor is affected at boot time.

With this patch, the number of instructions to check the erratum is
going down from ~13 (including 2 loads and a co-processor access) to
~6 instructions (include 1 load).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Provide macros to help creating workaround helpers
Julien Grall [Thu, 4 Aug 2016 17:50:03 +0000 (18:50 +0100)]
xen/arm: Provide macros to help creating workaround helpers

Workarounds may require to execute a different path when the platform
is affected by the associated erratum. Furthermore, this may need to
be called in the common code.

To avoid too much intrusion/overhead, the workaround helpers need to
be a nop on architecture which will never have the workaround and have
to be quick to check whether the platform requires it.

The alternative framework is used to transform the check in a single
instruction. When the framework is not available, the helper will have
~6 instructions including 1 instruction load.

The macro will create a handler called check_workaround_xxxxx with
xxxx the erratum number.

For instance, the line bellow will create a workaround helper for
erratum #424242 which is enabled when the capability
ARM64_WORKAROUND_424242 is set and only available for ARM64:

CHECK_WORKAROUND_HELPER(424242, ARM64_WORKAROUND_42424242, CONFIG_ARM64)

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: traps: Simplify the switch in do_trap_*_abort_guest
Julien Grall [Thu, 4 Aug 2016 17:50:02 +0000 (18:50 +0100)]
xen/arm: traps: Simplify the switch in do_trap_*_abort_guest

The fault status we care are in the form BBBBxx where xx is the lookup
level that gave the fault. We can simplify the code by masking the 2 least
significant bits.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86/debug: Make debugger_trap_entry() safe during early boot
Andrew Cooper [Thu, 4 Aug 2016 11:38:05 +0000 (12:38 +0100)]
x86/debug: Make debugger_trap_entry() safe during early boot

debugger_trap_entry() is reachable during early boot where its unconditional
use of current is unsafe.  Add a warning to the function to this effect.

Perform the vector check first, as this allows the compiler to elide the other
content from most of its callsites.  Check guest_mode(regs) before using
current, which makes the path safe on early boot.

While editing this area, drop DEBUGGER_trap_{entry,fatal}, as hiding a return
statement in a function-like macro is very antisocial programming; show the
real control flow at each of the callsites.  Finally, switch
debugger_trap_{entry,fatal} to having boolean return types, to match their
semantics.

No behavioural change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: support newer Intel CPU models
Jan Beulich [Thu, 4 Aug 2016 08:52:49 +0000 (10:52 +0200)]
x86: support newer Intel CPU models

... as per the June 2016 edition of the SDM.

Also remove a couple of dead break statements as well as unused
*MSR_PM_LASTBRANCH* #define-s.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agohvmloader: don't hard-code IO-APIC parameters
Jan Beulich [Thu, 4 Aug 2016 08:08:48 +0000 (10:08 +0200)]
hvmloader: don't hard-code IO-APIC parameters

The IO-APIC address has variable bits determined by the PCI-to-ISA
bridge (albeit for now we refrain from actually evaluating them, as
there's still implicit rather than explicit agreement on the IO-APIC
base address between qemu and the hypervisor), and the IO-APIC version
should be read from the IO-APIC.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: relax barriers
Jan Beulich [Thu, 4 Aug 2016 08:08:00 +0000 (10:08 +0200)]
x86/time: relax barriers

On x86 there's no need for full barriers in loops waiting for some
memory location to change. Nor do we need full barriers between two
reads and two writes - SMP ones fully suffice.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: group time stamps into a structure
Jan Beulich [Thu, 4 Aug 2016 08:07:02 +0000 (10:07 +0200)]
x86/time: group time stamps into a structure

If that had been done from the beginning, mistakes like the one
corrected in commit b64438c7c1 ("x86/time: use correct (local) time
stamp in constant-TSC calibration fast path") would likely never have
happened.

Also add a few "const" to make more obvious when things aren't expected
to change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: fold recurring code
Jan Beulich [Thu, 4 Aug 2016 08:04:29 +0000 (10:04 +0200)]
x86/time: fold recurring code

Common code between time_calibration_{std,tsc}_rendezvous() can better
live in a single place, eliminating the risk of adjusting one without
the other.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: support 32-bit wide ACPI PM timer
Jan Beulich [Thu, 4 Aug 2016 08:03:28 +0000 (10:03 +0200)]
x86/time: support 32-bit wide ACPI PM timer

I have no idea why we didn't do so from the beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: calibrate TSC against platform timer
Jan Beulich [Thu, 4 Aug 2016 08:02:52 +0000 (10:02 +0200)]
x86/time: calibrate TSC against platform timer

... instead of unconditionally against the PIT. This allows for local
and master system times to remain in better sync (which matters even
when, on any modern system, the master time is really used only during
secondary CPU bringup, as the error between the two is in fact
noticable in cross-CPU NOW() invocation monotonicity).

This involves moving the init_platform_timer() invocation into
early_time_init(), splitting out the few things which really need to be
done in init_xen_time(). That in turn allows dropping the open coded
PIT initialization from init_IRQ() (it was needed for APIC clock
calibration, which runs between early_time_init() and init_xen_time()).

In the course of this re-ordering also set the timer channel 2 gate low
after having finished calibration. This should be benign to overall
system operation, but appears to be the more clean state.

Also do away with open coded 8254 register manipulation from 8259 code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/HVM: add new functions to get/set memory types
Paul Durrant [Thu, 4 Aug 2016 08:01:57 +0000 (10:01 +0200)]
x86/HVM: add new functions to get/set memory types

For clarity this patch breaks the code to set/get memory types out
of do_hvm_op() into dedicated functions: hvmop_set/get_mem_type().
Also, for clarity, checks for whether a memory type change is allowed
are broken out into a separate function called by hvmop_set_mem_type().

There is no intentional functional change in this patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86: rename p2m_mmio_write_dm to p2m_ioreq_server
Paul Durrant [Thu, 4 Aug 2016 08:01:17 +0000 (10:01 +0200)]
x86: rename p2m_mmio_write_dm to p2m_ioreq_server

Previously p2m type p2m_mmio_write_dm was introduced for write-
protected memory pages whose write operations are supposed to be
forwarded to and emulated by an ioreq server. Yet limitations of
rangeset restrict the number of guest pages to be write-protected.

This patch replaces the p2m type p2m_mmio_write_dm with a new name:
p2m_ioreq_server, which means this p2m type can be claimed by one
ioreq server, instead of being tracked inside the rangeset of ioreq
server. And a new memory type, HVMMEM_ioreq_server, is now used in
the HVMOP_set/get_mem_type interface to set/get this p2m type.

Patches following up will add the related HVMOP handling code which
map/unmap type p2m_ioreq_server to/from an ioreq server. Without
following patches, memory type changes to HVMMEM_ioreq_server can
still be allowed, and in such cases, p2m_ioreq_server pages will be
treated the same as ones with previous type p2m_mmio_write_dm, and
are tracked inside the ioreq server's rangeset.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen/arm: traps: Don't inject a fault if the translation VA -> IPA fails
Julien Grall [Wed, 27 Jul 2016 16:37:11 +0000 (17:37 +0100)]
xen/arm: traps: Don't inject a fault if the translation VA -> IPA fails

Based on ARM ARM (D4.5.3 in ARM DDI 0486A and B3.12.7 in ARM DDI 0406C.c),
a Stage 1 translation error has priority over a Stage 2 translation error.

Therefore gva_to_ipa can only fail if another vCPU is playing with the
page table.

Rather than injecting a custom fault, replay the instruction and let the
processor injecting the correct fault.

This is fine as Xen is handling all the pending softirqs
(see leave_hypervisor_tail) before returning to the guest. One of them
is the scheduler which could rescheduled the vCPU.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: arm64: Add cortex-A57 erratum 832075 workaround
Julien Grall [Wed, 27 Jul 2016 16:37:10 +0000 (17:37 +0100)]
xen/arm: arm64: Add cortex-A57 erratum 832075 workaround

The ARM erratum 832075 applies to certain revisions of Cortex-A57, one
of the workarounds is to change device loads into using load-acquire
semantics.

Use the alternative framework to enable the workaround only on affected
cores.

Whilst a guest could trigger the deadlock, it can be broken when the
processor is receiving an interrupt. As the Xen scheduler will always setup
a timer (firing to every 1ms to 300ms depending on the running time
slice) on each processor, the deadlock would last only few milliseconds
and only affects the guest time slice.

Therefore a malicious guest could only hurt itself. Note that all the
guests should implement/enable the workaround for the affected cores.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: arm64: Add Cortex-A53 cache errata workaround
Julien Grall [Wed, 27 Jul 2016 16:37:09 +0000 (17:37 +0100)]
xen/arm: arm64: Add Cortex-A53 cache errata workaround

The ARM errata 819472, 827319 and 824069 define the same workaround for
these hardware issues in certain Cortex-A53 parts.

The cache instructions "dc cvac" and "dc cvau" need to be upgraded to
"dc civac".

Use the alternative framework to replace those instructions only on
affected cores.

Whilst the errata affect cache instructions issued at any exception
level, it is not necessary to trap EL1/EL0 data cache instructions
access in order to upgrade them. Indeed the data cache corruption would
always be at the address used by the data cache instructions. Note that
this address could point to a shared memory between guests and the
hypervisors, however all the information present in it are be validated
before any use.

Therefore a malicious guest could only hurt itself. Note that all the
guests should implement/enable the workaround for the affected cores.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Document the errata implemented in Xen
Julien Grall [Wed, 27 Jul 2016 16:37:08 +0000 (17:37 +0100)]
xen/arm: Document the errata implemented in Xen

The new document will help to keep track of each erratum Xen is able to
handle.

The text is based on the Linux doc in Documents/arm64/silicon-errata.txt.

Also list the current errata that Xen is aware of.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoxen/arm: Detect silicon revision and set cap bits accordingly
Julien Grall [Wed, 27 Jul 2016 16:37:07 +0000 (17:37 +0100)]
xen/arm: Detect silicon revision and set cap bits accordingly

After each CPU has been started, we iterate through a list of CPU
errata to detect CPUs which need from hypervisor code patches.

For each bug there is a function which checks if that a particular CPU is
affected. This needs to be done on every CPU to cover heterogenous
systems properly.

If a certain erratum has been detected, the capability bit will be set.
In the case the erratum requires code patching, this will be triggered
by the call to apply_alternatives.

The code is based on the file arch/arm64/kernel/cpu_errata.c in Linux
v4.6-rc3.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: cpufeature: Provide an helper to check if a capability is supported
Julien Grall [Wed, 27 Jul 2016 16:37:06 +0000 (17:37 +0100)]
xen/arm: cpufeature: Provide an helper to check if a capability is supported

The CPU capabilities will be set depending on the value found in the CPU
registers. This patch provides a generic to go through a set of capabilities
and find which one should be enabled.

The parameter "info" is used to display the kind of capability updated (e.g
workaround, feature...).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoxen/arm: Introduce alternative runtime patching
Julien Grall [Wed, 27 Jul 2016 16:37:05 +0000 (17:37 +0100)]
xen/arm: Introduce alternative runtime patching

Some of the processor erratum will require to modify code sequence.
As those modifications may impact the performance, they should only
be enabled on affected cores. Furthermore, Xen may also want to take
advantage of new hardware features coming up with v8.1 and v8.2.

This patch adds an infrastructure to patch Xen during boot time
depending on the "features" available on the platform.

This code is based on the file arch/arm64/kernel/alternative.c in
Linux 4.6-rc3. Any references to arm64 have been dropped to make the
code as generic as possible.

When Xen is creating the page tables, all the executable sections
(.text and .init.text) will be marked read-only and then enforced by
setting SCTLR.WNX.

Whilst it might be possible to mark those entries read-only after
Xen has been patched, we would need extra care to avoid possible
TLBs conflicts (see D4-1732 in ARM DDI 0487A.i) as all
physical CPUs will be running.

All the physical CPUs have to be brought up before patching Xen because
each cores may have different errata/features which require code
patching. The only way to find them is to probe system registers on
each CPU.

To avoid extra complexity, it is possible to create a temporary
writeable mapping with vmap. This mapping will be used to write the
new instructions.

Lastly, runtime patching is currently not necessary for ARM32. So the
code is only enabled for ARM64.

Note that the header asm-arm/alternative.h is a verbatim copy for the
Linux one (arch/arm64/include/asm/alternative.h). It may contain
innacurate comments, but I did not touch them for now.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
8 years agox86/mmcfg: Fix initalisation of variables in pci_mmcfg_nvidia_mcp55()
Andrew Cooper [Wed, 3 Aug 2016 09:48:42 +0000 (10:48 +0100)]
x86/mmcfg: Fix initalisation of variables in pci_mmcfg_nvidia_mcp55()

Shifting into the sign bit of an integer is undefined behaviour.

Only the first integer is actually undefined, but switch all the shifts
for consistency.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
8 years agoratelimit: Implement rate limit for credit2 scheduler
Anshul Makkar [Wed, 3 Aug 2016 12:35:22 +0000 (13:35 +0100)]
ratelimit: Implement rate limit for credit2 scheduler

Rate limit assures that a vcpu will execute for a minimum amount of
time before being put at the back of a queue or being preempted by
higher priority thread.

It introduces context-switch rate-limiting. The patch enables the VM
to batch its work and prevents the system from spending most of its
time in context switches because of a VM that is waking/sleeping at
high rate.

ratelimit can be disabled by setting it to 0.

Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
8 years agoxen: cpupool: small optimization when moving between pools
Dario Faggioli [Wed, 3 Aug 2016 12:31:49 +0000 (13:31 +0100)]
xen: cpupool: small optimization when moving between pools

If the domain is already where we want it to go,
there's not much to do indeed.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
8 years agoxen: fix a (latent) cpupool-related race during domain destroy
Dario Faggioli [Wed, 3 Aug 2016 12:31:49 +0000 (13:31 +0100)]
xen: fix a (latent) cpupool-related race during domain destroy

So, during domain destruction, we do:
 cpupool_rm_domain()    [ in domain_destroy() ]
 sched_destroy_domain() [ in complete_domain_destroy() ]

Therefore, there's a window during which, from the
scheduler's point of view, a domain stilsts outside
of any cpupool.

In fact, cpupool_rm_domain() does d->cpupool=NULL,
and we don't allow that to hold true, for anything
but the idle domain (and there are, in fact, ASSERT()s
and BUG_ON()s to that effect).

Currently, we never really check d->cpupool during the
window, but that does not mean the race is not there.
For instance, Credit2 at some point (during load balancing)
iterates on the list of domains, and if we add logic that
needs checking d->cpupool, and any one of them had
cpupool_rm_domain() called on itself already... Boom!

(In fact, calling __vcpu_has_soft_affinity() from inside
balance_load() makes `xl shutdown <domid>' reliably
crash, and this is how I discovered this.)

On the other hand, cpupool_rm_domain() "only" does
cpupool related bookkeeping, and there's no harm
postponing it a little bit.

Also, considering that, during domain initialization,
we do:
 cpupool_add_domain()
 sched_init_domain()

It makes sense for the destruction path to look like
the opposite of it, i.e.:
 sched_destroy_domain()
 cpupool_rm_domain()

And hence that's what this patch does.

Actually, for better robustness, what we really do is
moving both cpupool_add_domain() and cpupool_rm_domain()
inside sched_init_domain() and sched_destroy_domain(),
respectively (and also add a couple of ASSERT()-s).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: issues in csched2_cpu_pick(), when tracing is enabled.
Dario Faggioli [Wed, 27 Jul 2016 03:09:49 +0000 (05:09 +0200)]
xen: credit2: issues in csched2_cpu_pick(), when tracing is enabled.

In fact, when not finding a suitable runqueue where to
place a vCPU, and hence using a fallback, we either:
 - don't issue any trace record (while we should, at
   least, output the chosen pcpu),
 - risk underruning when accessing the runqueues
   array, while preparing the trace record.

Fix both issues and, while there, also a couple of style
problems found nearby.

Spotted by Coverity.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agomwait-idle: add Denverton
Jacob Pan [Wed, 3 Aug 2016 12:41:13 +0000 (14:41 +0200)]
mwait-idle: add Denverton

Denverton is an Intel Atom based micro server which shares the same
Goldmont architecture as Broxton. The available C-states on
Denverton is a subset of Broxton with only C1, C1e, and C6.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit: 0080d65b7719fc58e60b5595fc61acded330004f]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: introduce and use rdtsc_ordered()
Jan Beulich [Wed, 3 Aug 2016 12:40:44 +0000 (14:40 +0200)]
x86/time: introduce and use rdtsc_ordered()

Matching Linux commit 03b9730b76 ("x86/asm/tsc: Add rdtsc_ordered() and
use it in trivial call sites") and earlier ones it builds upon, let's
make sure timing loops don't have their rdtsc()-s re-ordered, as that
would harm precision of the result (values were observed to be several
hundred clocks off without this adjustment).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
8 years agox86/time: adjust local system time initialization
Jan Beulich [Wed, 3 Aug 2016 12:39:31 +0000 (14:39 +0200)]
x86/time: adjust local system time initialization

Using the bare return value from read_platform_stime() is not suitable
when local_time_calibration() is going to use its fast path: Divergence
of several dozen microseconds between NOW() return values on different
CPUs results when platform and local time don't stay in close sync.

Latch local and platform time on the CPU initiating AP bringup, such
that the AP can use these values to seed its stime_local_stamp with as
little of an error as possible. The boot CPU, otoh, can simply
calculate the correct initial value (other CPUs could do so too with
even greater accuracy than the approach being introduced, but that can
work only if all CPUs' TSCs start ticking at the same time, which
generally can't be assumed to be the case on multi-socket systems).

This slightly defers init_percpu_time() (moved ahead by commit
dd2658f966 ["x86/time: initialise time earlier during
start_secondary()"]) in order to reduce as much as possible the gap
between populating the stamps and consuming them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibxl: use llabs() instead abs() for int64_t argument
Juergen Gross [Tue, 2 Aug 2016 17:25:42 +0000 (19:25 +0200)]
libxl: use llabs() instead abs() for int64_t argument

Commit 57f8b13c724023c78fa15a80452d1de3e51a1418 ("libxl: memory size
in kb requires 64 bit variable") introduced a bug: abs() shouldn't
be called with an int64_t argument. llabs() is to be used here.

Caught by clang build with error message:

libxl.c:4198:33: error: absolute value function 'abs' given an argument
of type
    'int64_t' (aka 'long') but has parameter of type 'int' which may cause
    truncation of value [-Werror,-Wabsolute-value]
    if (target_memkb < 0 && abs(target_memkb) > current_target_memkb)

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/mm: Annotate gfn_get_* helpers as requiring non-NULL parameters
Andrew Cooper [Wed, 27 Jul 2016 17:34:39 +0000 (18:34 +0100)]
x86/mm: Annotate gfn_get_* helpers as requiring non-NULL parameters

Introduce and use the nonnull attribute to help the compiler catch NULL
parameters being passed to function which require their parameters not to be
NULL.  Experimentally, GCC 4.9 on Debian Jessie only warns of non-NULL-ness
from immediate callers, so propagate the attributes out to all helpers.

A sample error looks like:

mem_sharing.c: In function â€˜mem_sharing_nominate_page’:
mem_sharing.c:884:13: error: null argument where non-null required (argument 3) [-Werror=nonnull]
             amfn = get_gfn_type_access(ap2m, gfn, NULL, &ap2ma, 0, NULL);
             ^

As part of this, replace the get_gfn_type_access() macro with an equivalent
static inline function for extra type safety, and the ability to be annotated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agosystemd: remove hard-coded pid file in xendriverdomain service
Wei Liu [Wed, 20 Jul 2016 15:36:15 +0000 (16:36 +0100)]
systemd: remove hard-coded pid file in xendriverdomain service

Per the discussion in [0], the hard-coded pid file can be removed
completely. Systemd has no trouble figuring out the pid of devd all by
itself.

[0]: https://lists.xen.org/archives/html/xen-devel/2016-07/msg01393.html

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxl: memory size in kb requires 64 bit variable
Juergen Gross [Thu, 28 Jul 2016 13:35:19 +0000 (15:35 +0200)]
libxl: memory size in kb requires 64 bit variable

libxl_set_memory_target() and several other interface functions of
libxl use a 32 bit sized parameter for a memory size value in kBytes.
This limits the maximum size to be passed in such a parameter
depending on signedness of the parameter to 2TB or 4TB.

Correct this by using 64 bit types.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/mem-sharing: mem-sharing a range of memory
Tamas K Lengyel [Mon, 1 Aug 2016 17:14:27 +0000 (11:14 -0600)]
x86/mem-sharing: mem-sharing a range of memory

Currently mem-sharing can be performed on a page-by-page basis from the control
domain. However, this process is quite wasteful when a range of pages have to
be deduplicated.

This patch introduces a new mem_sharing memop for range sharing where
the user doesn't have to separately nominate each page in both the source and
destination domain, and the looping over all pages happen in the hypervisor.
This significantly reduces the overhead of sharing a range of memory.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibxl: create xenstore nodes for control/feature-XXX flags
Paul Durrant [Mon, 1 Aug 2016 08:57:10 +0000 (09:57 +0100)]
libxl: create xenstore nodes for control/feature-XXX flags

The xenstore-paths documentation specifies various control/feature-XXX
flags to allow a guest to tell a toolstack about its abilities to
respond to values written to control/shutdown. However, because the
parent control xenstore key is created read-only to the guest, unless
empty nodes for the feature flags are also created reat/write by the
toolstack, the guest will not be able to set any flags.

This patch adds code to create all specified feature flag nodes at
domain creation time.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: fix printing hotplug arguments/environment
Roger Pau Monne [Tue, 2 Aug 2016 10:49:51 +0000 (12:49 +0200)]
libxl: fix printing hotplug arguments/environment

An OS could decide to not pass any environment variables to hotplug scripts,
and this will trigger a bug in device_hotplug logic, since it expects the
environment array to exist. Allow env to be NULL.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: define semantics of vncpasswd in xl.cfg
Jim Fehlig [Fri, 29 Jul 2016 22:56:22 +0000 (16:56 -0600)]
docs: define semantics of vncpasswd in xl.cfg

A recent discussion around LSN-2016-0001 [1] included defining
the sematics of an empty string for a VNC password. It was stated
that "libxl interprets an empty password in the caller's
configuration to mean that passwordless access should be permitted".

The same applies for vncpasswd setting in xl.cfg. This patch
extends to xl.cfg documentation to define the semantics of setting
vncpasswd to an empty string.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/PCI: update ACPI Check to include SGI Ux3
Boris Ostrovsky [Tue, 2 Aug 2016 15:52:44 +0000 (17:52 +0200)]
x86/PCI: update ACPI Check to include SGI Ux3

These systems use variations of SGI3* for ID string.

Instead of adding abother set of strings do what Linux did
in commit 526018bc and look at first three letters.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: use gcc6'es flags asm() output support
Jan Beulich [Tue, 2 Aug 2016 15:51:10 +0000 (17:51 +0200)]
x86: use gcc6'es flags asm() output support

..., rendering affected code more efficient and smaller.

Note that in atomic.h this at once does away with the redundant output
and input specifications of the memory location touched.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agoxen/types: Correct the definition of uintptr_t
Andrew Cooper [Mon, 1 Aug 2016 12:36:44 +0000 (13:36 +0100)]
xen/types: Correct the definition of uintptr_t

uintptr_t is specified as unsigned int in 32bit, not unsigned long.  This is
why, when copying inttypes.h from GCC, the use of PRIxPTR and similar is
broken for 32bit builds.

Use __attribute__((__mode__(__pointer__))) to get the compilers default
pointer type, which matches the pre-existing inttypes.h

Fix the identified breakage with ELF_PRPTRVAL

Compile tested on all architectures, with a manual printk() to trigger any
potential -Wformat issues.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/common: Sort the obj build order
Andrew Cooper [Mon, 1 Aug 2016 13:03:32 +0000 (14:03 +0100)]
xen/common: Sort the obj build order

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/types: Alter typedef for bool_t
Andrew Cooper [Mon, 1 Aug 2016 10:34:35 +0000 (11:34 +0100)]
xen/types: Alter typedef for bool_t

As xen/stdbool.h is included, the typedef should use bool rather than _Bool.

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mm: Clean up the construction of base_disallow_mask
Andrew Cooper [Fri, 15 Jul 2016 18:34:00 +0000 (19:34 +0100)]
x86/mm: Clean up the construction of base_disallow_mask

 * Use _PAGE_AVAIL_HIGH and _PAGE_NX instead of opencoding them
 * Drop further remenants of the 32bit hypervisor build

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mm: Avoid NULL dereference when checking altp2m's for shareability
Andrew Cooper [Wed, 27 Jul 2016 17:54:16 +0000 (18:54 +0100)]
x86/mm: Avoid NULL dereference when checking altp2m's for shareability

Coverity identifies that __get_gfn_type_access() unconditionally writes to its
type parameter under a number of circumstances.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agox86/vMSI-x: check whether msixtbl_list in msixtbl_pt_register()
Chao Gao [Mon, 1 Aug 2016 16:22:54 +0000 (18:22 +0200)]
x86/vMSI-x: check whether msixtbl_list in msixtbl_pt_register()

MSI-x tables' initializtion had been deferred in the commit
74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798. If an assigned device does not support
MSI-x, the msixtbl_list won't be initialized. However, the following paths
    XEN_DOMCTL_bind_pt_irq
pt_irq_create_bind
    msixtbl_pt_register
do not check this case. Some errors(malwares, etc.) may lead to calling
XEN_DOMCTL_bind_pt_irq without a clear gtable and will cause Xen panic.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agomwait-idle: correct/improve BXT support
Jan Beulich [Mon, 1 Aug 2016 16:21:37 +0000 (18:21 +0200)]
mwait-idle: correct/improve BXT support

Linux commit 5dcef69486 ("intel_idle: add BXT support") added an
8-element lookup array with just a 2-bit value used for lookups. As per
the SDM that bit field is really 3 bits wide. Since the top two array
entries are zero, deal with the resulting invalid (zero) values by
moving the zero-MSR-value check into irtl_2_usec() and having that
function's caller check its result instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Linux commit: 3451ab3ebf92b12801878d8b5c94845afd4219f0]
[Linux commit: bef450962597ff39a7f9d53a30523aae9eb55843]

8 years agoMAINTAINERS: update Quan Xu's email address
Quan Xu [Mon, 1 Aug 2016 10:41:26 +0000 (11:41 +0100)]
MAINTAINERS: update Quan Xu's email address

Signed-off-by: Quan Xu <xuquan8@huawei.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
8 years agolibxl: compilation warning fix for arm & aarch64
Chris Patterson [Wed, 27 Jul 2016 20:01:26 +0000 (16:01 -0400)]
libxl: compilation warning fix for arm & aarch64

GCC 6 will warn on unused static const variables in c modules:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00847.html

When compiling with LIBXL_HAVE_NO_SUSPEND_RESUME set (arm & aarch64),
the compiler emits the following errors:
  xl_cmdimpl.c:101:19: error: 'migrate_report'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:99:19: error: 'migrate_permission_to_go'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:97:19: error: 'migrate_receiver_ready'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:95:19: error: 'migrate_receiver_banner'
      defined but not used [-Werror=unused-const-variable=]

These unused const variables are only used in functions which exist between
the ifndef block:
   #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
   ...
   #endif

Wrap the same ifndef around these variables.

Signed-off-by: Chris Patterson <pattersonc@ainfosec.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxsm: don't require configuring tools to build xen xsm blob
Wei Liu [Mon, 25 Jul 2016 15:13:13 +0000 (16:13 +0100)]
xsm: don't require configuring tools to build xen xsm blob

Starting from 08cffe66 ("xsm: add a default policy to .init.data") we
can attach a xsm policy blob to hypervisor. To build that policy blob
now hypervisor build system needs to enter tools directory.

The expectation for hypervisor and tools build systems is different. We
don't want xen build system to depend on configure but we want tools
build system to. That commit broke this expectation because it required
users to run configure before building hypervisor. This broke ARM build
because ARM developers normally build hypervisor and tools separately
(and possibly on different platforms). It can also break x86 if
developers don't run configure before building hypervisor with XSM on.

To fix it, move major part of tools/flask/policy/Makefile into
Makefile.common and create tools only Makefile to include that common
Makefile. Hypervisor Makefile will use Makefile.common to build xsm
policy.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm: p2m: Pass the p2m in parameter rather the domain when it is possible
Julien Grall [Thu, 28 Jul 2016 14:20:20 +0000 (15:20 +0100)]
xen/arm: p2m: Pass the p2m in parameter rather the domain when it is possible

Some p2m functions do not care about the domain except to get the
associate p2m.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Replace flush_tlb_domain by p2m_flush_tlb
Julien Grall [Thu, 28 Jul 2016 14:20:19 +0000 (15:20 +0100)]
xen/arm: p2m: Replace flush_tlb_domain by p2m_flush_tlb

The function to flush the TLBs for a given p2m does not need to know about
the domain. So pass directly the p2m in parameter.

At the same time rename the function to p2m_flush_tlb to match the
parameter change.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Don't export flush_tlb_domain
Julien Grall [Thu, 28 Jul 2016 14:20:18 +0000 (15:20 +0100)]
xen/arm: Don't export flush_tlb_domain

The function flush_tlb_domain is not used outside of the file where it
has been declared.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Inline p2m_load_VTTBR into p2m_restore_state
Julien Grall [Thu, 28 Jul 2016 14:20:17 +0000 (15:20 +0100)]
xen/arm: p2m: Inline p2m_load_VTTBR into p2m_restore_state

p2m_restore_state is the last caller of p2m_load_VTTBR and already check
if the vCPU does not belong to the idle domain.

Note that it is likely possible to remove some isb in the function
p2m_restore_state, however this is not the purpose of this patch. So the
numerous isb have been left.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Rework the context switch to another VTTBR in flush_tlb_domain
Julien Grall [Thu, 28 Jul 2016 14:20:16 +0000 (15:20 +0100)]
xen/arm: p2m: Rework the context switch to another VTTBR in flush_tlb_domain

The current implementation of flush_tlb_domain is relying on the domain
to have a single p2m. With the upcoming feature altp2m, a single domain
may have different p2m. So we would need to switch to the correct p2m in
order to flush the TLBs.

Rather than checking whether the domain is not the current domain, check
whether the VTTBR is different. The resulting assembly code is much
smaller: from 38 instructions (+ 2 functions call) to 22 instructions.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Don't need to restore the state for an idle vCPU.
Julien Grall [Thu, 28 Jul 2016 14:20:15 +0000 (15:20 +0100)]
xen/arm: p2m: Don't need to restore the state for an idle vCPU.

The function p2m_restore_state could be called with an idle vCPU in
arguments (when called by construct_dom0). However, we will never return
to EL0/EL1 in this case, so it is not necessary to restore the p2m
registers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Move the vttbr field from arch_domain to p2m_domain
Julien Grall [Thu, 28 Jul 2016 14:20:14 +0000 (15:20 +0100)]
xen/arm: p2m: Move the vttbr field from arch_domain to p2m_domain

The field vttbr holds the base address of the translation table for
guest. Its value will depends on how the p2m has been initialized and
will only be used by the P2M code.

So move the field from arch_domain to p2m_domain. This will also ease
the implementation of altp2m.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Don't call p2m_alloc_table from arch_domain_create
Julien Grall [Thu, 28 Jul 2016 14:20:13 +0000 (15:20 +0100)]
xen/arm: Don't call p2m_alloc_table from arch_domain_create

The p2m root table does not need to be allocate separately.

Also remove unnecessary fields initialization as the structure is already
memset to 0 and the fields will be overridden by p2m_alloc_table.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Switch the p2m lock from spinlock to rwlock
Julien Grall [Thu, 28 Jul 2016 14:20:12 +0000 (15:20 +0100)]
xen/arm: p2m: Switch the p2m lock from spinlock to rwlock

P2M reads do not require to be serialized. This will add contention
when PV drivers are using multi-queue because parallel grant
map/unmaps/copies will happen on DomU's p2m.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Introduce p2m_{read,write}_{,un}lock helpers
Julien Grall [Thu, 28 Jul 2016 14:20:11 +0000 (15:20 +0100)]
xen/arm: p2m: Introduce p2m_{read,write}_{,un}lock helpers

Some functions in the p2m code do not require to modify the P2M code.
Document it by introducing separate helpers to lock the p2m.

This patch does not change the lock. This will be done in a subsequent
patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Remove unnecessary locking
Julien Grall [Thu, 28 Jul 2016 14:20:10 +0000 (15:20 +0100)]
xen/arm: p2m: Remove unnecessary locking

The p2m is not yet in use when p2m_init and p2m_allocate_table are
called. Furthermore the p2m is not used anymore when p2m_teardown is
called. So taking the p2m lock is not necessary.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Find the memory attributes based on the p2m type
Julien Grall [Thu, 28 Jul 2016 14:20:09 +0000 (15:20 +0100)]
xen/arm: p2m: Find the memory attributes based on the p2m type

Currently, mfn_to_p2m_entry is relying on the caller to provide the
correct memory attribute and will deduce the sharability based on it.

Some of the callers, such as p2m_create_table, are using same memory
attribute regardless the underlying p2m type. For instance, this will
lead to use change the memory attribute from MATTR_DEV to MATTR_MEM when
a MMIO superpage is shattered.

Furthermore, it makes more difficult to support different shareability
with the same memory attribute.

All the memory attributes could be deduced via the p2m type. This will
simplify the code by dropping one parameter.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Differentiate cacheable vs non-cacheable MMIO
Julien Grall [Thu, 28 Jul 2016 14:20:08 +0000 (15:20 +0100)]
xen/arm: p2m: Differentiate cacheable vs non-cacheable MMIO

Currently, the p2m type p2m_mmio_direct is used to map in stage-2
cacheable MMIO (via map_regions_rw_cache) and non-cacheable one (via
map_mmio_regions). The p2m code is relying on the caller to give the
correct memory attribute.

In a follow-up patch, the p2m code will rely on the p2m type to find the
correct memory attribute. In preparation of this, introduce
p2m_mmio_direct_nc and p2m_mimo_direct_c to differentiate the
cacheability of the MMIO.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Use a whitelist rather than blacklist in get_page_from_gfn
Julien Grall [Thu, 28 Jul 2016 14:20:07 +0000 (15:20 +0100)]
xen/arm: p2m: Use a whitelist rather than blacklist in get_page_from_gfn

Currently, the check in get_page_from_gfn is using a blacklist. This is
very fragile because we may forgot to update the check when a new p2m
type is added.

To avoid any possible issue, use a whitelist. All type backed by a RAM
page can could potential be valid. The check is borrowed from x86.

Note with this change, it is not possible anymore to retrieve a page when
the p2m type is p2m_iommu_map_*. This is fine because they are special
mappings for direct mapping workaround and the associated GFN should be
used at all by callers of get_page_from_gfn.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Use the typesafe MFN in mfn_to_p2m_entry
Julien Grall [Thu, 28 Jul 2016 14:20:06 +0000 (15:20 +0100)]
xen/arm: p2m: Use the typesafe MFN in mfn_to_p2m_entry

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agolibxl: fix double free when doing xl save
Juergen Gross [Thu, 28 Jul 2016 07:21:43 +0000 (09:21 +0200)]
libxl: fix double free when doing xl save

Commit d2412fd63b14c6c21d0a3d4367afa448425dfb8a ("libxl: move common
nic stuff into one source") introduced a double free error in libxl
which occurred during "xl save".

Correct this error.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/arm: Fix coding style and update comment in acpi_route_spis
Julien Grall [Wed, 27 Jul 2016 13:58:30 +0000 (14:58 +0100)]
xen/arm: Fix coding style and update comment in acpi_route_spis

The comment was not correctly indented. Also the preferred name for the
initial domain is "hardware domain" and not "dom0, so replace it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: acpi: route all unused IRQs to DOM0
Julien Grall [Wed, 27 Jul 2016 13:58:29 +0000 (14:58 +0100)]
xen/arm: acpi: route all unused IRQs to DOM0

It is not possible to know which IRQs will be used by DOM0 when ACPI is
inuse. The approach implemented by this patch, will route all unused
IRQs to DOM0 before it has booted.

The number of IRQs routed is based on the maximum SPIs supported by the
hardware (up to ~1000). However, some of them might not be wired. So we
would allocate resource for nothing.

For each IRQ routed, Xen is allocating memory for irqaction (40 bytes)
and irq_guest (16 bytes). So in the worst case scenario ~54KB of memory
will be allocated. Given that ACPI will mostly be used by server, I
think it is a small drawback.

map_irq_to_domain is slightly reworked to remove the dependency on
device-tree. So the function can be also be used for ACPI and will
avoid code duplication.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
8 years agoxen/arm: Allow DOM0 to set the IRQ type
Julien Grall [Wed, 27 Jul 2016 13:58:28 +0000 (14:58 +0100)]
xen/arm: Allow DOM0 to set the IRQ type

The function route_irq_to_guest mandates the IRQ type, stored in
desc->arch.type, to be valid. However, in case of ACPI, these
information is not part of the static tables. Therefore Xen needs to
rely on DOM0 to provide a valid type based on the firmware tables.

A new helper, irq_type_set_by_domain is provided to check whether a
domain is allowed to set the IRQ type. For now, only DOM0 is allowed to
configure.

When the helper returns 1, the routing function will not check whether
the IRQ type is correctly set and configure the GIC. Instead, this will
be done when the domain will enable the interrupt.

Note that irq_set_spi_type is not called because it validates the type
and does not allow it the domain to change the type after the first
write. It means that desc->arch.type may never be set, which is fine
because the field is only used to configure the type during the routing.

Based on 4.3.13 in ARM IHI 0048B.b, changing the value of Int_config is
UNPREDICTABLE when the corresponding interrupt is not disabled.

Therefore, setting the IRQ type when the guest is writing into ICFGR
would require more work to make sure the IRQ has been disabled before
writing into the host ICFGR. As the behavior is UNPREDICTABLE, the type
will be set before enabling the physical IRQ associated to the virtual IRQ.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoRevert "xen/arm: warn the user that we cannot route SPIs to Dom0 on ACPI"
Julien Grall [Wed, 27 Jul 2016 13:58:27 +0000 (14:58 +0100)]
Revert "xen/arm: warn the user that we cannot route SPIs to Dom0 on ACPI"

This reverts commit f91c84edebe67296e4051af055dbf0adafb13a37. SPI
routing for ACPI support will be added in a follow-up patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: Document how gic_set_irq_type should be called
Julien Grall [Wed, 27 Jul 2016 13:58:26 +0000 (14:58 +0100)]
xen/arm: gic: Document how gic_set_irq_type should be called

Changing the value of Int_config is UNPREDICTABLE when the corresponding
interrupt is not disabled.

The driver is assuming the interrupt will be disabled by the caller of
gic_set_irq_type. Add an ASSERT to ensure it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: set_type: Pass the type in parameter rather than in desc->arch.type
Julien Grall [Wed, 27 Jul 2016 13:58:25 +0000 (14:58 +0100)]
xen/arm: gic: set_type: Pass the type in parameter rather than in desc->arch.type

A follow-up patch will not store the type in desc->arch.type. Also, the
callback prototype is more logical.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: split set_irq_properties
Julien Grall [Wed, 27 Jul 2016 13:58:24 +0000 (14:58 +0100)]
xen/arm: gic: split set_irq_properties

The callback set_irq_properties will configure the GIC for a specific
IRQ with the type and the priority.

In a follow-up patch, Xen will configure the type and the priority at
different stage of the routing. So split it in 2 separate callbacks.

At the same time, move the ASSERT to check the validity of the type and
if the desc->lock is locked in the common code (gic.c). This is because
the constraint are the same between GICv2 and GICv3, however the driver
of the latter did not contain any sanity check.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: Do not configure affinity during routing
Julien Grall [Wed, 27 Jul 2016 13:58:23 +0000 (14:58 +0100)]
xen/arm: gic: Do not configure affinity during routing

The affinity of a guest IRQ is set every time the guest enable it (see
vgic_enable_irqs).

It is not necessary to set the affinity when the IRQ is routed to the
guest because Xen will never receive the IRQ until it hass been enabled
by the guest.

To keep gic_route_irq_to_{xen,guest} behaving the same way (i.e just
setting up the routing), the affinity of IRQ routed to Xen is moved into
__setup_irq.

Signed-off-by: Julien grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: Consolidate the IRQ affinity set in a single place
Julien Grall [Wed, 27 Jul 2016 13:58:22 +0000 (14:58 +0100)]
xen/arm: gic: Consolidate the IRQ affinity set in a single place

The code to set the IRQ affinity is duplicated: once in
gicv{2,3}_set_properties and the other is gicv{2,3}_irq_set_affinity.

Remove the code from gicv{2,3}_set_properties and call directly the
affinity set helper from the common code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/domctl: Add DOMINFO_hap to xen_domctl_getdomaininfo
Andrew Cooper [Fri, 15 Jul 2016 15:43:48 +0000 (16:43 +0100)]
xen/domctl: Add DOMINFO_hap to xen_domctl_getdomaininfo

This allows a toolstack to identify whether a running domain is using hardware
assisted paging or not.

The appropriate tests differ by architecture, so introduce
arch_get_domain_info().  ARM unconditionally sets the new flag, while x86
checks with the paging subsystem first.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agolibxl: move common nic stuff into one source
Juergen Gross [Tue, 12 Jul 2016 15:30:44 +0000 (17:30 +0200)]
libxl: move common nic stuff into one source

Put all nic related stuff of libxl form common files into a dedicated
source file.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: add config update callback to device type framework
Juergen Gross [Tue, 12 Jul 2016 15:30:43 +0000 (17:30 +0200)]
libxl: add config update callback to device type framework

Some device types require a configuration update after resume of
domain. Add a callback for this purpose.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: split libxl vtpm code into one source
Juergen Gross [Tue, 12 Jul 2016 15:30:42 +0000 (17:30 +0200)]
libxl: split libxl vtpm code into one source

Put all vtpm related stuff of libxl into a dedicated source file.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: move library pvusb specific code into libxl_pvusb.c
Juergen Gross [Tue, 12 Jul 2016 15:30:41 +0000 (17:30 +0200)]
libxl: move library pvusb specific code into libxl_pvusb.c

Outside libxl_pvusb.c only libxl_util.c still contains some pvusb code.

Move it to libxl_pvusb.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: add "pv device mode needed" support to device type framework
Juergen Gross [Tue, 12 Jul 2016 15:30:40 +0000 (17:30 +0200)]
libxl: add "pv device mode needed" support to device type framework

Add another callback to the device type framework in order to aid
decision whether a pv domain needs a device model.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>