]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agox86emul: don't special case fetching unsigned 8-bit immediates
Jan Beulich [Fri, 12 Aug 2016 14:57:07 +0000 (16:57 +0200)]
x86emul: don't special case fetching unsigned 8-bit immediates

These can be made work using SrcImmByte, making sure the low 8 bits of
src.val get suitably zero extended upon consumption. SHLD and SHRD
require a little more adjustment: Their source operands get changed
away from SrcReg, handling the register access "manually" instead of
the insn byte fetching.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: all push flavors are data moves
Jan Beulich [Fri, 12 Aug 2016 14:55:48 +0000 (16:55 +0200)]
x86emul: all push flavors are data moves

Make all paths leading to the "push" label have the Mov flag set, and
ASSERT() that to be the case. For the opcode FF group the adjustment is
benign for the paths not leading to "push", as they all set dst.type to
OP_NONE

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: don't special case fetching immediates of near and short branches
Jan Beulich [Fri, 12 Aug 2016 14:55:13 +0000 (16:55 +0200)]
x86emul: don't special case fetching immediates of near and short branches

These immediates follow the standard patterns in all modes, so they're
better fetched by the generic source operand handling code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: don't special case fetching the immediate of PUSH
Jan Beulich [Fri, 12 Aug 2016 14:54:24 +0000 (16:54 +0200)]
x86emul: don't special case fetching the immediate of PUSH

These immediates follow the standard patterns in all modes, so they're
better fetched by the generic source operand handling code.

To facilitate testing, instead of adding yet another of these pretty
convoluted individual test cases, simply introduce another blowfish run
with -mno-accumulate-outgoing-args (the additional -Dstatic is to
keep the compiler from converting the calling convention to
"regparm(3)", which I did observe it does).

To make this introduction of a new blowfish pass (and potential further
ones later one) have less impact on the readability of the final code,
abstract all such "binary blob" executions via a table to iterate
through.

The resulting native code execution adjustment also uncovered a lack of
clobbers on the asm() in the 64-bit case, which is being fixed at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agovm_event: synchronize vCPU state in vm_event_resume()
Razvan Cojocaru [Fri, 12 Aug 2016 14:51:36 +0000 (16:51 +0200)]
vm_event: synchronize vCPU state in vm_event_resume()

Vm_event_vcpu_pause() needs to use vcpu_pause_nosync() in order
for the current vCPU to not get stuck. A consequence of this is
that the custom vm_event response handlers will not always see
the real vCPU state in v->arch.user_regs. This patch makes sure
that the state is always synchronized in vm_event_resume, before
any handlers have been called. This problem especially affects
vm_event_set_registers().

Simply checking vm_event_pause_count to make sure the vCPU is
paused suffices since there's only one ring / consumer at a
time, and events are being processed one-by-one, so the
toolstack won't unpause the vCPU behind our backs.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
8 years agovm_event: fix comment
Razvan Cojocaru [Fri, 12 Aug 2016 14:51:21 +0000 (16:51 +0200)]
vm_event: fix comment

There's no such thing as function vm_event_wake_waiters() anymore.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
8 years agox86/cpufreq: Avoid using processor_pminfo[cpu] when it is NULL
Andrew Cooper [Thu, 11 Aug 2016 17:21:14 +0000 (17:21 +0000)]
x86/cpufreq: Avoid using processor_pminfo[cpu] when it is NULL

The undefined behaviour sanitiser shows that it really is NULL via the
pre_initcall path.

  (XEN) ================================================================================
  (XEN) UBSAN: Undefined behaviour in cpufreq.c:158:66
  (XEN) member access within null pointer of type 'struct processor_pminfo'
  (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Not tainted ]----
  <snip>
  (XEN)    [<ffff82d0801c4231>] cpufreq_add_cpu+0x161/0xdc0
  (XEN)    [<ffff82d0801c6610>] cpufreq.c#cpu_callback+0x20/0x30
  (XEN)    [<ffff82d0804eefad>] cpufreq.c#cpufreq_presmp_init+0x2d/0x50
  (XEN)    [<ffff82d0804c5942>] do_presmp_initcalls+0x22/0x30
  (XEN)    [<ffff82d08051852d>] __start_xen+0x378d/0x42f0
  (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60

Fix two other occurances of the same buggy logic.

The processor_pminfo[] objects are only allocated as a result of
XENPF_set_processor_pminfo hypercalls, which means that this early cpu
callback will always hit the early NULL check, and is therefore pointless.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/boot: Align e820 and video data in the boot trampoline
Andrew Cooper [Thu, 11 Aug 2016 16:32:10 +0000 (16:32 +0000)]
x86/boot: Align e820 and video data in the boot trampoline

The undefined behaviour sanitiser in Clang 3.8 identifies that these are all
misaigned when used in __start_xen().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agolibxc: use DPRINTF in xc_domain_dumpcore_via_callback
Wei Liu [Thu, 11 Aug 2016 10:13:10 +0000 (11:13 +0100)]
libxc: use DPRINTF in xc_domain_dumpcore_via_callback

That line doesn't reveal much information to ordinary users.

Change that to debug output.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agox86/NUMA: cleanup
Jan Beulich [Thu, 11 Aug 2016 11:36:42 +0000 (13:36 +0200)]
x86/NUMA: cleanup

- drop the only left CONFIG_NUMA conditional (this is always true)
- drop struct node_data's node_id field (being always equal to the
  node_data[] array index used)
- don't open code node_{start,end}_pfn() nor node_spanned_pages()
  except when used as lvalues (those could be converted too, but this
  seems a little awkward)
- no longer open code pfn_to_paddr() in an expression being modified
  anyway
- make dump less verbose by logging actual vs intended node IDs only
  when they don't match

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agopage-alloc/x86: don't restrict DMA heap to node 0
Jan Beulich [Thu, 11 Aug 2016 11:35:50 +0000 (13:35 +0200)]
page-alloc/x86: don't restrict DMA heap to node 0

When node zero has no memory, the DMA bit width will end up getting set
to 9, which is obviously not helpful to hold back a reasonable amount
of low enough memory for Dom0 to use for DMA purposes. Find the lowest
node with memory below 4Gb instead.

Introduce arch_get_dma_bitsize() to keep this arch-specific logic out
of common code.

Also adjust the original calculation: I think the subtraction of 1
should have been part of the flsl() argument rather than getting
applied to its result. And while previously the division by 4 was valid
to be done on the flsl() result, this now also needs to be converted,
as is should only be applied to the spanned pages value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoallow reproducible builds of xen.gz
Trammell Hudson [Thu, 11 Aug 2016 11:34:59 +0000 (13:34 +0200)]
allow reproducible builds of xen.gz

The mkelf32 executable was using an uninitialized stack buffer for
padding after the ehdr and phdr are written to the xen file, which
leads to non-deterministic bytes in the binary and prevented Xen
hypervisors from being reproducibly built.

Additionally, the file was then compressed with gzip -9 without the
-n | --no-name flag, which lead to the xen.gz file having
non-deterministric bytes (the timestamp) in the compressed file.

Signed-off-by: Trammell Hudson <trammell.hudson@twosigma.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoAMD/VPMU: 0xc0010000 - 0xc001007 MSRs are in PMU range
Boris Ostrovsky [Thu, 11 Aug 2016 11:34:16 +0000 (13:34 +0200)]
AMD/VPMU: 0xc0010000 - 0xc001007 MSRs are in PMU range

We need to check for older PMU MSR range when emulating MSR
accesses for PV guests.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/HVM: add more checks verifying that PIT/PIC/IOAPIC are emulated
Boris Ostrovsky [Thu, 11 Aug 2016 11:18:24 +0000 (13:18 +0200)]
x86/HVM: add more checks verifying that PIT/PIC/IOAPIC are emulated

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/microcode: Avoid undefined behaviour from signed integer overflow
Andrew Cooper [Fri, 5 Aug 2016 13:26:21 +0000 (14:26 +0100)]
x86/microcode: Avoid undefined behaviour from signed integer overflow

The checksums should be calculated using unsigned 32bit integers, as they are
intended to overflow and end at 0.  Replace some other signed integers with
unsigned ones, to avoid mixed-sign comparisons.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agoxen/x86: Avoid undefined behaviour by shifting into a sign bit
Andrew Cooper [Fri, 5 Aug 2016 13:24:01 +0000 (14:24 +0100)]
xen/x86: Avoid undefined behaviour by shifting into a sign bit

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/common: Avoid undefined behaviour by shifting into a sign bit
Andrew Cooper [Fri, 5 Aug 2016 13:22:48 +0000 (14:22 +0100)]
xen/common: Avoid undefined behaviour by shifting into a sign bit

For d->shutdown_code, change the field to being unsigned and using an unsigned
sentinel.  The sentinal needs to be distinguishable from any value
representable in a u8.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/traps: Fix failed ASSERT() in do_guest_trap()
Andrew Cooper [Wed, 10 Aug 2016 09:41:28 +0000 (10:41 +0100)]
x86/traps: Fix failed ASSERT() in do_guest_trap()

c/s 2e426d6 "x86/traps: Drop use_error_code parameter from do_{,guest_}trap()"
introduced an assertion which covered the correctness of shifting 1u by an
input parameter.

While all other inputs provide a constants vector, the `int $N` handling path
from do_general_protection() passes any vector.

This path is triggered by XTF, which uses `int 0x20` to facilitate returning
to kernel mode after running specific tests in user mode.

No vectors above 32 have an error code, so adjust the logic to cope.

Reported-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agohvmloader: acpi_build_tables() can't take acpi_config as const
Boris Ostrovsky [Wed, 10 Aug 2016 09:58:34 +0000 (11:58 +0200)]
hvmloader: acpi_build_tables() can't take acpi_config as const

We'd need to update other routines' definitions. However, acpi_config
is not really a const since new_vm_gid() wants to update
acpi_config.vm_gid_addr.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
8 years agohvmloader: include libacpi.h instead of acpi2_0.h in rombios.c
Boris Ostrovsky [Wed, 10 Aug 2016 09:58:17 +0000 (11:58 +0200)]
hvmloader: include libacpi.h instead of acpi2_0.h in rombios.c

This is where struct acpi_config is now defined

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
8 years agocommon: clean up taint logic
Jan Beulich [Tue, 9 Aug 2016 15:31:46 +0000 (17:31 +0200)]
common: clean up taint logic

Drop unused UNSAFE_SMP and BAD_PAGE flags. Style adjstments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agohvmloader: move passthrough initialization from ACPI code
Boris Ostrovsky [Tue, 9 Aug 2016 15:31:15 +0000 (17:31 +0200)]
hvmloader: move passthrough initialization from ACPI code

Initialize it in hvmloader, avoiding ACPI code's use of xenstore_read()

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agohvmloader: decide which SSDTs to install in hvmloader
Boris Ostrovsky [Tue, 9 Aug 2016 15:30:56 +0000 (17:30 +0200)]
hvmloader: decide which SSDTs to install in hvmloader

With that, xenstore_read() won't need to be done in ACPI code

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agohvmloader: initialize vm_gid data outside ACPI code
Boris Ostrovsky [Tue, 9 Aug 2016 15:30:39 +0000 (17:30 +0200)]
hvmloader: initialize vm_gid data outside ACPI code

This way ACPI code won't use xenstore-read() and hvm_param_set()
which are private to hvmloader.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoacpi/hvmloader: allow acpi_build_tables() callers specify acpi_info values
Boris Ostrovsky [Tue, 9 Aug 2016 15:28:59 +0000 (17:28 +0200)]
acpi/hvmloader: allow acpi_build_tables() callers specify acpi_info values

By doing this we can move hvmloader-private interfaces (such as
uart_exists(), lpt_exists() etc.) out of the ACPI builder. This will
help us with allowing to call the builder from places other than
hvmloader.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agohvmloader: provide hvmloader_acpi_build_tables()
Boris Ostrovsky [Tue, 9 Aug 2016 15:27:39 +0000 (17:27 +0200)]
hvmloader: provide hvmloader_acpi_build_tables()

In preparation for moving out ACPI builder make all
BIOSes call hvmloader_acpi_build_tables() instead of
calling ACPI code directly.

No functional changes.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agotools/xenalyze: Allow automatic resizing of sample buffers
George Dunlap [Mon, 8 Aug 2016 09:42:50 +0000 (10:42 +0100)]
tools/xenalyze: Allow automatic resizing of sample buffers

Rather than have large fixed-size buffers, start with smaller buffers
and allow them to grow as needed (doubling each time), with a fairly
large maximum.  Allow this maximum to be set by a command-line
parameter.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/xenalyze: Get rid of extraneous data structure
George Dunlap [Mon, 8 Aug 2016 09:42:49 +0000 (10:42 +0100)]
tools/xenalyze: Get rid of extraneous data structure

The only difference between event_cycle_summary and cycle_summary was
that the former has a separate counter for "events" which had
zero-cycle events.  But a lot of the code dealing with them had to be
duplicated with slightly different fields.

Remove event_cycle_summary, add an "event_count" field to
cycle_symmary, and use cycle_summary for everything.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/xenalyze: Remove weighted cpi summaries
George Dunlap [Mon, 8 Aug 2016 09:42:48 +0000 (10:42 +0100)]
tools/xenalyze: Remove weighted cpi summaries

At the moment these structures are not used, and half of the code for
collecting it is commented out.  To be used they require further
support for collecting hardware instruction counter data inside of
Xen.

Remove the code entirely; when they're wanted again they will be here
in the git log.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/xenalyze: Remove bogus library dependencies
George Dunlap [Mon, 8 Aug 2016 09:42:48 +0000 (10:42 +0100)]
tools/xenalyze: Remove bogus library dependencies

xenalyze was inheriting LDLIBS of xentrace; but it doesn't need them.

Remove this dependency, which allows xenalyze to be built without the libraries
having been built, and run without the libraries being installed.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
8 years agolibxl: fix declaration of libxl_primary_console_exec_0x040700
Wei Liu [Mon, 8 Aug 2016 15:02:34 +0000 (16:02 +0100)]
libxl: fix declaration of libxl_primary_console_exec_0x040700

Add missing "int".

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agox86/traps: Drop use_error_code parameter from do_{,guest_}trap()
Andrew Cooper [Wed, 3 Aug 2016 16:56:56 +0000 (16:56 +0000)]
x86/traps: Drop use_error_code parameter from do_{,guest_}trap()

Whether or not an error code is needed can be determinted entirely from the
trapnr paramter, as error codes are architecturally specified.

Introduce TRAP_HAVE_EC as a bitmap of reserved vectors which have error codes,
and drop the use_error_code from all callsites.

As a result, the DO_ERROR{,_NOCODE}() macros become entirely superflouous and
can be dropped.  Update the exception_table to point straight at do_trap().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agolibxl: CODING_STYLE: Forbid if (...) { stmt; } else stmt;
Ian Jackson [Mon, 8 Aug 2016 10:21:31 +0000 (11:21 +0100)]
libxl: CODING_STYLE: Forbid if (...) { stmt; } else stmt;

And clarify that the rule about omitting braces for single statements
is optional (it is even contradicted by the example).

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxl: use xenconsole startup protocol
Wei Liu [Mon, 1 Aug 2016 09:55:59 +0000 (10:55 +0100)]
xl: use xenconsole startup protocol

If user asks xl to automatically connect to console when creating a
guest, use the new startup protocol before trying to unpause domain so
that we don't lose any console output.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agodocs: document xenconsole startup protocol
Wei Liu [Mon, 1 Aug 2016 09:36:57 +0000 (10:36 +0100)]
docs: document xenconsole startup protocol

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxl: libxl_{primary_,}console_exec now take notify_fd argument
Wei Liu [Mon, 1 Aug 2016 09:28:00 +0000 (10:28 +0100)]
libxl: libxl_{primary_,}console_exec now take notify_fd argument

The new argument will be passed down to xenconsole process, which then
uses it to notify readiness.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxl: factor out libxl__console_tty_path
Wei Liu [Mon, 1 Aug 2016 11:20:09 +0000 (12:20 +0100)]
libxl: factor out libxl__console_tty_path

No other user yet.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools/console: introduce --start-notify-fd option for console client
Wei Liu [Fri, 29 Jul 2016 17:24:25 +0000 (18:24 +0100)]
tools/console: introduce --start-notify-fd option for console client

The console client will write 0x00 to that fd before entering console
loop to indicate its readiness.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools/console: fix help string in client
Wei Liu [Fri, 29 Jul 2016 17:22:26 +0000 (18:22 +0100)]
tools/console: fix help string in client

There is no short '-t' option.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoCODING_STYLE: Allow single-sentence comments without full stops
George Dunlap [Mon, 8 Aug 2016 10:07:46 +0000 (11:07 +0100)]
CODING_STYLE: Allow single-sentence comments without full stops

One of the common ways in which contributors trip up over the
CODING_STYLE guides is by not putting a full stop at the end of a
comment when there is only a single sentence.  Calling these out is a
waste of everybody's time: The full stop at the end of a comment with
a single sentence (or a single phrase) adds absolutely nothing to the
legibility of the code.

Modify CODING_STYLE to allow comments with a single sentence or
sentence fragment to either have a full stop or not, while making it
clear that comments with multiple sentences must have a full stop at
the end of each sentence.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@citrix.com>
8 years agotools: xenalyze: kill spurious sched_switch output in non dump mode.
Dario Faggioli [Thu, 4 Aug 2016 08:59:03 +0000 (10:59 +0200)]
tools: xenalyze: kill spurious sched_switch output in non dump mode.

In fact, 52cf096df7 ("xenalyze: handle scheduling event"),
when dealing with TRC_SCHED_SWITCH, forgot to check whether
we actually are in dump mode, causing the printf() in
dump_sched_switch() to always produce its output, which
is not what we want.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agox86/time: also use rdtsc_ordered() in check_tsc_warp()
Jan Beulich [Fri, 5 Aug 2016 16:00:45 +0000 (18:00 +0200)]
x86/time: also use rdtsc_ordered() in check_tsc_warp()

This really was meant to be added in a v2 of what became commit
fa74e70500 ("x86/time: introduce and use rdtsc_ordered()").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibelf: drop struct elf_dom_parms' virt_offset member
Jan Beulich [Fri, 5 Aug 2016 15:59:32 +0000 (17:59 +0200)]
libelf: drop struct elf_dom_parms' virt_offset member

It's being used solely by elf_xen_addr_calc_check(), and hence can be
a local variable there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibxl: return any serial tty path in libxl_console_get_tty
Bob Liu [Thu, 4 Aug 2016 01:07:56 +0000 (09:07 +0800)]
libxl: return any serial tty path in libxl_console_get_tty

When specifying a serial list in domain config, users of
libxl_console_get_tty cannot get the tty path of a second specified pty serial,
since right now it always returns the tty path of serial 0.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: make xenstore domain easy configurable
Juergen Gross [Tue, 2 Aug 2016 16:10:47 +0000 (18:10 +0200)]
tools: make xenstore domain easy configurable

Add configuration entries to sysconfig.xencommons for selection of the
xenstore type (domain or daemon) and start the selected xenstore
service via a script called from sysvinit or systemd.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: use pidfile for test if xenstored is running
Juergen Gross [Tue, 2 Aug 2016 16:10:46 +0000 (18:10 +0200)]
tools: use pidfile for test if xenstored is running

Instead of trying to read xenstore via xenstore-read use the pidfile
of xenstored for the test whether xenstored is running. This prepares
support of xenstore domain, as trying to read xenstore will block
for ever in case xenstore domain is started after trying to read.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: split out xenstored starting form xencommons
Juergen Gross [Tue, 2 Aug 2016 16:10:45 +0000 (18:10 +0200)]
tools: split out xenstored starting form xencommons

In order to prepare starting a xenstore domain split out the starting
of the xenstore daemon from the xencommons script into a dedicated
launch-xenstore script.

A rerun of autogen.sh is required.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: remove systemd xenstore socket definitions
Juergen Gross [Tue, 2 Aug 2016 16:10:44 +0000 (18:10 +0200)]
tools: remove systemd xenstore socket definitions

On a system with systemd the xenstore sockets are created via systemd.
Remove the related configuration files in order to be able to decide
at runtime whether the sockets should be created or not. This will
enable Xen to start xenstore either via a daemon or via a stub domain.

As the xenstore domain start program will exit after it has done its
job prepare the same behaviour to be tolerated by systemd for the
xenstore daemon by specifying the appropriate flags in the service
file.

A rerun of autogen.sh is required.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave@recoil.org>
8 years agoxen/arm: p2m: Don't use default access permission when shattering a superpage
Julien Grall [Fri, 29 Jul 2016 18:53:19 +0000 (19:53 +0100)]
xen/arm: p2m: Don't use default access permission when shattering a superpage

The following message flood the console when memaccess is enabled on
various platforms:

traps.c:2510:d1v0 HSR=0x9383004f pc=0xffff000008b7d4c4 gva=0xffff000008eeb8e0 gpa=0x0000004903f8e0

This is because a data abort from a guest was received due to a
permission fault but memaccess thought there are no permission fault.

On ARM, memaccess permissions are stored in a radix tree because there
are not enough available bits in the p2m entry to store the access
restriction. When memaccess is restricting the access (i.e any other
access than p2m_access_rwx), the access will be added in the radix tree
using the GFN as a key. This will be done for all 4KB pages.

This means that memaccess has to shatter all the superpages in a given
region to set the permission on a 4KB granularity. Currently, when a
superpage is shattered, the new entries are using the value
p2m->default_access which will restrict permission (because memaccess
has been enabled). However the radix tree does not yet contain
an entry for this GFN.

If a guest VCPU is running at the same time and trying to access the
modified region, it will result to a stage-2 permission fault. As
the radix tree does not yet contain an entry for the GFN, memaccess will
deduce that the fault was not valid and a data abort will be injecting
to the guest (and crash it).

Furthermore, the permission may be restricted outside of the requested
region if it is only a subset of a 1GB/2MB superpage.

The two issues can be fixed by re-using the permission of the superpage
entry and override the necessary fields. This is not a problem because
memaccess cannot work on superpage.

Lastly, document the code which call mfn_to_p2m_entry when creating a
the p2m entry for a table to explain that create the p2m entry to page table
to explain that permission are ignored by the hardware (See D4.3.1 in ARM DDI
0487A.j). so the value of the parameter 'access' of mfn_to_p2m_entry does
not matter.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: arm64: Add Cortex-A57 erratum 834220 workaround
Julien Grall [Thu, 4 Aug 2016 17:50:07 +0000 (18:50 +0100)]
xen/arm: arm64: Add Cortex-A57 erratum 834220 workaround

The ARM erratum applies to certain revisions of Cortex-A57. The
processor may report a Stage 2 translation fault as the result of
Stage 1 fault for load crossing a page boundary when there is a
permission fault or device memory fault at stage 1 and a translation
fault at Stage 2.

So Xen needs to check that Stage 1 translation does not generate a fault
before handling the Stage 2 fault. If it is a Stage 1 translation fault,
return to the guest to let the processor injecting the correct fault.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: traps: Avoid unnecessary VA -> IPA translation in abort handlers
Julien Grall [Thu, 4 Aug 2016 17:50:06 +0000 (18:50 +0100)]
xen/arm: traps: Avoid unnecessary VA -> IPA translation in abort handlers

Translating a VA to a IPA is expensive. Currently, Xen is assuming that
HPFAR_EL2 is only valid when the stage-2 data/instruction abort happened
during a translation table walk of a first stage translation (i.e S1PTW
is set).

However, based on the ARM ARM (D7.2.34 in DDI 0487A.j), the register is
also valid when the data/instruction abort occured for a translation
fault.

With this change, the VA -> IPA translation will only happen for
permission faults that are not related to a translation table of a
first stage translation.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: traps: MMIO should only be emulated for fault translation
Julien Grall [Thu, 4 Aug 2016 17:50:05 +0000 (18:50 +0100)]
xen/arm: traps: MMIO should only be emulated for fault translation

The function do_trap_data_abort_guest assumes that a stage-2 data abort
can only be taken for a translation fault or permission fault today.

Whilst this is true today, it might not be in the future. Rather than
emulating the MMIO for any fault other than the permission one, print
a warning message when the fault is not handled by Xen.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Use check_workaround to handle the erratum 766422
Julien Grall [Thu, 4 Aug 2016 17:50:04 +0000 (18:50 +0100)]
xen/arm: Use check_workaround to handle the erratum 766422

Currently, Xen is accessing the stored MIDR everytime it has to check
whether the processor is affected by the erratum 766422.

This could take advantage of the new capability bitfields to detect
whether the processor is affected at boot time.

With this patch, the number of instructions to check the erratum is
going down from ~13 (including 2 loads and a co-processor access) to
~6 instructions (include 1 load).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Provide macros to help creating workaround helpers
Julien Grall [Thu, 4 Aug 2016 17:50:03 +0000 (18:50 +0100)]
xen/arm: Provide macros to help creating workaround helpers

Workarounds may require to execute a different path when the platform
is affected by the associated erratum. Furthermore, this may need to
be called in the common code.

To avoid too much intrusion/overhead, the workaround helpers need to
be a nop on architecture which will never have the workaround and have
to be quick to check whether the platform requires it.

The alternative framework is used to transform the check in a single
instruction. When the framework is not available, the helper will have
~6 instructions including 1 instruction load.

The macro will create a handler called check_workaround_xxxxx with
xxxx the erratum number.

For instance, the line bellow will create a workaround helper for
erratum #424242 which is enabled when the capability
ARM64_WORKAROUND_424242 is set and only available for ARM64:

CHECK_WORKAROUND_HELPER(424242, ARM64_WORKAROUND_42424242, CONFIG_ARM64)

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: traps: Simplify the switch in do_trap_*_abort_guest
Julien Grall [Thu, 4 Aug 2016 17:50:02 +0000 (18:50 +0100)]
xen/arm: traps: Simplify the switch in do_trap_*_abort_guest

The fault status we care are in the form BBBBxx where xx is the lookup
level that gave the fault. We can simplify the code by masking the 2 least
significant bits.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86/debug: Make debugger_trap_entry() safe during early boot
Andrew Cooper [Thu, 4 Aug 2016 11:38:05 +0000 (12:38 +0100)]
x86/debug: Make debugger_trap_entry() safe during early boot

debugger_trap_entry() is reachable during early boot where its unconditional
use of current is unsafe.  Add a warning to the function to this effect.

Perform the vector check first, as this allows the compiler to elide the other
content from most of its callsites.  Check guest_mode(regs) before using
current, which makes the path safe on early boot.

While editing this area, drop DEBUGGER_trap_{entry,fatal}, as hiding a return
statement in a function-like macro is very antisocial programming; show the
real control flow at each of the callsites.  Finally, switch
debugger_trap_{entry,fatal} to having boolean return types, to match their
semantics.

No behavioural change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: support newer Intel CPU models
Jan Beulich [Thu, 4 Aug 2016 08:52:49 +0000 (10:52 +0200)]
x86: support newer Intel CPU models

... as per the June 2016 edition of the SDM.

Also remove a couple of dead break statements as well as unused
*MSR_PM_LASTBRANCH* #define-s.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agohvmloader: don't hard-code IO-APIC parameters
Jan Beulich [Thu, 4 Aug 2016 08:08:48 +0000 (10:08 +0200)]
hvmloader: don't hard-code IO-APIC parameters

The IO-APIC address has variable bits determined by the PCI-to-ISA
bridge (albeit for now we refrain from actually evaluating them, as
there's still implicit rather than explicit agreement on the IO-APIC
base address between qemu and the hypervisor), and the IO-APIC version
should be read from the IO-APIC.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: relax barriers
Jan Beulich [Thu, 4 Aug 2016 08:08:00 +0000 (10:08 +0200)]
x86/time: relax barriers

On x86 there's no need for full barriers in loops waiting for some
memory location to change. Nor do we need full barriers between two
reads and two writes - SMP ones fully suffice.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: group time stamps into a structure
Jan Beulich [Thu, 4 Aug 2016 08:07:02 +0000 (10:07 +0200)]
x86/time: group time stamps into a structure

If that had been done from the beginning, mistakes like the one
corrected in commit b64438c7c1 ("x86/time: use correct (local) time
stamp in constant-TSC calibration fast path") would likely never have
happened.

Also add a few "const" to make more obvious when things aren't expected
to change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: fold recurring code
Jan Beulich [Thu, 4 Aug 2016 08:04:29 +0000 (10:04 +0200)]
x86/time: fold recurring code

Common code between time_calibration_{std,tsc}_rendezvous() can better
live in a single place, eliminating the risk of adjusting one without
the other.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: support 32-bit wide ACPI PM timer
Jan Beulich [Thu, 4 Aug 2016 08:03:28 +0000 (10:03 +0200)]
x86/time: support 32-bit wide ACPI PM timer

I have no idea why we didn't do so from the beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: calibrate TSC against platform timer
Jan Beulich [Thu, 4 Aug 2016 08:02:52 +0000 (10:02 +0200)]
x86/time: calibrate TSC against platform timer

... instead of unconditionally against the PIT. This allows for local
and master system times to remain in better sync (which matters even
when, on any modern system, the master time is really used only during
secondary CPU bringup, as the error between the two is in fact
noticable in cross-CPU NOW() invocation monotonicity).

This involves moving the init_platform_timer() invocation into
early_time_init(), splitting out the few things which really need to be
done in init_xen_time(). That in turn allows dropping the open coded
PIT initialization from init_IRQ() (it was needed for APIC clock
calibration, which runs between early_time_init() and init_xen_time()).

In the course of this re-ordering also set the timer channel 2 gate low
after having finished calibration. This should be benign to overall
system operation, but appears to be the more clean state.

Also do away with open coded 8254 register manipulation from 8259 code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/HVM: add new functions to get/set memory types
Paul Durrant [Thu, 4 Aug 2016 08:01:57 +0000 (10:01 +0200)]
x86/HVM: add new functions to get/set memory types

For clarity this patch breaks the code to set/get memory types out
of do_hvm_op() into dedicated functions: hvmop_set/get_mem_type().
Also, for clarity, checks for whether a memory type change is allowed
are broken out into a separate function called by hvmop_set_mem_type().

There is no intentional functional change in this patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86: rename p2m_mmio_write_dm to p2m_ioreq_server
Paul Durrant [Thu, 4 Aug 2016 08:01:17 +0000 (10:01 +0200)]
x86: rename p2m_mmio_write_dm to p2m_ioreq_server

Previously p2m type p2m_mmio_write_dm was introduced for write-
protected memory pages whose write operations are supposed to be
forwarded to and emulated by an ioreq server. Yet limitations of
rangeset restrict the number of guest pages to be write-protected.

This patch replaces the p2m type p2m_mmio_write_dm with a new name:
p2m_ioreq_server, which means this p2m type can be claimed by one
ioreq server, instead of being tracked inside the rangeset of ioreq
server. And a new memory type, HVMMEM_ioreq_server, is now used in
the HVMOP_set/get_mem_type interface to set/get this p2m type.

Patches following up will add the related HVMOP handling code which
map/unmap type p2m_ioreq_server to/from an ioreq server. Without
following patches, memory type changes to HVMMEM_ioreq_server can
still be allowed, and in such cases, p2m_ioreq_server pages will be
treated the same as ones with previous type p2m_mmio_write_dm, and
are tracked inside the ioreq server's rangeset.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen/arm: traps: Don't inject a fault if the translation VA -> IPA fails
Julien Grall [Wed, 27 Jul 2016 16:37:11 +0000 (17:37 +0100)]
xen/arm: traps: Don't inject a fault if the translation VA -> IPA fails

Based on ARM ARM (D4.5.3 in ARM DDI 0486A and B3.12.7 in ARM DDI 0406C.c),
a Stage 1 translation error has priority over a Stage 2 translation error.

Therefore gva_to_ipa can only fail if another vCPU is playing with the
page table.

Rather than injecting a custom fault, replay the instruction and let the
processor injecting the correct fault.

This is fine as Xen is handling all the pending softirqs
(see leave_hypervisor_tail) before returning to the guest. One of them
is the scheduler which could rescheduled the vCPU.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: arm64: Add cortex-A57 erratum 832075 workaround
Julien Grall [Wed, 27 Jul 2016 16:37:10 +0000 (17:37 +0100)]
xen/arm: arm64: Add cortex-A57 erratum 832075 workaround

The ARM erratum 832075 applies to certain revisions of Cortex-A57, one
of the workarounds is to change device loads into using load-acquire
semantics.

Use the alternative framework to enable the workaround only on affected
cores.

Whilst a guest could trigger the deadlock, it can be broken when the
processor is receiving an interrupt. As the Xen scheduler will always setup
a timer (firing to every 1ms to 300ms depending on the running time
slice) on each processor, the deadlock would last only few milliseconds
and only affects the guest time slice.

Therefore a malicious guest could only hurt itself. Note that all the
guests should implement/enable the workaround for the affected cores.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: arm64: Add Cortex-A53 cache errata workaround
Julien Grall [Wed, 27 Jul 2016 16:37:09 +0000 (17:37 +0100)]
xen/arm: arm64: Add Cortex-A53 cache errata workaround

The ARM errata 819472, 827319 and 824069 define the same workaround for
these hardware issues in certain Cortex-A53 parts.

The cache instructions "dc cvac" and "dc cvau" need to be upgraded to
"dc civac".

Use the alternative framework to replace those instructions only on
affected cores.

Whilst the errata affect cache instructions issued at any exception
level, it is not necessary to trap EL1/EL0 data cache instructions
access in order to upgrade them. Indeed the data cache corruption would
always be at the address used by the data cache instructions. Note that
this address could point to a shared memory between guests and the
hypervisors, however all the information present in it are be validated
before any use.

Therefore a malicious guest could only hurt itself. Note that all the
guests should implement/enable the workaround for the affected cores.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Document the errata implemented in Xen
Julien Grall [Wed, 27 Jul 2016 16:37:08 +0000 (17:37 +0100)]
xen/arm: Document the errata implemented in Xen

The new document will help to keep track of each erratum Xen is able to
handle.

The text is based on the Linux doc in Documents/arm64/silicon-errata.txt.

Also list the current errata that Xen is aware of.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoxen/arm: Detect silicon revision and set cap bits accordingly
Julien Grall [Wed, 27 Jul 2016 16:37:07 +0000 (17:37 +0100)]
xen/arm: Detect silicon revision and set cap bits accordingly

After each CPU has been started, we iterate through a list of CPU
errata to detect CPUs which need from hypervisor code patches.

For each bug there is a function which checks if that a particular CPU is
affected. This needs to be done on every CPU to cover heterogenous
systems properly.

If a certain erratum has been detected, the capability bit will be set.
In the case the erratum requires code patching, this will be triggered
by the call to apply_alternatives.

The code is based on the file arch/arm64/kernel/cpu_errata.c in Linux
v4.6-rc3.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: cpufeature: Provide an helper to check if a capability is supported
Julien Grall [Wed, 27 Jul 2016 16:37:06 +0000 (17:37 +0100)]
xen/arm: cpufeature: Provide an helper to check if a capability is supported

The CPU capabilities will be set depending on the value found in the CPU
registers. This patch provides a generic to go through a set of capabilities
and find which one should be enabled.

The parameter "info" is used to display the kind of capability updated (e.g
workaround, feature...).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoxen/arm: Introduce alternative runtime patching
Julien Grall [Wed, 27 Jul 2016 16:37:05 +0000 (17:37 +0100)]
xen/arm: Introduce alternative runtime patching

Some of the processor erratum will require to modify code sequence.
As those modifications may impact the performance, they should only
be enabled on affected cores. Furthermore, Xen may also want to take
advantage of new hardware features coming up with v8.1 and v8.2.

This patch adds an infrastructure to patch Xen during boot time
depending on the "features" available on the platform.

This code is based on the file arch/arm64/kernel/alternative.c in
Linux 4.6-rc3. Any references to arm64 have been dropped to make the
code as generic as possible.

When Xen is creating the page tables, all the executable sections
(.text and .init.text) will be marked read-only and then enforced by
setting SCTLR.WNX.

Whilst it might be possible to mark those entries read-only after
Xen has been patched, we would need extra care to avoid possible
TLBs conflicts (see D4-1732 in ARM DDI 0487A.i) as all
physical CPUs will be running.

All the physical CPUs have to be brought up before patching Xen because
each cores may have different errata/features which require code
patching. The only way to find them is to probe system registers on
each CPU.

To avoid extra complexity, it is possible to create a temporary
writeable mapping with vmap. This mapping will be used to write the
new instructions.

Lastly, runtime patching is currently not necessary for ARM32. So the
code is only enabled for ARM64.

Note that the header asm-arm/alternative.h is a verbatim copy for the
Linux one (arch/arm64/include/asm/alternative.h). It may contain
innacurate comments, but I did not touch them for now.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
8 years agox86/mmcfg: Fix initalisation of variables in pci_mmcfg_nvidia_mcp55()
Andrew Cooper [Wed, 3 Aug 2016 09:48:42 +0000 (10:48 +0100)]
x86/mmcfg: Fix initalisation of variables in pci_mmcfg_nvidia_mcp55()

Shifting into the sign bit of an integer is undefined behaviour.

Only the first integer is actually undefined, but switch all the shifts
for consistency.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
8 years agoratelimit: Implement rate limit for credit2 scheduler
Anshul Makkar [Wed, 3 Aug 2016 12:35:22 +0000 (13:35 +0100)]
ratelimit: Implement rate limit for credit2 scheduler

Rate limit assures that a vcpu will execute for a minimum amount of
time before being put at the back of a queue or being preempted by
higher priority thread.

It introduces context-switch rate-limiting. The patch enables the VM
to batch its work and prevents the system from spending most of its
time in context switches because of a VM that is waking/sleeping at
high rate.

ratelimit can be disabled by setting it to 0.

Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
8 years agoxen: cpupool: small optimization when moving between pools
Dario Faggioli [Wed, 3 Aug 2016 12:31:49 +0000 (13:31 +0100)]
xen: cpupool: small optimization when moving between pools

If the domain is already where we want it to go,
there's not much to do indeed.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
8 years agoxen: fix a (latent) cpupool-related race during domain destroy
Dario Faggioli [Wed, 3 Aug 2016 12:31:49 +0000 (13:31 +0100)]
xen: fix a (latent) cpupool-related race during domain destroy

So, during domain destruction, we do:
 cpupool_rm_domain()    [ in domain_destroy() ]
 sched_destroy_domain() [ in complete_domain_destroy() ]

Therefore, there's a window during which, from the
scheduler's point of view, a domain stilsts outside
of any cpupool.

In fact, cpupool_rm_domain() does d->cpupool=NULL,
and we don't allow that to hold true, for anything
but the idle domain (and there are, in fact, ASSERT()s
and BUG_ON()s to that effect).

Currently, we never really check d->cpupool during the
window, but that does not mean the race is not there.
For instance, Credit2 at some point (during load balancing)
iterates on the list of domains, and if we add logic that
needs checking d->cpupool, and any one of them had
cpupool_rm_domain() called on itself already... Boom!

(In fact, calling __vcpu_has_soft_affinity() from inside
balance_load() makes `xl shutdown <domid>' reliably
crash, and this is how I discovered this.)

On the other hand, cpupool_rm_domain() "only" does
cpupool related bookkeeping, and there's no harm
postponing it a little bit.

Also, considering that, during domain initialization,
we do:
 cpupool_add_domain()
 sched_init_domain()

It makes sense for the destruction path to look like
the opposite of it, i.e.:
 sched_destroy_domain()
 cpupool_rm_domain()

And hence that's what this patch does.

Actually, for better robustness, what we really do is
moving both cpupool_add_domain() and cpupool_rm_domain()
inside sched_init_domain() and sched_destroy_domain(),
respectively (and also add a couple of ASSERT()-s).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: issues in csched2_cpu_pick(), when tracing is enabled.
Dario Faggioli [Wed, 27 Jul 2016 03:09:49 +0000 (05:09 +0200)]
xen: credit2: issues in csched2_cpu_pick(), when tracing is enabled.

In fact, when not finding a suitable runqueue where to
place a vCPU, and hence using a fallback, we either:
 - don't issue any trace record (while we should, at
   least, output the chosen pcpu),
 - risk underruning when accessing the runqueues
   array, while preparing the trace record.

Fix both issues and, while there, also a couple of style
problems found nearby.

Spotted by Coverity.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agomwait-idle: add Denverton
Jacob Pan [Wed, 3 Aug 2016 12:41:13 +0000 (14:41 +0200)]
mwait-idle: add Denverton

Denverton is an Intel Atom based micro server which shares the same
Goldmont architecture as Broxton. The available C-states on
Denverton is a subset of Broxton with only C1, C1e, and C6.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit: 0080d65b7719fc58e60b5595fc61acded330004f]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: introduce and use rdtsc_ordered()
Jan Beulich [Wed, 3 Aug 2016 12:40:44 +0000 (14:40 +0200)]
x86/time: introduce and use rdtsc_ordered()

Matching Linux commit 03b9730b76 ("x86/asm/tsc: Add rdtsc_ordered() and
use it in trivial call sites") and earlier ones it builds upon, let's
make sure timing loops don't have their rdtsc()-s re-ordered, as that
would harm precision of the result (values were observed to be several
hundred clocks off without this adjustment).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
8 years agox86/time: adjust local system time initialization
Jan Beulich [Wed, 3 Aug 2016 12:39:31 +0000 (14:39 +0200)]
x86/time: adjust local system time initialization

Using the bare return value from read_platform_stime() is not suitable
when local_time_calibration() is going to use its fast path: Divergence
of several dozen microseconds between NOW() return values on different
CPUs results when platform and local time don't stay in close sync.

Latch local and platform time on the CPU initiating AP bringup, such
that the AP can use these values to seed its stime_local_stamp with as
little of an error as possible. The boot CPU, otoh, can simply
calculate the correct initial value (other CPUs could do so too with
even greater accuracy than the approach being introduced, but that can
work only if all CPUs' TSCs start ticking at the same time, which
generally can't be assumed to be the case on multi-socket systems).

This slightly defers init_percpu_time() (moved ahead by commit
dd2658f966 ["x86/time: initialise time earlier during
start_secondary()"]) in order to reduce as much as possible the gap
between populating the stamps and consuming them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibxl: use llabs() instead abs() for int64_t argument
Juergen Gross [Tue, 2 Aug 2016 17:25:42 +0000 (19:25 +0200)]
libxl: use llabs() instead abs() for int64_t argument

Commit 57f8b13c724023c78fa15a80452d1de3e51a1418 ("libxl: memory size
in kb requires 64 bit variable") introduced a bug: abs() shouldn't
be called with an int64_t argument. llabs() is to be used here.

Caught by clang build with error message:

libxl.c:4198:33: error: absolute value function 'abs' given an argument
of type
    'int64_t' (aka 'long') but has parameter of type 'int' which may cause
    truncation of value [-Werror,-Wabsolute-value]
    if (target_memkb < 0 && abs(target_memkb) > current_target_memkb)

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/mm: Annotate gfn_get_* helpers as requiring non-NULL parameters
Andrew Cooper [Wed, 27 Jul 2016 17:34:39 +0000 (18:34 +0100)]
x86/mm: Annotate gfn_get_* helpers as requiring non-NULL parameters

Introduce and use the nonnull attribute to help the compiler catch NULL
parameters being passed to function which require their parameters not to be
NULL.  Experimentally, GCC 4.9 on Debian Jessie only warns of non-NULL-ness
from immediate callers, so propagate the attributes out to all helpers.

A sample error looks like:

mem_sharing.c: In function ‘mem_sharing_nominate_page’:
mem_sharing.c:884:13: error: null argument where non-null required (argument 3) [-Werror=nonnull]
             amfn = get_gfn_type_access(ap2m, gfn, NULL, &ap2ma, 0, NULL);
             ^

As part of this, replace the get_gfn_type_access() macro with an equivalent
static inline function for extra type safety, and the ability to be annotated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agosystemd: remove hard-coded pid file in xendriverdomain service
Wei Liu [Wed, 20 Jul 2016 15:36:15 +0000 (16:36 +0100)]
systemd: remove hard-coded pid file in xendriverdomain service

Per the discussion in [0], the hard-coded pid file can be removed
completely. Systemd has no trouble figuring out the pid of devd all by
itself.

[0]: https://lists.xen.org/archives/html/xen-devel/2016-07/msg01393.html

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxl: memory size in kb requires 64 bit variable
Juergen Gross [Thu, 28 Jul 2016 13:35:19 +0000 (15:35 +0200)]
libxl: memory size in kb requires 64 bit variable

libxl_set_memory_target() and several other interface functions of
libxl use a 32 bit sized parameter for a memory size value in kBytes.
This limits the maximum size to be passed in such a parameter
depending on signedness of the parameter to 2TB or 4TB.

Correct this by using 64 bit types.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/mem-sharing: mem-sharing a range of memory
Tamas K Lengyel [Mon, 1 Aug 2016 17:14:27 +0000 (11:14 -0600)]
x86/mem-sharing: mem-sharing a range of memory

Currently mem-sharing can be performed on a page-by-page basis from the control
domain. However, this process is quite wasteful when a range of pages have to
be deduplicated.

This patch introduces a new mem_sharing memop for range sharing where
the user doesn't have to separately nominate each page in both the source and
destination domain, and the looping over all pages happen in the hypervisor.
This significantly reduces the overhead of sharing a range of memory.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibxl: create xenstore nodes for control/feature-XXX flags
Paul Durrant [Mon, 1 Aug 2016 08:57:10 +0000 (09:57 +0100)]
libxl: create xenstore nodes for control/feature-XXX flags

The xenstore-paths documentation specifies various control/feature-XXX
flags to allow a guest to tell a toolstack about its abilities to
respond to values written to control/shutdown. However, because the
parent control xenstore key is created read-only to the guest, unless
empty nodes for the feature flags are also created reat/write by the
toolstack, the guest will not be able to set any flags.

This patch adds code to create all specified feature flag nodes at
domain creation time.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: fix printing hotplug arguments/environment
Roger Pau Monne [Tue, 2 Aug 2016 10:49:51 +0000 (12:49 +0200)]
libxl: fix printing hotplug arguments/environment

An OS could decide to not pass any environment variables to hotplug scripts,
and this will trigger a bug in device_hotplug logic, since it expects the
environment array to exist. Allow env to be NULL.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: define semantics of vncpasswd in xl.cfg
Jim Fehlig [Fri, 29 Jul 2016 22:56:22 +0000 (16:56 -0600)]
docs: define semantics of vncpasswd in xl.cfg

A recent discussion around LSN-2016-0001 [1] included defining
the sematics of an empty string for a VNC password. It was stated
that "libxl interprets an empty password in the caller's
configuration to mean that passwordless access should be permitted".

The same applies for vncpasswd setting in xl.cfg. This patch
extends to xl.cfg documentation to define the semantics of setting
vncpasswd to an empty string.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/PCI: update ACPI Check to include SGI Ux3
Boris Ostrovsky [Tue, 2 Aug 2016 15:52:44 +0000 (17:52 +0200)]
x86/PCI: update ACPI Check to include SGI Ux3

These systems use variations of SGI3* for ID string.

Instead of adding abother set of strings do what Linux did
in commit 526018bc and look at first three letters.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: use gcc6'es flags asm() output support
Jan Beulich [Tue, 2 Aug 2016 15:51:10 +0000 (17:51 +0200)]
x86: use gcc6'es flags asm() output support

..., rendering affected code more efficient and smaller.

Note that in atomic.h this at once does away with the redundant output
and input specifications of the memory location touched.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agoxen/types: Correct the definition of uintptr_t
Andrew Cooper [Mon, 1 Aug 2016 12:36:44 +0000 (13:36 +0100)]
xen/types: Correct the definition of uintptr_t

uintptr_t is specified as unsigned int in 32bit, not unsigned long.  This is
why, when copying inttypes.h from GCC, the use of PRIxPTR and similar is
broken for 32bit builds.

Use __attribute__((__mode__(__pointer__))) to get the compilers default
pointer type, which matches the pre-existing inttypes.h

Fix the identified breakage with ELF_PRPTRVAL

Compile tested on all architectures, with a manual printk() to trigger any
potential -Wformat issues.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/common: Sort the obj build order
Andrew Cooper [Mon, 1 Aug 2016 13:03:32 +0000 (14:03 +0100)]
xen/common: Sort the obj build order

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/types: Alter typedef for bool_t
Andrew Cooper [Mon, 1 Aug 2016 10:34:35 +0000 (11:34 +0100)]
xen/types: Alter typedef for bool_t

As xen/stdbool.h is included, the typedef should use bool rather than _Bool.

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mm: Clean up the construction of base_disallow_mask
Andrew Cooper [Fri, 15 Jul 2016 18:34:00 +0000 (19:34 +0100)]
x86/mm: Clean up the construction of base_disallow_mask

 * Use _PAGE_AVAIL_HIGH and _PAGE_NX instead of opencoding them
 * Drop further remenants of the 32bit hypervisor build

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mm: Avoid NULL dereference when checking altp2m's for shareability
Andrew Cooper [Wed, 27 Jul 2016 17:54:16 +0000 (18:54 +0100)]
x86/mm: Avoid NULL dereference when checking altp2m's for shareability

Coverity identifies that __get_gfn_type_access() unconditionally writes to its
type parameter under a number of circumstances.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agox86/vMSI-x: check whether msixtbl_list in msixtbl_pt_register()
Chao Gao [Mon, 1 Aug 2016 16:22:54 +0000 (18:22 +0200)]
x86/vMSI-x: check whether msixtbl_list in msixtbl_pt_register()

MSI-x tables' initializtion had been deferred in the commit
74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798. If an assigned device does not support
MSI-x, the msixtbl_list won't be initialized. However, the following paths
    XEN_DOMCTL_bind_pt_irq
pt_irq_create_bind
    msixtbl_pt_register
do not check this case. Some errors(malwares, etc.) may lead to calling
XEN_DOMCTL_bind_pt_irq without a clear gtable and will cause Xen panic.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agomwait-idle: correct/improve BXT support
Jan Beulich [Mon, 1 Aug 2016 16:21:37 +0000 (18:21 +0200)]
mwait-idle: correct/improve BXT support

Linux commit 5dcef69486 ("intel_idle: add BXT support") added an
8-element lookup array with just a 2-bit value used for lookups. As per
the SDM that bit field is really 3 bits wide. Since the top two array
entries are zero, deal with the resulting invalid (zero) values by
moving the zero-MSR-value check into irtl_2_usec() and having that
function's caller check its result instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Linux commit: 3451ab3ebf92b12801878d8b5c94845afd4219f0]
[Linux commit: bef450962597ff39a7f9d53a30523aae9eb55843]

8 years agoMAINTAINERS: update Quan Xu's email address
Quan Xu [Mon, 1 Aug 2016 10:41:26 +0000 (11:41 +0100)]
MAINTAINERS: update Quan Xu's email address

Signed-off-by: Quan Xu <xuquan8@huawei.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
8 years agolibxl: compilation warning fix for arm & aarch64
Chris Patterson [Wed, 27 Jul 2016 20:01:26 +0000 (16:01 -0400)]
libxl: compilation warning fix for arm & aarch64

GCC 6 will warn on unused static const variables in c modules:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00847.html

When compiling with LIBXL_HAVE_NO_SUSPEND_RESUME set (arm & aarch64),
the compiler emits the following errors:
  xl_cmdimpl.c:101:19: error: 'migrate_report'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:99:19: error: 'migrate_permission_to_go'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:97:19: error: 'migrate_receiver_ready'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:95:19: error: 'migrate_receiver_banner'
      defined but not used [-Werror=unused-const-variable=]

These unused const variables are only used in functions which exist between
the ifndef block:
   #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
   ...
   #endif

Wrap the same ifndef around these variables.

Signed-off-by: Chris Patterson <pattersonc@ainfosec.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxsm: don't require configuring tools to build xen xsm blob
Wei Liu [Mon, 25 Jul 2016 15:13:13 +0000 (16:13 +0100)]
xsm: don't require configuring tools to build xen xsm blob

Starting from 08cffe66 ("xsm: add a default policy to .init.data") we
can attach a xsm policy blob to hypervisor. To build that policy blob
now hypervisor build system needs to enter tools directory.

The expectation for hypervisor and tools build systems is different. We
don't want xen build system to depend on configure but we want tools
build system to. That commit broke this expectation because it required
users to run configure before building hypervisor. This broke ARM build
because ARM developers normally build hypervisor and tools separately
(and possibly on different platforms). It can also break x86 if
developers don't run configure before building hypervisor with XSM on.

To fix it, move major part of tools/flask/policy/Makefile into
Makefile.common and create tools only Makefile to include that common
Makefile. Hypervisor Makefile will use Makefile.common to build xsm
policy.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Julien Grall <julien.grall@arm.com>