]> xenbits.xensource.com Git - people/liuw/libxenctrl-split/xen.git/log
people/liuw/libxenctrl-split/xen.git
9 years agognttab: eliminate several explicit version checks
Jan Beulich [Tue, 16 Jun 2015 10:24:49 +0000 (12:24 +0200)]
gnttab: eliminate several explicit version checks

By having nr_grant_entries() return zero when the grant table version
is still unset we can reduce the number of error paths and at once fix
grant_map_exists() running into the being removed ASSERT() when called
for a page owned by a domain not having its grant table set up yet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/MSI: partly restore commit 73cb5d43a8 (build fix)
Jan Beulich [Mon, 15 Jun 2015 11:27:53 +0000 (13:27 +0200)]
x86/MSI: partly restore commit 73cb5d43a8 (build fix)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agognttab: make the grant table lock a read-write lock
David Vrabel [Mon, 15 Jun 2015 11:25:20 +0000 (13:25 +0200)]
gnttab: make the grant table lock a read-write lock

In combination with the per-active entry locks, the grant table lock
can be made a read-write lock since the majority of cases only the
read lock is required. The grant table read lock protects against
changes to the table version or size (which are done with the write
lock held).

The write lock is also required when two active entries must be
acquired.

The double lock is still required when updating IOMMU page tables.

With the lock contention being only on the maptrack lock (unless IOMMU
updates are required), performance and scalability is improved.

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agognttab: introduce maptrack lock
David Vrabel [Mon, 15 Jun 2015 11:23:34 +0000 (13:23 +0200)]
gnttab: introduce maptrack lock

Split grant table lock into two separate locks. One to protect
maptrack free list (maptrack_lock) and one for everything else (lock).

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agognttab: per-active entry locking
David Vrabel [Mon, 15 Jun 2015 11:22:07 +0000 (13:22 +0200)]
gnttab: per-active entry locking

Introduce a per-active entry spin lock to protect active entry state
The grant table lock must be locked before acquiring (locking) an
active entry.

This is a step in reducing contention on the grant table lock, but
will only do so once the grant table lock is turned into a read-write
lock.

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agoRevert "x86/MSI-X: use qword MMIO access for address writes"
Jan Beulich [Mon, 15 Jun 2015 09:32:28 +0000 (11:32 +0200)]
Revert "x86/MSI-X: use qword MMIO access for address writes"

This reverts commit 73cb5d43a8f48930e4594ef7b15b974487651ffe,
which appears to break with certain Tigon3 NICs.

9 years agolibxl: libxl_internal.h: Clarify ao rule against internal callers
Ian Jackson [Thu, 11 Jun 2015 16:56:15 +0000 (17:56 +0100)]
libxl: libxl_internal.h: Clarify ao rule against internal callers

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Juergen Gross <jgross@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
9 years agox86: avoid tripping watchdog when constructing dom0
Ross Lagerwall [Fri, 12 Jun 2015 10:07:05 +0000 (12:07 +0200)]
x86: avoid tripping watchdog when constructing dom0

Constructing dom0 may take a few seconds, particularly if the slow VESA
graphics terminal is used. Process pending softirqs a few times to avoid
tripping a watchdog with a short timeout.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Move inclusion of xen/softirq.h (and at once clean up other includes).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agocpupool: fix shutdown with cpupools with different schedulers
Dario Faggioli [Fri, 12 Jun 2015 10:06:24 +0000 (12:06 +0200)]
cpupool: fix shutdown with cpupools with different schedulers

trying to shutdown the host when a cpupool exists, has
pCPUs, and has a scheduler different than the Xen's default
one, produces this:

 root@Zhaman:~# xl cpupool-cpu-remove Pool-0 8
 root@Zhaman:~# xl cpupool-create name=\"Pool-1\" sched=\"credit2\"
 Using config file "command line"
 cpupool name:   Pool-1
 scheduler:      credit2
 number of cpus: 0
 root@Zhaman:~# xl cpupool-cpu-add Pool-1 8
 root@Zhaman:~# shutdown -h now

 (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
 (XEN) CPU:    0
 (XEN) RIP:    e008:[<ffff82d080133bdf>] kill_timer+0x56/0x298
 (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
  ... ... ...
 (XEN) Xen call trace:
 (XEN)    [<ffff82d080133bdf>] kill_timer+0x56/0x298
 (XEN)    [<ffff82d08012233f>] csched_free_pdata+0x9b/0xcf
 (XEN)    [<ffff82d08012c30c>] cpu_schedule_callback+0x64/0x8b
 (XEN)    [<ffff82d08011bc7a>] notifier_call_chain+0x67/0x87
 (XEN)    [<ffff82d08010153e>] cpu_down+0xd9/0x12c
 (XEN)    [<ffff82d080101744>] disable_nonboot_cpus+0x93/0x138
 (XEN)    [<ffff82d0801aa6e7>] enter_state_helper+0xbd/0x365
 (XEN)    [<ffff82d0801061e5>] continue_hypercall_tasklet_handler+0x4a/0xb1
 (XEN)    [<ffff82d080132387>] do_tasklet_work+0x78/0xab
 (XEN)    [<ffff82d0801326bd>] do_tasklet+0x5e/0x8a
 (XEN)    [<ffff82d0801646d2>] idle_loop+0x56/0x6b
  ... ... ...
 (XEN) ****************************************
 (XEN) Panic on CPU 0:
 (XEN) FATAL PAGE FAULT
 (XEN) [error_code=0000]
 (XEN) Faulting linear address: 0000000000000041
 (XEN) ****************************************

The fix is, when tearing down a pCPU, call the free_pdata()
hook from the scheduler of the cpupool the pCPU belongs to,
not always the one from the default scheduler.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
9 years agolibelf: fix elf_parse_bsdsyms call
Roger Pau Monné [Fri, 12 Jun 2015 10:05:54 +0000 (12:05 +0200)]
libelf: fix elf_parse_bsdsyms call

elf_parse_bsdsyms expects the second paramater to be a physical address, not
a virtual one.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
9 years agox86/context-switch: prefer is_..._domain() over is_..._vcpu()
Jan Beulich [Fri, 12 Jun 2015 10:04:26 +0000 (12:04 +0200)]
x86/context-switch: prefer is_..._domain() over is_..._vcpu()

Latch both domains alongside both vCPU-s into local variables, making
use of them where possible also beyond what the title says.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/HVM: prefer is_..._domain() over is_..._vcpu()
Jan Beulich [Fri, 12 Jun 2015 10:03:56 +0000 (12:03 +0200)]
x86/HVM: prefer is_..._domain() over is_..._vcpu()

... when the domain pointer is already available or such operations
occur frequently in a function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: prefer is_..._domain() over is_..._vcpu()
Jan Beulich [Fri, 12 Jun 2015 10:03:13 +0000 (12:03 +0200)]
x86: prefer is_..._domain() over is_..._vcpu()

... when the domain pointer is already available or such operations
occur frequently in a function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agodomctl: prefer is_..._domain() over is_..._vcpu()
Jan Beulich [Fri, 12 Jun 2015 10:02:12 +0000 (12:02 +0200)]
domctl: prefer is_..._domain() over is_..._vcpu()

... when the domain pointer is already available.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoEFI: map allocation size must be set to zero
Jan Beulich [Thu, 11 Jun 2015 12:47:54 +0000 (14:47 +0200)]
EFI: map allocation size must be set to zero

Commit 8a753b3f1c ("efi: fix allocation problems if ExitBootServices()
fails") replaced the use of a static (and hence zero-initialized)
variable by an automatic (and hence uninitialized) one.

Also drop the variable introduced by that commit in favor of re-using
another available and suitable one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/traps: loop in the correct direction in compat_iret()
Andrew Cooper [Thu, 11 Jun 2015 12:44:47 +0000 (14:44 +0200)]
x86/traps: loop in the correct direction in compat_iret()

This is CVE-2015-4164 / XSA-136.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agognttab: add missing version check to GNTTABOP_swap_grant_ref handling
Jan Beulich [Thu, 11 Jun 2015 12:44:12 +0000 (14:44 +0200)]
gnttab: add missing version check to GNTTABOP_swap_grant_ref handling

... avoiding NULL derefs when the version to use wasn't set yet (via
GNTTABOP_setup_table or GNTTABOP_set_version).

This is CVE-2015-4163 / XSA-134.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoEFI/ARM: don't treat EfiBootServices{Code,Data} as normal RAM under /mapbs
Jan Beulich [Thu, 11 Jun 2015 09:58:29 +0000 (11:58 +0200)]
EFI/ARM: don't treat EfiBootServices{Code,Data} as normal RAM under /mapbs

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoVT-d: extend quirks to newer desktop chipsets
Jan Beulich [Thu, 11 Jun 2015 09:55:05 +0000 (11:55 +0200)]
VT-d: extend quirks to newer desktop chipsets

We're being told that while on the server side the issue we're trying
to work around is fixed starting with IvyBridge (another round of
double checking is going on before we're going to remove the one
IvyBridge ID that we're currently applying the workaround for), on the
desktop side even Skylake still requires the workaround. Hence we need
to add a whole bunch of desktop IDs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
9 years agoVT-d: use qword MMIO access for MSI address writes
Jan Beulich [Thu, 11 Jun 2015 09:54:10 +0000 (11:54 +0200)]
VT-d: use qword MMIO access for MSI address writes

Also make dmar_{read,write}q() actually do what their names suggest (we
don't need to be concerned of 32-bit restrictions anymore).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/MSI-X: use qword MMIO access for address writes
Jan Beulich [Thu, 11 Jun 2015 09:53:20 +0000 (11:53 +0200)]
x86/MSI-X: use qword MMIO access for address writes

Now that we support it for our guests, let's do so ourselves too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/vMSI-X: support qword MMIO access
Jan Beulich [Thu, 11 Jun 2015 09:52:18 +0000 (11:52 +0200)]
x86/vMSI-X: support qword MMIO access

The specification explicitly provides for this, so we should have
supported this from the beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agotools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
Andrew Cooper [Mon, 13 Apr 2015 16:07:03 +0000 (16:07 +0000)]
tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125

gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
uint64_t and unsigned long.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoevtchn: profile event channel lock
David Vrabel [Wed, 10 Jun 2015 10:06:02 +0000 (12:06 +0200)]
evtchn: profile event channel lock

The per-domain event channel lock may suffer from contention.  Add it to
the set of locks to be profiled when lock profiling is enabled.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
9 years agox86/EFI: adjust EFI_MEMORY_WP handling for spec version 2.5
Jan Beulich [Wed, 10 Jun 2015 10:05:21 +0000 (12:05 +0200)]
x86/EFI: adjust EFI_MEMORY_WP handling for spec version 2.5

That flag now means cachability rather than protection, and a new flag
EFI_MEMORY_RO got added in its place.

Along with EFI_MEMORY_RO also add the two other new EFI_MEMORY_*
definitions, even if we don't need them right away.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoEFI: support default attributes to map Runtime service areas with none given
Konrad Rzeszutek Wilk [Wed, 10 Jun 2015 10:04:07 +0000 (12:04 +0200)]
EFI: support default attributes to map Runtime service areas with none given

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

For example on Dell machines we see:

(XEN)  00000fed18000-00000fed19fff type=11 attr=8000000000000000
(XEN) Unknown cachability for MFNs 0xfed18-0xfed19

Let's allow them to be mapped as UC.

We also alter the 'efi-rs' to be 'efi=rs' or 'efi=no-rs'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoEFI/early: add /mapbs to map EfiBootServices{Code,Data}
Konrad Rzeszutek Wilk [Wed, 10 Jun 2015 10:02:43 +0000 (12:02 +0200)]
EFI/early: add /mapbs to map EfiBootServices{Code,Data}

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

To help on certain platforms to run.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/EFI: fix EFI_MEMORY_WP handling
Jan Beulich [Wed, 10 Jun 2015 10:01:35 +0000 (12:01 +0200)]
x86/EFI: fix EFI_MEMORY_WP handling

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoefi: avoid calling boot services after ExitBootServices()
Ross Lagerwall [Wed, 10 Jun 2015 09:57:18 +0000 (11:57 +0200)]
efi: avoid calling boot services after ExitBootServices()

After the first call to ExitBootServices(), avoid calling any boot
services (except GetMemoryMap() and ExitBootServices()) by setting
setting efi_bs to NULL and halting in blexit(). Only GetMemoryMap() and
ExitBootServices() are explicitly allowed to be called after the first
call to ExitBootServices() and so are are called via
SystemTable->BootServices.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoQEMU_TAG update
Ian Jackson [Tue, 9 Jun 2015 15:21:43 +0000 (16:21 +0100)]
QEMU_TAG update

9 years agokexec: add more pages to v1 environment
Jan Beulich [Tue, 9 Jun 2015 14:00:24 +0000 (16:00 +0200)]
kexec: add more pages to v1 environment

Destination pages need mappings to be added to the page tables in the
v1 case (where nothing else calls machine_kexec_add_page() for them).

Further, without the tools mapping the low 1Mb (expected by at least
some Linux version), we need to do so in the hypervisor in the v1 case.

Suggested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Alan Robinson <alan.robinson@ts.fujitsu.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: adjust PV I/O emulation functions' types
Jan Beulich [Tue, 9 Jun 2015 13:59:31 +0000 (15:59 +0200)]
x86: adjust PV I/O emulation functions' types

admin_io_okay(), guest_io_read(), and guest_io_write() all don't need
their current "regs" parameter at all, and they don't use the vCPU
passed to them for other than obtaining its domain. Drop the former and
replace the latter by a struct domain pointer.

pci_cfg_okay() returns a boolean type, and its "write" parameter is of
boolean kind too.

All of them get called for the current vCPU (and hence current domain)
only, so name the domain parameters accordingly except in the
admin_io_okay() case, which a subsequent patch will use for simplifying
setup_io_bitmap().

Latch current->domain into a local variable in emulate_privileged_op().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agorangeset: "has" and "is" functions return boolean
Jan Beulich [Tue, 9 Jun 2015 13:57:26 +0000 (15:57 +0200)]
rangeset: "has" and "is" functions return boolean

Additionally rangeset_is_empty()'s sole parameter can be const.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agomake do_sched_op_compat() x86-specific
Jan Beulich [Tue, 9 Jun 2015 13:56:03 +0000 (15:56 +0200)]
make do_sched_op_compat() x86-specific

Being a pre-3.1 compatibility hypercall handler only, it's not needed
on ARM or any future architectures Xen may get ported to.

Also the function shouldn't really be used internally - its use should
be limited to its purpose (and hence there's also no need for a
prototype).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@cirix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoarinc653: don't leak hypervisor stack contents through XEN_SYSCTL_SCHEDOP_getinfo
Jan Beulich [Tue, 9 Jun 2015 13:54:53 +0000 (15:54 +0200)]
arinc653: don't leak hypervisor stack contents through XEN_SYSCTL_SCHEDOP_getinfo

Note that due to XSA-77 this is not a security issue.

Reported-by: "栾尚聪(好风)" <shangcong.lsc@alibaba-inc.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by Robert VanVossen <robert.vanvossen@dornerworks.com>

9 years agoarm: use existing __section() macro instead of opencoding it
Andrew Cooper [Mon, 8 Jun 2015 13:38:39 +0000 (15:38 +0200)]
arm: use existing __section() macro instead of opencoding it

No functional change

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/mm: print domain IDs instead of pointers
Jan Beulich [Mon, 8 Jun 2015 12:41:25 +0000 (14:41 +0200)]
x86/mm: print domain IDs instead of pointers

Printing pointers to struct domain isn't really useful for initial
problem analysis. In get_page() also drop the page only after issuing
the log message, so that at the time of printing the state can be
considered reasonably consistent.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/VPMU: add lost Intel processor
Alan Robinson [Mon, 8 Jun 2015 12:17:06 +0000 (14:17 +0200)]
x86/VPMU: add lost Intel processor

commit 6d112f2b50 ("x86/vPMU: change Intel model numbers from decimal
to hex") translated 47 to 0x27, now corrected to 0x2f.

Signed-off-by: Alan Robinson <Alan.Robinson@ts.fujitsu.com>
Signed-off-by: Dietmar Hahn <Dietmar.Hahn@ts.fujitsu.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/setup: move CPU0s stack out of the Xen text/data/bss virtual region
Andrew Cooper [Mon, 8 Jun 2015 12:16:27 +0000 (14:16 +0200)]
x86/setup: move CPU0s stack out of the Xen text/data/bss virtual region

Currently, the BSP's stack is the BSS symbol cpu0_stack.  In builds using
memguard_stack(), a page gets shot out of the mappings.

To avoid shattering the superpage which will eventually map the BSS, use the
directmap virtual address of cpu0_stack, while still using the same underlying
physical memory.  (Xen has an order 21 physical relocation requirement meaning
that the order 3 alignment requirement for cpu0_stack will be honoured even
via its diretmap mapping.)

In addition, fix two issues exposed by the changes.

 * do_invalid_op() should use is_active_kernel_text() rather than having its
   own, different, idea of when to search through the bugframes.
 * Setting of system_state to active needs to be deferred until after code has
   left .init.text, for bugframes/backtraces to function in reinit_bsp_stack().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: misc boot/link tweaking
Andrew Cooper [Mon, 8 Jun 2015 12:15:59 +0000 (14:15 +0200)]
x86: misc boot/link tweaking

 * Introduce symbols bounding the multiboot1 header, which helps clarify that
   it is data and not code corruption when viewing the disassembly.
 * Move the __high_start symbol to its implementation, and declare it
   correctly as ENTRY()
 * Move the l1_identmap construction to be with all the other pagetables, and
   within __page_tables_{start,end}.  This won't affect the EFI relocation
   algorithm, as l1_identmap contains no relocations.
 * Move the cpu0_stack alignment check to the linker.  Chances are very good
   that a binary with a misaligned stack won't get as far as the test.
 * Use MB() in linker script.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: use existing __section() macro instead of opencoding it
Andrew Cooper [Mon, 8 Jun 2015 12:14:38 +0000 (14:14 +0200)]
x86: use existing __section() macro instead of opencoding it

No functional change

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agosched_rt: fix memory leak in rt_init()
Andrew Cooper [Mon, 8 Jun 2015 12:13:23 +0000 (14:13 +0200)]
sched_rt: fix memory leak in rt_init()

Introduced by c/s 376bbba "sched_rt: print useful affinity info when dumping".
If the allocation of cpumask failed, prv was leaked.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-ID: 1304398
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
9 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Fri, 5 Jun 2015 13:35:49 +0000 (14:35 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

9 years agoxen: arm: add missing newline to error message.
Ian Campbell [Thu, 4 Jun 2015 15:31:41 +0000 (16:31 +0100)]
xen: arm: add missing newline to error message.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
9 years agoxen/arm: vgic-v3: Clean the emulation of IROUTER
Julien Grall [Mon, 25 May 2015 20:44:20 +0000 (21:44 +0100)]
xen/arm: vgic-v3: Clean the emulation of IROUTER

The read emulation of the register IROUTER contains lots of uncessary
code as irouter is already valid and doesn't need any processing before
setting the value in a register.

Also take the opportunity to factorize the code to find a vCPU from the
affinity in a single place. It will be easier to change the way to do it
later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Chen Baozi <cbz@baozis.org>
Acked-by: Chen Baozi <baozich@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agovm_event: clean up control-register-write vm_events and add XCR0 event
Razvan Cojocaru [Fri, 5 Jun 2015 10:20:18 +0000 (12:20 +0200)]
vm_event: clean up control-register-write vm_events and add XCR0 event

As suggested by Andrew Cooper, this patch attempts to remove
some redundancy and allow for an easier time when adding vm_events
for new control registers in the future, by having a single
VM_EVENT_REASON_WRITE_CTRLREG vm_event type, meant to serve CR0,
CR3, CR4 and (newly introduced) XCR0. The actual control register
will be deduced by the new .index field in vm_event_write_ctrlreg
(renamed from vm_event_mov_to_cr).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agovmap: convert vmap() to using mfn_t
Andrew Cooper [Fri, 5 Jun 2015 10:17:16 +0000 (12:17 +0200)]
vmap: convert vmap() to using mfn_t

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agomem: expose typesafe mfns/gfns/pfns to common code
Andrew Cooper [Fri, 5 Jun 2015 10:10:33 +0000 (12:10 +0200)]
mem: expose typesafe mfns/gfns/pfns to common code

As the first step of memory management cleanup, introduce the common code to
mfn_t, gfn_t and pfn_t.

The typesafe construction moves to its own header file, while the declarations
and sentinal values are moved to being common.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agox86/paging: remove pointless current domain checks
Jan Beulich [Fri, 5 Jun 2015 10:09:18 +0000 (12:09 +0200)]
x86/paging: remove pointless current domain checks

Checking that the subject domain is not the current one is pointless
when already having paused that domain: domain_pause() already
ASSERT()s this to be the case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
9 years agotools: link executables with libtinfo explicitly
Daniel Kiper [Tue, 2 Jun 2015 13:33:26 +0000 (15:33 +0200)]
tools: link executables with libtinfo explicitly

binutils 2.22 changed ld default from --copy-dt-needed-entries
to -no-copy-dt-needed-entries. This revealed that some objects
are linked implicitly with libtinfo and newer ld fails to build
relevant executables.

Below is short explanation why we should not do that...

http://fedoraproject.org/wiki/UnderstandingDSOLinkChange says:

The default behaviour for ld (my note: before version 2.22) allows
users to 'indirectly' link to required objects/libraries through
intermediate objects/libraries. While this is convenient, it can
also be dangerous because it makes your program's dependencies tied
to the dependencies of other objects. If those objects ever change
their linkages, they can break your program without any changes
to your own code!

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: gic-hip04: Resync the driver with the GICv2
Julien Grall [Wed, 6 May 2015 18:52:30 +0000 (19:52 +0100)]
xen/arm: gic-hip04: Resync the driver with the GICv2

The GIC hip04 driver was differring from GICv2. I suspect that some of
the changes in the common GIC code make boot fail on hip04. Although, I
don't have a platform to check so it has been only build tested.

List of GICv2 commit ported to the HIP04:
    commit ce12e6dba4b2d120e35dffd95a745452224e7144
    Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
    Date:   Fri Apr 10 16:21:10 2015 +1000

        xen/arm: Don't write to GICH_MISR

        GICH_MISR is read-only in GICv2.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
    commit 2eb4f996547dc632aa94b2b7b4f783bec8ffe457
    Author: Julien Grall <julien.grall@linaro.org>
    Date:   Wed Apr 1 17:21:47 2015 +0100

        xen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts

        GICD_TYPER.ITLinesNumber can encode up to 1024 interrupts. Although,
        IRQ 1020-1023 are reserved for special purpose.

        The result is used by the callers of gic_number_lines in order to check
        the validity of an IRQ.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
    commit e2d486b385ce58b6db7561417de28ba837dcd4ac
    Author: Julien Grall <julien.grall@linaro.org>
    Date:   Wed Apr 1 17:21:34 2015 +0100

        xen/arm: Divide GIC initialization in 2 parts

        Currently the function to translate IRQ from the device tree is set
        unconditionally  to be able to be able to retrieve serial/timer IRQ
        before the GIC has been initialized.

        It assumes that the xlate function won't ever changed. We may also need
        to have the primary interrupt controller very early.

        Rework the gic initialization in 2 parts:
            - gic_preinit: Get the interrupt controller device tree node and
            set up GIC and xlate callbacks
            - gic_init: Initialize the interrupt controller and the boot CPU
            interrupts.

        The former function will be called just after the IRQ subsystem as been
        initialized.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Reviewed-by: Zoltan Kiss <zoltan.kiss@huawei.com>
Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: remove code in stubdom creation failure path and callback
Wei Liu [Mon, 1 Jun 2015 17:24:35 +0000 (18:24 +0100)]
libxl: remove code in stubdom creation failure path and callback

The snippet to destroy stubdom and the callback were added in 1fc3aeb3
("libxl: use new QEMU xenstore protocol"). The intention was to destroy
stubdom when it is not responsive. That approach is problematic because
rc is not propagate back to sdss->callback, hence the guest is leaked.

The solution is simple. The destruction of stubdom can be done later in
sdss->callback. That code path already does the right thing to destroy
both the guest and the stubdom that serves the guest.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: fix HVM vNUMA
Wei Liu [Mon, 1 Jun 2015 10:19:14 +0000 (11:19 +0100)]
libxl: fix HVM vNUMA

This patch does two thing:

The original code erroneously fills in xc_hvm_build_args before
generating vmemranges. The effect is that guest memory is populated
without vNUMA information. Move the hunk to right place to fix this.

Move the subtraction of video ram to libxl__vnuma_build_vmemrange_hvm
because it's the central place for generating vmemranges.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: rework vnuma bits in setup_guest
Wei Liu [Mon, 1 Jun 2015 10:19:13 +0000 (11:19 +0100)]
libxc: rework vnuma bits in setup_guest

Make the setup process similar to PV counterpart. That is, to allocate a
P2M array that covers the whole memory range and start from there. This
is clearer than using an array with no holes in it.

Also the dummy layout should take MMIO hole into consideration. We might
end up having two vmemranges in the dummy layout.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: print more error messages when failed
Wei Liu [Mon, 1 Jun 2015 10:19:12 +0000 (11:19 +0100)]
libxc: print more error messages when failed

No functional changes introduced.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc/libxl: fill xc_hvm_build_args in libxl
Wei Liu [Mon, 1 Jun 2015 10:19:11 +0000 (11:19 +0100)]
libxc/libxl: fill xc_hvm_build_args in libxl

When building HVM guests, originally some fields of xc_hvm_build_args
are filled in xc_hvm_build (and buried in the wrong function), some are
set in libxl__build_hvm before passing xc_hvm_build_args to
xc_hvm_build. This is fragile.

After examining the code in xc_hvm_build that sets those fields, we can
in fact move setting of mmio_start etc in libxl. This way we consolidate
memory layout setting in libxl.

The setting of firmware data related fields is left in xc_hvm_build
because it depends on parsing ELF image. Those fields only point to
scratch data that doesn't affect memory layout.

There should be no change in the generated guest memory layout. But the
semantic is changed for xc_hvm_build. Toolstack that built directly on
top of libxc need to adjust to this change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: "Chen, Tiejun" <tiejun.chen@intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: traps: Print a message in debug build when a guest dabt is not handled
Julien Grall [Thu, 28 May 2015 09:10:47 +0000 (10:10 +0100)]
xen/arm: traps: Print a message in debug build when a guest dabt is not handled

This is useful for debugging low level kernel before the guest as setup
the vector table.

Note that the value of the IPA is only here for reference and may not
always be valid if the error came from a stage 1 table translation walk.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- dropped spurious w/s change ]

9 years agolibxc: add missing xc_hypercall_bounce_pre calls
Daniel De Graaf [Tue, 26 May 2015 18:13:29 +0000 (14:13 -0400)]
libxc: add missing xc_hypercall_bounce_pre calls

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/flask: change bool_maxstr to PAGE_SIZE
Daniel De Graaf [Tue, 26 May 2015 18:13:28 +0000 (14:13 -0400)]
xen/flask: change bool_maxstr to PAGE_SIZE

When FLASK_{GET,SET}BOOL is called with a named boolean, the call to
flask_security_resolve_bool is made prior to bool_maxstr being populated
by flask_security_make_bools.  This results in the maximum string length
being specified as zero, which is not useful.  While it would be
possible to initialize bool_maxstr correctly prior to its use, it is
simpler to use a fixed maximum of PAGE_SIZE as is done for the other
calls to safe_copy_string_from_guest.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoflask/policy: updates from osstest runs
Daniel De Graaf [Tue, 26 May 2015 18:13:27 +0000 (14:13 -0400)]
flask/policy: updates from osstest runs

Migration and HVM domain creation both trigger AVC denials that should
be allowed in the default policy; add these rules.

Guest console writes need to be either allowed or denied without audit
depending on the decision of the local administrator; introduce a policy
boolean to switch between these possibilities.

Reported-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxentrace: install into sbin
Olaf Hering [Sat, 23 May 2015 08:24:10 +0000 (08:24 +0000)]
xentrace: install into sbin

Collecting the trace buffer requires root permissions. Adjust Makefile
to install xentrace and xentrace_setsize into sbindir. Leave the
existing support for BIN in place for upcoming changes.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
9 years agox86/memcpy: reduce code size
Andrew Cooper [Wed, 3 Jun 2015 07:28:05 +0000 (09:28 +0200)]
x86/memcpy: reduce code size

'n % BYTES_PER_LONG' is at most 7, and doesn't need a 64bit register mov.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/debugger: use copy_to/from_guest() in dbg_rw_guest_mem()
Andrew Cooper [Wed, 3 Jun 2015 07:27:09 +0000 (09:27 +0200)]
x86/debugger: use copy_to/from_guest() in dbg_rw_guest_mem()

Using gdbsx on Broadwell systems suffers a SMAP violation because
dbg_rw_guest_mem() uses memcpy() with a userspace pointer.

The functions dbg_rw_mem() and dbg_rw_guest_mem() have been updated to pass
'void * __user' pointers which indicates their nature clearly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/crash: don't use set_fixmap() in the crash path
Andrew Cooper [Wed, 3 Jun 2015 07:26:13 +0000 (09:26 +0200)]
x86/crash: don't use set_fixmap() in the crash path

Experimentally, this can result in memory allocation, and in particular a
failed assertion that interrupts are enabled when performing a TLB flush.

  (XEN) Assertion 'local_irq_is_enabled()' failed at smp.c:223
  <snip>
  (XEN) [<ffff82d08018a0d3>] flush_area_mask+0x7/0x134
  (XEN) [<ffff82d08011f7c6>] alloc_domheap_pages+0xa9/0x12a
  (XEN) [<ffff82d08011f8ab>] alloc_xenheap_pages+0x64/0xdb
  (XEN) [<ffff82d080178e08>] alloc_xen_pagetable+0x1c/0xa0
  (XEN) [<ffff82d08017926b>] virt_to_xen_l1e+0x38/0x1be
  (XEN) [<ffff82d080179bff>] map_pages_to_xen+0x80e/0xfd9
  (XEN) [<ffff82d080185a23>] __set_fixmap+0x2c/0x2e
  (XEN) [<ffff82d0801a6fd4>] machine_crash_shutdown+0x186/0x2b2
  (XEN) [<ffff82d0801172bb>] kexec_crash+0x3f/0x5b
  (XEN) [<ffff82d0801479b7>] panic+0x100/0x118
  (XEN) [<ffff82d08019002b>] set_guest_machinecheck_trapbounce+0/0x6d
  (XEN) [<ffff82d080195c15>] do_page_fault+0x40b/0x541
  (XEN) [<ffff82d0802345e0>] handle_exception_saved+0x2e/0x6c

Instead, use the directmap mapping which are writable and involve far less
complexity than set_fixmap()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/apic: Disable the LAPIC later in smp_send_stop()
Andrew Cooper [Wed, 3 Jun 2015 07:25:43 +0000 (09:25 +0200)]
x86/apic: Disable the LAPIC later in smp_send_stop()

__stop_this_cpu() may reset the LAPIC mode back from x2apic to xapic, but will
leave x2apic_enabled alone.  This may cause disconnect_bsp_APIC() in
disable_IO_APIC() to suffer a #GP fault.

Disabling the LAPIC can safely be deferred to being the last action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agosched_rt: use the correct type for _cpumask_scratch
Julien Grall [Wed, 3 Jun 2015 07:24:50 +0000 (09:24 +0200)]
sched_rt: use the correct type for _cpumask_scratch

The commit 376bbbabbda607d2039b8f839f15ff02721597d2 "sched_rt: print useful
affinity info when dumping" breaks build on ARM64:

sched_rt.c: In function ‘rt_init’:
sched_rt.c:442:26: error: assignment from incompatible pointer type [-Werror]
         _cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
                          ^
sched_rt.c: In function ‘rt_alloc_pdata’:
sched_rt.c:489:29: error: passing argument 1 of ‘alloc_cpumask_var’ from incompatible pointer type [-Werror]
     if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )

This is because cpumask_var_t is not a type alias to cpumask_t** when
the number of CPU > 2 * BITS_PER_LONG. The correct type for
_cpumask_scratch should be cpumask_var_t*.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
9 years agounmodified-drivers: tolerate IRQF_DISABLED being undefined
Jan Beulich [Tue, 2 Jun 2015 11:45:03 +0000 (13:45 +0200)]
unmodified-drivers: tolerate IRQF_DISABLED being undefined

It's being removed in Linux 4.1.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoefi: fix allocation problems if ExitBootServices() fails
Ross Lagerwall [Tue, 2 Jun 2015 11:44:24 +0000 (13:44 +0200)]
efi: fix allocation problems if ExitBootServices() fails

If calling ExitBootServices() fails, the required memory map size may
have increased. When initially allocating the memory map, allocate a
slightly larger buffer (by an arbitrary 8 entries) to fix this.

The ARM code path was already allocating a larger buffer than required,
so this moves the code to be common for all architectures.

This was seen on the following machine when using the iscsidxe UEFI
driver. The machine would consistently fail the first call to
ExitBootServices().
System Information
        Manufacturer: Supermicro
        Product Name: X10SLE-F/HF
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 2.00
        Release Date: 04/24/2014

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agosched_rt: print useful affinity info when dumping
Dario Faggioli [Tue, 2 Jun 2015 11:43:15 +0000 (13:43 +0200)]
sched_rt: print useful affinity info when dumping

In fact, printing the cpupool's CPU online mask
for each vCPU is just redundant, as that is the
same for all the vCPUs of all the domains in the
same cpupool, while hard affinity is already part
of the output of dumping domains info.

Instead, print the intersection between hard
affinity and online CPUs, which is --in case of this
scheduler-- the effective affinity always used for
the vCPUs.

This change also takes the chance to add a scratch
cpumask area, to avoid having to either put one
(more) cpumask_t on the stack, or dynamically
allocate it within the dumping routine. (The former
being bad because hypervisor stack size is limited,
the latter because dynamic allocations can fail, if
the hypervisor was built for a large enough number
of CPUs.) We allocate such scratch area, for all pCPUs,
when the first instance of the RTDS scheduler is
activated and, in order not to loose track/leak it
if other instances are activated in new cpupools,
and when the last instance is deactivated, we (sort
of) refcount it.

Such scratch area can be used to kill most of the
cpumasks{_var}_t local variables in other functions
in the file, but that is *NOT* done in this chage.

Finally, convert the file to use keyhandler scratch,
instead of open coded string buffers.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
9 years agodocs: clarification to terms used in hypervisor memory management
Andrew Cooper [Mon, 1 Jun 2015 10:00:18 +0000 (12:00 +0200)]
docs: clarification to terms used in hypervisor memory management

Memory management is hard[citation needed].  Furthermore, it isn't helped by
the inconsistent use of terms through the code, or that some terms have
changed meaning over time.

Describe the currently-used terms in a more practical fashon, so new code has
a concrete reference.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agox86: don't crash when mapping a page using EFI runtime page tables
Ross Lagerwall [Mon, 1 Jun 2015 09:59:14 +0000 (11:59 +0200)]
x86: don't crash when mapping a page using EFI runtime page tables

When an interrupt is received during an EFI runtime service call, Xen
may call map_domain_page() while using the EFI runtime page tables.
This fails because, although the EFI runtime page tables are a
copy of the idle domain's page tables, current points at a different
domain's vCPU.

To fix this, return NULL from mapcache_current_vcpu() when using the EFI
runtime page tables which is treated equivalently to running in an idle
vCPU.

This issue can be reproduced by repeatedly calling GetVariable() from
dom0 while using VT-d, since VT-d frequently maps a page from interrupt
context.

Example call trace:
[<ffff82d0801615dc>] __find_next_zero_bit+0x28/0x60
[<ffff82d08016a10e>] map_domain_page+0x4c6/0x4eb
[<ffff82d080156ae6>] map_vtd_domain_page+0xd/0xf
[<ffff82d08015533a>] msi_msg_read_remap_rte+0xe3/0x1d8
[<ffff82d08014e516>] iommu_read_msi_from_ire+0x31/0x34
[<ffff82d08016ff6c>] set_msi_affinity+0x134/0x17a
[<ffff82d0801737b5>] move_masked_irq+0x5c/0x98
[<ffff82d080173816>] move_native_irq+0x25/0x36
[<ffff82d08016ffcb>] ack_nonmaskable_msi_irq+0x19/0x20
[<ffff82d08016ffdb>] ack_maskable_msi_irq+0x9/0x37
[<ffff82d080173e8b>] do_IRQ+0x251/0x635
[<ffff82d080234502>] common_interrupt+0x62/0x70
[<00000000df7ed2be>] 00000000df7ed2be

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
9 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Fri, 29 May 2015 12:22:31 +0000 (13:22 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

9 years agolibxc/restore: implement Remus checkpointed restore
Yang Hongyang [Mon, 18 May 2015 07:03:56 +0000 (15:03 +0800)]
libxc/restore: implement Remus checkpointed restore

With Remus, the restore flow should be:
the first full migration stream -> { periodically restore stream }

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc/save: implement Remus checkpointed save
Yang Hongyang [Mon, 18 May 2015 07:03:55 +0000 (15:03 +0800)]
libxc/save: implement Remus checkpointed save

With Remus, the save flow should be:
live migration->{ periodically save(checkpointed save) }

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc/save: refactor of send_domain_memory_live()
Yang Hongyang [Mon, 18 May 2015 07:03:54 +0000 (15:03 +0800)]
libxc/save: refactor of send_domain_memory_live()

Split the send_domain_memory_live() into three helper function:
  - send_memory_live()  do the actually live send
  - suspend_and_send_dirty() suspend the guest and send dirty pages
  - send_memory_verify()
The motivation of this is that when we send checkpointed stream, we
will skip the actually live part.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoRevert "use ticket locks for spin locks"
Jan Beulich [Thu, 28 May 2015 10:07:33 +0000 (12:07 +0200)]
Revert "use ticket locks for spin locks"

This reverts commit 45fcc4568c5162b00fb3907fb158af82dd484a3d as it
introduces yet to be explained issues on ARM.

9 years agoRevert "spinlock: fix build with older GCC"
Jan Beulich [Thu, 28 May 2015 10:06:47 +0000 (12:06 +0200)]
Revert "spinlock: fix build with older GCC"

This reverts commit 1037e33c88bb0e1fe530c164f242df17030102e1 as its
prereq commit 45fcc4568c is about to be reverted.

9 years agoRevert "x86,arm: remove asm/spinlock.h from all architectures"
Jan Beulich [Thu, 28 May 2015 09:59:34 +0000 (11:59 +0200)]
Revert "x86,arm: remove asm/spinlock.h from all architectures"

This reverts commit e62e49e6d5d4e8d22f3df0b75443ede65a812435 as
its prerequisite 45fcc4568c is going to be reverted.

9 years agox86/pvh: disable posted interrupts
Roger Pau Monné [Thu, 28 May 2015 08:56:08 +0000 (10:56 +0200)]
x86/pvh: disable posted interrupts

Enabling posted interrupts requires the virtual interrupt delivery feature,
which is disabled for PVH guests, so make sure posted interrupts are also
disabled or else vmlaunch will fail.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-and-Tested-by: Lars Eggert <lars@netapp.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agopublic: fix xen_domctl_monitor_op_t definition
Vitaly Kuznetsov [Thu, 28 May 2015 08:55:43 +0000 (10:55 +0200)]
public: fix xen_domctl_monitor_op_t definition

It seems xen_domctl_monitor_op_t was supposed to be a typedef for
struct xen_domctl_monitor_op and not the non-existent xen_domctl__op.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
9 years agoQEMU_TAG update
Ian Jackson [Wed, 27 May 2015 15:54:07 +0000 (16:54 +0100)]
QEMU_TAG update

9 years agoxen: Simplify TSC domctls by removing double info field
Ian Campbell [Tue, 26 May 2015 11:14:48 +0000 (12:14 +0100)]
xen: Simplify TSC domctls by removing double info field

There is no need to have this twice and we can simply inline
xen_guest_tsc_info into xen_domctl_tsc_info as well.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
9 years agoxen: x86: copy back tsc info, not pointer to tsc info in domctl
Ian Campbell [Tue, 26 May 2015 11:14:47 +0000 (12:14 +0100)]
xen: x86: copy back tsc info, not pointer to tsc info in domctl

In 38b37ed82705 "x86/domctl: cleanup", XEN_DOMCTL_gettscinfo was
changed to use the standard copyback mechanism.

However the output TSC Info is a guerst handle, i.e. a pointer to the
location for the information, copyback just copies the unchanged
pointer back.

Switch back to fetching the details into a local struct and explicitly
copying it back.

This caused test failures in the Cambridge instance of osstest, but
not for some reason in the production instance.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/cpuidle: prevent out of bounds array access
Jan Beulich [Fri, 22 May 2015 15:34:51 +0000 (17:34 +0200)]
x86/cpuidle: prevent out of bounds array access

... resulting from fbeef5570c ("x86/cpuidle: get accurate C0 value with
xenpm tool"). For consistency also no longer account an unknown state
to C0 in pmstat_get_cx_stat().

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com>
9 years agouse ULL for GB and MB macros
Julien Grall [Fri, 22 May 2015 15:33:39 +0000 (17:33 +0200)]
use ULL for GB and MB macros

On 32bit, GB(4) doesn't fit on an unsigned long. Modify MB to avoid
further issue.

Also, fix a couple of printf format in x86 which breaks after using
unsigned long long.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
9 years agox86: switch default mapping attributes to non-executable
Jan Beulich [Fri, 22 May 2015 08:50:14 +0000 (10:50 +0200)]
x86: switch default mapping attributes to non-executable

Only a very limited subset of mappings need to be done as executable
ones; in particular the direct mapping should not be executable to
limit the damage attackers can cause by exploiting security relevant
bugs.

The EFI change at once includes an adjustment to set NX only when
supported by the hardware.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: move I/O emulation stubs off the stack
Jan Beulich [Fri, 22 May 2015 08:48:42 +0000 (10:48 +0200)]
x86: move I/O emulation stubs off the stack

This is needed as stacks are going to become non-executable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86emul: move stubs off the stack
Jan Beulich [Fri, 22 May 2015 08:46:32 +0000 (10:46 +0200)]
x86emul: move stubs off the stack

This is needed as stacks are going to become non-executable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: move syscall trampolines off the stack
Jan Beulich [Fri, 22 May 2015 08:45:43 +0000 (10:45 +0200)]
x86: move syscall trampolines off the stack

This is needed as stacks are going to become non-executable. Use
separate stub pages (shared among suitable CPUs on the same node)
instead.

Stub areas (currently 128 bytes each) are being split into two parts -
a fixed usage one (the syscall ones) and dynamically usable space,
which will be used by subsequent changes to hold dynamically generated
code during instruction eumlation.

While sharing physical pages among certain CPUs on the same node, for
now the virtual mappings get established in distinct pages for each
CPU. This isn't a strict requirement, but simplifies VA space
management for this initial implementation: Sharing VA space would
require additional tracking of which areas are currently in use. If
the VA and/or TLB overhead turned out to be a problem, such extra code
could easily be added.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/shadow: fix uninitialized rc in shadow_track_dirty_vram()
Jan Beulich [Fri, 22 May 2015 08:13:30 +0000 (10:13 +0200)]
x86/shadow: fix uninitialized rc in shadow_track_dirty_vram()

Commit bd1b4a71b3 ("x86/shadow: fix shadow_track_dirty_vram to work on
hvm guests"), trying to mirror its HAP counterpart, deleted a couple of
assignments to rc without making sure rc is initialized on all paths.

Coverity ID: 1299410
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agox86/irq: limit the maximum number of domain PIRQs
Andrew Cooper [Fri, 22 May 2015 08:13:04 +0000 (10:13 +0200)]
x86/irq: limit the maximum number of domain PIRQs

c/s 7e73a6e "have architectures specify the number of PIRQs a hardware domain
gets" increased the default number of pirqs for dom0, as 256 was found to be
too low in some cases.

However, it didn't account for the upper bound presented by the domains EOI
bitmap, registered with the PHYSDEVOP_pirq_eoi_gmfn_v* hypercall.

On a server with 240 cpus, Xen was observed to be attempting to clear the EOI
bit for dom0's pirq 0xb40f, which hit a pagefault.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/cpuidle: get accurate C0 value with xenpm tool
Huaitong Han [Fri, 22 May 2015 08:12:02 +0000 (10:12 +0200)]
x86/cpuidle: get accurate C0 value with xenpm tool

When checking the ACPI funciton of C-status, after 100 seconds sleep,
the sampling value of C0 status from the xenpm tool decreases.
Because C0=NOW()-C1-C2-C3-C4, when NOW() value is during idle time,
NOW() value is bigger than last C-status update time, and C0 value
is also bigger than ture value. if margin of the second error cannot
make up for margin of the first error, the value of C0 would decrease.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agopublic: restrict xen_arch_domainconfig visibility
Jan Beulich [Fri, 22 May 2015 08:10:28 +0000 (10:10 +0200)]
public: restrict xen_arch_domainconfig visibility

As an extension to 931f5777c7 ("public: clarify xen_arch_domainconfig
ABI statement") limit the respective definitions' visibility to
hypervisor and tools.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/VT-x: align segment table columns when dumping a VMCS
Andrew Cooper [Fri, 22 May 2015 08:09:27 +0000 (10:09 +0200)]
x86/VT-x: align segment table columns when dumping a VMCS

This makes it more succinct and easier to read.

Before:
  (XEN) Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
  (XEN) CS: sel=0x0008, attr=0x0c09b, limit=0xffffffff, base=0x0000000000000000
  (XEN) DS: sel=0x0010, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
  (XEN) SS: sel=0x0010, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
  (XEN) ES: sel=0x0010, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
  (XEN) FS: sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
  (XEN) GS: sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
  (XEN) GDTR:                           limit=0x00000017, base=0x0000000000102eb8
  (XEN) LDTR: sel=0x0000, attr=0x00082, limit=0x0000ffff, base=0x0000000000000000
  (XEN) IDTR:                           limit=0x0000ffff, base=0x0000000000000000
  (XEN) TR: sel=0x0000, attr=0x0008b, limit=0x0000ffff, base=0x0000000000000000
  (XEN) EFER = 0x0000000000000000  PAT = 0x0007040600070406

After:
  (XEN) Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
  (XEN)        sel  attr  limit   base
  (XEN)   CS: 0008 0c09b ffffffff 0000000000000000
  (XEN)   DS: 0010 0c093 ffffffff 0000000000000000
  (XEN)   SS: 0010 0c093 ffffffff 0000000000000000
  (XEN)   ES: 0010 0c093 ffffffff 0000000000000000
  (XEN)   FS: 0000 00093 0000ffff 0000000000000000
  (XEN)   GS: 0000 00093 0000ffff 0000000000000000
  (XEN) GDTR:            00000017 0000000000102eb8
  (XEN) LDTR: 0000 00082 0000ffff 0000000000000000
  (XEN) IDTR:            0000ffff 0000000000000000
  (XEN)   TR: 0000 0008b 0000ffff 0000000000000000
  (XEN) EFER = 0x0000000000000000  PAT = 0x0007040600070406

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agodocs: correct pod syntax
Ian Campbell [Thu, 21 May 2015 16:31:39 +0000 (17:31 +0100)]
docs: correct pod syntax

Olaf reports:
[  146s] man/xl.pod.1 around line 1529: '=item' outside of any '=over'
[  146s] man/xl.pod.1 around line 1531: You forgot a '=back' before '=head1'
[  146s] POD document had syntax errors at /usr/bin/pod2text line 84.
[  146s] Makefile:167: recipe for target 'txt/man/xl.1.txt' failed
[  146s] make[1]: *** [txt/man/xl.1.txt] Error 255

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoxen: Move preinit_xen_time in ARM headers
Julien Grall [Thu, 21 May 2015 14:31:21 +0000 (15:31 +0100)]
xen: Move preinit_xen_time in ARM headers

This function is ARM specific. It's has been added by mistake in the
common code.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools: add configure --with-dumpdir=DIR option
Olaf Hering [Mon, 11 May 2015 15:33:51 +0000 (15:33 +0000)]
tools: add configure --with-dumpdir=DIR option

The current base directory /var/xen/dump for domU dumps will be patched
to /var/lib/xen/dump by most distros to follow FHS.

This change does three things:
 - change the default from /var/xen/dump to /var/lib/xen/dump
 - provide a configure option to avoid patching the source.
 - update docs to refer to the new default location

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- s,/var,LOCALSTATEDIR, in help test, ran autogen.sh ]

9 years agolibxl: assign a default ssidref (XSM label) to guests
Ian Campbell [Wed, 20 May 2015 14:39:00 +0000 (15:39 +0100)]
libxl: assign a default ssidref (XSM label) to guests

We have now arranged for SECINITSID_DOMU and SECINITSID_DOMDM to be
defined (correspondng to system_u:system_r:domU_t and
system_u:system_r:dm_dom_t respectively in the default policy). Use
these as the default for the SSID of every (stub)domain.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Wei.Liu2@citrix.com
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
9 years agoflask/policy: add initial SIDs for domU/domDM
Daniel De Graaf [Wed, 20 May 2015 14:38:59 +0000 (15:38 +0100)]
flask/policy: add initial SIDs for domU/domDM

Add default security contexts to the XSM policy for use by the toolstack
when a domain is created without specifying an explicit security label.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools: Expose XSM Flask initial SIDs list to tools
Ian Campbell [Wed, 20 May 2015 14:38:58 +0000 (15:38 +0100)]
tools: Expose XSM Flask initial SIDs list to tools

By generating tools/include/xen-xsm/flask/flask.h using the same tool
as used during the hypervisor build.

Note that this is done regardless of whether XSM is enabled, since we
want the tools to be agnostic to whether or not XSM is enabled in the
hypervisor

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>