]> xenbits.xensource.com Git - people/liuw/libxenctrl-split/xen.git/log
people/liuw/libxenctrl-split/xen.git
9 years agox86: synchronize PCI config space access decoding
Jan Beulich [Thu, 18 Jun 2015 13:07:10 +0000 (15:07 +0200)]
x86: synchronize PCI config space access decoding

Both PV and HVM logic have similar but not similar enough code here.
Synchronize the two so that
- in the HVM case we don't unconditionally try to access extended
  config space
- in the PV case we pass a correct range to the XSM hook
- in the PV case we don't needlessly deny access when the operation
  isn't really on PCI config space
All this along with sharing the macros HVM already had here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoslightly simplify SCHEDOP_remote_shutdown handling
Jan Beulich [Thu, 18 Jun 2015 12:55:18 +0000 (14:55 +0200)]
slightly simplify SCHEDOP_remote_shutdown handling

There's no need for two exit paths each using rcu_unlock_domain() on
its own here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
9 years agoevtchn: remove the locking when unmasking an event channel
David Vrabel [Thu, 18 Jun 2015 12:54:25 +0000 (14:54 +0200)]
evtchn: remove the locking when unmasking an event channel

The event channel lock is no longer required to check if the port is
valid.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
9 years agoevtchn: simplify port_is_valid()
David Vrabel [Thu, 18 Jun 2015 12:53:23 +0000 (14:53 +0200)]
evtchn: simplify port_is_valid()

By keeping a count of the number of currently valid event channels,
port_is_valid() can be simplified.

d->valid_evtchns is only increased (while holding d->event_lock), so
port_is_valid() may be safely called without taking the lock (this
will be useful later).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
9 years agopvusb: don't rely on linux kernel macros for the interface
Juergen Gross [Thu, 18 Jun 2015 12:52:32 +0000 (14:52 +0200)]
pvusb: don't rely on linux kernel macros for the interface

The interface description of pvUSB lacks some access macros as using
linux kernel macros is assumed to work well. This solution is rather
unfriendly for pvusb implementations being outside the linux kernel.
Additionally things will break quite unpleasent in case the linux
kernel implementation is changed.

To avoid these problems define own macros for accessing bitfields of
the interface and for values of several structure members.

While working on the file add some more comments, especially for the
xenstore interface.

Signed-off-by: Juergen Gross <jgross@suse.com>
9 years agooxenstored: fix del_watches and del_transactions
Wei Liu [Wed, 17 Jun 2015 19:39:49 +0000 (20:39 +0100)]
oxenstored: fix del_watches and del_transactions

The statement to reset nb_watches should be in del_watches, not
del_transactions.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Scott <dave.scott@citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
[ ijc -- fix syntax error by adding a ";" to the previous line in the
         new location and removing from the previous line in the old ]

9 years agolibxl: refactor toolstack save restore code
Wei Liu [Wed, 17 Jun 2015 11:08:38 +0000 (12:08 +0100)]
libxl: refactor toolstack save restore code

This patch does following things:
1. Document v1 format.
2. Factor out function to handle QEMU restore data and function to
   handle v1 blob for restore path.
3. Refactor save function to generate different blobs in the order
   specified in format specification.
4. Change functions to use "goto out" idiom.

No functional changes introduced.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen: arm: Do not expose PMU to domain 0
Ian Campbell [Wed, 17 Jun 2015 12:37:00 +0000 (13:37 +0100)]
xen: arm: Do not expose PMU to domain 0

It uses a PPI which we cannot route to a guest, and will surely need
more support than just that anyway.

I noticed this on Mustang with UEFI where the built in DTB contains a
node of this type.

According to linux/Documentation/devicetree/bindings/arm/pmu.txt the
ARM v7 (Cortex-A{7,15}) PMUs require a PPI too, so blacklist them as
well.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
9 years agolibxc: fix xc_dom_load_elf_symtab
Roger Pau Monne [Thu, 11 Jun 2015 16:05:20 +0000 (18:05 +0200)]
libxc: fix xc_dom_load_elf_symtab

xc_dom_load_elf_symtab was incorrectly trying to perform the same
calculations already done in elf_parse_bsdsyms when load == 0 is used.
Instead of trying to repeat the calculations, just trust what
elf_parse_bsdsyms has already accounted for.

This also simplifies the code by allowing the non-load case to return
earlier.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxc: Batch memory allocations for PV guests
Ross Lagerwall [Mon, 15 Jun 2015 10:12:07 +0000 (11:12 +0100)]
tools/libxc: Batch memory allocations for PV guests

The current code for allocating memory for PV guests batches the
hypercalls to allocate memory by allocating 1024*1024 extents of order 0
at a time. To make this faster, first try allocating extents of order 9
(2 MiB) before falling back to the order 0 allocating if the order 9
allocation fails.

On my test machine this reduced the time to start a 128 GiB PV guest by
about 60 seconds.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agooxenstored: implement XS_RESET_WATCHES
Wei Liu [Tue, 9 Jun 2015 10:08:14 +0000 (11:08 +0100)]
oxenstored: implement XS_RESET_WATCHES

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
9 years agolibxc: unify handling of vNUMA layout
Wei Liu [Thu, 4 Jun 2015 10:23:01 +0000 (11:23 +0100)]
libxc: unify handling of vNUMA layout

This patch does the following:
1. Use local variables for dummy vNUMA layout in PV case.
2. Avoid leaking dummy layout back to caller in PV case.
3. Use local variables to reference vNUMA layout (whether it is dummy
   or provided by caller) for both PV and HVM.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: clean up qemu-save and qemu-resume files
Wei Liu [Wed, 3 Jun 2015 10:44:50 +0000 (11:44 +0100)]
libxl: clean up qemu-save and qemu-resume files

These files are leaked when using qemu-trad stubdom.  They are
intermediate files created by libxc. Unfortunately they don't fit well
in our userdata scheme. Clean them up after we destroy all userdata,
we're sure they are not useful anymore at that point.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Tue, 16 Jun 2015 11:42:27 +0000 (12:42 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

9 years agoxenalyze: remove argp_program_version
Olaf Hering [Thu, 11 Jun 2015 16:30:44 +0000 (16:30 +0000)]
xenalyze: remove argp_program_version

Since xenalyze is now upstream its Open Source and part of the given
release.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
9 years agoxenalyze: remove trailing whitespaces
Olaf Hering [Thu, 11 Jun 2015 16:30:43 +0000 (16:30 +0000)]
xenalyze: remove trailing whitespaces

Result of "sed -i 's@[[:blank:]]\+$@@' tools/xentrace/xenalyze.c"

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
9 years agoxenalyze: handle TRC_TRACE_WRAP_BUFFER
Olaf Hering [Thu, 11 Jun 2015 16:30:41 +0000 (16:30 +0000)]
xenalyze: handle TRC_TRACE_WRAP_BUFFER

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
9 years agoxenalyze: include odd mmio states in default output
Olaf Hering [Thu, 11 Jun 2015 16:30:40 +0000 (16:30 +0000)]
xenalyze: include odd mmio states in default output

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
9 years agoxenalyze: print newline after unknown hvm events
Olaf Hering [Thu, 11 Jun 2015 16:30:39 +0000 (16:30 +0000)]
xenalyze: print newline after unknown hvm events

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
9 years agoxenalyze: add to tools/xentrace/
Olaf Hering [Thu, 11 Jun 2015 16:30:38 +0000 (16:30 +0000)]
xenalyze: add to tools/xentrace/

This merges xenalyze.hg, changeset 150:24308507be1d,
into tools/xentrace/xenalyze.c to have the tool and
public/trace.h in one place.

Adjust code to use public/trace.h instead of private trace.h

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- wrap $(BIN) install in a check in case it is empty (which it
 is on !x86, avoid BIN += since it results in BIN = ' ' on
         !x86 ]

9 years agoevtchn: factor out freeing an event channel
David Vrabel [Tue, 16 Jun 2015 10:30:16 +0000 (12:30 +0200)]
evtchn: factor out freeing an event channel

We're going to want to free an event channel from two places.  Factor out
the code into a free_evtchn() function.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
9 years agohvmloader: properly parenthesize pci_write* macros
Don Slutz [Tue, 16 Jun 2015 10:29:59 +0000 (12:29 +0200)]
hvmloader: properly parenthesize pci_write* macros

Signed-off-by: Don Slutz <dslutz@verizon.com>
9 years agognttab: make struct grant_mapping private
Jan Beulich [Tue, 16 Jun 2015 10:29:18 +0000 (12:29 +0200)]
gnttab: make struct grant_mapping private

This documents that no entity outside of gnttab.c actually accesses
objects of that type, which is particularly important with the now more
fine grained locking in place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agognttab: fix/adjust gnttab_transfer()
Jan Beulich [Tue, 16 Jun 2015 10:28:11 +0000 (12:28 +0200)]
gnttab: fix/adjust gnttab_transfer()

- don't update shared entry's frame number for translated domains (as
  MFNs shouldn't be exposed to such guests)
- for v1 grant table format, force copying of the page also when the
  intended MFN doesn't fit in 32 bits (and the domain isn't translated)
- fix an apparent off-by-one error (it's unclear to me why commit
  5cc77f9098 ("32-on-64: Fix domain address-size clamping, implement")
  uses BITS_PER_LONG-1 here, while using BITS_PER_LONG in the two other
  invocations of domain_clamp_alloc_bitsize())
- adjust comments accompanying the shared entry's frame field

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agognttab: simplify page copying/clearing
Jan Beulich [Tue, 16 Jun 2015 10:26:35 +0000 (12:26 +0200)]
gnttab: simplify page copying/clearing

... by making {copy,clear}_domain_page() available also on other than
x86.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agognttab: simplify shared entry v1 vs v2 handling
Jan Beulich [Tue, 16 Jun 2015 10:26:03 +0000 (12:26 +0200)]
gnttab: simplify shared entry v1 vs v2 handling

In a number of places both v1 and v2 pointers are being obtained when
none or just one suffices. Additionally in __acquire_grant_for_copy()
the flow of if/else-if can be slightly improved by re-ordering.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agognttab: limit mapcount() looping
Jan Beulich [Tue, 16 Jun 2015 10:25:35 +0000 (12:25 +0200)]
gnttab: limit mapcount() looping

The function doesn't need to return counts in the first place; all its
callers are after is whether at least one entry of a certain kind
exists. With that there's no point for that loop to continue once the
looked for condition was found to be met by one entry. Rename the
function to match the changed behavior.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agognttab: eliminate several explicit version checks
Jan Beulich [Tue, 16 Jun 2015 10:24:49 +0000 (12:24 +0200)]
gnttab: eliminate several explicit version checks

By having nr_grant_entries() return zero when the grant table version
is still unset we can reduce the number of error paths and at once fix
grant_map_exists() running into the being removed ASSERT() when called
for a page owned by a domain not having its grant table set up yet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/MSI: partly restore commit 73cb5d43a8 (build fix)
Jan Beulich [Mon, 15 Jun 2015 11:27:53 +0000 (13:27 +0200)]
x86/MSI: partly restore commit 73cb5d43a8 (build fix)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agognttab: make the grant table lock a read-write lock
David Vrabel [Mon, 15 Jun 2015 11:25:20 +0000 (13:25 +0200)]
gnttab: make the grant table lock a read-write lock

In combination with the per-active entry locks, the grant table lock
can be made a read-write lock since the majority of cases only the
read lock is required. The grant table read lock protects against
changes to the table version or size (which are done with the write
lock held).

The write lock is also required when two active entries must be
acquired.

The double lock is still required when updating IOMMU page tables.

With the lock contention being only on the maptrack lock (unless IOMMU
updates are required), performance and scalability is improved.

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agognttab: introduce maptrack lock
David Vrabel [Mon, 15 Jun 2015 11:23:34 +0000 (13:23 +0200)]
gnttab: introduce maptrack lock

Split grant table lock into two separate locks. One to protect
maptrack free list (maptrack_lock) and one for everything else (lock).

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agognttab: per-active entry locking
David Vrabel [Mon, 15 Jun 2015 11:22:07 +0000 (13:22 +0200)]
gnttab: per-active entry locking

Introduce a per-active entry spin lock to protect active entry state
The grant table lock must be locked before acquiring (locking) an
active entry.

This is a step in reducing contention on the grant table lock, but
will only do so once the grant table lock is turned into a read-write
lock.

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agoRevert "x86/MSI-X: use qword MMIO access for address writes"
Jan Beulich [Mon, 15 Jun 2015 09:32:28 +0000 (11:32 +0200)]
Revert "x86/MSI-X: use qword MMIO access for address writes"

This reverts commit 73cb5d43a8f48930e4594ef7b15b974487651ffe,
which appears to break with certain Tigon3 NICs.

9 years agolibxl: libxl_internal.h: Clarify ao rule against internal callers
Ian Jackson [Thu, 11 Jun 2015 16:56:15 +0000 (17:56 +0100)]
libxl: libxl_internal.h: Clarify ao rule against internal callers

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Juergen Gross <jgross@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
9 years agox86: avoid tripping watchdog when constructing dom0
Ross Lagerwall [Fri, 12 Jun 2015 10:07:05 +0000 (12:07 +0200)]
x86: avoid tripping watchdog when constructing dom0

Constructing dom0 may take a few seconds, particularly if the slow VESA
graphics terminal is used. Process pending softirqs a few times to avoid
tripping a watchdog with a short timeout.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Move inclusion of xen/softirq.h (and at once clean up other includes).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agocpupool: fix shutdown with cpupools with different schedulers
Dario Faggioli [Fri, 12 Jun 2015 10:06:24 +0000 (12:06 +0200)]
cpupool: fix shutdown with cpupools with different schedulers

trying to shutdown the host when a cpupool exists, has
pCPUs, and has a scheduler different than the Xen's default
one, produces this:

 root@Zhaman:~# xl cpupool-cpu-remove Pool-0 8
 root@Zhaman:~# xl cpupool-create name=\"Pool-1\" sched=\"credit2\"
 Using config file "command line"
 cpupool name:   Pool-1
 scheduler:      credit2
 number of cpus: 0
 root@Zhaman:~# xl cpupool-cpu-add Pool-1 8
 root@Zhaman:~# shutdown -h now

 (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
 (XEN) CPU:    0
 (XEN) RIP:    e008:[<ffff82d080133bdf>] kill_timer+0x56/0x298
 (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
  ... ... ...
 (XEN) Xen call trace:
 (XEN)    [<ffff82d080133bdf>] kill_timer+0x56/0x298
 (XEN)    [<ffff82d08012233f>] csched_free_pdata+0x9b/0xcf
 (XEN)    [<ffff82d08012c30c>] cpu_schedule_callback+0x64/0x8b
 (XEN)    [<ffff82d08011bc7a>] notifier_call_chain+0x67/0x87
 (XEN)    [<ffff82d08010153e>] cpu_down+0xd9/0x12c
 (XEN)    [<ffff82d080101744>] disable_nonboot_cpus+0x93/0x138
 (XEN)    [<ffff82d0801aa6e7>] enter_state_helper+0xbd/0x365
 (XEN)    [<ffff82d0801061e5>] continue_hypercall_tasklet_handler+0x4a/0xb1
 (XEN)    [<ffff82d080132387>] do_tasklet_work+0x78/0xab
 (XEN)    [<ffff82d0801326bd>] do_tasklet+0x5e/0x8a
 (XEN)    [<ffff82d0801646d2>] idle_loop+0x56/0x6b
  ... ... ...
 (XEN) ****************************************
 (XEN) Panic on CPU 0:
 (XEN) FATAL PAGE FAULT
 (XEN) [error_code=0000]
 (XEN) Faulting linear address: 0000000000000041
 (XEN) ****************************************

The fix is, when tearing down a pCPU, call the free_pdata()
hook from the scheduler of the cpupool the pCPU belongs to,
not always the one from the default scheduler.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
9 years agolibelf: fix elf_parse_bsdsyms call
Roger Pau Monné [Fri, 12 Jun 2015 10:05:54 +0000 (12:05 +0200)]
libelf: fix elf_parse_bsdsyms call

elf_parse_bsdsyms expects the second paramater to be a physical address, not
a virtual one.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
9 years agox86/context-switch: prefer is_..._domain() over is_..._vcpu()
Jan Beulich [Fri, 12 Jun 2015 10:04:26 +0000 (12:04 +0200)]
x86/context-switch: prefer is_..._domain() over is_..._vcpu()

Latch both domains alongside both vCPU-s into local variables, making
use of them where possible also beyond what the title says.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/HVM: prefer is_..._domain() over is_..._vcpu()
Jan Beulich [Fri, 12 Jun 2015 10:03:56 +0000 (12:03 +0200)]
x86/HVM: prefer is_..._domain() over is_..._vcpu()

... when the domain pointer is already available or such operations
occur frequently in a function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: prefer is_..._domain() over is_..._vcpu()
Jan Beulich [Fri, 12 Jun 2015 10:03:13 +0000 (12:03 +0200)]
x86: prefer is_..._domain() over is_..._vcpu()

... when the domain pointer is already available or such operations
occur frequently in a function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agodomctl: prefer is_..._domain() over is_..._vcpu()
Jan Beulich [Fri, 12 Jun 2015 10:02:12 +0000 (12:02 +0200)]
domctl: prefer is_..._domain() over is_..._vcpu()

... when the domain pointer is already available.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoEFI: map allocation size must be set to zero
Jan Beulich [Thu, 11 Jun 2015 12:47:54 +0000 (14:47 +0200)]
EFI: map allocation size must be set to zero

Commit 8a753b3f1c ("efi: fix allocation problems if ExitBootServices()
fails") replaced the use of a static (and hence zero-initialized)
variable by an automatic (and hence uninitialized) one.

Also drop the variable introduced by that commit in favor of re-using
another available and suitable one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/traps: loop in the correct direction in compat_iret()
Andrew Cooper [Thu, 11 Jun 2015 12:44:47 +0000 (14:44 +0200)]
x86/traps: loop in the correct direction in compat_iret()

This is CVE-2015-4164 / XSA-136.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agognttab: add missing version check to GNTTABOP_swap_grant_ref handling
Jan Beulich [Thu, 11 Jun 2015 12:44:12 +0000 (14:44 +0200)]
gnttab: add missing version check to GNTTABOP_swap_grant_ref handling

... avoiding NULL derefs when the version to use wasn't set yet (via
GNTTABOP_setup_table or GNTTABOP_set_version).

This is CVE-2015-4163 / XSA-134.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoEFI/ARM: don't treat EfiBootServices{Code,Data} as normal RAM under /mapbs
Jan Beulich [Thu, 11 Jun 2015 09:58:29 +0000 (11:58 +0200)]
EFI/ARM: don't treat EfiBootServices{Code,Data} as normal RAM under /mapbs

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoVT-d: extend quirks to newer desktop chipsets
Jan Beulich [Thu, 11 Jun 2015 09:55:05 +0000 (11:55 +0200)]
VT-d: extend quirks to newer desktop chipsets

We're being told that while on the server side the issue we're trying
to work around is fixed starting with IvyBridge (another round of
double checking is going on before we're going to remove the one
IvyBridge ID that we're currently applying the workaround for), on the
desktop side even Skylake still requires the workaround. Hence we need
to add a whole bunch of desktop IDs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
9 years agoVT-d: use qword MMIO access for MSI address writes
Jan Beulich [Thu, 11 Jun 2015 09:54:10 +0000 (11:54 +0200)]
VT-d: use qword MMIO access for MSI address writes

Also make dmar_{read,write}q() actually do what their names suggest (we
don't need to be concerned of 32-bit restrictions anymore).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/MSI-X: use qword MMIO access for address writes
Jan Beulich [Thu, 11 Jun 2015 09:53:20 +0000 (11:53 +0200)]
x86/MSI-X: use qword MMIO access for address writes

Now that we support it for our guests, let's do so ourselves too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/vMSI-X: support qword MMIO access
Jan Beulich [Thu, 11 Jun 2015 09:52:18 +0000 (11:52 +0200)]
x86/vMSI-X: support qword MMIO access

The specification explicitly provides for this, so we should have
supported this from the beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agotools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
Andrew Cooper [Mon, 13 Apr 2015 16:07:03 +0000 (16:07 +0000)]
tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125

gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
uint64_t and unsigned long.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoevtchn: profile event channel lock
David Vrabel [Wed, 10 Jun 2015 10:06:02 +0000 (12:06 +0200)]
evtchn: profile event channel lock

The per-domain event channel lock may suffer from contention.  Add it to
the set of locks to be profiled when lock profiling is enabled.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
9 years agox86/EFI: adjust EFI_MEMORY_WP handling for spec version 2.5
Jan Beulich [Wed, 10 Jun 2015 10:05:21 +0000 (12:05 +0200)]
x86/EFI: adjust EFI_MEMORY_WP handling for spec version 2.5

That flag now means cachability rather than protection, and a new flag
EFI_MEMORY_RO got added in its place.

Along with EFI_MEMORY_RO also add the two other new EFI_MEMORY_*
definitions, even if we don't need them right away.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoEFI: support default attributes to map Runtime service areas with none given
Konrad Rzeszutek Wilk [Wed, 10 Jun 2015 10:04:07 +0000 (12:04 +0200)]
EFI: support default attributes to map Runtime service areas with none given

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

For example on Dell machines we see:

(XEN)  00000fed18000-00000fed19fff type=11 attr=8000000000000000
(XEN) Unknown cachability for MFNs 0xfed18-0xfed19

Let's allow them to be mapped as UC.

We also alter the 'efi-rs' to be 'efi=rs' or 'efi=no-rs'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoEFI/early: add /mapbs to map EfiBootServices{Code,Data}
Konrad Rzeszutek Wilk [Wed, 10 Jun 2015 10:02:43 +0000 (12:02 +0200)]
EFI/early: add /mapbs to map EfiBootServices{Code,Data}

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

To help on certain platforms to run.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/EFI: fix EFI_MEMORY_WP handling
Jan Beulich [Wed, 10 Jun 2015 10:01:35 +0000 (12:01 +0200)]
x86/EFI: fix EFI_MEMORY_WP handling

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoefi: avoid calling boot services after ExitBootServices()
Ross Lagerwall [Wed, 10 Jun 2015 09:57:18 +0000 (11:57 +0200)]
efi: avoid calling boot services after ExitBootServices()

After the first call to ExitBootServices(), avoid calling any boot
services (except GetMemoryMap() and ExitBootServices()) by setting
setting efi_bs to NULL and halting in blexit(). Only GetMemoryMap() and
ExitBootServices() are explicitly allowed to be called after the first
call to ExitBootServices() and so are are called via
SystemTable->BootServices.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoQEMU_TAG update
Ian Jackson [Tue, 9 Jun 2015 15:21:43 +0000 (16:21 +0100)]
QEMU_TAG update

9 years agokexec: add more pages to v1 environment
Jan Beulich [Tue, 9 Jun 2015 14:00:24 +0000 (16:00 +0200)]
kexec: add more pages to v1 environment

Destination pages need mappings to be added to the page tables in the
v1 case (where nothing else calls machine_kexec_add_page() for them).

Further, without the tools mapping the low 1Mb (expected by at least
some Linux version), we need to do so in the hypervisor in the v1 case.

Suggested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Alan Robinson <alan.robinson@ts.fujitsu.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: adjust PV I/O emulation functions' types
Jan Beulich [Tue, 9 Jun 2015 13:59:31 +0000 (15:59 +0200)]
x86: adjust PV I/O emulation functions' types

admin_io_okay(), guest_io_read(), and guest_io_write() all don't need
their current "regs" parameter at all, and they don't use the vCPU
passed to them for other than obtaining its domain. Drop the former and
replace the latter by a struct domain pointer.

pci_cfg_okay() returns a boolean type, and its "write" parameter is of
boolean kind too.

All of them get called for the current vCPU (and hence current domain)
only, so name the domain parameters accordingly except in the
admin_io_okay() case, which a subsequent patch will use for simplifying
setup_io_bitmap().

Latch current->domain into a local variable in emulate_privileged_op().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agorangeset: "has" and "is" functions return boolean
Jan Beulich [Tue, 9 Jun 2015 13:57:26 +0000 (15:57 +0200)]
rangeset: "has" and "is" functions return boolean

Additionally rangeset_is_empty()'s sole parameter can be const.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agomake do_sched_op_compat() x86-specific
Jan Beulich [Tue, 9 Jun 2015 13:56:03 +0000 (15:56 +0200)]
make do_sched_op_compat() x86-specific

Being a pre-3.1 compatibility hypercall handler only, it's not needed
on ARM or any future architectures Xen may get ported to.

Also the function shouldn't really be used internally - its use should
be limited to its purpose (and hence there's also no need for a
prototype).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@cirix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoarinc653: don't leak hypervisor stack contents through XEN_SYSCTL_SCHEDOP_getinfo
Jan Beulich [Tue, 9 Jun 2015 13:54:53 +0000 (15:54 +0200)]
arinc653: don't leak hypervisor stack contents through XEN_SYSCTL_SCHEDOP_getinfo

Note that due to XSA-77 this is not a security issue.

Reported-by: "栾尚聪(好风)" <shangcong.lsc@alibaba-inc.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by Robert VanVossen <robert.vanvossen@dornerworks.com>

9 years agoarm: use existing __section() macro instead of opencoding it
Andrew Cooper [Mon, 8 Jun 2015 13:38:39 +0000 (15:38 +0200)]
arm: use existing __section() macro instead of opencoding it

No functional change

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/mm: print domain IDs instead of pointers
Jan Beulich [Mon, 8 Jun 2015 12:41:25 +0000 (14:41 +0200)]
x86/mm: print domain IDs instead of pointers

Printing pointers to struct domain isn't really useful for initial
problem analysis. In get_page() also drop the page only after issuing
the log message, so that at the time of printing the state can be
considered reasonably consistent.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/VPMU: add lost Intel processor
Alan Robinson [Mon, 8 Jun 2015 12:17:06 +0000 (14:17 +0200)]
x86/VPMU: add lost Intel processor

commit 6d112f2b50 ("x86/vPMU: change Intel model numbers from decimal
to hex") translated 47 to 0x27, now corrected to 0x2f.

Signed-off-by: Alan Robinson <Alan.Robinson@ts.fujitsu.com>
Signed-off-by: Dietmar Hahn <Dietmar.Hahn@ts.fujitsu.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/setup: move CPU0s stack out of the Xen text/data/bss virtual region
Andrew Cooper [Mon, 8 Jun 2015 12:16:27 +0000 (14:16 +0200)]
x86/setup: move CPU0s stack out of the Xen text/data/bss virtual region

Currently, the BSP's stack is the BSS symbol cpu0_stack.  In builds using
memguard_stack(), a page gets shot out of the mappings.

To avoid shattering the superpage which will eventually map the BSS, use the
directmap virtual address of cpu0_stack, while still using the same underlying
physical memory.  (Xen has an order 21 physical relocation requirement meaning
that the order 3 alignment requirement for cpu0_stack will be honoured even
via its diretmap mapping.)

In addition, fix two issues exposed by the changes.

 * do_invalid_op() should use is_active_kernel_text() rather than having its
   own, different, idea of when to search through the bugframes.
 * Setting of system_state to active needs to be deferred until after code has
   left .init.text, for bugframes/backtraces to function in reinit_bsp_stack().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: misc boot/link tweaking
Andrew Cooper [Mon, 8 Jun 2015 12:15:59 +0000 (14:15 +0200)]
x86: misc boot/link tweaking

 * Introduce symbols bounding the multiboot1 header, which helps clarify that
   it is data and not code corruption when viewing the disassembly.
 * Move the __high_start symbol to its implementation, and declare it
   correctly as ENTRY()
 * Move the l1_identmap construction to be with all the other pagetables, and
   within __page_tables_{start,end}.  This won't affect the EFI relocation
   algorithm, as l1_identmap contains no relocations.
 * Move the cpu0_stack alignment check to the linker.  Chances are very good
   that a binary with a misaligned stack won't get as far as the test.
 * Use MB() in linker script.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: use existing __section() macro instead of opencoding it
Andrew Cooper [Mon, 8 Jun 2015 12:14:38 +0000 (14:14 +0200)]
x86: use existing __section() macro instead of opencoding it

No functional change

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agosched_rt: fix memory leak in rt_init()
Andrew Cooper [Mon, 8 Jun 2015 12:13:23 +0000 (14:13 +0200)]
sched_rt: fix memory leak in rt_init()

Introduced by c/s 376bbba "sched_rt: print useful affinity info when dumping".
If the allocation of cpumask failed, prv was leaked.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-ID: 1304398
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
9 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Fri, 5 Jun 2015 13:35:49 +0000 (14:35 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

9 years agoxen: arm: add missing newline to error message.
Ian Campbell [Thu, 4 Jun 2015 15:31:41 +0000 (16:31 +0100)]
xen: arm: add missing newline to error message.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
9 years agoxen/arm: vgic-v3: Clean the emulation of IROUTER
Julien Grall [Mon, 25 May 2015 20:44:20 +0000 (21:44 +0100)]
xen/arm: vgic-v3: Clean the emulation of IROUTER

The read emulation of the register IROUTER contains lots of uncessary
code as irouter is already valid and doesn't need any processing before
setting the value in a register.

Also take the opportunity to factorize the code to find a vCPU from the
affinity in a single place. It will be easier to change the way to do it
later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Chen Baozi <cbz@baozis.org>
Acked-by: Chen Baozi <baozich@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agovm_event: clean up control-register-write vm_events and add XCR0 event
Razvan Cojocaru [Fri, 5 Jun 2015 10:20:18 +0000 (12:20 +0200)]
vm_event: clean up control-register-write vm_events and add XCR0 event

As suggested by Andrew Cooper, this patch attempts to remove
some redundancy and allow for an easier time when adding vm_events
for new control registers in the future, by having a single
VM_EVENT_REASON_WRITE_CTRLREG vm_event type, meant to serve CR0,
CR3, CR4 and (newly introduced) XCR0. The actual control register
will be deduced by the new .index field in vm_event_write_ctrlreg
(renamed from vm_event_mov_to_cr).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agovmap: convert vmap() to using mfn_t
Andrew Cooper [Fri, 5 Jun 2015 10:17:16 +0000 (12:17 +0200)]
vmap: convert vmap() to using mfn_t

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agomem: expose typesafe mfns/gfns/pfns to common code
Andrew Cooper [Fri, 5 Jun 2015 10:10:33 +0000 (12:10 +0200)]
mem: expose typesafe mfns/gfns/pfns to common code

As the first step of memory management cleanup, introduce the common code to
mfn_t, gfn_t and pfn_t.

The typesafe construction moves to its own header file, while the declarations
and sentinal values are moved to being common.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agox86/paging: remove pointless current domain checks
Jan Beulich [Fri, 5 Jun 2015 10:09:18 +0000 (12:09 +0200)]
x86/paging: remove pointless current domain checks

Checking that the subject domain is not the current one is pointless
when already having paused that domain: domain_pause() already
ASSERT()s this to be the case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
9 years agotools: link executables with libtinfo explicitly
Daniel Kiper [Tue, 2 Jun 2015 13:33:26 +0000 (15:33 +0200)]
tools: link executables with libtinfo explicitly

binutils 2.22 changed ld default from --copy-dt-needed-entries
to -no-copy-dt-needed-entries. This revealed that some objects
are linked implicitly with libtinfo and newer ld fails to build
relevant executables.

Below is short explanation why we should not do that...

http://fedoraproject.org/wiki/UnderstandingDSOLinkChange says:

The default behaviour for ld (my note: before version 2.22) allows
users to 'indirectly' link to required objects/libraries through
intermediate objects/libraries. While this is convenient, it can
also be dangerous because it makes your program's dependencies tied
to the dependencies of other objects. If those objects ever change
their linkages, they can break your program without any changes
to your own code!

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: gic-hip04: Resync the driver with the GICv2
Julien Grall [Wed, 6 May 2015 18:52:30 +0000 (19:52 +0100)]
xen/arm: gic-hip04: Resync the driver with the GICv2

The GIC hip04 driver was differring from GICv2. I suspect that some of
the changes in the common GIC code make boot fail on hip04. Although, I
don't have a platform to check so it has been only build tested.

List of GICv2 commit ported to the HIP04:
    commit ce12e6dba4b2d120e35dffd95a745452224e7144
    Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
    Date:   Fri Apr 10 16:21:10 2015 +1000

        xen/arm: Don't write to GICH_MISR

        GICH_MISR is read-only in GICv2.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
    commit 2eb4f996547dc632aa94b2b7b4f783bec8ffe457
    Author: Julien Grall <julien.grall@linaro.org>
    Date:   Wed Apr 1 17:21:47 2015 +0100

        xen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts

        GICD_TYPER.ITLinesNumber can encode up to 1024 interrupts. Although,
        IRQ 1020-1023 are reserved for special purpose.

        The result is used by the callers of gic_number_lines in order to check
        the validity of an IRQ.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
    commit e2d486b385ce58b6db7561417de28ba837dcd4ac
    Author: Julien Grall <julien.grall@linaro.org>
    Date:   Wed Apr 1 17:21:34 2015 +0100

        xen/arm: Divide GIC initialization in 2 parts

        Currently the function to translate IRQ from the device tree is set
        unconditionally  to be able to be able to retrieve serial/timer IRQ
        before the GIC has been initialized.

        It assumes that the xlate function won't ever changed. We may also need
        to have the primary interrupt controller very early.

        Rework the gic initialization in 2 parts:
            - gic_preinit: Get the interrupt controller device tree node and
            set up GIC and xlate callbacks
            - gic_init: Initialize the interrupt controller and the boot CPU
            interrupts.

        The former function will be called just after the IRQ subsystem as been
        initialized.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Reviewed-by: Zoltan Kiss <zoltan.kiss@huawei.com>
Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: remove code in stubdom creation failure path and callback
Wei Liu [Mon, 1 Jun 2015 17:24:35 +0000 (18:24 +0100)]
libxl: remove code in stubdom creation failure path and callback

The snippet to destroy stubdom and the callback were added in 1fc3aeb3
("libxl: use new QEMU xenstore protocol"). The intention was to destroy
stubdom when it is not responsive. That approach is problematic because
rc is not propagate back to sdss->callback, hence the guest is leaked.

The solution is simple. The destruction of stubdom can be done later in
sdss->callback. That code path already does the right thing to destroy
both the guest and the stubdom that serves the guest.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: fix HVM vNUMA
Wei Liu [Mon, 1 Jun 2015 10:19:14 +0000 (11:19 +0100)]
libxl: fix HVM vNUMA

This patch does two thing:

The original code erroneously fills in xc_hvm_build_args before
generating vmemranges. The effect is that guest memory is populated
without vNUMA information. Move the hunk to right place to fix this.

Move the subtraction of video ram to libxl__vnuma_build_vmemrange_hvm
because it's the central place for generating vmemranges.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: rework vnuma bits in setup_guest
Wei Liu [Mon, 1 Jun 2015 10:19:13 +0000 (11:19 +0100)]
libxc: rework vnuma bits in setup_guest

Make the setup process similar to PV counterpart. That is, to allocate a
P2M array that covers the whole memory range and start from there. This
is clearer than using an array with no holes in it.

Also the dummy layout should take MMIO hole into consideration. We might
end up having two vmemranges in the dummy layout.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: print more error messages when failed
Wei Liu [Mon, 1 Jun 2015 10:19:12 +0000 (11:19 +0100)]
libxc: print more error messages when failed

No functional changes introduced.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc/libxl: fill xc_hvm_build_args in libxl
Wei Liu [Mon, 1 Jun 2015 10:19:11 +0000 (11:19 +0100)]
libxc/libxl: fill xc_hvm_build_args in libxl

When building HVM guests, originally some fields of xc_hvm_build_args
are filled in xc_hvm_build (and buried in the wrong function), some are
set in libxl__build_hvm before passing xc_hvm_build_args to
xc_hvm_build. This is fragile.

After examining the code in xc_hvm_build that sets those fields, we can
in fact move setting of mmio_start etc in libxl. This way we consolidate
memory layout setting in libxl.

The setting of firmware data related fields is left in xc_hvm_build
because it depends on parsing ELF image. Those fields only point to
scratch data that doesn't affect memory layout.

There should be no change in the generated guest memory layout. But the
semantic is changed for xc_hvm_build. Toolstack that built directly on
top of libxc need to adjust to this change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: "Chen, Tiejun" <tiejun.chen@intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: traps: Print a message in debug build when a guest dabt is not handled
Julien Grall [Thu, 28 May 2015 09:10:47 +0000 (10:10 +0100)]
xen/arm: traps: Print a message in debug build when a guest dabt is not handled

This is useful for debugging low level kernel before the guest as setup
the vector table.

Note that the value of the IPA is only here for reference and may not
always be valid if the error came from a stage 1 table translation walk.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- dropped spurious w/s change ]

9 years agolibxc: add missing xc_hypercall_bounce_pre calls
Daniel De Graaf [Tue, 26 May 2015 18:13:29 +0000 (14:13 -0400)]
libxc: add missing xc_hypercall_bounce_pre calls

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/flask: change bool_maxstr to PAGE_SIZE
Daniel De Graaf [Tue, 26 May 2015 18:13:28 +0000 (14:13 -0400)]
xen/flask: change bool_maxstr to PAGE_SIZE

When FLASK_{GET,SET}BOOL is called with a named boolean, the call to
flask_security_resolve_bool is made prior to bool_maxstr being populated
by flask_security_make_bools.  This results in the maximum string length
being specified as zero, which is not useful.  While it would be
possible to initialize bool_maxstr correctly prior to its use, it is
simpler to use a fixed maximum of PAGE_SIZE as is done for the other
calls to safe_copy_string_from_guest.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoflask/policy: updates from osstest runs
Daniel De Graaf [Tue, 26 May 2015 18:13:27 +0000 (14:13 -0400)]
flask/policy: updates from osstest runs

Migration and HVM domain creation both trigger AVC denials that should
be allowed in the default policy; add these rules.

Guest console writes need to be either allowed or denied without audit
depending on the decision of the local administrator; introduce a policy
boolean to switch between these possibilities.

Reported-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxentrace: install into sbin
Olaf Hering [Sat, 23 May 2015 08:24:10 +0000 (08:24 +0000)]
xentrace: install into sbin

Collecting the trace buffer requires root permissions. Adjust Makefile
to install xentrace and xentrace_setsize into sbindir. Leave the
existing support for BIN in place for upcoming changes.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
9 years agox86/memcpy: reduce code size
Andrew Cooper [Wed, 3 Jun 2015 07:28:05 +0000 (09:28 +0200)]
x86/memcpy: reduce code size

'n % BYTES_PER_LONG' is at most 7, and doesn't need a 64bit register mov.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/debugger: use copy_to/from_guest() in dbg_rw_guest_mem()
Andrew Cooper [Wed, 3 Jun 2015 07:27:09 +0000 (09:27 +0200)]
x86/debugger: use copy_to/from_guest() in dbg_rw_guest_mem()

Using gdbsx on Broadwell systems suffers a SMAP violation because
dbg_rw_guest_mem() uses memcpy() with a userspace pointer.

The functions dbg_rw_mem() and dbg_rw_guest_mem() have been updated to pass
'void * __user' pointers which indicates their nature clearly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/crash: don't use set_fixmap() in the crash path
Andrew Cooper [Wed, 3 Jun 2015 07:26:13 +0000 (09:26 +0200)]
x86/crash: don't use set_fixmap() in the crash path

Experimentally, this can result in memory allocation, and in particular a
failed assertion that interrupts are enabled when performing a TLB flush.

  (XEN) Assertion 'local_irq_is_enabled()' failed at smp.c:223
  <snip>
  (XEN) [<ffff82d08018a0d3>] flush_area_mask+0x7/0x134
  (XEN) [<ffff82d08011f7c6>] alloc_domheap_pages+0xa9/0x12a
  (XEN) [<ffff82d08011f8ab>] alloc_xenheap_pages+0x64/0xdb
  (XEN) [<ffff82d080178e08>] alloc_xen_pagetable+0x1c/0xa0
  (XEN) [<ffff82d08017926b>] virt_to_xen_l1e+0x38/0x1be
  (XEN) [<ffff82d080179bff>] map_pages_to_xen+0x80e/0xfd9
  (XEN) [<ffff82d080185a23>] __set_fixmap+0x2c/0x2e
  (XEN) [<ffff82d0801a6fd4>] machine_crash_shutdown+0x186/0x2b2
  (XEN) [<ffff82d0801172bb>] kexec_crash+0x3f/0x5b
  (XEN) [<ffff82d0801479b7>] panic+0x100/0x118
  (XEN) [<ffff82d08019002b>] set_guest_machinecheck_trapbounce+0/0x6d
  (XEN) [<ffff82d080195c15>] do_page_fault+0x40b/0x541
  (XEN) [<ffff82d0802345e0>] handle_exception_saved+0x2e/0x6c

Instead, use the directmap mapping which are writable and involve far less
complexity than set_fixmap()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/apic: Disable the LAPIC later in smp_send_stop()
Andrew Cooper [Wed, 3 Jun 2015 07:25:43 +0000 (09:25 +0200)]
x86/apic: Disable the LAPIC later in smp_send_stop()

__stop_this_cpu() may reset the LAPIC mode back from x2apic to xapic, but will
leave x2apic_enabled alone.  This may cause disconnect_bsp_APIC() in
disable_IO_APIC() to suffer a #GP fault.

Disabling the LAPIC can safely be deferred to being the last action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agosched_rt: use the correct type for _cpumask_scratch
Julien Grall [Wed, 3 Jun 2015 07:24:50 +0000 (09:24 +0200)]
sched_rt: use the correct type for _cpumask_scratch

The commit 376bbbabbda607d2039b8f839f15ff02721597d2 "sched_rt: print useful
affinity info when dumping" breaks build on ARM64:

sched_rt.c: In function ‘rt_init’:
sched_rt.c:442:26: error: assignment from incompatible pointer type [-Werror]
         _cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
                          ^
sched_rt.c: In function ‘rt_alloc_pdata’:
sched_rt.c:489:29: error: passing argument 1 of ‘alloc_cpumask_var’ from incompatible pointer type [-Werror]
     if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )

This is because cpumask_var_t is not a type alias to cpumask_t** when
the number of CPU > 2 * BITS_PER_LONG. The correct type for
_cpumask_scratch should be cpumask_var_t*.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
9 years agounmodified-drivers: tolerate IRQF_DISABLED being undefined
Jan Beulich [Tue, 2 Jun 2015 11:45:03 +0000 (13:45 +0200)]
unmodified-drivers: tolerate IRQF_DISABLED being undefined

It's being removed in Linux 4.1.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoefi: fix allocation problems if ExitBootServices() fails
Ross Lagerwall [Tue, 2 Jun 2015 11:44:24 +0000 (13:44 +0200)]
efi: fix allocation problems if ExitBootServices() fails

If calling ExitBootServices() fails, the required memory map size may
have increased. When initially allocating the memory map, allocate a
slightly larger buffer (by an arbitrary 8 entries) to fix this.

The ARM code path was already allocating a larger buffer than required,
so this moves the code to be common for all architectures.

This was seen on the following machine when using the iscsidxe UEFI
driver. The machine would consistently fail the first call to
ExitBootServices().
System Information
        Manufacturer: Supermicro
        Product Name: X10SLE-F/HF
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 2.00
        Release Date: 04/24/2014

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agosched_rt: print useful affinity info when dumping
Dario Faggioli [Tue, 2 Jun 2015 11:43:15 +0000 (13:43 +0200)]
sched_rt: print useful affinity info when dumping

In fact, printing the cpupool's CPU online mask
for each vCPU is just redundant, as that is the
same for all the vCPUs of all the domains in the
same cpupool, while hard affinity is already part
of the output of dumping domains info.

Instead, print the intersection between hard
affinity and online CPUs, which is --in case of this
scheduler-- the effective affinity always used for
the vCPUs.

This change also takes the chance to add a scratch
cpumask area, to avoid having to either put one
(more) cpumask_t on the stack, or dynamically
allocate it within the dumping routine. (The former
being bad because hypervisor stack size is limited,
the latter because dynamic allocations can fail, if
the hypervisor was built for a large enough number
of CPUs.) We allocate such scratch area, for all pCPUs,
when the first instance of the RTDS scheduler is
activated and, in order not to loose track/leak it
if other instances are activated in new cpupools,
and when the last instance is deactivated, we (sort
of) refcount it.

Such scratch area can be used to kill most of the
cpumasks{_var}_t local variables in other functions
in the file, but that is *NOT* done in this chage.

Finally, convert the file to use keyhandler scratch,
instead of open coded string buffers.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
9 years agodocs: clarification to terms used in hypervisor memory management
Andrew Cooper [Mon, 1 Jun 2015 10:00:18 +0000 (12:00 +0200)]
docs: clarification to terms used in hypervisor memory management

Memory management is hard[citation needed].  Furthermore, it isn't helped by
the inconsistent use of terms through the code, or that some terms have
changed meaning over time.

Describe the currently-used terms in a more practical fashon, so new code has
a concrete reference.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agox86: don't crash when mapping a page using EFI runtime page tables
Ross Lagerwall [Mon, 1 Jun 2015 09:59:14 +0000 (11:59 +0200)]
x86: don't crash when mapping a page using EFI runtime page tables

When an interrupt is received during an EFI runtime service call, Xen
may call map_domain_page() while using the EFI runtime page tables.
This fails because, although the EFI runtime page tables are a
copy of the idle domain's page tables, current points at a different
domain's vCPU.

To fix this, return NULL from mapcache_current_vcpu() when using the EFI
runtime page tables which is treated equivalently to running in an idle
vCPU.

This issue can be reproduced by repeatedly calling GetVariable() from
dom0 while using VT-d, since VT-d frequently maps a page from interrupt
context.

Example call trace:
[<ffff82d0801615dc>] __find_next_zero_bit+0x28/0x60
[<ffff82d08016a10e>] map_domain_page+0x4c6/0x4eb
[<ffff82d080156ae6>] map_vtd_domain_page+0xd/0xf
[<ffff82d08015533a>] msi_msg_read_remap_rte+0xe3/0x1d8
[<ffff82d08014e516>] iommu_read_msi_from_ire+0x31/0x34
[<ffff82d08016ff6c>] set_msi_affinity+0x134/0x17a
[<ffff82d0801737b5>] move_masked_irq+0x5c/0x98
[<ffff82d080173816>] move_native_irq+0x25/0x36
[<ffff82d08016ffcb>] ack_nonmaskable_msi_irq+0x19/0x20
[<ffff82d08016ffdb>] ack_maskable_msi_irq+0x9/0x37
[<ffff82d080173e8b>] do_IRQ+0x251/0x635
[<ffff82d080234502>] common_interrupt+0x62/0x70
[<00000000df7ed2be>] 00000000df7ed2be

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
9 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Fri, 29 May 2015 12:22:31 +0000 (13:22 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

9 years agolibxc/restore: implement Remus checkpointed restore
Yang Hongyang [Mon, 18 May 2015 07:03:56 +0000 (15:03 +0800)]
libxc/restore: implement Remus checkpointed restore

With Remus, the restore flow should be:
the first full migration stream -> { periodically restore stream }

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>