Jan Beulich [Thu, 18 Jun 2015 13:07:10 +0000 (15:07 +0200)]
x86: synchronize PCI config space access decoding
Both PV and HVM logic have similar but not similar enough code here.
Synchronize the two so that
- in the HVM case we don't unconditionally try to access extended
config space
- in the PV case we pass a correct range to the XSM hook
- in the PV case we don't needlessly deny access when the operation
isn't really on PCI config space
All this along with sharing the macros HVM already had here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
There's no need for two exit paths each using rcu_unlock_domain() on
its own here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
David Vrabel [Thu, 18 Jun 2015 12:53:23 +0000 (14:53 +0200)]
evtchn: simplify port_is_valid()
By keeping a count of the number of currently valid event channels,
port_is_valid() can be simplified.
d->valid_evtchns is only increased (while holding d->event_lock), so
port_is_valid() may be safely called without taking the lock (this
will be useful later).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Thu, 18 Jun 2015 12:52:32 +0000 (14:52 +0200)]
pvusb: don't rely on linux kernel macros for the interface
The interface description of pvUSB lacks some access macros as using
linux kernel macros is assumed to work well. This solution is rather
unfriendly for pvusb implementations being outside the linux kernel.
Additionally things will break quite unpleasent in case the linux
kernel implementation is changed.
To avoid these problems define own macros for accessing bitfields of
the interface and for values of several structure members.
While working on the file add some more comments, especially for the
xenstore interface.
Wei Liu [Wed, 17 Jun 2015 19:39:49 +0000 (20:39 +0100)]
oxenstored: fix del_watches and del_transactions
The statement to reset nb_watches should be in del_watches, not
del_transactions.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: David Scott <dave.scott@citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
[ ijc -- fix syntax error by adding a ";" to the previous line in the
new location and removing from the previous line in the old ]
Wei Liu [Wed, 17 Jun 2015 11:08:38 +0000 (12:08 +0100)]
libxl: refactor toolstack save restore code
This patch does following things:
1. Document v1 format.
2. Factor out function to handle QEMU restore data and function to
handle v1 blob for restore path.
3. Refactor save function to generate different blobs in the order
specified in format specification.
4. Change functions to use "goto out" idiom.
No functional changes introduced.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Roger Pau Monne [Thu, 11 Jun 2015 16:05:20 +0000 (18:05 +0200)]
libxc: fix xc_dom_load_elf_symtab
xc_dom_load_elf_symtab was incorrectly trying to perform the same
calculations already done in elf_parse_bsdsyms when load == 0 is used.
Instead of trying to repeat the calculations, just trust what
elf_parse_bsdsyms has already accounted for.
This also simplifies the code by allowing the non-load case to return
earlier.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Mon, 15 Jun 2015 10:12:07 +0000 (11:12 +0100)]
tools/libxc: Batch memory allocations for PV guests
The current code for allocating memory for PV guests batches the
hypercalls to allocate memory by allocating 1024*1024 extents of order 0
at a time. To make this faster, first try allocating extents of order 9
(2 MiB) before falling back to the order 0 allocating if the order 9
allocation fails.
On my test machine this reduced the time to start a 128 GiB PV guest by
about 60 seconds.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Thu, 4 Jun 2015 10:23:01 +0000 (11:23 +0100)]
libxc: unify handling of vNUMA layout
This patch does the following:
1. Use local variables for dummy vNUMA layout in PV case.
2. Avoid leaking dummy layout back to caller in PV case.
3. Use local variables to reference vNUMA layout (whether it is dummy
or provided by caller) for both PV and HVM.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Wed, 3 Jun 2015 10:44:50 +0000 (11:44 +0100)]
libxl: clean up qemu-save and qemu-resume files
These files are leaked when using qemu-trad stubdom. They are
intermediate files created by libxc. Unfortunately they don't fit well
in our userdata scheme. Clean them up after we destroy all userdata,
we're sure they are not useful anymore at that point.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:44 +0000 (16:30 +0000)]
xenalyze: remove argp_program_version
Since xenalyze is now upstream its Open Source and part of the given
release.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:43 +0000 (16:30 +0000)]
xenalyze: remove trailing whitespaces
Result of "sed -i 's@[[:blank:]]\+$@@' tools/xentrace/xenalyze.c"
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:41 +0000 (16:30 +0000)]
xenalyze: handle TRC_TRACE_WRAP_BUFFER
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:40 +0000 (16:30 +0000)]
xenalyze: include odd mmio states in default output
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:39 +0000 (16:30 +0000)]
xenalyze: print newline after unknown hvm events
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:38 +0000 (16:30 +0000)]
xenalyze: add to tools/xentrace/
This merges xenalyze.hg, changeset 150:24308507be1d,
into tools/xentrace/xenalyze.c to have the tool and
public/trace.h in one place.
Adjust code to use public/trace.h instead of private trace.h
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- wrap $(BIN) install in a check in case it is empty (which it
is on !x86, avoid BIN += since it results in BIN = ' ' on
!x86 ]
Jan Beulich [Tue, 16 Jun 2015 10:29:18 +0000 (12:29 +0200)]
gnttab: make struct grant_mapping private
This documents that no entity outside of gnttab.c actually accesses
objects of that type, which is particularly important with the now more
fine grained locking in place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:28:11 +0000 (12:28 +0200)]
gnttab: fix/adjust gnttab_transfer()
- don't update shared entry's frame number for translated domains (as
MFNs shouldn't be exposed to such guests)
- for v1 grant table format, force copying of the page also when the
intended MFN doesn't fit in 32 bits (and the domain isn't translated)
- fix an apparent off-by-one error (it's unclear to me why commit 5cc77f9098 ("32-on-64: Fix domain address-size clamping, implement")
uses BITS_PER_LONG-1 here, while using BITS_PER_LONG in the two other
invocations of domain_clamp_alloc_bitsize())
- adjust comments accompanying the shared entry's frame field
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:26:03 +0000 (12:26 +0200)]
gnttab: simplify shared entry v1 vs v2 handling
In a number of places both v1 and v2 pointers are being obtained when
none or just one suffices. Additionally in __acquire_grant_for_copy()
the flow of if/else-if can be slightly improved by re-ordering.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:25:35 +0000 (12:25 +0200)]
gnttab: limit mapcount() looping
The function doesn't need to return counts in the first place; all its
callers are after is whether at least one entry of a certain kind
exists. With that there's no point for that loop to continue once the
looked for condition was found to be met by one entry. Rename the
function to match the changed behavior.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:24:49 +0000 (12:24 +0200)]
gnttab: eliminate several explicit version checks
By having nr_grant_entries() return zero when the grant table version
is still unset we can reduce the number of error paths and at once fix
grant_map_exists() running into the being removed ASSERT() when called
for a page owned by a domain not having its grant table set up yet.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
David Vrabel [Mon, 15 Jun 2015 11:25:20 +0000 (13:25 +0200)]
gnttab: make the grant table lock a read-write lock
In combination with the per-active entry locks, the grant table lock
can be made a read-write lock since the majority of cases only the
read lock is required. The grant table read lock protects against
changes to the table version or size (which are done with the write
lock held).
The write lock is also required when two active entries must be
acquired.
The double lock is still required when updating IOMMU page tables.
With the lock contention being only on the maptrack lock (unless IOMMU
updates are required), performance and scalability is improved.
Based on a patch originally by Matt Wilson <msw@amazon.com>.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ian Jackson [Thu, 11 Jun 2015 16:56:15 +0000 (17:56 +0100)]
libxl: libxl_internal.h: Clarify ao rule against internal callers
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Juergen Gross <jgross@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Juergen Gross <jgross@suse.com>
Ross Lagerwall [Fri, 12 Jun 2015 10:07:05 +0000 (12:07 +0200)]
x86: avoid tripping watchdog when constructing dom0
Constructing dom0 may take a few seconds, particularly if the slow VESA
graphics terminal is used. Process pending softirqs a few times to avoid
tripping a watchdog with a short timeout.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Move inclusion of xen/softirq.h (and at once clean up other includes).
The fix is, when tearing down a pCPU, call the free_pdata()
hook from the scheduler of the cpupool the pCPU belongs to,
not always the one from the default scheduler.
Jan Beulich [Thu, 11 Jun 2015 12:47:54 +0000 (14:47 +0200)]
EFI: map allocation size must be set to zero
Commit 8a753b3f1c ("efi: fix allocation problems if ExitBootServices()
fails") replaced the use of a static (and hence zero-initialized)
variable by an automatic (and hence uninitialized) one.
Also drop the variable introduced by that commit in favor of re-using
another available and suitable one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Thu, 11 Jun 2015 09:55:05 +0000 (11:55 +0200)]
VT-d: extend quirks to newer desktop chipsets
We're being told that while on the server side the issue we're trying
to work around is fixed starting with IvyBridge (another round of
double checking is going on before we're going to remove the one
IvyBridge ID that we're currently applying the workaround for), on the
desktop side even Skylake still requires the workaround. Hence we need
to add a whole bunch of desktop IDs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Don Dugger <donald.d.dugger@intel.com>
Andrew Cooper [Mon, 13 Apr 2015 16:07:03 +0000 (16:07 +0000)]
tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
uint64_t and unsigned long.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 10 Jun 2015 10:05:21 +0000 (12:05 +0200)]
x86/EFI: adjust EFI_MEMORY_WP handling for spec version 2.5
That flag now means cachability rather than protection, and a new flag
EFI_MEMORY_RO got added in its place.
Along with EFI_MEMORY_RO also add the two other new EFI_MEMORY_*
definitions, even if we don't need them right away.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We also alter the 'efi-rs' to be 'efi=rs' or 'efi=no-rs'.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ross Lagerwall [Wed, 10 Jun 2015 09:57:18 +0000 (11:57 +0200)]
efi: avoid calling boot services after ExitBootServices()
After the first call to ExitBootServices(), avoid calling any boot
services (except GetMemoryMap() and ExitBootServices()) by setting
setting efi_bs to NULL and halting in blexit(). Only GetMemoryMap() and
ExitBootServices() are explicitly allowed to be called after the first
call to ExitBootServices() and so are are called via
SystemTable->BootServices.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 9 Jun 2015 14:00:24 +0000 (16:00 +0200)]
kexec: add more pages to v1 environment
Destination pages need mappings to be added to the page tables in the
v1 case (where nothing else calls machine_kexec_add_page() for them).
Further, without the tools mapping the low 1Mb (expected by at least
some Linux version), we need to do so in the hypervisor in the v1 case.
Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Alan Robinson <alan.robinson@ts.fujitsu.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 9 Jun 2015 13:59:31 +0000 (15:59 +0200)]
x86: adjust PV I/O emulation functions' types
admin_io_okay(), guest_io_read(), and guest_io_write() all don't need
their current "regs" parameter at all, and they don't use the vCPU
passed to them for other than obtaining its domain. Drop the former and
replace the latter by a struct domain pointer.
pci_cfg_okay() returns a boolean type, and its "write" parameter is of
boolean kind too.
All of them get called for the current vCPU (and hence current domain)
only, so name the domain parameters accordingly except in the
admin_io_okay() case, which a subsequent patch will use for simplifying
setup_io_bitmap().
Latch current->domain into a local variable in emulate_privileged_op().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 8 Jun 2015 12:41:25 +0000 (14:41 +0200)]
x86/mm: print domain IDs instead of pointers
Printing pointers to struct domain isn't really useful for initial
problem analysis. In get_page() also drop the page only after issuing
the log message, so that at the time of printing the state can be
considered reasonably consistent.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 8 Jun 2015 12:16:27 +0000 (14:16 +0200)]
x86/setup: move CPU0s stack out of the Xen text/data/bss virtual region
Currently, the BSP's stack is the BSS symbol cpu0_stack. In builds using
memguard_stack(), a page gets shot out of the mappings.
To avoid shattering the superpage which will eventually map the BSS, use the
directmap virtual address of cpu0_stack, while still using the same underlying
physical memory. (Xen has an order 21 physical relocation requirement meaning
that the order 3 alignment requirement for cpu0_stack will be honoured even
via its diretmap mapping.)
In addition, fix two issues exposed by the changes.
* do_invalid_op() should use is_active_kernel_text() rather than having its
own, different, idea of when to search through the bugframes.
* Setting of system_state to active needs to be deferred until after code has
left .init.text, for bugframes/backtraces to function in reinit_bsp_stack().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 8 Jun 2015 12:15:59 +0000 (14:15 +0200)]
x86: misc boot/link tweaking
* Introduce symbols bounding the multiboot1 header, which helps clarify that
it is data and not code corruption when viewing the disassembly.
* Move the __high_start symbol to its implementation, and declare it
correctly as ENTRY()
* Move the l1_identmap construction to be with all the other pagetables, and
within __page_tables_{start,end}. This won't affect the EFI relocation
algorithm, as l1_identmap contains no relocations.
* Move the cpu0_stack alignment check to the linker. Chances are very good
that a binary with a misaligned stack won't get as far as the test.
* Use MB() in linker script.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Mon, 25 May 2015 20:44:20 +0000 (21:44 +0100)]
xen/arm: vgic-v3: Clean the emulation of IROUTER
The read emulation of the register IROUTER contains lots of uncessary
code as irouter is already valid and doesn't need any processing before
setting the value in a register.
Also take the opportunity to factorize the code to find a vCPU from the
affinity in a single place. It will be easier to change the way to do it
later.
Razvan Cojocaru [Fri, 5 Jun 2015 10:20:18 +0000 (12:20 +0200)]
vm_event: clean up control-register-write vm_events and add XCR0 event
As suggested by Andrew Cooper, this patch attempts to remove
some redundancy and allow for an easier time when adding vm_events
for new control registers in the future, by having a single
VM_EVENT_REASON_WRITE_CTRLREG vm_event type, meant to serve CR0,
CR3, CR4 and (newly introduced) XCR0. The actual control register
will be deduced by the new .index field in vm_event_write_ctrlreg
(renamed from vm_event_mov_to_cr).
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 5 Jun 2015 10:09:18 +0000 (12:09 +0200)]
x86/paging: remove pointless current domain checks
Checking that the subject domain is not the current one is pointless
when already having paused that domain: domain_pause() already
ASSERT()s this to be the case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Daniel Kiper [Tue, 2 Jun 2015 13:33:26 +0000 (15:33 +0200)]
tools: link executables with libtinfo explicitly
binutils 2.22 changed ld default from --copy-dt-needed-entries
to -no-copy-dt-needed-entries. This revealed that some objects
are linked implicitly with libtinfo and newer ld fails to build
relevant executables.
Below is short explanation why we should not do that...
The default behaviour for ld (my note: before version 2.22) allows
users to 'indirectly' link to required objects/libraries through
intermediate objects/libraries. While this is convenient, it can
also be dangerous because it makes your program's dependencies tied
to the dependencies of other objects. If those objects ever change
their linkages, they can break your program without any changes
to your own code!
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Wed, 6 May 2015 18:52:30 +0000 (19:52 +0100)]
xen/arm: gic-hip04: Resync the driver with the GICv2
The GIC hip04 driver was differring from GICv2. I suspect that some of
the changes in the common GIC code make boot fail on hip04. Although, I
don't have a platform to check so it has been only build tested.
List of GICv2 commit ported to the HIP04:
commit ce12e6dba4b2d120e35dffd95a745452224e7144
Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Date: Fri Apr 10 16:21:10 2015 +1000
xen/arm: Don't write to GICH_MISR
GICH_MISR is read-only in GICv2.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
commit 2eb4f996547dc632aa94b2b7b4f783bec8ffe457
Author: Julien Grall <julien.grall@linaro.org>
Date: Wed Apr 1 17:21:47 2015 +0100
xen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts
GICD_TYPER.ITLinesNumber can encode up to 1024 interrupts. Although,
IRQ 1020-1023 are reserved for special purpose.
The result is used by the callers of gic_number_lines in order to check
the validity of an IRQ.
Currently the function to translate IRQ from the device tree is set
unconditionally to be able to be able to retrieve serial/timer IRQ
before the GIC has been initialized.
It assumes that the xlate function won't ever changed. We may also need
to have the primary interrupt controller very early.
Rework the gic initialization in 2 parts:
- gic_preinit: Get the interrupt controller device tree node and
set up GIC and xlate callbacks
- gic_init: Initialize the interrupt controller and the boot CPU
interrupts.
The former function will be called just after the IRQ subsystem as been
initialized.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Frediano Ziglio <frediano.ziglio@huawei.com> Cc: Zoltan Kiss <zoltan.kiss@huawei.com> Signed-off-by: Julien Grall <julien.grall@citrix.com> Cc: Zoltan Kiss <zoltan.kiss@huawei.com> Reviewed-by: Zoltan Kiss <zoltan.kiss@huawei.com> Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 17:24:35 +0000 (18:24 +0100)]
libxl: remove code in stubdom creation failure path and callback
The snippet to destroy stubdom and the callback were added in 1fc3aeb3
("libxl: use new QEMU xenstore protocol"). The intention was to destroy
stubdom when it is not responsive. That approach is problematic because
rc is not propagate back to sdss->callback, hence the guest is leaked.
The solution is simple. The destruction of stubdom can be done later in
sdss->callback. That code path already does the right thing to destroy
both the guest and the stubdom that serves the guest.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 10:19:14 +0000 (11:19 +0100)]
libxl: fix HVM vNUMA
This patch does two thing:
The original code erroneously fills in xc_hvm_build_args before
generating vmemranges. The effect is that guest memory is populated
without vNUMA information. Move the hunk to right place to fix this.
Move the subtraction of video ram to libxl__vnuma_build_vmemrange_hvm
because it's the central place for generating vmemranges.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 10:19:13 +0000 (11:19 +0100)]
libxc: rework vnuma bits in setup_guest
Make the setup process similar to PV counterpart. That is, to allocate a
P2M array that covers the whole memory range and start from there. This
is clearer than using an array with no holes in it.
Also the dummy layout should take MMIO hole into consideration. We might
end up having two vmemranges in the dummy layout.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 10:19:12 +0000 (11:19 +0100)]
libxc: print more error messages when failed
No functional changes introduced.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 10:19:11 +0000 (11:19 +0100)]
libxc/libxl: fill xc_hvm_build_args in libxl
When building HVM guests, originally some fields of xc_hvm_build_args
are filled in xc_hvm_build (and buried in the wrong function), some are
set in libxl__build_hvm before passing xc_hvm_build_args to
xc_hvm_build. This is fragile.
After examining the code in xc_hvm_build that sets those fields, we can
in fact move setting of mmio_start etc in libxl. This way we consolidate
memory layout setting in libxl.
The setting of firmware data related fields is left in xc_hvm_build
because it depends on parsing ELF image. Those fields only point to
scratch data that doesn't affect memory layout.
There should be no change in the generated guest memory layout. But the
semantic is changed for xc_hvm_build. Toolstack that built directly on
top of libxc need to adjust to this change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: "Chen, Tiejun" <tiejun.chen@intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Daniel De Graaf [Tue, 26 May 2015 18:13:28 +0000 (14:13 -0400)]
xen/flask: change bool_maxstr to PAGE_SIZE
When FLASK_{GET,SET}BOOL is called with a named boolean, the call to
flask_security_resolve_bool is made prior to bool_maxstr being populated
by flask_security_make_bools. This results in the maximum string length
being specified as zero, which is not useful. While it would be
possible to initialize bool_maxstr correctly prior to its use, it is
simpler to use a fixed maximum of PAGE_SIZE as is done for the other
calls to safe_copy_string_from_guest.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Daniel De Graaf [Tue, 26 May 2015 18:13:27 +0000 (14:13 -0400)]
flask/policy: updates from osstest runs
Migration and HVM domain creation both trigger AVC denials that should
be allowed in the default policy; add these rules.
Guest console writes need to be either allowed or denied without audit
depending on the decision of the local administrator; introduce a policy
boolean to switch between these possibilities.
Reported-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Sat, 23 May 2015 08:24:10 +0000 (08:24 +0000)]
xentrace: install into sbin
Collecting the trace buffer requires root permissions. Adjust Makefile
to install xentrace and xentrace_setsize into sbindir. Leave the
existing support for BIN in place for upcoming changes.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Andrew Cooper [Wed, 3 Jun 2015 07:25:43 +0000 (09:25 +0200)]
x86/apic: Disable the LAPIC later in smp_send_stop()
__stop_this_cpu() may reset the LAPIC mode back from x2apic to xapic, but will
leave x2apic_enabled alone. This may cause disconnect_bsp_APIC() in
disable_IO_APIC() to suffer a #GP fault.
Disabling the LAPIC can safely be deferred to being the last action.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
sched_rt.c: In function ‘rt_init’:
sched_rt.c:442:26: error: assignment from incompatible pointer type [-Werror]
_cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
^
sched_rt.c: In function ‘rt_alloc_pdata’:
sched_rt.c:489:29: error: passing argument 1 of ‘alloc_cpumask_var’ from incompatible pointer type [-Werror]
if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
This is because cpumask_var_t is not a type alias to cpumask_t** when
the number of CPU > 2 * BITS_PER_LONG. The correct type for
_cpumask_scratch should be cpumask_var_t*.
Ross Lagerwall [Tue, 2 Jun 2015 11:44:24 +0000 (13:44 +0200)]
efi: fix allocation problems if ExitBootServices() fails
If calling ExitBootServices() fails, the required memory map size may
have increased. When initially allocating the memory map, allocate a
slightly larger buffer (by an arbitrary 8 entries) to fix this.
The ARM code path was already allocating a larger buffer than required,
so this moves the code to be common for all architectures.
This was seen on the following machine when using the iscsidxe UEFI
driver. The machine would consistently fail the first call to
ExitBootServices().
System Information
Manufacturer: Supermicro
Product Name: X10SLE-F/HF
BIOS Information
Vendor: American Megatrends Inc.
Version: 2.00
Release Date: 04/24/2014
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roy Franz <roy.franz@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Tue, 2 Jun 2015 11:43:15 +0000 (13:43 +0200)]
sched_rt: print useful affinity info when dumping
In fact, printing the cpupool's CPU online mask
for each vCPU is just redundant, as that is the
same for all the vCPUs of all the domains in the
same cpupool, while hard affinity is already part
of the output of dumping domains info.
Instead, print the intersection between hard
affinity and online CPUs, which is --in case of this
scheduler-- the effective affinity always used for
the vCPUs.
This change also takes the chance to add a scratch
cpumask area, to avoid having to either put one
(more) cpumask_t on the stack, or dynamically
allocate it within the dumping routine. (The former
being bad because hypervisor stack size is limited,
the latter because dynamic allocations can fail, if
the hypervisor was built for a large enough number
of CPUs.) We allocate such scratch area, for all pCPUs,
when the first instance of the RTDS scheduler is
activated and, in order not to loose track/leak it
if other instances are activated in new cpupools,
and when the last instance is deactivated, we (sort
of) refcount it.
Such scratch area can be used to kill most of the
cpumasks{_var}_t local variables in other functions
in the file, but that is *NOT* done in this chage.
Finally, convert the file to use keyhandler scratch,
instead of open coded string buffers.
Andrew Cooper [Mon, 1 Jun 2015 10:00:18 +0000 (12:00 +0200)]
docs: clarification to terms used in hypervisor memory management
Memory management is hard[citation needed]. Furthermore, it isn't helped by
the inconsistent use of terms through the code, or that some terms have
changed meaning over time.
Describe the currently-used terms in a more practical fashon, so new code has
a concrete reference.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Ross Lagerwall [Mon, 1 Jun 2015 09:59:14 +0000 (11:59 +0200)]
x86: don't crash when mapping a page using EFI runtime page tables
When an interrupt is received during an EFI runtime service call, Xen
may call map_domain_page() while using the EFI runtime page tables.
This fails because, although the EFI runtime page tables are a
copy of the idle domain's page tables, current points at a different
domain's vCPU.
To fix this, return NULL from mapcache_current_vcpu() when using the EFI
runtime page tables which is treated equivalently to running in an idle
vCPU.
This issue can be reproduced by repeatedly calling GetVariable() from
dom0 while using VT-d, since VT-d frequently maps a page from interrupt
context.
With Remus, the restore flow should be:
the first full migration stream -> { periodically restore stream }
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>