David Vrabel [Mon, 22 Jun 2015 09:38:01 +0000 (11:38 +0200)]
evtchn: defer freeing struct evtchn's until evtchn_destroy_final()
notify_via_xen_event_channel() and free_xen_event_channel() had to
check if the domain was dying because they may be called while the
domain is being destroyed and the struct evtchn's are being freed.
By deferring the freeing of the struct evtchn's until all references
to the domain are dropped, these functions can rely on the channel
state being present and valid.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
David Vrabel [Mon, 22 Jun 2015 09:36:17 +0000 (11:36 +0200)]
evtchn: clear xen_consumer when clearing state
Freeing a xen event channel would clear xen_consumer before clearing
the channel state, leaving a window where the channel is in a funny
state (still bound but no consumer).
Move the clear of xen_consumer into free_evtchn() where the state is
also cleared.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Ditch the pointless evtchn_close() wrapper around __evtchn_close()
(renaming the latter) as well as some bogus casts of function results
to void.
Jan Beulich [Mon, 22 Jun 2015 09:34:57 +0000 (11:34 +0200)]
x86/HVM: EOI handling function adjustments
The vector parameters are more usefully u8 right away. This is
particularly important for the vioapic_update_EOI() invocation from
vioapic_write() (which luckily is only a latent issue, as
VIOAPIC_VERSION_ID is still hard coded to 0x11 right now). But it at
once allows simplifying VMX's EXIT_REASON_EOI_INDUCED handling (the
kind of pointless helper function should have been static anyway; not
being use for anything else, it gets removed altogether).
Plus vlapic_handle_EOI() (now renamed for that purpose) can be used as
the tail of vlapic_EOI_set() instead of duplicating that code.
Finally replace a stray current->domain use in vlapic_handle_EOI().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Malcolm Crossley [Fri, 19 Jun 2015 09:01:24 +0000 (11:01 +0200)]
gnttab: use per-VCPU maptrack free lists
Performance analysis of aggregate network throughput with many VMs
shows that performance is signficantly limited by contention on the
maptrack lock when obtaining/releasing maptrack handles from the free
list.
Instead of a single free list use a per-VCPU list. This avoids any
contention when obtaining a handle. Handles must be released back to
their original list and since this may occur on a different VCPU there
is some contention on the destination VCPU's free list tail pointer
(but this is much better than a per-domain lock).
Increase the default maximum number of maptrack frames by 4 times
because: a) struct grant_mapping is now 16 bytes (instead of 8); and
b) a guest may not evenly distribute all the grant map operations
across the VCPUs (meaning some VCPUs need more maptrack entries than
others).
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 19 Jun 2015 08:59:53 +0000 (10:59 +0200)]
x86/MSI: track host and guest masking separately
In particular we want to avoid losing track of our own intention to
have an entry masked. Physical unmasking now happens only when both
host and guest requested so.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 19 Jun 2015 08:58:45 +0000 (10:58 +0200)]
x86/MSI-X: cleanup
- __pci_enable_msix() now checks that an MSI-X capability was actually
found
- pass "pos" to msix_capability_init() as both callers already know it
(and hence there's no need to re-obtain it)
- call __pci_disable_msi{,x}() directly instead of via
pci_disable_msi() from __pci_enable_msi{x,}() state validation paths
- use msix_control_reg() instead of open coding it
- log message adjustments
- coding style corrections
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 18 Jun 2015 14:44:15 +0000 (16:44 +0200)]
x86/HVM: avoid pointer wraparound in bufioreq handling
The number of slots per page being 511 (i.e. not a power of two) means
that the (32-bit) read and write indexes going beyond 2^32 will likely
disturb operation. Extend I/O req server creation so the caller can
indicate that it is using suitable atomic accesses where needed (not
all accesses to the two pointers really need to be atomic), allowing
the hypervisor to atomically canonicalize both pointers when both have
gone through at least one cycle.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 18 Jun 2015 14:42:56 +0000 (16:42 +0200)]
x86/HAP: prefer is_..._domain() over is_..._vcpu()
In hvm_hap_nested_page_fault() latch the current domain alongside the
current vCPU into a local variable, making use of it where possible
also beyond what the title says.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 18 Jun 2015 13:07:10 +0000 (15:07 +0200)]
x86: synchronize PCI config space access decoding
Both PV and HVM logic have similar but not similar enough code here.
Synchronize the two so that
- in the HVM case we don't unconditionally try to access extended
config space
- in the PV case we pass a correct range to the XSM hook
- in the PV case we don't needlessly deny access when the operation
isn't really on PCI config space
All this along with sharing the macros HVM already had here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
There's no need for two exit paths each using rcu_unlock_domain() on
its own here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
David Vrabel [Thu, 18 Jun 2015 12:53:23 +0000 (14:53 +0200)]
evtchn: simplify port_is_valid()
By keeping a count of the number of currently valid event channels,
port_is_valid() can be simplified.
d->valid_evtchns is only increased (while holding d->event_lock), so
port_is_valid() may be safely called without taking the lock (this
will be useful later).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Thu, 18 Jun 2015 12:52:32 +0000 (14:52 +0200)]
pvusb: don't rely on linux kernel macros for the interface
The interface description of pvUSB lacks some access macros as using
linux kernel macros is assumed to work well. This solution is rather
unfriendly for pvusb implementations being outside the linux kernel.
Additionally things will break quite unpleasent in case the linux
kernel implementation is changed.
To avoid these problems define own macros for accessing bitfields of
the interface and for values of several structure members.
While working on the file add some more comments, especially for the
xenstore interface.
Wei Liu [Wed, 17 Jun 2015 19:39:49 +0000 (20:39 +0100)]
oxenstored: fix del_watches and del_transactions
The statement to reset nb_watches should be in del_watches, not
del_transactions.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: David Scott <dave.scott@citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
[ ijc -- fix syntax error by adding a ";" to the previous line in the
new location and removing from the previous line in the old ]
Wei Liu [Wed, 17 Jun 2015 11:08:38 +0000 (12:08 +0100)]
libxl: refactor toolstack save restore code
This patch does following things:
1. Document v1 format.
2. Factor out function to handle QEMU restore data and function to
handle v1 blob for restore path.
3. Refactor save function to generate different blobs in the order
specified in format specification.
4. Change functions to use "goto out" idiom.
No functional changes introduced.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Roger Pau Monne [Thu, 11 Jun 2015 16:05:20 +0000 (18:05 +0200)]
libxc: fix xc_dom_load_elf_symtab
xc_dom_load_elf_symtab was incorrectly trying to perform the same
calculations already done in elf_parse_bsdsyms when load == 0 is used.
Instead of trying to repeat the calculations, just trust what
elf_parse_bsdsyms has already accounted for.
This also simplifies the code by allowing the non-load case to return
earlier.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Mon, 15 Jun 2015 10:12:07 +0000 (11:12 +0100)]
tools/libxc: Batch memory allocations for PV guests
The current code for allocating memory for PV guests batches the
hypercalls to allocate memory by allocating 1024*1024 extents of order 0
at a time. To make this faster, first try allocating extents of order 9
(2 MiB) before falling back to the order 0 allocating if the order 9
allocation fails.
On my test machine this reduced the time to start a 128 GiB PV guest by
about 60 seconds.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Thu, 4 Jun 2015 10:23:01 +0000 (11:23 +0100)]
libxc: unify handling of vNUMA layout
This patch does the following:
1. Use local variables for dummy vNUMA layout in PV case.
2. Avoid leaking dummy layout back to caller in PV case.
3. Use local variables to reference vNUMA layout (whether it is dummy
or provided by caller) for both PV and HVM.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Wed, 3 Jun 2015 10:44:50 +0000 (11:44 +0100)]
libxl: clean up qemu-save and qemu-resume files
These files are leaked when using qemu-trad stubdom. They are
intermediate files created by libxc. Unfortunately they don't fit well
in our userdata scheme. Clean them up after we destroy all userdata,
we're sure they are not useful anymore at that point.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:44 +0000 (16:30 +0000)]
xenalyze: remove argp_program_version
Since xenalyze is now upstream its Open Source and part of the given
release.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:43 +0000 (16:30 +0000)]
xenalyze: remove trailing whitespaces
Result of "sed -i 's@[[:blank:]]\+$@@' tools/xentrace/xenalyze.c"
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:41 +0000 (16:30 +0000)]
xenalyze: handle TRC_TRACE_WRAP_BUFFER
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:40 +0000 (16:30 +0000)]
xenalyze: include odd mmio states in default output
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:39 +0000 (16:30 +0000)]
xenalyze: print newline after unknown hvm events
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:38 +0000 (16:30 +0000)]
xenalyze: add to tools/xentrace/
This merges xenalyze.hg, changeset 150:24308507be1d,
into tools/xentrace/xenalyze.c to have the tool and
public/trace.h in one place.
Adjust code to use public/trace.h instead of private trace.h
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- wrap $(BIN) install in a check in case it is empty (which it
is on !x86, avoid BIN += since it results in BIN = ' ' on
!x86 ]
Jan Beulich [Tue, 16 Jun 2015 10:29:18 +0000 (12:29 +0200)]
gnttab: make struct grant_mapping private
This documents that no entity outside of gnttab.c actually accesses
objects of that type, which is particularly important with the now more
fine grained locking in place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:28:11 +0000 (12:28 +0200)]
gnttab: fix/adjust gnttab_transfer()
- don't update shared entry's frame number for translated domains (as
MFNs shouldn't be exposed to such guests)
- for v1 grant table format, force copying of the page also when the
intended MFN doesn't fit in 32 bits (and the domain isn't translated)
- fix an apparent off-by-one error (it's unclear to me why commit 5cc77f9098 ("32-on-64: Fix domain address-size clamping, implement")
uses BITS_PER_LONG-1 here, while using BITS_PER_LONG in the two other
invocations of domain_clamp_alloc_bitsize())
- adjust comments accompanying the shared entry's frame field
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:26:03 +0000 (12:26 +0200)]
gnttab: simplify shared entry v1 vs v2 handling
In a number of places both v1 and v2 pointers are being obtained when
none or just one suffices. Additionally in __acquire_grant_for_copy()
the flow of if/else-if can be slightly improved by re-ordering.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:25:35 +0000 (12:25 +0200)]
gnttab: limit mapcount() looping
The function doesn't need to return counts in the first place; all its
callers are after is whether at least one entry of a certain kind
exists. With that there's no point for that loop to continue once the
looked for condition was found to be met by one entry. Rename the
function to match the changed behavior.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:24:49 +0000 (12:24 +0200)]
gnttab: eliminate several explicit version checks
By having nr_grant_entries() return zero when the grant table version
is still unset we can reduce the number of error paths and at once fix
grant_map_exists() running into the being removed ASSERT() when called
for a page owned by a domain not having its grant table set up yet.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
David Vrabel [Mon, 15 Jun 2015 11:25:20 +0000 (13:25 +0200)]
gnttab: make the grant table lock a read-write lock
In combination with the per-active entry locks, the grant table lock
can be made a read-write lock since the majority of cases only the
read lock is required. The grant table read lock protects against
changes to the table version or size (which are done with the write
lock held).
The write lock is also required when two active entries must be
acquired.
The double lock is still required when updating IOMMU page tables.
With the lock contention being only on the maptrack lock (unless IOMMU
updates are required), performance and scalability is improved.
Based on a patch originally by Matt Wilson <msw@amazon.com>.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ian Jackson [Thu, 11 Jun 2015 16:56:15 +0000 (17:56 +0100)]
libxl: libxl_internal.h: Clarify ao rule against internal callers
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Juergen Gross <jgross@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Juergen Gross <jgross@suse.com>
Ross Lagerwall [Fri, 12 Jun 2015 10:07:05 +0000 (12:07 +0200)]
x86: avoid tripping watchdog when constructing dom0
Constructing dom0 may take a few seconds, particularly if the slow VESA
graphics terminal is used. Process pending softirqs a few times to avoid
tripping a watchdog with a short timeout.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Move inclusion of xen/softirq.h (and at once clean up other includes).
The fix is, when tearing down a pCPU, call the free_pdata()
hook from the scheduler of the cpupool the pCPU belongs to,
not always the one from the default scheduler.
Jan Beulich [Thu, 11 Jun 2015 12:47:54 +0000 (14:47 +0200)]
EFI: map allocation size must be set to zero
Commit 8a753b3f1c ("efi: fix allocation problems if ExitBootServices()
fails") replaced the use of a static (and hence zero-initialized)
variable by an automatic (and hence uninitialized) one.
Also drop the variable introduced by that commit in favor of re-using
another available and suitable one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Thu, 11 Jun 2015 09:55:05 +0000 (11:55 +0200)]
VT-d: extend quirks to newer desktop chipsets
We're being told that while on the server side the issue we're trying
to work around is fixed starting with IvyBridge (another round of
double checking is going on before we're going to remove the one
IvyBridge ID that we're currently applying the workaround for), on the
desktop side even Skylake still requires the workaround. Hence we need
to add a whole bunch of desktop IDs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Don Dugger <donald.d.dugger@intel.com>
Andrew Cooper [Mon, 13 Apr 2015 16:07:03 +0000 (16:07 +0000)]
tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
uint64_t and unsigned long.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 10 Jun 2015 10:05:21 +0000 (12:05 +0200)]
x86/EFI: adjust EFI_MEMORY_WP handling for spec version 2.5
That flag now means cachability rather than protection, and a new flag
EFI_MEMORY_RO got added in its place.
Along with EFI_MEMORY_RO also add the two other new EFI_MEMORY_*
definitions, even if we don't need them right away.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We also alter the 'efi-rs' to be 'efi=rs' or 'efi=no-rs'.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ross Lagerwall [Wed, 10 Jun 2015 09:57:18 +0000 (11:57 +0200)]
efi: avoid calling boot services after ExitBootServices()
After the first call to ExitBootServices(), avoid calling any boot
services (except GetMemoryMap() and ExitBootServices()) by setting
setting efi_bs to NULL and halting in blexit(). Only GetMemoryMap() and
ExitBootServices() are explicitly allowed to be called after the first
call to ExitBootServices() and so are are called via
SystemTable->BootServices.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 9 Jun 2015 14:00:24 +0000 (16:00 +0200)]
kexec: add more pages to v1 environment
Destination pages need mappings to be added to the page tables in the
v1 case (where nothing else calls machine_kexec_add_page() for them).
Further, without the tools mapping the low 1Mb (expected by at least
some Linux version), we need to do so in the hypervisor in the v1 case.
Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Alan Robinson <alan.robinson@ts.fujitsu.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 9 Jun 2015 13:59:31 +0000 (15:59 +0200)]
x86: adjust PV I/O emulation functions' types
admin_io_okay(), guest_io_read(), and guest_io_write() all don't need
their current "regs" parameter at all, and they don't use the vCPU
passed to them for other than obtaining its domain. Drop the former and
replace the latter by a struct domain pointer.
pci_cfg_okay() returns a boolean type, and its "write" parameter is of
boolean kind too.
All of them get called for the current vCPU (and hence current domain)
only, so name the domain parameters accordingly except in the
admin_io_okay() case, which a subsequent patch will use for simplifying
setup_io_bitmap().
Latch current->domain into a local variable in emulate_privileged_op().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 8 Jun 2015 12:41:25 +0000 (14:41 +0200)]
x86/mm: print domain IDs instead of pointers
Printing pointers to struct domain isn't really useful for initial
problem analysis. In get_page() also drop the page only after issuing
the log message, so that at the time of printing the state can be
considered reasonably consistent.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 8 Jun 2015 12:16:27 +0000 (14:16 +0200)]
x86/setup: move CPU0s stack out of the Xen text/data/bss virtual region
Currently, the BSP's stack is the BSS symbol cpu0_stack. In builds using
memguard_stack(), a page gets shot out of the mappings.
To avoid shattering the superpage which will eventually map the BSS, use the
directmap virtual address of cpu0_stack, while still using the same underlying
physical memory. (Xen has an order 21 physical relocation requirement meaning
that the order 3 alignment requirement for cpu0_stack will be honoured even
via its diretmap mapping.)
In addition, fix two issues exposed by the changes.
* do_invalid_op() should use is_active_kernel_text() rather than having its
own, different, idea of when to search through the bugframes.
* Setting of system_state to active needs to be deferred until after code has
left .init.text, for bugframes/backtraces to function in reinit_bsp_stack().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 8 Jun 2015 12:15:59 +0000 (14:15 +0200)]
x86: misc boot/link tweaking
* Introduce symbols bounding the multiboot1 header, which helps clarify that
it is data and not code corruption when viewing the disassembly.
* Move the __high_start symbol to its implementation, and declare it
correctly as ENTRY()
* Move the l1_identmap construction to be with all the other pagetables, and
within __page_tables_{start,end}. This won't affect the EFI relocation
algorithm, as l1_identmap contains no relocations.
* Move the cpu0_stack alignment check to the linker. Chances are very good
that a binary with a misaligned stack won't get as far as the test.
* Use MB() in linker script.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Mon, 25 May 2015 20:44:20 +0000 (21:44 +0100)]
xen/arm: vgic-v3: Clean the emulation of IROUTER
The read emulation of the register IROUTER contains lots of uncessary
code as irouter is already valid and doesn't need any processing before
setting the value in a register.
Also take the opportunity to factorize the code to find a vCPU from the
affinity in a single place. It will be easier to change the way to do it
later.
Razvan Cojocaru [Fri, 5 Jun 2015 10:20:18 +0000 (12:20 +0200)]
vm_event: clean up control-register-write vm_events and add XCR0 event
As suggested by Andrew Cooper, this patch attempts to remove
some redundancy and allow for an easier time when adding vm_events
for new control registers in the future, by having a single
VM_EVENT_REASON_WRITE_CTRLREG vm_event type, meant to serve CR0,
CR3, CR4 and (newly introduced) XCR0. The actual control register
will be deduced by the new .index field in vm_event_write_ctrlreg
(renamed from vm_event_mov_to_cr).
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 5 Jun 2015 10:09:18 +0000 (12:09 +0200)]
x86/paging: remove pointless current domain checks
Checking that the subject domain is not the current one is pointless
when already having paused that domain: domain_pause() already
ASSERT()s this to be the case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Daniel Kiper [Tue, 2 Jun 2015 13:33:26 +0000 (15:33 +0200)]
tools: link executables with libtinfo explicitly
binutils 2.22 changed ld default from --copy-dt-needed-entries
to -no-copy-dt-needed-entries. This revealed that some objects
are linked implicitly with libtinfo and newer ld fails to build
relevant executables.
Below is short explanation why we should not do that...
The default behaviour for ld (my note: before version 2.22) allows
users to 'indirectly' link to required objects/libraries through
intermediate objects/libraries. While this is convenient, it can
also be dangerous because it makes your program's dependencies tied
to the dependencies of other objects. If those objects ever change
their linkages, they can break your program without any changes
to your own code!
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Julien Grall [Wed, 6 May 2015 18:52:30 +0000 (19:52 +0100)]
xen/arm: gic-hip04: Resync the driver with the GICv2
The GIC hip04 driver was differring from GICv2. I suspect that some of
the changes in the common GIC code make boot fail on hip04. Although, I
don't have a platform to check so it has been only build tested.
List of GICv2 commit ported to the HIP04:
commit ce12e6dba4b2d120e35dffd95a745452224e7144
Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Date: Fri Apr 10 16:21:10 2015 +1000
xen/arm: Don't write to GICH_MISR
GICH_MISR is read-only in GICv2.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
commit 2eb4f996547dc632aa94b2b7b4f783bec8ffe457
Author: Julien Grall <julien.grall@linaro.org>
Date: Wed Apr 1 17:21:47 2015 +0100
xen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts
GICD_TYPER.ITLinesNumber can encode up to 1024 interrupts. Although,
IRQ 1020-1023 are reserved for special purpose.
The result is used by the callers of gic_number_lines in order to check
the validity of an IRQ.
Currently the function to translate IRQ from the device tree is set
unconditionally to be able to be able to retrieve serial/timer IRQ
before the GIC has been initialized.
It assumes that the xlate function won't ever changed. We may also need
to have the primary interrupt controller very early.
Rework the gic initialization in 2 parts:
- gic_preinit: Get the interrupt controller device tree node and
set up GIC and xlate callbacks
- gic_init: Initialize the interrupt controller and the boot CPU
interrupts.
The former function will be called just after the IRQ subsystem as been
initialized.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Frediano Ziglio <frediano.ziglio@huawei.com> Cc: Zoltan Kiss <zoltan.kiss@huawei.com> Signed-off-by: Julien Grall <julien.grall@citrix.com> Cc: Zoltan Kiss <zoltan.kiss@huawei.com> Reviewed-by: Zoltan Kiss <zoltan.kiss@huawei.com> Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 17:24:35 +0000 (18:24 +0100)]
libxl: remove code in stubdom creation failure path and callback
The snippet to destroy stubdom and the callback were added in 1fc3aeb3
("libxl: use new QEMU xenstore protocol"). The intention was to destroy
stubdom when it is not responsive. That approach is problematic because
rc is not propagate back to sdss->callback, hence the guest is leaked.
The solution is simple. The destruction of stubdom can be done later in
sdss->callback. That code path already does the right thing to destroy
both the guest and the stubdom that serves the guest.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 10:19:14 +0000 (11:19 +0100)]
libxl: fix HVM vNUMA
This patch does two thing:
The original code erroneously fills in xc_hvm_build_args before
generating vmemranges. The effect is that guest memory is populated
without vNUMA information. Move the hunk to right place to fix this.
Move the subtraction of video ram to libxl__vnuma_build_vmemrange_hvm
because it's the central place for generating vmemranges.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 10:19:13 +0000 (11:19 +0100)]
libxc: rework vnuma bits in setup_guest
Make the setup process similar to PV counterpart. That is, to allocate a
P2M array that covers the whole memory range and start from there. This
is clearer than using an array with no holes in it.
Also the dummy layout should take MMIO hole into consideration. We might
end up having two vmemranges in the dummy layout.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 10:19:12 +0000 (11:19 +0100)]
libxc: print more error messages when failed
No functional changes introduced.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Jun 2015 10:19:11 +0000 (11:19 +0100)]
libxc/libxl: fill xc_hvm_build_args in libxl
When building HVM guests, originally some fields of xc_hvm_build_args
are filled in xc_hvm_build (and buried in the wrong function), some are
set in libxl__build_hvm before passing xc_hvm_build_args to
xc_hvm_build. This is fragile.
After examining the code in xc_hvm_build that sets those fields, we can
in fact move setting of mmio_start etc in libxl. This way we consolidate
memory layout setting in libxl.
The setting of firmware data related fields is left in xc_hvm_build
because it depends on parsing ELF image. Those fields only point to
scratch data that doesn't affect memory layout.
There should be no change in the generated guest memory layout. But the
semantic is changed for xc_hvm_build. Toolstack that built directly on
top of libxc need to adjust to this change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: "Chen, Tiejun" <tiejun.chen@intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Daniel De Graaf [Tue, 26 May 2015 18:13:28 +0000 (14:13 -0400)]
xen/flask: change bool_maxstr to PAGE_SIZE
When FLASK_{GET,SET}BOOL is called with a named boolean, the call to
flask_security_resolve_bool is made prior to bool_maxstr being populated
by flask_security_make_bools. This results in the maximum string length
being specified as zero, which is not useful. While it would be
possible to initialize bool_maxstr correctly prior to its use, it is
simpler to use a fixed maximum of PAGE_SIZE as is done for the other
calls to safe_copy_string_from_guest.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Daniel De Graaf [Tue, 26 May 2015 18:13:27 +0000 (14:13 -0400)]
flask/policy: updates from osstest runs
Migration and HVM domain creation both trigger AVC denials that should
be allowed in the default policy; add these rules.
Guest console writes need to be either allowed or denied without audit
depending on the decision of the local administrator; introduce a policy
boolean to switch between these possibilities.
Reported-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Sat, 23 May 2015 08:24:10 +0000 (08:24 +0000)]
xentrace: install into sbin
Collecting the trace buffer requires root permissions. Adjust Makefile
to install xentrace and xentrace_setsize into sbindir. Leave the
existing support for BIN in place for upcoming changes.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>