Jan Beulich [Thu, 8 Oct 2015 09:38:44 +0000 (11:38 +0200)]
x86/p2m-pt: tighten conditions of IOMMU mapping updates
Whether the MFN changes does not depend on the new entry being valid
(but solely on the old one), and the need to update or TLB-flush also
depends on permission changes.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 660fd65d5578a95ec5eac522128bba23325179eb
master date: 2015-10-02 13:40:36 +0200
Dario Faggioli [Thu, 8 Oct 2015 09:38:05 +0000 (11:38 +0200)]
credit1: fix tickling when it happens from a remote pCPU
especially if that is also from a different cpupool than the
processor of the vCPU that triggered the tickling.
In fact, it is possible that we get as far as calling vcpu_unblock()-->
vcpu_wake()-->csched_vcpu_wake()-->__runq_tickle() for the vCPU 'vc',
but all while running on a pCPU that is different from 'vc->processor'.
For instance, this can happen when an HVM domain runs in a cpupool,
with a different scheduler than the default one, and issues IOREQs
to Dom0, running in Pool-0 with the default scheduler.
In fact, right in this case, the following crash can be observed:
In this case, pCPU 7 is not in Pool-0, while the (Dom0's) vCPU being
woken is. pCPU's 7 pool has a different scheduler than credit, but it
is, however, right from pCPU 7 that we are waking the Dom0's vCPUs.
Therefore, the current code tries to access csched_balance_mask for
pCPU 7, but that is not defined, and hence the Oops.
(Note that, in case the two pools run the same scheduler we see no
Oops, but things are still conceptually wrong.)
Cure things by making the csched_balance_mask macro accept a
parameter for fetching a specific pCPU's mask (instead than always
using smp_processor_id()).
Quan Xu [Thu, 8 Oct 2015 09:34:56 +0000 (11:34 +0200)]
vt-d: fix IM bit mask and unmask of Fault Event Control Register
Bit 0:29 in Fault Event Control Register are 'Reserved and Preserved',
software cannot write 0 to it unconditionally. Software must preserve
the value read for writes.
Signed-off-by: Quan Xu <quan.xu@intel.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
vt-d: fix IM bit unmask of Fault Event Control Register in init_vtd_hw()
Bit 0:29 in Fault Event Control Register are 'Reserved and Preserved',
software cannot write 0 to it unconditionally. Software must preserve
the value read for writes.
xen/xsm: Make p->policyvers be a local variable (ver) to shut up GCC 5.1.1 warnings.
policydb.c: In function ‘user_read’:
policydb.c:1443:26: error: ‘buf[2]’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
usrdatum->bounds = le32_to_cpu(buf[2]);
^
cc1: all warnings being treated as errors
Which (as Andrew mentioned) is because GCC cannot assume
that 'p->policyvers' has the same value between checks.
We make it local, optimize the name to 'ver' and the warnings go away.
We also update another call site with this modification to
make it more inline with the rest of the functions.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: 6a2f81459e1455d65a9a6f78dd2a0d0278619680
master date: 2015-09-22 12:09:03 -0400
xen/arm: vgic-v2: Map the GIC virtual CPU interface with the correct size
On GICv2, the GIC virtual CPU interface is at minimum 8KB. Due some to
some necessary quirk for GIC using 64KB stride, we are mapping the
region in 2 time.
The first mapping is 4KB and the second one is 8KB, i.e 12KB in total.
Although the minimum supported size (and widely used) is 8KB. This means
that we are mapping 4KB more to any guest using GICv2.
While this looks scary at first glance, the GIC virtual CPU interface is
most frequently at the end the GIC I/O region. So we will most likely
map an an unused I/O region or a mirrored version of GICV for platform
using 64KB stride.
Nonetheless, fix the second mapping to only map 4KB.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(Backported from 493a67ee4a3fd9420e94fa2cf108e2a27961202b)
xen/arm: vgic: Correctly emulate write when byte is used
When a guest is writing a byte, the value will be located in bits[7:0]
of the register.
Although the current implementation is expecting the byte at the Nth
byte of the register where N = address & 4;
When the address is not 4-byte aligned, the corresponding byte in the
internal state will always be set to zero rather.
Note that byte access are only used for GICD_IPRIORITYR and
GICD_ITARGETSR. So the worst things that could happen is not setting the
priority correctly and ignore the target vCPU written.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 3f214fea76acc6cbc1101fe1815cee795483a67d)
Ian Campbell [Thu, 16 Jul 2015 08:50:07 +0000 (09:50 +0100)]
xen: arm: bootfdt: Avoid reading off the front of *_cells array
In device_tree_for_each_node the call to the callback was using
{address,size}_cells[depth - 1], which at depth 0 could read off the
front of the array.
We already handled this correctly in the rest of the loop so fixup
this instance as well.
Reported-by: Chris (Christopher) Brand <chris.brand@broadcom.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Chris (Christopher) Brand <chris.brand@broadcom.com> Reviewed-by: Julien Grall <julien.grall@citrix.com>
(cherry picked from commit 989f3939bd16a0e1669c179b6c5c264812a25fc2)
Ian Campbell [Mon, 30 Mar 2015 11:12:24 +0000 (12:12 +0100)]
xen: arm: handle accesses to CNTP_CVAL_EL0
All OSes we have run on top of Xen use CNTP_TVAL_EL0 but for
completeness we really should handle CVAL too.
In vtimer_emulate_cp64 pull the propagation of the 64-bit result into
two 32-bit registers out of the switch to avoid duplicating for every
register. We also need to initialise x now since previously the only
register implemented register was R/O.
While adding HSR_SYSREG_CNTP_CVAL_EL0 also move
HSR_SYSREG_CNTP_CTL_EL0 so it is sorted correctly.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 9cfa8ebe2e82d38c2e0c32fa23ea920a43e414ca)
Ian Campbell [Mon, 30 Mar 2015 11:12:23 +0000 (12:12 +0100)]
xen: arm: correctly handle vtimer traps from userspace
Previously 32-bit userspace on 32-bit kernel and 64-bit userspace on
64-bit kernel could access these registers irrespective of whether the
kernel had configured them to be allowed to. To fix this:
- Userspace access to CNTP_CTL_EL0 and CNTP_TVAL_EL0 should be gated
on CNTKCTL_EL1.EL0PTEN.
- Userspace access to CNTPCT_EL0 should be gated on
CNTKCTL_EL1.EL0PCTEN.
When we do not handle an access we now silently inject an undef even
in debug mode since the debugging messages are not helpful (we have
handled the access, by explicitly choosing not to).
The usermode accessibility check is rather repetitive, so a helper
macro is introduced.
Since HSR_EC_CP15_64 cannot be taken from a guest in AArch64 mode
except due to a hardware bug switch the associated check to a BUG_ON
(which will be switched to something more appropriate in a subsequent
patch)
Fix a coding style issue in HSR_CPREG64(CNTPCT) while touching similar
code.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit d306211e2131eb2d160522f21f21fceaa9dd054c)
Andrew Cooper [Wed, 23 Sep 2015 07:08:49 +0000 (09:08 +0200)]
x86/sysctl: don't clobber memory if NCAPINTS > ARRAY_SIZE(pi->hw_cap)
There is no current problem, as both NCAPINTS and pi->hw_cap are 8 entries,
but the limit should be calculated appropriately so as to avoid hypervisor
stack corruption if the two do get out of sync.
Jan Beulich [Wed, 23 Sep 2015 07:07:52 +0000 (09:07 +0200)]
x86/p2m: fix mismatched unlock
Luckily, due to gfn_unlock() currently mapping to p2m_unlock(), this is
only a cosmetic issue right now.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 1f180822ad3fe83fe293393ec175f14ded98f082
master date: 2015-09-14 13:39:19 +0200
The ACPI PM timer is sometimes broken on live migration.
Since vcpu->arch.hvm_vcpu.guest_time is always zero in other than
"delay for missed ticks mode". Even in "delay for missed ticks mode",
vcpu's guest_time field is not valid (i.e. zero) when
the state of vcpu is "blocked". (see pt_save_timer function)
The original author (Tim Deegan) of pmtimer_save() must have intended
that it saves the last scheduled time of the vcpu. Unfortunately it was
already implied this bug. FYI, there is no other timer mode than
"delay for missed ticks mode" then.
For consistency with HPET, pmtimer_save() should refer hvm_get_guest_time()
to update the counter as well as hpet_save() does.
Without this patch, the clock of windows server 2012R2 without HPET
might leap forward several minutes on live migration.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Retain use of ->arch.hvm_vcpu.guest_time when non-zero. Do the inverse
adjustment for vHPET.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Kouya Shimura <kouya@jp.fujitsu.com>
master commit: 244582a01dcb49fa30083725964a066937cc94f2
master date: 2015-09-11 16:24:56 +0200
Objects loaded by FileHandle->Read need to be flushed from dcache,
otherwise copy_from_paddr will read stale data when copying the kernel,
causing a failure to boot.
Introduce efi_arch_flush_dcache_area and call it from read_file.
This commit introduces no functional changes on x86.
Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 0d6a3c755374f04f6dd25373da28291a8f35bede
master date: 2015-09-09 15:29:06 +0200
The current libxl code doesn't deal with read-only drives at all.
Upstream QEMU and qemu-xen only support read-only cdrom drives: make
sure to specify "readonly=on" for cdrom drives and return error in case
the user requested a non-cdrom read-only drive.
This is XSA-142, discovered by Lin Liu
(https://bugzilla.redhat.com/show_bug.cgi?id=1257893).
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Backport to Xen 4.5 and earlier, apropos of report and review from
Michael Young.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Tue, 7 Jul 2015 15:09:13 +0000 (16:09 +0100)]
libxl: Increase device model startup timeout to 1min.
On a busy host, QEMU may take more than 10s to load and start.
This is likely due to a bug in Linux where the I/O subsystem sometime
produce high latency under load and result in QEMU taking a long time to
load every single dynamic libraries.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 9acfbe14d7261b03e3b3f4dc3c850ba2a7093e1f)
Wei Liu [Tue, 14 Jul 2015 16:41:10 +0000 (17:41 +0100)]
xl: correct handling of extra_config in main_cpupoolcreate
Don't dereference extra_config if it's NULL. Don't leak extra_config in
the end.
Also fixed a typo in error string while I was there.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 705c9e12426cba82804cb578fc70785281655d94)
Jan Beulich [Thu, 10 Sep 2015 13:36:12 +0000 (15:36 +0200)]
x86/NUMA: make init_node_heap() respect Xen heap limit
On NUMA systems, where we try to use node local memory for the basic
control structures of the buddy allocator, this special case needs to
take into consideration a possible address width limit placed on the
Xen heap. In turn this (but also other, more abstract considerations)
requires that xenheap_max_mfn() not be called more than once (at most
we might permit it to be called a second time with a larger value than
was passed the first time), and be called only before calling
end_boot_allocator().
While inspecting all the involved code, a couple of off-by-one issues
were found (and are being corrected here at once):
- arch_init_memory() cleared one too many page table slots
- the highmem_start based invocation of xenheap_max_mfn() passed too
big a value
- xenheap_max_mfn() calculated the wrong bit count in edge cases
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm64: do not (incorrectly) limit size of xenheap
The xenheap bits variable is used to know the last RAM MFN always mapped
in Xen virtual memory. If the value is 0, it means that all the memory is
always mapped in Xen virtual memory.
On X-gene the RAM bank resides above 128GB and last xenheap MFN is
0x4400000. With the new way to calculate the number of bits, xenheap_bits
will be equal to 38 bits. This will result to hide all the RAM and the
impossibility to allocate xenheap memory.
Given that aarch64 have always all the memory mapped in Xen virtual
memory, it's not necessary to call xenheap_max_mfn which set the number
of bits.
Jan Beulich [Thu, 10 Sep 2015 13:35:11 +0000 (15:35 +0200)]
x86/NUMA: don't account hotplug regions
... except in cases where they really matter: node_memblk_range[] now
is the only place all regions get stored. nodes[] and NODE_DATA() track
present memory only. This improves the reporting when nodes have
disjoint "normal" and hotplug regions, with the hotplug region sitting
above the highest populated page. In such cases a node's spanned-pages
value (visible in both XEN_SYSCTL_numainfo and 'u' debug key output)
covered all the way up to top of populated memory, giving quite
different a picture from what an otherwise identically configured
system without and hotplug regions would report. Note, however, that
the actual hotplug case (as well as cases of nodes with multiple
disjoint present regions) is still not being handled such that the
reported values would represent how much memory a node really has (but
that can be considered intentional).
Reported-by: Jim Fehlig <jfehlig@suse.com>
This at once makes nodes_cover_memory() no longer consider E820_RAM
regions covered by SRAT hotplug regions.
Also reject self-overlaps with mismatching hotplug flags.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com>
master commit: c011f470e6e79208f5baa071b4d072b78c88e2ba
master date: 2015-08-31 13:52:24 +0200
Jan Beulich [Thu, 10 Sep 2015 13:34:26 +0000 (15:34 +0200)]
x86/NUMA: fix setup_node()
The function referenced an __initdata object (nodes_found). Since this
being a node mask was more complicated than needed, the variable gets
replaced by a simple counter. Check at once that the count of nodes
doesn't go beyond MAX_NUMNODES.
Also consolidate three printk()s related to the function's use into just
one.
Finally (quite the opposite of the above issue) __init-annotate
nodes_cover_memory().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 8f945d36d9bddd5b589ba23c7322b30d623dd084
master date: 2015-08-31 13:51:52 +0200
Jan Beulich [Thu, 10 Sep 2015 13:33:12 +0000 (15:33 +0200)]
x86/IO-APIC: don't create pIRQ mapping from masked RTE
While moving our XenoLinux patches to 4.2-rc I noticed bogus "already
mapped" messages resulting from Linux (legitimately) writing RTEs with
only the mask bit set. Clearly we shouldn't even attempt to create a
pIRQ <-> IRQ mapping from such RTEs.
In the course of this I also found that the respective message isn't
really useful without also printing the pre-existing mapping. And I
noticed that map_domain_pirq() allowed IRQ0 to get through, despite us
never allowing a domain to control that interrupt.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 669d4b85c433674ab3b52ef707af0d3a551c941f
master date: 2015-08-25 16:18:31 +0200
x86, amd_ucode: skip microcode updates for final levels
Some of older[Fam10h] systems require that certain number of
applied microcode patch levels should not be overwritten by
the microcode loader. Otherwise, system hangs are known to occur.
The 'final_levels' of patch ids have been obtained empirically.
Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
for details of the issue.
The short version is that people have predominantly noticed
system hang issues when trying to update microcode levels
beyond the patch IDs below.
[0x01000098, 0x0100009f, 0x010000af]
From internal discussions, we gathered that OS/hypervisor
cannot reliably perform microcode updates beyond these levels
due to hardware issues. Therefore, we need to abort microcode
update process if we hit any of these levels.
In this patch, we check for those microcode versions and abort
if the current core has one of those final patch levels applied
by the BIOS
A linux version of the patch has already made it into tip-
http://marc.info/?l=linux-kernel&m=143703405627170
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 22c5675877c8209adcfdb6bceddb561320374529
master date: 2015-08-25 16:17:13 +0200
mm: populate_physmap: validate correctly the gfn for direct mapped domain
Direct mapped domain has already the memory allocated 1:1, so we are
directly using the gfn as mfn to map the RAM in the guest.
While we are validating that the page associated to the first mfn belongs to
the domain, the subsequent MFN are not validated when the extent_order
is > 0.
This may result to map memory region (MMIO, RAM) which doesn't belong to the
domain.
Although, only DOM0 on ARM is using a direct memory mapped. So it
doesn't affect any guest (at least on the upstream version) or even x86.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 9503ab0e9c6a41a1ee7a70c8ea9313d08ebaa8c5
master date: 2015-08-13 14:41:09 +0200
A domain with sufficient shadow allocation can cause a watchdog timeout
during domain destruction. Expand the existing -ERESTART logic in
paging_teardown() to allow {hap/sh}_set_allocation() to become
restartable during the DOMCTL_destroydomain hypercall.
Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: 0174da5b79752e2d5d6ca0faed89536e8f3d91c7
master date: 2015-08-06 10:04:43 +0100
Andrew Cooper [Thu, 10 Sep 2015 13:29:31 +0000 (15:29 +0200)]
x86/gdt: Drop write-only, xalloc()'d array from set_gdt()
It is not used, and can cause a spurious failure of the set_gdt() hypercall in
low memory situations.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: a7bd9b1661304500cd18b7d216d616ecf053ebdb
master date: 2015-08-05 10:32:45 +0100
Julien Grall [Thu, 13 Aug 2015 11:03:43 +0000 (12:03 +0100)]
xen/arm: mm: Do not dump the p2m when mapping a foreign gfn
The physmap operation XENMAPSPACE_gfmn_foreign is dumping the p2m when
an error occured by calling dump_p2m_lookup. But this function is not
using ratelimited printk.
Any domain able to map foreign gfmn would be able to flood the Xen
console.
The information wasn't not useful so drop it.
This is XSA-141.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit afc13fe5e21d18c09e44f8ae6f7f4484e9f1de7f)
It can happen that an fd is deregistered, and closed, and then a new
fd opened, and reregistered, all while another thread is in poll().
If this happens poll might report POLLNVAL, but the event loop would
think that the fd was supposed to have been valid, and then fail an
assertion:
libxl_event.c:1183: afterpoll_check_fd: Assertion `poller->fds_changed || !(fds[slot].revents & 0x020)' failed.
We can't simply ignore POLLNVAL because if we have bugs which cause
messed-up fds, it is a serious problem which we really need to detect.
Instead, add extra tracking to spot when this possibility arises, and
abort on POLLNVAL if we are sure that it is unexpected.
Reported-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 681ce1681622a46d111cfdc4fc07e4cb565ae131)
Ian Jackson [Thu, 9 Jul 2015 16:05:07 +0000 (17:05 +0100)]
libxl: poll: Use poller_get and poller_put for poller_app
This makes the code more regular. We are going to want to do some
more work in poller_get and poller_put, which work also wants to be
done for poller_app.
Two very minor functional changes:
* We call malloc an extra time since poller_app is now a pointer
* ERROR_FAIL on poller_get failing for poller_app is generated in
libxl_ctx_init rather than passed through by libxl_poller_init
from libxl__pipe_nonblock.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit aae37652067eafd0f2b85050306772df0cb71f08)
Ian Jackson [Thu, 9 Jul 2015 15:52:02 +0000 (16:52 +0100)]
libxl: poll: Make libxl__poller_get have only one success return path
In preparation for doing some more work on successful exit.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 6fc946bc5520ebdbba5cbae4d49e53895df8b393)
Ian Campbell [Mon, 13 Jul 2015 12:31:23 +0000 (13:31 +0100)]
tools: libxl: Handle failure to create qemu dm logfile
If libxl_create_logfile fails for some reason then
libxl__create_qemu_logfile previously just carried on and dereferenced
the uninitialised logfile.
Check for the error from libxl_create_logfile, which has already
logged for us.
This was reported as Debian bug #784880.
Reported-by: Russell Coker <russell@coker.com.au> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: 784880@bugs.debian.org Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit e9d3a859977913704605e0fd87887451b12d4722)
Ian Jackson [Mon, 15 Jun 2015 13:50:42 +0000 (14:50 +0100)]
xl: Sane handling of extra config file arguments
Various xl sub-commands take additional parameters containing = as
additional config fragments.
The handling of these config fragments has a number of bugs:
1. Use of a static 1024-byte buffer. (If truncation would occur,
with semi-trusted input, a security risk arises due to quotes
being lost.)
2. Mishandling of the return value from snprintf, so that if
truncation occurs, the to-write pointer is updated with the
wanted-to-write length, resulting in stack corruption. (This is
XSA-137.)
3. Clone-and-hack of the code for constructing the appended
config file.
These are fixed here, by introducing a new function
`string_realloc_append' and using it everywhere. The `extra_info'
buffers are replaced by pointers, which start off NULL and are
explicitly freed on all return paths.
The separate variable which will become dom_info.extra_config is
abolished (which involves moving the clearing of dom_info).
Additional bugs I observe, not fixed here:
4. The functions which now call string_realloc_append use ad-hoc
error returns, with multiple calls to `return'. This currently
necessitates multiple new calls to `free'.
5. Many of the paths in xl call exit(-rc) where rc is a libxl status
code. This is a ridiculous exit status `convention'.
6. The loops for handling extra config data are clone-and-hacks.
7. Once the extra config buffer is accumulated, it must be combined
with the appropriate main config file. The code to do this
combining is clone-and-hacked too.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Tested-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian,campbell@citrix.com>
(cherry picked from commit dd84604f35bd3855c57146eb8fe53924c10d3963)
Elena Ufimtseva [Tue, 21 Jul 2015 09:13:03 +0000 (11:13 +0200)]
dmar: device scope mem leak fix
Release memory allocated for scope.devices dmar units on various
failure paths and when disabling dmar. Set device count after
sucessfull memory allocation, not before, in device scope parsing function.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
# Commit 132231d10343608faf5892785a08acc500326d04
# Date 2015-07-16 15:23:37 +0200
# Author Andrew Cooper <andrew.cooper3@citrix.com>
# Committer Jan Beulich <jbeulich@suse.com>
dmar: fix double free in error paths following c/s a8bc99b
Several error paths would end up freeing scope->devices twice.
Jan Beulich [Tue, 21 Jul 2015 09:12:35 +0000 (11:12 +0200)]
make rangeset_report_ranges() report all ranges
find_range() returns NULL when s is below the lowest range, so we have
to use first_range() here (which is as good performance wise), or else
no range gets reported at all in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: b1c780cd315eb4db06be3bbb5c6d80b1cabd27a9
master date: 2015-07-15 16:11:42 +0200
Ian Campbell [Tue, 21 Jul 2015 09:11:49 +0000 (11:11 +0200)]
xen: earlycpio: Pull in latest linux earlycpio.[ch]
AFAICT our current version does not correspond to any version in the
Linux history. This commit resynchronised to the state in Linux
commit 598bae70c2a8e35c8d39b610cca2b32afcf047af.
Differences from upstream: find_cpio_data is __init, printk instead of
pr_*.
This appears to fix Debian bug #785187. "Appears" because my test box
happens to be AMD and the issue is that the (valid) cpio generated by
the Intel ucode is not liked by the old Xen code. I've tested by
hacking the hypervisor to look for the Intel path.
Reported-by: Stephan Seitz <stse+debianbugs@fsing.rootsland.net> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stephan Seitz <stse+debianbugs@fsing.rootsland.net> Cc: 785187@bugs.debian.org Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 39c6664a0e6e1b4ed80660d545dff34ce41bee31
master date: 2015-07-07 15:10:45 +0100
Andrew Cooper [Tue, 21 Jul 2015 09:11:19 +0000 (11:11 +0200)]
x86/hvmloader: avoid data corruption with xenstore reads/writes
The functions ring_read and ring_write() have logic to try and deal with
partial reads and writes.
However, in all cases where the "while (len)" loop executed twice, data
corruption would occur as the second memcpy() starts from the beginning of
"data" again, rather than from where it got to.
This bug manifested itself as protocol corruption when a reply header crossed
the first wrap of the response ring. However, similar corruption would also
occur if hvmloader observed xenstored performing partial writes of the block
in question, or if hvmloader had to wait for xenstored to make space in either
ring.
Reported-by: Adam Kucia <djexit@o2.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: bbbe7e7157a964c485fb861765be291734676932
master date: 2015-07-07 14:39:27 +0200
credit1: properly deal with pCPUs not in any cpupool
Ideally, the pCPUs that are 'free', i.e., not assigned
to any cpupool, should not be considred by the scheduler
for load balancing or anything. In Credit1, we fail at
this, because of how we use cpupool_scheduler_cpumask().
In fact, for a free pCPU, cpupool_scheduler_cpumask()
returns a pointer to cpupool_free_cpus, and hence, near
the top of csched_load_balance():
if ( unlikely(!cpumask_test_cpu(cpu, online)) )
goto out;
is false (the pCPU _is_ free!), and we therefore do not
jump to the end right away, as we should. This, causes
the following splat when resuming from ACPI S3 with
pCPUs not assigned to any pool:
The cure is:
* use cpupool_online_cpumask(), as a better guard to the
case when the cpu is being offlined;
* explicitly check whether the cpu is free.
SEDF is in a similar situation, so fix it too.
Still in Credit1, we must make sure that free (or offline)
CPUs are not considered "ticklable". Not doing so would impair
the load balancing algorithm, making the scheduler think that
it is possible to 'ask' the pCPU to pick up some work, while
in reallity, that will never happen! Evidence of such behavior
is shown in this trace:
Name CPU list
Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
In fact, when a pCPU goes down, we want to clear its
bit in the correct cpupool's valid mask, rather than
always in cpupool0's one.
Before this commit, all the pCPUs in the non-default
pool(s) will be considered immediately valid, during
system resume, even the one that have not been brought
up yet. As a result, the (Credit1) scheduler will attempt
to run its load balancing logic on them, causing the
following Oops:
The reason why the error is a #GP fault is that, without
this commit, we try to access the per-cpu area of a not
yet allocated and initialized pCPU.
In fact, %rax, which is what is used as pointer, is 80007d2f7fccb780, and we also have this:
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: e4e9d2d4e76bd8fe229c124bd57fc6ba824271b3
master date: 2015-07-07 11:37:26 +0200
Liang Li [Tue, 21 Jul 2015 09:08:05 +0000 (11:08 +0200)]
nested EPT: fix the handling of nested EPT
If the host EPT entry is changed, the nested EPT should be updated.
the current code does not do this, and it's wrong.
I have tested this patch, the L2 guest can boot and run as normal.
Signed-off-by: Liang Li <liang.z.li@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Reported-by: Tim Deegan <tim@xen.org> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 71bb7304e7a7a35ea6df4b0cedebc35028e4c159
master date: 2015-06-30 15:00:54 +0100
Andrew Cooper [Mon, 13 Jul 2015 11:54:47 +0000 (13:54 +0200)]
x86/traps: avoid using current too early on boot
Early on boot, current has the sentinel value 0xfffff000. Blindly using it in
show_registers() causes a nested failure and no useful information printed
from an early crash.
Ross Lagerwall [Mon, 13 Jul 2015 11:53:56 +0000 (13:53 +0200)]
x86: avoid tripping watchdog when constructing dom0
Constructing dom0 may take a few seconds, particularly if the slow VESA
graphics terminal is used. Process pending softirqs a few times to avoid
tripping a watchdog with a short timeout.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Move inclusion of xen/softirq.h (and at once clean up other includes).
Jan Beulich [Mon, 13 Jul 2015 11:52:27 +0000 (13:52 +0200)]
kexec: add more pages to v1 environment
Destination pages need mappings to be added to the page tables in the
v1 case (where nothing else calls machine_kexec_add_page() for them).
Further, without the tools mapping the low 1Mb (expected by at least
some Linux version), we need to do so in the hypervisor in the v1 case.
Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Alan Robinson <alan.robinson@ts.fujitsu.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 5cb57f4bddee1f11079e69bf43c193a8b104c476
master date: 2015-06-09 16:00:24 +0200
Tim Deegan [Mon, 13 Jul 2015 11:51:10 +0000 (13:51 +0200)]
passthrough/amd: avoid reading an uninitialized variable
update_intremap_entry_from_msi() doesn't write to its data pointer on
some error paths, so we copying that variable into the msg would count
as undefined behaviour.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
master commit: a8ccf2d9f6f291f8fc6003e3d8bc7275ac1cc69f
master date: 2015-04-24 12:04:57 +0200
The fix is, when tearing down a pCPU, call the free_pdata()
hook from the scheduler of the cpupool the pCPU belongs to,
not always the one from the default scheduler.
Jan Beulich [Thu, 18 Jun 2015 06:57:03 +0000 (08:57 +0200)]
VT-d: extend quirks to newer desktop chipsets
We're being told that while on the server side the issue we're trying
to work around is fixed starting with IvyBridge (another round of
double checking is going on before we're going to remove the one
IvyBridge ID that we're currently applying the workaround for), on the
desktop side even Skylake still requires the workaround. Hence we need
to add a whole bunch of desktop IDs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Don Dugger <donald.d.dugger@intel.com>
master commit: cdc6204b7749a53e6a4d95fac4440601c35a539d
master date: 2015-06-11 11:55:05 +0200
We also alter the 'efi-rs' to be 'efi=rs' or 'efi=no-rs'.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 74cdad5dae72c78af4f6b343f38fd55e6a526ab1
master date: 2015-06-10 12:04:07 +0200
Jan Beulich [Thu, 18 Jun 2015 06:54:57 +0000 (08:54 +0200)]
x86/EFI: fix EFI_MEMORY_WP handling
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Backport: Also fix EFI_MEMORY_XP handling (along the lines of what
master commit abcf15fa8f ["x86: switch default mapping attributes
to non-executable"] does): We must not set the NX bit when the CPU
doesn't support it, as the bit being set may trigger Reserved Bit
faults in that case.
Ross Lagerwall [Thu, 18 Jun 2015 06:54:08 +0000 (08:54 +0200)]
efi: avoid calling boot services after ExitBootServices()
After the first call to ExitBootServices(), avoid calling any boot
services (except GetMemoryMap() and ExitBootServices()) by setting
setting efi_bs to NULL and halting in blexit(). Only GetMemoryMap() and
ExitBootServices() are explicitly allowed to be called after the first
call to ExitBootServices() and so are are called via
SystemTable->BootServices.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: d4300db3a03a0cd999745135d7879fc4b6b5aa61
master date: 2015-06-10 12:00:10 +0200
Andrew Cooper [Thu, 18 Jun 2015 06:51:00 +0000 (08:51 +0200)]
x86/apic: Disable the LAPIC later in smp_send_stop()
__stop_this_cpu() may reset the LAPIC mode back from x2apic to xapic, but will
leave x2apic_enabled alone. This may cause disconnect_bsp_APIC() in
disable_IO_APIC() to suffer a #GP fault.
Disabling the LAPIC can safely be deferred to being the last action.
Ross Lagerwall [Thu, 18 Jun 2015 06:48:59 +0000 (08:48 +0200)]
efi: fix allocation problems if ExitBootServices() fails
If calling ExitBootServices() fails, the required memory map size may
have increased. When initially allocating the memory map, allocate a
slightly larger buffer (by an arbitrary 8 entries) to fix this.
The ARM code path was already allocating a larger buffer than required,
so this moves the code to be common for all architectures.
This was seen on the following machine when using the iscsidxe UEFI
driver. The machine would consistently fail the first call to
ExitBootServices().
System Information
Manufacturer: Supermicro
Product Name: X10SLE-F/HF
BIOS Information
Vendor: American Megatrends Inc.
Version: 2.00
Release Date: 04/24/2014
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roy Franz <roy.franz@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
EFI: map allocation size must be set to zero
Commit 8a753b3f1c ("efi: fix allocation problems if ExitBootServices()
fails") replaced the use of a static (and hence zero-initialized)
variable by an automatic (and hence uninitialized) one.
Also drop the variable introduced by that commit in favor of re-using
another available and suitable one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 8a753b3f1cf5e4714974196df9517849bf174324
master date: 2015-06-02 13:44:24 +0200
master commit: 4c94684bb7c20ff01d03fb1f22c03cc0c2fc417b
master date: 2015-06-11 14:47:54 +0200
Ross Lagerwall [Thu, 18 Jun 2015 06:48:13 +0000 (08:48 +0200)]
x86: don't crash when mapping a page using EFI runtime page tables
When an interrupt is received during an EFI runtime service call, Xen
may call map_domain_page() while using the EFI runtime page tables.
This fails because, although the EFI runtime page tables are a
copy of the idle domain's page tables, current points at a different
domain's vCPU.
To fix this, return NULL from mapcache_current_vcpu() when using the EFI
runtime page tables which is treated equivalently to running in an idle
vCPU.
This issue can be reproduced by repeatedly calling GetVariable() from
dom0 while using VT-d, since VT-d frequently maps a page from interrupt
context.
Roger Pau Monné [Thu, 18 Jun 2015 06:47:40 +0000 (08:47 +0200)]
x86/pvh: disable posted interrupts
Enabling posted interrupts requires the virtual interrupt delivery feature,
which is disabled for PVH guests, so make sure posted interrupts are also
disabled or else vmlaunch will fail.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-and-Tested-by: Lars Eggert <lars@netapp.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: cf6b3ccf28faee01a078311fcfe671148c81ea75
master date: 2015-05-28 10:56:08 +0200
Andrew Cooper [Mon, 13 Apr 2015 16:07:03 +0000 (16:07 +0000)]
tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
uint64_t and unsigned long.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 12e817e281034f5881f46e0e4f1d127820101a78)
libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.
There is no sense in trying to online (or offline) CPUs when the size of
cpumap is greater than the maximum number of VCPUs the guest can go to.
As such fail the operation if the count of CPUs to online is greater
than what the guest started with. For the offline case we do not
check (as the bits are unset in the cpumap) and let it go through.
We coalesce some of the underlying libxl_set_vcpuonline code
together which was duplicated in QMP and XenStore codepaths.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit d83bf9d224eeb5b73b93c2703f7dba4473cfa89c)
Conflicts:
tools/libxl/libxl.c Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 9 Feb 2015 15:20:32 +0000 (15:20 +0000)]
libxl: event handling: ao_inprogress does waits while reports outstanding
libxl__ao_inprogress needs to check (like
libxl__ao_complete_check_progress_reports) that there are no
oustanding progress callbacks.
Otherwise it might happen that we would destroy the ao while another
thread has an outstanding callback its egc report queue. The other
thread would then, in its egc_run_callbacks, touch the destroyed ao.
Instead, when this happens in libxl__ao_inprogress, simply run round
the event loop again. The thread which eventually makes the callback
will spot our poller in the ao, and notify the poller, waking us up.
This fixes an assertion failure race seen with libvirt:
libvirtd: libxl_event.c:1792: libxl__ao_complete_check_progress_reports: Assertion `ao->in_initiator' failed.
or (after "Add an assert to egc_run_callbacks")
libvirtd: libxl_event.c:1338: egc_run_callbacks: Assertion `aop->ao->magic == 0xA0FACE00ul' failed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit f1335f0d7b2402e94e0c6e8a905dc309edaafcfb)
Ian Jackson [Mon, 9 Feb 2015 15:18:30 +0000 (15:18 +0000)]
libxl: event handling: Break out ao_work_outstanding
Break out the test in libxl__ao_complete_check_progress_reports, into
ao_work_outstanding, which reports false if either (i) the ao is still
ongoing or (ii) there is a progress report (perhaps on a different
thread's callback queue) which has yet to be reported to the
application.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 93699882d98cce9736d6e871db303275df1138a2)
This is because, for free CPUs, -EBUSY were being returned
when trying to tear them down, making cpu_down() unhappy.
It is certainly unpractical to forbid shutting down or
suspenging if there are unassigned CPUs, so this change
fixes the above by just avoiding returning -EBUSY for those
CPUs. If shutting off, that does not matter much anyway. If
suspending, we make sure that the CPUs remain unassigned
when resuming.
While there, take the chance to:
- fix the doc comment of cpupool_cpu_remove() (it was
wrong);
- improve comments in general around and in cpupool_cpu_remove()
and cpupool_cpu_add();
- add a couple of ASSERT()-s for checking consistency.
Ian Campbell [Thu, 28 May 2015 12:00:20 +0000 (14:00 +0200)]
xen: common: Use unbounded array for symbols_offset.
Using a singleton array causes gcc5 to report:
symbols.c: In function 'symbols_lookup':
symbols.c:128:359: error: array subscript is above array bounds [-Werror=array-bounds]
symbols.c:136:176: error: array subscript is above array bounds [-Werror=array-bounds]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 3f82ea62826d4eb06002d8dba475bafcc454b845
master date: 2015-03-20 12:02:03 +0000
Andrew Cooper [Thu, 28 May 2015 10:29:49 +0000 (12:29 +0200)]
x86/irq: limit the maximum number of domain PIRQs
c/s 7e73a6e "have architectures specify the number of PIRQs a hardware domain
gets" increased the default number of pirqs for dom0, as 256 was found to be
too low in some cases.
However, it didn't account for the upper bound presented by the domains EOI
bitmap, registered with the PHYSDEVOP_pirq_eoi_gmfn_v* hypercall.
On a server with 240 cpus, Xen was observed to be attempting to clear the EOI
bit for dom0's pirq 0xb40f, which hit a pagefault.
Andrew Cooper [Mon, 2 Mar 2015 15:04:37 +0000 (15:04 +0000)]
tools/xenconsoled: Increase file descriptor limit
XenServer's VM density testing uncovered a regression when moving from
sysvinit to systemd where the file descriptor limit dropped from 4096 to
1024. (XenServer had previously inserted a ulimit statement into its
initscripts.)
One solution is to use LimitNOFILE=4096 in xenconsoled.service to match the
lost ulimit, but that is only a stopgap solution.
As Xenconsoled genuinely needs a large number of file descriptors if a large
number of domains are running, attempt to increase the limit.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 588df84c0d702e835e526ecef3af6c5444857558)
Ross Lagerwall [Tue, 24 Feb 2015 08:05:50 +0000 (08:05 +0000)]
tools/hotplug: systemd: Don't ever kill xenstored
Don't kill xenstored as part of the usual service shutdown process to
prevent hangs on shutdown where the kernel tries to unplug a VIF
after xenstored has exited.
In an ideal case with all guests cooperating, xendomains will have shut
down all guests before xenstored is killed.
However in the uncooperative case, malicious or crashed guests may still
be running after xendomains has exited and this should not block the
shutdown/reboot of dom0.
Xenstored has no state to sync to disk, and never used to be killed in
the sysvinit case; observe the warning in xencommons. Our testing has
shown regressions caused by the change in behaviour between sysvinit and
systemd when it comes to killing xenstored.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- added systemd to title ]
(cherry picked from commit 96e0ee8386cf690c28c46a2c9f75cd2b03f646e1)
Andrew Cooper [Fri, 30 Jan 2015 14:11:14 +0000 (14:11 +0000)]
ocaml/xenctrl: Fix stub_xc_readconsolering()
The Ocaml stub to retrieve the hypervisor console ring had a few problems.
* A single 32k buffer would truncate a large console ring.
* The buffer was static and not under the protection of the Ocaml GC lock so
could be clobbered by concurrent accesses.
* Embedded NUL characters would cause caml_copy_string() (which is strlen()
based) to truncate the buffer.
The function is rewritten from scratch, using the same algorithm as the python
stubs, but uses the protection of the Ocaml GC lock to maintain a static
running total of the ring size, to avoid redundant realloc()ing in future
calls.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Dave Scott <dave.scott@eu.citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: David Scott <dave.scott@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1a010ca99e9b04c1cfbd0ee718aa22d5ebd530ab)
Andrew Cooper [Wed, 28 Jan 2015 17:55:32 +0000 (17:55 +0000)]
ocaml/xenctrl: Make failwith_xc() thread safe
The static error_str[] buffer is not thread-safe, and 1024 bytes is
unreasonably large. Reduce to 256 bytes (which is still much larger than any
current use), and move it to being a stack variable.
Also, propagate the Noreturn attribute from caml_raise_with_string().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Dave Scott <Dave.Scott@eu.citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
(cherry picked from commit c8945d51613450c19e0898b1b3056c90f4929179)
Andrew Cooper [Tue, 27 Jan 2015 20:38:11 +0000 (20:38 +0000)]
ocaml/xenctrl: Check return values from hypercalls
rather than blindly continuing and possibly using negative values.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Dave Scott <dave.scott@eu.citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
(cherry picked from commit 3380f5b6270e6fa4b24313f8808e7625e4c5a6ba)
Ian Jackson [Thu, 5 Mar 2015 16:28:04 +0000 (16:28 +0000)]
libxl: Domain destroy: fork
Call xc_domain_destroy in a subprocess. That allows us to do so
asynchronously, rather than blocking the whole process calling libxl.
The changes in detail:
* Provide an libxl__ev_child in libxl__domain_destroy_state, and
initialise it in libxl__domain_destroy. There is no possibility
to `clean up' a libxl__ev_child, but there need to clean it up, as
the control flow ensures that we only continue after the child has
exited.
* Call libxl__ev_child_fork at the right point and put the call to
xc_domain_destroy and associated logging in the child. (The child
opens a new xenctrl handle because we mustn't use the parent's.)
* Consequently, the success return path from domain_destroy_domid_cb
no longer calls dis->callback. Instead it simply returns.
* We plumb the errorno value through the child's exit status, if it
fits. This means we normally do the logging only in the parent.
* Incidentally, we fix the bug that we were treating the return value
from xc_domain_destroy as an errno value when in fact it is a
return value from do_domctl (in this case, 0 or -1 setting errno).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 188e9c541f698109a76cf334b925c77b6007aba1)
Ian Jackson [Tue, 17 Mar 2015 15:30:58 +0000 (09:30 -0600)]
libxl: Domain destroy: unlock userdata earlier
Unlock the userdata before we actually call xc_domain_destroy. This
leaves open the possibility that other libxl callers will see the
half-destroyed domain (with no devices, paused), but this is fine.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1c91d6fba20c1517d78076d412d33cb41f501be5)
Ian Jackson [Tue, 17 Mar 2015 15:30:57 +0000 (09:30 -0600)]
libxl: In domain death search, start search at first domid we want
From: Ian Jackson <Ian.Jackson@eu.citrix.com>
When domain_death_xswatch_callback needed a further call to
xc_domain_getinfolist it would restart it with the last domain it
found rather than the first one it wants.
If it only wants one it will also only ask for one domain. The result
would then be that it gets the previous domain again (ie, the previous
one to the one it wants), which still doesn't reveal the answer to the
question, and it would therefore loop again.
It's completely unclear to me why I thought it was a good idea to
start the xc_domain_getinfolist with the last domain previously found
rather than the first one left un-confirmed. The code has been that
way since it was introduced.
Instead, start each xc_domain_getinfolist at the next domain whose
status we need to check.
We also need to move the test for !evg into the loop, we now need evg
to compute the arguments to getinfolist.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reported-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 4783c99aab866f470bd59368cfbf5ad5f677b0ec)
Jan Beulich [Tue, 19 May 2015 10:00:09 +0000 (12:00 +0200)]
x86: don't change affinity with interrupt unmasked
With ->startup unmasking the IRQ, setting the affinity afterwards
without masking the IRQ again is invalid namely for MSI (address and
data can't be updated atomically and may - at least for MSI-X - be
cached while unmasked).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
AMD IOMMU: only translate remapped IO-APIC RTEs
1aeb1156fa ("x86 don't change affinity with interrupt unmasked")
introducing RTE reads prior to the respective interrupt having got
enabled for the first time uncovered a bug in 2ca9fbd739 ("AMD IOMMU:
allocate IRTE entries instead of using a static mapping"): We obviously
shouldn't be translating RTEs for which remapping didn't get set up
yet.
x86_emulate: fix EFLAGS setting of CMPXCHG emulation
CMPXCHG sets CF, PF, AF, SF, and OF flags according to the results of the
comparison the rAX with the operand of the instruction.
rAX must be the first argument of the comparison (a minuend), the operand
must be the second one (a subtrahend).
Due to improper order of comparison arguments, CF, PF, AF, SF and OF flags were
set incorrectly in the case of inequality. Need to swap them.
Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
# Commit 20fd4b70a7647656812b8f276510e09b252db9f7
# Date 2015-05-04 12:03:19 +0200
# Author Eugene Korenevsky <ekorenevsky@gmail.com>
# Committer Jan Beulich <jbeulich@suse.com>
test_x86_emulate: extend EFLAGS check of CMPXCHG test
CMPXCHG: in the case of inequality of the rAX and the operand,
need to check CF, PF, AF, SF and OF flags as well.
This adjustment covers the fix of incorrect comparison during
CMPXCHG emulation.
Paul Durrant [Tue, 19 May 2015 09:45:34 +0000 (11:45 +0200)]
x86/hvm: implicitly disable an ioreq server when it is destroyed
Currently, unless a (non-default) ioreq server is explicitly disabled before
being destroyed, its gmfns will not be placed back into the p2m but still
released back into the ioreq_gmfn mask. This is somewhat counter-intuitive
and easily remedied by this small patch.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: ebd41901b96565b392e8d434a4c4ab543d5fb52b
master date: 2015-04-24 12:14:23 +0200
Paul Durrant [Tue, 19 May 2015 09:44:59 +0000 (11:44 +0200)]
x86/hvm: actually release ioreq server pages
hvm_free_ioreq_gmfn has the sense of the ioreq_gmfn mask inverted; it
needs to set a bit to release the gmfn, not clear it.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: cb791b84e7c0adce14194647912c4c3d28cddc4a
master date: 2015-04-24 12:13:48 +0200
Ross Lagerwall [Tue, 19 May 2015 09:43:35 +0000 (11:43 +0200)]
x86/efi: reserve SMBIOS table region when EFI booting
Some EFI firmware implementations may place the SMBIOS table in RAM
marked as BootServicesData, which Xen does not consider as reserved.
When dom0 tries to access the SMBIOS, the region is not contained in the
initial P2M and it crashes with a page fault. To fix this, reserve the
SMBIOS region.
Also, fix the memcmp checks for existence of the SMBIOS.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: bf68adcadaa2b0885c5d2f1c8e2e068e209eb041
master date: 2015-04-17 10:44:48 +0200