It can happen that an fd is deregistered, and closed, and then a new
fd opened, and reregistered, all while another thread is in poll().
If this happens poll might report POLLNVAL, but the event loop would
think that the fd was supposed to have been valid, and then fail an
assertion:
libxl_event.c:1183: afterpoll_check_fd: Assertion `poller->fds_changed || !(fds[slot].revents & 0x020)' failed.
We can't simply ignore POLLNVAL because if we have bugs which cause
messed-up fds, it is a serious problem which we really need to detect.
Instead, add extra tracking to spot when this possibility arises, and
abort on POLLNVAL if we are sure that it is unexpected.
Reported-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 681ce1681622a46d111cfdc4fc07e4cb565ae131)
Ian Jackson [Thu, 9 Jul 2015 16:05:07 +0000 (17:05 +0100)]
libxl: poll: Use poller_get and poller_put for poller_app
This makes the code more regular. We are going to want to do some
more work in poller_get and poller_put, which work also wants to be
done for poller_app.
Two very minor functional changes:
* We call malloc an extra time since poller_app is now a pointer
* ERROR_FAIL on poller_get failing for poller_app is generated in
libxl_ctx_init rather than passed through by libxl_poller_init
from libxl__pipe_nonblock.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit aae37652067eafd0f2b85050306772df0cb71f08)
Ian Jackson [Thu, 9 Jul 2015 15:52:02 +0000 (16:52 +0100)]
libxl: poll: Make libxl__poller_get have only one success return path
In preparation for doing some more work on successful exit.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 6fc946bc5520ebdbba5cbae4d49e53895df8b393)
Ian Campbell [Mon, 13 Jul 2015 12:31:23 +0000 (13:31 +0100)]
tools: libxl: Handle failure to create qemu dm logfile
If libxl_create_logfile fails for some reason then
libxl__create_qemu_logfile previously just carried on and dereferenced
the uninitialised logfile.
Check for the error from libxl_create_logfile, which has already
logged for us.
This was reported as Debian bug #784880.
Reported-by: Russell Coker <russell@coker.com.au> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: 784880@bugs.debian.org Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit e9d3a859977913704605e0fd87887451b12d4722)
Ian Jackson [Mon, 15 Jun 2015 13:50:42 +0000 (14:50 +0100)]
xl: Sane handling of extra config file arguments
Various xl sub-commands take additional parameters containing = as
additional config fragments.
The handling of these config fragments has a number of bugs:
1. Use of a static 1024-byte buffer. (If truncation would occur,
with semi-trusted input, a security risk arises due to quotes
being lost.)
2. Mishandling of the return value from snprintf, so that if
truncation occurs, the to-write pointer is updated with the
wanted-to-write length, resulting in stack corruption. (This is
XSA-137.)
3. Clone-and-hack of the code for constructing the appended
config file.
These are fixed here, by introducing a new function
`string_realloc_append' and using it everywhere. The `extra_info'
buffers are replaced by pointers, which start off NULL and are
explicitly freed on all return paths.
The separate variable which will become dom_info.extra_config is
abolished (which involves moving the clearing of dom_info).
Additional bugs I observe, not fixed here:
4. The functions which now call string_realloc_append use ad-hoc
error returns, with multiple calls to `return'. This currently
necessitates multiple new calls to `free'.
5. Many of the paths in xl call exit(-rc) where rc is a libxl status
code. This is a ridiculous exit status `convention'.
6. The loops for handling extra config data are clone-and-hacks.
7. Once the extra config buffer is accumulated, it must be combined
with the appropriate main config file. The code to do this
combining is clone-and-hacked too.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Tested-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian,campbell@citrix.com>
(cherry picked from commit dd84604f35bd3855c57146eb8fe53924c10d3963)
Elena Ufimtseva [Tue, 21 Jul 2015 09:13:03 +0000 (11:13 +0200)]
dmar: device scope mem leak fix
Release memory allocated for scope.devices dmar units on various
failure paths and when disabling dmar. Set device count after
sucessfull memory allocation, not before, in device scope parsing function.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
# Commit 132231d10343608faf5892785a08acc500326d04
# Date 2015-07-16 15:23:37 +0200
# Author Andrew Cooper <andrew.cooper3@citrix.com>
# Committer Jan Beulich <jbeulich@suse.com>
dmar: fix double free in error paths following c/s a8bc99b
Several error paths would end up freeing scope->devices twice.
Jan Beulich [Tue, 21 Jul 2015 09:12:35 +0000 (11:12 +0200)]
make rangeset_report_ranges() report all ranges
find_range() returns NULL when s is below the lowest range, so we have
to use first_range() here (which is as good performance wise), or else
no range gets reported at all in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: b1c780cd315eb4db06be3bbb5c6d80b1cabd27a9
master date: 2015-07-15 16:11:42 +0200
Ian Campbell [Tue, 21 Jul 2015 09:11:49 +0000 (11:11 +0200)]
xen: earlycpio: Pull in latest linux earlycpio.[ch]
AFAICT our current version does not correspond to any version in the
Linux history. This commit resynchronised to the state in Linux
commit 598bae70c2a8e35c8d39b610cca2b32afcf047af.
Differences from upstream: find_cpio_data is __init, printk instead of
pr_*.
This appears to fix Debian bug #785187. "Appears" because my test box
happens to be AMD and the issue is that the (valid) cpio generated by
the Intel ucode is not liked by the old Xen code. I've tested by
hacking the hypervisor to look for the Intel path.
Reported-by: Stephan Seitz <stse+debianbugs@fsing.rootsland.net> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stephan Seitz <stse+debianbugs@fsing.rootsland.net> Cc: 785187@bugs.debian.org Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 39c6664a0e6e1b4ed80660d545dff34ce41bee31
master date: 2015-07-07 15:10:45 +0100
Andrew Cooper [Tue, 21 Jul 2015 09:11:19 +0000 (11:11 +0200)]
x86/hvmloader: avoid data corruption with xenstore reads/writes
The functions ring_read and ring_write() have logic to try and deal with
partial reads and writes.
However, in all cases where the "while (len)" loop executed twice, data
corruption would occur as the second memcpy() starts from the beginning of
"data" again, rather than from where it got to.
This bug manifested itself as protocol corruption when a reply header crossed
the first wrap of the response ring. However, similar corruption would also
occur if hvmloader observed xenstored performing partial writes of the block
in question, or if hvmloader had to wait for xenstored to make space in either
ring.
Reported-by: Adam Kucia <djexit@o2.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: bbbe7e7157a964c485fb861765be291734676932
master date: 2015-07-07 14:39:27 +0200
credit1: properly deal with pCPUs not in any cpupool
Ideally, the pCPUs that are 'free', i.e., not assigned
to any cpupool, should not be considred by the scheduler
for load balancing or anything. In Credit1, we fail at
this, because of how we use cpupool_scheduler_cpumask().
In fact, for a free pCPU, cpupool_scheduler_cpumask()
returns a pointer to cpupool_free_cpus, and hence, near
the top of csched_load_balance():
if ( unlikely(!cpumask_test_cpu(cpu, online)) )
goto out;
is false (the pCPU _is_ free!), and we therefore do not
jump to the end right away, as we should. This, causes
the following splat when resuming from ACPI S3 with
pCPUs not assigned to any pool:
The cure is:
* use cpupool_online_cpumask(), as a better guard to the
case when the cpu is being offlined;
* explicitly check whether the cpu is free.
SEDF is in a similar situation, so fix it too.
Still in Credit1, we must make sure that free (or offline)
CPUs are not considered "ticklable". Not doing so would impair
the load balancing algorithm, making the scheduler think that
it is possible to 'ask' the pCPU to pick up some work, while
in reallity, that will never happen! Evidence of such behavior
is shown in this trace:
Name CPU list
Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
In fact, when a pCPU goes down, we want to clear its
bit in the correct cpupool's valid mask, rather than
always in cpupool0's one.
Before this commit, all the pCPUs in the non-default
pool(s) will be considered immediately valid, during
system resume, even the one that have not been brought
up yet. As a result, the (Credit1) scheduler will attempt
to run its load balancing logic on them, causing the
following Oops:
The reason why the error is a #GP fault is that, without
this commit, we try to access the per-cpu area of a not
yet allocated and initialized pCPU.
In fact, %rax, which is what is used as pointer, is 80007d2f7fccb780, and we also have this:
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: e4e9d2d4e76bd8fe229c124bd57fc6ba824271b3
master date: 2015-07-07 11:37:26 +0200
Liang Li [Tue, 21 Jul 2015 09:08:05 +0000 (11:08 +0200)]
nested EPT: fix the handling of nested EPT
If the host EPT entry is changed, the nested EPT should be updated.
the current code does not do this, and it's wrong.
I have tested this patch, the L2 guest can boot and run as normal.
Signed-off-by: Liang Li <liang.z.li@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Reported-by: Tim Deegan <tim@xen.org> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 71bb7304e7a7a35ea6df4b0cedebc35028e4c159
master date: 2015-06-30 15:00:54 +0100
Andrew Cooper [Mon, 13 Jul 2015 11:54:47 +0000 (13:54 +0200)]
x86/traps: avoid using current too early on boot
Early on boot, current has the sentinel value 0xfffff000. Blindly using it in
show_registers() causes a nested failure and no useful information printed
from an early crash.
Ross Lagerwall [Mon, 13 Jul 2015 11:53:56 +0000 (13:53 +0200)]
x86: avoid tripping watchdog when constructing dom0
Constructing dom0 may take a few seconds, particularly if the slow VESA
graphics terminal is used. Process pending softirqs a few times to avoid
tripping a watchdog with a short timeout.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Move inclusion of xen/softirq.h (and at once clean up other includes).
Jan Beulich [Mon, 13 Jul 2015 11:52:27 +0000 (13:52 +0200)]
kexec: add more pages to v1 environment
Destination pages need mappings to be added to the page tables in the
v1 case (where nothing else calls machine_kexec_add_page() for them).
Further, without the tools mapping the low 1Mb (expected by at least
some Linux version), we need to do so in the hypervisor in the v1 case.
Suggested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Alan Robinson <alan.robinson@ts.fujitsu.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 5cb57f4bddee1f11079e69bf43c193a8b104c476
master date: 2015-06-09 16:00:24 +0200
Tim Deegan [Mon, 13 Jul 2015 11:51:10 +0000 (13:51 +0200)]
passthrough/amd: avoid reading an uninitialized variable
update_intremap_entry_from_msi() doesn't write to its data pointer on
some error paths, so we copying that variable into the msg would count
as undefined behaviour.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
master commit: a8ccf2d9f6f291f8fc6003e3d8bc7275ac1cc69f
master date: 2015-04-24 12:04:57 +0200
The fix is, when tearing down a pCPU, call the free_pdata()
hook from the scheduler of the cpupool the pCPU belongs to,
not always the one from the default scheduler.
Jan Beulich [Thu, 18 Jun 2015 06:57:03 +0000 (08:57 +0200)]
VT-d: extend quirks to newer desktop chipsets
We're being told that while on the server side the issue we're trying
to work around is fixed starting with IvyBridge (another round of
double checking is going on before we're going to remove the one
IvyBridge ID that we're currently applying the workaround for), on the
desktop side even Skylake still requires the workaround. Hence we need
to add a whole bunch of desktop IDs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Don Dugger <donald.d.dugger@intel.com>
master commit: cdc6204b7749a53e6a4d95fac4440601c35a539d
master date: 2015-06-11 11:55:05 +0200
We also alter the 'efi-rs' to be 'efi=rs' or 'efi=no-rs'.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 74cdad5dae72c78af4f6b343f38fd55e6a526ab1
master date: 2015-06-10 12:04:07 +0200
Jan Beulich [Thu, 18 Jun 2015 06:54:57 +0000 (08:54 +0200)]
x86/EFI: fix EFI_MEMORY_WP handling
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Backport: Also fix EFI_MEMORY_XP handling (along the lines of what
master commit abcf15fa8f ["x86: switch default mapping attributes
to non-executable"] does): We must not set the NX bit when the CPU
doesn't support it, as the bit being set may trigger Reserved Bit
faults in that case.
Ross Lagerwall [Thu, 18 Jun 2015 06:54:08 +0000 (08:54 +0200)]
efi: avoid calling boot services after ExitBootServices()
After the first call to ExitBootServices(), avoid calling any boot
services (except GetMemoryMap() and ExitBootServices()) by setting
setting efi_bs to NULL and halting in blexit(). Only GetMemoryMap() and
ExitBootServices() are explicitly allowed to be called after the first
call to ExitBootServices() and so are are called via
SystemTable->BootServices.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: d4300db3a03a0cd999745135d7879fc4b6b5aa61
master date: 2015-06-10 12:00:10 +0200
Andrew Cooper [Thu, 18 Jun 2015 06:51:00 +0000 (08:51 +0200)]
x86/apic: Disable the LAPIC later in smp_send_stop()
__stop_this_cpu() may reset the LAPIC mode back from x2apic to xapic, but will
leave x2apic_enabled alone. This may cause disconnect_bsp_APIC() in
disable_IO_APIC() to suffer a #GP fault.
Disabling the LAPIC can safely be deferred to being the last action.
Ross Lagerwall [Thu, 18 Jun 2015 06:48:59 +0000 (08:48 +0200)]
efi: fix allocation problems if ExitBootServices() fails
If calling ExitBootServices() fails, the required memory map size may
have increased. When initially allocating the memory map, allocate a
slightly larger buffer (by an arbitrary 8 entries) to fix this.
The ARM code path was already allocating a larger buffer than required,
so this moves the code to be common for all architectures.
This was seen on the following machine when using the iscsidxe UEFI
driver. The machine would consistently fail the first call to
ExitBootServices().
System Information
Manufacturer: Supermicro
Product Name: X10SLE-F/HF
BIOS Information
Vendor: American Megatrends Inc.
Version: 2.00
Release Date: 04/24/2014
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roy Franz <roy.franz@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
EFI: map allocation size must be set to zero
Commit 8a753b3f1c ("efi: fix allocation problems if ExitBootServices()
fails") replaced the use of a static (and hence zero-initialized)
variable by an automatic (and hence uninitialized) one.
Also drop the variable introduced by that commit in favor of re-using
another available and suitable one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 8a753b3f1cf5e4714974196df9517849bf174324
master date: 2015-06-02 13:44:24 +0200
master commit: 4c94684bb7c20ff01d03fb1f22c03cc0c2fc417b
master date: 2015-06-11 14:47:54 +0200
Ross Lagerwall [Thu, 18 Jun 2015 06:48:13 +0000 (08:48 +0200)]
x86: don't crash when mapping a page using EFI runtime page tables
When an interrupt is received during an EFI runtime service call, Xen
may call map_domain_page() while using the EFI runtime page tables.
This fails because, although the EFI runtime page tables are a
copy of the idle domain's page tables, current points at a different
domain's vCPU.
To fix this, return NULL from mapcache_current_vcpu() when using the EFI
runtime page tables which is treated equivalently to running in an idle
vCPU.
This issue can be reproduced by repeatedly calling GetVariable() from
dom0 while using VT-d, since VT-d frequently maps a page from interrupt
context.
Enabling posted interrupts requires the virtual interrupt delivery feature,
which is disabled for PVH guests, so make sure posted interrupts are also
disabled or else vmlaunch will fail.
Andrew Cooper [Mon, 13 Apr 2015 16:07:03 +0000 (16:07 +0000)]
tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
uint64_t and unsigned long.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 12e817e281034f5881f46e0e4f1d127820101a78)
libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.
There is no sense in trying to online (or offline) CPUs when the size of
cpumap is greater than the maximum number of VCPUs the guest can go to.
As such fail the operation if the count of CPUs to online is greater
than what the guest started with. For the offline case we do not
check (as the bits are unset in the cpumap) and let it go through.
We coalesce some of the underlying libxl_set_vcpuonline code
together which was duplicated in QMP and XenStore codepaths.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit d83bf9d224eeb5b73b93c2703f7dba4473cfa89c)
Conflicts:
tools/libxl/libxl.c Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 9 Feb 2015 15:20:32 +0000 (15:20 +0000)]
libxl: event handling: ao_inprogress does waits while reports outstanding
libxl__ao_inprogress needs to check (like
libxl__ao_complete_check_progress_reports) that there are no
oustanding progress callbacks.
Otherwise it might happen that we would destroy the ao while another
thread has an outstanding callback its egc report queue. The other
thread would then, in its egc_run_callbacks, touch the destroyed ao.
Instead, when this happens in libxl__ao_inprogress, simply run round
the event loop again. The thread which eventually makes the callback
will spot our poller in the ao, and notify the poller, waking us up.
This fixes an assertion failure race seen with libvirt:
libvirtd: libxl_event.c:1792: libxl__ao_complete_check_progress_reports: Assertion `ao->in_initiator' failed.
or (after "Add an assert to egc_run_callbacks")
libvirtd: libxl_event.c:1338: egc_run_callbacks: Assertion `aop->ao->magic == 0xA0FACE00ul' failed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit f1335f0d7b2402e94e0c6e8a905dc309edaafcfb)
Ian Jackson [Mon, 9 Feb 2015 15:18:30 +0000 (15:18 +0000)]
libxl: event handling: Break out ao_work_outstanding
Break out the test in libxl__ao_complete_check_progress_reports, into
ao_work_outstanding, which reports false if either (i) the ao is still
ongoing or (ii) there is a progress report (perhaps on a different
thread's callback queue) which has yet to be reported to the
application.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 93699882d98cce9736d6e871db303275df1138a2)
This is because, for free CPUs, -EBUSY were being returned
when trying to tear them down, making cpu_down() unhappy.
It is certainly unpractical to forbid shutting down or
suspenging if there are unassigned CPUs, so this change
fixes the above by just avoiding returning -EBUSY for those
CPUs. If shutting off, that does not matter much anyway. If
suspending, we make sure that the CPUs remain unassigned
when resuming.
While there, take the chance to:
- fix the doc comment of cpupool_cpu_remove() (it was
wrong);
- improve comments in general around and in cpupool_cpu_remove()
and cpupool_cpu_add();
- add a couple of ASSERT()-s for checking consistency.
Ian Campbell [Thu, 28 May 2015 12:00:20 +0000 (14:00 +0200)]
xen: common: Use unbounded array for symbols_offset.
Using a singleton array causes gcc5 to report:
symbols.c: In function 'symbols_lookup':
symbols.c:128:359: error: array subscript is above array bounds [-Werror=array-bounds]
symbols.c:136:176: error: array subscript is above array bounds [-Werror=array-bounds]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 3f82ea62826d4eb06002d8dba475bafcc454b845
master date: 2015-03-20 12:02:03 +0000
Andrew Cooper [Thu, 28 May 2015 10:29:49 +0000 (12:29 +0200)]
x86/irq: limit the maximum number of domain PIRQs
c/s 7e73a6e "have architectures specify the number of PIRQs a hardware domain
gets" increased the default number of pirqs for dom0, as 256 was found to be
too low in some cases.
However, it didn't account for the upper bound presented by the domains EOI
bitmap, registered with the PHYSDEVOP_pirq_eoi_gmfn_v* hypercall.
On a server with 240 cpus, Xen was observed to be attempting to clear the EOI
bit for dom0's pirq 0xb40f, which hit a pagefault.
Andrew Cooper [Mon, 2 Mar 2015 15:04:37 +0000 (15:04 +0000)]
tools/xenconsoled: Increase file descriptor limit
XenServer's VM density testing uncovered a regression when moving from
sysvinit to systemd where the file descriptor limit dropped from 4096 to
1024. (XenServer had previously inserted a ulimit statement into its
initscripts.)
One solution is to use LimitNOFILE=4096 in xenconsoled.service to match the
lost ulimit, but that is only a stopgap solution.
As Xenconsoled genuinely needs a large number of file descriptors if a large
number of domains are running, attempt to increase the limit.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 588df84c0d702e835e526ecef3af6c5444857558)
Ross Lagerwall [Tue, 24 Feb 2015 08:05:50 +0000 (08:05 +0000)]
tools/hotplug: systemd: Don't ever kill xenstored
Don't kill xenstored as part of the usual service shutdown process to
prevent hangs on shutdown where the kernel tries to unplug a VIF
after xenstored has exited.
In an ideal case with all guests cooperating, xendomains will have shut
down all guests before xenstored is killed.
However in the uncooperative case, malicious or crashed guests may still
be running after xendomains has exited and this should not block the
shutdown/reboot of dom0.
Xenstored has no state to sync to disk, and never used to be killed in
the sysvinit case; observe the warning in xencommons. Our testing has
shown regressions caused by the change in behaviour between sysvinit and
systemd when it comes to killing xenstored.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- added systemd to title ]
(cherry picked from commit 96e0ee8386cf690c28c46a2c9f75cd2b03f646e1)
Andrew Cooper [Fri, 30 Jan 2015 14:11:14 +0000 (14:11 +0000)]
ocaml/xenctrl: Fix stub_xc_readconsolering()
The Ocaml stub to retrieve the hypervisor console ring had a few problems.
* A single 32k buffer would truncate a large console ring.
* The buffer was static and not under the protection of the Ocaml GC lock so
could be clobbered by concurrent accesses.
* Embedded NUL characters would cause caml_copy_string() (which is strlen()
based) to truncate the buffer.
The function is rewritten from scratch, using the same algorithm as the python
stubs, but uses the protection of the Ocaml GC lock to maintain a static
running total of the ring size, to avoid redundant realloc()ing in future
calls.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Dave Scott <dave.scott@eu.citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: David Scott <dave.scott@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1a010ca99e9b04c1cfbd0ee718aa22d5ebd530ab)
Andrew Cooper [Wed, 28 Jan 2015 17:55:32 +0000 (17:55 +0000)]
ocaml/xenctrl: Make failwith_xc() thread safe
The static error_str[] buffer is not thread-safe, and 1024 bytes is
unreasonably large. Reduce to 256 bytes (which is still much larger than any
current use), and move it to being a stack variable.
Also, propagate the Noreturn attribute from caml_raise_with_string().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Dave Scott <Dave.Scott@eu.citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
(cherry picked from commit c8945d51613450c19e0898b1b3056c90f4929179)
Andrew Cooper [Tue, 27 Jan 2015 20:38:11 +0000 (20:38 +0000)]
ocaml/xenctrl: Check return values from hypercalls
rather than blindly continuing and possibly using negative values.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Dave Scott <dave.scott@eu.citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
(cherry picked from commit 3380f5b6270e6fa4b24313f8808e7625e4c5a6ba)
Ian Jackson [Thu, 5 Mar 2015 16:28:04 +0000 (16:28 +0000)]
libxl: Domain destroy: fork
Call xc_domain_destroy in a subprocess. That allows us to do so
asynchronously, rather than blocking the whole process calling libxl.
The changes in detail:
* Provide an libxl__ev_child in libxl__domain_destroy_state, and
initialise it in libxl__domain_destroy. There is no possibility
to `clean up' a libxl__ev_child, but there need to clean it up, as
the control flow ensures that we only continue after the child has
exited.
* Call libxl__ev_child_fork at the right point and put the call to
xc_domain_destroy and associated logging in the child. (The child
opens a new xenctrl handle because we mustn't use the parent's.)
* Consequently, the success return path from domain_destroy_domid_cb
no longer calls dis->callback. Instead it simply returns.
* We plumb the errorno value through the child's exit status, if it
fits. This means we normally do the logging only in the parent.
* Incidentally, we fix the bug that we were treating the return value
from xc_domain_destroy as an errno value when in fact it is a
return value from do_domctl (in this case, 0 or -1 setting errno).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 188e9c541f698109a76cf334b925c77b6007aba1)
Ian Jackson [Tue, 17 Mar 2015 15:30:58 +0000 (09:30 -0600)]
libxl: Domain destroy: unlock userdata earlier
Unlock the userdata before we actually call xc_domain_destroy. This
leaves open the possibility that other libxl callers will see the
half-destroyed domain (with no devices, paused), but this is fine.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1c91d6fba20c1517d78076d412d33cb41f501be5)
Ian Jackson [Tue, 17 Mar 2015 15:30:57 +0000 (09:30 -0600)]
libxl: In domain death search, start search at first domid we want
From: Ian Jackson <Ian.Jackson@eu.citrix.com>
When domain_death_xswatch_callback needed a further call to
xc_domain_getinfolist it would restart it with the last domain it
found rather than the first one it wants.
If it only wants one it will also only ask for one domain. The result
would then be that it gets the previous domain again (ie, the previous
one to the one it wants), which still doesn't reveal the answer to the
question, and it would therefore loop again.
It's completely unclear to me why I thought it was a good idea to
start the xc_domain_getinfolist with the last domain previously found
rather than the first one left un-confirmed. The code has been that
way since it was introduced.
Instead, start each xc_domain_getinfolist at the next domain whose
status we need to check.
We also need to move the test for !evg into the loop, we now need evg
to compute the arguments to getinfolist.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reported-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 4783c99aab866f470bd59368cfbf5ad5f677b0ec)
Jan Beulich [Tue, 19 May 2015 10:00:09 +0000 (12:00 +0200)]
x86: don't change affinity with interrupt unmasked
With ->startup unmasking the IRQ, setting the affinity afterwards
without masking the IRQ again is invalid namely for MSI (address and
data can't be updated atomically and may - at least for MSI-X - be
cached while unmasked).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
AMD IOMMU: only translate remapped IO-APIC RTEs
1aeb1156fa ("x86 don't change affinity with interrupt unmasked")
introducing RTE reads prior to the respective interrupt having got
enabled for the first time uncovered a bug in 2ca9fbd739 ("AMD IOMMU:
allocate IRTE entries instead of using a static mapping"): We obviously
shouldn't be translating RTEs for which remapping didn't get set up
yet.
x86_emulate: fix EFLAGS setting of CMPXCHG emulation
CMPXCHG sets CF, PF, AF, SF, and OF flags according to the results of the
comparison the rAX with the operand of the instruction.
rAX must be the first argument of the comparison (a minuend), the operand
must be the second one (a subtrahend).
Due to improper order of comparison arguments, CF, PF, AF, SF and OF flags were
set incorrectly in the case of inequality. Need to swap them.
Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
# Commit 20fd4b70a7647656812b8f276510e09b252db9f7
# Date 2015-05-04 12:03:19 +0200
# Author Eugene Korenevsky <ekorenevsky@gmail.com>
# Committer Jan Beulich <jbeulich@suse.com>
test_x86_emulate: extend EFLAGS check of CMPXCHG test
CMPXCHG: in the case of inequality of the rAX and the operand,
need to check CF, PF, AF, SF and OF flags as well.
This adjustment covers the fix of incorrect comparison during
CMPXCHG emulation.
Paul Durrant [Tue, 19 May 2015 09:45:34 +0000 (11:45 +0200)]
x86/hvm: implicitly disable an ioreq server when it is destroyed
Currently, unless a (non-default) ioreq server is explicitly disabled before
being destroyed, its gmfns will not be placed back into the p2m but still
released back into the ioreq_gmfn mask. This is somewhat counter-intuitive
and easily remedied by this small patch.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: ebd41901b96565b392e8d434a4c4ab543d5fb52b
master date: 2015-04-24 12:14:23 +0200
Paul Durrant [Tue, 19 May 2015 09:44:59 +0000 (11:44 +0200)]
x86/hvm: actually release ioreq server pages
hvm_free_ioreq_gmfn has the sense of the ioreq_gmfn mask inverted; it
needs to set a bit to release the gmfn, not clear it.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: cb791b84e7c0adce14194647912c4c3d28cddc4a
master date: 2015-04-24 12:13:48 +0200
Ross Lagerwall [Tue, 19 May 2015 09:43:35 +0000 (11:43 +0200)]
x86/efi: reserve SMBIOS table region when EFI booting
Some EFI firmware implementations may place the SMBIOS table in RAM
marked as BootServicesData, which Xen does not consider as reserved.
When dom0 tries to access the SMBIOS, the region is not contained in the
initial P2M and it crashes with a page fault. To fix this, reserve the
SMBIOS region.
Also, fix the memcmp checks for existence of the SMBIOS.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: bf68adcadaa2b0885c5d2f1c8e2e068e209eb041
master date: 2015-04-17 10:44:48 +0200
Ian Jackson [Tue, 21 Apr 2015 13:47:38 +0000 (14:47 +0100)]
Config.mk: Fix QEMU_TAG and QEMU_TRADITIONAL_REVISION handling
In 2417e243 "QEMU_TAG update" my tag update script mangled the
machinery which sets QEMU_TRADITIONAL_REVISION, by replacing the first
assignment to QEMU_TRADITIONAL_REVISION it found rather than the one
which ought to have been replaced.
The result was that from that commit on, QEMU_TAG was no longer
honoured although QEMU_TRADITIONAL_REVISION still was.
Fix this by restoring the transfer from QEMU_TAG.
This fix is analogous to 5d4c0952 "Config.mk: Fix (and, effectively,
update) QEMU_TAG" from xen.git#staging, but in 4.5 there has not been
a subsequent attempt to update the qemu revision, so both settings of
QEMU_TRADITIONAL_REVISION are the same, and restoring the proper
behaviour will not have the side effect of making a previous qemu tag
update effective.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: George Dunlap <george.dunlap@eu.citrix.com> CC: Jan Beulich <jbeulich@suse.com> Reported-by: Jan Beulich <jbeulich@suse.com>
Liang Li [Tue, 21 Apr 2015 07:16:06 +0000 (09:16 +0200)]
x86/hvm: fix the unknown nested vmexit reason 80000021 bug
This bug will be trigged when NMI happen in the L2 guest. The current
code handles the NMI incorrectly. According to Intel SDM 31.7.1.2
(Resuming Guest Software after Handling an Exception), If bit 31 of the
IDT-vectoring information fields is set, and the virtual NMIs VM-execution
control is 1, while bits 10:8 in the IDT-vectoring information field is
2, bit 3 in the interruptibility-state field should be cleared to avoid
the next VM entry fail.
Signed-off-by: Liang Li <liang.z.li@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f7708796db463410a97b5fa3dd5902f6e8a1d102
master date: 2015-04-10 11:35:46 -0400
Jan Beulich [Tue, 21 Apr 2015 07:14:34 +0000 (09:14 +0200)]
VT-d: improve fault info logging
I got repeatedly annoyed by there not getting anything logged by
default on VT-d faults (and hence having to tell people to add extra
command line options), and hence I think it is time to redo this code:
Log basic fault information at guest-warning level (rate limited by
default), and show the page walk in verbose rather than only in debug
mode. Break up multi-line message so that each gets a proper log level
attached, at once splitting out the common part. Also don't log
"unknown" faults as interrupt-remapping ones.
As a minor cleanup fix the type of the involved "fault_type" variables.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: f0250f4b4818f5f4230995407ea2501de3485934
master date: 2015-03-27 15:23:25 +0100
Jan Beulich [Tue, 21 Apr 2015 07:13:51 +0000 (09:13 +0200)]
x86/MSI: fix error handling
__setup_msi_irq() needs to undo what it did before calling
write_msi_msg() in case that returned an error.
map_domain_pirq() needs to get rid of the MSI descriptor it
(implicitly) allocated. The case of a setup_msi_irq() failure on a
non-initial multi-vector-MSI interrupt needs special handling: While
the initial IRQ will get freed by the caller (who also passed it to
us), we need to undo the effect setup_msi_irq() had on it. (As a
benefit from the added call to msi_free_irq() we no longer need to
explicitly call destroy_irq() on the non-initial slots.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 29c1b7886c36d4e6aa03a779b2251b829d9689c3
master date: 2015-03-26 11:19:57 +0100
If the part of the compression data are corrupted, or the compression
data is totally fake, the memory access over the limit is possible.
This is the log from my system usning lz4 decompression.
[6502]data abort, halting
[6503]r0 0x00000000 r1 0x00000000 r2 0xdcea0ffc r3 0xdcea0ffc
[6509]r4 0xb9ab0bfd r5 0xdcea0ffc r6 0xdcea0ff8 r7 0xdce80000
[6515]r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0xb9a98000
[6522]r12 0xdcea1000 usp 0x00000000 ulr 0x00000000 pc 0x820149bc
[6528]spsr 0x400001f3
and the memory addresses of some variables at the moment are
ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000
As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory
over @oend.
Signed-off-by: JeHyeon Yeon <tom.yeon@windriver.com> Reviewed-by: David Sterba <dsterba@suse.cz>
[Linux commit d5e7cafd69da24e6d6cc988fab6ea313a2577efc] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: fcc17f96c2776eb220e3dee79fd0ad6a624ffcd9
master date: 2015-03-26 11:19:10 +0100
Jan Beulich [Tue, 21 Apr 2015 07:12:50 +0000 (09:12 +0200)]
hvmloader: don't treat ROM BAR like other BARs
Its low 11 bits have different meaning.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 73aa7fc2926c5ae30d8ebd049beadbb48e24d6c6
master date: 2015-03-26 11:17:51 +0100
Andrew Cooper [Tue, 21 Apr 2015 07:10:16 +0000 (09:10 +0200)]
domctl/sysctl: don't leak hypervisor stack to toolstacks
This is CVE-2015-3340 / XSA-132.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 4ff3449f0e9d175ceb9551d3f2aecb59273f639d
master date: 2015-04-21 09:03:15 +0200
Julien Grall [Wed, 14 Jan 2015 18:00:43 +0000 (18:00 +0000)]
xen/arm: Blacklist the memory mapped timer (armv7-timer-mem)
Some platform (such as the VFP Base AEMv8 model) has a memory mapped
timer. We don't want DOM0 use this timer rather than the generic ARM
timer. So blacklist it for all platforms.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 770ca104b5bb4dd3fee4071e2d8f1133f3c6d134)
Ian Campbell [Mon, 30 Mar 2015 11:12:22 +0000 (12:12 +0100)]
xen: arm: Factor out psr_mode_is_user
This embodies the logic on arm64 that userspace can be either 32-bit
or 64-bit. It will be used in other places shortly.
Note that the logic differs slightly because the original (in
inject_abt64_exception) knew that the kernel was 64-bit and could
therefore assume that any 32-bit mode was userspace. Instead the
refactored code explicitly checks for usr mode.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 9e1342cf1aabf5841293d32c918f59ffee01ec67)
Jan Beulich [Fri, 23 Jan 2015 14:02:39 +0000 (15:02 +0100)]
arm64: fix fls()
It using CLZ on a 64-bit register while specifying the input operand as
only 32 bits wide is wrong: An operand intentionally shrunk down to 32
bits at the source level doesn't imply respective zero extension also
happens at the machine instruction level, and hence the wrong result
could get returned.
Add suitable inline assembly abstraction so that the function can
remain shared between arm32 and arm64. The need to include asm_defns.h
in bitops.h makes it necessary to adjust processor.h though - it is
generally wrong to include public headers without making sure that
integer types are properly defined. (I didn't innvestigate or try
whether the possible alternative of moving the public/arch-arm.h
inclusion down in the file would also work.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit e9c1747a25763c66de4a06923946c59a9f043069)
Ian Campbell [Thu, 26 Mar 2015 10:54:04 +0000 (10:54 +0000)]
xen: arm: correctly handle continuations for 64-bit guests
The 64-bit ABI is different to 32-bit:
- uses x16 as the op register rather than r12.
- arguments in x0..x5 and not r0..r5. Using rN here potentially
truncates.
- return value goes in x0, not r0.
Hypercalls can only be made directly from kernel space, so checking
the domain's size is sufficient.
Spotted due to spurious -EFAULT when destroying a domain, due to the
hypercall's pointer argument being truncated. I'm unclear why I am
only seeing this now.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 7466b88ec2c59b1769da358bdc843bcabbee522c)
All the GICv2 registers are word-accessible. Some them are also
byte-accessible (see GICD_IPRIORITYR*).
Those registers are incorrectly implemented when they should be RAZ. Only
word-access size are currently allowed for them.
To avoid further issues, introduce different label following the access-size
of the registers:
- read_as_zero_32 and write_ignore_32: Used for registers accessible
via a word.
- read_as_zero: Used when we don't have to check the access size.
The latter is used when the access size has already been checked in the
register emulation and/or when the register offset is reserved/implementation
defined.
Note that, only used labels has been introduced.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1fefa550274758204a6bf58ea9b9509296197080)
Julien Grall [Mon, 16 Feb 2015 14:50:42 +0000 (14:50 +0000)]
xen/arm: vgic-v3: Correctly set GICD_TYPER.CPUNumber
On GICv3, the value (CPUNumber + 1) indicates the number of processor that may
be used as interrupts targets when ARE bit is zero. The maximum is 8
processors.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 834551bace5cfda7ca5ebbdc2ec9fd18f002e4ce)
Julien Grall [Mon, 16 Feb 2015 14:50:41 +0000 (14:50 +0000)]
xen/arm: vgic-v3: Correctly set GICD_TYPER.IDbits
From Linux 3.19, the GICv3 drivers is using GICD_TYPER.IDbits to check
the validity of the hardware interrupt number.
The field IDBits in the register GICD_TYPER is used to know the number of
interrupt identifiers (SPI, PPIs, SGIs, LPIs) supported by GIC Stream Protocol
Interface.
This field contains the number of interrupt identifier bits minus one.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 8206d052eb11061d7b6cada566c0804c14001fec)
Julien Grall [Thu, 15 Jan 2015 20:23:40 +0000 (20:23 +0000)]
xen/arm: vgic: Rename nr_lines into nr_spis
The field nr_lines in the arch_domain vgic structure contains the number of
SPIs for the emulated GIC. Using the nr_lines make confusion with the GIC
code, where it means the number of IRQs. This can lead to coding error.
Also introduce vgic_num_irqs to get the number of IRQ handled by the emulated
GIC.
Finally drop the initialization of nr_spis in both gicv2v_init and gicv3_init
as it's already initialized in vgic common code.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 96fcbd0599b74b4b2629447a6ee90580f43e3aa4)
Limit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)
Said hypercall for large BARs can take quite a while. As such
we can require that the hypercall MUST break up the request
in smaller values.
Another approach is to add preemption to it - whether we do the
preemption using hypercall_create_continuation or returning
EAGAIN to userspace (and have it re-invocate the call) - either
way the issue we cannot easily solve is that in 'map_mmio_regions'
if we encounter an error we MUST call 'unmap_mmio_regions' for the
whole BAR region.
Since the preemption would re-use input fields such as nr_mfns,
first_gfn, first_mfn - we would lose the original values -
and only undo what was done in the current round (i.e. ignoring
anything that was done prior to earlier preemptions).
Unless we re-used the return value as 'EAGAIN|nr_mfns_done<<10' but
that puts a limit (since the return value is a long) on the amount
of nr_mfns that can provided.
This patch sidesteps this problem by:
- Setting an hard limit of nr_mfns having to be 64 or less.
- Toolstack adjusts correspondingly to the nr_mfn limit.
- If the there is an error when adding the toolstack will call the
remove operation to remove the whole region.
The need to break this hypercall down is for large BARs can take
more than the guest (initial domain usually) time-slice. This has
the negative result in that the guest is locked out for a long
duration and is unable to act on any pending events.
We also augment the code to return zero if nr_mfns instead
of trying to the hypercall.
This is XSA-125 / CVE-2015-2752.
Suggested-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ross Lagerwall [Thu, 26 Mar 2015 07:27:13 +0000 (08:27 +0100)]
x86: don't apply reboot quirks if reboot set by user
If reboot= is specified on the command-line, don't apply reboot quirks
to allow the command-line option to take precedence.
This is a port of Linux commit 5955633e91bf ("x86/reboot: Skip DMI
checks if reboot set by user").
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Leverage (and make apply on top of) c643fb110a ("x86/EFI: allow
reboot= overrides when running under EFI").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 9832f5e8e3575f8affceb2751f7422704bf7b446
master date: 2015-03-13 12:41:51 +0100
At the point this patch calls domain_update_node_affinity(), the vcpu
hard affinities have not yet been updated; so calling it at this point
can in some circumstances trigger an ASSERT().
domain_update_node_affinity() is already called in
cpu_disable_scheduler(), so adding it to cpupool_unassign_cpu() is
redundant. Simply reverting the patch is sufficient.
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
x86/EFI: allow reboot= overrides when running under EFI
By default we will always use EFI reboot mechanism when
running under EFI platforms. However some EFI platforms
are buggy and need to use the ACPI mechanism to
reboot (such as Lenovo ThinkCentre M57). As such
respect the 'reboot=' override and DMI overrides
for EFI platforms.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
- BOOT_INVALID is just zero
- also consider acpi_disabled in BOOT_INVALID resolution
- duplicate BOOT_INVALID resolution in machine_restart()
- don't fall back from BOOT_ACPI to BOOT_EFI (if it was overridden, it
surely was for a reason)
- adjust doc change formatting
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/EFI: fix reboot after c643fb110a
acpi_disabled needs to be moved out of .init.data.
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: c643fb110a51693e82a36ca9178d54f0b9744024
master date: 2015-03-13 11:25:52 +0100
master commit: 8ff330ec11e471919621bce97c069b83b0319d15
master date: 2015-03-23 18:01:51 +0100
Ross Lagerwall [Thu, 26 Mar 2015 07:21:40 +0000 (08:21 +0100)]
EFI: fix getting EFI variable list on some systems
Copy the entire output buffer to the guest because some firmwares update
size on successful calls (contrary to the spec) and the buffer may
contain data beyond the output size that the firmware requires on a
subsequent GetNextVariableName() call (e.g. a NULL character).
Note that this shouldn't change the amount of data copied because on success, a
compliant firmware does not change size and so the entire buffer is copied
anyway. If size is changed, Xen does not copy the buffer.
Without this change, the following (simplified) sequence would occur:
GetNextVariableName: in \0, size 1024 || out AdminPw\0, size 7
GetNextVariableName: in AdminPw\0, size 1024 || out UserPw\0, size 6
GetNextVariableName: in UserPww\0, size 1024 || NOT FOUND
This was seen on an Intel S1200RP_SE with firmware
S1200RP.86B.02.02.0005.102320140911, version 4.6, date 2014-10-23.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 1f4eb9d27d0ebd62a0b6cdff8213726f5ae8f25c
master date: 2015-03-10 13:52:01 +0100
Jan Beulich [Thu, 26 Mar 2015 07:21:03 +0000 (08:21 +0100)]
VT-d: print_vtd_entries() should cope with superpages
Even if VT-d code alone (i.e. when not sharing tables with EPT) still
doesn't support superpages, this function - invoked upon DMA remapping
faults - needs to cope with such.
While at it also replace a few more plain numbers with suitable named
constants.
Jan Beulich [Thu, 26 Mar 2015 07:20:11 +0000 (08:20 +0100)]
complete conversion set_bit() -> __cpumask_set_cpu() by 4aaca0e9cd
While converting to __cpumask_set_cpu() was correct, the first argument
passed should have been corrected to be "cpu" instead of "nr" at once.
The wrong construct results in problems on systems with relatively few
CPUs.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 5dbdf33c57e3c95125b92f86d847ed8432e28f1c
master date: 2015-02-27 16:09:27 +0100