]> xenbits.xensource.com Git - xen.git/log
xen.git
9 years agotools/console: Fix build after "xenconsole tolerate tty errors" stable-4.2 staging-4.2
Ian Jackson [Wed, 21 Oct 2015 14:10:22 +0000 (15:10 +0100)]
tools/console: Fix build after "xenconsole tolerate tty errors"

2ddcdd96 "tools/console: xenconsole tolerate tty errors" contains a
compile error introduced during conflict resolution of the backport
from 4.4 to 4.3:

 client/main.c: In function 'get_pty_fd':
 client/main.c:124:10: error: passing argument 1 of 'warn' makes pointer from integer without a cast [-Werror]
 In file included from client/main.c:35:0:
 /usr/include/err.h:35:13: note: expected 'const char *' but argument is of type 'int'

Fix this.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 85ca813ec23c5a60680e4a13777dad530065902b)

9 years agodocs: xl.cfg: permissive option is not PV only.
Ian Campbell [Tue, 6 Oct 2015 08:42:35 +0000 (09:42 +0100)]
docs: xl.cfg: permissive option is not PV only.

Since XSA-131 qemu-xen has defaulted to non-permissive mode and the
option was extended to cover that case in 015a373351e5 "tools: libxl:
allow permissive qemu-upstream pci passthrough".

Since I was rewrapping to adjust the text anyway I've split the safety
warning into a separate paragraph to make it more obvious.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric <epretorious@yahoo.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 7f25baba1c0942e50757f4ecb233202dbbc057b9)

Conflicts:
docs/man/xl.cfg.pod.5

The text here in Xen 4.2 is quite different to that in 4.3 resulting
in a big conflict.  I resolved it by turning this commit into one
which simply removes the annotation `(PV only)'.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools: libxl: allow permissive qemu-upstream pci passthrough.
Ian Campbell [Mon, 1 Jun 2015 10:32:23 +0000 (11:32 +0100)]
tools: libxl: allow permissive qemu-upstream pci passthrough.

Since XSA-131 qemu-xen now restricts access to PCI cfg by default. In
order to allow local configuration of the existing libxl_device_pci
"permissive" flag needs to be plumbed through via the new QMP property
added by the XSA-131 patches.

Versions of QEMU prior to XSA-131 did not support this permissive
property, so we only pass it if it is true. Older versions only
supported permissive mode.

qemu-xen-traditional already supports the permissive mode setting via
xenstore.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 015a373351e5c3553f848324ac0f07a9d92883fa)

9 years agotools/console: xenconsole tolerate tty errors
Ian Jackson [Thu, 27 Feb 2014 17:46:49 +0000 (17:46 +0000)]
tools/console: xenconsole tolerate tty errors

Since 28d386fc4341 (XSA-57), libxl writes an empty value for the
console tty node, with read-only permission for the guest, when
setting up pv console "frontends".  (The actual tty value is later set
by xenconsoled.)   Writing an empty node is not strictly necessary to
stop the frontend from writing dangerous values here, but it is a good
belt-and-braces approach.

Unfortunately this confuses xenconsole.  It reads the empty value, and
tries to open it as the tty.  xenconsole then exits.

Fix this by having xenconsole treat an empty value the same way as no
value at all.

Also, make the error opening the tty be nonfatal: we just print a
warning, but do not exit.  I think this is helpful in theoretical
situations where xenconsole is racing with libxl and/or xenconsoled.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
v2: Combine two conditions and move the free
(cherry picked from commit 39ba2989b10b6a1852e253b204eb010f8e7026f1)
(cherry picked from commit 7b161be2e51c519754ac4435d63c8fc03db606ec)

Conflicts:
tools/console/client/main.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 2ddcdd96a0996fe755c6a9ba08182925c57ea412)

9 years agolibxl: handle read-only drives with qemu-xen
Stefano Stabellini [Tue, 22 Sep 2015 15:56:35 +0000 (16:56 +0100)]
libxl: handle read-only drives with qemu-xen

The current libxl code doesn't deal with read-only drives at all.

Upstream QEMU and qemu-xen only support read-only cdrom drives: make
sure to specify "readonly=on" for cdrom drives and return error in case
the user requested a non-cdrom read-only drive.

This is XSA-142, discovered by Lin Liu
(https://bugzilla.redhat.com/show_bug.cgi?id=1257893).

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Backport to Xen 4.5 and earlier, apropos of report and review from
Michael Young.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 998424e33db121270690586320e899a03c88b4aa)
(Backport to 4.2 and earlier.)
Conflicts:
tools/libxl/libxl_dm.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: Increase device model startup timeout to 1min.
Anthony PERARD [Tue, 7 Jul 2015 15:09:13 +0000 (16:09 +0100)]
libxl: Increase device model startup timeout to 1min.

On a busy host, QEMU may take more than 10s to load and start.

This is likely due to a bug in Linux where the I/O subsystem sometime
produce high latency under load and result in QEMU taking a long time to
load every single dynamic libraries.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 9acfbe14d7261b03e3b3f4dc3c850ba2a7093e1f)

Conflicts:
tools/libxl/libxl_internal.h
(cherry picked from commit bbbd29a25d090f1180d14210358c6d7ccdebef85)
(cherry picked from commit fe66a76a2d6e30f2cd3aea6b2c8cb41f20ca6ea2)
(cherry picked from commit 6d81366482a2fbc418cc96bd2d9875ae99da612c)

9 years agoxl: correct handling of extra_config in main_cpupoolcreate
Wei Liu [Tue, 14 Jul 2015 16:41:10 +0000 (17:41 +0100)]
xl: correct handling of extra_config in main_cpupoolcreate

Don't dereference extra_config if it's NULL. Don't leak extra_config in
the end.

Also fixed a typo in error string while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 705c9e12426cba82804cb578fc70785281655d94)
(cherry picked from commit ffb4e6387f489b6b5ce287f51db43cb37ebae064)
(cherry picked from commit 213e243819ba5f72e13afad41ce2f5df17715530)
(cherry picked from commit f97021eb92e91db8032d600893a531863a18bd23)

9 years agolibxl: fix leak of config_data in main_cpupoolcreate
Matthew Daley [Wed, 18 Sep 2013 03:37:43 +0000 (15:37 +1200)]
libxl: fix leak of config_data in main_cpupoolcreate

Coverity-ID: 1087193
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 7737ecb20c5babff45445d27fff91894b43a2d28)
(cherry picked from commit 8d394a43552434fbe62b9495b88ea8f2bfc12550)

9 years agoQEMU_TAG update
Ian Jackson [Fri, 11 Sep 2015 10:49:53 +0000 (11:49 +0100)]
QEMU_TAG update

9 years agolibxl: poll: Avoid fd deregistration race POLLNVAL crash
Ian Jackson [Wed, 12 Aug 2015 11:50:12 +0000 (12:50 +0100)]
libxl: poll: Avoid fd deregistration race POLLNVAL crash

It can happen that an fd is deregistered, and closed, and then a new
fd opened, and reregistered, all while another thread is in poll().

If this happens poll might report POLLNVAL, but the event loop would
think that the fd was supposed to have been valid, and then fail an
assertion:
  libxl_event.c:1183: afterpoll_check_fd: Assertion `poller->fds_changed || !(fds[slot].revents & 0x020)' failed.

We can't simply ignore POLLNVAL because if we have bugs which cause
messed-up fds, it is a serious problem which we really need to detect.

Instead, add extra tracking to spot when this possibility arises, and
abort on POLLNVAL if we are sure that it is unexpected.

Reported-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 681ce1681622a46d111cfdc4fc07e4cb565ae131)

(cherry picked from commit 7f7642f778b78e8e204fc082ce03072bb26887c7)

And, adjusted for semantic conflict over CTX vs ctx.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 3646b134c1673f09c0a239de10b0da4c9265c8e8)
Conflicts:
tools/libxl/libxl_event.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 3bcb2c062a02e3c45d3f87478d2cbe1a134d395c)

9 years agolibxl: make libxl__poller_put tolerate p==NULL
Ian Jackson [Fri, 11 Oct 2013 11:10:45 +0000 (12:10 +0100)]
libxl: make libxl__poller_put tolerate p==NULL

This is less fragile, and more in keeping with the usual style of
initialising everything to 0 and freeing things unconditionally.

Correspondingly, remove the tests at the call sites.

Apropos of c1f3f174.  No overall functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 6ed09e37722f601661fff42f80279a41773c574e)
(cherry picked from commit f66f746df7983fac7f63b38b1b6bde11a791f1ed)

9 years agolibxl: poll: Use poller_get and poller_put for poller_app
Ian Jackson [Thu, 9 Jul 2015 16:05:07 +0000 (17:05 +0100)]
libxl: poll: Use poller_get and poller_put for poller_app

This makes the code more regular.  We are going to want to do some
more work in poller_get and poller_put, which work also wants to be
done for poller_app.

Two very minor functional changes:

 * We call malloc an extra time since poller_app is now a pointer

 * ERROR_FAIL on poller_get failing for poller_app is generated in
   libxl_ctx_init rather than passed through by libxl_poller_init
   from libxl__pipe_nonblock.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit aae37652067eafd0f2b85050306772df0cb71f08)

(cherry picked from commit 9f6f513eecbdc76ce30b5f2e6c52e02076bac30b)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 0348c450fe253b3db6edc77568820327ff991478)
(cherry picked from commit 856f59437b2fde90f44570433b4cdbb46bfd0c47)

9 years agolibxl: poll: Make libxl__poller_get have only one success return path
Ian Jackson [Thu, 9 Jul 2015 15:52:02 +0000 (16:52 +0100)]
libxl: poll: Make libxl__poller_get have only one success return path

In preparation for doing some more work on successful exit.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
(cherry picked from commit 6fc946bc5520ebdbba5cbae4d49e53895df8b393)

(cherry picked from commit 8c409135e69c7321cb6d82b8cae0868a81d05ddc)
Conflicts:
tools/libxl/libxl_event.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 900c7970de4cb809ba208465cb0abd094b4dec58)
(cherry picked from commit dd80ee80c9bf84c3c7d4c72ffe4c1a9d9d170bf9)

9 years agolibxl: don't leak memory in libxl__poller_get failure case
Matthew Daley [Wed, 30 Oct 2013 07:51:53 +0000 (20:51 +1300)]
libxl: don't leak memory in libxl__poller_get failure case

Coverity-ID: 1055894
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1edd6d8da354442b860ae28b8944dbd8a102d5f7)
(cherry picked from commit e9a0fc5998917b4c77067fa35b16eee4185ca2fa)

9 years agolibxl: event handling: ao_inprogress does waits while reports outstanding
Ian Jackson [Mon, 9 Feb 2015 15:20:32 +0000 (15:20 +0000)]
libxl: event handling: ao_inprogress does waits while reports outstanding

libxl__ao_inprogress needs to check (like
libxl__ao_complete_check_progress_reports) that there are no
oustanding progress callbacks.

Otherwise it might happen that we would destroy the ao while another
thread has an outstanding callback its egc report queue.  The other
thread would then, in its egc_run_callbacks, touch the destroyed ao.

Instead, when this happens in libxl__ao_inprogress, simply run round
the event loop again.  The thread which eventually makes the callback
will spot our poller in the ao, and notify the poller, waking us up.

This fixes an assertion failure race seen with libvirt:
  libvirtd: libxl_event.c:1792: libxl__ao_complete_check_progress_reports: Assertion `ao->in_initiator' failed.
or (after "Add an assert to egc_run_callbacks")
  libvirtd: libxl_event.c:1338: egc_run_callbacks: Assertion `aop->ao->magic == 0xA0FACE00ul' failed.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit f1335f0d7b2402e94e0c6e8a905dc309edaafcfb)
(cherry picked from commit dfed6d96fd6af00c9970e2a1c600d6bb991d137e)
(cherry picked from commit 32f82fa8d9f6ea0eddfd6ab38c0b867b54f99f14)

9 years agolibxl: event handling: Break out ao_work_outstanding
Ian Jackson [Mon, 9 Feb 2015 15:18:30 +0000 (15:18 +0000)]
libxl: event handling: Break out ao_work_outstanding

Break out the test in libxl__ao_complete_check_progress_reports, into
ao_work_outstanding, which reports false if either (i) the ao is still
ongoing or (ii) there is a progress report (perhaps on a different
thread's callback queue) which has yet to be reported to the
application.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 93699882d98cce9736d6e871db303275df1138a2)
(cherry picked from commit ba683105b52f60e312151654d72b0b0e8508cee5)
(cherry picked from commit 546960ca0572d99edff03fd1eb1436c71ff24d6d)

9 years agolibxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:29 +0000 (16:02 -0400)]
libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.

There is no sense in trying to online (or offline) CPUs when the size of
cpumap is greater than the maximum number of VCPUs the guest can go to.

As such fail the operation if the count of CPUs to online is greater
than what the guest started with. For the offline case we do not
check (as the bits are unset in the cpumap) and let it go through.

We coalesce some of the underlying libxl_set_vcpuonline code
together which was duplicated in QMP and XenStore codepaths.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit d83bf9d224eeb5b73b93c2703f7dba4473cfa89c)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 0d8cbcad03764e42ff2f0d224aff883c3734d782)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit ca0f468192d12d8d30c2a48a37c5d3460a464a29)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 116bbc6062cd90b98f746c3a058f4ec24347527d)
Conflicts:
tools/libxl/libxl.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
Andrew Cooper [Mon, 13 Apr 2015 16:11:12 +0000 (16:11 +0000)]
tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125

gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
uint64_t and unsigned long.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 6c1cb3dba4ff97dd40909670755f24fcdf903012)
Conflicts:
tools/libxc/xc_domain.c
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit c776b6429bb3be7a68e819de496a272a4d0cd2f4)

9 years agotools/xenconsoled: Increase file descriptor limit
Andrew Cooper [Mon, 2 Mar 2015 15:04:37 +0000 (15:04 +0000)]
tools/xenconsoled: Increase file descriptor limit

XenServer's VM density testing uncovered a regression when moving from
sysvinit to systemd where the file descriptor limit dropped from 4096 to
1024. (XenServer had previously inserted a ulimit statement into its
initscripts.)

One solution is to use LimitNOFILE=4096 in xenconsoled.service to match the
lost ulimit, but that is only a stopgap solution.

As Xenconsoled genuinely needs a large number of file descriptors if a large
number of domains are running, attempt to increase the limit.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 588df84c0d702e835e526ecef3af6c5444857558)
(cherry picked from commit 9d5b2b01024d18b2135c1b0deebb8bfbf5dced5f)
(cherry picked from commit a490f8df8c989483624ebd6befb63377be55722b)
(cherry picked from commit 9d6754dc2dc488f1a5877955836fa46b4ee119bb)

9 years agoocaml/xenctrl: Fix stub_xc_readconsolering()
Andrew Cooper [Fri, 30 Jan 2015 14:11:14 +0000 (14:11 +0000)]
ocaml/xenctrl: Fix stub_xc_readconsolering()

The Ocaml stub to retrieve the hypervisor console ring had a few problems.

 * A single 32k buffer would truncate a large console ring.
 * The buffer was static and not under the protection of the Ocaml GC lock so
   could be clobbered by concurrent accesses.
 * Embedded NUL characters would cause caml_copy_string() (which is strlen()
   based) to truncate the buffer.

The function is rewritten from scratch, using the same algorithm as the python
stubs, but uses the protection of the Ocaml GC lock to maintain a static
running total of the ring size, to avoid redundant realloc()ing in future
calls.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Dave Scott <dave.scott@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 1a010ca99e9b04c1cfbd0ee718aa22d5ebd530ab)
(cherry picked from commit cfc4c43be14e60608ed0b8173365c737950afe41)
(cherry picked from commit c669c244246a7e45cedb03a30d59656c95d09719)
(cherry picked from commit 04f8006b66fd9f351467ad0e9c77a0a14741c6e4)

9 years agoocaml/xenctrl: Make failwith_xc() thread safe
Andrew Cooper [Wed, 28 Jan 2015 17:55:32 +0000 (17:55 +0000)]
ocaml/xenctrl: Make failwith_xc() thread safe

The static error_str[] buffer is not thread-safe, and 1024 bytes is
unreasonably large.  Reduce to 256 bytes (which is still much larger than any
current use), and move it to being a stack variable.

Also, propagate the Noreturn attribute from caml_raise_with_string().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Dave Scott <Dave.Scott@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
(cherry picked from commit c8945d51613450c19e0898b1b3056c90f4929179)
(cherry picked from commit 032673c8836e28d9e291e0d02235001c41aedaab)
(cherry picked from commit 9702e084d09550495c2e71f2639c1c2c43aeaf63)
(cherry picked from commit dea3fd1e945de74154e74989623a4272f43338fe)

9 years agoocaml/xenctrl: Check return values from hypercalls
Andrew Cooper [Tue, 27 Jan 2015 20:38:11 +0000 (20:38 +0000)]
ocaml/xenctrl: Check return values from hypercalls

rather than blindly continuing and possibly using negative values.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Dave Scott <dave.scott@eu.citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
(cherry picked from commit 3380f5b6270e6fa4b24313f8808e7625e4c5a6ba)
(cherry picked from commit c91ed889ea3c3781a94a30909f30c3aad56c97d5)
(cherry picked from commit 10a95535b0dcde7daa3402b92f3c8d4494781c28)
(cherry picked from commit 8d0ca8a5e1965e241baf32e2c8dfcd82bb924aac)

9 years agolibxl: In domain death search, start search at first domid we want
Ian Jackson [Tue, 17 Mar 2015 15:30:57 +0000 (09:30 -0600)]
libxl: In domain death search, start search at first domid we want

From: Ian Jackson <Ian.Jackson@eu.citrix.com>

When domain_death_xswatch_callback needed a further call to
xc_domain_getinfolist it would restart it with the last domain it
found rather than the first one it wants.

If it only wants one it will also only ask for one domain.  The result
would then be that it gets the previous domain again (ie, the previous
one to the one it wants), which still doesn't reveal the answer to the
question, and it would therefore loop again.

It's completely unclear to me why I thought it was a good idea to
start the xc_domain_getinfolist with the last domain previously found
rather than the first one left un-confirmed.  The code has been that
way since it was introduced.

Instead, start each xc_domain_getinfolist at the next domain whose
status we need to check.

We also need to move the test for !evg into the loop, we now need evg
to compute the arguments to getinfolist.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 4783c99aab866f470bd59368cfbf5ad5f677b0ec)
(cherry picked from commit 0b19348f3cd176e4badb173dd0054c49346a6ce1)
(cherry picked from commit 13623d5d8e854f20b2da885f4d452dc870912205)
(cherry picked from commit 0332b3f24b2c1be06a17b00786021aeea01c52f5)

9 years agodocs: workaround markdown parser error in xen-command-line.markdown
Ian Campbell [Wed, 19 Nov 2014 10:42:18 +0000 (10:42 +0000)]
docs: workaround markdown parser error in xen-command-line.markdown

Some versions of markdown (specifically the one in Debian Wheezy, currently
used to generate
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
confused by nested lists in the middle of multi-paragraph parent list entries
as seen in the com1,com2 entry.

The effect is that the "Default" section of all following entries are replace
by some sort of hash or checksum (at least, a string of 32 random seeming hex
digits).

Workaround this issue by making the decriptions of the DPS options a nested
list, moving the existing nested list describing the options for S into a third
level list. This seems to avoid the issue, and is arguably better formatting in
its own right (at least its not a regression IMHO)

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit ae325e94d2076f3953824e069c062908221f7325)
(cherry picked from commit 3e9054c523ed089d04cf10b7f5b417240be85760)
(cherry picked from commit f558eb43bf6f381853ef010d2d7ba9327481d2de)

9 years agoxl: Sane handling of extra config file arguments
Ian Jackson [Mon, 15 Jun 2015 13:50:42 +0000 (14:50 +0100)]
xl: Sane handling of extra config file arguments

Various xl sub-commands take additional parameters containing = as
additional config fragments.

The handling of these config fragments has a number of bugs:

 1. Use of a static 1024-byte buffer.  (If truncation would occur,
    with semi-trusted input, a security risk arises due to quotes
    being lost.)

 2. Mishandling of the return value from snprintf, so that if
    truncation occurs, the to-write pointer is updated with the
    wanted-to-write length, resulting in stack corruption.  (This is
    XSA-137.)

 3. Clone-and-hack of the code for constructing the appended
    config file.

These are fixed here, by introducing a new function
`string_realloc_append' and using it everywhere.  The `extra_info'
buffers are replaced by pointers, which start off NULL and are
explicitly freed on all return paths.

The separate variable which will become dom_info.extra_config is
abolished (which involves moving the clearing of dom_info).

Additional bugs I observe, not fixed here:

 4. The functions which now call string_realloc_append use ad-hoc
    error returns, with multiple calls to `return'.  This currently
    necessitates multiple new calls to `free'.

 5. Many of the paths in xl call exit(-rc) where rc is a libxl status
    code.  This is a ridiculous exit status `convention'.

 6. The loops for handling extra config data are clone-and-hacks.

 7. Once the extra config buffer is accumulated, it must be combined
    with the appropriate main config file.  The code to do this
    combining is clone-and-hacked too.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian,campbell@citrix.com>
(cherry picked from commit dd84604f35bd3855c57146eb8fe53924c10d3963)
(cherry picked from commit 6040b3aeb32b4bce2d9958ecbcbd020c46c35d61)
(cherry picked from commit 214fd40a20fa5988b4ea021c2d06e8aca8dda184)
(cherry picked from commit d7ab3a1c1cc245dc0683bb937467c27141754053)

9 years agoQEMU_TAG update
Ian Jackson [Wed, 29 Jul 2015 15:35:24 +0000 (16:35 +0100)]
QEMU_TAG update

9 years agoQEMU_TAG update
Ian Jackson [Tue, 23 Jun 2015 10:44:13 +0000 (11:44 +0100)]
QEMU_TAG update

9 years agoQEMU_TAG update
Ian Jackson [Thu, 11 Jun 2015 15:49:25 +0000 (16:49 +0100)]
QEMU_TAG update

9 years agox86/traps: loop in the correct direction in compat_iret()
Andrew Cooper [Thu, 11 Jun 2015 13:05:36 +0000 (15:05 +0200)]
x86/traps: loop in the correct direction in compat_iret()

This is CVE-2015-4164 / XSA-136.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 1f0721377952fc038b48f175d7061ec701359aac
master date: 2015-06-11 14:44:47 +0200

9 years agognttab: add missing version check to GNTTABOP_swap_grant_ref handling
Jan Beulich [Thu, 11 Jun 2015 13:03:58 +0000 (15:03 +0200)]
gnttab: add missing version check to GNTTABOP_swap_grant_ref handling

... avoiding NULL derefs when the version to use wasn't set yet (via
GNTTABOP_setup_table or GNTTABOP_set_version).

This is CVE-2015-4163 / XSA-134.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 5d5c09d853d3f212861f70c577c65d1703f752ae
master date: 2015-06-11 14:44:12 +0200

9 years agoQEMU_TAG update
Ian Jackson [Wed, 27 May 2015 15:57:21 +0000 (16:57 +0100)]
QEMU_TAG update

10 years agodomctl/sysctl: don't leak hypervisor stack to toolstacks
Andrew Cooper [Tue, 21 Apr 2015 07:27:07 +0000 (09:27 +0200)]
domctl/sysctl: don't leak hypervisor stack to toolstacks

This is CVE-2015-3340 / XSA-132.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 4ff3449f0e9d175ceb9551d3f2aecb59273f639d
master date: 2015-04-21 09:03:15 +0200

10 years agoLimit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)
Konrad Rzeszutek Wilk [Wed, 1 Apr 2015 09:12:19 +0000 (10:12 +0100)]
Limit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)

Said hypercall for large BARs can take quite a while. As such
we can require that the hypercall MUST break up the request
in smaller values.

Another approach is to add preemption to it - whether we do the
preemption using hypercall_create_continuation or returning
EAGAIN to userspace (and have it re-invocate the call) - either
way the issue we cannot easily solve is that in 'map_mmio_regions'
if we encounter an error we MUST call 'unmap_mmio_regions' for the
whole BAR region.

Since the preemption would re-use input fields such as nr_mfns,
first_gfn, first_mfn - we would lose the original values -
and only undo what was done in the current round (i.e. ignoring
anything that was done prior to earlier preemptions).

Unless we re-used the return value as 'EAGAIN|nr_mfns_done<<10' but
that puts a limit (since the return value is a long) on the amount
of nr_mfns that can provided.

This patch sidesteps this problem by:
 - Setting an hard limit of nr_mfns having to be 64 or less.
 - Toolstack adjusts correspondingly to the nr_mfn limit.
 - If the there is an error when adding the toolstack will call the
   remove operation to remove the whole region.

The need to break this hypercall down is for large BARs can take
more than the guest (initial domain usually) time-slice. This has
the negative result in that the guest is locked out for a long
duration and is unable to act on any pending events.

We also augment the code to return zero if nr_mfns instead
of trying to the hypercall.

This is XSA-125 / CVE-2015-2752.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoQEMU_TAG update
Ian Jackson [Tue, 31 Mar 2015 15:35:14 +0000 (16:35 +0100)]
QEMU_TAG update

10 years agotools: libxl: Explicitly disable graphics backends on qemu cmdline
Ian Campbell [Fri, 20 Feb 2015 14:41:09 +0000 (14:41 +0000)]
tools: libxl: Explicitly disable graphics backends on qemu cmdline

By default qemu will try to create some sort of backend for the
emulated VGA device, either SDL or VNC.

However when the user specifies sdl=0 and vnc=0 in their configuration
libxl was not explicitly disabling either backend, which could lead to
one unexpectedly running.

If either sdl=1 or vnc=1 is configured then both before and after this
change only the backends which are explicitly enabled are configured,
i.e. this issue only occurs when all backends are supposed to have
been disabled.

This affects qemu-xen and qemu-xen-traditional differently.

If qemu-xen was compiled with SDL support then this would result in an
SDL window being opened if $DISPLAY is valid, or a failure to start
the guest if not. Passing "-display none" to qemu before any further
-sdl options disables this default behaviour and ensures that SDL is
only started if the libxl configuration demands it.

If qemu-xen was compiled without SDL support then qemu would instead
start a VNC server listening on ::1 (IPv6 localhost) or 127.0.0.1
(IPv4 localhost) with IPv6 preferred if available. Explicitly pass
"-vnc none" when vnc is not enabled in the libxl configuration to
remove this possibility.

qemu-xen-traditional would never start a vnc backend unless asked.
However by default it will start an SDL backend, the way to disable
this is to pass a -vnc option. In other words passing "-vnc none" will
disable both vnc and sdl by default. sdl can then be reenabled if
configured by subsequent use of the -sdl option.

Tested with both qemu-xen and qemu-xen-traditional built with SDL
support and:
xl cr # defaults
xl cr sdl=0 vnc=0
xl cr sdl=1 vnc=0
xl cr sdl=0 vnc=1
xl cr sdl=0 vnc=0 vga=\"none\"
xl cr sdl=0 vnc=0 nographic=1
with both valid and invalid $DISPLAY.

This is XSA-119 / CVE-2015-2152.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agox86emul: fully ignore segment override for register-only operations
Jan Beulich [Tue, 10 Mar 2015 13:00:22 +0000 (14:00 +0100)]
x86emul: fully ignore segment override for register-only operations

For ModRM encoded instructions with register operands we must not
overwrite ea.mem.seg (if a - bogus in that case - segment override was
present) as it aliases with ea.reg.

This is CVE-2015-2151 / XSA-123.

Reported-by: Felix Wilhelm <fwilhelm@ernw.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Keir Fraser <keir@xen.org>
master commit: bcf92a5382b75fd964c1f8678b2d9a3abe6dec39
master date: 2015-03-10 13:45:51 +0100

10 years agopre-fill structures for certain HYPERVISOR_xen_version sub-ops
Aaron Adams [Thu, 5 Mar 2015 12:51:53 +0000 (13:51 +0100)]
pre-fill structures for certain HYPERVISOR_xen_version sub-ops

... avoiding to pass hypervisor stack contents back to the caller
through space unused by the respective strings.

This is CVE-2015-2045 / XSA-122.

Signed-off-by: Aaron Adams <Aaron.Adams@nccgroup.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: fe2e079f642effb3d24a6e1a7096ef26e691d93e
master date: 2015-03-05 13:35:54 +0100

10 years agox86/HVM: return all ones on wrong-sized reads of system device I/O ports
Jan Beulich [Thu, 5 Mar 2015 12:50:50 +0000 (13:50 +0100)]
x86/HVM: return all ones on wrong-sized reads of system device I/O ports

So far the value presented to the guest remained uninitialized.

This is CVE-2015-2044 / XSA-121.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: c9e57594e1ba5da9d705dee9f00aa4e7e925963d
master date: 2015-03-05 13:34:54 +0100

10 years agox86/HVM: prevent use-after-free when destroying a domain
Mihai Donțu [Tue, 6 Jan 2015 12:52:21 +0000 (12:52 +0000)]
x86/HVM: prevent use-after-free when destroying a domain

hvm_domain_relinquish_resources() can free certain domain resources
which can still be accessed, e.g. by HVMOP_set_param, while the domain
is being cleaned up.

Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
Tested-by: Răzvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
This is CVE-2015-0361 / XSA-116.
(cherry picked from commit 5d4e3ff19c33770ce01bec949c50326b11088fef)

10 years agoswitch to write-biased r/w locks
Keir Fraser [Mon, 8 Dec 2014 14:32:04 +0000 (15:32 +0100)]
switch to write-biased r/w locks

This is to improve fairness: A permanent flow of read acquires can
otherwise lock out eventual writers indefinitely.

This is CVE-2014-9065 / XSA-114.

Signed-off-by: Keir Fraser <keir@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 2a549b9c8aa48dc39d7c97e5a93978b781b3a1db
master date: 2014-12-08 14:45:46 +0100

10 years agox86/HVM: confine internally handled MMIO to solitary regions
Jan Beulich [Thu, 27 Nov 2014 13:18:14 +0000 (14:18 +0100)]
x86/HVM: confine internally handled MMIO to solitary regions

While it is generally wrong to cross region boundaries when dealing
with MMIO accesses of repeated string instructions (currently only
MOVS) as that would do things a guest doesn't expect (leaving aside
that none of these regions would normally be accessed with repeated
string instructions in the first place), this is even more of a problem
for all virtual MSI-X page accesses (both msixtbl_{read,write}() can be
made dereference NULL "entry" pointers this way) as well as undersized
(1- or 2-byte) LAPIC writes (causing vlapic_read_aligned() to access
space beyond the one memory page set up for holding LAPIC register
values).

Since those functions validly assume to be called only with addresses
their respective checking functions indicated to be okay, it is generic
code that needs to be fixed to clip the repetition count.

To be on the safe side (and consistent), also do the same for buffered
I/O intercepts, even if their only client (stdvga) doesn't put the
hypervisor at risk (i.e. "only" guest misbehavior would result).

This is CVE-2014-8867 / XSA-112.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86: limit checks in hypercall_xlat_continuation() to actual arguments
Jan Beulich [Thu, 27 Nov 2014 13:17:32 +0000 (14:17 +0100)]
x86: limit checks in hypercall_xlat_continuation() to actual arguments

HVM/PVH guests can otherwise trigger the final BUG_ON() in that
function by entering 64-bit mode, setting the high halves of affected
registers to non-zero values, leaving 64-bit mode, and issuing a
hypercall that might get preempted and hence become subject to
continuation argument translation (HYPERVISOR_memory_op being the only
one possible for HVM, PVH also having the option of using
HYPERVISOR_mmuext_op). This issue got introduced when HVM code was
switched to use compat_memory_op() - neither that nor
hypercall_xlat_continuation() were originally intended to be used by
other than PV guests (which can't enter 64-bit mode and hence have no
way to alter the high halves of 64-bit registers).

This is CVE-2014-8866 / XSA-111.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/mm: fix a reference counting error in MMU_MACHPHYS_UPDATE
Andrew Cooper [Thu, 20 Nov 2014 16:47:07 +0000 (17:47 +0100)]
x86/mm: fix a reference counting error in MMU_MACHPHYS_UPDATE

Any domain which can pass the XSM check against a translated guest can cause a
page reference to be leaked.

While shuffling the order of checks, drop the quite-pointless MEM_LOG().  This
brings the check in line with similar checks in the vicinity.

Discovered while reviewing the XSA-109/110 followup series.

This is XSA-113.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86emul: enforce privilege level restrictions when loading CS
Jan Beulich [Tue, 18 Nov 2014 13:33:55 +0000 (14:33 +0100)]
x86emul: enforce privilege level restrictions when loading CS

Privilege level checks were basically missing for the CS case, the
only check that was done (RPL == DPL for nonconforming segments)
was solely covering a single special case (return to non-conforming
segment).

Additionally in long mode the L bit set requires the D bit to be clear,
as was recently pointed out for KVM by Nadav Amit
<namit@cs.technion.ac.il>.

Finally we also need to force the loaded selector's RPL to CPL (at
least as long as lret/retf emulation doesn't support privilege level
changes).

This is CVE-2014-8595 / XSA-110.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86: don't allow page table updates on non-PV page tables in do_mmu_update()
Jan Beulich [Tue, 18 Nov 2014 13:32:57 +0000 (14:32 +0100)]
x86: don't allow page table updates on non-PV page tables in do_mmu_update()

paging_write_guest_entry() and paging_cmpxchg_guest_entry() aren't
consistently supported for non-PV guests (they'd deref NULL for PVH or
non-HAP HVM ones). Don't allow respective MMU_* operations on the
page tables of such domains.

This is CVE-2014-8594 / XSA-109.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86/paging: make log-dirty operations preemptible
Jan Beulich [Fri, 31 Oct 2014 10:23:17 +0000 (11:23 +0100)]
x86/paging: make log-dirty operations preemptible

Both the freeing and the inspection of the bitmap get done in (nested)
loops which - besides having a rather high iteration count in general,
albeit that would be covered by XSA-77 - have the number of non-trivial
iterations they need to perform (indirectly) controllable by both the
guest they are for and any domain controlling the guest (including the
one running qemu for it).

Note that the tying of the continuations to the invoking domain (which
previously [wrongly] used the invoking vCPU instead) implies that the
tools requesting such operations have to make sure they don't issue
multiple similar operations in parallel.

Note further that this breaks supervisor-mode kernel assumptions in
hypercall_create_continuation() (where regs->eip gets rewound to the
current hypercall stub beginning), but otoh
hypercall_cancel_continuation() doesn't work in that mode either.
Perhaps time to rip out all the remains of that feature?

This is part of CVE-2014-5146 / XSA-97.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 070493dfd2788e061b53f074b7ba97507fbcbf65
master date: 2014-10-06 11:22:04 +0200

10 years agox86/HVM: properly bound x2APIC MSR range
Jan Beulich [Wed, 1 Oct 2014 13:13:17 +0000 (15:13 +0200)]
x86/HVM: properly bound x2APIC MSR range

While the write path change appears to be purely cosmetic (but still
gets done here for consistency), the read side mistake permitted
accesses beyond the virtual APIC page.

Note that while this isn't fully in line with the specification
(digesting MSRs 0x800-0xBFF for the x2APIC), this is the minimal
possible fix addressing the security issue and getting x2APIC related
code into a consistent shape (elsewhere a 256 rather than 1024 wide
window is being used too). This will be dealt with subsequently.

This is CVE-2014-7188 / XSA-108.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 61fdda7acf3de11f3d50d50e5b4f4ecfac7e0d04
master date: 2014-10-01 14:54:47 +0200

10 years agox86emul: only emulate software interrupt injection for real mode
Jan Beulich [Tue, 23 Sep 2014 12:51:12 +0000 (14:51 +0200)]
x86emul: only emulate software interrupt injection for real mode

Protected mode emulation currently lacks proper privilege checking of
the referenced IDT entry, and there's currently no legitimate way for
any of the respective instructions to reach the emulator when the guest
is in protected mode.

This is XSA-106.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 346d4545569928b652c40c7815c1732676f8587c
master date: 2014-09-23 14:33:50 +0200

10 years agox86/emulate: check cpl for all privileged instructions
Andrew Cooper [Tue, 23 Sep 2014 12:50:39 +0000 (14:50 +0200)]
x86/emulate: check cpl for all privileged instructions

Without this, it is possible for userspace to load its own IDT or GDT.

This is XSA-105.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrei LUTAS <vlutas@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 0e442727ceccfa32a7276cccd205b4722e68fdc1
master date: 2014-09-23 14:33:06 +0200

10 years agox86/shadow: fix race condition sampling the dirty vram state
Andrew Cooper [Tue, 23 Sep 2014 12:49:35 +0000 (14:49 +0200)]
x86/shadow: fix race condition sampling the dirty vram state

d->arch.hvm_domain.dirty_vram must be read with the domain's paging lock held.

If not, two concurrent hypercalls could both end up attempting to free
dirty_vram (the second of which will free a wild pointer), or both end up
allocating a new dirty_vram structure (the first of which will be leaked).

This is XSA-104.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 46a49b91f1026f64430b84dd83e845a33f06415e
master date: 2014-09-23 14:31:47 +0200

10 years agoupdate Xen version to 4.2.5 RELEASE-4.2.5
Jan Beulich [Tue, 2 Sep 2014 06:22:57 +0000 (08:22 +0200)]
update Xen version to 4.2.5

10 years agox86_emulate: properly do IP updates and other side effects on success
Jan Beulich [Fri, 22 Aug 2014 12:13:27 +0000 (14:13 +0200)]
x86_emulate: properly do IP updates and other side effects on success

The two MMX/SSE/AVX code blocks failed to update IP properly, and these
as well as get_reg_refix(), which "manually" updated IP so far, failed
to do the TF and RF processing needed at the end of successfully
emulated instructions.

Fix the test utility at once to check IP is properly getting updated,
and while at it macroize the respective code quite a bit, hopefully
making it easier to add further tests when the need arises.

Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
master commit: 3af450fd2d9403f208d3ac6459716f027b8597ad
master date: 2014-08-08 09:34:03 +0200

10 years agoRevert "x86/paging: make log-dirty operations preemptible"
Ian Jackson [Tue, 19 Aug 2014 11:54:06 +0000 (12:54 +0100)]
Revert "x86/paging: make log-dirty operations preemptible"

This reverts commit 6fbd99d7cc46a91566839e85bffb9bff406adbae.

This fix has been discovered to cause regressions.  Investigations are
ongoing.

Revert-reqeuested-by: Jan Beulich <JBeulich@suse.com>
10 years agox86/cpu: undo BIOS CPUID max_leaf limit before querying for features
Andrew Cooper [Tue, 12 Aug 2014 14:10:01 +0000 (16:10 +0200)]
x86/cpu: undo BIOS CPUID max_leaf limit before querying for features

If IA32_MISC_ENABLE[22] is set by the BIOS, CPUID.0.EAX will be limited to 3.
Lift this limit before considering whether to query CPUID.7[ECX=0].EBX for
features.

Without this change, dom0 is able to see this feature leaf (as the limit was
subsequently lifted), and will set features appropriately in HVM domain cpuid
policies.

The specific bug XenServer observed was the advertisement of the FSGSBASE
feature, but an inability to set CR4.FSGSBASE as Xen considered the bit to be
reserved as cpu_has_fsgsbase incorrectly evaluated as false.

This is a regression introduced by c/s 44e24f8567 "x86: don't call
generic_identify() redundantly" where the redundant call actually resampled
CPUID.7[ECX=0] properly to obtain the feature flags.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a1ac4cf52e38386bac7ac3440c7da0099662ca5c
master date: 2014-07-29 17:02:25 +0200

10 years agox86/paging: make log-dirty operations preemptible
Jan Beulich [Tue, 12 Aug 2014 14:09:12 +0000 (16:09 +0200)]
x86/paging: make log-dirty operations preemptible

Both the freeing and the inspection of the bitmap get done in (nested)
loops which - besides having a rather high iteration count in general,
albeit that would be covered by XSA-77 - have the number of non-trivial
iterations they need to perform (indirectly) controllable by both the
guest they are for and any domain controlling the guest (including the
one running qemu for it).

This is CVE-2014-5146 / XSA-97.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 95e6d82224689fdfd967a093a4d69efc24c17e91
master date: 2014-08-12 15:30:11 +0200

10 years agox86/mm/hap: Adjust vram tracking to play nicely with log-dirty.
Robert Phillips [Tue, 12 Aug 2014 14:08:10 +0000 (16:08 +0200)]
x86/mm/hap: Adjust vram tracking to play nicely with log-dirty.

The previous code assumed the guest would be in one of three mutually exclusive
modes for bookkeeping dirty pages: (1) shadow, (2) hap utilizing the log dirty
bitmap to support functionality such as live migrate, (3) hap utilizing the
log dirty bitmap to track dirty vram pages.
Races arose when a guest attempted to track dirty vram while performing live
migrate.  (The dispatch table managed by paging_log_dirty_init() might change
in the middle of a log dirty or a vram tracking function.)

This change allows hap log dirty and hap vram tracking to be concurrent.
Vram tracking no longer uses the log dirty bitmap.  Instead it detects
dirty vram pages by examining their p2m type.  The log dirty bitmap is only
used by the log dirty code.  Because the two operations use different
mechanisms, they are no longer mutually exclusive.

Signed-Off-By: Robert Phillips <robert.phillips@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Minor whitespace changes to conform with coding style
Signed-off-by: Tim Deegan <tim@xen.org>
master commit: fd91a2a662bc59677e0f217423a7a155d5465886
master date: 2012-12-13 12:10:14 +0000

10 years agoupdate Xen version to 4.2.5-rc2 4.2.5-rc2
Jan Beulich [Tue, 5 Aug 2014 11:44:16 +0000 (13:44 +0200)]
update Xen version to 4.2.5-rc2

10 years agox86/mem_event: prevent underflow of vcpu pause counts
Andrew Cooper [Mon, 28 Jul 2014 13:16:19 +0000 (15:16 +0200)]
x86/mem_event: prevent underflow of vcpu pause counts

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Tested-by: Aravindh Puthiyaparambil <aravindp@cisco.com>
master commit: 868d9b99b39c53dc1f6ae9bfd7b148c206fd7240
master date: 2014-07-23 18:08:04 +0200

10 years agox86/mem_event: validate the response vcpu_id before acting on it
Andrew Cooper [Mon, 28 Jul 2014 13:15:21 +0000 (15:15 +0200)]
x86/mem_event: validate the response vcpu_id before acting on it

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
master commit: ee75480b3c8856db9ef1aa45418f35ec0d78989d
master date: 2014-07-23 18:07:11 +0200

10 years agox86/mem_event: fix regression affecting CR0 memory events
Tamas K Lengyel [Mon, 28 Jul 2014 13:14:43 +0000 (15:14 +0200)]
x86/mem_event: fix regression affecting CR0 memory events

This is a patch repairing a regression in code previously functional in 4.1.x.
It appears that, during some refactoring work, call to hvm_memory_event_cr0 was lost.

This function was originally called in mov_to_cr() of vmx.c, but the commit
http://xenbits.xen.org/hg/xen-unstable.hg/rev/1276926e3795 abstracted the
original code into generic functions up a level in hvm.c, dropping the call
in the process.

The same issue affected the CR3 and CR4 events, which were fixed in patch
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/7ab899e46347.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 5d570c1d0274cac3b333ef378af3325b3b69905e
master date: 2014-07-23 18:05:11 +0200

10 years agoavoid crash when doing shutdown with active cpupools
Juergen Gross [Mon, 28 Jul 2014 13:11:21 +0000 (15:11 +0200)]
avoid crash when doing shutdown with active cpupools

When shutting down the machine while there are cpus in a cpupool other than
Pool-0 a crash is triggered due to cpupool handling rejecting offlining the
non-boot cpus in other cpupools.

It is easy to detect this case and allow offlining those cpus.

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
master commit: 05377dede434c746e6708f055858378d20f619db
master date: 2014-07-23 18:03:19 +0200

10 years agoproperly reference count DOMCTL_{,un}pausedomain hypercalls
Andrew Cooper [Mon, 28 Jul 2014 13:10:41 +0000 (15:10 +0200)]
properly reference count DOMCTL_{,un}pausedomain hypercalls

For safety reasons, c/s 6ae2df93c27 "mem_access: Add helper API to setup
ring and enable mem_access" has to pause the domain while it performs a set of
operations.

However without properly reference counted hypercalls, xc_mem_event_enable()
now unconditionally unpauses a previously paused domain.

To prevent toolstack software running wild, there is an arbitrary limit of 255
on the toolstack pause count.  This is high enough for several components of
the toolstack to safely use, but prevents over/underflow of d->pause_count.

The previous domain_{,un}pause_by_systemcontroller() functions are updated to
return an error code.  domain_pause_by_systemcontroller() is modified to have
a common stub and take a pause_fn pointer, allowing for both sync and nosync
domain pauses.  domain_pause_for_debugger() has a hand-rolled nosync pause
replaced with the new domain_pause_by_systemcontroller_nosync(), and has its
variables shuffled slightly to avoid rereading current multiple times.

Suggested-by: Don Slutz <dslutz@verizon.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
With a couple of formatting adjustments:
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/gdbsx: invert preconditions for XEN_DOMCTL_gdbsx_{,un}pausevcpu hypercalls

c/s 3eb1c708ab "properly reference count DOMCTL_{,un}pausedomain hypercalls"
accidentally inverted the use of d->controller_pause_count.

Revert back to how it was originally, i.e. the XEN_DOMCTL_gdbsx_{,un}pausevcpu
hypercalls are only valid for a domain already paused by the system controller.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3eb1c708ab0fe1067a436498a684907afa14dacf
master date: 2014-07-03 16:51:13 +0200
master commit: 680d79f10bb70691a9ae3b4a6a8b669e0f2837f6
master date: 2014-07-25 11:53:31 +0200

10 years agoVT-d/ATS: correct and clean up dev_invalidate_iotlb()
Jan Beulich [Mon, 28 Jul 2014 13:09:47 +0000 (15:09 +0200)]
VT-d/ATS: correct and clean up dev_invalidate_iotlb()

While this was intended to only do cleanup (replace the two bogus
"ret |= " constructs, and a simple formatting correction), this now
also
- fixes the bit manipulations for size_order > 0
  a) correct an off-by-one in the use of size_order for shifting (till
     now double the requested size got invalidated)
  b) in fact setting bit 12 and up if necessary (without which too
     small a region might have got invalidated)
  c) making them capable of dealing with regions of 4Gb size and up
- corrects the return value handling, such that a later iteration's
  success won't clear an earlier iteration's error indication
- uses PCI_BDF2() instead of open coding it
- bail immediately on bad passed in invalidation type, rather than
  repeatedly printing the same message for each ATS-capable device, at
  once also no longer hiding that failure from the caller

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: fd33987ba27607c3cc7da258cf1d86d21beeb735
master date: 2014-06-30 15:57:40 +0200

10 years agoblktap2: Fix two 'maybe uninitialized' variables
Dario Faggioli [Fri, 20 Jun 2014 14:09:00 +0000 (16:09 +0200)]
blktap2: Fix two 'maybe uninitialized' variables

for which gcc 4.9.0 complains about, like this:

block-qcow.c: In function `get_cluster_offset':
block-qcow.c:431:3: error: `tmp_ptr' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
   memcpy(tmp_ptr, l1_ptr, 4096);
   ^
block-qcow.c:606:7: error: `tmp_ptr2' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
   if (write(s->fd, tmp_ptr2, 4096) != 4096) {
       ^
cc1: all warnings being treated as errors
/home/dario/Sources/xen/xen/xen.git/tools/blktap2/drivers/../../../tools/Rules.mk:89:
 recipe for target 'block-qcow.o' failed
make[5]: *** [block-qcow.o] Error 1

The proper behavior is to return upon allocation failure.
About what to return, 0 seems the best option, looking
at both the function and the call sites.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 345e44a85d71a1a910385f33c7f1ba3683026d18)
(cherry picked from commit 5e39eb05aa2a6d9bfa6c3b3e299b071422498625)
(cherry picked from commit c591613f8c629f18a521269d67240d532f2c27d1)

10 years agoVT-d/qinval: make local variable used for communication with IOMMU "volatile"
Jan Beulich [Tue, 24 Jun 2014 08:24:52 +0000 (10:24 +0200)]
VT-d/qinval: make local variable used for communication with IOMMU "volatile"

Without that there is - afaict - nothing preventing the compiler from
putting the variable into a register for the duration of the wait loop.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: ceec46c02074e1b2ade0b13c3c4a2f3942ae698c
master date: 2014-06-20 10:25:33 +0200

10 years agox86/EFI: allow FPU/XMM use in runtime service functions
Jan Beulich [Tue, 24 Jun 2014 08:24:20 +0000 (10:24 +0200)]
x86/EFI: allow FPU/XMM use in runtime service functions

UEFI spec update 2.4B developed a requirement to enter runtime service
functions with CR0.TS (and CR0.EM) clear, thus making feasible the
already previously stated permission for these functions to use some of
the XMM registers. Enforce this requirement (along with the connected
ones on FPU control word and MXCSR) by going through a full FPU save
cycle (if the FPU was dirty) in efi_rs_enter() (along with loading  the
specified values into the other two registers).

Note that the UEFI spec mandates that extension registers other than
XMM ones (for our purposes all that get restored eagerly) are preserved
across runtime function calls, hence there's nothing we need to restore
in efi_rs_leave() (they do get saved, but just for simplicity's sake).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: e0fe297dabc96d8161d568f19a99722c4739b9f9
master date: 2014-06-18 15:53:27 +0200

10 years agoIOMMU: prevent VT-d device IOTLB operations on wrong IOMMU
Malcolm Crossley [Tue, 24 Jun 2014 08:23:12 +0000 (10:23 +0200)]
IOMMU: prevent VT-d device IOTLB operations on wrong IOMMU

PCIe ATS allows for devices to contain IOTLBs, the VT-d code was iterating
around all ATS capable devices and issuing IOTLB operations for all IOMMUs,
even though each ATS device is only accessible via one particular IOMMU.

Issuing an IOMMU operation to a device not accessible via that IOMMU results
in an IOMMU timeout because the device does not reply. VT-d IOMMU timeouts
result in a Xen panic.

Therefore this bug prevents any Intel system with 2 or more ATS enabled IOMMUs,
each with an ATS device connected to them, from booting Xen.

The patch adds a IOMMU pointer to the ATS device struct so the VT-d code can
ensure it does not issue IOMMU ATS operations on the wrong IOMMU. A void
pointer has to be used because AMD and Intel IOMMU implementations do not have
a common IOMMU structure or indexing mechanism.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 84c340ba4c3eb99278b6ba885616bb183b88ad67
master date: 2014-06-18 15:50:02 +0200

10 years agox86/mce: don't spam the console with "CPUx: Temperature z"
Konrad Rzeszutek Wilk [Tue, 24 Jun 2014 08:18:29 +0000 (10:18 +0200)]
x86/mce: don't spam the console with "CPUx: Temperature z"

If the machine has been quite busy it ends up with these messages
printed on the hypervisor console:

(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature above threshold
(XEN) CPU0: Running in modulated clock mode
(XEN) CPU1: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal

While the state changes are important, the non-altered state
information is not needed. As such add a latch mechanism to only print
the information if it has changed since the last update (and the
hardware doesn't properly suppress redundant notifications).

This was observed on Intel DQ67SW,
BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
master commit: 323338f86fb6cd6f6dba4f59a84eed71b3552d21
master date: 2014-06-16 11:59:32 +0200

10 years agox86/HVM: refine SMEP test in HVM_CR4_GUEST_RESERVED_BITS()
Jan Beulich [Tue, 24 Jun 2014 08:17:45 +0000 (10:17 +0200)]
x86/HVM: refine SMEP test in HVM_CR4_GUEST_RESERVED_BITS()

Andrew validly points out that the use of the macro on the restore path
can't rely on the CPUID bits for the guest already being in place (as
their setting by the tool stack in turn requires the other restore
operations already having taken place). And even worse, using
hvm_cpuid() is invalid here because that function assumes to be used in
the context of the vCPU in question.

Reverting to the behavior prior to the change from checking
cpu_has_sm?p to hvm_vcpu_has_sm?p() would break the other (non-restore)
use of the macro. So let's revert to the prior behavior only for the
restore path, by adding a respective second parameter to the macro.

Obviously the two cpu_has_* uses in the macro should really also be
converted to hvm_cpuid() based checks at least for the non-restore
path.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
master commit: 584287380baf81e5acdd9dc7dfc7ffccd1e9a856
master date: 2014-06-10 13:12:05 +0200

10 years agoavoid crash on HVM domain destroy with PCI passthrough
Juergen Gross [Tue, 24 Jun 2014 08:16:12 +0000 (10:16 +0200)]
avoid crash on HVM domain destroy with PCI passthrough

c/s bac6334b5 "move domain to cpupool0 before destroying it" introduced a
problem when destroying a HVM domain with PCI passthrough enabled. The
moving of the domain to cpupool0 includes moving the pirqs to the cpupool0
cpus, but the event channel infrastructure already is unusable for the
domain. So just avoid moving pirqs for dying domains.

Signed-off-by: Juergen Gross <jgross@suse.com>
master commit: b9ae60907e6dbc686403e52a7e61a6f856401a1b
master date: 2014-06-10 12:04:08 +0200

10 years agox86: fix reboot/shutdown with running HVM guests
Roger Pau Monné [Tue, 24 Jun 2014 08:15:29 +0000 (10:15 +0200)]
x86: fix reboot/shutdown with running HVM guests

If there's a guest using VMX/SVM when the hypervisor shuts down, it
can lead to the following crash due to VMX/SVM functions being called
after hvm_cpu_down has been called. In order to prevent that, check in
{svm/vmx}_ctxt_switch_from that the cpu virtualization extensions are
still enabled.

(XEN) Domain 0 shutdown: rebooting machine.
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
...
(XEN) Xen call trace:
(XEN)    [<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
(XEN)    [<ffff82d08015d129>] __context_switch+0x127/0x462
(XEN)    [<ffff82d080160acf>] __sync_local_execstate+0x6a/0x8b
(XEN)    [<ffff82d080160af9>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82d080161728>] map_domain_page+0x88/0x4de
(XEN)    [<ffff82d08014e721>] map_vtd_domain_page+0xd/0xf
(XEN)    [<ffff82d08014cda2>] io_apic_read_remap_rte+0x158/0x29f
(XEN)    [<ffff82d0801448a8>] iommu_read_apic_from_ire+0x27/0x29
(XEN)    [<ffff82d080165625>] io_apic_read+0x17/0x65
(XEN)    [<ffff82d080166143>] __ioapic_read_entry+0x38/0x61
(XEN)    [<ffff82d080166aa8>] clear_IO_APIC_pin+0x1a/0xf3
(XEN)    [<ffff82d080166bae>] clear_IO_APIC+0x2d/0x60
(XEN)    [<ffff82d080166f63>] disable_IO_APIC+0xd/0x81
(XEN)    [<ffff82d08018228b>] smp_send_stop+0x58/0x68
(XEN)    [<ffff82d080181aa7>] machine_restart+0x80/0x20a
(XEN)    [<ffff82d080181c3c>] __machine_restart+0xb/0xf
(XEN)    [<ffff82d080128fb9>] smp_call_function_interrupt+0x99/0xc0
(XEN)    [<ffff82d080182330>] call_function_interrupt+0x33/0x43
(XEN)    [<ffff82d08016bd89>] do_IRQ+0x9e/0x63a
(XEN)    [<ffff82d08016406f>] common_interrupt+0x5f/0x70
(XEN)    [<ffff82d0801a8600>] mwait_idle+0x29c/0x2f7
(XEN)    [<ffff82d08015cf67>] idle_loop+0x58/0x76
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ****************************************

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 39ede234d1fd683430ffb1784d6d35b096f16457
master date: 2014-06-05 17:53:35 +0200

10 years agox86/domctl: two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate
Andrew Cooper [Tue, 24 Jun 2014 08:11:54 +0000 (10:11 +0200)]
x86/domctl: two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate

Interacting with the vcpu itself should be protected by vcpu_pause().
Buggy/naive toolstacks might encounter adverse interaction with a vcpu context
switch, or increase of xcr0_accum.  There are no much problems with current
in-tree code.

Explicitly permit a NULL guest handle as being a request for size.  It is the
prevailing Xen style, and without it, valgrind's ioctl handler is unable to
determine whether evc->buffer actually got written to.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/domctl: further fix to XEN_DOMCTL_[gs]etvcpuextstate

Do not clobber errors from certain codepaths.  Clobbering of -EINVAL from
failing "evc->size <= PV_XSAVE_SIZE(_xcr0_accum)" was a pre-existing bug.

However, clobbering -EINVAL/-EFAULT from the get codepath was a bug
unintentionally introduced by 090ca8c1 "x86/domctl: two functional fixes to
XEN_DOMCTL_[gs]etvcpuextstate".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 090ca8c155b7321404ea7713a28aaedb7ac4fffd
master date: 2014-06-05 17:52:57 +0200
master commit: 895661ae98f0249f50280b4acfb9dda70b76d7e9
master date: 2014-06-10 12:03:16 +0200

10 years agoVT-d: honor APEI firmware-first mode in XSA-59 workaround code
Jan Beulich [Tue, 24 Jun 2014 08:10:13 +0000 (10:10 +0200)]
VT-d: honor APEI firmware-first mode in XSA-59 workaround code

When firmware-first mode is being indicated by firmware, we shouldn't
be modifying AER registers - these are considered to be owned by
firmware in that case. Violating this is being reported to result in
SMI storms. While circumventing the workaround means re-exposing
affected hosts to the XSA-59 issues, this in any event seems better
than not booting at all. Respective messages are being issued to the
log, so the situation can be diagnosed.

The basic building blocks were taken from Linux 3.15-rc. Note that
this includes a block of code enclosed in #ifdef CONFIG_X86_MCE - we
don't define that symbol, and that code also wouldn't build without
suitable machine check side code added; that should happen eventually,
but isn't subject of this change.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 1cc37ba8dbd89fb86dad3f6c78c3fba06019fe21
master date: 2014-06-05 17:49:14 +0200

10 years agoACPI: Prevent acpi_table_entries from falling into a infinite loop
Malcolm Crossley [Tue, 24 Jun 2014 08:01:38 +0000 (10:01 +0200)]
ACPI: Prevent acpi_table_entries from falling into a infinite loop

If a buggy BIOS programs an ACPI table with to small an entry length
then acpi_table_entries gets stuck in an infinite loop.

To aid debugging, report the error and exit the loop.

Based on Linux kernel commit 369d913b242cae2205471b11b6e33ac368ed33ec

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Use < instead of <= (which I wrongly suggested), return -ENODATA
instead of -EINVAL, and make description match code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 9c1e8cae657bc13e8b1ddeede17603d77f3ad341
master date: 2014-06-04 11:26:15 +0200

10 years agox86, amd_ucode: flip revision numbers in printk
Aravind Gopalakrishnan [Tue, 24 Jun 2014 08:00:49 +0000 (10:00 +0200)]
x86, amd_ucode: flip revision numbers in printk

A failure would result in log message like so-
(XEN) microcode: CPU0 update from revision 0x6000637 to 0x6000626 failed
                                           ^^^^^^^^^^^^^^^^^^^^^^
The above message has the revision numbers inverted. Fix this.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
master commit: 071a4c70a634f7d4f74cde4086ff3202968538c9
master date: 2014-06-02 10:19:27 +0200

10 years agox86-64: fix incorrect assembly instruction operand combination
Jan Beulich [Tue, 24 Jun 2014 08:00:08 +0000 (10:00 +0200)]
x86-64: fix incorrect assembly instruction operand combination

Using %r11 with "cmpw" is invalid (which gas 2.25 is going to complain
about). Rather than rolling a branch specific fix, re-use the
respective hunk from master commit 4d246723 ("x86: use MOV instead of
PUSH/POP when saving/restoring register state").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agopage-alloc: scrub pages used by hypervisor upon freeing
Jan Beulich [Tue, 17 Jun 2014 14:07:28 +0000 (16:07 +0200)]
page-alloc: scrub pages used by hypervisor upon freeing

... unless they're part of a fully separate pool (and hence can't ever
be used for guest allocations).

This is CVE-2014-4021 / XSA-100.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 4bd78937ec324bcef4e29ef951e0ff9815770de1
master date: 2014-06-17 15:21:10 +0200

10 years agoupdate Xen version to 4.2.5-rc1 4.2.5-rc1
Jan Beulich [Tue, 17 Jun 2014 14:06:54 +0000 (16:06 +0200)]
update Xen version to 4.2.5-rc1

10 years agoACPI/ERST: fix table mapping
Jan Beulich [Wed, 4 Jun 2014 15:20:38 +0000 (17:20 +0200)]
ACPI/ERST: fix table mapping

acpi_get_table(), when executed before reaching SYS_STATE_active, will
return a mapping valid only until the next invocation of that funciton.
Consequently storing the returned pointer for later use is incorrect.
Copy the logic used in VT-d's DMAR handling.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: fca69b1fc606ece62430076ca4a157e4bed749a8
master date: 2014-05-26 12:25:01 +0200

10 years agoRevert "ACPI/ERST: fix table mapping"
Jan Beulich [Wed, 4 Jun 2014 13:36:25 +0000 (15:36 +0200)]
Revert "ACPI/ERST: fix table mapping"

This reverts commit 45ce6fb083335abf82f5b7d301bcc91261c94925
(missing 32-bit code).

10 years agox86/HVM: eliminate vulnerabilities from hvm_inject_msi()
Jan Beulich [Tue, 3 Jun 2014 14:11:52 +0000 (16:11 +0200)]
x86/HVM: eliminate vulnerabilities from hvm_inject_msi()

- pirq_info() returns NULL for a non-allocated pIRQ, and hence we
  mustn't unconditionally de-reference it, and we need to invoke it
  another time after having called map_domain_emuirq_pirq()
- don't use printk(), namely without XENLOG_GUEST, for error reporting

This is XSA-96.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 6f4cc0ac41625a054861b417ea1fc3ab88e2e40a
master date: 2014-06-03 15:17:14 +0200

10 years agotimers: set the deadline more accurately
Ross Lagerwall [Tue, 3 Jun 2014 10:23:25 +0000 (12:23 +0200)]
timers: set the deadline more accurately

Program the timer to the deadline of the closest timer if it is further
than 50us ahead, otherwise set it 50us ahead.  This way a single event
fires on time rather than 50us late (as it would have previously) while
still preventing too many timer wakeups in the case of having many
timers scheduled close together.

(where 50us is the timer_slop)

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: 054b6dfb61eab00d86ddd5d0ac508f5302da0d52
master date: 2014-05-28 10:07:50 +0200

10 years agox86: don't use VA for cache flush when also flushing TLB
Jan Beulich [Tue, 3 Jun 2014 10:22:49 +0000 (12:22 +0200)]
x86: don't use VA for cache flush when also flushing TLB

Doing both flushes at once is a strong indication for the address
mapping to either having got dropped (in which case the cache flush,
when done via INVLPG, would fault) or its physical address having
changed (in which case the cache flush would end up being done on the
wrong address range). There is no adverse effect (other than the
obvious performance one) using WBINVD in this case regardless of the
range's size; only map_pages_to_xen() uses combined flushes at present.

This problem was observed with the 2nd try backport of d6cb14b3 ("VT-d:
suppress UR signaling for desktop chipsets") to 4.2 (where ioremap()
needs to be replaced with set_fixmap_nocache(); the previously
commented out and now being re-enabled __set_fixmap(, 0, 0) there to
undo the mapping resulted in the first of the above two scenarios).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 50df6f7429f73364bbddb0970a3a34faa01a7790
master date: 2014-05-28 09:51:07 +0200

10 years agoAMD IOMMU: don't free page table prematurely
Jan Beulich [Tue, 3 Jun 2014 10:22:14 +0000 (12:22 +0200)]
AMD IOMMU: don't free page table prematurely

iommu_merge_pages() still wants to look at the next level page table,
the TLB flush necessary before freeing too happens in that function,
and if it fails no free should happen at all. Hence the freeing must
be done after that function returned successfully, not before it's
being called.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 6b4d71d028f445cba7426a144751fddc8bfdd67b
master date: 2014-05-28 09:50:33 +0200

10 years agoVT-d: fix mask applied to DMIBAR in desktop chipset XSA-59 workaround
Jan Beulich [Tue, 3 Jun 2014 10:21:12 +0000 (12:21 +0200)]
VT-d: fix mask applied to DMIBAR in desktop chipset XSA-59 workaround

In commit  ("VT-d: suppress UR signaling for desktop chipsets")
the mask applied to the value read from DMIBAR is to narrow, only the
comment accompanying it was correct. Fix that and tag the literal
number as "long long" at once to avoid eventual compiler warnings.

The widest possible value so far is 39 bits; all chipsets covered here
but having less than this number of bits have the remaining bits marked
reserved (zero), and hence there's no need for making the mask chipset
specific.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: f8ecf31c31906552522c2a1b0d1cada07d78876e
master date: 2014-05-26 12:28:46 +0200

10 years agoACPI/ERST: fix table mapping
Jan Beulich [Tue, 3 Jun 2014 10:19:45 +0000 (12:19 +0200)]
ACPI/ERST: fix table mapping

acpi_get_table(), when executed before reaching SYS_STATE_active, will
return a mapping valid only until the next invocation of that funciton.
Consequently storing the returned pointer for later use is incorrect.
Copy the logic used in VT-d's DMAR handling.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: fca69b1fc606ece62430076ca4a157e4bed749a8
master date: 2014-05-26 12:25:01 +0200

10 years agomove domain to cpupool0 before destroying it
Juergen Gross [Fri, 23 May 2014 13:42:16 +0000 (15:42 +0200)]
move domain to cpupool0 before destroying it

Currently when a domain is destroyed it is removed from the domain_list
before all of it's resources, including the cpupool membership, are freed.
This can lead to a situation where the domain is still member of a cpupool
without for_each_domain_in_cpupool() (or even for_each_domain()) being
able to find it any more. This in turn can result in rejection of removing
the last cpu from a cpupool, because there seems to be still a domain in
the cpupool, even if it can't be found by scanning through all domains.

This situation can be avoided by moving the domain to be destroyed to
cpupool0 first and then remove it from this cpupool BEFORE deleting it from
the domain_list. As cpupool0 is always active and a domain without any cpupool
membership is implicitly regarded as belonging to cpupool0, this poses no
problem.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
master date: 2014-05-20 15:55:42 +0200

10 years agoVT-d: extend error report masking workaround to newer chipsets
Jan Beulich [Fri, 23 May 2014 13:41:34 +0000 (15:41 +0200)]
VT-d: extend error report masking workaround to newer chipsets

Add two more PCI IDs to the set that has been taken care of with a
different workaround long before XSA-59, and (for constency with the
newer workarounds) log a message here too.

Also move the function wide comment to the cases it applies to; this
should really have been done by d061d200 ("VT-d: suppress UR signaling
for server chipsets").

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 04734664eb20c3bf239e473af182bb7ab901d779
master date: 2014-05-20 15:54:01 +0200

10 years agoVT-d: apply quirks at device setup time rather than only at boot
Jan Beulich [Fri, 23 May 2014 13:40:47 +0000 (15:40 +0200)]
VT-d: apply quirks at device setup time rather than only at boot

Accessing extended config space may not be possible at boot time, e.g.
when the memory space used by MMCFG is reserved only via ACPI tables,
but not in the E820/UEFI memory maps (which we need Dom0 to tell us
about). Consequently the change here still leaves the issue unaddressed
for systems where the extended config space remains inaccessible (due
to firmware bugs, i.e. not properly reserving the address space of
those regions).

With the respective messages now potentially getting logged more than
once, we ought to consider whether we should issue them only if we in
fact were required to do any masking (i.e. if the relevant mask bits
weren't already set).

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 5786718fbaafbe47d72cc1512cd93de79b8fc2fa
master date: 2014-05-20 15:53:20 +0200

10 years agoVT-d: suppress UR signaling for desktop chipsets
Jan Beulich [Fri, 23 May 2014 13:35:45 +0000 (15:35 +0200)]
VT-d: suppress UR signaling for desktop chipsets

Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the desktop chipsets dealt with here.

This is CVE-2013-3495 / XSA-59.

Note that in the backported version the clearing of the fixmap entry
is commented out - it's not strictly needed, as we don't re-use the
fixmap slot (and if we did it would still caue no problems), but causes
a problem in map_pages_to_xen(), which wants to flush the cache for the
page in question, but that works only when the page is still mapped.
Fixing this will need to be a separate patch (coming through -unstable)
though.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: d6cb14b34ffc2a830022d059f1aa22bf19dcf55f
master date: 2014-04-25 12:12:38 +0200

10 years agox86: use native RDTSC(P) execution when guest and host frequencies are the same
Boris Ostrovsky [Fri, 23 May 2014 13:33:44 +0000 (15:33 +0200)]
x86: use native RDTSC(P) execution when guest and host frequencies are the same

We should be able to continue using native RDTSC(P) execution on
HVM/PVH guests after migration if host and guest frequencies are
equal (this includes the case when the frequencies are made equal
by TSC scaling feature).

This also allows us to revert main part of commit 4aab59a3 (svm: Do not
intercept RDTSC(P) when TSC scaling is supported by hardware) which
was wrong: while RDTSC intercepts were disabled domain's vtsc could
still be set, leading to inconsistent view of guest's TSC.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 82713ec8d2b65d17f13e46a131e38bfe5baf8bd6
master date: 2014-04-22 12:07:37 +0200

10 years agotools/pygrub: Fix error handling if no valid partitions are found
Andrew Cooper [Sat, 10 May 2014 01:18:33 +0000 (02:18 +0100)]
tools/pygrub: Fix error handling if no valid partitions are found

If no partitions at all are found, pygrub never creates the name 'fs',
resulting in a NameError indicating the lack of fs, rather than a
RuntimeError explaining that no partitions were found.

Set fs to None right at the start, and use the pythonic idiom "if fs is None:"
to protect against otherwise valid values for fs which compare equal to
0/False.

Reported-by: Sven Köhler <sven.koehler@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit d75215805ce6ed20b3807955fab6a7f7a3368bee)
(cherry picked from commit 5ee75ef147f83457fa28d4d4374efcf066581e26)
(cherry picked from commit 11b2541f458a3d09c63980e669c166cf6e96980a)

10 years agolibxl_json: remove extra "break"
Wei Liu [Wed, 9 Apr 2014 13:29:13 +0000 (14:29 +0100)]
libxl_json: remove extra "break"

... otherwise JSON array elements are not freed and memory is leaked.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 3eb54a2fdbc216b39dc2c0a86f11a32d4c838269)
(cherry picked from commit d6eff6fcc05f7167e5b2232d3bc60047fffb8fc4)
(cherry picked from commit a14bb4db517ca076ad7d785be52d4bd7a6df6de9)

10 years agotmem: remove dumb check in do_tmem_destroy_pool
Julien Grall [Fri, 4 Apr 2014 09:13:32 +0000 (11:13 +0200)]
tmem: remove dumb check in do_tmem_destroy_pool

do_tmem_destroy_pool is checking if pools == NULL. But, pools is a fixed
array.

Clang 3.5 will fail to compile xen/common/tmem.c with the following error:
tmem.c:1848:18: error: comparison of array 'client->pools' equal to a null
pointer is always false [-Werror,-Wtautological-pointer-compare]
    if ( client->pools == NULL )

Coverity-ID:1055632

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ac0f56a2fa407e0704fade12630a5a960dedce87)
(cherry picked from commit 6ce0c3fca9bd1c0d45908452d6e5e9f7bf22f7b7)
(cherry picked from commit 804d9af208c5c95156140b1c62cf8857ba250b03)

10 years agoRevert "VT-d: suppress UR signaling for desktop chipsets"
Jan Beulich [Wed, 21 May 2014 14:33:57 +0000 (16:33 +0200)]
Revert "VT-d: suppress UR signaling for desktop chipsets"

This reverts commit a910070d4289fdf71c3ca35886192a602a3724d5 -
the use of ioremap()/iounmap() is only valid from 4.3 onwards.

10 years agoadd xen/include/xen/pci_ids.h forgotten in 64c4c7c8
Jan Beulich [Tue, 13 May 2014 06:30:56 +0000 (08:30 +0200)]
add xen/include/xen/pci_ids.h forgotten in 64c4c7c8

10 years agox86: fix guest CPUID handling
Jan Beulich [Mon, 12 May 2014 15:43:00 +0000 (17:43 +0200)]
x86: fix guest CPUID handling

The way XEN_DOMCTL_set_cpuid got handled so far allowed for surprises
to the caller. With this set of operations
- set leaf A (using array index 0)
- set leaf B (using array index 1)
- clear leaf A (clearing array index 0)
- set leaf B (using array index 0)
- clear leaf B (clearing array index 0)
the entry for leaf B at array index 1 would still be in place, while
the caller would expect it to be cleared.

While looking at the use sites of d->arch.cpuid[] I also noticed that
the allocation of the array needlessly uses the zeroing form - the
relevant fields of the array elements get set in a loop immediately
following the allocation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 4c0ff6bd54b5a67f8f820f9ed0a89a79f1a26a1c
master date: 2014-05-02 12:09:03 +0200

10 years agohvm_set_ioreq_page() releases wrong page in error path
Paul Durrant [Mon, 12 May 2014 15:42:33 +0000 (17:42 +0200)]
hvm_set_ioreq_page() releases wrong page in error path

The function calls prepare_ring_for_helper() to acquire a mapping for the
given gmfn, then checks (under lock) to see if the ioreq page is already
set up but, if it is, the function then releases the in-use ioreq page
mapping on the error path rather than the one it just acquired. This patch
fixes this bug.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 16e2a7596e9fc86881c73cef57602b2c88155528
master date: 2014-05-02 11:46:32 +0200

10 years agox86/HVM: correct the SMEP logic for HVM_CR0_GUEST_RESERVED_BITS
Feng Wu [Mon, 12 May 2014 15:41:42 +0000 (17:41 +0200)]
x86/HVM: correct the SMEP logic for HVM_CR0_GUEST_RESERVED_BITS

When checking the SMEP feature for HVM guests, we should check the
VCPU instead of the host CPU.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 31ee951a3bee6e7cc21f94f900fe989e3701a79a
master date: 2014-04-28 12:47:24 +0200

10 years agopassthrough: allow to suppress SERR and PERR signaling altogether
Jan Beulich [Mon, 12 May 2014 15:39:59 +0000 (17:39 +0200)]
passthrough: allow to suppress SERR and PERR signaling altogether

This is just to have a workaround at hand in case other chipsets (not
covered by the previous two patches) also have similar issues.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: 1a2a390a560e8319a6be98c7ab6cfaebd230f67e
master date: 2014-04-25 12:13:31 +0200