Michael Young [Wed, 13 Feb 2013 17:00:15 +0000 (17:00 +0000)]
tools: Fix memset(&p,0,sizeof(p)) idiom in several places.
gcc 4.8 identifies several places where code of the form memset(x, 0,
sizeof(x)); is used incorrectly, meaning that less memory is set to
zero than required.
Signed-off-by: Michael Young <m.a.young@durham.ac.uk> Committed-by: Keir Fraser <keir@xen.org>
Add Xenoprofile support for AMD Family16h. The corresponded OProfile
patch has already been submitted to OProfile mailing list.
(http://marc.info/?l=oprofile-list&m=136036136017302&w=2 ).
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Ian Jackson [Thu, 7 Feb 2013 14:21:47 +0000 (14:21 +0000)]
oxenstored: Enforce a maximum message size of 4096 bytes
The maximum size of a message is part of the protocol spec in
xen/include/public/io/xs_wire.h
Before this patch a client which sends an overly large message can
cause a buffer read overrun.
Note if a badly-behaved client sends a very large message
then it will be difficult for them to make their connection
work again-- they will probably need to reboot.
Signed-off-by: David Scott <dave.scott@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 7 Feb 2013 14:21:44 +0000 (14:21 +0000)]
tools/ocaml: oxenstored: Be more paranoid about ring reading
oxenstored makes use of the OCaml Xenbus bindings, in which the
function xs_ring_read in tools/ocaml/libs/xb/xs_ring_stubs.c is used
to read from the shared memory Xenstore ring.
This function does not correctly handle all possible (prod, cons)
states when MASK_XENSTORE_IDX(prod) > MASK_XENSTORE_IDX(cons).
The root cause is the use of the unmasked values of prod and cons to
calculate to_read. If prod is set to an out-of-range value, the ring
peer can cause to_read to be too large or even negative. This allows
the ring peer to force oxenstored to read and write out of range for
the buffers leading to a crash or possibly to privilege escalation.
Correct this by masking the values of cons and prod at the start, so
we only deal with masked values. This makes the logic simpler, as
semantically inappropriate values of the upper bits of the ring
pointers are simply ignored.
The same vulnerability does not exist in the ring writer because the
only use made of the unmasked value is the check which prevents the
prod pointer overtaking the cons pointer. A ring peer which defeats
this check will suffer only lost data.
However, additionally, precautions need to be taken to ensure that
req_cons and req_prod are only read once in each function. Without
the use of volatile or some asm construct, the compiler can "prove"
that req_cons and req_prod do not change unexpectedly and is permitted
to "amplify" the read of (say) req_cons into two reads at different
times, giving two different values for use as cons, and then use the
two sources of cons interchangeably. (The use of xen_mb() does not
forbid this.)
Therefore do the reads of req_cons and req_prod through a volatile
pointer in both xs_ring_read and xs_ring_write.
This is currently believed to be a theoretical vulnerability as we are
not aware of any compilers which amplify reads in this way.
This is a security issue, part of XSA-38 / CVE-2013-0215.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Matthew Daley <mattjd@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 5 Feb 2013 15:47:41 +0000 (15:47 +0000)]
xen: enable stubdom on a per arch basis
... and disable on ARM (for now).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Boris Ostrovsky [Tue, 5 Feb 2013 14:21:25 +0000 (15:21 +0100)]
AMD,IOMMU: Disable IOMMU if SATA Combined mode is on
AMD's SP5100 chipset can be placed into SATA Combined mode
that may cause prevent dom0 from booting when IOMMU is
enabled and per-device interrupt remapping table is used.
While SP5100 erratum 28 requires BIOSes to disable this mode,
some may still use it.
This patch checks whether this mode is on and, if per-device
table is in use, disables IOMMU.
This is XSA-36 / CVE-2013-0153.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Flipped operands of && in amd_iommu_init() to make the message issued
by amd_sp5100_erratum28() match reality (when amd_iommu_perdev_intremap
is zero, there's really no point in calling the function).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 5 Feb 2013 14:20:47 +0000 (15:20 +0100)]
AMD,IOMMU: Clean up old entries in remapping tables when creating new one
When changing the affinity of an IRQ associated with a passed
through PCI device, clear previous mapping.
This is XSA-36 / CVE-2013-0153.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
In addition, because some BIOSes may incorrectly program IVRS
entries for IOAPIC try to check for entry's consistency. Specifically,
if conflicting entries are found disable IOMMU if per-device
remapping table is used. If entries refer to bogus IOAPIC IDs
disable IOMMU unconditionally
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Boris Ostrovsky [Tue, 5 Feb 2013 14:18:18 +0000 (15:18 +0100)]
ACPI: acpi_table_parse() should return handler's error code
Currently, the error code returned by acpi_table_parse()'s handler
is ignored. This patch will propagate handler's return value to
acpi_table_parse()'s caller.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Ian Campbell [Tue, 5 Feb 2013 11:46:12 +0000 (11:46 +0000)]
xen: arm: fix build of hvm.c
Add include of xsm/xsm.h to fix:
hvm.c: In function 'do_hvm_op': hvm.c:37:9: error: implicit declaration of function 'xsm_hvm_param' [-Werror=implicit-function-declaration]
hvm.c:37:9: error: nested extern declaration of 'xsm_hvm_param' [-Werror=nested-externs]
hvm.c:37:28: error: 'XSM_TARGET' undeclared (first use in this function)
hvm.c:37:28: note: each undeclared identifier is reported only once for each function it appears in
cc1: all warnings being treated as errors
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Tue, 5 Feb 2013 11:31:11 +0000 (11:31 +0000)]
tools: update ocamlfind handling
configure checks just for ocamlc, but the tools in tools/ocaml depend
also on ocamlfind. On my workstation I have just ocamlc installed, but
no ocamlfind. As a result make will fail.
Update configure.ac to check also for OCAMLFIND, update various
Makefiles and replace hardcoded ocamlfind string with $(OCAMLFIND)
Please rerun autogen.sh after applying this patch.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
xencommons: redirect serial and parallel to /dev/null
Upstream QEMU doesn't support -nographic with -daemonize unless monitor,
serial and parallel outputs are all redirected:
/* According to documentation and historically, -nographic redirects
* serial port, parallel port and monitor to stdio, which does not work
* with -daemonize. We can redirect these to null instead, but since
* -nographic is legacy, let's just error out.
* We disallow -nographic only if all other ports are not redirected
* explicitly, to not break existing legacy setups which uses
* -nographic _and_ redirects all ports explicitly - this is valid
* usage, -nographic is just a no-op in this case.
*/
Considering that we do want to redirect them to /dev/null anyway, do so.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- handled reject vs 26352:9a1610c1e564 "xencommons: Stop QEMU in
do_stop()" and rewrapped the resulting long line ] Committed-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 5 Feb 2013 08:44:00 +0000 (09:44 +0100)]
x86/HVM: assorted RTC emulation adjustments
- only call check_update_timer() on REG_B writes when SET changes
- only call alarm_timer_update() on REG_B writes when relevant bits
change
- instead properly handle AF and PF when the guest is not also setting
AIE/PIE respectively (for UF this was already the case, only a
comment was slightly inaccurate), including calling the respective
update functions upon REG_C reads
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Olaf Hering <olaf@aepfle.de>
This conflicts with changes done in 26486:7648ef657fe7 and
26489:83a3fa9c8434 (i.e. the code added by them needs adjustment in
order for the change here to be correct).
Jan Beulich [Mon, 4 Feb 2013 11:10:26 +0000 (12:10 +0100)]
x86/EFI: simplify PCI option ROM retrieval
While putting together the kernel side of this I realized that c/s
26397:d9c7b82aa7b1 went a little too far in requiring a buffer for the
option ROM contents - all that is really needed is handing Dom0
physical address and size of the data block.
Dongxiao Xu [Mon, 4 Feb 2013 11:08:15 +0000 (12:08 +0100)]
nEPT: fix INVEPT instruction parameter
While emulating the INVEPT instruction in L0 VMM, the EPT pointer
should be fetched from the instruction decoding result, but not
the current loaded EPT pointer.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Acked-by: Eddie Dong <eddie.dong@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Dongxiao Xu [Mon, 4 Feb 2013 11:07:34 +0000 (12:07 +0100)]
nEPT: fix EPT pointer setting for L2 guest
Each time in virtual_vmentry(), the code needs to cover both EPT
and shadow mode for L2 guest, updating different EPT pointer to
shadow VMCS.
This fixes the issue that, launch a guest with EPT, then kill it
and launch a second guest with shadow, the second guest will hang
at the startup screen.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked by: Eddie Dong <eddie.dong@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 4 Feb 2013 11:03:38 +0000 (12:03 +0100)]
x86/nestedhvm: properly clean up after failure to set up all vCPU-s
This implies that the individual destroy functions will have to remain
capable of being called for a vCPU that the corresponding init function
was never run on.
Once at it, also clean up some inefficiencies in the corresponding
parameter validation code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Dongxiao Xu [Wed, 30 Jan 2013 17:17:30 +0000 (09:17 -0800)]
VMX: disable SMEP feature when guest is in non-paging mode
SMEP is disabled if CPU is in non-paging mode in hardware.
However Xen always uses paging mode to emulate guest non-paging
mode with HAP. To emulate this behavior, SMEP needs to be manually
disabled when guest switches to non-paging mode.
We met an issue that, SMP Linux guest with recent kernel (enable
SMEP support, for example, 3.5.3) would crash with triple fault if
setting unrestricted_guest=0 in grub. This is because Xen uses an
identity mapping page table to emulate the non-paging mode, where
the page table is set with USER flag. If SMEP is still enabled in
this case, guest will meet unhandlable page fault and then crash.
David Vrabel [Wed, 30 Jan 2013 10:38:37 +0000 (02:38 -0800)]
mini-os: build fixes for lwip 1.3.2
Various fixes to mini-os needed to build lwip 1.3.2:
- Don't build the tests.
- Add BSD-style endianness macros to endian.h.
- free() is called via a function pointer so it needs to be a real
function. Do the same for malloc() and realloc().
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Committed-by: Keir Fraser <keir@xen.org>
Ian Campbell [Mon, 28 Jan 2013 16:48:19 +0000 (16:48 +0000)]
tools: revert to installing in /usr
26470:acaf29203cf9 missed a bunch of hardcoded paths, e.g. in the
initscripts. I think at this juncture it is appropriate to revert
this change and try again after some more testing.
Ian Campbell [Fri, 25 Jan 2013 15:04:11 +0000 (15:04 +0000)]
stubdom: Install xenstore stubdom in $(XENFIRMWAREDIR)
Removes hardcoded /usr prefix.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Fri, 25 Jan 2013 15:04:10 +0000 (15:04 +0000)]
tools: revert to using /var and /etc/
26470:acaf29203cf9 "tools+stubdom: install under /usr/local by
default" moved more stuff under /usr/local than was desirable.
In particular SYSCONFIG_DIR (configuration for initscripts) moved to
/usr/local/etc/{sysconfig,defaults} while the initscripts themselves
(correctly) remained in /etc/init.d. Moving /etc/xen/scripts breaks
the udev bakcned rules file. Lastly stuff under /var was moved to
/usr/local/var.
Move these back to /etc/ and /var. Moving /etc wholesale rather thsn
just the problematic bits is preferable for consistency.
Although there seems to be some disagreement about /usr/local/var vs
/var using /var is compatible with the FHS and what we think most
people will expect.
Most of this impacts Linux only but NetBSD appears to have been using
/usr/local/var/lib which I have also reset to /var/lib.
Note that we already paid no attention to autoconf --sysconfdir or
--localstatedir ('etc' and 'var' respectively) so there is no change
from that PoV.
Dongxiao Xu [Fri, 25 Jan 2013 09:19:55 +0000 (10:19 +0100)]
nested vmx: enable VMCS shadowing feature
The current logic for handling the non-root VMREAD/VMWRITE is by
VM-Exit and emulate, which may bring certain overhead.
On new Intel platform, it introduces a new feature called VMCS
shadowing, where non-root VMREAD/VMWRITE will not trigger VM-Exit,
and the hardware will read/write the virtual VMCS instead.
This is proved to have performance improvement with the feature.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Acked-by Eddie Dong <eddie.dong@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Dongxiao Xu [Fri, 25 Jan 2013 09:19:15 +0000 (10:19 +0100)]
nested vmx: optimize for bulk access of virtual VMCS
After we use the VMREAD/VMWRITE to build up the virtual VMCS, each
access to the virtual VMCS needs two VMPTRLD and one VMCLEAR to
switch the environment, which might be an overhead to performance.
This commit tries to handle multiple virtual VMCS access together
to improve the performance.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by Eddie Dong <eddie.dong@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Dongxiao Xu [Fri, 25 Jan 2013 09:18:40 +0000 (10:18 +0100)]
nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing
Before the VMCS shadowing feature, we use memory operation to build up
the virtual VMCS. This does work since this virtual VMCS will never be
loaded into real hardware. However after we introduce the VMCS
shadowing feature, this VMCS will be loaded into hardware, which
requires all fields in the VMCS accessed by VMREAD/VMWRITE.
Besides, the virtual VMCS revision identifer should also meet the
hardware's requirement, instead of using a faked one.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by Eddie Dong <eddie.dong@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Dongxiao Xu [Fri, 25 Jan 2013 09:17:00 +0000 (10:17 +0100)]
nested vmx: Use a list to store the launched vvmcs for L1 VMM
Originally we use a virtual VMCS field to store the launch state of
a certain vmcs. However if we introduce VMCS shadowing feature, this
virtual VMCS should also be able to load into real hardware,
and VMREAD/VMWRITE operate invalid fields.
The new approach is to store the launch state into a list for L1 VMM.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by Eddie Dong <eddie.dong@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Ian Campbell [Fri, 25 Jan 2013 09:03:37 +0000 (09:03 +0000)]
docs: check for documentation generation tools in docs/configure.
It is sometimes hard to discover all the optional tools that should be
on a system to build all available Xen documentation. By checking for
documentation generation tools at ./configure time and displaying a
warning, Xen packagers will more easily learn about new optional build
dependencies, like markdown, when they are introduced.
Based on a patch by Matt Wilson. Changed to use a separate
docs/configure which is called from the top-level in the same manner
as stubdoms.
Rerun autogen.sh and "git add docs/configure" after applying this patch.
Ian Campbell [Fri, 25 Jan 2013 09:02:13 +0000 (09:02 +0000)]
docs: Remove xen-api docs
This document is about an old unmaintained version of the XenAPI,
which bears little to no relation to what is implemented in xapi and
which is only partially implemented in xend (which is deprecated). The
doc hasn't seen much in the way of updates since 2009.
Anyone who is actually interested can continue to use the version
which was in 4.2.
Ian Campbell [Fri, 25 Jan 2013 08:54:21 +0000 (08:54 +0000)]
xl: SWITCH_FOREACH_OPT handles special options directly.
This removes the need for the "case 0: case 2:" boilerplate in every
main_foo(). Calls exit(3) directly which is OK since xl cleans up the
context etc in an atexit(3) handler.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Fri, 25 Jan 2013 08:54:20 +0000 (08:54 +0000)]
xl: Introduce helper macro for option parsing.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Fri, 25 Jan 2013 08:54:19 +0000 (08:54 +0000)]
xl: allow def_getopt to handle long options
Improves consistency of option parsing and error handling.
Consistently support --help for all options.
Many users of getopt_long were needlessly passing an option_index
pointer which was not used.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Thu, 24 Jan 2013 12:47:57 +0000 (12:47 +0000)]
libxc: match types of 'subject' and 'foreigndom' between struct xc_mmu and do_mmu_update
In do_mmu_update() (in the hypervisor) the parameter 'foreigndom' is
'unsigned int' and both its high (bits 31-16) and low (bits 15-0) are
parts utilised, as explained here:
http://xenbits.xen.org/docs/unstable/hypercall/include,public,xen.h.html#Func_HYPERVISOR_mmu_update
However, the actual parameter, i.e., the 'subject' field in
struct xc_mmu is declared as domid_t, which typedef-s to uint16_t.
This means we are never able to pass anything via the higher 16 bits
of 'foreigndom', which in turns may cause the hypercall to fail when
called on an actual foreign domain.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 24 Jan 2013 12:47:56 +0000 (12:47 +0000)]
xen: Simplify the space of spaces supported by XENMEM_add_to_physmap(_range)
XENMAPSPACE_gmfn_foreign is not supported by XENMEM_add_to_physmap.
Although in theory XENMEM_add_to_physmap_range could support
XENMAPSPACE_gmfn_range this is no different to
XENMAPSPACE_gmfn in the context of the ranged hypercall so disallow it
to avoid any confusion.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 24 Jan 2013 12:47:55 +0000 (12:47 +0000)]
xen: arm: do not panic when failing to translate a guest address
The gva_to_{par,ipa} functions currently panic if the guest address
cannot be translated. Often the input comes from the guest so
panicing the host is a bit harsh!
Change the API of those functions to allow them to return a failure
and plumb it through where it is used.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 24 Jan 2013 12:47:55 +0000 (12:47 +0000)]
vtpm/vtpmmgr: Use libpolarssl.a instead of hardcoding own list of .o files
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked by: Matthew Fioravante <matthew.fioravante@jhuapl.edu> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 24 Jan 2013 12:47:54 +0000 (12:47 +0000)]
tools+stubdom: install under /usr/local by default.
This is the defacto (or FHS mandated?) standard location for software
built from source, in order to avoid clashing with packaged software
which is installed under /usr/bin etc.
I think there is benefit in having Xen's install behave more like the
majority of other OSS software out there.
The major downside here is in the transition from 4.2 to 4.3 where
people who have built from source will innevitably discover breakage
because 4.3 no longer overwrites stuff in /usr like it used to so they
pickup old stale bits from /usr instead of new stuff from /usr/local.
Packages will use ./configure --prefix=/usr or whatever helper macro
their package manager gives them. I have confirmed that doing this
results in the same list of installed files as before this patch was
applied.
The hypervisor remains in /boot/ and there is no intention to move it.
Ian Jackson [Thu, 24 Jan 2013 12:47:53 +0000 (12:47 +0000)]
libxl: fix stale timeout event callback race
Because there is not necessarily any lock held at the point the
application (eg, libvirt) calls libxl_osevent_occurred_timeout, in a
multithreaded program those calls may be arbitrarily delayed in
relation to other activities within the program.
Specifically this means when ->timeout_deregister returns, libxl does
not know whether it can safely dispose of the for_libxl value or
whether it needs to retain it in case of an in-progress call to
_occurred_timeout.
The interface could be fixed by requiring the application to make a
new call into libxl to say that the deregistration was complete.
However that new call would have to be threaded through the
application's event loop; this is complicated and some application
authors are likely not to implement it properly. Furthermore the
easiest way to implement this facility in most event loops is to queue
up a time event for "now".
Shortcut all of this by having libxl always call timeout_modify
setting abs={0,0} (ie, ASAP) instead of timeout_deregister. This will
cause the application to call _occurred_timeout. When processing this
calldown we see that we were no longer actually interested and simply
throw it away.
Additionally, there is a race between _occurred_timeout and
->timeout_modify. If libxl ever adjusts the deadline for a timeout
the application may already be in the process of calling _occurred, in
which case the situation with for_app's lifetime becomes very
complicated. Therefore abolish libxl__ev_time_modify_{abs,rel} (which
have no callers) and promise to the application only ever to call
->timeout_modify with abs=={0,0}. The application still needs to cope
with ->timeout_modify racing with its internal function which calls
_occurred_timeout. Document this.
This is a forwards-compatible change for applications using the libxl
API, and will hopefully eliminate these races in callback-supplying
applications (such as libvirt) without the need for corresponding
changes to the application. (It is possible that this might expose
bugs in applications, though, as previously libxl would never call
libxl_osevent_hooks->timeout_modify and now it never calls
->timeout_deregister).
For clarity, fold the body of time_register_finite into its one
remaining call site. This makes the semantics of ev->infinite
slightly clearer.
Cc: Bamvor Jian Zhang <bjzhang@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Thu, 24 Jan 2013 12:47:52 +0000 (12:47 +0000)]
libxl: fix stale fd event callback race
Because there is not necessarily any lock held at the point the
application (eg, libvirt) calls libxl_osevent_occurred_timeout and
..._fd, in a multithreaded program those calls may be arbitrarily
delayed in relation to other activities within the program.
libxl therefore needs to be prepared to receive very old event
callbacks. Arrange for this to be the case for fd callbacks.
This requires a new layer of indirection through a "hook nexus" struct
which can outlive the libxl__ev_foo. Allocation and deallocation of
these nexi is mostly handled in the OSEVENT macros which wrap up
the application's callbacks.
Document the problem and the solution in a comment in libxl_event.c
just before the definition of struct libxl__osevent_hook_nexus.
There is still a race relating to libxl__osevent_occurred_timeout;
this will be addressed in the following patch.
Reported-by: Bamvor Jian Zhang <bjzhang@suse.com> Cc: Bamvor Jian Zhang <bjzhang@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Tested-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen: infrastructure to have cross-platform video drivers
- introduce a new HAS_VIDEO config variable;
- build xen/drivers/video/font* if HAS_VIDEO;
- rename vga_puts to video_puts;
- rename vga_init to video_init;
- rename vga_endboot to video_endboot.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Introduce a function to map a range of physical memory into Xen virtual
memory.
It doesn't need domheap to be setup.
It is going to be used to map the videoram.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Add flush_xen_data_tlb_range_va, that flushes a range of virtual addresses.
Replace all the calls to flush_xen_data_tlb_va with calls to
flush_xen_data_tlb_range_va and remove flush_xen_data_tlb_va.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Wed, 23 Jan 2013 13:19:13 +0000 (14:19 +0100)]
x86/HVM: fix RTC hour conversions
Properly mask off bit 7 when retrieving the hour values in
alarm_timer_update(), and properly use RTC_HOURS_ALARM's bit 7 when
converting from 12- to 24-hour value.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 23 Jan 2013 13:15:16 +0000 (14:15 +0100)]
x86: support up to 16Tb
This mainly involves adjusting the number of L4 entries needing copying
between page tables (which is now different between PV and HVM/idle
domains), and changing the cutoff point and method when more than the
supported amount of memory is found in a system.
Since TMEM doesn't currently cope with the full 1:1 map not always
being visible, it gets forcefully disabled in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Jan Beulich [Wed, 23 Jan 2013 13:14:34 +0000 (14:14 +0100)]
tmem: partial adjustments for x86 16Tb support
Despite the changes below, tmem still has code assuming to be able to
directly access all memory, or mapping arbitrary amounts of not
directly accessible memory. I cannot see how to fix this without
converting _all_ its domheap allocations to xenheap ones. And even then
I wouldn't be certain about there not being other cases where the "all
memory is always mapped" assumption would be broken. Therefore, tmem
gets disabled by the next patch for the time being if the full 1:1
mapping isn't always visible.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 23 Jan 2013 13:12:54 +0000 (14:12 +0100)]
x86: properly use map_domain_page() in nested HVM code
This eliminates a couple of incorrect/inconsistent uses of
map_domain_page() from VT-x code.
Note that this does _not_ add error handling where none was present
before, even though I think NULL returns from any of the mapping
operations touched here need to properly be handled. I just don't know
this code well enough to figure out what the right action in each case
would be.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 23 Jan 2013 13:08:44 +0000 (14:08 +0100)]
x86: properly use map_domain_page() when building Dom0
This requires a minor hack to allow the correct page tables to be used
while running on Dom0's page tables (as they can't be determined from
"current" at that time).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 23 Jan 2013 13:06:20 +0000 (14:06 +0100)]
x86: re-introduce map_domain_page() et al
This is being done mostly in the form previously used on x86-32,
utilizing the second L3 page table slot within the per-domain mapping
area for those mappings. It remains to be determined whether that
concept is really suitable, or whether instead re-implementing at least
the non-global variant from scratch would be better.
Also add the helpers {clear,copy}_domain_page() as well as initial uses
of them.
One question is whether, to exercise the non-trivial code paths, we
shouldn't make the trivial shortcuts conditional upon NDEBUG being
defined. See the debugging patch at the end of the series.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Fix S3 regression introduced by cs 23013:65d26504e843 (ACPI: large
cleanup). The dmar virtual pointer returned from acpi_get_table cannot
be safely stored away and used later, as the underlying
acpi_os_map_memory / __acpi_map_table functions overwrite the mapping
causing it to point to different tables than dmar (last fetched table is
used). This subsequently causes acpi_dmar_reinstate() and
acpi_dmar_zap() to write data to wrong table, causing its corruption and
problems with consecutive s3 resumes.
Added a new function to fetch ACPI table physical address, and
establishing separate static mapping for dmar_table pointer instead of
using acpi_get_table().
Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Added call to acpi_tb_verify_table(). Fixed page count passed to
map_pages_to_xen(). Cosmetic changes.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Daniel De Graaf [Wed, 23 Jan 2013 09:18:50 +0000 (09:18 +0000)]
xen/arch/x86: complete XSM hooks on irq/pirq mappings
Manipulation of a domain's pirq namespace was not fully protected by
XSM hooks because the XSM hooks for IRQs needed a physical IRQ. Since
this may not apply to HVM domains, a complete solution needs to split
the XSM hook for this operation, using one hook for the PIRQ
manipulation and one for controlling access to the hardware IRQ.
This reworking has the advantage of providing the same MSI data to
remove_irq that is provided to add_irq, allowing the PCI device to be
determined in both functions. It also eliminates the last callers of
rcu_lock_target_domain_by_id in x86 and common code in preparation for
this function's removal.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Wed, 23 Jan 2013 09:17:19 +0000 (09:17 +0000)]
hvm: wire up domctl and xsm hypercalls
These hypercalls are usable by HVM guests. Once connected, simple
functions of the Xen toolstack can be run from an HVM domain if that
domain is permitted access (which is currently only possible via XSM).
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 22 Jan 2013 08:33:10 +0000 (09:33 +0100)]
x86: restore (optional) forwarding of PCI SERR induced NMI to Dom0
c/s 22949:54fe1011f86b removed the forwarding of NMIs to Dom0 when they
were caused by PCI SERR. NMI buttons as well as BMCs (like HP's iLO)
may however want such events to be seen in Dom0 (e.g. to trigger a
dump).
Therefore restore most of the functionality which named c/s removed
(adjusted for subsequent changes, and adjusting the public interface to
use the modern term, retaining the old one for backwards
compatibility).
Ian Campbell [Mon, 21 Jan 2013 16:04:56 +0000 (16:04 +0000)]
vtpmmgr: fix build on 32-bit
Correct format string, fixing:
vtpm_storage.c: In function 'vtpm_storage_load_header': vtpm_storage.c:658: error: format '%ld' expects type 'long int', but argument 5 has type 'unsigned int'
vtpm_storage.c:658: error: format '%ld' expects type 'long int', but argument 5 has type 'unsigned int' make[2]: *** [vtpm_storage.o] Error 1
Add padlock.o to PSSL_OBJS, fixing:
/local/scratch/ianc/devel/xen-unstable.git/stubdom/mini-os-x86_32-vtpmmgr/mini-os.o: In function `aes_crypt_ecb': /local/scratch/ianc/devel/xen-unstable.git/stubdom/polarssl-x86_32/library/aes.c:659: undefined reference to `padlock_supports'
/local/scratch/ianc/devel/xen-unstable.git/stubdom/polarssl-x86_32/library/aes.c:661: undefined reference to `padlock_xcryptecb' /local/scratch/ianc/devel/xen-unstable.git/stubdom/mini-os-x86_32-vtpmmgr/mini-os.o: In function `aes_crypt_cbc': /local/scratch/ianc/devel/xen-unstable.git/stubdom/polarssl-x86_32/library/aes.c:771: undefined reference to `padlock_supports'
/local/scratch/ianc/devel/xen-unstable.git/stubdom/polarssl-x86_32/library/aes.c:773: undefined reference to `padlock_xcryptcbc'
make[1]: ***
[/local/scratch/ianc/devel/xen-unstable.git/stubdom/mini-os-x86_32-vtpmmgr/mini-os]
Error 1
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked by: Matthew Fioravante <matthew.fioravante@jhuapl.edu>
[ ijc -- applied same fix to stubdom/vtpm/Makefile ] Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: flush dcache after memcpy'ing the kernel image
After memcpy'ing the kernel in guest memory we need to flush the dcache
to make sure that the data actually reaches the memory before we start
executing guest code with caches disabled.
copy_from_paddr is the function that does the copy, so add a
flush_xen_dcache_va_range there.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 21 Jan 2013 12:40:31 +0000 (12:40 +0000)]
arm: use module provided command line for domain 0 command line
Fallback to xen,dom0-bootargs if this isn't present.
Ideally this would use module1-args iff the kernel came from the
modules and the existing xen,dom0-bootargs if the kernel came from
flash, but this approach is simpler and has the same effect in
practice.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 21 Jan 2013 12:40:27 +0000 (12:40 +0000)]
arm: avoid placing Xen over any modules.
This will still fail if the modules are such that Xen is pushed out of
the top 32M of RAM since it will then overlap with the domheap (or
possibly xenheap). This will be dealt with later.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 21 Jan 2013 12:40:26 +0000 (12:40 +0000)]
xen: arm: introduce concept of modules which can be in RAM at start of day
The intention is that these will eventually be filled in with
information from the bootloader, perhaps via a DTB binding.
Allow for 2 modules (kernel and initrd), plus a third pseudo-module
which is the hypervisor itself. Currently we neither parse nor do
anything with them.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>