Juergen Gross [Wed, 7 Jan 2015 10:10:28 +0000 (11:10 +0100)]
expand x86 arch_shared_info to support linear p2m list
The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
currently contains the mfn of the top level page frame of the 3 level
p2m tree, which is used by the Xen tools during saving and restoring
(and live migration) of pv domains and for crash dump analysis. With
three levels of the p2m tree it is possible to support up to 512 GB of
RAM for a 64 bit pv domain.
A 32 bit pv domain can support more, as each memory page can hold 1024
instead of 512 entries, leading to a limit of 4 TB.
To be able to support more RAM on x86-64 switch to an additional
virtual mapped p2m list.
This patch expands struct arch_shared_info with a new p2m list virtual
address, the root of the page table root and a p2m generation count.
The new information is indicated by the domain to be valid by storing
a non-zero value into the page table root member.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Olaf Hering [Wed, 7 Jan 2015 10:09:50 +0000 (11:09 +0100)]
use more fixed strings to build the hypervisor
It should be possible to repeatedly build identical sources and get
identical binaries, even on different hosts at different build times.
This fails for xen.gz and xen.efi because current time and buildhost
get included in the binaries.
Provide variables XEN_BUILD_DATE, XEN_BUILD_TIME and XEN_BUILD_HOST
which the build environment can set to fixed strings to get a
reproducible build.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Wed, 7 Jan 2015 10:08:49 +0000 (11:08 +0100)]
x86/hvm: extend HVM cpuid leaf with vcpu id
To perform certain hypercalls HVM guests need to use Xen's idea of
vcpu id, which may well not match the guest OS idea of CPU id.
This patch adds vcpu id to the HVM cpuid leaf allowing the guest
to build a mapping.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ian Jackson [Tue, 6 Jan 2015 16:21:21 +0000 (16:21 +0000)]
configure: Rerun autogen.sh
Various configure scripts have the Xen version built into them by
autoconf. Rereun autogen.sh (on Debian wheezy) so that they all say
4.6. There are no changes other than to doc comments, usage messages,
and so forth.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 6 Jan 2015 16:18:42 +0000 (16:18 +0000)]
Open Xen 4.6.
* Update README's figlet.
* Remove obsolete 4.3 features paragraph from README (!)
* Update QEMU_UPSTREAM_REVISION to refer simply to `master'
* Update QEMU_TRADITIONAL_REVISION to refer to the actual commit hash.
* Change version in xen/Makefile to 4.6.0-unstable
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Olaf Hering [Fri, 19 Dec 2014 11:25:32 +0000 (12:25 +0100)]
tools/hotplug: remove EnvironmentFile from xen-qemu-dom0-disk-backend.service
The referenced Environment file does not exist, and the service file
does not make use of variables anyway.
N.B. If we start honouring env settings for any reason this will
have to be changed.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Fri, 19 Dec 2014 11:25:31 +0000 (12:25 +0100)]
tools/hotplug: use XENCONSOLED_TRACE in xenconsoled.service
Instead of inventing a new XENCONSOLED_LOG= variable reuse the
existing XENCONSOLED_TRACE= variable in xenconsoled.service.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Fri, 19 Dec 2014 11:25:30 +0000 (12:25 +0100)]
tools/hotplug: use xencommons as EnvironmentFile in xenconsoled.service
The referenced sysconfig/xenconsoled does not exist. If anything
needs to be specified it has to go into the existing
sysconfig/xencommons file.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Fri, 19 Dec 2014 11:25:29 +0000 (12:25 +0100)]
tools/hotplug: xendomains.service depends on network
Starting domains during boot will most likely require network for
the local bridge and it may need access to remote filesystems. Add
ordering tags to systemd service file.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Fri, 19 Dec 2014 11:25:28 +0000 (12:25 +0100)]
tools/hotplug: remove XENSTORED_ROOTDIR from xenstored.service
There is no need to export XENSTORED_ROOTDIR. This variable can be
enabled in sysconfig/xencommons. If the variable is unset xenstored
will automatically use @XEN_LIB_STORED@.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Fri, 19 Dec 2014 11:25:27 +0000 (12:25 +0100)]
tools/hotplug: remove SELinux options from var-lib-xenstored.mount
Using SELinux mount options per default breaks several systems.
Either the context= mount option is not known at all to the kernel,
as reported for ArchLinux. Or the default value "none" is unknown to
SELinux, as reported for Fedora. In both cases the unit will fail.
The proper place to specify mount options is /etc/fstab. Apparently
systemd is kind enough to use values from there even if Options= or
What= is specified in a .mount file.
Remove XENSTORED_MOUNT_CTX, the reference to a non-existent
EnvironmentFile and trim default Options= for the mount point.
The removed code was first mentioned in the patch referenced below,
with the following description:
...
* Some systems define the selinux context in the systemd Option for
the /var/lib/xenstored tmpfs:
Options=mode=755,context="system_u:object_r:xenstored_var_lib_t:s0"
For the upstream version we remove that and let systems specify
the context on their system /etc/default/xenstored or
/etc/sysconfig/xenstored $XENSTORED_MOUNT_CTX variable
...
It is nowhere stated (on xen-devel) what "Some systems" means, which
is unfortunately common practice in nearly all opensource projects.
http://lists.xenproject.org/archives/html/xen-devel/2014-03/msg02462.html
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Anthony PERARD <anthony.perard@citrix.com> Cc: M A Young <m.a.young@durham.ac.uk> Cc: Luis R. Rodriguez <mcgrof@do-not-panic.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ed Swierk [Tue, 6 Jan 2015 15:21:07 +0000 (15:21 +0000)]
libxl: Fix building libxlu_cfg_y.y with bison 3.0
- Use %lex-param instead of obsolete YYLEX_PARAM to override lex scanner
parameter
- Change deprecated %name-prefix= to %name-prefix
Tested against bison 2.4.1 and 3.0.2.
This is expected to sometimes (depending on timestamps and whether the
bison input files are edited) break building on systems with ancient
versions of bison. Bison 2.4.1 is known to work and was released in
December 2008.
Also, consquentially, regenerate bison output files with bison
1:2.5.dfsg-2.1 from Debian wheezy.
Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Tue, 6 Jan 2015 15:15:15 +0000 (15:15 +0000)]
libxl: Renegerate flex output files
Regenerate libxl_*_l.* with flex 2.5.35-10.1 as in current Debian
wheezy. The differences are trivial: addition of declarations of
xlu__cfg_yyget_column and xlu__cfg_yyset_column, but no code body
changes.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Fri, 19 Dec 2014 11:17:02 +0000 (11:17 +0000)]
EFI: suppress bogus loader warning
This was accidentally lost in commit fbc3d9a220 ("EFI: add
efi_arch_handle_cmdline() for processing commandline"), leading to the
"Unknown command line option" warning being printed whenever options
get passed to the core hypervisor or the Dom0 kernel.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mihai Donțu [Tue, 6 Jan 2015 12:49:52 +0000 (12:49 +0000)]
x86/HVM: prevent use-after-free when destroying a domain
hvm_domain_relinquish_resources() can free certain domain resources
which can still be accessed, e.g. by HVMOP_set_param, while the domain
is being cleaned up.
This is CVE-2015-0361 / XSA-116.
Signed-off-by: Mihai Donțu <mdontu@bitdefender.com> Tested-by: Răzvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
reset PCI devices on force removal even when QEMU returns error
On do_pci_remove when QEMU returns error, we just bail out early without
resetting the device. On domain shutdown we are racing with QEMU exiting
and most often QEMU closes the QMP connection before executing the
requested command.
In these cases if force=1, it makes sense to go ahead with rest of the
PCI device removal, that includes resetting the device and calling
xc_deassign_device. Otherwise we risk not resetting the device properly
on domain shutdown.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Sun, 21 Dec 2014 11:18:53 +0000 (11:18 +0000)]
xen: arm: correct off-by-one error in consider_modules
By iterating up to <= mi->nr_mods we are running off the end of the boot
modules, but more importantly it causes us to then skip the first FDT reserved
region, meaning we might clobber it.
Signed-off-by: Ian Campbell <ijc@hellion.org.uk> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Andrew Cooper [Mon, 5 Jan 2015 14:19:58 +0000 (14:19 +0000)]
tools/libxl: Use of init()/dispose() to avoid leaking libxl_dominfo.ssid_label
libxl_dominfo contains a ssid_label pointer which will have memory allocated
for it in libxl_domain_info() if the hypervisor has CONFIG_XSM compiled.
However, the lack of appropriate use of libxl_dominfo_{init,dispose}() will
cause the label string to be leaked, even in success cases.
This was discovered by XenServers Coverity scanning, and are issues not
identified by upstream Coverity Scan.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 14 Nov 2014 14:41:38 +0000 (14:41 +0000)]
libxl: Fix if{} nesting in do_pci_remove
do_pci_remove contained this:
if (type == LIBXL_DOMAIN_TYPE_HVM) {
[stuff]
} else if (type != LIBXL_DOMAIN_TYPE_PV)
abort();
{
This is bizarre, and not correct. The effect is that HVM guests end
up running both the proper code and that intended for PV guests. This
causes (amongst other things) trouble when PCI devices are
hot-unplugged from HVM guests.
This bug was introduced in abfb006f "tools/libxl: explicitly grant
access to needed I/O-memory ranges".
This is clear candidate for Xen 4.5, being a bugfix to an important
feature.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Robert Hu <robert.hu@intel.com> Rlease-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> CC: Sander Eikelenboom <linux@eikelenboom.it> CC: George Dunlap <George.Dunlap@eu.citrix.com>
Ian Jackson [Mon, 5 Jan 2015 14:31:00 +0000 (14:31 +0000)]
libxl: Initialise CTX->xce in domain suspend, as needed
When excuting xl migrate/Remus, the following error can occur:
[root@master xen]# xl migrate 5 slaver
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x1/0x0/1225)
Loading new save file <incoming migration stream> (new xl fmt info 0x1/0x0/12\
)
Savefile contains xl domain config in JSON format
Parsing config from <saved>
Segmentation fault (core dumped)
This is because CTX->xce is used without been initialized.
The bug was introduced by commit 2ffeb5d7f5d8
libxl: events: Deregister evtchn fd when not needed
which removed the initialization of xce from libxl__ctx_alloc.
In this patch we initialise the CTX->xce before using it. Also, we
adjust the doc comment for libxl__ev_evtchn_* to mention the need to
do so.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Wei Liu <wei.liu2@citrix.com>
Avoid emitting an error message referring to an incorrect or corrupt
container file just because no entry was found for the running CPU.
Additionally switch the order of data validation and consumption in
cpu_request_microcode()'s first loop, and also check the types of
skipped blocks in container_fast_forward().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Fri, 12 Dec 2014 10:24:13 +0000 (10:24 +0000)]
domctl: fix IRQ permission granting/revocation
Commit 545607eb3c ("x86: fix various issues with handling guest IRQs")
wasn't really consistent in one respect: The granting of access to an
IRQ shouldn't assume the pIRQ->IRQ translation to be the same in both
domains. In fact it is wrong to assume that a translation is already/
still in place at the time access is being granted/revoked.
What is wanted is to translate the incoming pIRQ to an IRQ for
the invoking domain (as the pIRQ is the only notion the invoking
domain has of the IRQ), and grant the subject domain access to
the resulting IRQ.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Thu, 11 Dec 2014 10:47:21 +0000 (10:47 +0000)]
x86: don't deliver NMI to PVH Dom0
... for the time being: The mechanism used depends on the domain's use
of the IRET hypercall - which PVH is not using. HVM code (which PVH
uses) will deliver an NMI if it sees v->nmi_pending however that
temporary affinity adjustment gets undone in the HYPERVISOR_iret
handler, yet PVH can't call that hypercall.
Also drop two bogus code lines spotted while going through the involved
code paths: Addresses of per-CPU variables can't possibly be NULL, and
the setting of st->vcpu in send_guest_trap()'s MCE case is redundant
with an earlier cmpxchgptr().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
M A Young [Thu, 18 Dec 2014 10:02:16 +0000 (10:02 +0000)]
tools/xl: fix segfault in xl migrate --debug
If differences are found during the verification phase of xl migrate
--debug then it is likely to crash with a segfault because the bogus
pagebuf->pfn_types[pfn] is used in a print statement instead of
pfn_type[pfn] .
Signed-off-by: Michael Young <m.a.young@durham.ac.uk> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
George Dunlap [Tue, 9 Dec 2014 14:04:19 +0000 (14:04 +0000)]
libxl: Tell qemu to use raw format when using a tapdisk
At the moment libxl unconditinally passes the underlying file format
to qemu in the device string. However, when tapdisk is in use,
tapdisk handles the underlying format and presents qemu with
effectively a raw disk. When qemu looks at the tapdisk block device
and doesn't find the image format it was looking for, it will fail.
This effectively means that tapdisk cannot be used with HVM domains at
the moment except for raw files.
Instead, if we're using a tapdisk backend, tell qemu to use a raw file
format.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- nuked extra blank line ]
Wei Liu [Mon, 15 Dec 2014 10:56:24 +0000 (10:56 +0000)]
xl: print message to stdout when (!debug && dryrun)
In commit d36a3734a ("xl: fix migration failure with xl migrate
--debug"), message is printed to stderr for both debug mode
and dryrun mode. That caused rdname() in xendomains fails to parse
domain name since it's expecting input from xl's stdout.
So this patch separates those two cases. If xl is running in debug mode,
then message is printed to stderr; if xl is running in dryrun mode and
debug is not enabled, message is printed to stdout. This will fix
xendomains and other scripts that use "xl create --dryrun", as well as
not re-introducing the old bug fixed in d36a3734a.
Reported-by: Mark Pryor <tlviewer@yahoo.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: M A Young <m.a.young@durham.ac.uk> Cc: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Fri, 12 Dec 2014 18:26:02 +0000 (18:26 +0000)]
docs/commandline: Minor formatting fixes and clarifications
`font` had a trailing single quote which was out of place.
`gnttab_max_frames` was missing escapes for the underscores which caused the
underscores to take their markdown meaning, causing 'max' in the middle to be
italicised. Escape the underscores, and make all command line parameters
bold, to be consistent with the existing style.
Clarify how the default for `nmi` changes between debug and non debug builds
of Xen.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Tue, 9 Dec 2014 16:43:22 +0000 (16:43 +0000)]
python/xc: Fix multiple issues in pyflask_context_to_sid()
The error handling from a failed memory allocation should return
PyErr_SetFromErrno(xc_error_obj); rather than simply calling it and continuing
to the memcpy() below, with the dest pointer being NULL.
Coverity also complains about passing a non-NUL terminated string to
xc_flask_context_to_sid(). xc_flask_context_to_sid() doesn't actually take a
NUL terminated string, but it does take a char* which, in context, used to be
a string, which is why Coverity complains.
One solution would be to use strdup(ctx) which is simpler than a
strlen()/malloc()/memcpy() combo, which would result in a NUL-terminated
string being used with xc_flask_context_to_sid().
However, ctx is strictly an input to the hypercall and is not mutated along
the way. Both these issues can be fixed, and the error logic simplified, by
not duplicating ctx in the first place.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-IDs: 10553051055721 Acked-by: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Xen Coverity Team <coverity@xen.org>
The UART is not able to receive bytes when idle mode is not configured
properly, therefore setup the UART with autoidle and wakeup enabled.
Older Linux kernels (for example 3.8) configure hwmods for all devices
even if the device tree nodes for those devices is absent in device
tree, thus UART idle mode is configured too. With such kernels we can
workaround the issue by adding a fake node in the UART containing this
MMIO range, which is therefore mapped by Xen to dom0, which
reconfigures the UART, causing things to work normally.
Newer Linux Kernels (3.12 and beyond) do not configure idle mode for
UART and so this hack no longer works.
Signed-off-by: Oleksandr Dmytryshyn <oleksandr.dmytryshyn@globallogic.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- updated commit message as discussed ]
Jan Beulich [Mon, 15 Dec 2014 08:30:05 +0000 (09:30 +0100)]
console: allocate ring buffer earlier
... when "conring_size=" was specified on the command line. We can't
really do this as early as we would want to when the option was not
specified, as the default depends on knowing the system CPU count. Yet
the parsing of the ACPI tables is one of the things that generates a
lot of output especially on large systems.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> (ARM and generic bits) Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Thu, 11 Dec 2014 16:14:07 +0000 (17:14 +0100)]
have architectures specify the number of PIRQs a hardware domain gets
The current value of nr_static_irqs + 256 is often too small for larger
systems. Make it dependent on CPU count and number of IO-APIC pins on
x86, and (until it obtains PCI support) simply NR_IRQS on ARM.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Jan Beulich [Thu, 11 Dec 2014 16:13:04 +0000 (17:13 +0100)]
lock down hypercall continuation encoding masks
Andrew validly points out that even if these masks aren't a formal part
of the hypercall interface, we aren't free to change them: A guest
suspended for migration in the middle of a continuation would fail to
work if resumed on a hypervisor using a different value. Hence add
respective comments to their definitions.
Additionally, to help future extensibility as well as in the spirit of
reducing undefined behavior as much as possible, refuse hypercalls made
with the respective bits non-zero when the respective sub-ops don't
make use of those bits.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Release-Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Ian Jackson [Wed, 26 Nov 2014 17:28:18 +0000 (17:28 +0000)]
libxl: events: Document and enforce actual callbacks restriction
libxl_event_register_callbacks cannot reasonably be called while libxl
is busy (has outstanding operations and/or enabled events).
This is because the previous spec implied (although not entirely
clearly) that event hooks would not be called for existing fd and
timeout interests. There is thus no way to reliably ensure that libxl
would get told about fds and timeouts which it became interested in
beforehand.
So there have to be no such fds or timeouts, which means that the
callbacks must only be registered or changed when the ctx is idle.
Document this restriction, and enforce it with a pair of asserts.
(It would be nicer, perhaps, to say that the application may not call
libxl_osevent_register_hooks other than right after creating the ctx.
But there are existing callers, including libvirt, who do it later -
even after doing major operations such as domain creation.)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 26 Nov 2014 17:27:27 +0000 (17:27 +0000)]
libxl: events: Deregister evtchn fd when not needed
We want to have no fd events registered when we are idle.
In this patch, deal with the evtchn fd:
* Defer setup of the evtchn handle to the first use.
* Defer registration of the evtchn fd; register as needed on use.
* When cancelling an evtchn wait, or when wait setup fails, check
whether there are now no evtchn waits and if so deregister the fd.
* On libxl teardown, the evtchn fd should therefore be unregistered.
assert that this is the case.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Do not bother putting evtchn_fd in the ctx; instead, get it
from xc_evtchn_fd when we need it. (Cosmetic.)
Do not register the evtchn fd multiple times: check it's not
registered before we call libxl__ev_fd_register. (Bugfix.)
Ian Jackson [Thu, 27 Nov 2014 18:04:29 +0000 (18:04 +0000)]
libxl: events: Tear down SIGCHLD machinery on ctx destruction
We want to have no fd events registered when we are idle.
Also, we should put back the default SIGCHLD handler. So:
* In libxl_ctx_free, use libxl_childproc_setmode to set the mode to
the default, which is libxl_sigchld_owner_libxl (ie `libxl owns
SIGCHLD only when it has active children').
But of course there are no active children at libxl teardown so
this results in libxl__sigchld_notneeded: the ctx loses its
interest in SIGCHLD (unsetting the SIGCHLD handler if we were the
last ctx) and deregisters the per-ctx selfpipe fd.
* assert that this is the case: ie that we are no longer interested
in the selfpipe.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Thu, 27 Nov 2014 18:03:03 +0000 (18:03 +0000)]
libxl: events: Deregister, don't just modify, sigchld pipe fd
We want to have no fd events registered when we are idle. This
implies that we must be able to deregister our interest in the sigchld
self-pipe fd, not just modify to request no events.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 26 Nov 2014 16:44:52 +0000 (16:44 +0000)]
libxl: events: Deregister xenstore watch fd when not needed
We want to have no fd events registered when we are idle.
In this patch, deal with the xenstore watch fd:
* Track the total number of active watches.
* When deregistering a watch, or when watch registration fails, check
whether there are now no watches and if so deregister the fd.
* On libxl teardown, the watch fd should therefore be unregistered.
assert that this is the case.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Wed, 26 Nov 2014 16:17:49 +0000 (16:17 +0000)]
libxl: events: Assert that libxl_ctx_free is not called from a hook
No-one in their right mind would do this, and if they did everything
would definitely collapse. Arrange that if this happens, we crash
ASAP.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Olaf Hering [Fri, 5 Dec 2014 10:49:47 +0000 (11:49 +0100)]
tools/xenstore: fix link error with libsystemd
Linking fails with undefined reference to the used systemd functions.
Move LDFLAGS after the object files to fix the failure.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Mon, 8 Dec 2014 13:45:46 +0000 (14:45 +0100)]
switch to write-biased r/w locks
This is to improve fairness: A permanent flow of read acquires can
otherwise lock out eventual writers indefinitely.
This is CVE-2014-9065 / XSA-114.
Signed-off-by: Keir Fraser <keir@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Don Dugger [Fri, 5 Dec 2014 11:46:15 +0000 (12:46 +0100)]
VT-d: decouple SandyBridge quirk from VTd timeout
Currently the quirk code for SandyBridge uses the VTd timeout value when
writing to an IGD register. This is the wrong timeout to use and, at
1000 msec., is also much too large. This patch changes the quirk code
to use a timeout that is specific to the IGD device and allows the user
control of the timeout.
Boolean settings for the boot parameter `snb_igd_quirk' keep their current
meaning, enabling or disabling the quirk code with a timeout of 1000 msec.
In addition specifying `snb_igd_quirk=default' will enable the code and
set the timeout to the theoretical maximum of 670 msec. For finer control,
specifying `snb_igd_quirk=n', where `n' is a decimal number, will enable
the code and set the timeout to `n' msec.
Signed-off-by: Don Dugger <donald.d.dugger@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Update documentation.
Wei Liu [Tue, 2 Dec 2014 15:11:30 +0000 (15:11 +0000)]
systemd: use pkg-config to determine systemd library availability
AC_CHECK_LIB fails on Debian Jessie since the ld flag it generates is
incorrect, even in the event systemd library is available. Use
PKG_CHECK_MODULES instead.
Tested on Debian Jessie and Arch Linux.
Reported-by: Mark Pryor <tlviewer@yahoo.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: Luis R. Rodriguez <mcgrof@do-not-panic.com> Cc: Mark Pryor <tlviewer@yahoo.com>
[ ijc -- reran autogen.sh as requested ]
Julien Grall [Fri, 28 Nov 2014 15:17:06 +0000 (15:17 +0000)]
xen/arm: Handle platforms with edge-triggered virtual timer
Some platforms (such as Xgene and ARMv8 models) use an edge-triggered interrupt
for the virtual timer. Even if the timer output signal is masked in the
context switch, the GIC will keep track that of any interrupts raised
while IRQs are disabled. As soon as IRQs are re-enabled, the virtual
interrupt timer will be injected to Xen.
If an idle vVCPU was scheduled next then the interrupt handler doesn't
expect to the receive the IRQ and will crash:
The proper solution is to context switch the virtual interrupt state at
the GIC level. This would also avoid masking the output signal which
requires specific handling in the guest OS and more complex code in Xen
to deal with EOIs, and so is desirable for that reason too.
Sadly, this solution requires some refactoring which would not be
suitable for a freeze exception for the Xen 4.5 release.
For now implement a temporary solution which ignores the virtual timer
interrupt when the idle VCPU is running.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- tweaked some wording in the comment ]
Boris Ostrovsky [Tue, 25 Nov 2014 16:11:50 +0000 (11:11 -0500)]
pygrub: Fix regression from c/s d1b93ea, attempt 2
c/s d1b93ea causes substantial functional regressions in pygrub's ability to
parse bootloader configuration files.
c/s d1b93ea itself changed an an interface which previously used exclusively
integers, to using strings in the case of a grub configuration with explicit
default set, along with changing the code calling the interface to require a
string. The default value for "default" remained as an integer.
As a result, any Extlinux or Lilo configuration (which drives this interface
exclusively with integers), or Grub configuration which doesn't explicitly
declare a default will die with an AttributeError when attempting to call
"self.cf.default.isdigit()" where "default" is an integer.
Sadly, this AttributeError gets swallowed by the blanket ignore in the loop
which searches partitions for valid bootloader configurations, causing the
issue to be reported as "Unable to find partition containing kernel"
We should explicitly check type of "default" in image_index() and process it
appropriately.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
libxc: check in xc_get_tot_pages() that the proper domain is reported
XEN_DOMCTL_getdomaininfo, which is being used by xc_domain_getinfo(), has
strange interface: it reports first domain which has domid >= requested domid
so all callers are supposed to check that the proper domain(s) was queried
by checking domid. xc_get_tot_pages() doesn't do that. In case the requested
domain was destroyed it will report first domain with domid > requested domid
which is apparently misleading as there is no way xc_get_tot_pages() callers
can figure out that they got tot_pages for some other domain.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Thu, 27 Nov 2014 12:34:34 +0000 (12:34 +0000)]
python/xs: Correct the indirection of the NULL xshandle() check
The code now now matches its comment, and will actually catch the case of a
bad xs handle.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-ID: 1055948 CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Xen Coverity Team <coverity@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Thu, 27 Nov 2014 12:34:33 +0000 (12:34 +0000)]
python/xc: Fix multiple issues in pyxc_readconsolering()
Don't leak a 16k allocation if PyArg_ParseTupleAndKeywords() or the first
xc_readconsolering() fail. It is trivial to run throught the processes memory
by repeatedly passing junk parameters to this function.
In the case that the call to xc_readconsolering() in the while loop fails,
reinstate str before breaking out, and passing a spurious pointer to free().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-IDs: 10549841055906 CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Xen Coverity Team <coverity@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Tue, 2 Dec 2014 15:39:23 +0000 (16:39 +0100)]
tools/hotplug: update systemd dependency to use service instead of socket
Since commit 4542ae340d75bd6319e3fcd94e6c9336e210aeef ("tools/hotplug:
systemd xenstored dependencies") all service files use the .socket unit
as startup dependency. While this happens to work for boot it fails for
shutdown because a .socket does not seem to enforce ordering. When
xendomains.service runs during shutdown then systemd will stop
xenstored.service at the same time.
Change all "xenstored.socket" to "xenstored.service" to let systemd know
that xenstored has to be shutdown after everything else.
Reported-by: Mark Pryor <tlviewer@yahoo.com> Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Wed, 3 Dec 2014 10:41:38 +0000 (10:41 +0000)]
libxl: expose #define to 4.5 and above
In e3abab74 (libxl: un-constify return value of libxl_basename), the
macro was exposed to releases < 4.5. However only new code is able to
make use of that macro so it should be exposed to releases >= 4.5.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Wed, 3 Dec 2014 08:52:34 +0000 (09:52 +0100)]
INSTALL: fix typo in xendomains.service name
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Dec 2014 11:31:13 +0000 (11:31 +0000)]
xl: fix two memory leaks
Free strings returned by libxl_basename after used.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- s/basename/kernel_basename in parse_config_data to avoid
shadowing basename(3). ]
Euan Harris [Mon, 1 Dec 2014 14:27:06 +0000 (14:27 +0000)]
libxl: Don't dereference null new_name pointer in libxl_domain_rename()
libxl__domain_rename() unconditionally dereferences its new_name
parameter, to check whether it is an empty string. Add a check to
avoid a segfault if new_name is null.
Signed-off-by: Euan Harris <euan.harris@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 1 Dec 2014 11:31:12 +0000 (11:31 +0000)]
libxl: un-constify return value of libxl_basename
The string returned is malloc'ed but marked as "const".
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Razvan Cojocaru [Fri, 28 Nov 2014 12:26:48 +0000 (14:26 +0200)]
xenstore: Clarify xs_open() semantics
Added to the xs_open() comments in xenstore.h. The text has been
taken almost verbatim from a xen-devel email by Ian Campbell,
and confirmed as accurate by Ian Jackson.
Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
M A Young [Tue, 2 Dec 2014 13:48:54 +0000 (13:48 +0000)]
xl: fix migration failure with xl migrate --debug
Migrations with xl migrate --debug will fail because debugging
information from the receiving process is written to the stdout
channel. This channel is also used for status messages so the
migration will fail as the sending process receives an unexpected
message. This patch moves the debugging information to the stderr
channel.
Signed-off-by: Michael Young <m.a.young@durham.ac.uk> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 2 Dec 2014 11:48:01 +0000 (12:48 +0100)]
x86/HVM: prevent infinite VM entry retries
This reverts the VMX side of commit 28b4baac ("x86/HVM: don't crash
guest upon problems occurring in user mode") and gets SVM in line with
the resulting VMX behavior. This is because Andrew validly says
"A failed vmentry is overwhelmingly likely to be caused by corrupt
VMC[SB] state. As a result, injecting a fault and retrying the the
vmentry is likely to fail in the same way."
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Chunyan Liu [Wed, 19 Nov 2014 06:34:11 +0000 (14:34 +0800)]
fix rename: xenstore not fully updated
libxl__domain_rename only updates /local/domain/<domid>/name,
/vm/<uuid>/name in xenstore are not updated. Add code in
libxl__domain_rename to update /vm/<uuid>/name too.
Signed-off-by: Chunyan Liu <cyliu@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Wed, 26 Nov 2014 15:09:40 +0000 (15:09 +0000)]
tools/oxenstored: Fix | vs & error in fd event handling
This makes fields 0 and 1 true more often than they should be, resulting
problems when handling events.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Dave Scott <Dave.Scott@eu.citrix.com> CC: Zheng Li <zheng.li3@citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Zheng Li <dev@zheng.li> Reviewed-by: David Scott <dave.scott@citrix.com>
Olaf Hering [Thu, 27 Nov 2014 09:26:26 +0000 (10:26 +0100)]
INSTALL: correct EXTRA_CFLAGS handling
The already documented configure patch was not applied.
Adjust documentation to describe existing behaviour.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Wei Liu [Tue, 25 Nov 2014 10:59:47 +0000 (10:59 +0000)]
libxl: allow copying between bitmaps of different sizes
When parsing bitmap objects JSON parser will create libxl_bitmap map of the
smallest size needed.
This can cause problems when saved image file specifies CPU affinity. For
example, if 'vcpu_hard_affinity' in the saved image has only the first CPU
specified, just a single byte will be allocated and libxl_bitmap->size will be
set to 1.
This will result in assertion in libxl_set_vcpuaffinity()->libxl_bitmap_copy()
since the destination bitmap is created for maximum number of CPUs.
We could allocate that bitmap of the same size as the source, however, it is
later passed to xc_vcpu_setaffinity() which expects it to be sized to the max
number of CPUs
To fix this issue, introduce an internal function to allowing copying between
bitmaps of different sizes. Note that this function is only used in
libxl_set_vcpuaffinity at the moment. Though NUMA placement logic invoke
libxl_bitmap_copy as well there's no need to replace those invocations. NUMA
placement logic comes into effect when no vcpu / node pinning is provided, so
it always operates on bitmap of the same sizes (that is, size of maximum
number of cpus /nodes).
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Thu, 27 Nov 2014 13:04:44 +0000 (14:04 +0100)]
docs/commandline: Refresh document for 4.5
* Add options whose code has been committed (without a patch to this file).
* Remove options which have been deleted (without a patch to this file).
* Tweak some formatting for consistency.
* Nuke some trailing whitespace.
I believe this document now identifies the exact set of command line options
accepted by Xen, for both x86 and ARM architectures.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Daniel De Graaf [Thu, 27 Nov 2014 13:04:23 +0000 (14:04 +0100)]
xsm/flask: add two missing domctls
Reported-by: Michael Young <m.a.young@durham.ac.uk> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Thu, 27 Nov 2014 13:03:23 +0000 (14:03 +0100)]
x86/PVH: properly disable vLAPIC
Rather than guarding higher level operations (like vPMU initialization
as suggested by Boris in
http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg02278.html)
mark the vLAPIC hardware disabled for PVH guests and prevent it from
getting moved out of this state.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Boris Ostrovsky [Thu, 27 Nov 2014 13:02:45 +0000 (14:02 +0100)]
x86: disable VPMU for PVH guests
Currently when VPMU is enabled on a system both HVM and PVH VPCUs will
initialize their VPMUs, including setting up vpmu_ops. As result even
though VPMU will not work for PVH guests (APIC is not supported there),
the guest may decide to perform a write to a PMU MSR. This will cause a
call to is_vlapic_lvtpc_enabled() which will crash the hypervisor, e.g.:
Jan Beulich [Thu, 27 Nov 2014 13:01:40 +0000 (14:01 +0100)]
x86/HVM: confine internally handled MMIO to solitary regions
While it is generally wrong to cross region boundaries when dealing
with MMIO accesses of repeated string instructions (currently only
MOVS) as that would do things a guest doesn't expect (leaving aside
that none of these regions would normally be accessed with repeated
string instructions in the first place), this is even more of a problem
for all virtual MSI-X page accesses (both msixtbl_{read,write}() can be
made dereference NULL "entry" pointers this way) as well as undersized
(1- or 2-byte) LAPIC writes (causing vlapic_read_aligned() to access
space beyond the one memory page set up for holding LAPIC register
values).
Since those functions validly assume to be called only with addresses
their respective checking functions indicated to be okay, it is generic
code that needs to be fixed to clip the repetition count.
To be on the safe side (and consistent), also do the same for buffered
I/O intercepts, even if their only client (stdvga) doesn't put the
hypervisor at risk (i.e. "only" guest misbehavior would result).
This is CVE-2014-8867 / XSA-112.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 27 Nov 2014 13:00:23 +0000 (14:00 +0100)]
x86: limit checks in hypercall_xlat_continuation() to actual arguments
HVM/PVH guests can otherwise trigger the final BUG_ON() in that
function by entering 64-bit mode, setting the high halves of affected
registers to non-zero values, leaving 64-bit mode, and issuing a
hypercall that might get preempted and hence become subject to
continuation argument translation (HYPERVISOR_memory_op being the only
one possible for HVM, PVH also having the option of using
HYPERVISOR_mmuext_op). This issue got introduced when HVM code was
switched to use compat_memory_op() - neither that nor
hypercall_xlat_continuation() were originally intended to be used by
other than PV guests (which can't enter 64-bit mode and hence have no
way to alter the high halves of 64-bit registers).
This is CVE-2014-8866 / XSA-111.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 25 Nov 2014 16:21:52 +0000 (17:21 +0100)]
vNUMA: rename interface structures
No-one (including me) paid attention during review that these
structures don't adhere to the naming requirements of the public
interface: Consistently use xen_ prefixes at least for all new
additions.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Campbell [Thu, 20 Nov 2014 15:48:47 +0000 (15:48 +0000)]
libxc: don't leak buffer containing the uncompressed PV kernel
The libxc xc_dom_* infrastructure uses a very simple malloc memory pool which
is freed by xc_dom_release. However the various xc_try_*_decode routines (other
than the gzip one) just use plain malloc/realloc and therefore the buffer ends
up leaked.
The memory pool currently supports mmap'd buffers as well as a directly
allocated buffers, however the try decode routines make use of realloc and do
not fit well into this model. Introduce a concept of an external memory block
to the memory pool and provide an interface to register such memory.
The mmap_ptr and mmap_len fields of the memblock tracking struct lose their
mmap_ prefix since they are now also used for external memory blocks.
We are only seeing this now because the gzip decoder doesn't leak and it's only
relatively recently that kernels in the wild have switched to better
compression.
This is https://bugs.debian.org/767295
Reported by: Gedalya <gedalya@gedalya.net> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Wed, 19 Nov 2014 15:28:15 +0000 (15:28 +0000)]
xen: arm: Support the other 4 PCI buses on Xgene
Currently we only establish specific mappings for pcie0, which is
used on the Mustang platform. However at least McDivitt uses pcie3.
So wire up all the others, based on whether the corresponding DT node
is marked as available.
UIE being set can cause maintenance interrupts to occur when Xen writes
to one or more LR registers. The effect is a busy loop around the
interrupt handler in Xen
(http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
Julien Grall [Thu, 20 Nov 2014 17:36:03 +0000 (17:36 +0000)]
scripts/get_maintainer.pl: Correctly CC the maintainers
The current script is setting $email_remove_duplicates to 1 by default, on
complex patch (see [1]), this will result to ommitting randomly some
maintainers.
This is because, the script will:
1) Get the list of maintainers of the file (incidentally all the
maintainers in "THE REST" role are added). If the email address already
exists in the global list, skip it. => The role will be lost
2) Filter the list to remove the entry with "THE REST" role
So if a maintainers is marked with "THE REST" role on the first file and
actually be an x86 maintainers on the script, the script will only retain
the "THE REST" role. During the filtering step, this maintainers will
therefore be dropped.
This patch fixes this by setting $email_remove_duplicates to 0 by default.
The new behavior of the script will be:
1) Append the list of maintainers for every file
2) Filter the list to remove the entry with "THE REST" role
3) Remove duplicated email address
Example:
Patch: https://patches.linaro.org/41083/
Before the patch:
Daniel De Graaf <dgdegra@tycho.nsa.gov>
Ian Jackson <ian.jackson@eu.citrix.com>
Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Wei Liu <wei.liu2@citrix.com>
George Dunlap <george.dunlap@eu.citrix.com>
xen-devel@lists.xen.org
After the patch:
Daniel De Graaf <dgdegra@tycho.nsa.gov>
Ian Jackson <ian.jackson@eu.citrix.com>
Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Wei Liu <wei.liu2@citrix.com>
Stefano Stabellini <stefano.stabellini@citrix.com>
Tim Deegan <tim@xen.org>
Keir Fraser <keir@xen.org>
Jan Beulich <jbeulich@suse.com>
George Dunlap <george.dunlap@eu.citrix.com>
xen-devel@lists.xen.org
George Dunlap [Wed, 12 Nov 2014 17:31:33 +0000 (17:31 +0000)]
xl: Return proper error codes for block-attach and block-detach
Return proper error codes on failure so that scripts can tell whether
the command completed properly or not.
This is not a proper fix, since it fails to call
libxl_device_disk_dispose() on the error path. But a proper fix
requires some refactoring, so given where we are in the release
process, it's better to have a fix that is simple and obvious, and do
the refactoring once the next development window opens up.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 25 Nov 2014 09:08:57 +0000 (10:08 +0100)]
x86/HVM: don't crash guest upon problems occurring in user mode
This extends commit 5283b310 ("x86/HVM: only kill guest when unknown VM
exit occurred in guest kernel mode") to a few more cases, including the
failed VM entry one that XSA-110 was needed to be issued for.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Tue, 25 Nov 2014 09:08:04 +0000 (10:08 +0100)]
x86: don't ignore foreigndom input on various MMUEXT ops
Instead properly fail requests that shouldn't be issued on foreign
domains or - for MMUEXT_{CLEAR,COPY}_PAGE - extend the existing
operation to work that way.
In the course of doing this the need to always clear "okay" even when
wanting an error code other than -EINVAL became unwieldy, so the
respective logic is being adjusted at once, together with a little
other related cleanup.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Tue, 25 Nov 2014 09:05:29 +0000 (10:05 +0100)]
x86/cpuidle: don't count C1 multiple times
Commit 4ca6f9f0 ("x86/cpuidle: publish new states only after fully
initializing them") resulted in the state counter to be incremented
for C1 despite that using a fixed table entry (and the statically
initialized counter value already accounting for it and C0).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
dpci: add 'masked' as a gate for hvm_dirq_assist to process
commit f6dd295381f4b6a66acddacf46bca8940586c8d8 "dpci: replace tasklet
with softirq" used the 'masked' as an two-bit state mechanism
(STATE_SCHED, STATE_RUN) to communicate between 'raise_softirq_for' and
'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be
re-scheduled.
However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper
dialogue and was not using locked cmpxchg or test_bit operations and
ended setting 'state' set to zero. That meant 'raise_softirq_for' was
free to schedule it while the 'struct hvm_pirq_dpci'' was still on an
per-cpu list causing an list corruption.
The code would trigger the following path causing list corruption:
\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the hvm_dirq_assist.
ends the loop, exits.
\- dpci_softirq
while (!list_emtpry)
list_del, but ->next already has LIST_POISON1 and we blow up.
An alternative solution was proposed (adding STATE_ZOMBIE and making
pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above
issue but had an fatal bug. It would miss interrupts that are to be scheduled!
This patch brings back the 'masked' boolean which is used as an
communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and
'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime
'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it.
The 'state' is left as a seperate mechanism to provide an mechanism between
'raise_sofitrq' and 'softirq_dpci' to communicate the state of the
'struct hvm_dirq_pirq'.
However since we have now two seperate machines we have to deal with an
cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind'
is called while an 'hvm_dirq_assist' is just about to service.
The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and
the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls
'set_timer' hitting an ASSERT.
By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that
scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'.
In the 'pt_irq_create_bind' - in the error cases we could be seeing
an softirq scheduled right away and being serviced (though stuck at
the spinlock). The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset'
to change the 'state' (as the state is in 'STATE_RUN', not 'STATE_SCHED').
'pt_irq_create_bind' continues on with setting '->flag=0' and unlocks the lock.
'hvm_dirq_assist' grabs the lock and continues one. Since 'flag = 0' and
'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing
anything until it hits 'set_timer' which is undefined for MSI. Adding
in 'masked=0' for the MSI case fixes that.
The legacy interrupt one does not need it as there is no chance of
do_IRQ being called at that point.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 20 Nov 2014 15:22:00 +0000 (15:22 +0000)]
docs/commandline: Fix formatting issues
For 'dom0_max_vcpus' and 'hvm_debug', markdown was interpreting the text as
regular text, and reflowing it as a regular paragraph, leading to a single
line as output. Reformat them as code blocks inside blockquote blocks, which
causes them to take their precise whitespace layout.
For 'psr', the bullet point was incorrectly delineated from paragraph text,
causing it to be reflowed. Alter the formatting to include the CMT-specific
options as sub-bullets of the overall CMT resource.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Release-acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Wed, 19 Nov 2014 15:28:14 +0000 (15:28 +0000)]
xen: arm: correct specific mappings for PCIE0 on X-Gene
The region assigned to PCIE0, according to the docs, is 0x0e000000000 to
0x10000000000. They make no distinction between PCI CFG and PCI IO mem within
this range (in fact, I'm not sure that isn't up to the driver).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Wed, 19 Nov 2014 15:28:13 +0000 (15:28 +0000)]
xen: arm: correct off by one in xgene-storm's map_one_mmio
The callers pass the end as the pfn immediately *after* the last page to be
mapped, therefore adding one is incorrect and causes an additional page to be
mapped.
At the same time correct the printing of the mfn values, zero-padding them to
16 digits as for a paddr when they are frame numbers is just confusing.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Wed, 19 Nov 2014 15:28:12 +0000 (15:28 +0000)]
xen: arm: Drop EARLY_PRINTK_BAUD from entries which don't set ..._INIT_UART
EARLY_PRINTK_BAUD doesn't do anything unless EARLY_PRINTK_INIT_UART is set.
Furthermore only the pl011 driver implements the init routine at all, so the
entries which use any other UART driver and specified a BAUD were doubly wrong.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Wed, 19 Nov 2014 10:42:18 +0000 (10:42 +0000)]
docs: workaround markdown parser error in xen-command-line.markdown
Some versions of markdown (specifically the one in Debian Wheezy, currently
used to generate
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be
confused by nested lists in the middle of multi-paragraph parent list entries
as seen in the com1,com2 entry.
The effect is that the "Default" section of all following entries are replace
by some sort of hash or checksum (at least, a string of 32 random seeming hex
digits).
Workaround this issue by making the decriptions of the DPS options a nested
list, moving the existing nested list describing the options for S into a third
level list. This seems to avoid the issue, and is arguably better formatting in
its own right (at least its not a regression IMHO)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>