]> xenbits.xensource.com Git - people/julieng/xen-unstable.git/log
people/julieng/xen-unstable.git
9 years agox86/IO-APIC: fix setup of Xen internally used IRQs (take 2)
Jan Beulich [Mon, 16 Nov 2015 12:11:08 +0000 (13:11 +0100)]
x86/IO-APIC: fix setup of Xen internally used IRQs (take 2)

..., i.e. namely that of a PCI serial card with an IRQ above the
legacy range. This had got broken by the switch to cpumask_any() in
cpu_mask_to_apicid_phys(). Fix this by allowing all CPUs for that IRQ
(via setup_vector_irq() properly updating a booting CPU's vector_irq[],
thus avoiding "No irq handler for vector" messages and the interrupt
not working).

Cleanup coding style and types there at once.

While doing this I also noticed that io_apic_set_pci_routing() can't
be quite right: It sets up the destination _before_ getting a vector
allocated (which on other than systems using the flat APIC mode
affects the possible destinations), and also didn't restrict affinity
to ->arch.cpu_mask (as established by assign_irq_vector()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoMINIOS_UPSTREAM_REVISION Update
Ian Campbell [Mon, 16 Nov 2015 11:29:45 +0000 (11:29 +0000)]
MINIOS_UPSTREAM_REVISION Update

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools/ocaml/xb: Correct calculations of data/space the ring
Andrew Cooper [Tue, 10 Nov 2015 10:46:44 +0000 (10:46 +0000)]
tools/ocaml/xb: Correct calculations of data/space the ring

ml_interface_{read,write}() would miscalculate the quantity of
data/space in the ring if it crossed the ring boundary, and incorrectly
return a short read/write.

This causes a protocol stall, as either side of the ring ends up waiting
for what they believe to be the other side needing to take the next
action.

Correct the calculations to cope with crossing the ring boundary.

In addition, correct the error detection.  It is a hard error if the
producer index gets more than a ring size ahead of the consumer, or if
the consumer ever overtakes the producer.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: David Scott <dave@recoil.org>
9 years agolibxl: relax readonly check introduced by XSA-142 fix
Jim Fehlig [Fri, 13 Nov 2015 02:40:46 +0000 (19:40 -0700)]
libxl: relax readonly check introduced by XSA-142 fix

The fix for XSA-142 is quite a big hammer, rejecting readonly
disk configuration even when the requested backend is known to
support readonly. While it is true that qemu doesn't support
readonly for emulated IDE or AHCI disks

$ /usr/lib/xen/bin/qemu-system-i386 \
 -drive file=/tmp/disk.raw,if=ide,media=disk,format=raw,readonly=on
qemu-system-i386: Can't use a read-only drive

$ /usr/lib/xen/bin/qemu-system-i386 -device ahci,id=ahci0 \
 -drive file=/tmp/disk.raw,if=none,id=ahcidisk-0,format=raw,readonly=on \
 -device ide-hd,bus=ahci0.0,unit=0,drive=ahcidisk-0
qemu-system-i386: -device ide-hd,bus=ahci0.0,unit=0,drive=ahcidisk-0:
Can't use a read-only drive

It does support readonly SCSI disks

$ /usr/lib/xen/bin/qemu-system-i386 \
 -drive file=/tmp/disk.raw,if=scsi,media=disk,format=raw,readonly=on
[ok]

Inside a guest using such a disk, the SCSI kernel driver sees write
protect on

[   7.339232] sd 2:0:1:0: [sdb] Write Protect is on

Also, PV drivers support readonly, but the patch rejects such
configuration even when PV drivers (vdev=xvd*) have been explicitly
specified and creation of an emulated twin is skiped.

This follow-up patch loosens the restriction to reject readonly when
creating an emulated IDE or AHCI disk, but allows it when the backend
is known to support readonly.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: remove xc_get_bit_size() from tools/libxc/xc_dom_compat_linux.c
Juergen Gross [Fri, 23 Oct 2015 13:05:01 +0000 (15:05 +0200)]
libxc: remove xc_get_bit_size() from tools/libxc/xc_dom_compat_linux.c

xc_get_bit_size() is being used by the unused python wrapper
xc.getBitSize() only. Remove the wrapper and xc_get_bit_size().

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: remove most of tools/libxc/xc_dom_compat_linux.c
Juergen Gross [Fri, 23 Oct 2015 13:05:00 +0000 (15:05 +0200)]
libxc: remove most of tools/libxc/xc_dom_compat_linux.c

In tools/libxc/xc_dom_compat_linux.c xc_linux_build() is the only
domain building function used by an in-tree component (qemu-xen) which
is really necessary.

Remove the other domain building functions and the unused python
wrapper xc.linux_build() referencing one of the to be removed
functions.

Suggested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoConfig.mk: update OVMF changeset
Wei Liu [Thu, 12 Nov 2015 10:06:58 +0000 (10:06 +0000)]
Config.mk: update OVMF changeset

The new osstest tested head contains a fix for gcc-4.4 toolchain.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agooxenstored: Quota.merge: don't assume domain already exists
Jonathan Davies [Wed, 11 Nov 2015 11:21:53 +0000 (11:21 +0000)]
oxenstored: Quota.merge: don't assume domain already exists

In Quota.merge, we merge two quota hashtables, orig_quota and mod_quota, putting
the results into dest_quota. These hashtables map domids to the number of
entries currently owned by that domain.

When mod_quota contains an entry for a domid that was not present in orig_quota
(or dest_quota), the call to get_entry caused Quota.merge to raise a Not_found
exception. This propagates back to the client as an ENOENT error, which is not
an appropriate return value from some operations, such as transaction_end.

This situation can arise when a transaction that introduces a domain (hence
calling Quota.add_entry) needs to be coalesced due to concurrent xenstore
activity.

This patch handles the merge in the case where mod_quota contains an entry not
present in orig_quota (or in dest_quota) by treating that hashtable as having
existing value 0.

Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoxen/serial: Return actual bytes stored in TX FIFO for OMAP
Oleksandr Tyshchenko [Thu, 5 Nov 2015 17:53:07 +0000 (19:53 +0200)]
xen/serial: Return actual bytes stored in TX FIFO for OMAP

This is intended to decrease a time spending in transmitter
while waiting for the free space in TX FIFO.
And as result to reduce the impact of hvc on the entire system
running on OMAP5/DRA7XX based platforms.

Signed-off-by: Oleksandr Tyshchenko <oleksandr.tyshchenko@globallogic.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Julien Grall <julien.grall@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/serial: Move any OMAP specific things to OMAP UART driver
Oleksandr Tyshchenko [Thu, 5 Nov 2015 17:53:06 +0000 (19:53 +0200)]
xen/serial: Move any OMAP specific things to OMAP UART driver

The 8250-uart.h contains extra serial register definitions
for the internal UARTs in TI OMAP SoCs which are used in
OMAP UART driver only.
In order to clean up code move these definitions to omap-uart.c.
Also rename some definitions to follow to the UART_OMAP* prefix.

Signed-off-by: Oleksandr Tyshchenko <oleksandr.tyshchenko@globallogic.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Julien Grall <julien.grall@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoocaml/xc: correct shutdown_reason enumeration
Simon Rowe [Thu, 5 Nov 2015 11:39:05 +0000 (11:39 +0000)]
ocaml/xc: correct shutdown_reason enumeration

As defined by the Xen public header the fifth value of
shutdown_reason is watchdog.

Signed-off-by: Simon Rowe <simon.rowe@eu.citrix.com>
Acked-by: David Scott <dave@recoil.org>
9 years agorun QEMU as non-root
Stefano Stabellini [Thu, 5 Nov 2015 12:47:26 +0000 (12:47 +0000)]
run QEMU as non-root

Try to use "xen-qemuuser-domid$domid" first, then
"xen-qemuuser-shared" and root if everything else fails.

The uids need to be manually created by the user or, more likely, by the
xen package maintainer.

Expose a device_model_user setting in libxl_domain_build_info, so that
opinionated callers, such as libvirt, can set any user they like. Do not
fall back to root if device_model_user is set. Users can also set
device_model_user by hand in the xl domain config file.

QEMU is going to setuid and setgid to the user ID and the group ID of
the specified user, soon after initialization, before starting to deal
with any guest IO.

To actually secure QEMU when running in Dom0, we need at least to
deprivilege the privcmd and xenstore interfaces, this is just the first
step in that direction.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools: pygrub: if partition table is empty, try treating as a whole disk
Ian Campbell [Thu, 5 Nov 2015 14:46:12 +0000 (14:46 +0000)]
tools: pygrub: if partition table is empty, try treating as a whole disk

pygrub (in identify_disk_image()) detects a DOS style partition table
via the presence of the 0xaa55 signature at the end of the first
sector of the disk.

However this signature is also present in whole-disk configurations
when there is an MBR on the disk. Many filesystems (e.g. ext[234])
include leading padding in their on disk format specifically to enable
this.

So if we think we have a DOS partition table but do not find any
actual partition table entries we may as well try looking at it as a
whole disk image. Worst case is we probe and find there isn't anything
there.

This was reported by Sjors Gielen in Debian bug #745419. The fix was
inspired by a patch by Adi Kriegisch in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745419#27

Tested by genext2fs'ing my /boot into a new raw image (works) and
then:
   dd if=/usr/lib/grub/i386-pc/g2ldr.mbr of=img conv=notrunc bs=512 count=1

to add an MBR (with 0xaa55 signature) to it, which after this patch
also works.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: 745419-forwarded@bugs.debian.org
9 years agotools: migration: Use PRIpfn when printing frame numbers.
Ian Campbell [Wed, 11 Nov 2015 13:33:46 +0000 (13:33 +0000)]
tools: migration: Use PRIpfn when printing frame numbers.

This avoids various printf formatting warnings when building on arm32.

While touching the affected lines make them consistently use %#.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agons16550: misc minor adjustments
Jan Beulich [Fri, 13 Nov 2015 14:41:47 +0000 (15:41 +0100)]
ns16550: misc minor adjustments

First and foremost: fix documentation: The use of "clock_hz", when
"base_baud" was meant, has taken me several hours (suspecting a more
complicated problem with the PCIe card I've been trying to get
working). At once correct the "gdb" option, which is more like
"console", not like "com<N>".

Next, fix the types of ns_{read,write}_reg(): Especially the former
having had a signed return type so far caused quite interesting effects
when determining to baud rate if "auto" was specified. In that same
code, also avoid dividing by zero when in fact the baud rate was not
previously set up.

Further, accept I/O port based serial PCI cards with a port range wider
than 8 bytes.

Finally, slightly rearrange struct ns16550 to reduce holes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoRevert "x86/IO-APIC: fix setup of Xen internally used IRQs"
Jan Beulich [Fri, 13 Nov 2015 14:39:57 +0000 (15:39 +0100)]
Revert "x86/IO-APIC: fix setup of Xen internally used IRQs"

This reverts commit 1126b40892ab56cb13c3cae5822bf3a18a689ffb,
as it breaks (at least) x2apic systems.

9 years agox86/IO-APIC: make SET_DEST() easier to use
Jan Beulich [Thu, 12 Nov 2015 16:04:31 +0000 (17:04 +0100)]
x86/IO-APIC: make SET_DEST() easier to use

There has been quite a bit of redundancy between the various use sites.
Eliminate that. No change of generated code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/IO-APIC: fix setup of Xen internally used IRQs
Jan Beulich [Thu, 12 Nov 2015 16:04:10 +0000 (17:04 +0100)]
x86/IO-APIC: fix setup of Xen internally used IRQs

..., i.e. namely that of a PCI serial card with an IRQ above the
legacy range. This had got broken by the switch to cpumask_any() in
cpu_mask_to_apicid_phys(). Fix this by allowing all CPUs for that IRQ
(such that __setup_vector_irq() will properly update a booting CPU's
vector_irq[], avoiding "No irq handler for vector" messages and the
interrupt not working).

While doing this I also noticed that io_apic_set_pci_routing() can't
be quite right: It sets up the destination _before_ getting a vector
allocated (which on other than systems using the flat APIC mode
affects the possible destinations), and also didn't restrict affinity
to ->arch.cpu_mask (as established by assign_irq_vector()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/event: correct debug event generation
Jan Beulich [Thu, 12 Nov 2015 16:03:20 +0000 (17:03 +0100)]
x86/event: correct debug event generation

RIP is not a linear address, and hence should not on its own be subject
to GVA -> GFN translation. Once at it, move all of the (perhaps
expensive) operations in the two functions into their main if()'s body,
and improve the error code passed to the translation function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: #PF error code adjustments
Jan Beulich [Thu, 12 Nov 2015 16:02:35 +0000 (17:02 +0100)]
x86: #PF error code adjustments

Add a definition for the (for now unused) protection key related error
code bit, moving our own custom ones out of the way. In the course of
checking the uses of the latter I realized that while right now they
can only get set on their own, callers would better not depend on that
property and check just for the bit rather than matching the entire
value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/traps: honor EXT bit in error codes
Jan Beulich [Thu, 12 Nov 2015 16:01:53 +0000 (17:01 +0100)]
x86/traps: honor EXT bit in error codes

The specification does not explicitly limit the use of this bit to
exceptions that can have selector style error codes, so to be on the
safe side we should deal with it being set even on error codes formally
documented to be always zero (if they're indeed always zero, the change
is simply dead code in those cases).

Introduce and use (where suitable) X86_XEC_* constants to make the code
easier to read.

To match the placement of the "hardware_trap" label, the "hardware_gp"
one gets moved slightly too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/SVM: don't exceed segment limit when fetching instruction bytes
Jan Beulich [Thu, 12 Nov 2015 16:01:04 +0000 (17:01 +0100)]
x86/SVM: don't exceed segment limit when fetching instruction bytes

Also consistently use the vmcb local variable whenever possible.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/HVM: unify and fix #UD intercept
Jan Beulich [Thu, 12 Nov 2015 16:00:31 +0000 (17:00 +0100)]
x86/HVM: unify and fix #UD intercept

The SVM and VMX versions really were identical, so instead of fixing
the same issue in two places, fold them at once. The issue fixed is the
missing seg:off -> linear translation of the current code address.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/HVM: don't inject #DB with error code
Jan Beulich [Thu, 12 Nov 2015 15:59:18 +0000 (16:59 +0100)]
x86/HVM: don't inject #DB with error code

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
9 years agoelfnotes: intorduce a new PHYS_ENTRY elfnote
Roger Pau Monné [Thu, 12 Nov 2015 15:58:07 +0000 (16:58 +0100)]
elfnotes: intorduce a new PHYS_ENTRY elfnote

This new elfnote contains the 32bit entry point into the kernel. Xen will
use this entry point in order to launch the guest kernel in 32bit protected
mode with paging disabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoefi: fix booting failure with UEFI on ARM
Shannon Zhao [Tue, 10 Nov 2015 11:08:29 +0000 (12:08 +0100)]
efi: fix booting failure with UEFI on ARM

Commit 9fd08b4 (efi: split out efi_get_gop()) splits out the
codes getting the pointer to GOP as efi_get_gop(), but it doesn't
initialize the variable handles and gop to NULL like what the original
codes do. This will cause booting failure on ARM while printing below
logs:
Xen 4.7-unstable (c/s Tue Oct 13 14:40:28 2015 +0100 git:7a92036) EFI loader
Synchronous Exception at 0x00000000FECB021C

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
9 years agosymbols.c: avoid warn_unused_result build failure on fgets()
Riku Voipio [Tue, 10 Nov 2015 11:07:55 +0000 (12:07 +0100)]
symbols.c: avoid warn_unused_result build failure on fgets()

In commit:

d37d63d symbols: prefix static symbols with their source file names

An unchecked fgets was added. This causes a compile error at least
on ubuntu utopic:

symbols.c: In function 'read_symbol':
symbols.c:181:3: error: ignoring return value of 'fgets', declared with
attribute warn_unused_result [-Werror=unused-result]
   fgets(str, 500, in); /* discard rest of line */
   ^

Paper over the warning by checking the return value in the if statement.

Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: allow disabling the emulated VGA
Roger Pau Monné [Tue, 10 Nov 2015 11:07:32 +0000 (12:07 +0100)]
x86: allow disabling the emulated VGA

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: allow disabling the emulated RTC
Roger Pau Monné [Tue, 10 Nov 2015 11:07:03 +0000 (12:07 +0100)]
x86: allow disabling the emulated RTC

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: allow disabling power management
Roger Pau Monné [Tue, 10 Nov 2015 11:06:48 +0000 (12:06 +0100)]
x86: allow disabling power management

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: allow disabling the emulated PIT
Roger Pau Monné [Tue, 10 Nov 2015 11:06:28 +0000 (12:06 +0100)]
x86: allow disabling the emulated PIT

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: allow disabling the emulated PIC
Roger Pau Monné [Tue, 10 Nov 2015 11:06:09 +0000 (12:06 +0100)]
x86: allow disabling the emulated PIC

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: allow disabling the emulated IOMMU
Roger Pau Monné [Tue, 10 Nov 2015 11:05:35 +0000 (12:05 +0100)]
x86: allow disabling the emulated IOMMU

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
9 years agox86: allow disabling the emulated IO APIC
Roger Pau Monné [Tue, 10 Nov 2015 11:05:18 +0000 (12:05 +0100)]
x86: allow disabling the emulated IO APIC

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: allow disabling the emulated HPET
Roger Pau Monné [Tue, 10 Nov 2015 11:04:57 +0000 (12:04 +0100)]
x86: allow disabling the emulated HPET

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: add bitmap of enabled emulated devices
Roger Pau Monné [Tue, 10 Nov 2015 11:04:04 +0000 (12:04 +0100)]
x86: add bitmap of enabled emulated devices

Introduce a bitmap in x86 xen_arch_domainconfig that allows enabling or
disabling specific devices emulated inside of Xen for HVM guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/HVM: always intercept #AC and #DB
Jan Beulich [Tue, 10 Nov 2015 11:03:08 +0000 (12:03 +0100)]
x86/HVM: always intercept #AC and #DB

Both being benign exceptions, and both being possible to get triggered
by exception delivery, this is required to prevent a guest from locking
up a CPU (resulting from no other VM exits occurring once getting into
such a loop).

The specific scenarios:

1) #AC may be raised during exception delivery if the handler is set to
be a ring-3 one by a 32-bit guest, and the stack is misaligned.

This is CVE-2015-5307 / XSA-156.

Reported-by: Benjamin Serebrin <serebrin@google.com>
2) #DB may be raised during exception delivery when a breakpoint got
placed on a data structure involved in delivering the exception. This
can result in an endless loop when a 64-bit guest uses a non-zero IST
for the vector 1 IDT entry, but even without use of IST the time it
takes until a contributory fault would get raised (results depending
on the handler) may be quite long.

This is CVE-2015-8104 / XSA-156.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/hvm: make sure stdvga cache cannot be re-enabled
Paul Durrant [Fri, 6 Nov 2015 14:17:00 +0000 (15:17 +0100)]
x86/hvm: make sure stdvga cache cannot be re-enabled

As soon as the cache is disabled, it will become out-of-sync with the
VGA device model and since no mechanism exists to acquire current VRAM
state from the device model, re-enabling it leads to stale data
being seen by the guest.

The problem was introduced by commit 3bbaaec0 ("x86/hvm: unify stdvga
mmio intercept with standard mmio intercept") and can be seen by
deliberately crashing a Windows guest; the BSOD output is corrupted.

This patch changes the existing 'cache' boolean in hvm_hw_stdvga into a
tri-state enum and only allows the state to move from 'uninitialized' to
'enabled'. Once the cache state becomes 'disabled' it will remain so for
the lifetime of the VM.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agosched: fix locking of remove_vcpu() in credit1
Dario Faggioli [Fri, 6 Nov 2015 14:16:38 +0000 (15:16 +0100)]
sched: fix locking of remove_vcpu() in credit1

In fact, csched_vcpu_remove() (i.e., the credit1
implementation of remove_vcpu()) manipulates runqueues,
so holding the runqueue lock is necessary.

However, the vCPU just can't be on the runqueue, when
the function is called. We can therefore ASSERT() that,
and avoid doing any runqueue manipulations (rather than
adding the runqueue locking around it).

Also, while there, *_lock_irq() (for the private lock) is
enough, there is no need to *_lock_irqsave().

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agocpufreq: allow ordinary boolean options to be passed on the command line
Jan Beulich [Fri, 6 Nov 2015 14:15:32 +0000 (15:15 +0100)]
cpufreq: allow ordinary boolean options to be passed on the command line

I was quite surprised to find "cpufreq=off" not doing what one would
expect it to do. Fix this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: cleanup of early cpuid handling
Andrew Cooper [Wed, 4 Nov 2015 16:47:17 +0000 (17:47 +0100)]
x86: cleanup of early cpuid handling

Use register names for variables, rather than their content for leaf 1.
Reduce the number of cpuid instructions issued.  Also drop some trailing
whitespace.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agocredit: remove cpu argument to __runq_insert()
Harmandeep Kaur [Wed, 4 Nov 2015 16:46:46 +0000 (17:46 +0100)]
credit: remove cpu argument to __runq_insert()

__runq_insert() takes two arguments, cpu and svc. However,
the cpu argument is redundant because we can get all the
information we need about cpu from svc.

Signed-off-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
9 years agoblkif: document blkif multi-queue/ring extension
Bob Liu [Wed, 4 Nov 2015 16:46:24 +0000 (17:46 +0100)]
blkif: document blkif multi-queue/ring extension

Document the multi-queue/ring feature in terms of XenStore keys to be written
by the backend and by the frontend.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoxenconsoled: Remove unexpected daemonize behavior
Ross Lagerwall [Mon, 2 Nov 2015 11:17:38 +0000 (11:17 +0000)]
xenconsoled: Remove unexpected daemonize behavior

Previously, xenconsoled's daemonize function would do nothing if its
parent process is init (as it is under systemd but not sysv init).
This is confusing. Instead, always daemonize when asked to, but use the
"interactive" switch when running from the systemd service.

Because a pidfile is only written when daemonizing, drop the pidfile
parameters from the service file (systemd keeps track of the pids
anyway).

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: log an error if libxl_cpupool_destroy() fails
Dario Faggioli [Wed, 4 Nov 2015 10:48:24 +0000 (11:48 +0100)]
xl: log an error if libxl_cpupool_destroy() fails

In fact, right now, failing at destroying a cpupool is just
not reported to the user in any explicit way.

Let's log an error, as it is customary for xl in these cases.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: avoid (another) uninitialised use of rc in vcpuset()
Dario Faggioli [Wed, 4 Nov 2015 12:03:31 +0000 (13:03 +0100)]
xl: avoid (another) uninitialised use of rc in vcpuset()

Rearange the case when we check the new number of vCPUs
against the number of host pCPUs not to use rc for internal
error reporting. In fact:
 - rc was at risk of being used uninitialised;
 - rc should only be used for holding libxl error codes.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxl: initialise rc before using it in vcpuset
Wei Liu [Wed, 4 Nov 2015 11:32:57 +0000 (11:32 +0000)]
xl: initialise rc before using it in vcpuset

In 5b725e56 (xl: improve return and exit codes of vcpu related
functions), the return value of libxl_cpu_bitmap_alloc was not stored in
rc anymore. Yet the subsequent fprintf still used that.

Reinstate the original implementation, that is, to store return value of
libxl_cpu_bitmap_alloc in rc before using rc.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 4 Nov 2015 09:38:51 +0000 (09:38 +0000)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

9 years agox86: query for paddr_bits in early_cpu_detect()
Andrew Cooper [Tue, 3 Nov 2015 17:15:58 +0000 (18:15 +0100)]
x86: query for paddr_bits in early_cpu_detect()

It is __read_mostly, so repeatedly writing to it is suboptiomal.  As the
MTRRs have already been set up, nothing good will come from its value
changing across CPUs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agox86: correct {a,m}perf check in generic_identify()
Andrew Cooper [Tue, 3 Nov 2015 17:15:39 +0000 (18:15 +0100)]
x86: correct {a,m}perf check in generic_identify()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agox86/vmx: replace unqualified ud2 instructions with BUG frames
Andrew Cooper [Tue, 3 Nov 2015 17:15:15 +0000 (18:15 +0100)]
x86/vmx: replace unqualified ud2 instructions with BUG frames

Using new _ASM_BUGFRAME* internals.

A side effect of complicating the ASM statements is that GCC now chooses to
out-of-line the stub functions, resulting in identical copies being present in
all translation units.  As with the stac()/clac() stubs, force them always
inline.

No functional change, other than the failure cases, which now produce a
far more clear error message.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/bug: break out the internals of BUG_FRAME()
Andrew Cooper [Tue, 3 Nov 2015 17:14:49 +0000 (18:14 +0100)]
x86/bug: break out the internals of BUG_FRAME()

To allow bug frames can be created inside existing asm() statements.  In
order to do so, the current bugframe positional parameters are altered
to be named parameters, to avoid interactions with the parameters of the
existing asm() statement.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: replace unqualified ud2 instructions with BUG frames
Andrew Cooper [Tue, 3 Nov 2015 17:14:27 +0000 (18:14 +0100)]
x86: replace unqualified ud2 instructions with BUG frames

No functional change, other than the failure cases, which now produce a
far more clear error message.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/vmx: improvements to vmentry failure handling
Andrew Cooper [Tue, 3 Nov 2015 17:14:02 +0000 (18:14 +0100)]
x86/vmx: improvements to vmentry failure handling

Combine the almost identical vm_launch_fail() and vm_resume_fail() into a
single vmx_vmentry_failure().

Re-save all GPRs so that domain_crash() prints the real register values,
rather than the stack frame of the vmx_vmentry_failure() call.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/HAP: use %pv printk() format where suitable
Jan Beulich [Tue, 3 Nov 2015 17:11:56 +0000 (18:11 +0100)]
x86/HAP: use %pv printk() format where suitable

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agotimer-op: demote a debugging message to really be debugging only
Jan Beulich [Tue, 3 Nov 2015 17:11:15 +0000 (18:11 +0100)]
timer-op: demote a debugging message to really be debugging only

The issue the message points out may have been of relevance during the
early days, but shouldn't anymore.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agocompat: enforce distinguishable file names in symbol table
Jan Beulich [Tue, 3 Nov 2015 17:07:20 +0000 (18:07 +0100)]
compat: enforce distinguishable file names in symbol table

To make it possible to tell apart the static symbols in files built a
second time for compat guest support, arrange for their source file
names to be prefixed by a suitable path. We can't do this without
explicit .file directives, since gcc has always been stripping paths
from file names handed to the internally generated .file directive.
However, we can leverage __FILE__ if we make sure the second instance
gets compiled out of other than the very directory the wrapper sits in.

Where suitable, remove the long redundant explicit inclusions of
xen/config.h at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agosymbols: prefix static symbols with their source file names
Jan Beulich [Tue, 3 Nov 2015 17:05:35 +0000 (18:05 +0100)]
symbols: prefix static symbols with their source file names

This requires adjustments to the tool generating the symbol table and
its as well as nm's invocation.

Note: Not warning about duplicate symbols in the EFI case for now, as
a binutils bug causes misnamed file name entries to appear in EFI
binaries' symbol tables when the file name is longer than 18 chars.
(Not doing so also avoids other duplicates getting printed twice.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxl: improve return and exit codes of parse related functions
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:24 +0000 (07:56 +0530)]
xl: improve return and exit codes of parse related functions

Turning  parsing related functions exit codes towards using the
EXIT_[SUCCESS|FAILURE] constants, instead of instead of arbitrary numbers
or libxl return codes.
        - for main_*: arbitrary -> EXIT_SUCCESS|EXIT_FAILURE.
        - for internal fucntion: arbitrary -> 0/1.

Don't touch parse_config_data() which is big enough to deserve its own patch.

Signed-off-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: improve return and exit codes of parse related functions
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:23 +0000 (07:56 +0530)]
xl: improve return and exit codes of parse related functions

Turning  cpupools related functions exit codes towards using the
EXIT_[SUCCESS|FAILURE] constants, instead of instead of arbitrary numbers
or libxl return codes.

Signed-off-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: improve return and exit codes of vcpu related functions
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:22 +0000 (07:56 +0530)]
xl: improve return and exit codes of vcpu related functions

Turning vcpu manipulation functions exit codes toward using the
EXIT_[SUCCESS|FAILURE] constants, instead of instead of arbitrary numbers
or libxl return codes.

Signed-off-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: improve return and exit codes of scheduling related functions
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:21 +0000 (07:56 +0530)]
xl: improve return and exit codes of scheduling related functions

Turning scheduling related functions exit codes towards using the
EXIT_[SUCCESS|FAILURE] constants, instead of instead of arbitrary numbers
or libxl return codes.
        - for main_*: arbitrary -> EXIT_SUCCESS|EXIT_FAILURE.
        - for internal fucntion: arbitrary -> 0/1.

Signed-off-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: convert main() exit codes to EXIT_[SUCCESS|FAILURE]
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:20 +0000 (07:56 +0530)]
xl: convert main() exit codes to EXIT_[SUCCESS|FAILURE]

Turning main() function exit codes towards using the EXIT_[SUCCESS|FAILURE]
constants, instead of instead of arbitrary numbers or libxl return codes.

Also includes a document comment in xl.h stating xl process should always
return EXIT_FOO and main_* can be treated as main() as if they are returning
a process exit status and not a function return value)

Signed-off-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoMAINTAINERS: adding myself as co-maintainer of vTPM
Quan Xu [Sat, 10 Oct 2015 16:26:07 +0000 (00:26 +0800)]
MAINTAINERS: adding myself as co-maintainer of vTPM

Signed-off-by: Quan Xu <quan.xu@intel.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
9 years agotools/hotplug: xendomains.service conflicts with libvirt
Olaf Hering [Thu, 29 Oct 2015 11:02:54 +0000 (11:02 +0000)]
tools/hotplug: xendomains.service conflicts with libvirt

xendomains will manage guests behind libvirts back:
- libvirt starts a guest
- that guest can be "managed" by libvirt and xl at the same time
- when xendomains runs on shutdown it will save the guest using xl
  libvirt does not know about this
- when xendomains runs on boot it will restore the saved guest using xl
  libvirt does not know about this, it will just fail to manage the
  restored guest

To prevent xendomains from interfering with libvirt add a Conflicts= to
xendomains.service. It will cause libvirt to be stopped if xendomains is
started manually with 'systemctl start'.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: domain_build: Avoid to shadow the variable "mod" in write_properties
Julien Grall [Tue, 27 Oct 2015 15:39:14 +0000 (15:39 +0000)]
xen/arm: domain_build: Avoid to shadow the variable "mod" in write_properties

The variable "mod" is defined twice with different value. This make the
code confusing to read.

Rename the 2 "mod" in something more meaningful.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
--

Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools: create XEN_DUMP_DIR with mode 0700
Wei Liu [Wed, 21 Oct 2015 14:15:56 +0000 (15:15 +0100)]
tools: create XEN_DUMP_DIR with mode 0700

That directory is used to store guest memory dump which contains
sensitive information.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxl: Die on unknown options
Ian Jackson [Fri, 23 Oct 2015 15:44:11 +0000 (16:44 +0100)]
xl: Die on unknown options

def_getopt would print a message to stderr, but blunder on anyway.

Sadly this is probably not a backport candidate.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/mm: return -ESRCH for an invalid foreign domid
Andrew Cooper [Mon, 2 Nov 2015 14:34:01 +0000 (15:34 +0100)]
x86/mm: return -ESRCH for an invalid foreign domid

For consistency with all other invalid domid handling.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/PoD: Make p2m_pod_empty_cache() restartable
Andrew Cooper [Mon, 2 Nov 2015 14:33:38 +0000 (15:33 +0100)]
x86/PoD: Make p2m_pod_empty_cache() restartable

This avoids a long running operation when destroying a domain with a
large PoD cache.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agocredit1: on vCPU wakeup, kick away current only if makes sense
Dario Faggioli [Mon, 2 Nov 2015 14:33:19 +0000 (15:33 +0100)]
credit1: on vCPU wakeup, kick away current only if makes sense

In fact, when waking up a vCPU, __runq_tickle() is called
to allow the new vCPU to run on a pCPU (which one, depends
on the relationship between the priority of the new vCPU,
and the ones of the vCPUs that are already running).

If there is no idle processor on which the new vCPU can
run (e.g., because of pinning/affinity), we try to migrate
away the vCPU that is currently running on the new vCPU's
processor (i.e., the processor on which the vCPU is waking
up).

Now, trying to migrate a vCPU has the effect of pushing it
through a

 running --> offline --> runnable

transition, which, in turn, has the following negative
effects:

 1) Credit1 counts that as a wakeup, and it BOOSTs the
    vCPU, even if it is a CPU-bound one, which wouldn't
    normally have deserved boosting. This can prevent
    legit IO-bound vCPUs to get ahold of the processor
    until such spurious boosting expires, hurting the
    performance!

 2) since the vCPU is fails the vcpu_runnable() test
    (within the call to csched_schedule() that follows
    the wakeup, as a consequence of tickling) the
    scheduling rate-limiting mechanism is also fooled,
    i.e., the context switch happens even if less than
    the minimum execution amount of time passed.

In particular, 1) has been reported to cause the following
issue:

 * VM-IO: 1-vCPU pinned to a pCPU, running netperf
 * VM-CPU: 1-vCPU pinned the the same pCPU, running a busy
           CPU loop
 ==> Only VM-I/O: throughput is 806.64 Mbps
 ==> VM-I/O + VM-CPU: throughput is 166.50 Mbps

This patch solves (for the above scenario) the problem
by checking whether or not it makes sense to try to
migrate away the vCPU currently running on the processor.
In fact, if there aren't idle processors where such a vCPU
can execute. attempting the migration is just futile
(harmful, actually!).

With this patch, in the above configuration, results are:

 ==> Only VM-I/O: throughput is 807.18 Mbps
 ==> VM-I/O + VM-CPU: throughput is 731.66 Mbps

Reported-by: Kun Suo <ksuo@uccs.edu>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Kun Suo <ksuo@uccs.edu>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86: make compat_iret() domain crash cases distinguishable
Jan Beulich [Mon, 2 Nov 2015 14:32:48 +0000 (15:32 +0100)]
x86: make compat_iret() domain crash cases distinguishable

Rather than issuing a (mostly) useless separate message, rely on
domain_crash() providing enough data, and leverage the line number
information it prints.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agolibxlu: avoid linker warnings
Jan Beulich [Mon, 2 Nov 2015 14:28:33 +0000 (15:28 +0100)]
libxlu: avoid linker warnings

Recent ld warns about libxenlight.so's dependency libraries not being
available, which can be easily avoided by not just passing the raw
library name on ld's command line.

In the course of checking how things fit together (I originally
suspected the warning to come from the linking of xl) I also noticed a
stray L in SHLIB_libxenguest, which gets removed at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agodrop get_xen_guest_handle()
Jan Beulich [Mon, 2 Nov 2015 14:26:40 +0000 (15:26 +0100)]
drop get_xen_guest_handle()

Its use in the tools (and its apparent abuse in the hypervisor) are
long gone.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: adjust PoD target by memory fudge, too
Ian Jackson [Wed, 21 Oct 2015 15:18:30 +0000 (16:18 +0100)]
libxl: adjust PoD target by memory fudge, too

PoD guests need to balloon at least as far as required by PoD, or risk
crashing.  Currently they don't necessarily know what the right value
is, because our memory accounting is (at the very least) confusing.

Apply the memory limit fudge factor to the in-hypervisor PoD memory
target, too.  This will increase the size of the guest's PoD cache by
the fudge factor LIBXL_MAXMEM_CONSTANT (currently 1Mby).  This ensures
that even with a slightly-off balloon driver, the guest will be
stable even under memory pressure.

There are two call sites of xc_domain_set_pod_target that need fixing:

The one in libxl_set_memory_target is straightforward.

The one in xc_hvm_build_x86.c:setup_guest is more awkward.  Simply
setting the PoD target differently does not work because the various
amounts of memory during domain construction no longer match up.
Instead, we adjust the guest memory target in xenstore (but only for
PoD guests).

This introduces a 1Mby discrepancy between the balloon target of a PoD
guest at boot, and the target set by an apparently-equivalent `xl
mem-set' (or similar) later.  This approach is low-risk for a security
fix but we need to fix this up properly in xen.git#staging and
probably also in stable trees.

This is XSA-153.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 56fb5fd62320eb40a7517206f9706aa9188d6f7b)

9 years agox86: rate-limit logging in do_xen{oprof,pmu}_op()
Jan Beulich [Thu, 29 Oct 2015 12:37:19 +0000 (13:37 +0100)]
x86: rate-limit logging in do_xen{oprof,pmu}_op()

Some of the sub-ops are acessible to all guests, and hence should be
rate-limited. In the xenoprof case, just like for XSA-146, include them
only in debug builds. Since the vPMU code is rather new, allow them to
be always present, but downgrade them to (rate limited) guest messages.

This is CVE-2015-7971 / XSA-152.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxenoprof: free domain's vcpu array
Jan Beulich [Thu, 29 Oct 2015 12:36:52 +0000 (13:36 +0100)]
xenoprof: free domain's vcpu array

This was overlooked in fb442e2171 ("x86_64: allow more vCPU-s per
guest").

This is CVE-2015-7969 / XSA-151.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/PoD: Eager sweep for zeroed pages
Andrew Cooper [Thu, 29 Oct 2015 12:36:25 +0000 (13:36 +0100)]
x86/PoD: Eager sweep for zeroed pages

Based on the contents of a guests physical address space,
p2m_pod_emergency_sweep() could degrade into a linear memcmp() from 0 to
max_gfn, which runs non-preemptibly.

As p2m_pod_emergency_sweep() runs behind the scenes in a number of contexts,
making it preemptible is not feasible.

Instead, a different approach is taken.  Recently-populated pages are eagerly
checked for reclaimation, which amortises the p2m_pod_emergency_sweep()
operation across each p2m_pod_demand_populate() operation.

Note that in the case that a 2M superpage can't be reclaimed as a superpage,
it is shattered if 4K pages of zeros can be reclaimed.  This is unfortunate
but matches the previous behaviour, and is required to avoid regressions
(domain crash from PoD exhaustion) with VMs configured close to the limit.

This is CVE-2015-7970 / XSA-150.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agofree domain's vcpu array
Jan Beulich [Thu, 29 Oct 2015 12:35:40 +0000 (13:35 +0100)]
free domain's vcpu array

This was overlooked in fb442e2171 ("x86_64: allow more vCPU-s per
guest").

This is CVE-2015-7969 / XSA-149.

Reported-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86: guard against undue super page PTE creation
Jan Beulich [Thu, 29 Oct 2015 12:35:07 +0000 (13:35 +0100)]
x86: guard against undue super page PTE creation

When optional super page support got added (commit bd1cd81d64 "x86: PV
support for hugepages"), two adjustments were missed: mod_l2_entry()
needs to consider the PSE and RW bits when deciding whether to use the
fast path, and the PSE bit must not be removed from L2_DISALLOW_MASK
unconditionally.

This is CVE-2015-7835 / XSA-148.

Reported-by: "栾尚聪(好风)" <shangcong.lsc@alibaba-inc.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
9 years agoarm: handle races between relinquish_memory and free_domheap_pages
Ian Campbell [Thu, 29 Oct 2015 12:34:17 +0000 (13:34 +0100)]
arm: handle races between relinquish_memory and free_domheap_pages

Primarily this means XENMEM_decrease_reservation from a toolstack
domain.

Unlike x86 we have no requirement right now to queue such pages onto
a separate list, if we hit this race then the other code has already
fully accepted responsibility for freeing this page and therefore
there is no more for relinquish_memory to do.

This is CVE-2015-7814 / XSA-147.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agoarm: rate-limit logging from unimplemented PHYSDEVOP and HVMOP.
Ian Campbell [Thu, 29 Oct 2015 12:33:38 +0000 (13:33 +0100)]
arm: rate-limit logging from unimplemented PHYSDEVOP and HVMOP.

These are guest accessible and should therefore be rate-limited.
Moreover, include them only in debug builds.

This is CVE-2015-7813 / XSA-146.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agoarm: Support hypercall_create_continuation for multicall
Julien Grall [Thu, 29 Oct 2015 12:31:10 +0000 (13:31 +0100)]
arm: Support hypercall_create_continuation for multicall

Multicall for ARM has been supported since commit f0dbdc6 "xen: arm: fully
implement multicall interface.". Although, if an hypercall in multicall
requires preemption, it will crash the host:

(XEN) Xen BUG at domain.c:347
(XEN) ----[ Xen-4.7-unstable  arm64  debug=y  Tainted:    C ]----
[...]
(XEN) Xen call trace:
(XEN)    [<00000000002420cc>] hypercall_create_continuation+0x64/0x380 (PC)
(XEN)    [<0000000000217274>] do_memory_op+0x1b00/0x2334 (LR)
(XEN)    [<0000000000250d2c>] do_multicall_call+0x114/0x124
(XEN)    [<0000000000217ff0>] do_multicall+0x17c/0x23c
(XEN)    [<000000000024f97c>] do_trap_hypercall+0x90/0x12c
(XEN)    [<0000000000251ca8>] do_trap_hypervisor+0xd2c/0x1ba4
(XEN)    [<00000000002582cc>] guest_sync+0x88/0xb8
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 5:
(XEN) Xen BUG at domain.c:347
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)

Looking to the code, the support of multicall looks valid to me, as we only
need to fill call.args[...]. So drop the BUG();

This is CVE-2015-7812 / XSA-145.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agosched-rt: avoid to shadow the variable "svc" in rt_dom_cntl
Julien Grall [Thu, 29 Oct 2015 11:24:13 +0000 (12:24 +0100)]
sched-rt: avoid to shadow the variable "svc" in rt_dom_cntl

The variable "svc" is declared twice within rt_dom_cntl. However, the
top declaration could be re-used avoiding re-declaring another time the
variable.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
9 years agocredit2: avoid to shadow the variable "cur" in runq_tickle
Julien Grall [Thu, 29 Oct 2015 11:23:53 +0000 (12:23 +0100)]
credit2: avoid to shadow the variable "cur" in runq_tickle

The variable "cur" is declared twice within "cur". However the top
declaration could be re-used avoiding re-declaring another time the
variable.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agocommon/memory: avoid to shadow the variable "d" in do_memory_op
Julien Grall [Thu, 29 Oct 2015 11:23:34 +0000 (12:23 +0100)]
common/memory: avoid to shadow the variable "d" in do_memory_op

The variable "d" is declared multiple times within do_memory_op.

The subsequent declaration are not useful because the top one is never
used. So drop them.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agogrant_table: avoid to shadow "frame" in __gnttab_map_grant_ref
Julien Grall [Thu, 29 Oct 2015 11:20:38 +0000 (12:20 +0100)]
grant_table: avoid to shadow "frame" in __gnttab_map_grant_ref

The variable "frame" is declared twice within the function
__gntab_map_grant_ref.  This makes the code quite confusing to read.

The second definition is not useful as the first one is never used
until then. So drop it.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agocommon/domain: avoid to shadow the variable "d" in do_vcpu_op
Julien Grall [Thu, 29 Oct 2015 11:19:23 +0000 (12:19 +0100)]
common/domain: avoid to shadow the variable "d" in do_vcpu_op

The variable "d" is defined twice. However, the second one is not
necessary as the vCPU as already been deduced from the first "d".

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agotools/python: Further pruning of the defuct xl bindings
Andrew Cooper [Wed, 28 Oct 2015 15:55:43 +0000 (15:55 +0000)]
tools/python: Further pruning of the defuct xl bindings

No need to generate xen/lowlevel/xl/_pyxl_types.{h,c}, following c/s
598e97f "tools/python: remove broken xl binding"

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/mm: don't call HVM-only function for PV guests
Jan Beulich [Tue, 27 Oct 2015 15:34:29 +0000 (16:34 +0100)]
x86/mm: don't call HVM-only function for PV guests

Somehow I managed to drop the HVM dependency from v2 to v3 of what
became commit 5c23c760a8 ("x86/HVM: correct page dirty marking in
hvm_map_guest_frame_rw()"), obviously breaking migration of PV guests.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agomm: improve message in populate physmap when the domain is direct mapped
Julien Grall [Tue, 27 Oct 2015 13:47:33 +0000 (14:47 +0100)]
mm: improve message in populate physmap when the domain is direct mapped

The current domain and the domain pointed by the variable "d" are not
the same.

However, when it's not possible to get a reference on the page, the
target domain ID is not printed. This makes the message difficult to
understand.

Improve the message by printing the target domain ID.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agomm: unmap page for direct mapped domain on decrease reservation
Julien Grall [Tue, 27 Oct 2015 13:47:01 +0000 (14:47 +0100)]
mm: unmap page for direct mapped domain on decrease reservation

Direct mapped domain needs to retrieve the exact same underlying
physical page when the region is re-populated.

Currently, when the memory reservation for this domain is decreased, the
request is just ignored and the page stayed mapped in the P2M. However,
this make more difficult to spot issue when the domain has not yet mapped
foreign page but trying to access the region.

What we really care for direct mapped domain is to not give back the
page to the allocator. So we can re-enable to direct mapped when the guest
memory region is re-populated.

The rest of the process to remove a page can be safely done. This
also ensures us to stay close to the normal domain memory handling.

At the same time, drop the trailing whitespaces around the code
modified.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/PV: don't zero-map LDT
Jan Beulich [Tue, 27 Oct 2015 13:46:12 +0000 (14:46 +0100)]
x86/PV: don't zero-map LDT

This effectvely reverts the LDT related part of commit cf6d39f819
("x86/PV: properly populate descriptor tables"), which broke demand
paged LDT handling in guests.

Reported-by: David Vrabel <david.vrabel@citrix.com>
Diagnosed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/mm: only a single instance of gw_page_flags[] is needed
Jan Beulich [Tue, 27 Oct 2015 10:46:35 +0000 (11:46 +0100)]
x86/mm: only a single instance of gw_page_flags[] is needed

None of its elements depends on GUEST_PAGING_LEVELS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/mm: build map_domain_gfn() just once
Jan Beulich [Tue, 27 Oct 2015 10:46:05 +0000 (11:46 +0100)]
x86/mm: build map_domain_gfn() just once

It doesn't depend on GUEST_PAGING_LEVELS. Moving the function to p2m.c
at once allows a bogus #define/#include pair to be removed from
hap/nested_ept.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/mm: override stored file names for multiply built sources
Jan Beulich [Tue, 27 Oct 2015 10:44:52 +0000 (11:44 +0100)]
x86/mm: override stored file names for multiply built sources

To make it possible to tell apart the static symbols therein, use their
object file names instead of their source ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agouse clear_domain_page() instead of open coding it
Jan Beulich [Tue, 27 Oct 2015 10:44:20 +0000 (11:44 +0100)]
use clear_domain_page() instead of open coding it

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/xsaves: add basic definitions/helpers to support xsaves
Shuai Ruan [Tue, 27 Oct 2015 10:42:57 +0000 (11:42 +0100)]
x86/xsaves: add basic definitions/helpers to support xsaves

This patch add basic definitions/helpers which will be used in
later patches.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()
Jan Beulich [Tue, 27 Oct 2015 10:42:04 +0000 (11:42 +0100)]
x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()

Rather than dirtying a page when establishing a (permanent) mapping,
dirty it when the page gets unmapped, or - if still mapped - on the
final iteration of a save operation (or in other cases where the guest
is paused or already shut down). (Transient mappings continue to get
dirtied upon getting mapped, to avoid the overhead of tracking.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agox86: remove assumptions about the layout of x86_capabilities
Andrew Cooper [Mon, 26 Oct 2015 13:02:30 +0000 (14:02 +0100)]
x86: remove assumptions about the layout of x86_capabilities

Future work will rearange it, invalidating these assumptions.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>