Tim Deegan [Mon, 30 Sep 2013 12:18:25 +0000 (14:18 +0200)]
x86/mm/shadow: Fix initialization of PV shadow L4 tables.
Shadowed PV L4 tables must have the same Xen mappings as their
unshadowed equivalent. This is done by copying the Xen entries
verbatim from the idle pagetable, and then using guest_l4_slot()
in the SHADOW_FOREACH_L4E() iterator to avoid touching those entries.
adc5afbf1c70ef55c260fb93e4b8ce5ccb918706 (x86: support up to 16Tb)
changed the definition of ROOT_PAGETABLE_XEN_SLOTS to extend right to
the top of the address space, which causes the shadow code to
copy Xen mappings into guest-kernel-address slots too.
In the common case, all those slots are zero in the idle pagetable,
and no harm is done. But if any slot above #271 is non-zero, Xen will
crash when that slot is later cleared (it attempts to drop
shadow-pagetable refcounts on its own L4 pagetables).
Fix by using the new ROOT_PAGETABLE_PV_XEN_SLOTS when appropriate.
Monitor pagetables need the full Xen mappings, so they keep using the
old name (with its new semantics).
This is CVE-2013-4356 / XSA-64.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ignoring them generally implies using uninitialized data and, in all
but two of the cases dealt with here, potentially leaking hypervisor
stack contents to guests.
This is CVE-2013-4355 / XSA-63.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The current logic does not handle the case when HPET special->handle
is invalid in IVRS. On such system, the following message is shown:
(XEN) AMD-Vi: Failed to setup HPET MSI remapping: Wrong HPET
This patch will allow the ivrs_hpet[<handle>]=<sbdf> to override the
IVRS. Also, it removes struct hpet_sbdf.iommu since it is not
used anywhere in the code.
Ian Campbell [Thu, 29 Aug 2013 15:25:00 +0000 (16:25 +0100)]
xen: arm: rewrite start of day page table and cpu bring up
This is unfortunately a rather large monolithic patch.
Rather than bringing up all CPUs in lockstep as we setup paging and relocate
Xen instead create a simplified set of dedicated boot time pagetables.
This allows secondary CPUs to remain powered down or in the firmware until we
actually want to enable them. The bringup is now done later on in C and can be
driven by DT etc. I have included code for the vexpress platform, but other
platforms will need to be added.
The mechanism for deciding how to bring up a CPU differs between arm32 and
arm64. On arm32 it is essentially a per-platform property, with the exception
of PSCI which can be implemented globally (but isn't here). On arm64 there is a
per-cpu property in the device tree.
Secondary CPUs are brought up directly into the relocated Xen image, instead of
relying on being able to launch on the unrelocated Xen and hoping that it
hasn't been clobbered.
As part of this change drop support for switching from secure mode to NS HYP as
well as the early CPU kick. Xen now requires that it is launched in NS HYP
mode and that firmware configure things such that secondary CPUs can be woken
up by a primarly CPU in HYP mode. This may require fixes to bootloaders or the
use of a boot wrapper.
The changes done here (re)exposed an issue with relocating Xen and the compiler
spilling values to the stack between the copy and the actual switch to the
relocaed copy of Xen in setup_pagetables. Therefore switch to doing the copy
and switch in a single asm function where we can control precisely what gets
spilled to the stack etc.
Since we now have a separate set of boot pagetables it is much easier to build
the real Xen pagetables inplace before relocating rather than the more complex
approach of rewriting the pagetables in the relocated copy before switching.
This will also enable Xen to be loaded above the 4GB boundary on 64-bit.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Julien Grall <julien.grall@linaro.org>
x86/microcode: Check whether the microcode is correct
We do the microcode code update in two steps - the presmp:
'microcode_presmp_init' and when CPUs are brought up: 'microcode_init'.
The earlier performs the microcode update on the BSP - but
unfortunately it does not check whether the update failed. Which means
that we might try later to update a incorrect payload on the rest of
CPUs.
This patch handles this odd situation.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
x86/microcode: Scan the initramfs payload for microcode blob
The Linux kernel is able to update the microcode during early bootup
via inspection of the initramfs blob to see if there is an cpio image
with certain microcode files. Linux is able to function with two (or
more) cpio archives in the initrd b/c it unpacks all of the cpio
archives.
The format of the early initramfs is nicely documented in Linux's
Documentation/x86/early-microcode.txt:
Early load microcode
====================
By Fenghua Yu <fenghua.yu@intel.com>
Kernel can update microcode in early phase of boot time. Loading microcode early
can fix CPU issues before they are observed during kernel boot time.
Microcode is stored in an initrd file. The microcode is read from the initrd
file and loaded to CPUs during boot time.
The format of the combined initrd image is microcode in cpio format followed by
the initrd image (maybe compressed). Kernel parses the combined initrd image
during boot time. The microcode file in cpio name space is:
kernel/x86/microcode/GenuineIntel.bin
During BSP boot (before SMP starts), if the kernel finds the microcode file in
the initrd file, it parses the microcode and saves matching microcode in memory.
If matching microcode is found, it will be uploaded in BSP and later on in all
APs.
The cached microcode patch is applied when CPUs resume from a sleep state.
There are two legacy user space interfaces to load microcode, either through
/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file
in sysfs.
In addition to these two legacy methods, the early loading method described
here is the third method with which microcode can be uploaded to a system's
CPUs.
The following example script shows how to generate a new combined initrd file in
/boot/initrd-3.5.0.ucode.img with original microcode microcode.bin and
original initrd image /boot/initrd-3.5.0.img.
As such this code inspects the initrd to see if the microcode
signatures are present and if so updates the hypervisor.
The option to turn this scan on/off is gated by the 'ucode'
parameter. The options are now:
'scan' Scan for the microcode in any multiboot payload.
<index> Attempt to load microcode blob (not the cpio archive
format) from the multiboot payload number.
This option alters slightly the 'ucode' parameter by only allowing
either parameter:
ucode=[<index>|scan]
Implementation wise the ucode_blob is defined as __initdata.
That is OK from the viewpoint of suspend/resume as the the underlaying
architecture microcode (microcode_intel or microcode_amd) end up saving
the blob in 'struct ucode_cpu_info' which is a per-cpu data
structure (see ucode_cpu_info). They end up saving it when doing the
pre-SMP (for CPU0) and SMP (for the rest) microcode loading.
Naturally if one does a hypercall to update the microcode and it is
newer, then the old per-cpu data is replaced.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Fri, 27 Sep 2013 08:15:28 +0000 (10:15 +0200)]
hvmloader/smbios: Change strncpy to memcpy for anchor strings
Coverity complains about the use of strncpy() to completely fill the anchor
strings, resulting in an unterminated string.
Although the strncpy result is correct, the anchor strings are not strings in
the C sense, and use of memcpy is the prevaling style elsewhere in hvmloader
anyway.
While tidying up the style in this function, also remove some trailing
whitespace and gratuitous cast.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
AMD IOMMU: fix Dom0 device setup failure for host bridges
The host bridge device (i.e. 0x18 for AMD) does not require IOMMU, and
therefore is not included in the IVRS. The current logic tries to map
all PCI devices to an IOMMU. In this case, "xl dmesg" shows the
following message on AMD sytem.
(XEN) setup 0000:00:18.0 for d0 failed (-19)
(XEN) setup 0000:00:18.1 for d0 failed (-19)
(XEN) setup 0000:00:18.2 for d0 failed (-19)
(XEN) setup 0000:00:18.3 for d0 failed (-19)
(XEN) setup 0000:00:18.4 for d0 failed (-19)
(XEN) setup 0000:00:18.5 for d0 failed (-19)
This patch adds a new device type (i.e. DEV_TYPE_PCI_HOST_BRIDGE) which
corresponds to PCI class code 0x06 and sub-class 0x00. Then, it uses
this new type to filter when trying to map device to IOMMU.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reported-by: Stefan Bader <stefan.bader@canonical.com>
On VT-d refuse (un)mapping host bridges for other than the hardware
domain.
Coding style cleanup.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Ian Campbell [Thu, 26 Sep 2013 11:35:42 +0000 (12:35 +0100)]
xen: support RAM at addresses 0 and 4096
Currently the mapping from pages to zones causes the page at zero to go into
zone -1 and the page at 4096 to go into zone 0, which is the Xen zone
(confusing various assertions).
Arrange instead for the mapping to be such that zone 0 is always reserved for
Xen and all other pages map to a zone >= 1.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Cc: jbeulich@suse.com Acked-by: Tim Deegan <tim@xen.org>
Ian Campbell [Thu, 26 Sep 2013 11:35:39 +0000 (12:35 +0100)]
xen/arm: Support dtb /memreserve/ regions
This requires a mapping of the DTB during setup_mm. Previously this was in
the BOOT_MISC slot, which is clobbered by setup_pagetables. Split it out
into its own slot which can be preserved.
Also handle these regions as part of consider_modules() and when adding pages
to the heaps to ensure we do not locate any part of Xen or the heaps over
them.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Ian Campbell [Thu, 26 Sep 2013 11:35:37 +0000 (12:35 +0100)]
xen/arm: do not relocate Xen outside of visible RAM
Since we do not handle non-contiguous banks of memory lets avoid relocating
Xen into such a bank. Avoids issues such as free_init_memory releasing pages
which are outside of the frametable.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Ian Campbell [Thu, 26 Sep 2013 11:35:34 +0000 (12:35 +0100)]
xen/arm: ensure the xenheap is 32MB aligned
My patch 08693f5948d8 "xen: arm: reduce the size of the xen heap to max 1/8
RAM size" unintentionally violated the constraint that the xenheap must be
32MB aligned, since we only explicitly align the end of the heap and
xenheap_pages was not a multiple of 32 pages.
Round xenheap pages up to a 32MB boundary.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Ian Campbell [Wed, 25 Sep 2013 11:21:51 +0000 (12:21 +0100)]
xen: arm: use new 64-bit zImage magic numbers for Xen binary
Upstream commit 4370eec05a88 "arm64: Expand arm64 image header" ended up
changing the zImage magic (which was actually the initial branch instructio
encoding!). The new header has a proper magic number at a fixed location.
Switch Xen itself to using this format. Neither the bootwrapper nor the
models care about this header themselves and real bootloaders are not widely
used, so now is as good a time as any to switch (as upstream have proven)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Ian Campbell [Wed, 25 Sep 2013 11:21:35 +0000 (12:21 +0100)]
xen: arm: handle new 64-bit zImage magic numbers
Upstream commit 4370eec05a88 "arm64: Expand arm64 image header" ended up
changing the zImage magic (which was actually the initial branch instruction
encoding!). The new header has a proper magic number at a fixed location. Support that as well as the original magic.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
xen/arm: Use the hardware ID to boot correctly secondary cpus
Secondary CPUs will spin in head.S until their MPIDR[23:0] correspond to
the smp_up_cpu. Actually Xen will set the value with the logical CPU ID
which is wrong. Use the cpu_logical_map to get the correct CPU ID.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Introduce cpu_logical_map to associate a logical CPU ID to an hardware CPU ID.
This map will be filled during Xen boot via the device tree. Each CPU node
contains a "reg" property which contains the hardware ID (ie MPIDR[0:23]).
Also move /cpus parsing later so we can use the dt_* API.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
The GIC mapping of CPU interfaces does not necessarily match the logical
CPU numbering.
When Xen wants to send an SGI to specific CPU, it needs to use the GIC CPU
ID. It can be retrieved from ITARGETSR0, in fact when this field is read,
the GIC will return a value that corresponds only to the processor reading
the register. So Xen can use the PPI 0 to initialize the mapping.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
When Xen initialize the GIC distributor, we need to route all the IRQs to
the boot CPU. The CPU ID can differ between Xen and the GIC.
When ITARGETSR0 is read, each field will return a value that corresponds
only to the processor reading the register. So Xen can use the PPI 0 to
initialize correctly the routing.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Thu, 26 Sep 2013 08:23:39 +0000 (10:23 +0200)]
x86: fix compat guest handling of XENPF_enter_acpi_sleep
Rather than blindly defining the native name to the compat one, when
we want to pass the compat structure to a native function we ought to
verify that their layouts match. With a respective xlat.lst entry
there's then also no need anymore to do such aliasing.
While cleaaning up that file I also noticed that the Cx and Px
interface handling here has quite a few unnecessary #define-s - delete
them.
Daniel De Graaf [Thu, 26 Sep 2013 08:15:47 +0000 (10:15 +0200)]
fix DOMID_IO mapping permission checks (try 2)
When the permission checks for memory mapping were moved from
get_pg_owner to xsm_mmu_update in aaba7a677, the exception for DOMID_IO
was not taken into account. This will cause IO memory mappings by PV
domains (mini-os in particular) to fail when XSM/FLASK is not being
used. This patch reintroduces the exception for DOMID_IO; the actual
restrictions on IO memory mappings have always been checked separately
using iomem_access_permitted, so this change should not break existing
access control.
Reported-by: Eduardo Peixoto Macedo <epm@cin.ufpe.br> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Andrew Cooper [Thu, 26 Sep 2013 08:14:51 +0000 (10:14 +0200)]
x86/crash: Indicate how well nmi_shootdown_cpus() managed to do
Having nmi_shootdown_cpus() report which pcpus failed to be shot down is a
useful debugging hint as to what possibly went wrong (especially when the
crash logs seem to indicate that an NMI timeout occurred while waiting for one
of the problematic pcpus to perform an action).
This is achieved by swapping an atomic_t count of unreported pcpus with a
cpumask. In the case that the 1 second timeout occurs, use the cpumask to
identify the problematic pcpus.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 26 Sep 2013 08:11:00 +0000 (10:11 +0200)]
x86: fix rdrand asm()
Just learned the hard way that at least for non-volatile asm()s gcc
indeed does what the documentation says: It may move it across jumps
(i.e. ahead of the cpu_has() check). While the documentation claims
that this can also happen for volatile asm()s, if that was the case
we'd have many more problems in our code (and e,g, Linux would too).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Matthew Daley [Wed, 18 Sep 2013 03:37:40 +0000 (15:37 +1200)]
libxl: fix libxl_string_list_length and its only caller
The wrong amount of indirections were being taken in
libxl_string_list_length, and its only caller was miscounting the amount
of initial non-list arguments, seemingly since the initial commit
(599c784).
This has been seen and reported in the wild (##xen):
< Trixboxer> Hi, any idea why would I get
< Trixboxer> xl: libxl_bootloader.c:42: bootloader_arg: Assertion `bl->nargs < bl->argsspace' failed.
< Trixboxer> 4.2.2-23.el6
Coverity-ID: 1054954 Signed-off-by: Matthew Daley <mattjd@gmail.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Daniel De Graaf [Wed, 25 Sep 2013 08:48:20 +0000 (10:48 +0200)]
fix DOMID_IO mapping permission checks
When the permission checks for memory mapping were moved from
get_pg_owner to xsm_mmu_update in aaba7a677, the exception for DOMID_IO
was not taken into account. This will cause IO memory mappings by PV
domains (mini-os in particular) to fail when XSM/FLASK is not being
used. This patch reintroduces the exception for DOMID_IO; the actual
restrictions on IO memory mappings have always been checked separately
using iomem_access_permitted, so this change should not break existing
access control.
Reported-by: Eduardo Peixoto Macedo <epm@cin.ufpe.br> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Jan Beulich [Wed, 25 Sep 2013 08:41:25 +0000 (10:41 +0200)]
x86/xsave: initialize extended register state when guests enable it
Till now, when setting previously unset bits in XCR0 we wouldn't touch
the active register state, thus leaving in the newly enabled registers
whatever a prior user of it left there, i.e. potentially leaking
information between guests.
This is CVE-2013-1442 / XSA-62.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 23 Sep 2013 15:37:00 +0000 (17:37 +0200)]
VMX: also use proper instruction mnemonic for VMREAD
... when assembler supports it, following commit cfd54835 ("VMX: use
proper instruction mnemonics if assembler supports them"). This merely
got split off from the earlier change becase of the significant number
of call sites needing to be changed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Jan Beulich [Mon, 23 Sep 2013 07:55:14 +0000 (09:55 +0200)]
x86/HVM: refuse doing string operations in certain situations
We shouldn't do any acceleration for
- "rep movs" when either side is passed through MMIO or when both sides
are handled by qemu
- "rep ins" and "rep outs" when the memory operand is any kind of MMIO
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Mon, 23 Sep 2013 07:53:55 +0000 (09:53 +0200)]
x86/HVM: linear address must be canonical for the whole accessed range
... rather than just for the first byte.
While at it, also
- make the real mode case at least dpo a wrap around check
- drop the mis-named "gpf" label (we're not generating faults here)
and use in-place returns instead
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Mon, 23 Sep 2013 07:52:29 +0000 (09:52 +0200)]
x86_emulate: fix wrap around handling for repeated string instructions
For one, repeat count clipping for MOVS must be done taking into
consideration both source and destination addresses.
And then we should allow a wrap on the final iteration only if either
the wrap is a precise one (i.e. the access itself doesn't wrap, just
the resulting index register value would) or if there is just one
iteration. In all other cases we should do a bulk operation first
without hitting the wrap, and then issue an individual iteration. If
we don't do it that way,
- the last iteration not completing successfully will cause the whole
operation to fail (i.e. registers not get updated to the failure
point)
- hvmemul_virtual_to_linear() may needlessly enforce non-repeated
operation
Additionally with the prior implementation there was a case
(df=1, ea=~0, reps=~0, bytes_per_rep=1) where we'd end up passing zero
reps back to the caller, yet various places assume that there's at
least on iteration.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Fri, 20 Sep 2013 16:18:35 +0000 (17:18 +0100)]
ns16550: support DesignWare 8250
This hardware has an additional feature which signals an error if you try to
write LCR while the UART is busy. We need to clear this error during setup,
otherwise LCR.DLAB doesn't get set and we cannot read/write the divisor.
This has been tested on the cubieboard2
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Cc: jbeulich@suse.com
Ian Campbell [Fri, 20 Sep 2013 16:18:34 +0000 (17:18 +0100)]
ns16550: make usable on ARM
There are several aspects to this:
- Correctly conditionalise use of PCI
- Correctly conditionalise use of IO ports
- Add discovery via device tree
- Support different registers shift/stride and widths
- Add vuart hooks.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Frser <keir@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: PranavkumarSawargaonkar<pranavkumar@linaro.org>
Ian Campbell [Mon, 9 Sep 2013 16:45:51 +0000 (17:45 +0100)]
xen/arm: replace io{read,write}{l,b} with {read,write}{l,b}
We appear to have invented the io versions ourselves for Xen on ARM, while x86
has the plain read/write. (and so does Linux FWIW)
read/write are used in common driver code (specifically ns16550) so instead of
keeping our own variant around lets replace it with the more standard ones.
At the same time resync with Linux making the "based on" comment in both sets of
io.h somewhat true (they don't look to have been very based on before...). Our
io.h is now consistent with Linux v3.11.
Note that iowrite and write take their arguments in the opposite order.
Also make asm-arm/io.h useful and include it where necessary instead of picking
up the include from mm.h. Remove the include from mm.h
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Tue, 17 Sep 2013 15:28:12 +0000 (16:28 +0100)]
xen: arm: rework placement of fdt in initial dom0 memory map
The 32-bit Linux kernel uses its lowmem direct mapping to access the FDT. The
lowmem mapping is around 0.75GiB but varies depending on the kernel's .config.
Our current scheme of loading the FDT as high as 4GB therefore fails with
larger amounts of dom0 RAM.
The upstream documentation has recently been update to provide more guidance
<http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7824/1>. In
accordance with this load the kernel just below 128MiB (aligned to 2MB) and
the FDT just above, or if there is less RAM available then as high as
possible.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Tue, 17 Sep 2013 15:56:28 +0000 (16:56 +0100)]
xen: arm: improve VMID allocation.
The VMID field is 8 bits. Rather than allowing only up to 256 VMs per host
reboot before things start "acting strange" instead maintain a simple bitmap
of used VMIDs and allocate them statically to guests upon creation.
This limits us to 256 concurrent VMs which is a reasonable improvement.
Eventually we will want a proper scheme to allocate VMIDs on context switch.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
Olaf Hering [Fri, 20 Sep 2013 09:41:08 +0000 (11:41 +0200)]
unmodified_drivers: enable unplug per default
Since xen-3.3 an official unplug protocol for emulated hardware is
available in the toolstack. The pvops kernel does the unplug per
default, so it is safe to do it also in the drivers for forward ported
xenlinux.
Currently its required to load xen-platform-pci with the module
parameter dev_unplug=all, which is cumbersome.
Also recognize the dev_unplug=never parameter, which provides the
default before this patch.
sched_credit: filter node-affinity mask against online cpus
in _csched_cpu_pick(), as not doing so may result in the domain's
node-affinity mask (as retrieved by csched_balance_cpumask() )
and online mask (as retrieved by cpupool_scheduler_cpumask() )
having an empty intersection.
Therefore, when attempting a node-affinity load balancing step
and running this:
...
/* Pick an online CPU from the proper affinity mask */
csched_balance_cpumask(vc, balance_step, &cpus);
cpumask_and(&cpus, &cpus, online);
...
we end up with an empty cpumask (in cpus). At this point, in
the following code:
....
/* If present, prefer vc's current processor */
cpu = cpumask_test_cpu(vc->processor, &cpus)
? vc->processor
: cpumask_cycle(vc->processor, &cpus);
....
an ASSERT (from inside cpumask_cycle() ) triggers like this:
It is for example sufficient to have a domain with node-affinity
to NUMA node 1 running, and issueing a `xl cpupool-numa-split'
would make the above happen. That is because, by default, all
the existing domains remain assigned to the first cpupool, and
it now (after the cpupool-numa-split) only includes NUMA node 0.
This change prevents that by generalizing the function used
for figuring out whether a node-affinity load balancing step
is legit or not. This way we can, in _csched_cpu_pick(),
figure out early enough that the mask would end up empty,
skip the step all together and avoid the splat.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Fri, 20 Sep 2013 09:05:28 +0000 (11:05 +0200)]
x86_emulate: fold wide reads
With HVM's MMIO operand handling now being capable of splitting large
reads, there's no need to issue at most machine word size reads when
we really need wider operands.
Not that this is not done everywhere - there are a couple of cases
where keeping the reads separate is more natural (and folding them
would complicate the code rather than simplifying it).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 20 Sep 2013 09:04:52 +0000 (11:04 +0200)]
x86_emulate: fix flag setting for 8-bit signed multiplication
We really need to check for a signed overflow of 8 bits, while the
previous check compared the sign-extended 8-bit result with the
zero-extended 16-bit one (which was wrong for all negative results).
Once at it
- also adjust the 16-bit comparison for symmetry
- improve the 8-bit multiplication (no need to zero-extend to 32-bits
the sign-extended to 16 bits original 8-bit value)
- fold both signed multiplication variants
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 20 Sep 2013 09:03:53 +0000 (11:03 +0200)]
x86_emulate: PUSH <mem> must read source operand just once
... for the case of accessing MMIO.
Rather than doing the early operand type adjustment for just for that
case, do it for all of the 0xF6, 0xF7, and 0xFF groups (allowing some
other code to be dropped instead).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 20 Sep 2013 09:03:12 +0000 (11:03 +0200)]
x86_emulate: MOVSXD must read source operand just once
... for the case of accessing MMIO.
Also streamline the ARPL emulation a little, and add tests for both
instructions (the MOVSXD one requires a few other adjustments, as we
now need to run in a mode where the emulator's mode_64bit() returns
true).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 20 Sep 2013 09:01:08 +0000 (11:01 +0200)]
x86/HVM: properly handle MMIO reads and writes wider than a machine word
Just like real hardware we ought to split such accesses transparently
to the caller. With little extra effort we can at once even handle page
crossing accesses correctly.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Tim Deegan [Thu, 19 Sep 2013 14:38:09 +0000 (15:38 +0100)]
x86: mark BUG()s and assertion failures as terminal.
This helps avoid static analysis false-positives, and might lead to
better code density as the compiler knows it doesn't have to restore
spilled state &c.
Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
George Dunlap [Wed, 18 Sep 2013 12:45:24 +0000 (14:45 +0200)]
x86/HVM: fix failure path in hvm_vcpu_initialise
It looks like one of the failure cases in hvm_vcpu_initialise jumps to
the wrong label; this could lead to slow leaks if something isn't
cleaned up properly.
I will probably change these labels in a future patch, but I figured
it was better to have this fix separately.
This is also a candidate for backport.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>