Jan Beulich [Thu, 18 Jul 2013 11:32:12 +0000 (13:32 +0200)]
VT-d: enable for multi-vector MSI
The main change being to make alloc_remap_entry() capable of allocating
a block of entries.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Jan Beulich [Thu, 18 Jul 2013 08:05:14 +0000 (10:05 +0200)]
x86: fix cache flushing condition in map_pages_to_xen()
This fixes yet another shortcoming of the function (exposed by 8bfaa2c2
["x86: add locking to map_pages_to_xen()"]'s adjustment to
msix_put_fixmap()): It must not flush caches when transitioning to a
non-present mapping. Doing so causes the CLFLUSH to fault, if used in
favor of WBINVD.
To help code readability, factor out the whole flush flags updating
in map_pages_to_xen() into a helper macro.
Andrew Cooper [Thu, 18 Jul 2013 07:16:15 +0000 (09:16 +0200)]
x86/time: Update wallclock in shared info when altering domain time offset
domain_set_time_offset() udpates d->time_offset_seconds, but does not correct
the wallclock in the shared info, meaning that it is incorrect until the next
XENPF_settime hypercall from dom0 which resynchronises the wallclock for all
domains.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
George Dunlap [Fri, 5 Jul 2013 11:13:54 +0000 (12:13 +0100)]
libxl: Allow network driver domains when run_hotplug_scritps is set
As of commit 05bfd984dfe7014f1f5ea1133608b9bab589c120, hotplug scripts
are not run if backend_domid != LIBXL_TOOSTACK_DOMID; so there is no reason
to restrict this for network driver domains any more.
Ian Campbell [Mon, 15 Jul 2013 08:24:05 +0000 (09:24 +0100)]
xen: arm: correctly configure NSACR.
Previously we were setting it up twice, the second time neglecting to set the
NS_SMP bit.
NSACR.NS_SMP is a processor specific bit which on Cortex-A7 and -A15 regulates
access to the (also processor specific) ACTLR.SMP bit. Not setting NSACR.NS_SMP
meant that Xen's attempts to set ACTLR.SMP was silently ignored. Setting this
bit is required in order to cause the processor to take part in cache and TLB
coherency protocols. Failure to set this bit leads to random memory corruption
in guests (although nothing like as catastrophic as you might expect!).
An alternative fix would have been to set ACTLR.SMP when in Secure World,
however Linux expects to set ACTLR.SMP itself in NS mode, so it's a good bet
that bootloaders will set NSACR.NS_SMP instead.
While here switch to a read-modify-write of NSACR to preserve any existing
bits -- seems safer.
Ian Murray [Wed, 3 Jul 2013 23:58:27 +0000 (00:58 +0100)]
xl: support for leaving domain paused after save
New feature to allow xl save to leave a domain paused after its
memory has been saved. This is to allow disk snapshots of domU
to be taken that exactly correspond to the memory state at save time.
Once the snapshot(s) have been taken or whatever, the domain can be
unpaused in the usual manner.
Usage:
xl save -p <domid> <filespec>
Signed-off-by: Ian Murray <murrayie@yahoo.co.uk> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Introduce Cortex-A7 with a scalable proc_info_list which including cpu id
and cpu initialize function.
In head.S, search cpu specific MIDR in procinfo and call such initialize
function. Currently, support Cortex-A7 and Cortex-A15.
Signed-off-by: Bamvor Jian Zhang <bjzhang@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Wed, 17 Jul 2013 08:21:33 +0000 (10:21 +0200)]
x86: don't use destroy_xen_mappings() for vunmap()
Its attempt to tear down intermediate page table levels may race with
map_pages_to_xen() establishing them, and now that
map_domain_page_global() is backed by vmap() this teardown is also
wasteful (as it's very likely to need the same address space populated
again within foreseeable time).
As the race between vmap() and vunmap(), according to the latest stage
tester logs, doesn't appear to be the only one still left, the patch
also adds logging for vmap() and vunmap() uses (there shouldn't be too
many of them, so logs shouldn't get flooded). These are supposed to
get removed (and are made stand out clearly) as soon as we're certain
that there's no issue left.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 17 Jul 2013 06:48:24 +0000 (08:48 +0200)]
VMX: suppress pointless indirect calls
Get the other virtual interrupt delivery related actors in sync
with the newly added handle_eoi() one: Clear the respective pointers
(thus avoiding the call from generic code) when the feature is
unavailable instead of checking feature availability in the actors.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
Jan Beulich [Wed, 17 Jul 2013 06:47:18 +0000 (08:47 +0200)]
VMX: fix interaction of APIC-V and Viridian emulation
Viridian using a synthetic MSR for issuing EOI notifications bypasses
the normal in-processor handling, which would clear
GUEST_INTR_STATUS.SVI. Hence we need to do this in software in order
for future interrupts to get delivered.
Based on analysis by Yang Z Zhang <yang.z.zhang@intel.com>.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
Andrew Cooper [Wed, 17 Jul 2013 06:45:20 +0000 (08:45 +0200)]
x86/cpuidle: Change logging for unknown APIC IDs
Dom0 uses this hypercall to pass ACPI information to Xen. It is not very
uncommon for more cpus to be listed in the ACPI tables than are present on the
system, particularly on systems with a common BIOS for a 2 and 4 socket server
varients.
As Dom0 does not control the number of entries in the ACPI tables, and is
required to pass everything it finds to Xen, change the logging.
There is now an single unconditional warning for the first unknown ID, and
further warnings if "cpuinfo" is requested by the user on the command line.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 16 Jul 2013 09:54:07 +0000 (11:54 +0200)]
AMD IOMMU: untie remap and vector maps
With the specific IRTEs used for an interrupt no longer depending on
the vector, there's no need to tie the remap sharing model to the
vector sharing one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Jan Beulich [Tue, 16 Jul 2013 09:52:38 +0000 (11:52 +0200)]
AMD IOMMU: allocate IRTE entries instead of using a static mapping
For multi-vector MSI, where we surely don't want to allocate
contiguous vectors and be able to set affinities of the individual
vectors separately, we need to drop the use of the tuple of vector and
delivery mode to determine the IRTE to use, and instead allocate IRTEs
(which imo should have been done from the beginning).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Andrew Cooper [Tue, 16 Jul 2013 09:10:45 +0000 (11:10 +0200)]
x86: Special case __HYPERVISOR_iret rather more when writing hypercall pages
In all cases when a hypercall page is written, __HYPERVISOR_iret is first
written as a regular hypercall, then subsequently rewritten in its special
case.
For VMX and SVM, this means that following the ud2a instruction is 3 bytes of
an imm32 parameter. For a ring3 kernel, this means that following the syscall
instruction is the second half of 'pop %r11'.
For a ring1 kernel, the iret case ends up as the same number of bytes as the
rest of the hypercalls, but it is pointless writing it twice, and is changed
for consistency.
Therefore, skip the loop iteration which would write the incorrect
__HYPERVISOR_iret hypercall. This removes junk machine code from the tail and
makes disassemblers rather more happy when looking at the hypercall page.
Also, a miscellaneous whitespace fix in the comment for ring3 kernel.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 15 Jul 2013 12:21:45 +0000 (14:21 +0200)]
AMD IOMMU: use ioremap()
There's no point in using the fixmap here, and it gets
map_iommu_mmio_region() in line with unmap_iommu_mmio_region(), which
was already using iounmap() (thus crashing if actually used).
Jan Beulich [Mon, 15 Jul 2013 12:21:03 +0000 (14:21 +0200)]
VT-d: use ioremap()
There's no point in using the fixmap here, and it gets iommu_alloc()
in line with iommu_free(), which was already using iounmap() (thus
crashing if actually used).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Mon, 15 Jul 2013 12:17:56 +0000 (14:17 +0200)]
x86: add locking to map_pages_to_xen()
While boot time calls don't need this, run time uses of the function
which may result in L2 page tables getting populated need to be
serialized to avoid two CPUs populating the same L2 (or L3) entry,
overwriting each other's results.
This is expected to fix what would seem to be a regression from commit b0581b92 ("x86: make map_domain_page_global() a simple wrapper around
vmap()"), albeit that change only made more readily visible the already
existing issue.
This patch intentionally does not
- add locking to the page table de-allocation logic in
destroy_xen_mappings() (the only user having potential races here,
msix_put_fixmap(), gets converted to use __set_fixmap() instead)
- avoid races between super page splitting and reconstruction in
map_pages_to_xen() (no such uses exist; races between multiple
splitting attempts or between multiple reconstruction attempts are
being taken care of)
If we wanted to take care of these, we'd need to alter the behavior
of virt_to_xen_l?e() - they would need to return with the lock held
then.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Wed, 10 Jul 2013 10:54:00 +0000 (12:54 +0200)]
arm: correct vfp save/restore asm constraints
Some versions of gcc complain:
> vfp.c: In function 'vfp_restore_state':
> vfp.c:45:27: error: memory input 0 is not directly addressable
> vfp.c:51:31: error: memory input 0 is not directly addressable
There is no way to express the constraint we want (which is the address of the
array, clobbering the whole array). Therefore we have to fake it up by using
two constraints.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Will.Deacon@arm.com Acked-by: Julien Grall <julien.grall@linaro.org>
Jan Beulich [Wed, 10 Jul 2013 08:03:40 +0000 (10:03 +0200)]
adjust x86 EFI build
While the rule to generate .init.o files from .o ones already correctly
included $(extra-y), the setting of the necessary compiler flag didn't
have the same. With some yet to be posted patch this resulted in build
breakage because of the compiler deciding not to inline a few functions
(which then results in .text not being empty as required for these
object files).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
libxl: do not call exit() in libxl_device_vtpm_list
Signal error with NULL return value, do not terminate the whole process.
Signed-off-by: Marek Marczykowski <marmarek@invisiblethingslab.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Thu, 4 Jul 2013 08:33:18 +0000 (10:33 +0200)]
x86/mm: Ensure useful progress in alloc_l2_table()
While debugging the issue which turned out to be XSA-58, a printk in this loop
showed that it was quite easy to never make useful progress, because of
consistently failing the preemption check.
One single l2 entry is a reasonable amount of work to do, even if an action is
pending, and also assures forwards progress across repeat continuations.
Tweak the continuation criteria to fail on the first iteration of the loop.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Thu, 4 Jul 2013 08:32:44 +0000 (10:32 +0200)]
use SMP barrier in common code dealing with shared memory protocols
Xen currently makes no strong distinction between the SMP barriers (smp_mb
etc) and the regular barrier (mb etc). In Linux, where we inherited these
names from having imported Linux code which uses them, the SMP barriers are
intended to be sufficient for implementing shared-memory protocols between
processors in an SMP system while the standard barriers are useful for MMIO
etc.
On x86 with the stronger ordering model there is not much practical difference
here but ARM has weaker barriers available which are suitable for use as SMP
barriers.
Therefore ensure that common code uses the SMP barriers when that is all which
is required.
On both ARM and x86 both types of barrier are currently identical so there is
no actual change. A future patch will change smp_mb to a weaker barrier on
ARM.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 4 Jul 2013 08:27:39 +0000 (10:27 +0200)]
x86: make map_domain_page_global() a simple wrapper around vmap()
This is in order to reduce the number of fundamental mapping mechanisms
as well as to reduce the amount of code to be maintained. In the course
of this the virtual space available to vmap() is being grown from 16Gb
to 64Gb.
Note that this requires callers of unmap_domain_page_global() to no
longer pass misaligned pointers - map_domain_page_global() returns page
size aligned pointers, so unmappinmg should be done accordingly.
unmap_vcpu_info() violated this and is being adjusted here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 4 Jul 2013 08:26:24 +0000 (10:26 +0200)]
bitmap_*() should cope with zero size bitmaps
... to match expectations set by memset()/memcpy().
Similarly for find_{first,next}_{,zero_}_bit() on x86.
__bitmap_shift_{left,right}() would also need fixing (they more
generally can't cope with the shift count being larger than the bitmap
size, and they perform undefined operations by possibly shifting an
unsigned long value by BITS_PER_LONG bits), but since these functions
aren't really used anywhere I wonder if we wouldn't better simply get
rid of them.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Ben Guthro [Thu, 4 Jul 2013 08:23:36 +0000 (10:23 +0200)]
x86: Restore reboot quirks by DMI, fix reboot on a number of systems
The following patch ports the functionality following changeset from
Linux (from 2008) to xen:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=14d7ca5c
It implements an additional reboot quirk to do a PCI reset via port
CF9.
This also restores some code dropped in the x86_32 target removal
(changeset 5d1181a5ea5e0f11d481a94b16ed00d883f9726e) which sets some
quirks based on DMI matching.
This will add reboot quirks on the following systems that are known to
be necessary on Linux:
Dell E520
Dell PowerEdge 1300
Dell PowerEdge 300
Dell OptiPlex 745
Dell OptiPlex 745
Dell OptiPlex 745
Dell OptiPlex 330
Dell OptiPlex 360
Dell OptiPlex 760
Dell PowerEdge 2400
Dell Precision T5400
Dell Precision T7400
HP Compaq Laptop
Dell XPS710
Dell DXP061
Sony VGN-Z540N
ASUS P4S800
Acer Aspire One A110
Apple MacBook5
Apple MacBookPro5
Apple Macmini3,1
Apple iMac9,1
Dell Latitude E6320
Dell Latitude E5420
Dell Latitude E6220
Dell Latitude E6420
Dell OptiPlex 990
Dell OptiPlex 990
Dell Latitude E6520
Dell OptiPlex 790
Dell OptiPlex 990
Dell OptiPlex 390
Dell Latitude E6320
Dell Latitude E6420
Dell Latitude E6520
I clearly have not been able to test on all of these systems.
It does fix rebooting on the Dell 790, and should *not* change the
reboot paths of systems not on this DMI match list.
Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com>
Use driver_data, thus requiring only a single handler function.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org Acked-by: Ben Guthro <benjamin.guthro@citrix.com>
The IOMMU interrupt handling in bottom half must clear the PPR log interrupt
and event log interrupt bits to re-enable the interrupt. This is done by
writing 1 to the memory mapped register to clear the bit. Due to hardware bug,
if the driver tries to clear this bit while the IOMMU hardware also setting
this bit, the conflict will result with the bit being set. If the interrupt
handling code does not make sure to clear this bit, subsequent changes in the
event/PPR logs will no longer generating interrupts, and would result if
buffer overflow. After clearing the bits, the driver must read back
the register to verify.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Adjust to apply on top of heavily modified patch 1. Adjust flow to get away
with a single readl() in each instance of the status register checks.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
iommu/amd: Fix logic for clearing the IOMMU interrupt bits
The IOMMU interrupt bits in the IOMMU status registers are
"read-only, and write-1-to-clear (RW1C). Therefore, the existing
logic which reads the register, set the bit, and then writing back
the values could accidentally clear certain bits if it has been set.
The correct logic would just be writing only the value which only
set the interrupt bits, and leave the rest to zeros.
This patch also, clean up #define masks as Jan has suggested.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
With iommu_interrupt_handler() properly having got switched its readl()
from status to control register, the subsequent writel() needed to be
switched too (and the RW1C comment there was bogus).
Some of the cleanup went too far - undone.
Further, with iommu_interrupt_handler() now actually disabling the
interrupt sources, they also need to get re-enabled by the tasklet once
it finished processing the respective log. This also implies re-running
the tasklet so that log entries added between reading the log and re-
enabling the interrupt will get handled in a timely manner.
Finally, guest write emulation to the status register needs to be done
with the RW1C (and RO for all other bits) semantics in mind too.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Jan Beulich [Tue, 2 Jul 2013 06:48:03 +0000 (08:48 +0200)]
x86: don't pass negative time to gtime_to_gtsc() (try 2)
This mostly reverts commit eb60be3d ("x86: don't pass negative time to
gtime_to_gtsc()") and instead corrects __update_vcpu_system_time()'s
handling of this_cpu(cpu_time).stime_local_stamp dating back before the
start of a HVM guest (which would otherwise lead to a negative value
getting passed to gtime_to_gtsc(), causing scale_delta() to produce
meaningless output).
Flushing the value to zero was wrong, and printing a message for
something that can validly happen wasn't very useful either.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jacob Shin [Tue, 2 Jul 2013 06:47:00 +0000 (08:47 +0200)]
cpufreq, xenpm: fix cpufreq and xenpm mismatch
Currently cpufreq and xenpm are out of sync. Fix cpufreq reporting of
if turbo mode is enabled or not. Fix xenpm to not decode for tristate,
but a boolean.
Jan Beulich [Tue, 2 Jul 2013 06:42:49 +0000 (08:42 +0200)]
x86/fxsave: bring in line with recent xsave adjustments
Defer the FIP/FDP pointer reset needed on AMD CPUs to the restore path,
and switch from using EMMS to FFREE here too (to be resistant against
eventual future CPUs without MMX support). Also switch from using an
almost typeless pointer in fpu_fxrstor() to a properly typed one, thus
telling the compiler the truth about which memory gets accessed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 2 Jul 2013 06:41:28 +0000 (08:41 +0200)]
x86/xsave: adjust state management
The initial state for a vCPU is using default values, so there's no
need to force the XRSTOR to read the state from memory. This saves a
couple of thousand restores from memory just during boot of Linux on
my Sandy Bridge system (I didn't try to make further measurements).
The above requires that arch_set_info_guest() updates the state flags
in the save area when valid floating point state got passed in, but
that would really have been needed even before in case XSAVE{,OPT}
decided to clear one or both of the FP and SSE bits.
Furthermore, hvm_vcpu_reset_state() shouldn't just clear out the FPU/
SSE area, but needs to re-initialized MXCSR and FCW.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Ian Jackson [Mon, 1 Jul 2013 14:20:28 +0000 (15:20 +0100)]
libxl: suppress device assignment to HVM guest when there is no IOMMU
This in effect copies similar logic from xend: While there's no way to
check whether a device is assigned to a particular guest,
XEN_DOMCTL_test_assign_device at least allows checking whether an
IOMMU is there and whether a device has been assign to _some_
guest.
For the time being, this should be enough to cover for the missing
error checking/recovery in other parts of libxl's device assignment
paths.
There remains a (functionality-, but not security-related) race in
that the iommu should be set up earlier, but this is too risky a
change for this stage of the 4.3 release.
This is a security issue, XSA-61.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Thu, 27 Jun 2013 17:13:30 +0000 (18:13 +0100)]
xen/arm: Rework the way to compute dom0 DTB base address
If the DTB is loading right after the kernel, on some setup, Linux will
overwrite the DTB during the decompression step.
To be sure the DTB won't be overwritten by the decompression stage, load
the DTB near the end of the first memory bank and below 4Gib (if memory range is
greater).
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Fri, 28 Jun 2013 11:25:57 +0000 (12:25 +0100)]
xen/arm: gic_shutdown_irq must only disable the right IRQ
When GICD_ICENABLERn is read, all the 1s bit represent enabled IRQs.
Currently gic_shutdown_irq:
- read GICD_ICENABLER
- set the corresping bit to 1
- write back the new value
That means, Xen will disable more IRQs than necessary.
Dongxiao Xu [Thu, 27 Jun 2013 15:01:26 +0000 (17:01 +0200)]
nested vmx: Fix the booting of L2 PAE guest
When doing virtual VM entry and virtual VM exit, we need to
sychronize the PAE PDPTR related VMCS registers. With this fix,
we can boot 32bit PAE L2 guest (Win7 & RHEL6.4) on "Xen on Xen"
environment.
Andrew Cooper [Thu, 27 Jun 2013 12:01:18 +0000 (14:01 +0200)]
AMD/intremap: Prevent use of per-device vector maps until irq logic is fixed
XSA-36 changed the default vector map mode from global to per-device. This is
because a global vector map does not prevent one PCI device from impersonating
another and launching a DoS on the system.
However, the per-device vector map logic is broken for devices with multiple
MSI-X vectors, which can either result in a failed ASSERT() or misprogramming
of a guests interrupt remapping tables. The core problem is not trivial to
fix.
In an effort to get AMD systems back to a non-regressed state, introduce a new
type of vector map called per-device-global. This uses per-device vector maps
in the IOMMU, but uses a single used_vector map for the core IRQ logic.
This patch is intended to be removed as soon as the per-device logic is fixed
correctly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
This grub.cfg from a default fedora 19 Beta install
caused pygrub failures.The previous pygrub commit
fixed taht. So this example file added for reference.
Signed-off-by: Marcel Mol <marcel@mesa.nl> Acked-by: Ian Campbell <ian.campbell@citrix.com>
pygrub/GrubConf: fix boot problem for fedora 19 grub.cfg (2nd attempt)
Booting a fedora 19 domU failed because a it could not properly
parse the grub.cfg file. This was cased by
set default="${next_entry}"
This statement actually is within an 'if' statement, so maybe it would
be better to skip code within if/fi blocks...
But this patch seems to work fine.
Signed-off-by: Marcel Mol <marcel@mesa.nl> Acked-by: Ian Campbell <ian.campbell@citix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Murray [Sat, 22 Jun 2013 12:38:11 +0000 (13:38 +0100)]
Xendomains was not correctly suspending domains when a STOP was issued.
The regex was not selecting the { when parsing JSON output of xl list -l.
It was also not selecting (domain when parsing xl list -l when SXP selected.
Pefixed { with 4 spaces, and removed an extra ( before domain in the regex
string
Added quotes around the grep strings so the spaces inserted into the string
didn't not break the grepping.
This has now been tested against 4.3RC5
Signed-off-by: Ian Murray <murrayie@yahoo.co.uk> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Anthony PERARD [Wed, 26 Jun 2013 15:54:31 +0000 (16:54 +0100)]
libxl: Use QMP cpu-add to hotplug CPU with qemu-xen.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Anthony PERARD [Wed, 26 Jun 2013 15:54:30 +0000 (16:54 +0100)]
libxl: Add "cpu-add" QMP command.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- rename index parameter to avoid Wshadow due to index(3) in strings.h ]
Andrew Cooper [Mon, 24 Jun 2013 15:47:05 +0000 (16:47 +0100)]
tools/libxc: Fix memory leaks in xc_domain_save()
Introduces outbuf_free() to mirror the currently existing outbuf_init().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 26 Jun 2013 13:23:35 +0000 (14:23 +0100)]
libxc: Fix guest boot on ARM after XSA-55
XSA-55 has exposed errors for guest creation on ARM:
- domain virt_base was not defined;
- xc_dom_alloc_segment allocates pfn from 0 instead of the RAM base address.
Jim Fehlig [Tue, 25 Jun 2013 22:02:15 +0000 (16:02 -0600)]
libxl: Fix assignment of devid value returned from libxl__device_nextid
Commit 5420f265 has some misplaced parenthesis that caused devid
to be assigned 1 or 0 based on checking return value of
libxl__device_nextid < 0, e.g.
devid = libxl__device_nextid(...) < 0
This works when only one instance of a given device type exists, but
subsequent devices of the same type will also have a devid = 1 if
libxl__device_nextid succeeds. Fix by checking the value assigned to
devid, e.g.
(devid = libxl__device_nextid(...)) < 0
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
In the original patch 7 of the series addressing XSA-45 I mistakenly
took the addition of the call to get_page_light() in alloc_page_type()
to cover two decrements that would happen: One for the PGT_partial bit
that is getting set along with the call, and the other for the page
reference the caller hold (and would be dropping on its error path).
But of course the additional page reference is tied to the PGT_partial
bit, and hence any caller of a function that may leave
->arch.old_guest_table non-NULL for error cleanup purposes has to make
sure a respective page reference gets retained.
Similar issues were then also spotted elsewhere: In effect all callers
of get_page_type_preemptible() need to deal with errors in similar
ways. To make sure error handling can work this way without leaking
page references, a respective assertion gets added to that function.
This is CVE-2013-1432 / XSA-58.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 25 Jun 2013 13:57:44 +0000 (15:57 +0200)]
VMX/Viridian: suppress MSR-based APIC suggestion when having APIC-V
When the CPU has the necessary capabilities, having Windows use
synthetic MSR reads/writes is bogus, as this still requires emulation
(which is pretty much guaranteed to be slower than having the hardware
carry out the operation).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Ian Jackson [Tue, 25 Jun 2013 10:24:22 +0000 (11:24 +0100)]
libxl: Restrict permissions on PV console device xenstore nodes
Matthew Daley has observed that the PV console protocol places sensitive host
state into a guest writeable xenstore locations, this includes:
- The pty used to communicate between the console backend daemon and its
client, allowing the guest administrator to read and write arbitrary host
files.
- The output file, allowing the guest administrator to write arbitrary host
files or to target arbitrary qemu chardevs which include sockets, udp, ptr,
pipes etc (see -chardev in qemu(1) for a more complete list).
- The maximum buffer size, allowing the guest administrator to consume more
resources than the host administrator has configured.
- The backend to use (qemu vs xenconsoled), potentially allowing the guest
administrator to confuse host software.
So we arrange to make the sensitive keys in the xenstore frontend directory
read only for the guest. This is safe since the xenstore permissions model,
unlike POSIX directory permissions, does not allow the guest to remove and
recreate a node if it has write access to the containing directory.
There are a few associated wrinkles:
- The primary PV console is "special". It's xenstore node is not under the
usual /devices/ subtree and it does not use the customary xenstore state
machine protocol. Unfortunately its directory is used for other things,
including the vnc-port node, which we do not want the guest to be able to
write to. Rather than trying to track down all the possible secondary uses
of this directory just make it r/o to the guest. All newly created
subdirectories inherit these permissions and so are now safe by default.
- The other serial consoles do use the customary xenstore state machine and
therefore need write access to at least the "protocol" and "state" nodes,
however they may also want to use arbitrary "feature-foo" nodes (although
I'm not aware of any) and therefore we cannot simply lock down the entire
frontend directory. Instead we add support to libxl__device_generic_add for
frontend keys which are explicitly read only and use that to lock down the
sensitive keys.
- Minios' console frontend wants to write the "type" node, which it has no
business doing since this is a host/toolstack level decision. This fails
now that the node has become read only to the PV guest. Since the toolstack
already writes this node just remove the attempt to set it.
This is a security issue, XSA-57.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Fri, 21 Jun 2013 16:36:26 +0000 (17:36 +0100)]
tools/libxc: Fix memory leaks in xc_domain_restore()
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> (re 4.3 release) Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Tue, 18 Jun 2013 09:08:01 +0000 (10:08 +0100)]
libxl,hvmloader: Don't relocate memory for MMIO hole
At the moment, qemu-xen can't handle memory being relocated by
hvmloader. This may happen if a device with a large enough memory
region is passed through to the guest. At the moment, if this
happens, then at some point in the future qemu will crash and the
domain will hang. (qemu-traditional is fine.)
It's too late in the release to do a proper fix, so we try to do
damage control.
hvmloader already has mechanisms to relocate memory to 64-bit space if
it can't make a big enough MMIO hole. By default this is 2GiB; if we
just refuse to make the hole bigger if it will overlap with guest
memory, then the relocation will happen by default.
v5:
- Update comment to not refer to "this series".
v4:
- Wrap long line in libxl_dm.c
- Fix comment
v3:
- Fix polarity of comparison
- Move diagnostic messages to another patch
- Tested with xen platform pci device hacked to have different BAR sizes
{256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory
configurations
- Add comment explaining why we default to "allow"
- Remove cast to bool
v2:
- style fixes
- fix and expand comment on the MMIO hole loop
- use "%d" rather than "%s" -> (...)?"1":"0"
- use bool instead of uint8_t
- Move 64-bit bar relocate detection to another patch
- Add more diagnostic messages
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> CC: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 18 Jun 2013 13:56:29 +0000 (14:56 +0100)]
hvmloader: Remove minimum size for BARs to relocate to 64-bit space
Allow devices with BARs less than 512MiB to be relocated to high
memory.
This will only be invoked if there is not enough low MMIO space to map
the device, and will be done preferentially to large devices first; so
in all likelihood only large devices will be remapped anyway.
This is needed to work-around the issue of qemu-xen not being able to
handle moving guest memory around to resize the MMIO hole. The
default MMIO hole size is less than 256MiB.
v3:
- Fixed minor style issue
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 18 Jun 2013 14:32:35 +0000 (15:32 +0100)]
hvmloader: Load large devices into high MMIO space as needed
Keep track of how much mmio space is left total, as well as the amount
of "low" MMIO space (<4GiB), and only load devices into high memory if
there is not enough low memory for the rest of the devices to fit.
Because devices are processed by size in order from large to small,
this should preferentially relocate devices with large BARs to 64-bit
space.
v3:
- Just use mmio_total rather than introducing a new variable.
- Port to using mem_resource directly rather than low_mmio_left
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 18 Jun 2013 14:11:03 +0000 (15:11 +0100)]
hvmloader: Correct bug in low mmio region accounting
When deciding whether to map a device in low MMIO space (<4GiB),
hvmloader compares it with "mmio_left", which is set to the size of
the low MMIO range (pci_mem_end - pci_mem_start). However, even if it
does map a device in high MMIO space, it still removes the size of its
BAR from mmio_left.
In reality we don't need to do a separate accounting of the low memory
available -- this can be calculated from mem_resource. Just get rid
of the variable and the duplicate accounting entirely. This will make
the code more robust.
Note also that the calculation of whether to move a device to 64-bit
is fragile at the moment, depending on some unstated assumptions.
State those assumptions in a comment for future reference.
v5:
- Add comment documenting fragility of the move-to-highmem check
v3:
- Use mem_resource values directly instead of doing duplicate
accounting
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 18 Jun 2013 11:48:36 +0000 (12:48 +0100)]
hvmloader: Fix check for needing a 64-bit bar
After attempting to resize the MMIO hole, the check to determine
whether there is a need to relocate BARs into 64-bit space checks the
specific thing that caused the loop to exit (MMIO hole == 2GiB) rather
than checking whether the required MMIO will fit in the hole.
But even then it does it wrong: the polarity of the check is
backwards.
Check for the actual condition we care about (the sizeof the MMIO
hole) rather than checking for the loop exit condition.
v3:
- Move earlier in the series, before other functional changes
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap [Wed, 19 Jun 2013 16:30:13 +0000 (17:30 +0100)]
hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G
hvmloader will read hvm_info->high_mem_pgend to calculate where to
start the highmem PCI region. However, if the guest does not have any
memory in the high region, this is set to zero, which will cause
hvmloader to use the "0" for the base of the highmem region, rather
than 1 << 32.
Check to see whether hvm_info->high_mem_pgend is set; if so, do the
normal calculation; otherwise, use 1<<32.
v4:
- Handle case where hfm_info->high_mem_pgend is non-zero but doesn't
point into high memory, throwing a warning.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 21 Jun 2013 06:39:43 +0000 (08:39 +0200)]
tpmif: fix identifier prefixes
The definitions here shouldn't use vtpm_ or VPTM_ as their prefixes,
the interface should instead make use of tpmif_ and TPMIF_. This
fixes a build failure after syncing the public headers to
linux-2.6.18-xen.hg (where a struct vtpm_state already exists).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>