Paul Durrant [Fri, 24 Apr 2015 12:49:58 +0000 (13:49 +0100)]
x86/hvm: disallow guest get and set of all ioreq server HVM params
A guest has no need to touch these parameters and reading
HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or HVM_PARAM_BUFIOREQ_EVTCHN
may cause Xen to create a default ioreq server where one did not already
exist.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Fri, 24 Apr 2015 12:42:25 +0000 (13:42 +0100)]
x86/hvm: introduce functions for HVMOP_get/set_param allowance checks
Some parameters can only (validly) be set once. Some cannot be set
by a guest for its own domain. Consolidate these checks, along with
the XSM check, in a new hvm_allow_set_param() function for clarity.
Also, introduce hvm_allow_get_param() for similar reasons.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Fri, 24 Apr 2015 12:07:49 +0000 (13:07 +0100)]
x86/hvm: give HVMOP_set_param and HVMOP_get_param their own functions
The level of switch nesting in those ops is getting unreadable. Giving
them their own functions does introduce some code duplication in the
the pre-op checks but the overall result is easier to follow.
This patch is code movement. There is no functional change.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 10 Apr 2015 15:26:18 +0000 (11:26 -0400)]
VTd/dmar: Tweak how the DMAR table is clobbered
Intead of clobbering DMAR -> XMAR and back, clobber to RMAD instead. This
means that changing the signature does not alter the checksum, which allows
the clobbering/unclobbering to be peformed atomically and idempotently, which
is an advantage on the kexec path which can reenter acpi_dmar_reinstate().
This DMAR clobbering was introduced by 83904107a33c9badc34ecdd1f8ca0f9271e5e370 which claims that the dom0 VT-d
driver was capable of playing with the IOMMU(s) while Xen was also using
them. An alternative approach might be to leave the DMAR table alone
and sprinkle some iomem_deny_access() around to forcibly prevent dom0
from playing but this is simpler.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix> CC: Yang Zhang <yang.z.zhang@intel> Acked-by: Kevin Tian <kevin.tian@intel>
Andrew Cooper [Mon, 30 Mar 2015 14:20:19 +0000 (15:20 +0100)]
tools/hvmloader: Don't perform AML hotplug debugging in production
It is number of vmexits and a moderate quantity of qemu logging which can
safely be avoided when not specifically debugging a PCI hotplug issue.
As mk_dsdt is a build system tool, pass 'debug' as a command line parameter
rather than "hardcoding" it via the compilation of mk_dsdt itself.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Andrew Cooper [Tue, 7 Apr 2015 17:26:18 +0000 (18:26 +0100)]
x86/link: Introduce and use __bss_end
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Tue, 7 Apr 2015 17:26:17 +0000 (18:26 +0100)]
x86/smp: Clean up use of memflags in cpu_smpboot_alloc()
Hoist MEMF_node(cpu_to_node(cpu)) to the start of the function, and avoid
passing (potentially bogus) memflags if node information is not available.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Tue, 7 Apr 2015 17:26:16 +0000 (18:26 +0100)]
x86/numa: Correct the extern of cpu_to_node
This was missed by c/s 54ce2db "x86/numa: adjust datatypes for node and pxm"
which changed the array definition in numa.c
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Tue, 7 Apr 2015 17:26:15 +0000 (18:26 +0100)]
x86/link: Discard the alternatives ".discard" sections
This appears to have been missed when porting the alternatives framework from
Linux, and saves us a section which is otherwise loaded into memory.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Boris Ostrovsky [Thu, 9 Apr 2015 20:38:43 +0000 (16:38 -0400)]
x86/dom0: Don't allow dom0_max_vcpus to be zero
In case dom0_max_vcpus is incorrectly specified on boot line make sure
we will still boot.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Liang Li [Tue, 7 Apr 2015 13:27:02 +0000 (21:27 +0800)]
x86/hvm: Fix the unknown nested vmexit reason 80000021 bug
This bug will be trigged when NMI happen in the L2 guest. The current
code handles the NMI incorrectly. According to Intel SDM 31.7.1.2
(Resuming Guest Software after Handling an Exception), If bit 31 of the
IDT-vectoring information fields is set, and the virtual NMIs VM-execution
control is 1, while bits 10:8 in the IDT-vectoring information field is
2, bit 3 in the interruptibility-state field should be cleared to avoid
the next VM entry fail.
Signed-off-by: Liang Li <liang.z.li@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Thu, 9 Apr 2015 18:49:25 +0000 (19:49 +0100)]
libxl: use new QEMU xenstore protocol
Originally both QEMU traditional and QEMU upstream used hardcoded
/local/domain/0 paths. This patch changes the protocol to use
/local/domain/$dm_domid path.
For QEMU traditional and upstream without stubdom, $dm_domid is 0 so
the path is in fact still /local/domain/0.
For QEMU traditional stubdom, this is incompatible protocol change.
However QEMU traditional is shipped with Xen so we are allowed to do
such change. This change requires to corresponding QEMU traditional
changeset.
There is no compatibility issue with QEMU upstream stubdom, because QEMU
upstream stubdom doesn't exist yet.
Watch /local/domain/$dm_domid/device-model/$domid/state, wait until
state turns "running" then unpause guest.
LIBXL_STUBDOM_START_TIMEOUT is the timeout used wait for stubdom to be
ready. My test on a very old machine (Core 2 6400) showed that it might
need more than 20s before the stubdom is ready to serve DomU.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:58 +0000 (22:06 +0100)]
x86/hvm: factor out and rename vm_event related functions
To avoid growing hvm.c these functions can be stored separately. Minor style
changes are applied to the logic in the file.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:57 +0000 (22:06 +0100)]
tools/tests: Clean-up tools/tests/xen-access
The spin-lock implementation in the xen-access test program is implemented
in a fashion that is actually incomplete. The x86 assembly that guarantees that
the lock is held by only one thread lacks the "lock;" instruction.
However, the spin-lock is not actually necessary in xen-access as it is not
multithreaded. The presence of the faulty implementation of the lock in a non-
multithreaded environment is unnecessarily complicated for developers who are
trying to follow this code as a guide in implementing their own applications.
Thus, removing it from the code improves the clarity on the behavior of the
system.
Also converting functions that always return 0 to return to void, and making
the teardown function actually return an error code on error.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:56 +0000 (22:06 +0100)]
xen: Rename mem_event to vm_event
In this patch we mechanically rename mem_event to vm_event. This patch
introduces no logic changes to the code. Using the name vm_event better
describes the intended use of this subsystem, which is not limited to memory
events. It can be used for off-loading the decision making logic into helper
applications when encountering various events during a VM's execution.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:55 +0000 (22:06 +0100)]
xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup
The only use-case of the mem_event_op structure had been in mem_paging,
thus renaming the structure mem_paging_op and relocating its associated
functions clarifies its actual usage.
As part of this fix-up we also convert the gfn's in the toolstack to be
explicitely 64-bit wide and clean the code a bit.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:54 +0000 (22:06 +0100)]
xen/mem_event: Cleanup mem_event names in rings, functions and domctls
The name of one of the mem_event rings still implies it is used only
for memory accesses, which is no longer the case. It is also used to
deliver various HVM events, thus the name "monitor" is more appropriate
in this setting.
Couple functions incorrectly labeled as part of mem_event is also renamed
to reflect that they belong to mem_access.
The mem_event subop definitions are also shortened to be more meaningful.
The tool side changes are only mechanical renaming to match these new names.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:53 +0000 (22:06 +0100)]
xen/mem_event: Cleanup of mem_event structures
The public mem_event structures used to communicate with helper applications via
shared rings have been used in different settings. However, the variable names
within this structure have not reflected this fact, resulting in the reuse of
variables to mean different things under different scenarios.
This patch remedies the issue by clearly defining the structure members based on
the actual context within which the structure is used.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Ian Jackson [Thu, 2 Apr 2015 14:32:22 +0000 (15:32 +0100)]
Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"
The bootloader code is relying on detecting POLLHUP, and 7e9ec50b
breaks that. 7e9ec50b, when handling a pty master, violates the
specification of the datacopier interface (as defined).
When the bootloader exits, several things change, all at once:
(a) The master pty fd (held by libxl) starts to signal POLLHUP
and maybe also POLLIN.
(b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
which will be handled by the libxl child process code.
(c) reads on the master pty fd start to return EOF
From the point of view of the datacopier these might happen in any
order. I think there is a latent bug with (c), which I will discuss
later in this email.
In a recent bug report from a FreeBSD installation, the datacopier
gets told about (a) before (b). But 7e9ec50b filters the POLLHUP out,
so that the dc signals eof rather than hup. As a result in
bootloader_copyfail we take the error path.
Olaf Hering [Wed, 1 Apr 2015 13:28:35 +0000 (13:28 +0000)]
hvmloader: add knob for fixed VGABIOS date string
To allow reproducible builds of hvmloader introduce a make variable
VGABIOS_REL_DATE="dd Mon yyyy" to provide a fixed date string. Without
this change the hvmloader binary changes with every rebuild.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Wed, 1 Apr 2015 13:28:34 +0000 (13:28 +0000)]
hvmloader: add knob for fixed SMBIOS date string
To allow reproducible builds of hvmloader introduce a make variable
SMBIOS_REL_DATE=mm/dd/yyyy to provide a fixed date string. Without this
change the hvmloader binary changes with every rebuild.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Apr 2015 13:28:33 +0000 (13:28 +0000)]
INSTALL: mention variables for reproducible builds
Mention two variables introduced by commit ac977f5 ("use more fixed
strings to build the hypervisor").
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Wed, 1 Apr 2015 13:28:32 +0000 (13:28 +0000)]
tools/hotplug: introduce XENSTORED_ARGS= in sysconfig file.
It is already used in the runlevel script and the service file.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
xen/arm: route_irq_to_guest: Check validity of the IRQ
Currently Xen only supports SPIs routing for guest, add a function
is_assignable_irq to check if we can assign a given IRQ to the guest.
Secondly, make sure the vIRQ is not the greater than the number of IRQs
configured in the vGIC and it's an SPI.
Thirdly, when the IRQ is already assigned to the domain, check the user
is not asking to use a different vIRQ than the one already bound.
Finally, desc->arch.type which contains the IRQ type (i.e level/edge) must
be correctly configured before. The misconfiguration can happen when:
- the device has been blacklisted for the current platform
- the IRQ has not been described in the device tree
Also, use XENLOG_G_ERR in the error message within the function as it will
be later called from a guest.
Currently, Xen is assuming that the virtual IRQ will always be the same
as IRQ.
Modify route_guest_irq to take the virtual IRQ in parameter which allow
Xen to assign a different IRQ number. Also store the vIRQ in the desc
action to easily retrieve the IRQ target when we need to inject the
interrupt.
As DOM0 will get most the devices, the vIRQ is equal to the IRQ in that case.
At the same time modify the behavior of irq_get_domain. The function now
requires that the irq_desc belongs to an IRQ assigned to a guest.
xen: Extend DOMCTL createdomain to support arch configuration
On ARM the virtual GIC may differ between each guest (emulated GIC version,
number of SPIs...). This information is already known at the domain creation
and can never change.
For now only the gic_version is set. In the long run, there will be more
parameters such as the number of SPIs. All will be required to be set at the
same time.
A new arch-specific structure arch_domainconfig has been created, the x86
one doesn't have any specific configuration, for now, a dummy structure
(C-spec compliant) has been created.
Some external tools (qemu, xenstore) may be required to create a domain.
Rather than asking them to take care of the arch-specific domain
configuration, let the current function (xc_domain_create) chose a
default configuration and introduce a new one (xc_domain_create_config).
This patch also drops the previously introduced DOMCTL arm_configure_domain
in Xen 4.5, as it has been made useless.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Keir Fraser <keir@xen.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com>
MAINTAINERS: move drivers/passthrough/device_tree.c in "DEVICE TREE"
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Keir Fraser <keir@xen.org>
When a device is marked for passthrough (via the new property
"xen,passthrough"), dom0 must not access to the device (i.e not
loading a driver), but should be able to manage the MMIO/interrupt
of the passthrough device.
The latter part will allow the toolstack to map MMIO/IRQ when a device
is pass through to a guest.
The property "xen,passthrough" will be translated as 'status="disabled"'
in the device tree to avoid DOM0 using the device. We assume that DOM0 is
able to cope with this property (already the case for Linux, and
required by ePAPR).
Rework the function map_device (renamed into handle_device) to:
* For a given device node:
- Give permission to manage IRQ/MMIO for this device
- Retrieve the IRQ configuration (i.e edge/level) from the device
tree
* When the device is not marked for guest passthrough:
- Assign the device to the guest if it's protected by an IOMMU
- Map the IRQs and MMIOs regions to the guest
The check to avoid mapping disabled devices in DOM0 was added in
anticipation of the device passthrough. But, a brand new property will
be added later to mark device which will be passthrough.
Also, remove the memory type check as we already skipped them earlier in
the function via skip_matches.
Furthermore, some platform (such as the OMAP) may try to poke device even
if the property "status" is set to "disabled".
Currently the function to translate IRQ from the device tree is set
unconditionally to be able to be able to retrieve serial/timer IRQ before the
GIC has been initialized.
It assumes that the xlate function won't ever changed. We may also need to
have the primary interrupt controller very early.
Rework the gic initialization in 2 parts:
- gic_preinit: Get the interrupt controller device tree node and set
up GIC and xlate callbacks
- gic_init: Initialize the interrupt controller and the boot CPU
interrupts.
The former function will be called just after the IRQ subsystem as been
initialized.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Frediano Ziglio <frediano.ziglio@huawei.com> Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
Limit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)
Said hypercall for large BARs can take quite a while. As such
we can require that the hypercall MUST break up the request
in smaller values.
Another approach is to add preemption to it - whether we do the
preemption using hypercall_create_continuation or returning
EAGAIN to userspace (and have it re-invocate the call) - either
way the issue we cannot easily solve is that in 'map_mmio_regions'
if we encounter an error we MUST call 'unmap_mmio_regions' for the
whole BAR region.
Since the preemption would re-use input fields such as nr_mfns,
first_gfn, first_mfn - we would lose the original values -
and only undo what was done in the current round (i.e. ignoring
anything that was done prior to earlier preemptions).
Unless we re-used the return value as 'EAGAIN|nr_mfns_done<<10' but
that puts a limit (since the return value is a long) on the amount
of nr_mfns that can provided.
This patch sidesteps this problem by:
- Setting an hard limit of nr_mfns having to be 64 or less.
- Toolstack adjusts correspondingly to the nr_mfn limit.
- If the there is an error when adding the toolstack will call the
remove operation to remove the whole region.
The need to break this hypercall down is for large BARs can take
more than the guest (initial domain usually) time-slice. This has
the negative result in that the guest is locked out for a long
duration and is unable to act on any pending events.
We also augment the code to return zero if nr_mfns instead
of trying to the hypercall.
This is XSA-125 / CVE-2015-2752.
Suggested-by: Jan Beulich <jbeulich@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Charles Arnold [Tue, 24 Mar 2015 02:55:08 +0000 (20:55 -0600)]
xentop: add support for qdisks
Now that Xen uses qdisks by default and qemu does not write out
statistics to sysfs this patch queries the QMP for disk statistics.
This patch depends on libyajl for parsing statistics returned from
QMP. The runtime requires libyajl 2.0.3 or newer for required bug
fixes in yajl_tree_parse().
Libxl is modified to create a new socket dedicated for the use of
libxenstat for querying the block statistics using QMP.
The current APIs remain unchanged. It works within the existing
framework of libxenstat.
Signed-off-by: Charles Arnold <carnold@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Dario Faggioli [Thu, 26 Mar 2015 08:55:04 +0000 (09:55 +0100)]
libxl: cleanup some misuse of 'cpumap' as parameter
in favour of the more generic 'bitmap', which is better
since these are generic libxl_bitmap_* functions.
Also fix a typo, and remove a stale (and wrong) comment.
No functional change intended.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Thu, 26 Mar 2015 08:54:57 +0000 (09:54 +0100)]
libxl: automatically set soft affinity after vnuma info
More specifically, vcpus are assigned to a vnode, which in
turn is associated with a pnode. If a vcpu does not have any
soft affinity, automatically build up one, matching the pcpus
of the said pnode.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Thu, 26 Mar 2015 08:54:48 +0000 (09:54 +0100)]
libxl: check whether vcpu affinity and vnuma info match
More specifically, vcpus are assigned to a vnode, which in
turn is associated with a pnode. If a vcpu also has, in its
(hard or soft) affinity, some pcpus that are not part of the
said pnode, print a warning to the user.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Robbie VanVossen [Tue, 24 Mar 2015 20:48:19 +0000 (16:48 -0400)]
xen/passthrough: Support a single iommu_domain per xen domain per SMMU
If multiple devices are being passed through to the same domain and they
share a single SMMU, then they only require a single iommu_domain.
In arm_smmu_assign_dev, before a new iommu_domain is created, the
xen_domain->contexts is checked for any iommu_domains that are already
assigned to device that uses the same SMMU as the current device. If one
is found, attach the device to that iommu_domain. If a new one isn't
found, create a new iommu_domain just like before.
The arm_smmu_deassign_dev function assumes that there is a single
device per iommu_domain. This meant that when the first device was
deassigned, the iommu_domain was freed and when another device was
deassigned a crash occurred in xen.
To fix this, a reference counter was added to the iommu_domain struct.
When an arm_smmu_xen_device references an iommu_domain, the
iommu_domains ref is incremented. When that reference is removed, the
iommu_domains ref is decremented. The iommu_domain will only be freed
when the ref is 0.
Ian Campbell [Mon, 30 Mar 2015 11:12:34 +0000 (12:12 +0100)]
xen: arm: Allow traps from 32 bit userspace on 64 bit hypervisors again
This removes the unconditional #undef injected in response to such
traps which was added by the fixes to CVE-2014-5147 / XSA-102 in c0020e099702 "xen: arm: Handle traps from 32-bit userspace on 64-bit
kernel as undef", we now handle such traps correctly.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Mon, 30 Mar 2015 11:12:32 +0000 (12:12 +0100)]
xen: arm: handle remaining traps from userspace
CP14 dbg and general CP register access are both handled with
unconditional injection of #undef from their respective handlers, so
allow these even from 32-bit userspace on a 64-bit kernel.
SMC32 and HVC32 should only come from a guest in AArch32 mode and
SMC64 and HVC64 should only come from a guest in AArch64 mode. Add
appropriate BUG_ONs to all cases.
After this bad_trap is no longer used.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Mon, 30 Mar 2015 11:12:30 +0000 (12:12 +0100)]
xen: arm: Handle CP14 32-bit register accesses from userspace
Accesses to these from 32-bit userspace would cause a hypervisor
exception (host crash) when running a 64-bit kernel, which is worked
around by the fix to XSA-102. On 32-bit kernels they would be
implemented as RAZ/WI which is incorrect but harmless.
Update as follows:
- DBGDSCRINT should be R/O.
- DBGDSCREXT should be EL1 only.
- DBGOSLAR is WO and EL1 only.
- DBGVCR, DBGB[VC]R*, DBGW[VC]R*, and DBGOSDLR are EL1 only.
DBGDIDR and DBGDSCRINT are accessible from EL0 if DBGDSCRext.UDCCdis.
Since we emulate that as RAZ/WI we allow access.
When we do not allow an access we now silently inject an undef even in
debug mode since the debugging messages are not helpful (we have
handled the access, by explicitly choosing not to).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Mon, 30 Mar 2015 11:12:29 +0000 (12:12 +0100)]
xen: arm: Handle CP15 register traps from userspace
Previously userspace access to PM* would have been incorrectly (but
benignly) implemented as RAZ/WI when running on a 32-bit kernel and
would cause a hypervisor exception (host crash) when running a 64-bit
kernel (this was already solved via the fix to XSA-102).
PMINTENSET, PMINTENCLR are EL1 only, but it is not clear whether
attempts to access from EL0 will trap to EL1 or EL2, be conservative
and handle EL0 access with an undef injection.
ACTLR is EL1 only and the ARM ARM states that HCR_EL2.TACR causes
accesses from EL1 to trap. However remain conservative even here and
handle accesses from EL0 by injecting an undef injection.
PMUSERENR is R/O at EL0 and we implement as RAZ/WI at EL1 as before.
The remaining PM* registers are accessible to EL0 only if
PMUSERENR_EL0.EN is set, since we emulate this as RAZ/WI the bit is
never set so we inject a trap on attempted access. We weren't
previously handling PMCCNTR.
HSR_EC_CP15_32 should never be seen from a 64-bit guest, so BUG_ON if
that occurs.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Mon, 30 Mar 2015 11:12:28 +0000 (12:12 +0100)]
xen: arm: drop cache maintenance by set/way trap handling
We do not set HCR_EL2.TSW so we will never see these.
This is undoubtedly wrong, but for now remove the dead code.
However, retain the HSR_SYSREG_* added by the precursor to this patch,
although they aren't used they are factually accurate and may as well
be kept for future use.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Mon, 30 Mar 2015 11:12:24 +0000 (12:12 +0100)]
xen: arm: handle accesses to CNTP_CVAL_EL0
All OSes we have run on top of Xen use CNTP_TVAL_EL0 but for
completeness we really should handle CVAL too.
In vtimer_emulate_cp64 pull the propagation of the 64-bit result into
two 32-bit registers out of the switch to avoid duplicating for every
register. We also need to initialise x now since previously the only
register implemented register was R/O.
While adding HSR_SYSREG_CNTP_CVAL_EL0 also move
HSR_SYSREG_CNTP_CTL_EL0 so it is sorted correctly.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Mon, 30 Mar 2015 11:12:23 +0000 (12:12 +0100)]
xen: arm: correctly handle vtimer traps from userspace
Previously 32-bit userspace on 32-bit kernel and 64-bit userspace on
64-bit kernel could access these registers irrespective of whether the
kernel had configured them to be allowed to. To fix this:
- Userspace access to CNTP_CTL_EL0 and CNTP_TVAL_EL0 should be gated
on CNTKCTL_EL1.EL0PTEN.
- Userspace access to CNTPCT_EL0 should be gated on
CNTKCTL_EL1.EL0PCTEN.
When we do not handle an access we now silently inject an undef even
in debug mode since the debugging messages are not helpful (we have
handled the access, by explicitly choosing not to).
The usermode accessibility check is rather repetitive, so a helper
macro is introduced.
Since HSR_EC_CP15_64 cannot be taken from a guest in AArch64 mode
except due to a hardware bug switch the associated check to a BUG_ON
(which will be switched to something more appropriate in a subsequent
patch)
Fix a coding style issue in HSR_CPREG64(CNTPCT) while touching similar
code.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Mon, 30 Mar 2015 11:12:22 +0000 (12:12 +0100)]
xen: arm: Factor out psr_mode_is_user
This embodies the logic on arm64 that userspace can be either 32-bit
or 64-bit. It will be used in other places shortly.
Note that the logic differs slightly because the original (in
inject_abt64_exception) knew that the kernel was 64-bit and could
therefore assume that any 32-bit mode was userspace. Instead the
refactored code explicitly checks for usr mode.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Boris Ostrovsky [Mon, 30 Mar 2015 20:17:59 +0000 (16:17 -0400)]
flask: Update XEN_SYSCTL_cputopoinfo name
Commit 2090f14c5cbd ("sysctl: make XEN_SYSCTL_topologyinfo sysctl a
little more efficient") renamed XEN_SYSCTL_topologyinfo to
XEN_SYSCTL_cputopoinfo.
It, however, neglected to update this macro for flask-related files.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
libxc: Introduce xc_domain_nr_gpfns as a cousin of xc_domain_maximum_gpfn.
The commit a8f8a590e02d2d2b717257c0bd9a8b396103bdf4
"libxc: Check xc_domain_maximum_gpfn for negative return values"
introduced an regression in tools outside libxc (migrate v2)
which wanted the unfiltered GPFN value. Said commit added
a wrapper which added +1 if there were no errors.
To make it work pre-commit a8f8a59 we add an xc_domain_nr_gpfns
which will add +1 if there are no errors (and change all in-tree
callers to use it). The xc_domain_maximum_gpfn will return the
unfiltered GPFN value.
Suggested-by: Ian Campbell <ian.campbell@citrix.com> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-IDs: 1291939 (stray semicolon), 1291941 (structually dead code) CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Xen Coverity Team <coverity@xen.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Thu, 26 Mar 2015 10:54:04 +0000 (10:54 +0000)]
xen: arm: correctly handle continuations for 64-bit guests
The 64-bit ABI is different to 32-bit:
- uses x16 as the op register rather than r12.
- arguments in x0..x5 and not r0..r5. Using rN here potentially
truncates.
- return value goes in x0, not r0.
Hypercalls can only be made directly from kernel space, so checking
the domain's size is sufficient.
Spotted due to spurious -EFAULT when destroying a domain, due to the
hypercall's pointer argument being truncated. I'm unclear why I am
only seeing this now.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Ian Jackson [Tue, 10 Feb 2015 20:09:49 +0000 (20:09 +0000)]
libxl: Comment cleanups
* Add two comments in libxl_remus_disk_drbd documenting buggy handling
of the hotplug script exit status.
* Add a section heading for async exec in libxl_aoutils.c
* Mention the right function name (libxl__ev_child_fork, not
libxl__ev_fork) in libxl_internal.h
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Yang Hongyang <yanghy@cn.fujitsu.com> CC: Wen Congyang <wency@cn.fujitsu.com> CC: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 10 Feb 2015 20:09:48 +0000 (20:09 +0000)]
libxl: Further fix exit paths from libxl_device_events_handler
On the success path, do not call GC_FREE explicitly. Instead, call
AO_INPROGRESS.
GC_FREE will free the gc underlying the long-term ao, which is then
subsequently referenced in backend_watch_callback's call to
libxl__nested_ao_create. It is a miracle that this ever works at all.
Also, add an `if (rc) goto out;' after the xswatch registration.
After this, libxl_device_events_handler has the conventional and
correct ao initiation pattern.
Olaf Hering [Tue, 24 Mar 2015 14:37:42 +0000 (14:37 +0000)]
tools/mkrpm: improve version.release handling
An increasing version and/or release number helps to update existing
packages without --force as in "rpm Uvh --force xen.rpm". Instead its
possible to do "rpm -Fvh *.rpm" to update only already installed
packages.
The usage of --force disables essentials checks such as file conflict
detection. As a result the new xen.rpm may overwrite files owned by
other packages.
With the current way of calculating version-release it is difficult to
get an increasing release number into the spec file. The release is
always zero unless "make make XEN_VENDORVERSION=`date +.%s`" is used,
which has the bad side effect that xen.gz always gets a different
filename every time.
Update mkrpm to recognize PKG_RELEASE=. Its value will be appended to
the Release string. It can be filled with a time stamp, like:
make rpmball PKG_RELEASE="`date +%Y%m%d%H%M%S`"
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Tested-by: George Dunlap <george.dunlap@eu.citrix.com>
Olaf Hering [Fri, 27 Mar 2015 10:29:24 +0000 (10:29 +0000)]
hotplug/Linux: add missing backslash in dom0_ip
Without it the actual error message is not written to xenstore.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Boris Ostrovsky [Thu, 26 Mar 2015 18:08:44 +0000 (14:08 -0400)]
libxc: Make conversion from page count to bytes 32-bit safe
Commit ba59e2ce935d ("libxc: allocate memory with vNUMA information for
PV guest") creates default vNUMA layout with a single range containing
all memory. The end of the range is calculated by shifting
dom->total_pages by 12 to the left.
On 32-bit dom0 this may result in losing upper bits since total_pages is
a 32-bit type.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Fri, 27 Mar 2015 14:23:25 +0000 (15:23 +0100)]
VT-d: improve fault info logging
I got repeatedly annoyed by there not getting anything logged by
default on VT-d faults (and hence having to tell people to add extra
command line options), and hence I think it is time to redo this code:
Log basic fault information at guest-warning level (rate limited by
default), and show the page walk in verbose rather than only in debug
mode. Break up multi-line message so that each gets a proper log level
attached, at once splitting out the common part. Also don't log
"unknown" faults as interrupt-remapping ones.
As a minor cleanup fix the type of the involved "fault_type" variables.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
Jan Beulich [Thu, 26 Mar 2015 10:23:33 +0000 (11:23 +0100)]
x86: simplify non-atomic bitops
- being non-atomic, their pointer arguments shouldn't be volatile-
qualified
- their (half fake) memory operands can be a single "+m" instead of
being both an output and an input
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 26 Mar 2015 10:19:57 +0000 (11:19 +0100)]
x86/MSI: fix error handling
__setup_msi_irq() needs to undo what it did before calling
write_msi_msg() in case that returned an error.
map_domain_pirq() needs to get rid of the MSI descriptor it
(implicitly) allocated. The case of a setup_msi_irq() failure on a
non-initial multi-vector-MSI interrupt needs special handling: While
the initial IRQ will get freed by the caller (who also passed it to
us), we need to undo the effect setup_msi_irq() had on it. (As a
benefit from the added call to msi_free_irq() we no longer need to
explicitly call destroy_irq() on the non-initial slots.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
JeHyeon Yeon [Thu, 26 Mar 2015 10:19:10 +0000 (11:19 +0100)]
LZ4 : fix the data abort issue
If the part of the compression data are corrupted, or the compression
data is totally fake, the memory access over the limit is possible.
This is the log from my system usning lz4 decompression.
[6502]data abort, halting
[6503]r0 0x00000000 r1 0x00000000 r2 0xdcea0ffc r3 0xdcea0ffc
[6509]r4 0xb9ab0bfd r5 0xdcea0ffc r6 0xdcea0ff8 r7 0xdce80000
[6515]r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0xb9a98000
[6522]r12 0xdcea1000 usp 0x00000000 ulr 0x00000000 pc 0x820149bc
[6528]spsr 0x400001f3
and the memory addresses of some variables at the moment are
ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000
As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory
over @oend.
Signed-off-by: JeHyeon Yeon <tom.yeon@windriver.com> Reviewed-by: David Sterba <dsterba@suse.cz>
[Linux commit d5e7cafd69da24e6d6cc988fab6ea313a2577efc] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Thu, 26 Mar 2015 10:18:28 +0000 (11:18 +0100)]
x86: don't change affinity with interrupt unmasked
With ->startup unmasking the IRQ, setting the affinity afterwards
without masking the IRQ again is invalid namely for MSI (address and
data can't be updated atomically and may - at least for MSI-X - be
cached while unmasked).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 26 Mar 2015 10:17:51 +0000 (11:17 +0100)]
hvmloader: don't treat ROM BAR like other BARs
Its low 11 bits have different meaning.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Boris Ostrovsky [Thu, 26 Mar 2015 10:13:01 +0000 (11:13 +0100)]
sysctl: don't overwrite array size variable when it is set on error earlier
When querying CPU topology, if caller-provided array size is smaller than
number of online CPUs then, in addition to returning -ENOBUFS, sysctl is
expected to provide back this number. However, this value, stored in 'i',
is overwritten in the subsequent loop's control statement.
Make sure we don't do this by converting the loop to 'while'.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ian Campbell [Thu, 26 Mar 2015 10:09:31 +0000 (11:09 +0100)]
arm: use gprintk as appropriate
gdprintk is now only included with debug=y builds. Therefore:
- switch some uses to gprintk
- remove some now redundant #ifndef NDEBUG surrounding existing
gdprintk uses.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Jan Beulich [Thu, 26 Mar 2015 10:08:28 +0000 (11:08 +0100)]
introduce gprintk()
... and convert several gdprintk()-s to it, as the next patch will make
them no-ops in non-debug builds.
Note that as a non-debug facility this does not print file name and
line number of the origin, to people are expected to use meaningful and
easily distinguishable messages (i.e. just like with plain printk()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Thu, 26 Mar 2015 10:05:01 +0000 (11:05 +0100)]
add flag to start info regarding virtual mapped p2m list
Xen pv domains are using a domain private p2m list to convert guest pfns
to mfns. This p2m list has to be updated by the Xen tools during domain
restore and migration, as the mfns will most likely change. In order to
locate the p2m list the Xen tools need an interface provided by the
guest. Up to now this interface has been the shared info page where the
guest would store the mfn of the top level page of a 3-level p2m tree.
This p2m tree is fixed in it's layout and due to the limitation of
entries it can hold at each level it is limiting the maximum size of the
p2m list which can be reported to the Xen tools. The maximum memory the
p2m tree can support for 64 bit domains is 512 GB (32 bit domains don't
have a problem, as the p2m tree limit is much higher than the supported
domain size of 64 GB).
In order to be able to support pv domains with more than 512 GB an
additional way to specify the p2m list for the Xen tools has been added:
instead of a tree structure linked via mfns, the virtual address of a
linear p2m list, the cr3 value of the related address space and the size
of the p2m list can be specified by the guest (added by commit 50bd1f0825339dfacde471df7664729216fc46e3).
Guests implementing this new interface need to know, of course, whether
the Xen tools are capable to use the new interface instead of the old
p2m tree interface. Otherwise a guest using only the new interface with
the virtual mapped linear p2m list on a machine with old Xen tools not
supporting this interface could not be restored or migrated.
The added flag in the start info indicates the Xen tool's capability to
use the new interface enabling the guest to omit the p2m tree and thus
to support more than 512 GB of RAM.
Vijaya Kumar K [Tue, 24 Mar 2015 11:44:47 +0000 (17:14 +0530)]
xen: Add populate_pt_range interface to reserve non-leaf level table entries
On x86, for the pages mapped with PAGE_HYPERVISOR attribute
non-leaf page tables are allocated with valid pte entries.
and with MAP_SMALL_PAGES attribute only non-leaf page tables are
allocated with invalid (valid bit set to 0) pte entries.
However on arm this is not the case. On arm for the pages
mapped with PAGE_HYPERVISOR and MAP_SMALL_PAGES both
non-leaf and leaf level page table are allocated with valid bit
in pte entries.
This behaviour in arm makes common vmap code fail to
allocate memory beyond 128MB as described below.
In vm_init, map_pages_to_xen() is called for mapping
vm_bitmap. Initially one page of vm_bitmap is allocated
and mapped using PAGE_HYPERVISOR attribute.
For the rest of vm_bitmap pages, MAP_SMALL_PAGES attribute
is used to map.
In ARM for both PAGE_HYPERVISOR and MAP_SMALL_PAGES, valid bit
is set to 1 in pte entry for these mapping.
In vm_alloc(), map_pages_to_xen() is failing for >128MB because
for this next vm_bitmap page the mapping is already set in vm_init()
with valid bit set in pte entry. So map_pages_to_xen() in
ARM returns error.
With this patch, MAP_SMALL_PAGES is dropped and arch specific
populate_pt_range() api is introduced to populate non-leaf
page table entries for the requested pages. Added RESERVE option
to map non-leaf page table entries.
Signed-off-by: Vijaya Kumar K<Vijaya.Kumar@caviumnetworks.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- rewrote subject line ]
Wei Liu [Fri, 20 Mar 2015 16:19:12 +0000 (16:19 +0000)]
libxl: use new QEMU xenstore protocol
Originally both QEMU traditional and QEMU upstream used hardcoded
/local/domain/0 paths. This patch changes the protocol to use
/local/domain/$dm_domid path.
For QEMU traditional and upstream without stubdom, $dm_domid is 0 so
the path is in fact still /local/domain/0.
For QEMU traditional stubdom, this is incompatible protocol change.
However QEMU traditional is shipped with Xen so we are allowed to do
such change. This change needs to work with corresponding QEMU
traditional changeset.
There is no compatibility issue with QEMU upstream stubdom, because QEMU
upstream stubdom doesn't exist yet.
Watch /local/domain/$dm_domid/device-model/$domid/state, wait until
state turns "running" then unpause guest.
LIBXL_STUBDOM_START_TIMEOUT is the timeout used wait for stubdom to be
ready. My test on a very old machine (Core 2 6400) showed that it might
need more than 20s before the stubdom is ready to serve DomU.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Boris Ostrovsky [Tue, 24 Mar 2015 08:27:00 +0000 (09:27 +0100)]
x86: don't use BAD_APICID for non-APICID fields
BAD_APICID is used by cpuinfo_x86's phys_proc_id, cpu_core_id
and compute_unit_id even though these fields don't hold an APIC ID
itself but rather its derivative.
Provide appropriate macros for each of those three (and make them
unsigned).
This also fixes regression introduced by commit 2090f14c5cbd ("sysctl:
make XEN_SYSCTL_topologyinfo sysctl a little more efficient") which
leaked BAD_APICID into common code, breaking ARM.
Reported-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Ditch INVALID_{CORE,SOCKET}ID in favor of always using
XEN_INVALID_{CORE,SOCKET}_ID.
Boris Ostrovsky [Tue, 24 Mar 2015 08:23:54 +0000 (09:23 +0100)]
pci: include asm/numa.h in pci.h
Commit 4fa6b0bacf9c ("pci: stash device's PXM information in struct
pci_dev") added node field to xen/include/xen/pci.h. Its type,
nodeid_t, is defined in asm/numa.h and we should include this file
explicitly in pci.h
Reported-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Jan Beulich [Tue, 24 Mar 2015 08:23:00 +0000 (09:23 +0100)]
x86: support newer Intel CPU models
This just follows what the January 2015 edition of the SDM documents,
with additional clarification from Intel:
- Broadwell models 0x4f and 0x56 don't cross-reference other tables,
but should be treated like other Boradwell (0x3d),
- Xeon Phi model 0x57 lists LASTBRANCH_TOS but not where the actual
stack is. Being told it's Silvermont based, attach it there.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
acpi_disabled needs to be moved out of .init.data.
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com>