]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agolibxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:29 +0000 (16:02 -0400)]
libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.

There is no sense in trying to online (or offline) CPUs when the size of
cpumap is greater than the maximum number of VCPUs the guest can go to.

As such fail the operation if the count of CPUs to online is greater
than what the guest started with. For the offline case we do not
check (as the bits are unset in the cpumap) and let it go through.

We coalesce some of the underlying libxl_set_vcpuonline code
together which was duplicated in QMP and XenStore codepaths.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Add ERROR_DOMAIN_NOTFOUND for libxl_domain_info when it cannot find the domain
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 20:02:28 +0000 (16:02 -0400)]
libxl: Add ERROR_DOMAIN_NOTFOUND for libxl_domain_info when it cannot find the domain

And use that for all of its callers in the tree.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Cope with pipes which signal POLLHUP|POLLIN on read eof
Ian Jackson [Tue, 7 Apr 2015 13:05:28 +0000 (14:05 +0100)]
libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof

Some operating systems (including Linux and FreeBSD[1]) signal not
(only) POLLIN when a reading pipe reaches EOF, but POLLHUP (with or
without POLLIN).  This is permitted[2].  The implications are that in
the general case it is not possible to determine whether POLLHUP
indicates an error or simply eof without attempting a read.

Datacopiers mishandle this, because they always treat POLLHUP
exceptionally (either reporting it via callback_pollhup, or treating
it as an error).  datacopiers reading from pipes on such OSs can fail
(perhaps leaving some data unprocessed) rather than completing
successfully.

[1] http://www.greenend.org.uk/rjk/tech/poll.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html

Distinguishing POLLHUP is needed for pty fds, but most callers in
libxl do not care about POLLHUP except as an error or eof condition.

So change the datacopier semantics so that if callback_pollhup is not
specified we treat POLLHUP almost like POLLIN.  The difference is that
if we get HUP from poll, but EWOULDBLOCK from read, we must signal an
error rather than attempting the read again.

This fixes the problem which 7e9ec50b0535 was aimed at.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: datacopier: Avoid eof/POLLHUP race
Ian Jackson [Tue, 7 Apr 2015 13:05:27 +0000 (14:05 +0100)]
libxl: datacopier: Avoid eof/POLLHUP race

When the bootloader exits, several things change, all at once:
 (a) The master pty fd (held by libxl) starts to signal POLLHUP
    and maybe also POLLIN.
 (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
    which will be handled by the libxl child process code.
 (c) reads on the master pty fd start to return EOF

From the point of view of the datacopier these might happen in any
order.

(c) can be detected only after a previous POLLIN without POLLHUP and
that previous POLLIN would be associated with data which was read,
which must therefore have ended up in the dc's buffer.  But nothing
stops the dc from writing that data into the output fd and reporting
eof before it calls poll again.

This race is unlikely.  But  nevertheless it should be fixed.

We solve the race with a poll of the reading fd, to double-check, when
we detect eof via read.  (This is only necessary if the caller has
specified callback_pollhup, as otherwise POLLHUP|POLLIN - and,
presumably, POLLIN followed perhaps by POLLHUP|POLLIN, is to be
treated as eof anyway.)

With a testing patch supplied by me, Roger Pau Monné has reproduced
the failure on FreeBSD and verified that this patch fixes the problem.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
10 years agox86/vMSI-X: add valid bits for read acceleration
Jan Beulich [Tue, 14 Apr 2015 14:51:18 +0000 (16:51 +0200)]
x86/vMSI-X: add valid bits for read acceleration

Again because Xen doesn't get to see all guest writes, it shouldn't
serve reads from its cache before having seen a write to the respective
address.

Also use DECLARE_BITMAP() in a related field declaration instead of
open coding it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/vMSI-X: honor all mask requests
Jan Beulich [Tue, 14 Apr 2015 14:50:35 +0000 (16:50 +0200)]
x86/vMSI-X: honor all mask requests

Commit 74fd0036de ("x86: properly handle MSI-X unmask operation from
guests") didn't go far enough: it fixed an issue with unmasking, but
left an issue with masking in place: Due to the (late) point in time
when qemu requests the hypervisor to set up MSI-X interrupts (which is
where the MMIO intercept gets put in place), the hypervisor doesn't
see all guest writes, and hence shouldn't make assumptions on the state
the virtual MSI-X resources are in. Bypassing the rest of the logic on
a guest mask operation leads to

[00:04.0] pci_msix_write: Error: Can't update msix entry 1 since MSI-X is already enabled.

which surprisingly enough doesn't lead to the device not working
anymore (I didn't dig in deep enough to figure out why that is). But it
does prevent the IRQ to be migrated inside the guest, i.e. all
interrupts will always arrive in vCPU 0.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: use real assert frames for ASSERT_INTERRUPTS_{EN,DIS}ABLED
Andrew Cooper [Tue, 14 Apr 2015 13:29:19 +0000 (15:29 +0200)]
x86: use real assert frames for ASSERT_INTERRUPTS_{EN,DIS}ABLED

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86: infrastructure to create BUG_FRAMES in asm code
Andrew Cooper [Tue, 14 Apr 2015 13:07:24 +0000 (15:07 +0200)]
x86: infrastructure to create BUG_FRAMES in asm code

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86: set regs->entry_vector for early_page_fault
Don Slutz [Tue, 14 Apr 2015 13:03:27 +0000 (15:03 +0200)]
x86: set regs->entry_vector for early_page_fault

This changes:

(XEN) Early fatal page fault at e008:ffff82d080164252 (cr2=0000000000000000, ec=0000)
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080164252>] arch_domain_create+0x3e/0x4ef
...
(XEN) Xen call trace:
(XEN)    [<ffff82d080164252>] arch_domain_create+0x3e/0x4ef
(XEN)    [<ffff82d080105262>] domain_create+0x384/0x556
(XEN)    [<ffff82d0802a0de4>] scheduler_init+0x1c4/0x244
(XEN)    [<ffff82d0802be359>] __start_xen+0x1d0e/0x22a1
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x58
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL TRAP: vector = 0 (divide error)
(XEN) [error_code=0000] , IN INTERRUPT CONTEXT
(XEN) ****************************************
...

to:

(XEN) Early fatal page fault at e008:ffff82d080164252 (cr2=0000000000000000, ec=0000)
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d080164252>] arch_domain_create+0x3e/0x4ef
...
(XEN) Xen call trace:
(XEN)    [<ffff82d080164252>] arch_domain_create+0x3e/0x4ef
(XEN)    [<ffff82d080105262>] domain_create+0x384/0x556
(XEN)    [<ffff82d0802a0de4>] scheduler_init+0x1c4/0x244
(XEN)    [<ffff82d0802be359>] __start_xen+0x1d0e/0x22a1
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x58
(XEN)
(XEN) Faulting linear address: 0000000000000000
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 000000083a1a6063 ffffffffffffffff
(XEN)  L3[0x000] = 000000083a1a5063 ffffffffffffffff
(XEN)  L2[0x000] = 000000083a1a4063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL TRAP: vector = 14 (page fault)
(XEN) [error_code=0000] , IN INTERRUPT CONTEXT
(XEN) ****************************************
...

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/mtrr: include asm/atomic.h
David Vrabel [Tue, 14 Apr 2015 13:02:32 +0000 (15:02 +0200)]
x86/mtrr: include asm/atomic.h

asm/atomic.h is needed but only included indirectly via
asm/spinlock.h.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/hvm: don't include asm/spinlock.h
David Vrabel [Tue, 14 Apr 2015 13:02:10 +0000 (15:02 +0200)]
x86/hvm: don't include asm/spinlock.h

asm/spinlock.h should not be included directly.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoRevert "x86/hvm: wait for at least one ioreq server to be enabled"
Wei Liu [Tue, 14 Apr 2015 13:01:14 +0000 (15:01 +0200)]
Revert "x86/hvm: wait for at least one ioreq server to be enabled"

We don't need this workaround anymore since we have fixed the toolstack
interlock problem that affects stubdom.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
10 years agox86: clean up psr boot parameter parsing
Chao Peng [Tue, 14 Apr 2015 13:00:44 +0000 (15:00 +0200)]
x86: clean up psr boot parameter parsing

Change type of opt_psr from bool to int so more psr features can fit.

Introduce a new routine to parse bool parameter so that both cmt and
future psr features like cat can use it.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agodocs: efi: given some hint about the dom0 command line
Ian Campbell [Tue, 14 Apr 2015 13:00:28 +0000 (15:00 +0200)]
docs: efi: given some hint about the dom0 command line

Suggested-by: Carlos Gustavo Ramirez Rodriguez <carlosgrr@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agox86/traps: identify the vcpu in context when dumping registers
Andrew Cooper [Tue, 14 Apr 2015 12:59:53 +0000 (14:59 +0200)]
x86/traps: identify the vcpu in context when dumping registers

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/cpuidle: identify a legitimate fallthrough case
Andrew Cooper [Tue, 14 Apr 2015 12:59:37 +0000 (14:59 +0200)]
x86/cpuidle: identify a legitimate fallthrough case

to appease the Missing Break checker.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-id: 1291938

10 years agosched_credit2: more info when dumping
Dario Faggioli [Tue, 14 Apr 2015 12:58:52 +0000 (14:58 +0200)]
sched_credit2: more info when dumping

more specifically, for each runqueue, print what pCPUs
belong to it, which ones are idle and which ones have
been tickled.

While there, also convert the whole file to use
keyhandler_scratch for printing cpumask-s.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agorework locking for dump of scheduler info (debug-key r)
Dario Faggioli [Tue, 14 Apr 2015 12:56:13 +0000 (14:56 +0200)]
rework locking for dump of scheduler info (debug-key r)

such as it is taken care of by the various schedulers, rather
than happening in schedule.c. In fact, it is the schedulers
that know better which locks are necessary for the specific
dumping operations.

While there, fix a few style issues (indentation, trailing
whitespace, parentheses and blank line after var declarations)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agoVTd/dmar: Tweak how the DMAR table is clobbered
Andrew Cooper [Fri, 10 Apr 2015 15:26:18 +0000 (11:26 -0400)]
VTd/dmar: Tweak how the DMAR table is clobbered

Intead of clobbering DMAR -> XMAR and back, clobber to RMAD instead. This
means that changing the signature does not alter the checksum, which allows
the clobbering/unclobbering to be peformed atomically and idempotently, which
is an advantage on the kexec path which can reenter acpi_dmar_reinstate().

This DMAR clobbering was introduced by
83904107a33c9badc34ecdd1f8ca0f9271e5e370 which claims that the dom0 VT-d
driver was capable of playing with the IOMMU(s) while Xen was also using
them. An alternative approach might be to leave the DMAR table alone
and sprinkle some iomem_deny_access() around to forcibly prevent dom0
from playing but this is simpler.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix>
CC: Yang Zhang <yang.z.zhang@intel>
Acked-by: Kevin Tian <kevin.tian@intel>
10 years agotools/hvmloader: Don't perform AML hotplug debugging in production
Andrew Cooper [Mon, 30 Mar 2015 14:20:19 +0000 (15:20 +0100)]
tools/hvmloader: Don't perform AML hotplug debugging in production

It is number of vmexits and a moderate quantity of qemu logging which can
safely be avoided when not specifically debugging a PCI hotplug issue.

As mk_dsdt is a build system tool, pass 'debug' as a command line parameter
rather than "hardcoding" it via the compilation of mk_dsdt itself.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agox86/smp: Allocate pcpu stacks on their local numa node
Andrew Cooper [Tue, 7 Apr 2015 17:26:19 +0000 (18:26 +0100)]
x86/smp: Allocate pcpu stacks on their local numa node

Previously, all pcpu stacks tended to be allocated on node 0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
10 years agox86/link: Introduce and use __bss_end
Andrew Cooper [Tue, 7 Apr 2015 17:26:18 +0000 (18:26 +0100)]
x86/link: Introduce and use __bss_end

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/smp: Clean up use of memflags in cpu_smpboot_alloc()
Andrew Cooper [Tue, 7 Apr 2015 17:26:17 +0000 (18:26 +0100)]
x86/smp: Clean up use of memflags in cpu_smpboot_alloc()

Hoist MEMF_node(cpu_to_node(cpu)) to the start of the function, and avoid
passing (potentially bogus) memflags if node information is not available.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/numa: Correct the extern of cpu_to_node
Andrew Cooper [Tue, 7 Apr 2015 17:26:16 +0000 (18:26 +0100)]
x86/numa: Correct the extern of cpu_to_node

This was missed by c/s 54ce2db "x86/numa: adjust datatypes for node and pxm"
which changed the array definition in numa.c

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/link: Discard the alternatives ".discard" sections
Andrew Cooper [Tue, 7 Apr 2015 17:26:15 +0000 (18:26 +0100)]
x86/link: Discard the alternatives ".discard" sections

This appears to have been missed when porting the alternatives framework from
Linux, and saves us a section which is otherwise loaded into memory.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/dom0: Don't allow dom0_max_vcpus to be zero
Boris Ostrovsky [Thu, 9 Apr 2015 20:38:43 +0000 (16:38 -0400)]
x86/dom0: Don't allow dom0_max_vcpus to be zero

In case dom0_max_vcpus is incorrectly specified on boot line make sure
we will still boot.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
10 years agox86/hvm: Fix the unknown nested vmexit reason 80000021 bug
Liang Li [Tue, 7 Apr 2015 13:27:02 +0000 (21:27 +0800)]
x86/hvm: Fix the unknown nested vmexit reason 80000021 bug

This bug will be trigged when NMI happen in the L2 guest. The current
code handles the NMI incorrectly. According to Intel SDM 31.7.1.2
(Resuming Guest Software after Handling an Exception), If bit 31 of the
IDT-vectoring information fields is set, and the virtual NMIs VM-execution
control is 1, while bits 10:8 in the IDT-vectoring information field is
2, bit 3 in the interruptibility-state field should be cleared to avoid
the next VM entry fail.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agolibxl: use new QEMU xenstore protocol
Wei Liu [Thu, 9 Apr 2015 18:49:25 +0000 (19:49 +0100)]
libxl: use new QEMU xenstore protocol

Originally both QEMU traditional and QEMU upstream used hardcoded
/local/domain/0 paths. This patch changes the protocol to use
/local/domain/$dm_domid path.

For QEMU traditional and upstream without stubdom, $dm_domid is 0 so
the path is in fact still /local/domain/0.

For QEMU traditional stubdom, this is incompatible protocol change.
However QEMU traditional is shipped with Xen so we are allowed to do
such change.  This change requires to corresponding QEMU traditional
changeset.

There is no compatibility issue with QEMU upstream stubdom, because QEMU
upstream stubdom doesn't exist yet.

Watch /local/domain/$dm_domid/device-model/$domid/state, wait until
state turns "running" then unpause guest.

LIBXL_STUBDOM_START_TIMEOUT is the timeout used wait for stubdom to be
ready. My test on a very old machine (Core 2 6400) showed that it might
need more than 20s before the stubdom is ready to serve DomU.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/hvm: factor out and rename vm_event related functions
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:58 +0000 (22:06 +0100)]
x86/hvm: factor out and rename vm_event related functions

To avoid growing hvm.c these functions can be stored separately. Minor style
changes are applied to the logic in the file.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agotools/tests: Clean-up tools/tests/xen-access
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:57 +0000 (22:06 +0100)]
tools/tests: Clean-up tools/tests/xen-access

The spin-lock implementation in the xen-access test program is implemented
in a fashion that is actually incomplete. The x86 assembly that guarantees that
the lock is held by only one thread lacks the "lock;" instruction.

However, the spin-lock is not actually necessary in xen-access as it is not
multithreaded. The presence of the faulty implementation of the lock in a non-
multithreaded environment is unnecessarily complicated for developers who are
trying to follow this code as a guide in implementing their own applications.
Thus, removing it from the code improves the clarity on the behavior of the
system.

Also converting functions that always return 0 to return to void, and making
the teardown function actually return an error code on error.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: Rename mem_event to vm_event
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:56 +0000 (22:06 +0100)]
xen: Rename mem_event to vm_event

In this patch we mechanically rename mem_event to vm_event. This patch
introduces no logic changes to the code. Using the name vm_event better
describes the intended use of this subsystem, which is not limited to memory
events. It can be used for off-loading the decision making logic into helper
applications when encountering various events during a VM's execution.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agoxen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:55 +0000 (22:06 +0100)]
xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup

The only use-case of the mem_event_op structure had been in mem_paging,
thus renaming the structure mem_paging_op and relocating its associated
functions clarifies its actual usage.

As part of this fix-up we also convert the gfn's in the toolstack to be
explicitely 64-bit wide and clean the code a bit.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/mem_event: Cleanup mem_event names in rings, functions and domctls
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:54 +0000 (22:06 +0100)]
xen/mem_event: Cleanup mem_event names in rings, functions and domctls

The name of one of the mem_event rings still implies it is used only
for memory accesses, which is no longer the case. It is also used to
deliver various HVM events, thus the name "monitor" is more appropriate
in this setting.

Couple functions incorrectly labeled as part of mem_event is also renamed
to reflect that they belong to mem_access.

The mem_event subop definitions are also shortened to be more meaningful.

The tool side changes are only mechanical renaming to match these new names.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/mem_event: Cleanup of mem_event structures
Tamas K Lengyel [Thu, 26 Mar 2015 21:06:53 +0000 (22:06 +0100)]
xen/mem_event: Cleanup of mem_event structures

The public mem_event structures used to communicate with helper applications via
shared rings have been used in different settings. However, the variable names
within this structure have not reflected this fact, resulting in the reuse of
variables to mean different things under different scenarios.

This patch remedies the issue by clearly defining the structure members based on
the actual context within which the structure is used.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agohvmloader: fix build error `invalid digit "8" in octal constant'
Wen Congyang [Wed, 8 Apr 2015 01:49:26 +0000 (01:49 +0000)]
hvmloader: fix build error `invalid digit "8" in octal constant'

commit b9245b75 introduces a building error:
make[1]: Entering directory `/root/work/xen/tools/firmware/hvmloader'
gcc   -O1 -fno-omit-frame-pointer -m32 -march=i686 -g -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes -Wdeclaration-after-statement   -O0 -g3 -D__XEN_TOOLS__ -MMD -MF .smbios.o.d -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -fno-optimize-sibling-calls -mno-tls-direct-seg-refs  -Werror -fno-stack-protector -fno-exceptions -fno-builtin -msoft-float -I/root/work/xen/tools/firmware/hvmloader/../../../tools/include -DENABLE_ROMBIOS -DENABLE_SEABIOS -D__SMBIOS_DATE__="04/08/2015"  -c -o smbios.o smbios.c
smbios.c:384:46: error: invalid digit "8" in octal constant
smbios.c:792:46: error: invalid digit "8" in octal constant
make[1]: *** [smbios.o] Error 1

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agoxentop: fix potential memory leak
Charles Arnold [Thu, 2 Apr 2015 15:42:02 +0000 (09:42 -0600)]
xentop: fix potential memory leak

On a read failure the qstats buffer is not freed.

Signed-off-by: Charles Arnold <carnold@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoRevert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"
Ian Jackson [Thu, 2 Apr 2015 14:32:22 +0000 (15:32 +0100)]
Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"

The bootloader code is relying on detecting POLLHUP, and 7e9ec50b
breaks that.  7e9ec50b, when handling a pty master, violates the
specification of the datacopier interface (as defined).

When the bootloader exits, several things change, all at once:
 (a) The master pty fd (held by libxl) starts to signal POLLHUP
    and maybe also POLLIN.
 (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
    which will be handled by the libxl child process code.
 (c) reads on the master pty fd start to return EOF

From the point of view of the datacopier these might happen in any
order.  I think there is a latent bug with (c), which I will discuss
later in this email.

In a recent bug report from a FreeBSD installation, the datacopier
gets told about (a) before (b).  But 7e9ec50b filters the POLLHUP out,
so that the dc signals eof rather than hup.  As a result in
bootloader_copyfail we take the error path.

This reverts commit 7e9ec50b0535bf2630da9d279a060775817d136d.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
10 years agohvmloader: add knob for fixed VGABIOS date string
Olaf Hering [Wed, 1 Apr 2015 13:28:35 +0000 (13:28 +0000)]
hvmloader: add knob for fixed VGABIOS date string

To allow reproducible builds of hvmloader introduce a make variable
VGABIOS_REL_DATE="dd Mon yyyy" to provide a fixed date string. Without
this change the hvmloader binary changes with every rebuild.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agohvmloader: add knob for fixed SMBIOS date string
Olaf Hering [Wed, 1 Apr 2015 13:28:34 +0000 (13:28 +0000)]
hvmloader: add knob for fixed SMBIOS date string

To allow reproducible builds of hvmloader introduce a make variable
SMBIOS_REL_DATE=mm/dd/yyyy to provide a fixed date string. Without this
change the hvmloader binary changes with every rebuild.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
10 years agoINSTALL: mention variables for reproducible builds
Olaf Hering [Wed, 1 Apr 2015 13:28:33 +0000 (13:28 +0000)]
INSTALL: mention variables for reproducible builds

Mention two variables introduced by commit ac977f5 ("use more fixed
strings to build the hypervisor").

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/hotplug: introduce XENSTORED_ARGS= in sysconfig file.
Olaf Hering [Wed, 1 Apr 2015 13:28:32 +0000 (13:28 +0000)]
tools/hotplug: introduce XENSTORED_ARGS= in sysconfig file.

It is already used in the runlevel script and the service file.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
10 years agoxen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts
Julien Grall [Wed, 1 Apr 2015 16:21:47 +0000 (17:21 +0100)]
xen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts

GICD_TYPER.ITLinesNumber can encode up to 1024 interrupts. Although,
IRQ 1020-1023 are reserved for special purpose.

The result is used by the callers of gic_number_lines in order to check
the validity of an IRQ.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
10 years agoxen/arm: vgic: Correctly calculate GICD_TYPER.ITLinesNumber
Julien Grall [Wed, 1 Apr 2015 16:21:46 +0000 (17:21 +0100)]
xen/arm: vgic: Correctly calculate GICD_TYPER.ITLinesNumber

The formula of GICD_TYPER.ITLinesNumber is 32(N + 1).

As the number of SPIs suppported by the domain may not be a multiple of
32, we have to round up the number before using it.

At the same time remove the mask GICD_TYPE_LINES which is pointless.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: gic_route_irq_to_guest: Honor the priority given in parameter
Julien Grall [Wed, 1 Apr 2015 16:21:45 +0000 (17:21 +0100)]
xen/arm: gic_route_irq_to_guest: Honor the priority given in parameter

The priority is already hardcoded in route_irq_to_guest and therefore
can't be controlled by the guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: gic: Add sanity checks gic_route_irq_to_guest
Julien Grall [Wed, 1 Apr 2015 16:21:44 +0000 (17:21 +0100)]
xen/arm: gic: Add sanity checks gic_route_irq_to_guest

With the addition of interrupt assignment to guest, we need to make sure
the guest can't blow up the interrupt management in Xen.

Before associating the IRQ to a vIRQ we need to make sure:
    - the vIRQ is not already associated to another IRQ
    - the guest didn't enable the vIRQ

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: route_irq_to_guest: Check validity of the IRQ
Julien Grall [Wed, 1 Apr 2015 16:21:43 +0000 (17:21 +0100)]
xen/arm: route_irq_to_guest: Check validity of the IRQ

Currently Xen only supports SPIs routing for guest, add a function
is_assignable_irq to check if we can assign a given IRQ to the guest.

Secondly, make sure the vIRQ is not the greater than the number of IRQs
configured in the vGIC and it's an SPI.

Thirdly, when the IRQ is already assigned to the domain, check the user
is not asking to use a different vIRQ than the one already bound.

Finally, desc->arch.type which contains the IRQ type (i.e level/edge) must
be correctly configured before. The misconfiguration can happen when:
    - the device has been blacklisted for the current platform
    - the IRQ has not been described in the device tree

Also, use XENLOG_G_ERR in the error message within the function as it will
be later called from a guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Allow virq != irq
Julien Grall [Wed, 1 Apr 2015 16:21:42 +0000 (17:21 +0100)]
xen/arm: Allow virq != irq

Currently, Xen is assuming that the virtual IRQ will always be the same
as IRQ.

Modify route_guest_irq to take the virtual IRQ in parameter which allow
Xen to assign a different IRQ number. Also store the vIRQ in the desc
action to easily retrieve the IRQ target when we need to inject the
interrupt.

As DOM0 will get most the devices, the vIRQ is equal to the IRQ in that case.

At the same time modify the behavior of irq_get_domain. The function now
requires that the irq_desc belongs to an IRQ assigned to a guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: Extend DOMCTL createdomain to support arch configuration
Julien Grall [Wed, 1 Apr 2015 16:21:41 +0000 (17:21 +0100)]
xen: Extend DOMCTL createdomain to support arch configuration

On ARM the virtual GIC may differ between each guest (emulated GIC version,
number of SPIs...). This information is already known at the domain creation
and can never change.

For now only the gic_version is set. In the long run, there will be more
parameters such as the number of SPIs. All will be required to be set at the
same time.

A new arch-specific structure arch_domainconfig has been created, the x86
one doesn't have any specific configuration, for now, a dummy structure
(C-spec compliant) has been created.

Some external tools (qemu, xenstore) may be required to create a domain.
Rather than asking them to take care of the arch-specific domain
configuration, let the current function (xc_domain_create) chose a
default configuration and introduce a new one (xc_domain_create_config).

This patch also drops the previously introduced DOMCTL arm_configure_domain
in Xen 4.5, as it has been made useless.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
10 years agoMAINTAINERS: move drivers/passthrough/device_tree.c in "DEVICE TREE"
Julien Grall [Wed, 1 Apr 2015 16:21:40 +0000 (17:21 +0100)]
MAINTAINERS: move drivers/passthrough/device_tree.c in "DEVICE TREE"

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Keir Fraser <keir@xen.org>
10 years agoxen/arm: Introduce xen, passthrough property
Julien Grall [Wed, 1 Apr 2015 16:21:39 +0000 (17:21 +0100)]
xen/arm: Introduce xen, passthrough property

When a device is marked for passthrough (via the new property
"xen,passthrough"), dom0 must not access to the device (i.e not
loading a driver), but should be able to manage the MMIO/interrupt
of the passthrough device.

The latter part will allow the toolstack to map MMIO/IRQ when a device
is pass through to a guest.

The property "xen,passthrough" will be translated as 'status="disabled"'
in the device tree to avoid DOM0 using the device. We assume that DOM0 is
able to cope with this property (already the case for Linux, and
required by ePAPR).

Rework the function map_device (renamed into handle_device) to:

* For a given device node:
    - Give permission to manage IRQ/MMIO for this device
    - Retrieve the IRQ configuration (i.e edge/level) from the device
    tree
* When the device is not marked for guest passthrough:
    - Assign the device to the guest if it's protected by an IOMMU
    - Map the IRQs and MMIOs regions to the guest

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Map disabled device in DOM0
Julien Grall [Wed, 1 Apr 2015 16:21:38 +0000 (17:21 +0100)]
xen/arm: Map disabled device in DOM0

The check to avoid mapping disabled devices in DOM0 was added in
anticipation of the device passthrough. But, a brand new property will
be added later to mark device which will be passthrough.

Also, remove the memory type check as we already skipped them earlier in
the function via skip_matches.

Furthermore, some platform (such as the OMAP) may try to poke device even
if the property "status" is set to "disabled".

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
10 years agoxen/arm: vgic: Introduce a function to initialize pending_irq
Julien Grall [Wed, 1 Apr 2015 16:21:37 +0000 (17:21 +0100)]
xen/arm: vgic: Introduce a function to initialize pending_irq

The structure pending_irq is initialized in the same way in 2 different
places. Introduce vgic_init_pending_irq to avoid code duplication.

Also move the setting of the irq field into this function as we need to
initialize it once rather than every time an IRQ is injected to the guest.

Finally, use unsigned int for the "irq" field to be consistent with the
virq variable

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/dts: Use unsigned int for MMIO and IRQ index
Julien Grall [Wed, 1 Apr 2015 16:21:36 +0000 (17:21 +0100)]
xen/dts: Use unsigned int for MMIO and IRQ index

There is no reason to use signed integer for an index.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/dts: Allow only IRQ translation that are mapped to main GIC
Julien Grall [Wed, 1 Apr 2015 16:21:35 +0000 (17:21 +0100)]
xen/dts: Allow only IRQ translation that are mapped to main GIC

Xen is only able to handle one GIC controller. Some platforms may contain
other interrupt controllers.

Make sure to only translate IRQ mapped into the GIC handled by Xen.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Divide GIC initialization in 2 parts
Julien Grall [Wed, 1 Apr 2015 16:21:34 +0000 (17:21 +0100)]
xen/arm: Divide GIC initialization in 2 parts

Currently the function to translate IRQ from the device tree is set
unconditionally  to be able to be able to retrieve serial/timer IRQ before the
GIC has been initialized.

It assumes that the xlate function won't ever changed. We may also need to
have the primary interrupt controller very early.

Rework the gic initialization in 2 parts:
    - gic_preinit: Get the interrupt controller device tree node and set
up GIC and xlate callbacks
    - gic_init: Initialize the interrupt controller and the boot CPU
    interrupts.

The former function will be called just after the IRQ subsystem as been
initialized.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>
Cc: Zoltan Kiss <zoltan.kiss@huawei.com>
10 years agodomctl: don't allow a toolstack domain to call domain_pause() on itself
Andrew Cooper [Wed, 1 Apr 2015 09:08:33 +0000 (10:08 +0100)]
domctl: don't allow a toolstack domain to call domain_pause() on itself

These DOMCTL subops were accidentally declared safe for disaggregation
in the wake of XSA-77.

This is XSA-127 / CVE-2015-2751.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoLimit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)
Konrad Rzeszutek Wilk [Wed, 19 Nov 2014 17:57:11 +0000 (12:57 -0500)]
Limit XEN_DOMCTL_memory_mapping hypercall to only process up to 64 GFNs (or less)

Said hypercall for large BARs can take quite a while. As such
we can require that the hypercall MUST break up the request
in smaller values.

Another approach is to add preemption to it - whether we do the
preemption using hypercall_create_continuation or returning
EAGAIN to userspace (and have it re-invocate the call) - either
way the issue we cannot easily solve is that in 'map_mmio_regions'
if we encounter an error we MUST call 'unmap_mmio_regions' for the
whole BAR region.

Since the preemption would re-use input fields such as nr_mfns,
first_gfn, first_mfn - we would lose the original values -
and only undo what was done in the current round (i.e. ignoring
anything that was done prior to earlier preemptions).

Unless we re-used the return value as 'EAGAIN|nr_mfns_done<<10' but
that puts a limit (since the return value is a long) on the amount
of nr_mfns that can provided.

This patch sidesteps this problem by:
 - Setting an hard limit of nr_mfns having to be 64 or less.
 - Toolstack adjusts correspondingly to the nr_mfn limit.
 - If the there is an error when adding the toolstack will call the
   remove operation to remove the whole region.

The need to break this hypercall down is for large BARs can take
more than the guest (initial domain usually) time-slice. This has
the negative result in that the guest is locked out for a long
duration and is unable to act on any pending events.

We also augment the code to return zero if nr_mfns instead
of trying to the hypercall.

This is XSA-125 / CVE-2015-2752.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Tue, 31 Mar 2015 16:29:48 +0000 (17:29 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

10 years agoxentop: add support for qdisks
Charles Arnold [Tue, 24 Mar 2015 02:55:08 +0000 (20:55 -0600)]
xentop: add support for qdisks

Now that Xen uses qdisks by default and qemu does not write out
statistics to sysfs this patch queries the QMP for disk statistics.

This patch depends on libyajl for parsing statistics returned from
QMP. The runtime requires libyajl 2.0.3 or newer for required bug
fixes in yajl_tree_parse().

Libxl is modified to create a new socket dedicated for the use of
libxenstat for querying the block statistics using QMP.

The current APIs remain unchanged. It works within the existing
framework of libxenstat.

Signed-off-by: Charles Arnold <carnold@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: cleanup some misuse of 'cpumap' as parameter
Dario Faggioli [Thu, 26 Mar 2015 08:55:04 +0000 (09:55 +0100)]
libxl: cleanup some misuse of 'cpumap' as parameter

in favour of the more generic 'bitmap', which is better
since these are generic libxl_bitmap_* functions.

Also fix a typo, and remove a stale (and wrong) comment.

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agolibxl: automatically set soft affinity after vnuma info
Dario Faggioli [Thu, 26 Mar 2015 08:54:57 +0000 (09:54 +0100)]
libxl: automatically set soft affinity after vnuma info

More specifically, vcpus are assigned to a vnode, which in
turn is associated with a pnode. If a vcpu does not have any
soft affinity, automatically build up one, matching the pcpus
of the said pnode.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agolibxl: check whether vcpu affinity and vnuma info match
Dario Faggioli [Thu, 26 Mar 2015 08:54:48 +0000 (09:54 +0100)]
libxl: check whether vcpu affinity and vnuma info match

More specifically, vcpus are assigned to a vnode, which in
turn is associated with a pnode. If a vcpu also has, in its
(hard or soft) affinity, some pcpus that are not part of the
said pnode, print a warning to the user.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
10 years agoQEMU_TAG update
Ian Jackson [Tue, 31 Mar 2015 15:29:19 +0000 (16:29 +0100)]
QEMU_TAG update

10 years agoxen/passthrough: Support a single iommu_domain per xen domain per SMMU
Robbie VanVossen [Tue, 24 Mar 2015 20:48:19 +0000 (16:48 -0400)]
xen/passthrough: Support a single iommu_domain per xen domain per SMMU

If multiple devices are being passed through to the same domain and they
share a single SMMU, then they only require a single iommu_domain.

In arm_smmu_assign_dev, before a new iommu_domain is created, the
xen_domain->contexts is checked for any iommu_domains that are already
assigned to device that uses the same SMMU as the current device. If one
is found, attach the device to that iommu_domain. If a new one isn't
found, create a new iommu_domain just like before.

The arm_smmu_deassign_dev function assumes that there is a single
device per iommu_domain. This meant that when the first device was
deassigned, the iommu_domain was freed and when another device was
deassigned a crash occurred in xen.

To fix this, a reference counter was added to the iommu_domain struct.
When an arm_smmu_xen_device references an iommu_domain, the
iommu_domains ref is incremented. When that reference is removed, the
iommu_domains ref is decremented. The iommu_domain will only be freed
when the ref is 0.

Signed-off-by: Robbie VanVossen <robert.vanvossen@dornerworks.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: arm: always omit guest user stack in vcpu_show_execution_state
Ian Campbell [Mon, 30 Mar 2015 11:12:35 +0000 (12:12 +0100)]
xen: arm: always omit guest user stack in vcpu_show_execution_state

Using !usr_mode(regs) only catches arm32 usr mode and not arm64 user
mode, switch to psr_mode_is_user instead.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Allow traps from 32 bit userspace on 64 bit hypervisors again
Ian Campbell [Mon, 30 Mar 2015 11:12:34 +0000 (12:12 +0100)]
xen: arm: Allow traps from 32 bit userspace on 64 bit hypervisors again

This removes the unconditional #undef injected in response to such
traps which was added by the fixes to CVE-2014-5147 / XSA-102 in
c0020e099702 "xen: arm: Handle traps from 32-bit userspace on 64-bit
kernel as undef", we now handle such traps correctly.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Dump guest state when invalid trap state is detected
Ian Campbell [Mon, 30 Mar 2015 11:12:33 +0000 (12:12 +0100)]
xen: arm: Dump guest state when invalid trap state is detected

By adding GUEST_BUG_ON locally to traps.c.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle remaining traps from userspace
Ian Campbell [Mon, 30 Mar 2015 11:12:32 +0000 (12:12 +0100)]
xen: arm: handle remaining traps from userspace

CP14 dbg and general CP register access are both handled with
unconditional injection of #undef from their respective handlers, so
allow these even from 32-bit userspace on a 64-bit kernel.

SMC32 and HVC32 should only come from a guest in AArch32 mode and
SMC64 and HVC64 should only come from a guest in AArch64 mode. Add
appropriate BUG_ONs to all cases.

After this bad_trap is no longer used.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: correctly handle sysreg accesses from userspace
Ian Campbell [Mon, 30 Mar 2015 11:12:31 +0000 (12:12 +0100)]
xen: arm: correctly handle sysreg accesses from userspace

Previously we implemented all registers as RAZ/WI even if they
shouldn't be accessible to userspace.

It is not entirely clear whether attempts to access *_EL1 registers
from EL0 will trap to EL1 or EL2, be conservative and treat as an
undef injection.

PMUSERENR_EL0 and MDCCSR_EL0 are R/O to EL0. MDCCSR_EL0 was previously
not handled at all.

Other PM*_EL0 registers are accessible at EL0 only if PMUSERENR_EL0.EN
is set, since we emulate that as RAZ/WI we know that bit cannot be
set.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Handle CP14 32-bit register accesses from userspace
Ian Campbell [Mon, 30 Mar 2015 11:12:30 +0000 (12:12 +0100)]
xen: arm: Handle CP14 32-bit register accesses from userspace

Accesses to these from 32-bit userspace would cause a hypervisor
exception (host crash) when running a 64-bit kernel, which is worked
around by the fix to XSA-102. On 32-bit kernels they would be
implemented as RAZ/WI which is incorrect but harmless.

Update as follows:
 - DBGDSCRINT should be R/O.
 - DBGDSCREXT should be EL1 only.
 - DBGOSLAR is WO and EL1 only.
 - DBGVCR, DBGB[VC]R*, DBGW[VC]R*, and DBGOSDLR are EL1 only.

DBGDIDR and DBGDSCRINT are accessible from EL0 if DBGDSCRext.UDCCdis.
Since we emulate that as RAZ/WI we allow access.

When we do not allow an access we now silently inject an undef even in
debug mode since the debugging messages are not helpful (we have
handled the access, by explicitly choosing not to).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Handle CP15 register traps from userspace
Ian Campbell [Mon, 30 Mar 2015 11:12:29 +0000 (12:12 +0100)]
xen: arm: Handle CP15 register traps from userspace

Previously userspace access to PM* would have been incorrectly (but
benignly) implemented as RAZ/WI when running on a 32-bit kernel and
would cause a hypervisor exception (host crash) when running a 64-bit
kernel (this was already solved via the fix to XSA-102).

PMINTENSET, PMINTENCLR are EL1 only, but it is not clear whether
attempts to access from EL0 will trap to EL1 or EL2, be conservative
and handle EL0 access with an undef injection.

ACTLR is EL1 only and the ARM ARM states that HCR_EL2.TACR causes
accesses from EL1 to trap. However remain conservative even here and
handle accesses from EL0 by injecting an undef injection.

PMUSERENR is R/O at EL0 and we implement as RAZ/WI at EL1 as before.

The remaining PM* registers are accessible to EL0 only if
PMUSERENR_EL0.EN is set, since we emulate this as RAZ/WI the bit is
never set so we inject a trap on attempted access. We weren't
previously handling PMCCNTR.

HSR_EC_CP15_32 should never be seen from a 64-bit guest, so BUG_ON if
that occurs.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: drop cache maintenance by set/way trap handling
Ian Campbell [Mon, 30 Mar 2015 11:12:28 +0000 (12:12 +0100)]
xen: arm: drop cache maintenance by set/way trap handling

We do not set HCR_EL2.TSW so we will never see these.

This is undoubtedly wrong, but for now remove the dead code.

However, retain the HSR_SYSREG_* added by the precursor to this patch,
although they aren't used they are factually accurate and may as well
be kept for future use.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: do not handle traps accessing CLIDR_EL1 or CCSIDR_EL1
Ian Campbell [Mon, 30 Mar 2015 11:12:27 +0000 (12:12 +0100)]
xen: arm: do not handle traps accessing CLIDR_EL1 or CCSIDR_EL1

They are trapped only with HCR_EL2.TID2 which we don't set, and in any
case we handled only for 32-bit.

One day we may want to trap and emulate these, but for now don't
bother with the dead code.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Handle 32-bit EL0 on 64-bit EL1 when advancing PC after trap
Ian Campbell [Mon, 30 Mar 2015 11:12:26 +0000 (12:12 +0100)]
xen: arm: Handle 32-bit EL0 on 64-bit EL1 when advancing PC after trap

Fix a coding style issue while in the area.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Use ARMv8 names for CNTHCTL_EL2 bits
Ian Campbell [Mon, 30 Mar 2015 11:12:25 +0000 (12:12 +0100)]
xen: arm: Use ARMv8 names for CNTHCTL_EL2 bits

Rather than using the v8 register names and the v7 bit names, which
makes things needlessly difficult when reading.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: handle accesses to CNTP_CVAL_EL0
Ian Campbell [Mon, 30 Mar 2015 11:12:24 +0000 (12:12 +0100)]
xen: arm: handle accesses to CNTP_CVAL_EL0

All OSes we have run on top of Xen use CNTP_TVAL_EL0 but for
completeness we really should handle CVAL too.

In vtimer_emulate_cp64 pull the propagation of the 64-bit result into
two 32-bit registers out of the switch to avoid duplicating for every
register. We also need to initialise x now since previously the only
register implemented register was R/O.

While adding HSR_SYSREG_CNTP_CVAL_EL0 also move
HSR_SYSREG_CNTP_CTL_EL0 so it is sorted correctly.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: correctly handle vtimer traps from userspace
Ian Campbell [Mon, 30 Mar 2015 11:12:23 +0000 (12:12 +0100)]
xen: arm: correctly handle vtimer traps from userspace

Previously 32-bit userspace on 32-bit kernel and 64-bit userspace on
64-bit kernel could access these registers irrespective of whether the
kernel had configured them to be allowed to. To fix this:

 - Userspace access to CNTP_CTL_EL0 and CNTP_TVAL_EL0 should be gated
   on CNTKCTL_EL1.EL0PTEN.
 - Userspace access to CNTPCT_EL0 should be gated on
   CNTKCTL_EL1.EL0PCTEN.

When we do not handle an access we now silently inject an undef even
in debug mode since the debugging messages are not helpful (we have
handled the access, by explicitly choosing not to).

The usermode accessibility check is rather repetitive, so a helper
macro is introduced.

Since HSR_EC_CP15_64 cannot be taken from a guest in AArch64 mode
except due to a hardware bug switch the associated check to a BUG_ON
(which will be switched to something more appropriate in a subsequent
patch)

Fix a coding style issue in HSR_CPREG64(CNTPCT) while touching similar
code.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Factor out psr_mode_is_user
Ian Campbell [Mon, 30 Mar 2015 11:12:22 +0000 (12:12 +0100)]
xen: arm: Factor out psr_mode_is_user

This embodies the logic on arm64 that userspace can be either 32-bit
or 64-bit. It will be used in other places shortly.

Note that the logic differs slightly because the original (in
inject_abt64_exception) knew that the kernel was 64-bit and could
therefore assume that any 32-bit mode was userspace. Instead the
refactored code explicitly checks for usr mode.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: Correct PMXEV cp register definitions
Ian Campbell [Mon, 30 Mar 2015 11:12:21 +0000 (12:12 +0100)]
xen: arm: Correct PMXEV cp register definitions

p15,0,c9,c13,1 is PMXEVTYPER not PMXEVCNTR.
p15,0,c9,c13,2 is PMXEVCNTR not PMXEVCNR.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoflask: Update XEN_SYSCTL_cputopoinfo name
Boris Ostrovsky [Mon, 30 Mar 2015 20:17:59 +0000 (16:17 -0400)]
flask: Update XEN_SYSCTL_cputopoinfo name

Commit 2090f14c5cbd ("sysctl: make XEN_SYSCTL_topologyinfo sysctl a
little more efficient") renamed XEN_SYSCTL_topologyinfo to
XEN_SYSCTL_cputopoinfo.

It, however, neglected to update this macro for flask-related files.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: Introduce xc_domain_nr_gpfns as a cousin of xc_domain_maximum_gpfn.
Konrad Rzeszutek Wilk [Mon, 30 Mar 2015 14:46:38 +0000 (10:46 -0400)]
libxc: Introduce xc_domain_nr_gpfns as a cousin of xc_domain_maximum_gpfn.

The commit a8f8a590e02d2d2b717257c0bd9a8b396103bdf4
"libxc: Check xc_domain_maximum_gpfn for negative return values"
introduced an regression in tools outside libxc (migrate v2)
which wanted the unfiltered GPFN value. Said commit added
a wrapper which added +1 if there were no errors.

To make it work pre-commit a8f8a59 we add an xc_domain_nr_gpfns
which will add +1 if there are no errors (and change all in-tree
callers to use it). The xc_domain_maximum_gpfn will return the
unfiltered GPFN value.

Suggested-by: Ian Campbell <ian.campbell@citrix.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/xen-mfndump: Remove stray semicolon preventing 'dump-m2p' from functioning
Andrew Cooper [Fri, 27 Mar 2015 18:44:52 +0000 (18:44 +0000)]
tools/xen-mfndump: Remove stray semicolon preventing 'dump-m2p' from functioning

Introduced by c/s 1781f00e

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-IDs: 1291939 (stray semicolon), 1291941 (structually dead code)
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Xen Coverity Team <coverity@xen.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: arm: Fix typo "Falltrough"
Ian Campbell [Wed, 25 Mar 2015 15:34:18 +0000 (15:34 +0000)]
xen: arm: Fix typo "Falltrough"

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agoxen: arm: correctly handle continuations for 64-bit guests
Ian Campbell [Thu, 26 Mar 2015 10:54:04 +0000 (10:54 +0000)]
xen: arm: correctly handle continuations for 64-bit guests

The 64-bit ABI is different to 32-bit:

 - uses x16 as the op register rather than r12.
 - arguments in x0..x5 and not r0..r5. Using rN here potentially
   truncates.
 - return value goes in x0, not r0.

Hypercalls can only be made directly from kernel space, so checking
the domain's size is sufficient.

Spotted due to spurious -EFAULT when destroying a domain, due to the
hypercall's pointer argument being truncated. I'm unclear why I am
only seeing this now.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agobuild: Reorganize and briefly document "external repo" template in tools/Makefile
George Dunlap [Thu, 26 Mar 2015 12:46:07 +0000 (12:46 +0000)]
build: Reorganize and briefly document "external repo" template in tools/Makefile

No functional changes.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Comment cleanups
Ian Jackson [Tue, 10 Feb 2015 20:09:49 +0000 (20:09 +0000)]
libxl: Comment cleanups

* Add two comments in libxl_remus_disk_drbd documenting buggy handling
  of the hotplug script exit status.

* Add a section heading for async exec in libxl_aoutils.c

* Mention the right function name (libxl__ev_child_fork, not
  libxl__ev_fork) in libxl_internal.h

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: Further fix exit paths from libxl_device_events_handler
Ian Jackson [Tue, 10 Feb 2015 20:09:48 +0000 (20:09 +0000)]
libxl: Further fix exit paths from libxl_device_events_handler

On the success path, do not call GC_FREE explicitly.  Instead, call
AO_INPROGRESS.

GC_FREE will free the gc underlying the long-term ao, which is then
subsequently referenced in backend_watch_callback's call to
libxl__nested_ao_create.  It is a miracle that this ever works at all.

Also, add an `if (rc) goto out;' after the xswatch registration.

After this, libxl_device_events_handler has the conventional and
correct ao initiation pattern.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Roger Pau Monne <roger.pau@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/mkrpm: improve version.release handling
Olaf Hering [Tue, 24 Mar 2015 14:37:42 +0000 (14:37 +0000)]
tools/mkrpm: improve version.release handling

An increasing version and/or release number helps to update existing
packages without --force as in "rpm Uvh --force xen.rpm". Instead its
possible to do "rpm -Fvh *.rpm" to update only already installed
packages.

The usage of --force disables essentials checks such as file conflict
detection. As a result the new xen.rpm may overwrite files owned by
other packages.

With the current way of calculating version-release it is difficult to
get an increasing release number into the spec file. The release is
always zero unless "make make XEN_VENDORVERSION=`date +.%s`" is used,
which has the bad side effect that xen.gz always gets a different
filename every time.

Update mkrpm to recognize PKG_RELEASE=. Its value will be appended to
the Release string. It can be filled with a time stamp, like:
 make rpmball PKG_RELEASE="`date +%Y%m%d%H%M%S`"

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agohotplug/Linux: add missing backslash in dom0_ip
Olaf Hering [Fri, 27 Mar 2015 10:29:24 +0000 (10:29 +0000)]
hotplug/Linux: add missing backslash in dom0_ip

Without it the actual error message is not written to xenstore.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: Make conversion from page count to bytes 32-bit safe
Boris Ostrovsky [Thu, 26 Mar 2015 18:08:44 +0000 (14:08 -0400)]
libxc: Make conversion from page count to bytes 32-bit safe

Commit ba59e2ce935d ("libxc: allocate memory with vNUMA information for
PV guest") creates default vNUMA layout with a single range containing
all memory. The end of the range is calculated by shifting
dom->total_pages by 12 to the left.

On 32-bit dom0 this may result in losing upper bits since total_pages is
a 32-bit type.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86_emulate: split the {reg,mem} union in struct operand
Tim Deegan [Fri, 27 Mar 2015 15:13:07 +0000 (16:13 +0100)]
x86_emulate: split the {reg,mem} union in struct operand

In the hopes of making any future errors along the lines of XSA-123
into clean crashes instead of memory corruption bugs.

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoRevert "x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P"
Tim Deegan [Fri, 27 Mar 2015 15:12:17 +0000 (16:12 +0100)]
Revert "x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P"

This reverts commit d639e6a05a0f8ee0e61c6cc4eebba78934ef3648.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Conflicts:
xen/arch/x86/domain.c
xen/arch/x86/mm.c
xen/arch/x86/mm/shadow/multi.c

10 years agoVT-d: improve fault info logging
Jan Beulich [Fri, 27 Mar 2015 14:23:25 +0000 (15:23 +0100)]
VT-d: improve fault info logging

I got repeatedly annoyed by there not getting anything logged by
default on VT-d faults (and hence having to tell people to add extra
command line options), and hence I think it is time to redo this code:
Log basic fault information at guest-warning level (rate limited by
default), and show the page walk in verbose rather than only in debug
mode. Break up multi-line message so that each gets a proper log level
attached, at once splitting out the common part. Also don't log
"unknown" faults as interrupt-remapping ones.

As a minor cleanup fix the type of the involved "fault_type" variables.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agox86: make atomic bitops consistent with non-atomic ones
Jan Beulich [Thu, 26 Mar 2015 10:24:15 +0000 (11:24 +0100)]
x86: make atomic bitops consistent with non-atomic ones

- use int instead of long pointers (matching the 'l' suffix on insns)
- use "+m" instead  or a pair of "=m" and "m" in asm() constraints

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: simplify non-atomic bitops
Jan Beulich [Thu, 26 Mar 2015 10:23:33 +0000 (11:23 +0100)]
x86: simplify non-atomic bitops

- being non-atomic, their pointer arguments shouldn't be volatile-
  qualified
- their (half fake) memory operands can be a single "+m" instead of
  being both an output and an input

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/MSI: fix error handling
Jan Beulich [Thu, 26 Mar 2015 10:19:57 +0000 (11:19 +0100)]
x86/MSI: fix error handling

__setup_msi_irq() needs to undo what it did before calling
write_msi_msg() in case that returned an error.

map_domain_pirq() needs to get rid of the MSI descriptor it
(implicitly) allocated. The case of a setup_msi_irq() failure on a
non-initial multi-vector-MSI interrupt needs special handling: While
the initial IRQ will get freed by the caller (who also passed it to
us), we need to undo the effect setup_msi_irq() had on it. (As a
benefit from the added call to msi_free_irq() we no longer need to
explicitly call destroy_irq() on the non-initial slots.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agoLZ4 : fix the data abort issue
JeHyeon Yeon [Thu, 26 Mar 2015 10:19:10 +0000 (11:19 +0100)]
LZ4 : fix the data abort issue

If the part of the compression data are corrupted, or the compression
data is totally fake, the memory access over the limit is possible.

This is the log from my system usning lz4 decompression.
   [6502]data abort, halting
   [6503]r0  0x00000000 r1  0x00000000 r2  0xdcea0ffc r3  0xdcea0ffc
   [6509]r4  0xb9ab0bfd r5  0xdcea0ffc r6  0xdcea0ff8 r7  0xdce80000
   [6515]r8  0x00000000 r9  0x00000000 r10 0x00000000 r11 0xb9a98000
   [6522]r12 0xdcea1000 usp 0x00000000 ulr 0x00000000 pc  0x820149bc
   [6528]spsr 0x400001f3
and the memory addresses of some variables at the moment are
    ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000

As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory
over @oend.

Signed-off-by: JeHyeon Yeon <tom.yeon@windriver.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
[Linux commit d5e7cafd69da24e6d6cc988fab6ea313a2577efc]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86: don't change affinity with interrupt unmasked
Jan Beulich [Thu, 26 Mar 2015 10:18:28 +0000 (11:18 +0100)]
x86: don't change affinity with interrupt unmasked

With ->startup unmasking the IRQ, setting the affinity afterwards
without masking the IRQ again is invalid namely for MSI (address and
data can't be updated atomically and may - at least for MSI-X - be
cached while unmasked).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agohvmloader: don't treat ROM BAR like other BARs
Jan Beulich [Thu, 26 Mar 2015 10:17:51 +0000 (11:17 +0100)]
hvmloader: don't treat ROM BAR like other BARs

Its low 11 bits have different meaning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agosysctl: don't overwrite array size variable when it is set on error earlier
Boris Ostrovsky [Thu, 26 Mar 2015 10:13:01 +0000 (11:13 +0100)]
sysctl: don't overwrite array size variable when it is set on error earlier

When querying CPU topology, if caller-provided array size is smaller than
number of online CPUs then, in addition to returning -ENOBUFS, sysctl is
expected to provide back this number. However, this value, stored in 'i',
is overwritten in the subsequent loop's control statement.

Make sure we don't do this by converting the loop to 'while'.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>