]> xenbits.xensource.com Git - xen.git/log
xen.git
11 years agoevtchn: refactor low-level event channel port ops
David Vrabel [Mon, 14 Oct 2013 08:15:49 +0000 (10:15 +0200)]
evtchn: refactor low-level event channel port ops

Use functions for the low-level event channel port operations
(set/clear pending, unmask, is_pending and is_masked).

Group these functions into a struct evtchn_port_op so they can be
replaced by alternate implementations (for different ABIs) on a
per-domain basis.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agodebug: remove some event channel info from the 'i' and 'q' debug keys
David Vrabel [Mon, 14 Oct 2013 08:14:38 +0000 (10:14 +0200)]
debug: remove some event channel info from the 'i' and 'q' debug keys

The 'i' key would always use VCPU0's selector word when printing the
event channel state. Remove the incorrect output as a subsequent
change will add the (correct) information to the 'e' key instead.

When dumping domain information, printing the state of the VIRQ_DEBUG
port is redundant -- this information is available via the 'e' key.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/HVM: cache emulated instruction for retry processing
Jan Beulich [Mon, 14 Oct 2013 07:54:09 +0000 (09:54 +0200)]
x86/HVM: cache emulated instruction for retry processing

Rather than re-reading the instruction bytes upon retry processing,
stash away and re-use what we already read. That way we can be certain
that the retry won't do something different from what requested the
retry, getting once again closer to real hardware behavior (where what
we use retries for is simply a bus operation, not involving redundant
decoding of instructions).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/HVM: properly deal with hvm_copy_*_guest_phys() errors
Jan Beulich [Mon, 14 Oct 2013 07:53:31 +0000 (09:53 +0200)]
x86/HVM: properly deal with hvm_copy_*_guest_phys() errors

In memory read/write handling the default case should tell the caller
that the operation cannot be handled rather than the operation having
succeeded, so that when new HVMCOPY_* states get added not handling
them explicitly will not result in errors being ignored.

In task switch emulation code stop handling some errors, but not
others.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/HVM: don't ignore hvm_copy_to_guest_phys() errors during I/O intercept
Jan Beulich [Mon, 14 Oct 2013 07:52:33 +0000 (09:52 +0200)]
x86/HVM: don't ignore hvm_copy_to_guest_phys() errors during I/O intercept

Building upon the extended retry logic we can now also make sure to
not ignore errors resulting from writing data back to guest memory.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/HVM: fix direct PCI port I/O emulation retry and error handling
Jan Beulich [Mon, 14 Oct 2013 07:51:40 +0000 (09:51 +0200)]
x86/HVM: fix direct PCI port I/O emulation retry and error handling

dpci_ioport_{read,write}() guest memory access failure handling should
be modelled after process_portio_intercept()'s (and others): Upon
encountering an error on other than the first iteration, the count
successfully handled needs to be stored and X86EMUL_OKAY returned, in
order for the generic instruction emulator to update register state
correctly before reporting failure or retrying (both of which would
only happen after re-invoking emulation).

Further we leverage (and slightly extend, due to the above mentioned
need to return X86EMUL_OKAY) the "large MMIO" retry model.

Note that there is still a special case not explicitly taken care of
here: While the first retry on the last iteration of a "rep ins"
correctly recovers the already read data, an eventual subsequent retry
is being handled by the pre-existing mmio-large logic (through
hvmemul_do_io() storing the [recovered] data [again], also taking into
consideration that the emulator converts a single iteration "ins" to
->read_io() plus ->write()).

Also fix an off-by-one in the mmio-large-read logic, and slightly
simplify the copying of the data.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/HVM: properly handle backward string instruction emulation
Jan Beulich [Mon, 14 Oct 2013 07:50:16 +0000 (09:50 +0200)]
x86/HVM: properly handle backward string instruction emulation

Multiplying a signed 32-bit quantity with an unsigned 32-bit quantity
produces an unsigned 32-bit result, yet for emulation of backward
string instructions we need the result sign extended before getting
added to the base address.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agosched: Correct function prototypes
Andrew Cooper [Mon, 14 Oct 2013 07:07:44 +0000 (09:07 +0200)]
sched: Correct function prototypes

struct vcpu pointers are traditionally v rather than d.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/MSI: fix locking in pci_restore_msi_state()
Jan Beulich [Mon, 14 Oct 2013 07:07:02 +0000 (09:07 +0200)]
x86/MSI: fix locking in pci_restore_msi_state()

Right after the loop the lock is being dropped, so all loop exits
should happen with the lock still held.

Reported-by: Kristoffer Egefelt <kristoffer@itoc.dk>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Kristoffer Egefelt <kristoffer@itoc.dk>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agosched: fix race between sched_move_domain() and vcpu_wake()
David Vrabel [Mon, 14 Oct 2013 06:58:31 +0000 (08:58 +0200)]
sched: fix race between sched_move_domain() and vcpu_wake()

From: David Vrabel <david.vrabel@citrix.com>

sched_move_domain() changes v->processor for all the domain's VCPUs.
If another domain, softirq etc. triggers a simultaneous call to
vcpu_wake() (e.g., by setting an event channel as pending), then
vcpu_wake() may lock one schedule lock and try to unlock another.

vcpu_schedule_lock() attempts to handle this but only does so for the
window between reading the schedule_lock from the per-CPU data and the
spin_lock() call.  This does not help with sched_move_domain()
changing v->processor between the calls to vcpu_schedule_lock() and
vcpu_schedule_unlock().

Fix the race by taking the schedule_lock for v->processor in
sched_move_domain().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Use vcpu_schedule_lock_irq() (which now returns the lock) to properly
retry the locking should the to be used lock have changed in the course
of acquiring it (issue pointed out by George Dunlap).

Add a comment explaining the state after the v->processor adjustment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoscheduler: adjust internal locking interface
Jan Beulich [Mon, 14 Oct 2013 06:57:56 +0000 (08:57 +0200)]
scheduler: adjust internal locking interface

Make the locking functions return the lock pointers, so they can be
passed to the unlocking functions (which in turn can check that the
lock is still actually providing the intended protection, i.e. the
parameters determining which lock is the right one didn't change).

Further use proper spin lock primitives rather than open coded
local_irq_...() constructs, so that interrupts can be re-enabled as
appropriate while spinning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: fix bug_line()
Jan Beulich [Mon, 14 Oct 2013 06:52:18 +0000 (08:52 +0200)]
x86: fix bug_line()

Due to the packing into a bit field together with a relocated field,
the computation can overflow when the relocated field ends up getting a
negative value stored. Hence it isn't sufficient to correct the value
by 1 in this case, but we also need to mask the result to the width of
the original bit field.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoRevert "QEMU_TAG update"
Ian Jackson [Fri, 11 Oct 2013 18:05:31 +0000 (19:05 +0100)]
Revert "QEMU_TAG update"

(My script edited the wrong xen.git branch)

This reverts commit 363cfda13a58eab51a4a85f30c7c740990b53c3a.

11 years agoQEMU_TAG update
Ian Jackson [Fri, 11 Oct 2013 18:04:25 +0000 (19:04 +0100)]
QEMU_TAG update

11 years agolibxl: make libxl__poller_put tolerate p==NULL
Ian Jackson [Fri, 11 Oct 2013 11:10:45 +0000 (12:10 +0100)]
libxl: make libxl__poller_put tolerate p==NULL

This is less fragile, and more in keeping with the usual style of
initialising everything to 0 and freeing things unconditionally.

Correspondingly, remove the tests at the call sites.

Apropos of c1f3f174.  No overall functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agox86: check for canonical address before doing page walks
Jan Beulich [Fri, 11 Oct 2013 07:31:16 +0000 (09:31 +0200)]
x86: check for canonical address before doing page walks

... as there doesn't really exists any valid mapping for them.

Particularly in the case of do_page_walk() this also avoids returning
non-NULL for such invalid input.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: use {rd,wr}{fs,gs}base when available
Jan Beulich [Fri, 11 Oct 2013 07:30:31 +0000 (09:30 +0200)]
x86: use {rd,wr}{fs,gs}base when available

... as being intended to be faster than MSR reads/writes.

In the case of emulate_privileged_op() also use these in favor of the
cached (but possibly stale) addresses from arch.pv_vcpu. This allows
entirely removing the code that was the subject of XSA-67.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: add address validity check to guest_map_l1e()
Jan Beulich [Fri, 11 Oct 2013 07:29:43 +0000 (09:29 +0200)]
x86: add address validity check to guest_map_l1e()

Just like for guest_get_eff_l1e() this prevents accessing as page
tables (and with the wrong memory attribute) internal data inside Xen
happening to be mapped with 1Gb pages.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: correct LDT checks
Jan Beulich [Fri, 11 Oct 2013 07:28:26 +0000 (09:28 +0200)]
x86: correct LDT checks

- MMUEXT_SET_LDT should behave as similarly to the LLDT instruction as
  possible: fail only if the base address is non-canonical
- instead LDT descriptor accesses should fault if the descriptor
  address ends up being non-canonical (by ensuring this we at once
  avoid reading an entry from the mach-to-phys table and consider it a
  page table entry)
- fault propagation on using LDT selectors must distinguish #PF and #GP
  (the latter must be raised for a non-canonical descriptor address,
  which also applies to several other uses of propagate_page_fault(),
  and hence the problem is being fixed there)
- map_ldt_shadow_page() should properly wrap addresses for 32-bit VMs

At once remove the odd invokation of map_ldt_shadow_page() from the
MMUEXT_SET_LDT handler: There's nothing really telling us that the
first LDT page is going to be preferred over others.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agolibxl: fix out-of-memory error handling in libxl_list_cpupool
Matthew Daley [Tue, 10 Sep 2013 10:18:46 +0000 (22:18 +1200)]
libxl: fix out-of-memory error handling in libxl_list_cpupool

...otherwise it will return freed memory. All the current users of this
function check already for a NULL return, so use that.

Coverity-ID: 1056194

This is CVE-2013-4371 / XSA-70

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/ocaml: fix erroneous free of cpumap in stub_xc_vcpu_getaffinity
Matthew Daley [Tue, 10 Sep 2013 11:12:45 +0000 (23:12 +1200)]
tools/ocaml: fix erroneous free of cpumap in stub_xc_vcpu_getaffinity

Not sure how it got there...

Coverity-ID: 1056196

This is CVE-2013-4370 / XSA-69

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: fix vif rate parsing
Ian Jackson [Thu, 10 Oct 2013 14:48:55 +0000 (15:48 +0100)]
libxl: fix vif rate parsing

strtok can return NULL here. We don't need to use strtok anyway, so just
use a simple strchr method.

Coverity-ID: 1055642

This is CVE-2013-4369 / XSA-68

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Fix type. Add test case

Signed-off-by: Ian Campbell <Ian.campbell@citrix.com>
11 years agox86: check segment descriptor read result in 64-bit OUTS emulation
Matthew Daley [Thu, 10 Oct 2013 13:19:53 +0000 (15:19 +0200)]
x86: check segment descriptor read result in 64-bit OUTS emulation

When emulating such an operation from a 64-bit context (CS has long
mode set), and the data segment is overridden to FS/GS, the result of
reading the overridden segment's descriptor (read_descriptor) is not
checked. If it fails, data_base is left uninitialized.

This can lead to 8 bytes of Xen's stack being leaked to the guest
(implicitly, i.e. via the address given in a #PF).

Coverity-ID: 1055116

This is CVE-2013-4368 / XSA-67.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Fix formatting.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agoxen/arm: Fixing clear_guest_offset macro
Jaeyong Yoo [Fri, 4 Oct 2013 04:44:02 +0000 (13:44 +0900)]
xen/arm: Fixing clear_guest_offset macro

Fix the the broken macro 'clear_guest_offset' in arm.

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Thu, 10 Oct 2013 11:41:10 +0000 (12:41 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

11 years agolibxl: introduce libxl_node_to_cpumap
Dario Faggioli [Thu, 3 Oct 2013 17:46:02 +0000 (19:46 +0200)]
libxl: introduce libxl_node_to_cpumap

As an helper for the special case (of libxl_nodemap_to_cpumap) when
one wants the cpumap for just one node.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agoxl: fix a typo in main_vcpulist()
Dario Faggioli [Thu, 3 Oct 2013 17:45:47 +0000 (19:45 +0200)]
xl: fix a typo in main_vcpulist()

which was preventing `xl vcpu-list -h' to work.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxl: update the manpage about "cpus=" and NUMA node-affinity
Dario Faggioli [Thu, 3 Oct 2013 17:45:38 +0000 (19:45 +0200)]
xl: update the manpage about "cpus=" and NUMA node-affinity

Since d06b1bf169a01a9c7b0947d7825e58cb455a0ba5 ('libxl: automatic placement
deals with node-affinity') it is no longer true that, if no "cpus=" option
is specified, xl picks up some pCPUs by default and pin the domain there.

In fact, it is the NUMA node-affinity that is affected by automatic
placement, not vCPU to pCPU pinning.

Update the xl config file documenation accordingly, as it seems to have
been forgotten at that time.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agotools/migrate: Fix regression when migrating from older version of Xen
Andrew Cooper [Thu, 10 Oct 2013 11:23:10 +0000 (12:23 +0100)]
tools/migrate: Fix regression when migrating from older version of Xen

Commit 00a4b65f8534c9e6521eab2e6ce796ae36037774 Sep 7 2010
  "libxc: provide notification of final checkpoint to restore end"
broke migration from any version of Xen using tools from prior to that commit

Older tools have no idea about an XC_SAVE_ID_LAST_CHECKPOINT, causing newer
tools xc_domain_restore() to start reading the qemu save record, as
ctx->last_checkpoint is 0.

The failure looks like:
  xc: error: Max batch size exceeded (1970103633). Giving up.
where 1970103633 = 0x756d6551 = *(uint32_t*)"Qemu"

With this fix in place, the behaviour for normal migrations is reverted to how
it was before the regression; the migration is considered non-checkpointed
right from the start.  A XC_SAVE_ID_LAST_CHECKPOINT chunk seen in the
migration stream is a nop.  For checkpointed migrations the behaviour is
unchanged.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> (Remus bits)
11 years agotools: adds tracer on qemu-xen debug configure options
Fabio Fantoni [Fri, 27 Sep 2013 14:00:46 +0000 (16:00 +0200)]
tools: adds tracer on qemu-xen debug configure options

When building tools in debug mode (debug=y), pass also
--enable-trace-backend=stderr when configuring qemu-xen.
Useful to improve debug.

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
11 years agoxen/arm32: Call start_xen only on the boot CPU
Julien Grall [Mon, 7 Oct 2013 14:44:35 +0000 (15:44 +0100)]
xen/arm32: Call start_xen only on the boot CPU

The boot CPU can have a CPU ID non-equal to zero. Xen needs to check the
logical CPU ID (in r12) to know if the CPU is the boot one.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm32: Call start_xen only on the boot CPU
Julien Grall [Mon, 7 Oct 2013 14:44:35 +0000 (15:44 +0100)]
xen/arm32: Call start_xen only on the boot CPU

The boot CPU can have a CPU ID non-equal to zero. Xen needs to check the
logical CPU ID (in r12) to know if the CPU is the boot one.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoqemu-xen: Set localstatedir to /var.
Anthony PERARD [Tue, 8 Oct 2013 12:59:57 +0000 (13:59 +0100)]
qemu-xen: Set localstatedir to /var.

This path is used by the QEMU build system to create the /run directory.
If local-state-dir is not set, the result become $prefix/var which
is not an acceptable path.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoqemu-xen: Disabling build of guest-agent.
Anthony PERARD [Tue, 8 Oct 2013 12:59:56 +0000 (13:59 +0100)]
qemu-xen: Disabling build of guest-agent.

It is not use when QEMU is run with Xen.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agohvm/vidirian: Avoid printing page_to_mfn(NULL) on error paths
Andrew Cooper [Wed, 9 Oct 2013 10:11:48 +0000 (12:11 +0200)]
hvm/vidirian: Avoid printing page_to_mfn(NULL) on error paths

While working in the viridian code, I noticed that 4cb6c4f4941

"x86/hvm: Use get_page_from_gfn() instead of get_gfn()/put_gfn."

introduced two error paths where page_to_mfn(NULL) would be formatted and
presented as a bad MFN.  This provides junk in the warning rather than
something useful.

These two codepaths are fixed up to match their counterpart in
wrmsr_hypervisor_regs()

While auditing the other changes from 4cb6c4f4941, I noticed a small
optimisation which could be made by changing the order of the validity checks
to remove 6 NULL pointer checks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/traps: improvements to {rd,wr}msr_hypervisor_regs()
Andrew Cooper [Wed, 9 Oct 2013 10:10:46 +0000 (12:10 +0200)]
x86/traps: improvements to {rd,wr}msr_hypervisor_regs()

Coverity ID: 1055249 1055250

Coverity was complaining that the switch statments contained dead code in
their default statements.  While this is quite minor, the code flow in
wrmsr_hypervisor_regs() was sufficiently opaque that I felt it approprate to
fix.

Other improvements include:
 * not shadowing the function parameter 'idx'.
 * use of PAGE_{SHIFT,SIZE} instead of opencoded numbers.
 * a more descriptive error message for attempting to write invalid indicies
   for hypercall pages.

There is no behavioural change as a result.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agoxen/x86: Remove GB macro in asm-x86/config.h
Julien Grall [Tue, 8 Oct 2013 16:48:33 +0000 (17:48 +0100)]
xen/x86: Remove GB macro in asm-x86/config.h

Commit 983843e "xen: Add macros MB and GB" introduce a generic GB macro.
By mistake, the macro in asm-x86/config.h was not removed. This is result to
a compilation error when Xen is build for x86.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agoxen/dts: Support Linux initrd DT bindings
Julien Grall [Fri, 27 Sep 2013 16:56:37 +0000 (17:56 +0100)]
xen/dts: Support Linux initrd DT bindings

Linux uses the property linux,initrd-start and linux,initrd-end to know where
the initrd lives in memory.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Add support to load initrd in dom0
Julien Grall [Fri, 27 Sep 2013 16:56:36 +0000 (17:56 +0100)]
xen/arm: Add support to load initrd in dom0

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/dts: Use ROUNDUP macro instead of the internal ALIGN
Julien Grall [Fri, 27 Sep 2013 16:56:35 +0000 (17:56 +0100)]
xen/dts: Use ROUNDUP macro instead of the internal ALIGN

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: Add macro ROUNDUP
Julien Grall [Fri, 27 Sep 2013 16:56:34 +0000 (17:56 +0100)]
xen: Add macro ROUNDUP

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
11 years agoxen: Add macros MB and GB
Julien Grall [Fri, 27 Sep 2013 16:56:33 +0000 (17:56 +0100)]
xen: Add macros MB and GB

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/HPET: basic cleanup
Andrew Cooper [Tue, 8 Oct 2013 09:09:22 +0000 (11:09 +0200)]
x86/HPET: basic cleanup

* Strip trailing whitespace
* Remove redundant definitions
* Update stale documentation links
* Move hpet_address into __initdata

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agoVT-d: fix suspected data race condition in iommu_set_root_entry()
Andrew Cooper [Tue, 8 Oct 2013 09:06:48 +0000 (11:06 +0200)]
VT-d: fix suspected data race condition in iommu_set_root_entry()

Coverity ID: 1054967

Coverity spotted that iommu->root_maddr was optionally allocated within the
protection of the iommu->lock, but was referenced with the protection of the
iommu->register_lock, and freed without any lock.

Luckily, the code as-is is not vulnerable to the potential risks identified.

However, the alloc_pgtable_maddr() is far more appropriately done in
iommu_alloc(), removing a set of spinlock calls, and a possibility for the
iommu setup to fail later than iommu_alloc() with an -ENOMEM.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
11 years agolibxc: add LZ4 decompression support
Jan Beulich [Mon, 7 Oct 2013 07:42:51 +0000 (09:42 +0200)]
libxc: add LZ4 decompression support

Since there's no shared or static library to link against, this simply
re-uses the hypervisor side code. However, I only audited the code
added here for possible security issues, not the referenced code in
the hypervisor tree.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: add LZ4 decompression support
Kyungsik Lee [Mon, 7 Oct 2013 07:40:35 +0000 (09:40 +0200)]
xen: add LZ4 decompression support

Add support for LZ4 decompression in Xen. LZ4 Decompression APIs for
Xen are based on LZ4 implementation by Yann Collet.

Benchmark Results(PATCH v3)
Compiler: Linaro ARM gcc 4.6.2

1. ARMv7, 1.5GHz based board
   Kernel: linux 3.4
   Uncompressed Kernel Size: 14MB
        Compressed Size  Decompression Speed
   LZO  6.7MB            20.1MB/s, 25.2MB/s(UA)
   LZ4  7.3MB            29.1MB/s, 45.6MB/s(UA)

2. ARMv7, 1.7GHz based board
   Kernel: linux 3.7
   Uncompressed Kernel Size: 14MB
        Compressed Size  Decompression Speed
   LZO  6.0MB            34.1MB/s, 52.2MB/s(UA)
   LZ4  6.5MB            86.7MB/s
- UA: Unaligned memory Access support
- Latest patch set for LZO applied

This patch set is for adding support for LZ4-compressed Kernel.  LZ4 is a
very fast lossless compression algorithm and it also features an extremely
fast decoder [1].

But we have five of decompressors already and one question which does
arise, however, is that of where do we stop adding new ones?  This issue
had been discussed and came to the conclusion [2].

Russell King said that we should have:

 - one decompressor which is the fastest
 - one decompressor for the highest compression ratio
 - one popular decompressor (eg conventional gzip)

If we have a replacement one for one of these, then it should do exactly
that: replace it.

The benchmark shows that an 8% increase in image size vs a 66% increase
in decompression speed compared to LZO(which has been known as the
fastest decompressor in the Kernel).  Therefore the "fast but may not be
small" compression title has clearly been taken by LZ4 [3].

[1] http://code.google.com/p/lz4/
[2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157
[3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347

LZ4 homepage: http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository: http://code.google.com/p/lz4/

Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: Improve information from domain_crash_synchronous
Andrew Cooper [Fri, 4 Oct 2013 10:58:20 +0000 (12:58 +0200)]
x86: Improve information from domain_crash_synchronous

As it currently stands, the string "domain_crash_sync called from entry.S" is
not helpful at identifying why the domain was crashed, and a debug build of
Xen doesn't help the matter

This patch improves the information printed, by pointing to where the crash
decision was made.

Specific improvements include:
 * Moving the ascii string "domain_crash_sync called from entry.S\n" away from
   some semi-hot code cache lines.
 * Moving the printk into C code (especially as this_cpu() is miserable to use
   in assembly code)
 * Undo the previous confusing situation of having the
   domain_crash_synchronous() as a macro in C code, yet a global symbol in
   assembly code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/traps: Record last extable faulting address
Andrew Cooper [Fri, 4 Oct 2013 10:57:43 +0000 (12:57 +0200)]
x86/traps: Record last extable faulting address

... so the following patch can identify the location of faults leading to a
decision to crash a domain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: allow HVM guests to make console_io hypercall
Konrad Rzeszutek Wilk [Fri, 4 Oct 2013 10:54:38 +0000 (12:54 +0200)]
x86: allow HVM guests to make console_io hypercall

The console_io hypercall is provided for PV guests and for HVM
guests it is done via the 0xe9 port. However the PV hypercall
is more efficient as it takes a string rather than one character
per write.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
11 years agoxsm: clean up unneeded current references
Daniel De Graaf [Fri, 4 Oct 2013 10:52:56 +0000 (12:52 +0200)]
xsm: clean up unneeded current references

Some XSM hooks in dummy.h used current->domain when this was also passed
as a parameter; use the parameter in these cases. There are two hooks
where this does not apply and which are not immediately obvious:
xsm_set_target's parameters are the device model and HVM domains, and
xsm_mem_sharing_op's first parameter is the source of the shared page,
not the domain making the hypercall.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoxsm: forbid PV guest console reads
Daniel De Graaf [Fri, 4 Oct 2013 10:51:44 +0000 (12:51 +0200)]
xsm: forbid PV guest console reads

The CONSOLEIO_read operation was incorrectly allowed to PV guests if the
hypervisor was compiled in debug mode (with VERBOSE defined).

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agox86: make hvm_cpuid() tolerate NULL pointers
Jan Beulich [Fri, 4 Oct 2013 10:32:25 +0000 (12:32 +0200)]
x86: make hvm_cpuid() tolerate NULL pointers

Now that other HVM code started making more extensive use of
hvm_cpuid(), let's not force every caller to declare dummy variables
for output not cared about.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
11 years agoNested VMX: fix IA32_VMX_CR4_FIXED1 msr emulation
Yang Zhang [Fri, 4 Oct 2013 10:30:09 +0000 (12:30 +0200)]
Nested VMX: fix IA32_VMX_CR4_FIXED1 msr emulation

Currently, it use hardcode value for IA32_VMX_CR4_FIXED1. This is wrong.
We should check guest's cpuid to know which bits are writeable in CR4 by guest
and allow the guest to set the corresponding bit only when guest has the feature.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Cleanup.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
11 years agoVMX: clean up capability checks
Jan Beulich [Fri, 4 Oct 2013 10:29:08 +0000 (12:29 +0200)]
VMX: clean up capability checks

VMCS size validation on APs should check against BP's size.

No need for a separate cpu_has_vmx_ins_outs_instr_info variable
anymore.

Use proper symbolics.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
11 years agoNested VMX: check VMX capability before read VMX related MSRs
Yang Zhang [Fri, 4 Oct 2013 10:28:14 +0000 (12:28 +0200)]
Nested VMX: check VMX capability before read VMX related MSRs

VMX MSRs only available when the CPU support the VMX feature. In addition,
VMX_TRUE* MSRs only available when bit 55 of VMX_BASIC MSR is set.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Cleanup.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
11 years agox86/percpu: Force INVALID_PERCPU_AREA into the non-canonical address region
Andrew Cooper [Fri, 4 Oct 2013 10:24:34 +0000 (12:24 +0200)]
x86/percpu: Force INVALID_PERCPU_AREA into the non-canonical address region

This causes accidental uses of per_cpu() on a pcpu with an INVALID_PERCPU_AREA
to result in a #GF for attempting to access the middle of the non-canonical
virtual address region.

This is preferable to the current behaviour, where incorrect use of per_cpu()
will result in an effective NULL structure dereference which has security
implication in the context of PV guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/idle: Fix get_cpu_idle_time()'s interaction with offline pcpus
Andrew Cooper [Fri, 4 Oct 2013 10:23:23 +0000 (12:23 +0200)]
x86/idle: Fix get_cpu_idle_time()'s interaction with offline pcpus

Checking for "idle_vcpu[cpu] != NULL" is insufficient protection against
offline pcpus.  From a hypercall, vcpu_runstate_get() will determine "v !=
current", and try to take the vcpu_schedule_lock().  This will try to look up
per_cpu(schedule_data, v->processor) and promptly suffer a NULL structure
deference as v->processors' __per_cpu_offset is INVALID_PERCPU_AREA.

One example might look like this:

...
Xen call trace:
   [<ffff82c4c0126ddb>] vcpu_runstate_get+0x50/0x113
   [<ffff82c4c0126ec6>] get_cpu_idle_time+0x28/0x2e
   [<ffff82c4c012b5cb>] do_sysctl+0x3db/0xeb8
   [<ffff82c4c023280d>] compat_hypercall+0xbd/0x116

Pagetable walk from 0000000000000040:
 L4[0x000] = 0000000186df8027 0000000000028207
 L3[0x000] = 0000000188e36027 00000000000261c9
 L2[0x000] = 0000000000000000 ffffffffffffffff

****************************************
Panic on CPU 11:
...

get_cpu_idle_time() has been updated to correctly deal with offline pcpus
itself by returning 0, in the same way as it would if it was missing the
idle_vcpu[] pointer.

In doing so, XENPF_getidletime needed updating to correctly retain its
described behaviour of clearing bits in the cpumap for offline pcpus.

As this crash can only be triggered with toolstack hypercalls, it is not a
security issue and just a simple bug.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agolibxl: correctly handle libxl_get_cpu_topology failure in libxl_{cpu, node}map_to_...
Matthew Daley [Sun, 29 Sep 2013 05:47:37 +0000 (18:47 +1300)]
libxl: correctly handle libxl_get_cpu_topology failure in libxl_{cpu, node}map_to_{node, cpu}map

Initialize nr_cpus to 0 so that if it is unchanged by a failing
libxl_get_cpu_topology, libxl_cputopology_list_free still works OK
afterward.

Coverity-ID: 1055294
Coverity-ID: 1055295
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
11 years agoxen/arm: map_domain_page: reuse slots with avail == 0
Stefano Stabellini [Mon, 30 Sep 2013 12:06:12 +0000 (13:06 +0100)]
xen/arm: map_domain_page: reuse slots with avail == 0

If a slot has avail == 0 but still points to the right mfn, reuse it.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: only put poller if already gotten in libxl_event_wait
Matthew Daley [Sun, 29 Sep 2013 05:24:36 +0000 (18:24 +1300)]
libxl: only put poller if already gotten in libxl_event_wait

Coverity-ID: 1055292
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxc: only munmap when something has actually been mapped in change_pte
Matthew Daley [Sun, 29 Sep 2013 01:35:02 +0000 (14:35 +1300)]
libxc: only munmap when something has actually been mapped in change_pte

Coverity-ID: 1055269
signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxm-test: fix the ip allocation function
Zhu Yanhai [Mon, 30 Sep 2013 08:12:10 +0000 (16:12 +0800)]
xm-test: fix the ip allocation function

__findFirstOctetIP() is expecting min and max available octets according to
its code, however the caller getFreeIP() gives it the min octet and (max -
min + 1), which is the length instead.

Signed-off-by: Zhu Yanhai <gaoyang.zyh@taobao.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm32: don't export v7_init
Julien Grall [Fri, 27 Sep 2013 16:49:52 +0000 (17:49 +0100)]
xen/arm32: don't export v7_init

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxl: fork before execing vncviewer
Ian Campbell [Fri, 27 Sep 2013 10:16:22 +0000 (11:16 +0100)]
xl: fork before execing vncviewer

Otherwise we don't daemonize to monitor the domain.

Heavily cargo-culted from autoconnect-console and only compile tested.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
11 years agolibxl: handle null lists in libxl_string_list_length
Matthew Daley [Fri, 27 Sep 2013 11:29:10 +0000 (23:29 +1200)]
libxl: handle null lists in libxl_string_list_length

After commit b0be2b12 ("libxl: fix libxl_string_list_length and its only
caller") libxl_string_list_length no longer handles null (empty) lists. Fix
so they are handled, returning length 0.

While at it, remove the unneccessary undereferenced null pointer check
and tidy the layout of the function.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agox86: don't blindly create L3 tables for the direct map
Jan Beulich [Mon, 30 Sep 2013 13:28:12 +0000 (15:28 +0200)]
x86: don't blindly create L3 tables for the direct map

Now that the direct map area can extend all the way up to almost the
end of address space, this is wasteful.

Also fold two almost redundant messages in SRAT parsing into one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: properly set up fbld emulation operand address
Jan Beulich [Mon, 30 Sep 2013 12:18:58 +0000 (14:18 +0200)]
x86: properly set up fbld emulation operand address

This is CVE-2013-4361 / XSA-66.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agox86/mm/shadow: Fix initialization of PV shadow L4 tables.
Tim Deegan [Mon, 30 Sep 2013 12:18:25 +0000 (14:18 +0200)]
x86/mm/shadow: Fix initialization of PV shadow L4 tables.

Shadowed PV L4 tables must have the same Xen mappings as their
unshadowed equivalent.  This is done by copying the Xen entries
verbatim from the idle pagetable, and then using guest_l4_slot()
in the SHADOW_FOREACH_L4E() iterator to avoid touching those entries.

adc5afbf1c70ef55c260fb93e4b8ce5ccb918706 (x86: support up to 16Tb)
changed the definition of ROOT_PAGETABLE_XEN_SLOTS to extend right to
the top of the address space, which causes the shadow code to
copy Xen mappings into guest-kernel-address slots too.

In the common case, all those slots are zero in the idle pagetable,
and no harm is done.  But if any slot above #271 is non-zero, Xen will
crash when that slot is later cleared (it attempts to drop
shadow-pagetable refcounts on its own L4 pagetables).

Fix by using the new ROOT_PAGETABLE_PV_XEN_SLOTS when appropriate.
Monitor pagetables need the full Xen mappings, so they keep using the
old name (with its new semantics).

This is CVE-2013-4356 / XSA-64.

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agox86: properly handle hvm_copy_from_guest_{phys,virt}() errors
Jan Beulich [Mon, 30 Sep 2013 12:17:46 +0000 (14:17 +0200)]
x86: properly handle hvm_copy_from_guest_{phys,virt}() errors

Ignoring them generally implies using uninitialized data and, in all
but two of the cases dealt with here, potentially leaking hypervisor
stack contents to guests.

This is CVE-2013-4355 / XSA-63.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/AMD-Vi: Fix IVRS HPET special->handle override
Suravee Suthikulpanit [Mon, 30 Sep 2013 12:00:44 +0000 (14:00 +0200)]
x86/AMD-Vi: Fix IVRS HPET special->handle override

The current logic does not handle the case when HPET special->handle
is invalid in IVRS. On such system, the following message is shown:

(XEN) AMD-Vi: Failed to setup HPET MSI remapping: Wrong HPET

This patch will allow the ivrs_hpet[<handle>]=<sbdf> to override the
IVRS.  Also, it removes struct hpet_sbdf.iommu since it is not
used anywhere in the code.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
11 years agocpupools: update domU's node-affinity on the cpupool_unassign_cpu() path
Dario Faggioli [Mon, 30 Sep 2013 11:59:47 +0000 (13:59 +0200)]
cpupools: update domU's node-affinity on the cpupool_unassign_cpu() path

that is, when a cpu is remove from a pool, as it is happening already
on the cpupool_assign_cpu_*() path (i.e., when a cpu is added to a
pool).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
11 years agoNested VMX: Expose unrestricted guest feature to guest
Yang Zhang [Mon, 30 Sep 2013 11:58:48 +0000 (13:58 +0200)]
Nested VMX: Expose unrestricted guest feature to guest

With virtual unrestricted guest feature, L2 guest is allowed to run
with PG cleared. Also, allow PAE not set during virtual vmexit emulation.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Eddie.Dong@intel.com
11 years agoxen: arm: move smp_init_cpus to smpboot.c
Ian Campbell [Fri, 27 Sep 2013 09:30:29 +0000 (10:30 +0100)]
xen: arm: move smp_init_cpus to smpboot.c

Seems like a better home.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: split cpu0's domheap mapping PTs out from xen_second
Ian Campbell [Mon, 16 Sep 2013 20:26:48 +0000 (21:26 +0100)]
xen: arm: split cpu0's domheap mapping PTs out from xen_second

Now that bringup has been rewritten we don't need these 4 contiguous pages for
the 1:1 map. So split them out and only allocate them for 32 bit

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen: arm: configure TCR_EL2 for 40 bit physical address space
Ian Campbell [Mon, 16 Sep 2013 20:39:22 +0000 (21:39 +0100)]
xen: arm: configure TCR_EL2 for 40 bit physical address space

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen: arm: use symbolic names for MPIDR bits.
Ian Campbell [Fri, 20 Sep 2013 16:51:20 +0000 (17:51 +0100)]
xen: arm: use symbolic names for MPIDR bits.

arm32 already uses MPIDR_HWID_MASK, use it on arm64 too. Add MPIDR_{SMP,UP}
(and bitwise equivalents) and use them.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen: arm: rewrite start of day page table and cpu bring up
Ian Campbell [Thu, 29 Aug 2013 15:25:00 +0000 (16:25 +0100)]
xen: arm: rewrite start of day page table and cpu bring up

This is unfortunately a rather large monolithic patch.

Rather than bringing up all CPUs in lockstep as we setup paging and relocate
Xen instead create a simplified set of dedicated boot time pagetables.

This allows secondary CPUs to remain powered down or in the firmware until we
actually want to enable them. The bringup is now done later on in C and can be
driven by DT etc. I have included code for the vexpress platform, but other
platforms will need to be added.

The mechanism for deciding how to bring up a CPU differs between arm32 and
arm64. On arm32 it is essentially a per-platform property, with the exception
of PSCI which can be implemented globally (but isn't here). On arm64 there is a
per-cpu property in the device tree.

Secondary CPUs are brought up directly into the relocated Xen image, instead of
relying on being able to launch on the unrelocated Xen and hoping that it
hasn't been clobbered.

As part of this change drop support for switching from secure mode to NS HYP as
well as the early CPU kick. Xen now requires that it is launched in NS HYP
mode and that firmware configure things such that secondary CPUs can be woken
up by a primarly CPU in HYP mode. This may require fixes to bootloaders or the
use of a boot wrapper.

The changes done here (re)exposed an issue with relocating Xen and the compiler
spilling values to the stack between the copy and the actual switch to the
relocaed copy of Xen in setup_pagetables. Therefore switch to doing the copy
and switch in a single asm function where we can control precisely what gets
spilled to the stack etc.

Since we now have a separate set of boot pagetables it is much easier to build
the real Xen pagetables inplace before relocating rather than the more complex
approach of rewriting the pagetables in the relocated copy before switching.

This will also enable Xen to be loaded above the 4GB boundary on 64-bit.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: implement smp initialisation callbacks for exynos5
Ian Campbell [Fri, 27 Sep 2013 09:38:21 +0000 (10:38 +0100)]
xen: arm: implement smp initialisation callbacks for exynos5

These were removed in "xen: arm: rewrite start of day page table and cpu
bring up".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: implement arch/platform SMP and CPU initialisation framework
Ian Campbell [Fri, 20 Sep 2013 22:29:44 +0000 (23:29 +0100)]
xen: arm: implement arch/platform SMP and CPU initialisation framework

Includes an implementation for vexpress using the sysflags interface and
support for the ARMv8 "spin-table" method.

Unused until "rewrite start of day page table and cpu bring up", split out to
simplify review.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: add two new device tree helpers
Ian Campbell [Tue, 17 Sep 2013 01:27:49 +0000 (02:27 +0100)]
xen: arm: add two new device tree helpers

 - dt_property_read_u64
 - dt_find_node_by_type

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: make sure we stay within the memory bank during mm setup
Ian Campbell [Mon, 16 Sep 2013 16:57:08 +0000 (17:57 +0100)]
xen: arm: make sure we stay within the memory bank during mm setup

Otherwise if there is a module in another bank we can run off the end.

Rename *n to *end to make it clearer what is happening.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: Log the raw MIDR on boot.
Ian Campbell [Mon, 16 Sep 2013 14:47:05 +0000 (15:47 +0100)]
xen: arm: Log the raw MIDR on boot.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen: arm: build platform support only on the relevant arch
Ian Campbell [Fri, 27 Sep 2013 09:35:47 +0000 (10:35 +0100)]
xen: arm: build platform support only on the relevant arch

midway, omap5 and exynos are all 32-bit only platforms. This avoids needing
CONFIG_ARM_32 ifdefs around the SMP callbacks on such platforms.

Vexpress is both.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: Load xen under 4GB on 32-bit
Ian Campbell [Wed, 5 Jun 2013 09:08:35 +0000 (10:08 +0100)]
xen: arm: Load xen under 4GB on 32-bit

We need to be able to use a 1:1 mapping during bring up.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
11 years agox86/microcode: Check whether the microcode is correct
Konrad Rzeszutek Wilk [Fri, 27 Sep 2013 08:25:08 +0000 (10:25 +0200)]
x86/microcode: Check whether the microcode is correct

We do the microcode code update in two steps - the presmp:
'microcode_presmp_init' and when CPUs are brought up: 'microcode_init'.
The earlier performs the microcode update on the BSP - but
unfortunately it does not check whether the update failed. Which means
that we might try later to update a incorrect payload on the rest of
CPUs.

This patch handles this odd situation.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/microcode: Scan the initramfs payload for microcode blob
Konrad Rzeszutek Wilk [Fri, 27 Sep 2013 08:22:55 +0000 (10:22 +0200)]
x86/microcode: Scan the initramfs payload for microcode blob

The Linux kernel is able to update the microcode during early bootup
via inspection of the initramfs blob to see if there is an cpio image
with certain microcode files. Linux is able to function with two (or
more) cpio archives in the initrd b/c it unpacks all of the cpio
archives.

The format of the early initramfs is nicely documented in Linux's
Documentation/x86/early-microcode.txt:

Early load microcode
====================
By Fenghua Yu <fenghua.yu@intel.com>

Kernel can update microcode in early phase of boot time. Loading microcode early
can fix CPU issues before they are observed during kernel boot time.

Microcode is stored in an initrd file. The microcode is read from the initrd
file and loaded to CPUs during boot time.

The format of the combined initrd image is microcode in cpio format followed by
the initrd image (maybe compressed). Kernel parses the combined initrd image
during boot time. The microcode file in cpio name space is:
kernel/x86/microcode/GenuineIntel.bin

During BSP boot (before SMP starts), if the kernel finds the microcode file in
the initrd file, it parses the microcode and saves matching microcode in memory.
If matching microcode is found, it will be uploaded in BSP and later on in all
APs.

The cached microcode patch is applied when CPUs resume from a sleep state.

There are two legacy user space interfaces to load microcode, either through
/dev/cpu/microcode or through /sys/devices/system/cpu/microcode/reload file
in sysfs.

In addition to these two legacy methods, the early loading method described
here is the third method with which microcode can be uploaded to a system's
CPUs.

The following example script shows how to generate a new combined initrd file in
/boot/initrd-3.5.0.ucode.img with original microcode microcode.bin and
original initrd image /boot/initrd-3.5.0.img.

mkdir initrd
cd initrd
mkdir kernel
mkdir kernel/x86
mkdir kernel/x86/microcode
cp ../microcode.bin kernel/x86/microcode/GenuineIntel.bin
find .|cpio -oc >../ucode.cpio
cd ..
cat ucode.cpio /boot/initrd-3.5.0.img >/boot/initrd-3.5.0.ucode.img

As such this code inspects the initrd to see if the microcode
signatures are present and if so updates the hypervisor.

The option to turn this scan on/off is gated by the 'ucode'
parameter. The options are now:
 'scan'      Scan for the microcode in any multiboot payload.
 <index>     Attempt to load microcode blob (not the cpio archive
             format) from the multiboot payload number.

This option alters slightly the 'ucode' parameter by only allowing
either parameter:
  ucode=[<index>|scan]

Implementation wise the ucode_blob is defined as __initdata.
That is OK from the viewpoint of suspend/resume as the the underlaying
architecture microcode (microcode_intel or microcode_amd) end up saving
the blob in 'struct ucode_cpu_info' which is a per-cpu data
structure (see ucode_cpu_info). They end up saving it when doing the
pre-SMP (for CPU0) and SMP (for the rest) microcode loading.

Naturally if one does a hypercall to update the microcode and it is
newer, then the old per-cpu data is replaced.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agounmodified_drivers: enable build of usbfront driver
Olaf Hering [Fri, 27 Sep 2013 08:18:03 +0000 (10:18 +0200)]
unmodified_drivers: enable build of usbfront driver

Signed-off-by: Olaf Hering <olaf@aepfle.de>
11 years agohvmloader/smbios: Change strncpy to memcpy for anchor strings
Andrew Cooper [Fri, 27 Sep 2013 08:15:28 +0000 (10:15 +0200)]
hvmloader/smbios: Change strncpy to memcpy for anchor strings

Coverity complains about the use of strncpy() to completely fill the anchor
strings, resulting in an unterminated string.

Although the strncpy result is correct, the anchor strings are not strings in
the C sense, and use of memcpy is the prevaling style elsewhere in hvmloader
anyway.

While tidying up the style in this function, also remove some trailing
whitespace and gratuitous cast.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoAMD IOMMU: fix Dom0 device setup failure for host bridges
Suravee Suthikulpanit [Fri, 27 Sep 2013 08:11:49 +0000 (10:11 +0200)]
AMD IOMMU: fix Dom0 device setup failure for host bridges

The host bridge device (i.e. 0x18 for AMD) does not require IOMMU, and
therefore is not included in the IVRS. The current logic tries to map
all PCI devices to an IOMMU. In this case, "xl dmesg" shows the
following message on AMD sytem.

(XEN) setup 0000:00:18.0 for d0 failed (-19)
(XEN) setup 0000:00:18.1 for d0 failed (-19)
(XEN) setup 0000:00:18.2 for d0 failed (-19)
(XEN) setup 0000:00:18.3 for d0 failed (-19)
(XEN) setup 0000:00:18.4 for d0 failed (-19)
(XEN) setup 0000:00:18.5 for d0 failed (-19)

This patch adds a new device type (i.e. DEV_TYPE_PCI_HOST_BRIDGE) which
corresponds to PCI class code 0x06 and sub-class 0x00. Then, it uses
this new type to filter when trying to map device to IOMMU.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
On VT-d refuse (un)mapping host bridges for other than the hardware
domain.

Coding style cleanup.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
11 years agoxen: support RAM at addresses 0 and 4096
Ian Campbell [Thu, 26 Sep 2013 11:35:42 +0000 (12:35 +0100)]
xen: support RAM at addresses 0 and 4096

Currently the mapping from pages to zones causes the page at zero to go into
zone -1 and the page at 4096 to go into zone 0, which is the Xen zone
(confusing various assertions).

Arrange instead for the mapping to be such that zone 0 is always reserved for
Xen and all other pages map to a zone >= 1.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Cc: jbeulich@suse.com
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: print the location of the Xen heap on 32 bit
Ian Campbell [Thu, 26 Sep 2013 11:35:41 +0000 (12:35 +0100)]
xen/arm: print the location of the Xen heap on 32 bit

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: rename boot misc region to boot reloc now it has a single purpose
Ian Campbell [Thu, 26 Sep 2013 11:35:40 +0000 (12:35 +0100)]
xen/arm: rename boot misc region to boot reloc now it has a single purpose

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: Support dtb /memreserve/ regions
Ian Campbell [Thu, 26 Sep 2013 11:35:39 +0000 (12:35 +0100)]
xen/arm: Support dtb /memreserve/ regions

This requires a mapping of the DTB during setup_mm. Previously this was in
the BOOT_MISC slot, which is clobbered by setup_pagetables. Split it out
into its own slot which can be preserved.

Also handle these regions as part of consider_modules() and when adding pages
to the heaps to ensure we do not locate any part of Xen or the heaps over
them.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: cope with modules outside of "visible" RAM
Ian Campbell [Thu, 26 Sep 2013 11:35:38 +0000 (12:35 +0100)]
xen/arm: cope with modules outside of "visible" RAM

This can happen if modules are in a bank which we can't cope with e.g. due to
being non-contiguous.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: do not relocate Xen outside of visible RAM
Ian Campbell [Thu, 26 Sep 2013 11:35:37 +0000 (12:35 +0100)]
xen/arm: do not relocate Xen outside of visible RAM

Since we do not handle non-contiguous banks of memory lets avoid relocating
Xen into such a bank. Avoids issues such as free_init_memory releasing pages
which are outside of the frametable.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: Reserve FDT via early module mechanism
Ian Campbell [Thu, 26 Sep 2013 11:35:36 +0000 (12:35 +0100)]
xen/arm: Reserve FDT via early module mechanism

This will stop us putting any heaps or relocating Xen itself over the FDT.

The devicetree will be copied to allocated memory in setup_mm and the
original copy will be freed by discard_initial_modules.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: DOMHEAP_SECOND_PAGES is arm32 specific
Ian Campbell [Thu, 26 Sep 2013 11:35:35 +0000 (12:35 +0100)]
xen/arm: DOMHEAP_SECOND_PAGES is arm32 specific

since 5263507b1b4a "xen: arm: Use a direct mapping of RAM on arm64"

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: ensure the xenheap is 32MB aligned
Ian Campbell [Thu, 26 Sep 2013 11:35:34 +0000 (12:35 +0100)]
xen/arm: ensure the xenheap is 32MB aligned

My patch 08693f5948d8 "xen: arm: reduce the size of the xen heap to max 1/8
RAM size" unintentionally violated the constraint that the xenheap must be
32MB aligned, since we only explicitly align the end of the heap and
xenheap_pages was not a multiple of 32 pages.

Round xenheap pages up to a 32MB boundary.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen/arm: Don't dump stack when the VCPU is offline
Julien Grall [Wed, 25 Sep 2013 12:12:47 +0000 (13:12 +0100)]
xen/arm: Don't dump stack when the VCPU is offline

When a VCPU is not yet online, the registers contain garbagge. This will
result to call randomly BUG() in show_guest_stack.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: arm: use new 64-bit zImage magic numbers for Xen binary
Ian Campbell [Wed, 25 Sep 2013 11:21:51 +0000 (12:21 +0100)]
xen: arm: use new 64-bit zImage magic numbers for Xen binary

Upstream commit 4370eec05a88 "arm64: Expand arm64 image header" ended up
changing the zImage magic (which was actually the initial branch instructio
encoding!). The new header has a proper magic number at a fixed location.

Switch Xen itself to using this format. Neither the bootwrapper nor the
models care about this header themselves and real bootloaders are not widely
used, so now is as good a time as any to switch (as upstream have proven)

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>