]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agomove domain to cpupool0 before destroying it
Juergen Gross [Tue, 20 May 2014 13:55:42 +0000 (15:55 +0200)]
move domain to cpupool0 before destroying it

Currently when a domain is destroyed it is removed from the domain_list
before all of it's resources, including the cpupool membership, are freed.
This can lead to a situation where the domain is still member of a cpupool
without for_each_domain_in_cpupool() (or even for_each_domain()) being
able to find it any more. This in turn can result in rejection of removing
the last cpu from a cpupool, because there seems to be still a domain in
the cpupool, even if it can't be found by scanning through all domains.

This situation can be avoided by moving the domain to be destroyed to
cpupool0 first and then remove it from this cpupool BEFORE deleting it from
the domain_list. As cpupool0 is always active and a domain without any cpupool
membership is implicitly regarded as belonging to cpupool0, this poses no
problem.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
10 years agoVT-d: extend error report masking workaround to newer chipsets
Jan Beulich [Tue, 20 May 2014 13:54:01 +0000 (15:54 +0200)]
VT-d: extend error report masking workaround to newer chipsets

Add two more PCI IDs to the set that has been taken care of with a
different workaround long before XSA-59, and (for constency with the
newer workarounds) log a message here too.

Also move the function wide comment to the cases it applies to; this
should really have been done by d061d200 ("VT-d: suppress UR signaling
for server chipsets").

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agoVT-d: apply quirks at device setup time rather than only at boot
Jan Beulich [Tue, 20 May 2014 13:53:20 +0000 (15:53 +0200)]
VT-d: apply quirks at device setup time rather than only at boot

Accessing extended config space may not be possible at boot time, e.g.
when the memory space used by MMCFG is reserved only via ACPI tables,
but not in the E820/UEFI memory maps (which we need Dom0 to tell us
about). Consequently the change here still leaves the issue unaddressed
for systems where the extended config space remains inaccessible (due
to firmware bugs, i.e. not properly reserving the address space of
those regions).

With the respective messages now potentially getting logged more than
once, we ought to consider whether we should issue them only if we in
fact were required to do any masking (i.e. if the relevant mask bits
weren't already set).

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
10 years agolibxl: Rerun flex/bison for xl discard support
Ian Jackson [Mon, 19 May 2014 14:17:03 +0000 (15:17 +0100)]
libxl: Rerun flex/bison for xl discard support

In 417e6b70 I overlooked the requirement to rerun bison/flex.  Do that
now.  The changes are exactly those which are the result of 417e6b70.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
10 years agoadd Yang and Kevin as the new maintainer of VT-d stuff
Xiantao Zhang [Mon, 19 May 2014 14:10:56 +0000 (16:10 +0200)]
add Yang and Kevin as the new maintainer of VT-d stuff

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Restricted the change's effect to what its subject says: Replace the
VT-d maintainers, i.e. drop the new additions for the generic IOMMU
code for the time being.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/misc: post cleanup
Andrew Cooper [Mon, 19 May 2014 12:24:45 +0000 (14:24 +0200)]
x86/misc: post cleanup

* panic() now works on early boot.  Replace EARLY_FAIL()
* Cleanup __set_intr_gate() & friends.  The master IDT is fully constructed on
  early boot, and only subsequently altered on the crash path.  Make them
  private to traps.c, move them into .init, and remove the loop over all idts,
  as __set_intr_gate() will never find an AP to patch. (For some reason,
  leaving out the noinline causes ~1.5k of code bloat from GCC inlining
  everything)
* No need to clear X86_EFLAGS_NT in cpu_init().  This is covered by the eflags
  reset in __high_start().
* Missing '\n' from unexpected MCE printk.
* load_system_tables() is x86 specific.  Move its declaration into an x86 header.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/irqs: move interrupt-stub generation out of C
Andrew Cooper [Mon, 19 May 2014 12:24:04 +0000 (14:24 +0200)]
x86/irqs: move interrupt-stub generation out of C

In addition, generate stubs for reserved exceptions.  These go through the
standard handle_exception mechanism, although the C handler do_reserved_trap()
is a terminal error path.

 * Move all automatic stub generation out of i8259.c and into entry.S.
 * Move patching of the master IDT into trap_init(). Provide ASSERT()s to
   ensure we have fully populated the IDT and don't accidentally clobbered any
   preexisting traps.
 * Demote TRAP_copro_seg and TRAP_spurious_int to being reserved exceptions
   and remove their custom entry points.
 * Point double_fault's exception_table entry at do_reserved_trap.  We do not
   ever expect to enter a real double fault this way.
 * Acquaint Xen with #VE but leave it reserved.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/boot: drop pre-C IDT patching
Andrew Cooper [Mon, 19 May 2014 12:22:28 +0000 (14:22 +0200)]
x86/boot: drop pre-C IDT patching

It is not needed now that __start_xen sets itself up with complete trap
handlers as its first action.  This fixes a potential issue introduced in

  c/s 7e510a7b874
  "x86/boot: move some __high_start code and data into init sections"

which would leave ignore_int (in the .init section) patched into the reserved
exceptions in all IDTs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agolibxl: add option for discard support to xl disk configuration
Olaf Hering [Mon, 19 May 2014 09:50:19 +0000 (11:50 +0200)]
libxl: add option for discard support to xl disk configuration

Handle new boolean option discard/no-discard for disk configuration. It
is supposed to disable discard support if file based backing storage was
intentionally created non-sparse to avoid fragmentation of the file.

The option intended for the backend driver. A new boolean property
"discard-enable" is written to the backend node. An upcoming patch for
qemu will make use of this property. The kernel blkback driver may be
updated as well to disable discard for phy based backing storage.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxc: Free logger after printing error message
Jason Andryuk [Fri, 16 May 2014 20:41:17 +0000 (16:41 -0400)]
libxc: Free logger after printing error message

On error, PERROR calls the already destroyed logger, which can segfault.
Re-order the calls, so the logger is still available.

Signed-off-by: Jason Andryuk <andryuk@aero.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agox86/boot: correct CR4 setup on APs
Andrew Cooper [Fri, 16 May 2014 15:41:10 +0000 (17:41 +0200)]
x86/boot: correct CR4 setup on APs

It is not safe to load mmu_cr4_features into cr4 early on AP start.  Features
such as MCE require an int 0x18 handler to be set up.

Instead, load the minimum Xen CR4 features early but defer loading the full
'mmu_cr4_features' set until after the IDT has been set up.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/boot: install trap handlers much earlier on boot
Andrew Cooper [Fri, 16 May 2014 15:39:07 +0000 (17:39 +0200)]
x86/boot: install trap handlers much earlier on boot

Patch the trap handlers into the master idt very early on boot, and setup &
load the GDT, IDT, TR and LDT. Load the IDT before the TR so we stand a chance
of catching an invalid TSS exception rather than triple faulting.

This provides full exception support far earlier on boot than previously.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/traps: functional prep work
Andrew Cooper [Fri, 16 May 2014 15:38:16 +0000 (17:38 +0200)]
x86/traps: functional prep work

* Promote certain actions to earlier in __start_xen().
* Declare double_fault and early_page_fault as standard trap handlers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/misc: early cleanup
Andrew Cooper [Fri, 16 May 2014 15:37:46 +0000 (17:37 +0200)]
x86/misc: early cleanup

Various bits of cleanup without functional impact as far as the series goes,
but make subsequent patches cleaner.

* WARN_ON(1) is just WARN().
* Replace hand-crafted rolled stack printing with fatal_trap().
* 16 BSS bytes is overkill for an empty idtr to triple fault with.  Construct
  it on the stack using an appropriate struct, and correct the asm memory
  constraint.
* Fix watchdog asymmetry in panic().  machine_halt() needs just as much
  watchdog care as machine_restart(), but it should be up to the arch
  implementation of machine_{halt,restart}() to play with the watchdog.
* unsigned and const correctness for trapstr(), along with whitespace cleanup.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/traps: make the main trap handlers safe for use early during Xen boot
Andrew Cooper [Fri, 16 May 2014 15:37:18 +0000 (17:37 +0200)]
x86/traps: make the main trap handlers safe for use early during Xen boot

Most of this patch is an analysis of the safety of the trap handlers.

Traps 0, 4, 5, 9-12, 16, 17 and 19 all end up in do_trap().  do_trap() is
mostly safe, performing an exception table search and possibly panic()s.

There is one complication with traps 16 and 19 which will see about calling
the fpu_exception_callback.  This involves following current which is not
valid early on boot.  The has_hvm_container_vcpu(curr) check is preceded with
a system_state check, so in the exceedingly unlikely case that Xen takes an
x87/SIMD trap while booting, it will panic() instead of following a bogus
current vcpu.

Traps 1, 3, 6-8, 13 and 15 are completely safe with respect to running during
early boot.  They all have well formed and obvious differences between faults
in Xen and faults in guests, with the Xen faults doing little more than
exception table walks or panic()s.

Trap 2 is a complicated codepath, but appears safe.  For the possible
injection of NMIs into dom0 there is a NULL domain pointer check.  The
possible softirq raised for PCI SERR will be delivered until we start the idle
vcpu, but is safe.

Trap 14 is very complicated.  The code is certainly unsafe for boot as
fixup_page_fault() will dereference current to find the running domain.  There
exists an explicit do_early_page_fault() handler which shall continue to be
used.

Trap 18 has a default handler before the MCE infrastructure is set up, which
has always been unsafe and liable to deadlock itself with the console lock.
As it is expected never to trigger, and if it did we would be in serious
problems, the simple printk() is replaced with a fatal error path.

Trap 20 (Virtualisation Exception) is currently not implemented.  It is fatal
one way or another, and will become more explicitly so with later changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/traps: make panic and reboot paths safe during early boot
Andrew Cooper [Fri, 16 May 2014 15:36:40 +0000 (17:36 +0200)]
x86/traps: make panic and reboot paths safe during early boot

Reverse two conditions in show_registers().  For an early crash, it is not
safe to dereference 'current' for its HVM status before knowing that it is a
guest vcpu.

Introduce SYS_STATE_smp_boot to distinguish the point at which APs need
considering before boot is complete.  There is one code change required as a
result; .init.text symbols are still in use before Xen is active, so alter its
predicate in is_active_kernel_text().

Make use of SYS_STATE_smp_boot in machine_{halt,restart}().  Before Xen starts
booting the APs, any execution here is certainly the BSP.

When halting or rebooting particularly early, this avoids the risks of a #PF
or #GP when accessing the LAPIC before generic_apic_probe(), as well as trying
to enable interrupts before init_IRQ() is complete.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agox86/traps: mnemonics for system descriptor types
Andrew Cooper [Fri, 16 May 2014 15:35:24 +0000 (17:35 +0200)]
x86/traps: mnemonics for system descriptor types

Avoids some particularly obscure magic numbers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoxen/arm: Drop event_mask in arch_vcpu
Julien Grall [Wed, 14 May 2014 13:14:54 +0000 (14:14 +0100)]
xen/arm: Drop event_mask in arch_vcpu

This field has not been used since a while, last use was before the
commit 4df76b3 "xen/arm: disable the event optimization in the gic" back
in July 2012.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/python: expose xc_getcpuinfo()
Zhigang Wang [Tue, 13 May 2014 20:32:33 +0000 (16:32 -0400)]
tools/python: expose xc_getcpuinfo()

This API can be used to get per physical CPU utilization.

Testing:

      # python
      >>> import xen.lowlevel.xc
      >>> xc = xen.lowlevel.xc.xc()
      >>> xc.getcpuinfo()
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: Required argument 'max_cpus' (pos 1) not found
      >>> xc.getcpuinfo(4)
      [{'idletime': 109322086128854}, {'idletime': 109336447648802},
      {'idletime': 109069270544960}, {'idletime': 109065612611363}]
      >>> xc.getcpuinfo(100)
      [{'idletime': 109639015806078}, {'idletime': 109654551195681},
      {'idletime': 109382107891193}, {'idletime': 109382057541119}]
      >>> xc.getcpuinfo(1)
      [{'idletime': 109682068418798}]
      >>> xc.getcpuinfo(2)
      [{'idletime': 109711311201330}, {'idletime': 109728458214729}]
      >>> xc.getcpuinfo(max_cpus=4)
      [{'idletime': 109747116214638}, {'idletime': 109764982453261},
      {'idletime': 109491373228931}, {'idletime': 109489858724432}]

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoautoconf: add variable for pass arbitrary options to qemu upstream
Fabio Fantoni [Fri, 9 May 2014 12:06:46 +0000 (14:06 +0200)]
autoconf: add variable for pass arbitrary options to qemu upstream

Added configure options for pass arbitrary configure options to qemu
upstream build.

Usage example:
./configure --with-extra-qemuu-configure-args="--enable-spice --enable-usb-redir"

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agolibxl: add stdvga video memory setting with upstream qemu
Fabio Fantoni [Fri, 9 May 2014 13:04:39 +0000 (15:04 +0200)]
libxl: add stdvga video memory setting with upstream qemu

Currently we set the stdvga video memory with qemu-traditional only, add the
necessary settings for qemu upstream too.

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agolibxl: fix cirrus vga video memory setting with upstream qemu
Fabio Fantoni [Fri, 9 May 2014 12:55:46 +0000 (14:55 +0200)]
libxl: fix cirrus vga video memory setting with upstream qemu

The Cirrus VGA videoram setting used with upstream qemu is wrong. Qemu
silently ignores the incorrect setting.

Switch to the correct vgamem_mb property which was added in qemu 1.3.

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Reviewed-by: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- updated changelog. ]

10 years agox86/setup: resync the boot stack 8 bytes at a time
Andrew Cooper [Thu, 15 May 2014 13:33:01 +0000 (15:33 +0200)]
x86/setup: resync the boot stack 8 bytes at a time

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/nmi: be less verbose when testing the NMI watchdog
David Vrabel [Thu, 15 May 2014 13:32:36 +0000 (15:32 +0200)]
x86/nmi: be less verbose when testing the NMI watchdog

There's no need to print all the CPUs that are ok, only the ones that
got stuck.

The resulting output is either:

  Testing NMI watchdog on all CPUs: 1 4 6 stuck

or

  Testing NMI watchdog on all CPUs: ok

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86/nmi: remove spurious local_irq_enable from check_nmi_watchdog()
David Vrabel [Thu, 15 May 2014 13:32:01 +0000 (15:32 +0200)]
x86/nmi: remove spurious local_irq_enable from check_nmi_watchdog()

All callers of check_nmi_watchdog() already have local irqs enabled so
remove the unpaired local_irq_enable().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agox86: correct MSI_ADDR_DEST_ID_MASK
Jan Beulich [Thu, 15 May 2014 13:27:37 +0000 (15:27 +0200)]
x86: correct MSI_ADDR_DEST_ID_MASK

This should only cover bits 12-19, in line with MSI_ADDR_DEST_ID_SHIFT.

Also replace a couple of open-coded uses of this shift and mask.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
10 years agoswitch internal hypercall restart indication from -EAGAIN to -ERESTART
Jan Beulich [Thu, 15 May 2014 13:26:12 +0000 (15:26 +0200)]
switch internal hypercall restart indication from -EAGAIN to -ERESTART

-EAGAIN being a return value we want to return to the actual caller in
a couple of cases makes this unsuitable for restart indication, and x86
already developed two cases where -EAGAIN could not be returned as
intended due to this (which is being fixed here at once).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com
Acked-by: Aravind Gopalakrishnan<Aravind.Gopalakrishnan@amd.com>
Reviewed-by: Tim Deegan <tim@xen.org>
10 years agotools: arm: remove code to check for a DTB appended to the kernel
Ian Campbell [Wed, 14 May 2014 14:12:01 +0000 (15:12 +0100)]
tools: arm: remove code to check for a DTB appended to the kernel

The code to check for an appended DTB was confusing and unnecessary. Since we
know the size of the kernel binary passed to us we should just load the entire
thing into guest RAM (subject to the limits checks). Removing this code avoids
a whole raft of overflow and alignment issues.

We also need to validate the limits of the segment where we intend to load the
kernel to avoid overflow issues.

For ARM32 we control the load address, but we need to validate the size. The
entry point is only relevant within the guest so we don't need to worry about
that.

For ARM64 we need to validate both the load address (which is the same as the
entry point) and the size.

This is XSA-95.

Reported-by: Thomas Leonard <talex5@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agoxen/arm: Don't give IOMMU devices to dom0 when iommu is disabled
Julien Grall [Tue, 13 May 2014 15:50:27 +0000 (16:50 +0100)]
xen/arm: Don't give IOMMU devices to dom0 when iommu is disabled

When iommu={disable,off,no,false} is given to Xen command line, the IOMMU
framework won't specify that the device shouldn't be passthrough to DOM0.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoMAINTAINERS: Add drivers/passthrough/arm
Julien Grall [Tue, 13 May 2014 15:50:26 +0000 (16:50 +0100)]
MAINTAINERS: Add drivers/passthrough/arm

Add the ARM IOMMU directory to "ARM ARCHITECTURE" part

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
10 years agoxen/passthrough: Introduce IOMMU ARM architecture
Julien Grall [Tue, 13 May 2014 15:50:25 +0000 (16:50 +0100)]
xen/passthrough: Introduce IOMMU ARM architecture

This patch contains the architecture to use IOMMUs on ARM. There is no
IOMMU drivers on this patch.

In this implementation, IOMMU page table will be shared with the P2M.

The code will run through the device tree and will initialize every IOMMU.
It's possible to have multiple IOMMUs on the same platform, but they must
be handled with the same driver. For now, there is no support for using
multiple iommu drivers at runtime.

Each new IOMMU drivers should contain:

static const char * const myiommu_dt_compat[] __initconst =
{
    /* list of device compatible with the drivers. Will be matched with
     * the "compatible" property on the device tree
     */
    NULL,
};

DT_DEVICE_START(myiommu, "MY IOMMU", DEVICE_IOMMU)
        .compatible = myiommu_compatible,
        .init = myiommu_init,
DT_DEVICE_END

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>
10 years agoxen/passthrough: iommu: Basic support of device tree assignment
Julien Grall [Tue, 13 May 2014 15:50:24 +0000 (16:50 +0100)]
xen/passthrough: iommu: Basic support of device tree assignment

Add IOMMU helpers to support device tree assignment/deassignment. This patch
introduces 2 new fields in the dt_device_node:
    - is_protected: Does the device is protected by an IOMMU
    - domain_list: Pointer to the next device assigned to the same
    domain

This commit contains only support to protected a device with DOM0.
Device passthrough to another guest won't work out-of-box.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>
10 years agoxen/arm: Introduce flush_tlb_domain
Julien Grall [Tue, 13 May 2014 15:50:17 +0000 (16:50 +0100)]
xen/arm: Introduce flush_tlb_domain

The pattern p2m_load_VTTBR(d) -> flush_tlb -> p2m_load_VTTBR(current->domain)
is used in few places.

Replace this usage by flush_tlb_domain which will take care of this pattern.
This will help to the readability of apply_p2m_changes which begin to be big.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- s/lisibility/readability/, s/speficied/specified/ ]

10 years agox86/MCE: bypass uninitialized vcpu in vMCE injection
Kai Huang [Wed, 14 May 2014 08:54:39 +0000 (10:54 +0200)]
x86/MCE: bypass uninitialized vcpu in vMCE injection

Dom0 may bring up less number of vCPUs than xen hypervisor actually created for
it, and in this case, on Intel platform, vMCE injection to dom0 will fail due to
injecting vMCE to uninitialized vcpu, and cause dom0 crash.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Acked-by: Christoph Egger <chegger@amazon.de>
10 years agoiommu: introduce arch specific code
Julien Grall [Wed, 14 May 2014 08:51:37 +0000 (10:51 +0200)]
iommu: introduce arch specific code

Currently the structure hvm_iommu (xen/include/xen/hvm/iommu.h) contains
x86 specific fields.

This patch creates:
    - arch_hvm_iommu structure which will contain architecture depend
    fields
    - arch_iommu_domain_{init,destroy} function to execute arch
    specific during domain creation/destruction

Also move iommu_use_hap_pt and domain_hvm_iommu in asm-x86/iommu.h.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
10 years agoiommu: split generic code
Julien Grall [Wed, 14 May 2014 08:50:22 +0000 (10:50 +0200)]
iommu: split generic code

The generic IOMMU framework code (xen/drivers/passthrough/iommu.c) contains
functions specific to x86 and PCI.

Split the framework in 3 distincts files:
    - iommu.c: contains generic functions shared between x86 and ARM
               (when it will be supported)
    - pci.c: contains specific functions for PCI passthrough
    - x86/iommu.c: contains specific functions for x86

io.c contains x86 HVM specific code. Only compile for x86.

This patch is mostly code movement in new files.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agopassthrough: rework hwdom_pvh_reqs to use it also on ARM
Julien Grall [Wed, 14 May 2014 08:49:17 +0000 (10:49 +0200)]
passthrough: rework hwdom_pvh_reqs to use it also on ARM

Hardware domain on ARM will have the same requirements as hwdom PVH when iommu
is enabled. Both PVH and ARM guest has paging mode translate enabled, so Xen
can use it to know if it needs to check the requirements.

Rename the function and remove "pvh" word in the panic message.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agopassthrough: vtd: iommu_set_hwdom_mapping is VTD specific
Julien Grall [Wed, 14 May 2014 08:48:37 +0000 (10:48 +0200)]
passthrough: vtd: iommu_set_hwdom_mapping is VTD specific

This function was exported in common header. Rename it and move the declaration
in drivers/passtrough/vtd/extern.h

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agopassthrough: amd: rename iommu_has_feature into amd_iommu_has_feature
Julien Grall [Wed, 14 May 2014 08:48:16 +0000 (10:48 +0200)]
passthrough: amd: rename iommu_has_feature into amd_iommu_has_feature

This function is AMD specific and the name will clash with a newly function
added in the IOMMU framework.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
10 years agopassthrough: amd: Remove domain_id from hvm_iommu
Julien Grall [Wed, 14 May 2014 08:47:02 +0000 (10:47 +0200)]
passthrough: amd: Remove domain_id from hvm_iommu

The structure hvm_iommu contains a shadow value of domain->domain_id. There
is no reason to not directly use domain->domain_id.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
10 years agox86/traps: do not inline do_trap() into 10 different handlers
Andrew Cooper [Mon, 12 May 2014 15:08:25 +0000 (17:08 +0200)]
x86/traps: do not inline do_trap() into 10 different handlers

Furthermore, trapnr can be pulled from regs->entry_vector to avoid a risk of
mismatching the trap number with the underlying exception state.

This nets 2K of code reduction.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86/tools: expose SMAP to HVM guests
Feng Wu [Mon, 12 May 2014 15:05:33 +0000 (17:05 +0200)]
x86/tools: expose SMAP to HVM guests

This patch exposes SMAP festure to HVM guests

Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/hvm: add SMAP support to HVM guest
Feng Wu [Mon, 12 May 2014 15:04:50 +0000 (17:04 +0200)]
x86/hvm: add SMAP support to HVM guest

Intel new CPU supports SMAP (Supervisor Mode Access Prevention).
SMAP prevents supervisor-mode accesses to any linear address with
a valid translation for which the U/S flag (bit 2) is 1 in every
paging-structure entry controlling the translation for the linear
address.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agox86: enable Supervisor Mode Access Prevention (SMAP) for Xen
Feng Wu [Mon, 12 May 2014 15:03:38 +0000 (17:03 +0200)]
x86: enable Supervisor Mode Access Prevention (SMAP) for Xen

Supervisor Mode Access Prevention (SMAP) is a new security
feature disclosed by Intel, please refer to the following
document:

http://software.intel.com/sites/default/files/319433-014.pdf

If CR4.SMAP = 1, supervisor-mode data accesses are not allowed
to linear addresses that are accessible in user mode. If CPL < 3,
SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3, SMAP
applies to all supervisor-mode data accesses (these are implicit
supervisor accesses) regardless of the value of EFLAGS.AC.

This patch enables SMAP in Xen to prevent Xen hypervisor from
accessing pv guest data, whose translation paging-structure
entries' U/S flags are all set.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agoVMX: disable SMAP feature when guest is in non-paging mode
Feng Wu [Mon, 12 May 2014 15:03:09 +0000 (17:03 +0200)]
VMX: disable SMAP feature when guest is in non-paging mode

SMAP is disabled if CPU is in non-paging mode in hardware.
However Xen always uses paging mode to emulate guest non-paging
mode with HAP. To emulate this behavior, SMAP needs to be manually
disabled when guest switches to non-paging mode.

This logic is similiar with SMEP.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agox86: temporarily disable SMAP to legally access user pages in kernel mode
Feng Wu [Mon, 12 May 2014 15:02:25 +0000 (17:02 +0200)]
x86: temporarily disable SMAP to legally access user pages in kernel mode

Use STAC/CLAC to temporarily disable SMAP to allow legal accesses to
user pages in kernel mode

STAC/CLAC is not needed for compat_create_bounce_frame, since in this
chunk of code, it only accesses the pv guest's kernel stack, which is
in ring 1 for 32-bit pv guests.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
10 years agox86: clear AC bit in RFLAGS to protect Xen itself by SMAP
Feng Wu [Mon, 12 May 2014 15:01:47 +0000 (17:01 +0200)]
x86: clear AC bit in RFLAGS to protect Xen itself by SMAP

Clear AC bit in RFLAGS at the beginning of exception, interrupt, hypercall,
so Xen itself can be protected by SMAP mechanism. This patch also sets AC
bit at the beginning of double_fault and fatal_trap() to reduce the likelihood
of taking a further fault while trying to dump state.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 years agox86: add support for STAC/CLAC instructions
Feng Wu [Mon, 12 May 2014 15:00:39 +0000 (17:00 +0200)]
x86: add support for STAC/CLAC instructions

The STAC/CLAC instructions are only available when SMAP feature is
available, but on the other hand they aren't needed if SMAP is not
enabled, or before we start to run userspace, in that case, the
functions and macros do nothing.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agotools/pygrub: Fix error handling if no valid partitions are found
Andrew Cooper [Sat, 10 May 2014 01:18:33 +0000 (02:18 +0100)]
tools/pygrub: Fix error handling if no valid partitions are found

If no partitions at all are found, pygrub never creates the name 'fs',
resulting in a NameError indicating the lack of fs, rather than a
RuntimeError explaining that no partitions were found.

Set fs to None right at the start, and use the pythonic idiom "if fs is None:"
to protect against otherwise valid values for fs which compare equal to
0/False.

Reported-by: Sven Köhler <sven.koehler@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agoclarify SHUTDOWN_suspend additional argument
Stefano Stabellini [Thu, 8 May 2014 15:43:08 +0000 (16:43 +0100)]
clarify SHUTDOWN_suspend additional argument

Clarify the behaviour of SCHEDOP_shutdown: PV x86 guests need to pass a
third argument, that is unused on HVM and ARM guests.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agotools/libxc: Issue individual DPRINTF()s rather than multiline ones.
Andrew Cooper [Fri, 9 May 2014 09:59:58 +0000 (10:59 +0100)]
tools/libxc: Issue individual DPRINTF()s rather than multiline ones.

For libxc users who log to syslog, this results in legible logging, rather
than long lines with #012's replacing newlines.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen: arm: bitops take unsigned int
Ian Campbell [Thu, 8 May 2014 15:13:55 +0000 (16:13 +0100)]
xen: arm: bitops take unsigned int

Xen bitmaps can be 4 rather than 8 byte aligned, so use the appropriate type.
Otherwise the compiler can generate unaligned 8 byte accesses and cause traps.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
10 years agopvh dom0: Add checks and restrictions for p2m_is_foreign
Mukesh Rathor [Mon, 12 May 2014 10:10:13 +0000 (12:10 +0200)]
pvh dom0: Add checks and restrictions for p2m_is_foreign

In this patch, we add some checks and restrictions in the relevant
p2m paths for p2m_is_foreign.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
10 years agoadd the facility to limit ranges per rangeset
Paul Durrant [Mon, 12 May 2014 10:04:45 +0000 (12:04 +0200)]
add the facility to limit ranges per rangeset

A subsequent patch exposes rangesets to secondary emulators, so to allow a
limit to be placed on the amount of xenheap that an emulator can cause to be
consumed, the function rangeset_limit() has been created to set the allowed
number of ranges in a rangeset. By default, there is no limit.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoadd an implentation of asprintf() for xen
Paul Durrant [Mon, 12 May 2014 10:03:57 +0000 (12:03 +0200)]
add an implentation of asprintf() for xen

Also needed to fix vsnprintf() et al so it can be called with a NULL buf
(and zero size, of course).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 years agoioreq-server: on-demand creation of ioreq server
Paul Durrant [Mon, 12 May 2014 10:03:19 +0000 (12:03 +0200)]
ioreq-server: on-demand creation of ioreq server

This patch only creates the ioreq server when the legacy HVM parameters
are read (by an emulator).

A lock is introduced to protect access to the ioreq server by multiple
emulator/tool invocations should such an eventuality arise. The guest is
protected by creation of the ioreq server only being done whilst the
domain is paused.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoioreq-server: create basic ioreq server abstraction
Paul Durrant [Mon, 12 May 2014 10:02:20 +0000 (12:02 +0200)]
ioreq-server: create basic ioreq server abstraction

Collect together data structures concerning device emulation together into
a new struct hvm_ioreq_server.

Code that deals with the shared and buffered ioreq pages is extracted from
functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
and consolidated into a set of hvm_ioreq_server manipulation functions. The
lock in the hvm_ioreq_page served two different purposes and has been
replaced by separate locks in the hvm_ioreq_server structure.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoioreq-server: centralize access to ioreq structures
Paul Durrant [Mon, 12 May 2014 10:01:43 +0000 (12:01 +0200)]
ioreq-server: centralize access to ioreq structures

To simplify creation of the ioreq server abstraction in a subsequent patch,
this patch centralizes all use of the shared ioreq structure and the
buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.

The patch moves an rmb() from inside hvm_io_assist() to hvm_do_resume()
because the former may now be passed a data structure on stack, in which
case the barrier is unnecessary.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
10 years agoioreq-server: pre-series tidy up
Paul Durrant [Mon, 12 May 2014 10:00:30 +0000 (12:00 +0200)]
ioreq-server: pre-series tidy up

This patch tidies up various parts of the code that following patches move
around. If these modifications were combined with the code motion it would
be easy to miss them.

There's also some function renaming to reflect purpose and a single
whitespace fix.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 years agoNested VMX: load current_vmcs only when it exists
Edmund H White [Mon, 12 May 2014 09:59:19 +0000 (11:59 +0200)]
Nested VMX: load current_vmcs only when it exists

There may not have valid vmcs on current CPU. So only load it when it exists.

This original fixing is from Edmud <edmund.h.white@intel.com>.

Signed-off-by: Edmund H White <edmund.h.white@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agopvh dom0: construct_dom0 changes
Mukesh Rathor [Thu, 8 May 2014 12:18:27 +0000 (14:18 +0200)]
pvh dom0: construct_dom0 changes

This patch changes construct_dom0() to boot in pvh mode:
  - Make sure dom0 elf supports pvh mode.
  - Call guest_physmap_add_page for pvh rather than simple p2m setting
  - Map all non-RAM regions 1:1 upto the end region in e820 or 4GB which
    ever is higher.
  - Allocate p2m, copying calculation from toolstack.
  - Allocate shared info page from the virtual space so that dom0 PT
    can be updated. Then update p2m for it with the actual mfn.
  - Since we build the page tables for pvh same as for pv, in
    pvh_fixup_page_tables_for_hap we replace the mfns with pfns.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agox86: remove c_identify of the struct cpu_dev
Yi Li [Thu, 8 May 2014 12:06:10 +0000 (14:06 +0200)]
x86: remove c_identify of the struct cpu_dev

After commit 44e24f85674d (x86: don't call generic_identify() redundantly)
the struct cpu_dev don't need the c_identify.

Signed-off-by: Yi Li <peteryili@tencent.com>
11 years agognttab: don't flush the TLB on grant ops for auto-translated guests
Roger Pau Monné [Thu, 8 May 2014 12:05:35 +0000 (14:05 +0200)]
gnttab: don't flush the TLB on grant ops for auto-translated guests

For auto-translated guests the p2m code will do the necessary TLB
flushes, so there's no need to perform any TLB flushes in generic
grant table code.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/P2M: p2m_change_type() should pass on error from p2m_set_entry()
Jan Beulich [Thu, 8 May 2014 11:59:33 +0000 (13:59 +0200)]
x86/P2M: p2m_change_type() should pass on error from p2m_set_entry()

Modify the function's name to help eventual backports involving this
function, and in one case where this is trivially possible also stop
ignoring its return value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agox86/P2M: pass on errors from p2m_set_entry()
Jan Beulich [Thu, 8 May 2014 11:58:46 +0000 (13:58 +0200)]
x86/P2M: pass on errors from p2m_set_entry()

... at least in a couple of straightforward cases.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agodomctl: tighten XEN_DOMCTL_*_permission
Jan Beulich [Thu, 8 May 2014 11:57:12 +0000 (13:57 +0200)]
domctl: tighten XEN_DOMCTL_*_permission

With proper permission (and, for the I/O port case, wrap-around) checks
added (note that for the I/O port case a count of zero is now being
disallowed, in line with I/O memory handling):

XEN_DOMCTL_irq_permission:
XEN_DOMCTL_ioport_permission:

 Of both IRQs and I/O ports there is only a reasonably small amount, so
 there's no excess resource consumption involved here. Additionally
 they both have a specialized XSM hook associated.

XEN_DOMCTL_iomem_permission:

 While this also has a specialized XSM hook associated (just like
 XEN_DOMCTL_{irq,ioport}_permission), it's not clear whether it's
 reasonable to expect XSM to restrict the number of ranges associated
 with a domain via this hook (which is the main resource consumption
 item here).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Detect comparator values in the past
Don Slutz [Fri, 2 May 2014 20:18:08 +0000 (16:18 -0400)]
hvm/hpet: Detect comparator values in the past

This statement only works using 64-bit arithmetic for the main
                                     63
counter never changing by more then 2  .  (Which is a boundary
case that should not happen in my life time.)

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Prevent master clock equal to comparator while enabled
Don Slutz [Fri, 2 May 2014 20:18:07 +0000 (16:18 -0400)]
hvm/hpet: Prevent master clock equal to comparator while enabled

Based on the software-developers-hpet-spec-1-0a.pdf, the comparator
for a periodic timer will change to the new value when it matches
the master clock.  The current code here uses a very standard
rounding formula of "((x + y - 1) / y) * y".  This is wrong because
in this case you need to go to the next comparator value when "x"
equals "y". Not when "x + 1" equals "y".  In this case "y" is the
period and "x" is the master clock.

The code lines:

    elapsed = hpet_read_maincounter(h, guest_time) +
        period - 1 - comparator;
    comparator += (elapsed / period) * period;

are what matter here.

Using some numbers to help show the issue:

hpet_read_maincounter(h, guest_time) = 130252
period = 62500

comparator       : 130252
elapsed          : 62499
elapsed/period   : 0
comparator_delta : 0
new comparator   : 130252

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: comparator can only change when master clock is enabled.
Don Slutz [Fri, 2 May 2014 20:18:06 +0000 (16:18 -0400)]
hvm/hpet: comparator can only change when master clock is enabled.

This is based on software-developers-hpet-spec-1-0a.pdf saying:

When the main counter value matches the value in the timer's
comparator register, an interrupt can be generated.  The hardware
will then automatically increase the value in the compare register
by the last value written to that register.

When the overall enable is off (the main count is halted), none of
the compare registers should change.

The code lines:

    elapsed = hpet_read_maincounter(h, guest_time) +
        period - 1 - comparator;
    comparator += (elapsed / period) * period;

are what matter here.  They will always adjust comparator to be no
more then one period away.

Using some numbers to help show the issue:

hpet_read_maincounter(h, guest_time) = 67752
period = 62500
comparator = 255252 == 67752 + 3 * 62500

comparator       : 255252
elapsed          : -125001
elapsed/period   : -2
comparator_delta : -125000
new comparator   : 130252

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Init comparator64 like comparator.
Don Slutz [Fri, 2 May 2014 20:18:05 +0000 (16:18 -0400)]
hvm/hpet: Init comparator64 like comparator.

The software-developers-hpet-spec-1-0a.pdf says that the comparator
starts as all 1's.  Also make the hidden register comparator64 the same.

Since only the hidden register comparator64 is used by hpet_save, it
needs to start out with the right value.

A disabled hpet (like when a guest is starting), should start with
the value the spec says.  Both the guest (via reading the
comparator) and an administrator using xen-hvmctx, will see all 0's
not all 1's.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: In hpet_save, call hpet_get_comparator.
Don Slutz [Fri, 2 May 2014 20:18:04 +0000 (16:18 -0400)]
hvm/hpet: In hpet_save, call hpet_get_comparator.

This changes save data to consistent/expected values.  It is not
technically required because hpet_get_comparator() will adjust from
any value to the correct value. And hpet_get_comparator() is
effectivly called in hpet_load via hpet_set_timer.

However it does look strange to people that the output from
xen-hvmctx for the comparator values do not change when the master
clock does.

The software-developers-hpet-spec-1-0a.pdf says that the comparator
will allways be greater than master clock for a periodic timer.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: In hpet_save, correctly compute mc64.
Don Slutz [Fri, 2 May 2014 20:18:03 +0000 (16:18 -0400)]
hvm/hpet: In hpet_save, correctly compute mc64.

When the master clock is not enabled, mc64 has the right value.

Basicly do the same thing as hpet_read_maincounter().

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Correctly limit period to a maximum.
Don Slutz [Fri, 2 May 2014 20:18:02 +0000 (16:18 -0400)]
hvm/hpet: Correctly limit period to a maximum.

In the code section after the comment:

    /*
     * Clamp period to reasonable min/max values:
     *  - minimum is 100us, same as timers controlled by vpt.c
     *  - maximum is to prevent overflow in time_after() calculations
     */

The current maximum limit actually allows "bad" values like 0 and 1.
This is because it uses a mask not a maximum.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Only set comparator or period not both.
Don Slutz [Fri, 2 May 2014 20:18:01 +0000 (16:18 -0400)]
hvm/hpet: Only set comparator or period not both.

The current code sets both.  If setting the comparator also set
comparator64 (the hidden version).

Based on:

software-developers-hpet-spec-1-0a.pdf

A write call should only change comparator or period, not both.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Only call guest_time_hpet(h) one time per action.
Don Slutz [Fri, 2 May 2014 20:18:00 +0000 (16:18 -0400)]
hvm/hpet: Only call guest_time_hpet(h) one time per action.

This call is expensive and will cause extra time to pass.

The software-developers-hpet-spec-1-0a.pdf does not say how long it
takes after the main clock is enabled before the first change of the
master clock.  Therefore multiple calls to guest_time_hpet(h) are
not needed.  Since each timer is started by a loop, each ones start
time will change on the multple calls.  In the real hardware, there
is not delta based on which timer.

Without this change it is possible for an HVM guest running linux to
get the message:

..MP-BIOS bug: 8254 timer not connected to IO-APIC

On the guest console(s); and the guest will panic.

Also Xen hypervisor console with be flooded with:

vioapic.c:352:d1 Unsupported delivery mode 7
vioapic.c:352:d1 Unsupported delivery mode 7
vioapic.c:352:d1 Unsupported delivery mode 7

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agohvm/hpet: Add manual unit test code.
Don Slutz [Fri, 2 May 2014 20:17:59 +0000 (16:17 -0400)]
hvm/hpet: Add manual unit test code.

Add the code at tools/tests/vhpet.

See comment in tools/tests/vhpet/main.c for details on running
either in a xen source tree or elsewhere.

A basic in source tree usage is:

make -C tools/tests/vhpet run

Does repro the bug:

..MP-BIOS bug: 8254 timer not connected to IO-APIC

The make file includes coping hpet.c and hpet.h from the source
tree.  hpet.c is then modifed to remove all include file and add the
emul.h include file.

The manual test code has only a few automatic checks that output
messages to stderr:

1) Possible ..MP-BIOS bug: 8254 timer...
   if 1st period is not <= the expected value

2) hpet_set_mode(%ld): T%d Error: Set ...
   if read of comparator != write of comparator in

3) hpet_check_stopped(%ld): T%d Error: Set ...
   if read != write

4) main(%ld): With clock stopped mc64 changed: ...
   if hpet_save returns different master clock values when called
   more then once.

It also generates a lot of output, which is why the sugested way to
use includes a redirect of stdout to a file.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxenstat: don't leak memory in getBridge
Matthew Daley [Sun, 4 May 2014 08:31:47 +0000 (20:31 +1200)]
xenstat: don't leak memory in getBridge

getBridge's method of returning a result was a little confused:
allocating a result buffer but never using it.

Simplify by instead allowing a result buffer to be passed in and
modifying the single usage to match.

Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxenstat: fix unsigned less-than-0 comparison
Matthew Daley [Sun, 4 May 2014 08:31:46 +0000 (20:31 +1200)]
xenstat: fix unsigned less-than-0 comparison

Commit 1438d36f ("xenstat: Fix buffer over-run with new_domains being
negative.") attempted to fix the handling of a negative error result
from xc_domain_getinfolist in xenstat_get_node. However, it forgot to
change the result variable from an unsigned type to a signed one.

Do so, allowing the error result to be handled properly.

Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agonetif.h: Document xen-net{back, front} multi-queue feature
Andrew J. Bennieston [Tue, 6 May 2014 11:03:18 +0000 (12:03 +0100)]
netif.h: Document xen-net{back, front} multi-queue feature

Document the multi-queue feature in terms of XenStore keys to be written
by the backend and by the frontend.

Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/libxl: add direct_io_safe to check-xl-disk-parse
Olaf Hering [Mon, 5 May 2014 13:30:28 +0000 (15:30 +0200)]
tools/libxl: add direct_io_safe to check-xl-disk-parse

Add missing bool "direct_io_safe" to expected output. It was added by
Commit 6ec48cf4 ("libxl: introduce an option for disabling the
non-O_DIRECT workaround"), but check-xl-disk-parse was not updated.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agox86: reduce redundancy in tsc_[gs]et_info()
Jan Beulich [Wed, 7 May 2014 14:36:11 +0000 (16:36 +0200)]
x86: reduce redundancy in tsc_[gs]et_info()

- some of the case statements are effectively or mostly special cases
  of others, so there's no good reason not to share the code
- in the "get" function, a variable can be made case-wide instead of
  having multiple instance of it (and those even with a pointless
  initializer)
- minor formatting adjustments

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agocredit2: use unique names
Juergen Gross [Wed, 7 May 2014 14:35:24 +0000 (16:35 +0200)]
credit2: use unique names

Avoid name duplicated with the credit scheduler. This makes live easier when
debugging with tools like cscope or crash.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
11 years agox86: merge stuff from asm-x86/x86_64/asm_defns.h to asm-x86/asm_defns.h
Feng Wu [Tue, 6 May 2014 11:55:27 +0000 (13:55 +0200)]
x86: merge stuff from asm-x86/x86_64/asm_defns.h to asm-x86/asm_defns.h

This patch move stuff unchanged from asm-x86/x86_64/asm_defns.h to
asm-x86/asm_defns.h

Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agox86: move common_interrupt to entry.S
Feng Wu [Tue, 6 May 2014 11:54:16 +0000 (13:54 +0200)]
x86: move common_interrupt to entry.S

This patch moves label common_interrupt from asm_defns.h to entry.S and
convert SAVE_ALL from a C to an assembler macro.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86: define macros CPUINFO_features and CPUINFO_FEATURE_OFFSET
Feng Wu [Tue, 6 May 2014 11:51:27 +0000 (13:51 +0200)]
x86: define macros CPUINFO_features and CPUINFO_FEATURE_OFFSET

This patch defines macros CPUINFO_features and CPUINFO_FEATURE_OFFSET.
CPUINFO_features can be used as the base of the offset for cpu features,
while CPUINFO_FEATURE_OFFSET is used to define the right offset for
specific CPU feature.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Some further cleanup (both to the patch and to surrounding code).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agox86,amd_ucode: verify max allowed patch size before apply
Aravind Gopalakrishnan [Tue, 6 May 2014 11:39:05 +0000 (13:39 +0200)]
x86,amd_ucode: verify max allowed patch size before apply

Each family has a stipulated max patch_size. Use this as
additional sanity check before we apply it.

Also, tone down the amount of debug messages and
Follow microcode_intel's implementation of pr_debug.

While at it, fix comment at very top to indicate we support ucode
patch loading from fam10h and higher.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Reviewed-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
11 years agox86/time: cpuid_time_leaf() cleanup
Andrew Cooper [Tue, 6 May 2014 11:33:46 +0000 (13:33 +0200)]
x86/time: cpuid_time_leaf() cleanup

* Don't mix uint32_t and unsigned int between prototype and definition
* Don't bitwise or with 0

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agoNPT: temporarily retain page table mapping in do_recalc()
Jan Beulich [Tue, 6 May 2014 11:30:31 +0000 (13:30 +0200)]
NPT: temporarily retain page table mapping in do_recalc()

Commit b3e024f3 ("x86/NPT: don't walk page tables when changing types
on a range") neglected the fact that p2m_next_level() replaces the
previous level's mapping with the new level's one, hence dereferencing
a stale pointer the translation for which may no longer be available
(timing dependent). Add a parameter to that function allowing the
caller to request that the mapping be retained (the unmapping will be
taken care of by the caller then).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
11 years agolibxl: introduce an option for disabling the non-O_DIRECT workaround
Stefano Stabellini [Wed, 30 Apr 2014 15:06:24 +0000 (16:06 +0100)]
libxl: introduce an option for disabling the non-O_DIRECT workaround

Document and implement a new option that permits disk backends which
would otherwise have to avoid O_DIRECT (because of the network memory
lifetime bug) to use it anyway.  This is:
 direct-io-safe   in the xl domain disk config specification
 direct_io_safe   in the libxl disk API
 direct-io-safe   in the backend xenstore interface

Add a reference to xen/include/public/io/blkif.h in
docs/misc/vbd-interface.txt.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Felipe Franciosi <felipe@paradoxo.org>
11 years agolibxl: Rerun bison
Ian Jackson [Fri, 2 May 2014 16:47:55 +0000 (17:47 +0100)]
libxl: Rerun bison

This updates libxlu_cfg_y.[ch] to code generated by bison from
Debian wheezy (1:2.5.dfsg-2.1 i386).

There should be no functional change since there is no change to the
source file, but we will inherit bugfixes and behavioural changes from
the new version of bison.  So this is more a matter of hope than
knowledge.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agoxen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle
Julien Grall [Tue, 22 Apr 2014 13:14:24 +0000 (14:14 +0100)]
xen/dts: Add dt_parse_phandle_with_args and dt_parse_phandle

Code adapted from linux drivers/of/base.c (commit ef42c58).

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: vtimer: rename vcpu_domain_init into domain_vtimer_init
Julien Grall [Thu, 1 May 2014 12:31:15 +0000 (13:31 +0100)]
xen/arm: vtimer: rename vcpu_domain_init into domain_vtimer_init

The current function name vcpu_domain_init doesn't reflect what the function
does and might be misused.

Rename it into domain_vtimer_init.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: bail from placement on non-NUMA boxes
Dario Faggioli [Wed, 30 Apr 2014 15:44:13 +0000 (17:44 +0200)]
libxl: bail from placement on non-NUMA boxes

If there only is 1 NUMA node, no need to go through placement
candidate selection, etc., we can just bail.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/mfn-dump: Fixes to 'dump-p2m'
Andrew Cooper [Thu, 24 Apr 2014 21:06:27 +0000 (22:06 +0100)]
tools/mfn-dump: Fixes to 'dump-p2m'

* Don't walk off the end of p2m_table under the mistaken impression that it
  contains toolstack unsigned longs.  Despite its array type it contains guest
  unsigned longs so unconditionally needs casting to the guest width to use
  correctly.  Furthermore, a 64bit toolstack must be extra careful when it
  finds a 32bit guest's INVALID_MFN.

* Drop 'mapped' and 'pinned' descriptions.  This are both bogus, including all
  uses of the is_mapped() macro.

* Rearrange the type name printing to be more concise.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/misc: Fix linkage of libxenstore
Andrew Cooper [Thu, 24 Apr 2014 21:17:57 +0000 (22:17 +0100)]
tools/misc: Fix linkage of libxenstore

* xen-mfndump doesn't use xenstore at all.  Don't link against it.

* xen-hptool can include the correct header rather than externing itself a
  single function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agotools/libxl: fix typo in main_tmem_freeable
Olaf Hering [Tue, 29 Apr 2014 09:09:55 +0000 (11:09 +0200)]
tools/libxl: fix typo in main_tmem_freeable

missing letter 'b'.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agovtpmmgr: properly remove t_uint size dependency
Daniel De Graaf [Mon, 28 Apr 2014 23:29:10 +0000 (19:29 -0400)]
vtpmmgr: properly remove t_uint size dependency

Rather than using the internal MPI format for the Diffie-Hellman group,
whose representation depends on the size of the t_uint type, store the
value as a big-endian integer and use mpi_read_binary to convert it in
an architecture-independent manner.  This patch also removes the
unnecessary range check on the exponent which ended up being different
between 32- and 64-bit code.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agobuild: export CC value to SeaBIOS
Roger Pau Monne [Wed, 16 Apr 2014 14:13:31 +0000 (16:13 +0200)]
build: export CC value to SeaBIOS

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agogdbsx: remove cast from ioctl
Roger Pau Monne [Wed, 16 Apr 2014 14:13:30 +0000 (16:13 +0200)]
gdbsx: remove cast from ioctl

The ulong type is not defined on FreeBSD, and the cast seems
pointless, so just remove it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
11 years agoxenstat: add a dummy FreeBSD implementation
Roger Pau Monne [Wed, 16 Apr 2014 14:13:29 +0000 (16:13 +0200)]
xenstat: add a dummy FreeBSD implementation

Add an empty FreeBSD implementation so xenstat can compile on FreeBSD.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>