]> xenbits.xensource.com Git - people/royger/xen.git/log
people/royger/xen.git
8 years agoxen: allow setting the store pfn HVM parameter dom0_wip11 gitlab/dom0_wip11
Roger Pau Monne [Tue, 6 Sep 2016 08:58:28 +0000 (10:58 +0200)]
xen: allow setting the store pfn HVM parameter

Xen already allows setting the store event channel, and this parameter is
not used by Xen at all.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoxen/x86: allow HVM hardware domains (PVHv2 Dom0) to perform foreign memory mappings
Roger Pau Monne [Tue, 6 Sep 2016 08:57:06 +0000 (10:57 +0200)]
xen/x86: allow HVM hardware domains (PVHv2 Dom0) to perform foreign memory mappings

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoxen/x86: add MSI-X emulation to PVHv2 Dom0
Roger Pau Monne [Mon, 19 Sep 2016 10:00:56 +0000 (12:00 +0200)]
xen/x86: add MSI-X emulation to PVHv2 Dom0

This requires adding handlers to the PCI configuration space, plus a MMIO
handler for the MSI-X table, the PBA is left mapped directly into the guest.
The implementation is based on the one already found in the passthrough
code from QEMU.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Paul Durrant <paul.durrant@citrix.com>
Jan Beulich <jbeulich@suse.com>
Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Move registration of the vMSI-X handlers to hvm_domain_initialise.

8 years agox86/msixtbl: disable MSI-X intercepts for domains without an ioreq server
Roger Pau Monne [Thu, 22 Sep 2016 14:52:42 +0000 (16:52 +0200)]
x86/msixtbl: disable MSI-X intercepts for domains without an ioreq server

The current msixtbl intercepts only partially trap MSI-X accesses, but are
not complete, there's missing logic in order to setup PIRQs and bind them to
domains. Disable them for domains without at least an ioreq server (PVH).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
NB: this is a preparatory patch for introducing a complete MSI-X emulation
layer into Xen. Long term the current msixtbl code should be replaced with
the complete MSI-X emulation introduced in later patches.

8 years agoxen/x86: add PCIe emulation
Roger Pau Monne [Fri, 2 Sep 2016 14:59:29 +0000 (16:59 +0200)]
xen/x86: add PCIe emulation

Add a new MMIO handler that traps accesses to PCIe regions, as discovered by
Xen from the MCFG ACPI table. The handler used is the same as the one used
for accesses to the IO PCI configuration space.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Move the PCIe handlers from io.c into arch/x86/pci.c.
 - Register the PCIE MMIO handlers directly in hvm_domain_initialise.

8 years agoxen/x86: add all PCI devices to PVHv2 Dom0
Roger Pau Monne [Wed, 31 Aug 2016 10:43:55 +0000 (12:43 +0200)]
xen/x86: add all PCI devices to PVHv2 Dom0

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/vmsi: add MSI emulation for hardware domain
Roger Pau Monne [Fri, 26 Aug 2016 11:06:38 +0000 (13:06 +0200)]
x86/vmsi: add MSI emulation for hardware domain

Import the MSI handlers from QEMU into Xen. This allows Xen to detect
accesses to the MSI registers and correctly setup PIRQs for physical devices
that are then bound to the hardware domain.

The current logic only allows the usage of a single MSI interrupt per
device, so the maximum queue size announced by the device is unconditionally
set to 0 (1 vector only).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoxen/x86: route legacy PCI interrupts to Dom0
Roger Pau Monne [Wed, 20 Jul 2016 15:48:46 +0000 (17:48 +0200)]
xen/x86: route legacy PCI interrupts to Dom0

This is done adding some Dom0 specific logic to the IO APIC emulation inside
of Xen, so that writes to the IO APIC registers that should unmask an
interrupt will take care of setting up this interrupt with Xen. A Dom0
specific EIO handler also has to be used, since Xen doesn't know the
topology of the PCI devices and it just has to passthrough what Dom0 does.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
8 years agoxen/x86: prevent PVHv2 Dom0 BAR remapping
Roger Pau Monne [Tue, 30 Aug 2016 10:49:14 +0000 (12:49 +0200)]
xen/x86: prevent PVHv2 Dom0 BAR remapping

Add handlers to detect attempts from a PVHv2 Dom0 to change the position of
the PCI BARs and crash the domain in such cases. PCI BAR remapping is not
yet supported for PVHv2 Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Simplify accept handlers.
 - Fix commit message.

8 years agoxen/pci: split code to size BARs from pci_add_device
Roger Pau Monne [Thu, 1 Sep 2016 14:24:50 +0000 (16:24 +0200)]
xen/pci: split code to size BARs from pci_add_device

Because it's also going to be used by other code.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
8 years agoxen/x86: add the basic infrastructure to import QEMU passthrough code
Roger Pau Monne [Thu, 25 Aug 2016 15:23:14 +0000 (17:23 +0200)]
xen/x86: add the basic infrastructure to import QEMU passthrough code

Most of this code has been picked up from QEMU and modified so it can be
plugged into the internal Xen IO handlers. The structure of the handlers has
been keep quite similar to QEMU, so existing handlers can be imported
without a lot of effort.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v2:
 - Create a macro to convert a size into a bitmask (instead of open coding
   it everywhere).
 - Use xzalloc instead of xmalloc + memset.

8 years agoxen/dcpi: add a dpci passthrough handler for hardware domain
Roger Pau Monne [Wed, 20 Jul 2016 11:41:00 +0000 (13:41 +0200)]
xen/dcpi: add a dpci passthrough handler for hardware domain

This is very similar to the PCI trap used for the traditional PV(H) Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Simplify the accept handler.
 - Simplify read/write handlers by doing addr &= 3.

8 years agoxen/x86: setup PVHv2 Dom0 ACPI tables
Roger Pau Monne [Fri, 29 Jul 2016 11:43:27 +0000 (13:43 +0200)]
xen/x86: setup PVHv2 Dom0 ACPI tables

Create a new MADT table that contains the topology exposed to the guest. A
new XSDT table is also created, in order to filter the tables that we want
to expose to the guest, plus the Xen crafted MADT. This in turn requires Xen
to also create a new RSDP in order to make it point to the custom XSDT.

Also, regions marked as E820_ACPI or E820_NVS are identity mapped into Dom0
p2m, plus any top-level ACPI tables that should be accessible to Dom0 and
that don't reside in RAM regions. This is needed because some memory maps
don't properly account for all the memory used by ACPI, so it's common to
find ACPI tables in holes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Completely reworked.

8 years agoxen/x86: hack to setup PVHv2 Dom0 CPUs
Roger Pau Monne [Fri, 29 Jul 2016 11:25:44 +0000 (13:25 +0200)]
xen/x86: hack to setup PVHv2 Dom0 CPUs

Initialize Dom0 BSP/APs and setup the memory and IO permissions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
DO NOT APPLY.

The logic used to setup the CPUID leaves is clearly lacking. This patch will
be rebased on top of Andrew's CPUID work, that will move CPUID setup from
libxc into Xen. For the time being this is needed in order to be able to
boot a PVHv2 Dom0, in order to test the rest of the patches.

8 years agoxen/x86: parse Dom0 kernel for PVHv2
Roger Pau Monne [Fri, 29 Jul 2016 09:16:53 +0000 (11:16 +0200)]
xen/x86: parse Dom0 kernel for PVHv2

Introduce a helper to parse the Dom0 kernel.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Remove debug messages.
 - Don't hardcode the number of modules to 1.

8 years agoxen/x86: populate PVHv2 Dom0 physical memory map
Roger Pau Monne [Fri, 29 Jul 2016 09:22:17 +0000 (11:22 +0200)]
xen/x86: populate PVHv2 Dom0 physical memory map

Craft the Dom0 e820 memory map and populate it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Introduce get_order_from_bytes_floor as a local function to
   domain_build.c.
 - Remove extra asserts.
 - Make hvm_populate_memory_range return an error code instead of panicking.
 - Fix comments and printks.
 - Use ULL sufix instead of casting to uint64_t.
 - Rename hvm_setup_vmx_unrestricted_guest to
   hvm_setup_vmx_realmode_helpers.
 - Only substract two pages from the memory calculation, that will be used
   by the MADT replacement.
 - Remove some comments.
 - Remove printing allocation information.
 - Don't stash any pages for the MADT, TSS or ident PT, those will be
   subtracted directly from RAM regions of the memory map.
 - Count the number of iterations before calling process_pending_softirqs
   when populating the memory map.
 - Move the initial call to process_pending_softirqs into construct_dom0,
   and remove the ones from construct_dom0_hvm and construct_dom0_pv.
 - Make memflags global so it can be shared between alloc_chunk and
   hvm_populate_memory_range.

Changes since RFC:
 - Use IS_ALIGNED instead of checking with PAGE_MASK.
 - Use the new %pB specifier in order to print sizes in human readable form.
 - Create a VM86 TSS for hardware that doesn't support unrestricted mode.
 - Subtract guest RAM for the identity page table and the VM86 TSS.
 - Split the creation of the unrestricted mode helper structures to a
   separate function.
 - Use preemption with paging_set_allocation.
 - Use get_order_from_bytes_floor.

8 years agoxen/mm: introduce a function to map large chunks of MMIO
Roger Pau Monne [Fri, 30 Sep 2016 14:35:18 +0000 (16:35 +0200)]
xen/mm: introduce a function to map large chunks of MMIO

Current {un}map_mmio_regions implementation has a maximum number of loops to
perform before giving up and returning to the caller. This is an issue when
mapping large MMIO regions when building the hardware domain. In order to
solve it, introduce a wrapper around {un}map_mmio_regions that takes care of
calling process_pending_softirqs between consecutive {un}map_mmio_regions
calls.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v2:
 - Pull the code into a separate patch.
 - Use an unbounded for loop with break conditions.

8 years agoxen/x86: split Dom0 build into PV and PVHv2
Roger Pau Monne [Thu, 28 Jul 2016 15:14:19 +0000 (17:14 +0200)]
xen/x86: split Dom0 build into PV and PVHv2

Split the Dom0 builder into two different functions, one for PV (and classic
PVH), and another one for PVHv2. Introduce a new command line parameter
called 'dom0' that can be used to request the creation of a PVHv2 Dom0 by
setting the 'hvm' sub-option.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Fix coding style.
 - Introduce a new dom0 option that allows passing several parameters.
   Currently supported ones are hvm and shadow.

Changes since RFC:
 - Add documentation for the new command line option.
 - Simplify the logic in construct_dom0.

8 years agoxen/x86: allow the emulated APICs to be enabled for the hardware domain
Roger Pau Monne [Fri, 28 Oct 2016 09:16:41 +0000 (11:16 +0200)]
xen/x86: allow the emulated APICs to be enabled for the hardware domain

Allow the use of both the emulated local APIC and IO APIC for the hardware
domain.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Allow all PV guests to use the emulated PIT.

Changes since RFC:
 - Move the emulation flags check to a separate helper.

8 years agox86/vtd: fix mapping of RMRR regions
Roger Pau Monne [Fri, 28 Oct 2016 09:16:40 +0000 (11:16 +0200)]
x86/vtd: fix mapping of RMRR regions

Currently RMRR regions are only mapped to the hardware domain or to
non-translated domains that use an IOMMU. In order to fix this, make sure
set_identity_p2m_entry sets the appropriate IOMMU mappings, and that
clear_identity_p2m_entry also removes them.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoxen/x86: do the PCI scan unconditionally
Roger Pau Monne [Fri, 28 Oct 2016 09:16:40 +0000 (11:16 +0200)]
xen/x86: do the PCI scan unconditionally

Instead of being tied to the presence of an IOMMU. This avoids doing the
scan in two different places, and although it's only required for PVHv2
guests (that also require and IOMMU), it makes the code slightly easier to
follow.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Feng Wu <feng.wu@intel.com>
---
Changes since v2:
 - Expand the commit message.

8 years agoxen/x86: split the setup of Dom0 permissions to a function
Roger Pau Monne [Fri, 28 Oct 2016 09:16:40 +0000 (11:16 +0200)]
xen/x86: split the setup of Dom0 permissions to a function

So that it can also be used by the PVH-specific domain builder. This is just
code motion, it should not introduce any functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes since v2:
 - Fix comment style.
 - Convert i to unsigned int.
 - Restore previous BUG_ON in case of failure (instead of panic).
 - Remove unneeded rc initializer.

8 years agox86/paging: introduce paging_set_allocation
Roger Pau Monne [Fri, 28 Oct 2016 09:16:40 +0000 (11:16 +0200)]
x86/paging: introduce paging_set_allocation

... and remove hap_set_alloc_for_pvh_dom0. While there also change the last
parameter of the {hap/sh}_set_allocation functions to be a boolean.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Tim Deegan <tim@xen.org>
---
Changes since v2:
 - Convert the preempt parameter into a bool.
 - Fix Dom0 builder comment to reflect that paging.mode should be correct
   before calling paging_set_allocation.

Changes since RFC:
 - Make paging_set_allocation preemtable.
 - Move comments.

8 years agoxen/x86: assert that local_events_need_delivery is not called by the idle domain
Roger Pau Monne [Fri, 28 Oct 2016 09:16:39 +0000 (11:16 +0200)]
xen/x86: assert that local_events_need_delivery is not called by the idle domain

It doesn't make sense since the idle domain doesn't receive any events. This
is relevant in order to be sure that hypercall_preempt_check is not called
by the idle domain, which would happen previously when calling
{hap/sh}_set_allocation during domain 0 creation.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Expand commit message.

8 years agoxen/x86: allow calling {sh/hap}_set_allocation with the idle domain
Roger Pau Monne [Fri, 28 Oct 2016 09:16:39 +0000 (11:16 +0200)]
xen/x86: allow calling {sh/hap}_set_allocation with the idle domain

... and using the "preempted" parameter. Introduce a new helper that can
be used from both hypercall or idle vcpu context (ie: during Dom0
creation) in order to check if preemption is needed. If such preemption
happens, the caller should then call process_pending_softirqs in order to
drain the pending softirqs, and then call {sh/hap}_set_allocation again to
continue with it's execution.

This allows us to call *_set_allocation() when building domain 0.

While there also convert hypercall_preempt_check to an inline function, and
document it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v3:
 - Introduce general_preempt_check.
 - Convert hypercall_preempt_check to an inline function.

Changes since v2:
 - Fix commit message.

8 years agoxen/x86: fix return value of *_set_allocation functions
Roger Pau Monne [Fri, 28 Oct 2016 09:16:39 +0000 (11:16 +0200)]
xen/x86: fix return value of *_set_allocation functions

Return should be an int.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Tim Deegan <tim@xen.org>
---
Changes since v2:
 - Also fix the callers to treat the return value as an int.
 - Don't convert the pages parameter to unsigned long.

8 years agoxen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests
Roger Pau Monne [Fri, 28 Oct 2016 09:16:39 +0000 (11:16 +0200)]
xen/x86: remove XENFEAT_hvm_pirqs for PVHv2 guests

PVHv2 guests, unlike HVM guests, won't have the option to route interrupts
from physical or emulated devices over event channels using PIRQs. This
applies to both DomU and Dom0 PVHv2 guests.

Introduce a new XEN_X86_EMU_USE_PIRQ to notify Xen whether a HVM guest can
route physical interrupts (even from emulated devices) over event channels,
and is thus allowed to use some of the PHYSDEV ops.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v2:
 - Change local variable name to currd instead of d.
 - Use currd where it makes sense.

8 years agoxen: rtds: Update last_start whenever cur_budget is updated
Meng Xu [Wed, 26 Oct 2016 19:06:29 +0000 (15:06 -0400)]
xen: rtds: Update last_start whenever cur_budget is updated

Make budget accounting code more consistent by making sure the values
used to compute how much budget has been consumed are updated together.

This makes code resilient to calling burn_budget() from more than just
one place -- in case we will need to do that -- without risking subtle
bugs.

No functional changes are intended.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen:rtds: Fix bug in budget accounting
Meng Xu [Wed, 26 Oct 2016 19:06:06 +0000 (15:06 -0400)]
xen:rtds: Fix bug in budget accounting

Bug scenario:
repl_timer_handler() may be called before rt_schedule() for a VCPU.
This situation may happen in two scenarios:
(1) The VCPU misses deadline due to the system is oversubscribed. For example,
    the sum of VCPUs utilization on a core is larger than one.
(2) The VCPU has budget = period, which causes the timers for
    rt_schedule() and repl_timer_handler() are fired at the same time.
When the situation happens, it causes the following incorrect behavior:
repl_timer_handler() will update the VCPU period and deadline.
If the VCPU is still the highest priority one, even with the new deadline,
it will continue to run, but with new period and deadline.
Since the budget enforcement timer for the previous period is still armed,
rt_schedule() will still be called in the new period and enforce the budget
for the previous period.
The current burn_budget() will deduct the time spent in previous period from
the budget in current period, which is incorrect.

Fix:
We keeps last_start always within the current period for a VCPU, so that
we only deduct the time spent in the current period from the VCPU budget.
We always update last_start whenever we update cur_deadline for a VCPU.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Reported-by: Dagaen Golomb <dgolomb@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoRevert "keyhandler: rework process of nonirq keyhandler"
Jan Beulich [Wed, 26 Oct 2016 14:13:21 +0000 (16:13 +0200)]
Revert "keyhandler: rework process of nonirq keyhandler"

This reverts commit 610b4eda2ce2b87cccbc8f61bdec01052e54fc66.
It's not useful without ed7e33747d, which got reverted already.

8 years agox86/emul: Move CPUID Faulting fault generation into the emulator
Andrew Cooper [Wed, 26 Oct 2016 11:06:44 +0000 (12:06 +0100)]
x86/emul: Move CPUID Faulting fault generation into the emulator

In hindsight, this is a better position for it, as it avoids opencoding
hvmemul_inject_hw_exception() in hvmemul_cpuid(), and reduces the requirements
on other ops->cpuid() hooks wanting to implement cpuid faulting in the future.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/emul: Correct the decoding of SReg3 operands
Andrew Cooper [Fri, 23 Sep 2016 13:48:27 +0000 (14:48 +0100)]
x86/emul: Correct the decoding of SReg3 operands

REX.R is ignored when considering segment register operands, and needs masking
out first.

While fixing this, reorder the user segments in x86_segment to match SReg3
encoding.  This avoids needing a translation table between hardware ordering
and Xen's ordering.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/emul: Use explicit __attribute__((__packed__)) rather than __packed
Andrew Cooper [Tue, 25 Oct 2016 17:46:39 +0000 (18:46 +0100)]
x86/emul: Use explicit __attribute__((__packed__)) rather than __packed

x86_emulate.h is included by the userspace test harness.  Avoid using
constructs which don't come from standard header files.

Reposition the test harnesses inclusion of x86_emulate.h to avoid relying on
any definitions intended for use by x86_emulate.c alone.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen: rtds: always clear the flag when replenishing a depleted vcpu
Meng Xu [Sat, 22 Oct 2016 02:12:02 +0000 (22:12 -0400)]
xen: rtds: always clear the flag when replenishing a depleted vcpu

We should clear the __RTDS_depleted bit once a VCPU budget is replenished.
Because repl_timer_handler may be called after rt_schedule
but before rt_context_saved, the VCPU may be not on CPU or on queue
when the VCPU is the middle of context switch

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: remove wrong statement about bug in xenstore
Juergen Gross [Mon, 24 Oct 2016 11:27:17 +0000 (13:27 +0200)]
docs: remove wrong statement about bug in xenstore

docs/misc/xenstore.txt states that xenstored will use "0" as a valid
transaction id after 2^32 transactions. This is not true. Remove that
statement.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/oxenstored: Avoid allocating invalid transaction ids
Andrew Cooper [Wed, 26 Oct 2016 09:34:21 +0000 (10:34 +0100)]
tools/oxenstored: Avoid allocating invalid transaction ids

The transaction id of 0 is reserved, meaning "not in a transaction".  It is up
to the xenstored server to allocate transaction ids.  While oxenstored starts
its ids at 1, but insufficient care is taken with truncation cases.

A 32bit oxenstored has an int with 31 bits of width, meaning that the
transaction id will wrap around to 0 after 2 billion transactions.

A 64bit oxenstored has an int with 63 bits of width, meaning that once 4
billion transactions are used, the allocated id will be truncated when written
into the uin32_t field in the ring.  This causes the client to reply with the
truncated id, breaking any further attempt to use any transactions.

Limit all transaction ids to the range between 1 and 0x7ffffffe.  This is the
best which can be done without making oxenstored depend on Stdint or Cstruct,
yet still work for 32bit builds.

Also check that the proposed new transaction id isn't currently in use.  For
the first 2 billion transactions there is no chance of a collision, and after
that, the chance is at most 20 (the default open transaction quota) in 2
billion.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: David Scott <dave@recoil.org>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/configure: fix pkg-config install path for FreeBSD
Roger Pau Monne [Tue, 25 Oct 2016 09:53:28 +0000 (11:53 +0200)]
tools/configure: fix pkg-config install path for FreeBSD

pkg-config from FreeBSD ports doesn't have ${prefix}/share/pkgconfig in the
default search path, fix this by having a PKG_INSTALLDIR variable that can
be changed on a per-OS basis.

It would be best to use PKG_INSTALLDIR as defined by the pkg.m4 macro, but
sadly this also reports a wrong value on FreeBSD (${libdir}/pkgconfig, which
expands to /usr/local/lib/pkgconfig by default, and is also _not_ part of
the default pkg-config search path).

This patch should not change the behavior for Linux installs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Alexander Nusov <alexander.nusov@nfvexpress.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoUpdate QEMU_UPSTREAM_REVISION
Ian Jackson [Wed, 26 Oct 2016 11:06:17 +0000 (12:06 +0100)]
Update QEMU_UPSTREAM_REVISION

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
8 years agolibacpi: require ACPI_BUILD_DIR to be set
Wei Liu [Fri, 14 Oct 2016 17:02:31 +0000 (18:02 +0100)]
libacpi: require ACPI_BUILD_DIR to be set

It's better to have a explicit error than a build failure returned by
gcc.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86: MISALIGNSSE feature depends on SSE
Jan Beulich [Mon, 24 Oct 2016 15:34:17 +0000 (17:34 +0200)]
x86: MISALIGNSSE feature depends on SSE

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86emul: fix XOP decode
Jan Beulich [Mon, 24 Oct 2016 15:33:30 +0000 (17:33 +0200)]
x86emul: fix XOP decode

Commit f09902c456 ("x86emul: add XOP decoding") ended up overwriting b
prior to the last use of its previously stored value. SLightly defer
fetching the main opcode byte.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: initialise nr_dom_vcpus to fix 4a6070ea9
Wei Liu [Mon, 24 Oct 2016 10:11:15 +0000 (11:11 +0100)]
libxl: initialise nr_dom_vcpus to fix 4a6070ea9

Clang complains nr_dom_vcpus may be used uninitialised after
4a6070ea9.

The real issue is vinfo can be NULL and nr_dom_vcpus remains
uninitialised if previous call fails.

Initialise nr_dom_vcpus to 0 at the beginning of the loop to fix the
issue.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/x86: Fixup misc stale issues
Andrew Cooper [Sat, 1 Oct 2016 18:36:12 +0000 (18:36 +0000)]
xen/x86: Fixup misc stale issues

 * Dom0 does now have an arch_config passed.
 * hypercall() and smp_alloc_memory() no longer exist.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/emul: Correctly annotate all push/pop %sreg instructions
Andrew Cooper [Wed, 19 Oct 2016 16:30:36 +0000 (17:30 +0100)]
x86/emul: Correctly annotate all push/pop %sreg instructions

c/s 373923ed9c2 "x86emul: fix pushing of selector registers" redirected
all push %sreg instructions into the general push path.  However, this
ends up hitting the assertion at the head of the push path.

Annotate All push and pop %sreg instructions as Mov, indicating that
they do not read the destination operand.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: Handle existing link to acpi directory
Boris Ostrovsky [Sun, 23 Oct 2016 23:09:19 +0000 (19:09 -0400)]
tools: Handle existing link to acpi directory

The link to acpi include directory is not removed by Makefile's 'clean'
target. This can lead to make failure when making xen/.dir target if
we try to create the link again.

We can prevent this failure by (1) removing acpi link when cleaning up
and (2) adding '-f' option to 'ln' (just like we do for other targets).

We should also add tools/include/acpi link to .gitignore.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoRevert "timer: process softirq during dumping timer info"
Wei Liu [Fri, 21 Oct 2016 16:51:59 +0000 (17:51 +0100)]
Revert "timer: process softirq during dumping timer info"

This reverts commit ed7e33747da83ce805c00cd457e71075e34f0854.

Assertion is triggered:
(XEN) Assertion '!in_irq() && local_irq_is_enabled()' failed at softirq.c:57

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: avoid considering pCPUs outside of the cpupool during NUMA placement
Dario Faggioli [Fri, 21 Oct 2016 13:49:30 +0000 (15:49 +0200)]
libxl: avoid considering pCPUs outside of the cpupool during NUMA placement

During NUMA automatic placement, the information
of how many vCPUs can run on what NUMA nodes is used,
in order to spread the load as evenly as possible.

Such information is derived from vCPU hard and soft
affinity, but that is not enough. In fact, affinity
can be set to be a superset of the pCPUs that belongs
to the cpupool in which a domain is but, of course,
the domain will never run on pCPUs outside of its
cpupool.

Take this into account in the placement algorithm.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs:RTDS: Correct mistakes in feature doc
Meng Xu [Wed, 19 Oct 2016 14:48:39 +0000 (10:48 -0400)]
docs:RTDS: Correct mistakes in feature doc

Correct the mistakes in the example command
Correct a simple typo.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agovscsiif.h: replace PAGE_SIZE with VSCSIIF_PAGE_SIZE
Stefano Stabellini [Wed, 19 Oct 2016 19:22:35 +0000 (12:22 -0700)]
vscsiif.h: replace PAGE_SIZE with VSCSIIF_PAGE_SIZE

Do not reference PAGE_SIZE directly: it could be undefined, or it could
have different values in the frontend or in the backend.

Define VSCSIIF_PAGE_SIZE as 4096, assuming all users of vscsiif.h have
4K page granularity. Replace PAGE_SIZE with VSCSIIF_PAGE_SIZE.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agousbif.h: replace PAGE_SIZE with USBIF_RING_SIZE
Stefano Stabellini [Wed, 19 Oct 2016 19:22:34 +0000 (12:22 -0700)]
usbif.h: replace PAGE_SIZE with USBIF_RING_SIZE

Do not reference PAGE_SIZE directly: it could be undefined, or it could
have different values in the frontend or in the backend.

Define USBIF_RING_SIZE as 4096, assuming all users of usbif.h have 4K
page granularity. Replace PAGE_SIZE with USBIF_RING_SIZE.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoaltp2m: don't attempt to unshare pages during change_altp2m_gfn op
Tamas K Lengyel [Fri, 14 Oct 2016 00:00:47 +0000 (18:00 -0600)]
altp2m: don't attempt to unshare pages during change_altp2m_gfn op

Attempting to change gfn mappings with altp2m on a memory shared page results
in a lock-order violation (mm locking order violation: 282 > 254), which
crashes the hypervisor. Don't attempt to automatically unshare such pages and
just fall back to failing the op if the page type is not correct.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/Intel: virtualize support for cpuid faulting
Kyle Huey [Thu, 20 Oct 2016 13:44:28 +0000 (06:44 -0700)]
x86/Intel: virtualize support for cpuid faulting

On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
faulting state in vmx_do_cpuid and hvmemul_cpuid. A new function,
hvm_check_cpuid_fault will check if cpuid faulting is enabled and the CPL > 0.
When it returns true, the cpuid handling functions will inject a GP(0). Notably
explicit hardware support for faulting on cpuid is not necessary to emulate
support for an HVM guest.

On PV guests, hardware support is required so that userspace cpuid will trap
to Xen. Xen already enables cpuid faulting on supported CPUs for pv guests (that
aren't the control domain, see the comment in intel_ctxt_switch_levelling).
Every PV guest cpuid will trap via a GP(0) to emulate_privileged_op (via
do_general_protection). Once there we simply decline to emulate cpuid if the
CPL > 0 and faulting is enabled, leaving the GP(0) for the guest kernel to
handle.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere
Kyle Huey [Thu, 20 Oct 2016 13:44:27 +0000 (06:44 -0700)]
x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere

While we're here, use bool instead of bool_t.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoConfig.mk: use non-debug build for 4.8
Wei Liu [Thu, 20 Oct 2016 13:00:47 +0000 (14:00 +0100)]
Config.mk: use non-debug build for 4.8

Set debug ?= n in preparation for late RCs and eventual release.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/svm: Drop adjustment of X86_FEATURE_APIC
Andrew Cooper [Thu, 1 Sep 2016 09:38:27 +0000 (10:38 +0100)]
x86/svm: Drop adjustment of X86_FEATURE_APIC

The common hvm_cpuid() code already does this.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/sm{e, a}p: allow disabling sm{e, a}p for Xen itself
He Chen [Wed, 19 Oct 2016 08:03:24 +0000 (16:03 +0800)]
xen/sm{e, a}p: allow disabling sm{e, a}p for Xen itself

SMEP/SMAP is a security feature to prevent kernel executing/accessing
user address involuntarily, any such behavior will lead to a page fault.

SMEP/SMAP is open (in CR4) for both Xen and HVM guest in earlier code.
SMEP/SMAP bit set in Xen CR4 would enforce security checking for 32-bit
PV guest which will suffer unknown SMEP/SMAP page fault when guest
kernel attempt to access user address although SMEP/SMAP is close for
PV guests.

This patch introduces a new boot option value "hvm" for "sm{e,a}p", it
is going to diable SMEP/SMAP for Xen hypervisor while enable them for
HVM. In this way, 32-bit PV guest will not suffer SMEP/SMAP security
issue. Users can choose whether open SMEP/SMAP for Xen itself,
especially when they are going to run 32-bit PV guests.

Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
[Fixed up command line docs]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/vmx: Reduce the verbosity of the vmentry failure error reporting
Andrew Cooper [Thu, 13 Oct 2016 11:12:20 +0000 (12:12 +0100)]
x86/vmx: Reduce the verbosity of the vmentry failure error reporting

Identify the affected vcpu at the start of the message.  While tweaking this
area, add extra newlines between cases.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/vmx: Print the problematic MSR if a vmentry fails
Andrew Cooper [Thu, 13 Oct 2016 10:46:58 +0000 (11:46 +0100)]
x86/vmx: Print the problematic MSR if a vmentry fails

Sample error looks like:

  (XEN) Failed vm entry (exit reason 0x80000022) caused by MSR loading (entry 13).
  (XEN)   msr 0000068a val 1fff800000102af0 (mbz 0)
  (XEN) ************* VMCS Area **************

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: remove explicit rule for libxl_arm_acpi.o
Wei Liu [Tue, 18 Oct 2016 12:43:07 +0000 (13:43 +0100)]
libxl: remove explicit rule for libxl_arm_acpi.o

After 9c635883 ("ARM64: fix libxl build, do not include
../../xen/include") there is nothing special needed to build
libxl_arm_acpi.o. Remove the explicit rule, use predefined one.

Build tested on ARM64.

Suggested-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoARM64: fix libxl build, do not include ../../xen/include
Stefano Stabellini [Tue, 18 Oct 2016 11:32:50 +0000 (12:32 +0100)]
ARM64: fix libxl build, do not include ../../xen/include

Do not include ../../xen/include/ to build libxl_arm_acpi.c: header
files clashing against default headers under /usr/include are present in
that directory.

Link only $(XEN_ROOT)/xen/include/acpi under tools/include instead.

Build tested on ARM64 and x86_64.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/xl: Use %u for uint32_t domids
Ronald Rojas [Mon, 17 Oct 2016 00:16:32 +0000 (20:16 -0400)]
tools/xl: Use %u for uint32_t domids

domid is normally represented by uint32_t, but many format
strings in xl_cmdimpl.c use %d when printing, which is signed.
Use %u instead to print the unsigned integer domid.

Signed-off-by: Ronald Rojas <ronladred@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibacpi: add back the "G" in "GNU" in licence header
Wei Liu [Fri, 14 Oct 2016 17:02:32 +0000 (18:02 +0100)]
libacpi: add back the "G" in "GNU" in licence header

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibacpi: fix arm64 build
Wei Liu [Fri, 14 Oct 2016 17:02:30 +0000 (18:02 +0100)]
libacpi: fix arm64 build

The arm64 build for libacpi was broken due to two reasons:

1. ACPI_BUILD_DIR was appended twice to dsdt_anycpu_arm.c.
2. The inclusion of firmware/Rules.mk overrided XEN_TARGET_ARCH, which
   made CONFIG_ARM disappear.

Fix those by:

1. Correctly generate full path for dsdt_anaycpu_arm.c.
2. Include tools/Rules.mk instead, because libacpi/Makefile doesn't rely
   on settings in firmware/Rules.mk.

While at it, use CONFIG_ARM_64 instead of CONFIG_ARM as it is more
accurate.

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agodocs: RTDS feature document.
Dario Faggioli [Fri, 14 Oct 2016 10:02:25 +0000 (11:02 +0100)]
docs: RTDS feature document.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: Credit2 feature document.
Dario Faggioli [Fri, 14 Oct 2016 10:01:40 +0000 (11:01 +0100)]
docs: Credit2 feature document.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: Credit1 feature document.
Dario Faggioli [Fri, 14 Oct 2016 10:00:55 +0000 (11:00 +0100)]
docs: Credit1 feature document.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/Viridian: don't depend on undefined register state
Jan Beulich [Fri, 14 Oct 2016 12:09:42 +0000 (14:09 +0200)]
x86/Viridian: don't depend on undefined register state

The high halves of all GPRs are undefined in 32-bit and compat modes,
and the dependency is being obfuscated by our structure field names not
matching architectural register names (it was actually while putting
together a patch to correct this when I noticed the issue here).

For consistency also use the architecturally correct names on the
output side.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
8 years agox86emul: fix pushing of selector registers
Jan Beulich [Fri, 14 Oct 2016 12:09:16 +0000 (14:09 +0200)]
x86emul: fix pushing of selector registers

Both explicit PUSH and far CALL currently push unrelated data (the
segment attributes word) in the high half (attributes and limit in the
64-bit case in the high 48 bits) instead of zero. To avoid having to
apply this and further changes in multiple places, also fold the two
(respectively) far call/jmp instances into one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: honor MXCSR.MM
Jan Beulich [Fri, 14 Oct 2016 12:08:29 +0000 (14:08 +0200)]
x86emul: honor MXCSR.MM

Commit 6dc9ac9f52 ("x86emul: check alignment of SSE and AVX memory
operands") didn't consider a specific AMD mode: Mis-alignment #GP
faults can be masked on some of their hardware.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/hvm: Clobber %cs.L when LME becomes set
Andrew Cooper [Thu, 13 Oct 2016 12:16:47 +0000 (12:16 +0000)]
x86/hvm: Clobber %cs.L when LME becomes set

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/hvm: Correct the position of the %cs L/D checks
Andrew Cooper [Thu, 13 Oct 2016 10:27:28 +0000 (11:27 +0100)]
x86/hvm: Correct the position of the %cs L/D checks

Contrary to the description in the software manuals, in Long Mode, attempts to
load %cs check that D is not set in combination with L before the present flag
is checked.

This can be observed because the L/D check fails with #GP before the presence
check failes with #NP.

This change partially reverts c/s 78ff18c90 "x86: defer not-present segment
checks", taking it back to how it was in the v1 submission.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agotools: check liblzma in configure for rombios
Wei Liu [Thu, 13 Oct 2016 11:03:17 +0000 (12:03 +0100)]
tools: check liblzma in configure for rombios

We upgraded ipxe in 38ab99b2 ("ipxe: update to new commit"). That
version of ipxe requires liblzma to build.

Check that in configure and document this in README.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agox86emul: correct {,F}CMOV and F{,U}COMI{,P} emulation
Jan Beulich [Thu, 13 Oct 2016 11:07:25 +0000 (13:07 +0200)]
x86emul: correct {,F}CMOV and F{,U}COMI{,P} emulation

The FPU ones need to be executed with guest EFLAGS.{C,P,Z}F in context.

We also can't exclude someone wanting to hide the feature from (32-bit)
guests.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agokeyhandler: rework process of nonirq keyhandler
Lan Tianyu [Thu, 13 Oct 2016 11:06:28 +0000 (13:06 +0200)]
keyhandler: rework process of nonirq keyhandler

Keyhandler may run for a long time in serial port driver's
timer handler on the large machine with a lot of physical
cpus(e,g dump_timerq()) when serial port driver works in
the poll mode(via the exception mechanism).

If a timer handler runs a long time, it will block nmi_timer_fn()
to feed NMI watchdog and cause Xen hypervisor panic. Inserting
process_pending_softirqs() in timer handler will not help. when timer
interrupt arrives, timer subsystem calls all expired timer handlers
before programming next timer interrupt. There is no timer interrupt
arriving to trigger timer softirq during run a timer handler.

This patch is to fix the issue to make nonirq keyhandler run in
tasklet when receive debug key from serial port.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoipxe: update to newer commit
Wei Liu [Mon, 10 Oct 2016 12:50:58 +0000 (13:50 +0100)]
ipxe: update to newer commit

The current commit in tree is rather old. It has come to a point that
cherry-picking commits from upstream isn't trivial anymore.

There is long term plan to track ipxe upstream, but for 4.8 release, we
should just update ipxe to a newer commit (they are using rolling
release model now).

Forward-port the one boot prompt patch that is still relevant and retire
the rest which are already in upstream.

Reported-by: Juergen Schinker <ba1020@homie.homelinux.net>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxen/arm: Disable the Cortex-a53-edac
Edgar E. Iglesias [Thu, 6 Oct 2016 16:36:31 +0000 (18:36 +0200)]
xen/arm: Disable the Cortex-a53-edac

Disable the Cortex-a53-edac. Xen currently does not yet
handle reads/writes to the implementation defined CPUMERRSR
register.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/trace: Fix trace metadata page count calculation (revert fbf96e6)
George Dunlap [Fri, 30 Sep 2016 14:42:56 +0000 (15:42 +0100)]
xen/trace: Fix trace metadata page count calculation (revert fbf96e6)

Changeset fbf96e6, "xentrace: correct formula to calculate
t_info_pages", broke the trace metadata page count calculation, by
mistaking t_info_first_offset as denominated in bytes, when in fact it
is denominated in words (uint32_t).

Effectively revert that change, and put a comment there to reduce the
chance that someone will make that mistake in the future.

Reviewed-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Tested-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
8 years agoMakefile: fix (again) EFI part of "symbols: Generate an xen-sym.map
Konrad Rzeszutek Wilk [Mon, 10 Oct 2016 18:10:56 +0000 (11:10 -0700)]
Makefile: fix (again) EFI part of "symbols: Generate an xen-sym.map

This is a follow-up to commit d14fffcc6a7c054db9e337026a3c850152244ac4
"fix EFI part of "symbols: Generate an xen-sym.map" which fixed most of
the issues.

However we still have an issue - The file being installed (xen.efi.map)
does not exist in an ARM64 build (the xen.efi is linked againts xen).

The fix can be done two ways:
 a) See if xen.efi.map exists and then copy it
 b) Or link xen.efi.map to xen-syms.map (similar to how xen.efi is linked
    against xen).

The patch chooses the former.

Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoKconfig: use tab instead of space
Wei Liu [Mon, 10 Oct 2016 09:40:30 +0000 (10:40 +0100)]
Kconfig: use tab instead of space

Previously in d6be2cfc ("xen: make clear gcov support limitation in
Kconfig") and db6c2264 ("xen: add a gcov Kconfig option"), space was
used to indent Kconfig text. Change that to use tab instead.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: defer not-present segment checks
Jan Beulich [Mon, 10 Oct 2016 10:16:49 +0000 (12:16 +0200)]
x86: defer not-present segment checks

Following on from commits 5602e74c60 ("x86emul: correct loading of
%ss") and bdb860d01c ("x86/HVM: correct segment register loading during
task switch") the point of the non-.present checks needs to be refined:
#NP (and its #SS companion), other than suggested by the various
instruction pages in Intel's SDM, gets checked for only after all type
and permission checks. The only checks getting done even later are the
long mode specific ones for system descriptors (which we don't support
yet) and 64-bit code segments (i.e. anything touching other than the
attribute byte).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86: replace redundant MTRR MSR definitions
Jan Beulich [Mon, 10 Oct 2016 10:16:06 +0000 (12:16 +0200)]
x86: replace redundant MTRR MSR definitions

We really should have only one set of #define-s for them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/hvm: remove emulation context setting from hvmemul_cmpxchg()
Razvan Cojocaru [Fri, 7 Oct 2016 09:35:58 +0000 (11:35 +0200)]
x86/hvm: remove emulation context setting from hvmemul_cmpxchg()

hvmemul_cmpxchg() sets the read emulation context in p_new instead
of p_old, which is inconsistent (and wrong). Since p_old is
unused in any case and cmpxchg() semantics would be altered even
if it wasn't, remove the emulation context setting code.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
8 years agotimer: process softirq during dumping timer info
Lan Tianyu [Fri, 7 Oct 2016 09:35:26 +0000 (11:35 +0200)]
timer: process softirq during dumping timer info

Dumping timer info may run for a long time on the huge machine with
a lot of physical cpus. To avoid triggering NMI watchdog, add
process_pending_softirqs() in the loop of dumping timer info.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agox86emul: check for FPU availability
Jan Beulich [Wed, 5 Oct 2016 12:20:10 +0000 (14:20 +0200)]
x86emul: check for FPU availability

We can't exclude someone wanting to hide the FPU from guests.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
8 years agox86emul: deliver correct math exceptions
Jan Beulich [Wed, 5 Oct 2016 12:19:43 +0000 (14:19 +0200)]
x86emul: deliver correct math exceptions

#MF only applies to x87 instructions. SSE and AVX ones need #XM to be
raised instead, unless CR4.OSXMMEXCPT is clear, in which case #UD needs
to result. (But note that this is only a latent issue - we don't
emulate any instructions so far which could result in #XM.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86emul: honor guest CR4.OSFXSR and CR4.OSXSAVE
Jan Beulich [Wed, 5 Oct 2016 12:18:42 +0000 (14:18 +0200)]
x86emul: honor guest CR4.OSFXSR and CR4.OSXSAVE

These checks belong into the emulator instead of hvmemul_get_fpu().

The CR0.PE/EFLAGS.VM ones can actually just be ASSERT()ed, as decoding
should make it impossible to get into get_fpu() with them in the wrong
state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoFix to be error handled when 10ms delayed for cpu_on
casionwoo [Tue, 4 Oct 2016 11:04:08 +0000 (20:04 +0900)]
Fix to be error handled when 10ms delayed for cpu_on

Comment of origin code said "wait max 10 ms until cpu is on"
Origin code expects to print "CPU%d power enable failed", if cpu do not on until 10ms
But actual code do not reach to print even it wait 10 ms (actually it waits 11ms not 10ms)
Because the comparing is like bellow
"if ( timeout-- == 0 )"
So I modified the code to wait 10ms and print the error statement
Let me simulate about origin code and modified code.

Origin code)

timeout    delayed time   timeout
(before while)     (mdelay(1)) (timeout--)
  10     1 9
  9 2 8
  8 3 7
  7 4 6
  6 5 5
  5 6 4
  4 7 3
  3 8 2
  2 9 1
  1 10 0
  0 11 -1

Modified code)

timeout    delayed time   timeout
(before while)     (mdelay(1)) (--timeout)
  10     1 9
  9 2 8
  8 3 7
  7 4 6
  6 5 5
  5 6 4
  4 7 3
  3 8 2
  2 9 1
  1 10 0

Signed-off-by: JEUNGWOO, YOO <casionwoo@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoarm: fix build with gcc6
Jan Beulich [Tue, 4 Oct 2016 10:26:14 +0000 (04:26 -0600)]
arm: fix build with gcc6

Commit e170622f95 ("xen/arm: p2m: Re-implement p2m_set_mem_access using
p2m_{set,get}_entry") eliminated the only user of level_sizes[],
causing gcc6 to warn about the unused variable (as it's a const one
older gcc versions apparently don't care to emit a warning).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86emul: honor guest CR0.TS and CR0.EM
Jan Beulich [Tue, 4 Oct 2016 13:04:46 +0000 (14:04 +0100)]
x86emul: honor guest CR0.TS and CR0.EM

We must not emulate any instructions accessing respective registers
when either of these flags is set in the guest view of the register, or
else we may do so on data not belonging to the guest's current task.

Being architecturally required behavior, the logic gets placed in the
instruction emulator instead of hvmemul_get_fpu(). It should be noted,
though, that hvmemul_get_fpu() being the only current handler for the
get_fpu() callback, we don't have an active problem with CR4: Both
CR4.OSFXSR and CR4.OSXSAVE get handled as necessary by that function.

This is XSA-190.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoinit-xenstore-domain: remove an unused variable
Jan Beulich [Tue, 4 Oct 2016 10:27:07 +0000 (04:27 -0600)]
init-xenstore-domain: remove an unused variable

Introduced by commit 80dd5b401e ("tools: add --maxmem parameter to
init-xenstore-domain").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: Mark libxl_retrieve_domain_configuration as for external callers only
Ian Jackson [Tue, 4 Oct 2016 09:19:36 +0000 (10:19 +0100)]
libxl: Mark libxl_retrieve_domain_configuration as for external callers only

This function takes the userdata lock.  Incautious use inside libxl
can result in nested acquisition of that lock, and deadlock.

There is no good reason to use this function inside libxl, but it is a
superficially attractive option.  Make future regressions easier to
spot by marking the function for external use only.

Similar arguments apply for the application-facing userdata accessors,
so do those too.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: fix issues in 38cd0664
Wei Liu [Mon, 3 Oct 2016 14:46:02 +0000 (15:46 +0100)]
libxl: fix issues in 38cd0664

A few issues were introduced in 38cd0664 ("libxl/arm: Add the size of
ACPI tables to maxmem"):

1. d_config was not properly initialised and disposed of.
2. using libxl_retrieve_domain_configuration caused thread to
   deadlock itself.

Fix those issues by:

1. properly initialise and dispose of d_config.
2. switch to use libxl__get_domain_configuration.

Note that in theory we can refactor libxl_retrieve_domain_configuration
a bit to get a function without locking, but up until the calculation of
extra memory only relies on static configuration, hence we use the
stored configuration only.

Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoXen 4.8.0-rc1 preparation
Ian Jackson [Mon, 3 Oct 2016 10:55:26 +0000 (11:55 +0100)]
Xen 4.8.0-rc1 preparation

* Change QEMU_UPSTREAM_REVISION MINIOS_UPSTREAM_REVISION and
  QEMU_TRADITIONAL_REVISION to refer to the Xen 4.8.0-rc1 tags.

* Change README and xen/Makefile to refer to Xen 4.8.0-rc (note, the
  RC number is not included, so we do not have to update these again).

I reran autogen.sh as per the release checklist and this produced no
changes, as expected.  (Debian jessie i386.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotmem: Batch and squash XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_[FLAGS,NPAGES,UUID]
Konrad Rzeszutek Wilk [Fri, 30 Sep 2016 19:10:22 +0000 (15:10 -0400)]
tmem: Batch and squash XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_[FLAGS,NPAGES,UUID]
in one sub-call: XEN_SYSCTL_TMEM_OP_GET_POOLS.

These operations are used during the save process of migration.
Instead of doing 64 hypercalls lets do just one. We modify
the 'struct xen_tmem_client' structure (used in
XEN_SYSCTL_TMEM_OP_[GET|SET]_CLIENT_INFO) to have an extra field
'nr_pools'. Armed with that the code slurping up pages from the
hypervisor can allocate a big enough structure (struct tmem_pool_info)
to contain all the active pools. And then just iterate over each
one and save it in the stream.

We are also re-using one of the subcommands numbers for this,
as such the XEN_SYSCTL_INTERFACE_VERSION should be incremented
and that was done in the patch titled:
"tmem/libxc: Squash XEN_SYSCTL_TMEM_OP_[SET|SAVE].."

In the xc_tmem_[save|restore] we also added proper memory handling
of the 'buf' and 'pools'. Because of the loops and to make it as
easy as possible to review we add a goto label and for almost
all error conditions jump in it.

The include for inttypes is required for the PRId64 macro to
work (which is needed to compile this code under 32-bit).

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agotmem/xc_tmem_control: Rename 'arg1' to 'len' and 'arg2' to arg.
Konrad Rzeszutek Wilk [Fri, 30 Sep 2016 19:10:01 +0000 (15:10 -0400)]
tmem/xc_tmem_control: Rename 'arg1' to 'len' and 'arg2' to arg.

That is what they are used for. Lets make it more clear.

Of all the various sub-commands, the only one that needed
semantic change is XEN_SYSCTL_TMEM_OP_SAVE_BEGIN. That in the
past used 'arg1', and now we are moving it to use 'arg'.
Since that code is only used during migration which is tied
to the toolstack it is OK to change it.

We should increment the XEN_SYSCTL_INTERFACE_VERSION because
of this, and that was fortunatly done in the patch titled:
"tmem/libxc: Squash XEN_SYSCTL_TMEM_OP_[SET|SAVE].."

While at it, also fix xc_tmem_control_oid to properly handle
the 'buf' and bounce it as appropiate.

Acked-by: Andrew cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agotmem: Unify XEN_SYSCTL_TMEM_OP_[[SAVE_[BEGIN|END]|RESTORE_BEGIN]
Konrad Rzeszutek Wilk [Mon, 26 Sep 2016 15:05:09 +0000 (11:05 -0400)]
tmem: Unify XEN_SYSCTL_TMEM_OP_[[SAVE_[BEGIN|END]|RESTORE_BEGIN]

return values. For success they used to be 1 ([SAVE,RESTORE]_BEGIN),
0 if guest did not have any tmem (but only for SAVE_BEGIN), and
-1 for any type of failure.

And SAVE_END (which you would think would mirror SAVE_BEGIN)
had 0 for success and -1 if guest did not any tmem enabled for it.

This is confusing. Now the code will return 0 if the operation was
success.  Various XEN_EXX values are returned if tmem is not enabled
or the operation could not performed.

The xc_tmem.c code only needs one place to check - where we use
SAVE_BEGIN. The place where RESTORE_BEGIN is used will have errno
with the proper error value and return will be -1, so will still
fail properly.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agotmem/libxc: Squash XEN_SYSCTL_TMEM_OP_[SET|SAVE]..
Konrad Rzeszutek Wilk [Fri, 30 Sep 2016 14:53:01 +0000 (10:53 -0400)]
tmem/libxc: Squash XEN_SYSCTL_TMEM_OP_[SET|SAVE]..

Specifically:

XEN_SYSCTL_TMEM_OP_SET_[WEIGHT,COMPRESS] are now done via:

 XEN_SYSCTL_TMEM_SET_CLIENT_INFO

and XEN_SYSCTL_TMEM_OP_SAVE_GET_[VERSION,MAXPOOLS,
CLIENT_WEIGHT, CLIENT_FLAGS] can now be retrieved via:

 XEN_SYSCTL_TMEM_GET_CLIENT_INFO

All this information is now in 'struct xen_tmem_client' and
that is what we pass around.

We also rev up the XEN_SYSCTL_INTERFACE_VERSION as we are
re-using the value number of the deleted ones (and henceforth
the information is retrieved differently).

On the toolstack, prior to this patch, the xc_tmem_control
would use the bounce buffer only when arg1 was set and the cmd
was to list. With the 'XEN_SYSCTL_TMEM_OP_SET_[WEIGHT|COMPRESS]'
that made sense as the 'arg1' would have the value. However
for the other ones (say XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_UUID)
the 'arg1' would be the length of the 'buf'. If this
confusing don't despair, patch patch titled:
tmem/xc_tmem_control: Rename 'arg1' to 'len' and 'arg2' to arg.
takes care of that.

The acute reader of the toolstack code will discover that
we only used the bounce buffer for LIST, not for any other
subcommands that used 'buf'!?! Which means that the contents
of 'buf' would never be copied back to the calleer 'buf'!

The author is not sure how this could possibly work, perhaps Xen 4.1
(when this was introduced) was more relaxed about the bounce buffer
being enabled. Anyhow this fixes xc_tmem_control to do it for
any subcommand that has 'arg1'.

Lastly some of the checks in xc_tmem_[restore|save] are removed
as they can't ever be reached (not even sure how they could
have been reached in the original submission). One of them
is the check for the weight against -1 when in fact the
hypervisor would never have provided that value.

Now the checks are simple - as the hypercall always returns
->version and ->maxpools (which is mirroring how it was done
prior to this patch). But if one wants to check the if a guest
has any tmem activity then the patch titled
"tmem: Batch and squash XEN_SYSCTL_TMEM_OP_SAVE_GET_POOL_
[FLAGS,NPAGES,UUID] in one sub-call: XEN_SYSCTL_TMEM_OP_GET_POOLS."
adds an ->nr_pools to check for that.

Also we add the check for ->version and ->maxpools and remove
the TODO.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agotmem/sysctl: Add union in struct xen_sysctl_tmem_op
Konrad Rzeszutek Wilk [Fri, 30 Sep 2016 14:50:32 +0000 (10:50 -0400)]
tmem/sysctl: Add union in struct xen_sysctl_tmem_op

No functional change. We do this to prepare for another
entry to be added in the union. See patch titled:
"tmem/libxc: Squash XEN_SYSCTL_TMEM_OP_[SET|SAVE]"

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agotmem: Move client weight, frozen, live_migrating, and compress
Konrad Rzeszutek Wilk [Fri, 30 Sep 2016 14:10:42 +0000 (10:10 -0400)]
tmem: Move client weight, frozen, live_migrating, and compress

in its own structure. This paves the way to make only one hypercall
to retrieve/set this information instead of multiple ones.

Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agotmem: Delete deduplication (and tze) code.
Konrad Rzeszutek Wilk [Tue, 27 Sep 2016 13:40:22 +0000 (09:40 -0400)]
tmem: Delete deduplication (and tze) code.

Couple of reasons:
 - It can lead to security issues (see row-hammer, KSM and such
   attacks).
 - Code is quite complex.
 - Deduplication is good if the pages themselves are the same
   but that is hardly guaranteed.
 - We got some gains (if pages are deduped) but at the cost of
   making code less maintainable.
 - tze depends on deduplication code.

As such, deleting it.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>