Olaf Hering [Wed, 1 Oct 2014 16:41:25 +0000 (18:41 +0200)]
tools: remove private copies of includedir and libdir from libxenstat
They are wrong and unused.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Oct 2014 16:41:24 +0000 (18:41 +0200)]
Make XENFIRMWAREDIR a subdir of libexecdir
Put the firmware files below libexecdir. This is essentially just a new
name for the existing path. It has the benefit that it can be configured
via --libexecdir= if required.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:23 +0000 (18:41 +0200)]
Add configure --with-sysconfig-leaf-dir=SUBDIR to set CONFIG_LEAF_DIR
Set CONFIG_LEAF_DIR with configure to give control if needed. The
check for the correct value if the option is not specified is tricky.
Since other packages (such as grub2) started to populate also
/etc/default/ a given system may have both directories.
Use "default" only if /etc/sysconfig does not exist. "sysconfig"
remains the default.
Move the variable from StdGNU.mk to Linux.mk because thats the only
place where it is used.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:22 +0000 (18:41 +0200)]
Move variable to set bash_completion.d to Paths.mk
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
This replaces config/Linux.modules with a configure option. As a result
of this change only a single xencomons.in is required, instead of a
xencomons.in.in and sed hackery.
After this change blktap2 and blktap will be loaded at the same time.
This is already done in out-of-tree xencommons scripts, and systemd will
load both modules as well. No harm is expected by loading both modules.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:20 +0000 (18:41 +0200)]
Add configure --enable-rpath
This fixes the tools when xen is configured with --prefix=/odd/path
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:19 +0000 (18:41 +0200)]
Use configure --localstatedir=BASEDIR to set path to /var
This is helpful to test make uninstall with --prefix=/private/dir as
unprivileged user. No change in behaviour is expected by this change.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:18 +0000 (18:41 +0200)]
Use configure --prefix=DIR to set PREFIX
PREFIX is set by configure --prefix=DIR, nothing outside
tools,docs,stubdom is using this variable.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Oct 2014 16:41:17 +0000 (18:41 +0200)]
Use configure --libexecdir=BASEDIR to set LIBEXEC
The current usage of LIBEXEC is bogus. It describes the directory for
private xen executables. Other places create their own, similar
libexecdir path as $prefix/lib/xen/*.
Additional two other variables are used to describe similar paths:
PRIVATE_BINDIR and PRIVATE_PREFIX
The autoconf documentation refers to libexec as a directory for
executables and stuff which is called by other programs, not by the
user.
Adjust all places that want libexecdir as a target path. LIBEXEC refers
now to the base directory. Three convenience variables are used to refer
to paths to private binaries, libs and include files.
In the systemd files LIBEXEC_BIN is substituted, so this variable has to
be present in autoconf. All other variables are expanded in Paths.mk
because they are only used in Makefiles.
Most users of LIBEXEC are updated to use LIBEXEC_BIN because that is
what they want.
Users of PRIVATE_BINDIR are updated to use LIBEXEC_BIN because that is
what they want. PRIVATE_BINDIR and PRIVATE_PREFIX usage is removed by
this patch, in favour of LIBXEC_BIN and LIBEXEC
An internal libxl function was removed. A single helper to retrieve
LIBEXEC_BIN remains.
As suggested by the autoconf documentation, configure appends the
package name to LIBEXEC to make sure the provided directory really
refers to xen. This makes sure "make uninstall" preserves the real
libexecdir.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested, updated QEMU_TRADITIONAL_REVISION to
pickup version which uses LIBEXEC_BIN ]
Olaf Hering [Wed, 1 Oct 2014 16:41:16 +0000 (18:41 +0200)]
Use configure --includedir=DIR to set INCLUDEDIR
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Oct 2014 16:41:15 +0000 (18:41 +0200)]
Use configure --docdir=DIR to set DOCDIR
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Oct 2014 16:41:14 +0000 (18:41 +0200)]
Use configure --mandir=DIR to set MANDIR
Also move common MAN8DIR and MAN1DIR to Paths.mk.in
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Oct 2014 16:41:13 +0000 (18:41 +0200)]
tools: substitute bindir instead of BINDIR
... and same for sbindir and libdir.
Expand usage of exec_prefix so that it does not appear in substituted
variables in systemd files.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Akced-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:12 +0000 (18:41 +0200)]
Substitue configure variables in Paths.mk.in
This patch lays the groundwork to convert variables used in Makefiles
to the common automake style, i.e. PREFIX becomes prefix, MANDIR becomes
mandir and so on.
The reason is that configure variables such as mandir expand to
${datarootdir}/man, and datarootdir expands to ${prefix}/share. This
requires extra expansion in configure.ac before assigning to MANDIR.
Special care must be taken when variable substition is done in other
files, such as xencommons.in. All @VARIABLES@ used in these files have
to be the expanded version, or all other variables must be available at
runtime.
This patch by itself changes nothing, but upcoming changes will make use
of the lowercase variables.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Oct 2014 16:41:11 +0000 (18:41 +0200)]
remove duplicate variables from config
... because they are already in Paths.mk:
BINDIR, LIBEXEC, PRIVATE_BINDIR, PRIVATE_PREFIX, SBINDIR, SHAREDIR,
XEN_CONFIG_DIR, XENFIRMWAREDIR, XEN_LOCK_DIR, XEN_PAGING_DIR,
XEN_RUN_DIR. Remove unused PKG_XEN_PREFIX, which was also incorrectly
assigned to PRIVATE_PREFIX.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:10 +0000 (18:41 +0200)]
tools/hotplug: substitute XEN_SCRIPT_DIR on FreeBSD
Also remove DESTDIR from the path, this was most likely not intended.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:09 +0000 (18:41 +0200)]
tools/hotplug: use INITD_DIR instead of CONFIG_DIR/init.d|rc.d
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Oct 2014 16:41:08 +0000 (18:41 +0200)]
tools/configure.ac: sort AC_CONFIG_FILES
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:07 +0000 (18:41 +0200)]
tools/hotplug: use XEN_SCRIPT_DIR instead of hardcoded path
Helper scripts get installed into XEN_SCRIPT_DIR, but initscripts,
helper scripts and udev rules still refer to the hardcoded location
/etc/xen/scripts/. Update scripts, rules and Makefile to refer to
@XEN_SCRIPT_DIR@ instead.
Update configure.ac to substitute the path in files using
XEN_SCRIPT_DIR. Remove XEN_SCRIPT_DIR from StdGNU.mk and SunOS.mk, its
already in Paths.mk.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- ran autogen.sh as requested ]
Olaf Hering [Wed, 1 Oct 2014 16:41:05 +0000 (18:41 +0200)]
tools/python: use also LDFLAGS for build
An upcoming change will pass -Wl,-rpath to xc.so. Make sure such LDFLAGS
will be used for python libs.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Wed, 1 Oct 2014 16:41:04 +0000 (18:41 +0200)]
tools/hotplug: fix race during xen.conf creation
A make -j8 will call the xen.conf rule twice. The move-if-changed
macro may fail if the tmp file was already removed by the other make
process. Fix this by let the all target depend on install.
Also remove the generated file with make clean.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- fixed s/of/if/ typo in commit message ]
Olaf Hering [Wed, 1 Oct 2014 16:41:03 +0000 (18:41 +0200)]
tools: fix make uninstall
The uninstall target does not know about the paths it removes because
the toplevel Makefile does not include the required files.
Move the commands to tools/Makefile because all files come from subdirs
in tools/ anyway. Drop the removal of $(XEN_RUN_DIR) because it gets
created at runtime. Drop the removal of systemd related files because
the wildcard matches everything.
The proper fix is to remove the files and directories in the Makefiles
which install them. But this version is the least intrusive change at
this point.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
xen/arm: check on domain type against hardware support
Some arm64 platforms implement only aarch64 mode. So allow
domains that are only 64-bit
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- s/cpu_has_a32/cpu_has_arm/ as discussed on list ]
xen/arm: Deliver interrupts to vcpu specified in IROUTER
In GICv3 use IROUTER register contents to deliver irq to
specified vcpu.
vgic irouter[irq] is used to represent vcpu number for which
irq affinity is assigned. Bit[31] is used to store IROUTER
bit[31] value to represent irq mode.
Tamas K Lengyel [Mon, 29 Sep 2014 15:55:13 +0000 (17:55 +0200)]
xen/arm: Add p2m_set_permission and p2m_shatter_page helpers.
Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Dongxiao Xu [Mon, 6 Oct 2014 10:43:20 +0000 (12:43 +0200)]
x86: enable CMT for each domain RMID
If the CMT service is attached to a domain, its related RMID
will be set to hardware for monitoring when the domain's vcpu is
scheduled in. When the domain's vcpu is scheduled out, RMID 0
(system reserved) will be set for monitoring.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Dongxiao Xu [Mon, 6 Oct 2014 10:42:32 +0000 (12:42 +0200)]
x86: collect global CMT information
This implementation tries to put all policies into user space, thus some
global CMT information needs to be exposed, such as the total RMID count,
L3 upscaling factor, etc.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Dongxiao Xu [Mon, 6 Oct 2014 10:35:55 +0000 (12:35 +0200)]
libxc: provide interface for generic resource access
Xen added a new platform_op hypercall for generic MSR access, and this
is the the tool side change to wrapper the hypercall into xc APIs.
For non-preemptible batch resource operations, group them in entries of
xc_resource_op structure. For preemptible ones, use multiple
xc_resource_op structure instead.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add a generic resource access hypercall for tool stack or other
components, e.g., accessing MSR, port I/O, etc.
The resource is abstracted as a resource address/value pair.
The resource access can be any type of XEN_RESOURCE_OP_*(current
only support MSR and it's white-listed). The resource operations
are always runs on cpu that caller specified. If caller does not
care this, it should use current cpu to eliminate the IPI overhead.
Batch resource operations in one call are also supported but the
max number currently is limited to 2. The operations in a batch are
non-preemptible and execute in their original order. If preemptible
batch is desirable, then multicall mechanism can be used.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Mon, 6 Oct 2014 09:22:04 +0000 (11:22 +0200)]
x86/paging: make log-dirty operations preemptible
Both the freeing and the inspection of the bitmap get done in (nested)
loops which - besides having a rather high iteration count in general,
albeit that would be covered by XSA-77 - have the number of non-trivial
iterations they need to perform (indirectly) controllable by both the
guest they are for and any domain controlling the guest (including the
one running qemu for it).
Note that the tying of the continuations to the invoking domain (which
previously [wrongly] used the invoking vCPU instead) implies that the
tools requesting such operations have to make sure they don't issue
multiple similar operations in parallel.
Note further that this breaks supervisor-mode kernel assumptions in
hypercall_create_continuation() (where regs->eip gets rewound to the
current hypercall stub beginning), but otoh
hypercall_cancel_continuation() doesn't work in that mode either.
Perhaps time to rip out all the remains of that feature?
This is part of CVE-2014-5146 / XSA-97.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
docs, amd_ucode: add AMD container file format notes
This patch introduces documentation notes about AMD container
file formats and where to obtain latest container files from.
Also, We provide a how-to for updating patch level by
concatenating container files along with initrd images.
Misc notes about how Xen handles two containers of same
kind (if/when) they are concatenated together are also included.
Andrew Cooper [Mon, 6 Oct 2014 09:20:12 +0000 (11:20 +0200)]
AMD/guest_iommu: properly disable guest iommu support
AMD Guest IOMMU support was added to allow correct use of PASID and PRI
hardware support with an ATS-aware guest driver.
However, support cannot possibly function as guest_iommu_set_base() has no
callers. This means that its MMIO region's P2M pages are not set to
p2m_mmio_dm, preventing any invocation of the MMIO read/write handlers.
c/s fd186384 "x86/HVM: extend LAPIC shortcuts around P2M lookups" introduces a
path (via hvm_mmio_internal()) where iommu_mmio_handler claims its MMIO range,
and causes __hvm_copy() to fail with HVMCOPY_bad_gfn_to_mfn.
iommu->mmio_base defaults to 0, with a range of 8 pages, and is unilaterally
enabled in any HVM guests when the host IOMMU(s) supports any extended
features.
Unfortunately, HVMLoader's AP boot trampoline executes an `lmsw` instruction
at linear address 0x100c which unconditionally requires emulation. The
instruction fetch in turn fails as __hvm_copy() fails with
HVMCOPY_bad_gfn_to_mfn.
The result is that multi-vcpu HVM guests do not work on newer AMD hardware, if
IOMMU support is enabled in the BIOS.
Change the default mmio_base address to ~0ULL. This prevents
guest_iommu_mmio_range() from actually claiming any physical range
whatsoever, which allows the emulation of `lmsw` to succeed.
Reported-by: Roberto Luongo <rluongo@ready.it> Suggested-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Roberto Luongo <rluongo@ready.it> Acked-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Jan Beulich [Mon, 6 Oct 2014 09:15:01 +0000 (11:15 +0200)]
don't allow Dom0 access to IOMMUs' MMIO pages
Just like for LAPIC, IO-APIC, MSI, and HT we shouldn't be granting Dom0
access to these. This implicitly results in these pages also getting
marked reserved in the machine memory map Dom0 uses to determine the
ranges where PCI devices can have their MMIO ranges placed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Mon, 6 Oct 2014 09:13:19 +0000 (11:13 +0200)]
x86: restore reserving of IO-APIC pages in XENMEM_machine_memory_map output
Commit d1222afda4 ("x86: allow Dom0 read-only access to IO-APICs") had
an unintended side effect: By no longer adding IO-APIC pages to Dom0's
iomem_caps these also no longer get reported as reserved in the machine
memory map presented to it (which got added there intentionally by
commit b8a456caed ["x86: improve reporting through
XENMEM_machine_memory_map"] because many BIOSes fail to add these).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Mon, 6 Oct 2014 09:11:28 +0000 (11:11 +0200)]
x86/MSI: fix MSI-X case of freeing IRQ
Commit d1b6d0a024 ("x86: enable multi-vector MSI") went a little too
far with moving things around in msi_free_irqs() in order to streamline
the code: We shouldn't drop the MSI-X control page reference before
calling destroy_irq(), as the latter will call us back via
desc->handler->shutdown() (effectively invoking to msi_set_mask_bit()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
amd/seattle: Initial revision of AMD Seattle support
This patch add initial (minimal) platform support for AMD Seattle,
which mainly just define the matching ID, and specify system_off,
and system_reset mechanism.
Initially, the firmware only support a subset of PSCI-0.2 functions,
system-off and system-reset. The mechanism for bring up auxiliary processors
is still using spin-table.
Frediano Ziglio [Thu, 2 Oct 2014 15:16:37 +0000 (16:16 +0100)]
xen/arm: Fix crash if last memory section is bigger than 1gb
On arm32 the xenheap has a maximum size of 1GB. On systems with more than 8GB
(so 1/8 total RAM is greater than 1GB) there is no point in searching for a
region with 1/8 of the total RAM when only 1GB will be used. Therefore limit
the maximum size to 1GB before searching.
Jan Beulich [Thu, 2 Oct 2014 15:03:04 +0000 (17:03 +0200)]
x86/APIC: don't make wrong implications on constants
For the physical APIC, oprofile code was abusing APIC_DM_NMI as a mask.
For the virtual APIC, a wrong assumption was made that LVTPC could be
programmed to only fixed or NMI delivery modes. While other modes are
invalid here, we still shouldn't inject an NMI into the guest in such
a case. Instead just do nothing.
In the course of adjusting this it became obvious that what value
vpmu_do_interrupt() returns on its various return paths was pretty
arbitrary. With its only caller ignoring the return value, simply make
the function's return type "void".
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roy Franz [Thu, 2 Oct 2014 15:02:23 +0000 (17:02 +0200)]
arm64: create xen.efi binary for arm64
The 'xen' binary for arm64 is both an Image file and a PE/COFF executable,
copy it to xen.efi so that the 'make install' processing is shared with
x86. Prior to this 'make install' was broken on arm64.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Roy Franz [Thu, 2 Oct 2014 15:02:04 +0000 (17:02 +0200)]
EFI: update documentation for arm64
The arm64 EFI boot support added a new 'dtb' value to the configuration file.
Update the documentation to describe this and how the coniguration file is not
used when GRUB loads the modules. Updates 'ucode' description to indicate that
it is x86 only.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
libxc: fix mmap leak in xc_unmap_domain_meminfo/xc_map_domain_meminfo
xc_unmap_domain_meminfo uses P2M_FLL_ENTRIES macro instead of P2M_FL_ENTRIES.
Moreover, P2M_FL_ENTRIES macro uses (dinfo->p2m_size) which is always 0 here
as we don't initialize it. The result is that we always unmap just 1 frame.
xc_map_domain_meminfo uses P2M_FLL_ENTRIES macro instead of P2M_FL_ENTRIES
on failure path.
The issue went unnoticed mostly because we use unmap_domain_meminfo and
xc_map_domain_meminfo in one-shot xen-mfndump and xen-hptool (through
xc_exchange_page()) tools. When used is long-running apps (e.g. in xl)
domains become zombies after their death.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 1 Oct 2014 15:13:41 +0000 (16:13 +0100)]
xen/arm: p2m: Correctly initialize cur_offset
{~0,} only initializes the first cell of the array to ~0. The other cells
are initialized to 0.
Explicitly initialize every cells of the array and, at the same time, do the
same for the mappings.
This is fixing boot after 82985d7 "xen: arm: handle variable p2m levels
in apply_p2m_changes" on platform where the root-level doesn't have
concatenate table (such as the Foundation Model).
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Wed, 1 Oct 2014 12:54:47 +0000 (14:54 +0200)]
x86/HVM: properly bound x2APIC MSR range
While the write path change appears to be purely cosmetic (but still
gets done here for consistency), the read side mistake permitted
accesses beyond the virtual APIC page.
Note that while this isn't fully in line with the specification
(digesting MSRs 0x800-0xBFF for the x2APIC), this is the minimal
possible fix addressing the security issue and getting x2APIC related
code into a consistent shape (elsewhere a 256 rather than 1024 wide
window is being used too). This will be dealt with subsequently.
Roger Pau Monne [Wed, 1 Oct 2014 10:42:07 +0000 (12:42 +0200)]
xl: use nic global default values in network-attach
Introduce a new static function that will be used to set the initial nic
config based on the global defaults. This fixes a bug caused by
network-attach not using the default values set in xl.conf(5).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Currently the hypervisor will hang if it hits a WARN_ON.
The implementation uses an undefined instruction, made ourself because ARM
don't provide one, to implement BUG/ASSERT/WARN_ON, and sets up the
different tables (one for each type) which contain useful information.
This is based on the x86 implementation (include/asm-x86/bug.h). Unfortunately
the structure can't be shared because many ARM{32,64} gcc versions doesn't
correctly support %c. The support for executing a function in an exception handler
is also keep unimplemented on ARM. Therefore, dump_execution_state is
implemented as WARN()
The current opcode used to go in exception mode may not be undefined on ARM64.
Use the instruction "brk" to generate a software debug exception.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Wed, 1 Oct 2014 10:09:17 +0000 (11:09 +0100)]
tools/libxl: Initialise rc on error paths of libxl_domain_remus_start()
Coverity-ID: 1242320 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca> CC: Yang Hongyang <yanghy@cn.fujitsu.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Thu, 25 Sep 2014 13:08:14 +0000 (15:08 +0200)]
MAINTAINERS: handle buildsystem
Tweaking configure and friends is most likely not of much interest for
anyone beside the tools maintainers.
List such files, which are currently covered by "THE REST", in the
TOOLSTACK section. Also update list of stubdom related files.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Roy Franz [Fri, 26 Sep 2014 22:25:01 +0000 (15:25 -0700)]
Add ARM EFI boot support
This patch adds EFI boot support for ARM based on the previous refactoring of
the x86 EFI boot code. All ARM specific code is in the ARM efi-boot.h header
file, with the main EFI entry point common/efi/boot.c. The PE/COFF header is
open-coded in head.S, which allows us to have a single binary be both an EFI
executable and a normal arm64 IMAGE file. There is currently no PE/COFF
toolchain support for arm64, so it is not possible to create the PE/COFF header
in the same manner as on x86. This also simplifies the build as compared to
x86, as we always build the same executable, whereas x86 builds 2. An ARM
version of efi-bind.h is added, which is based on the x86_64 version with the
x86 specific portions removed. The Makefile in common/efi is different for x86
and ARM, as for ARM we always build in EFI support.
NR_MEM_BANKS is increased, as memory regions are now added from the UEFI memory map,
rather than memory banks from a DTB. The UEFI memory map may be fragmented so a larger
number of regions will be used.
Signed-off-by: Roy Franz <roy.franz@linaro.org> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- applied vga.h movement fixup patch from Roy, plus moved the xen/vga.h
include before the asm/* ones. ]
Tamas K Lengyel [Wed, 1 Oct 2014 09:40:36 +0000 (11:40 +0200)]
xsm: wrap mem_access blocks into HAS_MEM_ACCESS ifdefs
This patch wraps the XSM code corresponding to the mem_access and
mem_event code-paths into HAS_MEM_ACCESS ifdefs.
Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Paul Durrant [Wed, 1 Oct 2014 09:37:06 +0000 (11:37 +0200)]
ioreq-server: handle the lack of a default emulator properly
I started porting QEMU over to use the new ioreq server API and hit a
problem with PCI bus enumeration. Because, with my patches, QEMU only
registers to handle config space accesses for the PCI device it implements
all other attempts by the guest to access 0xcfc go nowhere and this was
causing the vcpu to wedge up because nothing was completing the I/O.
This patch introduces an I/O completion handler into the hypervisor for the
case where no ioreq server matches a particular request. Read requests are
completed with 0xf's in the data buffer, writes and all other I/O req types
are ignored.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
* Introduce hvm_dump_emulation_state() to be a common implementation rather
than having the printk() open-coded slightly differently in 3 separate
places.
* Identify the vcpu operating mode to allow for unambiguous decoding of the
instruction bytes.
* A valid instruction can be up to 15 bytes long, but may also be shorter than
the current arbitrary 10 bytes. Print only the fetched bytes, which could
include nothing if the emulation failed due to an inability to fetch the
instruction.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Release-acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
FreeBSD doesn't use any qemu-ifup script in order to setup the network, it
is all done on the hotplug script like Linux.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
FreeBSD only allows reading multiples of sector size from raw disk devices
(character devices). This fix should only alter the behaviour of pygrub on
FreeBSD, the other supported OSes will continue using the same size.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Thu, 25 Sep 2014 14:23:10 +0000 (15:23 +0100)]
docs: fix one typo in pvh.markdown
were -> where
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Roger Pau Monne <roger.pau@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 29 Sep 2014 08:23:01 +0000 (10:23 +0200)]
x86/emulate: support for emulating software event injection
AMD SVM requires all software events to have their injection emulated if
hardware lacks NextRIP support. In addition, `icebp` (opcode 0xf1) injection
requires emulation in all cases, even with hardware NextRIP support.
Emulating full control transfers is overkill for our needs. All that matters
is that guest userspace can't bypass the descriptor DPL check. Any guest OS
which would incur other faults as part of injection is going to end up with a
double fault instead, and won't be in a position to care that the faulting eip
is wrong.
Reported-by: Andrei LUTAS <vlutas@bitdefender.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Andrew Cooper [Mon, 29 Sep 2014 08:22:23 +0000 (10:22 +0200)]
x86/hvm: don't discard the SW/HW event distinction from the emulator
Injecting emulator software events as hardware exceptions results in a bypass
of DPL checks. As the emulator doesn't perform DPL checks itself, guest
userspace is capable of bypassing DPL checks and injecting arbitrary events.
Propagating software event information from the emulator allows VMX to now
properly inject software events, including DPL and presence checks, as well
correct fault/trap frames.
Reported-by: Andrei LUTAS <vlutas@bitdefender.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrei LUTAS <vlutas@bitdefender.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
x86/hvm: remove stray lock release from hvm_ioreq_server_init()
If HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or HVM_PARAM_BUFIOREQ_EVTCHN
parameters are read when guest domain is dying it leads to the following
ASSERT:
The root cause of this issue is the fact that ioreq_server.lock is being
released twice - first in hvm_ioreq_server_init() and then in hvm_create_ioreq_server().
Drop the lock release from hvm_ioreq_server_init() as we don't take it here, do minor
label cleanup.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Fri, 26 Sep 2014 14:29:34 +0000 (16:29 +0200)]
mem_event: relax error condition on debug builds
A faulty tool stack can brick a debug hypervisor. Unpleasant while dev/test.
Suggested-by: Andres Lagar Cavilla <andres@lagarcavilla.org> Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de> Acked-by: Tim Deegan <tim@xen.org>
Tamas K Lengyel [Fri, 26 Sep 2014 14:24:02 +0000 (16:24 +0200)]
relocate p2m_access_t into common and swap the order
We swap the order of the enum of types n ... rwx, as to have rwx at 0, which is
the default setting when mem_access is not in use. This has performance benefits for
non-memaccess paths, as now comparison is to 0 when checking if memaccess is in use,
which is often faster.
We fix one location in nested_hap where the order of the enum made a difference.
Signed-off-by: Tamas K Lengyel <tklengyel@sec.in.tum.de> Acked-by: Tim Deegan <tim@xen.org>
Yang Hongyang [Mon, 7 Jul 2014 02:10:20 +0000 (10:10 +0800)]
MAINTAINERS: update maintained files of Remus
Add Remus specific hotplug scripts and libxl files
to the list of maintained files.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Wed, 16 Jul 2014 09:07:43 +0000 (17:07 +0800)]
libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
Add LIBXL_HAVE_REMUS to indicate Remus support in libxl
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Wed, 16 Jul 2014 09:27:43 +0000 (17:27 +0800)]
xl/remus: add a cmdline switch to disable disk replication
Disk replication is enabled by default. This patch adds a cmdline
switch to 'xl remus' command to explicitly disable disk replication.
A new boolean field 'diskbuf' is added to the libxl_domain_remus_info
structure to represent this configuration option inside libxl.
Note: Disabling disk replication requires enabling unsafe mode.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Wed, 11 Jun 2014 03:29:44 +0000 (11:29 +0800)]
xl/remus: cmdline switches and config vars to control network buffering
Add two members in libxl_domain_remus_info:
netbuf: whether netbuf is enabled
netbufscript: the path of the script which will be run to setup
and tear down the guest's interface.
Add cmdline switches to 'xl remus' command to enable or disable
network buffering and a domain-specific hotplug script to setup
network buffering.
Add a new config var 'remus.default.netbufscript' to xl.conf, that
allows the user to override the default global script used to
setup network buffering.
Note: Network buffering is enabled by default. Disabling network
buffering requires enabling unsafe mode.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Thu, 24 Jul 2014 08:47:24 +0000 (16:47 +0800)]
xl/remus: cmdline switch to explicitly enable unsafe configurations
By default, network buffering and disk replication are enabled;
checkpoints are replicated to another standby VM.
This patch allows the user to disable any of these features by
explicitly specifying a 'run in unsafe mode' switch when invoking
the 'xl remus' command. While running Remus in an unsafe mode
makes little sense under normal circumstances, it is useful to be
able to disable one or more features mentioned above for
testing/debugging/profiling purposes.
Unless this option is enabled, it will not be possible to
replicate memory checkpoints to /dev/null (blackhole replication),
disable network buffering or disk replication.
As a starter, the use of blackhole replication now requires that
the unsafe mode be enabled. Subsequent patches will add support
for disabling network buffering and disk replication in a similar
manner.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 29 Aug 2014 02:16:36 +0000 (10:16 +0800)]
xl/remus: change bool to defbool
Use defbool instead of bool for boolean flags in remus_info struct.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 18 Jul 2014 09:14:22 +0000 (17:14 +0800)]
libxl/remus: setup and control disk replication for DRBD backends
This patch adds the machinery required for protecting a guest's
disk state, when the guest disk uses a DRBD disk backend.
This patch comprises of two parts:
1. Hotplug scripts: The block-drbd-probe script is responsible for
performing sanity checks on the state of the DRBD disk before the
checkpointing process begins. This script should be invoked by
libxl for each of the guest's disk devices, when starting Remus.
2. Remus drbd disk device: Implements the interfaces required by the
remus abstract device layer. A note about the implementation:
a) setup() is called for each disk attached to the guest.
During setup():
i) The hotplug script is called to perform the sanity check.
ii) Libxl obtains a handle to the DRBD device (/dev/drbd*) and
and subsequently controls disk checkpoint replication using
this handle in the checkpoint callbacks.
c) The preresume() checkpoint callback is executed asynchronously
using libxl__ev_child_fork(), as it may potentially block for more
than few seconds in case of backup failure.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 18 Jul 2014 07:08:36 +0000 (15:08 +0800)]
libxl/remus: setup and control network output buffering
This patch adds the machinery required for protecting a guest's
network device state. This patch comprises of two parts:
1. Hotplug scripts: The remus-netbuf-setup script is responsible for
setting up and tearing down the necessary infrastructure required for
network output buffering. This script should be invoked by libxl for
each of the guest's network interfaces, when starting or stopping Remus.
Apart from returning success/failure indication via the usual hotplug
entries in xenstore, this script also writes to xenstore, the name of
the REMUS_IFB device to be used to control the vif's network output.
The script relies on libnl3 command line utilities to perform various
setup/teardown functions. The script is confined to Linux platforms only
since NetBSD does not seem to have libnl3.
2. Remus network device: Implements the interfaces required by the
remus abstract device layer. A note about the implementation:
a) init_subkind_nic() & cleanup_subkind_nic() are called once per Remus
invocation. They establish and free netlink related state respectively.
b) setup() and teardown are called for each vif attached to the
guest.
During setup():
i) The hotplug script is called to setup a network buffer on a
given vif. The script chooses an available IFB device from
the system, redirects vif egress traffic to the IFB device
and sets up the plug qdisc (output buffer) on the IFB device.
The name of the IFB device is communicated via xenstore to
libxl.
ii) Libxl obtains a handle to the plug qdisc using the libnl3 API
and subsequently controls output buffering using this handle
in the checkpoint callbacks.
During teardown(), the hotplug scripts are called again to remove
the vif->ifb traffic redirection, release the ifb and the plug
qdisc associated with it.
c) The checkpoint callbacks [postsuspend(), preresume() and commit()]
are implemented as synchronous ops as the netlink calls associated
with the qdisc subsystem are very fast.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 18 Jul 2014 07:02:34 +0000 (15:02 +0800)]
libxl/remus: introduce an abstract Remus device layer
Introduce an abstract device layer that allows the Remus
logic in libxl to control a guest's devices in a device-agnostic
manner. The device layer also exposes a set of internal interfaces
that a device type must implement, if it wishes to support Remus.
The following API are exposed to libxl:
One-time configuration operations:
*libxl__remus_devices_setup
> Enable output buffering for NICs, setup disk replication, etc.
*libxl__remus_devices_teardown
> Disable network output buffering and disk replication;
teardown any associated external setups like qdiscs for NICs.
Operations executed every checkpoint (in order of invocation):
*libxl__remus_devices_postsuspend
*libxl__remus_devices_preresume
*libxl__remus_devices_commit
Each device type needs to implement the interfaces specified in
the libxl__remus_device_instance_ops if it wishes to support Remus.
The high-level control flow through the Remus device layer is shown below:
callback processing
* Only call the per-device libxl__multidev_one_callback
when the iteration has succeded or failed.
* The final callback (called by multidev) is a trivial
shim to shuffle the pointers and notify our own caller.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yang Hongyang [Fri, 27 Jun 2014 01:43:51 +0000 (09:43 +0800)]
autoconf: add libnl3 dependency for Remus network buffering support
Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
It also provides the ability to configure tools without libnl3 support
i.e., without network buffering support.
When there is no network buffering support, libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.
NOTE: This patch changes tools/configure.ac, please rerun
autogen.sh while applying the patch.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>