Jan Beulich [Tue, 14 Jun 2016 13:08:47 +0000 (15:08 +0200)]
x86/time: use correct (local) time stamp in constant-TSC calibration fast path
This looks like a copy and paste mistake in commit 1b6a99892d ("x86:
Simpler time handling when TSC is constant across all power saving
states"), responsible for occasional many-microsecond cross-CPU skew of
what NOW() returns.
Also improve the correlation between local TSC and stime stamps
obtained at the end of the two calibration handlers: Compute the stime
one from the TSC one, instead of doing another rdtsc() for that
compuation.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Euan Harris [Thu, 9 Jun 2016 10:14:10 +0000 (10:14 +0000)]
nested vmx: Validate host VMX MSRs before accessing them
Some VMX MSRs may not exist on certain processor models, or may
be disabled because of configuration settings. It is only safe to
access these MSRs if configuration flags in other MSRs are set. These
prerequisites are listed in the Intel 64 and IA-32 Architectures
Software Developer’s Manual, Vol 3, Appendix A.
nvmx_msr_read_intercept() does not check the prerequisites before
accessing MSR_IA32_VMX_PROCBASED_CTLS2, MSR_IA32_VMX_EPT_VPID_CAP,
MSR_IA32_VMX_VMFUNC on the host. Accessing these MSRs from a nested
VMX guest running on a host which does not support them will cause
Xen to crash with a GPF.
Signed-off-by: Euan Harris <euan.harris@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 10 Jun 2016 18:11:12 +0000 (19:11 +0100)]
xen/hvm: Fix advertisement of available xstates following c/s c52319642
PKU lives in CPUID.7[0].ECX, not EBX. This causes hardware with BMI1 to
accidentally advertise PKU in CPUID.0xD[0].EAX. Any OS which proceeds to
blindly write this into %xcr0 takes a #GP fault. (Experimentally, Windows
Vista 32bit falls into this category.)
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
libxenvchan: Change license of header from Lesser GPL v2.1 to BSD
As the xen/COPYING file says:
"A few files are licensed under both GPL and a weaker BSD-style
license. This includes all files within the subdirectory
include/public, as described in include/public/COPYING. All such files
include the non-GPL license text as a source-code comment. Although
the license text refers generically to "the software", the non-GPL
license applies *only* to those source files that explicitly include
the non-GPL license text."
The libxenvchan.h is under xen/include/public/io directory
and the xen/include/public/COPYING says:
"XEN NOTICE
==========
This copyright applies to all files within this subdirectory and its
subdirectories:
include/public/*.h
include/public/hvm/*.h
include/public/io/*.h
The intention is that these files can be freely copied into the source
tree of an operating system when porting that OS to run on Xen. Doing
so does *not* cause the OS to become subject to the terms of the GPL.
All other files in the Xen source distribution are covered by version
2 of the GNU General Public License except where explicitly stated
otherwise within individual source files.
"
Having the libxenvchan.h as Lesser GPL v2.1 where the COPYING file
says otherwise is confusing to say at least.
Upon consulting with the authors of libxenvchan they said:
"FWIW Neither I, nor ITL staff (as author of original libvchan library)
have anything against converting it to the BSD-style licence."
(Marek Marczykowski-Górecki,
http://lists.xen.org/archives/html/xen-devel/2016-06/msg00995.html)
so as such lets change it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anil Madhavapeddy <anil@recoil.org> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: George Dunlap <George.Dunlap@eu.citrix.com> Acked-by: Jan Beulich <JBeulich@suse.com> Acked-by: Jason Andryuk <andryuk@aero.org> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Matthew Daley <mattjd@gmail.com> Acked-by: Olaf Hering <olaf@aepfle.de> Acked-by: Roger Pau Monne <roger.pau@entel.upc.edu> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
["I have spoken to my line manager. I can confirm that Citrix is happy
with this proposed change. So:
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
This view from Citrix covers all contributions made to these files in
the course of Citrix's employees' employment, which I think is:
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Roger Pau Monne <roger.pau@entel.upc.edu>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Tim Deegan <tim@xen.org>
> Cc: Wei Liu <wei.liu2@citrix.com>
..
[in subsequent email]:
Wei points out that this ought also to include Keir Fraser's
contribution, which was (only) in 2012.
" (from Ian's email)
In a subsequent mail, Wei also points out that David Scott's
contribution is covered by Ian's ack.
]
Andrew Cooper [Fri, 10 Jun 2016 14:47:15 +0000 (15:47 +0100)]
xen/x86: Always print processor information at boot
It is generally useful information, which isn't directly available in the
hypervisor console log.
To get an appropriate string in this_cpu->c_vendor, drop the notion of
gcv_host_late. All relevent information is available even during early
detection, and even Linux (as the ancestor of this code) as dropped the
distinction.
A sample log now looks like:
(XEN) Domain heap initialised
(XEN) CPU Vendor: Intel, Family 6, Model 71, Stepping 1 (raw 00040671)
(XEN) found SMP MP-table at 000fd6c0
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Local variable "j" would be used only when "i == ARRAY_SIZE(main_options)"
is true. Thus, it is not necessary to update "j" when "i ==
ARRAY_SIZE(main_options)" is false.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Wei Liu [Thu, 9 Jun 2016 12:57:40 +0000 (13:57 +0100)]
hotplug/NetBSD: honour XEN_{LOG,RUN}_DIR
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Wei Liu [Thu, 9 Jun 2016 12:57:39 +0000 (13:57 +0100)]
hotplug/Linux: honour XEN_LOG_DIR
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Wei Liu [Thu, 9 Jun 2016 12:57:38 +0000 (13:57 +0100)]
hotplug/FreeBSD: honour XEN_{LOG,RUN}_DIR
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
In accordance with CODING_SYTLE:
- Use 'r' for return values to functions whose return values are a
different error space (like xc_tmem_control, xc_tmem_auth)
libxc functions are supposed to, on failure, set errno and always
return -1 which is the value stored in 'r', therfore use LOGE()
instead LOGEV() with the 'r' value.
Signed-off-by: Paulina Szubarczyk <paulinaszubarczyk@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
libxl: style cleanups in libxl_device_pci_assignable_list()
Various coding style compliance cleanups, such as, arranging for
using only one path out of the function, whitespaces in loops ad if-s
and r instead of rc for storing non-libxl error codes.
Signed-off-by: Paulina Szubarczyk <paulinaszubarczyk@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Mon, 9 May 2016 11:30:55 +0000 (13:30 +0200)]
libxl: Fix libxl_set_memory_target return value
libxl_set_memory_target seems to have the following return values:
'1' : on failure, if the failure happens because of a xenstore error
*or* invalid target
'-1': on error, the setmaxmem and set_pod_target hypercalls
return -1 and set errno appropriately.
'0' : on success
Make it consistently return ERROR_FAIL on failure, unless the
parameters were invalid, in which case return ERROR_INVAL.
In accordance with CODING_SYTLE:
1. Leave rc uninitialized, and set when an error is detected
2. Use 'r' for return values to functions whose return values are a
different error space (like xc_domain_setmaxmem and
xc_domain_set_pod_target)
3. Use 'lrc' for return values to local functions libxl__*
where a failure means retry, rather than fail the whole function
(libxl__fill_dom0_memory_info), to reduce the risk of that.
Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com> Signed-off-by: Paulina Szubarczyk <paulinaszubarczyk@gmail.com> Reviewed-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com>
Functions libxl_tmem_freeze(), libxl_tmem_thaw(), libxl_tmem_set() and
libxl_tmem_shared_auth() located in libxl.c file return
ERROR_FAIL/ERROR_INVAL or internal error codes from libxc library
improve main_tmem_* return codes by returning EXIT_{SUCCESS/FAILURE}
accordingly to return codes of those functions.
Signed-off-by: Paulina Szubarczyk <paulinaszubarczyk@gmail.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Len Brown [Thu, 9 Jun 2016 13:52:27 +0000 (15:52 +0200)]
mwait-idle: add BXT support
Broxton has all the HSW C-states, except C3.
BXT C-state timing is slightly different.
Here we trust the IRTL MSRs as authority
on maximum C-state latency, and override the driver's tables
with the values found in the associated IRTL MSRs.
Further we set the target_residency to 1x maximum latency,
trusting the hardware demotion logic.
Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit: 5dcef694860100fd16885f052591b1268b764d21] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Len Brown [Thu, 9 Jun 2016 13:52:05 +0000 (15:52 +0200)]
mwait-idle: add KBL support
KBL is similar to SKL
Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit: 3ce093d4de753d6c92cc09366e29d0618a62f542] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Len Brown [Thu, 9 Jun 2016 13:51:43 +0000 (15:51 +0200)]
mwait-idle: add SKX support
SKX is similar to BDX
Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit: f9e71657c2c0a8f1c50884ab45794be2854e158e] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 9 Jun 2016 13:46:22 +0000 (15:46 +0200)]
public/errno: sort entries numerically
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 21 Apr 2016 13:47:12 +0000 (14:47 +0100)]
xen/vsprintf: Avoid returning NULL from number()
In practice this is an unused codepath, as every caller of number() passes an
explicit base of 8, 10 or 16. For all other uses, number() returns a pointer
between the str and end parameters, as do the other similar helper functions.
However, the fact that there is a NULL return path causes Coverity to check
whether the caller makes NULL checks on the return value, and complain.
Change the conditional return into an ASSERT().
No functional change, but this removes 21 instances of NULL_RETURN in
Coverity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
--- CC: Jan Beulich <JBeulich@suse.com>
Wei Chen [Fri, 3 Jun 2016 10:07:13 +0000 (18:07 +0800)]
xen/arm: build: add missed dependency for head.S
When we update the header files that had been included in head.S.
The build system would not re-compile the head.S. Because in the
build rules, the dependencies are setting to .*.d (eg. DEPS = .*.d)
files in the same folder as Makefile.
But head.S is very special, it was used by the Makefile in the parent
folder: "ALL_OBJS := $(TARGET_SUBARCH)/head.o".
In this case, the build system could not find the dependency in DEPS.
When we update the header files, the build system is unware of this
update. If we re-build the Xen without doing make clean or touching
the head.S, the build system will not recompile the head.S.
Signed-off-by: Wei Chen <Wei.Chen@linaro.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 8 Jun 2016 14:42:19 +0000 (15:42 +0100)]
libxl: Fix NULL pointer due to XSA-178 fix wrong XS nodename
In "libxl: Do not trust backend for disk eject vdev" (c69871a2fb26 on
xen.git#staging) we changed libxl_evenable_disk_eject to read the
device vdev out of xenstore from the /libxl path, rather than the
backend path, and to read it during setup rather than on each event.
However, the patch has a mistake:
- GCSPRINTF("%s/dev", backend), NULL);
+ GCSPRINTF("%s/vdev", libxl_path), &configured_vdev);
^
Spot the extra "v". This causes configured_vdev always to be NULL.
configured_vdev is passed to [libxl__]strdup.
In Xen 4.6 and later libxl__strdup is used and tolerates NULL.
evg->vdev is set to NULL. This propagates to the `vdev' field in the
generated event. This may or may not cause further trouble, depending
on the calling application. In our osstest test cases it does not
cause any trouble, so the bug goes undetected.
In Xen 4.5 and earlier, the strdup does not tolerate NULL, and libxl
crashes immediately. This has been detected by osstest as a
regression in Xen 4.5.
IMO this patch should be applied immediately to
xen.git#staging-4.5 (to check that it fixes the osstest regression)
xen.git#staging (to check that it does not break master
Subject to passes, it should then be propagated to all supported
stable trees and also be mentioned in an update to XSA-178.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> CC: security@xenproject.org CC: Jan Beulich <jbeulich@suse.com> CC: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 27c5d7ff8cfdc2e15ff521b4912d69b782a269d7)
Euan Harris [Wed, 8 Jun 2016 12:14:33 +0000 (14:14 +0200)]
nested vmx: intercept guest rdmsr for MSR_IA32_VMX_VMFUNC
Guest reads of MSR_IA32_VMX_VMFUNC should be handled by
the logic in vmx_msr_read_intercept(). Otherwise a guest
can read the raw host value of this MSR, even if nested
vmx is disabled.
Signed-off-by: Euan Harris <euan.harris@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
At the time of registering HVM I/O handler, the HVM domain might
not have been initialized, which means the hvm_domain.io_handler
would be NULL. In the hvm_next_io_handler(), this should be asserted.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
AMD IOMMU: introduce support for IVHD block type 11h
Along with the IVHD block type 10h, newer AMD platforms also come with
types 11h, which is a superset of the older one. Having multiple IVHD
block types in the same platform allows backward compatibility of newer
systems to work with existing drivers. The driver should only parse
the highest-level (newest) type of IVHD block that it can support.
However, the current driver returns error when encounters with unknown
IVHD block type. This causes existing driver to unnecessarily fail IOMMU
initialization on new systems.
This patch introduces a new logic, which scans through IVRS table looking
for the highest-level supporsted IVHD block type. It also adds support
for the new IVHD block type 11h. More information about the IVHD type 11h
can be found in the AMD I/O Virtualization Technology (IOMMU) Specification
rev 2.62.
http://support.amd.com/TechDocs/48882_IOMMU.pdf
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 8 Jun 2016 12:12:45 +0000 (14:12 +0200)]
kexec: allow relaxed placement specification via command line
Rather than just allowing a fixed address or fully automatic placement,
also allow for specifying an upper bound. Especially on EFI systems,
where firmware memory use is commonly less predictable than on legacy
BIOS ones, this makes success of the reservation more likely when
automatic placement is not an option (e.g. because of special DMA
restrictions of devices involved in actually carrying out the dump).
Also take the opportunity to actually add text to the "crashkernel"
entry in the command line option doc.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Doug Goldstein [Wed, 8 Jun 2016 12:11:50 +0000 (14:11 +0200)]
build: convert lock_profile to Kconfig
Convert the 'lock_profile' option to Kconfig as CONFIG_LOCK_PROFILE.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com>
Doug Goldstein [Wed, 8 Jun 2016 12:11:21 +0000 (14:11 +0200)]
build: convert perfc{,_arrays} to Kconfig
Convert the 'perfc' and 'perfc_arrays' options to Kconfig as
CONFIG_PERF_COUNTERS and CONFIG_PERF_ARRAYS.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Wed, 8 Jun 2016 12:10:35 +0000 (14:10 +0200)]
build: convert frame_pointer to Kconfig
Converts the frame_pointer option to a Kconfig option.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Wed, 8 Jun 2016 12:09:55 +0000 (14:09 +0200)]
build: convert verbose to Kconfig
Convert 'verbose', which was enabled by 'debug=y' to Kconfig as
CONFIG_VERBOSE_DEBUG which is enabled by default when CONFIG_DEBUG is
enabled.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Doug Goldstein [Wed, 8 Jun 2016 12:06:59 +0000 (14:06 +0200)]
build: convert crash_debug to Kconfig
Convert the crash_debug option to Kconfig as CONFIG_CRASH_DEBUG. This
was previously togglable on the command line so this adds a message for
users enabling it from the command line to tell them to enable it from
make menuconfig.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Wed, 8 Jun 2016 12:04:30 +0000 (14:04 +0200)]
build: convert debug to Kconfig
Enabling debug will disable NDEBUG which will result in more debug
prints. There are a number of debugging options for Xen so place the
debug option under a menu for different debugging options to have a way
to group them all together.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Daniel Kiper [Wed, 8 Jun 2016 12:01:53 +0000 (14:01 +0200)]
x86/boot: do not create unwind tables
This way .eh_frame section is not included in *.lnk and *.bin files.
Hence, final e.g. reloc.bin file size is reduced from 408 bytes to
272 bytes and it contains only used code and data.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Platforms supporting Intel NVDIMM are now required to provide
persistency once pmem stores are accepted by the memory subsystem.
This is usually achieved by a platform-level feature known as ADR
(Asynchronous DRAM Refresh) that flushes any memory subsystem write
pending queues on power loss/shutdown. Therefore, the pcommit
instruction, which has not yet shipped on any product (and will not),
is no longer needed and is deprecated.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Haozhong Zhang [Wed, 8 Jun 2016 09:08:55 +0000 (11:08 +0200)]
x86/mce: handle reserved domain ID in XEN_MC_msrinject
Commit 26646f3 "x86/mce: translate passed-in GPA to host machine
address" and commit 4ddf474 "tools/xen-mceinj: Pass in GPA when
injecting through MSR_MCI_ADDR" forgot to consider reserved domain
ID and mistakenly add MC_MSRINJ_F_GPADDR flag for them, which in turn
causes bug reported by
http://lists.xenproject.org/archives/html/xen-devel/2016-05/msg02640.html.
This patch removes MC_MSRINK_F_GPADDR flag and checks this when injecting
to reserved domain IDs except DOMID_SELF, and treats the passed-in
address as host machine address.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Christoph Egger <chegger@amazon.de>
Chris Patterson [Fri, 3 Jun 2016 16:50:10 +0000 (12:50 -0400)]
libfsimage: replace deprecated readdir_r() with readdir()
Replace the usage of readdir_r() with readdir() to address a
compilation error under glibc due to the deprecation of readdir_r
for their next release (2.24) [1, 2].
Add new error checking on readdir(), and fail if error occurs.
--
From the GNU libc manual [3]:
"
It is expected that future versions of POSIX will obsolete readdir_r and
mandate the level of thread safety for readdir which is provided by the
GNU C Library and other implementations today.
"
There is a filed bug in the Austin Group Defect Tracker [4] in which 'dalias'
proposes (in comment 0001632) that:
"
I would like to propose an alternate solution. For readdir, replace the text:
"The readdir() function need not be thread-safe."
with:
"If multiple threads call the readdir() function with the same directory
stream argument and without synchronization to preclude simultaneous
access, then the behavior is undefined."
With this change, the clunky readdir_r function is no longer needed or
useful, and should probably be deprecated. As the only reasonable way
to meet the implementation requirements for readdir is to have the dirent
buffer in the DIR structure, this change should not require any change to
existing implementations.
"
Signed-off-by: Chris Patterson <pattersonc@ainfosec.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Chris Patterson [Fri, 3 Jun 2016 16:50:09 +0000 (12:50 -0400)]
libxl: replace deprecated readdir_r() with readdir()
Replace the usage of readdir_r() with readdir() to address a
compilation error under glibc due to the deprecation of readdir_r
for their next release (2.24) [1, 2].
Remove code specific to usage of readdir_r which is no longer required,
such as zalloc_dirent().
--
From the GNU libc manual [3]:
"
It is expected that future versions of POSIX will obsolete readdir_r and
mandate the level of thread safety for readdir which is provided by the
GNU C Library and other implementations today.
"
There is a filed bug in the Austin Group Defect Tracker [4] in which 'dalias'
proposes (in comment 0001632) that:
"
I would like to propose an alternate solution. For readdir, replace the text:
"The readdir() function need not be thread-safe."
with:
"If multiple threads call the readdir() function with the same directory
stream argument and without synchronization to preclude simultaneous
access, then the behavior is undefined."
With this change, the clunky readdir_r function is no longer needed or
useful, and should probably be deprecated. As the only reasonable way
to meet the implementation requirements for readdir is to have the dirent
buffer in the DIR structure, this change should not require any change to
existing implementations.
"
Signed-off-by: Chris Patterson <pattersonc@ainfosec.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 3 Jun 2016 15:21:46 +0000 (16:21 +0100)]
docs: Feature Levelling feature document
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Thu, 2 Jun 2016 11:08:42 +0000 (12:08 +0100)]
x86/cpuid: Calculate a guests xfeature_mask from its featureset
libxc current performs the xstate calculation for guests, and provides the
information to Xen to be used when satisfying CPUID traps. (There is further
work planned to improve this arrangement, but the worst a buggy toolstack can
do is make junk appear in the cpuid leaves for the guest.)
dom0 however has no policy constructed for it, and certain fields filter
straight through from hardware.
Linux queries CPUID.7[0].{EAX/EDX} alone to choose a setting for %xcr0, which
is a valid action to take, but features such as MPX and PKRU are not supported
for PV guests. As a result, Linux, using leaked hardware information, fails
to set %xcr0 on newer Skylake hardware with PKRU support, and crashes.
As an interim solution, dynamically calculate the correct xfeature_mask and
xstate_size to report to the guest for CPUID.7[0] queries. This ensures that
domains don't see leaked hardware values, even when no cpuid policy is
provided.
Similarly, CPUID.7[1]{ECX/EDX} represents the applicable settings for MSR_XSS.
As Xen doesn't yet support any XSS states in guests, unconditionally zero
them.
Reported-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Luwei Kang <luwei.kang@intel.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 3 Jun 2016 13:28:10 +0000 (15:28 +0200)]
VMX: relax incoming BNDCFGS check
Accepting zero here even when !cpu_has_mpx makes the restore side
symmetric to the save logic (which avoids saving the value if zero),
i.e. makes either side independent of the logic on the other side.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Thu, 2 Jun 2016 13:19:00 +0000 (14:19 +0100)]
xen/arm: Don't free p2m->root in p2m_teardown() before it has been allocated
If p2m_init() didn't complete successfully, (e.g. due to VMID
exhaustion), p2m_teardown() is called and unconditionally tries to free
p2m->root before it has been allocated. free_domheap_pages() doesn't
tolerate NULL pointers.
This is XSA-181
Reported-by: Aaron Cornelius <Aaron.Cornelius@dornerworks.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <julien.grall@arm.com>
tmem: Move bulk of tmem control functions in its own file.
The functionality that is related to migration is left inside
tmem.c. The list of control operations that are in tmem_control
with XEN_SYSCTL_TMEM_OP prefix are:
tmem: Move global_ individual variables in a global structure.
Put them all in one structure to make it easier to
figure out what can be removed. The structure is called
'tmem_global' as it will be eventually non-static.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
tmem: Wrap atomic_t in struct tmem_statistics as well.
The macros: atomic_inc_and_max and atomic_dec_and_assert
use also the 'stats' to access them. Had to open-code
access to pool->pgp_count as it would not work anymore.
No functional change.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
s/\.xsplice/\.livepatch/
s/XSPLICE/LIVEPATCH/
s/xsplice/livepatch/
s/livepatch_patch_func/livepatch_func/
s/xSplice/Xen Live Patch/
s/livepatching/livepatch/
s/arch_livepatch_enter/arch_livepatch_quiesce/
s/arch_livepatch_exit/arch_livepatch_revive/
And then modify some of the function arguments
to have two more characters.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Thu, 2 Jun 2016 15:10:32 +0000 (16:10 +0100)]
libxl: Document ~/serial/ correctly
xenstore-paths.markdown talked about ~/device/serial/, but that's not
used.
(It is very wrong for this value, which contains a driver domain
filesystem path, to be in the guest's area of xenstore. However, it
is only ever created by libxl and ready by xenconsoled. When it is
created, it inherits the read-only permissions of /local/domain/DOMID.
So there is no security bug.)
This is a followup to XSA-175.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Thu, 2 Jun 2016 15:10:30 +0000 (16:10 +0100)]
libxl: Cleanup: Have libxl__alloc_vdev use /libxl
When allocating a vdev for a new disk, look in /libxl/device, rather
than the frontends directory in xenstore.
This is more in line with the other parts of libxl, which ought not to
trust frontends. In this case, though, there is no security bug prior
to this patch because the frontend is the toolstack domain itself.
If libxl__alloc_vdev were ever changed to take a frontend domain
argument, this patch will fix a latent security bug.
This is a followup to XSA-175.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Thu, 5 May 2016 15:17:26 +0000 (16:17 +0100)]
libxl: Do not trust backend for vusb
Read the type from /libxl, rather than the backend. (We still trust
the backend for details such as the number of ports, etc.; these are
not a security problem.)
In getinfo, use the computed frontend path, and the incoming domid,
rather than needlessly reading these values from the backend.
This is part of XSA-178.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: New patch following rebase.
Ian Jackson [Wed, 4 May 2016 15:59:38 +0000 (16:59 +0100)]
libxl: Do not trust backend in channel list
Read the name from /libxl/device. Pass the /libxl path to
libxl__device_channel_from_xenstore.
This removes the final route by which READ_LIBXLDEV might receive a
backend path.
This is part of XSA-178.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: Remove be_path variable which is now no longer used.
Ian Jackson [Wed, 4 May 2016 15:23:57 +0000 (16:23 +0100)]
libxl: Do not trust backend for nic in list
libxl_device_nic_list should use the /libxl path to search for
devices, and for obtaining the device information.
The "type" parameter was always "vif". Abolish it. (In any case,
paths in /libxl/device are named after the frontend type which is
constant, not the backend type which might in future vary.)
Abolish a redundant store to pnic->backend_domid. Before this commit,
that store was not needed because libxl_device_nic_init (called by
libxl__device_nic_from_xenstore) would zero it. Now it overwrites the
correct backend domid with zero; so remove it.
This is part of XSA-178.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Tue, 3 May 2016 14:40:18 +0000 (15:40 +0100)]
libxl: Have READ_LIBXLDEV use libxl_path rather than be_path
Fix the just-introduced bug in this macro: now it reads the
trustworthy libxl_path. Change the variable name in the two functions
(nic and channel) which use it.
Shuffling the bump in the carpet along, we now introduce three new
bugs: the three call sites pass a backend path where a frontend path
is expected.
No functional change.
This is part of XSA-178.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Wed, 4 May 2016 15:07:02 +0000 (16:07 +0100)]
libxl: Rename READ_BACKEND to READ_LIBXLDEV
We are going to want to change all the functions that use READ_BACKEND
to get untrustworthy information from the backend, to use trustworthy
information from /libxl.
This will involve replacing READ_BACKEND, which reads from be_path,
with a similar macro READ_LIBXLDEV, which reads from libxl_path.
The macro name change generates a lot of clutter in the diff. So we
break it out into this separate patch. Here, we rename the macro, but
the implementation does not really match the new name.
So, another way to look at this, is that we have transformed the bug:
* All of the backends use READ_BACKEND, which is unsafe
into the new bug:
* READ_LIBXLDEV actually reads be_path, which is unsafe.
There is no functional change as yet.
This is part of XSA-178.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Wed, 4 May 2016 15:18:36 +0000 (16:18 +0100)]
libxl: Rename libxl__device_{nic,channel}_from_xs_be to _from_xenstore
We are going to change these functions to expect, and be passed, a
/libxl path. So it is wrong that they are called _from_xs_be.
Neither function reads anything which isn't found in both places, so
we can and will change the call sites later.
The only remaining function in libxl called *_from_xs_be relates to
PCI devices, for which the backend domain is hardcoded to 0 throughout
the libxl_pci.c.
No functional change.
This is part of XSA-178.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Fri, 29 Apr 2016 17:29:45 +0000 (18:29 +0100)]
libxl: Do not trust backend for disk; fix driver domain disks list
Rework libxl__device_disk_from_xs_be (which takes a backend path) into
to libxl__device_disk_from_xenstore (which takes a libxl path).
libxl__device_disk_from_xenstore now finds the backend path itself,
although it doesn't use it any more for most of its functions. We
rename the variable from be_path to backend_path to make sure we
didn't miss any cases.
All the data collection is now done by reading from the copy in
/libxl.
libxl_device_disk_list and its helper libxl__append_disk_list (which
used to be libxl__append_disk_list_of_type) need extensive rework,
because they now need to specify the /libxl path rather than the
backend path.
To do that they enumerate disks by looking in the appropriate area in
/libxl. Previously they scanned various of the backend directories in
dom0 (which was broken for driver domains). It is no longer necessary
to enumerate the various disk backends, because they all use the same
paths in /devices. libxl__device_disk_from_xenstore will parse the
type out of the backend path, for itself. (Indeed, it did so before -
the now-gone type parameter to libxl__append_disk_list_of_type wasn't
used other than to construct the directory to list.)
Finally, remove a redundant store to pdisk->backend_domid in
libxl__append_disk_list[_of_type]. Even before this commit, that
store was not needed because libxl_device_disk_init (called by
libxl__device_disk_from_xenstore) would zero it. Now it overwrites
the correct backend domid with zero; so remove it.
This is part of XSA-178.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: Also fix up COLO reads, following rebase
Ian Jackson [Fri, 29 Apr 2016 15:23:35 +0000 (16:23 +0100)]
libxl: Do not trust backend for disk eject vdev
For disk eject, use configured vdev from /libxl, not backend.
The backend directory is writeable by driver domains. This means that
a malicious driver domain could cause libxl to see a wrong vdev,
confusing the user or the toolstack.
Use the vdev from the /libxl space, rather than the backend.
For convenience, we read the vdev from the /libxl space into the evg
during setup and copy it on each event, rather than reading it afresh
each time (which would in any case involve generating or saving a copy
of the relevant /libxl path).
This is part of XSA-178.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Fri, 29 Apr 2016 15:57:14 +0000 (16:57 +0100)]
libxl: Do not trust backend for vtpm in getinfo (uuid)
Use uuid from /libxl, rather than from backend. I think the backend
is not supposed to change the uuid, since it seems to be set by libxl
during setup.
If in fact the backend is supposed to be able to change the uuid, this
patch needs to be dropped and replaced by a patch which makes the vtpm
uuid lookup tolerate bad or missing data.
This is part of XSA-178.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Fri, 29 Apr 2016 16:18:44 +0000 (17:18 +0100)]
libxl: Do not trust backend for vtpm in getinfo (except uuid)
* Do not check the backend for existence. We have already read the
/libxl path so know that the vtpm exists (or is supposed to); if the
backend doesn't exist then that must be the backend's doing.
* Get the frontend path from the /libxl directory.
* The frontend domid is the guest domid, and does not need to be read
from xenstore (!)
We still attempt to read the uuid from the backend. This will be
fixed in the next patch.
This is part of XSA-178.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Fri, 29 Apr 2016 15:19:28 +0000 (16:19 +0100)]
libxl: Make copy of every xs backend in /libxl in _generic_add
We want to stop libxl trustingly reading information from the backend
directory (since this is, of course, writeable by the backend, which
might be a semi-trusted driver domain).
In principle it is wrong in current libxl for anything to try to
divine virtual device configuration from xenstore: the JSON domain
config ought to supply that, and xenstore should only tell us which
devices actually exist.
However:
Firstly, there are several existing places where configuration
information is retrieved from xenstore rather than JSON. We do not
want to reen gineer this in a security patch.
Secondly, we want to make a security patch which can be backported to
versions of libxl without the JSON configuration machinery.
So we take the expedient approach of keeping a copy of the
configuration somewhere we trust, namely /libxl. This is obviously
fairly low-risk, although it does write significantly more keys in
xenstore.
In this patch we make this change in libxl__device_generic_add. This
is responsible for actually writing the vast majority of device
information to xenstore. There are a few loose ends which will be
dealt with in a moment.
Likewise, changes to readers to use the new location will appear in
further patches.
This is part of XSA-178.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Thu, 5 May 2016 15:17:18 +0000 (16:17 +0100)]
libxl: Do not trust frontend for vusb
Do not use the frontend directory for enumerating the vusb devices;
since the frontend could delete them, this could result in devices
being lost and not torn down, etc. Instead, use the /libxl directory
for enumeration. So:
* Replace vusb_be_from_xs_fe with vusb_be_from_xs_libxl
* Change the call sites
* Change various places that use the dompath to use libxl_dom_path
* Rename some `path' variables appropriate (to spot any missed updates)
* Parse backend domid out of backend path rather than reading it from
the frontend (several places)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v3: Whitespace adjustment to parameter list indentation
v2: New patch, following rebase
Ian Jackson [Tue, 3 May 2016 16:24:32 +0000 (17:24 +0100)]
libxl: Do not trust frontend for channel in getinfo
libxl_device_channel_getinfo needs to examine devices without trusting
frontend-controlled data. So:
* Use /libxl to find the backend path.
* Parse the backend path to find the backend domid, rather than
reading it from the frontend.
* Tolerate FRONTEND/tty vanishing.
Note that there is a strange off-by-one error in the computation of
both fe_path and libxl_path in libxl_device_channel_getinfo: the
incoming channel->devid, which is copied to channelinfo->devid, has +1
applied to calculate the frontend path (and, after this patch, the
libxl path). I.e., the devid passed to libxl_device_channel_getinfo
must be one less than the actual devid for the device being asked
about.
This is actually a bug which mirrors a bug in
libxl__append_channel_list, which fills in the devids of the channel
devices it finds with sequentially increasing numbers starting at 0.
In the usual case channels have real devids starting at 1 (because
there is the console, which is devid 0, but not a channel). So these
bugs usually cancel out.
We do not address this problem at this time. This bug does not have
any security implications.
This patch is part of XSA-175.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Tue, 3 May 2016 16:01:56 +0000 (17:01 +0100)]
libxl: Do not trust frontend for channel in list
libxl_device_channel_list should not trust frontend-provided data.
So it needs to iterate using the /libxl paths, and read the backend
path out of /libxl.
However, it also filters out pure "consoles", which are channels
without a "name". But the name was stored only in the frontend
directory, which the frontend can delete.
So store the name in the backend too. (Ideally we would store it in
/libxl, where the backend can't write to it either, but
libxl__device_console_add not currently have access to the xenstore
transaction used by libxl__device_generic_add. Protection against the
backend will come later, in XSA-178.)
Because the libxl paths are defined to be in terms of the frontend
device types, not the backend device types, it is no longer correct
for libxl__append_channel_list to take a type argument. Abolish this
(with no functional effect).
This is part of XSA-175.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Tue, 3 May 2016 14:58:32 +0000 (15:58 +0100)]
libxl: Do not trust frontend for vtpm list
libxl_device_vtpm_list needs to enumerate and identify devices without
trusting frontend-controlled data. So
* Use the /libxl path to enumerate vtpms.
* Use the /libxl path to find the corresponding backends.
* Parse the backend path to find the backend domid.
This is part of XSA-175.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Fri, 29 Apr 2016 18:21:51 +0000 (19:21 +0100)]
libxl: Do not trust frontend for disk in getinfo
* Rename the frontend variable to `fe_path' to check we caught them all
* Read the backend path from /libxl, rather than from the frontend
* Parse the backend domid from the backend path, rather than reading it
from the frontend (and add the appropriate error path and initialisation)
This is part of XSA-175.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Wed, 27 Apr 2016 15:08:49 +0000 (16:08 +0100)]
libxl: Do not trust frontend for disk eject event
Use the /libxl path for interpreting disk eject watch events: do not
read the backend path out of the frontend. Instead, use the version
in /libxl. That avoids us relying on the guest-modifiable
$frontend/backend pointer.
To implement this we store the path
/libxl/$guest/device/vbd/$devid/backend
in the evgen structure.
This is part of XSA-175.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>