Don Slutz [Fri, 2 May 2014 20:18:08 +0000 (16:18 -0400)]
hvm/hpet: Detect comparator values in the past
This statement only works using 64-bit arithmetic for the main
63
counter never changing by more then 2 . (Which is a boundary
case that should not happen in my life time.)
Signed-off-by: Don Slutz <dslutz@verizon.com> Acked-by: Tim Deegan <tim@xen.org>
Don Slutz [Fri, 2 May 2014 20:18:07 +0000 (16:18 -0400)]
hvm/hpet: Prevent master clock equal to comparator while enabled
Based on the software-developers-hpet-spec-1-0a.pdf, the comparator
for a periodic timer will change to the new value when it matches
the master clock. The current code here uses a very standard
rounding formula of "((x + y - 1) / y) * y". This is wrong because
in this case you need to go to the next comparator value when "x"
equals "y". Not when "x + 1" equals "y". In this case "y" is the
period and "x" is the master clock.
Don Slutz [Fri, 2 May 2014 20:18:06 +0000 (16:18 -0400)]
hvm/hpet: comparator can only change when master clock is enabled.
This is based on software-developers-hpet-spec-1-0a.pdf saying:
When the main counter value matches the value in the timer's
comparator register, an interrupt can be generated. The hardware
will then automatically increase the value in the compare register
by the last value written to that register.
When the overall enable is off (the main count is halted), none of
the compare registers should change.
Don Slutz [Fri, 2 May 2014 20:18:05 +0000 (16:18 -0400)]
hvm/hpet: Init comparator64 like comparator.
The software-developers-hpet-spec-1-0a.pdf says that the comparator
starts as all 1's. Also make the hidden register comparator64 the same.
Since only the hidden register comparator64 is used by hpet_save, it
needs to start out with the right value.
A disabled hpet (like when a guest is starting), should start with
the value the spec says. Both the guest (via reading the
comparator) and an administrator using xen-hvmctx, will see all 0's
not all 1's.
Signed-off-by: Don Slutz <dslutz@verizon.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Don Slutz [Fri, 2 May 2014 20:18:04 +0000 (16:18 -0400)]
hvm/hpet: In hpet_save, call hpet_get_comparator.
This changes save data to consistent/expected values. It is not
technically required because hpet_get_comparator() will adjust from
any value to the correct value. And hpet_get_comparator() is
effectivly called in hpet_load via hpet_set_timer.
However it does look strange to people that the output from
xen-hvmctx for the comparator values do not change when the master
clock does.
The software-developers-hpet-spec-1-0a.pdf says that the comparator
will allways be greater than master clock for a periodic timer.
Signed-off-by: Don Slutz <dslutz@verizon.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Don Slutz [Fri, 2 May 2014 20:18:02 +0000 (16:18 -0400)]
hvm/hpet: Correctly limit period to a maximum.
In the code section after the comment:
/*
* Clamp period to reasonable min/max values:
* - minimum is 100us, same as timers controlled by vpt.c
* - maximum is to prevent overflow in time_after() calculations
*/
The current maximum limit actually allows "bad" values like 0 and 1.
This is because it uses a mask not a maximum.
Signed-off-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Don Slutz [Fri, 2 May 2014 20:18:00 +0000 (16:18 -0400)]
hvm/hpet: Only call guest_time_hpet(h) one time per action.
This call is expensive and will cause extra time to pass.
The software-developers-hpet-spec-1-0a.pdf does not say how long it
takes after the main clock is enabled before the first change of the
master clock. Therefore multiple calls to guest_time_hpet(h) are
not needed. Since each timer is started by a loop, each ones start
time will change on the multple calls. In the real hardware, there
is not delta based on which timer.
Without this change it is possible for an HVM guest running linux to
get the message:
..MP-BIOS bug: 8254 timer not connected to IO-APIC
On the guest console(s); and the guest will panic.
Don Slutz [Fri, 2 May 2014 20:17:59 +0000 (16:17 -0400)]
hvm/hpet: Add manual unit test code.
Add the code at tools/tests/vhpet.
See comment in tools/tests/vhpet/main.c for details on running
either in a xen source tree or elsewhere.
A basic in source tree usage is:
make -C tools/tests/vhpet run
Does repro the bug:
..MP-BIOS bug: 8254 timer not connected to IO-APIC
The make file includes coping hpet.c and hpet.h from the source
tree. hpet.c is then modifed to remove all include file and add the
emul.h include file.
The manual test code has only a few automatic checks that output
messages to stderr:
1) Possible ..MP-BIOS bug: 8254 timer...
if 1st period is not <= the expected value
2) hpet_set_mode(%ld): T%d Error: Set ...
if read of comparator != write of comparator in
3) hpet_check_stopped(%ld): T%d Error: Set ...
if read != write
4) main(%ld): With clock stopped mc64 changed: ...
if hpet_save returns different master clock values when called
more then once.
It also generates a lot of output, which is why the sugested way to
use includes a redirect of stdout to a file.
Signed-off-by: Don Slutz <dslutz@verizon.com> Acked-by: Tim Deegan <tim@xen.org>
Matthew Daley [Sun, 4 May 2014 08:31:47 +0000 (20:31 +1200)]
xenstat: don't leak memory in getBridge
getBridge's method of returning a result was a little confused:
allocating a result buffer but never using it.
Simplify by instead allowing a result buffer to be passed in and
modifying the single usage to match.
Signed-off-by: Matthew Daley <mattd@bugfuzz.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Matthew Daley [Sun, 4 May 2014 08:31:46 +0000 (20:31 +1200)]
xenstat: fix unsigned less-than-0 comparison
Commit 1438d36f ("xenstat: Fix buffer over-run with new_domains being
negative.") attempted to fix the handling of a negative error result
from xc_domain_getinfolist in xenstat_get_node. However, it forgot to
change the result variable from an unsigned type to a signed one.
Do so, allowing the error result to be handled properly.
Signed-off-by: Matthew Daley <mattd@bugfuzz.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Mon, 5 May 2014 13:30:28 +0000 (15:30 +0200)]
tools/libxl: add direct_io_safe to check-xl-disk-parse
Add missing bool "direct_io_safe" to expected output. It was added by
Commit 6ec48cf4 ("libxl: introduce an option for disabling the
non-O_DIRECT workaround"), but check-xl-disk-parse was not updated.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Wed, 7 May 2014 14:36:11 +0000 (16:36 +0200)]
x86: reduce redundancy in tsc_[gs]et_info()
- some of the case statements are effectively or mostly special cases
of others, so there's no good reason not to share the code
- in the "get" function, a variable can be made case-wide instead of
having multiple instance of it (and those even with a pointless
initializer)
- minor formatting adjustments
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Feng Wu [Tue, 6 May 2014 11:51:27 +0000 (13:51 +0200)]
x86: define macros CPUINFO_features and CPUINFO_FEATURE_OFFSET
This patch defines macros CPUINFO_features and CPUINFO_FEATURE_OFFSET.
CPUINFO_features can be used as the base of the offset for cpu features,
while CPUINFO_FEATURE_OFFSET is used to define the right offset for
specific CPU feature.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Some further cleanup (both to the patch and to surrounding code).
Jan Beulich [Tue, 6 May 2014 11:30:31 +0000 (13:30 +0200)]
NPT: temporarily retain page table mapping in do_recalc()
Commit b3e024f3 ("x86/NPT: don't walk page tables when changing types
on a range") neglected the fact that p2m_next_level() replaces the
previous level's mapping with the new level's one, hence dereferencing
a stale pointer the translation for which may no longer be available
(timing dependent). Add a parameter to that function allowing the
caller to request that the mapping be retained (the unmapping will be
taken care of by the caller then).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
libxl: introduce an option for disabling the non-O_DIRECT workaround
Document and implement a new option that permits disk backends which
would otherwise have to avoid O_DIRECT (because of the network memory
lifetime bug) to use it anyway. This is:
direct-io-safe in the xl domain disk config specification
direct_io_safe in the libxl disk API
direct-io-safe in the backend xenstore interface
Add a reference to xen/include/public/io/blkif.h in
docs/misc/vbd-interface.txt.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Felipe Franciosi <felipe@paradoxo.org>
Ian Jackson [Fri, 2 May 2014 16:47:55 +0000 (17:47 +0100)]
libxl: Rerun bison
This updates libxlu_cfg_y.[ch] to code generated by bison from
Debian wheezy (1:2.5.dfsg-2.1 i386).
There should be no functional change since there is no change to the
source file, but we will inherit bugfixes and behavioural changes from
the new version of bison. So this is more a matter of hope than
knowledge.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Thu, 24 Apr 2014 21:06:27 +0000 (22:06 +0100)]
tools/mfn-dump: Fixes to 'dump-p2m'
* Don't walk off the end of p2m_table under the mistaken impression that it
contains toolstack unsigned longs. Despite its array type it contains guest
unsigned longs so unconditionally needs casting to the guest width to use
correctly. Furthermore, a 64bit toolstack must be extra careful when it
finds a 32bit guest's INVALID_MFN.
* Drop 'mapped' and 'pinned' descriptions. This are both bogus, including all
uses of the is_mapped() macro.
* Rearrange the type name printing to be more concise.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Thu, 24 Apr 2014 21:17:57 +0000 (22:17 +0100)]
tools/misc: Fix linkage of libxenstore
* xen-mfndump doesn't use xenstore at all. Don't link against it.
* xen-hptool can include the correct header rather than externing itself a
single function.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Daniel De Graaf [Mon, 28 Apr 2014 23:29:10 +0000 (19:29 -0400)]
vtpmmgr: properly remove t_uint size dependency
Rather than using the internal MPI format for the Diffie-Hellman group,
whose representation depends on the size of the t_uint type, store the
value as a big-endian integer and use mpi_read_binary to convert it in
an architecture-independent manner. This patch also removes the
unnecessary range check on the exponent which ended up being different
between 32- and 64-bit code.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
libxl: add support for OS-specific names to backend interfaces
libxl__device_nic_devname used to hardcode backend network interfaces
as "vif<domid>.<handle>", remove this limitation and allow libxl to
deal with OS-specific interface names.
xen/arm: IRQ: Do not allow IRQ to be shared between domains and XEN
The current dt_route_irq_to_guest implementation sets IRQ_GUEST even if the
IRQ is correctly setup.
An IRQ can be shared between devices, if the devices are not assigned to the
same domain or Xen, then this could result in routing the IRQ to the domain
instead of Xen ...
Also avoid to relying on wrong the behaviour when Xen is routing an IRQ to
DOM0. Therefore check the return code from route_dt_irq_to_guest in
map_device.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: IRQ: Defer routing IRQ to Xen until setup_irq() call
When an IRQ is handling by Xen, setup is done in 2 steps:
- Route the IRQ to the current CPU and set priorities
- Set up the handler
For PPIs, these steps are called on every cpu. For SPIs, they are only called
on the boot CPU.
Dividing the setup in two step complicates the code when a new driver is
added to Xen (for instance a SMMU driver). Xen can safely route the IRQ
when the driver sets up the interrupt handler.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: IRQ: Require desc.lock be held by callers of hw_irq_controller callbacks
When multiple action are supported, gic_irq_{startup,shutdown} will have
to be called in the same critical section as setup/release.
Otherwise there is a race condition if at the same time CPU A is calling
release_dt_irq and CPU B is calling setup_dt_irq.
This could end up with the IRQ not being enabled.
At the same time, modify gic_irq_{enable,disable} to require desc.lock be held.
With both of theses changes, ARM's locking requirements is the same as x86's.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
This function retrieves a domain from an IRQ. It will be used in several
places (such as do_IRQ) to avoid duplicated code when multiple action will be
supported.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: IRQ: Move IRQ management from gic.c to irq.c
The file gic.c contains functions and variables which is not related to the GIC:
- release_irq
- setup_irq
- gic_route_irq_to_guest
- {,local_}irq_desc
Move all theses functions/variables in irq.c
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: IRQ: move gic {, un}lock in gic_set_irq_properties
The function gic_set_irq_properties is only called in two places:
- gic_route_irq: the gic.lock is only taken for the call to the
former function.
- gic_route_irq_to_guest: the gic.lock is taken for the duration of
the function. But the lock is only useful when gic_set_irq_properties.
So we can safely move the lock in gic_set_irq_properties and restrict the
critical section for the gic.lock in gic_route_irq_to_guest.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: timer: replace timer_dt_irq by timer_get_irq
The function is nearly only used to retrieve the IRQ number.
There is one place where the IRQ type is used (in domain_build.c) but
as the timer IRQ is virtualised for guest we might not have the same property
(e.g active-low level sensitive interrupt).
Replace timer_dt_irq by timer_get_irq which will return the IRQ number.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Fri, 2 May 2014 10:09:48 +0000 (12:09 +0200)]
domctl: perform initial post-XSA-77 auditing
In a number of cases, loops over each vCPU in a domain are involved
here. For large numbers of vCPU-s these may still take some time to
complete, but we're limiting them at a couple of thousand at most, so I
would think this should not by itself be an issue. I wonder though
whether it shouldn't be possible to have XSM restrict the vCPU count
that can be set through XEN_DOMCTL_max_vcpus.
XEN_DOMCTL_pausedomain:
A loop over vcpu_sleep_sync() for each of vCPU in the domain. That
function itself has a loop waiting for the subject vCPU to become non-
runnable, which ought to complete quickly (involving an IPI to be sent
and acted on). No other unbounded resource usage.
XEN_DOMCTL_unpausedomain:
Simply a loop calling vcpu_wake() (not having any loops or other
resource usage itself) for each of vCPU in the domain.
XEN_DOMCTL_getdomaininfo:
Two loops (one over all domains, i.e. bounded by the limit of 32k
domains, and another over all vCPU-s in the domain); no other
unbounded resource usage.
XEN_DOMCTL_getpageframeinfo:
Inquiring just a single MFN, i.e. no loops and no other unbounded
resource usage.
XEN_DOMCTL_getpageframeinfo{2,3}:
Number of inquired MFNs is limited to 1024. Beyond that just like
XEN_DOMCTL_getpageframeinfo.
XEN_DOMCTL_getvcpuinfo:
Only obtaining information on the vCPU, no loops or other resource
usage.
XEN_DOMCTL_setdomainhandle:
Simply a memcpy() of a very limited amount of data.
XEN_DOMCTL_setdebugging:
A domain_{,un}pause() pair (see XEN_DOMCTL_{,un}pausedomain) framing
the setting of a flag.
XEN_DOMCTL_hypercall_init:
Initializing a guest provided page with hypercall stubs. No other
resource consumption.
XEN_DOMCTL_arch_setup:
IA64 leftover, interface structure being removed from the public
header.
XEN_DOMCTL_settimeoffset:
Setting a couple of guest state fields. No other resource consumption.
Involve temporary memory allocations (approximately) bounded by the
number of CPUs in the system / number of nodes built for, which is
okay. Beyond that trivial operation.
XEN_DOMCTL_real_mode_area:
PPC leftover, interface structure being removed from the public
header.
XEN_DOMCTL_resumedomain:
A domain_{,un}pause() pair framing operation very similar to
XEN_DOMCTL_unpausedomain (see above).
XEN_DOMCTL_sendtrigger:
Injects an interrupt (SCI or NMI) without any other resource
consumption.
XEN_DOMCTL_subscribe:
Updates the suspend event channel, i.e. affecting only the controlled
domain.
MAX_CPUID_INPUT bounded loop, which is okay. No other resource
consumption.
XEN_DOMCTL_get_machine_address_size:
Simply obtaining the value set by XEN_DOMCTL_set_machine_address_size
(or the default set at domain creation time).
XEN_DOMCTL_gettscinfo:
XEN_DOMCTL_settscinfo:
Reading/writing of a couple of guest state fields wrapped in a
domain_{,un}pause() pair.
XEN_DOMCTL_audit_p2m:
Enabled only in debug builds.
XEN_DOMCTL_set_max_evtchn:
While the limit set here implies other (subsequent) resource usage,
this is the purpose of the operation.
I also verified that all removed domctls' handlers don't leak
hypervisor memory contents .
Inspected but questionable (and hence left in place for now):
XEN_DOMCTL_max_mem:
While only setting the field capping a domain's allocation (this
implies potential successive resource usage, but that's the purpose of
the operation). However, XSM doesn't see the value that's being set
here, so the net effect would be potential unbounded memory use.
XEN_DOMCTL_set_virq_handler:
This modifies a global array. While that is the purpose of the
operation, if multiple domains are granted permission they can badly
interfere with one another. Hence I'd appreciate a second opinion
here. [Andrew confirms that this being the nature of the operation,
it's fine to be removed from the list - will be done in a 2nd round.]
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 2 May 2014 10:09:03 +0000 (12:09 +0200)]
x86: fix guest CPUID handling
The way XEN_DOMCTL_set_cpuid got handled so far allowed for surprises
to the caller. With this set of operations
- set leaf A (using array index 0)
- set leaf B (using array index 1)
- clear leaf A (clearing array index 0)
- set leaf B (using array index 0)
- clear leaf B (clearing array index 0)
the entry for leaf B at array index 1 would still be in place, while
the caller would expect it to be cleared.
While looking at the use sites of d->arch.cpuid[] I also noticed that
the allocation of the array needlessly uses the zeroing form - the
relevant fields of the array elements get set in a loop immediately
following the allocation.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Boris Ostrovsky [Fri, 2 May 2014 10:03:36 +0000 (12:03 +0200)]
libxc: allow changing max number of hypervisor cpuid leaves
Add support for changing max number of hypervisor leaves from configuration
file.
This number can be specified using xl's standard 'cpuid' option. Only lowest
8 bits of leaf's 0x4000xx00 eax register are processed, all others are ignored.
The changes allow us to revert commit 80ecb40362365ba77e68fc609de8bd3b7208ae19
which is most likely no longer needed now anyway (Solaris bug that it addressed
has been fixed and backported to earlier releases) but leave possibility of
running unpatched version of Solaris by forcing number of leaves to 2 in the
configuration file.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 2 May 2014 09:53:38 +0000 (11:53 +0200)]
x86/NPT: don't walk entire page tables when globally changing types
Instead leverage the NPF VM exit enforcement by marking just the top
level entries as needing recalculation of their type, building on the
respective range type change modifications.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 2 May 2014 09:52:42 +0000 (11:52 +0200)]
x86/NPT: don't walk page tables when changing types on a range
This builds on the fact that in order for no NPF VM exit to occur,
_PAGE_USER must always be set. I.e. by clearing the flag we can force a
VM exit allowing us to do similar lazy type changes as on EPT.
That way, the generic entry-wise code can go away, and we could remove
the range restriction in enforced on HVMOP_track_dirty_vram for XSA-27.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 2 May 2014 09:51:46 +0000 (11:51 +0200)]
x86/EPT: don't walk page tables when changing types on a range
This requires a new P2M backend hook and a little bit of extra care on
accounting in the generic function.
Note that even on leaf entries we must not immediately set the new
type (in an attempt to avoid the EPT_MISCONFIG VM exits), since the
global accounting in p2m_change_type_range() gets intentionally done
only after updating page tables (or else the update there would
conflict with the function's own use of p2m_is_logdirty_range()), and
the correct type can only be calculated with that in place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 2 May 2014 09:50:43 +0000 (11:50 +0200)]
x86/EPT: don't walk entire page tables when globally changing types
Instead leverage the EPT_MISCONFIG VM exit by marking just the top
level entries as needing recalculation of their type, propagating the
the recalculation state down as necessary such that the actual
recalculation gets done upon access.
For this to work, we have to
- restrict the types between which conversions can be done (right now
only the two types involved in log dirty tracking need to be taken
care of)
- remember the ranges that log dirty tracking was requested for as well
as whether global log dirty tracking is in effect
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Paul Durrant [Fri, 2 May 2014 09:46:32 +0000 (11:46 +0200)]
hvm_set_ioreq_page() releases wrong page in error path
The function calls prepare_ring_for_helper() to acquire a mapping for the
given gmfn, then checks (under lock) to see if the ioreq page is already
set up but, if it is, the function then releases the in-use ioreq page
mapping on the error path rather than the one it just acquired. This patch
fixes this bug.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
tmem_rwlock is unnecessary in tmem_relinquish_pages(), as
such lock is used as gate for hypercalls. However
tmem_relinquish_pages deals with pages that are no longer
owned by any domain - hence there is no need for tmem_rwlock.
Also the function is protected by the 'heap_lock' which
is the only calleer of this function.
This patch drops said lock.
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Fri, 2 May 2014 08:56:23 +0000 (10:56 +0200)]
x86/HVM: clean up HVMOP_set_mem_type processing
- drop unused variable "mfn"
- consistently do not use "else" when the prior "if" ends in "goto"
- use printk() referencing the target domain instead of gdprintk()
(which references the current domain) and slightly shorten message
- annotate -EINVAL results in paging/shared paths to actually need
switching to -EAGAIN (possible only when preemption logic got fixed
to use -ERESTART)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 2 May 2014 08:54:07 +0000 (10:54 +0200)]
x86/EPT: flush cache when (potentially) limiting cachability
While generally such guest side changes ought to be followed by guest
initiated flushes, we're flushing the cache under similar conditions
elsewhere (e.g. when the guest sets CR0.CD), so let's do so here too.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 2 May 2014 08:51:32 +0000 (10:51 +0200)]
x86/EPT: also force EMT re-evaluation if pinned ranges change
This was inadvertently left out of aa9114ed ("x86/EPT: force
re-evaluation of memory type as necessary"). Note that this
intentionally doesn't use memory_type_changed(): Changes to the pinned
ranges are independent of IOMMU presence, which that function uses to
determine whether to call the underlying p2m_memory_type_changed().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 2 May 2014 08:50:04 +0000 (10:50 +0200)]
x86/EPT: refine direct MMIO checking when determining EMT
With need_iommu() only ever true when iommu_enabled is also true, and
with the former getting set when a PCI device gets added to a guest,
the checks can be consolidated. The range set check are left in place
just in case raw MMIO or I/O port ranges get passed to a guest.
At once drop open-coding of cache_flush_permitted().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tim Deegan <tim@xen.org>
Julien Grall [Thu, 1 May 2014 10:54:40 +0000 (11:54 +0100)]
xen/arm: Correctly save/restore CNTKCTL_EL1
CNTKCTL_EL1 is used by the guest to control access to the timer from
userspace. It therefore needs to be save/restored by Xen as part of
the VCPU state.
By default Linux on ARM64 exposes the timer to userspace. Furthermore on
ARM64, Linux provides helpers in a VDSO (gettimeofday/__do_get_tspec)
that use the timer counter. Conversely, during CPU bring up, Xen will
set CNTKCTL_EL1 to 0 (i.e disallow timer access to the userspace). As
a result, currently, if dom0 has 1 VCPU which is migrated to another
PCPU, init might crash.
Alternatively, a guest (malicious or not) might decide to disable
access to the timer from userspace. If the register is not
save/restored, when a DOM0 VCPU runs again, a similar crash would
result.
Also, drop CNTKCTL_EL1 initialization in init_timer_interrupt. Xen
should let the guest deal with this register.
This is XSA-91 / CVE-2014-3125.
Reported-by: Chen Baozi <baozich@gmail.com> Signed-off-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
To correctly migrate a PV guest, the toolstack must remove Xen mappings from
the guest pagetables. For 32bit PV guests, the pagetables cannot be walked
from the top so upon encountering an L2 table, the toolstack must decide
whether it contains Xen mappings or not, to avoid corrupting L2s without Xen
mappings.
The migration code performs this search efficiently by knowing that the Xen
mappings will start at a known L2e and point to a known mfn, which will be the
first mfn in the m2p table.
Unfortunately there are two m2p tables in use; the regular and the
compatibility one. The toolstack looks for the first mfn of its own m2p table
in the guest pagetables. This only works if the toolstack is the same bitness
as the 32bit domain being migrated, and leaves a problem for 64bit toolstacks
which will never be able to find its regular m2p in a compat guest.
It appears that this bug for 64bit toolstacks was discovered, but hacked
around in an unsafe manner. The code currently shoots any invalid L2es and
doesn't report a failure for L2 tables in a 32 bit guest, even after the guest
is paused. This means that non Xen entries which should fail the migration
don't, and the guest will resume on the far side with unexpectedly fewer
present pagetable entries.
This patch introduces XENMEM_machphys_compat_mfn_list which permits a 64bit
toolstack to access the compat m2p mfn list, for the purpose of correctly
identifying Xen entries in a 32bit guest.
It is worth noting for completeness that 64bit PV guests don't have any of
these games to play. The Xen mappings are present at a known location in all
L4 tables, so can be safely shot by 32 and 64bit toolstacks without looking at
where the mapping points to.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 29 Apr 2014 13:18:39 +0000 (15:18 +0200)]
x86/boot: move some __high_start code and data into init sections
Half of __high_start is strictly for the BSP and will only be run once on
boot. To complement 'start_secondary', create 'start_bsp' and move it into
the init.text section.
The interrupt handler 'ignore_int' is patched into the BSPs IDT, but fully
replaced with real handlers early during boot. The BSPs IDT is used by APs
until midway through start_secondary, but after the real handlers have been
installed. Therefore, 'ignore_int' can move to init.text. Furthermore, its
strings can move to init.rodata.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 29 Apr 2014 13:11:31 +0000 (15:11 +0200)]
x86/HVM: restrict HVMOP_set_mem_type
Permitting arbitrary type changes here has the potential of creating
present P2M (and hence EPT/NPT/IOMMU) entries pointing to an invalid
MFN (INVALID_MFN truncated to the respective hardware structure field's
width). This would become a problem the latest when something real sat
at the end of the physical address space; I'm suspecting though that
other things might break with such bogus entries.
Along with that drop a bogus (and otherwise becoming stale) log
message.
Afaict the similar operation in p2m_set_mem_access() is safe.
This is XSA-92.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
We accumulated quite a number of these, despite having a pre-canned
interface for it.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Fri, 25 Apr 2014 10:13:31 +0000 (12:13 +0200)]
passthrough: allow to suppress SERR and PERR signaling altogether
This is just to have a workaround at hand in case other chipsets (not
covered by the previous two patches) also have similar issues.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Don Dugger <donald.d.dugger@intel.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Jan Beulich [Fri, 25 Apr 2014 10:12:38 +0000 (12:12 +0200)]
VT-d: suppress UR signaling for desktop chipsets
Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the desktop chipsets dealt with here.
This is CVE-2013-3495 / XSA-59.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Don Dugger <donald.d.dugger@intel.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Jan Beulich [Fri, 25 Apr 2014 10:11:55 +0000 (12:11 +0200)]
VT-d: suppress UR signaling for server chipsets
Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the server chipsets dealt with here.
IDs 0xe00, 0xe01, and 0xe04 ... 0xe0b (Ivytown) aren't needed here -
Intel confirmed the issue to be fixed in hardware there.
This is CVE-2013-3495 / XSA-59.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Don Dugger <donald.d.dugger@intel.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Jan Beulich [Fri, 25 Apr 2014 10:07:22 +0000 (12:07 +0200)]
x86/P2M: simplify write_p2m_entry()
The "table_mfn" parameter really isn't needed anywhere, so it gets
dropped.
The "struct vcpu *" one was always bogus (as was being made up by
paging_write_p2m_entry()), and is not commonly used. It can be easily
enough made up in the one place (sh_unshadow_for_p2m_change()) it is
needed, and we can otherwise pass "struct domain *" instead, properly
reflecting that P2M operations are per-domain.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Daniel De Graaf [Thu, 24 Apr 2014 20:39:10 +0000 (16:39 -0400)]
vtpmmgr: fix 32-bit compilation
The internal MPI word size matches the word size of the platform rather
than using uint32_t/uint64_t, so constant MPI objects need to be
initialized with that in mind.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
George Dunlap [Wed, 23 Apr 2014 13:19:43 +0000 (14:19 +0100)]
docs: Give advice on dealing with quoting special characters
The man page mentions passing config snippets key=val in xl create.
Unfortunately, the config syntax contain characters which are
interpreted (and often discarded) by the shell before getting passed
in, resulting in a parsing error.
For example:
xl create hvm.cfg cpus="0-3"
The shell will eat the quotes and pass 'cpus=0-3' to xl, which won't
parse properly without the quotes.
Mention this in the man page, and recommend the use of single quotes, as well
as separating multiple arguments with a semicolon, thus:
xl create hvm.cfg 'cpus="0-3"; pci=["1.0","1.1"]'
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 17 Apr 2014 12:57:24 +0000 (13:57 +0100)]
xen: arm: fully implement multicall interface.
I'm not sure what I was smoking at the time of 5d74ad1a082e "xen: arm:
implement do_multicall_call for both 32 and 64-bit" but it is obviously
insufficient since it doesn't actually wire up the hypercall.
Before doing so we need to make the usual adjustments for ARM and turn the
unsigned longs into xen_ulong_t. There is no difference in the resulting
structure for x86.
There are knock on changes to the trace interface, but again they are nops on
x86.
For 32-bit ARM guests we require that the arguments which they pass to a
hypercall via a multicall do not use the upper bits of xen_ulong_t and kill
them if they violate this. This should ensure that no ABI surprises can be
silently lurking when running on a 32-bit hypervisor waiting to pounce when the
same kernel is run on a 64-bit hypervisor. Killing the guest is harsh but it
will be far easier to relax the restriction if it turns out to cause problems
than to tighten it up if we were lax to begin with.
In the interests of clarity and always using explicitly sized types change the
unsigned int in the hypercall arguments to a uint32_t. There is no actual
change here on any platform.
We should consider backporting this to 4.4.1 in case a guest decides they want
to use a multicall in common code e.g. I suggested such a thing while
reviewing a netback change recently.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: keir@xen.org Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>