Ting-Wei Lan [Wed, 5 Aug 2015 17:10:06 +0000 (01:10 +0800)]
VT-d: add iommu=igfx option to workaround graphics issues
When using Linux >= 3.19 (commit 47591df) as dom0 on some Intel Ironlake
devices, It is possible to encounter graphics issues that make screen
unreadable or crash the system. It was reported in freedesktop bugzilla:
As we still cannot find a proper fix for this problem, this patch adds
iommu=igfx option to control whether Intel graphics IOMMU is enabled.
Running Xen with iommu=no-igfx is similar to running Linux with
intel_iommu=igfx_off, which disables IOMMU for Intel GPU. This can be
used by users to manually workaround the problem before a fix is
available for i915 driver.
Signed-off-by: Ting-Wei Lan <lantw44@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 4 Aug 2015 17:16:34 +0000 (18:16 +0100)]
tools/libxl: Prepare to write multiple records with EMULATOR headers
With the newly specified EMULATOR_XENSTORE_DATA record, there are two
libxl records with an emulator subheader. Refactor the existing code to
make future additions easier, and rename some functions for consistency
with the new scheme.
* Calculate the subheader at stream start time, rather than on the fly.
Its contents are not going to change.
* Introduce a new setup_emulator_write() to insert a sub header in the
appropriate place before a blob of data.
* Rename *toolstack_* to *emulator_xenstore_*
* Rename *emulator_* to *emulator_context_*
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Andrew Cooper [Tue, 4 Aug 2015 17:16:32 +0000 (18:16 +0100)]
docs/libxl: Re-specify XENSTORE_DATA as EMULATOR_XENSTORE_DATA
The legacy "toolstack" record as implemented in libxl turns out not to
be 32/64bit safe. As migration v2 has not shipped yet, take this
opportunity to adjust the specification and fix the incompatibility.
Libxl shall loose all knowledge of the old "toolstack" blob and use this
EMULATOR_XENSTORE_DATA record instead. Compatibility shall be handled
by the legacy conversion script.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Tue, 4 Aug 2015 17:16:31 +0000 (18:16 +0100)]
tools/libxl: Make libxl__conversion_helper_abort() safe to use
Previously, in the case of an error causing a call to
libxl__conversion_helper_abort() on a stream without legacy conversion,
libxl would fall over a NULL pointer because chs->ao was not set up.
Arrange for all ->ao's to be set up at _init() time, by having each
_init() function assert that their caller has done the right thing.
While doing so, introduce a previously-missing save_helper_init() in
stream_read_init().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Roger Pau Monne [Tue, 4 Aug 2015 10:02:55 +0000 (12:02 +0200)]
libxl: increase hotplug timeout to 40s
The default libxl timeout for hotplug scripts execution is too low, when
launching 40 HVM guests in parallel, all using the same file as disk,
execution times of ~20s are expected. Increase the timeout to 40s in order
to be sure hotplug scripts have enough time to execute.
This is a short term solution.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 3 Aug 2015 17:05:43 +0000 (18:05 +0100)]
x86/gdt: Drop write-only, xalloc()'d array from set_gdt()
It is not used, and can cause a spurious failure of the set_gdt() hypercall in
low memory situations.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Paul Durrant [Fri, 31 Jul 2015 15:34:22 +0000 (16:34 +0100)]
x86/hvm: don't rely on shared ioreq state for completion handling
Both hvm_io_pending() and hvm_wait_for_io() use the shared (with emulator)
ioreq structure to determined whether there is a pending I/O. The latter will
misbehave if the shared state is driven to STATE_IOREQ_NONE by the emulator,
or when the shared ioreq page is cleared for re-insertion into the guest
P2M when the ioreq server is disabled (STATE_IOREQ_NONE == 0) because it
will terminate its wait without calling hvm_io_assist() to adjust Xen's
internal I/O emulation state. This may then lead to an io completion
handler finding incorrect internal emulation state and calling
domain_crash().
This patch fixes the problem by adding a pending flag to the ioreq server's
per-vcpu structure which cannot be directly manipulated by the emulator
and thus can be used to determine whether an I/O is actually pending for
that vcpu on that ioreq server. If an I/O is pending and the shared state
is seen to go to STATE_IOREQ_NONE then it can be treated as an abnormal
completion of emulation (hence the data placed in the shared structure
is not used) and the internal state is adjusted as for a normal completion.
Thus, when a completion handler subsequently runs, the internal state is as
expected and domain_crash() will not be called.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Tested-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ting-Wei Lan [Thu, 30 Jul 2015 06:51:10 +0000 (14:51 +0800)]
build: use correct qemu path in systemd service file and init script
When --with-system-qemu is used, it is possible that we cannot find
qemu-system-i386 in LIBEXEC_BIN, which can cause error in xencommons
init script and xen-qemu-dom0-disk-backend.service systemd service.
Signed-off-by: Ting-Wei Lan <lantw44@gmail.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ravi Sahita [Wed, 29 Jul 2015 16:39:22 +0000 (09:39 -0700)]
x86/hvm.c: Don't tear down altp2m state if it was never set up
Reported-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ravi Sahita <ravi.sahita@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- replacement subject from Andy ]
Andrew Cooper [Tue, 28 Jul 2015 21:44:37 +0000 (22:44 +0100)]
tools/libxl: Assert that libxl__ao_inprogress_gc() is not called with NULL
libxl__ao_inprogress_gc() is hidden behind various macros used to
construct local variables. Assert() that NULL is not passed, to make
such an error very obvious, rather than a plain segfault at 0.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Tue, 28 Jul 2015 21:44:36 +0000 (22:44 +0100)]
tools/libxl: Only continue stream operations if the stream is still in progress
Part of the callback contract with check_all_finished() is that each
running parallel task shall call it exactly once.
Previously, it was possible for stream_continue() or
write_toolstack_record() to fail and call into check_all_finished(). As
the save helpers callback has fired, it no longer counts as in use,
which causes check_all_finished() to fire the stream callback. Then,
unwinding the stack back and calling check_all_finished() a second time
results in the same conditions being observed, and the stream callback
being fired a second time.
To avoid this, check_all_finished() is called before any other actions
which continue the stream functionality, and the stream is only
continued if it has not been torn down. This guarantees not to continue
stream operations if the stream does not owe a callback to
check_all_finished().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Campbell [Wed, 29 Jul 2015 10:00:36 +0000 (11:00 +0100)]
Replace FSF street address with canonical URL
As recommended in http://www.gnu.org/licenses/gpl-howto.en.html.
This is the result of:
$ git grep -El Mass\|Temple\|Franklin | xargs ./fsf.pl
Where fsf.pl is:
#!/usr/bin/perl -w -pi.bak -0777
my $repl = 'If not, see <http://www.gnu.org/licenses/>.';
my $br = qr/(?:\s*\n\s*(?:[\*\#]|\/\/|\.\\" )?\s*|\s+)/;
my $inwt = qr/[Ii]f${br}not,${br}write${br}(?:to${br})?the${br}Free${br}Software${br}Foundation,(?:${br}Inc\.,)?/;
my $mass = qr/675${br}Mass${br}Ave,?${br}Cambridge,?${br}MA${br}02139,?${br}USA,?\.?/;
my $franklin = qr/51${br}Franklin${br}St(?:reet)?(?:,${br}| - )Fifth${br}Floor,?${br}Boston,?${br}MA,?${br}02110-1301,?${br}USA,?\.?/;
my $temple = qr/59${br}Temple${br}Place(?:,${br}| - )Suite${br}330,?${br}Boston,?${br}MA,?${br}021110?-1307,?${br}USA,?\.?/;
The only remaining mentions of these addresses are in COPYING files which I
haven't touched.
Some of the changed files are imports from elsewhere, however
filtering them out is tricky, I think it is tolerable to have these
files be modified here and then perhaps reverted on the next sync,
since it's only 1-2 lines and obvious what is going on.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Mon, 27 Jul 2015 16:47:26 +0000 (17:47 +0100)]
tools/libxl: Do not fire the stream callback multiple times
Avoid stacking of check_all_finished() via synchronous teardown of
tasks. If the _abort() functions call back synchronously,
stream->completion_callback() ends up getting called twice, as first
and last check_all_finished() frames observe each task being finished.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Mon, 27 Jul 2015 16:47:25 +0000 (17:47 +0100)]
tools/libxl: Do not set stream->rc in stream_complete()
Only ever set stream->rc in check_all_finished(). The first version of
the migration v2 series had separate rc and joined_rc parameters, where
this logic worked. However when combining the two, the teardown path
fails to trigger if stream_complete() records stream->rc itself. A side
effect of this is that stream_done() needs to take an rc parameter.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Currently we always use memory map[] to help hvmloader construct e820 table
but hvmloader may have relocated RAM to support mmio allocation or just
populated ram to ensure we can have enough room to load ovmf. Anyway we
need to sync these changes into memory map[].
CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Mon, 27 Jul 2015 17:45:08 +0000 (18:45 +0100)]
python/xc: reinstate original implementation of next_bdf
I missed the fact that next_bdf is used to parsed user supplied
strings when reviewing. The user supplied string is a NULL-terminated
string separated by comma. User can supply several PCI devices in that
string. There is, however, no delimiter for different devices, hence
we can't change the syntax of that string.
This patch reinstate the original implementation of next_bdf to
preserve the original syntax. The last argument for xc_assign_device
is always 0.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Mon, 27 Jul 2015 17:45:02 +0000 (18:45 +0100)]
libxl: properly clean up array in libxl_list_cpupool failure path
Document how cpupool_info works. Distinguish success (ERROR_FAIL +
ENOENT) vs failure in libxl_list_cpupool and properly clean up the array
in failure path.
Also switch to libxl__realloc and call libxl_cpupool_{init,dispose}
where appropriate.
There is change of behaviour. Previously if memory allocation fails the
said function returns NULL. Now memory allocation failure is fatal. This
is in line with how we deal with memory allocation failure in other
places in libxl though.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Mon, 20 Jul 2015 10:37:59 +0000 (11:37 +0100)]
tools/libx{l, c}: Fix trivial Coverity defects in migration v2 code
All of these are UNUSED_VALUE defects where a default value is
unconditionally overwritten. They are not particularly interesting,
bug wise, but keeping these defects at bay helps prevent real bugs
going unnoticed in the volume.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Mon, 20 Jul 2015 10:37:58 +0000 (11:37 +0100)]
docs: Migration v2 is now no longer draft
Add further instructions to the libxc "Future Extensions" section, and
provide such a section for libxl.
In addition, drop the "In experimental __func__" IPRINTF()s from the
libxc implementations.
Finally, a correction to libxl's "Not Yet Included" section which
should have been amended in c/s 7eaec00 when libxl Remus support was
introduced into the protocol.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Mon, 20 Jul 2015 10:37:57 +0000 (11:37 +0100)]
tools/libx{l, c}: Drop '2' suffixes from xc_domain_{save, restore}2() functions
As there is now only the one implementation.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
All handling of device model files is now at the libxl level. Remove
XC_DEVICE_MODEL_RESTORE_FILE and introduce LIBXL_DEVICE_MODEL_RESTORE_FILE in
its place.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Mon, 20 Jul 2015 10:37:55 +0000 (11:37 +0100)]
tools/libx{l, c}: Remove the toolstack_{save, restore} callbacks
Update the libxc spec to indicate more sternly that TOOLSTACK records
should no longer be used.
Also, trim further toolstack infrastructure which should have gone in
c/s 39bf4e9 "tools/libxl: Drop all knowledge of toolstack callbacks"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
One complication is that xc_map_m2p() has users in xc_offline_page.c,
xen-mfndump and xen-mceinj. Move its implementation into
xc_offline_page (for want of a better location) beside it's current
user.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- drop mentions of removed files from MAINTAINERS ]
Wei Liu [Mon, 27 Jul 2015 14:01:32 +0000 (15:01 +0100)]
libxl: check nesthvm and altp2m in libxl
In ea214001 ("x86/altp2m: add altp2mhvm HVM domain parameter"), a
check was added to ensure nestedhvm and altp2m cannot be enabled at
the same time. That check was added in xl, but in fact it should be in
libxl because it should be the entity that decides whether
the provided configuration is valid.
This patch moves the check to libxl. The code snippet is moved after
calling libxl__domain_build_info_setdefault so that we can:
1. remove libxl_defbool_is_default in `if()';
2. detect mistake in libxl__domain_build_info_setdefault.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Martin Lucina [Fri, 24 Jul 2015 15:29:41 +0000 (17:29 +0200)]
xenconsole: Ensure exclusive access to console using locks
If more than one instance of xenconsole is run against the same DOMID
then each instance will only get some data. This change ensures
exclusive access to the console by obtaining an exclusive lock on
<XEN_LOCK_DIR>/xenconsole.<DOMID>.
The locking strategy used is based on
tools/libxl/libxl_internal.c:libxl__lock_domain_userdata().
Signed-off-by: Martin Lucina <martin@lucina.net> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Sun, 26 Jul 2015 21:34:54 +0000 (22:34 +0100)]
libxc: fix memory leak in migration v2
Originally there was only one counter to keep track of pages. It was
used erroneously to keep track of how many pages were mapped and how
many pages needed to be sent. In the end munmap(2) always had 0 as the
length argument, which resulted in leaking the mapping.
This problem was discovered on 32bit toolstack because 32bit applications
have notably smaller address space. In fact this bug affects 64bit
toolstack too.
Use a separate counter to keep track of the number of mapped pages to
solve this problem.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tamas K Lengyel [Fri, 24 Jul 2015 11:42:24 +0000 (13:42 +0200)]
xen-access: altp2m testcases
Working altp2m test-case. Extended the test tool to support singlestepping
to better highlight the core feature of altp2m view switching.
Signed-off-by: Tamas K Lengyel <tlengyel@novetta.com> Signed-off-by: Ed White <edmund.h.white@intel.com> Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Tamas K Lengyel [Fri, 24 Jul 2015 11:42:12 +0000 (13:42 +0200)]
libxc: add support to altp2m hvmops
Wrappers to issue altp2m hvmops.
Signed-off-by: Tamas K Lengyel <tlengyel@novetta.com> Signed-off-by: Ravi Sahita <ravi.sahita@intel.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Martin Lucina [Fri, 24 Jul 2015 11:30:48 +0000 (13:30 +0200)]
xenconsole: Allow non-interactive use
If xenconsole is run with stdin closed or redirected to /dev/null,
console_loop() will return immediately due to failure to read from
STDIN_FILENO. This patch tests if stdin and stdout are both connected to
a TTY and, if not, xenconsole will not attempt to read from stdin or
modify stdout terminal attributes.
Existing behaviour when xenconsole is run from a terminal does not
change.
This allows for non-interactive use, eg. running "xl create -c" under
systemd or piping the output of "xl console" to another command.
Signed-off-by: Martin Lucina <martin@lucina.net> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ravi Sahita [Fri, 24 Jul 2015 11:39:33 +0000 (13:39 +0200)]
x86/altp2m: XSM hooks for altp2m HVM ops
Signed-off-by: Ravi Sahita <ravi.sahita@intel.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Drop now bogus conditional expression from xsm_hvm_altp2mhvm_op()
invocation.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ed White [Fri, 24 Jul 2015 11:38:28 +0000 (13:38 +0200)]
x86/altp2m: add altp2mhvm HVM domain parameter
The altp2mhvm and nestedhvm parameters are mutually
exclusive and cannot be set together.
Signed-off-by: Ed White <edmund.h.white@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ed White [Fri, 24 Jul 2015 11:36:15 +0000 (13:36 +0200)]
x86/altp2m: add remaining support routines
Add the remaining routines required to support enabling the alternate
p2m functionality.
Signed-off-by: Ed White <edmund.h.white@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Fix off-by-one in various checks against MAX_ALTP2M. Adjust error code
in p2m_destroy_altp2m_by_id(). Cosmetic adjustments.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ed White [Fri, 24 Jul 2015 11:34:46 +0000 (13:34 +0200)]
x86/altp2m: alternate p2m memory events
Add a flag to indicate that a memory event occurred in an alternate p2m
and a field containing the p2m index. Allow any event response to switch
to a different alternate p2m using the same flag and field.
Modify p2m_mem_access_check() to handle alternate p2m's.
Signed-off-by: Ed White <edmund.h.white@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> for the x86 bits. Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Tamas K Lengyel <tlengyel@novetta.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Fri, 24 Jul 2015 11:30:44 +0000 (13:30 +0200)]
x86/altp2m: add control of suppress_ve
The existing ept_set_entry() and ept_get_entry() routines are extended
to optionally set/get suppress_ve. Passing -1 will set suppress_ve on
new p2m entries, or retain suppress_ve flag on existing entries.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Ravi Sahita <ravi.sahita@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Also adjust the caller in set_identity_p2m_entry().
Ed White [Fri, 24 Jul 2015 11:29:18 +0000 (13:29 +0200)]
VMX/altp2m: add code to support EPTP switching and #VE
Implement and hook up the code to enable VMX support of VMFUNC and #VE.
VMFUNC leaf 0 (EPTP switching) emulation is added in a later patch.
Signed-off-by: Ed White <edmund.h.white@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ed White [Fri, 24 Jul 2015 11:28:00 +0000 (13:28 +0200)]
x86/altp2m: basic data structures and support routines
Add the basic data structures needed to support alternate p2m's and
the functions to initialise them and tear them down.
Although Intel hardware can handle 512 EPTP's per hardware thread
concurrently, only 10 per domain are supported in this patch for
performance reasons.
This change also splits the p2m lock into one lock type for altp2m's
and another type for all other p2m's. The purpose of this is to place
the altp2m list lock between the types, so the list lock can be
acquired whilst holding the host p2m lock.
Signed-off-by: Ed White <edmund.h.white@intel.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Cosmetic adjustments.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ed White [Fri, 24 Jul 2015 11:26:02 +0000 (13:26 +0200)]
x86/HVM: hardware alternate p2m support detection
As implemented here, only supported on platforms with VMX HAP.
By default this functionality is force-disabled, it can be enabled
by specifying altp2m=1 on the Xen command line.
Signed-off-by: Ed White <edmund.h.white@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ed White [Fri, 24 Jul 2015 11:25:29 +0000 (13:25 +0200)]
VMX: implement suppress #VE
In preparation for selectively enabling #VE in a later patch, set
suppress #VE on all EPTE's.
Suppress #VE should always be the default condition for two reasons:
it is generally not safe to deliver #VE into a guest unless that guest
has been modified to receive it; and even then for most EPT violations only
the hypervisor is able to handle the violation.
Signed-off-by: Ed White <edmund.h.white@intel.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ed White [Fri, 24 Jul 2015 11:24:51 +0000 (13:24 +0200)]
VMX: VMFUNC and #VE definitions and detection
Currently, neither is enabled globally but may be enabled on a per-VCPU
basis by the altp2m code.
Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
that bit is now hardware-defined.
Signed-off-by: Ed White <edmund.h.white@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 24 Jul 2015 11:23:59 +0000 (13:23 +0200)]
common/domain: helpers to pause a domain while in context
For use on codepaths which would need to use domain_pause() but might be in
the target domain's context. In the case that the target domain is in
context, all other vcpus are paused.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Fri, 24 Jul 2015 10:41:17 +0000 (11:41 +0100)]
tools: libxl: Use correct printf format for uint64_t
Since 25652f232cbe "tools/libxl: detect and avoid conflicts with RDM"
the build is broken for x86_32 and arm32 with:
libxl_dm.c: In function ‘libxl__domain_device_construct_rdm’:
libxl_dm.c:349:13: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 8 has type ‘uint64_t’ [-Werror=format=]
LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
^
libxl_dm.c:352:13: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 8 has type ‘uint64_t’ [-Werror=format=]
LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
Use PRIx64 for these.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Tiejun Chen <tiejun.chen@intel.com>
xen/tools: Widen the machine_irq in xc_domain_*bind_pt_irq_int
The DOMCTLs {,un}bind_pt_irq are using uint32_t for the machine_irq
while the helper is using uint8_t.
Currently on ARM, we are supporting SPIs whose irq number can go up to
1019 which doesn't fit in an uint8_t. The helpers xc_domain_bind_pt_spi
and xc_domain_unbint_pt_spi are correctly taking an uint16_t so the
libxc was truncating without noticing the user which may end up to
route the wrong IRQ.
Fix the problem by widening the machine_irq parameter in
xc_domain_*bind_pt_irq_int.
Note that XEN_DOMCTL_irq_permission has the same problem but it's not
used at the moment on ARM. So we can defer the changes after the release
of Xen 4.7.
In setup_mm(), the value passed as xenheap_megabytes gets
converted to pages and passed to setup_xenheap_mappings(),
which in turn passes it to create_32mb_mappings(), which
contains an ASSERT that the value passed is a multiple of
32MB. So specifying any value that is not an integer multiple
of 32 will cause Xen to hit this assert and fail to boot.
Signed-off-by: Chris Brand <chris.brand@broadcom.com> Reviewed-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
sched/cpupool: properly update affinity when removing a cpu from a cpupool
And this time, do it right. In fact, a similar change was
attempted in 93be8285a79c6 ("cpupools: update domU's node-affinity
on the cpupool_unassign_cpu() path"). But that was buggy, and got
reverted with 8395b67ab0b8a86.
However, even though reverting was the right thing to do, it
remains true that:
- calling the function is better done in the cpupool cpu removal
code, even if just for simmetry with the cpupool cpu adding path;
- it is not necessary to call it during cpu teardown (for suspend
or shutdown) code as we either are going down and will never
come up (shutdown) or, when coming up, we want everything to be
as before the tearing down process started, and so we would just
undo any update made during the process.
- calling it from the teardown path is not only unnecessary, but
it can trigger an ASSERT(), in case we get, during the process,
to remove the last online pcpu of a domain's node affinity:
Therefore, for all these reasons, move the call from
cpu_disable_schedule() to cpupool_unassign_cpu_helper().
While there, add some sanity checking (in the latter function), and
make sure that scanning the domain list is done with domlist_read_lock
held, at least when the system is 'live'.
I re-tested the scenario described in here:
http://permalink.gmane.org/gmane.comp.emulators.xen.devel/235310
which is what led to the revert of 93be8285a79c6, and that is
working ok after this commit.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Juergen Gross <jgross@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
The function is called both when we want to remove a cpu
from a cpupool, and during cpu teardown, for suspend or
shutdown. If, however, the boot cpu (cpu 0, most of the
times) is not present in the default cpupool, during
suspend or shutdown, Xen crashes like this:
There also are problems when we try to suspend or shutdown
with a cpupool configured with just one cpu (no matter, in
this case, whether that is the boot cpu or not):
root@Zhaman:~# xl create /etc/xen/test.cfg
root@Zhaman:~# xl cpupool-migrate test Pool-1
root@Zhaman:~# xl cpupool-list -c
Name CPU list
Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
Pool-1 12
root@Zhaman:~# shutdown -h now
(XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]----
(XEN) CPU: 12
...
(XEN) Xen call trace:
(XEN) [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
(XEN) [<ffff82d080101424>] take_cpu_down+0x34/0x3b
(XEN) [<ffff82d08013097a>] stopmachine_action+0x70/0x99
(XEN) [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
(XEN) [<ffff82d080132926>] do_tasklet+0x5e/0x8a
(XEN) [<ffff82d08016478c>] idle_loop+0x56/0x6b
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 12:
(XEN) Xen BUG at smpboot.c:895
(XEN) ****************************************
In both cases, the problem is the scheduler not being able
to:
- move all the vcpus to the boot cpu (as the boot cpu is
not in the cpupool), in the former;
- move the vcpus away from a cpu at all (as that is the
only one cpu in the cpupool), in the latter.
Solution is to distinguish, inside cpu_disable_scheduler(),
the two cases of cpupool manipulation and teardown. For
cpupool manipulation, it is correct to ask the scheduler to
take an action, as pathological situation (like there not
being any cpu in the pool where to send vcpus) are taken
care of (i.e., forbidden!) already. For suspend and shutdown,
we don't want the scheduler to be involved at all, as the
final goal is pretty simple: "send all the vcpus to the
boot cpu ASAP", so we just go for it.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
libxc: Expose xc_reserved_device_memory_map to ARM too
The commit 25652f2 "tools/libxl: detect and avoid conflicts with RDM"
introduced the usage of xc_reserved_device_memory_map in the libxl
generic code. But the function is only defined for x86 which breaks the
ARM build.
The hypercall called by this helper is implemented in the generic code
and doesn't contain any x86 specific code. Therefore, it's fine to
expose the helper to ARM.
Signed-off-by: Julien Grall <julien.grall@citrix.com> CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
xen/vtd: prevent from assign the device with shared rmrr
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.
CC: Yang Zhang <yang.z.zhang@intel.com> CC: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.
CC: Yang Zhang <yang.z.zhang@intel.com> CC: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
libxl: construct e820 map with RDM information for HVM guest
Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.
Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Checked-by: Ian Jackson <ian.jackson@eu.citrix.com>
tools: introduce a new parameter to set a predefined rdm boundary
Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter "rdm_mem_boundary" in .cfg file.
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Checked-by: Ian Jackson <ian.jackson@eu.citrix.com>
While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.
RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.
But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,
#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;
#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
"strict" policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
"relaxed" policy issue a warning message and also mask this entry INVALID
to indicate we shouldn't expose this entry to hvmloader.
Note later we need to provide a parameter to set that predefined boundary
dynamically.
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
---
v13a: Change `flag' to `flags' in libxl__xc_device_get_rdm.
No functional change. [ Suggested by Tiejun Chen. ]
v13: Mechanical changes to deal with changes to patch 01/
XENMEM_reserved_device_memory_map.
Global RDM parameter, "strategy", allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.
'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.
Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Checked-by: Ian Jackson <ian.jackson@eu.citrix.com>
tools: extend xc_assign_device() to support rdm reservation policy
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.
Note this also bring some fallout to python usage of xc_assign_device().
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: David Scott <dave.scott@eu.citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
tools/libxc: Expose new hypercall xc_reserved_device_memory_map
We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
---
v13: Mechanical changes to deal with changes to patch 01/
XENMEM_reserved_device_memory_map.
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
hypervisor map
* Add in the hypervisor supplied regions
* Adjust the lowmem and highmem regions if we've had to relocate
memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.
CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
George Dunlap [Wed, 22 Jul 2015 14:24:49 +0000 (15:24 +0100)]
hvmloader/pci: try to avoid placing BARs in RMRRs
Try to avoid placing PCI BARs over RMRRs:
- If mmio_hole_size is not specified, and the existing MMIO range has
RMRRs in it, and there is space to expand the hole in lowmem without
moving more memory, then make the MMIO hole as large as possible.
- When placing RMRRs, find the next RMRR higher than the current base
in the lowmem mmio hole. If it overlaps, skip ahead of it and find
the next one.
This certainly won't work in all cases, but it should work in a
significant number of cases. Additionally, users should be able to
work around problems by setting mmio_hole_size larger in the guest
config.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.
And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END).
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.
CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.
CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
xen/passthrough: extend hypercall to support rdm reservation policy
This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.
CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Yang Zhang <yang.z.zhang@intel.com> CC: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
--
v13a: Fix build on ARM by passing 0 for flags to arm_smmu_assign_dev.
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.
And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:
set_identity_p2m_entry():
If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.
clear_identity_p2m_entry():
We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.
CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Yang Zhang <yang.z.zhang@intel.com> CC: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Jan Beulich [Wed, 22 Jul 2015 15:06:01 +0000 (16:06 +0100)]
introduce XENMEM_reserved_device_memory_map
This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v12a: Move interface structure union member to the end, while moving
the whole public header block into a __XEN__ / __XEN_TOOLS__
conditional block.
v12: Restore changes as much as possible to my original version, fixing
a few issues that got introduced after handing it over. Unionize
new public memop interface structure to allow for non-PCI to be
supported later on. Check flags to have all currently undefined
flags clear. Refine adjustments to xen/pci.h.
Jan Beulich [Thu, 23 Jul 2015 12:03:41 +0000 (14:03 +0200)]
x86/MSI: drop bogus NULL check from pci_restore_msi_state()
Commit 372900faf8 ("x86/MSI-X: reduce fiddling with control register
during restore") introduced de-references of pdev before it gets
checked against NULL. Instead of deferring the de-references, drop
the pointless check - both call sites do that check already.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 23 Jul 2015 08:15:39 +0000 (10:15 +0200)]
x86/MSI-X: access MSI-X table only after having enabled MSI-X
As done in Linux by f598282f51 ("PCI: Fix the NIU MSI-X problem in a
better way") and its broken predecessor, make sure we don't access the
MSI-X table without having enabled MSI-X first, using the mask-all flag
instead to prevent interrupts from occurring.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 23 Jul 2015 08:14:59 +0000 (10:14 +0200)]
x86/MSI-X: be more careful during teardown
When a device gets detached from a guest, pciback will clear its
command register, thus disabling both memory and I/O decoding. The
disabled memory decoding, however, has an effect on the MSI-X table
accesses the hypervisor does: These won't have the intended effect
anymore. Even worse, for PCIe devices (but not SR-IOV virtual
functions) such accesses may (will?) be treated as Unsupported
Requests, causing respective errors to be surfaced, potentially in the
form of NMIs that may be fatal to the hypervisor or Dom0 is different
ways. Hence rather than carrying out these accesses, we should avoid
them where we can, and use alternative (e.g. PCI config space based)
mechanisms to achieve at least the same effect.
At this time it continues to be unclear whether this is fixing an
actual bug or is rather just working around bogus (but apparently
common) system behavior.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>