]> xenbits.xensource.com Git - people/liuw/xen.git/log
people/liuw/xen.git
6 years agox86/mm: drop _new suffix for page table APIs xen-pt-allocation-1.1
Wei Liu [Fri, 8 Feb 2019 17:19:26 +0000 (17:19 +0000)]
x86/mm: drop _new suffix for page table APIs

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: switch to use domheap page for page tables
Wei Liu [Tue, 5 Feb 2019 17:20:11 +0000 (17:20 +0000)]
x86: switch to use domheap page for page tables

Modify all the _new APIs to handle domheap pages.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: drop old page table APIs
Wei Liu [Tue, 5 Feb 2019 17:06:43 +0000 (17:06 +0000)]
x86/mm: drop old page table APIs

Now that we've switched all users to the new APIs, the old ones aren't
needed anymore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: remove lXe_to_lYe in __start_xen
Wei Liu [Tue, 5 Feb 2019 17:04:56 +0000 (17:04 +0000)]
x86: remove lXe_to_lYe in __start_xen

Properly map and unmap page tables where necessary.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/pv: properly map and unmap page table in dom0_construct_pv
Wei Liu [Tue, 5 Feb 2019 16:35:28 +0000 (16:35 +0000)]
x86/pv: properly map and unmap page table in dom0_construct_pv

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/pv: properly map and unmap page tables in mark_pv_pt_pages_rdonly
Wei Liu [Tue, 5 Feb 2019 16:32:54 +0000 (16:32 +0000)]
x86/pv: properly map and unmap page tables in mark_pv_pt_pages_rdonly

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/smpboot: remove lXe_to_lYe in cleanup_cpu_root_pgt
Wei Liu [Tue, 5 Feb 2019 13:51:12 +0000 (13:51 +0000)]
x86/smpboot: remove lXe_to_lYe in cleanup_cpu_root_pgt

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: map and unmap page tables in subarch_memory_op
Wei Liu [Tue, 5 Feb 2019 13:47:07 +0000 (13:47 +0000)]
x86_64/mm: map and unmap page tables in subarch_memory_op

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: map and unmap page tables in subarch_init_memory
Wei Liu [Tue, 5 Feb 2019 13:44:22 +0000 (13:44 +0000)]
x86_64/mm: map and unmap page tables in subarch_init_memory

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: map and unmap page tables in cleanup_frame_table
Wei Liu [Tue, 5 Feb 2019 13:35:19 +0000 (13:35 +0000)]
x86_64/mm: map and unmap page tables in cleanup_frame_table

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: map and unmap page tables in setup_compat_m2p_table
Wei Liu [Tue, 5 Feb 2019 13:25:05 +0000 (13:25 +0000)]
x86_64/mm: map and unmap page tables in setup_compat_m2p_table

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: map and unmap page tables in destroy_m2p_mapping
Wei Liu [Tue, 5 Feb 2019 13:19:43 +0000 (13:19 +0000)]
x86_64/mm: map and unmap page tables in destroy_m2p_mapping

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: map and unmap page tables in destroy_compat_m2p_mapping
Wei Liu [Tue, 5 Feb 2019 13:09:18 +0000 (13:09 +0000)]
x86_64/mm: map and unmap page tables in destroy_compat_m2p_mapping

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: map and unmap page tables in share_hotadd_m2p_table
Wei Liu [Tue, 5 Feb 2019 13:06:08 +0000 (13:06 +0000)]
x86_64/mm: map and unmap page tables in share_hotadd_m2p_table

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: map and unmap page tables in m2p_mapped
Wei Liu [Tue, 5 Feb 2019 12:56:41 +0000 (12:56 +0000)]
x86_64/mm: map and unmap page tables in m2p_mapped

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/shim: map and unmap page tables in replace_va_mapping
Wei Liu [Tue, 5 Feb 2019 12:48:03 +0000 (12:48 +0000)]
x86/shim: map and unmap page tables in replace_va_mapping

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: switch root_pgt to mfn_t and use new APIs
Wei Liu [Tue, 5 Feb 2019 12:02:00 +0000 (12:02 +0000)]
x86: switch root_pgt to mfn_t and use new APIs

This then requires moving declaration of root page table mfn into mm.h
and modify setup_cpu_root_pgt to have a single exit path.

We also need to force map_domain_page to use direct map when switching
per-domain mappings. This is contrary to our end goal of removing
direct map, but this will be removed once we make map_domain_page
context-switch safe in another (large) patch series.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/smpboot: drop lXe_to_lYe invocations from cleanup_cpu_root_pgt
Wei Liu [Mon, 4 Feb 2019 18:16:30 +0000 (18:16 +0000)]
x86/smpboot: drop lXe_to_lYe invocations from cleanup_cpu_root_pgt

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/smpboot: switch pl1e to use new APIs in clone_mapping
Wei Liu [Mon, 4 Feb 2019 18:05:58 +0000 (18:05 +0000)]
x86/smpboot: switch pl1e to use new APIs in clone_mapping

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/smpboot: switch pl2e to use new APIs in clone_mapping
Wei Liu [Mon, 4 Feb 2019 18:03:09 +0000 (18:03 +0000)]
x86/smpboot: switch pl2e to use new APIs in clone_mapping

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/smpboot: switch pl3e to use new APIs in clone_mapping
Wei Liu [Mon, 4 Feb 2019 17:57:33 +0000 (17:57 +0000)]
x86/smpboot: switch pl3e to use new APIs in clone_mapping

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/smpboot: clone_mapping should have one exit path
Wei Liu [Mon, 4 Feb 2019 17:48:45 +0000 (17:48 +0000)]
x86/smpboot: clone_mapping should have one exit path

We will soon need to clean up page table mappings in the exit path.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/smpboot: add emacs block
Wei Liu [Mon, 4 Feb 2019 17:45:50 +0000 (17:45 +0000)]
x86/smpboot: add emacs block

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoefi: switch EFI L4 table to use new APIs
Wei Liu [Mon, 4 Feb 2019 17:19:27 +0000 (17:19 +0000)]
efi: switch EFI L4 table to use new APIs

This requires storing the MFN instead of linear address of the L4
table. Adjust code accordingly.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoefi: add emacs block to boot.c
Wei Liu [Mon, 4 Feb 2019 17:01:10 +0000 (17:01 +0000)]
efi: add emacs block to boot.c

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoefi: use new page table APIs in efi_init_memory
Wei Liu [Mon, 4 Feb 2019 17:00:59 +0000 (17:00 +0000)]
efi: use new page table APIs in efi_init_memory

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoefi: avoid using global variable in copy_mapping
Wei Liu [Mon, 4 Feb 2019 16:40:34 +0000 (16:40 +0000)]
efi: avoid using global variable in copy_mapping

We will soon switch efi_l4_table to use ephemeral mapping. Make
copy_mapping take a pointer to the mapping instead of using the global
variable.

No functional change intended.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoefi: use new page table APIs in copy_mapping
Wei Liu [Mon, 4 Feb 2019 16:01:03 +0000 (16:01 +0000)]
efi: use new page table APIs in copy_mapping

After inspection ARM doesn't have alloc_xen_pagetable so this function
is x86 only, which means it is safe for us to change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
XXX test this in gitlab ci to be sure.

6 years agox86_64/mm: drop lXe_to_lYe invocations from setup_m2p_table
Wei Liu [Thu, 31 Jan 2019 19:04:23 +0000 (19:04 +0000)]
x86_64/mm: drop lXe_to_lYe invocations from setup_m2p_table

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: switch to new APIs in setup_m2p_table
Wei Liu [Thu, 31 Jan 2019 19:01:11 +0000 (19:01 +0000)]
x86_64/mm: switch to new APIs in setup_m2p_table

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: introduce pl2e in setup_m2p_table
Wei Liu [Thu, 31 Jan 2019 18:52:48 +0000 (18:52 +0000)]
x86_64/mm: introduce pl2e in setup_m2p_table

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm.c: remove code that serves no purpose in setup_m2p_table
Wei Liu [Thu, 31 Jan 2019 18:49:36 +0000 (18:49 +0000)]
x86_64/mm.c: remove code that serves no purpose in setup_m2p_table

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: drop l4e_to_l3e invocation from paging_init
Wei Liu [Thu, 31 Jan 2019 18:31:04 +0000 (18:31 +0000)]
x86_64/mm: drop l4e_to_l3e invocation from paging_init

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: switch to new APIs in paging_init
Wei Liu [Tue, 29 Jan 2019 14:40:26 +0000 (14:40 +0000)]
x86_64/mm: switch to new APIs in paging_init

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86_64/mm: introduce pl2e in paging_init
Wei Liu [Thu, 31 Jan 2019 18:06:53 +0000 (18:06 +0000)]
x86_64/mm: introduce pl2e in paging_init

Introduce pl2e so that we can use l2_ro_mpt to point to the page table
itself.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: switch to new APIs in arch_init_memory
Wei Liu [Tue, 29 Jan 2019 14:15:47 +0000 (14:15 +0000)]
x86/mm: switch to new APIs in arch_init_memory

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: drop lXe_to_lYe invocations from modify_xen_mappings
Wei Liu [Fri, 1 Feb 2019 13:15:59 +0000 (13:15 +0000)]
x86/mm: drop lXe_to_lYe invocations from modify_xen_mappings

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: switch to new APIs in modify_xen_mappings
Wei Liu [Tue, 29 Jan 2019 14:03:48 +0000 (14:03 +0000)]
x86/mm: switch to new APIs in modify_xen_mappings

Page tables allocated in that function should be mapped and unmapped
now.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: drop lXe_to_lYe invocations in map_pages_to_xen
Wei Liu [Fri, 1 Feb 2019 12:39:26 +0000 (12:39 +0000)]
x86/mm: drop lXe_to_lYe invocations in map_pages_to_xen

Map and unmap page tables where necessary.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxxx fixup: avoid shadowing mfn
Wei Liu [Fri, 8 Feb 2019 17:17:15 +0000 (17:17 +0000)]
xxx fixup: avoid shadowing mfn

6 years agox86/mm: switch to new APIs in map_pages_to_xen
Wei Liu [Tue, 29 Jan 2019 13:56:43 +0000 (13:56 +0000)]
x86/mm: switch to new APIs in map_pages_to_xen

Page tables allocated in that function should be mapped and unmapped
now.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: rewrite virt_to_xen_l1e
Wei Liu [Tue, 29 Jan 2019 13:31:24 +0000 (13:31 +0000)]
x86/mm: rewrite virt_to_xen_l1e

Rewrite this function to use new APIs. Modify its callers to unmap the
pointer returned.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: rewrite xen_to_virt_l2e
Wei Liu [Tue, 29 Jan 2019 13:18:39 +0000 (13:18 +0000)]
x86/mm: rewrite xen_to_virt_l2e

Rewrite that function to use the new APIs. Modify its callers to unmap
the pointer returned.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: rewrite virt_to_xen_l3e
Wei Liu [Tue, 29 Jan 2019 12:42:23 +0000 (12:42 +0000)]
x86/mm: rewrite virt_to_xen_l3e

Rewrite that function to use the new APIs. Modify its callers to unmap
the pointer returned.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: change pl3e to l3t in virt_to_xen_l3e
Wei Liu [Tue, 29 Jan 2019 12:59:55 +0000 (12:59 +0000)]
x86/mm: change pl3e to l3t in virt_to_xen_l3e

We will need to have a variable named pl3e when we rewrite
virt_to_xen_l3e. Change pl3e to l3t to reflect better its purpose.
This will make reviewing later patch easier.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: change pl1e to l1t in virt_to_xen_l1e
Wei Liu [Tue, 29 Jan 2019 12:57:35 +0000 (12:57 +0000)]
x86/mm: change pl1e to l1t in virt_to_xen_l1e

We will need to have a variable named pl1e when we rewrite
virt_to_xen_l1e. Change pl1e to l1t to reflect better its purpose.
This will make reviewing later patch easier.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: change pl2e to l2t in virt_to_xen_l2e
Wei Liu [Tue, 29 Jan 2019 12:54:48 +0000 (12:54 +0000)]
x86/mm: change pl2e to l2t in virt_to_xen_l2e

We will need to have a variable named pl2e when we rewrite
virt_to_xen_l2e. Change pl2e to l2t to reflect better its purpose.
This will make reviewing later patch easier.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: add an end_of_loop label in modify_xen_mappings
Wei Liu [Mon, 28 Jan 2019 18:45:06 +0000 (18:45 +0000)]
x86/mm: add an end_of_loop label in modify_xen_mappings

We will soon need to clean up mappings whenever the out most loop
is ended. Add a new label and turn relevant continue's into goto's.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: make sure there is one exit path for modify_xen_mappings
Wei Liu [Mon, 28 Jan 2019 18:41:26 +0000 (18:41 +0000)]
x86/mm: make sure there is one exit path for modify_xen_mappings

We will soon need to handle dynamically mapping / unmapping page
tables in the said function.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: add an end_of_loop label in map_pages_to_xen
Wei Liu [Mon, 28 Jan 2019 18:35:52 +0000 (18:35 +0000)]
x86/mm: add an end_of_loop label in map_pages_to_xen

We will soon need to clean up mappings whenever the out most loop is
ended. Add a new label and turn relevant continue's into goto's.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: map_pages_to_xen should have one exit path
Wei Liu [Mon, 28 Jan 2019 18:30:47 +0000 (18:30 +0000)]
x86/mm: map_pages_to_xen should have one exit path

We will soon rewrite the function to handle dynamically mapping and
unmapping of page tables.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: introduce l{1,2}t local variables to modify_xen_mappings
Wei Liu [Mon, 28 Jan 2019 18:10:10 +0000 (18:10 +0000)]
x86/mm: introduce l{1,2}t local variables to modify_xen_mappings

The pl2e and pl1e variables are heavily (ab)used in that function.  It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.

We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.

Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: introduce l{1,2}t local variables to map_pages_to_xen
Wei Liu [Mon, 28 Jan 2019 17:54:24 +0000 (17:54 +0000)]
x86/mm: introduce l{1,2}t local variables to map_pages_to_xen

The pl2e and pl1e variables are heavily (ab)used in that function. It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.

We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.

Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: introduce a new set of APIs to manage Xen page tables
Wei Liu [Wed, 23 Jan 2019 15:33:07 +0000 (15:33 +0000)]
x86: introduce a new set of APIs to manage Xen page tables

We are going to switch to using domheap page for page tables.
A new set of APIs is introduced to allocate, map, unmap and free pages
for page tables.

The allocation and deallocation work on mfn_t but not page_info,
because they are required to work even before frame table is set up.

Implement the old functions with the new ones. We will rewrite, site
by site, other mm functions that manipulate page tables to use the new
APIs.

Note these new APIs still use xenheap page underneath and no actual
map and unmap is done so that we don't break xen half way. They will
be switched to use domheap and dynamic mappings when usage of old APIs
is eliminated.

No functional change intended in this patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: move some xen mm function declarations
Wei Liu [Wed, 23 Jan 2019 15:17:41 +0000 (15:17 +0000)]
x86: move some xen mm function declarations

They were put into page.h but mm.h is more appropriate.

The real reason is that I will be adding some new functions which
takes mfn_t. It turns out it is a bit difficult to do in page.h.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/mm: defer clearing page in virt_to_xen_lXe
Wei Liu [Tue, 22 Jan 2019 16:42:48 +0000 (16:42 +0000)]
x86/mm: defer clearing page in virt_to_xen_lXe

Defer the call to clear_page to the point when we're sure the page is
going to become a page table.

This is a minor optimisation. No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoVMX: don't ignore P2M setup error xen-pt-allocation-1.1-base
Jan Beulich [Tue, 12 Feb 2019 10:54:57 +0000 (11:54 +0100)]
VMX: don't ignore P2M setup error

set_mmio_p2m_entry() may fail, in particular with -ENOMEM. Don't ignore
such an error, but instead cause domain creation to fail in such a case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoiommu: fix iommu_ops initialization
Juergen Gross [Tue, 12 Feb 2019 10:54:07 +0000 (11:54 +0100)]
iommu: fix iommu_ops initialization

Commit 32a5ea00ec75ef53e ("IOMMU/x86: remove indirection from certain
IOMMU hook accesses") introduced iommu_ops initialized at boot time
with data declared as __initconstrel.

On Intel systems there is another path where iommu_ops is initialized
and this path is relevant on resume after returning from system suspend.
As the initialization data is no longer accessible in this case that
second initialization must be dropped in case the system isn't just
booting.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoasm: handle comments when creating header file
Norbert Manthey [Wed, 6 Feb 2019 14:09:33 +0000 (15:09 +0100)]
asm: handle comments when creating header file

In the early steps of compilation, the asm header files are created, such
as include/asm-$(TARGET_ARCH)/asm-offsets.h. These files depend on the
assembly file arch/$(TARGET_ARCH)/asm-offsets.s, which is generated
before. Depending on the used toolchain, there might be comments in the
assembly files. Especially the goto-gcc compiler of the bounded model
checker CBMC adds comments that start with a '#' symbol at the beginning
of the line.

This commit adds handling comments in assembler during the creation of the
asm header files, especially ignoring lines that start with '#', which
indicate comments for both ARM and x86 assembler. The used tool goto-as
produces exactly comments of this kind.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Signed-off-by: Michael Tautschnig <tautschn@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/shadow: adjust minimum allocation calculations
Jan Beulich [Mon, 11 Feb 2019 08:09:13 +0000 (09:09 +0100)]
x86/shadow: adjust minimum allocation calculations

A previously bad situation has become worse with the early setting of
->max_vcpus: The value returned by shadow_min_acceptable_pages() has
further grown, and hence now holds back even more memory from use for
the p2m.

Make sh_min_allocation() account for all p2m memory needed for
shadow_enable() to succeed during domain creation (at which point the
domain has no memory at all allocated to it yet, and hence use of
d->tot_pages is meaningless).

Also make shadow_min_acceptable_pages() no longer needlessly add 1 to
the vCPU count.

Finally make the debugging printk() in shadow_alloc_p2m_page() a little
more useful by logging some of the relevant domain settings.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs: features/qemu-depriv formatting fixes
George Dunlap [Thu, 7 Feb 2019 12:41:17 +0000 (12:41 +0000)]
docs: features/qemu-depriv formatting fixes

Need a space between the paragraph and the list so pandoc knows it's a
list.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs: Update credit/credit2 feature docs reflecting new default scheduler
George Dunlap [Thu, 7 Feb 2019 12:05:43 +0000 (12:05 +0000)]
docs: Update credit/credit2 feature docs reflecting new default scheduler

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools: init scripts: make XEN_RUN_DIR and XEN_LOCK_DIR mode 700
Ian Jackson [Thu, 7 Feb 2019 15:02:27 +0000 (15:02 +0000)]
tools: init scripts: make XEN_RUN_DIR and XEN_LOCK_DIR mode 700

These directories ought not to be even world-readable.  If this script
for some reason runs with a lax umask they might be created
overly-writeable.  Avoid any such bug by setting the mode explicitly.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools: init scripts: xencommons: Fixes to Description
Ian Jackson [Thu, 7 Feb 2019 15:02:26 +0000 (15:02 +0000)]
tools: init scripts: xencommons: Fixes to Description

`neeeded' is a typo.  And xend is long gone.

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools: init scripts: xencommons: Provides `xen'
Ian Jackson [Thu, 7 Feb 2019 15:02:25 +0000 (15:02 +0000)]
tools: init scripts: xencommons: Provides `xen'

It is useful to have a single `xen' facility (in the LSB Provides
namespace).  That allows other facilities to specify that they should
go after `xen' without needing to know the implementation details.

This service name is already Provide'd by the (fairly different) init
scripts used in Debian.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/arm: gic-v2: deactivate interrupts during initialization
Stefano Stabellini [Tue, 5 Feb 2019 21:38:53 +0000 (13:38 -0800)]
xen/arm: gic-v2: deactivate interrupts during initialization

Interrupts could be ACTIVE at boot. Make sure to deactivate them during
initialization.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
CC: julien.grall@arm.com
CC: peng.fan@nxp.com
CC: jgross@suse.com
6 years agodocs, argo: add design document for Argo
Christopher Clark [Wed, 6 Feb 2019 08:56:00 +0000 (09:56 +0100)]
docs, argo: add design document for Argo

Document provides a brief introduction to the Argo interdomain
communication mechanism and a detailed description of the granular
locking used within the Argo implementation.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoSUPPORT.md : add new entry for the Argo feature
Christopher Clark [Wed, 6 Feb 2019 09:04:00 +0000 (10:04 +0100)]
SUPPORT.md : add new entry for the Argo feature

Status: Experimental

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoMAINTAINERS: add new section for Argo and self as maintainer
Christopher Clark [Wed, 6 Feb 2019 08:56:00 +0000 (09:56 +0100)]
MAINTAINERS: add new section for Argo and self as maintainer

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxsm, argo: notify: don't describe rings that cannot be sent to
Christopher Clark [Wed, 6 Feb 2019 08:56:00 +0000 (09:56 +0100)]
xsm, argo: notify: don't describe rings that cannot be sent to

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxsm, argo: XSM control for any access to argo by a domain
Christopher Clark [Wed, 6 Feb 2019 08:56:00 +0000 (09:56 +0100)]
xsm, argo: XSM control for any access to argo by a domain

Will inhibit initialization of the domain's argo data structure to
prevent receiving any messages or notifications and access to any of
the argo hypercall operations.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxsm, argo: XSM control for argo message send operation
Christopher Clark [Wed, 6 Feb 2019 09:02:00 +0000 (10:02 +0100)]
xsm, argo: XSM control for argo message send operation

Default policy: allow.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxsm, argo: XSM control for argo register
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
xsm, argo: XSM control for argo register

XSM controls for argo ring registration with two distinct cases, where
the ring being registered is:

1) Single source:  registering a ring for communication to receive messages
                   from a specified single other domain.
   Default policy: allow.

2) Any source:     registering a ring for communication to receive messages
                   from any, or all, other domains (ie. wildcard).
   Default policy: deny, with runtime policy configuration via bootparam.

This commit modifies the signature of core XSM hook functions in order to
apply 'const' to arguments, needed in order for 'const' to be accepted in
signature of functions that invoke them.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: implement the notify op
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: implement the notify op

Queries for data about space availability in registered rings and
causes notification to be sent when space has become available.

The hypercall op populates a supplied data structure with information about
ring state and if insufficient space is currently available in a given ring,
the hypervisor will record the domain's expressed interest and notify it
when it observes that space has become available.

Checks for free space occur when this notify op is invoked, so it may be
intentionally invoked with no data structure to populate
(ie. a NULL argument) to trigger such a check and consequent notifications.

Limit the maximum number of notify requests in a single operation to a
simple fixed limit of 256.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: implement the sendv op; evtchn: expose send_guest_global_virq
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: implement the sendv op; evtchn: expose send_guest_global_virq

sendv operation is invoked to perform a synchronous send of buffers
contained in iovs to a remote domain's registered ring.

It takes:
 * A destination address (domid, port) for the ring to send to.
   It performs a most-specific match lookup, to allow for wildcard.
 * A source address, used to inform the destination of where to reply.
 * The address of an array of iovs containing the data to send
 * .. and the length of that array of iovs
 * and a 32-bit message type, available to communicate message context
   data (eg. kernel-to-kernel, separate from the application data).

If insufficient space exists in the destination ring, it will return
-EAGAIN and Xen will notify the caller when sufficient space becomes
available.

Accesses to the ring indices are appropriately atomic. The rings are
mapped into Xen's private address space to write as needed and the
mappings are retained for later use.

Notifications are sent to guests via VIRQ and send_guest_global_virq is
exposed in the change to enable argo to call it. VIRQ_ARGO is claimed
from the VIRQ previously reserved for this purpose (#11).

The VIRQ notification method is used rather than sending events using
evtchn functions directly because:

* no current event channel type is an exact fit for the intended
  behaviour. ECS_IPI is closest, but it disallows migration to
  other VCPUs which is not necessarily a requirement for Argo.

* at the point of argo_init, allocation of an event channel is
  complicated by none of the guest VCPUs being initialized yet
  and the event channel logic expects that a valid event channel
  has a present VCPU.

* at the point of signalling a notification, the VIRQ logic is already
  defensive: if d->vcpu[0] is NULL, the notification is just silently
  dropped, whereas the evtchn_send logic is not so defensive: vcpu[0]
  must not be NULL, otherwise a null pointer dereference occurs.

Using a VIRQ removes the need for the guest to query to determine which
event channel notifications will be delivered on. This is also likely to
simplify establishing future L0/L1 nested hypervisor argo communication.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: implement the unregister op
Christopher Clark [Wed, 6 Feb 2019 09:04:00 +0000 (10:04 +0100)]
argo: implement the unregister op

Takes a single argument: a handle to the ring unregistration struct,
which specifies the port and partner domain id or wildcard.

The ring's entry is removed from the hashtable of registered rings;
any entries for pending notifications are removed; and the ring is
unmapped from Xen's address space.

If the ring had been registered to communicate with a single specified
domain (ie. a non-wildcard ring) then the partner domain state is removed
from the partner domain's argo send_info hash table.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: implement the register op
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: implement the register op

The register op is used by a domain to register a region of memory for
receiving messages from either a specified other domain, or, if specifying a
wildcard, any domain.

This operation creates a mapping within Xen's private address space that
will remain resident for the lifetime of the ring. In subsequent commits,
the hypervisor will use this mapping to copy data from a sending domain into
this registered ring, making it accessible to the domain that registered the
ring to receive data.

Wildcard any-sender rings are default disabled and registration will be
refused with EPERM unless they have been specifically enabled with the
new mac-permissive flag that is added to the argo boot option here. The
reason why the default for wildcard rings is 'deny' is that there is
currently no means to protect the ring from DoS by a noisy domain
spamming the ring, affecting other domains ability to send to it. This
will be addressed with XSM policy controls in subsequent work.

Since denying access to any-sender rings is a significant functional
constraint, the new option "mac-permissive" for the argo bootparam
enables overriding this. eg: "argo=1,mac-permissive=1"

The p2m type of the memory supplied by the guest for the ring must be
p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
is registered.

This hypercall op and its interface currently only supports 4K-sized pages.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/arm: introduce guest_handle_for_field()
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
xen/arm: introduce guest_handle_for_field()

ARM port of c/s bb544585: "introduce guest_handle_for_field()"

This helper turns a field of a GUEST_HANDLE into a GUEST_HANDLE.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoerrno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI

EMSGSIZE: Argo's sendv operation will return EMSGSIZE when an excess amount
of data, across all iovs, has been supplied, exceeding either the statically
configured maximum size of a transmittable message, or the (variable) size
of the ring registered by the destination domain.

ECONNREFUSED: Argo's register operation will return ECONNREFUSED if a ring
is being registered to communicate with a specific remote domain that does
exist but is not argo-enabled.

These codes are described by POSIX here:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
    EMSGSIZE     : "Message too large"
    ECONNREFUSED : "Connection refused".

The numeric values assigned to each are taken from Linux, as is the case
for the existing error codes.
    EMSGSIZE     : 90
    ECONNREFUSED : 111

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: init, destroy and soft-reset, with enable command line opt
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: init, destroy and soft-reset, with enable command line opt

Initialises basic data structures and performs teardown of argo state
for domain shutdown.

Inclusion of the Argo implementation is dependent on CONFIG_ARGO.

Introduces a new Xen command line parameter 'argo': bool to enable/disable
the argo hypercall. Defaults to disabled.

New headers:
  public/argo.h: with definions of addresses and ring structure, including
  indexes for atomic update for communication between domain and hypervisor.

  xen/argo.h: to expose the hooks for integration into domain lifecycle:
    argo_init: per-domain init of argo data structures for domain_create.
    argo_destroy: teardown for domain_destroy and the error exit
                  path of domain_create.
    argo_soft_reset: reset of domain state for domain_soft_reset.

Adds a new field to struct domain: struct argo_domain *argo;

In accordance with recent work on _domain_destroy, argo_destroy is
idempotent. It will tear down: all rings registered by this domain, all
rings where this domain is the single sender (ie. specified partner,
non-wildcard rings), and all pending notifications where this domain is
awaiting signal about available space in the rings of other domains.

A count will be maintained of the number of rings that a domain has
registered in order to limit it below the fixed maximum limit defined here.

Macros are defined to verify the internal locking state within the argo
implementation. The macros are ASSERTed on entry to functions to validate
and document the required lock state prior to calling.

The hash function for the hashtables that hold ring state is derived from
the string hashing function djb2 (http://www.cse.yorku.ca/~oz/hash.html)
by Daniel J. Bernstein. Basic testing with a limited number of domains and
ports has shown reasonable distribution for the table size.

The software license on the public header is the BSD license, standard
procedure for the public Xen headers. The public header was originally
posted under a GPL license at: [1]:
https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html

The following ACK by Lars Kurth is to confirm that only people being
employees of Citrix contributed to the header files in the series posted at
[1] and that thus the copyright of the files in question is fully owned by
Citrix. The ACK also confirms that Citrix is happy for the header files to
be published under a BSD license in this series (which is based on [1]).

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: define argo_dprintk for subsystem debugging
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: define argo_dprintk for subsystem debugging

A convenience for working on development of the argo subsystem:
setting a #define variable enables additional debug messages.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoargo: introduce the argo_op hypercall boilerplate
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: introduce the argo_op hypercall boilerplate

Presence is gated upon CONFIG_ARGO.

Registers the hypercall previously reserved for this.
Takes 5 arguments, does nothing and returns -ENOSYS.

Implementation will provide a compat ABI so COMPAT_CALL is the selected
macro for the hypercall tables.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: Introduce the Kconfig option to govern inclusion of Argo
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: Introduce the Kconfig option to govern inclusion of Argo

Defines CONFIG_ARGO when enabled. Default: disabled.

When the Kconfig option is enabled, the Argo hypercall implementation
will be included, allowing use of the hypervisor-mediated interdomain
communication mechanism.

Argo is implemented for x86 and ARM hardware platforms.

Availability of the option depends on EXPERT and Argo is currently an
experimental feature.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoarm: gic-v3: deactivate interrupts during initialization
Peng Fan [Tue, 5 Feb 2019 05:55:35 +0000 (05:55 +0000)]
arm: gic-v3: deactivate interrupts during initialization

On i.MX8, we implemented partition reboot which means Cortex-A reboot
will not impact M4 cores and System control Unit core. However GICv3 is
not reset because we also need to support A72 Cluster reboot without
affecting A53 Cluster.

The gic-v3 controller is configured with EOImode to 1, so during xen
reboot, there is a function call "smp_call_function(halt_this_cpu, NULL, 0);"
but halt_this_cpu never returns, that means other CPUs have no chance to
deactivate the SGI interrupt, because the deactivate_irq operation is at
the end of do_sgi. During the next boot of Xen, CPU0 will issue
GIC_SGI_CALL_FUNCTION to other CPUs. As the Active state for SGI is left
untouched during the reboot, the GIC_SGI_CALL_FUNCTION will still be active
on the non-boot CPUs. This means the interrupt cannot be triggered again
until it get deactivated.

And according to IHI0069D_gic_architecture_specification, chapter
"8.11.3 GICR_ICACTIVER0, Interrupt Clear-Active Register 0", the RW
field of GICR_ICACTIVER0 resets to a value that is architecturally UNKNOWN.
So make sure all interrupts are deactivated during initialization by
clearing the state.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agotools: drop obsolete xen-ringwatch
Wei Liu [Mon, 4 Feb 2019 13:58:24 +0000 (13:58 +0000)]
tools: drop obsolete xen-ringwatch

This utility can't possibly work with modern Xen setup: none of the
sysfs path used (under /sys/devices/xen-backend) is documented as
stable ABI in upstream Linux kernel.

Archaeology shows that the path used could have been part of the
xenolinux fork which never got upstreamed.

Its utility is zero nowadays. Drop it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agoxen/arm: irq: End cleanly spurious interrupt
Julien Grall [Mon, 28 Jan 2019 16:00:23 +0000 (16:00 +0000)]
xen/arm: irq: End cleanly spurious interrupt

no_irq_type handlers are used when an IRQ does not have action attached.
This is useful to detect misconfiguration between the interrupt
controller and the software.

Currently, all the handlers will do nothing on spurious interrupt. This
means if such interrupt is received, the priority of the interrupt will
not be dropped and the processor will lose the ability to receive any
interrupt lower or equal to the priority.

Spurious interrupt can happen while releasing interrupt assigned to
guest (happen during domain destruction). The interaction is roughly

CPU0                                CPU1
release_guest_irq(A)
spin_lock(&desc->lock)
gic_remove_irq_from_guest
                                    receive IRQ A
                                    spin_lock(&desc->lock)
    desc->handler->shutdown()
      set_bit(IRQ_DISABLED)
    desc->handler = &no_irq_type
spin_unlock(&desc->lock)
                                    desc->handler->end();
                                    spin_unlock(&desc->lock)

Because the no_irq_type.end callback is implemented as a NOP, CPU1 will
not drop the priority of the interrupt. So the CPU will not be able to
receive any interrupt route to any guest afterwards.

The problem can be prevented by dropping the priority and deactivating
the interrupt via gic_hw_ops->gic_host_irq->end().

Note that, for now, interrupt used by Xen are safe because it is not
using no_irq_type on release.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agotools/misc: Remove obsolete xen-bugtool
Hans van Kranenburg [Sun, 3 Feb 2019 20:35:18 +0000 (21:35 +0100)]
tools/misc: Remove obsolete xen-bugtool

xen-bugtool relies on code that has been removed in commit 9e8672f1c3
"tools: remove xend and associated python modules", more than 5 years
ago. Remove it, since it confuses users.

    -$ /usr/sbin/xen-bugtool
    Traceback (most recent call last):
      File "/usr/sbin/xen-bugtool", line 9, in <module>
from xen.util import bugtool
    ImportError: No module named xen.util

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866380
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoautomation: introduce a QEMU smoke test for PVH Dom0
Wei Liu [Thu, 24 Jan 2019 14:03:48 +0000 (14:03 +0000)]
automation: introduce a QEMU smoke test for PVH Dom0

Make qemu-smoke-x86-64.sh take a variant argument. Make two new tests
in test.yaml.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibxl: When restricted, start QEMU paused
Anthony PERARD [Thu, 31 Jan 2019 10:57:48 +0000 (10:57 +0000)]
libxl: When restricted, start QEMU paused

libxl runs the command "cont" later during guest creation; i.e. it
is expecting that QEMU would not do any emulation.  Use the "-S"
command option to achieve this.

Unfortunately, when QEMU is started with "-S", it won't write QEMU's
readiness into xenstore. So only activate this option when we have a
QEMU startup notification via QMP available, i.e. when dm_restrict
is activated.

The -S option has the side-effect of suppressing the startup
notification via xenstore: libxl will only get the notification via
QMP.

It is important to rely only on QMP for notification when we have
QMP available, as (due to a qemu bug) not waiting for that QMP
notification may result in the QMP socket becoming blocked, so that
QEMU stops responding to new connections even if no existing ones
are active.

When the QEMU bug happens, the actions taken by both libxl and QEMU
are roughly as follows:
- libxl connects and handshakes with QEMU, then sends the
  cmd "query-status".
- QEMU prepares and maybe tries to send the response,
  while also writing "running" into xenstore.
- libxl sees via xenstore that QEMU is running and disconnects from the
  QMP socket before receiving the response from the cmd.
=> The QMP socket (monitor) is thereby blocked and will never reply
  to commands on new connections.

This is due to QEMU only responding to one command at a time, and
suspending its monitor (QMP) until the command has been processed and
sent. Disconnecting from the socket doesn't unsuspend the monitor. The
race described here is very likely to happen with QEMU 3.1.50 (during
3.2 development), but can be reproduced with QEMU 3.1.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agox86/svm: Improve diagnostics when svm_get_insn_len() fails
Andrew Cooper [Fri, 30 Nov 2018 13:50:54 +0000 (13:50 +0000)]
x86/svm: Improve diagnostics when svm_get_insn_len() fails

Sadly, a lone:

  (XEN) emulate.c:156:d2v0 svm_get_insn_len: Mismatch between expected and actual instruction: eip = fffff804564139c0

on the console is of no use trying to identify what went wrong.  Dump as much
state as we can to help identify what went wrong.

  (XEN) Insn mismatch: Expected opcode 0xf0031, modrm 0, got nrip_len 3, emul_len 3
  (XEN) SVM Insn len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 f9 0f 31 5b 31 ff 31 c0 e9 c2 db ff ff 00

Drop the debug-only early exit if the sources of length disagree, because the
only effect it has it to avoid the more detailed analysis of what went wrong.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/svm: Drop enum instruction_index and simplify svm_get_insn_len()
Andrew Cooper [Thu, 13 Dec 2018 17:01:24 +0000 (17:01 +0000)]
x86/svm: Drop enum instruction_index and simplify svm_get_insn_len()

Passing a 32-bit integer index into an array with entries containing less than
32 bits of data is wasteful, and creates an unnecessary error condition of
passing an out-of-range index.

The width of the X86EMUL_OPC() encoding is currently 20 bits for the
instructions used, which leaves room for a modrm byte.  Drop opc_tab[]
entirely, and encode the expected opcode/modrm information directly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/svm: Remove list functionality from __get_instruction_length_* infrastructure
Andrew Cooper [Thu, 13 Dec 2018 17:01:24 +0000 (09:01 -0800)]
x86/svm: Remove list functionality from __get_instruction_length_* infrastructure

The existing __get_instruction_length_from_list() has a single user
which uses the list functionality.  That user however should be looking
specifically for INVD or WBINVD, as reported by the vmexit exit reason.

Modify svm_vmexit_do_invalidate_cache() to ask for the correct
instruction, and drop all list functionality from the helper.

Take the opportunity to rename it to svm_get_insn_len(), and drop the
IOIO length handling which has never been used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86emul: correct AVX512BW write masking checks
Jan Beulich [Thu, 31 Jan 2019 10:38:24 +0000 (11:38 +0100)]
x86emul: correct AVX512BW write masking checks

For VPSADBW this likely was a result of bad copy-and-paste.

For VPS{L,R}LDQ comment and code were not in line, but then again the
comment also wasn't fully updated from the AVX2 original it got cloned
from.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools: fix build dependency upon generated header(s)
Jan Beulich [Thu, 31 Jan 2019 10:37:56 +0000 (11:37 +0100)]
tools: fix build dependency upon generated header(s)

Commit fd35f32b4b ("tools/x86emul: Use struct cpuid_policy in the
userspace test harnesses") didn't account for the dependencies of
cpuid-autogen.h to potentially change between incremental builds.
Putting the make invocation to produce the header together with the
directory tree creation therefore does not work. Introduce a separate
goal.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/cmdline: Work around some specific command line warnings
Andrew Cooper [Tue, 29 Jan 2019 19:07:40 +0000 (19:07 +0000)]
xen/cmdline: Work around some specific command line warnings

Xen will warn when an unknown parameter is found in the command line.  e.g.

  (d8) [ 1556.334664] (XEN) parameter "pv-shim" unknown!

One case where this goes wrong is a workaround for an old grub bug, which
resulted in "placeholder" being prepended to the command line.

Another case is when booting a CONFIG_PV_SHIM_EXCLUSIVE build, in which the
parsing for the "pv-shim" parameter is discarded.

Introduce ignore_param() and OPT_IGNORE to cope with known cases, where
issuing a warning is the wrong course of action to take.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/pvh-boot: don't mandate validity of RSDP pointer
Wei Liu [Wed, 30 Jan 2019 13:55:55 +0000 (13:55 +0000)]
x86/pvh-boot: don't mandate validity of RSDP pointer

RSDP is not mandatory according to PVH spec. Remove the BUG_ON. The
guest (xen) will fall back to scanning if necessary.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooepr3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/arm: gic-vgic: Fix the assert condition in vgic_connect_hw_irq
Andrii Anisov [Fri, 25 Jan 2019 17:06:02 +0000 (19:06 +0200)]
xen/arm: gic-vgic: Fix the assert condition in vgic_connect_hw_irq

Currently, the assert condition in vgic_connect_hw_irq does not
correspond to the comment above and result to hit the assertion
on HW IRQ disconnection.

Fix the condition so it corresponds to the comment and allows IRQ
disconnection on debug builds.

Fixes: ec2a2f1 ("ARM: VGIC: factor out vgic_connect_hw_irq()")
Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Suggested-by: Stefan Nuernberger <snu@amazon.de>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
[julieng: Reword the commit message]
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibxl: correctly dispose of dominfo list in libxl_name_to_domid
Wei Liu [Tue, 29 Jan 2019 11:37:59 +0000 (11:37 +0000)]
libxl: correctly dispose of dominfo list in libxl_name_to_domid

Tamas reported ssid_label was leaked. Use the designated function to
free dominfo list to fix the leakage.

Reported-by: Tamas K Lengyel <tamas@tklengyel.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/hvm: Fix bit checking for CR4 and MSR_EFER
Andrew Cooper [Fri, 25 Jan 2019 16:23:46 +0000 (16:23 +0000)]
x86/hvm: Fix bit checking for CR4 and MSR_EFER

Before the cpuid_policy logic came along, %cr4/EFER auditing on migrate-in was
complicated, because at that point no CPUID information had been set for the
guest.  Auditing against the host CPUID was better than nothing, but not
ideal.

Similarly at the time, PVHv1 lacked the "CPUID passed through from hardware"
behaviour with PV guests had, and PVH dom0 had to be special-cased to be able
to boot.

Order of information in the migration stream is still an issue (hence we still
need to keep the restore parameter to cope with a nested virt corner case for
%cr4), but since Xen 4.9, all domains start with a suitable CPUID policy,
which is a more appropriate upper bound than host_cpuid_policy.

Finally, reposition the UMIP logic as it is the only row out of order.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/p2m: Drop erroneous #VE-enabled check in ept_set_entry()
Andrew Cooper [Tue, 22 Jan 2019 18:58:56 +0000 (18:58 +0000)]
x86/p2m: Drop erroneous #VE-enabled check in ept_set_entry()

Code clearing the "Suppress VE" bit in an EPT entry isn't nececsserily running
in current context.  In ALTP2M_external mode, it definitely is not, and in PV
context, vcpu_altp2m(current) acts upon the HVM union.

Even if we could sensibly resolve the target vCPU, it may legitimately not be
fully set up at this point, so rejecting the EPT modification would be buggy.

There is a path in hvm_hap_nested_page_fault() which explicitly emulates #VE
in the cpu_has_vmx_virt_exceptions case, so the -EOPNOTSUPP part of this
condition is also wrong.

Drop the !sve check entirely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Juergen Gross <jgross@suse.com>