]> xenbits.xensource.com Git - people/royger/xen.git/log
people/royger/xen.git
7 years agoxen/test/livepatch/Makefile: Install in DESTDIR/usr/lib/debug/xen-livepatch
Ian Jackson [Wed, 7 Jun 2017 14:00:17 +0000 (15:00 +0100)]
xen/test/livepatch/Makefile: Install in DESTDIR/usr/lib/debug/xen-livepatch

Dumping these patch files in /usr/lib/debug/xen-*.livepatch is a bit
ugly.

Also, refactor the Makefile to have a LIVEPATCHES variable, to reduce
repetition.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agopublic: there's no MMUEXT_SET_FOREIGNDOM
Jan Beulich [Wed, 14 Jun 2017 09:40:02 +0000 (11:40 +0200)]
public: there's no MMUEXT_SET_FOREIGNDOM

Correct respective comments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoHVM: clean up hvm_save_one()
Jan Beulich [Wed, 14 Jun 2017 09:39:06 +0000 (11:39 +0200)]
HVM: clean up hvm_save_one()

Eliminate the for_each_vcpu() loop and the associated local variables,
don't override the save handler's return code, and correct formatting.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoHVM: sanitize DOMCTL_gethvmcontext_partial handling
Jan Beulich [Wed, 14 Jun 2017 09:38:32 +0000 (11:38 +0200)]
HVM: sanitize DOMCTL_gethvmcontext_partial handling

Have the caller indicate its buffer size, provide a means to query the
needed size, don't ignore the upper halves of type code and instance,
and don't copy partial data.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoRevert "x86/mm: add temporary debugging code to get_page_from_gfn_p2m()"
Jan Beulich [Wed, 14 Jun 2017 09:36:42 +0000 (11:36 +0200)]
Revert "x86/mm: add temporary debugging code to get_page_from_gfn_p2m()"

This reverts commit 933f966bcdf4f4255b432071fc12c9ee2efb05ef.

Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agomm: provide more grep fodder
Wei Liu [Mon, 3 Apr 2017 11:22:39 +0000 (12:22 +0100)]
mm: provide more grep fodder

Define several _* and *_x macros for better grep-ability. This also
helps indexing tool like GNU Global.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoMAINTAINERS: Move rombios and vgabios under x86 maintainership
Andrew Cooper [Tue, 13 Jun 2017 10:37:39 +0000 (11:37 +0100)]
MAINTAINERS: Move rombios and vgabios under x86 maintainership

alongside hvmloader.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agox86/boot: Fix the boot time relocation calculations
Andrew Cooper [Fri, 2 Jun 2017 10:22:17 +0000 (11:22 +0100)]
x86/boot: Fix the boot time relocation calculations

c/s b28044226e1 "x86: make Xen early boot code relocatable" introduces

    mov $sym_offs(__image_base__),%esi

to the legacy boot path.  However, this is by definition 0, which means the
boot code only functions correctly when Xen is loaded at its preferred
physical address (2M at the time of writing).

Xen does cope if loaded at an alternative physical address, if the
MULTIBOOT2_TAG_TYPE_LOAD_BASE_ADDR tag is filled in properly.  While recent
versions of Grub do fill this in appropriately, tboot does not.  (In fact,
tboot loads Xen at the preferred address, but claims a load address of 8M.)

Both Multiboot 1 and 2 specify the execution environment as being flat.  As a
result, Xen needs no help calculating the proper load address.

However, Multiboot specifies %esp as undefined.  Experimentally, using the
entry %esp is fine, but this is certainly no guarantee.  Use a temporary stack
in the first page of RAM, which is one of the safest areas to clobber.

Calculate the load address from %eip alone, and ignore
MULTIBOOT2_TAG_TYPE_LOAD_BASE_ADDR entirely.  This fixes legacy boot under
various versions of tboot.

Finally, set up the stack as soon as possible, which means the BIOS path has a
usable stack for the entirety of its duration.  Use the full available stack
size, rather than limiting to an arbitrary 1k.  One side effect is that the
MB2/EFI path continues to use the EFI stack until the trampoline is entered.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
7 years agox86/tests: Ignore automatically generated sse*.c files
Andrew Cooper [Mon, 12 Jun 2017 10:21:40 +0000 (11:21 +0100)]
x86/tests: Ignore automatically generated sse*.c files

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/xenstat: fix missing linkage of libxenstat against libyajl
Peter Große [Mon, 12 Jun 2017 23:05:21 +0000 (01:05 +0200)]
tools/xenstat: fix missing linkage of libxenstat against libyajl

This fixes the python bindings, since symbols were missing in libxenstat.
xentop doesn't use any yajl functions, so drop linking libyajl.

Signed-off-by: Peter Große <pegro@friiks.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxenstat: use python detected by configure for python bindings
Peter Große [Mon, 12 Jun 2017 23:05:20 +0000 (01:05 +0200)]
libxenstat: use python detected by configure for python bindings

Signed-off-by: Peter Große <pegro@friiks.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxl.cfg man page cleanup and fixes
Armando Vega [Thu, 8 Jun 2017 18:39:14 +0000 (20:39 +0200)]
xl.cfg man page cleanup and fixes

- fixed some minor numbering and syntax issues in the CPU allocation
  examples for the 'cpus' option
- semantic fixes to make explanations more clear throughout
- fixed all the typo's I could see
- general styling and makeup fixes to make everything look more consistent

Signed-off-by: Armando Vega <armando@greenhost.nl>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/boot: re-arrange how/when we do disk I/O
Jan Beulich [Tue, 13 Jun 2017 08:41:10 +0000 (10:41 +0200)]
x86/boot: re-arrange how/when we do disk I/O

We place the trampoline no lower than at 256k, so we have ample space
to read the MBRs of BIOS disks into an aligned buffer right below the
trampoline (not doing so has been found to be a problem on a buggy BIOS
coming with a Skull Canyon NUC). To facilitate that move MBR reading
past EDD info retrieval.

Also add a wrap check to the EDD info retrieval loop, to match that in
the MBR reading one.

Reported-by: Paul Durrant <Paul.Durrant@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Paul Durrant <Paul.Durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agodomctl: improve device assignment structure layout and use
Jan Beulich [Tue, 13 Jun 2017 08:39:52 +0000 (10:39 +0200)]
domctl: improve device assignment structure layout and use

Avoid needless gaps. Make flags field mandatory for all three
operations (and rename it to fit the intended future purpose of
possibly holding more than just one flag).

Also correct a typo in a related domctl.h comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: limit page type width
Jan Beulich [Tue, 13 Jun 2017 08:38:51 +0000 (10:38 +0200)]
x86: limit page type width

There's no reason to burn 4 bits on page type when we only have 7 types
(plus "none") at present. This requires changing one use of
PGT_shared_page, which so far assumed that the type is both a power of
2 and the only type with the high bit set.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/HAP: avoid using bogus/misleading locking
Jan Beulich [Tue, 13 Jun 2017 08:38:02 +0000 (10:38 +0200)]
x86/HAP: avoid using bogus/misleading locking

hap_teardown() unconditionally releases the paging lock and is always
being called without the lock held: Lock acquire should then be
unconditional too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agolivepatch: Wrong usage of spinlock on debug console.
Konrad Rzeszutek Wilk [Fri, 9 Jun 2017 13:31:28 +0000 (09:31 -0400)]
livepatch: Wrong usage of spinlock on debug console.

If we have a large amount of livepatches and want to print them
on the console using 'xl debug-keys x' we eventually hit
the preemption check:

  if ( i && !(i % 64) )
  {
spin_unlock(&payload_lock);
process_pending_softirqs();
if ( spin_trylock(&payload_lock) )
return

<facepalm> The effect is that we have just effectively
taken the lock and returned without unlocking!

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoSVM: clean up svm_vmcb_isvalid()
Jan Beulich [Mon, 12 Jun 2017 07:32:14 +0000 (09:32 +0200)]
SVM: clean up svm_vmcb_isvalid()

- correct CR3, CR4, and EFER checks
- delete bogus nested paging check
- add vcpu parameter (to include in log messages) and constify vmcb one
- use bool/true/false
- use accessors (and local variables to improve code readability)
- adjust formatting

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86: drop unused barrier parameter from build_{read,write}_atomic()
Jan Beulich [Mon, 12 Jun 2017 07:31:34 +0000 (09:31 +0200)]
x86: drop unused barrier parameter from build_{read,write}_atomic()

Also take the opportunity and make an attempt at making the macro
definitions readable. Drop pointless casts while doing so.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mm: drop further relics of translated PV domains
Jan Beulich [Mon, 12 Jun 2017 07:30:53 +0000 (09:30 +0200)]
x86/mm: drop further relics of translated PV domains

For PV domains paging_mode_{refcounts,translate}() are always false as
of commits 4045953527 ("x86/paging: Enforce PG_external == PG_translate
== PG_refcounts") and 92942fd3d4 ("x86/mm: drop
guest_{map,get_eff}_l1e() hooks").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: get_page_from_gfn() should not return misleading type
Jan Beulich [Mon, 12 Jun 2017 07:29:45 +0000 (09:29 +0200)]
x86: get_page_from_gfn() should not return misleading type

It is not impossible that the page owner is dom_io. While no current
caller cares about this case, let's nevertheless return an appropriate
type even in that case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/apic: Drop CONFIG_IO_APIC
Andrew Cooper [Thu, 1 Jun 2017 15:17:59 +0000 (16:17 +0100)]
x86/apic: Drop CONFIG_IO_APIC

It is unconditionally selected, and compiling out IO-APIC support is not a
useful thing to do these days.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/apic: Drop CONFIG_LOCAL_APIC
Andrew Cooper [Thu, 1 Jun 2017 15:17:02 +0000 (16:17 +0100)]
x86/apic: Drop CONFIG_LOCAL_APIC

It is unconditionally selected, and all 64bit processors have local APICs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/apic: Drop the unused struct local_apic
Andrew Cooper [Thu, 1 Jun 2017 15:15:40 +0000 (16:15 +0100)]
x86/apic: Drop the unused struct local_apic

It is not suitable for Xen's use (being xapic and x2apic compatible), and the
comment doesn't inspire much confidence in its correctness.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSVM: clean up svm_vmcb_dump()
Jan Beulich [Fri, 9 Jun 2017 12:14:27 +0000 (14:14 +0200)]
SVM: clean up svm_vmcb_dump()

- constify parameter
- use accessors
- drop stray casts
- adjust formatting

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agoSVM: infer type in VMCB_ACCESSORS()
Jan Beulich [Fri, 9 Jun 2017 12:13:54 +0000 (14:13 +0200)]
SVM: infer type in VMCB_ACCESSORS()

Prevent accidental mistakes by not requiring explicit types to be
specified in the macro invocations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agoSVM: use VMCB accessors
Jan Beulich [Fri, 9 Jun 2017 12:13:24 +0000 (14:13 +0200)]
SVM: use VMCB accessors

This is particularly relevant for the SET form, to ensure proper clean
bits tracking (albeit in the case here it's benign as CPL and other
segment register attributes share a clean bit).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agokexec: provide user friendly option for memory limit
Simon Crowe [Fri, 9 Jun 2017 12:11:37 +0000 (14:11 +0200)]
kexec: provide user friendly option for memory limit

kexec: Provide user friendly option for memory limit

grub2 requires that the '<' character be escaped which is
inconvienet for users, provide a more natural specifier.

Signed-off-by: Simon Crowe <Simon.Crowe@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoConfig.mk: update seabios
Wei Liu [Wed, 7 Jun 2017 14:37:25 +0000 (15:37 +0100)]
Config.mk: update seabios

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: bump some library version numbers to 4.10
Wei Liu [Wed, 7 Jun 2017 13:08:48 +0000 (14:08 +0100)]
tools: bump some library version numbers to 4.10

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agox86/pv/domain: clean up switch_compat
Wei Liu [Wed, 26 Apr 2017 15:28:37 +0000 (16:28 +0100)]
x86/pv/domain: clean up switch_compat

Remove the redundant is_pv_domain check. Rearrange setup_compat calls.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv/domain: clean up setup_compat_l4
Wei Liu [Wed, 26 Apr 2017 15:24:13 +0000 (16:24 +0100)]
x86/pv/domain: clean up setup_compat_l4

Move updating type_info after clearing the page. Add spaces.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/domain: move HVM specific code to hvm/domain.c
Wei Liu [Mon, 10 Apr 2017 13:14:46 +0000 (14:14 +0100)]
x86/domain: move HVM specific code to hvm/domain.c

There is only one function arch_set_info_hvm_guest is moved. The
check_segment function is also moved since arch_set_info_hvm_guest is
the only user.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/domain: move PV specific code to pv/domain.c
Wei Liu [Mon, 24 Apr 2017 18:39:11 +0000 (19:39 +0100)]
x86/domain: move PV specific code to pv/domain.c

Move all the PV specific code along with the supporting code to
pv/domain.c.

This in turn requires exporting a few functions in header files. Create
pv/domain.h for that.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/domain: factor out pv_domain_initialise
Wei Liu [Fri, 7 Apr 2017 14:55:43 +0000 (15:55 +0100)]
x86/domain: factor out pv_domain_initialise

Lump everything PV related in arch_domain_create into
pv_domain_initialise.

Though domcr_flags and config are not necessary, the new function is
given those to match hvm counterpart.

Since it cleans up after itself there is no need to clean up in
arch_domain_create in case of failure. Remove the initialiser of rc in
arch_domain_create.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/domain: factor out pv_domain_destroy
Wei Liu [Fri, 7 Apr 2017 14:49:42 +0000 (15:49 +0100)]
x86/domain: factor out pv_domain_destroy

Now this function also frees the perdomain mapping. It is safe to do so
because destroy_perdomain_mapping is idempotent.

Move free_perdomain_mappings after pv_domain_destroy. It is safe to do
so because both destroy_perdomain_mapping and free_perdomain_mappings
are idempotent.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/domain: push per-domain mapping creation down to hvm_domain_initialise
Wei Liu [Mon, 24 Apr 2017 18:00:35 +0000 (19:00 +0100)]
x86/domain: push per-domain mapping creation down to hvm_domain_initialise

We want to have a single entry point to initialise hvm guest.  Push
the per-domain mapping creation down to hvm_domain_initialise.

We can't move setting hap_enabled yet because that field needs to be
set before paging initialisation. Document that.

While at it, supply hvm_domain_initialise with more arguments. Though
they aren't used yet, they might be required in the future.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/domain: factor out pv_vcpu_initialise
Wei Liu [Mon, 24 Apr 2017 17:11:00 +0000 (18:11 +0100)]
x86/domain: factor out pv_vcpu_initialise

Move PV specific vcpu initialisation code to said function, but leave
the only line needed by idle domain in vcpu_initialise.

Use pv_vcpu_destroy in error path to simplify code. It is safe to do so
because the destruction function accepts partially initialised vcpu
struct.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/domain: factor out pv_vcpu_destroy
Wei Liu [Mon, 24 Apr 2017 13:37:48 +0000 (14:37 +0100)]
x86/domain: factor out pv_vcpu_destroy

The function is made idempotent on purpose. Note that free_compat_l4,
release_compat_l4 and pv_destroy_gdt_ldt_l1tab are idempotent already.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/domain: make release_compat_l4 NULL tolerant
Wei Liu [Mon, 24 Apr 2017 13:33:43 +0000 (14:33 +0100)]
x86/domain: make release_compat_l4 NULL tolerant

Push the check in caller down to that function so that it becomes
idempotent.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/domain: provide pv_{create,destroy}_gdt_ldt_l1tab and use them
Wei Liu [Mon, 24 Apr 2017 12:05:19 +0000 (13:05 +0100)]
x86/domain: provide pv_{create,destroy}_gdt_ldt_l1tab and use them

This patch encapsulates the perdomain creation and destruction into
helper functions and use them where appropriate.

Since destroy_perdomain_mapping is idempotent, it is safe to call the
destruction function multiple times.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/apic: Drop workarounds for Pentium/82489DX erratum
Andrew Cooper [Thu, 1 Jun 2017 14:27:38 +0000 (15:27 +0100)]
x86/apic: Drop workarounds for Pentium/82489DX erratum

CONFIG_X86_GOOD_APIC is unconditionally selected for 64bit builds.  Drop the
related infrastructure including apic_{read,write}_around(), the former of
which had no effect, and the latter which was an alias of apic_write().

No functional change, as confirmed by diffing the before/after disassembly.
(Three __LINE__ numbers are different, but they are `mov $imm, %reg` as part
of a dprintk() call.)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
7 years agox86/vmx: Fix vmentry failure because of invalid LER on Broadwell
Ross Lagerwall [Tue, 30 May 2017 14:05:04 +0000 (15:05 +0100)]
x86/vmx: Fix vmentry failure because of invalid LER on Broadwell

Occasionally, on certain Broadwell CPUs MSR_IA32_LASTINTTOIP has been
observed to have the top three bits corrupted as though the MSR is using
the LBR_FORMAT_EIP_FLAGS_TSX format. This is incorrect and causes a
vmentry failure -- the MSR should contain an offset into the current
code segment. This is assumed to be erratum BDF14. Workaround the issue
by sign-extending into bits 48:63 for MSR_IA32_LASTINT{FROM,TO}IP.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agons16550: add support for UART parameters to be specifed with name-value pairs
Swapnil Paratey [Wed, 7 Jun 2017 10:35:01 +0000 (12:35 +0200)]
ns16550: add support for UART parameters to be specifed with name-value pairs

Add name=value parsing options for com1 and com2 to add flexibility
in setting register values for MMIO UART devices.

Maintain backward compatibility with previous positional parameter
specfications.

eg. com1=115200,8n1,0x3f8,4
eg. com1=115200,8n1,0x3f8,4,reg_width=4,reg_shift=2
eg. com1=baud=115200,parity=n,reg_width=4,reg_shift=2,irq=4

Signed-off-by: Swapnil Paratey <swapnil.paratey@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: ensure invalidate_icache() definition is visible only when !__ASSEMBLY__
Punit Agrawal [Wed, 7 Jun 2017 10:34:20 +0000 (12:34 +0200)]
x86: ensure invalidate_icache() definition is visible only when !__ASSEMBLY__

Commit edff605421 introduces an empty invalidate_icache() function in
page.h for x86 but mistakenly places it outside the !__ASSEMBLY__
block. This causes build failure on x86.

Address this by moving the function definition to within the existing
!__ASSEMBLY__ block.

Fixes: edff605421 ("Avoid excess icache flushes in populate_physmap() before domain has been created")
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/arm: Remove unused helpers access_ok and array_access_ok
Julien Grall [Tue, 23 May 2017 17:03:36 +0000 (18:03 +0100)]
xen/arm: Remove unused helpers access_ok and array_access_ok

Both helpers access_ok and array_access_ok are not used on ARM. Remove
them.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoAvoid excess icache flushes in populate_physmap() before domain has been created
Punit Agrawal [Fri, 26 May 2017 11:14:07 +0000 (12:14 +0100)]
Avoid excess icache flushes in populate_physmap() before domain has been created

populate_physmap() calls alloc_heap_pages() per requested
extent. alloc_heap_pages() invalidates the entire icache per
extent. During domain creation, the icache invalidations can be deffered
until all the extents have been allocated as there is no risk of
executing stale instructions from the icache.

Introduce a new flag "MEMF_no_icache_flush" to be used to prevent
alloc_heap_pages() from performing icache maintenance operations. Use
the flag in populate_physmap() before the domain has been unpaused and
perform required icache maintenance function at the end of the
allocation.

One concern is the lack of synchronisation around testing for
"creation_finished". But it seems, in practice the window where it is
out of sync should be small enough to not matter.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm: p2m: Prevent redundant icache flushes
Punit Agrawal [Fri, 26 May 2017 11:14:06 +0000 (12:14 +0100)]
arm: p2m: Prevent redundant icache flushes

When toolstack requests flushing the caches, flush_page_to_ram() is
called for each page of the requested domain. This needs to unnecessary
icache invalidation operations.

Let's take the responsibility of performing icache operations and use
the recently introduced flag to prevent redundant icache operations by
flush_page_to_ram().

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoAllow control of icache invalidations when calling flush_page_to_ram()
Punit Agrawal [Fri, 26 May 2017 11:14:05 +0000 (12:14 +0100)]
Allow control of icache invalidations when calling flush_page_to_ram()

flush_page_to_ram() unconditionally drops the icache. In certain
situations this leads to execessive icache flushes when
flush_page_to_ram() ends up being repeatedly called in a loop.

Introduce a parameter to allow callers of flush_page_to_ram() to take
responsibility of synchronising the icache. This is in preparations for
adding logic to make the callers perform the necessary icache
maintenance operations.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agovif-common.sh: Have iptables wait for the xtables lock
George Dunlap [Mon, 5 Jun 2017 10:02:30 +0000 (11:02 +0100)]
vif-common.sh: Have iptables wait for the xtables lock

iptables has a system-wide lock on the xtables.  Strangely though, in
the case of two concurrent invocations, the default is for the
instance not grabbing the lock to exit out rather than waiting for it.
This means that when starting a large number of guests in parallel,
many will fail out with messages like this:

  2017-05-10 11:45:40 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge remove [18767] exited with error status 4
  2017-05-10 11:50:52 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge offline [1554] exited with error status 4

In order to instruct iptables to wait for the lock, you have to
specify '-w'.  Unfortunately, not all versions of iptables have the
'-w' option, so on first invocation check to see if it accepts the -w
command.

Reported-by: Antony Saba <awsaba@gmail.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/HAP: don't open code clear_domain_page()
Jan Beulich [Tue, 6 Jun 2017 12:37:12 +0000 (14:37 +0200)]
x86/HAP: don't open code clear_domain_page()

Also drop a stray initializer.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/HVM: correct notion of new CPL in task switch emulation
Jan Beulich [Tue, 6 Jun 2017 12:36:41 +0000 (14:36 +0200)]
x86/HVM: correct notion of new CPL in task switch emulation

Commit aac1df3d03 ("x86/HVM: introduce hvm_get_cpl() and respective
hook") went too far in one aspect: When emulating a task switch we
really shouldn't be looking at what hvm_get_cpl() returns, as we're
switching all segment registers.

The issue manifests as a vmentry failure for 32bit VMs which use task
gates to service interrupts/exceptions, in situations where delivering
the event interrupts user code, and a privilege increase is required.

However, instead of reverting the relevant parts of that commit, have
the caller tell the segment loading function what the new CPL is. This
at once fixes ES being loaded before CS so far having had its checks
done against the old CPL.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/physdev: factor out the code to allocate and map a pirq
Roger Pau Monné [Tue, 6 Jun 2017 12:35:54 +0000 (14:35 +0200)]
x86/physdev: factor out the code to allocate and map a pirq

Move the code to allocate and map a domain pirq (either GSI or MSI)
into the x86 irq code base, so that it can be used outside of the
physdev ops.

This change shouldn't affect the functionality of the already existing
physdev ops.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agodomctl/pt: remove hvm_domid field from bind struct
Roger Pau Monné [Tue, 6 Jun 2017 12:35:01 +0000 (14:35 +0200)]
domctl/pt: remove hvm_domid field from bind struct

This filed is unused and serves no purpose.

Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[jb: bump domctl interface version]
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/vlapic: fix two flaws in emulating MSR_IA32_APICBASE
Chao Gao [Tue, 6 Jun 2017 12:34:30 +0000 (14:34 +0200)]
x86/vlapic: fix two flaws in emulating MSR_IA32_APICBASE

According to SDM Chapter ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
-> Extended XAPIC (x2APIC) -> x2APIC State Transitions, The existing code to
handle guest's writing MSR_IA32_APICBASE has two flaws:
1. Transition from x2APIC Mode to Disabled Mode is allowed but wrongly
disabled currently. Fix it by removing the related check.
2. Transition from x2APIC Mode to xAPIC Mode is illegal but wrongly allowed
currently. Considering changing ENABLE bit of the MSR has been handled,
it can be fixed by only allowing transition from xAPIC Mode to x2APIC Mode
(the other two transitions: from x2APIC mode to xAPIC Mode, from disabled mode
to invalid state (EN=0, EXTD=1) are disabled).

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/PoD: drop a pointless local variable
Jan Beulich [Tue, 6 Jun 2017 12:33:47 +0000 (14:33 +0200)]
x86/PoD: drop a pointless local variable

... and move another one into a more narrow scope.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/NPT: deal with fallout from 2Mb/1Gb unmapping change
Jan Beulich [Tue, 6 Jun 2017 12:32:54 +0000 (14:32 +0200)]
x86/NPT: deal with fallout from 2Mb/1Gb unmapping change

Commit efa9596e9d ("x86/mm: fix incorrect unmapping of 2MB and 1GB
pages") left the NPT code untouched, as there is no explicit alignment
check matching the one in EPT code. However, the now more widespread
storing of INVALID_MFN into PTEs requires adjustments:
- calculations when shattering large pages may spill into the p2m type
  field (converting p2m_populate_on_demand to p2m_grant_map_rw) - use
  OR instead of PLUS,
- the use of plain l{2,3}e_from_pfn() in p2m_pt_set_entry() results in
  all upper (flag) bits being clobbered - introduce and use
  p2m_l{2,3}e_from_pfn(), paralleling the existing L1 variant.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen/public: Correct the HYPERVISOR_dm_op() documentation to match reality
Andrew Cooper [Thu, 1 Jun 2017 13:09:30 +0000 (14:09 +0100)]
xen/public: Correct the HYPERVISOR_dm_op() documentation to match reality

The number of buffers is ahead of the buffer list in the argument list.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/pagewalk: Fix pagewalk's handling of instruction fetches
Andrew Cooper [Tue, 23 May 2017 16:32:30 +0000 (16:32 +0000)]
x86/pagewalk: Fix pagewalk's handling of instruction fetches

Despite the claim in the comment (which was based partly on the code already
being like that, and mistaken reasoning because of Xen leaking NX into guest
context), reality differs.

Use of the SMAP feature without NX, or in a 2-level guest, demonstrate an
observable difference between reads and instruction fetches, despite
PFEC_insn_fetch not being reported in the #PF error code.  This demonstrates
that instruction fetches are distinguished from data reads even without
PFEC_insn_fetch being reported.

Alter the pagewalk logic to keep the pagewalk insn_fetch input intact, but
only conditionally report insn_fetch in the error code.  This logic is more
in line with the Intel SDM text:

 * I/D flag (bit 4).
   This flag is 1 if (1) the access causing the page-fault exception was an
   instruction fetch; and (2) either (a) CR4.SMEP = 1; or (b) both (i) CR4.PAE
   = 1 (either PAE paging or 4-level paging is in use); and (ii) IA32_EFER.NXE
   = 1. Otherwise, the flag is 0. This flag describes the access causing the
   page-fault exception, not the access rights specified by paging.

and the AMD SDM text:

 * I/D - Bit 4. If this bit is set to 1, it indicates that the access that
   caused the page fault was an instruction fetch. Otherwise, this bit is
   cleared to 0. This bit is only defined if no-execute feature is enabled
   (EFER.NXE=1 && CR4.PAE=1).

Curiously, the AMD manual doesn't mention SMEP despite some Fam16h processors
and all Fam17h processors supporting it.  Experimentally, it behaves as
described by Intel.

In addition, add some extra clarification and sanity checking around the use
of NX for the access checks, where it might be reserved.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoRevert "x86/hvm: disable pkeys for guests in non-paging mode"
Andrew Cooper [Thu, 25 May 2017 17:17:01 +0000 (18:17 +0100)]
Revert "x86/hvm: disable pkeys for guests in non-paging mode"

This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.

When determining Access Rights, Protection Keys only take effect when CR4.PKE
it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
PAE paging) skip the Protection Key control mechanism.

Therefore, we do not need to clear CR4.PKE behind the back of a guest which is
not using paging, as such a guest is necesserily running with EFER.LMA
disabled.

The {RD,WR}PKRU instructions are specified as being legal for use in any
operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
back of an unpaged guest, these instructions yield #UD despite the guest
correctly seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Huaitong Han <huaitong.han@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agostop_machine: fill fn_result only in case of error
Gregory Herrero [Thu, 1 Jun 2017 08:53:04 +0000 (10:53 +0200)]
stop_machine: fill fn_result only in case of error

When stop_machine_run() is called with NR_CPUS as last argument,
fn_result member must be filled only if an error happens since it is
shared across all cpus.

Assume CPU1 detects an error and set fn_result to -1, then CPU2 doesn't
detect an error and set fn_result to 0. The error detected by CPU1 will
be ignored.

Note that in case multiple failures occur on different CPUs, only the
last error will be reported.

Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: partially undo "fix build with gcc 7"
Jan Beulich [Thu, 1 Jun 2017 08:50:25 +0000 (10:50 +0200)]
x86: partially undo "fix build with gcc 7"

While f32400e90c ("x86: fix build with gcc 7")'s change to
compat_array_access_ok() is necessary, I had blindly and needlessly
also added it to array_access_ok(). There's no conditional expression
involved there, so undo it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agosmp: assert that all affected CPUs are online in on_selected_cpus()
Jan Beulich [Thu, 1 Jun 2017 08:49:53 +0000 (10:49 +0200)]
smp: assert that all affected CPUs are online in on_selected_cpus()

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agohvmloader: drop pointless objcopy invocation
Jan Beulich [Thu, 1 Jun 2017 08:49:26 +0000 (10:49 +0200)]
hvmloader: drop pointless objcopy invocation

It doesn't alter the image in any way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/MSI: improve memory usage in struct msi_desc
Jan Beulich [Thu, 1 Jun 2017 08:48:04 +0000 (10:48 +0200)]
x86/MSI: improve memory usage in struct msi_desc

There's no reason to have both a 4-byte hole and 4 bytes of tail
padding.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/x86: Drop sync_core()
Andrew Cooper [Fri, 19 May 2017 10:01:42 +0000 (11:01 +0100)]
xen/x86: Drop sync_core()

As identified in Linux c/s c198b121b1a1d "x86/asm: Rewrite sync_core() to use
IRET-to-self", sync_core() is only appropriate for two very specific usecases.

Xen doesn't have need of either of these usecases, so drop sync_core() to
avoid any misuse.

In the unlikely event that we do gain a legitimate use for sync_core(), it
should be reintroduced as a mov to %cr2 rather than cpuid, which has a lower
overhead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86/alternatives: Do not use sync_core() to serialize I$
Borislav Petkov [Sat, 3 Dec 2016 15:02:58 +0000 (16:02 +0100)]
xen/x86/alternatives: Do not use sync_core() to serialize I$

We use sync_core() in the alternatives code to stop speculative
execution of prefetched instructions because we are potentially changing
them and don't want to execute stale bytes.

What it does on most machines is call CPUID which is a serializing
instruction. And that's expensive.

However, the instruction cache is serialized when we're on the local CPU
and are changing the data through the same virtual address. So then, we
don't need the serializing CPUID but a simple control flow change. Last
being accomplished with a CALL/RET which the noinline causes.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
[Linux commit 34bfab0eaf0fb5c6fb14c6b4013b06cdc7984466]

Ported to Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/string: Clean up x86/string.h
Andrew Cooper [Fri, 12 May 2017 14:07:16 +0000 (15:07 +0100)]
x86/string: Clean up x86/string.h

 * None of the GCC docs mention memmove() in its list of builtins even today,
   but 4.1 does have the builtin, meaning that all currently supported
   compilers have it.
 * Consistently use Xen style, matching the common code, and introduce symbol
   definitions for function pointer use.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
7 years agoxen/string: Use compiler __builtin_*() where possible
Andrew Cooper [Tue, 2 Aug 2016 19:55:12 +0000 (19:55 +0000)]
xen/string: Use compiler __builtin_*() where possible

The use of -fno-builtin inhibits these automatic transformations.  This causes
constructs such as strlen("literal") to be evaluated at compile time, and
certain simple operations to be replaced with repeated string operations.

To avoid the macro altering the function names, use the method recommended by
the C specification by enclosing the function name in brackets to avoid the
macro being expanded.  This means that optimisation opportunities continue to
work in the rest of the translation unit.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/string: Clean up {xen,arm}/string.h
Andrew Cooper [Fri, 12 May 2017 16:15:36 +0000 (17:15 +0100)]
xen/string: Clean up {xen,arm}/string.h

 * Drop __kernel_size_t entirely.  It isn't a useful distinction, especially
   as it means the the prototypes don't appear to match their common
   definitions.
 * Introduce __HAVE_ARCH_* guards for strpbrk(), strsep() and strspn(), which
   match their implementation in common/string.c
 * Apply consistent Xen style throughout.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoRevert "tools/libxc: Drop broken xc_{get,set}_hvm_param() functions"
Wei Liu [Wed, 31 May 2017 10:35:25 +0000 (11:35 +0100)]
Revert "tools/libxc: Drop broken xc_{get,set}_hvm_param() functions"

This reverts commit fa4583333ddba6afb7b07ff7eb4d16e1a6a7459c.

QEMU build is broken by that patch.

7 years agoxl man page cleanup and fixes
Armando Vega [Wed, 31 May 2017 06:30:09 +0000 (08:30 +0200)]
xl man page cleanup and fixes

Signed-off-by: Armando Vega <armando@greenhost.nl>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ wei: remove trailing spaces ]
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/libxc: Drop broken xc_{get,set}_hvm_param() functions
Andrew Cooper [Mon, 22 May 2017 11:50:09 +0000 (12:50 +0100)]
tools/libxc: Drop broken xc_{get,set}_hvm_param() functions

xc_{get,set}_hvm_param() are deprecated because they truncate their value
parameter in 32bit builds of libxc, and are therefore unfit for use.

As there is only a single remaining user, switch that user over to
xc_hvm_param_get() and drop these functions completely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoRevert "ns16550: add support for UART parameters to be specifed with name-value pairs"
Jan Beulich [Wed, 31 May 2017 09:18:58 +0000 (11:18 +0200)]
Revert "ns16550: add support for UART parameters to be specifed with name-value pairs"

This reverts commit a91252ff0d219d801f2dc947511c1755fe5b05fe,
as it breaks the build on ARM.

7 years agodocs: remove PVHv1 document
Roger Pau Monné [Wed, 31 May 2017 06:47:57 +0000 (08:47 +0200)]
docs: remove PVHv1 document

The current misc/pvh.markdown document refers to PVHv1, remove it to
avoid confusion with PVHv2 since the PVHv1 code has already been
removed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/vpmu: add cpu hot unplug notifier for vpmu
Luwei Kang [Wed, 31 May 2017 06:41:43 +0000 (08:41 +0200)]
x86/vpmu: add cpu hot unplug notifier for vpmu

Currently, Hot unplug a physical CPU with vpmu enabled may cause
system hang due to send a remote call to an offlined pCPU. This
patch add a cpu hot unplug notifer to save vpmu context before
cpu offline.

Consider one scenario, hot unplug pCPU N with vpmu enabled.
The vcpu which running on this pCPU will be switch to other
online cpu. A remote call will be send to pCPU N to save the
vpmu context before loading the vpmu context on this pCPU.
System will hang in function on_select_cpus() because of that
pCPU is offlined and can not do any respond.

The purpose of add a VPMU_CONTEXT_LOADED check in vpmu_arch_destroy()
before send a remote call to save vpmu contex is:
a. when a vpmu context has been loaded in a remote pCPU, make a
   remote call to save the vpmu contex and stop counters is necessary.
b. VPMU_CONTEXT_LOADED flag will be reset if a pCPU is offlined.
   this check will prevent send a remote call to an offlined pCPU.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agoacpi: enlarge NUM_FIXMAP_ACPI_PAGES to support larger scale boards
Zhang Bo [Wed, 31 May 2017 06:40:44 +0000 (08:40 +0200)]
acpi: enlarge NUM_FIXMAP_ACPI_PAGES to support larger scale boards

In acpi_tb_verify_table()->__acpi_map_table(), it suppose all ACPI tables
may not exceed 4 pages, the tables includes SRAT/APIC/ERST etc.
Please note that the table DSDT is not mapped through
acpi_tb_verify_table(), thus we don't care its size although it's usually
the largest table among all the ACPI tables. Then the biggest table we
concern is SRAT.
As we know, the size of SRAT if affected by both CPU number and memory
slot number, each CPU costs 24B, and each memory slot costs 40B.

Please note: even when SRAT table is within 4 pages, eg. 14128B, in
__acpi_map_table(), it maps pages to get the table. suppose the start
address is near the end of the first page:

       1000B    4096B         4096B          4096B      840B
       |___|_____________|______________|______________|____|

although the total page is within 4 pages , but it may be in fact across 5
pages, as shown above. Thus the NUM_FIXMAP_ACPI_PAGES should be much
larger nowadays. If not, xen would wrongly thinks no NUMA configuration
could be found as that it could not get SRAT table.

Thus, we make NUM_FIXMAP_ACPI_PAGES much larger, to 64(256KB). it's
calculated for that the theoretical largest CPU number on main Linux
distros is about 8192, and memory slots number should be within 1000,
that's 24B*8192+40B*1000 = 236608B. Meanwhile, because IOREMAP_VIRT_*
region is 16GB, thus I think extending it to 256KB is safe enough.

Of course, there's much more work to do to support large scale boards of
that many(8192) CPUs and 1000 memory slots. We just make life easier for
boards with serveral hundreds of CPUs and serveral TBs of memory.

Signed-off-by: Zhang Bo <oscar.zhangbo@huawei.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agons16550: add support for UART parameters to be specifed with name-value pairs
Swapnil Paratey [Wed, 31 May 2017 06:40:15 +0000 (08:40 +0200)]
ns16550: add support for UART parameters to be specifed with name-value pairs

Add name=value parsing options for com1 and com2 to add flexibility
in setting register values for MMIO UART devices.

Maintain backward compatibility with previous positional parameter
specfications.

eg. com1=115200,8n1,0x3f8,4
eg. com1=115200,8n1,0x3f8,4,reg_width=4,reg_shift=2
eg. com1=baud=115200,parity=n,reg_width=4,reg_shift=2,irq=4

Signed-off-by: Swapnil Paratey <swapnil.paratey@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mmcfg: set pci_mmcfg_config_num to 0 on error path
Roger Pau Monné [Wed, 31 May 2017 06:39:47 +0000 (08:39 +0200)]
x86/mmcfg: set pci_mmcfg_config_num to 0 on error path

One error path of acpi_parse_mcfg doesn't set pci_mmcfg_config_num to zero, fix
this.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mce: make 'found_error' and 'mce_fatal_cpus' private to mcheck_cmn_handler()
Haozhong Zhang [Wed, 31 May 2017 06:39:22 +0000 (08:39 +0200)]
x86/mce: make 'found_error' and 'mce_fatal_cpus' private to mcheck_cmn_handler()

mcheck_cmn_handler() is the only user of 'found_error' and
'mce_fatal_cpus'.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mce: make mce barriers private to their users
Haozhong Zhang [Wed, 31 May 2017 06:38:21 +0000 (08:38 +0200)]
x86/mce: make mce barriers private to their users

Each of current mce barriers is actually used by only one function, so
move their definitions into their users. A static mce barrier initializer
is introduced so we can move the initialization of above mce barriers
to their definitions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Replace do_guest_trap() with pv_inject_hw_exception()
Andrew Cooper [Mon, 15 May 2017 12:15:21 +0000 (13:15 +0100)]
x86/pv: Replace do_guest_trap() with pv_inject_hw_exception()

do_guest_trap() is now functionally equivalent to pv_inject_hw_exception(),
but with a less useful API as it requires the error code parameter to be
passed implicitly via cpu_user_regs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mtrr: Improvements to control register handling
Andrew Cooper [Tue, 9 May 2017 15:05:08 +0000 (15:05 +0000)]
x86/mtrr: Improvements to control register handling

Use X86_CR0_CD rather than opencoding it (and its inversion).  Drop the
pointless cr0 variable.

Xen always uses CR4.PGE, and altering PGE is a full TLB flush.  There is no
need to call flush_tlb_local() (which itself, toggles CR4.PGE rather than
writing to CR3!) as well as clearing CR4.PGE.  The static cr4 variable isn't
needed either.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/tlb: Don't use locked operations in tlbflush_filter()
Andrew Cooper [Mon, 8 May 2017 16:02:09 +0000 (16:02 +0000)]
x86/tlb: Don't use locked operations in tlbflush_filter()

All passed cpumask_t's are context-local and not at risk of concurrent
updates.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/traps: export trapstr()
Wei Liu [Tue, 2 May 2017 18:02:43 +0000 (19:02 +0100)]
x86/traps: export trapstr()

It will be used in common and pv specific code. Export it in traps.h.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/x86: Remove APIC_INTEGRATED() checks
Andrew Cooper [Tue, 17 Nov 2015 14:41:32 +0000 (14:41 +0000)]
xen/x86: Remove APIC_INTEGRATED() checks

All 64bit processors have integrated APICs.  Xen has no need to attempt to
cope with external APICs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/mm: Fix the odd indentation of the pin_page block of do_mmuext_op()
Andrew Cooper [Fri, 5 May 2017 16:21:28 +0000 (17:21 +0100)]
x86/mm: Fix the odd indentation of the pin_page block of do_mmuext_op()

The pin_page block is missing one level of indentation, which makes the
MMUEXT_UNPIN_TABLE case label appear to be outside of the switch statement.

However, the block isn't needed at all if page is declared with switch level
scope.  This allows for the removal of the identical local declarations for
MMUEXT_UNPIN_TABLE, MMUEXT_NEW_USER_BASEPTR and MMUEXT_CLEAR_PAGE.

While making this adjustment, delete one other piece of trailing whitespace.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Drop unused switch_kernel_stack()
Andrew Cooper [Thu, 4 May 2017 14:19:31 +0000 (15:19 +0100)]
x86/pv: Drop unused switch_kernel_stack()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mm: make free_perdomain_mappings() idempotent
Wei Liu [Tue, 25 Apr 2017 12:00:17 +0000 (13:00 +0100)]
x86/mm: make free_perdomain_mappings() idempotent

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
7 years agoMakefile: Mention usual targets of subdir Makefiles
Ian Jackson [Thu, 25 May 2017 15:42:12 +0000 (16:42 +0100)]
Makefile: Mention usual targets of subdir Makefiles

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: M A Young <m.a.young@durham.ac.uk>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoBranching 4.9: Fix versions to be 4.10-unstable
Ian Jackson [Tue, 30 May 2017 16:30:03 +0000 (17:30 +0100)]
Branching 4.9: Fix versions to be 4.10-unstable

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoBranch Xen 4.9: Make staging be an unstable branch
Ian Jackson [Tue, 30 May 2017 15:30:58 +0000 (16:30 +0100)]
Branch Xen 4.9: Make staging be an unstable branch

Config.mk
  MINIOS_UPSTREAM_REVISION   } changed from tag to equivalent
  QEMU_TRADITIONAL_REVISION  }  specific commit hash
  QEMU_UPSTREAM_REVISION     now tracking master again

README, xen/Makefile
  Update version number

*/configure
  Reran autoconf; only change is version number

tools/Rules.mk, xen/Kconfig.debug
  Enable debug.
  Reverts 229ff3125b3d "Use non-debug build for Xen 4.9".

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoMakefile: Regularise subdir targets and their dependencies
Ian Jackson [Wed, 24 May 2017 15:54:11 +0000 (16:54 +0100)]
Makefile: Regularise subdir targets and their dependencies

Recent changes to this Makefile have broken some build targets, and
some parallel builds.

Looking at it, I think I have identified the undocumented design
intent in the top-level Makefile.  So in this patch I document it, and
also make it true.

In detail:

 * Add a comment with the new design intent
 * Get rid of the ad-hoc rules for recursing into tools/include,
   and replace them with a pattern rule
 * Add an appropriate dependency on TARGET-tools-public-headers from
   TARGET-tools and TARGET-stubdom (but not dist-*).
 * Get rid of all the separate invocations of $(MAKE) -C tools/include
   which are now obsolete
 * Un-deprecate the simple `tools' etc. targets (aliases for `dist-tools')
   which we seem not to be making any effort to get rid of

I have verified with the following shell script that after my change,
the tree producese the same results for various build targets as
3fafdc28eb98 (before the Makefile-hacking started).

My tests failed as expected for make -C tools, both before and after.

Separately, there is a bug in the Makefiles that `make distclean-tools'
fails.  I have not investigated that bug in detail.

    #!/bin/bash

    set -e
    set -o pipefail

    listings=../listings

    rm -rf $listings
    mkdir $listings

    chks () {
         reskey="C$subdir $*"
         reskey="${reskey// /_}"
         reskey="${reskey//\//:}"
         lk=$listings/$reskey
         for suffix in '' -xen -tools -stubdom -docs; do
             case "$subdir:$suffix" in
             .:*) ;;
             *:) ;;
             *) continue;;
             esac
             git clean -qxdff
             rm -rf $output
             printf '%s' "running -C$subdir suffix=$suffix "
             case "$subdir $suffix" in
             *xen*) ;;
             *) printf 'configure '; ./configure >$lk.cfg 2>&1 ;;
             esac
             fail=''
             for targ in $*; do
                 realtarg=$targ$suffix
                 printf '%s ' "$realtarg"
                 if ! make -C $subdir -j10 $realtarg >${lk}_${realtarg}.log 2>&1
                 then
                    fail=$realtarg
                    break
                 fi
             done
             if [ "$fail" ]; then
               echo fail!
               echo "$fail failed" >$lk.list
             else
               echo ok.
               (test ! -e "$output" || find $output) |sort >$lk.list
             fi
        done
    }

    subdirs='. xen docs tools'

    output=$PWD/dist
    for subdir in $subdirs; do
        chks build clean distclean
    done

    output=$PWD/dist
    subdir=.
    chks dist

    export DESTDIR=$PWD/destdir
    output=$PWD/destdir
    for subdir in $subdirs; do
        chks install
    done

And the output:

    (64)iwj@mariner:~/work/xen.git$ ~/junk/chks
    running -C. suffix= configure build clean distclean ok.
    running -C. suffix=-xen build-xen clean-xen distclean-xen ok.
    running -C. suffix=-tools configure build-tools clean-tools distclean-tools fail!
    running -C. suffix=-stubdom configure build-stubdom clean-stubdom distclean-stubdom ok.
    running -C. suffix=-docs configure build-docs clean-docs distclean-docs ok.
    running -Cxen suffix= build clean distclean ok.
    running -Cdocs suffix= configure build clean distclean ok.
    running -Ctools suffix= configure build fail!
    running -C. suffix= configure dist ok.
    running -C. suffix=-xen dist-xen ok.
    running -C. suffix=-tools configure dist-tools ok.
    running -C. suffix=-stubdom configure dist-stubdom ok.
    running -C. suffix=-docs configure dist-docs ok.
    running -C. suffix= configure install ok.
    running -C. suffix=-xen install-xen ok.
    running -C. suffix=-tools configure install-tools ok.
    running -C. suffix=-stubdom configure install-stubdom ok.
    running -C. suffix=-docs configure install-docs ok.
    running -Cxen suffix= install ok.
    running -Cdocs suffix= configure install ok.
    running -Ctools suffix= configure install fail!
    (64)iwj@mariner:~/work/xen.git$

CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: M A Young <m.a.young@durham.ac.uk>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agotools/include/Makefile: Support `build' target
Ian Jackson [Wed, 24 May 2017 15:53:28 +0000 (16:53 +0100)]
tools/include/Makefile: Support `build' target

This is the only one of the Makefiles invoked with -C from the
toplevel which lacks this target.

CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: M A Young <m.a.young@durham.ac.uk>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/hvmloader: Don't wait for the producer to fill the ring if
Anshul Makkar [Tue, 23 May 2017 14:12:58 +0000 (15:12 +0100)]
x86/hvmloader: Don't wait for the producer to fill the ring if

The condition: if there is a space in the ring then wait for the producer
to fill the ring also evaluates to true even if the ring if full. It
leads to a deadlock where producer is waiting for consumer
to consume the items and consumer is waiting for producer to fill the ring.

Fix for the issue: check if the ring is full and then break from
the loop to consume the items from the ring.
eg. case: prod = 1272, cons = 248.

Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoRestore HVM_OP hypercall continuation (partial revert of ae20ccf)
George Dunlap [Mon, 22 May 2017 10:38:31 +0000 (11:38 +0100)]
Restore HVM_OP hypercall continuation (partial revert of ae20ccf)

Commit ae20ccf removed the hypercall continuation logic from the end
of do_hvm_op(), claiming:

"This patch removes the need for handling HVMOP restarts, so that
infrastructure is removed."

That turns out to be false.  The removal of HVMOP_set_mem_type removed
the need to store a start iteration value in the hypercall
continuation, but a grep through hvm.c for ERESTART turns up at least
two places where do_hvm_op() may still need a hypercall continuation:

 * HVMOP_set_hvm_param can return -ERESTART when setting
HVM_PARAM_IDENT_PT in the event that it fails to acquire the domctl
lock

 * HVMOP_flush_tlbs can return -ERESTART if several vcpus call it at
   the same time

In both cases, a simple restart (with no stored iteration information)
is necessary.

Add a check for -ERESTART again, along with a comment at the top of
the function regarding the lack of decoding any information from the
op value.

Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
7 years agoxen/arm: p2m: Fix incorrect mapping of superpages
Julien Grall [Fri, 19 May 2017 16:08:39 +0000 (17:08 +0100)]
xen/arm: p2m: Fix incorrect mapping of superpages

The same set of functions is used to set as well as to clean P2M
entries, except for clean operations (INVALID_MFN ~0UL) is passed as a
parameter. Unfortunately, when calculating an appropriate target order
for a particular mapping INVALID_MFN is taken into account which leads
to 4K page target order being set each time even for 2MB and 1GB
mappings.

This will result to break down the superpage into 4K mappings and leave
empty tables allocated.

This was introduced by commit 2ef3e36ec7 "xen/arm: p2m: Introduce
p2m_set_entry and __p2m_set_entry".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/pagewalk: Fix determination of Protection Key access rights
Andrew Cooper [Tue, 16 May 2017 14:47:33 +0000 (15:47 +0100)]
x86/pagewalk: Fix determination of Protection Key access rights

 * When fabricating gl1e's from superpages, propagate the protection key as
   well, so the protection key logic sees the real key as opposed to 0.

 * Experimentally, the protection key checks are performed ahead of the other
   access rights.  In particular, accesses which fail both protection key and
   regular permission checks yield PFEC_prot_key in the resulting pagefault.

 * Protection keys apply to all data accesses to user-mode addresses,
   including accesses from supervisor code.  PKRU WD applies to any data
   write, not just to mapping which are writable.  However, a supervisor
   access without CR0.WP bypasses any protection from protection keys.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agohvmloader: avoid tests when they would clobber used memory
Jan Beulich [Fri, 19 May 2017 14:04:38 +0000 (16:04 +0200)]
hvmloader: avoid tests when they would clobber used memory

First of all limit the memory range used for testing to 4Mb: There's no
point placing page tables right above 8Mb when they can equally well
live at the bottom of the chunk at 4Mb - rep_io_test() cares about the
5Mb...7Mb range only anyway. In a subsequent patch this will then also
allow simply looking for an unused 4Mb range (instead of using a build
time determined one).

Extend the "skip tests" condition beyond the "is there enough memory"
question.

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Gary Lin <glin@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoUse non-debug build for Xen 4.9
Julien Grall [Thu, 18 May 2017 15:38:29 +0000 (16:38 +0100)]
Use non-debug build for Xen 4.9

Modify Config.mk and Kconfig.debug to disable debug by default in
preparation for late RCs and eventual release.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>