]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agox86: Support indirect thunks from assembly code
Andrew Cooper [Mon, 18 Dec 2017 13:54:25 +0000 (13:54 +0000)]
x86: Support indirect thunks from assembly code

Introduce INDIRECT_CALL and INDIRECT_JMP which either degrade to a normal
indirect branch, or dispatch to the __x86_indirect_thunk_* symbols.

Update all the manual indirect branches in to use the new thunks.  The
indirect branches in the early boot and kexec path are left intact as we can't
use the compiled-in thunks at those points.

This is part of XSA-254.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: Support compiling with indirect branch thunks
Andrew Cooper [Mon, 18 Dec 2017 13:54:25 +0000 (13:54 +0000)]
x86: Support compiling with indirect branch thunks

Use -mindirect-branch=thunk-extern/-mindirect-branch-register when available.
To begin with, use the retpoline thunk.  Later work will add alternative
thunks which can be selected at boot time.

This is part of XSA-254.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: allow Meltdown band-aid to be disabled
Jan Beulich [Tue, 16 Jan 2018 16:50:59 +0000 (17:50 +0100)]
x86: allow Meltdown band-aid to be disabled

First of all we don't need it on AMD systems. Additionally allow its use
to be controlled by command line option. For best backportability, this
intentionally doesn't use alternative instruction patching to achieve
the intended effect - while we likely want it, this will be later
follow-up.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: Meltdown band-aid against malicious 64-bit PV guests
Jan Beulich [Tue, 16 Jan 2018 16:49:03 +0000 (17:49 +0100)]
x86: Meltdown band-aid against malicious 64-bit PV guests

This is a very simplistic change limiting the amount of memory a running
64-bit PV guest has mapped (and hence available for attacking): Only the
mappings of stack, IDT, and TSS are being cloned from the direct map
into per-CPU page tables. Guest controlled parts of the page tables are
being copied into those per-CPU page tables upon entry into the guest.
Cross-vCPU synchronization of top level page table entry changes is
being effected by forcing other active vCPU-s of the guest into the
hypervisor.

The change to context_switch() isn't strictly necessary, but there's no
reason to keep switching page tables once a PV guest is being scheduled
out.

This isn't providing full isolation yet, but it should be covering all
pieces of information exposure of which would otherwise require an XSA.

There is certainly much room for improvement, especially of performance,
here - first and foremost suppressing all the negative effects on AMD
systems. But in the interest of backportability (including to really old
hypervisors, which may not even have alternative patching) any such is
being left out here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mm: Always set _PAGE_ACCESSED on L4e updates
Andrew Cooper [Fri, 1 Sep 2017 11:15:39 +0000 (12:15 +0100)]
x86/mm: Always set _PAGE_ACCESSED on L4e updates

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/Rules: Use -mskip-rax-setup if the compiler supports it
Andrew Cooper [Fri, 6 Oct 2017 13:21:32 +0000 (13:21 +0000)]
x86/Rules: Use -mskip-rax-setup if the compiler supports it

This option is available from GCC 5 onwards, and was specifically introduced
as an optimisation for Linux.  When using variadic functions, the caller needs
to know how many floating point arguments were passed.  Xen, like Linux,
doesn't uses floating point arguments, so doesn't need to emit code to inform
variadic functions such as printk() that there are zero arguments.

The net delta for a release build is:

  add/remove: 0/0 grow/shrink: 35/625 up/down: 603/-5489 (-4886)

with the single biggest change being:

  x86_emulate                               101933  101751    -182

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/bitops: Introduce variable/constant pairs for __{set,clear,change}_bit()
Andrew Cooper [Fri, 29 Dec 2017 12:56:24 +0000 (12:56 +0000)]
x86/bitops: Introduce variable/constant pairs for __{set,clear,change}_bit()

Just as with test_bit, the non-atomic set/clear/change helpers can be better
optimised by the compiler in the case that the nr parameter is constant, and
it often is.

This results in a general replacement of `mov $imm, %reg; bt* %reg, mem` with
the shorter and more simple `op $imm, mem`, also reducing register pressure.

The net diffstat is:
  add/remove: 0/1 grow/shrink: 5/17 up/down: 90/-301 (-211)

As a piece of minor cleanup, drop unnecessary brackets in the test_bit()
macro, and fix the indentation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/sched_rt: Move repl_timer into struct rt_private
Andrew Cooper [Fri, 29 Dec 2017 13:06:14 +0000 (13:06 +0000)]
xen/sched_rt: Move repl_timer into struct rt_private

struct timer is only 48 bytes and repl_timer has a 1-to-1 correspondance with
struct rt_private, so having it referenced by pointer is wasteful.

This avoids one memory allocation in rt_init(), and the resulting diffstat is:

  add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-156 (-156)
  function                                     old     new   delta
  rt_switch_sched                              134     133      -1
  rt_context_saved                             278     271      -7
  rt_vcpu_remove                               253     245      -8
  rt_vcpu_sleep                                234     218     -16
  repl_timer_handler                           761     744     -17
  rt_deinit                                     44      20     -24
  rt_init                                      219     136     -83

As an extra bit of cleanup noticed while making this change, there is no need
to call cpumask_clear() on an zeroed memory allocation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
7 years agoxen/credit2: Drop unnecessary bit test
Andrew Cooper [Fri, 29 Dec 2017 12:56:34 +0000 (12:56 +0000)]
xen/credit2: Drop unnecessary bit test

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
7 years agox86/boot: Fix boot following c/s b6c2c7f48a
Andrew Cooper [Thu, 11 Jan 2018 23:38:40 +0000 (23:38 +0000)]
x86/boot: Fix boot following c/s b6c2c7f48a

c/s b6c2c7f48a unfortunately broke booting on affected systems.  Most of the
time, ioemul_handle_quirk() doesn't write a custom stub, and the redundant
call was depending on the seemingly-pointless writing of the default stub.

Alter the ioemul_handle_quirk() API to return a boolean if a custom stub was
written, allowing its caller to know whether it should write a default stub
instead.

Finally, adjust the /* Regular stubs */ comment to make it clearer that the 16
refers to the length of the emul stub opcode.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoMAINTAINERS: update my entries to new email address.
Dario Faggioli [Wed, 10 Jan 2018 18:20:34 +0000 (19:20 +0100)]
MAINTAINERS: update my entries to new email address.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/microcode: Use the exported bootstrap_map() function
Andrew Cooper [Tue, 28 Nov 2017 19:11:12 +0000 (19:11 +0000)]
x86/microcode: Use the exported bootstrap_map() function

... rather than obtaining it via function pointer.  The internal ucode_mod_map
function pointer can also be dropped.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/xsm: Use the exported bootstrap_map() function
Andrew Cooper [Tue, 28 Nov 2017 19:07:02 +0000 (19:07 +0000)]
x86/xsm: Use the exported bootstrap_map() function

... rather than obtaining it via function pointer.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agox86/boot: Export bootstrap_map() for use in other translation units
Andrew Cooper [Tue, 28 Nov 2017 19:01:15 +0000 (19:01 +0000)]
x86/boot: Export bootstrap_map() for use in other translation units

There is one static bootstrap_map() function which is passed via function
pointer to all of its users.  This is wasteful.

Export bootstrap_map() for all x86 users, and drop the function pointer
parameter from the construct_dom0*() infrastructure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/ioemul: Account for ioemul_handle_quirk() in stub length check
Andrew Cooper [Tue, 9 Jan 2018 16:28:28 +0000 (16:28 +0000)]
x86/ioemul: Account for ioemul_handle_quirk() in stub length check

The opcode potentially written into ctxt->io_emul_stub[] in the case
that ioemul_handle_quirk() is overriding the default logic isnt
accounted for in the build-time check that the stubs are large enough.

Introduce IOEMUL_QUIRK_STUB_BYTES and use for both the main and quirk
stub cases.  As a slim optimisation, avoid writing out the default stub
when we know we are going to overwrite it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: Don't use potentially incorrect CPUID values for topology information
Jan H. Schönherr [Sun, 7 Jan 2018 20:28:20 +0000 (12:28 -0800)]
x86: Don't use potentially incorrect CPUID values for topology information

Intel says for CPUID leaf 0Bh:

  "Software must not use EBX[15:0] to enumerate processor
   topology of the system. This value in this field
   (EBX[15:0]) is only intended for display/diagnostic
   purposes. The actual number of logical processors
   available to BIOS/OS/Applications may be different from
   the value of EBX[15:0], depending on software and platform
   hardware configurations."

And yet, we're using them to derive the number cores in a package
and the number of siblings in a core.

Derive the number of siblings and cores from EAX instead, which is
intended for that.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agocommon/wait: Clarifications to wait infrastructure
Andrew Cooper [Thu, 28 Dec 2017 11:41:13 +0000 (11:41 +0000)]
common/wait: Clarifications to wait infrastructure

This logic is not as clear as it could be.  Add some comments to help.

Rearrange the asm block in __prepare_to_wait() to separate the GPR
saving/restoring from the internal logic.

While tweaking, add an unreachable() following the jmp in
check_wakeup_from_wait().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Erase guest GPR state on entry to Xen
Andrew Cooper [Wed, 16 Aug 2017 17:08:01 +0000 (17:08 +0000)]
x86/entry: Erase guest GPR state on entry to Xen

This reduces the number of code gadgets which can be attacked with arbitrary
guest-controlled GPR values.

This is part of XSA-254.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/hvm: Use SAVE_ALL to construct the cpu_user_regs frame after VMExit
Andrew Cooper [Thu, 17 Aug 2017 14:23:21 +0000 (15:23 +0100)]
x86/hvm: Use SAVE_ALL to construct the cpu_user_regs frame after VMExit

No practical change.

One side effect in debug builds is that %rbp is inverted in the manner
expected by the stack unwinder to indicate a interrupt frame.

This is part of XSA-254.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/entry: Rearrange RESTORE_ALL to restore register in stack order
Andrew Cooper [Wed, 16 Aug 2017 17:07:30 +0000 (18:07 +0100)]
x86/entry: Rearrange RESTORE_ALL to restore register in stack order

Results in a more predictable (i.e. linear) memory access pattern.

No functional change.

This is part of XSA-254.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/entry: Remove support for partial cpu_user_regs frames
Andrew Cooper [Wed, 16 Aug 2017 17:06:59 +0000 (18:06 +0100)]
x86/entry: Remove support for partial cpu_user_regs frames

Save all GPRs on entry to Xen.

The entry_int82() path is via a DPL1 gate, only usable by 32bit PV guests, so
can get away with only saving the 32bit registers.  All other entrypoints can
be reached from 32 or 64bit contexts.

This is part of XSA-254.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: Introduce a common cpuid_policy_updated()
Andrew Cooper [Thu, 16 Nov 2017 15:42:24 +0000 (15:42 +0000)]
x86: Introduce a common cpuid_policy_updated()

No practical change at the moment, but future changes will need to react
irrespective of guest type.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: Rename update_guest_vendor() callback to cpuid_policy_changed()
Andrew Cooper [Tue, 14 Nov 2017 19:12:55 +0000 (19:12 +0000)]
x86/hvm: Rename update_guest_vendor() callback to cpuid_policy_changed()

It will shortly be used for more than just changing the vendor.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/alt: Introduce ALTERNATIVE{,_2} macros
Andrew Cooper [Fri, 3 Nov 2017 16:32:59 +0000 (16:32 +0000)]
x86/alt: Introduce ALTERNATIVE{,_2} macros

To help creating alternative frames in assembly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/alt: Break out alternative-asm into a separate header file
Andrew Cooper [Fri, 3 Nov 2017 16:28:00 +0000 (16:28 +0000)]
x86/alt: Break out alternative-asm into a separate header file

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/upcall: inject a spurious event after setting upcall vector
Roger Pau Monné [Thu, 4 Jan 2018 13:29:16 +0000 (14:29 +0100)]
x86/upcall: inject a spurious event after setting upcall vector

In case the vCPU has pending events to inject. This fixes a bug that
happened if the guest mapped the vcpu info area using
VCPUOP_register_vcpu_info without having setup the event channel
upcall, and then setup the upcall vector.

In this scenario the guest would not receive any upcalls, because the
call to VCPUOP_register_vcpu_info would have marked the vCPU as having
pending events, but the vector could not be injected because it was
not yet setup.

This has not caused issues so far because all the consumers first
setup the vector callback and then map the vcpu info page, but there's
no limitation that prevents doing it in the inverse order.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/hvm: add MSR old value
Alexandru Isaila [Thu, 4 Jan 2018 13:28:29 +0000 (14:28 +0100)]
x86/hvm: add MSR old value

This patch adds the old value param and the onchangeonly option
to the VM_EVENT_REASON_MOV_TO_MSR event.

The param was added to the vm_event_mov_to_msr struct and to the
hvm_monitor_msr function. Finally I've changed the bool_t param
to a bool for the hvm_msr_write_intercept function.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/msr: Free msr_vcpu_policy during vcpu destruction
Andrew Cooper [Thu, 4 Jan 2018 13:27:38 +0000 (14:27 +0100)]
x86/msr: Free msr_vcpu_policy during vcpu destruction

c/s 4187f79dc7 "x86/msr: introduce struct msr_vcpu_policy" introduced a
per-vcpu memory allocation, but failed to free it in the clean vcpu
destruction case.

This is XSA-253.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: Correct mandatory and SMP barrier definitions
Andrew Cooper [Fri, 2 Dec 2016 15:00:41 +0000 (15:00 +0000)]
xen/x86: Correct mandatory and SMP barrier definitions

Barriers are a complicated topic, a source of confusion, and their incorrect
use is a common cause of bugs.  It really doesn't help when Xen's API is the
same as Linux, but its ABI different.

Bring the two back in line, so programmers stand a chance of actually getting
their usage correct.

Drop the links in the comment, both of which are now stale.  Instead, refer to
the vendor system manuals in a generic way.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: Drop unnecessary barriers
Andrew Cooper [Fri, 2 Dec 2016 15:00:41 +0000 (15:00 +0000)]
xen/x86: Drop unnecessary barriers

x86's current implementation of wmb() is a compiler barrier.  As a result, the
only change in this patch is to remove an mfence instruction from
cpuidle_disable_deep_cstate().

None of these barriers serve any purpose.  They are not synchronising with
remote cpus, and their compiler-barrier properties are not needed for
correctness purposes.

Furthermore, these wmb()'s specifically do not want to turn into sfence
instructions in future changes where wmb()'s implementation is corrected.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: fix typo in comment of rb_insert_color
Wei Yang [Wed, 3 Jan 2018 11:42:27 +0000 (12:42 +0100)]
rbtree: fix typo in comment of rb_insert_color

In case 1, it passes down the BLACK color from G to p and u, and maintains
the color of n.  By doing so, it maintains the black height of the sub-tree.

While in the comment, it marks the color of n to BLACK.  This is a typo
and not consistents with the code.

This patch fixs this typo in comment.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 1b9c53e849aa65776d4f611d99aa09f856518dad]

Ported to Xen for rb_insert_color API.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: low level optimizations in rb_erase()
Michel Lespinasse [Wed, 3 Jan 2018 11:42:07 +0000 (12:42 +0100)]
rbtree: low level optimizations in rb_erase()

Various minor optimizations in rb_erase():
- Avoid multiple loading of node->__rb_parent_color when computing parent
  and color information (possibly not in close sequence, as there might
  be further branches in the algorithm)
- In the 1-child subcase of case 1, copy the __rb_parent_color field from
  the erased node to the child instead of recomputing it from the desired
  parent and color
- When searching for the erased node's successor, differentiate between
  cases 2 and 3 based on whether any left links were followed. This avoids
  a condition later down.
- In case 3, keep a pointer to the erased node's right child so we don't
  have to refetch it later to adjust its parent.
- In the no-childs subcase of cases 2 and 3, place the rebalance assigment
  last so that the compiler can remove the following if(rebalance) test.

Also, added some comments to illustrate cases 2 and 3.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 4f035ad67f4633c233cb3642711d49b4efc9c82d]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color()
Michel Lespinasse [Wed, 3 Jan 2018 11:41:47 +0000 (12:41 +0100)]
rbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color()

An interesting observation for rb_erase() is that when a node has
exactly one child, the node must be black and the child must be red.
An interesting consequence is that removing such a node can be done by
simply replacing it with its child and making the child black,
which we can do efficiently in rb_erase(). __rb_erase_color() then
only needs to handle the no-childs case and can be modified accordingly.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 46b6135a7402ac23c5b25f2bd79b03bab8f98278]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: place easiest case first in rb_erase()
Michel Lespinasse [Wed, 3 Jan 2018 11:41:29 +0000 (12:41 +0100)]
rbtree: place easiest case first in rb_erase()

In rb_erase, move the easy case (node to erase has no more than
1 child) first. I feel the code reads easier that way.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 60670b8034d6e2ba860af79c9379b7788d09db73]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: add __rb_change_child() helper function
Michel Lespinasse [Wed, 3 Jan 2018 11:41:11 +0000 (12:41 +0100)]
rbtree: add __rb_change_child() helper function

Add __rb_change_child() as an inline helper function to replace code that
would otherwise be duplicated 4 times in the source.

No changes to binary size or speed.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 7abc704ae399fcb9c51ca200b0456f8a975a8011]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: optimize fetching of sibling node
Michel Lespinasse [Wed, 3 Jan 2018 11:40:52 +0000 (12:40 +0100)]
rbtree: optimize fetching of sibling node

When looking to fetch a node's sibling, we went through a sequence of:
- check if node is the parent's left child
- if it is, then fetch the parent's right child

This can be replaced with:
- fetch the parent's right child as an assumed sibling
- check that node is NOT the fetched child

This avoids fetching the parent's left child when node is actually
that child. Saves a bit on code size, though it doesn't seem to make
a large difference in speed.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 59633abf34e2f44b8e772a2c12a92132aa7c2220]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: coding style adjustments
Michel Lespinasse [Wed, 3 Jan 2018 11:40:29 +0000 (12:40 +0100)]
rbtree: coding style adjustments

Set comment and indentation style to be consistent with linux coding style
and the rest of the file, as suggested by Peter Zijlstra

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 7ce6ff9e5de99e7b72019c7de82fb438fe1dc5a0]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agosimplify xenmem_add_to_physmap_batch()
Jan Beulich [Wed, 3 Jan 2018 10:05:05 +0000 (11:05 +0100)]
simplify xenmem_add_to_physmap_batch()

There's no need for
- advancing the handles and at the same time using
  __copy_{from,to}_guest_offset(),
- an "out" label,
- local variables "done" and (function scope) "rc".

To better reflect its resulting use also rename the function's "start"
parameter to "extent".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/E820: improve insn selection
Jan Beulich [Wed, 3 Jan 2018 10:04:26 +0000 (11:04 +0100)]
x86/E820: improve insn selection

..., largely to shrink code size a little:
- use TEST instead of CMP with zero immediate
- use MOVZWL instead of AND with 0xffff immediate
- compute final highmem_bk value in registers, accessing memory just
  once

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/E820: don't overrun array
Jan Beulich [Wed, 3 Jan 2018 10:03:56 +0000 (11:03 +0100)]
x86/E820: don't overrun array

The bounds check needs to be done after the increment, not before, or
else it needs to use a one lower immediate. Also use word operations
rather than byte ones for both the increment and the compare (allowing
E820_BIOS_MAX to be more easily bumped, should the need ever arise).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/IRQ: conditionally preserve access permission on map error paths
Jan Beulich [Wed, 3 Jan 2018 10:03:10 +0000 (11:03 +0100)]
x86/IRQ: conditionally preserve access permission on map error paths

Permissions that had been granted before should not be revoked when
handling unrelated errors.

Reported-by: HW42 <hw42@ipsumj.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/Intel: drop another 32-bit leftover
Jan Beulich [Wed, 3 Jan 2018 10:02:10 +0000 (11:02 +0100)]
x86/Intel: drop another 32-bit leftover

None of the models MISC_ENABLE MSR access is excluded for support 64-bit
mode - drop the conditional from early_init_intel(). Also convert
pointless rdmsr_safe() elsewhere to rdmsrl().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/x86: Replace appropriate mandatory barriers with SMP barriers
Andrew Cooper [Wed, 5 Oct 2016 11:42:15 +0000 (12:42 +0100)]
xen/x86: Replace appropriate mandatory barriers with SMP barriers

There is no functional change.  Xen currently assignes smp_* meaning to
the non-smp_* barriers.

All of these uses are just to deal with shared memory between multiple
processors, which means that the smp_*() varients are the correct ones to use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/mcheck: Drop unnecessary barriers
Andrew Cooper [Fri, 2 Dec 2016 15:00:41 +0000 (15:00 +0000)]
x86/mcheck: Drop unnecessary barriers

spin_unlock() has full barrier semantics already.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: low level optimizations in __rb_erase_color()
Michel Lespinasse [Wed, 20 Dec 2017 17:03:51 +0000 (18:03 +0100)]
rbtree: low level optimizations in __rb_erase_color()

In __rb_erase_color(), we often already have pointers to the nodes being
rotated and/or know what their colors must be, so we can generate more
efficient code than the generic __rb_rotate_left() and __rb_rotate_right()
functions.

Also when the current node is red or when flipping the sibling's color,
the parent is already known so we can use the more efficient
rb_set_parent_color() function to set the desired color.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 6280d2356fd8ad0936a63c10dc1e6accf48d0c61]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: optimize case selection logic in __rb_erase_color()
Michel Lespinasse [Wed, 20 Dec 2017 17:03:31 +0000 (18:03 +0100)]
rbtree: optimize case selection logic in __rb_erase_color()

In __rb_erase_color(), we have to select one of 3 cases depending on the
color on the 'other' node children.  If both children are black, we flip a
few node colors and iterate.  Otherwise, we do either one or two tree
rotations, depending on the color of the 'other' child opposite to 'node',
and then we are done.

The corresponding logic had duplicate checks for the color of the 'other'
child opposite to 'node'.  It was checking it first to determine if both
children are black, and then to determine how many tree rotations are
required.  Rearrange the logic to avoid that extra check.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit e125d1471a4f8f1bf7ea9a83deb8d23cb40bd712]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: adjust node color in __rb_erase_color() only when necessary
Michel Lespinasse [Wed, 20 Dec 2017 17:03:09 +0000 (18:03 +0100)]
rbtree: adjust node color in __rb_erase_color() only when necessary

In __rb_erase_color(), we were always setting a node to black after
exiting the main loop.  And in one case, after fixing up the tree to
satisfy all rbtree invariants, we were setting the current node to root
just to guarantee a loop exit, at which point the root would be set to
black.  However this is not necessary, as the root of an rbtree is already
known to be black.  The only case where the color flip is required is when
we exit the loop due to the current node being red, and it's easiest to
just do the flip at that point instead of doing it after the loop.

[adrian.hunter@intel.com: perf tools: fix build for another rbtree.c change]
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit d6ff1273928ebf15466a85b7e1810cd00e72998b]

Ported only rbtree.c to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: low level optimizations in rb_insert_color()
Michel Lespinasse [Wed, 20 Dec 2017 17:02:50 +0000 (18:02 +0100)]
rbtree: low level optimizations in rb_insert_color()

- Use the newly introduced rb_set_parent_color() function to flip the color
  of nodes whose parent is already known.
- Optimize rb_parent() when the node is known to be red - there is no need
  to mask out the color in that case.
- Flipping gparent's color to red requires us to fetch its rb_parent_color
  field, so we can reuse it as the parent value for the next loop iteration.
- Do not use __rb_rotate_left() and __rb_rotate_right() to handle tree
  rotations: we already have pointers to all relevant nodes, and know their
  colors (either because we want to adjust it, or because we've tested it,
  or we can deduce it as black due to the node proximity to a known red node).
  So we can generate more efficient code by making use of the node pointers
  we already have, and setting both the parent and color attributes for
  nodes all at once. Also in Case 2, some node attributes don't have to
  be set because we know another tree rotation (Case 3) will always follow
  and override them.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 5bc9188aa207dafd47eab57df7c4fe5b3d3f636a]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: adjust root color in rb_insert_color() only when necessary
Michel Lespinasse [Wed, 20 Dec 2017 17:02:31 +0000 (18:02 +0100)]
rbtree: adjust root color in rb_insert_color() only when necessary

The root node of an rbtree must always be black.  However,
rb_insert_color() only needs to maintain this invariant when it has been
broken - that is, when it exits the loop due to the current (red) node
being the root.  In all other cases (exiting after tree rotations, or
exiting due to an existing black parent) the invariant is already
satisfied, so there is no need to adjust the root node color.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 6d58452dc066db61acdff7b84671db1b11a3de1c]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: break out of rb_insert_color loop after tree rotation
Michel Lespinasse [Wed, 20 Dec 2017 17:02:12 +0000 (18:02 +0100)]
rbtree: break out of rb_insert_color loop after tree rotation

It is a well known property of rbtrees that insertion never requires more
than two tree rotations.  In our implementation, after one loop iteration
identified one or two necessary tree rotations, we would iterate and look
for more.  However at that point the node's parent would always be black,
which would cause us to exit the loop.

We can make the code flow more obvious by just adding a break statement
after the tree rotations, where we know we are done.  Additionally, in the
cases where two tree rotations are necessary, we don't have to update the
'node' pointer as it wouldn't be used until the next loop iteration, which
we now avoid due to this break statement.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 1f0528653e41ec230c60f5738820e8a544731399]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: move some implementation details from rbtree.h to rbtree.c
Michel Lespinasse [Wed, 20 Dec 2017 17:01:47 +0000 (18:01 +0100)]
rbtree: move some implementation details from rbtree.h to rbtree.c

rbtree users must use the documented APIs to manipulate the tree
structure.  Low-level helpers to manipulate node colors and parenthood are
not part of that API, so move them to lib/rbtree.c

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit bf7ad8eeab995710c766df49c9c69a8592ca0216]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: empty nodes have no color
Michel Lespinasse [Wed, 20 Dec 2017 17:01:26 +0000 (18:01 +0100)]
rbtree: empty nodes have no color

Empty nodes have no color.  We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also,
we can get rid of the rb_init_node function which had been introduced by
commit 88d19cf37952 ("timers: Add rb_init_node() to allow for stack
allocated rb nodes") to avoid some issue with the empty node's color not
being initialized.

I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though.  axboe introduced them in commit 10fd48f2376d
("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I
see it, the 'empty node' abstraction is only used by rbtree users to
flag nodes that they haven't inserted in any rbtree, so asking the
predecessor or successor of such nodes doesn't make any sense.

One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups.  This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.

[sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 4c199a93a2d36b277a9fd209a0f2793f8460a215]

Ported rbtree.h and rbtree.c changes which are relevant to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorbtree: remove redundant if()-condition in rb_erase()
Wolfram Strepp [Wed, 20 Dec 2017 17:00:49 +0000 (18:00 +0100)]
rbtree: remove redundant if()-condition in rb_erase()

Furthermore, notice that the initial checks:

            if (!node->rb_left)
                    child = node->rb_right;
            else if (!node->rb_right)
                    child = node->rb_left;
            else
            {
                    ...
            }
guarantee that old->rb_right is set in the final else branch, therefore
we can omit checking that again.

Signed-off-by: Wolfram Strepp <wstrepp@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 4b324126e0c6c3a5080ca3ec0981e8766ed6f1ee]

Ported to Xen.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/nops: Switch to the P6 nops as a compile-time default
Andrew Cooper [Wed, 20 Dec 2017 11:52:15 +0000 (11:52 +0000)]
x86/nops: Switch to the P6 nops as a compile-time default

Along with c/s d7128e735031 switching the runtime choice of best nops, switch
the compile-time default to P6 nops.  This is more efficient on most
processors for alternative points which add/remove code, rather than switch
between two different pieces of code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agorevert d64ac4d5f2's stray changed to Config.mk
Jan Beulich [Wed, 20 Dec 2017 10:37:43 +0000 (11:37 +0100)]
revert d64ac4d5f2's stray changed to Config.mk

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/shadow: make 1-bit-disable match 1-bit-enable
Jan Beulich [Wed, 20 Dec 2017 09:12:11 +0000 (10:12 +0100)]
x86/shadow: make 1-bit-disable match 1-bit-enable

shadow_one_bit_enable() sets PG_SH_enable (if not already set of course)
in addition to the bit being requested. Make shadow_one_bit_disable()
behave similarly - clear PG_SH_enable if that's the only bit remaining.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agox86/shadow: ignore sh_pin() failure in one more case
Jan Beulich [Wed, 20 Dec 2017 09:05:16 +0000 (10:05 +0100)]
x86/shadow: ignore sh_pin() failure in one more case

Following what we've already done in the XSA-250 fix, convert another
sh_pin() caller to no longer fail the higher level operation if pinning
fails, as pinning is a performance optimization only in those places.

Suggested-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/shadow: remove pointless loops over all vCPU-s
Jan Beulich [Wed, 20 Dec 2017 09:04:48 +0000 (10:04 +0100)]
x86/shadow: remove pointless loops over all vCPU-s

The vCPU count can be had more directly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agox86/shadow: drop further 32-bit relics
Jan Beulich [Wed, 20 Dec 2017 09:04:16 +0000 (10:04 +0100)]
x86/shadow: drop further 32-bit relics

PV guests don't ever get shadowed in other than 4-level mode anymore;
commit 5a3ce8f85e ("x86/shadow: drop stray name tags from
sh_{guest_get,map}_eff_l1e()") didn't go quite fare enough (and there's
a good chance that further cleanup opportunity exists, which I simply
didn't notice).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agox86: introduce NOP9 forms
Jan Beulich [Wed, 20 Dec 2017 09:03:20 +0000 (10:03 +0100)]
x86: introduce NOP9 forms

Both Intel and AMD recommend an operand-size-override-prefixed long NOP
form for covering 9 bytes, so introduce this and use it in p6_nops[] to
allow further reducing the number of NOPs needed when covering larger
ranges.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: improve NOP use for AMD CPUs
Jan Beulich [Wed, 20 Dec 2017 09:02:53 +0000 (10:02 +0100)]
x86: improve NOP use for AMD CPUs

For Fam10 and later AMD recommends using the "long" NOP forms. Re-write
the present Intel code into switch() statements and add AMD logic.

Default to "long" forms (which all 64-bit CPUs are supposed to
recognize), overriding to the K8 flavor on those few (older) CPUs.

This at the same time brings us in line again in this regard with
current Linux.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: implement set value flow for MBA
Yi Sun [Tue, 19 Dec 2017 01:49:00 +0000 (02:49 +0100)]
x86: implement set value flow for MBA

This patch implements set value flow for MBA including its callback
function and domctl interface.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: implement get value interface for MBA
Yi Sun [Tue, 19 Dec 2017 01:49:00 +0000 (02:49 +0100)]
x86: implement get value interface for MBA

This patch implements get value domctl interface for MBA.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: implement get hw info flow for MBA
Yi Sun [Tue, 19 Dec 2017 01:50:00 +0000 (02:50 +0100)]
x86: implement get hw info flow for MBA

This patch implements get HW info flow for MBA including its callback
function and sysctl interface.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/dom0: remove is_pv_domain leftovers from the PV domain builder
Roger Pau Monné [Wed, 20 Dec 2017 09:00:16 +0000 (10:00 +0100)]
x86/dom0: remove is_pv_domain leftovers from the PV domain builder

Those where added when PVHv1 was sharing the domain builder with PV.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/dom0: remove autotranslate leftovers
Roger Pau Monné [Wed, 20 Dec 2017 08:59:21 +0000 (09:59 +0100)]
x86/dom0: remove autotranslate leftovers

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxl/pvh: force PVH guests to use the xenstore shutdown
Roger Pau Monne [Tue, 19 Dec 2017 14:17:52 +0000 (14:17 +0000)]
libxl/pvh: force PVH guests to use the xenstore shutdown

PVH guests are all required to support the xenstore-based shutdown
signalling, since there is no other way for a PVH guest to be
requested to shut down.

For HVM guests we check whether the guest has installed a PV-on-HVM
interrupt callback; that does not make sense for PVH guests.

So for PVH guests, take the PV path: assume that all PVH guests have
suitable xenstore drivers.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agogcov: rename folder and header to coverage
Roger Pau Monne [Thu, 9 Nov 2017 11:15:00 +0000 (12:15 +0100)]
gcov: rename folder and header to coverage

Preparatory change before adding llvm profiling support.
No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agokconfig/gcov: remove gcc version choice from kconfig
Roger Pau Monne [Thu, 9 Nov 2017 11:16:00 +0000 (12:16 +0100)]
kconfig/gcov: remove gcc version choice from kconfig

Use autodetect only.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoVMX: drop bogus gpa parameter from __invept()
Jan Beulich [Fri, 15 Dec 2017 10:18:06 +0000 (11:18 +0100)]
VMX: drop bogus gpa parameter from __invept()

Perhaps there once was a plan to have a flush type requiring this, but
the current SDM has no mention of such and all callers pass zero anyway.

Take the opportunity and also change involved types to uint64_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agodomctl: improve locking during domain destruction
Jan Beulich [Fri, 15 Dec 2017 10:17:19 +0000 (11:17 +0100)]
domctl: improve locking during domain destruction

There is no need to hold the global domctl lock across domain_kill() -
the domain lock is fully sufficient here, and parallel cleanup after
multiple domains performs quite a bit better this way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: make _get_page_type() a proper counterpart of _put_page_type() again
Jan Beulich [Fri, 15 Dec 2017 10:16:32 +0000 (11:16 +0100)]
x86: make _get_page_type() a proper counterpart of _put_page_type() again

Drop one of the leading underscores and use bool for its "preemptible"
parameter.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: use switch() in _put_page_type()
Jan Beulich [Fri, 15 Dec 2017 10:15:54 +0000 (11:15 +0100)]
x86: use switch() in _put_page_type()

Use this to cheaply add another assertion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: improve _put_page_type() readability
Jan Beulich [Fri, 15 Dec 2017 10:15:16 +0000 (11:15 +0100)]
x86: improve _put_page_type() readability

By limiting the scope of rc it is more obvious that failure can be
reported only if _put_final_page_type() failed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citix.com>
7 years agox86: remove _PAGE_PSE check from get_page_from_l2e()
Jan Beulich [Fri, 15 Dec 2017 10:14:31 +0000 (11:14 +0100)]
x86: remove _PAGE_PSE check from get_page_from_l2e()

With L2_DISALLOW_MASK containing _PAGE_PSE unconditionally as of commit
56fff3e5e9 ("x86: nuke PV superpage option and code") there's no point
anymore in separately checking for the bit.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: make get_page_from_mfn() return struct page_info *
Jan Beulich [Fri, 15 Dec 2017 10:13:49 +0000 (11:13 +0100)]
x86: make get_page_from_mfn() return struct page_info *

Almost all users of it want it, and it calculates it anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/HVM: fix hvmemul_rep_outs_set_context()
Jan Beulich [Fri, 15 Dec 2017 10:11:36 +0000 (11:11 +0100)]
x86/HVM: fix hvmemul_rep_outs_set_context()

There were two issues with this function: Its use of
hvmemul_do_pio_buffer() was wrong (the function deals only with
individual port accesses, not repeated ones, i.e. passing it
"*reps * bytes_per_rep" does not have the intended effect). And it
could have processed a larger set of operations in one go than was
probably intended (limited just by the size that xmalloc() can hand
back).

By converting to proper use of hvmemul_do_pio_buffer(), no intermediate
buffer is needed at all. As a result a preemption check is being added.

Also drop unused parameters from the function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agox86: implement data structure and CPU init flow for MBA
Yi Sun [Fri, 20 Oct 2017 08:50:00 +0000 (10:50 +0200)]
x86: implement data structure and CPU init flow for MBA

This patch implements main data structures of MBA.

Like CAT features, MBA HW info has cos_max which means the max thrtl
register number, and thrtl_max which means the max throttle value
(delay value). It also has a flag to represent if the throttle
value is linear or non-linear.

One thrtl register of MBA stores a throttle value for one or more
domains. The throttle value means the delay applied to traffic between
L2 cache and next cache level.

This patch also implements init flow for MBA and register stub
callback functions.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: a few optimizations to psr codes
Yi Sun [Fri, 20 Oct 2017 08:50:00 +0000 (10:50 +0200)]
x86: a few optimizations to psr codes

This patch refines psr codes:
1. Change type of 'cat_init_feature' to 'bool' to remove the pointless
   returning of error code.
2. Move printk in 'cat_init_feature' to reduce a return path.
3. Define a local variable 'feat_mask' in 'psr_cpu_init' to reduce calling of
   'cpuid_count_leaf()'.
4. Change 'PSR_INFO_IDX_CAT_FLAG' to 'PSR_INFO_IDX_CAT_FLAGS'.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: rename 'cbm_type' to 'psr_type' to make it general
Yi Sun [Fri, 20 Oct 2017 08:50:00 +0000 (10:50 +0200)]
x86: rename 'cbm_type' to 'psr_type' to make it general

This patch renames 'cbm_type' to 'psr_type' to generalize it.
Then, we can reuse this for all psr allocation features.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoRename PSR sysctl/domctl interfaces and xsm policy to make them be general
Yi Sun [Tue, 24 Oct 2017 09:33:00 +0000 (11:33 +0200)]
Rename PSR sysctl/domctl interfaces and xsm policy to make them be general

This patch renames PSR sysctl/domctl interfaces and related xsm policy to
make them be general for all resource allocation features but not only
for CAT. Then, we can resuse the interfaces for all allocation features.

Basically, it changes 'psr_cat_op' to 'psr_alloc', and remove 'CAT_' from some
macros. E.g.:
1. psr_cat_op -> psr_alloc
2. XEN_DOMCTL_psr_cat_op -> XEN_DOMCTL_psr_alloc
3. XEN_SYSCTL_psr_cat_op -> XEN_SYSCTL_psr_alloc
4. XEN_DOMCTL_PSR_CAT_SET_L3_CBM -> XEN_DOMCTL_PSR_SET_L3_CBM
5. XEN_SYSCTL_PSR_CAT_get_l3_info -> XEN_SYSCTL_PSR_get_l3_info

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agodocs: create Memory Bandwidth Allocation (MBA) feature document
Yi Sun [Fri, 20 Oct 2017 08:50:00 +0000 (10:50 +0200)]
docs: create Memory Bandwidth Allocation (MBA) feature document

This patch creates MBA feature document in doc/features/. It describes
key points to implement MBA which is described in details in Intel SDM
"Introduction to Memory Bandwidth Allocation".

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/vmx: Don't use hvm_inject_hw_exception() in long_mode_do_msr_write()
Andrew Cooper [Wed, 6 Dec 2017 17:46:20 +0000 (17:46 +0000)]
x86/vmx: Don't use hvm_inject_hw_exception() in long_mode_do_msr_write()

Since c/s 49de10f3c1718 "x86/hvm: Don't raise #GP behind the emulators back
for MSR accesses", returning X86EMUL_EXCEPTION has pushed the exception
generation to the top of the call tree.

Using hvm_inject_hw_exception() and returning X86EMUL_EXCEPTION causes a
double #GP injection, which combines to #DF.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/efer: Make {read,write}_efer() into inline helpers
Andrew Cooper [Mon, 23 Oct 2017 09:49:33 +0000 (10:49 +0100)]
x86/efer: Make {read,write}_efer() into inline helpers

There is no need for the overhead of a call to a separate translation unit.
While moving the implementation, update them to use uint64_t over u64

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/domctl: Avoid redundant zeroing in XEN_DOMCTL_get_vcpu_msrs
Andrew Cooper [Fri, 1 Dec 2017 13:16:12 +0000 (13:16 +0000)]
x86/domctl: Avoid redundant zeroing in XEN_DOMCTL_get_vcpu_msrs

Zero the msr structure once at initialisation time, and avoid re-zeroing the
reserved field every time the structure is used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/efi: Fix build with clang-5.0
Andrew Cooper [Wed, 13 Dec 2017 16:55:38 +0000 (16:55 +0000)]
xen/efi: Fix build with clang-5.0

The clang-5.0 build is reliably failing with:

  Error: size of boot.o:.text is 0x01

which is because efi_arch_flush_dcache_area() exists as a single ret
instruction.  Mark it as __init like everything else in the files.

Spotted by Travis.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/microcode: Add support for fam17h microcode loading
Tom Lendacky [Thu, 30 Nov 2017 22:46:40 +0000 (16:46 -0600)]
x86/microcode: Add support for fam17h microcode loading

The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes.  Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[Linux commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf]

Ported to Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/intel: Drop zeroed-out select_idle_routine() function
Andrew Cooper [Wed, 6 Dec 2017 18:44:15 +0000 (18:44 +0000)]
x86/intel: Drop zeroed-out select_idle_routine() function

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/arm: traps: Merge do_trap_instr_abort_guest and do_trap_data_abort_guest
Julien Grall [Tue, 12 Dec 2017 19:02:12 +0000 (19:02 +0000)]
xen/arm: traps: Merge do_trap_instr_abort_guest and do_trap_data_abort_guest

The two helpers do_trap_instr_abort_guest and do_trap_data_abort_guest
are used trap stage-2 abort. While the former is only handling prefetch
abort and the latter data abort, they are very similarly and does not
warrant to have separate helpers.

For instance, merging the both will make easier to maintain stage-2 abort
handling. So consolidate the two helpers in a new helper
do_trap_stage2_abort.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: traps: Move the definition of mmio_info_t in try_handle_mmio
Julien Grall [Tue, 12 Dec 2017 19:02:11 +0000 (19:02 +0000)]
xen/arm: traps: Move the definition of mmio_info_t in try_handle_mmio

mmio_info_t is currently filled by do_trap_data_guest_abort but only
important when emulation an MMIO region.

A follow-up patch will merge stage-2 prefetch abort and stage-2 data abort
in a single helper. To prepare that, mmio_info_t is now filled by
try_handle_mmio.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org.
7 years agoxen/arm: traps: Remove the field gva from mmio_info_t
Julien Grall [Tue, 12 Dec 2017 19:02:10 +0000 (19:02 +0000)]
xen/arm: traps: Remove the field gva from mmio_info_t

mmio_info_t is used to gather information in order do emulation of a
region. Guest virtual address is unlikely to be a useful information and
not currently used. So remove the field gva from mmio_info_t and replace
by a local variable.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: p2m: Fold p2m_tlb_flush into p2m_force_tlb_flush_sync
Julien Grall [Tue, 12 Dec 2017 19:02:09 +0000 (19:02 +0000)]
xen/arm: p2m: Fold p2m_tlb_flush into p2m_force_tlb_flush_sync

p2m_tlb_flush is called in 2 places: p2m_alloc_table and
p2m_force_tlb_flush_sync.

p2m_alloc_table is called when the domain is initialized and could be
replace by a call to p2m_force_tlb_flush_sync with the P2M write locked.

This seems a bit pointless but would allow to have a single API for
flushing and avoid misusage in the P2M code.

So update p2m_alloc_table to use p2m_force_tlb_flush_sync and fold
p2m_tlb_flush in p2m_force_tlb_flush_sync.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: p2m: Introduce p2m_tlb_flush_sync, export it and use it
Julien Grall [Tue, 12 Dec 2017 19:02:08 +0000 (19:02 +0000)]
xen/arm: p2m: Introduce p2m_tlb_flush_sync, export it and use it

Multiple places in the code requires to flush the TLBs only when
p2m->need_flush is set.

Rather than open-coding it, introduce a new helper p2m_tlb_flush_sync to
do it.

Note that p2m_tlb_flush_sync is exported as it might be used by other
part of Xen.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: p2m: Rename p2m_flush_tlb and p2m_flush_tlb_sync
Julien Grall [Tue, 12 Dec 2017 19:02:07 +0000 (19:02 +0000)]
xen/arm: p2m: Rename p2m_flush_tlb and p2m_flush_tlb_sync

Rename p2m_flush_tlb and p2m_flush_tlb_sync to respectively
p2m_tlb_flush and p2m_force_tlb_flush_sync.

At first glance, inverting 'flush' and 'tlb'  might seem pointless but
would be helpful in the future in order to get more easily some code ported
from x86 P2M or even to shared with.

For p2m_flush_tlb_sync, the 'force' was added because the TLBs are
flush unconditionally. A follow-up patch will add an helper to flush
TLBs only in certain cases.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain_build: Use copy_to_guest_phys_flush_dcache in dtb_load
Julien Grall [Tue, 12 Dec 2017 19:02:06 +0000 (19:02 +0000)]
xen/arm: domain_build: Use copy_to_guest_phys_flush_dcache in dtb_load

The function dtb_load is dealing with IPA but uses gvirt_to_maddr to do
the translation. This is currently working fine because the stage-1 MMU
is disabled.

Rather than relying on such assumption, use the new
copy_to_guest_phys_flush_dcache. This also result to a slightly more
comprehensible code.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain_build: Rework initrd_load to use the generic copy helper
Julien Grall [Tue, 12 Dec 2017 19:02:05 +0000 (19:02 +0000)]
xen/arm: domain_build: Rework initrd_load to use the generic copy helper

The function initrd_load is dealing with IPA but uses gvirt_to_maddr to
do the translation. This is currently working fine because the stage-1 MMU
is disabled.

Furthermore, the function is implementing its own copy to guest resulting
in code duplication and making more difficult to update the logic in
page-tables (such support for Populate On Demand).

The new copy_to_guest_phys_flush_dcache could be used here by temporarily
mapping the full initrd in the virtual space.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: kernel: Rework kernel_zimage_load to use the generic copy helper
Julien Grall [Tue, 12 Dec 2017 19:02:04 +0000 (19:02 +0000)]
xen/arm: kernel: Rework kernel_zimage_load to use the generic copy helper

The function kernel_zimage is dealing with IPA but uses gvirt_to_maddr to
do the translation. This is currently working fine because the stage-1 MMU
is disabled.

Furthermore, the function is implementing its own copy to guest resulting
in code duplication and making more difficult to update the logic in
page-tables (such support for Populate On Demand).

The new copy_to_guest_phys_flush_dcache could be used here by
temporarily mapping the full kernel in the virtual space.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Introduce copy_to_guest_phys_flush_dcache
Julien Grall [Tue, 12 Dec 2017 19:02:03 +0000 (19:02 +0000)]
xen/arm: Introduce copy_to_guest_phys_flush_dcache

This new function will be used in a follow-up patch to copy data to the guest
using the IPA (aka guest physical address) and then clean the cache.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Extend copy_to_guest to support copying from/to guest physical address
Julien Grall [Tue, 12 Dec 2017 19:02:02 +0000 (19:02 +0000)]
xen/arm: Extend copy_to_guest to support copying from/to guest physical address

The only differences between copy_to_guest and access_guest_memory_by_ipa are:
    - The latter does not support copying data crossing page boundary
    - The former is copying from/to guest VA whilst the latter from
    guest PA

copy_to_guest can easily be extended to support copying from/to guest
physical address. For that a new bit is used to tell whether linear
address or ipa is been used.

Lastly access_guest_memory_by_ipa is reimplemented using copy_to_guest.
This also has the benefits to extend the use of it, it is now possible
to copy data crossing page boundary.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: guest_copy: Extend the prototype to pass the vCPU
Julien Grall [Tue, 12 Dec 2017 19:02:01 +0000 (19:02 +0000)]
xen/arm: guest_copy: Extend the prototype to pass the vCPU

Currently, guest_copy assumes the copy will only be done for the current
vCPU. copy_guest is meant to be vCPU agnostic, so extend the prototype
to pass the vCPU.

At the same time, encapsulate the vCPU in an union to allow extension
for copying from a guest domain (ipa case) in the future.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>