Roger Pau Monne [Thu, 21 Dec 2017 18:49:27 +0000 (18:49 +0000)]
xen/shim: allow DomU to have as many vcpus as available
Since the shim VCPUOP_{up/down} hypercall is wired to the plug/unplug
of CPUs to the shim itself, start the shim DomU with only the BSP
online, and let the guest bring up other CPUs as it needs them.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Wed, 20 Dec 2017 14:53:19 +0000 (14:53 +0000)]
xen/shim: crash instead of reboot in shim mode
All guest shutdown operations are forwarded to L0, so the only native
calls to machine_restart happen from crash related paths inside the
hypervisor, hence switch the reboot code to instead issue a crash
shutdown.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Wed, 20 Dec 2017 11:09:09 +0000 (11:09 +0000)]
xen/pvshim: use default position for the m2p mappings
When running a 32bit kernel as Dom0 on a 64bit hypervisor the
hypervisor will try to shrink the hypervisor hole to the minimum
needed, and thus requires the Dom0 to use XENMEM_machphys_mapping in
order to fetch the position of the start of the hypervisor virtual
mappings.
Disable this feature when running as a PV shim, since some DomU
kernels don't implemented XENMEM_machphys_mapping and break if the m2p
doesn't begin at the default address.
NB: support for the XENMEM_machphys_mapping was added in Linux by
commit 7e7750.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Wed, 20 Dec 2017 11:09:06 +0000 (11:09 +0000)]
xen/pvshim: add grant table operations
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Anthony Liguori <aliguori@amazon.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v1:
- Use __ of copy_to_guest.
- Return EOPNOTSUPP for not implemented grant table hypercalls.
Roger Pau Monne [Wed, 20 Dec 2017 11:09:06 +0000 (11:09 +0000)]
xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
Note that the unmask and the virq operations are handled by the shim
itself, and that FIFO event channels are not exposed to the guest.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Anthony Liguori <aliguori@amazon.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
Changes since v1:
- Use find_first_set_bit instead of ffsl.
- Constify the domain parameter of some functions.
- Indent macro one more level.
- Have a single evtchn_close struct in pv_shim_event_channel_op.
- Add blank lines between switch cases.
- Use -EOPNOTSUPP in order to signal lack of FIFO or PIRQ support.
- Switch evtchn_bind_virq parameter to evtchn_port_t and use 0 signal
allocation needed.
- Switch evtchn helpers return type to int instead of long.
Roger Pau Monne [Wed, 20 Dec 2017 11:09:06 +0000 (11:09 +0000)]
xen/pvshim: set correct domid value
If domid is not provided by L0 set domid to 1 by default. Note that L0
not provinding the domid can cause trouble if the guest tries to use
it's domid instead of DOMID_SELF when performing hypercalls that are
forwarded to the L0 hypervisor.
Since the domain created is no longer the hardware domain add a hook
to the domain shutdown path in order to forward shutdown operations to
the L0 hypervisor.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
Changes since v1:
- s/get_dom0_domid/get_initial_domain_id/.
- Add a comment regarding why dom0 needs to be global.
Roger Pau Monne [Wed, 20 Dec 2017 11:09:06 +0000 (11:09 +0000)]
xen/pvshim: modify Dom0 builder in order to build a DomU
According to the PV ABI the initial virtual memory regions should
contain the xenstore and console pages after the start_info. Also set
the correct values in the start_info for DomU operation.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Wed, 20 Dec 2017 11:09:08 +0000 (11:09 +0000)]
sched/null: skip vCPUs on the waitqueue that are blocked
Avoid scheduling vCPUs that are down, there's no point in assigning
them to a pCPU because they are not going to run anyway.
Since down vCPUs are not assigned to pCPUs after this change, force a
rescheduling when a vCPU is brought up if it's on the waitqueue. Also
when scheduling try to pick a vCPU from the runqueue if the pCPU is
running idle.
There's no current way to prevent a user from adding more vcpus to a
pool than there are pcpus (if nothing else, by creating a new VM in a
given pool), or from taking pcpus from a pool in which #vcpus >=
#pcpus.
The null scheduler deals with this by having a queue of "unassigned"
vcpus that are waiting for a free pcpu. When a pcpu becomes
available, it will do the assignment. When a pcpu that has a vcpu is
assigned is removed from the pool, that vcpu is assigned to a
different pcpu if one is available; if not, it is put on the list.
In the case of shim mode, this also seems to happen whenever curvcpus
< maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which
to schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule,
of which (maxvcpus-curvcpus) are marked down. In this case, it also
seems that the null scheduler sometimes schedules a down vcpu when
there are up vcpus on the list; meaning that the up vcpus are never
scheduled.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Dario Faggioli <raistlin@linux.it>
---
Changes since v1:
- Force a rescheduling when a vCPU is brought up.
- Try to pick a vCPU from the runqueue if running the idle vCPU.
- Add George Dunlap description of the problem to the commit log.
Roger Pau Monne [Thu, 30 Nov 2017 09:53:26 +0000 (09:53 +0000)]
xen/x86: report domain id on cpuid
Use the ECX register of the hypervisor leaf 5. The EAX register on
this leaf is a flags field that can be used to notice the presence of
the domain id in ECX. Note that this is only available to HVM guests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
- Use leaf 5 instead.
Roger Pau Monne [Tue, 28 Nov 2017 09:54:17 +0000 (09:54 +0000)]
xen/x86: make VGA support selectable
Through a Kconfig option. Enable it by default, and disable it for the
PV-in-PVH shim.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v1:
- Make the VGA option dependent on the shim one.
Roger Pau Monne [Tue, 9 Jan 2018 12:51:37 +0000 (12:51 +0000)]
x86/guest: setup event channel upcall vector
And a dummy event channel upcall handler.
Note that with the current code the underlying Xen (L0) must support
HVMOP_set_evtchn_upcall_vector or else event channel setup is going to
fail. This limitation can be lifted by implementing more event channel
interrupt injection methods as a backup.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Wei Liu [Thu, 16 Nov 2017 17:56:18 +0000 (17:56 +0000)]
x86: xen pv clock time source
It is a variant of TSC clock source.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
- Use the mapped vcpu_info.
Roger Pau Monne [Thu, 28 Dec 2017 15:22:34 +0000 (15:22 +0000)]
x86/guest: map per-cpu vcpu_info area.
Mapping the per-vcpu vcpu_info area is required in order to use more
than XEN_LEGACY_MAX_VCPUS.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
- Make vcpu_info_mapped static.
- Add a BUG_ON in case VCPUOP_register_vcpu_info fails.
- Remove one indentation level in hypervisor_setup.
- Make xen_hypercall_vcpu_op return int.
Roger Pau Monne [Tue, 9 Jan 2018 11:19:44 +0000 (11:19 +0000)]
x86/guest: map shared_info page
Use an unpopulated PFN in order to map it.
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v1:
- Use an unpopulated PFN to map the shared_info page.
- Mask all event channels.
- Report XENMEM_add_to_physmap error code in case of failure.
Wei Liu [Wed, 3 Jan 2018 16:50:24 +0000 (16:50 +0000)]
xen/pvshim: keep track of used PFN ranges
Simple infrastructure to keep track of PFN space usage, so that we can
use unpopulated PFNs to map special pages like shared info and grant
table.
As rangeset depends on malloc being ready so hypervisor_setup is
introduced for things that can be initialised late in the process.
Note that the PFN is marked as reserved at least up to 4GiB (or more
if the guest has more memory). This is not a perfect solution but
avoids using the MMIO hole below 4GiB. Ideally the shim (L1) should
have a way to ask the underlying Xen (L0) which memory regions are
populated, unpopulated, or MMIO space.
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Wed, 3 Jan 2018 16:38:54 +0000 (16:38 +0000)]
xen: introduce rangeset_claim_range
Reserve a hole in a rangeset.
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
- Change function name.
- Use a local variable instead of *s.
- Add unlikely to the !prev case.
- Move the function prototype position in the header file.
Andrew Cooper [Thu, 23 Nov 2017 10:59:59 +0000 (10:59 +0000)]
xen/console: Introduce console=xen
This specifies whether to use Xen specific console output. There are
two variants: one is the hypervisor console, the other is the magic
debug port 0xe9.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Tue, 28 Nov 2017 18:30:15 +0000 (18:30 +0000)]
x86/fixmap: Modify fix_to_virt() to return a void pointer
Almost all users of fix_to_virt() actually want a pointer. Include the cast
within the definition, so the callers don't need to.
Two users which need the integer value are switched to using __fix_to_virt()
directly. A few users stay fully unchanged, due to GCC's void pointer
arithmetic extension causing the same behaviour. Most users however have
their explicit casting dropped.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Bob Moore [Thu, 8 Aug 2013 04:29:51 +0000 (12:29 +0800)]
ACPICA: Make ACPI Power Management Timer (PM Timer) optional.
PM Timer is now optional.
This support is already in Windows8 and "SHOULD" come out in ACPI 5.0A
(if all goes well).
The change doesn't affect Xen directly, because it does not rely
on the presence of the PM timer.
Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ported to Xen] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
George Dunlap [Wed, 20 Dec 2017 11:09:08 +0000 (11:09 +0000)]
libxl: Introduce hack to allow PVH mode to add a shim
libxl will look for LIBXL_PVSHIM_PATH and LIBXL_PVSHIM_CMDLINE
environment variables. If the first is present, it will boot with the
shim and the existing kernel / ramdisk. (That is, the shim as the "kernel" and the
kernel and ramdisk both as extra modules.)
If not, it will just boot the kernel / ramdisk directly (that is, with
the kernel as "kernel" and the ramdisk as a module).
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
To be replaced with proper toolstack side patches
Jonathan Ludlam [Mon, 27 Nov 2017 16:18:58 +0000 (16:18 +0000)]
tools/libxc: Multi modules support
Signed-off-by: Jonathan Ludlam <jonathan.ludlam@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Wed, 27 Dec 2017 11:50:21 +0000 (11:50 +0000)]
x86/upcall: inject a spurious event after setting upcall vector
In case the vCPU has pending events to inject. This fixes a bug that
happened if the guest mapped the vcpu info area using
VCPUOP_register_vcpu_info without having setup the event channel
upcall, and then setup the upcall vector.
In this scenario the guest would not receive any upcalls, because the
call to VCPUOP_register_vcpu_info would have marked the vCPU as having
pending events, but the vector could not be injected because it was
not yet setup.
This has not caused issues so far because all the consumers first
setup the vector callback and then map the vcpu info page, but there's
no limitation that prevents doing it in the inverse order.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 2 Dec 2016 15:00:41 +0000 (15:00 +0000)]
xen/x86: Correct mandatory and SMP barrier definitions
Barriers are a complicated topic, a source of confusion, and their incorrect
use is a common cause of bugs. It really doesn't help when Xen's API is the
same as Linux, but its ABI different.
Bring the two back in line, so programmers stand a chance of actually getting
their usage correct.
Drop the links in the comment, both of which are now stale. Instead, refer to
the vendor system manuals in a generic way.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 2 Dec 2016 15:00:41 +0000 (15:00 +0000)]
xen/x86: Drop unnecessary barriers
x86's current implementation of wmb() is a compiler barrier. As a result, the
only change in this patch is to remove an mfence instruction from
cpuidle_disable_deep_cstate().
None of these barriers serve any purpose. They are not synchronising with
remote cpus, and their compiler-barrier properties are not needed for
correctness purposes.
Furthermore, these wmb()'s specifically do not want to turn into sfence
instructions in future changes where wmb()'s implementation is corrected.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Various minor optimizations in rb_erase():
- Avoid multiple loading of node->__rb_parent_color when computing parent
and color information (possibly not in close sequence, as there might
be further branches in the algorithm)
- In the 1-child subcase of case 1, copy the __rb_parent_color field from
the erased node to the child instead of recomputing it from the desired
parent and color
- When searching for the erased node's successor, differentiate between
cases 2 and 3 based on whether any left links were followed. This avoids
a condition later down.
- In case 3, keep a pointer to the erased node's right child so we don't
have to refetch it later to adjust its parent.
- In the no-childs subcase of cases 2 and 3, place the rebalance assigment
last so that the compiler can remove the following if(rebalance) test.
Also, added some comments to illustrate cases 2 and 3.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 4f035ad67f4633c233cb3642711d49b4efc9c82d]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
rbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color()
An interesting observation for rb_erase() is that when a node has
exactly one child, the node must be black and the child must be red.
An interesting consequence is that removing such a node can be done by
simply replacing it with its child and making the child black,
which we can do efficiently in rb_erase(). __rb_erase_color() then
only needs to handle the no-childs case and can be modified accordingly.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 46b6135a7402ac23c5b25f2bd79b03bab8f98278]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
When looking to fetch a node's sibling, we went through a sequence of:
- check if node is the parent's left child
- if it is, then fetch the parent's right child
This can be replaced with:
- fetch the parent's right child as an assumed sibling
- check that node is NOT the fetched child
This avoids fetching the parent's left child when node is actually
that child. Saves a bit on code size, though it doesn't seem to make
a large difference in speed.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 59633abf34e2f44b8e772a2c12a92132aa7c2220]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 3 Jan 2018 10:05:05 +0000 (11:05 +0100)]
simplify xenmem_add_to_physmap_batch()
There's no need for
- advancing the handles and at the same time using
__copy_{from,to}_guest_offset(),
- an "out" label,
- local variables "done" and (function scope) "rc".
To better reflect its resulting use also rename the function's "start"
parameter to "extent".
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 3 Jan 2018 10:04:26 +0000 (11:04 +0100)]
x86/E820: improve insn selection
..., largely to shrink code size a little:
- use TEST instead of CMP with zero immediate
- use MOVZWL instead of AND with 0xffff immediate
- compute final highmem_bk value in registers, accessing memory just
once
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 3 Jan 2018 10:03:56 +0000 (11:03 +0100)]
x86/E820: don't overrun array
The bounds check needs to be done after the increment, not before, or
else it needs to use a one lower immediate. Also use word operations
rather than byte ones for both the increment and the compare (allowing
E820_BIOS_MAX to be more easily bumped, should the need ever arise).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 3 Jan 2018 10:02:10 +0000 (11:02 +0100)]
x86/Intel: drop another 32-bit leftover
None of the models MISC_ENABLE MSR access is excluded for support 64-bit
mode - drop the conditional from early_init_intel(). Also convert
pointless rdmsr_safe() elsewhere to rdmsrl().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 5 Oct 2016 11:42:15 +0000 (12:42 +0100)]
xen/x86: Replace appropriate mandatory barriers with SMP barriers
There is no functional change. Xen currently assignes smp_* meaning to
the non-smp_* barriers.
All of these uses are just to deal with shared memory between multiple
processors, which means that the smp_*() varients are the correct ones to use.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
rbtree: low level optimizations in __rb_erase_color()
In __rb_erase_color(), we often already have pointers to the nodes being
rotated and/or know what their colors must be, so we can generate more
efficient code than the generic __rb_rotate_left() and __rb_rotate_right()
functions.
Also when the current node is red or when flipping the sibling's color,
the parent is already known so we can use the more efficient
rb_set_parent_color() function to set the desired color.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 6280d2356fd8ad0936a63c10dc1e6accf48d0c61]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
rbtree: optimize case selection logic in __rb_erase_color()
In __rb_erase_color(), we have to select one of 3 cases depending on the
color on the 'other' node children. If both children are black, we flip a
few node colors and iterate. Otherwise, we do either one or two tree
rotations, depending on the color of the 'other' child opposite to 'node',
and then we are done.
The corresponding logic had duplicate checks for the color of the 'other'
child opposite to 'node'. It was checking it first to determine if both
children are black, and then to determine how many tree rotations are
required. Rearrange the logic to avoid that extra check.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit e125d1471a4f8f1bf7ea9a83deb8d23cb40bd712]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
rbtree: adjust node color in __rb_erase_color() only when necessary
In __rb_erase_color(), we were always setting a node to black after
exiting the main loop. And in one case, after fixing up the tree to
satisfy all rbtree invariants, we were setting the current node to root
just to guarantee a loop exit, at which point the root would be set to
black. However this is not necessary, as the root of an rbtree is already
known to be black. The only case where the color flip is required is when
we exit the loop due to the current node being red, and it's easiest to
just do the flip at that point instead of doing it after the loop.
[adrian.hunter@intel.com: perf tools: fix build for another rbtree.c change] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit d6ff1273928ebf15466a85b7e1810cd00e72998b]
Ported only rbtree.c to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
rbtree: low level optimizations in rb_insert_color()
- Use the newly introduced rb_set_parent_color() function to flip the color
of nodes whose parent is already known.
- Optimize rb_parent() when the node is known to be red - there is no need
to mask out the color in that case.
- Flipping gparent's color to red requires us to fetch its rb_parent_color
field, so we can reuse it as the parent value for the next loop iteration.
- Do not use __rb_rotate_left() and __rb_rotate_right() to handle tree
rotations: we already have pointers to all relevant nodes, and know their
colors (either because we want to adjust it, or because we've tested it,
or we can deduce it as black due to the node proximity to a known red node).
So we can generate more efficient code by making use of the node pointers
we already have, and setting both the parent and color attributes for
nodes all at once. Also in Case 2, some node attributes don't have to
be set because we know another tree rotation (Case 3) will always follow
and override them.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 5bc9188aa207dafd47eab57df7c4fe5b3d3f636a]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
rbtree: adjust root color in rb_insert_color() only when necessary
The root node of an rbtree must always be black. However,
rb_insert_color() only needs to maintain this invariant when it has been
broken - that is, when it exits the loop due to the current (red) node
being the root. In all other cases (exiting after tree rotations, or
exiting due to an existing black parent) the invariant is already
satisfied, so there is no need to adjust the root node color.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 6d58452dc066db61acdff7b84671db1b11a3de1c]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
rbtree: break out of rb_insert_color loop after tree rotation
It is a well known property of rbtrees that insertion never requires more
than two tree rotations. In our implementation, after one loop iteration
identified one or two necessary tree rotations, we would iterate and look
for more. However at that point the node's parent would always be black,
which would cause us to exit the loop.
We can make the code flow more obvious by just adding a break statement
after the tree rotations, where we know we are done. Additionally, in the
cases where two tree rotations are necessary, we don't have to update the
'node' pointer as it wouldn't be used until the next loop iteration, which
we now avoid due to this break statement.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 1f0528653e41ec230c60f5738820e8a544731399]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
rbtree: move some implementation details from rbtree.h to rbtree.c
rbtree users must use the documented APIs to manipulate the tree
structure. Low-level helpers to manipulate node colors and parenthood are
not part of that API, so move them to lib/rbtree.c
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit bf7ad8eeab995710c766df49c9c69a8592ca0216]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Empty nodes have no color. We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros. Also,
we can get rid of the rb_init_node function which had been introduced by
commit 88d19cf37952 ("timers: Add rb_init_node() to allow for stack
allocated rb nodes") to avoid some issue with the empty node's color not
being initialized.
I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though. axboe introduced them in commit 10fd48f2376d
("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev"). The way I
see it, the 'empty node' abstraction is only used by rbtree users to
flag nodes that they haven't inserted in any rbtree, so asking the
predecessor or successor of such nodes doesn't make any sense.
One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups. This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.
[sfr@canb.auug.org.au: fix net/ceph/osd_client.c build] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 4c199a93a2d36b277a9fd209a0f2793f8460a215]
Ported rbtree.h and rbtree.c changes which are relevant to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wolfram Strepp [Wed, 20 Dec 2017 17:00:49 +0000 (18:00 +0100)]
rbtree: remove redundant if()-condition in rb_erase()
Furthermore, notice that the initial checks:
if (!node->rb_left)
child = node->rb_right;
else if (!node->rb_right)
child = node->rb_left;
else
{
...
}
guarantee that old->rb_right is set in the final else branch, therefore
we can omit checking that again.
Signed-off-by: Wolfram Strepp <wstrepp@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Linux commit 4b324126e0c6c3a5080ca3ec0981e8766ed6f1ee]
Ported to Xen.
Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 20 Dec 2017 11:52:15 +0000 (11:52 +0000)]
x86/nops: Switch to the P6 nops as a compile-time default
Along with c/s d7128e735031 switching the runtime choice of best nops, switch
the compile-time default to P6 nops. This is more efficient on most
processors for alternative points which add/remove code, rather than switch
between two different pieces of code.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 20 Dec 2017 09:12:11 +0000 (10:12 +0100)]
x86/shadow: make 1-bit-disable match 1-bit-enable
shadow_one_bit_enable() sets PG_SH_enable (if not already set of course)
in addition to the bit being requested. Make shadow_one_bit_disable()
behave similarly - clear PG_SH_enable if that's the only bit remaining.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Wed, 20 Dec 2017 09:05:16 +0000 (10:05 +0100)]
x86/shadow: ignore sh_pin() failure in one more case
Following what we've already done in the XSA-250 fix, convert another
sh_pin() caller to no longer fail the higher level operation if pinning
fails, as pinning is a performance optimization only in those places.
Suggested-by: Tim Deegan <tim@xen.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Wed, 20 Dec 2017 09:04:16 +0000 (10:04 +0100)]
x86/shadow: drop further 32-bit relics
PV guests don't ever get shadowed in other than 4-level mode anymore;
commit 5a3ce8f85e ("x86/shadow: drop stray name tags from
sh_{guest_get,map}_eff_l1e()") didn't go quite fare enough (and there's
a good chance that further cleanup opportunity exists, which I simply
didn't notice).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Wed, 20 Dec 2017 09:03:20 +0000 (10:03 +0100)]
x86: introduce NOP9 forms
Both Intel and AMD recommend an operand-size-override-prefixed long NOP
form for covering 9 bytes, so introduce this and use it in p6_nops[] to
allow further reducing the number of NOPs needed when covering larger
ranges.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Wed, 20 Dec 2017 09:00:16 +0000 (10:00 +0100)]
x86/dom0: remove is_pv_domain leftovers from the PV domain builder
Those where added when PVHv1 was sharing the domain builder with PV.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Wed, 20 Dec 2017 08:59:21 +0000 (09:59 +0100)]
x86/dom0: remove autotranslate leftovers
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>