]> xenbits.xensource.com Git - xen.git/log
xen.git
5 years agoxen/grant_table: Rework the prototype of _set_status* for lisibility
Julien Grall [Mon, 29 Apr 2019 14:05:17 +0000 (15:05 +0100)]
xen/grant_table: Rework the prototype of _set_status* for lisibility

It is not clear from the parameters name whether domid and gt_version
correspond to the local or remote domain. A follow-up patch will make
them more confusing.

So rename domid (resp. gt_version) to ldomid (resp. rgt_version). At
the same time re-order the parameters to hopefully make it more
readable.

This is part of XSA-295.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Add an isb() before reading CNTPCT_EL0 to prevent re-ordering
Julien Grall [Mon, 29 Apr 2019 14:05:16 +0000 (15:05 +0100)]
xen/arm: Add an isb() before reading CNTPCT_EL0 to prevent re-ordering

Per D8.2.1 in ARM DDI 0487C.a, "a read to CNTPCT_EL0 can occur
speculatively and out of order relative to other instructions executed
on the same PE."

Add an instruction barrier to get accurate number of cycles when
requested in get_cycles(). For the other users of CNPCT_EL0, replace by
a call to get_cycles().

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: mm: Protect Xen page-table update with a spinlock
Julien Grall [Mon, 18 Mar 2019 18:06:55 +0000 (18:06 +0000)]
xen/arm: mm: Protect Xen page-table update with a spinlock

The function create_xen_entries() may be called concurrently. For
instance, while the vmap allocation is protected by a spinlock, the
mapping is not.

The implementation create_xen_entries() contains quite a few TOCTOU
races such as when allocating the 3rd-level page-tables.

Thankfully, they are pretty hard to reach as page-tables are allocated
once and never released. Yet it is possible, so we need to protect with
a spinlock to avoid corrupting the page-tables.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii.anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: mm: Avoid cleaning the cache for secondary CPUs page-tables
Julien Grall [Sun, 21 Apr 2019 18:53:12 +0000 (19:53 +0100)]
xen/arm32: mm: Avoid cleaning the cache for secondary CPUs page-tables

The page-table walker is configured by TCR_EL2 to use the same
shareability and cacheability as the access performed when updating the
page-tables. This means cleaning the cache for secondary CPUs runtime
page-tables is unnecessary.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoargo: correctly report pending message length
Nicholas Tsirakis [Wed, 12 Jun 2019 12:34:45 +0000 (08:34 -0400)]
argo: correctly report pending message length

When a message is requeue'd in Xen's internal queue, the queue
entry contains the length of the message so that Xen knows to
send a VIRQ to the respective domain when enough space frees up
in the ring. Due to a small bug, however, Xen doesn't populate
the length of the msg if a given write fails, so this length is
always reported as zero. This causes Xen to spuriously wake up
a domain even when the ring doesn't have enough space.

This patch makes sure that the msg len is properly reported by
populating it in the event of a write failure.

Signed-off-by: Nicholas Tsirakis <tsirakisn@ainfosec.com>
Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com>
5 years agoxen/arm: mm: Flush the TLBs even if a mapping failed in create_xen_entries
Julien Grall [Mon, 18 Mar 2019 18:01:31 +0000 (18:01 +0000)]
xen/arm: mm: Flush the TLBs even if a mapping failed in create_xen_entries

At the moment, create_xen_entries will only flush the TLBs if the full
range has successfully been updated. This may lead to leave unwanted
entries in the TLBs if we fail to update some entries.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: tlbflush: Rework TLB helpers
Julien Grall [Wed, 3 Apr 2019 22:53:23 +0000 (23:53 +0100)]
xen/arm: tlbflush: Rework TLB helpers

All the TLBs helpers invalidate all the TLB entries are using the same
pattern:
    DSB SY
    TLBI ...
    DSB SY
    ISB

This pattern is following pattern recommended by the Arm Arm to ensure
visibility of updates to translation tables (see K11.5.2 in ARM DDI
0487D.b).

We have been a bit too eager in Xen and use system-wide DSBs when this
can be limited to the inner-shareable domain.

Furthermore, the first DSB can be restrict further to only store in the
inner-shareable domain. This is because the DSB is here to ensure
visibility of the update to translation table walks.

Lastly, there are a lack of documentation in most of the TLBs helper.

Rather than trying to update the helpers one by one, this patch
introduce a per-arch macro to generate the TLB helpers. This will be
easier to update the TLBs helper in the future and the documentation.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Gather all TLB flush helpers in tlbflush.h
Julien Grall [Thu, 4 Apr 2019 17:35:09 +0000 (18:35 +0100)]
xen/arm: Gather all TLB flush helpers in tlbflush.h

At the moment, TLB helpers are scattered in 2 headers: page.h (for
Xen TLB helpers) and tlbflush.h (for guest TLB helpers).

This patch is gathering all of them in tlbflush. This will help to
uniformize and update the logic of the helpers in follow-up patches.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: page: Clarify the Xen TLBs helpers name
Julien Grall [Thu, 4 Apr 2019 17:26:51 +0000 (18:26 +0100)]
xen/arm: page: Clarify the Xen TLBs helpers name

Now that we dropped flush_xen_text_tlb_local(), we have only one set of
helpers acting on Xen TLBs. There naming are quite confusing because the
TLB instructions used will act on both Data and Instruction TLBs.

Take the opportunity to rework the documentation which can be confusing
to read as they don't match the implementation. Note the mention about
the instruction cache maintenance has been removed because modifying
mapping does not require instruction cache maintenance.

Lastly, switch from unsigned long to vaddr_t as the function technically
deal with virtual address.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Don't boot Xen on platform using AIVIVT instruction caches
Julien Grall [Mon, 13 May 2019 15:02:18 +0000 (16:02 +0100)]
xen/arm: Don't boot Xen on platform using AIVIVT instruction caches

The AIVIVT is a type of instruction cache available on Armv7. This is
the only cache not implementing the IVIPT extension and therefore
requiring specific care.

To simplify maintenance requirements, Xen will not boot on platform
using AIVIVT cache.

This should not be an issue because Xen Arm32 can only boot on a small
number of processors (see arch/arm/arm32/proc-v7.S). All of them are
not using AIVIVT cache.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86/boot: Drop vestigial support for pre-SIPI APICs
Andrew Cooper [Wed, 12 Jun 2019 10:28:05 +0000 (11:28 +0100)]
x86/boot: Drop vestigial support for pre-SIPI APICs

The current code in do_boot_cpu() makes a CMOS write (even in the case of an
FADT reduced hardware configuration) and two writes into the BDA for the
start_eip segment and offset.

BDA 0x67 and 0x69 hail from the days of the DOS and the 286, when IBM put
together the fast way to return from Protected Mode back to Real Mode (via a
deliberate triple fault).  This vector, when set, redirects the early boot
logic back into OS control.

It is also used by early MP systems, before the Startup IPI message became
standard, which in practice was before Local APICs became integrated into CPU
cores.

Support for non-integrated APICs was dropped in c/s 7b0007af "xen/x86: Remove
APIC_INTEGRATED() checks" because there are no 64-bit capable systems without
them.  Therefore, drop smpboot_{setup,restore}_warm_reset_vector().

Dropping smpboot_setup_warm_reset_vector() also lets us drop
TRAMPOLINE_{HIGH,LOW}, which lets us drop mach_wakecpu.h entirely.  The final
function in smpboot_hooks.h is smpboot_setup_io_apic() and has a single
caller, so expand it inline and delete smpboot_hooks.h as well.

This removes all reliance on CMOS and the BDA from the AP boot path, which is
especially of interest on reduced_hardware boots and EFI systems.

This was discovered while investigating Xen's use of the BDA during kexec.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/pv: Add Hygon Dhyana support to emulate MSRs access
Pu Wen [Wed, 12 Jun 2019 12:54:25 +0000 (20:54 +0800)]
x86/pv: Add Hygon Dhyana support to emulate MSRs access

The Hygon Dhyana CPU supports lots of MSRs(such as perf event select and
counter MSRs, hardware configuration MSR, MMIO configuration base address
MSR, MPERF/APERF MSRs) as AMD CPU does, so add Hygon Dhyana support to the
PV emulation infrastructure by using the code path of AMD.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/acpi: Add Hygon Dhyana support
Pu Wen [Wed, 12 Jun 2019 12:53:58 +0000 (20:53 +0800)]
x86/acpi: Add Hygon Dhyana support

Add Hygon Dhyana support to the acpi cpufreq and cpuidle subsystems by
using the code path of AMD.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/sched: let sched_switch_sched() return new lock address
Juergen Gross [Tue, 28 May 2019 10:32:16 +0000 (12:32 +0200)]
xen/sched: let sched_switch_sched() return new lock address

Instead of setting the scheduler percpu lock address in each of the
switch_sched instances of the different schedulers do that in
schedule_cpu_switch() which is the single caller of that function.
For that purpose let sched_switch_sched() just return the new lock
address.

This allows to set the new struct scheduler and struct schedule_data
values in the percpu area in schedule_cpu_switch() instead of the
schedulers, too.

It should be noted that in credit2 the lock used to be set while still
holding the global scheduler write lock, which will no longer be true
with the new scheme applied. This is actually no problem as the write
lock is meant to guard the call of init_pdata() which still is true.

While there, turn the full barrier, which was overkill, into an
smp_wmb(), matching with the one implicit in managing to take the
lock.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoschedule: move credit scheduler specific member to its privates
Andrii Anisov [Wed, 12 Jun 2019 09:35:50 +0000 (12:35 +0300)]
schedule: move credit scheduler specific member to its privates

The vcpu structure member last_run_time is used by credit scheduler only.
In order to get better encapsulation, it is moved from a generic
structure to the credit scheduler private vcpu definition. Also, rename
the member to last_sched_time in order to reflect that it is the time
when the vcpu went through the scheduling path.

With this move we have slight changes in functionality:
 - last_sched_time is not updated for an idle vcpu. But the idle vcpu is,
   in fact, a per-pcpu stub and never migrates so last_sched_time is
   meaningless for it.
 - The value of last_sched_time is updated on every schedule, even if the
   vcpu is not being changed. It is still ok, because last_sched_time is
   only used for runnable vcpu migration decision, and we have it correct
   at that moment. Scheduling parameters and statistics are tracked by
   other entities.

Reducing code and data usage when not running credit scheduler is another
nice side effect.

While here, also:
  - turn last_sched_time into s_time_t, which is more appropriate.
  - properly const-ify related argument of __csched_vcpu_is_cache_hot().

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoargo: warn sendv() caller when ring is full
Nicholas Tsirakis [Tue, 11 Jun 2019 17:11:24 +0000 (13:11 -0400)]
argo: warn sendv() caller when ring is full

In its current state, if the destination ring is full, sendv()
will requeue the message and return the rc of pending_requeue(),
which will return 0 on success. This prevents the caller from
distinguishing the difference between a successful write and a
message that needs to be resent at a later time.

Instead, capture the -EAGAIN value returned from ringbuf_insert()
and *only* overwrite it if the rc of pending_requeue() is non-zero.
This allows the caller to make intelligent decisions on -EAGAIN and
still be alerted if the pending message fails to requeue.

Signed-off-by: Nicholas Tsirakis <tsirakisn@ainfosec.com>
Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com>
5 years agoxen/sched: add inline wrappers for calling per-scheduler functions
Juergen Gross [Tue, 28 May 2019 10:32:15 +0000 (12:32 +0200)]
xen/sched: add inline wrappers for calling per-scheduler functions

Instead of using the SCHED_OP() macro to call the different scheduler
specific functions add inline wrappers for that purpose.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: only allow schedulers with all mandatory functions available
Juergen Gross [Tue, 28 May 2019 10:32:14 +0000 (12:32 +0200)]
xen/sched: only allow schedulers with all mandatory functions available

Some functions of struct scheduler are mandatory. Test those in the
scheduler initialization loop to be present and drop schedulers not
complying.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoautomation: Fix CI with the fedora container
Andrew Cooper [Tue, 11 Jun 2019 10:09:06 +0000 (11:09 +0100)]
automation: Fix CI with the fedora container

A recent rebuild of the CI contaniers switched from Fedora 29 to 30 because
the dockerfile is targetting latest.

Unfortunately, the version of iPXE in master doesn't build with the default
GCC in Fedora 30, which is blocking all CI activity.

Switch from latest to an explicit version, to avoid future breakage.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agox86/AMD: make use of CPUID leaf 0xb when available
Jan Beulich [Tue, 11 Jun 2019 15:21:34 +0000 (17:21 +0200)]
x86/AMD: make use of CPUID leaf 0xb when available

Initially I did simply stumble across a backport of Linux commit
e0ceeae708 ("x86/CPU/hygon: Fix phys_proc_id calculation logic for
multi-die processors") to our kernels. There I got puzzled by the claim
that a similar change isn't needed on the AMD side. As per the web page
cited [1], there aren't supposed to be affected AMD processors, but
according to my reading there are: The EPYC 7000 series comes with 8,
16, 24, or 32 cores, which I imply to be 1, 2, 3, or 4 die processors.
And many of them have "1P/2P" in the "socket count" column. Therefore
our calculation, being based on CPUID.80000008.EBX[15:12], would be
similarly wrong on such 2-socket 1- or 2-die systems.

Checking Linux code I then found that they don't even rely on the
calculation we currently use anymore, at least not in the case when
leaf 0xb is available (which is the case on Fam17). Let's follow
Suravee's Linux commit 3986a0a805 ("x86/CPU/AMD: Derive CPU topology
from CPUID function 0xB when available") in this regard to address this.

To avoid logging duplicate information, make the function return bool.
Move its and detect_ht()'s declaration to a private header at the same
time.

[1] https://www.amd.com/en/products/specifications/processors

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agopci: make PCI_SBDF3 return a pci_sbdf_t
Roger Pau Monné [Tue, 11 Jun 2019 15:20:31 +0000 (17:20 +0200)]
pci: make PCI_SBDF3 return a pci_sbdf_t

And adjust it's users.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: make PCI_SBDF2 return a pci_sbdf_t
Roger Pau Monné [Tue, 11 Jun 2019 15:19:06 +0000 (17:19 +0200)]
pci: make PCI_SBDF2 return a pci_sbdf_t

And adjust it's only user.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: make PCI_SBDF return a pci_sbdf_t
Roger Pau Monné [Tue, 11 Jun 2019 15:18:22 +0000 (17:18 +0200)]
pci: make PCI_SBDF return a pci_sbdf_t

And adjust it's only user.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: introduce a pci_sbdf_t field to pci_dev
Roger Pau Monné [Tue, 11 Jun 2019 15:17:38 +0000 (17:17 +0200)]
pci: introduce a pci_sbdf_t field to pci_dev

And use an union with the current seg, bus and devfn fields to make
fields point to the same underlying data.

No functional change.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: introduce a devfn field to pci_sbdf_t
Roger Pau Monné [Tue, 11 Jun 2019 15:16:59 +0000 (17:16 +0200)]
pci: introduce a devfn field to pci_sbdf_t

This is equivalent to the current extfunc field in term of contents.

Switch the two current users of extfunc to use devfn instead for
correctness.

No functional change.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: rename func field to fn
Roger Pau Monné [Tue, 11 Jun 2019 15:16:19 +0000 (17:16 +0200)]
pci: rename func field to fn

In preparation for adding a devfn field. This makes the naming more
consistent, as the devfn field encloses both the dev and the fn
fields.

No functional change intended.

Requested-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86emul: support AVX512DQ floating point manipulation insns
Jan Beulich [Tue, 11 Jun 2019 15:15:24 +0000 (17:15 +0200)]
x86emul: support AVX512DQ floating point manipulation insns

This completes support of AVX512DQ in the insn emulator.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: support AVX512F floating point manipulation insns
Jan Beulich [Tue, 11 Jun 2019 15:13:36 +0000 (17:13 +0200)]
x86emul: support AVX512F floating point manipulation insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoiommu/arm: add missing return
Oleksandr Tyshchenko [Thu, 30 May 2019 12:02:28 +0000 (15:02 +0300)]
iommu/arm: add missing return

In case iommu_ops have been already set, we should not update it.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: vtimer: Change the return value to void for virt_timer_[save|restore]
Baodong Chen [Mon, 10 Jun 2019 05:07:54 +0000 (13:07 +0800)]
xen/arm: vtimer: Change the return value to void for virt_timer_[save|restore]

virt_timer_{save, return} always return 0 and none of the caller
actually check it. So change the return type to void.

Signed-off-by: Baodong Chen <chenbaodong@mxnavi.com>
[julien: Rework the commit message]
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: domain: Remove redundant memset for v->arch.saved_context
Baodong Chen [Mon, 10 Jun 2019 05:15:47 +0000 (13:15 +0800)]
xen/arm: domain: Remove redundant memset for v->arch.saved_context

arch.saved_context is already zeroed in alloc_vcpu_struct() by
clear_page(). So there are no need to memset it again in
arch_vcpu_create().

Signed-off-by: Baodong Chen <chenbaodong@mxnavi.com>
[julien: Rework the commit message]
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agoautomation: Add an 'all' target for container maintenance
Andrew Cooper [Mon, 10 Jun 2019 17:52:04 +0000 (18:52 +0100)]
automation: Add an 'all' target for container maintenance

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agoautomation: add clang and lld 8 tests to gitlab
Roger Pau Monne [Mon, 10 Jun 2019 16:32:46 +0000 (18:32 +0200)]
automation: add clang and lld 8 tests to gitlab

Using clang and lld 8 requires installing the packages from the
official llvm apt repositories, so modify the Debian Docker files for
stretch and unstable to add the llvm repo and install clang and lld
from it.

Also add some jobs to test building Xen with clang 8 and lld.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agoxen/arm: gic: Defer the decision to unmask interrupts to do_{LPI, IRQ}()
Andrii Anisov [Mon, 27 May 2019 09:29:30 +0000 (12:29 +0300)]
xen/arm: gic: Defer the decision to unmask interrupts to do_{LPI, IRQ}()

At the moment, interrupts are unmasked by gic_interrupt() before
calling do_{IRQ, LPI}(). In the case of handling an interrupt routed
to guests, its priority will be dropped, via desc->handler->end()
called from do_irq(), with interrupt unmasked.

In other words:
    - Until the priority is dropped, only higher priority interrupt
    can be received. Today, only Xen interrupts have higher priority.
    - As soon as priority is dropped, any interrupt can be received.

This means the purpose of the loop in gic_interrupt() is defeated as
all interrupts may get trapped earlier. To reinstate the purpose of
the loop (and prevent the trap), interrupts should be masked when
dropping the priority.

For interrupts routed to Xen, priority will always be dropped with
interrupts masked. So the issue is not present. However, it means
that we are pointless try to mask the interrupts.

To avoid conflicting behavior between interrupt handling,
gic_interrupt() is now keeping interrupts masked and defer the decision
to do_{LPI, IRQ}.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
[julien: Reword the commit message]
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/device-tree: Add ability to handle nodes with interrupts-extended prop
Oleksandr Tyshchenko [Tue, 21 May 2019 17:37:34 +0000 (20:37 +0300)]
xen/device-tree: Add ability to handle nodes with interrupts-extended prop

The "interrupts-extended" property is a special form for use when
a node needs to reference multiple interrupt parents.

According to:
Linux/Documentation/devicetree/bindings/interrupt-controller/interrupts.txt

But, there are cases when "interrupts-extended" property is used for
"outside /soc node" with a single interrupt parent as an equivalent of
pairs ("interrupt-parent" + "interrupts").

A good example here is ARCH timer node for R-Car Gen3/Gen2 family,
which is mandatory device for Xen usage on ARM. And without ability
to handle such nodes, Xen fails to operate.

So, this patch adds required support for Xen to be able to handle
nodes with that property.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/device-tree: Add dt_count_phandle_with_args helper
Oleksandr Tyshchenko [Tue, 21 May 2019 17:37:33 +0000 (20:37 +0300)]
xen/device-tree: Add dt_count_phandle_with_args helper

Port Linux helper of_count_phandle_with_args for counting
number of phandles in a property.

Please note, this helper is ported from Linux v4.6.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: Rework secondary_start prototype
Julien Grall [Thu, 11 Apr 2019 20:28:50 +0000 (21:28 +0100)]
xen/arm: Rework secondary_start prototype

None of the parameters of secondary_start are actually used. So turn
secondary_start to a function with no parameters.

Also modify the assembly code to avoid setting-up the registers before
calling start_secondary.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andrii Anisov <andrii.anisov@epam.com>
5 years agoxen/arm: Remove parameter cpuid from start_xen
Julien Grall [Thu, 11 Apr 2019 20:35:25 +0000 (21:35 +0100)]
xen/arm: Remove parameter cpuid from start_xen

The parameter cpuid is not used by start_xen. So remove it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii.anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86: Fix boot with CONFIG_XSM enabled following c/s 7177f589ba
Andrew Cooper [Fri, 7 Jun 2019 11:56:48 +0000 (12:56 +0100)]
x86: Fix boot with CONFIG_XSM enabled following c/s 7177f589ba

Currently, booting staging fails with:

  (XEN) Using APIC driver default
  (XEN) ----[ Xen-4.13-unstable  x86_64  debug=y   Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e008:[<ffff82d08038f66e>] __x86_indirect_thunk_rax+0xe/0x10
  (XEN) RFLAGS: 0000000000010016   CONTEXT: hypervisor
  (XEN) rax: c2c2c2c2c2c2c2c2   rbx: ffff83003f4cc000   rcx: 0000000000000000
  <snip>
  (XEN) Xen code around <ffff82d08038f66e> (__x86_indirect_thunk_rax+0xe/0x10):
  (XEN)  ae e8 eb fb 48 89 04 24 <c3> 90 e8 05 00 00 00 0f ae e8 eb fb 48 89 0c 24
  (XEN) Xen stack trace from rsp=ffff82d080827d28:
  (XEN)    c2c2c2c2c2c2c2c2 ffff82d080207588 ffff82d080827d68 0000000000000000
  <snip>
  (XEN) Xen call trace:
  (XEN)    [<ffff82d08038f66e>] __x86_indirect_thunk_rax+0xe/0x10
  (XEN)    [<ffff82d0806078a9>] setup_system_domains+0x18/0xab
  (XEN)    [<ffff82d08062d9c8>] __start_xen+0x1ea9/0x2935
  (XEN)    [<ffff82d0802000f3>] __high_start+0x53/0x55
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) GENERAL PROTECTION FAULT
  (XEN) [error_code=0000]
  (XEN) ****************************************

UBSAN (which I happened to have active in my build at the time) identifies the
problem explicitly:

  (XEN) Using APIC driver default
  (XEN) ================================================================================
  (XEN) UBSAN: Undefined behaviour in /local/xen.git/xen/include/xsm/xsm.h:309:19
  (XEN) member access within null pointer of type 'struct xsm_operations'
  (XEN) ----[ Xen-4.13-unstable  x86_64  debug=y   Not tainted ]----

"adjust system domain creation (and call it earlier on x86)" didn't account
for the fact that domain_create() depends on XSM already being set up.

Therefore, domain_create() follows xsm_ops->alloc_security_domain() which is
offset 0 from a NULL pointer, meaning that we execute the 16bit IVT until
happening to explode in __x86_indirect_thunk_rax().

There is nothing very interesting that xsm_multiboot_init() does more than
allocating memory, which means that it is safe to move earlier during setup.

The resulting boot now looks like:

  (XEN) Using APIC driver default
  (XEN) XSM Framework v1.0.0 initialized
  (XEN) Flask: 128 avtab hash slots, 283 rules.
  (XEN) Flask: 128 avtab hash slots, 283 rules.
  (XEN) Flask:  4 users, 3 roles, 38 types, 2 bools
  (XEN) Flask:  13 classes, 283 rules
  (XEN) Flask:  Starting in enforcing mode.
  (XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0]

and

  (XEN) Using APIC driver default
  (XEN) XSM Framework v1.0.0 initialized
  (XEN) Initialising XSM SILO mode
  (XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0]

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/pv: Fix undefined behaviour in check_descriptor()
Andrew Cooper [Thu, 6 Jun 2019 14:44:21 +0000 (15:44 +0100)]
x86/pv: Fix undefined behaviour in check_descriptor()

UBSAN reports:

  (XEN) ================================================================================
  (XEN) UBSAN: Undefined behaviour in x86_64/mm.c:1108:31
  (XEN) left shift of 255 by 24 places cannot be represented in type 'int'
  (XEN) ----[ Xen-4.13-unstable  x86_64  debug=y   Tainted:    H ]----
  (XEN) CPU:    60
  (XEN) RIP:    e008:[<ffff82d0802a54ce>] ubsan.c#ubsan_epilogue+0xa/0xc2
  <snip>
  (XEN) Xen call trace:
  (XEN)    [<ffff82d0802a54ce>] ubsan.c#ubsan_epilogue+0xa/0xc2
  (XEN)    [<ffff82d0802a6009>] __ubsan_handle_shift_out_of_bounds+0x15d/0x16c
  (XEN)    [<ffff82d08033abd7>] check_descriptor+0x191/0x3dd
  (XEN)    [<ffff82d0804ef920>] do_update_descriptor+0x7f/0x2b6
  (XEN)    [<ffff82d0804efb75>] compat_update_descriptor+0x1e/0x20
  (XEN)    [<ffff82d0804fa1cc>] pv_hypercall+0x87f/0xa6f
  (XEN)    [<ffff82d080501acb>] do_entry_int82+0x53/0x58
  (XEN)    [<ffff82d08050702b>] entry_int82+0xbb/0xc0
  (XEN)
  (XEN) ================================================================================

As this is a constant, express it in longhand for correctness, and consistency
with the surrounding code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/irq: Fix undefined behaviour in irq_move_cleanup_interrupt()
Andrew Cooper [Thu, 6 Jun 2019 14:26:17 +0000 (15:26 +0100)]
x86/irq: Fix undefined behaviour in irq_move_cleanup_interrupt()

UBSAN reports:

  (XEN) ================================================================================
  (XEN) UBSAN: Undefined behaviour in irq.c:682:22
  (XEN) left shift of 1 by 31 places cannot be represented in type 'int'
  (XEN) ----[ Xen-4.13-unstable  x86_64  debug=y   Not tainted ]----
  (XEN) CPU:    16
  (XEN) RIP:    e008:[<ffff82d0802a54ce>] ubsan.c#ubsan_epilogue+0xa/0xc2
  <snip>
  (XEN) Xen call trace:
  (XEN)    [<ffff82d0802a54ce>] ubsan.c#ubsan_epilogue+0xa/0xc2
  (XEN)    [<ffff82d0802a6009>] __ubsan_handle_shift_out_of_bounds+0x15d/0x16c
  (XEN)    [<ffff82d08031ae77>] irq_move_cleanup_interrupt+0x25c/0x4a0
  (XEN)    [<ffff82d08031b585>] do_IRQ+0x19d/0x104c
  (XEN)    [<ffff82d08050c8ba>] common_interrupt+0x10a/0x120
  (XEN)    [<ffff82d0803b13a6>] cpu_idle.c#acpi_idle_do_entry+0x1de/0x24b
  (XEN)    [<ffff82d0803b1d83>] cpu_idle.c#acpi_processor_idle+0x5c8/0x94e
  (XEN)    [<ffff82d0802fa8d6>] domain.c#idle_loop+0xee/0x101
  (XEN)
  (XEN) ================================================================================

Switch to an unsigned shift, and correct the surrounding style.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/arm: Pair call to set_fixmap with call to clear_fixmap in copy_from_paddr
Julien Grall [Sun, 2 Dec 2018 19:12:54 +0000 (19:12 +0000)]
xen/arm: Pair call to set_fixmap with call to clear_fixmap in copy_from_paddr

At the moment, set_fixmap may replace a valid entry without following
the break-before-make sequence. This may result to TLB conflict abort.

Rather than dealing with Break-Before-Make in set_fixmap, each call to
set_fixmap in copy_from_paddr is paired with a call to clear_fixmap.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: mm: Check start is always before end in {destroy, modify}_xen_mappings
Julien Grall [Wed, 3 Apr 2019 10:14:10 +0000 (11:14 +0100)]
xen/arm: mm: Check start is always before end in {destroy, modify}_xen_mappings

The two helpers {destroy, modify}_xen_mappings don't check that the
start is always before the end. This should never happen but if it
happens, it will result to unexpected behavior.

Catch such issues earlier on by adding an ASSERT in destroy_xen_mappings
and modify_xen_mappings.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: mm: Initialize page-tables earlier
Julien Grall [Fri, 5 Apr 2019 17:47:47 +0000 (18:47 +0100)]
xen/arm: mm: Initialize page-tables earlier

Since commit f60658c6ae "xen/arm: Stop relocating Xen", the function
setup_page_tables() does not require any information from the FDT.

So the initialization of the page-tables can be done much earlier in the
boot process. The earliest setup_page_tables() can be called is after
traps have been initialized, so we can get backtrace if an error
occurred.

Moving the initialization of the page-tables also avoid the dance to map
the FDT again in the new set of page-tables.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: mm: Introduce DEFINE_PAGE_TABLE{,S} and use it
Julien Grall [Sun, 14 Apr 2019 19:59:28 +0000 (20:59 +0100)]
xen/arm: mm: Introduce DEFINE_PAGE_TABLE{,S} and use it

We have multiple static page-tables defined in arch/arm/mm.c. The
current way to define them is difficult to read and does not help when
making modification.

Two new helpers DEFINE_PAGE_TABLES (to define multiple page-tables) and
DEFINE_PAGE_TABLE (alias of DEFINE_PAGE_TABLES(..., 1)) are introduced
and now used to define static page-tables.

Note that DEFINE_PAGE_TABLES() alignment differs from what is currently
used for allocating page-tables. This is fine because page-tables are
only required to be aligned to a page-size.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: mm: Avoid to zero and clean cache for CPU0 domheap
Julien Grall [Sun, 7 Apr 2019 19:59:22 +0000 (20:59 +0100)]
xen/arm32: mm: Avoid to zero and clean cache for CPU0 domheap

The page-table walker is configured to use the same shareability and
cacheability as the access performed when updating the page-tables. This
means cleaning the cache for CPU0 domheap is unnecessary.

Furthermore, CPU0 page-tables are part of Xen binary and will already be
zeroed before been used. So it is pointless to zero the domheap again.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: head: Always zero r3 before update a page-table entry
Julien Grall [Mon, 15 Apr 2019 14:37:12 +0000 (15:37 +0100)]
xen/arm32: head: Always zero r3 before update a page-table entry

The boot code is using r2 and r3 to hold the page-table entry value.
While r2 is always updated before storing the value, this is not always
the case for r3.

Thankfully today, r3 will always be zero when we care. But this is
difficult to track and error-prone.

So always zero r3 within the few instructions before the write the
page-table entry.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii.anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: head: Don't set MAIR0 and MAIR1
Julien Grall [Sat, 13 Apr 2019 16:00:06 +0000 (17:00 +0100)]
xen/arm32: head: Don't set MAIR0 and MAIR1

The co-processor registers MAIR0 and MAIR1 are managed by EL1. So there
are no need to initialize them during Xen boot.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: head: Correctly report the HW CPU ID
Julien Grall [Mon, 15 Apr 2019 12:39:01 +0000 (13:39 +0100)]
xen/arm32: head: Correctly report the HW CPU ID

There are no reason to consider the HW CPU ID will be 0 when the
processor is part of a uniprocessor system. At best, this will result to
conflicting output as the rest of Xen use the value directly read from
MPIDR.

So remove the zeroing and logic to check if the CPU is part of a
uniprocessor system.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: p2m: configure stage-2 page table to support upto 42-bit PA systems
Vishnu Pajjuri OS [Thu, 30 May 2019 07:59:46 +0000 (07:59 +0000)]
xen/arm: p2m: configure stage-2 page table to support upto 42-bit PA systems

At the moment, on platform supporting 42-bit PA, Xen will only expose
40-bit worth of IPA to all domains.

The limitation was to prevent allocating too much memory for the root
page tables as those platforms only support 3-levels page-tables. At the
time, this was deemed acceptable because none of the platforms had
address wired above 40-bits.

However, newer platforms take advantage of the full address space. This
will result to break Dom0 boot as it can't access anything above 40-bit.

The only way to support 42-bit IPA is to allocate 8 pages for the root
page-tables. This is a bit a waste of memory as Xen does not offer
per-guest stage-2 configuration. But it is considered acceptable as
current platforms support 42-bit PA have a lot of memory.

In the future, we may want to consider per-guest stage-2 configuration
to reduce the waste.

Signed-off-by: Feng Kan <fengkan@os.amperecomputing.com>
Signed-off-by: Vishnu <vishnu@os.amperecomputing.com>
[julien: rework commit message]
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoArm64: further speed-up to hweight{32,64}()
Jan Beulich [Fri, 31 May 2019 09:53:39 +0000 (03:53 -0600)]
Arm64: further speed-up to hweight{32,64}()

According to Linux commit e75bef2a4f ("arm64: Select
ARCH_HAS_FAST_MULTIPLIER") this is a further improvement over the
variant using only bitwise operations on at least some hardware, and no
worse on other.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen: actually skip the first MAX_ORDER bits in pfn_pdx_hole_setup
Stefano Stabellini [Mon, 3 Jun 2019 22:02:44 +0000 (15:02 -0700)]
xen: actually skip the first MAX_ORDER bits in pfn_pdx_hole_setup

pfn_pdx_hole_setup is meant to skip the first MAX_ORDER bits, but
actually it only skips the first MAX_ORDER-1 bits. The issue was
probably introduced by bdb5439c3f ("x86_64: Ensure frame-table
compression leaves MAX_ORDER aligned"), when changing to loop to start
from MAX_ORDER-1 an adjustment by 1 was needed in the call to
find_next_bit() but not done.

Fix the issue by passing j+1 and i+1 to find_next_zero_bit and
find_next_bit. Also add a check for i >= BITS_PER_LONG because
find_{,next_}zero_bit() are free to assume that their last argument is
less than their middle one.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
CC: andrew.cooper3@citrix.com
CC: JBeulich@suse.com
CC: George.Dunlap@eu.citrix.com
CC: ian.jackson@eu.citrix.com
CC: konrad.wilk@oracle.com
CC: tim@xen.org
CC: wei.liu2@citrix.com
5 years agoxen/arm: fix nr_pdxs calculation
Stefano Stabellini [Mon, 3 Jun 2019 22:02:43 +0000 (15:02 -0700)]
xen/arm: fix nr_pdxs calculation

pfn_to_pdx expects an address, not a size, as a parameter. Specifically,
it expects the end address, then the masks calculations compensate for
any holes between start and end. Thus, we should pass the end address to
pfn_to_pdx.

The initial pdx is stored in frametable_base_pdx, so we can subtract the
result of pfn_to_pdx(start_address) from nr_pdxs; we know that we don't
need to cover any memory in the range 0-start in the frametable.

Remove the variable `nr_pages' because it is unused.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
CC: JBeulich@suse.com
5 years agotools/libxc: Add Hygon Dhyana support
Pu Wen [Thu, 4 Apr 2019 13:48:13 +0000 (21:48 +0800)]
tools/libxc: Add Hygon Dhyana support

Add Hygon Dhyana support to caculate the cpuid policies for creating PV
or HVM guest by using the code path of AMD.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpuid: Add Hygon Dhyana support
Pu Wen [Thu, 4 Apr 2019 13:48:04 +0000 (21:48 +0800)]
x86/cpuid: Add Hygon Dhyana support

The Hygon Dhyana family 18h processor shares the same cpuid leaves as
the AMD family 17h one. So add Hygon Dhyana support to caculate the
cpuid policies as the AMD CPU does.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/traps: Add Hygon Dhyana support
Pu Wen [Thu, 4 Apr 2019 13:47:54 +0000 (21:47 +0800)]
x86/traps: Add Hygon Dhyana support

The Hygon Dhyana processor has the methold to get the last exception
source IP from MSR0000_01DD. So add support for it if the boot param
ler is true.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/domctl: Add Hygon Dhyana support
Pu Wen [Thu, 4 Apr 2019 13:47:40 +0000 (21:47 +0800)]
x86/domctl: Add Hygon Dhyana support

Add Hygon Dhyana support to update cpuid info for creating PV guest.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/domain: Add Hygon Dhyana support
Pu Wen [Thu, 4 Apr 2019 13:47:29 +0000 (21:47 +0800)]
x86/domain: Add Hygon Dhyana support

Add Hygon Dhyana support to handle HyperTransport range.

Also loading a nul selector does not clear bases and limits on Hygon
CPUs, so add Hygon Dhyana support to the function preload_segment.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/apic: Add Hygon Dhyana support
Pu Wen [Thu, 4 Apr 2019 13:46:42 +0000 (21:46 +0800)]
x86/apic: Add Hygon Dhyana support

Add Hygon Dhyana support to use modern APIC.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/spec_ctrl: Add Hygon Dhyana to the respective mitigation machinery
Pu Wen [Thu, 4 Apr 2019 13:46:33 +0000 (21:46 +0800)]
x86/spec_ctrl: Add Hygon Dhyana to the respective mitigation machinery

The Hygon Dhyana CPU has the same speculative execution as AMD family
17h, so share AMD Retpoline and PTI mitigation code with Hygon Dhyana.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpu/mce: Add Hygon Dhyana support to the MCA infrastructure
Pu Wen [Thu, 4 Apr 2019 13:46:23 +0000 (21:46 +0800)]
x86/cpu/mce: Add Hygon Dhyana support to the MCA infrastructure

The machine check architecture for Hygon Dhyana CPU is similar to the
AMD family 17h one. Add vendor checking for Hygon Dhyana to share the
code path of AMD family 17h.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpu/vpmu: Add Hygon Dhyana and AMD Zen support for vPMU
Pu Wen [Thu, 4 Apr 2019 13:46:11 +0000 (21:46 +0800)]
x86/cpu/vpmu: Add Hygon Dhyana and AMD Zen support for vPMU

As Hygon Dhyana CPU share similar PMU architecture with AMD family
17h one, so add Hygon Dhyana support in vpmu_arch_initialise() and
vpmu_init() by sharing AMD code path.

Split the common part in amd_vpmu_init() to a static function
_vpmu_init(), making AMD and Hygon to call the shared function to
initialize vPMU.

As current vPMU still not support AMD Zen(family 17h), add 0x17 support
to amd_vpmu_init().

Also create a function hygon_vpmu_init() for Hygon vPMU initialization.

Both of AMD 17h and Hygon 18h have the same performance event select
and counter MSRs as AMD 15h has, so reuse the 15h definitions for them.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/cpu/mtrr: Add Hygon Dhyana support to get TOP_MEM2
Pu Wen [Thu, 4 Apr 2019 13:45:56 +0000 (21:45 +0800)]
x86/cpu/mtrr: Add Hygon Dhyana support to get TOP_MEM2

The Hygon Dhyana CPU supports the MSR way to get TOP_MEM2. So add Hygon
Dhyana support to print the value of TOP_MEM2.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/cpu: Fix common cpuid faulting probing for AMD and Hygon
Pu Wen [Thu, 4 Apr 2019 13:45:42 +0000 (21:45 +0800)]
x86/cpu: Fix common cpuid faulting probing for AMD and Hygon

There is no MSR_INTEL_PLATFORM_INFO for AMD and Hygon families. Read
this MSR will stop the Xen initialization process in some Hygon
systems or produce GPF(0). So directly return false in the function
probe_cpuid_faulting() if !cpu_has_hypervisor.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpu: Create Hygon Dhyana architecture support file
Pu Wen [Thu, 4 Apr 2019 13:45:03 +0000 (21:45 +0800)]
x86/cpu: Create Hygon Dhyana architecture support file

Add x86 architecture support for a new processor: Hygon Dhyana Family
18h. To make Hygon initialization flow more clear, carve out code from
amd.c into a separate file hygon.c, and remove unnecessary code for
Hygon Dhyana.

To identify Hygon Dhyana CPU, add a new vendor type X86_VENDOR_HYGON
and vendor ID "HygonGenuine" for system recognition, and fit the new
x86 vendor lookup mechanism.

Hygon can fully use the function early_init_amd(), so make this common
function non-static and direct call it from Hygon code.

Add a separate hygon_get_topology(), which calculate phys_proc_id from
AcpiId[6](see reference [1]).

Reference:
[1] https://git.kernel.org/tip/e0ceeae708cebf22c990c3d703a4ca187dc837f5

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap" and
             64933920c9b "x86/cpu: Drop cpu_devs[] and $VENDOR_init_cpu() hooks"]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agotools/fuzz: Add a cpu-policy fuzzing harness
Andrew Cooper [Thu, 3 Jan 2019 18:03:25 +0000 (18:03 +0000)]
tools/fuzz: Add a cpu-policy fuzzing harness

There is now enough complexity that a fuzzing harness is a good idea, and
enough supporting logic to implement one which AFL seems happy with.

Take the existing recalculate_synth() helper and export it as
x86_cpuid_policy_recalc_synth(), as it is needed by the fuzzing harness.

While editing the MAINTAINERS file, insert a related entry which was
accidentally missed from c/s 919ddc3c0 "tools/cpu-policy: Add unit tests", and
sort the lines.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agolibx86: Helper for clearing out-of-range CPUID leaves
Andrew Cooper [Tue, 21 May 2019 16:56:43 +0000 (17:56 +0100)]
libx86: Helper for clearing out-of-range CPUID leaves

When merging a levelled policy, stale out-of-range leaves may remain.
Introduce a helper to clear them, and test a number of the subtle corner
cases.

The logic based on cpuid_policy_xstates() is liable to need changing when XCR0
has bit 63 defined.  Leave BUILD_BUG_ON()'s behind with comments in all all
impacted areas, which includes in x86_cpuid_policy_fill_native().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/IRQ: ACKTYPE_NONE cannot make it into irq_guest_eoi_timer_fn()
Jan Beulich [Thu, 6 Jun 2019 14:05:27 +0000 (16:05 +0200)]
x86/IRQ: ACKTYPE_NONE cannot make it into irq_guest_eoi_timer_fn()

action->ack_type is set once before the timer even gets initialized, and
is never changed later. The timer gets activated only for EOI and UNMASK
types. Hence there's no need to have a respective if() in there. Replace
it by an ASSERT().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: bail early from irq_guest_eoi_timer_fn() when nothing is in flight
Jan Beulich [Thu, 6 Jun 2019 14:04:53 +0000 (16:04 +0200)]
x86/IRQ: bail early from irq_guest_eoi_timer_fn() when nothing is in flight

There's no point entering the loop in the function in this case. Instead
there still being something in flight _after_ the loop would be an
actual problem: No timer would be running anymore for issuing the EOI
eventually, and hence this IRQ (and possibly lower priority ones) would
be blocked, perhaps indefinitely.

Issue a warning instead and prefer breaking some (presumably
misbehaving) guest over stalling perhaps the entire system.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: don't keep EOI timer running without need
Jan Beulich [Thu, 6 Jun 2019 14:04:09 +0000 (16:04 +0200)]
x86/IRQ: don't keep EOI timer running without need

The timer needs to remain active only until all pending IRQ instances
have seen EOIs from their respective domains. Stop it when the in-flight
count has reached zero in desc_guest_eoi(). Note that this is race free
(with __do_IRQ_guest()), as the IRQ descriptor lock is being held at
that point.

Also pull up stopping of the timer in __do_IRQ_guest() itself: Instead
of stopping it immediately before re-setting, stop it as soon as we've
made it past any early returns from the function (and hence we're sure
it'll get set again).

Finally bail from the actual timer handler in case we find the timer
already active again by the time we've managed to acquire the IRQ
descriptor lock. Without this we may forcibly EOI an IRQ immediately
after it got sent to a guest. For this, timer_is_active() gets split out
of active_timer(), deliberately moving just one of the two ASSERT()s (to
allow the function to be used also on a never initialized timer).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agomemory: don't depend on guest_handle_subrange_okay() implementation details
Jan Beulich [Thu, 6 Jun 2019 14:03:10 +0000 (16:03 +0200)]
memory: don't depend on guest_handle_subrange_okay() implementation details

guest_handle_subrange_okay() takes inclusive first and last parameters,
i.e. checks that [first, last] is valid. Many callers, however, actually
need to see whether [first, limit) is valid (i.e., limit is non-
inclusive), and to do this they subtract 1 from the size. This is
normally correct, except in cases where first == limit, in which case
guest_handle_subrange_okay() will be passed a second parameter less than
its first.

As it happens, due to the way the math is implemented in x86's
guest_handle_subrange_okay(), the return value turns out to be correct;
but we shouldn\92t rely on this behavior.

Make sure all callers handle first == limit explicitly before calling
guest_handle_subrange_okay().

Note that the other uses (increase-reservation, populate-physmap, and
decrease-reservation) are already fine due to a suitable check in
do_memory_op().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoadjust system domain creation (and call it earlier on x86)
Jan Beulich [Thu, 6 Jun 2019 09:16:57 +0000 (11:16 +0200)]
adjust system domain creation (and call it earlier on x86)

Split out this mostly arch-independent code into a common-code helper
function. (This does away with Arm's arch_init_memory() altogether.)

On x86 this needs to happen before acpi_boot_init(): Commit 9fa94e1058
("x86/ACPI: also parse AMD IOMMU tables early") only appeared to work
fine - it's really broken, and doesn't crash (on non-EFI AMD systems)
only because of there being a mapping of linear address 0 during early
boot. On EFI there is:

 Early fatal page fault at e008:ffff82d08024d58e (cr2=0000000000000220, ec=0000)
 ----[ Xen-4.13-unstable  x86_64  debug=y   Not tainted ]----
 CPU:    0
 RIP:    e008:[<ffff82d08024d58e>] pci.c#_pci_hide_device+0x17/0x3a
 RFLAGS: 0000000000010046   CONTEXT: hypervisor
 rax: 0000000000000000   rbx: 0000000000006000   rcx: 0000000000000000
 rdx: ffff83104f2ee9b0   rsi: ffff82e0209e5d48   rdi: ffff83104f2ee9a0
 rbp: ffff82d08081fce0   rsp: ffff82d08081fcb8   r8:  0000000000000000
 r9:  8000000000000000   r10: 0180000000000000   r11: 7fffffffffffffff
 r12: ffff83104f2ee9a0   r13: 0000000000000002   r14: ffff83104f2ee4b0
 r15: 0000000000000064   cr0: 0000000080050033   cr4: 00000000000000a0
 cr3: 000000009f614000   cr2: 0000000000000220
 fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
 ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
 Xen code around <ffff82d08024d58e> (pci.c#_pci_hide_device+0x17/0x3a):
  48 89 47 38 48 8d 57 10 <48> 8b 88 20 02 00 00 48 89 51 08 48 89 4f 10 48
 Xen stack trace from rsp=ffff82d08081fcb8:
[...]
 Xen call trace:
    [<ffff82d08024d58e>] pci.c#_pci_hide_device+0x17/0x3a
[   [<                >] pci_ro_device+...]
    [<ffff82d080617fe1>] amd_iommu_detect_one_acpi+0x161/0x249
    [<ffff82d0806186ac>] iommu_acpi.c#detect_iommu_acpi+0xb5/0xe7
    [<ffff82d08061cde0>] acpi_table_parse+0x61/0x90
    [<ffff82d080619e7d>] amd_iommu_detect_acpi+0x17/0x19
    [<ffff82d08061790b>] acpi_ivrs_init+0x20/0x5b
    [<ffff82d08062e838>] acpi_boot_init+0x301/0x30f
    [<ffff82d080628b10>] __start_xen+0x1daf/0x28a2

 Pagetable walk from 0000000000000220:
  L4[0x000] = 000000009f44f063 ffffffffffffffff
  L3[0x000] = 000000009f44b063 ffffffffffffffff
  L2[0x000] = 0000000000000000 ffffffffffffffff

 ****************************************
 Panic on CPU 0:
 FATAL TRAP: vector = 14 (page fault)
 [error_code=0000] , IN INTERRUPT CONTEXT
 ****************************************

Of course the bug would nevertheless have lead to post-boot crashes as
soon as the list would actually get traversed.

Take the opportunity and
- convert BUG_ON()s being moved to panic(),
- add __read_mostly annotations to the dom_* definitions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoPCI: move pdev_list field to common structure
Jan Beulich [Thu, 6 Jun 2019 09:14:58 +0000 (11:14 +0200)]
PCI: move pdev_list field to common structure

Its management shouldn't be arch-specific, and in particular there
should be no need for special precautions when creating the special
domains.

At this occasion
- correct parenthesization of for_each_pdev(),
- stop open-coding for_each_pdev() in vPCI code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: relax locking in irq_guest_eoi_timer_fn()
Jan Beulich [Thu, 6 Jun 2019 09:14:00 +0000 (11:14 +0200)]
x86/IRQ: relax locking in irq_guest_eoi_timer_fn()

This is a timer handler, so it gets entered with IRQs enabled. Therefore
there's no need to save/restore the IRQ masking flag.

Additionally the final switch()'es ACKTYPE_EOI case re-acquires the lock
just for it to be dropped again right away. Do away with this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoarm: rename tiny64.conf to tiny64_defconfig
Volodymyr Babchuk [Thu, 16 May 2019 13:39:00 +0000 (15:39 +0200)]
arm: rename tiny64.conf to tiny64_defconfig

As build system now supports *_defconfig rules it is good to be able
to configure minimal XEN image with

 make tiny64_defconfig

command.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agomakefile: add support for *_defconfig targets
Volodymyr Babchuk [Thu, 6 Jun 2019 09:11:14 +0000 (11:11 +0200)]
makefile: add support for *_defconfig targets

Ease up XEN configuration for non-standard builds, like
armv8 tiny config.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/bitops: Further reduce the #ifdef-ary in generic_hweight64()
Andrew Cooper [Tue, 4 Jun 2019 12:40:08 +0000 (13:40 +0100)]
xen/bitops: Further reduce the #ifdef-ary in generic_hweight64()

This #ifdef-ary isn't necessary, and the logic can live in a plain if()

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/vm-event: Misc fixups
Andrew Cooper [Fri, 31 May 2019 19:54:28 +0000 (12:54 -0700)]
xen/vm-event: Misc fixups

 * Drop redundant brackes, and inline qualifiers.
 * Insert newlines and spaces where appropriate.
 * Drop redundant NDEBUG - gdprint() is already conditional.  Fix the
   logging level, as gdprintk() already prefixes the guest marker.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
5 years agoxen/vm-event: Fix interactions with the vcpu list
Andrew Cooper [Fri, 31 May 2019 19:29:27 +0000 (12:29 -0700)]
xen/vm-event: Fix interactions with the vcpu list

vm_event_resume() should use domain_vcpu(), rather than opencoding it
without its Spectre v1 safety.

vm_event_wake_blocked() can't ever be invoked in a case where d->vcpu is
NULL, so drop the outer if() and reindent, fixing up style issues.

The comment, which is left alone, is false.  This algorithm still has
starvation issues when there is an asymetric rate of generated events.

However, the existing logic is sufficiently complicated and fragile that
I don't think I've followed it fully, and because we're trying to
obsolete this interface, the safest course of action is to leave it
alone, rather than to end up making things subtly different.

Therefore, no practical change that callers would notice.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
5 years agoxen/vm-event: Remove unnecessary vm_event_domain indirection
Andrew Cooper [Fri, 31 May 2019 20:11:15 +0000 (13:11 -0700)]
xen/vm-event: Remove unnecessary vm_event_domain indirection

The use of (*ved)-> leads to poor code generation, as the compiler can't
assume the pointer hasn't changed, and results in hard-to-follow code.

For both vm_event_{en,dis}able(), rename the ved parameter to p_ved, and
work primarily with a local ved pointer.

This has a key advantage in vm_event_enable(), in that the partially
constructed vm_event_domain only becomes globally visible once it is
fully constructed.  As a consequence, the spinlock doesn't need holding.

Furthermore, rearrange the order of operations to be more sensible.
Check for repeated enables and an bad HVM_PARAM before allocating
memory, and gather the trivial setup into one place, dropping the
redundant zeroing.

No practical change that callers will notice.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
5 years agoxen/vm-event: Expand vm_event_* spinlock macros and rename the lock
Andrew Cooper [Fri, 31 May 2019 20:57:03 +0000 (13:57 -0700)]
xen/vm-event: Expand vm_event_* spinlock macros and rename the lock

These serve no purpose, but to add to the congnitive load of following
the code.  Remove the level of indirection.

Furthermore, the lock protects all data in vm_event_domain, making
ring_lock a poor choice of name.

For vm_event_get_response() and vm_event_grab_slot(), fold the exit
paths to have a single unlock, as the compiler can't make this
optimisation itself.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
5 years agoxen/vm-event: Drop unused u_domctl parameter from vm_event_domctl()
Andrew Cooper [Fri, 31 May 2019 19:35:55 +0000 (12:35 -0700)]
xen/vm-event: Drop unused u_domctl parameter from vm_event_domctl()

This parameter isn't used at all.  Futhermore, elide the copyback in
failing cases, as it is only successful paths which generate data which
needs sending back to the caller.

Finally, drop a redundant d == NULL check, as that logic is all common
at the begining of do_domctl().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agosched_null: superficial clean-ups
Baodong Chen [Mon, 3 Jun 2019 15:56:20 +0000 (17:56 +0200)]
sched_null: superficial clean-ups

* Remove unused dependency 'keyhandler.h'
* Make sched_null_def static

Signed-off-by: Baodong Chen <chenbaodong@mxnavi.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86: remove alternative_callN usage of ALTERNATIVE asm macro
Roger Pau Monné [Mon, 3 Jun 2019 15:55:37 +0000 (17:55 +0200)]
x86: remove alternative_callN usage of ALTERNATIVE asm macro

There is a bug in llvm that needs to be fixed before switching to use
the alternative assembly macros in inline assembly call sites.
Therefore alternative_callN using inline assembly to generate the
alternative patch sites should be using the ALTERNATIVE C preprocessor
macro rather than the ALTERNATIVE assembly macro. Using the assembly
macro in an inline assembly instance triggers the following bug on
llvm based toolchains:

<instantiation>:1:1: error: invalid symbol redefinition
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
^
<instantiation>:1:37: error: invalid symbol redefinition
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
                                    ^
<instantiation>:1:60: error: invalid reassignment of non-absolute variable '.L0_diff'
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
                                                           ^
<inline asm>:1:2: note: while in macro instantiation
        ALTERNATIVE "call *genapic+64(%rip)", "call .", X86_FEATURE_LM
        ^
<instantiation>:1:156: error: invalid symbol redefinition
  ...- (.L0_orig_e - .L0_orig_s); mknops ((-(.L0_diff > 0)) * .L0_diff); .L0_orig_p:
                                                                         ^
<instantiation>:18:5: error: invalid symbol redefinition
    .L0_repl_s1: call .; .L0_repl_e1:
    ^
<instantiation>:18:26: error: invalid symbol redefinition
    .L0_repl_s1: call .; .L0_repl_e1:
                         ^
<instantiation>:1:1: error: invalid symbol redefinition
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
^
<instantiation>:1:37: error: invalid symbol redefinition
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
                                    ^
<instantiation>:1:60: error: invalid reassignment of non-absolute variable '.L0_diff'
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
                                                           ^
<inline asm>:1:2: note: while in macro instantiation
        ALTERNATIVE "call *genapic+64(%rip)", "call .", X86_FEATURE_LM
        ^
<instantiation>:1:156: error: invalid symbol redefinition
  ...- (.L0_orig_e - .L0_orig_s); mknops ((-(.L0_diff > 0)) * .L0_diff); .L0_orig_p:
                                                                         ^
<instantiation>:18:5: error: invalid symbol redefinition
    .L0_repl_s1: call .; .L0_repl_e1:
    ^
<instantiation>:18:26: error: invalid symbol redefinition
    .L0_repl_s1: call .; .L0_repl_e1:
                         ^
<instantiation>:1:1: error: invalid symbol redefinition
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
^
<instantiation>:1:37: error: invalid symbol redefinition
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
                                    ^
<instantiation>:1:60: error: invalid reassignment of non-absolute variable '.L0_diff'
.L0_orig_s: call *genapic+64(%rip); .L0_orig_e: .L0_diff = (.L0_repl_e1 - .L0_repl_s1) - (...
                                                           ^
<inline asm>:1:2: note: while in macro instantiation
        ALTERNATIVE "call *genapic+64(%rip)", "call .", X86_FEATURE_LM
        ^
<instantiation>:1:156: error: invalid symbol redefinition
  ...- (.L0_orig_e - .L0_orig_s); mknops ((-(.L0_diff > 0)) * .L0_diff); .L0_orig_p:
                                                                         ^
<instantiation>:18:5: error: invalid symbol redefinition
    .L0_repl_s1: call .; .L0_repl_e1:
    ^
<instantiation>:18:26: error: invalid symbol redefinition
    .L0_repl_s1: call .; .L0_repl_e1:
                         ^

This has been reported to upstream llvm:

https://bugs.llvm.org/show_bug.cgi?id=42034

Fixes: 67d01cdb5 ("x86: infrastructure to allow converting certain indirect calls to direct ones")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: further speed-up to hweight{32,64}()
Jan Beulich [Mon, 3 Jun 2019 15:21:05 +0000 (17:21 +0200)]
x86: further speed-up to hweight{32,64}()

According to Linux commit 0136611c62 ("optimize hweight64 for x86_64")
this is a further improvement over the variant using only bitwise
operations. It's also a slight further code size reduction.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agobitops: speed up hweight<N>()
Jan Beulich [Mon, 3 Jun 2019 15:20:13 +0000 (17:20 +0200)]
bitops: speed up hweight<N>()

Algorithmically this gets us in line with current Linux, where the same
change did happen about 13 years ago. See in particular Linux commits
f9b4192923 ("bitops: hweight() speedup") and 0136611c62 ("optimize
hweight64 for x86_64").

Kconfig changes for actually setting HAVE_FAST_MULTIPLY will follow.

Take the opportunity and change generic_hweight64()'s return type to
unsigned int.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agocpu: change 'cpu_hotplug_[begin|done]' to inline function
Baodong Chen [Mon, 3 Jun 2019 15:18:58 +0000 (17:18 +0200)]
cpu: change 'cpu_hotplug_[begin|done]' to inline function

Signed-off-by: Baodong Chen <chenbaodong@mxnavi.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoremove on-stack cpumask from stop_machine_run()
Juergen Gross [Mon, 3 Jun 2019 15:17:51 +0000 (17:17 +0200)]
remove on-stack cpumask from stop_machine_run()

The "allbutself" cpumask in stop_machine_run() is not needed. Instead
of allocating it on the stack it can easily be avoided.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agonotifier: refine 'notifier_head', use 'list_head' directly
Baodong Chen [Mon, 3 Jun 2019 15:16:52 +0000 (17:16 +0200)]
notifier: refine 'notifier_head', use 'list_head' directly

'notifier_block' can be replaced with 'list_head' when used for
'notifier_head', this makes a little clearer.

Signed-off-by: Baodong Chen <chenbaodong@mxnavi.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoschedule: initialize 'now' when really needed
Baodong Chen [Mon, 3 Jun 2019 15:15:44 +0000 (17:15 +0200)]
schedule: initialize 'now' when really needed

when 'periodic_period' is zero, there is no need to initialize 'now'.

Signed-off-by: Baodong Chen <chenbaodong@mxnavi.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86emul/fuzz: add a state sanity checking function
Jan Beulich [Mon, 3 Jun 2019 15:15:06 +0000 (17:15 +0200)]
x86emul/fuzz: add a state sanity checking function

This is to accompany sanitize_input(). Just like for initial state we
want to have state between two emulated insns sane, at least as far as
assumptions in the main emulator go. Do minimal checking after segment
register, CR, and MSR writes, and roll back to the old value in case of
failure (raising #GP(0) at the same time).

In the particular case observed, a CR0 write clearing CR0.PE was
followed by a VEX-encoded insn, which the decoder accepts based on
guest address size, restricting things just outside of the 64-bit case
(real and virtual modes don't allow VEX-encoded insns). Subsequently
_get_fpu() would then assert that CR0.PE must be set (and EFLAGS.VM
clear) when trying to invoke YMM, ZMM, or OPMASK state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86emul/fuzz: extend canonicalization to 57-bit linear address width case
Jan Beulich [Mon, 3 Jun 2019 15:14:41 +0000 (17:14 +0200)]
x86emul/fuzz: extend canonicalization to 57-bit linear address width case

Don't enforce any other dependencies for now, just like we don't enforce
e.g. PAE enabled as a prereq for long mode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/hvm: Make the altp2m locking in hvm_hap_nested_page_fault() easier to follow
Andrew Cooper [Tue, 23 Oct 2018 10:18:07 +0000 (11:18 +0100)]
x86/hvm: Make the altp2m locking in hvm_hap_nested_page_fault() easier to follow

Drop the ap2m_active boolean, and consistently use the unlocking form:

  if ( p2m != hostp2m )
       __put_gfn(p2m, gfn);
  __put_gfn(hostp2m, gfn);

which makes it clear that we always unlock the altp2m's gfn if it is in use,
and always unlock the hostp2m's gfn.  This also drops the ternary expression
in the logdirty case.

Extend the logdirty comment to identify where the locking violation is liable
to occur.

No (intended) overall change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agovm_event: Make ‘local’ functions ‘static’
Petre Pircalabu [Thu, 30 May 2019 14:18:17 +0000 (17:18 +0300)]
vm_event: Make ‘local’ functions ‘static’

vm_event_get_response, vm_event_resume, and vm_event_mark_and_pause are
used only in xen/common/vm_event.c.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agox86/mpparse: Don't print "limit reached" for every subsequent processor
Andrew Cooper [Fri, 17 May 2019 18:35:08 +0000 (19:35 +0100)]
x86/mpparse: Don't print "limit reached" for every subsequent processor

When you boot Xen with the default 256 NR_CPUS, on a box with rather more
processors, the resulting spew is unnecesserily verbose.  Instead, print the
message once, e.g:

 (XEN) ACPI: X2APIC (apic_id[0x115] uid[0x115] enabled)
 (XEN) WARNING: NR_CPUS limit of 256 reached - ignoring further processors
 (XEN) ACPI: X2APIC (apic_id[0x119] uid[0x119] enabled)
 (XEN) ACPI: X2APIC (apic_id[0x11d] uid[0x11d] enabled)
 (XEN) ACPI: X2APIC (apic_id[0x121] uid[0x121] enabled)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/lib: Introduce printk_once() and replace some opencoded examples
Andrew Cooper [Fri, 17 May 2019 18:30:47 +0000 (19:30 +0100)]
xen/lib: Introduce printk_once() and replace some opencoded examples

Reflow the ZynqMP message for grepability, and fix the omission of a newline.

There is a race condition where multiple cpus could race to set once_ boolean.
However, the use of this construct is mainly useful for boot time code, and
the only consequence of the race is a repeated print message.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agox86/spec-ctrl: Knights Landing/Mill are retpoline-safe
Andrew Cooper [Fri, 17 May 2019 18:23:55 +0000 (19:23 +0100)]
x86/spec-ctrl: Knights Landing/Mill are retpoline-safe

They are both Airmont-based and should have been included in c/s 17f74242ccf
"x86/spec-ctrl: Extend repoline safey calcuations for eIBRS and Atom parts".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/vhpet: avoid 'small' time diff test on resume
Paul Durrant [Fri, 31 May 2019 09:40:52 +0000 (11:40 +0200)]
x86/vhpet: avoid 'small' time diff test on resume

It appears that even 64-bit versions of Windows 10, when not using syth-
etic timers, will use 32-bit HPET non-periodic timers. There is a test
in hpet_set_timer(), specific to 32-bit timers, that tries to disambiguate
between a comparator value that is in the past and one that is sufficiently
far in the future that it wraps. This is done by assuming that the delta
between the main counter and comparator will be 'small' [1], if the
comparator value is in the past. Unfortunately, more often than not, this
is not the case if the timer is being re-started after a migrate and so
the timer is set to fire far in the future (in excess of a minute in
several observed cases) rather then set to fire immediately. This has a
rather odd symptom where the guest console is alive enough to be able to
deal with mouse pointer re-rendering, but any keyboard activity or mouse
clicks yield no response.

This patch simply adds an extra check of 'creation_finished' into
hpet_set_timer() so that the 'small' time test is omitted when the function
is called to restart timers after migration, and thus any negative delta
causes a timer to fire immediately.

[1] The number of ticks that equate to 0.9765625 milliseconds

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agosupport: remove tmem from SUPPORT.md
Juergen Gross [Fri, 31 May 2019 09:40:38 +0000 (11:40 +0200)]
support: remove tmem from SUPPORT.md

Tmem has been removed. Reflect that in SUPPORT.md

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoVT-d: change bogus return value of intel_iommu_lookup_page()
Jan Beulich [Fri, 31 May 2019 09:39:49 +0000 (11:39 +0200)]
VT-d: change bogus return value of intel_iommu_lookup_page()

The function passes 0 as "alloc" argument to addr_to_dma_page_maddr(),
so -ENOMEM simply makes no sense (and its use was probably simply a
copy-and-paste effect originating at intel_iommu_map_page()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>