Add a new start address field to struct bootcmdline to easily match a
cmdline to the corresponding bootmodule. This is useful for debugging
(not actually needed for functionalities today, but could be.)
Instead of printing the index in the cmdline array, print the start
address of the corresponding bootmodule for each cmdline in
early_print_info.
Find addresses, sizes on device tree from kernel_probe.
Find the cmdline from the bootcmdlines array.
Introduce a new boot_module_find_by_addr_and_kind function to match not
just on boot module kind, but also by address so that we can support
multiple domains.
Introduce a boot_cmdline_find_by_name function to find the right struct
cmdline based on the device tree node name of the "xen,domain"
compatible node.
Set command line for dom0 in kernel_probe for consistency.
xen/arm: don't add duplicate boot modules, introduce domU flag
Don't add duplicate boot modules (same kind and same start address),
they are freed later, we don't want to introduce double-free errors.
Introduce a domU flag in struct bootmodule and struct bootcmdline. Set
it for kernels and ramdisks of "xen,domain" nodes to avoid getting
confused in kernel_probe, where we try to guess which is the dom0 kernel
and initrd to be compatible with all versions of the multiboot spec.
boot_module_find_by_kind and boot_cmdline_find_by_kind automatically
check for !domU entries (they are only used for non-domU modules).
Introduce a new array to store the cmdline of each boot module. It is
separate from struct bootmodules. Remove the cmdline field from struct
boot_module. This way, kernels and initrds with the same address in
memory can share struct bootmodule (important because we want them to be
free'd only once), but they can still have their separate bootcmdline
entries.
Add a dt_name field to struct bootcmdline to make it easier to find the
correct entry. Store the name of the "xen,domain" compatible node (for
example "Dom1"). This is a better choice compared to the name of the
"multiboot,kernel" compatible node, because their names are not unique.
For instance there can be more than one "module@0x4c000000" in the
system, but there can only be one "/chosen/Dom1".
Add a pointer to struct kernel_info to point to the cmdline for a given
kernel.
Xen boot modules need to account not just for Dom0 but also for a few
potential DomUs, each of them coming with their own kernel and initrd.
Increase MAX_MODULES to 32 to allow for more DomUs.
xen: allow console_io hypercalls from certain DomUs
Introduce an is_console option to allow certain classes of domUs to use
the Xen console. Specifically, it will be used to give console access to
all domUs started from Xen from information on device tree.
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> CC: andrew.cooper3@citrix.com CC: George.Dunlap@eu.citrix.com CC: ian.jackson@eu.citrix.com CC: jbeulich@suse.com CC: konrad.wilk@oracle.com CC: tim@xen.org CC: wei.liu2@citrix.com CC: dgdegra@tycho.nsa.gov
Sergey Dyasli [Wed, 14 Nov 2018 10:23:22 +0000 (10:23 +0000)]
x86/vvmx: correctly report vvmcs size
The size of Xen's virtual vmcs region is 4096 bytes (see comment about
Virtual VMCS layout in include/asm-x86/hvm/vmx/vvmx.h). Correctly report
it to the guest in case when VMCS shadowing is not available instead of
providing H/W value (which is usually smaller).
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Sergey Dyasli [Wed, 14 Nov 2018 10:23:16 +0000 (10:23 +0000)]
x86/nestedhvm: init nv_vvmcxaddr in hvm_vcpu_initialise()
This allows to safely use nestedhvm functions that rely on the values
inside struct nestedvcpu independently of the nested virtualisation
(HVM_PARAM_NESTEDHVM) status of a domain.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 8 Nov 2018 18:12:19 +0000 (18:12 +0000)]
x86/hvm: Unify hvm_event_pending()'s API with the !CONFIG_HVM version
This patch should have been part of, or a prerequiesite of, c/s 981c9a78 "x86:
provide stubs, declarations and macros in hvm.h" to avoid getting the API's
out of sync.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Wed, 14 Nov 2018 16:50:18 +0000 (17:50 +0100)]
x86: add myself as reviewer
As I've touched quite a lot of this code in order to add PVH and PV
shim support I would like to keep an eye on incoming changes, and
since I'm also attempting to review patches in this area it's going to
be easier if I get CCed on them.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Christian Lindig [Wed, 14 Nov 2018 11:06:49 +0000 (11:06 +0000)]
tools/ocaml: cleanup to reduce compiler warnings
This commit cleans up code to reduce compiler warnings:
* remove unused open statements (warning 33)
* remove unused 'rec' declarations (warnign 39)
* remove unused type declarations (warning 34)
* mark unused variables with an underscore (warning 27)
* mark unused value declarations with an underscore (warning 32)
This commit does not include changes to fix compiler warnings 52
(matching against strings in exceptions). These changes have no impact
on functionality.
Signed-off-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
This patch adds a couple of regs to the vm_event that are used by
the introspection. The base, limit and ar
bits are compressed into a uint64_t union so as not to enlarge the
vm_event.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Daniel De Graaf [Fri, 2 Nov 2018 17:46:11 +0000 (13:46 -0400)]
flask/policy: allow dom0 to use PHYSDEVOP_pci_mmcfg_reserved
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Mon, 12 Nov 2018 16:14:57 +0000 (17:14 +0100)]
guest/pvh: special case the low 1MB
When running as a PVH guest Xen only special cases the trampoline
code in the low 1MB, without also reserving the space used by the
relocated metadata or the trampoline stack.
Fix this by always reserving the low 1MB regardless of whether Xen is
running as a guest or natively.
Reported-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monné [Mon, 12 Nov 2018 16:13:57 +0000 (17:13 +0100)]
guest/pvh: fix handling of multiboot module list
When booting Xen as a PVH guest the data in the PVH start info
structure is copied over to a multiboot structure and a module list
array that resides in the .init section of the Xen image. The
resulting multiboot module list is then handed to the generic boot
process using the physical address in mbi->mods_addr.
This works fine as long as the Xen image doesn't relocate itself, if
there's such a relocation the physical addresses of multiboot module
list is no longer valid.
Fix this by handing the virtual address of the module list to the
generic boot process instead of it's physical address.
Reported-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Tue, 23 Oct 2018 14:25:13 +0000 (15:25 +0100)]
xen/arm: gic: Relax barrier when sending an SGI
When sending an SGI to another CPU, we require a barrier to ensure that
any pending stores to normal memory are made visible to the recipient
before the interrupt arrives.
For GICv2, rather than using dsb(sy) before writel_gicd, we can instead
use dsb(ishst), since we just need to ensure that any pending normal
writes are visible within the inner-shareable domain before we poke the
GIC.
With this observation, we can then further weaken the barrier to a
dmb(ishst), since other CPUs in the inner-shareable domain must observe
the write to the distributor before the SGI is generated.
A DMB instruction can be used to ensure the relative order of only
memory accesses before and after the barrier. Since writes to system
registers are not memory operations, barrier DMB is not sufficient for
observalibility of memory accesses that occur before ICC_SGI1R_EL1
(GICv3).
For GICv3, a DSB instruction ensures that no instructions that appear in
program order after the DSB instruction, can execute until the DSB
instruction has completed.
Andrew Cooper [Fri, 9 Nov 2018 14:14:08 +0000 (14:14 +0000)]
x86/dom0: Avoid using 1G superpages if shadowing may be necessary
The shadow code doesn't support 1G superpages, and will hand #PF[RSVD] back to
guests.
For dom0's with 512GB of RAM or more (and subject to the P2M alignment), Xen's
domain builder might use 1G superpages.
Avoid using 1G superpages (falling back to 2M superpages instead) if there is
a reasonable chance that we may have to shadow dom0. This assumes that there
are no circumstances where we will activate logdirty mode on dom0.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Tue, 23 Oct 2018 18:17:07 +0000 (19:17 +0100)]
xen/arm: gic: Ensure ordering between read of INTACK and shared data
When an IPI is generated by a CPU, the pattern looks roughly like:
<write shared data>
dsb(sy);
<write to GIC to signal SGI>
On the receiving CPU we rely on the fact that, once we've taken the
interrupt, then the freshly written shared data must be visible to us.
Put another way, the CPU isn't going to speculate taking an interrupt.
Unfortunately, this assumption turns out to be broken.
Consider that CPUx wants to send an IPI to CPUy, which will cause CPUy
to read some shared_data. Before CPUx has done anything, a random
peripheral raises an IRQ to the GIC and the IRQ line on CPUy is raised.
CPUy then takes the IRQ and starts executing the entry code, heading
towards gic_handle_irq. Furthermore, let's assume that a bunch of the
previous interrupts handled by CPUy were SGIs, so the branch predictor
kicks in and speculates that irqnr will be <16 and we're likely to
head into handle_IPI. The prefetcher then grabs a speculative copy of
shared_data which contains a stale value.
Meanwhile, CPUx gets round to updating shared_data and asking the GIC
to send an SGI to CPUy. Internally, the GIC decides that the SGI is
more important than the peripheral interrupt (which hasn't yet been
ACKed) but doesn't need to do anything to CPUy, because the IRQ line
is already raised.
CPUy then reads the ACK register on the GIC, sees the SGI value which
confirms the branch prediction and we end up with a stale shared_data
value.
This patch fixes the problem by adding an smp_rmb() to the IPI entry
code in do_SGI.
Julien Grall [Tue, 23 Oct 2018 18:17:06 +0000 (19:17 +0100)]
xen/arm: gic: Ensure we have an ISB between ack and do_IRQ()
Devices that expose their interrupt status registers via system
registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,
vgic (although unused by Linux), ...) rely on a context synchronising
operation on the CPU to ensure that the updated status register is
visible to the CPU when handling the interrupt. This usually happens as
a result of taking the IRQ exception in the first place, but there are
two race scenarios where this isn't the case.
For example, let's say we have two peripherals (X and Y), where Y uses a
system register for its interrupt status.
Case 1:
1. CPU takes an IRQ exception as a result of X raising an interrupt
2. Y then raises its interrupt line, but the update to its system
register is not yet visible to the CPU
3. The GIC decides to expose Y's interrupt number first in the Ack
register
4. The CPU runs the IRQ handler for Y, but the status register is stale
Case 2:
1. CPU takes an IRQ exception as a result of X raising an interrupt
2. CPU reads the interrupt number for X from the Ack register and runs
its IRQ handler
3. Y raises its interrupt line and the Ack register is updated, but
again, the update to its system register is not yet visible to the
CPU.
4. Since the GIC drivers poll the Ack register, we read Y's interrupt
number and run its handler without a context synchronisation
operation, therefore seeing the stale register value.
In either case, we run the risk of missing an IRQ. This patch solves the
problem by ensuring that we execute an ISB in the GIC drivers prior
to invoking the interrupt handler.
Julien Grall [Wed, 31 Oct 2018 18:13:13 +0000 (18:13 +0000)]
xen/arm: Move vgic_* helpers from gic.h to vgic.h
Keep vgic_* helpers in a single place. At the same time remove gic.h
from event.h since the helpers has now been moved to vgic.h (included by
domain.h).
Julien Grall [Wed, 31 Oct 2018 18:13:02 +0000 (18:13 +0000)]
xen/arm: Move SYSREG accessors in sysregs.h
System registers accessors are self-contained and should not be included
everywhere in Xen. Move the accessors in sysregs.h and include the file
when necessary.
With that change, it is not necessary to include processor.h in time.h.
Julien Grall [Wed, 31 Oct 2018 18:12:57 +0000 (18:12 +0000)]
xen/arm: Consolidate CPU identification in cpufeature.{c,h}
At the moment, CPU Identification is spread accross cpu.c, cpufeature.c,
processor.h, cpufeature.h. It would be better to keep everything
together in a single place.
Julien Grall [Wed, 31 Oct 2018 18:12:55 +0000 (18:12 +0000)]
xen/arm: Remove __init from prototype
In Xen, it is common to add __init to the declaration and not the
prototype. Remove the few __init on some prototypes which allows to
avoid the inclusion of init.h in headers.
With these changes, init.h is now required to be included on some c
files. Also, add __init where it was missing in declaration.
x86/hvm: clean up the rest of bool_t from vm_event
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Fri, 9 Nov 2018 12:05:28 +0000 (13:05 +0100)]
pass-through: adjust pIRQ migration
For one it is quite pointless to iterate over all pIRQ-s the domain has
when just one is being adjusted. Introduce hvm_migrate_pirq() as an
externally accessible function.
Additionally it is bogus to migrate the pIRQ to a vCPU different from
the one the event is supposed to be posted to - if anything, it might be
worth considering not to migrate the pIRQ at all in the posting case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Thu, 1 Nov 2018 10:16:58 +0000 (10:16 +0000)]
xen/grant_table: Remove stale comment on top of map_grant_ref
Remove the 2 part comment on top of map_grant_ref:
- The first part mention the return value which has been void since
2006!
- The second part mention a local variable 'addr' which does not
exist anymore.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 9 Nov 2018 10:42:10 +0000 (11:42 +0100)]
cpufreq: convert to a single post-init driver (hooks) instance
This reduces the post-init memory footprint, eliminates a pointless
level of indirection at the use sites, and allows for subsequent
alternatives call patching.
Take the opportunity and also add a name to the PowerNow! instance.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Xin Li [Fri, 9 Nov 2018 10:41:30 +0000 (11:41 +0100)]
xsm: remove printing from set_to_dummy_if_null()
Filling dummy module's hook to null value of xsm_operations structure
will generate debug message. This becomes boot time spew for module
like silo, which only sets a few hooks of itself. So remove the printing
to avoid boot time spew.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Xin Li <xin.li@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Paul Durrant [Fri, 9 Nov 2018 10:40:12 +0000 (11:40 +0100)]
viridian: introduce struct viridian_page
The 'vp_assist' page is currently an example of a guest page which needs to
be kept mapped throughout the life-time of a guest, but there are other
such examples in the specifiction [1]. This patch therefore introduces a
generic 'viridian_page' type and converts the current vp_assist/apic_assist
related code to use it. Subsequent patches implementing other enlightments
can then also make use of it.
This patch also renames the 'vp_assist_pending' field in struct
hvm_viridian_vcpu_context to 'apic_assist_pending' to more accurately
reflect its meaning. The term 'vp_assist' applies to the whole page rather
than just the EOI-avoidance enlightenment. New versons of the specification
have defined data structures for other enlightenments within the same page.
Paul Durrant [Fri, 9 Nov 2018 10:39:27 +0000 (11:39 +0100)]
viridian: define type for the 'virtual VP assist page'
The specification [1] defines a type so we should use it, rather than just
OR-ing and AND-ing magic bits.
No functional change.
NOTE: The type defined in the specification does include an anonymous
sub-struct in the page type but, as we currently use only the first
element, the struct declaration has been omitted.
Paul Durrant [Fri, 9 Nov 2018 10:38:03 +0000 (11:38 +0100)]
viridian: separate time related enlightenment implementations...
...into new 'time' module.
This patch reduces the size of the main viridian source module by
moving time related enlightenments into their own source module. This is
done in anticipation of implementation of more such enightenments and
a desire to not further lengthen the main source module when this work
is done.
While moving the code:
- Move the declaration of HV_REFERENCE_TSC_PAGE from the header file into
the new source module, since it is only used there.
- Clean up a bool_t.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Paul Durrant [Fri, 9 Nov 2018 10:36:52 +0000 (11:36 +0100)]
viridian: separate interrupt related enlightenment implementations...
...into new 'synic' module.
The SynIC (synthetic interrupt controller) is specified [1] to be a super-
set of a virtualized LAPIC, and its definition encompasses all
enlightenments related to virtual interrupt control.
This patch reduces the size of the main viridian source module by giving
these enlightenments their own module. This is done in anticipation of
implementation of more such enlightenments and a desire not to further
lengthen then main source module when this work is done.
Whilst moving the code:
- Fix various style issues.
- Move the MSR definitions into the header (since they are now needed in
more than one source module).
Roger Pau Monne [Thu, 8 Nov 2018 14:23:58 +0000 (15:23 +0100)]
amd/pvh: enable ACPI C1E disable quirk on PVH Dom0
PV Dom0 has a quirk for some AMD processors, where enabling ACPI can
also enable C1E mode. Apply the same workaround as done on PV for a
PVH Dom0, which consist on trapping accesses to the SMI command IO
port and disabling C1E if ACPI is enabled.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
This patch adds a couple of regs to the vm_event that are used by
the introspection. The base, limit and ar
bits are compressed into a uint64_t union so as not to enlarge the
vm_event.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Thu, 8 Nov 2018 14:59:14 +0000 (15:59 +0100)]
x86/genapic: remove indirection from genapic hook accesses
Instead of loading a pointer at each use site, have a single runtime
instance of struct genapic, copying into it from the individual
instances. The individual instances can this way also be moved to .init
(also adjust apic_probe[] at this occasion).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 7 Nov 2018 08:35:14 +0000 (09:35 +0100)]
p2m: move p2m-common.h inclusion point
The header is (hence its name) supposed to be a helper for the per-arch
p2m.h files. It was never supposed to be included directly, and for the
purpose of putting common function declarations into the common header
it is more helpful if things like p2m_t are already available at the
inclusion point.
This also undoes parts of 02ede7dc03 ("memory: add
check_get_page_from_gfn() as a wrapper..."), which had been there just
because of the unhelpful original way of including p2m-common.h.
Take the opportunity and also ditch a duplicate public/memory.h from the
ARM header.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Sergey Dyasli [Wed, 7 Nov 2018 08:34:17 +0000 (09:34 +0100)]
mm/page_alloc: make bootscrub happen in idle-loop
Scrubbing RAM during boot may take a long time on machines with lots
of RAM. Add 'idle' option to bootscrub which marks all pages dirty
initially so they will eventually be scrubbed in idle-loop on every
online CPU.
It's guaranteed that the allocator will return scrubbed pages by doing
eager scrubbing during allocation (unless MEMF_no_scrub was provided).
Use the new 'idle' option as the default one.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 7 Nov 2018 08:33:24 +0000 (09:33 +0100)]
x86: work around HLE host lockup erratum
XACQUIRE prefixed accesses to the 4Mb range of memory starting at 1Gb
are liable to lock up the processor. Disallow use of this memory range.
Unfortunately the available Core Gen7 and Gen8 spec updates are pretty
old, so I can only guess that they're similarly affected when Core Gen6
is and the Xeon counterparts are, too.
This is part of XSA-282.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Sergey Dyasli [Thu, 21 Jun 2018 14:35:50 +0000 (16:35 +0200)]
x86/domctl: Implement XEN_DOMCTL_get_cpu_policy
This finally (after literally years of work!) marks the point where the
toolstack can ask the hypervisor for the current CPUID configuration of a
specific domain.
Introduce a new flask access vector and update the default policies.
Also extend xen-cpuid's --policy mode to be able to take a domid and dump a
specific domains CPUID and MSR policy.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 2 Jul 2018 16:05:33 +0000 (16:05 +0000)]
x86: Introduce struct cpu_policy to refer to a group of individual policies
This is prep work for the following patch - please refer to it as well.
When auditing and manipulating policies, it is necessary to do so with a
complete set of policies, due to the interdependences of the contents. A
containing structure like this will allow for clearer APIs and code.
As a first user, this structure is convenient for the mapping used by
XEN_SYSCTL_get_cpu_policy (implemented in the next patch), and for auditing
(later when XEN_DOMCTL_set_cpu_policy is implemented).
At this point, the distinction between *_max and *_default is introduced into
the ABI. For now, *_default is mapped to *_max, but future development work
will result in *_default being a logical subset of *_max.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 21 Jun 2018 14:35:50 +0000 (16:35 +0200)]
libx86: Introduce a helper to serialise msr_policy objects
As with CPUID, an architectural form is used for representing the MSR data.
It is expected not to change moving forwards, but does have a 32 bit field
(currently reserved) which can be used compatibly if needs be.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 21 Jun 2018 14:35:49 +0000 (16:35 +0200)]
libx86: Introduce a helper to serialise cpuid_policy objects
The serialised form is made up of the leaf, subleaf and data tuple. As this
is the architectural form, it is expected not to change going forwards.
The serialisation of the Xen/Viridian leaves isn't fully implemented yet. It
is just enough to be bug-compatible with the current DOMCTL_set_cpuid
behaviour, but needs further hypervisor work before the toolstack can sensibly
control these values.
x86_cpuid_copy_to_buffer() is implemented using Xen's regular copy_to_guest
primitives, with an API-compatible memcpy() is used for the libxc half of the
build.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
George Dunlap [Tue, 6 Nov 2018 15:41:25 +0000 (15:41 +0000)]
tools/dm_depriv: Add first cut RLIMITs
Limit the ability of a potentially compromised QEMU to consume system
resources. Key limits:
- RLIMIT_FSIZE (file size): 256KiB
- RLIMIT_NPROC (after uid changes to a unique uid)
NB that we do not yet set RLIMIT_AS (total virtual memory) or
RLIMIT_NOFILES (number of open files), since these require more care
and/or more coordination with QEMU to implement.
Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
Changes since v4:
- Put global headers before local headers (sugg by Paul)
- Move #undif inside the braces (sugg by Paul)
Changes since v3:
- Align RLIMIT_ENTRY list for easier reading
- Fix wrong format string specifier
- Get rid of some trailing whitespace
Changes since v2:
- Use a macro to define rlimit entries
- Use RLIMIT_NLIMITS as an end-of-list marker, rather than -1
- Various style clean-ups
CC: Ian Jackson <ian.jackson@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Anthony Perard <anthony.perard@citrix.com>