that mmio <-> mmio copy is not handled. This means the code in the
stdvga mmio intercept that explicitly handles mmio <-> mmio copy when
hvm_copy_to/from_guest_phys() fails is never going to be executed.
This patch therefore adds a check in hvmemul_do_io_addr() to make sure
mmio <-> mmio is disallowed and then registers standard mmio intercept ops
in stdvga_init().
With this patch all mmio and portio handled within Xen now goes through
process_io_intercept().
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Thu, 9 Jul 2015 17:04:00 +0000 (19:04 +0200)]
x86/hvm: unify dpci portio intercept with standard portio intercept
This patch re-works the dpci portio intercepts so that they can be unified
with standard portio handling thereby removing a substantial amount of
code duplication.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Thu, 9 Jul 2015 17:04:00 +0000 (19:04 +0200)]
x86/hvm: unify internal portio and mmio intercepts
The implementation of mmio and portio intercepts is unnecessarily different.
This leads to much code duplication. This patch unifies much of the
intercept handling, leaving only distinct handlers for stdvga mmio and dpci
portio. Subsequent patches will unify those handlers.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Thu, 9 Jul 2015 17:04:00 +0000 (19:04 +0200)]
x86/hvm: change portio port numbers and sizes to unsigned int
Building on the previous patch, this patch changes portio port numbers
and sizes to unsigned int which then allows the io_handler size field to
reduce to an unsigned int.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Thu, 9 Jul 2015 16:27:16 +0000 (18:27 +0200)]
x86/hvm: remove multiple open coded 'chunking' loops
...in hvmemul_read/write()
Add hvmemul_phys_mmio_access() and hvmemul_linear_mmio_access() functions
to reduce code duplication.
NOTE: This patch also introduces a change in 'chunking' around a page
boundary. Previously (for example) an 8 byte access at the last
byte of a page would get carried out as 8 single-byte accesses.
It will now be carried out as a single-byte access, followed by
a 4-byte access, a 2-byte access and then another single-byte
access.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
For booting cpu, the socket number is not needed to be 0 so
it needs to be computed by cpu number.
For secondary cpu, phys_proc_id is not valid in CPU_PREPARE
notifier(cpu_smpboot_alloc), so cpu_to_socket(cpu) can't be used.
Instead, pre-allocate secondary_cpu_mask in cpu_smpboot_alloc()
and later consume it in smp_store_cpu_info().
This patch also change socket_cpumask type from 'cpumask_var_t *'
to 'cpumask_t **' so that smaller NR_CPUS works.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Tested-by: Dario Faggioli <dario.faggioli@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Boris Ostrovsky [Thu, 9 Jul 2015 14:52:31 +0000 (16:52 +0200)]
x86/VPMU: add privileged PMU mode
Add support for privileged PMU mode (XENPMU_MODE_ALL) which allows privileged
domain (dom0) profile both itself (and the hypervisor) and the guests. While
this mode is on profiling in guests is disabled.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Boris Ostrovsky [Thu, 9 Jul 2015 14:48:00 +0000 (16:48 +0200)]
x86/VPMU: handle PMU interrupts for PV(H) guests
Add support for handling PMU interrupts for PV(H) guests.
VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.
Since the interrupt handler may now force VPMU context save (i.e. set
VPMU_CONTEXT_SAVE flag) we need to make changes to amd_vpmu_save() which
until now expected this flag to be set only when the counters were stopped.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
CentOS7 complains that 'ret' might be unused, and indeed this is the case for
`xl psr-hwinfo --cat`.
The logic for selecting which information to print was rather awkward.
Introduce a new 'all' which default to true, and is cleared if specific
options are selected. This allows for a far more clear logic when choosing
whether to print information or not.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Chao Peng <chao.p.peng@linux.intel.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Chao Peng <chao.p.peng@linux.intel.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:55:32 +0000 (13:55 +0200)]
VPMU/AMD: check MSR values before writing to hardware
A number of fields of PMU control MSRs are defined as Reserved. AMD
documentation requires that such fields are preserved when the register
is written by software.
Add checks to amd_vpmu_do_wrmsr() to make sure that guests don't attempt
to modify those bits.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:53:55 +0000 (13:53 +0200)]
x86/VPMU: add support for PMU register handling on PV guests
Intercept accesses to PMU MSRs and process them in VPMU module. If vpmu ops
for VCPU are not initialized (which is the case, for example, for PV guests that
are not "VPMU-enlightened") access to MSRs will return failure.
Dump VPMU state for all domains (HVM and PV) when requested.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:53:03 +0000 (13:53 +0200)]
x86/VPMU: when handling MSR accesses, leave fault injection to callers
Hypervisor cannot easily inject faults into PV guests from arch-specific VPMU
read/write MSR handlers (unlike it is in the case of HVM guests).
With this patch vpmu_do_msr() will return an error code to indicate whether an
error was encountered during MSR processing (instead of stating that the access
was to a VPMU register). The caller will then decide how to deal with the error.
As part of this patch we also check for validity of certain MSR accesses right
when we determine which register is being written, as opposed to postponing this
until later.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:51:42 +0000 (13:51 +0200)]
x86/VPMU: save VPMU state for PV guests during context switch
Save VPMU state during context switch for both HVM and PV(H) guests.
A subsequent patch ("x86/VPMU: NMI-based VPMU support") will make it possible
for vpmu_switch_to() to call vmx_vmcs_try_enter()->vcpu_pause() which needs
is_running to be correctly set/cleared. To prepare for that, call context_saved()
before vpmu_switch_to() is executed. (Note that while this change could have
been dalayed until that later patch, the changes are harmless to existing code
and so we do it here)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:50:22 +0000 (13:50 +0200)]
x86/VPMU: initialize PMU for PV(H) guests
Code for initializing/tearing down PMU for PV guests
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:39:53 +0000 (13:39 +0200)]
x86/VPMU: interface for setting PMU mode and flags
Add runtime interface for setting PMU mode and flags. Three main modes are
provided:
* XENPMU_MODE_OFF: PMU is not virtualized
* XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
* XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
can profile itself and the hypervisor.
Note that PMU modes are different from what can be provided at Xen's boot line
with 'vpmu' argument. An 'off' (or '0') value is equivalent to XENPMU_MODE_OFF.
Any other value, on the other hand, will cause VPMU mode to be set to
XENPMU_MODE_SELF during boot.
For feature flags only Intel's BTS is currently supported.
Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Boris Ostrovsky [Thu, 9 Jul 2015 11:34:29 +0000 (13:34 +0200)]
x86/VPMU: add public xenpmu.h
Add pmu.h header files, move various macros and structures that will be
shared between hypervisor and PV guests to it.
Move MSR banks out of architectural PMU structures to allow for larger sizes
in the future. The banks are allocated immediately after the context and
PMU structures store offsets to them.
While making these updates, also:
* Remove unused vpmu_domain() macro from vpmu.h
* Convert msraddr_to_bitpos() into an inline and make it a little faster by
realizing that all Intel's PMU-related MSRs are in the lower MSR range.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/arm: gic-v3: Add support of vGICv2 when available
* Modify the GICv3 driver to recognize a such device. I wasn't able
to find a register which tell if GICv2 is supported on GICv3. The only
way to find it seems to check if the DT node provides GICC and GICV.
* Disable access to ICC_SRE_EL1 to guest using vGICv2
* The LR is slightly different for vGICv2. The interrupt is always
injected with group0.
* Add a comment explaining why Group1 is used for vGICv3.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
A platform may have a GIC compatible with previous version of the
device.
This is allow to virtualize an unmodified OS on new hardware if the GIC
is compatible with older version.
When a guest is created, the vGIC will emulate same version as the
hardware. Although, the user can specify in the configuration file the
preferred version (currently only GICv2 and GICv3 are supported).
Signed-off-by: Julien Grall <julien.grall@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(libxl_gic_version is an enum). This is not currently supported by the
ocaml genwrap.py. Add a simple pass which handles simple anonymous
unions as top level members of a struct type, but not more deeply
nested since that would be a much more complex change and is not
currently required.
With Juliens patch applied the relevant resulting change to the .mli
is:
type type__union = Hvm of type_hvm | Pv of type_pv | Invalid
+ type arch_arm__anon = {
+ gic_version : gic_version;
+ }
+
type t =
{
max_vcpus : int;
@@ -510,6 +522,7 @@ module Domain_build_info : sig
ramdisk : string option;
device_tree : string option;
xl_type : type__union;
+ arch_arm : arch_arm__anon;
}
val default : ctx -> ?xl_type:domain_type -> unit -> t
end
The .ml differs similarly. Without Julien's patch there is no change.
gen_struct is refactored slightly to take the indent level as an
argument, since it is now used at a different level.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Julien Grall <julien.grall@citrix.com> Cc: Dave Scott <Dave.Scott@citrix.com> Cc: Rob Hoes <Rob.Hoes@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Rob Hoes <rob.hoes@citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
If hdtype=ahci adds ich9 disk controller in ahci mode and uses it with
upstream qemu to emulate disks instead of ide.
It doesn't support cdroms which still using ide (cdroms will use
"-device ide-cd" as new qemu parameter)
Ahci requires new qemu parameter but for now other emulated disks cases
remains with old ones (I did it in other patch, not needed by this one)
I did it as libxl parameter disabled by default to avoid possible
problems:
- with save/restore/migration (restoring with ahci a domU that was with
ide instead)
- windows < 8 without pv drivers (a registry key change is needed for
AHCI<->IDE change FWIK to avoid possible blue screen)
- windows XP or older that many not support ahci by default.
Setting AHCI with libxl parameter and default to disabled seems the best
solution.
AHCI increase hvm domUs boot performance. On linux hvm domU I saw up to
only 20% of the previous total boot time, whereas boot time decrease a
lot on W7 domUs for most of boots I have done. Small difference in boot
time compared to ide mode on W8 and newer (probably other xen
improvements or fixes are needed not ahci related)
Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- adjust name of LIBXL_HAVE #define as discussed on list,
fixup pod syntax in xl.cfg.pod.5 ]
This is the xc/xl changes to support Intel Cache Allocation
Technology(CAT).
'xl psr-hwinfo' is updated to show CAT info and two new commands
for CAT are introduced:
- xl psr-cat-cbm-set [-s socket] <domain> <cbm>
Set cache capacity bitmasks(CBM) for a domain.
- xl psr-cat-show <domain>
Show CAT domain information.
Examples:
[root@vmm-psr vmm]# xl psr-hwinfo --cat
Cache Allocation Technology (CAT):
Socket ID : 0
L3 Cache : 12288KB
Maximum COS : 15
CBM length : 12
Default CBM : 0xfff
[root@vmm-psr vmm]# xl psr-cat-cbm-set 0 0xff
[root@vmm-psr vmm]# xl psr-cat-show
Socket ID : 0
L3 Cache : 12288KB
Default CBM : 0xfff
ID NAME CBM
0 Domain-0 0xff
vm_event: rename MEM_ACCESS_EMULATE and MEM_ACCESS_EMULATE_NOWRITE
By naming, placing and bit shift convention, it could be taken as
implied that MEM_ACCESS_EMULATE and MEM_ACCESS_EMULATE_NOWRITE are
mem_access event specific flags (instead of being generally
applicable as vm_event flags). This patch renames them to
VM_EVENT_FLAG_EMULATE and VM_EVENT_FLAG_EMULATE_NOWRITE
respectively, and uses bit shifts following the rest of the
VM_EVENT_FLAG_ constants.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Tamas K Lengyel <tlengyel@novetta.com>
more specifically, of all the symbols and references
to it.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
libxc: Prevent NULL pointer dereference in stdiostream_vmessage()
Unlikely that it may seem localtime_r could fail, which would result in a
null pointer dereference. In this case, it shoud log the errno, (instead of
the date/time), and and continue its logging, as this is still useful.
Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 15 Jun 2015 13:50:42 +0000 (14:50 +0100)]
xl: Sane handling of extra config file arguments
Various xl sub-commands take additional parameters containing = as
additional config fragments.
The handling of these config fragments has a number of bugs:
1. Use of a static 1024-byte buffer. (If truncation would occur,
with semi-trusted input, a security risk arises due to quotes
being lost.)
2. Mishandling of the return value from snprintf, so that if
truncation occurs, the to-write pointer is updated with the
wanted-to-write length, resulting in stack corruption. (This is
XSA-137.)
3. Clone-and-hack of the code for constructing the appended
config file.
These are fixed here, by introducing a new function
`string_realloc_append' and using it everywhere. The `extra_info'
buffers are replaced by pointers, which start off NULL and are
explicitly freed on all return paths.
The separate variable which will become dom_info.extra_config is
abolished (which involves moving the clearing of dom_info).
Additional bugs I observe, not fixed here:
4. The functions which now call string_realloc_append use ad-hoc
error returns, with multiple calls to `return'. This currently
necessitates multiple new calls to `free'.
5. Many of the paths in xl call exit(-rc) where rc is a libxl status
code. This is a ridiculous exit status `convention'.
6. The loops for handling extra config data are clone-and-hacks.
7. Once the extra config buffer is accumulated, it must be combined
with the appropriate main config file. The code to do this
combining is clone-and-hacked too.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Tested-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian,campbell@citrix.com>
Anthony PERARD [Tue, 7 Jul 2015 15:09:13 +0000 (16:09 +0100)]
libxl: Increase device model startup timeout to 1min.
On a busy host, QEMU may take more than 10s to load and start.
This is likely due to a bug in Linux where the I/O subsystem sometime
produce high latency under load and result in QEMU taking a long time to
load every single dynamic libraries.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Since XSA-131 qemu-xen now restricts access to PCI cfg by default. In
order to allow local configuration of the existing libxl_device_pci
"permissive" flag needs to be plumbed through via the new QMP property
added by the XSA-131 patches.
Versions of QEMU prior to XSA-131 did not support this permissive
property, so we only pass it if it is true. Older versions only
supported permissive mode.
qemu-xen-traditional already supports the permissive mode setting via
xenstore.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Anthony PERARD <anthony.perard@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Mon, 6 Jul 2015 10:51:39 +0000 (11:51 +0100)]
libxl: Remove linux udev rules
They are no longer needed, having been replaced by a daemon for
driverdomains which will run scripts as necessary.
Worse yet, they seem to be broken for script-based block devices, such
as block-iscsi. This wouldn't matter so much if they were never run
by default; but if you run block-attach without having created a
domain, then the appropriate node to disable running udev scripts will
not have been written yet, and the attach will silently fail.
Rather than try to sort out that issue, just remove them entirely.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Mon, 6 Jul 2015 10:51:38 +0000 (11:51 +0100)]
libxl: Make local_initiate_attach more rational
There are a lot of paths through
libxl__device_disk_local_initiate_attach(), but they all really boil
down to one thing: Can we just access the file directly, or do we need
to attach it?
The requirements for direct access are fairly simple:
* Is this local (as opposed to a driver domain)?
* Is this a raw format (as opposed to cooked)?
* Does this have no scripts associated with it?
If it meets all those requirements, we can access it directly;
otherwise we need to attach it.
This fixes a bug where bootloader execution fails for disks with
hotplug scripts.
This should fix a theoretical bug when using a qdisk backend in a
driver domain. (Not tested.)
Based on a patch by Roger Pau Monne <roger.pau@citrix.com>.
Wei Liu [Mon, 6 Jul 2015 13:47:40 +0000 (14:47 +0100)]
libxc: fix PV vNUMA guest memory allocation
In 415b58c1 (tools/libxc: Batch memory allocations for PV guests) the
number of super pages is calculated with the number of total pages. That
is wrong. It breaks PV guest vNUMA. The correct number of super pages
should be derived from the number of pages within that virtual NUMA
node.
Also change the name and type of super page variable to match the naming
convention and type of normal page variable. Make the necessary
adjustment to make code compile.
Reported-by: Dario Faggioli <dario.faggioli@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-and-Tested-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 1 Jul 2015 14:43:07 +0000 (15:43 +0100)]
xen: earlycpio: Pull in latest linux earlycpio.[ch]
AFAICT our current version does not correspond to any version in the
Linux history. This commit resynchronised to the state in Linux
commit 598bae70c2a8e35c8d39b610cca2b32afcf047af.
Differences from upstream: find_cpio_data is __init, printk instead of
pr_*.
This appears to fix Debian bug #785187. "Appears" because my test box
happens to be AMD and the issue is that the (valid) cpio generated by
the Intel ucode is not liked by the old Xen code. I've tested by
hacking the hypervisor to look for the Intel path.
Reported-by: Stephan Seitz <stse+debianbugs@fsing.rootsland.net> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stephan Seitz <stse+debianbugs@fsing.rootsland.net> Cc: 785187@bugs.debian.org Acked-by: Jan Beulich <jbeulich@suse.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:18 +0000 (09:46 +0100)]
xen: arm: consolidate mmio and irq mapping to dom0
The code in the callbacks for dt_for_each_irq_map and
dt_for_each_range is very similar to the code in handle_device for
each non-pci device.
In fact the only major difference is that the irq callback needs to
call irq_set_spi_type in the PCI case. Refactor into a
map_dt_irq_to_domain callback which does the irq_set_spi_type and then
calls map_irq_to_domain which is also used from handle_device.
For mmio map_range_to_domain can already be used directly from
handle_device too. Note that the uses of PAGE_MASK in the
handle_device code here were unnecessary (and already removed from the
map_range_to_domain variant).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:17 +0000 (09:46 +0100)]
xen: arm: Import of_bus PCI entry from Linux (as a dt_bus entry)
This provides specific handlers for the PCI bus relating to matching
and translating. It's mostly similar to the defaults but includes some
additional error checks and other PCI specific bits.
There are some subtle differences in how the generic code vs. the pci
specific code here will handle buggy DTs (i.e. #*-cells which are not
as required by the pci bindings). This will mean we tolerate such
device trees better.
I say "buggy", but actually it's not clear to me from reading "PCI Bus
Binding to Open Firmware" that when the device_type is "pci" that
e.g. the text says "The value of "#address-cells" for PCI Bus Nodes is
3." and not "A PCI Bus Node must contain a #address-cells property
containing 3", iow the #address-cells might validly be implicit rather
than an actual property. Maybe that interpretation is bogus, but with
this patch we are are able to cope with DTs written by people who do
read it like that.
It also gets us the ability to parse the flags (cacheability),
although at the moment we only check them for validity rather than use
them.
Functions/types renamed and reindented (because apparently we do
that for these).
Needs a selection of IORESOURCE_* defines, which I've taken from Linux
and have included locally for now until we figure out where else they
might be needed.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:16 +0000 (09:46 +0100)]
xen: arm: map child MMIO and IRQs to dom0 for PCI bus DT nodes.
This uses the dt_for_each_{irq_map,range} helpers to map the interrupt
and child MMIO regions to dom0. Since PCI busses are enumerable these
resources may not be otherwise described in the DT (although they can
be).
Although PCI is the only bus we handle this way the code should be
generic enough to apply to similar buses in the future.
This replaces the xgene specific mapping. Tested on Mustang and on a
model with a PCI virtio controller.
This patch doesn't stop recursing when it finds such a node, since
double mapping these resources if they do happen to be described is
(or should be) harmless
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:15 +0000 (09:46 +0100)]
xen: arm: drop redundant extra call to vgic_reserve_virq
This is only needed if we are giving the IRQ to dom0 (as opposed to
setting it up for passthrough due to xen,passthrough property). There
is already a call to vgic_reserve_virq inside the if ( need_mapping ),
so drop this one.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:14 +0000 (09:46 +0100)]
xen: dt: add dt_for_each_range helper
This function iterates over a node's ranges property and calls a
callback for each region. For now it only supplies the MMIO range (in
terms of CPU addresses, i.e. already translated).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 08:46:13 +0000 (09:46 +0100)]
xen: dt: add dt_for_each_irq_map helper
This function iterates over a nodes interrupt-map property and calls a
callback for each interrupt. For now it only supplies the translated
IRQ since my use case has no need of e.g. child unit address. These
can be added as needed by any future users.
This follows much the same logic as dt_irq_map_raw when parsing the
interrupt-map, but doesn't walk up the tree doing the actual
translation and it iterates over all entries instead of just looking
for the first match.
I looked into refactoring dt_irq_map_raw but I couldn't find a way
which I was confident in, plus I was reluctant to diverge from the
Linux roots of this function any further.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com>
Ian Campbell [Tue, 7 Jul 2015 11:51:54 +0000 (12:51 +0100)]
tools: Rerun autogen.sh with Jessie version of autoconf
I have upgraded the box which I use to do committing (and hence run
autogen.sh on) from Debian Wheezy to Jessie, resulting in a upgrade
from autoconf 2.69-1 to 2.69-8. To avoid noise from this transition
when the next configure.ac change occurs regenerate those files now.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
For CAT, COS is maintained in hypervisor only while CBM is exposed to
user space directly to allow getting/setting domain's cache capacity.
For each specified CBM, hypervisor will either use a existed COS which
has the same CBM or allocate a new one if the same CBM is not found. If
the allocation fails because of no enough COS available then error is
returned. The getting/setting are always operated on a specified socket.
For multiple sockets system, the interface may be called several times.
In Xen's implementation, the CAT enforcement granularity is per domain.
Due to the length of CBM and the number of COS may be socket-different,
each domain has COS ID for each socket. The domain get COS=0 by default
and at runtime its COS is then allocated dynamically when user specifies
a CBM for the domain.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
For each socket, a COS to CBM mapping structure is maintained for each
COS. The mapping is indexed by COS and the value is the corresponding
CBM. Different VMs may use the same CBM, a reference count is used to
indicate if the CBM is available.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Detect Intel Cache Allocation Technology(CAT) feature and store the
cpuid information for later use. Currently only L3 cache allocation is
supported. The L3 CAT features may vary among sockets so per-socket
feature information is stored. The initialization can happen either at
boot time or when CPU(s) is hot plugged after booting.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Use -1 as notifier priority. Fix typos.
Paul Durrant [Tue, 7 Jul 2015 12:40:04 +0000 (14:40 +0200)]
x86/hvm: make sure emulation is retried if domain is shutting down
The addition of commit 2df1aa01 "x86/hvm: remove hvm_io_pending() check
in hvmemul_do_io()" causes a problem in migration because I/O that was
caught by the test of vcpu_start_shutdown_deferral() in
hvm_send_assist_req() is now considered completed rather than requiring
a retry.
This patch fixes the problem by having hvm_send_assist_req() return
X86EMUL_RETRY rather than X86EMUL_OKAY if the
vcpu_start_shutdown_deferral() test fails and then making sure that
the emulation state is reset if the domain is found to be shutting
down.
Reported-by: Don Slutz <don.slutz@gmail.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 7 Jul 2015 12:39:40 +0000 (14:39 +0200)]
x86/hvmloader: improve error handling for xenbus interactions
Consume and ignore all XS_DEBUG packets, and pass the response type back to
the caller of xenbus_recv() so the caller can take appropriate action if an
unexpected reply was received.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 7 Jul 2015 12:39:27 +0000 (14:39 +0200)]
x86/hvmloader: avoid data corruption with xenstore reads/writes
The functions ring_read and ring_write() have logic to try and deal with
partial reads and writes.
However, in all cases where the "while (len)" loop executed twice, data
corruption would occur as the second memcpy() starts from the beginning of
"data" again, rather than from where it got to.
This bug manifested itself as protocol corruption when a reply header crossed
the first wrap of the response ring. However, similar corruption would also
occur if hvmloader observed xenstored performing partial writes of the block
in question, or if hvmloader had to wait for xenstored to make space in either
ring.
Reported-by: Adam Kucia <djexit@o2.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
credit1: properly deal with pCPUs not in any cpupool
Ideally, the pCPUs that are 'free', i.e., not assigned
to any cpupool, should not be considred by the scheduler
for load balancing or anything. In Credit1, we fail at
this, because of how we use cpupool_scheduler_cpumask().
In fact, for a free pCPU, cpupool_scheduler_cpumask()
returns a pointer to cpupool_free_cpus, and hence, near
the top of csched_load_balance():
if ( unlikely(!cpumask_test_cpu(cpu, online)) )
goto out;
is false (the pCPU _is_ free!), and we therefore do not
jump to the end right away, as we should. This, causes
the following splat when resuming from ACPI S3 with
pCPUs not assigned to any pool:
The cure is:
* use cpupool_online_cpumask(), as a better guard to the
case when the cpu is being offlined;
* explicitly check whether the cpu is free.
SEDF is in a similar situation, so fix it too.
Still in Credit1, we must make sure that free (or offline)
CPUs are not considered "ticklable". Not doing so would impair
the load balancing algorithm, making the scheduler think that
it is possible to 'ask' the pCPU to pick up some work, while
in reallity, that will never happen! Evidence of such behavior
is shown in this trace:
Name CPU list
Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
In fact, when a pCPU goes down, we want to clear its
bit in the correct cpupool's valid mask, rather than
always in cpupool0's one.
Before this commit, all the pCPUs in the non-default
pool(s) will be considered immediately valid, during
system resume, even the one that have not been brought
up yet. As a result, the (Credit1) scheduler will attempt
to run its load balancing logic on them, causing the
following Oops:
The reason why the error is a #GP fault is that, without
this commit, we try to access the per-cpu area of a not
yet allocated and initialized pCPU.
In fact, %rax, which is what is used as pointer, is 80007d2f7fccb780, and we also have this:
When dumping scheduling information (debug key 'r'), what
we print as 'Idle cpupool' is pretty much the same of what
we print immediately after as 'Cpupool0'. In fact, if there
are no pCPUs outside of any cpupools, it is exactly the
same.
If there are free pCPUs, there is some valuable information,
but still a lot of duplication:
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 7 Jul 2015 08:39:52 +0000 (10:39 +0200)]
x86/nHVM: generic hook adjustments
Some of the generic hooks were unused altogether - drop them.
Some of the hooks were used only to handle calls from the specific
vendor's code (SVM) - drop them too.
Several more hooks were pointlessly implementaed as out-of-line
functions, when most (all?) other HVM hooks use inline ones - make
them inlines. None of them are implemented by only one of SVM or VMX,
so also drop the conditionals. Funnily nhvm_vmcx_hap_enabled(), having
return type bool_t, nevertheless returned -EOPNOTSUPP.
nhvm_vmcx_guest_intercepts_trap() and its hook and implementations are
being made return bool_t, as they should have been from the beginning
(its sole caller only checks for a non-zero result).
Finally, make static whatever can as a result be static.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Jan Beulich [Tue, 7 Jul 2015 08:34:13 +0000 (10:34 +0200)]
x86: drop is_pv_32on64_domain()
... as being identical to is_pv_32bit_domain() after the x86-32
removal.
In a few cases this includes no longer open-coding is_pv_32bit_vcpu().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 7 Jul 2015 08:30:12 +0000 (10:30 +0200)]
gnttab: clean up gnttab_set_version()
- drop pointless nr_grant_entries() check from loop over reserved
entries (adding suitable BUILD_BUG_ON()s to validate that)
- adjust types
- rename d to currd
- formatting
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 7 Jul 2015 08:28:25 +0000 (10:28 +0200)]
gnttab: fix out of range shift count
Commit 213f145114 ("gnttab: fix/adjust gnttab_transfer()") wasn't
careful enough in this regard.
Coverity ID: 1306859 Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
libxc: Fix misleading use of strncpy code in build_hvm_info()
hvm_info->signature is not a string, but an 64 bit int, and is not
NULL terminated. The use of strncpy to populate it is inappropriate and
potentially misleading. A cursory glance might have you thinking someone
had miscounted the length of the string literal - not realising it was
intentionally cropping of the null termination.
Also, since we wish to initialise all of hvm_info->signature, and
certainly no more, the use of sizeof is safer.
Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxc: Prevent dereferencing NULL pointers returned from xc_dom_allocate()
The return from xc_dom_allocate is not checked for a NULL value.
This patch fixes this, causing it to return from the function with an error.
Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>