According to Intel 64 and IA32 Architectures SDM 3B Appendix B, Intel
Nehalem/Westmere processors provide h/w MSR to report the core/package
cstate residencies. Extend sysctl_get_pmstat interface to pass the
core/package cstate residencies.
Eliminate redundant ones, fix names (where so far inappropriately
referring to capability structure fields the don't really relate to),
use symbolic names instead of raw numbers, and remove an unusable one.
This matches similar checks done in Linux, since no good can come from
a domain trying to enable both MSI and MSI-X on the same device at the
same time.
x86/cpufreq: pass pointers to cpu masks where possible
This includes replacing the bogus definition of cpumask_test_cpu()
(introduced by c/s 20073) with a Linux compatible one and replacing
the bad uses with cpu_isset().
x86 hvm: Add a new HVMOP to get the current Xen system time
Xen absolute system time, so that it can use SCHEDOP_poll in a
sensible fashion. HVM PV drivers can't use the normal PV clock
because they might have TSC offsets that hey don't know about.
iommu: New options iommu=dom-strict and iommu=dom0-passthrough
The former strips dom0 of its usual 1:1 mapping of all memory, and
only provides it with mappings of its own memory, like any other
domain. The latter is a new consistent name for iommu=passthrough.
Currently "make stubdom" on its own fails because it depends on files
being installed by the results of "make tools". This also means that
in some circumstances a parallel "make tools stubdom" (or "make all")
can fail due to races. So make "make stubdom" depend on "make tools"
having completed first.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
The hardware CPUID-levelling features level the feature flags but
don't change the CPU family/model/stepping. Relax the HVM restore
check on family/model/stepping to printk but not veto the load, so
that VMs can be migrated between machines that have been
CPUID-levelled.
xen: allow HVM save/restore from different changesets
Allow HVM save/restore from different changesets of Xen. The HVM save
records are supposed to be backwards compatible; XenServer
live-migrates between versions of Xen during upgrades.
xen: make the shadow allocation hypercalls include the p2m memory
in the total shadow allocation. This makes the effect of allocation
changes consistent regardless of p2m activity on boot.
Otherwise vcpu_periodic_timer_work() can think the next timer is in
the future (and re-issue it unchanged) while timer_softirq_action()
thinks it's in the past (and fires it immediately), leading to
livelock.
rombios: move the stack to 0x9e000 and protect it with an e820 entry
so that we don't corrupt E820_RAM memory with stack ops in S3 wakeup.
It has to move up so the lowest contiguous RAM area is >= 512MiB.
This relies on the previous fix to let DS != SS
Signed-off-by: Paul Durrant <Paul.Durrant@citrix.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Ian Jackson [Tue, 6 Jul 2010 15:55:49 +0000 (16:55 +0100)]
tools/libxl: allow setting of timer_mode, hpet and vpt_align parameters
Implement parsing for timer_mode, hpet and vpt_align parameters.
These are all HVM only parameters and hpet/vpt_align are boolean so
change types and place in hvm union accordingly. Also HPET is x86 only
on principle so make this compile-time conditional on arch as-is
viridian.
Ian Jackson [Tue, 6 Jul 2010 12:10:14 +0000 (13:10 +0100)]
tools/hotplug: locking.sh script: fix lock directory remains on error bug
_release_lock should be used instead of release_lock.
sigerr is introduced so that it can be redefined by
xen-hotplug-common.sh to a version which writes error status to xenstore.
Ian Jackson [Tue, 6 Jul 2010 10:57:20 +0000 (11:57 +0100)]
tools/xenstore: add XS_RESTRICT operation to C xenstore client libs.
The OCaml xenstored supports the XS_RESTRICT operation, which
deprivileges a dom0 xenstore connection so it can only affect one
domain's entries. Add the relevant definitions to the C libraries
so that callers can use it.
This patch masks PIC and IOAPIC RTE's before x2APIC enabling, unmask
and restore them after x2APIC enabling. It also really enables
interrupt remapping before x2APIC enabling instead of just checking
interrupt remapping setting. This patch also handles all x2APIC
configuration including BIOS settings and command line
settings. Especially, it handles that BIOS hands over in x2APIC mode
(when there is apic id > 255). It checks if x2APIC is already enabled
by BIOS. If already enabled, it will disable interrupt remapping and
queued invalidation first, then enable them again.
Signed-off-by: Weidong Han <weidong.han@intel.com>
x2APIC/VT-d: improve interrupt remapping and queued invalidation enabling and disabling
x2APIC depends on interrupt remapping, so interrupt remapping needs to
be enabled before x2APIC. Usually x2APIC is not enabled
(x2apic_enabled=0) when enable interrupt remapping, although x2APIC
will be enabled later. So it needs to pass a parameter to set
interrupt mode in intremap_enable, instead of checking
x2apic_enable. This patch adds a parameter "eim" to intremap_enable to
achieve it. Interrupt remapping and queued invalidation are already
enabled when enable x2apic, so it needn't to enable them again when
setup iommu. This patch checks if interrupt remapping and queued
invalidation are already enable or not, and won't enable them if
already enabled. It does the similar in disabling, that's to say don't
disable them if already disabled.
Signed-off-by: Weidong Han <weidong.han@intel.com>
A drhd is created when parse ACPI DMAR table, but drhd->iommu is not
allocated until iommu setup. But iommu is needed by x2APIC which will
enable interrupt remapping before iommu setup. This patch allocates
iommu when create drhd. And then drhd->ecap can be removed because
it's the same as iommu->ecap.
Signed-off-by: Weidong Han <weidong.han@intel.com>
VMX: fix ept pages free up when ept superpage split fails:
1) implement ept super page split in a recursive way to
form an ept sub tree before real installation;
2) free an ept sub tree also in a recursive way.
3) change ept_next_level last input parameter from shift
bits # to next walk level;
This path enables AMD OSVW (OS Visible Workaround) feature for
Xen. New AMD errata will have a OSVW id assigned in the future. OS is
supposed to check OSVW status MSR to find out whether CPU has a
specific erratum. Legacy errata are also supported in this patch:
traditional family/model/stepping approach will be used if OSVW
feature isn't applicable. This patch is adapted from Hans Rosenfeld's
patch submitted to Linux kernel.
Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Hans Rosenfeld <hands.rosenfeld@amd.com> Acked-by: Jan Beulich <jbeulich@novell.com>
blktap2: make protocol specific usage of shared sring explicit
I don't think protocol specific data really belongs in this header
but since it is already there and we seem to be stuck with it let's at
least make the users explicit lest people get caught out by future new
fields moving the pad field around.
This is the Xen portion of this change. The kernel portion will be
sent separately. There is no dependency between the two.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Daniel Stodden <daniel.stodden@citrix.com> Cc: Dongxiao Xu <dongxiao.xu@intel.com>
After getting a report of 3.2.3's xenmon crashing Xen (as it turned
out this was because c/s 17000 was backported to that tree without
also applying c/s 17515), I figured that the hypervisor shouldn't rely
on any specific state of the actual trace buffer (as it is shared
writable with Dom0)
[GWD: Volatile quantifiers have been taken out and moved to another
patch]
To make clear what purpose specific variables have and/or where they
got loaded from, the patch also changes the type of some of them to be
explicitly u32/s32, and removes pointless assertions (like checking an
unsigned variable to be >= 0).
I also took the prototype adjustment of __trace_var() as an
opportunity to simplify the TRACE_xD() macros. Similar simplification
could be done on the (quite numerous) direct callers of the function.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Tue, 29 Jun 2010 17:17:44 +0000 (18:17 +0100)]
Use fixed-width types in the memory event interface
Set the types in the public memory_event header file to use
fixed-sized and self-aligned fields rather than "unsigned long". AIUI
this feature only works with 64-bit hypervisors but I think this
change will be necessary to use 32-on-64 dom0 tools.
This breaks compatibility with older builds of the tools, but I can't
see any way to avoid it short of __attribute__((__packed__)).
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> Acked-by: Patrick Colp <pjcolp@cs.ubc.ca>
Ian Jackson [Tue, 29 Jun 2010 15:23:13 +0000 (16:23 +0100)]
tools/pygrub: Fix default when out of range
This is the patch to fix pyGrub default value when it's being set out of
range. This patch makes the quiet and interactive mode select the same
default image when the default value for boot entry is out of range,
i.e. when the guest is having wrong configuration in it's boot loader
(like 3 entries with default mistakenly set to 10 etc).
When the boot entry number is being set out of range it falls back to 0
(first entry of boot loader).
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Ian Jackson [Tue, 29 Jun 2010 14:07:17 +0000 (15:07 +0100)]
tools: init.d/xencommons: Wait for xenstored to start before setting dom0 name
On one of my boxes, the xenstore-write setting dom0's name starts
before xenstored is actually ready to handle the connection properly,
resulting in the name set failing. Wait for xenstored to be up and
responding to reads before continuing, timing out after 30 seconds.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Ian Jackson [Tue, 29 Jun 2010 13:52:51 +0000 (14:52 +0100)]
tools/xend, xm: add a command to get the state of VMs
add a command "domstate" to get the state of Vms, which may have one state of
{'shutoff', 'idle','shutdown','running','crashed','paused' or 'paused by
admin"}.
For case of pause, I distinguish it into two conditions. One is "paused" the
other is "paused by admin".
"pasued by admin" means that users pause a domain voluntary by "xm paused
VM" or " API"
Ian Jackson [Mon, 28 Jun 2010 16:13:43 +0000 (17:13 +0100)]
libxl: Specify no nics to qemu when no emulated nics
qemu will default to one emulated NIC if no network configuration is
specified on the command-line. If there are no emualted NICs (i.e.,
no NICs or all NICs are PV), specify no nics to avoid getting an
emulated NIC by default.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>