Keir Fraser [Tue, 1 Jun 2010 09:56:07 +0000 (10:56 +0100)]
x86 mtrr: Remove (noop) lock_cpu_hotplug().
CPUs coming online sync themselves with current MTRR state at an
appropriate point anyway.
It's not actually possible to have a newly booted CPU immediately have
in-sync MTRR state anyway. It has to be synced up as part of normal
CPU bootstrap procedure. Which is what we do.
Keir Fraser [Tue, 1 Jun 2010 06:04:35 +0000 (07:04 +0100)]
xc: deal with xen/evtchn and xen/gntdev device names
This patch makes xc_linux properly deal with:
1. discovering and creating device nodes if necessary
2. the new form of xen/<dev> device names soon to be used by the
kernel
This changes the logic slightly:
- If a device node already exists with the proper name, then it uses
it as-is, assuming it has already been correctly created.
- If the path doesn't exist, or it exists but isn't a device node,
and
it has successfully found the major/minor for the device, then
(re)create the device node.
Since this logic is identical for gntdev and evtchn, make a common
function to handle both.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keir Fraser [Tue, 1 Jun 2010 05:45:44 +0000 (06:45 +0100)]
xen: update_runstate_area for 32 bit PV on HVM guests
The current implementation of update_runstate_area is unable to handle
32 bit PV on HVM guests because the check is_pv_32on64_domain doesn't
cover that case. This patch fixes it.
Keir Fraser [Fri, 28 May 2010 08:38:56 +0000 (09:38 +0100)]
xl/libxtl: Remove glitch in xl migrate log output
* Provide a new XTL_STDIOSTREAM_HIDE_PROGRESS flag in the stdio logger
* Provide a way to adjust the flags after logger setup
* Use these to disable progress output from the migration receiver, as
the sender is also sending progress information.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Keir Fraser [Fri, 28 May 2010 08:38:18 +0000 (09:38 +0100)]
libxc: remove \n from strings passed to PERROR
Previously, the code was inconsistent: some calls to PERROR passed \n
and some did not. With the new logging arrangements, passing \n is
definitely incorrect.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Keir Fraser [Fri, 28 May 2010 08:37:42 +0000 (09:37 +0100)]
libxc: save/restore error handling fixes
* Make "read_exact" in libxc always set errno.
* Rename "read_exact" macro in xc_domain_restore.c (which shadows
real function) to RDEXACT and change all callers.
* Make RDEXACT anamorphically use xch for error reporting rather than
* stderr.
* Call PERROR rather than ERROR when appropriate, so that log messages
include errno.
* Save errno in noncached_write so that its errno value is always
* right.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Keir Fraser [Fri, 28 May 2010 08:30:19 +0000 (09:30 +0100)]
libxc: eliminate static variables, use xentoollog; API change
This patch eliminate the global variables in libxenctrl (used for
logging and error reporting).
Instead the information which was in the global variables is now in a
new xc_interface* opaque structure, which xc_interface open returns
instead of the raw file descriptor; furthermore, logging is done via
xentoollog.
There are three new parameters to xc_interface_open to control the
logging, but existing callers can just pass "0" for all three to get
the old behaviour.
All libxc callers have been adjusted accordingly.
Also update QEMU_TAG for corresponding qemu change.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Fri, 28 May 2010 08:29:15 +0000 (09:29 +0100)]
xtl: New xentoollog mini-library.
We provide a new header file "xentoollog.h" which defines an interface
that libraries and applications can use for logging. This avoids
having to wrap each library's log callbacks up, massage arguments to
log callbacks, and so on.
The library's .o files are within libxc to avoid having to create a
separate lib*.a, but callers do not need to #include xenctrl.h and it
should be regarded as a separate API.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Fri, 28 May 2010 08:27:40 +0000 (09:27 +0100)]
libelf: Tidy up logging and remove dependency on stdio.
libelf now permits callers to specify logging callback functions,
rather than a FILE*. libelf's non-Xen callers are all libxc users, so
the stdio dependency and the default logging callback function (which
calls vfprintf) is now in libxc.
Xen's use of libxc is unaffected in this patch.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Keir Fraser [Fri, 28 May 2010 07:14:54 +0000 (08:14 +0100)]
HAP: Add hardware capability check for 2MB super page.
While setting the HAP entry previously, we only check the hardware
capability for 1GB super page. This patch adds hardware capability
check for 2MB superpage
Also, Intel SDM doesn't exclude 1GB feature for 32/pae
host. Therefore remove the BUG_ON() check in common code.
Keir Fraser [Fri, 28 May 2010 07:10:48 +0000 (08:10 +0100)]
xl: fix PCI resource parsing
The parsing of PCI resources has two problems:
1. it assumes devices are 32-bits, whereas the fields in the
"resources" file can have full 64-bit values
2. it only parses the first resource because the format string is
missing a \n
Fix both of these up, which allows my Intel 82574L to work with MSI-X.
However, this should probably be using a PCI access library rather
than rummaging around in /sys/bus/pci...
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keir Fraser [Thu, 27 May 2010 08:04:46 +0000 (09:04 +0100)]
x86: Speed up PV-guest superpage mapping
The current version of superpage mapping takes a PGT_writable
reference to every page in a superpage each time it is mapped. This
is extremely slow, so slow that applications become unusable.
My solution for this is to introduce a superpage table in the
hypervisor, similar to the frametable structure for pages. Currently
this table only has a type_info element. There are three types a
superpage can have, SGT_mark, SGT_dynamic, or SGT_none.
In normal operation, the first time a superpage is mapped, a
PGT_writable reference is taken to each page in the superpage, and the
superpage is set to type SGT_dynamic and the superpage typecount is
incremented. On subsequent mappings and unmappings, only the
superpage typecount changes. On the last unmap, the PGT_writable
reference on each page is removed.
The SGT_mark type is set and cleared through two new MMUEXT
hypercalls, mark_super and unmark_super. When the hypercall is made,
the superpage's type is set to SGT_mark and a PGT_writable reference
is taken to its pages. On unmark, the type is cleared and the
reference removed.
If a page is already set to SGT_dynamic when mark_super is called, the
type is changed to SGT_mark and no additional PGT_writable reference
is taken. If there are still outstanding mappings of this superpage
when unmark_super is called, the type is set to SGT_dynamic and the
PGT_writable reference is not removed.
Fast superpage mapping is only supported on 64 bit hypervisors. For
32 bit hyperviors, superpage mapping is supported but will be
extremely slow.
Signed-off-by: Dave McCracken <dave.mccracken@oracle.com>
Keir Fraser [Thu, 27 May 2010 07:21:24 +0000 (08:21 +0100)]
xenconsoled: Discard guest console data in bigger chunks
Discard guest console data in bigger chunks so that there are fewer
discontinuities in the console data. Also avoid discarding data if
space is available at the front of the buffer by reclaiming that
space.
Patch from: Christian Limpach <Christian.Limpach@citrix.com> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Wed, 26 May 2010 09:52:15 +0000 (10:52 +0100)]
pyGrub: Use proper bootloader class when entering command manually
Use the proper bootloader class when entering the boot commands
manually (i.e. using the 'c' option). Before this patch the bootloader
was always treated to be Grub but when user is using Grub2/ExtLinux or
Lilo it's rather confusing. After applying this patch the proper
bootloader image class is being used, e.g. Grub2Image for Grub2
etc. when you define the boot commands manually using the 'c' command
in pyGrub.
Also, fix for using isconfig has been applied since if there is not fs
set in the run_grub() method the read_config() would fail since it's
trying to access undefined self.cf which is now being set to parser()
from cfg_list.
Signed-off-by: Michal Novotny <minovotn@redhat.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Wed, 26 May 2010 07:15:31 +0000 (08:15 +0100)]
tools: Fix time offset when localtime=0
localtime can be stored in vm config as a string, resulting in
incorrect calculation of rtc_timeoffset. Cast localtime to int
to ensure rtc_timeoffset is calculated properly.
Keir Fraser [Wed, 26 May 2010 07:13:47 +0000 (08:13 +0100)]
xl: Some small fixes
- When use mem-set, I got suspicious error output:
# xl mem-set 1 256g
setting domid 1 memory to : 268435456
[0] libxl.c:2535:libxl_set_memory_target: memory_dynamic_max must be
less than or equal to memory_static_max
: Success
- String generated by strdup() should be freed
- When using 'xl help', mem-max and mem-set's output is not as intend,
and it also breaks bash completion, fix it.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Keir Fraser [Wed, 26 May 2010 07:12:15 +0000 (08:12 +0100)]
xl: allow nameless domains to be named
At present, find_domain() will exit(2) if you specify a domain by
number, but that domain doesn't have a corresponding name. However,
nothing seem to critically depend on common_domname being set, and the
test prevents dom0 or other nameless domains from being named. So
just remove the check.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Keir Fraser [Wed, 26 May 2010 07:01:21 +0000 (08:01 +0100)]
x86 shadow: Avoid remove-all-shadows after shadow teardown
If dom0 alters the p2m of a domain that's being destroyed, we can end
up doing a remove-all-shadows after the shadow hash table has been
freed. Since no hash table implies no shadows, just return
immediately.
Keir Fraser [Wed, 26 May 2010 06:59:52 +0000 (07:59 +0100)]
hvm: Handle extreme wallclock offsets safely.
When a VM's wallclock offset is negative enough, gmtime() can be called
with an underflowed uint64, which it then tries to divide into years
by subtraction. Handle the input as a 40-bit signed integer instead.
Keir Fraser [Tue, 25 May 2010 08:08:34 +0000 (09:08 +0100)]
xend: Add interface name definition support for xend-relocation-address
Add a new feature for xend-relocation-address option to support
definition by interface name which can be useful for people having
e.g. a cluster environment with multiple network interfaces on all of
the machines with only one reserved to be registered to a private
cluster network. This way they won't need to specify the relocation
address manually on all the machines but just simple providing the
interface name to get the IP address from would do the job (all the
machines have to have this interface named the same to make it
working, of course).
Technically it reads the interface name and gets its IP address using
ioctl call of SIOCGIFADDR and if the interface doesn't have the
address, i.e. if non-existing interface or hostname was provided the
original ifname is returned to preserve the old behaviour.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Sat, 22 May 2010 05:31:47 +0000 (06:31 +0100)]
x86: TSC handling cleanups (version 2)
"I am removing the tsc_scaled variable that is never actually used
because when tsc needs to be scaled vtsc is 1. I am also making this
more explicit in tsc_set_info. I am also removing hvm_domain.gtsc_khz
that is a duplicate of d->arch.tsc_khz. I am using scale_delta(delta,
&d->arch.ns_to_vtsc) to scale the tsc value before returning it to the
guest like in the pv case. I added a feature flag to specify that the
pvclock algorithm is safe to be used in an HVM guest so that the guest
can now use it without hanging."
Keir Fraser [Fri, 21 May 2010 14:25:10 +0000 (15:25 +0100)]
xl: fix block-attach command parsing
Fix two command-line parsing problems:
- the argc check is wrong: it must be provided with the frontend
device
- the ro/rw mode is optional, so default to rw if it is absent
Also, update the usage message accordingly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keir Fraser [Thu, 20 May 2010 13:12:14 +0000 (14:12 +0100)]
ocaml: fix ocaml xc compilation on 32 bit
cc1: warnings being treated as errors
xc_lib.c: In function 'xc_domain_get_pfn_list':
xc_lib.c:1217: error: assignment from incompatible pointer type
The XEN_DOMCTL_getmemlist interface has been 32/64 invariante since
13594:30af6cfdb05c and uint64_t is now the correct type for the PFN
list on all word sizes.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Wed, 19 May 2010 14:42:03 +0000 (15:42 +0100)]
x86: Streamline the CPU early boot process.
Mainly this involves getting rid of a bunch of cpumasks and replacing
with a single 'cpu_state' enumeration to track progress and allow
master-slave handshaking.
Cleaning this stuff up is a prerequisite for safely handling slave
failure (e.g., out of memory, invalid slave CPU capabilities,
...). This will get fixed up in a future patch.
Keir Fraser [Wed, 19 May 2010 07:22:06 +0000 (08:22 +0100)]
VT-d: Fix ATS enabling for device assignment
Currently, Xen only enables ATS in Xen booting. When an ATS capable
device is assigned to guest, ATS is actually not enabled because FLR
before assignment causes it to be disabled. Thus ATS cannot be used in
guest. This patch enables ATS in domain_context_mapping. This ensures
ATS is enabled in assignment because FLR is earlier than
domain_context_mapping call. Therefore ATS can be used in guest. This
patch also implements disable_ats_device to disable ATS when the
device is deassigned from a domain.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Wed, 19 May 2010 07:20:46 +0000 (08:20 +0100)]
libxl: Compilation and other small fixes
* Some of the library functions such as fscanf, system, and asprintf
are declared with warn_unused_result (ubuntu server 9.10), causing
compilation errors in libxl.
* When using asprintf, the caller is responsible for freeing the
memory.
* memset takes wrong size argument in one of the places (caught by
a builtin gcc check).
Keir Fraser [Tue, 18 May 2010 14:05:54 +0000 (15:05 +0100)]
x86: Pull dynamic memory allocation out of do_boot_cpu().
This has two advantages:
(a) We can move the allocations to a context where we can handle
failure.
(b) We can implement matching deallocations on CPU offline.
Only the idle vcpu structure is now not freed on CPU offline. This
probably does not really matter.
Keir Fraser [Tue, 18 May 2010 10:38:12 +0000 (11:38 +0100)]
xl: allow scaling suffix on memory sizes in mem-set and mem-max
Allow mem-set and mem-max to take 'b', 'k', 'm', 'g' and 't' as
scaling suffixes for bytes, kilobytes, mega, etc. An unadorned number
is still treated as kilobytes so no existing users should be affected.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 18 May 2010 10:24:04 +0000 (11:24 +0100)]
x86: Allow PV superpages to work with live migration
PV superpages currently do not work with live migration. They fall
over dead when the shadow page table is enabled for dirty tracking.
The HVM support for superpages in this code has been tested and found
to work just fine for PV superpages. This patch modifies the test
macro to allow the code to work with PV superpages.
Keir Fraser [Tue, 18 May 2010 10:21:25 +0000 (11:21 +0100)]
svm: Fix for AMD erratum 383 on Family 10h CPUs
This patches implements the workaround of AMD erratum 383 on family
10h CPUs. It destroys the guest VM when a MC error with a special
pattern is detected. Without this patch, a guest VM failure can
potentially crash Xen hypervisor and the whole system. The erratum
will be published in next version of guide.