Ian Campbell [Wed, 18 Aug 2010 16:09:59 +0000 (17:09 +0100)]
tools/libxl: fix "xl console" for primary console
libxl_console_constype is an enum and can therefore be unsigned so
using -1 as a sentinel for unset in main_console fails to work as
expected.
Arrange for all valid enum values to be > 0 and use 0 as the sentinal
instead.
If the user does not request a specific type then always use the
primary console since using "-n" but not "-t" is not meaningful as we
do not know which type to request.
Also make libxl_console_exec reject invalid values of type.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Wed, 18 Aug 2010 16:09:25 +0000 (17:09 +0100)]
tools/libxc: free thread specific hypercall buffer on xc_interface_close
The per-thread hypercall buffer is usually cleaned up on pthread_exit
by the destructor passed to pthread_key_create. However if the calling
application is not threaded then the destructor is never called.
This frees the data for the current thread only but that is OK since
any other threads will be cleaned up by the destructor.
Changed since v1:
* Ensure hcall_buf_pkey is initialised before use. Thanks to
Christoph Egger for his help diagnosing this issue on NetBSD.
* Remove redundant if (hcall_buf) from xc_clean_hcall_buf since
_xc_clean_hcall_buf includes the same check.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Brendan Cully [Wed, 18 Aug 2010 13:50:51 +0000 (14:50 +0100)]
tools/hotplug: 21549:8bcaec29574e breaks vif-script with arguments.
For example, (vif-script 'vif-bridge bridge=eth1') in xend-config.sxp will
cause vif-setup to attempt to execute 'vif-bridge bridge=eth1' due to a
quoting mismatch. The fix appears to be to remove the extra quotes around
"$script" in vif-setup.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Tue, 17 Aug 2010 18:32:37 +0000 (19:32 +0100)]
x86 cpuidle: check whether cpu is online in cpu idle control
We observed a 2.6.18.8 dom0 kernel crash when Xen has maxcpus < num
of physical cores (maxcpus=3D4 for a 12-core system). It appeared that
hypervisor doesn't check whether CPU is online or not. This small
patch fixed the issue.
Ian Jackson [Tue, 17 Aug 2010 16:20:53 +0000 (17:20 +0100)]
tools/libxl: compile with -Wmissing-declarations
Since the recent build error caused by mismatch of symbol types due to
missing declarations, build libxl with -Wmissing-declarations.
The patch is mostly straightforward, a one liner in the makefile enables
the flag, a lot of functions in xl_cmdimpl.c needed to be made static
and libxl_paths.c needed to include libxl.h.
The one wart on the patch-set is that flex has a bug where it emits code
with missing declarations for yy(set|get)_column. This can be worked
around by providing the declarations ourselves regardless.
Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 16 Aug 2010 14:31:04 +0000 (15:31 +0100)]
libxl: move various enum and #defines above datastructure definitions.
These are left behind when the datastructures move to _lixl_types.hg
in a following patch and moving them first makes that pure code motion
patch more obviously correct.
Gianni Tedesco [Mon, 16 Aug 2010 16:15:04 +0000 (17:15 +0100)]
xl: make libxl_uuid2string internal to libxenlight
libxenlight exports a function libxl_uuid2string which is used
internally in several places but has one external caller in xl.
This means that libxl internal callers leak since they were not
expecting to have to free() the UUID since the per-api-call-gc-lifetime
patch.
Convert libxl_uuid2string to be an internal function which participates
in the callers garbage collection. Eliminate string_of_uuid() macro in
favour of "format" and "arguments" macros suitable for printf()-like
functions which are made part of the libxl API and fix-up xl callers to
use that to avoid code duplication and enhance readability.
Values of cpu_weight and cpu_cap are lost after xend restart
For managed domains in state 'halted' I always get default values
for cpu_cap / cpu_weight after xend restart.
This is because the names of parameters differ between a SXP file to
create a VM (here the parameter names are cpu_cap / cpu_weight) and
a SXP file of a managed VM (here vcpus_params (cap 0) (weight 0)).
But XendConfig.py reads only cpu_cap / cpu_weight and if not found,
default values are used.
The patch reads first vcpus_params (cap, weight), if not found then cpu_cap,
cpu_weight and if both parameters are missing it uses the default values.
eXeC001er [Mon, 16 Aug 2010 16:11:30 +0000 (17:11 +0100)]
Fix "Error: Device 51952 not connected" error when using pygrub
The following is the process of booting a DomU with 'mounted-blktap2' (VHD
for example) and 'pygrub' as bootloader:
1. Connect boot-device to Dom0 as '/dev/xpvd'
2. Pygrub get info for load DomU
3. Disconnect boot-device from Dom0
4. Boot DomU
During step 3 the created device is disconnected from Dom0, but
xenstore does not scrape away after the device is disconnected so you
get the following error:
"Error: Device /dev/xvdp (51952, tap2) is already connected."
During step 3 xend calls destroyDevice always with 'tap' as argument.
Gianni Tedesco [Mon, 16 Aug 2010 12:39:19 +0000 (13:39 +0100)]
tools/libxl: remove libxl_free() since there are no more callers
libxl_free() allows allocated memory to be explicitly free'd from a
libxl_gc. Every previous use of this function has now been made
redundant and therefore has been removed. We can safely kill it and
amend the policy accordingly.
Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Gianni Tedesco [Mon, 16 Aug 2010 12:37:58 +0000 (13:37 +0100)]
tools/libxl: fix memory management bugs in libxl_device_disk_list()
fix invalid free segfault and use-after-free in libxl_device_disk_list()
Gah, libxl_device_disk_list() is returning a lot of pointers to free'd
data. Fix that by replacing libxl_xs_read() with xs_read() in line with
the policy.
Also fix a segfault caused by an erroneous free of the last disk-list
array element rather than the first one. This was causing xl create to
segfault when using the new qemu-dm code-base. Fix that and add a
comment about the fact that this libxl API requires a corresponding
libxl_device_disk_free() function.
Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
tools: xenconsole[d] and libxl: multiple console support
This patch implements the new protocol for handling pv consoles and
emulated serials as described in the document docs/misc/console.txt.
The changes are:
- xenconsoled: do not write the pty under serial in xenstore if
xenconsoled is handling a consolepath;
- xenconsole: implement support for an explicit console type parameter;
the parameter can be "pv", to specify that the user wants to
connect to a pv console, or "serial", to specify that the user wants to
connect to an emulated serial. If the type parameter hasn't been
specified be the user, xenconsole tries to guess which type of console
it has to connect to, defaulting to pv console for pv guests and
emulated serial for hvm guests.
- xenconsole: use the new xenstore paths;
- libxl: rename libxl_console_constype to libxl_console_consback:
constype is used to to specify whether qemu or xenconsoled provides the
backend, so I renamed it to libxl_console_consback to make it more
obvious that we are talking about backends;
- libxl: add a new libxl_console_constype to specify if the console is
an emulated serial or a pv console;
- libxl: support the new xenconsole "type" command line parameter;
- libxl: use the "output" node under console in xenstore to tell qemu
where do we want the output of this pv console to go;
- remove the legacy "serialpath" from xenconsoled altogether
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Also: update the QEMU_TAG to pull in the qemu part of these changes.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
After animated discussion with several libxl developers we seem to have
agreed on a policy for memory management within libxenlight. These
comments document the policy which is mostly implemented since
21977:51147d5b17c3 but some aspects (comments, function naming) are
guidelines to be followed in future functionality and perhaps to be
implemented by search/replace in future patches.
The document is mostly authored by Ian Jackson but with modifications to
reflect the slightly different functionality that has been implemented
since this was proposed.
Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Fri, 13 Aug 2010 13:59:03 +0000 (14:59 +0100)]
x86: eliminate bogus IRQ restrictions
As pointed out in
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00077.html
the limits introduced in c/s 20072 are at least
questionable. Eliminate them in favor of a more dynamic approach:
There's no real need for an upper limit on nr_irqs (as anything beyond
nr_irqs_gsi isn't visible to domains anyway), and the split point (and
hence ratio) between GSI and MSI/MSI-X IRQs doesn't need to be hard
coded, but can instead be controlled on the command line in case there
are *very* many GSIs.
The default used for nr_irqs will be rather large with this patch, so
it may not be acceptable without also switching to a sparse irq_desc[]
as was done not so lomg ago in Linux.
The added capping of any domain's nr_pirqs is based on the observation
that no domain can possibly have more than the system wide number of
IRQs. The opposite case may in fact also require some adjustment:
Defaulting the number of non-GSI IRQs available (namely to Dom0) to a
fixed value may not be the best choice going forward, since if there
indeed are very many non-GSI interrupt sources, it won't be possible
for the kernel to make use of them without giving
"extra_guest_irqs=" on the command line (but the goal should be to
allow things to work right by default even on large systems).
Keir Fraser [Fri, 13 Aug 2010 13:58:06 +0000 (14:58 +0100)]
x2APIC: Improve x2APIC suspend/resume
x2apic depends on interrupt remapping, so it should disable interrupt
remapping behind x2apic disabling. And also this patch wraps
__enable_x2apic to get rid of duplicated code.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Fri, 13 Aug 2010 13:57:35 +0000 (14:57 +0100)]
Fix IOAPIC S3 with interrupt remapping enabled
In ioapic_suspend, it reads and saves ioapic RTEs. But when interrupt
remapping is enabled, io_apic_read will call io_apic_read_remap_rte to
convert remapped format interrupt to compatible format, this results
in 'dest' field may be changed in remap_entry_to_ioapic_rte. When in
ioapic_resume, it will write the saved RTEs with incorrect 'dest' to
interrupt remapping table.
Actually it needn't to convert RTEs regardless interrupt remapping is
enabled or not. It just needs to save and restore RTE values
directly. This patch just uses __io_apic_read and __io_apic_write,
which won't call Interrupt remapping functions to convert, to save and
restore RTEs in ioapic_suspend and ioapic_resume. Thus fix this issue.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Fri, 13 Aug 2010 13:56:15 +0000 (14:56 +0100)]
sysctl: Return physinfo.max_{cpu,node}_id as maximum *possible* IDs.
In particular, this fixes setting vcpu affinities via
libxl. Previously, the affinity mask would be narrowed to the maximum
currently-online CPU. So future hotplugged CPUs could not be
expressed.
Removing the include of sys/ptrace.h and threaddb.h exposed a few
places which were using time(2) or gettimeofday(2) without including
time.h or sys/time.h respectively and were relying on an include.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Fri, 13 Aug 2010 07:33:19 +0000 (08:33 +0100)]
credit1: Make weight per-vcpu
Change the meaning of credit1's "weight" parameter to be per-vcpu,
rather than per-VM.
At the moment, the "weight" parameter for a VM is set on a per-VM
basis. This means that when cpu time is scarce, two VMs with the same
weight will be given the same amount of total cpu time, no matter how
many vcpus it has. I.e., if a VM has 1 vcpu, that vcpu will get x% of
cpu time; if a VM has 2 vcpus, each vcpu will get (x/2)% of the cpu
time.
I believe this is a counter-intuitive interface. Users often choose
to add vcpus; when they do so, it's with the expectation that a VM
will need and use more cpu time. In my experience, however, users
rarely change the weight parameter. So the normal course of events is
for a user to decide a VM needs more processing power, add more cpus,
but doesn't change the weight. The VM still gets the same amount of
cpu time, but less efficiently allocated (because it's divided).
The attached patch changes the meaning of the "weight" parameter, to
be per-vcpu. Each vcpu is given the weight. So if you add an extra
vcpu, your VM will get more cpu time as well.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Currently scratch variables allocated by libxl have the same lifetime as
the context. While this is suitable for one off invocations of xl. It is
not so great for a daemon process linking to libxl. In that case there
will be prolific leakage of heap memory.
My proposed solution involves create a new libxl_gc structure, which
contains a pointer to an owning context as well as the garbage
collection data. Top-level library functions which expect to do a lot of
scratch allocations put gc struct on the stack and initialize it with a
macro. Before returning they then call libxl_free_all on this struct.
This means that static helper functions called by such functions will
usually take a gc instead of a ctx as a first parameter.
The patch touches almost every code-path so a close review and testing
would be much appreciated. I have tested with valgrind all of the parts
I could which looked non-straightforward. Suffice to say that it seems
crash-free even if we have exposed a few real memory leaks. These are
for cases where we return eg. block list to an xl caller but there is no
appropriate block_list_free() function to call. Ian Campbells work in
this area should sew up all these loose ends.
Ian Jackson [Thu, 12 Aug 2010 16:06:21 +0000 (17:06 +0100)]
tools/python: Remove non-ASCII characters introduced by fffedd3d70e1.
fffedd3d70e1 introduced into tools/python/xen/util/vscsi_util.py
high-bit-set characters which appear to be UTF-8 for non-breaking
spaces. Replace them with spaces to avoid getting a Python syntax
error.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
split LDLIBS from LDFLAGS to fix link errors in recent toolchains
Linker command lines are order-sensitive.
Move linker options -Lfoo -lfoo from LDFLAGS to LDLIBS and place this new
variable after the objects to link. This resolves build errors in xenpagin
and blktap with recent toolchains.
rename SHLIB_CFLAGS to SHLIB_LDFLAGS
rename LDFLAGS_* to LDLIBS_*
move LDFLAGS usage after CFLAGS in CC calls
remove stale comments in xenpaging Makefile
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Wed, 11 Aug 2010 16:01:02 +0000 (17:01 +0100)]
msi: Avoid uninitialized msi descriptors
When __pci_enable_msix() returns early, output parameter (struct
msi_desc **desc) will not be initialized. On my machine, a Broadcom
BCM5709 nic has both MSI and MSIX capability blocks and when guest
tries to enable msix interrupts but __pci_enable_msix() returns early
for encountering a msi block, the whole system will crash for fatal
page fault immediately.
George Dunlap [Wed, 11 Aug 2010 14:56:21 +0000 (15:56 +0100)]
[Xen-devel] [PATCH] PoD: Fix domain build populate-on-demand cache allocation
Rather than trying to count the number of PoD entries we're putting in, we
simply pass the target # of pages - the vga hole, and let the hypervisor
do the calculation.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
xl: don't use libxl allocator for vcpu_list
This also fixes a bug with an erroneous call to libxl_free().
A destructor for the vcpu list is also implemented which is called from
xl.
"Dube, Lutz" [Wed, 11 Aug 2010 12:18:05 +0000 (13:18 +0100)]
Exception in xen/util/vscsi_util.py while starting xend
We have pscsi device with long scsi ids like 15:0:11:101.
In this case lsscsi prints no "blank" between id and type,
so the following split of the string returns wrong output.
The field physical_HCTL is set to 15:0:11:101]dis.
The patch replaces char "]" by "] ", so split() will return the right
physical_HTCL.
Signed-off-by: Lutz Dube Lutz.Dube@ts.fujitsu.com Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xl: Use gcc hidden visibility attribute
This kills off about 1,000 PLT entries speeding up link time and shaving about
1-2KB off of binary size. It also prevents users of the library erroneously
calling libxl internal functions.
xl: Fix invalid return of internal ptrs via libxl_poolid_to_name
libxl_poolid_to_name has in-and-out-of-library callers. In library callers now use
_libxl_poolid_to_name() which participates in garbage collection and
out-of-library callers are fixed up to free() the domain name.
xl: Fix invalid return of libxl-internal pointers via libxl_domid_to_name()
libxl_domid_to_name has numerous in-and-out-of-library callers. In
library callers now use _libxl_domid_to_name() which participates in
garbage collection and out-of-library callers are fixed up to free() the
domain name.
xl: don't use libxl allocator for nic_list
This also fixes a bug with an erroneous call to libxl_free().
A destructor for the nic list is also implemented which is called from
xl.
xl: allocate version info strings with strdup()
libxl_strdup() will soon have a per-call rather than per-context lifetime so
use strdup() to allocate version info and manually free in context destructor
xl: Return void from libxl_free() and libxl_free_all()
Also abort() if a pointer passed in to libxl_free() cannot be found in the
mini-gc structures. This exposes several real bugs (ie. not just the
libxl_free() something that was allocated via malloc variety).
Andre Przywara [Tue, 10 Aug 2010 14:35:13 +0000 (15:35 +0100)]
xl: Fix xl vcpu-list output on machines with more than 16 cores Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
xc_vcpu_[sg]etaffinity require the number of _bytes_ needed for
holding a CPU map, not the number of bits. Fix this when
calling the function from libxl and rename the misleading variable
name to avoid future confusion.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 9 Aug 2010 17:28:04 +0000 (18:28 +0100)]
x86: Reorder CPUs at boot time to reflect system topology.
This is an attempt to impose some sensible coherent ordering on the
cpu namespace, where previously there was none (we were at the mercy
of BIOS ordering, which varies wildly across systems).
Implement PCI pass-through for multi-function devices. The supported BDF
notation is: BB:DD.* - therefore passing-through a subset of functions or
remapping the function numbers is not supported except for when passing
through a single function which will be a virtual function 0.
Letting qemu automatically select the virtual slot is not supported in
multi-function passthrough since the qemu-dm protocol can not actually
handle that case.
Gianni Tedesco [Mon, 9 Aug 2010 16:43:18 +0000 (17:43 +0100)]
xc: fix segfault in pv domain create if kernel is an invalid image
If libelf calls elf_err() or elf_msg() before elf_set_log() has been
called then it could potentially read an uninitialised log handling
callback function pointer from struct elf_binary. Fix this in libxc by
zeroing the structure before calling elf_init().
The error message when one wants to list a non-existent domain is at
best misleading (libxl_domain_info failed (code -5)).
This patch catches this specific error and tells the user that the
requested domain does not exist:
Error: Domain '42' does not exist.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 9 Aug 2010 15:40:18 +0000 (16:40 +0100)]
Intel EPT: Fix out of range right shift on 32-bit host
Currently, there has a logic to check whether the EPT GFN is exceeding
guest physical address width. It uses right shift(>>) to implement the
check. But the right shift count is greater than the width of the
type(unsigned long = 32) under the PAE. And this will cause guest boot
fail under PAE with EPT supported.
Signed-off-by: Li Xin <xin.li@intel.com> Signed-off-by: Zhang Yang <yang.z.zhang@intel.com>
Keir Fraser [Mon, 9 Aug 2010 15:37:33 +0000 (16:37 +0100)]
Credit1: Tweak reset condition
VMs that don't use their full timeslice are guaranteed to flip back
and forth between "active" and "inactive". If we set credit to 0
when setting "inactive", then when the VM comes back to "active"
again, it will effectively be behind most other vcpus in credit.
This causes the credit1 to effectively discriminate *against*
VMs which use less than their full timeslice.
Instead of setting credit to 0, divide it in half. This gets rid of
some of the system credit while allowing non-cpu-bound VMs to keep
some priority advantage.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Mon, 9 Aug 2010 15:36:07 +0000 (16:36 +0100)]
scheduler: Implement yield for credit1
This patch implements 'yield' for credit1. It does this by attempting
to put yielding vcpu behind a single lower-priority vcpu on the
runqueue. If no lower-priority vcpus are in the queue, it will go at
the back (which if the queue is empty, will also be the front).
Runqueues are sorted every 30ms, so that's the longest this priority
inversion can happen.
For workloads with heavy concurrency hazard, and guest which implement
yield-on-spinlock, this patch significantly increases performance and
total system throughput.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Anthony Perard [Mon, 9 Aug 2010 15:46:02 +0000 (16:46 +0100)]
tools/hotplug, Use udev rules instead of qemu script to setup the bridge.
From: Anthony PERARD <anthony.perard@citrix.com>
This patch adds a second argument to vif-bridge script. It can be "vif"
or "tap". "vif" give the default behavior and "tap" just add the
interface to the found bridge when the action is "add".
Anthony Perard [Mon, 9 Aug 2010 15:46:01 +0000 (16:46 +0100)]
libxl, Introduce the command line handler for the new qemu.
From: Anthony PERARD <anthony.perard@citrix.com>
This patch adds a function to check the version of the device model.
Depending on the version of the DM, the command line arguments will be
built differently.
Keir Fraser [Mon, 9 Aug 2010 15:33:45 +0000 (16:33 +0100)]
vt-d: Fix ioapic_rte_to_remap_entry error path.
When ioapic_rte_to_remap_entry fails, currently it just writes value
to ioapic. But the 'mask' bit may be changed if it writes to the upper
half of RTE. This patch ensures to recover the original value of 'mask'
bit in this case.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Mon, 9 Aug 2010 15:32:45 +0000 (16:32 +0100)]
vt-d: Fix ioapic write order in io_apic_write_remap_rte
At the end of io_apic_write_remap_rte, it writes new entry (remapped
interrupt) to ioapic. But it writes low 32 bits before high 32 bits,
it unmasks interrupt before writing high 32 bits if 'mask' bit in low
32 bits is cleared. Thus it may result in issues. This patch fixes
this issue by writing high 32 bits before low 32 bits.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Weidong Han <weidong.han@intel.com>