Keir Fraser [Wed, 20 May 2009 15:02:50 +0000 (16:02 +0100)]
ACPI/NUMA: Improve SRAT parsing
This is to properly handle SRAT rev 2 extended proximity domain
values.
Also a first step to eliminate the redundant definitions of
ACPI provided table structures (Linux eliminated all of the duplicates
from include/linux/acpi.h in 2.6.21).
Portions based on a Linux patch from Kurt Garloff <garloff@suse.de>
and Alexey Starikovskiy <astarikovskiy@suse.de>.
Keir Fraser [Wed, 20 May 2009 14:38:34 +0000 (15:38 +0100)]
x86-64: also handle virtual aliases of Xen image pages
With the unification of the heaps, the pages freed from the Xen boot
image now can also end up being allocated to a domain, and hence the
respective aliases need handling when such pages get their
cacheability attributes changed.
Rather than establishing multiple mappings with non-WB attributes
(which temporarily still can cause aliasing issues), simply unmap
those pages from the Xen virtual space, and re-map them (to allow re-
establishing of eventual large page mappings) when the cachability
attribute for them gets restored to normal (WB).
Keir Fraser [Wed, 20 May 2009 14:35:32 +0000 (15:35 +0100)]
x86: don't map more than the allocated space for frame_table
This is to avoid undue virtual address aliases in case the over-mapped
pages happen to get allocated to a domain, and then get their
cacheability attributes changed.
At the same time, use 1Gb mappings if possible and reasonable.
Keir Fraser [Tue, 19 May 2009 22:28:25 +0000 (23:28 +0100)]
x86: Fix the P2M audit code.
It currently doesn't even compile; with this patch applied, it
compiles and didn't immediately explode as soon as I started a VM.
I've not given it much testing beyond that, though.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Keir Fraser [Tue, 19 May 2009 13:17:56 +0000 (14:17 +0100)]
stubdom: Rebuild the ocaml runtime libraries with the options needed
if they are to be linked with object files created by ocamlc and the minios
kernel.
This is needed to build stubdoms written in ocaml.
Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com>
Keir Fraser [Tue, 19 May 2009 01:18:48 +0000 (02:18 +0100)]
xend: Make hotplug script timeouts configurable
In some configurations, when dom0 is busy with I/O, it may take
several minutes to complete all hotplug scripts required when a new
domain is being created. As device create timeout is set to 100
seconds, users get "hotplug scripts not working" error instead of a
new domain.
This patch makes both DEVICE_CREATE_TIMEOUT and DEVICE_DESTROY_TIMEOUT
configurable in xend-config.sxp to allow users to easily adapt hotplug
timeouts to their environment.
Keir Fraser [Tue, 19 May 2009 00:37:19 +0000 (01:37 +0100)]
xend: solve issues with xm block-configure command.
In the case of inactive managed domains:
The following error occurs currently. We cannot change the
configuration of the VBD by using xm block-configure. Of course,
using xm block-detach and xm block-attach instead of xm
block-configure, we can change it. However, I'd like to change it by
using xm block-configure.
In the case of active domains:
Another problem occurs after a domain was rebooted. Even if we
change a configuration of a VBD in the domain by using xm
block-configure, the configuration of the VBD is reverted to previous
configuration after the domain was rebooted.
Keir Fraser [Tue, 19 May 2009 00:31:26 +0000 (01:31 +0100)]
x86, cpufreq: fix ondemand governor to take aperf/mperf feedback
APERF/MPERF MSRs provides feedback about actual freq in
eplased time, which could be different from requested freq by
governor. However currently ondemand governor only takes that
feedback at freq down path. We should do that for scale up too.
Keir Fraser [Fri, 15 May 2009 07:12:39 +0000 (08:12 +0100)]
vt-d: Fix interrupt remapping for multiple IOAPICs
Current IOAPIC interrupt remapping code assumes there is only one
IOAPIC in system. It brings problem when there are more than one
IOAPIC in system. This patch extends ioapic_pin_to_intremap_index[]
array to handle multiple IOAPICs case.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Thu, 14 May 2009 14:46:04 +0000 (15:46 +0100)]
xen public: make mmuext_op's vcpumask field const
Linux started to pass around pointers to 'const cpumask_t' a while ago,
and passing such a pointer to set_xen_guest_handle() requires that the
field be a handle for a constant type in order to avoid compiler
warnings.
Keir Fraser [Wed, 13 May 2009 09:39:44 +0000 (10:39 +0100)]
x86 vmx: Ensure debug-mode intercept for int3 and debug exceptions are
reinstated when resetting EXCEPTION_BIRTMAP entry in VMCS after
exiting real mode.
Keir Fraser [Wed, 13 May 2009 09:28:35 +0000 (10:28 +0100)]
passthrough: Fix PCI hot-plug option parsing
When a PCI function is passed-through extra options may be passed
through.
In the case of boot-time PCI pass-through the documented format is:
[dom:]bus:dev.slot[@vslot][[,opt]...]
e.g.
00:01.00.1@7,msitranslate=3D1
In the case of PCI hot-plug the xm pci-attach command take the
following arguments:
[-o opt[,opt]...] [dom:]bus:dev.slot [vslot]
e.g.
-o msitranslate=3D1 00:01.00.1 7
These xm ends up passing these to xem-qemu as:
[dom:]bus:dev.slot[[,opt]...][@vslot]
e.g.
00:01.00.1,msitranslate=3D1@7
Note that the option and the vslot have are transposed when
compared to the format used by boot-time PCI pass-through.
The parser inside qemu-xen can only handle the format used by
boot-time PCI pass-through and because of this ignores
any options passed by hot-plug.
This patch alters format used by hot-plug to match the parser.
Keir Fraser [Fri, 8 May 2009 10:50:12 +0000 (11:50 +0100)]
x86 hvm: hvm_set_callback_irq_level() must not be called in IRQ
context or with IRQs disabled. Ensure this by deferring to tasklet
(softirq) context if required.
Keir Fraser [Thu, 7 May 2009 18:32:10 +0000 (19:32 +0100)]
Permit user to suppress passing --prefix to setup.py
We change all invocations of setup.py as follows:
* use $(PYTHON) instead of `python' so that the user can specify
an alternative python version if they need to. If not set it
defaults to `python' in Config.mk.
* pass --prefix=$(PREFIX) via a new make variable
$(PYTHON_PREFIX_ARG). This allows a user to suppress the
--prefix=... argument entirely by setting PYTHON_PREFIX_ARG=''.
This will work around the bug described here
https://bugs.launchpad.net/ubuntu/+bug/362570
where passing --prefix=/usr/local (which ought to have no effect as
/usr/local is the default prefix) changes which subdirectory
distutils chooses, and results in the files being installed in
site-packages which is not on the default search path.
Users not affected by this python packaging bug should not set
PYTHON_PREFIX_ARG and their builds will not be affected. (Provided
PREFIX did not contain spaces. People who put spaces in PREFIX are
being quite optimistic.)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
After the scheduler timer became suspended before entering cpu idle
state, the percpu timer_deadline is possible to be 0, i.e. no soft
timer in the queue. This case will cause unexpected large residency
percentage in C1 for the purely idle cpu.
Signed-off-by: Wei Gang <gang.wei@intel.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Update XEN_LINUX_GIT_REMOTEBRANCH to match changes made in upstream
repo. Needed if you want setting KERNELS=linux-2.6-pvops in
config/Linux.mk to work.
Signed-off-by: Alex Zeffert <alex.zeffert@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
op_pincpu method in SrvDomain.py does not currently work because
op_pincpu method gives string objects to a cpumap argument of
domain_pincpu method in XendDomain.py though the cpumap argument
expects list objects.
This patch solves the above problem as follows.
op_pincpu method gives string objects to the cpumap argument as is,
because op_pincpu method cannot give list objects to the cpumap
argument.
Instead, domain_pincpu method expects that the cpumap argument is
string objects, then domain_pincpu method converts the cpumap
argument into list objects.
Also, the patch modifies two methods (except for op_pincpu method)
calling domain_pincpu method. The methods give string objects to
the cpumap argument instead of list objects.
Network support is still provided the same way: using the tap
interface, created in qemu using netfront.
The lwip stack is still available to avoid additional compilation
issues.
However the stubdom is not going to have its own vif anymore,
this means that the only vnc server supported is the one in dom0.
You can still enable the vnc server in a stubdom at compile time, if
you want so.
Probably the most important change caused by this patch to xen users
is that you don't have to specify two vif in the stubdom config file
anymore, but just one:
Prior to changset 19510:5c69f98c348e - 'xm, xend: Replace "vslt" with
"vslot"', both vslt and vslot were used in the xm code, often fairly
arbitrarily.
However, in the dictionary that describes a pci function both vslt and
vslot were present. vslt stored the slot assigned to the function. And
vslot stored the slot the user requested for the function, or
AUTO_PHP_SLOT if no slot was requested.
With the renaming these two values got merged into a single entry.
This patch un-merges them by renaming the what was vslot to
requested_vslot.
So an out of chronological order list of name changes is:
xend: Do not overwrite xauthority and display with empty values
Display and xauthority vars are read from vmConfig['platform'] first,
then they are read again from dev_info.
However if the user does not set those variable in the config file,
dev_info won't contain them, hence we are going to overwrite the
current significant values with null.
This patch fixes the problem setting display and xauthority to the
current values if dev_info does not contain them.
This patch removes the need for a second configuration file for
stubdoms: it is going to be automatically generated by the script
stubdom-dm using command line options and xenstore to find any needed
information.
The configuration script will be placed under /etc/xen/stubdoms and
automatically removed when the domain is destroyed.
The only change needed in xend is not to write on xenstore sdl,
opengl and serial command line options for qemu, because stubdoms do
not support them.
It is safe to remove those two options from xenstore because qemu does
not use xenstore to read commans line options.
Finally this patch fixes blkfront disconnections from backends and
display and xauthority variables for pv guests.
If ${netdev} is bonding, brctl addif ${bridge} ${pdev} fails:
can't add ${pdev} to bridge ${bridge}: Invalid argument
Because ${pdev} has no slaves at this point.=20
# Notice that ifdown ${netdev} clears slaves of ${netdev}.
This patch restores slaves before add_to_bridge2 ${bridge} ${pdev}.
The following changeset broke booting xen-ia64 on some kinds of ia64 boxes.
http://xenbits.xensource.com/ext/ia64/xen-unstable.hg/rev/3fd8f9b34941
The tasklet_schedule call raise_softirq().
Because raise_softirq() use per_cpu, if we access per_cpu before cpu_init()
the behavior would be unexpected.
Event-channel setup: Re-bind if the connection becomes unbound (e.g.,
due to 'slow' domain suspend cancellation), even if the remote port
identifier has not changed.
Domain logging: Only open log file once (don't leak fds) and fix a
small memory leak.
Evtchn changes based on a patch by Jiri Denemark <jdenemar@redhat.com>
x86: fix next->vcpu_dirty_cpumask checking in context_switch()
There was a timing window where flush_tlb_mask() could be called with
an empty mask (triggering a WARN_ON() in send_IPI_mask_flat() along
with APIC errors) because rather than using the already taken snapshot
of next's vcpu_dirty_cpumask struct vcpu's field was used directly,
which can get its only bit cleared by remote CPUs.
Replacing the structure field's use by the local variable then made
the inner cpus_empty() check completely redundant with the one in the
surrounding if()'s condition.
This patch updates the Makefile to download the latest version of
tboot, which supports the interface changes made recently. This
should go into 3.4, since 3.4 supports the new tboot interface.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
A few ioapic redirection entries are initialized by hypervisor before
enabling iommu hardware. This patch copies those entries from ioapic
redirection table into interrupt remapping table after interrupt
remapping table has been allocated.
cpuidle: Add support for Always Running APIC timer, CPUID_0x6_EAX_Bit2.
This bit means the APIC timer continues to run even when CPU is
in deep C-states.
The advantage is that we can use LAPIC timer on these CPUs
always, and there is no need for "slow to read and program"
external timers (HPET/PIT) and the timer broadcast logic
and related code in C-state entry and exit.
Refer to the latest Intel SDM Vol 2A
(http://www.intel.com/products/processor/manuals/index.htm)
x86: avoid EPT scanning errors when splitting superpages during live migration
Since Xen did not lock the p2m table for p2m table reading, when
splitting the large page during live migration, we should make sure
the path of EPT entries be modified are always there while other CPUs
may access the super entries at the same time.
xend: clean up qemu-dm related items on domain destroy
Some qemu-dm related stuffs might be left behind after the domain is
destroyed.
- xenstore entry, /local/domain/0/device-model/<domid>
- named pipes, /var/run/tap/qemu-{read,write}-<domid>
Extend pt_bind_irq to handle the update of msi guest
vector and flag.
Unbind and rebind using separate hypercalls may not be viable
sometime.
For example, the guest may update MSI address/data on fly without
disabling it first (e.g. change delivery/destination), implement these
updates in such a way may result in interrupt loss.
I observed from xend.log that several domain restart threads run
simultaneously. This patch make it singleton.
Without this, several coredump of a domain might be created.
If a qemu-dm dies immediately (probably by wrong setting),
xend repeats to restart a domain so many times.=20
That causes system overload.
There is already a logic to avoid too early restarting, however,
it might not work. Since xenstore entry 'xend/previous_restart_time'
is volatile. XendDomainInfo.destroy() which removes the entry from
xenstore is called in some places.
Also, this patch prevents too early restarting even at the first
domain creation.
The corruption happens every time we pass a sector aligned buffer
(instead of a page aligned buffer) to blkfront_aio. To trigger the COW
we have to write at least a byte to each page of the buffer, but we
must be careful not to overwrite useful content.
Currently cpufreq HW-ALL coordination is handled same way as SW-ALL.
However, SW-ALL will bring more IPIs which is bad for cpuidle.
This patch implement HW-ALL coordination handled in different way from
SW-ALL, for the sake of performance and reduce IPIs. We also
suspend/resume HW-ALL dbs timer for idle.
Signed-off-by: Yu, Ke <ke.yu@intel.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Tian, Kevin <kevin.tian@intel.com>
This patch makes two small changes to dom0 iptables rules that permit
(and revoke) domU network access.
First:
Currently, a rule intended to allow domU network access is appended to
the end of the FORWARD chain, where it can be preempted by other =20
rules. This patch causes the rule to be inserted at the top, where
it's more likely to have the intended effect.
Second:
In some cases (e.g. Fedora 9's default iptables configuration), the
first rule alone is insufficient to permit two-way packet flow. This
patch adds a second rule to the FORWARD chain that permits replies to
domU network requests to reach the domU vif.
Signed-off-by: Chris Bookholt <hap10@tycho.ncsc.mil>