Big fixes for the new IO-APIC acknowledging method. The problems
were:
1. Some critical Xen interrupts could get blocked behind
unacknowledged guest interrupts. This is avoided by making
all Xen-bound interrrupts strictly higher priority.
2. Interrupts must not only be EOIed on the CPU that received
them, but also in reverse order when interrupts are nested.
A whole load of logic has been added to ensure this.
There are two boot parameters relating to all this:
'ioapic_ack=old' -- use the old IO-APIC ACK method
'ioapic_ack=new' -- use the new IO-APIC ACK method (default)
'force_intack' -- periodically force acknowledgement of
interrupts (default is no; useful for debugging)
This patch adds a section to the documentation on the late binding
feature for PCI devices. It provides some examples (mostly stolen from
the e-mail which accompanied the late-binding patch) of how to use the
sysfs attributes for late binding.
This patch was revised from the last documentation patch that I
submitted which included this and some documentation on the permissive
flag. I've divided the two sections up and I'd like this one considered
for acceptance now while I revise the permissive flag code.
Currently, it is possible to set the mem-max value to value lower than
what has been currently allocated to the domain causing the kernel to
crash. This patch validates the value passed in and prevents setting the
value below the current allocation level.
This patch enables external devices, such as for example a mounted hard
drive image or a TPM, to be migrated to a remote machine. The patch
hooks into the checkpointing (XendCheckpoint.py) code and performs
migration in 4 different steps:
In a 1st step (step = 0 in the code) migration of all devices of a
domain is 'tested', that means their driver implementations (blkif.py,
netif.py, tpmif.py, usbif.py, pciif.py) are queried whether migration is
possible at all. Currently all device representations respond with a
'yes' (=0), although probably a VM mounting a hard drive partition
should respond with a 'no' (-1) already. This first step is a quick
check to see whether devices can be migrated.
The 2nd step is to do whatever can be done before the domain is
suspended. At this point migration of the device could be initiated, if
at all possible.
The 3rd step is to migrate a device after the domain has been suspended,
meaning that it is not scheduled anymore and the VM is 'settled'. All
devices are called again and a good implementation would initiate the
migration in a background process to achieve as much concurrency as
possible.
The 4th step is to synchronize with the 3rd step. At this point the
implementor has to make sure that anything that was initiated in step 3
has completed. Once all steps 4 have been processed, the VM will resume
on the remove machine.
I have implemented hooks for migration of a virtual TPM in
xen/xend/server/tpmif.py. These hooks call a configurable external
migration tool using the os.popen() call with a fixed command line
parameter set. The implementation refuses to migrate a VM attached to a
virtual TPM if no tool has been provided for migration.
All other devices do not currently overload the 'migrate' method defined
in the DevController.py and therefore will just let migration happen.
I have added hooks for error recovery such that whatever part of
migration has been initiated can be rolled back when any of the devices
fail to migrate in one of the steps. The interface (in tpmif.py) to the
external application now uses os.popen() to allow error handling by
reading the application's output.
Clean up grant_mapping_t. Increase its size from 4bytes to 8bytes and
removed tight encoding of flag and ref. This change is xen-internal
so this shouldn't affect domain api.
Since we don't reset the proto_csum_blank flag in the skb, the
checksum calculation gets done twice, which is not twice as good as
once.
With this patch, TCP/UDP checksum errors from dom0 are fixed, and
domUs can use TCP/UDP without turning off TX checksum offload. Normal
non-VLAN bridged configs still work fine, tested with xm-test.
This is a patch for XenMon which only applies to the userspace tools.
The primary purpose of this patch is to add support for non-polling
access to the xen trace buffers. The hypervisor changes have already
been accepted.
Also included are a few bug fixes and some minor new features:
1. If xenmon is run without first allocating trace buffers (via
'setsize') and enabling them (via 'tbctl'), then this is done
automatically using sensible defaults.
2. There was a bug that caused the first second's worth of data output
from xenmon to be erroneous; This has been fixed.
3. There was a bug that caused xenmon to sometimes not display data for
newly created domains; This has also been fixed.
4. The xenmon display has a 'heartbeat' which flickers once per second.
This is to show that xenmon is still alive, even though the display
isn't changing at all, a situation that can happen sometimes when there
is nothing at all happening on a particular cpu.
5. Added cpu utilization display to the top of the xenmon window.
6. Added a bunch of options in xenmon to control exactly which metrics
are displayed, so the screen doesn't get cluttered with stuff you're not
interested in. The new options are:
--allocated
--noallocated
--blocked
--noblocked
--waited
--nowaited
--excount
--noexcount
--iocount
--noiocount
7. Added an option ("--cpu=N") to xenmon to specify which physical cpu
you'd like data displayed for.
8. Updated the README with information about default trace buffer size, etc.
Trivial patch to fix x86_64 builds in which XEN_TARGET_ARCH
is specified on the make command line, e.g.:
make XEN_TARGET_ARCH=x86_64
This busted the vmxassist and hvmloader builds, which must
be done -m32. Using "override" in the vmxassist/hvmloader
Makefiles fixes the problem by not allowing this to be
overridden from the command line.
Signed-off-by: Dave Lively <dlively@virtualiron.com>
New IO-APIC ACK method seems to cause problems on some systems
(e.g., Dell 1850). Disable it by default for now, but allow the
new mwethod to be tested by passing boot parameter 'new_ack'
to Xen.
You can tell which ACK method you are using because Xen prints
out "Using old ACK method" or "Using new ACK method" during boot.
This workaround can be removed if/when the problems with the new
ACK method are flushed out.
Fix Xen's interrupt acknowledgement routines on certain
(apparently broken) IO-APIC hardware:
1. Do not mask/unmask the IO-APIC pin during normal ISR
processing. This seems to have really bizarre side effects
on some chipsets.
2. Since we instead tickle the local APIC in the ->end
irq hook function, it *must* run on the CPU that
received the interrupt. Therefore we track which CPUs
need to do final acknowledgement and IPI them if
necessary to do so.
In some cases, say for instance for some bizzare reason
the tree was checked out of CVS, which doens't neccessarily
store file permissions, mkbuildtree may not be executable.
So run them explicitly via bash.
This fixes the Xen Makefile to allow correct building of cscope, TAGS
and tags. Prior to this the asm directory was not constructed correctly
for the "find" command. "xen\cscope.*" has been added to ".hgignore".
This is the initial patch for SMP PAE guest on x86-64 Xen.
For vcpus=2, the SMP PAE guest can do kernel build successfully.
And it improves the stability of SMP guests.
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xiaohui Xin xiaohui.xin@intel.com
Use copy_from_user when accessing linear page table in shadow_fault().
This is safer, and direct access may crash hypervisor by some potential
bug. Also remove some tailing space.
The maximum instruction length for both x86-32 and
x86-64 is 15 bytes (including all prefixes, opcode,
ModRM, SIB, displacement, and immediate bytes).
This patch adjusts the MAX_INST_LEN to the correct
value. This should reduce the size of some variables
in the hypervisor code. This patch also does some
minor code clean-up in the vm exit handler for VMX.
When running test 5 in Memtest86+ v1.65, I got a "this opcode is not
supported", so I decided to add it. It's a compare operation, and it's
just the opposite of the already supported one (opcode 0x39), so it's
nothing spectacular. Why there's a page-fault when this instruction gets
executed, I haven't got a clue, but I have a feeling that Memtest86 is
doing something wrong :-( However, this fix may help some other code to
run too...
With this, Test 5 passes all the way through without crashing. I did see
some occassional memory errors in some other tests, and I'm not 100%
sure whether those are caused by the system or they are "real" memory
errors. At some time in the future I may get round to memory testing my
target system...
Signed off by: Mats Petersson (mats.petersson@amd.com).=20
Make event_pending() architecture-specific.
PowerPC needs this because the domain can directly modify the hardware's
"interrupts enabled" bit, and we don't want to patch Linux to replace
all those accesses to use evtchn_upcall_mask instead.
Add a new config option for all backend drivers. This has two benefits:
1. All backend drivers can be disabled or modularised via
one config option.
2. Backend helper routines that are not specific to any particular driver
can be disabled or modularised based on this config option. In
particular this may allow backend drivers plus the service module
to be upgraded separate from the kernel core as and when the backend
interfaces change (and they will).
Fix the test inside all_devices_ready, and move it from xenbus_probe (a
postcore_initcall) to a new late_initcall, so that it happens after the
drivers have initialised.
If the 'cdrom=' option is specified in the definition file but media is
not found in the CD drive then main() in vl.c exits and the guest appears
to hang. This patch modifies vl.c slightly to check for the presents of
media. If the cdrom cannot be opened then the cd entry is removed from
hd_filename[] and bs_table[] allowing the guest to continue initializing.
If the guest requires the CD media then the guest should report, gracefully
or otherwise, that it's missing.
* Move .PHONY directives next to targets,
this makes them a lot harder to miss
* Add missing .PHONY directives
* Remove nonexistent .PHONY directives
* Hopefully I didn'T miss anything...
In the case where XEN_PYTHON_NATIVE_INSTALL is in effect,
if DESTDIR is not set then the install will go into a relative
directory rather than under the default prefix (usually /usr).
An alternate solution would be to update the fragments
that do the python install to use $(DESTDIR)/ instead of
$(DESTDIR). This is not an incredible burden as there
are only two such fragments in the tree. However, it
seems prone to error as new makefiles are created
in the future.
build: Remove iptables and python loging helper targets
These targets don't really fit into the build infastructure,
for instance there is no faclilty for them to be removed
on make distclean. I posted a patch that fleshed out the targets,
but Christian Limpach suggested to me that removing them
would be a better idea.
I used the wrong operator in a couple places for putting together some
error messages out of format strings. This patch corrects those
operators and fixes the strings.
Introduce page_to_bus() and use it in pci-dma-xen.c and swiotlb.c. On
xen/ia64 with the P2M/VP model pseudo physical address(gpaddr) is
fully virtualized so it defines
xen_features(XENFEAT_auto_translated_physmap) = 1. In this case
page_to_phys(page) should return pseudo physical address like
pfn_to_mfn() and its families. However dma is not virtualized, it
can't be used for pci-dma-xen.c, swiotlb.c.
Robustify and add tracing to the IO-APIC update hypercall.
If this patch, and any others that follow it, fix some of the
prblems that various users have been seeing then they may
be good candidates for backporting to 3.0.2 (assuming no
regressions for other users).
Do not disable spurious irq debugging in i386 xenlinux. It may
be masking underlying problems, and the problem it was intended
to work around should be fixed properly.
Do not accept empty definition of __XEN_INTERFACE_VERSION__
in xen-compat.h. It leads to building a broken kernel image
where the kernel sources end up using an unexpected interface
version. In the case of Linux, the kernel expects to use
the new sched_op() hypercall but ends up calling the
legacy hypercall -- this breaks poll, reboot, and save/restore.
A more acceptable patch would be to detect the empty
definition in xen-compat.h and give a reasonable #error message
to fail the build: the current error message is confusing.
Support __XEN_INTERFACE_VERSION__ defined to the empty string.
This can happen when building Linux with an old .config file which
doesn't have a value for CONFIG_XEN_INTERFACE_VERSION.
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Do not create blkback vbd kernel thread until fully connected
to frontend driver. Otherwise the kernel thread may crash trying
to access the non-existent shared ring.