This patch updates the documentation and extends the 'xm' man page with
the integrated access control management commands. The man page is a
good place to start exploring these commands.
This patch integrates the new access control management tools into 'xm'
and 'xend' and supports label/ssid translation support for
migration/life-migration/resume.
This patch adds new python access control management scripts, which
integrate into Xen Management and which support the new access control
labels (labels replace the ssidref numbers at the management user
interface).
This patch adds support in the hypervisor for the policy name attribute
introduced into security policies. It also fixes a minor problem related
to handling unsupported boot policies.
This patch adds support in the hypervisor for the policy name attribute
introduced into security policies. It also fixes a minor problem related
to handling unsupported boot policies.
This patch adds a policy name to the policy definition. This policy name
must be unique and must change if the content of the file changes. The
policy name is used to ensure that the XM tools and the hypervisor work
on the same policy, i.e., interpret the security information on domains
consistently. This patch also simplifies the policy management by moving
policy and labels into a single file.
The Xen checksum offload feature attempts to insert a TCP/UDP
checksums into already encrypted packets (esp4) in dom0. Obviously,
it is not possible to insert a checksum into an already encrypted
packet, so this patch inserts the checksum prior to encrypting
packets in net/ipv4/xfrm4_output.c.
To do this cleanly, the TCP/UDP header pointers need to be pointed to
the correct spot, so this functionality has been abstracted into a new
function.
This patch fixes bug 143 (verified by Jim Dykman). Earlier version
verified by Jon McCune.
Signed-off-by: James Dykman <dykman@us.ibm.com> Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Remove update_vcpu_system_time() call from the per-VCPU timer
callback function. It's unnecessary and in fact may occasionally
even run on the wrong CPU.
Avoid flood of PIT interrupts while debugging an hvm guest.
This is rebased to the new PIT code now. It has the same logic as
earlier. PIT tries to catch up the missed timer ticks by injected all
the ticks one by one so that Guest time stays close to the wall clock.
But while debugging a hvm guest if you stop the guest by debugger and
then continue, the guest sees flood of interrupts compensating the
missed ticks for the stopped time. This patch just check if the guest is
being debugged, if yes then it does not try to catch up with the missed
ticks.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@.intel.com>
Fix command-line parsing in a few respects -- be more
generous about what we accept, avoid stack overflow, and
print the command line during boot (rather useful!).
This should fix the 'lapic' and 'nolapic' boot options.
The attached patch allows external devices to migrate. The patch
contains code that allows to at least detect local migration of a
virtual machine and handles this for the virtual TPM (results in a no-op
for local migr.). If migration of a virtual machine with attached vTPM
to another machine is attempted, XenD will return an error.
Propagate information about bad (or good) REGSEL register
of chipset IO-APICs to Xen. If REGSEL is bad (some old SiS
chipsets) then we have a slower read-modify-write routine.
Loosely based on an original patch from Jan Beulich.
Fix the "hda lost interrupt" issue when creating a VMX guest on a PAE
host.
Occasionally when injecting an IDE DMA interrupt into the guest, a
page fault occurs (e.g., because the IDT mapping is not present in
shadow pagetables). This causes an immediate vmexit and, because it
occurred during event delivery, the original VM_ENTRY_INTR_INFO_FIELD
is kept in IDT_VECTORING_INFO_FIELD.
The current code copies IDT_VECTORING_INFO_FIELD back to
VM_ENTRY_INTR_INFO_FIELD, intending that the interrupt will be
injected again on next vmresume.
However, there is a corner case: if, before the next vmresume, a timer
interrupt happened then vmx_intr_assist may overwrite the information
on VM_ENTRY_INTR_INFO_FIELD, and the IDE DMA interrupt is effectively
lost.
This patch checks the IDT_VECTORING_INFO_FIELD in vmx_intr_assist and,
if it is set, copies it to VM_ENTRY_INTR_INFO_FIELD and returns.
Signed-off-by: Yunhong Jiang <Yunhong.jiang@intel.com> Signed-off-by: Eddie Dong <eddie.dong@intel.com>
There are instances where we DO NOT want an hvm guest to run an
MP enabled kernel. In such situations we should have a workaround to
guarantee hvm guests will not detect MP.
For example, in the absence of ACPI and MPS the installation code in some
linux distributions key off the presence of cpuid edx/HTT bit (indicating
the presence of Hyper-Threading Technology) to determine if another
logical processor is present and if so load an MP enabled kernel instead
of a uniprocessor kernel. SMBIOS is also looked at for the same purpose
and presents a potential problem as well. While both approaches for
selecting an MP kernel are debatable (since using MPS or ACPI have long
been the standard for MP detection), these approaches are something we
have to live and work around with because making a change in the fully
virtualized guest is not an option.
To solve the problem we need to hide all secondary processors from the hvm
guest. Since the hvm does not surface MPS tables, we only need to deal
with ACPI, cpuid HTT, and possibly SMBIOS. (I did not have time right
now to look closely at the hvm BIOS to know if SMBIOS is also going to be
a problem.)
Also fixes a logic problem the code path where apic=0 was not
being handled correctly (vmx path only).
Enable migration of a domain to the local machine - some timing
issues needed to be resolved by executing certain code early/later
Restore the physical to machine array such that balloon-allocated
pages can be deallocated.
SVM patch to ensure that PAE bit is set for 32bit guests on 32bit PAE,
by using paging levels>=3 rather than ifdef i386. This patch fixes
the "black screen" hang issue when building w/XEN_TARGET_X86_PAE=y on
32bit.
Tested linux debian and win2003EE guests with pae=1. The linux
guest boots without error, while the windows guest sometimes hits a
bug() in shadow.c. Both VT and SVM encounter the same bug.
In both i386 and x86-64 Linux, using a static variable (and thus
having the potential of missing synchronization there,
as I suspect exists in native Linux) is not needed with the hypercall
approach. In the hypervisor, the patch adds the
needed synchronization.
Build frontend drivers into the -xen kernels rather than as
modules. Most people's initrd-building tools will not know
about these drivers so it will only cause confusion not to
include them in the kernel core image.
Fix the 15_create_smallmem_pos.py test, which was failing because the
set console.limit command in the test was never being run. The select in
Console.py was never timing out because there was always someting to read on
the fd, the OOM messages are constant. So the test would hang.
The fix includes:
1) Changing MEM in 15_create_smallmem_pos.py to 32MBs, which is the default
for the tools that should work.
2) Change the XmConsole init to add an argument to set the console limit
when it's created.
3) Set a default large limit for console so we won't hang in the future.
4) Added a new 16_create_smallmem_neg.py test to handle failure situation.
5) Added comment in README.
Signed-off-by: Daniel Stekloff <dsteklof@us.ibm.com>
Increase size of level-2 initial PDE identity map from first 64MB of
physical RAM to first 1GB of physical RAM. This allows x86_64 xen to boot
larger dom0 images. Without this changes large dom0 images fail to
boot with "Unknown interrupt" on xen console and wedge.
The Xen Hypervisor currently operates a bit differently when the
guest is being debugged. The differences are handling of int3 exception
and missed pit timer injections. The Xen hypervisor should get back to
the normal mode when the gdb connection is closed. With the attached
patch gdbserver properly detaches from the guest when the gdb detaches
or quits.
Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com>
This patch defines a test_and_clear bitop for cpumask_t pointers.
Also fixes "wrong pointer type" for type specific bitops by using
&foo[0] instead of &foo.
Read the message type out of the message before sending it to xenstored, and
use that saved value when handling the reply. Xenstored will leave the
message type intact, _except_ when returning an error, in which case it will
change the type to XS_ERROR. This meant that we failed to remove a
transaction from our internal list if xenstored returned EAGAIN, as we did not
realise that the message was XS_TRANSACTION_END. This manifested itself as
the intended behaviour until the connection was closed, at which point all of
those failed transactions would erroneously be aborted.
Big fixes for the new IO-APIC acknowledging method. The problems
were:
1. Some critical Xen interrupts could get blocked behind
unacknowledged guest interrupts. This is avoided by making
all Xen-bound interrrupts strictly higher priority.
2. Interrupts must not only be EOIed on the CPU that received
them, but also in reverse order when interrupts are nested.
A whole load of logic has been added to ensure this.
There are two boot parameters relating to all this:
'ioapic_ack=old' -- use the old IO-APIC ACK method
'ioapic_ack=new' -- use the new IO-APIC ACK method (default)
'force_intack' -- periodically force acknowledgement of
interrupts (default is no; useful for debugging)
This patch adds a section to the documentation on the late binding
feature for PCI devices. It provides some examples (mostly stolen from
the e-mail which accompanied the late-binding patch) of how to use the
sysfs attributes for late binding.
This patch was revised from the last documentation patch that I
submitted which included this and some documentation on the permissive
flag. I've divided the two sections up and I'd like this one considered
for acceptance now while I revise the permissive flag code.
Currently, it is possible to set the mem-max value to value lower than
what has been currently allocated to the domain causing the kernel to
crash. This patch validates the value passed in and prevents setting the
value below the current allocation level.
This patch enables external devices, such as for example a mounted hard
drive image or a TPM, to be migrated to a remote machine. The patch
hooks into the checkpointing (XendCheckpoint.py) code and performs
migration in 4 different steps:
In a 1st step (step = 0 in the code) migration of all devices of a
domain is 'tested', that means their driver implementations (blkif.py,
netif.py, tpmif.py, usbif.py, pciif.py) are queried whether migration is
possible at all. Currently all device representations respond with a
'yes' (=0), although probably a VM mounting a hard drive partition
should respond with a 'no' (-1) already. This first step is a quick
check to see whether devices can be migrated.
The 2nd step is to do whatever can be done before the domain is
suspended. At this point migration of the device could be initiated, if
at all possible.
The 3rd step is to migrate a device after the domain has been suspended,
meaning that it is not scheduled anymore and the VM is 'settled'. All
devices are called again and a good implementation would initiate the
migration in a background process to achieve as much concurrency as
possible.
The 4th step is to synchronize with the 3rd step. At this point the
implementor has to make sure that anything that was initiated in step 3
has completed. Once all steps 4 have been processed, the VM will resume
on the remove machine.
I have implemented hooks for migration of a virtual TPM in
xen/xend/server/tpmif.py. These hooks call a configurable external
migration tool using the os.popen() call with a fixed command line
parameter set. The implementation refuses to migrate a VM attached to a
virtual TPM if no tool has been provided for migration.
All other devices do not currently overload the 'migrate' method defined
in the DevController.py and therefore will just let migration happen.
I have added hooks for error recovery such that whatever part of
migration has been initiated can be rolled back when any of the devices
fail to migrate in one of the steps. The interface (in tpmif.py) to the
external application now uses os.popen() to allow error handling by
reading the application's output.
Clean up grant_mapping_t. Increase its size from 4bytes to 8bytes and
removed tight encoding of flag and ref. This change is xen-internal
so this shouldn't affect domain api.
Since we don't reset the proto_csum_blank flag in the skb, the
checksum calculation gets done twice, which is not twice as good as
once.
With this patch, TCP/UDP checksum errors from dom0 are fixed, and
domUs can use TCP/UDP without turning off TX checksum offload. Normal
non-VLAN bridged configs still work fine, tested with xm-test.
This is a patch for XenMon which only applies to the userspace tools.
The primary purpose of this patch is to add support for non-polling
access to the xen trace buffers. The hypervisor changes have already
been accepted.
Also included are a few bug fixes and some minor new features:
1. If xenmon is run without first allocating trace buffers (via
'setsize') and enabling them (via 'tbctl'), then this is done
automatically using sensible defaults.
2. There was a bug that caused the first second's worth of data output
from xenmon to be erroneous; This has been fixed.
3. There was a bug that caused xenmon to sometimes not display data for
newly created domains; This has also been fixed.
4. The xenmon display has a 'heartbeat' which flickers once per second.
This is to show that xenmon is still alive, even though the display
isn't changing at all, a situation that can happen sometimes when there
is nothing at all happening on a particular cpu.
5. Added cpu utilization display to the top of the xenmon window.
6. Added a bunch of options in xenmon to control exactly which metrics
are displayed, so the screen doesn't get cluttered with stuff you're not
interested in. The new options are:
--allocated
--noallocated
--blocked
--noblocked
--waited
--nowaited
--excount
--noexcount
--iocount
--noiocount
7. Added an option ("--cpu=N") to xenmon to specify which physical cpu
you'd like data displayed for.
8. Updated the README with information about default trace buffer size, etc.
Trivial patch to fix x86_64 builds in which XEN_TARGET_ARCH
is specified on the make command line, e.g.:
make XEN_TARGET_ARCH=x86_64
This busted the vmxassist and hvmloader builds, which must
be done -m32. Using "override" in the vmxassist/hvmloader
Makefiles fixes the problem by not allowing this to be
overridden from the command line.
Signed-off-by: Dave Lively <dlively@virtualiron.com>
New IO-APIC ACK method seems to cause problems on some systems
(e.g., Dell 1850). Disable it by default for now, but allow the
new mwethod to be tested by passing boot parameter 'new_ack'
to Xen.
You can tell which ACK method you are using because Xen prints
out "Using old ACK method" or "Using new ACK method" during boot.
This workaround can be removed if/when the problems with the new
ACK method are flushed out.
Fix Xen's interrupt acknowledgement routines on certain
(apparently broken) IO-APIC hardware:
1. Do not mask/unmask the IO-APIC pin during normal ISR
processing. This seems to have really bizarre side effects
on some chipsets.
2. Since we instead tickle the local APIC in the ->end
irq hook function, it *must* run on the CPU that
received the interrupt. Therefore we track which CPUs
need to do final acknowledgement and IPI them if
necessary to do so.
In some cases, say for instance for some bizzare reason
the tree was checked out of CVS, which doens't neccessarily
store file permissions, mkbuildtree may not be executable.
So run them explicitly via bash.