Keir Fraser [Tue, 18 Mar 2008 11:43:42 +0000 (11:43 +0000)]
netfront accel: Reduce length of accel_watch workqueue name
As reported reported (see link below) the first argument to
create_workqueue is limited to 10 character on RHEL4U2. This patch
shortens the length and makes the workqueue name a little more
descriptive.
Keir Fraser [Tue, 18 Mar 2008 11:22:54 +0000 (11:22 +0000)]
xen/i386: re-add and use pre_setup_arch_hook()
It was only during the 2.6.25 merge that I realized that there was a
difference to native code that was not only unnecessary, but even
preventing the Xen version from being better readable and closer to
native both in terms of source code and behavior:
pre_setup_arch_hook() can do everything that (or equivalent to what)
x86-64 does in head64-xen.c. Apart from that it simplifies forward
porting, since certain pieces set up here are required to be available
much earlier in newer Linux.
Keir Fraser [Tue, 18 Mar 2008 11:21:44 +0000 (11:21 +0000)]
xen: satisfy newer modpost requirements, part 2
References to __devinit probe functions are considered valid by newer
modprobe if the containing structure is named in certain ways. Use the
_driver suffix where necessary.
Keir Fraser [Tue, 18 Mar 2008 11:18:06 +0000 (11:18 +0000)]
xen pcifront: satisfy newer modpost requirements
pcifront_scan_root() calls pci_scan_bus_parented() possibly after core
kernel initialization, but that latter function is marked __devinit.
While HOTPLUG can be turned off only under EMBEDDED, 2.6.25's modpost
still (validly) catches this as an incorrect reference. Marking
pcifront_scan_root() __init_refok seems too dangerous, however, so
instead the much more streamlined pcifront_backend_changed() is being
marked so, and the rest of the offending dependencies is being marked
__devinit, and XEN_PCIDEV_FRONTEND now selects HOTPLUG to make it
independend of any changes in HOTPLUG's prompt visibility.
Keir Fraser [Tue, 18 Mar 2008 11:13:37 +0000 (11:13 +0000)]
xen: validate type and value of the dtor argument of SetPageForeign()
Linux 2.6.25 changes the protoype of pte_free() etc., resulting in
those functions no longer be suitable as a PageForeign destructor. I
had to find out by way of analysing a crash, but for the future it'd
be much better if the build would already indicate a problem with
this.
At the same time, also check the destructor supplied is not NULL.
Keir Fraser [Wed, 5 Mar 2008 17:28:41 +0000 (17:28 +0000)]
xenbus: prevent warnings on unhandled enumeration values
XenbusStateReconfiguring/XenbusStateReconfigured were introduced by
c/s 437, but aren't handled in many switch statements. This c/s also
introduced a possibly un-referenced label, which also gets eliminated
here.
Keir Fraser [Wed, 5 Mar 2008 17:28:04 +0000 (17:28 +0000)]
linux: fix pv driver build
When building with -Werror-implicit-function-declaration, the addition
of is_initial_xen_domain() checks in drivers/xen/netfront/accel.c
causes the build to fail.
Additionally, drivers/xen/netfront/netfront.c illegally (and
needlessly) includes xen/hypercall.h directly.
Keir Fraser [Wed, 5 Mar 2008 17:27:36 +0000 (17:27 +0000)]
linux/i386-pae: fix __pte_ma()
While at present there's no use of the macro that would suffer from
this problem, this is a latent bug and should therefore be fixed (just
like __pte() in the native kernel).
Keir Fraser [Wed, 5 Mar 2008 11:09:41 +0000 (11:09 +0000)]
x86 xen: New vcpu_op call to get physical CPU identity.
Some AMD machines have APIC IDs that not equal to CPU IDs. In
the default Xen configuration, ACPI calls on these machines
can get confused. This shows up most noticeably when running
AMD PowerNow!. The only solution is for dom0 to get the
hypervisor's cpuid to apicid table when needed (ie, when dom0
vcpus are pinned).
Add a vcpu op to Xen to allow dom0 to query the hypervisor for
architecture dependent physical cpu information if dom0 vcpus are
pinned.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Alex Williamson [Fri, 29 Feb 2008 16:14:21 +0000 (09:14 -0700)]
[IA64] kexec: Unpin TLB in the hypervisor
In the case where kexec is being run for a xen dom0 or hypervisor,
this code, present in dom0, will be called from the hypervisor
after ia64_machine_kexec and before going into purgatory.
This code makes a large number of assumptions about various compile
time constants, and thus assumes that these constants are the
same for the hypervisor and dom0. Despite extensive #ifdef work this
has proved to be both fragile and incomplete.
This patch changes things around so that the unpinning work is done
by code provided by the hypervisor, reusing existing code there.
Apart from being a solution that works, its also likely
a much more maintainable solution, as as TLB changes in the hypervisor
code are made, the code paths in the hypervisor are much more likely
to be checked than this one which lies in a completely different tree.
With this change, and the corresponding change to the hypervisor,
the CPU will be running in "physical mode" when it enters this
relocate_new_kernel code. Previously the code was called with the
CPU in "virtual mode". This is still the case when the code is
called from a regular Linux kernel.
Code that switches the CPU into "physical mode" is still called.
This is primarily left in tact to ensure that the register stack
(bspstore) and stack (sp) are updated correctly. It could probably
be trimmed down a bit, but there seems little point.
When called from the hypervisor, in3 is ignored. The number
of parameters could hence be shrunk from 4 to 3, but again,
there seems little point.
This code still assumes that PAGE_SIZE and PAGE_MASK are the
same in the hypervisor and dom0. I am reluctant to fix this
as I think that it is unlikely this will change. But if it does,
the fix should be as easy as passing PAGE_SIZE as in3. I am quite
happy to create a patch if/when it is needed.
As noted above, there is a hypervisor portion of this patch
supplied separately.
Alex Williamson [Fri, 29 Feb 2008 16:06:15 +0000 (09:06 -0700)]
[IA64] kexec: is for privileged guests only
This makes the KEXEC Kconfig option depend on !XEN_UNPRIVILEGED_GUEST, so
that it is not available to unprivelaged guests. Or in other words,
it is only available to non-xen linux or privileged guests.
Some minor #defines relating to kexec have also been
updated.
linux/kexec.h is only needed in contig.c if both XEN and KEXEC
are in operation.
iomem_machine_resource is only used if PROC_IOMEM_MACHINE is in effect.
This does depend on XEN, but also depends on KEXEC and IA64.
Throughout the code #if CONFIG_XEN is used to guard regions.
This is ok, because the relevant code is only active if
KEXEC is configured, and thus implicitly the code is
being compiled with XEN_PRIVILEGED_GUEST
This is in line with the use of Kconfig on x86_64
(and presumably x86_32, though I did not check)
Ian Campbell [Tue, 26 Feb 2008 17:59:18 +0000 (17:59 +0000)]
Avoid using a separate watch thread due to uninitialised watch->flags.
The xenbus_dev code isn't setup to handle the case where
XBWF_new_thread is set so there is a potetial crash if this flag is
erroneously set. Therefore initialise flags to zero by using kzalloc
rather than kmalloc.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Alex Williamson [Fri, 22 Feb 2008 15:36:10 +0000 (08:36 -0700)]
[IA64] Fix vulnerability of privcmd_mmap
empty_zero_page can be polluted by writing to a page through
privcmd_mmap(). i.e. a user program can hang a privileged
domain (dom0), although root privilege is required.
Resetting the VM_PFNMAP flag is a little bit kludgy, but
fixes the issue.
After this patch is applied, other patches to Qemu become
necessary to create a HVM domain.
Keir Fraser [Fri, 29 Feb 2008 10:29:13 +0000 (10:29 +0000)]
xen: Fix PV resume race against another back-to-back suspend request.
Previously the next suspend request was being dropped on the floor. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 28 Feb 2008 10:55:18 +0000 (10:55 +0000)]
kexec: dont initialise regions in reserve_memory()
There is no need to initialise efi_memmap_res and boot_param_res in
reserve_memory() for the initial xen domain as it is done in
machine_kexec_setup_resources() using values from the kexec hypercall.
Keir Fraser [Thu, 28 Feb 2008 10:54:55 +0000 (10:54 +0000)]
kexec: read more iomem regions from hypervisor
This sets the location of the efi memmap and boot parameter
regions using information provided by the hypervisor,
overriding values derived by dom0 from the virtualised
efi memory regions.
It also creates a xen heap region and uses this as the parent
of per-cpu regions - they belong in hypervisor memory not
dom0 kernel memory.
The xen heap region is inserted into /proc/iomem_machine
* There is also a hypervisor portion of this patch.
* In order for the regions to show up after kexec patches
to kexec-tools are required. I have posted them
to the kexec mailing list and intend to merge them.
Keir Fraser [Thu, 28 Feb 2008 10:54:20 +0000 (10:54 +0000)]
kexec: add xen_machine_kexec_register_resources() and machine_kexec_register_resources()
Add xen_machine_kexec_register_resources() and
machine_kexec_register_resources() to allow architecture specific
handling of iomem resources.
At this time xen_machine_kexec_register_resources() does the
same parenting of per-cpu resources on all architectures.
And machine_kexec_register_resources does nothing on all
architectures.
Keir Fraser [Thu, 28 Feb 2008 10:53:42 +0000 (10:53 +0000)]
kexec: add parent to per-cpu regions at setup time.
This is slightly more efficient as xen_machine_kexec_setup_resorces()
is called once (before xen_machine_kexec_register_resources()). While
xen_machine_kexec_register_resources() is called once for each EFI
memory region seen by a domain.
More cosmetic than anything else, but it seems more logical to me.
Keir Fraser [Thu, 28 Feb 2008 10:52:47 +0000 (10:52 +0000)]
kexec: Use error path if crash region range can't be accessed
Although the error handling path in xen_machine_kexec_setup_resource()
is somewhat minmal, it ought to be used if HYPERVISOR_kexec_op() fails
whengetting the crash kernel region, as this indicates that an error
occured, not that the crash kernel region is empty.
There would seem to be a potential deadlock in the netfront accelerator
plugin support. When the configured accelerator changes in xenstore,
netfront tries to load the new plugin using request_module(). It does
this from a workqueue work item. request_module() will invoke
modprobe which in some circumstances (I'm not sure exactly what - I've
not managed to reproduce it myself) seems to try to flush the
workqueue, and so it deadlocks. This patch fixes the problem by
giving the accel watch work item its own workqueue, and so modprobe
can successfully flush the system-wide one.
Keir Fraser [Thu, 21 Feb 2008 10:21:34 +0000 (10:21 +0000)]
Solarflare: adjust Kconfig additions
Clean up drivers/xen/Kconfig after the Solarflare additions:
- placement of new items should not disturb menu hierarchy
- dependencies of XEN_NETDEV_ACCEL_SFC_BACKEND were missing
- use tabs for indentation
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Kieran Mansley <kmansley@solarflare.com>
Keir Fraser [Wed, 20 Feb 2008 18:05:47 +0000 (18:05 +0000)]
Solarflare: Various build fixes, and make SFC drivers dependent on x86
From: Kieran Mansley <kmansley@solarflare.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 15 Feb 2008 10:01:06 +0000 (10:01 +0000)]
Xen dom0 arbitrarily assigns APIC ID x to CPU ID x. Make dom0 also
assign the APIC ID to ACPI ID mapping in the same way. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Keir Fraser [Mon, 11 Feb 2008 11:05:27 +0000 (11:05 +0000)]
CVE-2008-0600: Fix exploitable hole in vmsplice() syscall.
Fix is Al Viro's suggested patch for RHEL5. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 11 Feb 2008 10:19:25 +0000 (10:19 +0000)]
fbfront: Clear ring contents on save/restore. Otherwise in some cases
a restored domain loses mouse and keyboard. Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com>
Keir Fraser [Mon, 11 Feb 2008 10:08:57 +0000 (10:08 +0000)]
block: backport Jens Axboe's commit from
Tue, 16 Oct 2007 09:03:56 +0000 (11:03 +0200) bf2de6f5a4faf0197268f18d08969b003b87b6e8
Initial support for data-less (or empty) barrier support
blkback: permit and implement empty barrier. Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Keir Fraser [Mon, 11 Feb 2008 09:55:25 +0000 (09:55 +0000)]
net accel: Fix double-probe of accelerator on suspend_cancel
Fixes a bug in the network acceleration stuff where an accelerator
could get probed with the same interface twice on a suspend-cancel -
once manually in the suspend_cancel handler, and once when the watch
on the accel configuration option fired after being reinstated.
Keir Fraser [Mon, 11 Feb 2008 09:52:49 +0000 (09:52 +0000)]
xen balloon: allocate and free cold pages
To reduce the performance side effects of ballooning, use and return
cold pages. To limit the impact scrubbing of these (and other) pages
has on the cache, also implement a dedicated scrubbing function on x86
which uses non-temporal stores (when available).
Keir Fraser [Mon, 11 Feb 2008 09:49:58 +0000 (09:49 +0000)]
xen/x86: fix and improve xen_limit_pages_to_max_mfn()
- don't do multicall when nr_mcl is zero (and specifically don't
access cr_mcl[nr_mcl - 1] in that case)
- fix CONFIG_XEN_COMPAT <=3D 0x030002 handling
- don't exchange pages already meeting the restriction (likely
avoiding exchanging anything at all)
- avoid calling kmap functions without CONFIG_XEN_SCRUB_PAGES
- eliminate a few local variables
Keir Fraser [Tue, 5 Feb 2008 10:05:19 +0000 (10:05 +0000)]
netback: Fix BUG_ON() on page-flip receive path which would always
trigger and crash the kernbel.
Tracked down by Joakim Dahlstedt <jda@bea.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 4 Feb 2008 14:29:03 +0000 (14:29 +0000)]
ebtables: don't compute gap until we know we have an ebt_entry
Original upstream Linux patch by Chuck Ebbert. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 1 Feb 2008 11:11:12 +0000 (11:11 +0000)]
Do not allocate vcpu_guest_context on the stack when initialising a
new VCPU. It is too big for 4kB stacks.
Original patch by Donald Dutile <ddutile@redhat.com> backported from
upstream pv_ops work. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Mon, 21 Jan 2008 11:43:31 +0000 (11:43 +0000)]
blkback/blktap: Check for kthread_should_stop() in inner loop,
mdelaay() should be msleep(), and these changes belong in blktap as
well as blkback.
Based on comments and patches from Jan Beulich and Steven Smith. Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 18 Jan 2008 16:52:25 +0000 (16:52 +0000)]
blkback: Request-processing loop is unbounded and hence requires a
yield point. Also, bad request type is a good cause to sleep for a
short while as the frontend has probably gone mad.
Patch by Steven Smith <steven.smith@eu.citrix.com>
Keir Fraser [Fri, 18 Jan 2008 16:35:24 +0000 (16:35 +0000)]
linux/x86: clean up hypercall headers
- don't define HYPERVISOR_hvm_op() for pv guests (requiring to not
include
include/xen/hvm.h in non-pv-driver builds)
- remove the custome __STR/STR macros
- remove stringification where not necessary
- reduce instruction size for pv-driver case on x86-64