Steven Smith [Sun, 4 Oct 2009 11:50:46 +0000 (12:50 +0100)]
Add support for automatically creating and destroying bypass rings in response to observed traffic.
This is designed to minimise the overhead of the autobypass machine,
and in particular to minimise the overhead in dom0, potentially at the
cost of not always detecting that a bypass would be useful. In
particular, it isn't triggered by transmit_policy_small packets, and
so if you have a lot of very small packets then no bypass will be
created.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Sun, 4 Oct 2009 11:46:32 +0000 (12:46 +0100)]
Bypass support, for both frontend and backend.
A bypass is an auxiliary ring attached to a netchannel2 interface
which is used to communicate with a particular remote guest,
completely bypassing the bridge in dom0. This is quite a bit faster,
and can also help to prevent dom0 from becoming a bottleneck on large
systems.
Bypasses are inherently incompatible with packet filtering in domain
0. This is a moderately unusual configuration (there'll usually be a
firewall protecting the dom0 host stack, but bridge filtering is less
common), and we rely on the user turning off bypasses if they're doing
it.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Sun, 4 Oct 2009 11:38:25 +0000 (12:38 +0100)]
Add support for receiver-map mode.
In this mode of operation, the receiving domain maps the sending
domain's buffers, rather than grant-copying them into local memory.
This is marginally faster, but requires the receiving domain to be
somewhat trusted, because:
a) It can see anything else which happens to be on the same page
as the transmit buffer, and
b) It can just hold onto the pages indefinitely, causing a memory leak
in the transmitting domain.
It's therefore only really suitable for talking to a trusted peer, and
we use it in that way.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Sun, 4 Oct 2009 11:25:44 +0000 (12:25 +0100)]
Add a fall-back poller, in case finish messages get stuck somewhere.
We try to avoid the event channel notification when sending finish
messages, for performance reasons, but that can lead to a deadlock if
you have a lot of packets going in one direction and nothing coming
the other way. Fix it by just polling for messages every second when
there are unfinished packets outstanding.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 16:48:43 +0000 (17:48 +0100)]
Extend the grant tables implementation with an improved allocation batching mechanism.
The current batched allocation mechanism only allows grefs to be
withdrawn from the pre-allocated pool one at a time; the new scheme
allows them to be withdrawn in groups. There aren't currently any
users of this facility, but it will simplify some of the NC2 logic
(coming up shortly).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 16:47:55 +0000 (17:47 +0100)]
Add support for transitive grants.
These allow a domain A which has been granted access on a page of
domain B's memory to issue domain C with a copy-grant on the same
page. This is useful e.g. for forwarding packets between domains.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 16:45:49 +0000 (17:45 +0100)]
Add support for copy only (sub-page) grants.
These are like normal access grants, except:
-- They can't be used to map the page (so can only be used in a
GNTTABOP_copy hypercall).
-- It's possible to grant access with a finer granularity than whole
pages.
-- Xen guarantees that they can be revoked quickly (a normal map
grant can only be revoked with the cooperation of the domain which
has been granted access).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 16:03:44 +0000 (17:03 +0100)]
Fix a long-standing memory leak in the grant tables implementation.
According to the interface comments, gnttab_end_foreign_access() is
supposed to free the page once the grant is no longer in use, from a
polling timer, but that was never implemented. Implement it.
This shouldn't make any real difference, because the existing drivers
all arrange that with well-behaved backends references are never ended
while they're still in use, but it tidies things up a bit.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 15:56:58 +0000 (16:56 +0100)]
Use the foreign page tracking logic in netback.c. This isn't terribly
useful, but will be necessary if anything else ever introduces
mappings of foreign pages into the network stack.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 15:52:00 +0000 (16:52 +0100)]
Introduce a live_maps facility for tracking which domain foreign pages were mapped from in a reasonably uniform way.
This isn't terribly useful at present, but will make it much easier to
forward mapped packets between domains when there are multiple drivers
loaded which can produce such packets (e.g. netback1 and netback2).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
xen/netback: use smart polling instead of event notification
Netback will not notify netfront until it finds that the netfront has
stopped polling.
Netback will set a flag in xenstore to indicate whether netback
supports the smart polling feature. If there is only one side
supporting it, the communication mechanism will fall back to default,
and the new feature will not be used. The feature is enabled only
when both sides have the flag set in xenstore.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Change c3219dc868fe3e84070d6da2d0759a834b6f7251, "Completely drop flip
support" was a bit too aggressive in removing code, and removed a chunk
which was used for not only flip but if a buffer crossed a page boundary.
Reinstate that code.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keir says:
> > Does CONFIG_XEN_NETDEV_PIPELINED_TRANSMITTER need to be a config
> > option? Could/should we always/never set it?
> It doesn't work well with local delivery into dom0, nor even with IP
> fragment reassembly. I don't think we would ever turn it on these days.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Jan Beulich [Fri, 6 Mar 2009 08:29:31 +0000 (08:29 +0000)]
linux/netback: unmap tx ring gref when mapping of rx ring gref failed
[ijc-ported from linux-2.6.18-xen.hg 782:51decc39e5e7] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 9 Feb 2009 20:05:50 +0000 (12:05 -0800)]
xen: Include xen/interface/xen.h in grant_table.h
Impact: compile fix
otherwise build fails with CONFIG_XEN_DOM0=n
In file included from /local/scratch/ianc/devel/kernels/paravirt/include/xen/grant_table.h:41,
from /local/scratch/ianc/devel/kernels/paravirt/drivers/pci/xen-iommu.c:13:
/local/scratch/ianc/devel/kernels/paravirt/include/xen/interface/grant_table.h:96: error: expected specifier-qualifier-list before 'domid_t'
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
When running in Xen domain with device access, we need to make sure
the block subsystem doesn't merge requests across pages which aren't
machine physically contiguous. To do this, we define our own
BIOVEC_PHYS_MERGEABLE. When CONFIG_XEN isn't enabled, or we're not
running in a Xen domain, this has identical behaviour to the normal
implementation. When running under Xen, we also make sure the
underlying machine pages are the same or adjacent.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Alex Nixon [Mon, 9 Feb 2009 20:05:46 +0000 (12:05 -0800)]
xen: Allow unprivileged Xen domains to create iomap pages
PV DomU domains are allowed to map hardware MFNs for PCI passthrough,
but are not generally allowed to map raw machine pages. In particular,
various pieces of code try to map DMI and ACPI tables in the ISA ROM
range. We disallow _PAGE_IOMAP for those mappings, so that they are
redirected to a set of local zeroed pages we reserve for that purpose.
[ Impact: prevent passthrough of ISA space, as we only allow PCI ]
Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Stephen Tweedie [Thu, 18 Jun 2009 23:54:24 +0000 (16:54 -0700)]
xen/mtrr: Add mtrr_if support for Xen mtrr
Add a Xen mtrr type, and reorganise mtrr initialisation slightly to
allow the mtrr driver to set up num_var_ranges (Xen needs to do this by
querying the hypervisor itself.)
Only the boot path is handled for now: we set up a xen-specific mtrr_if
and set up the mtrr tables based on hypervisor information, but we don't
yet handle mtrr entry add/delete.
[ Impact: add basic MTRR support (enough to get started) ]
Signed-off-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Stephen Tweedie [Sat, 7 Feb 2009 03:09:47 +0000 (19:09 -0800)]
x86/mtrr: Extend mtrr_if to include num_var_ranges
Reorganise mtrr initialisation slightly to allow the mtrr driver to
set up num_var_ranges. This cleans things up by making each driver
return the number of ranges rather than having a single function with
magic knowledge.
[ Impact: clean up num_var_ranges operation ]
Signed-off-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Christophe Saout [Sat, 17 Jan 2009 16:30:17 +0000 (17:30 +0100)]
x86/paravirt: paravirtualize IO permission bitmap
Paravirtualized x86 systems don't have an exposed TSS, as it is only
directly visible in ring 0. The IO permission bitmap is part of
the TSS, so with out a TSS, it must be paravirtualized separately,
like the iopl mask.
[ Impact: make ioperm bitmap work under Xen ]
Signed-off-by: Christophe Saout <chtephan@leto.intern.saout.de> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
xen/i386: make sure initial VGA/ISA mappings are not overridden
arch/x86/mm/init_32.c overrides the ISA/VGA mappings with direct mappings
which do not have _PAGE_IOMAP set, thereby making the ISA space inaccessible.
This patch adds to the existing hack to make sure the pre-constructed
ISA mappings are not incorrectly overwritten.
Thanks to Gerd Hoffman for pointing this out.
[ Impact: Makes 32-bit dom0 VGA work properly. ]
Diagnosed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Get the information about the VGA console hardware from Xen, and put
it into the form the bootloader normally generates, so that the rest
of the kernel can deal with VGA as usual.
[ Impact: make VGA console work in dom0 ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Ian Campbell [Sat, 7 Feb 2009 03:17:22 +0000 (19:17 -0800)]
xen: clear reserved bits in l3 entries given in the initial pagetables
In native PAE, the only flag that may be legitimately set in an L3
entry is Present. When Xen grafts the top-level PAE L3 pagetable
entries into the L4 pagetable, it must also set the other permissions
flags so that the mapped pages are actually accessible.
However, due to a bug in the hypervisor, it validates update to the L3
entries as formal PAE entries, so it will refuse to validate these
entries with the extra bits requires for 4-level pagetables.
This patch simply masks the entries back to the bare PAE level,
leaving Xen to add whatever bits it feels are necessary.
[ Impact: workaround Xen bug in 32-on-64 dom0 ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Ian Campbell [Sat, 7 Feb 2009 03:15:40 +0000 (19:15 -0800)]
xen: implement XENMEM_machphys_mapping
This hypercall allows Xen to specify a non-default location for the
machine to physical mapping. This capability is used when running a 32
bit domain 0 on a 64 bit hypervisor to shrink the hypervisor hole to
exactly the size required.
[ Impact: add Xen hypercall definitions ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Ian Campbell [Sat, 7 Feb 2009 03:09:48 +0000 (19:09 -0800)]
xen/dom0: Use host E820 map
Unlike the non-paravirt Xen port we do not have distinct psuedo-physical
and I/O memory resource-spaces and therefore resources in the two
can clash. Fix this by registering a memory map which matches the
underlying I/O map. Currently this wastes the memory in the reserved
regions. Eventually we should remap this memory to the end of the
address space.
[ Impact: synthesize virtual E820 using real one as template ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
xen/dom0: use _PAGE_IOMAP in ioremap to do machine mappings
In a Xen domain, ioremap operates on machine addresses, not
pseudo-physical addresses. We use _PAGE_IOMAP to determine whether a
mapping is intended for machine addresses.
[ Impact: allow Xen domain to map real hardware ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Linus Torvalds [Wed, 17 Jun 2009 17:42:21 +0000 (10:42 -0700)]
Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Fix some typos in comments
kmemleak: Rename kmemleak_panic to kmemleak_stop
kmemleak: Only use GFP_KERNEL|GFP_ATOMIC for the internal allocations
Catalin Marinas [Wed, 17 Jun 2009 17:29:02 +0000 (18:29 +0100)]
kmemleak: Only use GFP_KERNEL|GFP_ATOMIC for the internal allocations
Kmemleak allocates memory for pointer tracking and it tries to avoid
using GFP_ATOMIC if the caller doesn't require it. However other gfp
flags may be passed by the caller which aren't required by kmemleak.
This patch filters the gfp flags so that only GFP_KERNEL | GFP_ATOMIC
are used.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Linus Torvalds [Wed, 17 Jun 2009 16:51:50 +0000 (09:51 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c
[CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c
[CPUFREQ] powernow-k8: get drv data for correct CPU
[CPUFREQ] powernow-k8: read P-state from HW
[CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[]
[CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier()
[CPUFREQ] minor correction to cpu-freq documentation
[CPUFREQ] powernow-k8.c: mess cleanup
[CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful
[CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0
[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6:
[SCSI] aic79xx: make driver respect nvram for IU and QAS settings
[SCSI] don't attach ULD to Dell Universal Xport
[SCSI] lpfc 8.3.3 : Update driver version to 8.3.3
[SCSI] lpfc 8.3.3 : Add support for Target Reset handler entrypoint
[SCSI] lpfc 8.3.3 : Fix a couple of spin_lock and memory issues and a crash
[SCSI] lpfc 8.3.3 : FC/FCOE discovery fixes
[SCSI] lpfc 8.3.3 : Fix various SLI-3 vs SLI-4 differences
[SCSI] qla2xxx: Resolve a performance issue in interrupt
[SCSI] cnic, bnx2i: Fix build failure when CONFIG_PCI is not set.
[SCSI] nsp_cs: time_out reaches -1
[SCSI] qla2xxx: fix printk format warnings
[SCSI] ncr53c8xx: div reaches -1
[SCSI] compat: don't perform unneeded copy in sg_io code
[SCSI] zfcp: Update FC pass-through support
[SCSI] zfcp: Add FC pass-through support
[SCSI] FC Pass Thru support
Linus Torvalds [Wed, 17 Jun 2009 16:48:30 +0000 (09:48 -0700)]
Merge branch 'linux-next' of git://git.infradead.org/ubi-2.6
* 'linux-next' of git://git.infradead.org/ubi-2.6: (21 commits)
UBI: add reboot notifier
UBI: handle more error codes
UBI: fix multiple spelling typos
UBI: fix kmem_cache_free on error patch
UBI: print amount of reserved PEBs
UBI: improve messages in the WL worker
UBI: make gluebi a separate module
UBI: remove built-in gluebi
UBI: add notification API
UBI: do not switch to R/O mode on read errors
UBI: fix and clean-up error paths in WL worker
UBI: introduce new constants
UBI: fix race condition
UBI: minor serialization fix
UBI: do not panic if volume check fails
UBI: add dump_stack in checking code
UBI: fix races in I/O debugging checks
UBI: small debugging code optimization
UBI: improve debugging messages
UBI: re-name volumes_mutex to device_mutex
...
Linus Torvalds [Wed, 17 Jun 2009 16:46:33 +0000 (09:46 -0700)]
Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: start using hrtimers
hrtimer: export ktime_add_safe
UBIFS: do not forget to register BDI device
UBIFS: allow sync option in rootflags
UBIFS: remove dead code
UBIFS: use anonymous device
UBIFS: return proper error code if the compr is not present
UBIFS: return error if link and unlink race
UBIFS: reset no_space flag after inode deletion
Linus Torvalds [Wed, 17 Jun 2009 16:13:52 +0000 (09:13 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (47 commits)
MIPS: Add hibernation support
MIPS: Move Cavium CP0 hwrena impl bits to cpu-feature-overrides.h
MIPS: Allow CPU specific overriding of CP0 hwrena impl bits.
MIPS: Kconfig Add SYS_SUPPORTS_HUGETLBFS and enable it for some systems.
Hugetlbfs: Enable hugetlbfs for more systems in Kconfig.
MIPS: TLB support for hugetlbfs.
MIPS: Add hugetlbfs page defines.
MIPS: Add support files for hugetlbfs.
MIPS: Remove unused parameters from iPTE_LW.
Staging: Add octeon-ethernet driver files.
MIPS: Export erratum function needed by octeon-ethernet driver.
MIPS: Cavium-Octeon: Add more chip specific feature tests.
MIPS: Cavium-Octeon: Add more board type constants.
MIPS: Export cvmx_sysinfo_get needed by octeon-ethernet driver.
MIPS: Add named alloc functions to OCTEON boot monitor memory allocator.
MIPS: Alchemy: devboards: Convert to gpio calls.
MIPS: Alchemy: xxs1500: use linux gpio api.
MIPS: Alchemy: MTX-1: Use linux gpio api.
MIPS: Alchemy: Rewrite GPIO support.
MIPS: Alchemy: Remove unused au1000_gpio.h header
...
Linus Torvalds [Wed, 17 Jun 2009 15:46:57 +0000 (08:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
get rid of BKL in fs/sysv
get rid of BKL in fs/minix
get rid of BKL in fs/efs
befs ->pust_super() doesn't need BKL
Cleanup of adfs headers
9P doesn't need BKL in ->umount_begin()
fuse doesn't need BKL in ->umount_begin()
No instance of ->bmap() needs BKL
remove unlock_kernel() left accidentally
ext4: avoid unnecessary spinlock in critical POSIX ACL path
ext3: avoid unnecessary spinlock in critical POSIX ACL path
Wu Zhangjin [Thu, 4 Jun 2009 12:27:10 +0000 (20:27 +0800)]
MIPS: Add hibernation support
[Ralf: SMP support requires CPU hotplugging which MIPS currently doesn't
support. As implemented in this patch cache and tlb flushing will also be
invoked with interrupts disabled so smp_call_function() will blow up in
charming ways. So limit to !SMP.]
Reviewed-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Yan Hua <yanh@lemote.com> Reviewed-by: Arnaud Patard <apatard@mandriva.com> Reviewed-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Wu Zhangjin <wuzj@lemote.com> Signed-off-by: Hu Hongbing <huhb@lemote.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 28 May 2009 00:47:45 +0000 (17:47 -0700)]
Hugetlbfs: Enable hugetlbfs for more systems in Kconfig.
As part of adding hugetlbfs support for MIPS, I am adding a new
kconfig variable 'SYS_SUPPORTS_HUGETLBFS'. Since some mips cpu
varients don't yet support it, we can enable selection of HUGETLBFS on
a system by system basis from the arch/mips/Kconfig.
Signed-off-by: David Daney <ddaney@caviumnetworks.com> CC: William Irwin <wli@holomorphy.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>