Qing He [Tue, 18 Aug 2009 05:45:13 +0000 (13:45 +0800)]
xen: add msi support for dom0
This patch adds msi support for dom0, based on
arch_setup_msi_irqs hook, a xen_setup_msi_irqs is called
if it's Xen domain. No interrupt remapping is handled since
Xen domain isn't exposed with such feature at this time.
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Merge branch 'rebase/dom0/apic' into rebase/dom0/msi
* rebase/dom0/apic: (7035 commits)
xen: disable MSI
xen: fix legacy irq setup, make ioapic-less machines work.
xen: set pirq name to something useful.
xen: dynamically allocate irq & event structures
xen: initialize irq 0 too
xen: use acpi_get_override_irq() to get triggering for legacy irqs
xen: don't setup acpi interrupt unless there is one
xen: pre-initialize legacy irqs early
xen: bind pirq to vector and event channel
xen: direct irq registration to pirq event channels
xen/apic: identity map gsi->irqs
x86/io_apic: add get_nr_irqs_gsi()
xen: implement pirq type event channels
xen: create dummy ioapic mapping
xen: hook io_apic read/write operations
xen/dom0: handle acpi lapic parsing in Xen dom0
kmemleak: Fix some typos in comments
kmemleak: Rename kmemleak_panic to kmemleak_stop
kmemleak: Only use GFP_KERNEL|GFP_ATOMIC for the internal allocations
Documentation/vm/Makefile: don't try to build slqbinfo
...
Weidong Han [Thu, 30 Jul 2009 06:33:01 +0000 (14:33 +0800)]
pv-ops: register xen pci notifier
Register the notifier to handle hot-plug devices and SR-IOV devices
for Xen hypervisor. When a device is hot added or removed, it adds
or removes it to Xen via hypercalls.
Changes in v3:
It isn't necessarily to explicitly initialize elements to 0, because
initializer will do it implicitly. Remove the unnecessary initilization.
Changes in v2:
Remove inline #ifdef and the awkward dangling else/#endif construction,
and rather than using memset, use variable declaration and initializer
to assign the elements in xen_add_device.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Dynamically allocate the irq_info and evtchn_to_irq arrays, so that
1) the irq_info array scales to the actual number of possible irqs,
and 2) we don't needlessly increase the static size of the kernel
when we aren't running under Xen.
Derived on patch from Mike Travis <travis@sgi.com>.
[ Impact: reduce memory usage ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Ian Campbell [Mon, 9 Feb 2009 20:05:48 +0000 (12:05 -0800)]
xen: pre-initialize legacy irqs early
Various legacy devices, such as IDE, assume their legacy interrupts are
already initialized and are immediately usable. Pre-initialize all the
legacy interrupts.
[ Impact: ISA/legacy device compat ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Having converting a dev+pin to a gsi, and that gsi to an irq, and
allocated a vector for the irq, we must program the IO APIC to deliver
an interrupt on a pin to the vector, so Xen can deliver it as an event
channel.
Given the pirq, we can get the gsi and vector. We map the gsi to a
specific IO APIC's pin, and set the routing entry.
(We were passing the ACPI triggering and polarity levels directly into
the apic - but they have reversed values. The result was that
all the level-triggered interrupts were edge, and vice-versa.
It's surprising that anything worked at all, but now AHCI works
for me.
Thanks for Gerd Hoffmann for noticing this.)
[ Impact: program IO APICs under Xen ]
Diagnosed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
xen: direct irq registration to pirq event channels
Impact: route hardware interrupts via Xen
This patch puts the hooks into place so that when the interrupt
subsystem registers an irq, it gets routed via Xen (if we're running
under Xen).
The first step is to get a gsi for a particular device+pin. We use
the normal acpi interrupt routing to do the mapping.
We reserve enough irq space to fit the hardware interrupt sources in,
so we can allocate the irq == gsi, as we do in the native case;
software events will get allocated irqs above that.
Having allocated an irq, we ask Xen to allocate a vector, and then
bind that pirq/vector to an event channel. When the hardware raises
an interrupt on a vector, Xen signals us on the corresponding event
channel, which gets routed to the irq and delivered to the appropriate
device driver.
This patch does everything except set up the IO APIC pin routing to
the vector.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
A privileged PV Xen domain can get direct access to hardware. In
order for this to be useful, it must be able to get hardware
interrupts.
Being a PV Xen domain, all interrupts are delivered as event channels.
PIRQ event channels are bound to a pirq number and an interrupt
vector. When a IO APIC raises a hardware interrupt on that vector, it
is delivered as an event channel, which we can deliver to the
appropriate device driver(s).
This patch simply implements the infrastructure for dealing with pirq
event channels.
[ Impact: integrate hardware interrupts into Xen's event scheme ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
We don't allow direct access to the IO apic, so make sure that any
request to map it just "maps" non-present pages. We should see any
attempts at direct access explode nicely.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
In Xen, writes to the IO APIC are paravirtualized via hypercalls, so
implement the appropriate operations.
This version of the patch just hooks the io_apic read/write functions
directly, rather than introducing another layer of indirection. The
xen_initial_domain() tests compile to 0 if CONFIG_XEN_DOM0 isn't set,
and are cheap if it is.
(An alternative would be to add io_apic_ops, and point them to the Xen
implementation as needed. HPA deemed this extra level of indirection to
be excessive.)
[Impact: paravirtualize io_apic_read/write] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Impact: ignore local apics, which are not usable under Xen
When running in Xen dom0, we still want to parse the ACPI tables to
find out about local and IO apics, but we don't want to actually use
the lapics.
Put a couple of tests for Xen to prevent lapics from being mapped or
accessed. This is very Xen-specific behaviour, so there didn't seem to
be any point in adding more indirection.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Linus Torvalds [Wed, 17 Jun 2009 17:42:21 +0000 (10:42 -0700)]
Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Fix some typos in comments
kmemleak: Rename kmemleak_panic to kmemleak_stop
kmemleak: Only use GFP_KERNEL|GFP_ATOMIC for the internal allocations
Catalin Marinas [Wed, 17 Jun 2009 17:29:02 +0000 (18:29 +0100)]
kmemleak: Only use GFP_KERNEL|GFP_ATOMIC for the internal allocations
Kmemleak allocates memory for pointer tracking and it tries to avoid
using GFP_ATOMIC if the caller doesn't require it. However other gfp
flags may be passed by the caller which aren't required by kmemleak.
This patch filters the gfp flags so that only GFP_KERNEL | GFP_ATOMIC
are used.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Linus Torvalds [Wed, 17 Jun 2009 16:51:50 +0000 (09:51 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c
[CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c
[CPUFREQ] powernow-k8: get drv data for correct CPU
[CPUFREQ] powernow-k8: read P-state from HW
[CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[]
[CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier()
[CPUFREQ] minor correction to cpu-freq documentation
[CPUFREQ] powernow-k8.c: mess cleanup
[CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful
[CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0
[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6:
[SCSI] aic79xx: make driver respect nvram for IU and QAS settings
[SCSI] don't attach ULD to Dell Universal Xport
[SCSI] lpfc 8.3.3 : Update driver version to 8.3.3
[SCSI] lpfc 8.3.3 : Add support for Target Reset handler entrypoint
[SCSI] lpfc 8.3.3 : Fix a couple of spin_lock and memory issues and a crash
[SCSI] lpfc 8.3.3 : FC/FCOE discovery fixes
[SCSI] lpfc 8.3.3 : Fix various SLI-3 vs SLI-4 differences
[SCSI] qla2xxx: Resolve a performance issue in interrupt
[SCSI] cnic, bnx2i: Fix build failure when CONFIG_PCI is not set.
[SCSI] nsp_cs: time_out reaches -1
[SCSI] qla2xxx: fix printk format warnings
[SCSI] ncr53c8xx: div reaches -1
[SCSI] compat: don't perform unneeded copy in sg_io code
[SCSI] zfcp: Update FC pass-through support
[SCSI] zfcp: Add FC pass-through support
[SCSI] FC Pass Thru support
Linus Torvalds [Wed, 17 Jun 2009 16:48:30 +0000 (09:48 -0700)]
Merge branch 'linux-next' of git://git.infradead.org/ubi-2.6
* 'linux-next' of git://git.infradead.org/ubi-2.6: (21 commits)
UBI: add reboot notifier
UBI: handle more error codes
UBI: fix multiple spelling typos
UBI: fix kmem_cache_free on error patch
UBI: print amount of reserved PEBs
UBI: improve messages in the WL worker
UBI: make gluebi a separate module
UBI: remove built-in gluebi
UBI: add notification API
UBI: do not switch to R/O mode on read errors
UBI: fix and clean-up error paths in WL worker
UBI: introduce new constants
UBI: fix race condition
UBI: minor serialization fix
UBI: do not panic if volume check fails
UBI: add dump_stack in checking code
UBI: fix races in I/O debugging checks
UBI: small debugging code optimization
UBI: improve debugging messages
UBI: re-name volumes_mutex to device_mutex
...
Linus Torvalds [Wed, 17 Jun 2009 16:46:33 +0000 (09:46 -0700)]
Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: start using hrtimers
hrtimer: export ktime_add_safe
UBIFS: do not forget to register BDI device
UBIFS: allow sync option in rootflags
UBIFS: remove dead code
UBIFS: use anonymous device
UBIFS: return proper error code if the compr is not present
UBIFS: return error if link and unlink race
UBIFS: reset no_space flag after inode deletion
Linus Torvalds [Wed, 17 Jun 2009 16:13:52 +0000 (09:13 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (47 commits)
MIPS: Add hibernation support
MIPS: Move Cavium CP0 hwrena impl bits to cpu-feature-overrides.h
MIPS: Allow CPU specific overriding of CP0 hwrena impl bits.
MIPS: Kconfig Add SYS_SUPPORTS_HUGETLBFS and enable it for some systems.
Hugetlbfs: Enable hugetlbfs for more systems in Kconfig.
MIPS: TLB support for hugetlbfs.
MIPS: Add hugetlbfs page defines.
MIPS: Add support files for hugetlbfs.
MIPS: Remove unused parameters from iPTE_LW.
Staging: Add octeon-ethernet driver files.
MIPS: Export erratum function needed by octeon-ethernet driver.
MIPS: Cavium-Octeon: Add more chip specific feature tests.
MIPS: Cavium-Octeon: Add more board type constants.
MIPS: Export cvmx_sysinfo_get needed by octeon-ethernet driver.
MIPS: Add named alloc functions to OCTEON boot monitor memory allocator.
MIPS: Alchemy: devboards: Convert to gpio calls.
MIPS: Alchemy: xxs1500: use linux gpio api.
MIPS: Alchemy: MTX-1: Use linux gpio api.
MIPS: Alchemy: Rewrite GPIO support.
MIPS: Alchemy: Remove unused au1000_gpio.h header
...
Linus Torvalds [Wed, 17 Jun 2009 15:46:57 +0000 (08:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
get rid of BKL in fs/sysv
get rid of BKL in fs/minix
get rid of BKL in fs/efs
befs ->pust_super() doesn't need BKL
Cleanup of adfs headers
9P doesn't need BKL in ->umount_begin()
fuse doesn't need BKL in ->umount_begin()
No instance of ->bmap() needs BKL
remove unlock_kernel() left accidentally
ext4: avoid unnecessary spinlock in critical POSIX ACL path
ext3: avoid unnecessary spinlock in critical POSIX ACL path
Wu Zhangjin [Thu, 4 Jun 2009 12:27:10 +0000 (20:27 +0800)]
MIPS: Add hibernation support
[Ralf: SMP support requires CPU hotplugging which MIPS currently doesn't
support. As implemented in this patch cache and tlb flushing will also be
invoked with interrupts disabled so smp_call_function() will blow up in
charming ways. So limit to !SMP.]
Reviewed-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Yan Hua <yanh@lemote.com> Reviewed-by: Arnaud Patard <apatard@mandriva.com> Reviewed-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Wu Zhangjin <wuzj@lemote.com> Signed-off-by: Hu Hongbing <huhb@lemote.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 28 May 2009 00:47:45 +0000 (17:47 -0700)]
Hugetlbfs: Enable hugetlbfs for more systems in Kconfig.
As part of adding hugetlbfs support for MIPS, I am adding a new
kconfig variable 'SYS_SUPPORTS_HUGETLBFS'. Since some mips cpu
varients don't yet support it, we can enable selection of HUGETLBFS on
a system by system basis from the arch/mips/Kconfig.
Signed-off-by: David Daney <ddaney@caviumnetworks.com> CC: William Irwin <wli@holomorphy.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Wed, 6 May 2009 00:35:21 +0000 (17:35 -0700)]
Staging: Add octeon-ethernet driver files.
The octeon-ethernet driver supports the sgmii, rgmii, spi, and xaui
ports present on the Cavium OCTEON family of SOCs. These SOCs are
multi-core mips64 processors with existing support over in arch/mips.
The driver files can be categorized into three basic groups:
1) Register definitions, these are named cvmx-*-defs.h
2) Main driver code, these have names that don't start cvmx-.
3) Interface specific functions and other utility code, names starting
with cvmx-
Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Manuel Lauss [Sat, 6 Jun 2009 12:09:55 +0000 (14:09 +0200)]
MIPS: Alchemy: Rewrite GPIO support.
The current in-kernel Alchemy GPIO support is far too inflexible for
all my use cases. To address this, the following changes are made:
* create generic functions which deal with manipulating the on-chip
GPIO1/2 blocks. Such functions are universally useful.
* Macros for GPIO2 shared interrupt management and block control.
* support for both built-in CONFIG_GPIOLIB and fast, inlined GPIO macros.
If CONFIG_GPIOLIB is not enabled, provide linux gpio framework
compatibility by directly inlining the GPIO1/2 functions. GPIO access
is limited to on-chip ones and they can be accessed as documented in
the datasheets (GPIO0-31 and 200-215).
If CONFIG_GPIOLIB is selected, two (2) gpio_chip-s, one for GPIO1 and
one for GPIO2, are registered. GPIOs can still be accessed by using
the numberspace established in the databooks.
However this is not yet flexible enough for my uses: My Alchemy
systems have a documented "external" gpio interface (fixed, different
numberspace) and can support a variety of baseboards, some of which
are equipped with I2C gpio expanders. I want to be able to provide
the default 16 GPIOs of the CPU board numbered as 0..15 and also
support gpio expanders, if present, starting as gpio16.
To achieve this, a new Kconfig symbol for Alchemy is introduced,
CONFIG_ALCHEMY_GPIO_INDIRECT, which boards can enable to signal
that they don't want the Alchemy numberspace exposed to the outside
world, but instead want to provide their own. Boards are now respon-
sible for providing the linux gpio interface glue code (either in a
custom gpio.h header (in board include directory) or with gpio_chips).
To make the board-specific inlined gpio functions work, the MIPS
Makefile must be changed so that the mach-au1x00/gpio.h header is
included _after_ the board headers, by moving the inclusion of
the mach-au1x00/ to the end of the header list.
See arch/mips/include/asm/mach-au1x00/gpio.h for more info.
Ralf Baechle [Wed, 17 Jun 2009 10:06:28 +0000 (11:06 +0100)]
MIPS: ioctl.h: Cleanup.
o Rewrite to use <asm-generic/ioctl.h>. Cuts down the file from 40 to
16 lines.
o Delete _IOC_VOID, _IOC_OUT, _IOC_IN and _IOC_INOUT. They were added
for 2.1.14 but I was not able to find any user - not even historical
ones.
Imre Kaloz [Tue, 2 Jun 2009 12:22:06 +0000 (14:22 +0200)]
MIPS: Sibyte: Remove standalone kernel support
CFE is the only supported and used bootloader on the SiByte boards,
the standalone kernel support has been never used outside Broadcom.
Remove it and make the kernel use CFE by default.
Signed-off-by: Imre Kaloz <kaloz@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Florian Fainelli [Thu, 21 May 2009 17:49:47 +0000 (19:49 +0200)]
MIPS: RB532: Check irq number when handling GPIO interrupts
This patch makes sure that we are not going to clear
or change the interrupt status of a GPIO interrupt
superior to 13 as this is the maximum number of GPIO
interrupt source (p.232 of the RC32434 reference manual).
David Daney [Tue, 12 May 2009 19:41:53 +0000 (12:41 -0700)]
MIPS: Allow R2 CPUs to turn off generation of 'ehb' instructions.
Some CPUs do not need ehb instructions after writing CP0 registers.
By allowing ehb generation to be overridden in
cpu-feature-overrides.h, we can save a few instructions in the TLB
handler hot paths.
Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Wed, 20 May 2009 18:40:59 +0000 (11:40 -0700)]
MIPS: Fold the TLB refill at the vmalloc path if possible.
Try to fold the 64-bit TLB refill handler opportunistically at the
beginning of the vmalloc path so as to avoid splitting execution flow in
half and wasting cycles for a branch required at that point then. Resort
to doing the split if either of the newly created parts would not fit into
its designated slot.
Original-patch-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Wed, 20 May 2009 18:40:58 +0000 (11:40 -0700)]
MIPS: Replace some magic numbers with symbolic values in tlbex.c
The logic used to split the r4000 refill handler is liberally
sprinkled with magic numbers. We attempt to explain what they are and
normalize them against a new symbolic value (MIPS64_REFILL_INSNS).
CC: David VomLehn <dvomlehn@cisco.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Wed, 17 Jun 2009 10:06:24 +0000 (11:06 +0100)]
MIPS: SB1250: Sort out merge mistake.
A wrong resolution of a merge conflict made the recently deleted wrong
error check in sb1250_set_affinity. Send the zombie back to the empire
of the undead.
were uncorrectly merged.
The former removes one pair of lock/unlock_kernel(), but the latter adds
several unlock_kernel(). Finally a few unlock_kernel() calls left.
Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Theodore Ts'o [Mon, 8 Jun 2009 19:22:25 +0000 (15:22 -0400)]
ext4: avoid unnecessary spinlock in critical POSIX ACL path
If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
to do POSIX ACL checks on any files not owned by the caller, and it does
this for every single pathname component that it looks up.
That obviously can be pretty expensive if the filesystem isn't careful
about it, especially with locking. That's doubly sad, since the common
case tends to be that there are no ACL's associated with the files in
question.
ext4 already caches the ACL data so that it doesn't have to look it up
over and over again, but it does so by taking the inode->i_lock spinlock
on every lookup. Which is a noticeable overhead even if it's a private
lock, especially on CPU's where the serialization is expensive (eg Intel
Netburst aka 'P4').
For the special case of not actually having any ACL's, all that locking is
unnecessary. Even if somebody else were to be changing the ACL's on
another CPU, we simply don't care - if we've seen a NULL ACL, we might as
well use it.
So just load the ACL speculatively without any locking, and if it was
NULL, just use it. If it's non-NULL (either because we had a cached
entry, or because the cache hasn't been filled in at all), it means that
we'll need to get the lock and re-load it properly.
(This commit was ported from a patch originally authored by Linus for
ext3.)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Mon, 8 Jun 2009 19:22:24 +0000 (15:22 -0400)]
ext3: avoid unnecessary spinlock in critical POSIX ACL path
If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
to do POSIX ACL checks on any files not owned by the caller, and it does
this for every single pathname component that it looks up.
That obviously can be pretty expensive if the filesystem isn't careful
about it, especially with locking. That's doubly sad, since the common
case tends to be that there are no ACL's associated with the files in
question.
ext3 already caches the ACL data so that it doesn't have to look it up
over and over again, but it does so by taking the inode->i_lock spinlock
on every lookup. Which is a noticeable overhead even if it's a private
lock, especially on CPU's where the serialization is expensive (eg Intel
Netburst aka 'P4').
For the special case of not actually having any ACL's, all that locking is
unnecessary. Even if somebody else were to be changing the ACL's on
another CPU, we simply don't care - if we've seen a NULL ACL, we might as
well use it.
So just load the ACL speculatively without any locking, and if it was
NULL, just use it. If it's non-NULL (either because we had a cached
entry, or because the cache hasn't been filled in at all), it means that
we'll need to get the lock and re-load it properly.
This is noticeable even on Nehalem, which does locking quite well (much
better than P4). From lmbench:
Processor, Processes - times in microseconds - smaller is better
--------------------------------------------------------------------
Host OS Mhz null null open slct fork exec sh
call I/O stat clos TCP proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
- before:
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.45 2.18 69.1 273. 1141
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.48 2.28 69.9 253. 1140
nehalem.l Linux 2.6.30- 3193 0.04 0.10 0.95 1.42 2.19 68.6 284. 1141
- after:
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.44 2.12 68.3 282. 1094
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.20 67.0 308. 1123
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.36 67.4 293. 1148
where you can see what appears to be a roughly 3% improvement in stat
and open/close latencies from just the removal of the locking overhead.
Of course, this only matters for files you don't own (the owner never
needs to do the ACL checks), but that's the common case for libraries,
header files, and executables. As well as for the base components of any
absolute pathname, even if you are the owner of the final file.
[ At some point we probably want to move this ACL caching logic entirely
into the VFS layer (and only call down to the filesystem when
uncached), but in the meantime this improves ext3 a bit.
A similar fix to btrfs makes a much bigger difference (15x improvement
in lmbench) due to broken caching. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>