Steven Smith [Sun, 4 Oct 2009 11:50:46 +0000 (12:50 +0100)]
Add support for automatically creating and destroying bypass rings in response to observed traffic.
This is designed to minimise the overhead of the autobypass machine,
and in particular to minimise the overhead in dom0, potentially at the
cost of not always detecting that a bypass would be useful. In
particular, it isn't triggered by transmit_policy_small packets, and
so if you have a lot of very small packets then no bypass will be
created.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Sun, 4 Oct 2009 11:46:32 +0000 (12:46 +0100)]
Bypass support, for both frontend and backend.
A bypass is an auxiliary ring attached to a netchannel2 interface
which is used to communicate with a particular remote guest,
completely bypassing the bridge in dom0. This is quite a bit faster,
and can also help to prevent dom0 from becoming a bottleneck on large
systems.
Bypasses are inherently incompatible with packet filtering in domain
0. This is a moderately unusual configuration (there'll usually be a
firewall protecting the dom0 host stack, but bridge filtering is less
common), and we rely on the user turning off bypasses if they're doing
it.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Sun, 4 Oct 2009 11:38:25 +0000 (12:38 +0100)]
Add support for receiver-map mode.
In this mode of operation, the receiving domain maps the sending
domain's buffers, rather than grant-copying them into local memory.
This is marginally faster, but requires the receiving domain to be
somewhat trusted, because:
a) It can see anything else which happens to be on the same page
as the transmit buffer, and
b) It can just hold onto the pages indefinitely, causing a memory leak
in the transmitting domain.
It's therefore only really suitable for talking to a trusted peer, and
we use it in that way.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Sun, 4 Oct 2009 11:25:44 +0000 (12:25 +0100)]
Add a fall-back poller, in case finish messages get stuck somewhere.
We try to avoid the event channel notification when sending finish
messages, for performance reasons, but that can lead to a deadlock if
you have a lot of packets going in one direction and nothing coming
the other way. Fix it by just polling for messages every second when
there are unfinished packets outstanding.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 16:48:43 +0000 (17:48 +0100)]
Extend the grant tables implementation with an improved allocation batching mechanism.
The current batched allocation mechanism only allows grefs to be
withdrawn from the pre-allocated pool one at a time; the new scheme
allows them to be withdrawn in groups. There aren't currently any
users of this facility, but it will simplify some of the NC2 logic
(coming up shortly).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 16:47:55 +0000 (17:47 +0100)]
Add support for transitive grants.
These allow a domain A which has been granted access on a page of
domain B's memory to issue domain C with a copy-grant on the same
page. This is useful e.g. for forwarding packets between domains.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 16:45:49 +0000 (17:45 +0100)]
Add support for copy only (sub-page) grants.
These are like normal access grants, except:
-- They can't be used to map the page (so can only be used in a
GNTTABOP_copy hypercall).
-- It's possible to grant access with a finer granularity than whole
pages.
-- Xen guarantees that they can be revoked quickly (a normal map
grant can only be revoked with the cooperation of the domain which
has been granted access).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 16:03:44 +0000 (17:03 +0100)]
Fix a long-standing memory leak in the grant tables implementation.
According to the interface comments, gnttab_end_foreign_access() is
supposed to free the page once the grant is no longer in use, from a
polling timer, but that was never implemented. Implement it.
This shouldn't make any real difference, because the existing drivers
all arrange that with well-behaved backends references are never ended
while they're still in use, but it tidies things up a bit.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 15:56:58 +0000 (16:56 +0100)]
Use the foreign page tracking logic in netback.c. This isn't terribly
useful, but will be necessary if anything else ever introduces
mappings of foreign pages into the network stack.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 15:52:00 +0000 (16:52 +0100)]
Introduce a live_maps facility for tracking which domain foreign pages were mapped from in a reasonably uniform way.
This isn't terribly useful at present, but will make it much easier to
forward mapped packets between domains when there are multiple drivers
loaded which can produce such packets (e.g. netback1 and netback2).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
* xen/dom0/drm:
drm: free vmalloc dma memory properly
drm: make sure all pages in vmalloc area are really DMA-ready
drm: recompute vma->vm_page_prot after changing vm_flags
Having allocated pages with drm_alloc_coherent, we need to free them with
drm_free_coherent. We're still not preserving the bus addresses so
we're hoping the backend dma_ops implementation can cope with the ones
we're making up.
Derived from patch by Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
drm: make sure all pages in vmalloc area are really DMA-ready
Under Xen, vmalloc_32() isn't guaranteed to return pages which are really
under 4G in machine physical addresses (only in virtual pseudo-physical
addresses). To work around this, implement a vmalloc variant which
allocates each page with dma_alloc_coherent() to guarantee that each
page is suitable for the device in question.
(This is playing rather fast-and-loose with dma_alloc_coherent by
assuming that vfree will correctly release the pages. That happens
to be OK for Xen, but will quite likely break if its drawing memory
from another type of IOMMU... So I don't think this alone is the correct
solution.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Merge commit 'stable-2.6.31/master' into xen/master
* commit 'stable-2.6.31/master': (46 commits)
Linux 2.6.31.1
powerpc/pseries: Fix to handle slb resize across migration
PCI: Unhide the SMBus on the Compaq Evo D510 USDT
PCI quirk: update 82576 device ids in SR-IOV quirks list
libata: fix off-by-one error in ata_tf_read_block()
KVM: limit lapic periodic timer frequency
KVM: x86 emulator: fix jmp far decoding (opcode 0xea)
KVM: MMU: make __kvm_mmu_free_some_pages handle empty list
KVM: x86 emulator: Implement zero-extended immediate decoding
KVM: VMX: Fix cr8 exiting control clobbering by EPT
KVM: x86: Disallow hypercalls for guest callers in rings > 0
KVM guest: fix bogus wallclock physical address calculation
KVM: VMX: Check cpl before emulating debug register access
KVM: Fix coalesced interrupt reporting in IOAPIC
KVM guest: do not batch pte updates from interrupt context
ARM: 5691/1: fix cache aliasing issues between kmap() and kmap_atomic() with highmem
x86, pat: Fix cacheflush address in change_page_attr_set_clr()
PCI: apply nv_msi_ht_cap_quirk on resume too
x86/i386: Make sure stack-protector segment base is cache aligned
x86: Fix x86_model test in es7000_apic_is_cluster()
...
The SLB can change sizes across a live migration, which was not
being handled, resulting in possible machine crashes during
migration if migrating to a machine which has a smaller max SLB
size than the source machine. Fix this by first reducing the
SLB size to the minimum possible value, which is 32, prior to
migration. Then during the device tree update which occurs after
migration, we make the call to ensure the SLB gets updated. Also
add the slb_size to the lparcfg output so that the migration
tools can check to make sure the kernel has this capability
before allowing migration in scenarios where the SLB size will change.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
One more form factor for Compaq Evo D510, which needs the same quirk
as the other form factors. Apparently there's no hardware monitoring
chip on that one, but SPD EEPROMs, so it's still worth unhiding the
SMBus.
ata_tf_read_block() has off-by-one error when converting CHS address
to LBA. The bug isn't very visible because ata_tf_read_block() is
used only when generating sense data for a failed RW command and CHS
addressing isn't used too often these days.
Don't call adjust_vmx_controls() two times for the same control.
It restores options that were dropped earlier. This loses us the cr8
exit control, which causes a massive performance regression Windows x64.
So far unprivileged guest callers running in ring 3 can issue, e.g., MMU
hypercalls. Normally, such callers cannot provide any hand-crafted MMU
command structure as it has to be passed by its physical address, but
they can still crash the guest kernel by passing random addresses.
To close the hole, this patch considers hypercalls valid only if issued
from guest ring 0. This may still be relaxed on a per-hypercall base in
the future once required.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The use of __pa() to calculate the address of a C-visible symbol
is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h
for details.
It should be replaced with __pa_symbol(), that does the correct math here,
by taking relocations into account. This ensures the correct wallclock data
structure physical address is passed to the hypervisor.
Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Debug registers may only be accessed from cpl 0. Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.
Commit b8bcfe997e4 made paravirt pte updates synchronous in interrupt
context.
Unfortunately the KVM pv mmu code caches the lazy/nonlazy mode
internally, so a pte update from interrupt context during a lazy mmu
operation can be batched while it should be performed synchronously.
Let's suppose a highmem page is kmap'd with kmap(). A pkmap entry is
used, the page mapped to it, and the virtual cache is dirtied. Then
kunmap() is used which does virtually nothing except for decrementing a
usage count.
Then, let's suppose the _same_ page gets mapped using kmap_atomic().
It is therefore mapped onto a fixmap entry instead, which has a
different virtual address unaware of the dirty cache data for that page
sitting in the pkmap mapping.
Fortunately it is easy to know if a pkmap mapping still exists for that
page and use it directly with kmap_atomic(), thanks to kmap_high_get().
And actual testing with a printk in the added code path shows that this
condition is actually met *extremely* frequently. Seems that we've been
quite lucky that things have worked so well with highmem so far.
Signed-off-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix address passed to cpa_flush_range() when changing page
attributes from WB to UC. The address (*addr) is
modified by __change_page_attr_set_clr(). The result is that
the pages being flushed start at the _end_ of the changed range
instead of the beginning.
This should be considered for 2.6.30-stable and 2.6.31-stable.
Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
http://bugzilla.kernel.org/show_bug.cgi?id=12542 reports that with the
quirk not applied on resume, msi stops working after resuming and mcp78s
ahci fails due to IRQ mis-delivery. Apply it on resume too.
In Intel Atom microarchitecture, the address generation unit
assumes that the segment base will be 0 by default. Non-zero
segment base will cause load and store operations to experience
a delay.
- If the segment base isn't aligned to a cache line
boundary, the max throughput of memory operations is
reduced to one [e]very 9 cycles.
[...]
Assembly/Compiler Coding Rule 15. (H impact, ML generality)
For Intel Atom processors, use segments with base set to 0
whenever possible; avoid non-zero segment base address that is
not aligned to cache line boundary at all cost.
We can't avoid having a non-zero base for the stack-protector
segment, but we can make it cache-aligned.
The current implementation allocates a single host page for EQ context
memory, which was OK when we only allocated a few EQs. However, since
we now allocate an EQ for each CPU core, this patch removes the
hard-coded limit (which we exceed with 4 KB pages and 128 byte EQ
context entries with 32 CPUs) and uses the same ICM table code as all
other context tables, which ends up simplifying the code quite a bit
while fixing the problem.
This problem was actually hit in practice on a dual-socket Nehalem box
with 16 real hardware threads and sufficiently odd ACPI tables that it
shows on boot
SMP: Allowing 32 CPUs, 16 hotplug CPUs
so num_possible_cpus() ends up 32, and mlx4 ends up creating 33 MSI-X
interrupts and 33 EQs. This mlx4 bug means that mlx4 can't even
initialize at all on this quite mainstream system.
Reported-by: Eli Cohen <eli@mellanox.co.il> Tested-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the volume is changed continuously (e.g., when the user drags a
volume slider with the mouse), the driver does lots of I2C writes.
Apparently, the sound chip can get confused when we poll the I2C status
register too much, and fails to complete a read from it. On the PCI-E
models, the PCI-E/PCI bridge gets upset by this and generates a machine
check exception.
To avoid this, this patch replaces the polling with an unconditional
wait that is guaranteed to be long enough.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Tested-by: Johann Messner <johann.messner at jku.at> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix minimum period size for cs46xx cards. This fixes a problem in the
case where neither a period size nor a buffer size is passed to ALSA;
this is the case in Audacious, OpenAL, and others.
Signed-off-by: Sophie Hamilton <kernel@theblob.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As early pci resume has already restored config for host
bridge and graphics device, don't need to restore it again,
This removes an original order hack for graphics device restore.
This fixed the resume hang issue found by Alan Stern on 845G,
caused by extra config restore on graphics device.
Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Dave Airlie <airlied@linux.ie> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently there is a bug where if you use oprofile on a pSeries
machine, then use perf_counters, then use oprofile again, oprofile
will not work correctly; it will lose the PMU configuration the next
time the hypervisor does a partition context switch, and thereafter
won't count anything.
Maynard Johnson identified the sequence causing the problem:
- oprofile setup calls ppc_enable_pmcs(), which calls
pseries_lpar_enable_pmcs, which tells the hypervisor that we want
to use the PMU, and sets the "PMU in use" flag in the lppaca.
This flag tells the hypervisor whether it needs to save and restore
the PMU config.
- The perf_counter code sets and clears the "PMU in use" flag directly
as it context-switches the PMU between tasks, and leaves it clear
when it finishes.
- oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
which does nothing because it has already been called. In particular
it doesn't set the "PMU in use" flag.
This fixes the problem by arranging for ppc_enable_pmcs to always set
the "PMU in use" flag. It makes the perf_counter code call
ppc_enable_pmcs also rather than calling the lower-level function
directly, and removes the setting of the "PMU in use" flag from
pseries_lpar_enable_pmcs, since that is now done in its caller.
This also removes the declaration of pasemi_enable_pmcs because it
isn't defined anywhere.
Reported-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Michael Ellerman reported stack-frame size warnings being produced
for power_check_constraints(), which uses an 8*8 array of u64 and
two 8*8 arrays of unsigned long, which are currently allocated on the
stack, along with some other smaller variables. These arrays come
to 1.5kB on 64-bit or 1kB on 32-bit, which is a bit too much for the
stack.
This fixes the problem by putting these arrays in the existing
per-cpu cpu_hw_counters struct. This is OK because two of the call
sites have interrupts disabled already; for the third call site we
use get_cpu_var, which disables preemption, so we know we won't
get a context switch while we're in power_check_constraints().
Note that power_check_constraints() can be called during context
switch but is not called from interrupts.
Reported-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, if a group is created where the group leader is
initially disabled but a non-leader member is initially
enabled, and then the leader is subsequently enabled some time
later, the time_enabled for the non-leader member will reflect
the whole time since it was created, not just the time since
the leader was enabled.
This is incorrect, because all of the members are effectively
disabled while the leader is disabled, since none of the
members can go on the PMU if the leader can't.
Thus we have to update the ->tstamp_enabled for all the enabled
group members when a group leader is enabled, so that the
time_enabled computation only counts the time since the leader
was enabled.
Similarly, when disabling a group leader we have to update the
time_enabled and time_running for all of the group members.
Also, in update_counter_times, we have to treat a counter whose
group leader is disabled as being disabled.
My 353d5c30c666580347515da609dd74a2b8e9b828 "mm: fix hugetlb bug due to
user_shm_unlock call" broke the CONFIG_SYSVIPC !CONFIG_MMU build of both
2.6.31 and 2.6.30.6: "undefined reference to `user_shm_unlock'".
gcc didn't understand my comment! so couldn't figure out to optimize
away user_shm_unlock() from the error path in the hugetlb-less case, as
it does elsewhere. Help it to do so, in a language it understands.
Commit b8313b6da7e2e7c7f47d93d8561969a3ff9ba0ea ("dm log: remove incorrect
field from userspace table output") added a call to strstr() with a
single-character "needle" string parameter.
Unfortunately some versions of gcc replace such calls to strstr() by calls
to strchr() behind our back. This causes linking errors if strchr() is
defined as an inline function in <asm/string.h> (e.g. on m68k):
When probing the device in tpm_tis_init the call request_locality
uses timeout_a, which wasn't being initalized until after
request_locality. This results in request_locality falsely timing
out if the chip is still starting. Move the initialization to before
request_locality.
This probably only matters for embedded cases (ie mine), a BIOS likely
gets the TPM into a state where this code path isn't necessary.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After applying the patch:
$ ./hello
Trace trap # user-mode execution after execve() finishes
If the ELF headers are actually self-inconsistent, then dying is fine.
But having no PROT_WRITE segment is perfectly normal and correct if
there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser
suggested checking for PROT_WRITE in the bss logic. I think it makes
most sense to simply apply the bss logic only when there is bss.
This patch looks less trivial than it is due to some reindentation.
It just moves the "if (last_bss > elf_bss) {" test up to include the
partial-page bss logic as well as the more-pages bss logic.
Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
"Ath5k: unify resets"
introduced a regression into 2.6.28 where the PCU registers are never
initialized, due to ath5k_reset() always passing true for change_channel.
We subsequently program a lot of these registers but several may start
in an unknown state.
Reported-by: Forrest Zhang <forrest@hifulltech.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The find_ie() function uses a size_t for the len parameter, and
directly uses len as a loop variable. If any received packets
are malformed, it is possible for the decrease of len to overflow,
and since the result is unsigned, the loop will not terminate.
Change it to a signed int so the loop conditional works for
negative values.
Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a memory leak in the libsrp function srp_ring_free().
It is not documented whether or not this function should free the ring
pointer itself. But the source code of the callers of this function
(srp_target_alloc() and srp_target_free()) makes it clear that
srp_ring_free() should deallocate the ring pointer itself. Furthermore,
the patch below makes srp_ring_free() deallocate all memory allocated by
srp_ring_alloc().
This patch affects the ibmvstgt driver, which is the only in-tree driver
that calls the srp_ring_free() function (indirectly).
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Chris Webb reported:
p0# uname -a
Linux f7ea8425-d45b-490f-a738-d181d0df6963.host.elastichosts.com 2.6.30.4-elastic-lon-p #2 SMP PREEMPT Thu Aug 20 14:30:50 BST 2009 x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
p0# zgrep SCAN_ASYNC /proc/config.gz
# CONFIG_SCSI_SCAN_ASYNC is not set
The problem is caused because the async scanning split in sd.c doesn't hold
any reference to the device when it kicks off the async piece. What's
happening is that an iSCSI disconnect is destorying the device again *before*
the async sd scanning thread even starts. Fix this by taking a reference
before starting the thread and dropping it again when the thread completes.
Reported-by: Chris Webb <chris@arachsys.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch modifies the slave_configure callback so the messages that get sent
to system log for RAID1E volumes contain the string "RAID10" instead of
"RAID1E". These messages contain information regarding what kind of scsi device
is being added. Certain OEMS can enable displaying the RAID10 string instead of
RAID1E via manufacturing page 10. The driver will read this config page at
driver load time, then determine from the GenericFlags0 bits whether display
the RAID10 or RAID1E string, also even drive count is taken into consideration.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Reviewed-by: Eric Moore <Eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changing SDEV Running state from interrupt context. Previously It was
handle in work queue thread. With this change It will not wait for work
queue thread to execute scsih_ublock_io_device to put SDEV into Running
state. This will reduce delay for Device becoming RUNNING.
Modified this patch considering James comment "Not to change SDEV state
using scsi_device_set_state API, instead use scsi_internal_device_unblock
scsi_internal_device_block API"
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Reviewed-by: Eric Moore <Eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch renames the flag for indicating host reset from
ioc_reset_in_progress to shost_recovery. It also removes the spin locks
surrounding the setting of this flag, which are unnecessary. Sanity checks on
the shost_recovery flag were added thru out the code so as to prevent sending
firmware commands during host reset. Also, the setting of the shost state to
SHOST_RECOVERY was removed to prevent deadlocks, this is actually better
handled by the shost_recovery flag.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Reviewed-by: Eric Moore <Eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Following host reset its possible that the controller firmware could
assign new handles for devices, as well as adding or deleting devices. There is
code in the driver that will rescan the topology folowing host reset; updating
device handles, and remove devices that are no longer responding. This patch
will improve the responsivness by moving this rescaning from the delayed hotplug
worker thread to immediately following the host reset.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Reviewed-by: Eric Moore <Eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the allocation fails in sg_build_indirect(), an oops happens in
the error path. It's caused by an obvious typo.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reported-by: Bob Tracy <rct@gherkin.frus.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The 32 and 64-bit versions of sys_iopl were needlessly different:
- The 32-bit version ignored its function argument and directly
fetched the parameter from struct regs, presumably code dating
from the neolithic era.
- The 64-bit version never called set_iopl_mask(). Usually this
is a no-op for 64-bit, but it is also a pvop, which meant that
that iopl never worked for 64-bit guests under Xen.
- Both versions use the archaic style of getting pt_regs passed
to them. We can get the current pt_regs with task_pt_regs, which
avoids worrying about the fine details of stack layout.
We can use the body of the 32-bit function with a few small
adjustments to the function definition.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
xen/swiotlb: make sure GART iommu doesn't initialize if we want Xen swiotlb
So the kernel sets the dma_ops to the Xen SWIOTLB, and it
allocates an extra 64MB chunk of memory for the GART, which is not
used, and ... somehow all of the ioremap_nocache functions stop working
correctly. Maybe the ioremap_nocache does use some of that memory that
the gart_iommu_hole_init allocated?
With this patch, the GART is forcefully disabled, and the kernel boots fine
(with 6GB, 8GB, etc).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
evtchn_unbind_from_user() is called under a lock, so there's no need to
worry about the ordering of unbind_from_irqhandler vs clearing the port
per-user data.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Remove large local because it provokes compile warnings; replace it
with a static __initdata variable. Also rearrange some declarations
for improved prettyness.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Merge branch 'rebase/dom0/swiotlb-new' into rebase/master
* rebase/dom0/swiotlb-new:
Fix calling order wherein iommu_detected would be set after software IO TLB was initialized causing double IO TLB allocation.
Scan an e820 table and release any memory which lies between e820 entries,
as it won't be used and would just be wasted. At present this is just to
release any memory beyond the end of the e820 map, but it will also deal
with holes being punched in the map.
Derived from patch by Miroslav Rezanina <mrezanin@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Fix calling order wherein iommu_detected would be set after software IO TLB was initialized causing double IO TLB allocation.
The iommu_detected flag was set _after_ the Software IO TLB was initialized.
This caused the Xen software IO TLB to be init and right after that the software IO TLB.
Merging xen_swiotlb_init and xen_swiotlb_init_alloc in one function that sets
this argument fixes the issue.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>