Steven Smith [Thu, 1 Oct 2009 09:54:10 +0000 (10:54 +0100)]
Add netchannel2 VMQ support to an old version of the ixgbe driver.
This is a bit of a mess, and doesn't really want to be applied as-is,
but might be useful for testing.
The VMQ patch which I have is against version 1.3.56.5 of the driver,
whereas the current 2.6.27 tree has version 2.0.34.3. I don't
currently have access to any VMQ-capable hardware, and won't be at
Citrix long enough to acquire any, so this patch just rolls the driver
back to 1.3.56.5 and adds VMQ support to that.
The original VMQ patch was
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
My only contribution was to run combinediff, but FWIW that's
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Thu, 1 Oct 2009 09:05:08 +0000 (10:05 +0100)]
NC2 VMQ support.
This only includes the transmit half, because the receiver uses an
unmodified posted buffers mode implementation.
This includes various bits of patches which were
Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Steven Smith <steven.smith@citrix.com>
All bugs are mine, of course.
Steven Smith [Wed, 30 Sep 2009 16:25:00 +0000 (17:25 +0100)]
Posted buffer mode support.
In this mode, domains are expected to pre-post a number of receive
buffers to their peer, and the peer will then copy packets into those
buffers when it wants to transmit. This is similar to the way
netchannel1 worked.
This isn't particularly useful by itself, because the software-only
implementation is slower than the other transmission modes, and is
disabled unless you set a #define, but it's necessary for VMQ support.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Thu, 1 Oct 2009 09:21:17 +0000 (10:21 +0100)]
Add the basic VMQ APIs. Nobody uses or implements them at the moment,
but that will change shortly.
This includes various bits of patches which were
Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Steven Smith <steven.smith@citrix.com>
All bugs are mine, of course.
Steven Smith [Fri, 2 Oct 2009 09:29:19 +0000 (10:29 +0100)]
Add support for automatically creating and destroying bypass rings
in response to observed traffic.
This is designed to minimise the overhead of the autobypass machine,
and in particular to minimise the overhead in dom0, potentially at the
cost of not always detecting that a bypass would be useful. In
particular, it isn't triggered by transmit_policy_small packets, and
so if you have a lot of very small packets then no bypass will be
created.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Wed, 30 Sep 2009 15:24:37 +0000 (16:24 +0100)]
Bypass support, for both frontend and backend.
A bypass is an auxiliary ring attached to a netchannel2 interface
which is used to communicate with a particular remote guest,
completely bypassing the bridge in dom0. This is quite a bit faster,
and can also help to prevent dom0 from becoming a bottleneck on large
systems.
Bypasses are inherently incompatible with packet filtering in domain
0. This is a moderately unusual configuration (there'll usually be a
firewall protecting the dom0 host stack, but bridge filtering is less
common), and we rely on the user turning off bypasses if they're doing
it.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Wed, 30 Sep 2009 12:28:28 +0000 (13:28 +0100)]
Add support for receiver-map mode.
In this mode of operation, the receiving domain maps the sending
domain's buffers, rather than grant-copying them into local memory.
This is marginally faster, but requires the receiving domain to be
somewhat trusted, because:
a) It can see anything else which happens to be on the same page
as the transmit buffer, and
b) It can just hold onto the pages indefinitely, causing a memory leak
in the transmitting domain.
It's therefore only really suitable for talking to a trusted peer, and
we use it in that way.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Wed, 30 Sep 2009 09:53:53 +0000 (10:53 +0100)]
Add a fall-back poller, in case finish messages get stuck somewhere.
We try to avoid the event channel notification when sending finish
messages, for performance reasons, but that can lead to a deadlock if
you have a lot of packets going in one direction and nothing coming
the other way. Fix it by just polling for messages every second when
there are unfinished packets outstanding.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Tue, 29 Sep 2009 15:50:37 +0000 (16:50 +0100)]
Extend the grant tables implementation with an improved allocation batching mechanism.
The current batched allocation mechanism only allows grefs to be
withdrawn from the pre-allocated pool one at a time; the new scheme
allows them to be withdrawn in groups. There aren't currently any
users of this facility, but it will simplify some of the NC2 logic
(coming up shortly).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Tue, 29 Sep 2009 15:45:29 +0000 (16:45 +0100)]
Add support for transitive grants.
These allow a domain A which has been granted access on a page of
domain B's memory to issue domain C with a copy-grant on the same
page. This is useful e.g. for forwarding packets between domains.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Tue, 29 Sep 2009 15:41:23 +0000 (16:41 +0100)]
Add support for copy only (sub-page) grants. These are like normal
access grants, except:
-- They can't be used to map the page (so can only be used in a
GNTTABOP_copy hypercall).
-- It's possible to grant access with a finer granularity than whole
pages.
-- Xen guarantees that they can be revoked quickly (a normal map
grant can only be revoked with the cooperation of the domain which
has been granted access).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Tue, 29 Sep 2009 15:30:09 +0000 (16:30 +0100)]
Fix a long-standing memory leak in the grant tables implementation.
According to the interface comments, gnttab_end_foreign_access() is
supposed to free the page once the grant is no longer in use, from a
polling timer, but that was never implemented. Implement it.
This shouldn't make any real difference, because the existing drivers
all arrange that with well-behaved backends references are never ended
while they're still in use, but it tidies things up a bit.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Tue, 29 Sep 2009 15:01:53 +0000 (16:01 +0100)]
Use the foreign page tracking logic in netback.c. This isn't terribly
useful, but will be necessary if anything else ever introduces mappings
of foreign pages into the network stack.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Tue, 29 Sep 2009 14:54:09 +0000 (15:54 +0100)]
Introduce a live_maps facility for tracking which domain foreign pages
were mapped from in a reasonably uniform way.
This isn't terribly useful at present, but will make it much easier to
forward mapped packets between domains when there are multiple drivers
loaded which can produce such packets (e.g. netback1 and netback2).
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
Add a new ioctl to /proc/xen/privcmd which allows domctls to be performed
without using the generic hypercall interface, so that they are available
on restricted fds.
This requires an unfortunate amount of fiddling with headers so that
XEN_GUEST_HANDLE_64 and uint64_aligned_t are available in kernel
space.
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
Watch the online node in the backend area, as well as the state node
in the frontend area, and fire the frontend state changed watch
whenever it changes. This allows us to catch the case where a device
shuts down in a domU and then gets xm detach'd from in dom0.
Otherwise, the backend doesn't shut down correctly, since online was
set when the frontend shut down and we don't get another kick when it
becomes unset.
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
__gnttab_dma_map_page can be called from a softirq (via the network
transmit softirq for example) therefor gnttab_copy_grant_page needs to
take gntab_dma_lock in an interrupt safe manner.
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
There's no point in sending lots of little packets to a copying
receiver if we can instead arrange to copy them all into a single RX
buffer. We need to copy anyway, so there's no overhead here, and this
is a little bit easier on the receiving domain's network stack.
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
Ensure that packet csums are computed correctly when sending a GSO
packet to an interface which supports scatter-gather but not transmit
checksum offloads.
Signed-off-by: Steven Smith <ssmith@xensource.com>
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
[NETBACK] Try to pull a minimum of 72 bytes into the skb data area
when receiving a packet into netback. The previous number, 64, tended
to place a fragment boundary in the middle of the TCP header options
and led to unnecessary fragmentation in Windows <-> Windows
networking.
Signed-off-by: Steven Smith <ssmith@xensource.com>
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
It is possible for a frontend to generate a TSO request which doesn't
actually need segmentation (i.e. with size < MTU). Make sure this
doesn't crash the backend.
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
The Windows drivers push the network frontend to state Closed, then
Initialised, then Closed again as part of device disable. Make sure
the backend doesn't get stuck at closed.
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
Arrange that netback waits for the hotplug scripts to complete before
going to state Connected. WHQL gets quite upset if it sends packets
which don't arrive, and that can happen if our hotplug scripts are
slow and don't hook the network interface up to the bridge in time.
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
It turns out that Windows occasionally generates packets in which the
IP and TCP headers are in different fragments. Make sure that the
backends can handle this.
Steven Smith [Fri, 2 Oct 2009 11:58:56 +0000 (12:58 +0100)]
CA-27974: Fix blktap shutdown race due to improper event ordering.
Writing shutdown-done before switching device state to closed (6)
opens a remarkably small race window to fall through: The agent
removes the device directory just before the write to the 'state'
field will recreate it again. This in turn leads to xenbus failing to
remove the device, since removal is guided by directory existence.
With shutdown-done and connection state being rather independent,
trivially fixing event ordering to write shutdown-done last appears
safe but mandatory. Comment this tiny detail.
Steven Smith [Fri, 2 Oct 2009 11:58:57 +0000 (12:58 +0100)]
Close block devices when the pv drivers take over and flush the buffer cache.
- close and free the block devices in qemu when we switch to pv drivers in
the guest
- use BLKFLSBUF to flush the buffer cache, both in qemu and in blkback