This patch adds TCP Segmentation Offload (TSO) support to the
frontend.
It also advertises this fact through xenbus so that the frontend can
detect this and send through TSO requests only if it is supported.
This is done using an extra request slot which is indicated by a flag
in the first slot. In future checksum offload can be done in the same
way.
Even though only TSO is supported for now the code actually supports
GSO so it can be applied to any other protocol. The only missing bit
is the detection of host support for a specific GSO protocol. Once
that is added we can advertise all supported protocols to the guest.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Disable for now, as in domU->dom0 direction.
This patch adds scatter-and-gather transmission support to the
backend. This allows the MTU to be raised right now and the potential
for TSO in future.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds scatter-and-gather support to the frontend. It also
advertises this fact through xenbus so that the backend can detect
this and send through SG requests only if it is supported.
SG support is required to support skb's larger than one page. This
in turn is needed for either jumbo MTU or TSO. One of these is
required to bring local networking performance up to a level that
is acceptable.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a tx queue to the backend if the frontend supports rx
refill notification. A queue is needed because SG/TSO greatly reduces
the number of packets that can be stored in the rx ring. Given an rx
ring with 256 entries, a maximum TSO packet can occupy as many as 18
entries, meaning that the entire ring can only hold 14 packets. This
is too small at high bandwidths with large TCP RX windows.
Having a tx queue does not present a new security risk as the queue is
a fixed size buffer just like the rx ring. So each guest can only
hold a
fixed amount of memory (proportional to the tx queue length) on the
host.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cannot BUG_ON netbk_queue_full yet !netbk_can_queue, as this can be
triggered by a misbehaving client. Set req_event appropriately when
stopping the packet queue, or we will not receive a notification.
[XEN] Some suspicion that we may enter an infinite
#PF loop due to broken spurious pagefault detection.
Beef up the tracing on that code path so we can catch
some useful info if it happens. Signed-off-by: Keir Fraser <keir@xensource.com>
Add a transaction_started field in xenstored connection structure instead of
browsing the list of transaction each time
Bump the default to 10, and make it configurable through the command line.
Signed-off-by: Vincent Hanquez <vincent@xensource.com>
[TPM] Remove some stale code from the TPM backend driver. The code
used to be used for sending of vTPM control commands, but now this is
all done with the hotplug scripts.
This patch adds support to the frontend for notifying the backend whenever
the rx ring is refilled. This is required in order for the backend to
get a tx queue.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Modified to only send notification if req_event index is set
appropriately.
[NET] back: Replace netif->status with netif_carrier_ok
The connection status to the frontend can be represented using
netif_carrier_ok instead of netif->status. As a result, we delay
the construction of the dev qdisc until the carrier comes on. This
is a prerequisite for adding a tx queue.
By the same token, netif->active is now simply the conjunction of
netif_running and netif_carrier_ok so it too can be removed.
Because netif_carrier_off/netif_carrier_on and rtnl_lock all entail
memory barriers, there is no need to have extra memory barriers around
them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch moves all rx request pushing to network_alloc_rx_buffers.
This is needed to reduce churn for TSO. More importantly, this makes
it easier to send notifications when adding rx requests which is
required for having a queue in dom0.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[NET] front: Clean up rx ring recovery. Signed-off-by: Keir Fraser <keir@xensource.com>
This is an update to c/s
10855:03c8002068d9d60c7bbfc2f41af975e09b2e2211
which should have contained the following changeset message
(rather than 'Merge.').
[NET] front: Stop using rx->id
With the current protocol for transferring packets from dom0 to domU,
the
rx->id field is useless because it can be derived from the rx request
ring
ID. In particular,
rx->id = (ring_id & NET_RX_RING_SIZE - 1) + 1;
This formula works because the rx response to each request always
occupies
the same slot that the request arrived in. This in turn is a
consequence
of the fact that each packet only occupies one slot.
The other important reason that this works for dom0=>domU but not
domU=>dom0
is that the resource associated with the rx->id is freed immediately
while
in the domU=>dom0 case the resource is held until the skb is liberated
by
dom0.
Using this formula we can essentially remove rx->id from the protocol,
freeing up space that could be instead be used by things like TSO.
The
only constraint is that the backend must obey the rule that each id
must
only be used in the response that occupies the same slot as the
request.
The actual field of rx->id is still maintained for compatibility with
older backends.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[PCI] Basic documentation for the per-device permissive
flag and the two policy files. However, the general intent of this
patch set is to avoid the need for user interaction, so documentation
is somewhat sparse.
Signed-off-by: Chris Bookholt <hap10@tycho.ncsc.mil>
[PCI] Two policy files written in what is intended to be human-readable SXP.
1. xend-pci-quirks.sxp:
Specifies which PCI device(s) may write to a set of PCI configuration
space registers. A quirky PCI device is identified by its vendor ID,
device ID, subvendor ID, and subdevice ID. If a matching entry is
found, the corresponding fields will be sent to the PCI bus manager.
Fields are composed of a register, size, and mask -- although the mask
field is currently unused.
The included policy file is for a range of tg3 devices, which is the
only type of quirky device I know about. Users with other quirky
devices are invited to either add entries to this policy file or add
an entry in the permissive file, described next. In either case, send an
email to the xen-devel list to make the device known.
2. xend-pci-permissive.sxp
Lists PCI devices that pciback should not prevent from writing to
their configuration space. This can be necessary if, for example, a new
Tigon3 devices is released with different PCI vendor/device values
such that no entry in xend-pci-quirks.sxp is triggered.
Signed-off-by: Chris Bookholt <hap10@tycho.ncsc.mil>
[PCI] Allow per-device configuration for fine-grained control over PCI
configuration space writes, with a goal that was also previously
described by Ryan:
"Permissive mode should only be used as a fall back for unknown
devices.
I think the correct solution for dealing with these device-specific
configuration space registers is to identify them and add the
device-specific fields to the overlay. This patch adds a special
configuration space handler for network cards based on the tg3 linux
network device driver. This handler should allow for reads/writes to
all of the configuration space registers that the tg3 driver requires."
This patch attempts to address concerns with Ryan's original
submission by moving policy from the dom0 kernel into dom0 user-space.
As new quirky devices emerge they can be incorporated into the user-space
policy. An added benefit is that changes to the policy are effective
for domains created after the changes are written (no need rebuild the
hypervisor or restart xend).
Signed-off-by: Chris Bookholt <hap10@tycho.ncsc.mil>
[qemu] Fix reads on unreported memory addresses.
The function cpu_physical_memory_rw() thinks that if an address is
not mmio-related, it is assumed to be a RAM case. This is improper.
When making the assumption, we should make sure the address is less
than the guest physical memory size ram_size.
From: Cui, Dexuan <dexuan.cui@intel.com> Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
[NET] back: Replace netif->active with netif_carrier_ok
The connection status to the frontend can be represented using
netif_carrier_ok instead of netif->active. As a result, we delay
the construction of the dev qdisc until the carrier comes on. This
is a prerequisite for adding a tx queue.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[NET] back: Make use of the simplicity of tasklets in net_rx_action
Tasklets have the property that each one is running on only one CPU at
any time. This means that you don't have to worry about the tasklet
racing against itself. Therefore any resources used by just a single
tasklet does not need to be guarded by locks.
Since net_rx_action is the only user of alloc_mfn, we can remove the
mfn_lock that guard it.
The notify_list array is huge by Linux standards so placing it on the
stack is unsafe. Since net_rx_action is not re-entrant, we can simply
make it static.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Clear MPT l2 entries when allocating monitor pagetable
Changeset 10734 removed the code clearing MPT l2 entries, however
these entries are stained when copying from idle_pg_table_l2.
[IA64] Fix of C/S 10529:4260eb8c08740de0000081c61a6237ffcb95b2d5 for IA64.
When page is zapped from a domain, the page referenced counter
is checked. But it results in false positive alert on Xen/IA64
because a page 'in use' has reference count 2 on Xen/IA64.
- a page is assigned to guest domain's psudo physical address space.
This is decremented by guest_physmap_remove_page()
- a page is allocated for a domain.
This is decremented by the following put_page()
[qemu] Fix -net tap option when no ifname is specified.
Uninitialized ifname can cause qemu to quit. If the first character of the
ifname is not \0, qemu will think it's a valid ifname and configure
/dev/net/tun to use it. The configuration fails and qemu exits.
Based on a patch from: Steve Dobbelstein <steved@us.ibm.com> Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
[powerpc] domain building fixes for linux kexec model
The following updates are included:
- No stack allocation is necessary
- Some buggy kernels require r13 to be zeroed
- the DTB must be loaded from a fixed address, we are using
"/root/DomU.dtb" until we have the tools build the DTB on their
own.
- Though we give the PFN of the store and console pages to the new
domain we must make sure the MFN is given to the tools.
[qemu] Initialize vga from within qemu for when the bios doesn't do so.
On xen/x86, vga bios is copied to 0xC0000 by guest firmware.
However on ia64 platform, native firmware depends on some
initialization vga state at power on and so does guest firmware.
That's why that vga bios initialization stub is required for vti
domain, to match platform requirement.
Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
[qemu] Re-calculate color_table after color depth reset.
VNC client may reset color depth after connection, so if we don't
re-calculate color_table, monitor/console's background is abnormal.
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com> Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
Fix cirrus and rt8139 co-exist issue in new qemu-dm.
The root cause is that if two MMIO spaces are continuous, qemu may misuse
last MMIO space's read/write to handle current request.
Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com> Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
[HVM] Sync p2m table across all vcpus on x86_32p xen.
We found VGA acceleration can not work on SMP VMX guests on x86_32p
xen, this is caused by the way we construct p2m table today: only the 1st
l2 page table slot that maps p2m table pages is copied to none-vcpu0 vcpu
monitor page table when VMX is created. But VGA acceleration will
create some p2m table entries beyond the 1st l2 page table slot after HVM is
created, so only vcpu0 can get these p2m entries, and other vcpu can
not do VGA acceleration.
[SVM] Correct compile time compare of CONFIG_PAGING_LEVELS for 64bit and
32bit PAE guests. This code affects accesses to the CR4 register by the SVM guest.
Fix Linux so that it does not set a timeout if there are no pending
timers. Fix Xen so that it does not immediately fire a timer event if
it sees a very long timeout -- sometimes this means that there are
no pending timers.