* tag 'win-socket-pull-request' of https://gitlab.com/marcandre.lureau/qemu: (25 commits)
monitor: restrict command getfd to POSIX hosts
qtest: enable vnc-display test on win32
libqtest: make qtest_qmp_add_client work on win32
qmp: add 'get-win32-socket'
monitor: release the lock before calling close()
qmp: 'add_client' actually expects sockets
osdep: implement qemu_socketpair() for win32
tests/docker: fix a win32 error due to portability
char: do not double-close fd when failing to add client
tests: fix path separator, use g_build_filename()
win32: replace closesocket() with close() wrapper
os-posix: remove useless ioctlsocket() define
win32: avoid mixing SOCKET and file descriptor space
slirp: open-code qemu_socket_(un)select()
slirp: unregister the win32 SOCKET
main-loop: remove qemu_fd_register(), win32/slirp/socket specific
aio/win32: aio_set_fd_handler() only supports SOCKET
aio: make aio_set_fd_poll() static to aio-posix.c
win32/socket: introduce qemu_socket_unselect() helper
win32/socket: introduce qemu_socket_select() helper
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A process with enough capabilities can duplicate a socket to QEMU. Add a
QMP command to import it and add it to the monitor fd list, so it can be
later used by other commands.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230306122751.2355515-9-marcandre.lureau@redhat.com>
Whether it is SPICE, VNC, D-Bus, or the socket chardev, they all
actually expect a socket kind or will fail in different ways at runtime.
Throw an error early if the given 'add_client' fd is not a socket, and
close it to avoid leaks.
This allows to replace the close() call with a more correct & portable
closesocket() version.
(this will allow importing sockets on Windows with a specialized command
in the following patch, while keeping the remaining monitor associated
sockets/add_client code & usage untouched)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230306122751.2355515-6-marcandre.lureau@redhat.com>
Manually implement a socketpair() function, using UNIX sockets and
simple peer credential checking.
QEMU doesn't make much use of socketpair, beside vhost-user which is not
available for win32 at this point. However, I intend to use it for
writing some new portable tests.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230306122751.2355515-5-marcandre.lureau@redhat.com>
char: do not double-close fd when failing to add client
The caller is already closing the fd on failure.
Fixes: c3054a6e6a ("char: Factor out qmp_add_client() parts and move to chardev/") Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230306122751.2355515-3-marcandre.lureau@redhat.com>
win32: avoid mixing SOCKET and file descriptor space
Until now, a win32 SOCKET handle is often cast to an int file
descriptor, as this is what other OS use for sockets. When necessary,
QEMU eventually queries whether it's a socket with the help of
fd_is_socket(). However, there is no guarantee of conflict between the
fd and SOCKET space. Such conflict would have surprising consequences,
we shouldn't mix them.
Also, it is often forgotten that SOCKET must be closed with
closesocket(), and not close().
Instead, let's make the win32 socket wrapper functions return and take a
file descriptor, and let util/ wrappers do the fd/SOCKET conversion as
necessary. A bit of adaptation is necessary in io/ as well.
Unfortunately, we can't drop closesocket() usage, despite
_open_osfhandle() documentation claiming transfer of ownership, testing
shows bad behaviour if you forget to call closesocket().
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-Id: <20230221124802.4103554-15-marcandre.lureau@redhat.com>
Presumably, this is what should happen when the SOCKET is to be removed.
(it probably worked until now because closesocket() does it implicitly,
but we never now how the slirp library could use the SOCKET later)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-Id: <20230221124802.4103554-13-marcandre.lureau@redhat.com>
This is a wrapper for WSAEventSelect, with Error handling. By default,
it will produce a warning, so callers don't have to be modified
now, and yet we can spot potential mis-use.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-Id: <20230221124802.4103554-7-marcandre.lureau@redhat.com>
Fortunately, qemu_fork() is no longer used since commit a95570e3e4d6 ("io/command: use glib GSpawn, instead of open-coding
fork/exec"). (GSpawn uses posix_spawn() whenever possible instead)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230221124802.4103554-2-marcandre.lureau@redhat.com>
Fix gdt on i386/x86_64
Handle traps on sparc
Add translation for argument of msync
Emulate CLONE_PIDFD flag in clone
handle netlink flag NLA_F_NESTED
fix sockaddr_in6 endianness
Fix brk() to release pages
fill out task state in /proc/self/stat
add support for xtensa FDPIC
Fix unaligned memory access in prlimit64 syscall
add target to host netlink conversions
fix timerfd read endianness conversion
Fix access to /proc/self/exe
Add strace for prlimit64() syscall
* tag 'for-upstream' of https://repo.or.cz/qemu/kevin:
qed: remove spurious BDRV_POLL_WHILE()
iotests/308: Add test for 'write -zu'
block/fuse: Let PUNCH_HOLE write zeroes
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* tag 'pull-gitdm-100323-1' of https://gitlab.com/stsquad/qemu:
contrib/gitdm: add Idan to IBM's group map
contrib/gitdm: Add Facebook the domain map
contrib/gitdm: add Tsukasa as an individual contributor
contrib/gitdm: Add Ventana Micro Systems to the domain map
contrib/gitdm: Add VRULL to the domain map
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Sat, 11 Mar 2023 17:17:17 +0000 (17:17 +0000)]
Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging
# -----BEGIN PGP SIGNATURE-----
# Version: GnuPG v1
#
# iQEcBAABAgAGBQJkCvgFAAoJEO8Ells5jWIRHiUH/jhydpJHIqnAPxHQAwGtmyhb
# 9Z52UOzW5V6KxfZJ+bQ4RPFkS2UwcxmeadPHY4zvvJTVBLAgG3QVgP4igj8CXKCI
# xRnwMgTNeu655kZQ5P/elTwdBTCJFODk7Egg/bH3H1ZiUhXBhVRhK7q/wMgtlZkZ
# Kexo6txCK4d941RNzEh45ZaGhdELE+B+D7cRuQgBs/DXZtJpsyEzBbP8KYSMHuER
# AXfWo0YIBYj7X3ek9D6j0pbOkB61vqtYd7W6xV4iDrJCcFBIOspJbbBb1tGCHola
# AXo5/OhRmiQnp/c/HTbJIDbrj0sq/r7LxYK4zY1x7UPbewHS9R+wz+FfqSmoBF0=
# =056y
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 10 Mar 2023 09:27:33 GMT
# gpg: using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* tag 'net-pull-request' of https://github.com/jasowang/qemu: (44 commits)
ebpf: fix compatibility with libbpf 1.0+
docs/system/devices/igb: Add igb documentation
tests/avocado: Add igb test
igb: Introduce qtest for igb device
tests/qtest/libqos/e1000e: Export macreg functions
tests/qtest/e1000e-test: Fabricate ethernet header
Intrdocue igb device emulation
e1000: Split header files
pcie: Introduce pcie_sriov_num_vfs
net/eth: Introduce EthL4HdrProto
e1000e: Implement system clock
net/eth: Report if headers are actually present
e1000e: Count CRC in Tx statistics
e1000: Count CRC in Tx statistics
e1000e: Combine rx traces
MAINTAINERS: Add e1000e test files
MAINTAINERS: Add Akihiko Odaki as a e1000e reviewer
e1000e: Do not assert when MSI-X is disabled later
hw/net/net_tx_pkt: Check the payload length
hw/net/net_tx_pkt: Implement TCP segmentation
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
linux-user: fix bug about incorrect base addresss of gdt on i386 and x86_64
On linux user mode, CPUX86State::gdt::base from Different CPUX86State
Objects have same value, It is incorrect! Every CPUX86State::gdt::base
Must points to independent memory space.
The other types, such as FSR_FTT_UNIMPFPOP, should not appear,
because we enable normal emulation of missing insns at the
start of sparc_cpu_realizefn().
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230216054516.1267305-15-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Since qemu does not implement a sparc coprocessor, all such
instructions raise this trap. Because of that, we never raise
the coprocessor exception trap, which would be vector 0x28.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230216054516.1267305-13-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
For the most part priviledged opcodes are ifdefed out of the
user-only sparc translator, which will then incorrectly produce
illegal opcode traps. But there are some code paths that
properly raise TT_PRIV_INSN, so we must handle it.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230216054516.1267305-11-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
These traps are present for sparc64 with ilp32, aka sparc32plus.
Enabling them means adjusting the defines over in signal.c,
and fixing an incorrect usage of abi_ulong when we really meant
the full register, target_ulong.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230216054516.1267305-7-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
For sparc32, 0x88 is the "Slowaris" system call, currently BAD_TRAP
in the kernel's ttable_32.S. For sparc64, 0x110 is tl0_linux32, the
sparc32 trap, now folded into the TARGET_ABI32 case via TT_TRAP.
For sparc64, there does still exist trap 0x111 as tl0_oldlinux64,
which was replaced by 0x16d as tl0_linux64 in 1998. Since no one
has noticed, don't bother implementing it now.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230216054516.1267305-3-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Helge Deller [Tue, 29 Nov 2022 11:08:20 +0000 (12:08 +0100)]
linux-user: Emulate CLONE_PIDFD flag in clone()
Add emulation for the CLONE_PIDFD flag of the clone() syscall.
This flag was added in Linux kernel 5.2.
Successfully tested on a x86-64 Linux host with hppa-linux target.
Can be verified by running the testsuite of the qcoro debian package,
which breaks hard and kills the currently logged-in user without this
patch.
Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <Y4XoJCpvUA1JD7Sj@p100>
[lv: define CLONE_PIDFD if it is not] Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Helge Deller [Tue, 31 Jan 2023 17:18:36 +0000 (18:18 +0100)]
linux-user: Provide print_raw_param64() for 64-bit values
Add a new function print_raw_param64() to print 64-bit values in the
same way as print_raw_param(). This prevents that qemu_log() is used to
work around the problem that print_raw_param() can only print 32-bit
values when compiled for 32-bit targets.
Additionally convert the existing 64-bit users in print_timespec64(),
print_rlimit64() and print_preadwrite64() over to this new function and
drop some unneccessary spaces.
Helge Deller [Sun, 25 Dec 2022 08:23:19 +0000 (09:23 +0100)]
linux-user: Fix brk() to release pages
The current brk() implementation does not de-allocate pages if a lower
address is given compared to earlier brk() calls.
But according to the manpage, brk() shall deallocate memory in this case
and currently it breaks a real-world application, specifically building
the debian gcl package in qemu-user.
Fix this issue by reworking the qemu brk() implementation.
Tested with the C-code testcase included in qemu commit 4d1de87c750, and
by building debian package of gcl in a hppa-linux guest on a x86-64
host.
Ilya Leoshkevich [Fri, 24 Feb 2023 00:39:06 +0000 (01:39 +0100)]
linux-user: Fix unaligned memory access in prlimit64 syscall
target_rlimit64 contains uint64_t fields, so it's 8-byte aligned on
some hosts, while some guests may align their respective type on a
4-byte boundary. This may lead to an unaligned access, which is an UB.
Fix by defining the fields as abi_ullong. This makes the host alignment
match that of the guest, and lets the compiler know that it should emit
code that can deal with the guest alignment.
While at it, also use __get_user() and __put_user() instead of
tswap64().
Helge Deller [Mon, 5 Dec 2022 11:38:25 +0000 (12:38 +0100)]
linux-user: Fix access to /proc/self/exe
When accsssing /proc/self/exe from a userspace program, linux-user tries
to resolve the name via realpath(), which may fail if the process
changed the working directory in the meantime.
An example:
- a userspace program ist started with ./testprogram
- the program runs chdir("/tmp")
- then the program calls readlink("/proc/self/exe")
- linux-user tries to run realpath("./testprogram") which fails
because ./testprogram isn't in /tmp
- readlink() will return -ENOENT back to the program
Avoid this issue by resolving the full path name of the started process
at startup of linux-user and store it in real_exec_path[]. This then
simplifies the emulation of readlink() and readlinkat() as well, because
they can simply copy the path string to userspace.
I noticed this bug because the testsuite of the debian package "pandoc"
failed on linux-user while it succeeded on real hardware. The full log
is here:
https://buildd.debian.org/status/fetch.php?pkg=pandoc&arch=hppa&ver=2.17.1.1-1.1%2Bb1&stamp=1670153210&raw=0
Alex Bennée [Mon, 19 Dec 2022 12:19:11 +0000 (12:19 +0000)]
contrib/gitdm: Add Facebook the domain map
A number of Facebook developers contribute to the project. Peter can
you confirm your want pjd.dev contributions counted here or as
an individual contributor?
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Cc: Iris Chen <irischenlj@fb.com> Cc: Daniel Müller <muellerd@fb.com> Reviewed-by: Peter Delevoryas <peter@pjd.dev>
Message-Id: <20221219121914.851488-9-alex.bennee@linaro.org>
Alex Bennée [Mon, 19 Dec 2022 12:19:09 +0000 (12:19 +0000)]
contrib/gitdm: add Tsukasa as an individual contributor
I wasn't sure if you want to be added as an individual contributor or
an academic so please confirm.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Message-Id: <20221219121914.851488-7-alex.bennee@linaro.org>
Alex Bennée [Mon, 19 Dec 2022 12:19:06 +0000 (12:19 +0000)]
contrib/gitdm: Add VRULL to the domain map
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Message-Id: <20221219121914.851488-4-alex.bennee@linaro.org>
Peter Maydell [Fri, 10 Mar 2023 14:31:37 +0000 (14:31 +0000)]
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, fixes
Several features that landed at the last possible moment:
Passthrough HDM decoder emulation
Refactor cryptodev
RAS error emulation and injection
acpi-index support on non-hotpluggable slots
Dynamically switch to vhost shadow virtqueues at vdpa net migration
Plus a couple of bugfixes that look important to have in the release.
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (72 commits)
virtio: fix reachable assertion due to stale value of cached region size
hw/virtio/vhost-user: avoid using unitialized errp
hw/pxb-cxl: Support passthrough HDM Decoders unless overridden
hw/pci: Add pcie_count_ds_port() and pcie_find_port_first() helpers
hw/mem/cxl_type3: Add CXL RAS Error Injection Support.
hw/pci/aer: Make PCIE AER error injection facility available for other emulation to use.
hw/cxl: Fix endian issues in CXL RAS capability defaults / masks
hw/mem/cxl-type3: Add AER extended capability
hw/pci-bridge/cxl_root_port: Wire up MSI
hw/pci-bridge/cxl_root_port: Wire up AER
hw/pci/aer: Add missing routing for AER errors
hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
pcihp: add ACPI PCI hotplug specific is_hotpluggable_bus() callback
pcihp: move fields enabling hotplug into AcpiPciHpState
acpi: pci: move out ACPI PCI hotplug generator from generic slot generator build_append_pci_bus_devices()
acpi: pci: move BSEL into build_append_pcihp_slots()
acpi: pci: drop BSEL usage when deciding that device isn't hotpluggable
pci: move acpi-index uniqueness check to generic PCI device code
tests: acpi: update expected blobs
tests: acpi: add non zero function device with acpi-index on non-hotpluggble bus
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Stefan Hajnoczi [Thu, 9 Mar 2023 16:31:34 +0000 (11:31 -0500)]
qed: remove spurious BDRV_POLL_WHILE()
This looks like a copy-paste or merge error. BDRV_POLL_WHILE() is
already called above. It's not needed in the qemu_in_coroutine() case.
Fixes: 9fb4dfc570ce ("qed: make bdrv_qed_do_open a coroutine_fn") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230309163134.398707-1-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Hanna Czenczek [Mon, 27 Feb 2023 10:47:25 +0000 (11:47 +0100)]
iotests/308: Add test for 'write -zu'
Try writing zeroes to a FUSE export while allowing the area to be
unmapped; block/file-posix.c generally implements writing zeroes with
BDRV_REQ_MAY_UNMAP ('write -zu') by calling fallocate(PUNCH_HOLE). This
used to lead to a blk_pdiscard() in the FUSE export, which may or may
not lead to the area being zeroed. HEAD^ fixed this to use
blk_pwrite_zeroes() instead (again with BDRV_REQ_MAY_UNMAP), so verify
that running `qemu-io 'write -zu'` on a FUSE exports always results in
zeroes being written.
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230227104725.33511-3-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Hanna Czenczek [Mon, 27 Feb 2023 10:47:24 +0000 (11:47 +0100)]
block/fuse: Let PUNCH_HOLE write zeroes
fallocate(2) says about PUNCH_HOLE: "After a successful call, subsequent
reads from this range will return zeros." As it is, PUNCH_HOLE is
implemented as a call to blk_pdiscard(), which does not guarantee this.
We must call blk_pwrite_zeroes() instead. The difference to ZERO_RANGE
is that we pass the `BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK` flags to
the call -- the storage is supposed to be unmapped, and a slow fallback
by actually writing zeroes as data is not allowed.
Closes: https://gitlab.com/qemu-project/qemu/-/issues/1507 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230227104725.33511-2-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Maydell [Fri, 10 Mar 2023 11:31:33 +0000 (11:31 +0000)]
Merge tag 'qga-pull-2023-03-08' of github.com:kostyanf14/qemu into staging
qga-pull-2023-03-08
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEEwsLBCepDxjwUI+uE711egWG6hOcFAmQI6V0ACgkQ711egWG6
# hOegDBAAk9S6bszwuvPUIupNofujYkFKrrgHiujTOmPVXvD52C8FhojKTlW3d1QT
# f50vkMkMgavewPzsJU0SAu9kX80uOprDZwUYZ/3awSRPgL1zfFnZSZj5C/Pk4tD2
# 9rD8YjVPgvRpIhqZGTaAr97NFpigslMdba8SaucHPA1FmwRUzs1lWCX2hK9ewWuD
# /3/6Dy9mVoFGdEru2kNO5uZFUVsfatZMUQS8oOdgwHtYRkVwV7olPglQZ/iqACor
# yegxAt5tUL3WJIiYAVntiGSos0QnD7AgrGnSM5398uA4/oMVdehpAf5TyUOJ0QEy
# aq51TGZQ6Vc/0sYrsO65zaNXNsgNx1jAl7BcBleawyrdM8q/ILStXelk3MFR7Dbz
# dNi5NNHK4acEStk5XJZHc+bPQybjeWGCsQY9NBO5zLmZO2gCWnjN/nWxT6ivAgzF
# JlYfiiuLku/sZBGun7giHsKQ0EFeMzi+DdKsX3AoJhA+RJ/XEa88MTIh4EIK/tsj
# BwoPtrngsHvkazwgpb1Fa204kTAhmjx+2bpyEiNAxcRTgShxXIsm09xGOi5w8z3Q
# 48kLmkPL/xwKLImh1hx4z612VhCwdhMaLgKmri5i99jWoKJqwpnUf04JtEvyYM0d
# ErlBQsz1GOjVZTm9yCtqZwjZeM5kK83lRu7fUxmtTTA3G1H/EAM=
# =wWvd
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 08 Mar 2023 20:00:29 GMT
# gpg: using RSA key C2C2C109EA43C63C1423EB84EF5D5E8161BA84E7
# gpg: Good signature from "Kostiantyn Kostiuk (Upstream PR sign) <kkostiuk@redhat.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: C2C2 C109 EA43 C63C 1423 EB84 EF5D 5E81 61BA 84E7
* tag 'qga-pull-2023-03-08' of github.com:kostyanf14/qemu:
qga/win/vss: requester_freeze changes
qga/win/vss: query VSS backup type
qga/win/installer: add VssOption to installer
qga/win32: Use rundll for VSS installation
qga/win32: Remove change action from MSI installer
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Shreesh Adiga [Sun, 18 Dec 2022 14:39:27 +0000 (20:09 +0530)]
ebpf: fix compatibility with libbpf 1.0+
The current implementation fails to load on a system with
libbpf 1.0 and reports that legacy map definitions in 'maps'
section are not supported by libbpf v1.0+. This commit updates
the Makefile to add BTF (-g flag) and appropriately updates
the maps in rss.bpf.c and update the skeleton file in repo.
Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Fri, 10 Mar 2023 06:11:25 +0000 (14:11 +0800)]
igb: Introduce qtest for igb device
This change is derived from qtest for e1000e device.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Thomas Huth <thuth@redhat.com>
[Jason: make qtest work for win32 (only hotplug)] Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 9 Mar 2023 03:54:57 +0000 (11:54 +0800)]
Intrdocue igb device emulation
This change introduces emulation for the Intel 82576 adapter, AKA igb.
The details of the device will be provided by the documentation that
will follow this change.
This initial implementation of igb does not cover the full feature set,
but it selectively implements changes necessary to pass tests of Linut
Test Project, and Windows HLK. The below is the list of the implemented
changes; anything not listed here is not implemented:
New features:
- igb advanced descriptor handling
- Support of 16 queues
- SRRCTL.BSIZEPACKET register field
- SRRCTL.RDMTS register field
- Tx descriptor completion writeback
- Extended RA registers
- VMDq feature
- MRQC "Multiple Receive Queues Enable" register field
- DTXSWC.Loopback_en register field
- VMOLR.ROMPE register field
- VMOLR.AUPE register field
- VLVF.VLAN_id register field
- VLVF.VI_En register field
- VF
- Mailbox
- Reset
- Extended interrupt registers
- Default values for IGP01E1000 PHY registers
Akihiko Odaki [Thu, 23 Feb 2023 10:50:51 +0000 (19:50 +0900)]
e1000: Split header files
Some definitions in the header files are invalid for igb so extract
them to new header files to keep igb from referring to them.
Signed-off-by: Gal Hammer <gal.hammer@sap.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:50:50 +0000 (19:50 +0900)]
pcie: Introduce pcie_sriov_num_vfs
igb can use this function to change its behavior depending on the
number of virtual functions currently enabled.
Signed-off-by: Gal Hammer <gal.hammer@sap.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:50:49 +0000 (19:50 +0900)]
net/eth: Introduce EthL4HdrProto
igb, a new network device emulation, will need SCTP checksum offloading.
Currently eth_get_protocols() has a bool parameter for each protocol
currently it supports, but there will be a bit too many parameters if
we add yet another protocol.
Introduce an enum type, EthL4HdrProto to represent all L4 protocols
eth_get_protocols() support with one parameter.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:18 +0000 (19:20 +0900)]
e1000e: Implement system clock
The system clock is necessary to implement PTP features. While we are
not implementing PTP features for e1000e yet, we do have a plan to
implement them for igb, a new network device derived from e1000e,
so add system clock to the common base first.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:17 +0000 (19:20 +0900)]
net/eth: Report if headers are actually present
The values returned by eth_get_protocols() are used to perform RSS,
checksumming and segmentation. Even when a packet signals the use of the
protocols which these operations can be applied to, the headers for them
may not be present because of too short packet or fragmentation, for
example. In such a case, the operations cannot be applied safely.
Report the presence of headers instead of whether the use of the
protocols are indicated with eth_get_protocols(). This also makes
corresponding changes to the callers of eth_get_protocols() to match
with its new signature and to remove redundant checks for fragmentation.
Fixes: 75020a7021 ("Common definitions for VMWARE devices") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:16 +0000 (19:20 +0900)]
e1000e: Count CRC in Tx statistics
The datasheet 8.19.29 "Good Packets Transmitted Count - GPTC (0x04080;
RC)" says:
> This register counts the number of good (no errors) packets
> transmitted. A good transmit packet is considered one that is 64 or
> more bytes in length (from <Destination Address> through <CRC>,
> inclusively) in length.
It also says similar for the other Tx statistics registers. Add the
number of bytes for CRC to those registers.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:15 +0000 (19:20 +0900)]
e1000: Count CRC in Tx statistics
The Software Developer's Manual 13.7.4.5 "Packets Transmitted (64 Bytes)
Count" says:
> This register counts the number of packets transmitted that are
> exactly 64 bytes (from <Destination Address> through <CRC>,
> inclusively) in length.
It also says similar for the other Tx statistics registers. Add the
number of bytes for CRC to those registers.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:14 +0000 (19:20 +0900)]
e1000e: Combine rx traces
Whether a packet will be written back to the guest depends on the
remaining space of the queue. Therefore, e1000e_rx_written_to_guest and
e1000e_rx_not_written_to_guest should log the index of the queue instead
of generated interrupts. This also removes the need of
e1000e_rx_rss_dispatched_to_queue, which logs the queue index.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:13 +0000 (19:20 +0900)]
MAINTAINERS: Add e1000e test files
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:12 +0000 (19:20 +0900)]
MAINTAINERS: Add Akihiko Odaki as a e1000e reviewer
I want to know to be notified when there is a new change for e1000e
as e1000e is similar to igb and such a change may also be applicable for
igb.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:11 +0000 (19:20 +0900)]
e1000e: Do not assert when MSI-X is disabled later
Assertions will fail if MSI-X gets disabled while a timer for MSI-X
interrupts is running so remove them to avoid abortions. Fortunately,
nothing bad happens even if the assertions won't trigger as
msix_notify(), called by timer handlers, does nothing when MSI-X is
disabled.
This bug was found by Alexander Bulekov when fuzzing igb, a new
device implementation derived from e1000e:
https://patchew.org/QEMU/20230129053316.1071513-1-alxndr@bu.edu/
The fixed test case is:
fuzz/crash_aea040166819193cf9fedb810c6d100221da721a
Fixes: 6f3fbe4ed0 ("net: Introduce e1000e device emulation") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:09 +0000 (19:20 +0900)]
hw/net/net_tx_pkt: Implement TCP segmentation
There was no proper implementation of TCP segmentation before this
change, and net_tx_pkt relied solely on IPv4 fragmentation. Not only
this is not aligned with the specification, but it also resulted in
corrupted IPv6 packets.
This is particularly problematic for the igb, a new proposed device
implementation; igb provides loopback feature for VMDq and the feature
relies on software segmentation.
Implement proper TCP segmentation in net_tx_pkt to fix such a scenario.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:08 +0000 (19:20 +0900)]
e1000e: Perform software segmentation for loopback
e1000e didn't perform software segmentation for loopback if virtio-net
header is enabled, which is wrong.
To fix the problem, introduce net_tx_pkt_send_custom(), which allows the
caller to specify whether offloading should be assumed or not.
net_tx_pkt_send_custom() also allows the caller to provide a custom
sending function. Packets with virtio-net headers and ones without
virtio-net headers will be provided at the same time so the function
can choose the preferred version. In case of e1000e loopback, it prefers
to have virtio-net headers as they allows to skip the checksum
verification if VIRTIO_NET_HDR_F_DATA_VALID is set.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:07 +0000 (19:20 +0900)]
hw/net/net_rx_pkt: Remove net_rx_pkt_has_virt_hdr
When virtio-net header is not set, net_rx_pkt_get_vhdr() returns
zero-filled virtio_net_hdr, which is actually valid. In fact, tap device
uses zero-filled virtio_net_hdr when virtio-net header is not provided
by the peer. Therefore, we can just remove net_rx_pkt_has_virt_hdr() and
always assume NetTxPkt has a valid virtio-net header.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:05 +0000 (19:20 +0900)]
net: Strip virtio-net header when dumping
filter-dump specifiees Ethernet as PCAP LinkType, which does not expect
virtio-net header. Having virtio-net header in such PCAP file breaks
PCAP unconsumable. Unfortunately currently there is no LinkType for
virtio-net so for now strip virtio-net header to convert the output to
Ethernet.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:03 +0000 (19:20 +0900)]
net: Check L4 header size
net_tx_pkt_build_vheader() inspects TCP header but had no check for
the header size, resulting in an undefined behavior. Check the header
size and drop the packet if the header is too small.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:02 +0000 (19:20 +0900)]
e1000e: Remove extra pointer indirection
e1000e_write_packet_to_guest() passes the reference of variable ba as a
pointer to an array, and that pointer indirection is just unnecessary;
all functions which uses the passed reference performs no pointer
operation on the pointer and they simply dereference the passed
pointer. Remove the extra pointer indirection.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:01 +0000 (19:20 +0900)]
e1000e: Set MII_ANER_NWAY
This keeps Windows driver 12.18.9.23 from generating an event with ID
30. The description of the event is as follows:
> Intel(R) 82574L Gigabit Network Connection
> PROBLEM: The network adapter is configured for auto-negotiation but
> the link partner is not. This may result in a duplex mismatch.
> ACTION: Configure the link partner for auto-negotiation.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
Akihiko Odaki [Thu, 23 Feb 2023 10:20:00 +0000 (19:20 +0900)]
e1000e: Introduce e1000_rx_desc_union
Before this change, e1000e_write_packet_to_guest() allocated the
receive descriptor buffer as an array of uint8_t. This does not ensure
the buffer is sufficiently aligned.
Introduce e1000_rx_desc_union type, a union type of all receive
descriptor types to correct this.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>