xen/console: Introduce a new console driver for Xen guest
The current Xen console driver is crashing very quickly when using it on ARM
guest. This is because of the console lock is recursive which may lead to
recurse on the tty lock and/or corrupt the ring pointer.
Furthermore, the console lock is not always taken where it should be and has
to be released too early because of the way the console has been designed.
Over the year, code has been added to support various new feature but the
driver has not been reworked. This brings to have code related to the
hypervisor console in ring specific function...
This new driver has been rewritten with this idea to only
have a small set of specific function to write either via the ring or the
hypercall.
Note that HVM support has been left aside for now because it requires external
feature to be used on ARM which are not yet upstreamed. A follow-up patch will
be sent with the ARM guest support.
This new console driver will be added in the build in the following patch.. It
has been divided to help reviewing.
List of items that may be good to have but not mandatory:
- Avoid to flush for each character written when using the tty.
- Use a ops structure to distinguish hypervisor vs ring helpers
- Support multiple console
adrian [Sat, 3 Oct 2015 05:44:05 +0000 (05:44 +0000)]
rum(4): some non-functional changes / cleanup
* Remove unused sc_txtap_len/sc_rxtap_len fields.
* Remove unused ackrate variable.
* Remove unneded warning in rum_update_mcast().
* Use nitems().
* Replace some hardcoded values for RT2573_MAC_CSR1 register.
* Remove second argument for RUM_LOCK_ASSERT() - it is always the same.
grehan [Fri, 2 Oct 2015 21:09:49 +0000 (21:09 +0000)]
Simple sysctl-like firmware query interface. Similar in operation
to the qemu one, and uses the same i/o ports but with different
messaging. Requires the 'bootrom' option to be enabled.
This is used by UEFI (and potentially other BIOSs/firmware) to
request information from bhyve. Currently, only the number of
vCPUs is made available, with more to follow.
A very large thankyou to Ben Perrault who helped out testing
an earlier version of this, and bhyve/Windows in general.
Reviewed by: tychon
Discussed with: neel
Sponsored by: Nahanni Systems
kib [Fri, 2 Oct 2015 13:25:59 +0000 (13:25 +0000)]
Do not set 'flush to zero' VFPSCR_FZ bit by default. The correct
implementation of IEEE 754 arithmetic depends on denormals operating
correctly. Both perl test suite and paranoia tripped over the
setting.
Reported by: Stefan Parvu <sparvu@kronometrix.org>
Discussed with: andrew
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Fri, 2 Oct 2015 13:21:08 +0000 (13:21 +0000)]
FreeBSD does not support SMP on ARMv5. Since processor is always
self-consistent, there is no need in anything but compiler barrier in
the implementation of atomic_thread_fence_*() on ARMv5. Split
implementation of fences for ARMv4/5 and ARMv6; the former use
compiler barriers, the later also perform hardware barriers.
An issue which is fixed by the change is the faults from the CP15
coprocessor accesses in the user mode. This was uncovered by the
pthread_once() changes in r287556.
Reported by: Mattia Rossi <mattia.rossi.mailinglists@gmail.com>
Discussed with: alc, cognet, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
emaste [Fri, 2 Oct 2015 13:16:06 +0000 (13:16 +0000)]
Add debug file extension to kldxref(8) after r288176
After r288176 kernel debug files have the extension .debug. They also
moved to /usr/lib/debug/boot/kernel by default so in the normal case
kldxref does not encounter them. A src.conf(5) setting may be used to
continue installing them in /boot/kernel though, so have kldxref skip
.debug files in addition to .symbols files.
Reported by: fabient
Sponsored by: The FreeBSD Foundation
This change adds the bits that are necessary to fetch system call
arguments and return values from trapframes for CloudABI. This allows us
to properly print system calls with the right name. We need to make sure
that we properly convert error numbers when system calls fail.
We still need to improve truss to pretty-print some of the system calls
that have flags.
araujo [Fri, 2 Oct 2015 08:58:50 +0000 (08:58 +0000)]
The rpc.yppasswdd has an option to not allow shell changes (-s), but is
always passed a shell by the remote yppasswd. If an NIS client overrides the
shell provided by the ypserv, then yppasswd (pam_unix, actually, afaict)
will pass this new shell to the yppasswdd. If this shell has been set on the
client to a shell which is invalid on the server, a user will never be able
to change their password on the client.
bdrewery [Fri, 2 Oct 2015 07:00:43 +0000 (07:00 +0000)]
META_MODE: For some reason meta mode cannot generate the intermediate tab.c
files. Split up all of the targets to be more clear on how they are generated
to fix the problem.
bdrewery [Fri, 2 Oct 2015 06:24:09 +0000 (06:24 +0000)]
META_MODE: Fix stage_links not running in the right order without -j.
This fixes staging errors for non-parallel builds that have LINKS.
Creating hardlinks must always happen after the actual files are installed.
The staging code was protected by an .ORDER statement that only affected
parallel -j builds but not non-parallel builds. Fix this by making the
real stage_links.SET (stage_links.links, stage_links.mlinks, etc) targets
depend on the main targets for all of the other possible staging needs. For
example, stage_links.links will depend on stage_as and stage_files, which have
their own dependencies to stage_as.prog or stage_files.prog or stage_files.SET,
which is enough to satistfy the ordering.
Also remove the requirement that symlinks be created last, as they can
safely be made without the source being present unlike hardlinks. This also
fixes symlinks to come before hardlinks as it is possible, in theory, to
hardlink a symlink. This is not actually supported here though.
jhb [Thu, 1 Oct 2015 18:18:58 +0000 (18:18 +0000)]
Rather than groveling around in a socket address structure for a socket
address's length (and then overriding it if it "looks wrong"), use the
next argument to the system call to determine the length. This is more
reliable since this is what the kernel depends on anyway and is also
simpler.
jhb [Thu, 1 Oct 2015 17:50:41 +0000 (17:50 +0000)]
The id_t type used to pass IDs to wait6(2) and procctl(6) is a 64-bit
integer. Fix the argument decoding to treat this as a quad instead of an
int. This includes using QUAD_ALIGN and QUAD_SLOTS as necessary. To
continue printing IDs in decimal, add a new QuadHex argument type that
prints a 64-bit integer in hex, use QuadHex for the existing off_t arguments,
repurpose Quad to print a 64-bit integer in decimal, and use Quad for id_t
arguments.
This fixes the decoding of wait6(2) and procctl(2) on 32-bit platforms.
jhb [Thu, 1 Oct 2015 17:28:07 +0000 (17:28 +0000)]
- Remove extra integer argument from truncate() and ftruncate(). This is
probably fallout from the removal of the extra padding argument before
off_t in 7. However, that padding still exists for 32-bit powerpc, so
use QUAD_ALIGN.
- Fix QUAD_ALIGN to be zero for powerpc64. It should only be set to 1
for 32-bit platforms that add padding to align 64-bit arguments.
jhb [Thu, 1 Oct 2015 16:59:07 +0000 (16:59 +0000)]
Most error cases in i915_gem_do_execbuffer() jump to one of two labels to
release resources (such as unholding pages) when errors occur. Some
recently added error checks return immediately instead of jumping to a
label resulting in leaks. Fix these to jump to a label to do cleanup
instead.
markj [Thu, 1 Oct 2015 16:34:53 +0000 (16:34 +0000)]
Ensure that vop_stdadvise() does not call getblk() on vnodes that have an
empty bufobj. Otherwise, vnodes belonging to filesystems that do not use the
buffer cache may trigger assertion failures.
andrew [Thu, 1 Oct 2015 12:09:05 +0000 (12:09 +0000)]
An IPI must be cleared before it is handled otherwise next IPI could be
missed. In other words, if a new request for an IPI is sent while the
previous request is being handled but the IPI is not cleared yet, the
clearing of the previous IPI request also clears the new one and the
handling is missed.
There are only three MP interrupt controllers in ARM now. Two of them are
fixed by this change, the third one is correct, probably only just by
accident. The fix is minimalistic as new interrupt framework is awaited.
It was debugged on RPi2 where missing IPI handling together with SCHED_ULE
led to situation in which tdq_ipipending was not cleared and so IPI_PREEMPT
was stopped to be sent. Various odditys were found related to slow system
response time like various events timed out, and slow console response.
cperciva [Thu, 1 Oct 2015 10:52:26 +0000 (10:52 +0000)]
Disable suspend when we're shutting down. This solves the "tell FreeBSD
to shut down; close laptop lid" scenario which otherwise tended to end
with a laptop overheating or the battery dying.
The implementation uses a new sysctl, kern.suspend_blocked; init(8) sets
this while rc.suspend runs, and the ACPI sleep code ignores requests while
the sysctl is set.
Discussed on: freebsd-acpi (35 emails)
MFC after: 1 week
andrew [Thu, 1 Oct 2015 09:36:18 +0000 (09:36 +0000)]
Update the arm64 ttys file to enable the correct uart based on which device
the console is attached to. This makes this file identical to the 32-bit
arm version of this file.
Obtained from: EuroBSDCon devsummit
Sponsored by: ABT Systems Ltd
As a step towards the elimination of PG_CACHED pages, rework the handling
of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to
the head of the inactive queue instead of being cached.
This affects the implementation of POSIX_FADV_NOREUSE as well, since it
works by applying POSIX_FADV_DONTNEED to file ranges after they have been
read or written. At that point the corresponding buffers may still be
dirty, so the previous implementation would coalesce successive ranges and
apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
dirty buffers would eventually be cached. To preserve this behaviour in an
efficient manner, this change adds a new buf flag, B_NOREUSE, which causes
the pages backing a VMIO buf to be placed at the head of the inactive queue
when the buf is released. POSIX_FADV_NOREUSE then works by setting this
flag in bufs that underlie the specified range.
Shell syntax is too complicated to detect command substitution and unquoted
operators reliably without implementing much of sh's parser. Therefore, have
sh do this detection.
While changing sh's support anyway, also read input from a pipe instead of
arguments to avoid {ARG_MAX} limits and improve privacy, and output count
and length using 16 instead of 8 digits.
The basic concept is:
execl("/bin/sh", "sh", "-c", "freebsd_wordexp ${1:+\"$1\"} -f "$2",
"", flags & WRDE_NOCMD ? "-p" : "", <pipe with words>);
The WRDE_BADCHAR error is still implemented in libc. POSIX requires us to
fail strings containing unquoted braces with code WRDE_BADCHAR. Since this
is normally not a syntax error in sh, there is still a need for checking
code in libc, we_check().
The new we_check() is an optimistic check that all the characters
<newline> | & ; < > ( ) { }
are quoted. To avoid duplicating too much sh logic, such characters are
permitted when quoting characters are seen, even if the quoting characters
may themselves be quoted. This code reports all WRDE_BADCHAR errors; bad
characters that get past it and are a syntax error in sh return WRDE_SYNTAX.
Although many implementations of WRDE_NOCMD erroneously allow some command
substitutions (and ours even documented this), there appears to be code that
relies on its security (codesearch.debian.net shows quite a few uses).
Passing untrusted data to wordexp() still exposes a denial of service
possibility and a fairly large attack surface.
Replace most of the beforeinstall: hack with FILES mechanism.
This now generates the files into the OBJDIR as needed. Some of the files
are installed directly from the src directory. Files which are generated
from the src directory are renamed to .in to generate them and avoid
colliding with the checked-in file when CURDIR=OBJDIR.
The remaining beforeinstall: handling still needs to be reworked as it does
not work well with staging for packaging.
META_MODE: Remove unneeded groff/tmac special GENDIRDEPS_FILTER.
This is converting the path usr/share/tmac.*stage to something else, but
nothing ever installs or reads from such a path. They might look in
stage.*usr/share/tmac, but that's not what this is matching. Additionally
the .dirdeps match all of the tmac files back to gnu/usr.bin/groff/tmac
fine.
Several changes to truss.
- Refactor the interface between the ABI-independent code and the
ABI-specific backends. The backends now provide smaller hooks to
fetch system call arguments and return values. The rest of the
system call entry and exit handling that was previously duplicated
among all the backends has been moved to one place.
- Merge the loop when waiting for an event with the loop for handling stops.
This also means not emulating a procfs-like interface on top of ptrace().
Instead, use a single event loop that fetches process events via waitid().
Among other things this allows us to report the full 32-bit exit value.
- Use PT_FOLLOW_FORK to follow new child processes instead of forking a new
truss process for each new child. This allows one truss process to monitor
a tree of processes and truss -c should now display one total for the
entire tree instead of separate summaries per process.
- Use the recently added fields to ptrace_lwpinfo to determine the current
system call number and argument count. The latter is especially useful
and fixes a regression since the conversion from procfs. truss now
generally prints the correct number of arguments for most system calls
rather than printing extra arguments for any call not listed in the
table in syscalls.c.
- Actually check the new ABI when processes call exec. The comments claimed
that this happened but it was not being done (perhaps this was another
regression in the conversion to ptrace()). If the new ABI after exec
is not supported, truss detaches from the process. If truss does not
support the ABI for a newly executed process the process is killed
before it returns from exec.
- Along with the refactor, teach the various ABI-specific backends to
fetch both return values, not just the first. Use this to properly
report the full 64-bit return value from lseek(). In addition, the
handler for "pipe" now pulls the pair of descriptors out of the
return values (which is the true kernel system call interface) but
displays them as an argument (which matches the interface exported by
libc).
- Each ABI handler adds entries to a linker set rather than requiring
a statically defined table of handlers in main.c.
- The arm and mips system call fetching code was changed to follow the
same pattern as amd64 (and the in-kernel handler) of fetching register
arguments first and then reading any remaining arguments from the
stack. This should fix indirect system call arguments on at least
arm.
- The mipsn32 and n64 ABIs will now look for arguments in A4 through A7.
- Use register %ebp for the 6th system call argument for Linux/i386 ABIs
to match the in-kernel argument fetch code.
- For powerpc binaries on a powerpc64 system, fetch the extra arguments
on the stack as 32-bit values that are then copied into the 64-bit
argument array instead of reading the 32-bit values directly into the
64-bit array.
Reviewed by: kib (earlier version)
Tested on: amd64 (FreeBSD/amd64 & i386), i386, arm (earlier version)
Tested on: powerpc64 (FreeBSD/powerpc64 & powerpc)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D3575
Skip the B_flag testcase to stop blowing up freebsd-current@ with
"test failure emails" because kyua report-jenkins doesn't properly
escape non-printable chars
Take extra reference to security policy before calling crypto_dispatch().
Currently we perform crypto requests for IPSEC synchronous for most of
crypto providers (software, aesni) and only VIA padlock calls crypto
callback asynchronous. In synchronous mode it is possible, that security
policy will be removed during the processing crypto request. And crypto
callback will release the last reference to SP. Then upon return into
ipsec[46]_process_packet() IPSECREQUEST_UNLOCK() will be called to already
freed request. To prevent this we will take extra reference to SP.
Have lockstat(1) trace locks by name rather than by address.
Previously, lockstat(1) would use a lock's address as its identifier when
consuming data describing lock contention and hold events. After collecting
the requested data, it would use ksyms(4) to resolve lock addresses to
names. Of course, this doesn't work too well for locks contained in
dynamically-allocated memory. This change modifies lockstat(1) to trace the
lock names obtained from the base struct lock_object instead, leading to
output that is generally much more useful.
This change also removes the -c option, which is used to coalesce data for
locks in an array. It's not possible to support this option without also
tracing lock addresses, and since lock arrays in which the lock names are
distinct are not very common in FreeBSD, it's simpler to just remove the
option.
adrian [Wed, 30 Sep 2015 05:19:16 +0000 (05:19 +0000)]
modify the rssi logic a bit to actually return a useful rssi.
The fullmac firmware doesn't seem to populate a useful rssi indicator
in the RX descriptor, so if one plotted said values, they'd basically
look like garbage.
The reference driver implements a "get current rssi" firmware command
which I guess is really meant for station operation only (as hostap
operation would need rssi per station, not a single firmware read.)
So:
* populate sc_currssi during each calibration run;
* use this in the RX path instead of trying to reconstruct the RSSI
value and passing it around as a pointer;
* do up a quick hack to map the rssi hardware value to some useful
signal level;
* the survey results provide an RSSI value between 0..100, so just
do another quick hack to map it into some usefulish signal level;
* supply a faked noise floor - I haven't yet found how to pull it
out of the firmware.
The scan results and the station RSSI information is now more useful
for indicating signal strength / distance.
Stop hard-coding a 32-bit data model for USDT tests, and just use the native
model. This was causing many of the tests to fail on amd64 since USDT
support for 32-bit programs is currently non-functional.
When processing ICMP need frag message, ignore the suggested MTU unless it
is smaller than the current one for this connection. This is behavior
specified by RFC 1191, and this is how original BSD stack behaved, but this
was unintentionally regressed in r182851.
Reported & tested by: Richard Russo <russor whatsapp.com>
Differential Revision: D3567
Sponsored by: Nginx, Inc.
When stopping ugidfw, it is not enough to just try unloading the module. If
the module is built-in to the kernel then the kldunload will fail. Rather
than do this just check if there are rules and then remove them all.
Add requirement on FILESYSTEMS to ensure /usr is present for /usr/sbin/ugidfw
and /usr/bin/xargs. This was already effectively the ordering from rcorder(8).
The Sun RPC framework uses a netbuf structure to represent the
transport specific form of a universal transport address. The
structure is expected to be opaque to consumers. In the current
implementation, the structure contains a pointer to a buffer
that holds the actual address.
In rpcbind(8), netbuf structures are copied directly, which would
result in two netbuf structures that reference to one shared
address buffer. When one of the two netbuf structures is freed,
access to the other netbuf structure would result in an undefined
result that may crash the rpcbind(8) daemon.
Fix this by making a copy of the buffer that is going to be freed
instead of doing a shallow copy.
In addition to the ubldr file, also copy ubldr.bin to the
MS-DOS partition. This will help with transitioning to
a single arm/armv6 userland build which could be used for
all FreeBSD/armv6 images without UBLDR_LOADADDR being set
for each board (ultimately requiring a separate buildworld
for each currently).
Requested by: ian
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
When XSAVE support was added on amd64, the FPU save area was moved
out of 'struct pcb' and into a variable-sized region after the
structure. The kgdb code currently only reads the pcb. It does not
read in the FPU save area but instead passes stack garbage as the
FPU's saved context. Fixing this would mean determining the proper
size of the area and fetching it. However, this state is not saved
for running CPUs in stoppcbs[], so the callback would also have to
know to ignore those pcbs. Instead, just remove the call since it is
of limited usefulness. It results in kgdb reporting the state of the
FPU/SIMD registers in userland, not their current values in the kernel.
In particular, it does not report the correct state for any code in
the kernel which does use the FPU and would report incorrect values
in that case.