Andrei Tatar [Wed, 22 Jan 2025 22:16:37 +0000 (23:16 +0100)]
lib/posix-timerfd: Output correct old_value on set
Previously settime() would output the old timerfd setting verbatim, as
an absolute deadline; this contradicts timerfd_settime(2) which clearly
states that old_value should be output with the same semantics as
gettime() -- relative time remaining until the next expiration.
This change makes settime() calculate and output this time correctly.
Andrei Tatar [Wed, 22 Jan 2025 22:26:44 +0000 (23:26 +0100)]
lib/posix-timerfd: Fix counter on subsequent reads
The internal counter of a timerfd is returned on read() and reset to 0,
and should be increased by 1 for every subsequent expiration.
A logic error in the current code makes every update set the counter to
the total expirations rather than since last read, leading to subsequent
successful reads returning wrong counts.
This change fixes this error.
Andrei Tatar [Wed, 25 Sep 2024 15:38:05 +0000 (17:38 +0200)]
lib/posix-socket: Give name to opened sockets
This change names newly opened sockets as such:
- generic sockets: "socket"
- sockets opened by a call to accept(): "socket:accepted"
- sockets opened by socketpair(): "socket:pair"
Andrei Tatar [Wed, 25 Sep 2024 15:30:36 +0000 (17:30 +0200)]
lib/posix-fdtab: Allow named files & partial open
This change expands the fdtab API, separating the action of creating a
new open file description with that of associating it with an fd. This
allows callers to perform additional initialization on the open file
description before the fd goes live.
One notable such init is filling in the `name` field of the ofile,
which the API additions now support and take care to allocate space for.
It was a conscious decision to not mandate that the fdtab fill in the
name itself, as drivers may construct names in any manner of ways other
than having a string on hand. Thus, to prevent a redundant memcpy, a
driver can choose to fill in the field itself.
Andrei Tatar [Wed, 19 Feb 2025 11:54:31 +0000 (12:54 +0100)]
lib/posix-fdtab: Fix wrong return in exec handler
Previously fdtab_handle_execve would return 0 on success, as per common
convention. This is however wrong for event handlers, as these require
specific exit codes on success; in this case UK_EVENT_HANDLED_CONT.
This change fixes this oversight.
Andrei Tatar [Tue, 23 Jan 2024 18:44:06 +0000 (19:44 +0100)]
lib/posix-fdtab: Handle fdtab duplication on clone
This change adds logic to handle fdtab references on clone:
- if CLONE_FILES: child inherits a reference to parent's tab
- if !CLONE_FILES: child allocates new fdtab duplicate of parent's,
populated with new references to the same open file descriptions hosted
by the parent fdtab.
The initial duplication logic is rudimentary and does not provide any
ordering guarantees w.r.t. syscalls modifying the original table (open,
dup, close), under the assumption that it won't trigger race conditions
in the wild. Please revisit if this turns out to be overly optimistic.
Andrei Tatar [Tue, 23 Jan 2024 18:37:36 +0000 (19:37 +0100)]
lib/posix-fdtab: Add value assert in fmap_xchg
This change adds an assert to sanity-check the value to be exchanged
into the fmap. Exchanging in a NULL value is an invalid operation and
breaks the semantics of fmap, leaving the data structure in a
potentially unsound state.
Calling code should never normally do this; the assert serves as extra
precaution for future development.
Andrei Tatar [Tue, 23 Jan 2024 18:30:25 +0000 (19:30 +0100)]
lib/posix-fdtab: Support independent fdtab refs
This change adds the config option LIBPOSIX_FDTAB_MULTITAB that enables
independent per-thread fdtab references. If enabled, each thread will
lookup an fdtab reference in its TLS memory.
The static initial fdtab remains, and continues to be used.
To this end, fdtabs are now refcounted and have non-static lifetimes.
New threads inherit their parent's ref, with other components or
their callbacks responsible for initializing a meaningful value.
The init thread, however, holds a reference to the static init fdtab.
Sergiu Moga [Fri, 14 Feb 2025 14:22:00 +0000 (16:22 +0200)]
lib/ukcpio: Use `uk_syscall_do_` instead of `uk_syscall_r_` symbols
Since `uk_syscall_r_` symbols tend to also invoke the system call enter
and exit tables if the `syscall_shim` library is enabled, replace such
calls with the `uk_syscall_do_` symbol variant which does not involve
any tables.
Checkpatch-Ignore: FUNCTION_ARGUMENTS
Checkpatch-Ignore: AVOID_EXTERNS Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1590
Since the CPIO library depends on having the `uk_syscall_r_` symbols
exported by the VFSCore library, which can only do so if the
`syscall_shim` library is enabled, add a dependency to this said
library.
This will be undone in the near future by the deprecation of VFSCore,
but for now do this so we don't break builds that use CPIO without
`syscall_shim`.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1590
Sergiu Moga [Mon, 17 Feb 2025 13:50:56 +0000 (15:50 +0200)]
lib/vfscore: Register `creat` using the raw variant
The `creat` has been registered as non-raw a long time ago, back
when the registration policy was different. Now, all system calls
should be registered as raw. Do so for `creat` as well and make it
call the `uk_syscall_do_` variant of `open` in order to avoid
invoking the system call enter/exit tables twice.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1591
Sergiu Moga [Fri, 14 Feb 2025 14:22:00 +0000 (16:22 +0200)]
lib/vfscore: Use `uk_syscall_do_` instead of `uk_syscall_r_` symbols
Since `uk_syscall_r_` symbols tend to also invoke the system call enter
and exit tables if the `syscall_shim` library is enabled, replace such
calls with the `uk_syscall_do_` symbol variant which does not involve
any tables.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1591
Michalis Pappas [Fri, 5 Jan 2024 19:09:32 +0000 (20:09 +0100)]
lib/nolibc: Adapt musl imported `signal.h` to nolibc
The replaced values of SIGRTMIN and SIGRTMAX are effectively
the return values of musl's __libc_current_sigrtmin() and
__libc_current_sigrtmax() defined in musl/src/signal/sigrtmin.c
and musl/src/signal/sigrtmax.c
Signed-off-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1244
Michalis Pappas [Mon, 8 Jan 2024 15:57:19 +0000 (16:57 +0100)]
lib/nolibc: Rename musl-imported/arch/aarch64
Rename `musl-imported/arch/aarch64` to `musl-imported/arch/arm64`.
to fix build errors caused by an incorrect include path generated
relative to $(ARCH).
Signed-off-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1244
Add the kernel internal variant of `exit_group`, `pprocess_exit`.
This allows kernel internal code to call this system call's logic
without having the syscall shim wrapper logic intervene.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1585
Sergiu Moga [Fri, 7 Feb 2025 12:27:45 +0000 (14:27 +0200)]
lib/syscall_shim: Avoid nested syscalls within binary syscalls
The binary system call handler calls `uk_syscall6_r_e`, which ends up
calling either the `uk_syscall_r_` or the `uk_syscall_r_e_` variant of
the syscall wrappers, which also iterate through the system call
enter/exit tables. However, the binary system call itself also runs
through these tables already, therefore avoid this from happening twice
by calling the `uk_syscall_do_` and `uk_syscall_do_e` variants of the
system calls through `uk_syscall6_do_e`.
Since either the `uk_syscall_r_`/`uk_syscall_r_e` or
`uk_syscall_e`/`uk_syscall_e_e` end up calling the syscall enter
and exit tables, we need a new, third, variant that guarantees calling
nothing but the actual implementation logic of the syscall itself.
Thus, introduce such a variant: the `uk_syscall_do_` variant. This
will be the exact same as what the `uk_syscall_r_` variant was, previous
to the introduction of the system call tables.
Sergiu Moga [Tue, 28 Jan 2025 19:00:58 +0000 (21:00 +0200)]
lib/syscall_shim: Put bin syscall dbg handler in `syscall_entertab`
Move the debug handler printing to a `syscall_entertab` handler.
Since we want this to be printed on binary system calls exit only, make
sure to check for it.
Lastly, add an assertion for a nested depth of 1. It should be
impossible that this would be different from 1 as it would either mean
a kernel internal system call invoked the binary handler somehow or the
TLS counter nesting variable is corrupted.
Sergiu Moga [Tue, 28 Jan 2025 19:00:58 +0000 (21:00 +0200)]
lib/syscall_shim: Move binary syscall strace to `syscall_exittab`
Move `strace` printing to a `syscall_exittab` handler.
Since we want this to be printed on binary system calls exit only, make
sure to check for it.
Lastly, add an assertion for a nested depth of 1. It should be
impossible that this would be different from 1 as it would either mean
a kernel internal system call invoked the binary handler somehow or the
TLS counter nesting variable is corrupted.
Define two routines tables: syscall_entertab and syscall_exittab.
These tables shall be iterated upon on system call entry and exit
respectively.
By registering into one of these tables, one may be able to have a custom
function called during the entering/exiting of a called system call.
The order these routines are executed in is dictated by their
priority: lower priority means earlier.
Note that there may exist some nested system calls, e.g. system call
handler invoked through a binary system call ending up calling a system
call of its own. To deal with such cases, introduce a TLS variable for
keeping track of when we enter/exit a syscall, binary or native. This way
we are able to let registered handlers know whether they are in a nested
context or not.
Very important is that we must remember to reset this TLS variable to 0
in the context of exiting execve since the process is born anew with a
fresh counter.
Sergiu Moga [Mon, 3 Feb 2025 12:03:44 +0000 (14:03 +0200)]
lib/syscall_shim: Add execenv argument to PRINTD variant
The __UK_SYSCALL_EXECENV_PRINTD variant wrongly expects no execenv
argument despite using an execenv. It's only worked so far because it's
been only used in a spot where there is an execenv variable declared in
the function. Fix this by making __UK_SYSCALL_EXECENV_PRINTD take an
additional execenv argument.
Add the kernel internal variant of `getpid`, `uk_sys_getpid`. This
allows kernel internal code to call this system call's logic without
having the syscall shim wrapper logic intervene.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1583
Add the kernel internal variant of `getppid`, `uk_sys_getppid`. This
allows kernel internal code to call this system call's logic without
having the syscall shim wrapper logic intervene.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1583
Sergiu Moga [Fri, 7 Feb 2025 09:36:29 +0000 (11:36 +0200)]
lib/posix-futex: Use kernel internal `uk_sys_gettid`
Use the kernel internal variant of `gettid`, `uk_sys_gettid`. Unlike
`uk_syscall_r_gettid`, this variant does not involve any syscall shim
wrapper logic, but rather simply calls the system call's logic directly.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1583
Add the kernel internal variant of `gettid`, `uk_sys_gettid`. This
allows kernel internal code to call this system call's logic without
having the syscall shim wrapper logic intervene.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1583
Sergiu Moga [Fri, 7 Feb 2025 10:17:48 +0000 (12:17 +0200)]
lib/vfscore: Use kernel internal `clock_gettime` variant
Use the kernel internal variant of `clock_gettime`,
`uk_sys_clock_gettime`. This helps avoid unnecessary execution of
the syscall wrappers' logic of syscall shim that would have otherwise
been run through `uk_syscall_r_clock_gettime`. Additionally, make
sure to also handle errors of said syscall.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1586
Sergiu Moga [Fri, 14 Feb 2025 09:09:43 +0000 (11:09 +0200)]
lib/ramfs: Explicitly mention that there are no public symbols
`exportsyms.uk` of RAMFS had a `none` dummy string so that no symbols
are exported, otherwise the `objcopy --keep-global-symbols=` would have
failed due to an empty file being given as argument.
However, this is inconsistent with other such files which just had
`# No public symbols` which is just as effective. Thus make it
consistent and do the same here.
Sergiu Moga [Sat, 8 Feb 2025 11:38:30 +0000 (13:38 +0200)]
lib/posix-process: Remove leftover `uk_syscall_` symbols of `clone3`
Commit cee6ce09d720 ("lib/posix-process: remove clone3 syscall")
tried to remove `clone3` but forgot to remove its `uk_syscall_` symbols
from `exportsyms.uk`.
Sergiu Moga [Thu, 6 Feb 2025 09:35:15 +0000 (11:35 +0200)]
lib: Remove unneeded `uk_syscall_*` symbols from all `exportsyms.uk`
The syscall shim library is now able to automatically generate and
export `uk_syscall_*` symbols by preprocessing each library's defined
`UK_SYSCALLS_PROVIDED-*`. Thus, we no longer need to manually add these
in each exportsyms.uk.
Note that in the case of `posix-futex`'s library we had to keep an
exportsyms containing `none` to still avoid having any symbols being
spilled.
Sergiu Moga [Thu, 6 Feb 2025 09:32:38 +0000 (11:32 +0200)]
lib/syscall_shim: Autogenerate `uk_syscall_*` symbols at build time
Introduce a new AWK script that autogenerates an additional, custom
defined, exportsyms that contains all `uk_syscall_*` symbols as per
each library's `UK_SYSCALLS_PROVIDED-*`.
This will help in reducing exportsyms.uk noise from now on as we no
longer have to manually add these supposedly internal symbols.
Sergiu Moga [Thu, 6 Feb 2025 09:26:55 +0000 (11:26 +0200)]
Introduce `EACHOLIB_EXPORTS`
Introduce the equivalent of `EACHOLIB_LOCALS`, but for exportsyms.uk.
Now, one is able to add custom defined/named exportsyms.uk, in addition
to an already existing exportsyms.uk. If no exportsyms.uk exists, then,
unlike `EACHOLIB_LOCALS` which is processed regardless of whether a
localsyms.uk exists or not, `EACHOLIB_EXPORTS` will be ignored.
However, its result files may still be built as per Makefile recipe
dependencies.
Andrei Tatar [Mon, 4 Mar 2024 18:58:36 +0000 (19:58 +0100)]
lib/ukfile: Remove unused parts of pollqueue API
This change removes the following from the pollqueue API:
- uk_pollq_init: initializers & init values fill its usecase better
- uk_pollq_edge*: semantics were unclear & not useful nor used
Add the kernel internal variants of `prlimit64`, `getrlimit` and
`setrlimit`: `uk_sys_prlimit64`, `uk_sys_getrlimit` and
`uk_sys_setrlimit` respectively.
This allows kernel internal code to call these system calls' logic
without having the syscall shim wrapper logic intervene.
Note how only `uk_sys_prlimit64` has been added to `exportsyms.uk`
since the others are defined as inline.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1584
Sergiu Moga [Fri, 7 Feb 2025 10:25:11 +0000 (12:25 +0200)]
lib/ramfs: Use kernel internal `clock_gettime` variant
Use the kernel internal variant of `clock_gettime`,
`uk_sys_clock_gettime`.
This helps avoid unnecessary execution of the syscall wrappers'
logic of syscall shim that would have otherwise been run through
`uk_syscall_r_clock_gettime`.
Lastly, since now RAMFS uses a definition available only through
posix-time, add a dependency to it in the Config.uk. Ideally,
this should have been a depends on HAVE_TIME and an imply, but seeing
that currently this is the only library offering us time services,
do it like this for now.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1587
Andrei Tatar [Thu, 6 Feb 2025 12:49:13 +0000 (13:49 +0100)]
lib/ukfile: API: Change iovec length to size_t
Previously the ukfile API would take the size of a passed iovec as a
signed int, a design oversight copied over from the binary syscall API.
Negative iovec lengths make no sense and should not be exposed by our
internal API.
This change makes iovec lengths unsigned size_t for ukfiles.
The posix-socket socket ops API is similarly changed to use size_t.
External socket implementations will need updating.
Andrei Tatar [Thu, 6 Feb 2025 12:20:44 +0000 (13:20 +0100)]
lib/ukfile: API: Change file offset to size_t
This change makes the ukfile API natively use size_t as type for
file offsets. The previous use of the signed off_t was unwieldy, as
negative file offsets had no meaning or use in the internal API.
Best to correct this mistake earlier than later.
This makes posix-fdio syscalls the only entry points with signed file
offset args, allowing validation to be isolated.
Andrei Tatar [Thu, 6 Feb 2025 15:38:57 +0000 (16:38 +0100)]
lib/posix-fdio: Adopt fch* syscalls from vfscore
This change moves the implementation of the fchmod and fchown syscalls
from vfscore to posix-fdio, allowing shim operation on both ukfiles and
legacy vfscore files.
In addition, this move fixes a file ref leak in vfscore's fchmod due to
a missing fdrop after fd lookup.
Andrei Tatar [Thu, 6 Feb 2025 15:33:03 +0000 (16:33 +0100)]
lib/posix-fdio: Adopt file space mgmt syscalls
This change moves the implementation of the ftruncate and fallocate
syscalls from vfscore to posix-fdio, allowing shim operation on both
ukfiles and legacy vfscore files alike.
In addition, the related fadvise64 syscall is implemented, stubbed out
as success for vfscore files.
Andrei Tatar [Thu, 6 Feb 2025 15:27:58 +0000 (16:27 +0100)]
lib/posix-fdio: Adopt f*sync syscalls from vfscore
This change moves the implementation of the fsync and fdatasync syscalls
from vfscore to posix-fdio, allowing shim operation on both ukfiles and
legacy vfscore files.
Andrei Tatar [Tue, 24 Sep 2024 15:03:34 +0000 (17:03 +0200)]
lib/ukfile: Add try acquire operation
This change adds a try_acquire operation on files which allows callers
holding a weakref to attempt to take a strong ref. This call may fail if
no other strongrefs are currently held (and thus finalizers and the
destructor are scheduled to run).
Andrei Tatar [Tue, 24 Sep 2024 14:54:53 +0000 (16:54 +0200)]
include/uk/weak_refcount: Add try acquire op
This change adds a `try_acquire` operation to strong/weak refcounts,
allowing a caller holding a weakref to attempt to acquire a strong ref.
This call may fail if no other strong references exist.
Andrei Tatar [Thu, 6 Feb 2025 11:28:50 +0000 (12:28 +0100)]
lib/posix-fdio: Fix wrong computation of dev_t
Previously posix-fdio would compute the value of a dev_t from a
major/minor number pair wrong by naive bit shifting. The correct
computation is more involved and should use makedev() defined in
<sys/sysmacros.h>.
This change fixes this oversight, making stat() output correct.
This change adds configuration guards around syscall declarations in
Makefiles that depend on posix-fdtab to be enabled. This allows builds
of these libs with syscall_shim but fdtab disabled, decoupling them from
the fdtab completely.
Andrei Tatar [Thu, 22 Feb 2024 20:32:38 +0000 (21:32 +0100)]
lib/posix-fdio: Remove posix-fdtab dependency
This change removes the hard dependency on posix-fdtab, allowing
posix-fdio to be used with anonymous open file descriptions (i.e., raw
uk_ofiles). It remains selected as a soft dependency by default.
File descriptor related syscalls are now only provided if posix-fdtab is
also selected.
Andrei Tatar [Thu, 22 Feb 2024 20:28:22 +0000 (21:28 +0100)]
lib/posix-fd: Move open file headers into own lib
This changes merges the headers uk/ofile.h and uk/posix-fd.h and moves
them into their own library, providing shared definitions related to
open file descriptions for both posix-fdio and posix-fdtab, without
introducing an undue dedependency between the two.
Andrei Tatar [Thu, 22 Feb 2024 19:40:25 +0000 (20:40 +0100)]
lib/posix-socket: Remove posix-fdtab dependency
This change makes posix-fdtab no longer a hard dependency of
posix-socket, allowing socket operations using the internal API without
needing a userspace-visible fdtab.
Andrei Tatar [Thu, 22 Feb 2024 19:38:00 +0000 (20:38 +0100)]
lib/posix-timerfd: Remove posix-fdtab dependency
This change makes posix-fdtab no longer a hard dependency of
posix-timerfd, allowing non-fd parts of its functionality to work
without the former selected.
Andrei Tatar [Thu, 22 Feb 2024 19:33:58 +0000 (20:33 +0100)]
lib/posix-pipe: Remove posix-fdtab dependency
This change makes posix-fdtab no longer a hard dependency of posix-pipe,
allowing non-fd parts of its functionality to work without the former
selected.
Andrei Tatar [Thu, 22 Feb 2024 19:28:24 +0000 (20:28 +0100)]
lib/posix-eventfd: Remove posix-fdtab dependency
This change makes posix-fdtab no longer a hard dependency of
posix-eventfd, allowing non-fd parts of its functionality to work
without the former selected.
Sergiu Moga [Mon, 3 Feb 2025 09:38:57 +0000 (11:38 +0200)]
lib/ukvmem: Use correct stack guard size macro for tests
During the upstreaming of 5587fd88 (lib/ukvmem: Make stack VMA guards size configurable and end-to-end),
following some renames, the macro used
in the tests was not updated as well, which results in build errors for
the ukvmem tests. Fix this by using the proper macro name.
Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1573
The vfork() syscall is equivalent to calling clone() with the flags
parameter set to CLONE_VM | CLONE_VFORK | SIGCHLD. Update clone() to
support CLONE_VFORK and CLONE_VM. Implement vfork() as a wrapper of
clone().
For more info see vfork(2).
Signed-off-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1386
Michalis Pappas [Wed, 15 Nov 2023 15:24:52 +0000 (16:24 +0100)]
lib/vfscore: Handle CLONE_VM
vfork() sets the CLONE_VM and CLONE_VFORK flags. This triggers an
error in the clone handlers of vfscore as CLONE_FS is not set. Update
the handlers to additionally check against CLONE_VM, as that also
implies that the parent and child share filesystem state.
Signed-off-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1386
Add a field to posix_thread to keep track of its parent. This is
populated during the creation of a posix_thread, and it is used
for deriving the parent's state in execve() / exit().
Signed-off-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1386
Michalis Pappas [Fri, 3 Nov 2023 09:45:59 +0000 (10:45 +0100)]
lib/posix-process: Add posix_thread state
The state provides information on whether a posix_thread is
running, blocked, or exited.
Notice that posix_thread_state is only updated by operations
at the posix_process / posix_thread level and may not
always be in sync with the state of the underlying uk_thread.
This specifically applies to the POSIX_THREAD_RUNNING state,
which may not be accurate e.g. if the underlying uk_thread
blocks at the scheduler due to a lock.
On the other hand, the variants of POSIX_STATE_BLOCKED always
reflect the state of a posix_thread, as it is certain that the
underlying uk_thread will also be blocked from the scheduler.
Given the above, a check against POSIX_STATE_RUNNING should only be
used to check if the state of a posix-thread is not terminated or
blocked.
Signed-off-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1386
Michalis Pappas [Thu, 4 Jan 2024 09:46:12 +0000 (10:46 +0100)]
lib/posix-process: Migrate tid2pprocess / tid2pthread to process.h
Migrate the definitions of tid2pthread() and tid2pprocess() to
the private process.h to make them available to the rest of the
library. This requires to additionally migrate the definitions of
struct posix_process() and struct posix_thread().
Signed-off-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1386
Add definition of ARG_MAX to limits.h. POSIX defines ARG_MAX as the
number of bytes available for the combined arguments and env vars
of a new process. Whether that additionally includes NULL terminator,
pointers, or alignment bytes is IMPLEMENTATION DEFINED.
Signed-off-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Andrei Tatar <andrei@unikraft.io>
GitHub-Closes: #1386
libukbinfmt provides a minimal framework to register handlers of
executable files. Typical examples include binary executables like
ELF objects, or interpreted files like *nix scripts that use the
sha-bang sequence to specify an interpreter.
This commit only implements the functionality required to register
and execute loaders within the kernel's scope. Additional
functionality incl. application support via Linux's `binfmt_misc`
API shall be added as a future extension.
Clang (18) requires a space between identifier and literals for
preprocessor string concatenation. Otherwise, it results in build error.
The error shown is: "C++11 requires a space between literal and
identifier".
Andrei Tatar [Tue, 17 Dec 2024 19:48:25 +0000 (20:48 +0100)]
lib/vfscore: Fix missing syscall declaration
This change adds a declaration for `uk_syscall_r_fstat` as this syscall
is no longer implemented in vfscore and thus no longer implicitly
declared, previously causing a build warning.
GCC 14 and Clang no longer accept implicitly declared functions and will
error out in such situations.
Signed-off-by: Andrei Tatar <andrei@unikraft.io> Reviewed-by: Stefan Jumarea <stefanjumarea02@gmail.com> Reviewed-by: Razvan Deaconescu <razvand@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1556
Michalis Pappas [Wed, 11 Dec 2024 13:39:27 +0000 (14:39 +0100)]
lib/ukrandom: Improve boot messages
Have libukrandom print an info message about the CSPRNG's seed source.
Since initialization of the library is driver-centric, drivers that
fail to probe can't know if the library can be initialize by another
driver, so the only way to know which driver was used at initialization
is to print an informational message from libukrandom.
Add debug messages for improved diagnostics on cmdline and dtb init,
as well as when drivers are ignored if libukrandom has been initialized
already.
Add a warning when seeding via the cmdline and dtb to make sure that
the user is aware that these are potentially insecure methods.
Florian Rommel [Wed, 30 Oct 2024 13:52:37 +0000 (14:52 +0100)]
drivers/ukconsole/vga: Add initial sanity check
On initialization, ensure that the VGA controller is present and in
the expected state, otherwise cancel the initialization and
registration of the VGA driver, but continue booting.
For platforms without an VGA controller (such as firecracker), this
avoids aborts due to an illicit registration of a non-existing VGA
memory region.
Signed-off-by: Florian Rommel <mail@florommel.de> Approved-by: Simon Kuenzer <simon@unikraft.io> Reviewed-by: Simon Kuenzer <simon@unikraft.io>
GitHub-Closes: #1530
Simon Kuenzer [Wed, 11 Dec 2024 21:32:51 +0000 (13:32 -0800)]
lib/ukconsole: Enable as default
This commit enables ukconsole as default. This is something that should
never be done for libraries or options because a fresh and clean
configuration should always lead to a minimal unikernel configuration and
KConfig has no option to "unselect" an setting.
We consider the console as special case because otherwise no kernel
messages are visible for initial configurations.
Simon Kuenzer [Wed, 11 Dec 2024 02:04:21 +0000 (18:04 -0800)]
drivers/xengnttab: Move grant table to own library
This commit moves the code for the Xen grant table from the Xen platform
library to a separate driver library. This follows the goal of decomposing
`libxenplat`. All Xen drivers that require grant table support (`9pfront`,
`blkfront`, `netfront`) will specify their dependency to `libxengnttab`
with `select`. This simplifies the automatic configuration of dependencies.
For example, if an application requires networking and selects `uknetdev`,
`netfront` is automatically added to a build and will automatically resolve
its dependencies.
Simon Kuenzer [Tue, 10 Dec 2024 15:06:22 +0000 (07:06 -0800)]
plat/xen: Export public API of libxenplat
As part of platform rearch, this commit treats `libxenplat` as a regular
driver library that exports an API. Although it is intended that
`libxenplat` will be further decomposed into several driver libraries
(e.g., CPU, memory), this step avoids that other Xen drivers or services
require to cross-include headers and compile definitions from `libxenplat`
which were technically defined as platform-internal. Such an approach
breaks convention of regular libraries and is typical source of problems
that have to do with Makefile processing order and symbol linking errors.
As part of platform re-arch, the definition of a platform-internal scope
is intended to get removed because it caused confusion and restrictions
(e.g., application code was not able to directly access driver APIs that
were within platform scope).
With this change, the compile flag `CONFIG_PARAVIRT` is name spaced to
`XEN_PARAVIRT`, as well.
Simon Kuenzer [Mon, 9 Dec 2024 00:04:47 +0000 (01:04 +0100)]
drivers/xenbus: Properly export public API
This commit cleanly defines and exports the public API of the XenBus
driver. This is done to avoid that other Xen drivers or services
require to cross-include the XenBus headers which were technically
defined as "internal". Such an approach breaks library convention and
is typical source of problems that have to do with Makefile processing
order and symbol linking errors.
Simon Kuenzer [Fri, 20 Sep 2024 12:13:13 +0000 (14:13 +0200)]
plat/xen: Introduce `HAVE_XENBUS`
This commit introduces `HAVE_XENBUS` as replacement for `PLAT_XEN`. This
prepares the Xen drivers for potential re-use by platforms that implement
devices by the Xen standard.
Simon Kuenzer [Thu, 19 Sep 2024 15:25:15 +0000 (17:25 +0200)]
drivers/*: Introduce `HAVE_IBMPC`
This commit introduces the (invisible) feature option `HAVE_IBMPC`, which
describes that a platform uses non-discoverable devices that can be found
under an established address and operated according to the IBM PC/AT
standard.
At the moment this focuses on VGA compatible adapters and UART controllers.
The suboptions `HAVE_IBMPC_NS16550`, `HAVE_IBMPC_VGA` can be used instead
if only a subset of the devices are used by a platform. For example,
`HAVE_IBMPC_NS16550` enables the port-io mode of ns16550: Serial devices
are addressed under well-defined addresses: `0x3f8` (COM1), `0x2f8` (COM2).
Simon Kuenzer [Thu, 19 Sep 2024 15:20:58 +0000 (17:20 +0200)]
drivers/vgacons: Rename `libukconsole_vga` to `libvgacons`
Rename the driver library and driver directory to `libvgacons` to provide
a naming scheme that is more appropriate for general purpose devices and
to highlight that this driver is inteded for text-mode only.
Configuration options are adopted accordingly.
Simon Kuenzer [Thu, 19 Sep 2024 15:40:40 +0000 (17:40 +0200)]
drivers/pl011: Rename `libukconsole_pl011` to `libpl011`
Rename the driver library and driver directory to `libpl011` to provide
a naming scheme that is more appropriate for general purpose devices. The
name `pl011` is considered a precise description of the driver as it is
the name of the actual serial I/O controller.
Configuration options are adopted accordingly.
Simon Kuenzer [Thu, 19 Sep 2024 15:35:41 +0000 (17:35 +0200)]
drivers/ns16650: Rename `libukconsole_ns16550` to `libns16550`
Rename the driver library and driver directory to `libns16550` to provide
a naming scheme that is more appropriate for general purpose devices. The
name `ns16550` is considered a precise description of the driver as it is
the name of the actual serial I/O chip.
Configuration options are adopted accordingly.
Simon Kuenzer [Thu, 19 Sep 2024 14:23:25 +0000 (16:23 +0200)]
drivers/xenemgcon: Depend on `libukconsole`
Makes the Xen emergency console driver dependent on `libukconsole` instead
of selecting it. This fits our current driver model: only include drivers
when there is an application need.