Ian Jackson [Thu, 7 Nov 2019 17:46:48 +0000 (17:46 +0000)]
make-flight: Drop all win10 tests in all flights
These are failing and have been for some time and it does not appear
that anyone has the capability to fix them. Running them in these
circumstances seems wasteful.
Effect is to drop test-*-win10-* jobs (checked with
standalone-generate-dump-flight-runvars).
CC: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Ian Jackson [Wed, 6 Nov 2019 16:49:36 +0000 (16:49 +0000)]
sg-report-host-history: Reduce limit from 2000 to 200
Currently the "sg-report-host-history" part of most flights is taking
an inordinate amount of time. Hours. These are serialised and this
is a big problem, seriously impeding throughput.
Reducing this limit by a factor of 10 will reduce the available
history when we are looking at host-specific problems. It is an
emergency fix.
I am working on an arrangement which will avoid having to rescan all
of history each time and which will instead reuse previous output.
CC: Jürgen Groß <jgross@suse.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 4 Nov 2019 16:50:38 +0000 (16:50 +0000)]
adhoc-revtuple-generator: Bisect over 5000 commits (really)
In e9b0653875b3 we changed one of the `1000' values to `5000'. But
this magic number had been duplicated. Urgh!
The result is that adhoc-revtuple-generator might generate a weirdly
truncated output which causes cs-bisection-stop to fail with messages
like this:
*** not RelvUp at 3d40147282670d597b336be5599b5cc4c2ff7ddd at ./cs-bisection-step line 554.
*** not RelvDown at 2fa3479cfadb0bb3fe694dbfd29f2350eb2570df at ./cs-bisection-step line 554.
*** not RelvUp at 2fa3479cfadb0bb3fe694dbfd29f2350eb2570df at ./cs-bisection-step line 554.
...
Use of uninitialized value in concatenation (.) or string at ./cs-bisection-step line 747.
Should test .
BROKEN see earlier errors. at ./cs-bisection-step line 1454, <SVGI> line 10089.
Fix this by (i) plumbing the magic value we already edited properly
back to the (command-line controlled) global variable (ii) changing
the global variable from 1000 to 5000.
git-grep '\b1000\b' still produces a fair amount of output but most
of it is timeouts, which is fair enough. There is also a flight
count limit in sg-report-flight, which limits how far back it is
willing to look. We don't want to change that here.
With this change, cs-bisection-step on the currently-failing freebsd
build job does this:
Searching for interesting versions
Result found: flight 141420 (pass), for basis pass
Result found: flight 143397 (fail), for basis failure
Need to reproduce basis pass (pass); had 1 already.
Should test 2fa3479cfadb0bb3fe694dbfd29f2350eb2570df.
This looks plausible: it is picking up where it left off before the
basis pass fell over its horizon.
CC: Roger Pau Monné <royger@FreeBSD.org> CC: Jürgen Groß <jgross@suse.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 31 Oct 2019 14:57:45 +0000 (14:57 +0000)]
grub2 setup: Set GRUB_TERMINAL=console, if no other setting
The default for d-i, if it doesn't know better, is to let update-grub
set grub's terminal to `gfxterm'. But in osstest we do not really
ever want to use a graphical console.
Let us discuss some of the cases in a bit more detail:
On UEFI systems with a serial console, the UEFI console ought (and in
our installation is in all cases so far) typically linked to the
serial console. So GRUB_TERMINAL=console would be right for UEFI.
This appears to be correct on the albanas, our one pair of in-service
x86 boxes with a UEFI firmware configuration.
But on x86 systems, we generally pass console=ttyS... arguments on the
d-i command line, and d-i arranges for GRUB_TERMINAL=serial and
appropriate other settings. We already have a workaround that changes
that to "serial console", which is fine whether "console" means a VGA
console we don't look at, or some kind of BIOS console redirection.
This currently works on all our x86 machines including o UEFI.
On our ThunderX (arm64) boxes, `gfxterm' does not work at all.
`console', does, because it goes to the UEFI console which UEFI sends
to the serial port.
The best approach to unpicking this mess seems to be to apply a
default setting of GRUB_TERMINAL=console. The effect of this is to
change `gfxterm' in grub.cfg. In practice all our x86
boxes (including our x86 UEFI boxes, where `console' would work) have
it set to `serial' (modified by us to `serial console') so remain
unchanged.
The net result is that on ARM, we now set `GRUB_TERMINAL=console', and
we now get all of the bootloader serial output on the rochesters.
I have tested this on:
rochester0 - arm64 uefi ThunderX, used not to work
laxton1 - arm64 uefi SoftIron
albana0 - x86 uefi
huxelrebe0 - x86 bios
arndale-westfield - armhf u-boot
cubietruck-gleizes - armhf u-boot
Thanks to Brian Woods for poking at rochester0 and making the key
suggestions.
CC: Brian Woods <brian.woods@xilinx.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien.grall@arm.com> CC: Jürgen Groß <jgross@suse.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 30 Oct 2019 16:04:49 +0000 (16:04 +0000)]
xl guest creation: Pause 10s to work around libxl/blkback races
In ea6626f7edd9eb40a3510eaf6816a77cac4f63d0
guest_prepare_disk: Only do the umount if we set an env var
we removed (in the usual case) a check for the guest disk
already being mounted in dom0 etc. This check is there for
ad-hoc testing.
We removed it because it exposes what we think is an annoying race in
blkback.
Unfortunately this change seems to have made guest-rapid-restart races
worse rather than better. Steps test-* guest-start/debian.repeat seem
to fail a lot more now.
We are in the throes of preparing the Xen 4.13 release. These guest
restart races have existed for a long time.
Bodge this for now by adding a sleep :-/.
We do this in the xl toolstack, during domain creation. And also in
the libvirt toolstack because that uses xl but doesn't inherit the
sleep from the Osstest module.
Release-acked-by: Jürgen Groß <jgross@suse.com> CC: Wei Liu <wl@xen.org> CC: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
We aren't actually interested in bisecting the FreeBSD
version (usually, the anointed version) which was used as the platform
for the failed builds. We are thereby making the assumption that any
build failure (or indeed test failure) is the result in changes to the
recent FreeBSD being actually built or used, not the version being
used as a build host.
Achieve ignoring this by having other_revision_job_suffix return a new
magic new value DISCARD, which all callers must know means `skip
this one'. There are three call sites:
In cs-bisection-step:flight_rmap, we skip those rows in the Perl
loop. (We can't skip them conveniently in the SQL because we can't
refer to the column `othrev'; we'd have to duplicate the expression,
or have a subquery. This doesn't seem likely to matter much.)
In cs-bisection-step:preparejob, we always compare the returned suffix
with a fixed value (which eventually came from the previous call). So
DISCARD will never match. No change is needed here.
In Osstest.pm:main_revision_job_cond, we compare the returned suffix
with ''. Again, it will never match and no change is needed.
I have checked that now a cs-bisection-step run chooses a single
FreeBSD master commit to try to build.
CC: Roger Pau Monné <royger@FreeBSD.org> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 24 Oct 2019 10:46:13 +0000 (11:46 +0100)]
power_cycle_sleep: Change default sleep to 15s
5s is so short that when a host fails to respond we aren't sure if it
was just very idle and ran off its PSU's internal energy storage for
that period.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 23 Oct 2019 14:55:06 +0000 (15:55 +0100)]
make-flight: Drop arm64 with Linux before 4.10
The driver for the laxtons' network cards is not in 4.4 (and that's
quite old). Our ThunderX's may even require something more recent but
we will cross that bridge when we see it.
Effect is to drop the following jobs:
linux-4.1 *arm64*
linux-4.4 *arm64*
linux-4.9 *arm64*
(Checked by eyeballing standalone-generate-dump-flight-runvars diff.)
CC: Julien Grall <julien.grall@arm.com> CC: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Julien Grall <julien.grall@arm.com>
Now we have two lists of things not supported on ARM: one of branches
where that's inherent in the branch somehow, and one for those where
the kernel is simply too old. The latter are going to differ between
armhf and arm64.
No functional change.
(Verified with standalone-generate-dump-flight-runvars.)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 21 Oct 2019 13:23:55 +0000 (14:23 +0100)]
guest_prepare_disk: Only do the umount if we set an env var
This call to guest_umount_lv is here for the benefit of ad-hoc reruns
of (eg) ts-guest-start tidy up any ad-hoc messing about (eg from
earlier runs of ts-debian-fixup or something). It is not needed in
production runs.
Serendipitously, this osstest code discovered a bug in the Linux
blkback: when tearing down, it sets the backend state to 6 before it
has closed the underlying block devices. This ultimately means that
after "xl destroy" or "xl shutdown -w" there is a period when the
guest's open handle onto its storage is still open. This is wrong.
This detection depends on us winning a tricky race. So it shows up in
osstest as a very low probability heisenbug. The bug is currently in
all versions of Linux and causing a bit of a nuisance.
It would be best to add a proper check for this bug. However, this is
quite fiddly: really, it ought to be done as close to the xl command
completion as possible, in the same ssh invocation. That would
involve a fair bit of plumbing and ad-hocery. I don't think that
would be proportionate for such a low-impact bug.
So instead in this patch I just disable this cleanup code in the
troublesome case, unless it is explicitly requested by the user
setting OSSTEST_GUEST_DISK_MOUNT_CLEANUP to a trueish value. (This
would be reasonably convenient for the ad-hoc testing that this call
serves.)
Thanks to Roger for diagnosing the Linux kernel bug.
CC: Jürgen Groß <jgross@suse.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
Ian Jackson [Thu, 3 Oct 2019 09:48:38 +0000 (10:48 +0100)]
Osstest.pm: Fix main_revision_job_cond after 0964bab7a9ea
In
other_revision_job_suffix: Take and pass referring runvar name
we updated main_revision_job_cond to pass a dummy 'x' for the new
parameter. But the parameter is a sql expression, not a value,
and so an extra pair of quotes are needed.
This error broke sg-check-tested and this fix fixes it.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 30 Sep 2019 14:16:04 +0000 (15:16 +0100)]
freebsd build job bisection: add special case
other_revision_job_suffix contains ad-hoc code which returns an
identifier distinguishing certain jobs which are expected to refer to
different revisions within their flight.
Add the special case for freebsdbuildjob's recursion.
After this change we are now willing to tolerate the fact that a
freebsd build job has as input multiple different revisions of
freebsd.
cs-bisection-step has code to avoid creating recursive build jobs: the
created top-level job will therefore reuse the same freebsdbuildjob as
the template. Hopefully that will be the previously anointed one and
still be available.
The bisector wants to repro on the same host as before. This means it
won't necessarily use the most recent pass as the basis build. So
long as the previous build has not been expired, this is fine. It
does involve building an earlier freebsd on a later one but this
should be OK.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 29 Aug 2019 09:12:44 +0000 (10:12 +0100)]
mg-hosts mknetbootdir: Introduce and require -F<firmware>
If one runs
./mg-hosts mknetbootdir HOST
before having sorted out all the host configuration, it uses the
default configuration value for the host's firmware kind, which is
"bios". If the configuration is then changed, things don't work.
This is confusing.
So ask the user to specify one or more -F<firmware>, or -Fany.
CC: Dominic Brekau <dominic.brekau@credativ.de> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 2 Jul 2019 16:26:02 +0000 (17:26 +0100)]
ts-hosts-allocate-Executive: Treat "no suitable host" as starved
In particular, this means that
* platform-* jobs will not cause problems in old Xen branches when
there a platform supports only newer Xen
* commissioning flights will complain less about the architectures
that aren't included in the particular set of hosts
The motivation for this patch, now, is that the first of these applies
to `platform-thunderx', recently introduced in the Xen Project colo.
CC: Julien Grall <julien.grall@arm.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 2 Jul 2019 16:16:47 +0000 (17:16 +0100)]
cr-daily-branch, mfi-common: Use tested seabios and ovmf for -prev
Introduce {TREE,REVISION}_{OVMF,SEABIOS}_PREV, so that -prev builds
use the tested ovmf too. This should be true in all branches,
including xen-unstable. (In the seabios and ovmf branches, there
are no -prev builds.)
Checked with standalone-generate-dump-flight-runvars
and the result is to these runvars
revision_ovmf
revision_ovmf
revision_seabios
revision_seabios
to jobs
build-i386-prev
build-amd64-prev
in the branches
xen-*-testing values are baseline
xen-unstable values are the empty string
The empty string is equivalent to unset: see config in ts-xen-build.
CC: Wei Liu <wl@xen.org> CC: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
The glob syntax here was wrong, and the code cs-adjust-flight did not
handle it properly either. So --rebuild -r has not worked since it
first appeared in: a1e0e5846f7bb7d82a5db1d7cd643b9f5ca1b9a9
mg-repro-flight: Provide --rebuild to make variant build jobs
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New patch
Ian Jackson [Fri, 17 May 2019 17:24:12 +0000 (17:24 +0000)]
mg-repro-setup: Introduce `statictask' variable
We are going to make a mode where we don't set OSSTEST_TASK. The
result is that our subprocesses will do whatever they usually do.
Those are mg-allocate (which would allocate for our static task) and
mg-execute-flight which will make a dynamic task. We must therefore
prevent mg-allocate from running since the allocations would not be
useable for the flight execution.
No functional change yet, since nothing sets statictask=false and
therefore OSSTEST_TASK would always be set.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 30 May 2019 16:47:42 +0000 (17:47 +0100)]
ts-kernel-build: Disable CONFIG_ARCH_QCOM in Xen Project CI
drivers/firmware/qcom_scm.c:469:47: error: passing argument 3 of `dma_alloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]
This is fixed by
firmware: qcom_scm: Use proper types for dma mappings
but this is not present in all relevant stable branches.
We currently have no Qualcomm hardware in the Xen Project test lab so
we do not need this enabled.
CC: Stefano Stabellini <sstabellini@kernel.org> CC: linux-arm-msm@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Stephen Boyd <swboyd@chromium.org> CC: Andy Gross <agross@kernel.org> CC: Bjorn Andersson <bjorn.andersson@linaro.org> CC: Avaneesh Kumar Dwivedi <akdwived@codeaurora.org> Acked-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 17 May 2019 13:39:47 +0000 (13:39 +0000)]
installs: Disable cron
The presence of cron causes leak check failures, since cron may run
processes that the leak checker detects. Disable it, since none of
our installs live long enough for this to matter.
Do this in host_install_postboot_complete since it seems to me like we
don't want this in guests any more than we want it in hosts.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 3 May 2019 13:38:53 +0000 (13:38 +0000)]
cr-daily-branch: seabios: "usually" use xen-tested-master
This branch is supposed to be suitable for all versions of Xen.
Conversely, older versions of seabios do not build on newer
compilers (as provided, eg, in stretch).
So, for "branches" other than xen-unstable and xen-unstable-smoke, use
the usual "determine_version" machinery, which will select
xen-tested-master for branches other than the ovmf branch itself.
No change for the seabios "branch", nor for xen-unstable*. The effect
is to switch xen-*-testing, qemu-*, linux-*, etc., to all use ovmf
xen-tested-master.
Ian Jackson [Thu, 2 May 2019 15:59:16 +0000 (16:59 +0100)]
mg-repro-flight: Provide --rebuild to make variant build jobs too
This allows a single command to repro a particular job with a variety
of different source code.
The implementation technique is:
- run the build job in a separate flight, so that it can run
with a separate task which gives its host up after the build
- do much of the heavy lifting of runvar fiddling etc. in
a new helper routine in cs-adjust-flight
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: Add a missing `continue' (without which everything goes quite wrong)
Ian Jackson [Wed, 1 May 2019 10:43:08 +0000 (11:43 +0100)]
Drop Xen 4.5 and earlier
These releases are out of security support. They are known not to
build on Debian stretch, which is what we are using, and we do not
intend to ever update them to fix that.
Xen 4.6 is also out of security support but we want osstest to be able
to continue to build it so that we can test 4.6->4.7 migration, for
the purposes of testing Xen 4.7, which is still supported right now.
So we have recently applied some build fixes to the 4.6 tree, and for
now we retain 4.6 in osstest so that build fixes applied to
staging-4.6 can propagate to stable-4.6.
CC: committers@xenproject.org Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 26 Apr 2019 14:58:47 +0000 (15:58 +0100)]
cross builds: Build armhf kernels on amd64 hosts
Our armhf hosts are devboards and very slow, as well as scarce. It
takes 17ks or so for a kernel build. This will go *much* faster on
an amd64 box and we have lots of those too.
standalone-generate-dump-flight-runvars shows that the only change is
to change host_arch from armhf to amd64 in build-armhf-pvops jobs.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Julien Grall <julien.grall@arm.com>
---
v2: Fix typo in commit message.
Ian Jackson [Fri, 26 Apr 2019 14:22:55 +0000 (15:22 +0100)]
cross builds: mfi-common: Prepare for kernel cross building
Introduce job_create_build_crossable, which takes a target->host
architecture map in its arguments, and use it for build-kern,
passing an empty architecture map.
Overall functional change is only to add
host_arch=$arch
to the kernel build jobs, which has no ultimate effect because it's
the same as the arch=$arch. (Difference in flight construction
verified with standalone-generate-dump-flight-runvars.)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 26 Apr 2019 13:44:04 +0000 (14:44 +0100)]
arch replumbing: ts-debian-di-install: Use $gho->{Arch}
This is just tidying up. The only effect is that now these would
honour $r{all_guest_arch} as a fallback. But right now,
$r{GUEST_arch} will always be set, and that is what ends up in
$gho->{Arch}.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
This comment was lamenting the very problem we are fixing now. It
would now be possible to test i386->amd64 tools migration, by writing
an appropriate test job with different src_host_arch and
dst_host_arch etc.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 26 Apr 2019 13:27:12 +0000 (14:27 +0100)]
arch replumbing: Replace many $r{arch} with $[g]ho->{Arch}
No functional change with existing flights. But the effect is that
now, generally, ts-* scripts and the support code will honour
host_arch, if it is set, in preference to arch.
This patch contains only replacements of $r{arch} with $ho->{Arch} or
$gho->{Arch}. In fact, perhaps surprisingly, there were no locations
where $gho was wanted rather than $ho (I have double checked this).
Exceptions, where we left $r{arch} alone, are:
* make-flight: a comment, which we are about to deal with;
* ts-kernel-build: we are going to support cross building and
$r{arch} is going to be the architecture of the kernel we want
rather than of the build host.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 26 Apr 2019 13:07:08 +0000 (14:07 +0100)]
TestSupport: target_var: Use host_V for host variables
Change `target_var' to set `IDENT_V' rather than just V. For
compatibility with older flights and older flight construction,
look for plain V too when looking up the variable.
And, we now look at all_host_V before V. This has no functional
change with existing flights, because existing flights only have
all_host_suite
all_host_di_version
all_host_os
and we never set the corresponding V form of those variables.
So with existing flights the only functional change is a change to
synth runvars, to add HOST_ to the name.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 16 Apr 2019 14:30:29 +0000 (15:30 +0100)]
starvation: Abandon jobs which are unreasonably delaying their flight
Sometimes, due to a shortage of available resources, a flight might be
delayed because a handful of jobs are waiting much longer than the
rest. Add a heuristic which causes these jobs to be abandoned.
We consider ourselves starving if we are starving now, based on the
most optimistic start time seen in the last I.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Wed, 18 Jul 2018 12:21:27 +0000 (13:21 +0100)]
starvation: Infrastructure for jobs which are delaying their flights
Provide hostalloc_starvation_* in Osstest::Executive, and a comment
saying what we are going to do. And provide a demo utility which
prints the effect of some particular runvar value on a range of
situations.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Wed, 18 Jul 2018 12:20:39 +0000 (13:20 +0100)]
starvation: Use "starved" for hostalloc_maxwait_max
Previously this was "broken".
We mustn't just call `broken' inside attempt_allocation because that
runs in a db transaction. Instead, we arrange that attempt_allocation
returns 2, which threads its way back out to the return value from
alloc_resources, and then call broken there.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 16 Apr 2019 10:51:16 +0000 (11:51 +0100)]
tcl/JobDB-Executive: Do not squash "starved" status
ts-hosts-allocate is going to set the job status to `starved'
sometimes, and then die. `starved' needs to be added to the list of
job statuses that sg-run-job leaves alone.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 16 Apr 2019 17:46:42 +0000 (18:46 +0100)]
step handling: Preserve step states set by ts-* scripts
sg-run-job would unconditionally set the step state to the value it
calculated, which would usually be `pass' or `fail' or
`broken' (according to the recipe).
Relax this interface somewhat to allow a test script to set the step
status itself: specifically, do not overwrite an existing status of
aborted broken starved
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 18 Apr 2019 16:18:53 +0000 (17:18 +0100)]
starvation: Teach archaeologists about starved job state
sg-report-flight is a bit awkward. It thinks mostly about step
status, not job status. So, when justifying, if we don't find a step,
and the job state is starved, we treat the step as starved.
If there are only starved steps, then we don't have evidence that this
is a regression, because the test wasn't run in the baseline.
If there are other steps we look at those instead.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 18 Apr 2019 12:17:50 +0000 (13:17 +0100)]
starvation: Teach sg-report-flight about starved step state
We are going to use this for situations where the resources to run the
test weren't available. In general we are going to treat this as not
a regression.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 20 Jul 2018 15:11:37 +0000 (16:11 +0100)]
alloc_resources: Support special abandonment values
This gives a way for the caller's $resourcecall to signal something
interesting, back to its main loop. This is useful for calling
broken, for example: that can't be done within $resourcecall because
$resourcecall operates within the allocation db transaction (which
ought to be rolled back...)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 25 Apr 2019 15:04:56 +0000 (16:04 +0100)]
builds: Run i386 builds on amd64 kernels
Most hardware that supports i386 supports amd64 too. When doing
builds we do need the right userland, but we don't actually care what
the kernel is doing. With Linux 32-on-64 is good for that.
Especially, there is a kernel regression (evident in the Debian
stretch kernel, but not present in jessie's) where 32-bit Linux
mismanages the memory on hosts with moderately large amounts of
RAM (ie, significantly more RAM than can be addressed at once),
resulting in what amounts to a near-stall of the paging system. Since
the paging system is used for filesystem writes too, the effect is
that commands run by builds can take totally unreasonable amounts of
time. Ie, this version of Linux is broken when i386 PAE is needed.
In practice this is causing significant trouble in the Xen Project CI.
This kernel bug probably won't affect our test jobs because
(i) we use our own kernels, so we would probably detect this
regression when switching kernel branches etc. (ii) test jobs
run with a dom0_mem setting which avoids the preconditions for the
particular bug.
CC: Juergen Gross <jgross@suse.com> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 25 Apr 2019 14:58:00 +0000 (15:58 +0100)]
Debian: preferred arch: Apply setarch to sshd
Many build systems (including Xen's, and autoconf) use uname to try to
discern the system's architecture. When running i386 userland on an
amd64 kernel, this gives the wrong answer. These build systems then
go off and try to do a sort of cross compile thing, and, generally,
fall over.
The uname -m value (which is what is at issue) is an inherited process
property. Linux provides a utility `setarch' which changes this. We
need to apply this to all builds; and it is not really convenient to
add an adverbial command to every build via the existing ssh build
shell rune mechanisms.
A fairly simple way to get the right behaviour is to wrap sshd
instead. sshd doesn't mind what `personality' it sees. Replacing
/usr/bin/sshd with a wrapper shell script might break
start-stop-daemon's attempts to shut down or restart sshd but we don't
care about that in osstest (certainly not on build installs, where
this feature is to be used).
Nothing uses this yet.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>