Ian Jackson [Tue, 21 May 2019 16:06:50 +0000 (17:06 +0100)]
ts-host-reuse: New script, to do reuse state changes
This will be made part of the test job recipes.
We calculate the sharing scope (sharetype) by reference to a lot of
runvars, etc.
This version of the script is rather far from the finished working
one, but it seems better to preserve the actual history for how it got
the way it is.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 8 Nov 2017 16:29:07 +0000 (16:29 +0000)]
plan_search: Track last sharing state to determine $share_reuse
What matters for the purpose of $share_reuse is not whether the host
is actually being _shared_ (ie, there are other concurrent allocations
and therefore a concurrent Event with Share information). What we
really want to know is whether the *last* use of this host was a
suitable sharing setup - because we actually want to know if we will
be able to skip our setup.
So track that explicitly. (The slightly odd structure, where there
are two loops in one, means that we reset $last_eshare when we go onto
the next $req ie the next host to check.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 27 Oct 2017 16:52:49 +0000 (17:52 +0100)]
host allocation: selecthost: allow sort-of-selection of prospective hosts
If one passes a trueish value for $prospective, selecthost does not
worry about whether any host has actually been selected. It does a
limited amount of prep work.
This will be useful if we want to know some of the non-host-specific
information selecthost computes - in particular, $ho->{Suite} etc.
No functional change with existing callers.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 27 Oct 2017 14:42:39 +0000 (15:42 +0100)]
host allocation: *_shared_mark_ready: Make $sharetype check optional
We are going to want to be able to set shares to other than ready,
without double-checking the sharetype.
The change to the UPDATE statement makes no difference because
resource_check_allocated_core has just got that sharetype out of the
db. (This does remove one safety check against bugs, sadly.)
No functional change for existing callers.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 3 Nov 2017 17:40:42 +0000 (17:40 +0000)]
ts-hosts-allocate-Executive: Fix handling of failed preps for same sharing
This code was previously unreachable. It ought to be executed when
all the shares are allocatable or prep: in that case, we can unshare
and re-share the host.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Mon, 24 Aug 2020 17:54:18 +0000 (18:54 +0100)]
Debian: osstest-erase-other-disks: Slightly guard against races
Apparently it can happen that something decides to rescan a partition
table, removing a partition block device, while it is being zeroed:
osstest-erase-other-disks-6081: hd devices present after: /dev/hd*
osstest-erase-other-disks-6081: Erasing /dev/sda
osstest-erase-other-disks-6081: Erasing /dev/sda1
osstest-erase-other-disks-6081: /dev/sda1 is no longer a block device!
To try to narrow the window during which this race occurs, do not care
if the thing we just zeroed no longer exists after we zeroed it.
We still bomb out if it exists but is not a block device - that would
probably mean we had written it out as a file.
This is all quite unfortunate.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 25 Aug 2020 11:02:13 +0000 (12:02 +0100)]
abolish "kernkind"; desupport non-pvops kernels
This was for distinguishing the old-style Xenolinux kernels from pvops
kernels.
We have not actually tested any non-pvops kernels for a very very long
time. Delete this now because the runvar is slightly in the way of
test host reuse.
(Sorry for the wide CC but it seems better to make sure anyone who
might object can do so.)
All this machinery exists just to configure the guest console
device (Xenolinux used "xvc" rather than "hvc") and the guest root
block device (Xenolinux stole "hda"/"sda" rather than using "xvda").
Specifically, in this commit:
* In what is now target_setup_rootdev_console_inittab, do not
look at any kernkind runvar and simply do what we would if
it were "pvops" or unset, as it is in all current jobs.
* Remove the runvar from all jobs creation and example runes.
(This has no functional change even for jobs running with
the previous osstest code because we have defaulted to "pvops"
for a very long time.)
We retain the setting of the shell variable "kernbuild", because that
ends up in build jobs' names. All our kernel build jobs now end in
-pvops and I intend to retain that name component since abolishing it
is nontrivial.
We move this earlier. This is OK because it depends only on the
console runvar (inside the sub; this is set by target_kernkind_check),
$ho and $gho (which are set by this point); and $mountpoint$ (which is
set by access().
No functional change.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 30 Oct 2017 11:36:16 +0000 (11:36 +0000)]
show_abs_time: Represent undef $timet as <undef>
This can happen, for example, if a badly broken flight has steps which
are STARTING and have NULL in the start time column, and is then
reported using sg-report-flight.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 1 Oct 2020 14:17:44 +0000 (15:17 +0100)]
Tolerate lack of platform-specific hosts in old Xen branches
Right now we have a situation where these can't all be made to work
because because some older Xen branches are hard to make work on
current Debian stable, and we have some hardware (which we have tagged
as specific "platforms") which doesn't work with oldstable.
This seems like a general problem, so fix it this way.
Note that we still treat these failed allocations as failures, so they
are subject to regression analysis and ought not to appear willy-nilly
on existing branches.
Runvar dump shows the addition of this runvar
hostalloc_missing_expected=1
to
qemu-upstream-4.6-testing
xen-4.6-testing
...
qemu-upstream-4.14-testing
xen-4.14-testing
inclusive.
Set MF_SIMULATE_PLATFORMS to a suitable value if it is
not *set*. (Distinguishing unset from set to empty.)
I have verified that this, plus the preceding commits to
cri-getplatforms, produces no change in the output of
MF_SIMULATE_PLATFORMS='' OSSTEST_CONFIG=standalone-config-example eatmydata ./standalone-generate-dump-flight-runvars
Without the MF_SIMULATE_PLATFORMS setting it adds several new jobs to
each flight, name things like this:
test-amd64-$arch1-xl-simplat-$arch2-$suite
The purpose of this right now is to provide a way to dry-run test the
next change.
Ian Jackson [Thu, 1 Oct 2020 15:36:17 +0000 (16:36 +0100)]
cri-getplatforms: Honour new MF_SIMULATE_PLATFORMS env var
This is to be expanded by the shell, using eval, so that it can refer
to $xenarch, $suite and $blessing.
No functional change if this variable is unset, or empty. If it is
set to a single space, cri-getplatforms produces no output (as it does
anyway in standalone mode).
Ian Jackson [Thu, 1 Oct 2020 14:18:39 +0000 (15:18 +0100)]
ts-hosts-allocate-Executive: Allow to tolerate missing resources
Now, a job can specify that lack of a suitable host should be treated
as a plain test failure (ie, subject to the usual regression analysis)
rather than as an infrastructure or configuration problem.
This will be useful for some tests which don't work in some branches
because of lack of suitable hardware. We want to avoid encoding our
hardware availability situation in make-flight.
Ian Jackson [Thu, 1 Oct 2020 16:02:48 +0000 (17:02 +0100)]
sg-run-job: Preserve step state "fail" if set by test script
If the test script exits nonzero but after setting the step status to
'fail', we can leave it that way. This is particularly relevant if
the iffail in the job spec says 'broken' or something. After this
change, a step can decide to override that.
An alternative would be to have the step script exit zero, but of
course that would (generally) leave the job to continue running more
steps!
Ian Jackson [Thu, 24 Sep 2020 16:14:25 +0000 (16:14 +0000)]
TftiDiVersion: Update to latest installer for stretch
The stretch (Debian oldstable) kernel has been updated, causing our
Xen 4.10 tests (which are still using stretch) to break. This update
seems to fix it.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ian Jackson <iwj@xenproject.org>
Ian Jackson [Wed, 19 Aug 2020 14:59:09 +0000 (15:59 +0100)]
schema: Provide index on flights by start time
We often use flight number as a proxy for ordering, but this is not
always appropriate and not always done (and sometimes it's a bit of a
bodge).
Provide an index to find flights by start time. This significantly
speeds up the host allocation $equivstatusq query, and the duration
estimator.
(I have tested this by creating a trial index in the production
database. That index can be dropped again, preferably after this
commit makes it to production.)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 19 Aug 2020 12:00:58 +0000 (13:00 +0100)]
host allocation: Memoise duration estimates
We look at our own branch to estimate durations. If somehow we are
one of multiple concurrent flights on this branch with the appropriate
blessing, we don't mind not noticing the doing of our peer flights so
that if our estimates are a bit out of date.
So it is fine to use an estimate no older than our own runtime.
Right now we generate a new duration estimator during each queueing
round, because it contains a statement handle and we must disconnect
from the db while waiting. So the internal memo table gets thrown
away each time and is useless.
To actually memoise, pass our own hash which lives as long as we do.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 19 Aug 2020 11:13:23 +0000 (12:13 +0100)]
resource allocation: Provide OSSTEST_ALLOC_FAKE_PLAN test facility
Set this variable (to a data-plan.final.pl, say) and it becomes
possible to test host allocation programs without actually allocating
anything and without engaging with the queue system.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 19 Aug 2020 12:05:22 +0000 (13:05 +0100)]
ts-hosts-allocate-Executive: Fix broken call to $duration_estimator
The debug subref is passed to the constructor (and indeed we do that).
The final argument to the actual estimator is $uptoincl_testid (but we
didn't say $will_uptoincl_testid, so it is ignored).
The code was wrong, but with no effect. So no functional change.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 10 Aug 2020 15:57:44 +0000 (16:57 +0100)]
history reporting (nfc): Provide cache_set_task_print
This takes a string which gets added to the cache messages. This
will allow us to distinguish the output from different processes
when using parallel by fork.
Nothing sets this yet.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 10 Aug 2020 15:10:50 +0000 (16:10 +0100)]
sg-report-job-history (nfc): Query hosts runvars in one go
Rather than doing one query for each entry in @hostvarcols, do one
query for all the relevant runvars. This is quite a bit faster and
will enable us to use the cache.
This is correct because @hostvarcols was the union of all the host
runvars, so this produces the same answers as the individual queries.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 10 Aug 2020 14:43:33 +0000 (15:43 +0100)]
sg-report-job-history: Refactor "ALL" handling
* Make an explicit entry ALL in @branches, rather than implicitly
processing ALL as well.
* Consequently, put explicit ALL entries in @tasks too, rather than
putting in entries without a branch name.
* Pass ALL to processjobbranch rather than undef, and turn it into
the internally-used undef at the start.
When used with --flight (findflight), this has no functional change.
When used with --job, ALL must now be included in the branch
list passed to --branches. The only in-tree call is with --flight.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 4 Aug 2020 16:23:18 +0000 (17:23 +0100)]
sg-report-job-history: Use one child per report
Rather than one child per job, which then did one report per branch.
This will mean we can use the cache machinery, which is rather global
so wouldn't cope well with processing multiple job history reports
within a process.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 6 Aug 2020 11:57:31 +0000 (12:57 +0100)]
parallel by fork: Disconnect $dbh_tests as well as undefing it
If the caller is buggy and has statement handles still open, they can
still "work" even if we have thrown away the db handle.
Where, after forking, "work" means "use the same connection in
multiple processes simultaneously, without locking". This could
result in arbitrary crazy nbehaviour (eg, TLS crypto failures).
No functional change with existing callers since they don't have this
bug.
Ian Jackson [Tue, 4 Aug 2020 13:41:09 +0000 (14:41 +0100)]
history reporting: Skip undefined keys
This makes it work if the caller's cached hash contains an key which
is bound to undef.
sg-report-host-history already does this, which currently causes:
Use of uninitialized value $_ in substitution (s///) at Osstest/HistoryReport.pm line 134.
Use of uninitialized value $_ in printf at Osstest/HistoryReport.pm line 135.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>