]> xenbits.xensource.com Git - osstest.git/log
osstest.git
4 years agoflight other job reporting: Put nulls last in the report
Ian Jackson [Thu, 3 Sep 2020 15:33:14 +0000 (16:33 +0100)]
flight other job reporting: Put nulls last in the report

Cosmetic change only, but this makes the results easier to understand.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agosg-report-flight: Improvements to other job (share/reuse) reporting
Ian Jackson [Fri, 2 Oct 2020 15:19:29 +0000 (16:19 +0100)]
sg-report-flight: Improvements to other job (share/reuse) reporting

* Prefer to show "prep" (purple) rather than "share".
* Show our own relationship, in particular to show if it was prep.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agosg-report-flight: Reformat slightly
Ian Jackson [Fri, 2 Oct 2020 15:19:04 +0000 (16:19 +0100)]
sg-report-flight: Reformat slightly

This is more regular and will make the next commit easier to
understand.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agohost reuse: New protocol between sg-run-job and ts-host-reuse
Ian Jackson [Thu, 3 Sep 2020 10:58:30 +0000 (11:58 +0100)]
host reuse: New protocol between sg-run-job and ts-host-reuse

Abolish post-test-ok (which runs only if successful) and replace it
with final (which sets the runvar to indicate finality, and runs
regardless).

This allows a subsequent job which reuses the host to see that this
job had finished using the host.  This is relevant for builds, where a
host can be reused even after a failed job.

"Lies", where we claim the use of the host was done, are
avoided (barring unlikely races) because selecthost de-finalises the
runvar.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agohost reuse: ts-host-reuse: Prepare for argument handling
Ian Jackson [Thu, 3 Sep 2020 10:57:29 +0000 (11:57 +0100)]
host reuse: ts-host-reuse: Prepare for argument handling

No functional change.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agohost reuse: sg-run-job: Reanme post-test-ok parameter
Ian Jackson [Thu, 3 Sep 2020 10:47:55 +0000 (11:47 +0100)]
host reuse: sg-run-job: Reanme post-test-ok parameter

This is more accurate.

No overall functional change.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agoresource reporting: Report host reuse/sharing in job report
Ian Jackson [Fri, 28 Aug 2020 13:38:17 +0000 (14:38 +0100)]
resource reporting: Report host reuse/sharing in job report

Compatibility: in principle this might generate erroneous reports
which omit sharing/reuse information for allocations made by jobs
using older versions of osstest.

However, we do not share or reuse hosts across different osstest
versions, so this cannot occur.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoresource reporting, nfc: split a here document
Ian Jackson [Fri, 28 Aug 2020 13:07:57 +0000 (14:07 +0100)]
resource reporting, nfc: split a here document

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agosg-report-flight: Refactor runvar access
Ian Jackson [Thu, 27 Aug 2020 18:11:37 +0000 (19:11 +0100)]
sg-report-flight: Refactor runvar access

Collect the runvars query into local perl variables.  This will allow
us to reuse the information without going back to the db.

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohost lifecycle: Record lifecycle in db and runvar
Ian Jackson [Tue, 25 Aug 2020 19:13:22 +0000 (20:13 +0100)]
host lifecycle: Record lifecycle in db and runvar

This is just the calls to host_update_lifecycle_info.
Now the db table is Needed.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohost lifecycle: Prevent referential integrity violation
Ian Jackson [Thu, 27 Aug 2020 17:48:36 +0000 (18:48 +0100)]
host lifecycle: Prevent referential integrity violation

We can't use normal constraints for either of these, sadly.

We can make the constraints into a single query which says "OK".

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohost lifecycle: Fix detection of concurrent jobs
Ian Jackson [Wed, 7 Oct 2020 16:36:50 +0000 (17:36 +0100)]
host lifecycle: Fix detection of concurrent jobs

The previous algorithm was wrong here.

This commit was originally considerably later than the previous one.
I'm avoiding squshing this commit, to make future archaeology easier.
The effect of the bug is to report other tasks as live too often, so
hosts show up as shared rather than reused.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agohost lifecycle: Machinery, db, for tracking relevant events
Ian Jackson [Tue, 25 Aug 2020 17:34:42 +0000 (18:34 +0100)]
host lifecycle: Machinery, db, for tracking relevant events

When we reuse test hosts, we want to be able to give a list of the
other jobs which might be responsible for any problem.

In principle it would be possible to do this by digging into the
db's history tables like sg-report-host-history does, but this is
quite slow and also I don't have enough confidence in that approach to
use it for this application.

So instead, track the host lifecycle explicitly.

The approach taken is a hybrid one.  I first considered two and a half
approaches:

 1. Permanently record all host/share allocations and share state
    changes in a host history table.  But it is nontrivial to update
    all the allocation machinery to keep this table up to date.  It is
    also nontrivial to extract the necessary information from such a
    table: the allocation information would have to be correlated,
    using timestamps, with the steps table.  That's slow and complex.
    We had such a table but it was never used for these reasons;
    I dropped that empty table recently.

 1b. Like 1 but explicitly put a lifecycle sequence number in the
    allocations table,.  This would make it easy to find relevant
    events but would involve even more complicated logic during
    allocation.

 2. Record the host's lifecycle information in a file on the host.
    This means it gets wiped whenever the host does and makes finding
    the relevant jobs easy: read the file during logs capture, and
    we'll find everything of relevance.  It then has to be permanently
    stored somewhere it can be used for logging and archaeology: a
    per-job runvar giving the relevant host history, up to the point
    where that job finished. does that job nicely.  However, this
    has a serious problem: if the host crashes hard, we may not be
    able to recover the complete information about why!  We really
    want to the information recorded outside the host in question.

So I've taken a hybrid approach: effectively replicate the per-host
file from (2), but put the information in the database.  This
necessites a call to clear the host lifecycle history, which we make
at the *end* of the host install.  As a bonus this might let us more
easily identify if there are particular jobs that leave hosts in
states that are hard to recover from, and it will make total host
failure quite obvious because the host install log report will have a
list of the failed attempts (longer in each successive job).

For build jobs we only record the setup job, and concurrent jobs, in
the runvar.  This does not seem to have been a problem so far, and
this avoids having to do work on other allocations (eg, mg-allocate).
It also avoids having very long lists of previous builds listed in
every build job.

Test jobs are only shared within a flight and with much more limited
scope so the same considerations don't arise.  But by the same token,
we also do not need to adjust mg-allocate etc., since the user ought
not to allocate shares of test hosts unless they know what they are
doing.

In this commit we introduce:
 * The database table
 * The runvar syntax
 * The function for recording the lifecycle events

We have what amounts to an ad-hoc compression scheme for the
information in the lifecycle runvars.  Otherwise this data might get
quite voluminous, which can makes various other db queries slow.

There isn't a very good way to represent out-of-job tasks in the
lifecycle runvar.  We could maybe put in something from the tasks
table, but the entry in the tasks table might be gone by now and that
would involve quoting (and it might be quite large).

But this will only matter when a shared/reused host has been manually
messed with, and recording the task is sufficient to
 (1) note the fact of such interference
 (2) if the task is static, or still going when the job reports,
      can actually be put in the report.
 (3) failing that provide something which could be grepped for in logs

We do not call the recording function yet, so the db update is merely
Preparatory.

There is a bug in this patch: the calculation of $olive is wrong.
This will be fixed in a moment.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohsot reuse: Make share type hash more easily greppable
Ian Jackson [Fri, 21 Aug 2020 14:44:58 +0000 (15:44 +0100)]
hsot reuse: Make share type hash more easily greppable

Use - and _ to make up the base64 alphabet instead of + and /

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohsot reuse: Hash the share type
Ian Jackson [Fri, 21 Aug 2020 10:25:06 +0000 (11:25 +0100)]
hsot reuse: Hash the share type

We don't really want to duplicate (triplicate, actually) lots of the
runvars.  This will make the runvars table needlessly bloated.

So hash the values.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotest host reuse: Switch to principled sharing scope runvar scheme
Ian Jackson [Fri, 21 Aug 2020 14:22:19 +0000 (15:22 +0100)]
test host reuse: Switch to principled sharing scope runvar scheme

* When selecthost is passed an @host ident, indicating prep work,
  engage restricted runvar access.  If no call to sharing_for_build
  was made, this means it can access only the runvars in
  the default value of @accessible_runvar_pats.

* Make the sharetype for host reuse be based on the values of
  precisely those same runvars, rather than using an adhoc scheme.

The set of covered runvars is bigger now as a result of testing...

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agorunvar access: Introduce effects_gone_before_share_reuse
Ian Jackson [Fri, 21 Aug 2020 11:36:10 +0000 (12:36 +0100)]
runvar access: Introduce effects_gone_before_share_reuse

The syslog server, and its port, is used for things that happen in
this job, but the syslog server is torn down and a new one started,
when the host is reused.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agorunvar access: Introduce sharing_for_build
Ian Jackson [Fri, 21 Aug 2020 11:47:44 +0000 (12:47 +0100)]
runvar access: Introduce sharing_for_build

Builds don't have so much contingent setup.  We don't track the
runvars; we just rely on the share-* hostflag set in the job.

But selecthost() is going to automatically enable runvar access
control for shared/reused hosts.  So, provide a way to disable that.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agorunvar access: Use runvar_glob for dmrestrict runvar search
Ian Jackson [Thu, 20 Aug 2020 16:39:58 +0000 (17:39 +0100)]
runvar access: Use runvar_glob for dmrestrict runvar search

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agorunvar access: Provide runvar_glob
Ian Jackson [Fri, 21 Aug 2020 11:47:02 +0000 (12:47 +0100)]
runvar access: Provide runvar_glob

We will need this because when runvar access is restricted, accessing
via %r directly won't work.  We want to see what patterns the code is
interested in (so that interest in a nonexistent runvar is properly
tracked).

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agorunvar access: Introduce access control machinery
Ian Jackson [Fri, 21 Aug 2020 11:43:31 +0000 (12:43 +0100)]
runvar access: Introduce access control machinery

This will allow us to trap accesses, during test host setup, to
runvars which weren't included in ithe calculation of the sharing
scope.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoTestSupport: Provide runvar_is_synth
Ian Jackson [Thu, 20 Aug 2020 20:32:48 +0000 (21:32 +0100)]
TestSupport: Provide runvar_is_synth

Internally we use an array %r_notsynth.  This allows us to avoid
adding code to store_runvar etc.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agosubst_netboot_template: Do not use all of %r
Ian Jackson [Thu, 20 Aug 2020 16:49:31 +0000 (17:49 +0100)]
subst_netboot_template: Do not use all of %r

Instead of copying all of %r into %v, have the template substitutor
fall back to %r from %v.

This is going to be important when we have host-reuse-related access
control to %r.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohost reuse: Bump host share reuse bonus
Ian Jackson [Wed, 22 Nov 2017 11:39:39 +0000 (11:39 +0000)]
host reuse: Bump host share reuse bonus

In test jobs this is now contending with the variation bonus.

If we fail to vary properly this time, we get another go in the next
flight, so this is not so critical.

This increases the amount of test host reuse.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost reuse: Use literal for the hosts_infraprioritygroup runvar
Ian Jackson [Mon, 24 Aug 2020 11:03:11 +0000 (12:03 +0100)]
host reuse: Use literal for the hosts_infraprioritygroup runvar

At some point this might make the database smarter about indexing.
It's certainly clearer.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohost reuse: Jiggle the infra-priority a bit, within a flight
Ian Jackson [Wed, 22 Nov 2017 11:38:05 +0000 (11:38 +0000)]
host reuse: Jiggle the infra-priority a bit, within a flight

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: Group jobs by their reuse parameters
Ian Jackson [Fri, 17 Nov 2017 16:49:42 +0000 (16:49 +0000)]
host allocation: Group jobs by their reuse parameters

This promotes reuse by arranging that jobs that can reuse a host get
to run consecutively.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost reuse: Reuse test hosts within a flight
Ian Jackson [Tue, 21 May 2019 16:06:24 +0000 (17:06 +0100)]
host reuse: Reuse test hosts within a flight

Mark the host shareable, and unshareable, as appropriate.

There is still a lot more cleanup and improvement to do.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoshared/reuse: Rely on @ for ts-host-ping-check
Ian Jackson [Fri, 21 Aug 2020 16:40:07 +0000 (17:40 +0100)]
shared/reuse: Rely on @ for ts-host-ping-check

Remove the check for SharedReady.

The existence of this check is perplexing.  It was introduced in
  ts-host-ping-check: Do not run if host is being reused
in 8f1dc3f7c401 (from 2015).

At that time we only share build hosts, and build hosts never ran this
script.  So I don't understand what that was hoping to achieve.  Maybe
it made some difference in a now-lost pre-rebase situation.

Anyway, in our current tree I think we want to rerun the
ts-host-ping-check when we reuse a test host.  My change to add @ to
parts of per-host-prep in sg-run-job deliberately omitted the step
with testid host-ping-check-xen/@.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohost reuse: sg-run-job: per-host prep: Use @ for per-host-ts
Ian Jackson [Thu, 20 Aug 2020 14:13:12 +0000 (15:13 +0100)]
host reuse: sg-run-job: per-host prep: Use @ for per-host-ts

These are the steps that will be skipped when we reuse a test host.

No functional change yet since we don't allocate the host shared yet.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoshared/reuse: Use @ for freebsd host prep
Ian Jackson [Tue, 21 May 2019 16:37:42 +0000 (17:37 +0100)]
shared/reuse: Use @ for freebsd host prep

These are all the relevant call sites for ts-freebsd-host-install and
ts-freebsd-build-prep.  (There's a ts-freebsd-host-install in
ts-memdisk-try-append but that's for host examination and does not
uee or want sharing or reuse.)

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoshared/reuse: Use @ for ts-host-install
Ian Jackson [Mon, 30 Oct 2017 18:09:41 +0000 (18:09 +0000)]
shared/reuse: Use @ for ts-host-install

Pass @ from sg-run-job.  These are all the call sites for
ts-host-install-*, so we can lose the open-coded test for SharedReady.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoshared/reuse: Use @ for ts-xen-build-prep
Ian Jackson [Wed, 22 May 2019 15:44:40 +0000 (16:44 +0100)]
shared/reuse: Use @ for ts-xen-build-prep

Pass @ from sg-run-job.  This is the only call site for
ts-xen-build-prep, so it can lose the open-coded test for SharedReady.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agosg-run-job: Detect improper use of @ iffail with run-ts
Ian Jackson [Wed, 22 May 2019 16:42:16 +0000 (17:42 +0100)]
sg-run-job: Detect improper use of @ iffail with run-ts

Only per-host-ts understands this.  This is a bit of a bear trap, so
arrange to bail rather than putting strange step status values with
`@' at the front in the database...

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agosg-run-job: New @ iffail tag for prep tasks
Ian Jackson [Wed, 22 May 2019 15:34:42 +0000 (16:34 +0100)]
sg-run-job: New @ iffail tag for prep tasks

Currently no users sites, so no functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agots-hosts-allocate-Executive print sharing info in debug output
Ian Jackson [Mon, 6 Nov 2017 18:07:39 +0000 (18:07 +0000)]
ts-hosts-allocate-Executive print sharing info in debug output

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: selecthost(): Support @IDENT for reuse
Ian Jackson [Tue, 21 May 2019 16:30:43 +0000 (17:30 +0100)]
host allocation: selecthost(): Support @IDENT for reuse

This is the first part of a central way to control host reuse, rather
than having to write code in each ts-* script to check Shared etc.

No functional change with existing callers.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agots-host-reuse: Add some missing runvars to the host sharing control
Ian Jackson [Mon, 20 Nov 2017 16:12:56 +0000 (16:12 +0000)]
ts-host-reuse: Add some missing runvars to the host sharing control

Add some missing runvars to the host sharing control.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agots-host-reuse: Do not depend on bios
Ian Jackson [Mon, 20 Nov 2017 16:07:32 +0000 (16:07 +0000)]
ts-host-reuse: Do not depend on bios

Weirdly, this is only used for guests.  Really, it should be a
target_var, not a raw runvar applying to all guests, since it can be
guest-specific.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agots-host-reuse: tolerate unremoveable lv
Ian Jackson [Fri, 17 Nov 2017 14:05:34 +0000 (14:05 +0000)]
ts-host-reuse: tolerate unremoveable lv

It might be a symlink in the pair tests.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agots-host-reuse: New script, to do reuse state changes
Ian Jackson [Tue, 21 May 2019 16:06:50 +0000 (17:06 +0100)]
ts-host-reuse: New script, to do reuse state changes

This will be made part of the test job recipes.

We calculate the sharing scope (sharetype) by reference to a lot of
runvars, etc.

This version of the script is rather far from the finished working
one, but it seems better to preserve the actual history for how it got
the way it is.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agots-hosts-allocate-Executive: Better message for hosts abandoned mid-test
Ian Jackson [Mon, 6 Nov 2017 17:23:34 +0000 (17:23 +0000)]
ts-hosts-allocate-Executive: Better message for hosts abandoned mid-test

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoresource reporting, nfc: Break out report_rogue_task_description
Ian Jackson [Fri, 28 Aug 2020 15:53:18 +0000 (16:53 +0100)]
resource reporting, nfc: Break out report_rogue_task_description

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoresource reporting: Print username when listing "rogue tasks"
Ian Jackson [Fri, 28 Aug 2020 15:45:53 +0000 (16:45 +0100)]
resource reporting: Print username when listing "rogue tasks"

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoplan_search: Track last sharing state to determine $share_reuse
Ian Jackson [Wed, 8 Nov 2017 16:29:07 +0000 (16:29 +0000)]
plan_search: Track last sharing state to determine $share_reuse

What matters for the purpose of $share_reuse is not whether the host
is actually being _shared_ (ie, there are other concurrent allocations
and therefore a concurrent Event with Share information).  What we
really want to know is whether the *last* use of this host was a
suitable sharing setup - because we actually want to know if we will
be able to skip our setup.

So track that explicitly.  (The slightly odd structure, where there
are two loops in one, means that we reset $last_eshare when we go onto
the next $req ie the next host to check.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoplan search: Move $share_compat_ok further up the file
Ian Jackson [Wed, 8 Nov 2017 16:43:34 +0000 (16:43 +0000)]
plan search: Move $share_compat_ok further up the file

We are going to want to use this outside the loop.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoplan_search: Use plan's Wear information rather than tracking it ourselves
Ian Jackson [Wed, 8 Nov 2017 16:39:37 +0000 (16:39 +0000)]
plan_search: Use plan's Wear information rather than tracking it ourselves

There is no reason not to use this information from the plan.
Not computing it ourselves saves some confusing logic here.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoplan_search: Improve debugging of $share_compat_ok->()
Ian Jackson [Wed, 8 Nov 2017 16:36:07 +0000 (16:36 +0000)]
plan_search: Improve debugging of $share_compat_ok->()

No change other than to debugging output.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoplan_search: Break out $share_compat_ok
Ian Jackson [Wed, 8 Nov 2017 16:16:29 +0000 (16:16 +0000)]
plan_search: Break out $share_compat_ok

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: *_shared_mark_ready: Only prod when $newstate is ready
Ian Jackson [Mon, 30 Oct 2017 17:25:43 +0000 (17:25 +0000)]
host allocation: *_shared_mark_ready: Only prod when $newstate is ready

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: Support new reuse-* magic hostflag
Ian Jackson [Mon, 30 Oct 2017 16:33:50 +0000 (16:33 +0000)]
host allocation: Support new reuse-* magic hostflag

This is like share-* except it has different MaxTasks and MaxWear
parameters.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: *_shared_mark_ready: allow alternative $oldtypes
Ian Jackson [Fri, 27 Oct 2017 17:23:41 +0000 (18:23 +0100)]
host allocation: *_shared_mark_ready: allow alternative $oldtypes

$oldtype may now be a hashref, where keys mapping to truthy values are
permitted for the sharetype precondition.

No functional change for existing callers.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: selecthost: allow sort-of-selection of prospective hosts
Ian Jackson [Fri, 27 Oct 2017 16:52:49 +0000 (17:52 +0100)]
host allocation: selecthost: allow sort-of-selection of prospective hosts

If one passes a trueish value for $prospective, selecthost does not
worry about whether any host has actually been selected.  It does a
limited amount of prep work.

This will be useful if we want to know some of the non-host-specific
information selecthost computes - in particular, $ho->{Suite} etc.

No functional change with existing callers.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: *_shared_mark_ready: Make $sharetype check optional
Ian Jackson [Fri, 27 Oct 2017 14:42:39 +0000 (15:42 +0100)]
host allocation: *_shared_mark_ready: Make $sharetype check optional

We are going to want to be able to set shares to other than ready,
without double-checking the sharetype.

The change to the UPDATE statement makes no difference because
resource_check_allocated_core has just got that sharetype out of the
db.  (This does remove one safety check against bugs, sadly.)

No functional change for existing callers.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: *_shared_mark_ready: Allow other states
Ian Jackson [Fri, 27 Oct 2017 14:41:31 +0000 (15:41 +0100)]
host allocation: *_shared_mark_ready: Allow other states

Generalise these functions so they can set the state to something
other than `ready', and so that they can expect a state other than
`prep'.

No functional change with existing callers.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agodb_retry: Make the sleeps random and increasing
Ian Jackson [Tue, 21 Nov 2017 17:18:09 +0000 (17:18 +0000)]
db_retry: Make the sleeps random and increasing

When there's a thundering herd, this can run out of retries.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agosg-run-job: Use +! in per-host-ts implementation
Ian Jackson [Wed, 22 May 2019 15:34:11 +0000 (16:34 +0100)]
sg-run-job: Use +! in per-host-ts implementation

This makes this slightly clearer, even more so in a moment.

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agosg-run-job: support +! for *only* adding things to TESTID
Ian Jackson [Tue, 21 May 2019 15:43:51 +0000 (16:43 +0100)]
sg-run-job: support +! for *only* adding things to TESTID

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agots-hosts-allocate-Executive: Fix handling of failed preps for same sharing
Ian Jackson [Fri, 3 Nov 2017 17:40:42 +0000 (17:40 +0000)]
ts-hosts-allocate-Executive: Fix handling of failed preps for same sharing

This code was previously unreachable.  It ought to be executed when
all the shares are allocatable or prep: in that case, we can unshare
and re-share the host.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: Executive: Honour $xparams{InfraPriority}
Ian Jackson [Fri, 17 Nov 2017 15:33:02 +0000 (15:33 +0000)]
host allocation: Executive: Honour $xparams{InfraPriority}

And pass it to ms-queuedaemon.  No functional change with existing
callers since no-one sets this yet.

Forthcoming test host sharing machinery uses this.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost allocation: Remove some unnecessary definedness tests
Ian Jackson [Fri, 17 Nov 2017 15:31:41 +0000 (15:31 +0000)]
host allocation: Remove some unnecessary definedness tests

$set_info->() already checkes for undef, and returns immediately in
that case.  So there is no point checking at the call site.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoDebian: osstest-erase-other-disks: Slightly guard against races
Ian Jackson [Mon, 24 Aug 2020 17:54:18 +0000 (18:54 +0100)]
Debian: osstest-erase-other-disks: Slightly guard against races

Apparently it can happen that something decides to rescan a partition
table, removing a partition block device, while it is being zeroed:

 osstest-erase-other-disks-6081: hd devices present after: /dev/hd*
 osstest-erase-other-disks-6081: Erasing /dev/sda
 osstest-erase-other-disks-6081: Erasing /dev/sda1
 osstest-erase-other-disks-6081: /dev/sda1 is no longer a block device!

To try to narrow the window during which this race occurs, do not care
if the thing we just zeroed no longer exists after we zeroed it.

We still bomb out if it exists but is not a block device - that would
probably mean we had written it out as a file.

This is all quite unfortunate.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoabolish "kernkind"; desupport non-pvops kernels
Ian Jackson [Tue, 25 Aug 2020 11:02:13 +0000 (12:02 +0100)]
abolish "kernkind"; desupport non-pvops kernels

This was for distinguishing the old-style Xenolinux kernels from pvops
kernels.

We have not actually tested any non-pvops kernels for a very very long
time.  Delete this now because the runvar is slightly in the way of
test host reuse.

(Sorry for the wide CC but it seems better to make sure anyone who
might object can do so.)

All this machinery exists just to configure the guest console
device (Xenolinux used "xvc" rather than "hvc") and the guest root
block device (Xenolinux stole "hda"/"sda" rather than using "xvda").

Specifically, in this commit:
 * In what is now target_setup_rootdev_console_inittab, do not
   look at any kernkind runvar and simply do what we would if
   it were "pvops" or unset, as it is in all current jobs.
 * Remove the runvar from all jobs creation and example runes.
   (This has no functional change even for jobs running with
   the previous osstest code because we have defaulted to "pvops"
   for a very long time.)

We retain the setting of the shell variable "kernbuild", because that
ends up in build jobs' names.  All our kernel build jobs now end in
-pvops and I intend to retain that name component since abolishing it
is nontrivial.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Juergen Gross <jgross@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wei.liu@kernel.org>
CC: Paul Durrant <paul@xen.org>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Juergen Gross <jgross@suse.com>
CC: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
CC: Andrew Cooper <Andrew.Cooper3@citrix.com>
CC: Olivier Lambert <olivier.lambert@vates.fr>
4 years agotarget setup refactoring: Add a doc comment
Ian Jackson [Tue, 25 Aug 2020 11:08:42 +0000 (12:08 +0100)]
target setup refactoring: Add a doc comment

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotarget setup refactoring: Merge target_kernkind_*
Ian Jackson [Tue, 25 Aug 2020 11:00:47 +0000 (12:00 +0100)]
target setup refactoring: Merge target_kernkind_*

Combine these two functions.  Rename them to a name which doesn't
mention "kernkind".

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotarget setup refactoring: Move target_kernkind_console_inittab
Ian Jackson [Tue, 25 Aug 2020 10:51:27 +0000 (11:51 +0100)]
target setup refactoring: Move target_kernkind_console_inittab

We move this earlier.  This is OK because it depends only on the
console runvar (inside the sub; this is set by target_kernkind_check),
$ho and $gho (which are set by this point); and $mountpoint$ (which is
set by access().

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agotarget setup refactoring: Move target_kernkind_check
Ian Jackson [Tue, 25 Aug 2020 10:49:08 +0000 (11:49 +0100)]
target setup refactoring: Move target_kernkind_check

This is OK because nothing in access() looks at the rootdev or console
runvars, which are what target_kernkind_check sets.

No functional change other than perhaps to log output.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agocr-publish-flight-logs: Fix abs_time calls
Ian Jackson [Mon, 24 Aug 2020 11:00:16 +0000 (12:00 +0100)]
cr-publish-flight-logs: Fix abs_time calls

There was a missing space in these messages, since they were
introduced in 31b7cae19fe1
  timing traces: cr-publish-flight-logs: Report more progress

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohsot reuse: ms-planner: Abbreviate reporting of test shares
Ian Jackson [Fri, 4 Sep 2020 20:58:51 +0000 (21:58 +0100)]
hsot reuse: ms-planner: Abbreviate reporting of test shares

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohost reuse: ms-planner: Do not show reuse as shared in the plan
Ian Jackson [Mon, 30 Oct 2017 16:52:24 +0000 (16:52 +0000)]
host reuse: ms-planner: Do not show reuse as shared in the plan

If the number of shares is 1, do not show it as shared, and also
ignore the Unshare events.

This clarifies the display, especially when used with forthcoming test
host reuse work.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agohost reuse: ms-planner: Bring some variables forward
Ian Jackson [Mon, 30 Oct 2017 16:50:20 +0000 (16:50 +0000)]
host reuse: ms-planner: Bring some variables forward

Move the scope of $share earlier in cmd_show_html, and also introduce
$shared in the colour computation.  This makes the next changes easier.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agots-hosts-allocate-Executive: Add a comment about a warning
Ian Jackson [Fri, 3 Nov 2017 17:40:30 +0000 (17:40 +0000)]
ts-hosts-allocate-Executive: Add a comment about a warning

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoshow_abs_time: Represent undef $timet as <undef>
Ian Jackson [Mon, 30 Oct 2017 11:36:16 +0000 (11:36 +0000)]
show_abs_time: Represent undef $timet as <undef>

This can happen, for example, if a badly broken flight has steps which
are STARTING and have NULL in the start time column, and is then
reported using sg-report-flight.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agosg-run-job: Improve some internal API docs
Ian Jackson [Fri, 2 Oct 2020 15:00:28 +0000 (16:00 +0100)]
sg-run-job: Improve some internal API docs

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agosg-run-job: Minor whitespace (formatting) changes
Ian Jackson [Tue, 21 May 2019 16:35:23 +0000 (17:35 +0100)]
sg-run-job: Minor whitespace (formatting) changes

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoREADME.planner: Document magic job hostflags
Ian Jackson [Mon, 30 Oct 2017 16:32:27 +0000 (16:32 +0000)]
README.planner: Document magic job hostflags

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoExecutive.pm planner: fix typo
Ian Jackson [Mon, 6 Nov 2017 18:07:24 +0000 (18:07 +0000)]
Executive.pm planner: fix typo

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
4 years agoms-queuedaemon: Update for newer Tcl's socket channel ids
Ian Jackson [Fri, 21 Aug 2020 10:37:51 +0000 (11:37 +0100)]
ms-queuedaemon: Update for newer Tcl's socket channel ids

Now we have things like "sock55599edaf050" where previously we had
something like "sock142".  So the output is misaligned.

Bump the sizes.  And with these longer names, when showing the front
of the queue only print the full first entry and the start of the next
one.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoTolerate lack of platform-specific hosts in old Xen branches
Ian Jackson [Thu, 1 Oct 2020 14:17:44 +0000 (15:17 +0100)]
Tolerate lack of platform-specific hosts in old Xen branches

Right now we have a situation where these can't all be made to work
because because some older Xen branches are hard to make work on
current Debian stable, and we have some hardware (which we have tagged
as specific "platforms") which doesn't work with oldstable.

This seems like a general problem, so fix it this way.

Note that we still treat these failed allocations as failures, so they
are subject to regression analysis and ought not to appear willy-nilly
on existing branches.

Runvar dump shows the addition of this runvar
   hostalloc_missing_expected=1
to
   qemu-upstream-4.6-testing
   xen-4.6-testing
   ...
   qemu-upstream-4.14-testing
   xen-4.14-testing
inclusive.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agostandalone-generate-dump-flight-runvars: Simulate cri-getplatforms
Ian Jackson [Thu, 1 Oct 2020 15:36:29 +0000 (16:36 +0100)]
standalone-generate-dump-flight-runvars: Simulate cri-getplatforms

Set MF_SIMULATE_PLATFORMS to a suitable value if it is
not *set*.  (Distinguishing unset from set to empty.)

I have verified that this, plus the preceding commits to
cri-getplatforms, produces no change in the output of
  MF_SIMULATE_PLATFORMS='' OSSTEST_CONFIG=standalone-config-example eatmydata ./standalone-generate-dump-flight-runvars

Without the MF_SIMULATE_PLATFORMS setting it adds several new jobs to
each flight, name things like this:
  test-amd64-$arch1-xl-simplat-$arch2-$suite

The purpose of this right now is to provide a way to dry-run test the
next change.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agocri-getplatforms: Honour new MF_SIMULATE_PLATFORMS env var
Ian Jackson [Thu, 1 Oct 2020 15:36:17 +0000 (16:36 +0100)]
cri-getplatforms: Honour new MF_SIMULATE_PLATFORMS env var

This is to be expanded by the shell, using eval, so that it can refer
to $xenarch, $suite and $blessing.

No functional change if this variable is unset, or empty.  If it is
set to a single space, cri-getplatforms produces no output (as it does
anyway in standalone mode).

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agocri-getplatforms: Give names to xenarch and suite
Ian Jackson [Thu, 1 Oct 2020 15:35:56 +0000 (16:35 +0100)]
cri-getplatforms: Give names to xenarch and suite

No functional change.  This will be useful in a moment.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agots-hosts-allocate-Executive: Allow to tolerate missing resources
Ian Jackson [Thu, 1 Oct 2020 14:18:39 +0000 (15:18 +0100)]
ts-hosts-allocate-Executive: Allow to tolerate missing resources

Now, a job can specify that lack of a suitable host should be treated
as a plain test failure (ie, subject to the usual regression analysis)
rather than as an infrastructure or configuration problem.

This will be useful for some tests which don't work in some branches
because of lack of suitable hardware.  We want to avoid encoding our
hardware availability situation in make-flight.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agosg-run-job: Preserve step state "fail" if set by test script
Ian Jackson [Thu, 1 Oct 2020 16:02:48 +0000 (17:02 +0100)]
sg-run-job: Preserve step state "fail" if set by test script

If the test script exits nonzero but after setting the step status to
'fail', we can leave it that way.  This is particularly relevant if
the iffail in the job spec says 'broken' or something.  After this
change, a step can decide to override that.

An alternative would be to have the step script exit zero, but of
course that would (generally) leave the job to continue running more
steps!

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agostandalone: Use mkdir -p
Ian Jackson [Thu, 1 Oct 2020 14:18:33 +0000 (15:18 +0100)]
standalone: Use mkdir -p

These two mkdir calls could fail if
standalone-generate-dump-flight-runvars is run without a log
directory, because they were not concurrency-correct.

mkdir -p should fix that.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agoExecutive: Fix an undef warning message
Ian Jackson [Thu, 1 Oct 2020 14:08:29 +0000 (15:08 +0100)]
Executive: Fix an undef warning message

$onhost can be undef too

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agoUpdate TftpDiVersion_buster
Ian Jackson [Mon, 28 Sep 2020 12:05:52 +0000 (13:05 +0100)]
Update TftpDiVersion_buster

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoTftiDiVersion: Update to latest installer for stretch
Ian Jackson [Thu, 24 Sep 2020 16:14:25 +0000 (16:14 +0000)]
TftiDiVersion: Update to latest installer for stretch

The stretch (Debian oldstable) kernel has been updated, causing our
Xen 4.10 tests (which are still using stretch) to break.  This update
seems to fix it.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agoTCP fix: Do not wait for ownerdaemon to speak
Ian Jackson [Mon, 28 Sep 2020 11:43:30 +0000 (12:43 +0100)]
TCP fix: Do not wait for ownerdaemon to speak

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoTCP fix: Do not wait for queuedaemon to speak
Ian Jackson [Mon, 28 Sep 2020 11:41:13 +0000 (12:41 +0100)]
TCP fix: Do not wait for queuedaemon to speak

This depends on the preceding daemonlib patch and an ms-queuedaemon
restart.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agodaemonlib: Provide a "noop" command
Ian Jackson [Mon, 28 Sep 2020 11:37:33 +0000 (12:37 +0100)]
daemonlib: Provide a "noop" command

We are going to want clients to speak before waiting for the server
banner.  A noop command is useful for that.

Putting this here makes it apply to both ownerdaemon and queuedaemon.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoschema: Provide index on flights by start time
Ian Jackson [Wed, 19 Aug 2020 14:59:09 +0000 (15:59 +0100)]
schema: Provide index on flights by start time

We often use flight number as a proxy for ordering, but this is not
always appropriate and not always done (and sometimes it's a bit of a
bodge).

Provide an index to find flights by start time.  This significantly
speeds up the host allocation $equivstatusq query, and the duration
estimator.

(I have tested this by creating a trial index in the production
database.  That index can be dropped again, preferably after this
commit makes it to production.)

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agots-hosts-allocate-Executive: Do a pre-check
Ian Jackson [Wed, 19 Aug 2020 11:52:04 +0000 (12:52 +0100)]
ts-hosts-allocate-Executive: Do a pre-check

Call attempt_allocation with an empty plan and $mayalloc=0.

In the usual case this will arrange to prime our memoisation caches
before we get involved with the queueing system.

It will also arrange for various errors to be reported sooner.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: Improved error and result handling

4 years agohost allocation: Memoise $equivstatus query results
Ian Jackson [Wed, 19 Aug 2020 12:09:49 +0000 (13:09 +0100)]
host allocation: Memoise $equivstatus query results

This provides a very significant speedup.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agohost allocation: Memoise duration estimates
Ian Jackson [Wed, 19 Aug 2020 12:00:58 +0000 (13:00 +0100)]
host allocation: Memoise duration estimates

We look at our own branch to estimate durations.  If somehow we are
one of multiple concurrent flights on this branch with the appropriate
blessing, we don't mind not noticing the doing of our peer flights so
that if our estimates are a bit out of date.

So it is fine to use an estimate no older than our own runtime.

Right now we generate a new duration estimator during each queueing
round, because it contains a statement handle and we must disconnect
from the db while waiting.  So the internal memo table gets thrown
away each time and is useless.

To actually memoise, pass our own hash which lives as long as we do.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoduration estimates: Memoise results
Ian Jackson [Wed, 19 Aug 2020 11:55:20 +0000 (12:55 +0100)]
duration estimates: Memoise results

The caller may provide a memoisation hash.  If they don't we embed
one in the estimator.

The estimator contains a db statement handle so shouldn't be so
long-lived that this gives significantly wrong answers.

I am aiming this work at ts-hosts-allocate-Executive, but it is
possible that this might speed up sg-report-flight.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoresource allocation: Provide OSSTEST_ALLOC_FAKE_PLAN test facility
Ian Jackson [Wed, 19 Aug 2020 11:13:23 +0000 (12:13 +0100)]
resource allocation: Provide OSSTEST_ALLOC_FAKE_PLAN test facility

Set this variable (to a data-plan.final.pl, say) and it becomes
possible to test host allocation programs without actually allocating
anything and without engaging with the queue system.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agots-hosts-allocate-Executive: Fix broken call to $duration_estimator
Ian Jackson [Wed, 19 Aug 2020 12:05:22 +0000 (13:05 +0100)]
ts-hosts-allocate-Executive: Fix broken call to $duration_estimator

The debug subref is passed to the constructor (and indeed we do that).
The final argument to the actual estimator is $uptoincl_testid (but we
didn't say $will_uptoincl_testid, so it is ignored).

The code was wrong, but with no effect.  So no functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agosg-report-job-history: Increase default limit
Ian Jackson [Mon, 10 Aug 2020 16:36:35 +0000 (17:36 +0100)]
sg-report-job-history: Increase default limit

Now this is a *lot* faster, we can print a lot more history.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agosg-report-job-history: Provide --time-limit
Ian Jackson [Mon, 10 Aug 2020 16:32:58 +0000 (17:32 +0100)]
sg-report-job-history: Provide --time-limit

Calculate a minflight based on the time limit, and set the time limit
to a year ago by default.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>