Wei Liu [Tue, 26 Jul 2016 11:16:29 +0000 (12:16 +0100)]
Introduce ts-xtf-run
This is the main script for running XTF. It will first perform
selftest, and then run each XTF test case as a substep.
It does the following things:
1. Run self tests for individual environment and record the result.
2. Collect tests according to available environments.
3. Run the collected tests one by one.
The script may exit early if it detects the test host is down or
xtf-runner returns non-recognisable exit code.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Thu, 21 Jul 2016 14:37:48 +0000 (15:37 +0100)]
ts-xen-build: always compile in FEP support
By default FEP depends on debug flag. When we are near release the debug
flag will be turned off. In order to test a release build, we explicitly
enable FEP in build configuration.
Since we target Xen versions that already have Kconfig support, only a
Kconfig option is created for now.
We can easily add config option for older Xen when necessary.
Note that this only compiles in FEP support. To enable it a user needs
to explicitly specify fep=1 in hypervisor command line.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 7 Jul 2016 18:35:12 +0000 (19:35 +0100)]
Executive: Support substeps
ts-* scripts can now create `substeps'. For the purposes of
archaeology etc., a substep is just like a step. But it does
correspond to a single specific ts-* invocation.
Instead, it is started and finished explicitly as required.
The whole job implementation code needs to explicitly assign a unique
stable testid to each substep.
The `script' parameter is stored in the `step' field in the database,
which is used only for reporting. These do not need to be unique.
All substeps started are should also be finished, by the end of the
job. If this is not done, the job will be regarded as broken (if it
is not already failed or aborted). (But a substep might be finished
by a different ts-* script to the one that started it.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Wed, 6 Jul 2016 14:22:21 +0000 (15:22 +0100)]
ts-hosts-allocate-Executive: Support diverse-CLASS hostflag
Specifically:
* Parse it out of the hostflags when constructing the hid
* Look for the `hostalloc-diverse-FLIGHT-CLASS' ClientNote in
the resource plan, to avoid inappropriately planning to reuse hosts.
* Look for the `diversehosts_CLASS' runvar in other jobs in this flight,
to find out who might have allocated with the same CLASS. (This
sort of duplicates information in *hostflags and *host, but digging
the information out of the latter two would be very tiresome.)
* Check each of the above for each candidate host.
* Set the ClientNote when we are preparing a booking.
* Set the runvar when we do the allocation.
* Document the ClientNote.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Fri, 8 Jul 2016 19:02:39 +0000 (20:02 +0100)]
step status skip: Implement in sg-report-flight
* When we are doing archaeology, searching for flight(s) which ran a
particular testid, ignore all flights where the testid was skipped.
* In a flight we are examining for failures we need to justify, do not
regard `skip' as a failure which requires investigation. We
thusg treat `skip' in such a flight very like `pass'.
* Assign a colour (dark grey, almost like the background) and display
priority (very low) to `skip', so that they turn up nicely in the
HTML grids.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 8 Jul 2016 18:57:53 +0000 (19:57 +0100)]
step status skip: Ignore in report_run_getinfo
report_run_getinfo is trying to generate some HTML to describe a job's
(current) status. It sometimes looks at the steps to find
`interesting' information to report.
Completely ignore steps with status `skip' for this purpose, just like
we ignore ones with status `pass'.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 8 Jul 2016 18:56:05 +0000 (19:56 +0100)]
step status skip: Ignore in cs-bisection-step
cs-bisection step wants to completely ignore all skipped steps. So we
adjust the one query which doesn't already insist on particular status
values, to filter out `skip'.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 8 Jul 2016 18:30:58 +0000 (19:30 +0100)]
Executive: Previous duration estimator: use overall time, not sum of steps
Some jobs runs steps in parallel. Do not add up all the individual
step durations. Instead, calculate the duration as the time between
first step start and last step finish.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Fri, 1 Jul 2016 15:46:08 +0000 (16:46 +0100)]
rumprun: `rumpbake' our executables and run them with `rumprun'
(Well, our one executable: xenstore-ls)
Modern rumprun requires the output of the linker to be `baked' (second
link phase, where the complete unikernel is assembled).
This has to be done as part of the build, because it needs all the
rumpkernel libraries. It generates a single image file - there is no
longer any disk image or config file produced by the rump ecosystem.
The baked file needs to be provided in a dist. We have
ts-rumprun-bake take command line argument specifying which things to
bake. It reads the runvars for the source executables and creates a
single dist output containing the images. There are now `executables'
and `images'.
Furthermore modern rumprun requires the image to be run with
`rumprun'. One underlying reason is that it wants to pass the command
line and some other config parameters to the guest via xenstore, in
/local/domain/GUEST/rumprun/cfg. To do this outside xl requires the
domain to be created paused. Another is to abstract away details of
the actual execution environment (compared to other unikernel
execution models).
rumprun has a mode (-D -T) in which it would be possible to fish the
configuration and the desired json object (for the cfg) out of the
tempfile it creates. It might also be possible for osstest to
construct these out of whole cloth.
However, this would be undesirable because it would break if rumprun
changed (in particular, if the interface to the domain creation
changed).
And because of the cfg wrinkle it still wouldn't let us construct a
domain config file which could be passed to
toolstack($ho)->guest_create.
So instead we invent Osstest::Rumprun::rumprun_guest_create, which
invokes rumprun. rumprun implicitly invokes xl.
The config editing which was previously done by ts-rumprun-demo-setup
is now done by passing appropriate options to rumprun.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 1 Jul 2016 14:44:41 +0000 (15:44 +0100)]
Xen built versions: ts-xen-build: check versions of Xen subtrees, only
ts-xen-build has a check that the actually-built versions of the
various subtrees are right. This allows it to spot if the machinery
for specifying the subtree revision hasn't worked.
However, this machinery is troublesome: it assumes that the value
specified in the revision_TREE runvar is a commit id, just like the
value specified in built_revision_TREE. This is, currently, true in
flights created by cr-daily-branch and cs-try-bisect.
But it is not necessarily true for flights created other ways. In
principle it would be possible to look into each checked out subtree,
and use git-rev-parse (and its equivalent for nother VCSs) to check
whether the specified revision is right (by comparing it to
origin/<revision_TREE>, not <revision_TREE>, I guess). This is quite
fiddly.
The reason this is causing trouble now is that some of the ad-hoc rump
kernel flights I'm currently making contain non-git-revison-id values
for the revision_TREE for parts of the rumprun build.
So for now, limiting this check to TREEs which are actually Xen
subtrees will fix the problem for me (and this will be necessary for
the fuller fix, which I describe above). So do that.
Specifically:
* Add a new WHERE clause to the query statement, so that it selects
only the row for one specific tree
* Run the query once for each tree in %xensubtrees
This leaves the query overly-complicated, but this doesn't matter,
because if and when we make a fuller fix we'll throw this entire query
away. So it is easier to put off rewriting it in the hope that this
will never been needed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 1 Jul 2016 14:43:00 +0000 (15:43 +0100)]
Xen built versions: Move list of subtrees to BuildSupport
Turn the adhoc list of tree names and subdirectories in
collect_xen_built_versions into a hash, which we iterate over.
Doing this in a data-driven way allows us to provide this information
to callers of collect_xen_built_versions, which is going to be helpful
in a moment.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 30 Jun 2016 14:19:15 +0000 (15:19 +0100)]
rumprun: ts-rumprun-build: Update for newer Xen
Newer Xen needs more work to make it cross compile for rump.
* Pass --host=TARGET to configure. This is needed so that configure
knows that we are deliberately cross compiling. (Otherwise it
tries to run target binaries on the host, and crashes when that fails.)
* Pass CROSS_COMPILE in the environment. This arranges for the Xen
Makefiles to run the right compiler, ie $(CROSS_COMPILE)-gcc.
* Put the rump compiler directory on PATH, so that the Xen Makefiles
can find it.
* Pass HOSTCC=gcc in the environment; otherwise it tries to use the
default CC (which is $(CROSS_COMPILE)gcc), when building
build-system-internal tools which are to be run on the host as part
of the build.
The need for this could be avoided by setting XEN_TARGET_ARCH to the
rump architecture, but then we would have to provide a Xen arch
config file for that architecture, which would be meaningless since
we are not actually building a hypervisor, and would have to contain
various dummy information.
NB in this commit message I use Xen terminology for cross arch names:
Xen GCC/GNU Meaning Example for
terminology terminology rump cross build
host build Native architecture of i586-linux-gnu
the environment in which
we are running the build.
target host Foreign architecture on i486-rumprun-netbsdelf
which the objects etc.
which we are now building
will eventually be run.
n/a target Used only when building a "Canadian"
cross compiler: the 2nd foreign
architecture for which the compiler which
we are now building (on the `build(gnu)'
arch) will, when we run it, produce
binaries (when it is run on the
`host(gnu)' arch).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 1 Jul 2016 17:30:21 +0000 (18:30 +0100)]
Executive: Allow out-of-order manipulations of flights intended play
Flights being operated on by a developer hacking about with the code,
which were created with intended blessing `play', are usually blessed
`running' or `broken' or something. So the safety catch bypass needs
to look at the intended blessing too.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
The order of results reported by sg-report-flight determines which
testid the bisector will try to work on first. It also determines the
order in which failures are shown in the email reports. We currently
sort them by the duration estimate (for each failure's containing job).
We should prefer earlier steps. So change the first sort key to be
the duration estimate only for the steps leading up to the step of
interest for each failure. (By passing the testid to the duration
estimator.)
Since the granularity is in seconds, this may still not distinguish
when there are fast steps. So as a secondary sort criterion, use the
stepno.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
squash! sg-report-flight: Report earlier, earlier step failures
Ian Jackson [Wed, 10 Aug 2016 16:10:39 +0000 (17:10 +0100)]
duration_estimator: Be able to estimate job duration up to a particular step
If this is passed, we are interested only in the duration up to and
including the specified test step. (If the specified test step is not
present or didn't have a recorded finish, we look at the whole job.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Wed, 3 Aug 2016 14:49:34 +0000 (15:49 +0100)]
ts-xen-build-prep: Use .gitconfig so _everything_ uses git cache
In particular, when xen.git clones a subtrees, whose url we didn't
specify in the runvars, we end up using the url from xen.git's
Config.mk.
Arrange to use the git cache for all git urls, via the insteadOf
feature.
Note that the git config url insteadOf feature is backwards: one
configures the config variable "url.NEW-URL.insteadOf.OLD-URL". So
the key is the value, and the value is the key.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 25 Jul 2016 10:19:18 +0000 (11:19 +0100)]
cr-ensure-disk-space: Correct stdout output
d221996eea64 "cr-ensure-disk-space: Run check_space before taking
lock" introduced an additional call to check_space but check_space
prints the start of a message (with no newline) expecting
iteration_proceed to print the rest.
Move $|=1 up appropriately and add a couple of messages in the right
place. This involves calling quit_ok rather than exit 0.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 25 Feb 2016 12:31:09 +0000 (12:31 +0000)]
mg-list-all-branches: Do not match ${BRANCHES+= ... }
This is not valid shell syntax and should not appear. The confusion
seems to have arisen because of the need for to match BRANCHES+=...
(without the surrounding { }).
This results in no change to the output. (I seem to have collected
this patch some time ago as part of some fixes to mg-list-all-branches
which have by now been applied, but not managed to write up and post
this specific change.)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 19 Jul 2016 16:25:48 +0000 (17:25 +0100)]
cr-ensure-disk-space: Run check_space before taking lock
This allows cr-ensure-disk-space to be a noop if there is enough
space, even if run on a host which doesn't have access to the relevant
lock directory.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 19 Jul 2016 14:06:58 +0000 (15:06 +0100)]
mfi-common: Do not set di_version runvar to empty string in build jobs
2601498df77c "mfi-common: Do not set di_version runvar to empty string"
fixed the test jobs but not the build jobs, because the setting of
hostos_runvars was (it seems) cloned-and-hacked, and it fixed only one
instance.
Now that we have set_hostos_runvars, use it in create_build_jobs too.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 15 Jul 2016 14:59:30 +0000 (15:59 +0100)]
Bisection truncation: Stop a bisection job after the step of interest
Set the `truncate_testid' runvar when we create a bisection flight.
Thus, the bisection will stop when it has collected the data point we
wanted. This is especially useful if the failing step is early in a
long job: passes do not have to wait for the whole rest of the job to
run.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 15 Jul 2016 14:30:56 +0000 (15:30 +0100)]
Job truncation: sg-run-job support truncation by setting global `truncate'
Arrange for a global variable `truncate' to be honoured. It is
initialised to 0. If it becomes 1 then:
* spawn-ts does not spawn jobs any more (reap-ts reaps these non-jobs
immediately), unless they are marked with ! in their iffail
* per-host-ts does not try to spawn anything any more, likewise
(strictly, we could leave checking truncate to spawn-ts, but this
way is clearer).
* These not-spawned jobs count as successful when reaped, unlike
jobs not spawned due to the presence of `abort'.
* At the end of the job, if things otherwise went OK, we set the
status to `truncated' rather than `pass'.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 15 Jul 2016 15:27:50 +0000 (16:27 +0100)]
sg-run-job: Change spawn-ts internal representation of reap handles
Previously, spawn-ts would pass reap-ts (via its caller) either a
filehandle, or an empty string meaning `when this is reaped, count it
as failed'.
We are going to want to represent `when this is reaped, count it as
successful', too. So change the representation to a variadic list,
with an enum type field at the front.
NB: oddly, reap-ts returns 1 for success.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 15 Jul 2016 14:20:00 +0000 (15:20 +0100)]
sg-run-job: Break out iffail-check
Both spawn-ts and per-host-ts do some processing of `iffail' values:
* Strip any leading !, which means "run this even if the job
is being stopped due to error";
* Turn `.' into `fail'.
The first of these is currently only done by per-host-ts, which checks
ok. We are going to want to do something more sophisticated when
truncating flights. So we introduce a new helper.
For now spawn-ts passes 1 for okexpr so its iffail-check always
returns 1 so it doesn't check the return value.
No functional change yet.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 15 Jul 2016 14:08:20 +0000 (15:08 +0100)]
Job truncation: Tolerate `truncated' job status
We are going to introduce a new `truncated' job status, which means
that the job went OK until sg-run-job decided not to continue with it
because it had done all that was requested.
(This will be used for bisection, to stop a bisection job after the
step of interest.)
Its properties are:
* In summary HTML `truncated' shows up as a green job status, like `pass'.
* The duration estimator _does_ look at truncated jobs. (Note that
it only looks on the specific branch, so only when organising a
bisection will it look at bisections.)
* Consequently the host allocator for bisections will expect the
duration to be that of the last flight where this job passed,
failed or was truncated, which is correct.
* When the host allocator is choosing a host for non-bisections it
won't consider these truncated jobs because they ought not to
appear in main branch flights. If they do they count more as
fails than (that is, they do make the job sticky).
* sg-execute-flight expects that sg-run-job might set the job
status to `truncated' and then exit with status 0.
* sg-report-flight does not look for an interesting failing step
when the job is truncated (ie for this purpose it's like pass).
* sg-report-flight doesn't consider truncated jobs to indicate
trouble, and handles truncated properly in Subject line generation.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Once upon a time we tried to run one bisector for all the branches.
But that doesn't work because they would overwrite each others' mros,
making the bisector flap as main flights finish.
If sharing a bisector working tree is desirable, something more
complex will be needed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 14 Jul 2016 15:40:00 +0000 (16:40 +0100)]
Bisection: Do not try to inhibit queue
In an effort to try to increase the chance that the next bisection
step will get the same host quickly, the cri-bisect uses
mg-queue-inhibit to inhibit all resource allocation for 5 minutes.
With the increasing size of the test facility and the increasing
number of bisector instances running, this is starting to become a
very crude hammer indeed.
And this is largely ineffective anyway as we try bisections every 15
minutes but only inhibit for 5 minutes.
Disable it, until we have a better answer.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 14 Jul 2016 15:26:30 +0000 (16:26 +0100)]
cr-try-bisect: Use WAITSTART of when we started bisecting this testid
Otherwise bisection jobs get queued up very late.
The intent is that once we have a regression, we /start/ bisecting it
roughly FCFS along with other flights, but then it gets priority until
the bisection is done.
Then next bisection in the same branch will have to wait again, to
start.
We implement this by keeping a stamp file, whose timestamp shows when
we started bisecting this testid and this step.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 15 Jul 2016 18:55:11 +0000 (18:55 +0000)]
ms-flights-summary: Invent a `prep.alloc.' pseudo job state
This allows us to separate out `preparing' jobs into ones which are in
our data plan and ones which are not. The ones which are not may not
have quite started to run ts-hosts-allocate, or may still be in the
planning queue and not made it into the projection.
In either case we don't have an estimated finish time for them.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 1 Jul 2016 11:28:18 +0000 (12:28 +0100)]
tcl daemons: transaction: Support db autoreconnect
Provide an `autoreconnect' argument which will automatically reconnect
to the db if the connection has been lost. It will make only one
reconnection attempt.
No functional change yet because no call sites have been changed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 1 Jul 2016 11:23:12 +0000 (12:23 +0100)]
tcl daemons: transaction: Only try ROLLBACK when necessary
In the deadlock case, we need to ROLLBACK. In other error cases we
are going to close the connection. And in those other cases the
ROLLBACK might fail, causing our error recovery to go wrong.
So do ROLLBACK only on the single path where we might continue to use
the connection.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 7 Jul 2016 12:14:26 +0000 (13:14 +0100)]
tcl daemons: Break out db-ensure-open and db-ensure-closed
To be able to deliberately reconnect to the database, in case of
error, we need functions which actually work with dbh, rather than
simply the refcount.
No functional change as yet.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>