Ian Jackson [Fri, 1 Jul 2016 11:23:12 +0000 (12:23 +0100)]
tcl daemons: transaction: Only try ROLLBACK when necessary
In the deadlock case, we need to ROLLBACK. In other error cases we
are going to close the connection. And in those other cases the
ROLLBACK might fail, causing our error recovery to go wrong.
So do ROLLBACK only on the single path where we might continue to use
the connection.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 7 Jul 2016 12:14:26 +0000 (13:14 +0100)]
tcl daemons: Break out db-ensure-open and db-ensure-closed
To be able to deliberately reconnect to the database, in case of
error, we need functions which actually work with dbh, rather than
simply the refcount.
No functional change as yet.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 1 Jul 2016 11:17:37 +0000 (12:17 +0100)]
tcl daemons: jobdb::transaction: Improve two message generation sites
* Use logputs rather than puts to report transaction deadlock retry
* Use $ei and $ec rather than $errorInfo and $errorCode when calling
error due to too many deadlock retries. This has no functional change
but is less fragile in case of future addition of new calls to catch
between the main catch and this throw.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 7 Jan 2016 19:30:42 +0000 (19:30 +0000)]
ms-ownerdaemon: Cope with db restart. Retry recording dead tasks.
In chan-destroy-stuff, instead of accessing the db directly, add the
dead task(s) to a queue, and arrange to look at that queue.
Errors are handled by setting an `after' handler which we cancel if we
are successful.
The after handler requeues a queue run attempt as the first thing
(which will arrange that a further retry will occur if things are
still broken) and then attempts to reconnect to the database.
I have tested this with a test instance by renaming the `tasks' table
under its feet, and it functions as expected.
DEPLOYMENT NOTE: The owner daemon cannot be restarted without shutting
everything down. So this update should first be deployed in
Cambridge, probably, to see how it goes. Also, it is less critical in
the main Xen production test lab because there the db and the owner
daemon are co-hosted on the same VM.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Put back the `unset tasks' which was mistakenly removed. The
effect of its lack is to fail to clear out the task list for
previous uses of the channel (which is named after the fd); this
is mostly harmless apart from log spam but causes the usual
case to be something like
OK created-task 456354 ownd [10.80.227.94]:44852-876
rather than
OK created-task 456354 ownd [10.80.227.94]:44852-876
which some of the clients (rightly) don't expect.
Ian Jackson [Thu, 7 Jan 2016 18:47:03 +0000 (18:47 +0000)]
Database locking: Tcl: Retry only on DEADLOCK DETECTED
Use the new errorCode coming out of db-execute* to tell when the error
is that we got a database deadlock, which is the situation in which we
should retry.
This involves combining the two catch blocks, so that there is only
one error handling strategy. Previously errors on COMMIT would be
retried and others would not. Now errors anywhere might be retried
but only if the DB indicated deadlock.
We now unconditionally execute ROLLBACK. This is more correct, since
we always previously executed BEGIN.
And, we pass the errorInfo and errorCode from the $body to the caller.
I have tested this with a test db instance, using contrived means to
generate a database deadlock, and it does actually retry.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Thu, 7 Jan 2016 18:22:53 +0000 (18:22 +0000)]
Database locking: Tcl: for errorCode, use pg_exec, not pg_execute
We would like to be able to retry db transactions. To do this we need
to know why they failed (if they did).
But pg_execute does not set errorCode. (This is clearly a bug.) And
since it immediately discards a failed statement, any error
information has been lost by the time pg_execute returns.
So, instead, use pg_exec, and manually mess about with fishing
suitable information out of a failed statement handle, and generating
an appropriate errorCode.
There are no current consumers of this errorCode: that will come in a
moment.
A wrinkle is that as a result it is no longer possible to use
db-execute on a SELECT statement nor db-execute-array on a non-SELECT
statement. This is because 1. the `ok' status that we have to
check for is different for statements which are commands and ones
which return tuples and 2. we need to fish a different return value out
of the statement handle (-cmdTuples vs -numTuples). But all uses in
the codebase are now fine for this distinction.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Put emsg at the start of errorInfo; things that print errors that
print errorInfo typically print _only_ errorInfo.
Ian Jackson [Thu, 14 Jul 2016 12:11:44 +0000 (13:11 +0100)]
Database locking: Tcl: Use db-execute-array for SELECT in sg-execute-flight
We are going to make it wrong to use db-execute for SELECT statements.
Convert the existing violation site, which uses db-execute, into
db-execute-array (providing a dummy arrayvar).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Dropped change to become-task which is actually wrong and should
be db-update-1, anyway. This will now be fixed in a separate patch.
Ian Jackson [Fri, 1 Jul 2016 18:40:41 +0000 (19:40 +0100)]
Tcl: Use tclsh8.5
I have checked that tclsh8.5 and TclX work on osstest.test-lab (and
also osstest.xs.citrite.net). TclX seems to be provided by tcl8.4 but
work with tcl8.5 (at least on wheezy and jessie).
Deployment note: hosts running Debian wheezy (including
osstest.xs.citrite.net, the Citrix Cambridge instance), will need
OSSTEST_DAEMON_TCLSH=tclsh8.4 in ~/.xen-osstest/settings.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 8 Jul 2016 10:48:39 +0000 (11:48 +0100)]
invoke-daemon: Honour OSSTEST_DAEMON_TCLSH
It appears that tcl8.5 in wheezy has a serious bug which makes `after
idle' not always work. tcl8.4 has been working well in wheezy but is
not in jessie, where tcl8.5 works (and tcl8.6 has a serious event loop
bug - Debian #826741).
So we need to use different versions of Tcl on different hosts.
Allow this to be specified in ~/.xen-osstest/settings.
This affects only:
- invoke-daemon (which is normally run from inittab)
- mg-schema-test-database
sg-run-job and sg-execute-flight are not affected. They do not
currently use `after idle' so that is OK for now.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 8 Jul 2016 10:57:32 +0000 (11:57 +0100)]
mg-schema-test-database: Change default minflight to -100
It is tiresome to try to create a test db for playing with and have to
wait for a big copy. Better to create a small one by default; if the
user has forgotten to specify a minflight, they can always drop it and
run it again.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 7 Jul 2016 15:58:04 +0000 (16:58 +0100)]
mg-allocate: Do not treat already-allocated resources as satisfactory
This was always rather odd for ./mg-allocate HOSTNAME but makes the
more sophisticated uses like ./mg-allocate '{FLAG,FLAG,...}' very much
less useful.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 7 Jul 2016 11:18:33 +0000 (12:18 +0100)]
mg-allocate: Fix "issteallable" call
81cac5a1656e "mg-allocate: Support --steal" introduced an erroneous
call to the subref $issteallable, using { } instead of ( ), producing
this error:
Not a HASH reference at ./mg-allocate line 225.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Mon, 27 Jun 2016 15:49:52 +0000 (16:49 +0100)]
cr-daily-branch: libvirt: use frozen version on stable branches
libvirt master might increase its LIBXL_API_VERSION. When this feeds
through osstest it can cause the push gates of Xen stable branches to
break.
So for stable Xen branches do not track libvirt upstream. Instead,
use a frozen revision. (Only for main push gate tests of stable Xen
branches.)
The frozen branch is never going to be updated so it is not suitable
for other kinds of uses. In particular it won't get security fixes.
So we call the refs osstest/frozen/xen-K.L-testing to discourage
users from using them.
Deployment note: The Xen release checklist needs a new item "add this
frozen libvirt branch".
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Mon, 27 Jun 2016 11:25:14 +0000 (12:25 +0100)]
ts-xen-build-prep: Do not install Ocaml on squeeze or wheezy
squeeze doesn't (didn't) have it at all. wheezy doesn't have ocamlopt
on armhf, and the Xen build system (in the old branches where this is
relevant) seems not to be able to test this.
In any case we use these old Debian suites when testing old Xen
branches, which were (when they were current) built without ocaml.
This partially reverts "ts-xen-build-prep: Install Ocaml" bbe1a9b2a6c0.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: David Scott <dave@recoil.org> CC: Jan Beulich <JBeulich@suse.com>
Ian Jackson [Tue, 22 Mar 2016 19:40:53 +0000 (19:40 +0000)]
mg-hosts serial attach: Provide serial-attach command
This is like running sympathy -r or xenuse by hand, except that it
checks that you have the host allocated, and looks up in the database
what the right rune is.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 22 Mar 2016 16:57:49 +0000 (16:57 +0000)]
Executive: Provide findtask_spec
This will allow code elsewhere to look up tasks other than the one
specified in OSSTEST_TASK. No callers of findtask_spec yet, so no
functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 22 Apr 2016 15:25:04 +0000 (16:25 +0100)]
ts-xen-build-prep: Install Ocaml
This will result in the Xen build system building, and then
preferring, oxenstored.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Scott <dave@recoil.org>
Ian Jackson [Fri, 22 Apr 2016 14:46:30 +0000 (15:46 +0100)]
crontab: Drop linux-mingo-tip-master linux-next linux-linus
It appears that no-one is looking at the output. These have not had a
push to the tested output branch for at least 250 days (742 days in
the case of linux-linus!) and the reports don't seem to be generating
any bugfixing activity.
There is a plan to do some Xen testing in Zero-day but even if that
doesn't lead to anything we would still be just where we are now.
So drop these to save our test bandwith for more useful work.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Roger Pau Monne <roger.pau@citrix.com> Acked-by: Juergen Gross <jgross@suse.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> CC: David Vrabel <david.vrabel@citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Anshul Makkar <anshul.makkar@citrix.com>
Ian Jackson [Mon, 11 Apr 2016 16:17:30 +0000 (17:17 +0100)]
sg-run-job: nested: Report nested log capture failure as `fail'
Previously this was `broken' (ie, infrastructure failure), which is
not really true - the usual reason is that the L0 has crashed, so that
efforts to manipulate the L1 do not succeed.
Tested using OSSTEST_SIMULATE and this:
diff --git a/sg-run-job b/sg-run-job
index 8b2d5e1..0f8e278 100755
--- a/sg-run-job
+++ b/sg-run-job
@@ -181,6 +181,11 @@ proc spawn-ts {iffail testid args} {
set xprefix {}
if {[var-or-default env(OSSTEST_SIMULATE) 0]} { set xprefix echo }
+puts stderr ">$ts $real_args"
+ switch -glob "$ts $real_args" {
+ {ts-logs-capture *} { set xprefix "bomb $xprefix" }
+ }
+
set log [jobdb::step-log-filename $flight $jobinfo(job) $stepno $ts]
set redirects {< /dev/null}
if {[string length $log]} {
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Mon, 11 Apr 2016 16:12:01 +0000 (17:12 +0100)]
mfi-common: Do not set di_version runvar to empty string
Specifically, do not set all_host_di_version to the shell variable
$di_version unless the latter has a nonempty value. A set but empty
value for all_host_di_version does not default to the version for the
specific suite. So this produces install failures.
This bug seems to have been introduced fairly recently, as fallout
from recent di_version handling changes.
diffing standalone-generate-dump-flight-runvars shows the expected
changes.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Mon, 11 Apr 2016 15:31:28 +0000 (16:31 +0100)]
make-distros-flight: Always set guest_suite and defguestsuite
Abolish the shell variables $gsuite and $debian_suite (which were
referred to only in make-distros-flight) and set and use the variables
guest_suite and defguestsuite.
These variables are used by the machinery in mfi-common to populate
the runvars.
No functional change (as seen in standalone-generate-dump-flight-runvars,
with mg-list-all-branches edited to use crontab-cambridge).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 19 Feb 2016 17:52:16 +0000 (17:52 +0000)]
mg-list-all-branches: avoid mistakenly generating `.' in the output
The regex in mg-list-all-branches assumes that the BRANCHES= will
either be a singleton entry separated from the following command by a
hard tab or a single quoted list of space separated entries, however
the xen-unstable-coverity line is singleton separated from the command
by a single space.
We could fix this by using a hard tab, but that ends up aligning
things in an aesthetically displeasing way, and relying on hard tabs
is fragile.
Instead, improve the parsing in mg-list-all-branches: break out a
couple of semantically (as well as syntactically) common regexp
elements out into variables, and then provide two regexps: one which
matches shell "assign default values" substitutions, and the other
which matches the ordinary shell assignments.
We use an empty pair of () in the first regexp to make sure that they
both produce the branch name list in $2. (It would be possible to use
named capture groups but I'm not sure whether all our perls are recent
enough.)
I have verified that the actual difference in output right now is just
to remove the erroneous `.' entry.
Reported-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 30 Mar 2016 10:33:01 +0000 (11:33 +0100)]
TestSupport: Move cfg_tftp_di_version from Debian.pm
Strictly speaking this is a Debian-specific function. But it is
called by selecthost. TestSupport does not `use Osstest::Debian'
right now. As a result, currently, if $suite is not set or
TftpDiVersion_$suite is not set, the program will crash with
Undefined subroutine &Osstest::TestSupport::cfg_tftp_di_version called at Osstest/TestSupport.pm line 865.
Fix this by moving cfg_tftp_di_version to TestSupport, where it is
needed.
It would be possible to make the boundary between Osstest::TestSupport
and Osstest::Debian firmer by having selecthost explicitly call a
selecthost_do_debian_things (perhaps optionally, or as specified by
the caller).
But would be quite a palaver. It is much more convenient to fudge the
issue. (Of course if we have similar requirements for other OS's we
can put them in TestSupport too, provided they're not too big and
tangly.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Change `use' in getconfig_TftpDiVersion_suite
Ian Jackson [Wed, 23 Mar 2016 11:28:22 +0000 (11:28 +0000)]
coverity: Rename branch names to `smoke', not `smoked'
c/s d94637b6 "coverity: run tests on smoked rather than master"
used `smoked' in several places, including as the name of the
input branch (which is already established as `smoke'), and the name
of the coverity-tested branch.
But we call this `smoke', not `smoked'.
After this patch `git-grep smoked' produces no output, as it did in d94637b6~.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Wed, 23 Mar 2016 11:17:10 +0000 (11:17 +0000)]
cri-getprevxenbranch: Only ever return xen-X.Y-testing
Only consider xen-[0-9]* as candidates either for returning, or for
matching the current branch.
The effect is that attempts to ask for the `previous Xen branch' of
anything other than a Xen stable branch give the latest Xen stable
branch, which I think is correct.
This fixes a bug where the `previous branch' of xen-unstable-smoke was
considered to be xen-unstable-coverity (!)
This bug would not have been of any consequence, except that the
coverity tested branch name in xen.bit changed in
c/s d94637b6 "coverity: run tests on smoked rather than master"
and had not been created, so that cr-daily-branch would crash for
most branches because the (largely irrelevant) invocation of
`./ap-fetch-version-old xen-unstable-coverity' would fail.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 18 Mar 2016 17:48:44 +0000 (17:48 +0000)]
make-flight: Set debianhvm_suite for Debian HVM tests
do_hvm_debian_test_one uses usual_debianhvm_image which honours the
prevailing value of $guestsuite. However, it does not provide an
explicit suite setting in the runvars.
As a consequence, the test code will expect the image to install
whatever the default suite is. If guestsuite is not the default
suite, there is a mismatch. At the very least, the wrong
suite-specific workarounds will be applied.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Campbell [Tue, 23 Feb 2016 10:46:30 +0000 (10:46 +0000)]
coverity: run tests on smoked rather than master.
In retrospect there isn't much point in defering coverity until the
tree has been through a full test and it just results in potentially
longer gaps between runs with larger numbers of commits included (for
example the run on Sunday was skipped because master hadn't moved
forward since Wednesday).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Fri, 19 Feb 2016 16:21:44 +0000 (16:21 +0000)]
sg-report-flight: move "started" column to the left in Step table
With the current ordering of status => started I frequently (more
often than not) read the failing step as "(stepno,testid,script)
failed at <time>" (where <time> is actually the start time, not the
fail time).
Move the "started" column to the left of the "status" column. On the
basis that "(stepno,testid,script) started at $time and failed" reads
more (chrono)logically.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 18 Mar 2016 15:22:32 +0000 (15:22 +0000)]
Various: Honour suite-specific TftpDiVersion
Replace references to $c{TftpDiVersion} in the general osstest code
with calls to cfg_tftp_di_version. This means that the suite-specific
d-i version will, in general, be honoured (as is correct).
In preseed_create, we also honour $ho->{DiVersion}. Often this won't
be set, but it might be (for example, by selecthost finding di_version
runvars).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 18 Mar 2016 15:18:18 +0000 (15:18 +0000)]
mg-debian-installer-update: Use getconfig_TftpDiVersion_suite
The computed value of the `tftpdiversion' shell variable is used only
to see if it is equal to `current'; if so, we update it.
Whether this is done should depend on the effective TftpDiVersion for
the specific suite, not on the default global. So use
getconfig_TftpDiVersion_suite.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Campbell [Wed, 17 Feb 2016 16:13:11 +0000 (16:13 +0000)]
Increase priority of xen-unstable-coverity
Since we are limited on the number of these we can do per week (to 2)
we would like these to happen fairly promptly after the time given in
the crontab, otherwise we can potentially end up with the Wednesday
run not actually happening until late Saturday, right before the
Sunday run which might happen right away.
Therefore specify OSSTEST_RESOURCE_PRIORITY=-15, which is right behind
xen-unstable-smoke in priority order.
We don't have much data yet but based on what we have so far
ts-coverity-build takes up to 1000s (around quarter of an hour) and
ts-coverity-upload a little over half an hour. So including host
install (if needed, it can use a share of an existing build host if
one is around) the whole thing comes in at well under an hour, so
having this slip to the head of the queue is unlikely to cause
problems.
Also put mg-allocate and mg-blockage in the correct order in the doc.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Wed, 17 Feb 2016 10:50:01 +0000 (10:50 +0000)]
mg-show-flight-runvars: avoid "SELECT .. AND TRUE" for sqlite
c5e29f93fb6e "mg-show-flight-runvars: recurse on buildjobs upon
request" broke standalone mode with:
Error: no such column: TRUE
from sqlite. Do as is done for $syntcond and use (1=1) instead.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 18 Jan 2016 14:28:57 +0000 (14:28 +0000)]
make-flight: Use older Debian for host and guest OS with older Xen
Sometimes when updating osstest to use a newer version of Debian as a
baseline we find that the new compiler or other tools pickup latent
errors in older code bases for which the fixes are invasive or
otherwise inappropriate for a stable branch.
This is the case with Debian Jessie and Xen 4.3 and earlier, so
restrict those branches to keep using Wheezy.
This only applies to xen-X.Y-testing branches and
qemu-upstream-X.Y-testing branches since other branch all use
xen-unstable as their Xen.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 18 Jan 2016 14:28:56 +0000 (14:28 +0000)]
mfi-common: usual_debianhvm_image: derive version from $guestsuite
This more likely matches the callers intention.
Move the setting into production-config* alongside the Suite and
TftpDiVersion settings. Continue to support $DEBIAN_IMAGE_VERSION as an
override. The value for Wheezy is from what was replaced
in 610ea1628363 "Switch to Debian 8.0 (jessie) as OS for test hosts".
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 18 Jan 2016 14:28:55 +0000 (14:28 +0000)]
Qualify TftpDiVersion with the suite.
This allows the version to differ e.g. between Wheezy and Jessie.
Update production-config* to set TftpDiVersion_jessie instead of just
TftpDiVersion, also add TftpDiVersion_wheezy using the version
replaced in commit f610ea162836 "Switch to Debian 8.0 (jessie) as OS
for test hosts".
In mfi-common we need to check for TftpDiVersion_$suite (_$guestsuite)
and TftpDiVersion manually since getconfig In that context will not
see any DebianSuite override in the environment.
This ensures that when a non-default suite is configured a
corresponding useful version of DI is selected.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 18 Jan 2016 14:28:52 +0000 (14:28 +0000)]
ts-debian-di-install: Allow Di Version to come from runvars
and following the lead of the suite arrange for a version selected
from the defaults to be written back to the runvars.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- missing s/diversion/di_version/ in ts-debian-di-install,
drop unnecessary \ wrapping from $di_path assignment ]
Ian Campbell [Mon, 18 Jan 2016 14:28:51 +0000 (14:28 +0000)]
ts-host-install: Support DiVersion coming from runvars
To do so initialise $ho->{DiVersion} in select host and use it in
ts-host-install.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc missing s/diversion/di_version in selecthost ]
Ian Campbell [Mon, 18 Jan 2016 14:28:50 +0000 (14:28 +0000)]
mfi-common: Always add debian_suite to debian_runvars
This adds an explicit debian_suite to some jobs which didn't already
have one, meaning that those jobs will remain the same when cloned for
a bisect and run in a tree where $c{DebianGuestSuite} has changed
since the original construction.
No expected semantic change.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 18 Jan 2016 14:28:49 +0000 (14:28 +0000)]
mfi-common: always add host suite to hostos_runvars
This avoids situations where production-config* has changed
DebianSuite but the bisector is still picking up baselines etc from
before the change and reusing their runvars (without suite) with an
inconsistent config.
Switch selecthost() to use target_var when querying the suite. This
means it will check the "{ident}_suite" runvar first as before but
fallback to just looking at the "all_host_suite" runvar. We also
change the existing host_suite to all_host_suite in mfi-commong so
that test_matrix_iterate() needn't worry about ident=host vs
=src_host/dst_host etc (of course this can still be overridden if
desired by using src_host_suite etc, but nowhere does.
Other uses of $c{DebianSuite} have been abolished already.
Note that "$suite != $defsuite" is not true for any current production
invocation of osstest. If this was ever true then we would have set
the host_suite runvar, whereas now we always set all_host_suite.
However any old flights with host_suite would still be interpretted
the same. Note also that the "$suite != $defsuite" case was previously
broken for the -pair tests since the host idents there are 'src_host'
and 'dst_host', so the previous code would have fallen back to
$c{DebianSuite} without looking at the host_suite runvar.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 18 Jan 2016 14:28:46 +0000 (14:28 +0000)]
Debian: Abolish $suite and $xopts{Suite} from preseed_* interfaces.
Generating a preseed for a suite which does not match the ->{Suite} of
the underlying guest or host object does not seem useful, so remove
this option and use ->{Suite} instead.
For guests ->{Suite} is set by debian_guest_suite() (which is called
from preseed_guest_create(), although it is often also called prior to
that) and by selectguest()
For hosts $ho->{Suite} is initialised by selecthost if we are in the
context of a $job (and if we aren't we had best not be trying to
reinstall a host).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 5 Feb 2016 09:30:39 +0000 (09:30 +0000)]
Add a weekly coverity flight
This primarily consists of ts-coverity-{build,upload} and
make-coverity-flight which constructs the sole job.
The branch is named "xen-unstable-coverity" which matches various xen*
in the cr-* scripts. Places which needed special treatement are
handled by matching xen-*-coverity, which leaves the possibility of
xen-4.7-testing-coverity etc in the future, but note that care would
be needed so that coverity's tracking of new vs existing issues would
likely be confused by uploading different branches without
differentiating somehow (I don't know how this is supposed to work).
The most recently scanned revision is pushed to a new
coverity-scanned/master branch in the usual xen.git, tests are run on
the master branch.
I initially thoughts that $c{CoverityEmail} would need to be an actual
account registered with scan, however a manual experiment using
email=security@xen.org was accepted by the service. An "analysis
complete" message was sent to security@ while individual results mails
were sent to each member of the coverity project who was configured to
receive them. I think this is what we want. The "analysis complete"
mail contained no sensitive data, but also no real information other
than "success" (or presumably "failure" if that were to be the case).
I think going to security@ is probably OK.
The upload URL defaults to a dummy local URL, which will fail (it
would be possible in principal to put a stunt CGI there though). When
run with "cr-daily-branch --real" (i.e. in full on production mode)
then this is set instead to the value of CoverityUploadUrl from the
config (production-config etc). This means that adhoc and play runs
still exercise all the code (but the curl will fail) while --real runs
upload to a site-configurable location. (Note that the URL includes
the coverity project name, which would likely differ for different
instances).
I have run this via cr-daily-branch --real on the production infra
and it did upload as expected (flight 80516). Since
master==coverity-tested/master at this point it came out as a baseline
test which didn't attempt ap-push, which I would have expected to fail
anyway since it was running as my user in the colo which cannot push
to osstest@xenbits.
In my experiments the curl command took ~35 minutes to complete (rate
in the 100-200k range). Not sure if this is a problem, but use curl
--max-time passing it an hour to bound things. Note that curl is run
on the controller (via system_checked). timeout etc.
Note that the token must be supplied with </path/to/token and not
@/path/to/token. The latter appears to the server as a file upload
rather than a text field in a form which doesn't work. In early
attempts I thought that the trailing \n in /path/to/token might be an
issue and hence wrote a big comment. However having discovered < vs @
I am no longer 100% sure that is the case, but I left the comment
anyway since I can observe on the wire that the \n is included in the
upload (but each test takes ~35 mins and there is a ratelimit on the
server side too).
A final niggle is that the descripton field in the web ui ends up as:
80516:\ git://xenbits.xen.org/xen.git\ 9937763265d9597e5f2439249b16d995842cdf0
(i.e. spaces are \ escaped). I've confirmed with curl --trace-ascii
the the uploaded data is not escaped (this is from an earlier attempt
which did not include the flight number):
Due to the limitations on the numbers of uploads I've not experimented
with possible fixes yet (e.g. URL escaping the upload). Worst case we
either live with it or adjust the syntax to avoid the problematic
characters.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Ian Campbell [Tue, 19 Jan 2016 12:48:08 +0000 (12:48 +0000)]
make-flight: Support specifying a mini-os tree+revision
This is useful for standalone or adhoc use as well as (presumably)
bisection.
There is no ap-* or cr-daily-* integration here because I didn't need
it (i.e. I'm not intending to create a new mini-os branch here).
In order to cope with Xen <= 4.5 where extras/mini-os exists but is
part of xen.git and not something cloned from elsewhere add a
$optional argument (itself optional) to dir_identify_vcs which if true
causes dir_identify_vcs to return 'none' instead of failing.
Previously dir_identify_vcs failed with:
bash: line 5: fail: command not found
because the fail command is undefined. Instead echo fail and use that
to trigger the $optional handling.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 18 Jan 2016 15:54:15 +0000 (15:54 +0000)]
stop allowing libvirt failures
In Feb/Mar 2015 (not long after adding the libvirt tests) we appear to
have added test-@@-libvirt@@ to the set of allowed failures in
response to some issues with libvirtd crashing.
However looking at the history of test-@@-libvirt@@ on all branches
both in the COLO and in Cambridge (which was the production instance
back then) I don't see any evidence that this issue is still ongoing
(which matches my recollection of it having been fixed).
Therefore remove the entries allowing libvirt failures.
This effectively reverts:
00023a5af6ff allow files: Allow all libvirt test failures on other branches 83b8c8eafb18 allow.all: Do not regard libvirt guest start failures as regressions
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Wed, 6 Jan 2016 11:08:43 +0000 (11:08 +0000)]
sg-report-job-history: alternate color of osstest column only when it changes
Currently the bgcolor of the osstest column alternates on each line,
rather than only when it changes as the other revision columns do.
A given flight might touch multiple osstest revisions (although in
practice they rarely do) but it seems reasonable to simply consider
any change as a change.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Wed, 20 Jan 2016 15:06:20 +0000 (15:06 +0000)]
Debian: erase-other-disks: erase partitions first
It seems that when sdX is zeroed there is some chance that sdX[0-9]
will disappear before we get to them.
When partman comes along and recreates the partitions it is likely
that they will occupy the same disk space as before (since d-i's
autopartition is deterministic), meaning that LVM will find the old
PV headers again.
This is in particular problematic on multi disk systems where we end
up with an LV spanning sda5 and sdb. sdb is successfully erased here
but sda5 is not, however LVM will still find the LV with missing PV,
which is sufficient to trigger partman-lvm's checks for erasing
devices which weren't explicitly listed, resulting in:
!! ERROR: Unable to automatically remove LVM data
Because the volume group(s) on the selected device also consist of physical
volumes on other devices, it is not considered safe to remove its LVM data
automatically. If you wish to use this device for partitioning, please remove
its LVM data first.
which cannot be preseeded around.
If the autopartitioning is not deterministic (as might be the case
when installing a different version of Debian to last time) then
going from layout A -> B -> A' risks B (by chance) not destroying the
headers created by A, meaning that A' will find them and suffer again
from the problem above. This is handled via the use of
ts-host-install-twice which will cause A' to run twice, i.e. A -> B
-> (A' -> A''). In this case A' will fail as above, but A'' will
startup seeing the partition layout put in place by A' (which matches
A) and erase those partitions, leading to success later on.
Also erase partitions for all sd/hd? not just sda+hda.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>