Ian Jackson [Thu, 22 Dec 2016 13:43:01 +0000 (13:43 +0000)]
ts-xen-install: Pass `noreboot' to Xen
This prevents Xen from rebooting the host, if Xen crashes.
This reboot serves no function in osstest, since a crashed host will
be automatically power cycled to recover it. (Firstly, during log
collection, a renewed attempt to boot from the hard disk; then, during
the next test, netboot to wipe the machine to reinstall it.)
But the reboot does make logs more confusing, and we suspect that the
reboot loops which can occur (eg if the version of Xen and Linux being
tested always crashes on boot) might be implicated in our test boxes
occasionally forgetting their boot order.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 20 Dec 2016 17:15:26 +0000 (17:15 +0000)]
db retry: Use HandleError and exceptions to detect when to retry
It appears that sometimes, $dbh->state could be overwritten before
$mjobdb->need_retry got to run. $dbh->err and $@ would still be
right. I have not been able to explain this; I suspect that there is
something exciting going on in the eval exception trapping.
To try to isolate the problem, I developed this different approach: we
use the HandleError database handle feature, to cause all the
retriable errors to be thrown as a dedicated exception class.
We can then simply test ref $@. (We don't care about subclassing
here. And we don't want isa because $@ might just be a string.)
This is, in general, more reliable, as it cannot treat any other
failures as db retry failures.
Osstest::Executive and Osstest::JobDB::Executive become slightly more
intertwined with this patch.
Sadly this does not seem to completely eliminate the problem. It does
allow us to present clearer evidence of problems in the underlying
DBI, DBD or PostgreSQL layers...
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 20 Dec 2016 17:29:57 +0000 (17:29 +0000)]
db retry, bisect: Cache build reuse investigations
If we previously searched for builds to reuse, trust our previous
answers. We will only have seen data from committed transactions and
we will only have looked at jobs in completed flights, which won't
have changed.
So any previously reuseable build is still reuseable. (Unless its
stash check failed, in which case we might want to search again if we
do a db retry.) We don't care much about missing any
recently-finished jobs - there is a much bigger and unavoidable race
there anyway, where multiple bisections on different branches may
choose to pointlessly rebuild the same thing at the same time.
Not doing this search over and over again is important because it is a
very wide ranging search, which will often cause database transaction
serialisation errors. Without some caching here, we may never
converge.
In principle we could do this another way: we could make a readonly
transaction which did all the searching. But that's a more awkward
way to organise the code because our search uses a temporary table
which we then construct the flight from.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 20 Dec 2016 17:21:16 +0000 (17:21 +0000)]
db retry, bisection: Reset %jobs_created on db retry
%jobs_created is used for memoisation while populating the destination
flight. We need to reset it when we restart flight construction,
because those jobs were created in the discarded transaction.
Otherwise we could create a flight with missing jobs.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Mon, 12 Dec 2016 13:21:53 +0000 (13:21 +0000)]
mg-schema-test-database: Wrap some withtest psql_do in subshells
Otherwise it takes effect for the rest of the script, which is not
what is wanted ! As it happens, there are no accesses to the real db
after this point, so this bug is latent.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Mon, 12 Dec 2016 13:18:29 +0000 (13:18 +0000)]
mg-schema-test-database: Revamp sequence handling
The initial value (at creation time) of a sequence appears in the
schema, but is not of any consequence. To avoid the schema diff check
failing in databases created in a slightly different way, it is
necessary to copy the actual original initial sequence value for each
sequence.
Replace the sequence handling code with a setup which, for each
sequence, copies the START WITH and calculates a fresh RESTART WITH.
This replaces both the unconditional copy (done with pgdump) and the
special calculation of the next flight number. Now all sequences have
the "bump the number somewhat" treatment, which seems nice.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 30 Nov 2016 13:58:56 +0000 (13:58 +0000)]
Executive database: stub out use of LOCK TABLES
We want to improve database performance, and one of the problems is
excessive locking. Postgresql now has predictate locking, and we
have, we think, eliminated all the places that do not handle a
database transaction failure. So we can rely on optimistic
concurrency.
So, eliminate all uses of LOCK TABLES.
However, I'm not quite sure that all of the above is actually true -
particularly, with relation to our own error handling. So, we want to
leave ourselves an escape hatch and an easy reversion path.
The approach adopted is to change the semantics of the transaction
support routines (one in Perl, and one in Tcl) so that the meaning of
all the existing call sites is changed to "do not lock any tables".
But the facility for table locking is retained and any call sites
which still need locking or fixing can use a new parameter format to
say they actually want the locking.
Hopefully this will turn out to be unnecessary. In that case, in due
course, we can strip out all the locking machinery, abolish all the
corresponding parameters, and so on.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 9 Dec 2016 19:08:14 +0000 (19:08 +0000)]
sg-check-tested: Lift work into new `search' sub, and indirect output
* Replace open-coded prints to stdout with calls to new `ouput' sub
* Move main body into new `search' sub
* Exit from `search' with `return' rather than `exit 0'
Overall, no functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 9 Dec 2016 18:53:14 +0000 (18:53 +0000)]
sg-execute-flight: Lift db work into retry loop
* Open $mro.new at the start of the loop, rather than during
argument parsing. So if we retry, it gets rewritten.
* Move the stdout output to the end of the script.
We tolerate that the html is moved into place even though we might be
about to rewrite it.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 9 Dec 2016 17:03:10 +0000 (17:03 +0000)]
Executive database: Handle database transaction retry in resourcecall
Previously, call failures of $resourcecall were fatal. But it might
fail due to a need for db retry. In that case, simply rethrow the
exception for handling by db_retry.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Mon, 5 Dec 2016 15:18:57 +0000 (15:18 +0000)]
tcl: Abolish with-db in favour of transaction
Everything needs to be in a proper transaction, with retries.
Replace the one call site which does an ad hoc BEGIN/COMMIT with a
call to transaction. The body here is already idempotent, so making
it be a loop body is fine.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Mon, 5 Dec 2016 15:33:40 +0000 (15:33 +0000)]
tcl: sg-execute-flight: Reorganise check to use a transaction
We are going to remove the locking. with-db is going to have to go.
Make this a transaction retry loop. That means moving the
non-idempotent parts out of the transaction loop.
The changed control flow means that:
* If we are done, the 2nd query is now run uselessly. It ought not
to find any jobs because no jobs ought to become queued during
the transaction. So this is harmless.
* Just for clarity, we calculate nrunning again after the transaction.
Only nqueued is an output from the transaction, formally. Of course
the transaction can update running but that is not in the database.
* We have to bless the flight outside the transaction, or we are
creating (and mishandling) reentrant transactions.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Wed, 30 Nov 2016 13:59:01 +0000 (13:59 +0000)]
Executive database: set isolation level in Perl
The Perl was lacking SET TRANSACTION ISOLATION LEVEL SERIALIZABLE,
which is sadly not the default. Currently that does not matter
because of all the table locking, but we are about to abolish that.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 7 Dec 2016 13:23:52 +0000 (13:23 +0000)]
tcl: JobDB-Executive: Do not crash due to ROLLBACK on transaction retry
If the transaction fails, we have called db-close by the time we
discover it's for retry, so the dbh may be closed (depending whether
we have a persistent dbh).
So run ROLLBACK only if dbopen indicates that the db connection is
still open.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 29 Nov 2016 16:57:50 +0000 (16:57 +0000)]
rumprun: Be more optimistic in allows
The old rumpuserxen allow file (which wants to override the global
ignore for the xenstorels.repeat failure) was no longer effective
because the branch name (in the filename) was out of date. Fix that.
Also no longer ignore one-off xenstorels failures in the primary
branches. The failure probability seems really very low now.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Avoids this crash, at the end of the job:
expected boolean value but got ""
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 8 Nov 2016 15:06:12 +0000 (15:06 +0000)]
ts-xen-build: Enable CONFIG_EXPERT
This requires an environment variable set in the build environment,
too. (There is an argument amongst hypervisor maintainers about
whether this requirement in xen.git is a good idea; but, nevertheless,
it is currently there in several existing trees, so we need to set
it.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
standalong-reset: Use the $suite when recreating soft links.
Commit ef3a6f2162ced5cfeb08b437315b69ad1ddbc5ed:
"Add -$suite suffix to TftpDiVersion in code"
forgot to include the $suite parameter when re-linking
current-$suite against the <date>-$suite.
Specifically after we have downloaded the debian files the
$TftpDir has:
konrad konrad 4096 paź 26 13:29 2016-10-26-jessi
konrad konrad 17 paź 26 13:29 current-jessie -> 2016-10-26-jessi
we end up removing the symlink (current-jessie) and then
recreating it as:
konrad konrad 4096 paź 26 13:29 2016-10-26-jessi
konrad konrad 10 paź 26 13:29 current-jessie -> 2016-10-2
which is wrong as there is no '2016-10-2' directory.
The patch is to add the $suite in the linking.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
README: Mention the / requirement in Tftp[Tmp|Pxe]Dir
That is the / should be part of the directory name. Otherwise
we get strange files such as: pxelinux.cfgC0A86A3C
in the TftpPath directory.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Slightly improved wording.
If the user forgot to include 'dhcp3' on the parameter
line point out the error to the user.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: Still print out the arguments to the constructor, hopefully
non-confusingly.
In the Xen Project colo, we do not grant our hosts (including the
controller) general internet access.
We did have the IP address of scan.coverity.com configured in our
firewall. But Coverity use a CDN now and the IP address is not even
slightly stable any more.
Using the squid works (using CONNECT). So do that.
I have not been able to test this end-to-end, since I didn't want to
do a real upload. However I have c&p the command line that
ts-coverity-upload now attempts, and modified it slightly, and
verified that it then manages to get the appropriate 401 error from
scan.coverity.com.
Deployment note: Sites doing Coverity uploads who have an http proxy
configured but which do not want to use it for these uploads need to
set the config option CoverityHttpProxy to the empty value.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com>
Ian Jackson [Mon, 17 Oct 2016 14:01:07 +0000 (15:01 +0100)]
README: Be less confusing about Tftp settings
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Give the scope-free version as the basic explanation
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Marcos Matsunaga <Marcos.Matsunaga@oracle.com>
Ian Jackson [Tue, 18 Oct 2016 15:46:20 +0000 (16:46 +0100)]
mgi-common: Support empty (unset) HttpProxy properly
mg_update_proxy ends up being set to the empty string so the {...:+-x}
form is needed to expand only non-empty values to `-x'.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Marcos Matsunaga <Marcos.Matsunaga@oracle.com>
Ian Jackson [Tue, 18 Oct 2016 15:22:33 +0000 (16:22 +0100)]
standlone-reset: mkdir some directories
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Marcos Matsunaga <Marcos.Matsunaga@oracle.com>
Ian Jackson [Tue, 18 Oct 2016 14:50:16 +0000 (15:50 +0100)]
standalone-reset: Fix when TftpDiVersion not set in config
In b8134c7fa60d "mg-debian-installer-update: Print the correct value
for TftpDiVersion", the output of mg-debian-installer-update was
changed to be a config fragment. But standalone-reset expected it to
be just the date value, and was not updated.
Update it now. And leave a comment in mg-debian-installer-update to
stop this happening again.
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Marcos Matsunaga <Marcos.Matsunaga@oracle.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Oct 2016 14:33:03 +0000 (15:33 +0100)]
support check: Reverse sense of return values
The toolstack()->check_blah functions would return an exit status.
This is very confusing. Instead, have them return a booleanish value
representing the support status: ie, truthy if supported.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Oct 2016 14:24:45 +0000 (15:24 +0100)]
support check: Provide helper function to print nice log message
Makes ts-migrate-support-check and ts-saverestore-support-check
slightly clearer.
This function takes $yes, which is truthish if the feature is
supported. We are going to replace use of exit status truth values in
the various check functions in just a moment.
No functional change other than to log output.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 6 Oct 2016 16:38:29 +0000 (17:38 +0100)]
libvirt: Check /capabilities/host/migration_features/live for live migration
libvirt is capable of advertising this separately from
/capabilities/host/migration_features, so if save/restore is supported
but live migration is not, this will do the right thing.
We would have preferred libvirt to advertise
/capabilities/host/migration_features/save
or something, but it doesn't right now, so we continue to use
/capabilities/host/migration_features
to detect save/restore support.
If libvirt changes its feature presentation, then at some future point
we should change osstest too.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Martin Kletzander <mkletzan@redhat.com> CC: Jim Fehlig <jfehlig@suse.com>
---
v3: Call correct function name.
Ian Jackson [Tue, 4 Oct 2016 16:24:17 +0000 (17:24 +0100)]
libvirt: Do not attempt save/restore when migration not advertised
Currently, osstest wrongly thinks that ARM can do save/restore,
because `virsh help' does mention the save command (on all
architectures).
So, additionally, check the virth capabilities xpath
/capabilities/host/migration_features
to try to see whether this host supports migration.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Julien Grall <julien.grall@arm.com> CC: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v3: Removed questioning and uncertain remarks from commit message,
following appropriate confirmation from libvirt folks.
Get sense of conditional combination right.
Ian Jackson [Tue, 4 Oct 2016 16:15:55 +0000 (17:15 +0100)]
libvirt: Check migration capabilities using proper XML parser
Do not grep the virsh capabilities output (!) Instead, parse the XML
using perl's XML modules and look for the specific feature flag using
an XPATH pattern.
Xen could in principle (and is expected to, in the future, on ARM)
support save/restore but not live migration. Currently it supports
neither on ARM. libvirt's capabilities system does in principle
capture this distinction, but only in an adhoc way.
For now, this osstest commit has no ultimate functional change (with
libvirt output as it currently appears on our real hosts).
Deployment note: Requires libxml-libxml-perl to be installed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Julien Grall <julien.grall@arm.com> CC: Jim Fehlig <jfehlig@suse.com>
---
v3: Mention newly-required Perl libraries in README and commit message
Get answers and syntax right (!)
Ian Jackson [Thu, 6 Oct 2016 18:39:02 +0000 (19:39 +0100)]
make-flight: XTF: honour $bfi (ie build flight)
If make-flight is run with a $buildflight argument, it does not create
any build jobs. The test jobs are supposed to refer to the build jobs.
This was not done correctly for the XTF tests. Add the missing ${bfi}.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Wed, 5 Oct 2016 13:39:11 +0000 (14:39 +0100)]
Support guest-specific "toolstack" for guest creation
Some guests need creation in a special way. For example, rump kernels
are ideally started with rumprun. Honour a guest var which specifies
a toolstack name.
Osstest::TestSupport::toolstack now takes an optional $gho so it can
do this lookup when appropriate.
After creation the guest is necessarily managed with the toolstack for
the host, so we honour this (ie we pass the $gho) only for create.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 29 Sep 2016 12:57:05 +0000 (13:57 +0100)]
rump-test-net: setsockopt V6ONLY off
NetBSD (unlike Linux) has the V6ONLY socket option turned on by
default. So to work in the rump kernel environment when tested with
IPv4 we need to adjust this setting.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 29 Sep 2016 10:36:23 +0000 (11:36 +0100)]
rump-test-net: New test program
The rump kernel WOPR test is no more, so we reimplement it. This test
program simply listens on a TCP socket and says hi when you connect to
it. It's a portable program. So far, this has been tested on Linux,
but not in the rump environment.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 6 Oct 2016 15:49:28 +0000 (16:49 +0100)]
mg-allocate: Provide command line way to list allocated resources
Freely shareable resources don't appear in the plan, and the plan is
not always immediately updated, and is generally not always a
convenient interface. Provide a command line way to list allocated
resources.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Thu, 6 Oct 2016 12:10:20 +0000 (13:10 +0100)]
sg-report-flight: Avoid some warnings when reporting unexecuted jobs
If no steps in a job are executed, there can be a failure with a
synthetic step row, containing a stepno of ''. This causes a perl
warning when compared with <=>:
Argument "" isn't numeric in numeric comparison (<=>) at ./sg-report-flight line 774.
Fix this by replacing falseish values with 0.
Bug introduced in 0e09a8b00ec6 "sg-report-flight: Report earlier,
earlier step failures".
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Tue, 4 Oct 2016 18:09:18 +0000 (18:09 +0000)]
host allocation: Fix duration estimate to not include host allocation
In 720f08cb9052 "Executive: Previous duration estimator: use overall
time, not sum of steps" we introduced a bug: the condition to exclude
the host allocation time is now not effective if there are any steps
before host allocation. Usually there are.
This means that the host allocation duration estimator has been
including the host allocation time from previous jobs, which is quite
wrong.
Fix this by subtracting the maximum duration of any host allocation
step. Hopefully there will only be one.
If any host allocation runs concurrently with other steps (including
other host allocations) then this will start to give wrong answers.
But there are other reasons why we wouldn't want to do that.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: Fix sql syntax.