Ian Jackson [Fri, 18 Sep 2015 14:35:47 +0000 (15:35 +0100)]
ts-debian-hvm-install: Set $gsuite after $gho
$gsuite was set from guest_var, but before $gho was set, leading to an
undefined value warning from Perl.
This would ignore any guest-specific suite runvars. AFAICT these are
set by some of the jobs in make-distros-flight. I think the effect of
this change is to apply workarounds for the intended suite, rather
than for wheezy.
(Although there is another assignment to $gho later in
ts-debian-hvm-install, for stage 2, the stage 2 code does some trivial
TestSupport calls and does not need $gsuite. So there is no need to
make arrangements to assign to $gsuite - or, for that matter, $kernel
or $ramdisk, in that path.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 18 Sep 2015 14:30:57 +0000 (15:30 +0100)]
ts-debian-hvm-install: Cope with images containing only isolinux
debian-7.2.0-i386-CD-1.iso contains no grub, only isolinux.
If the specified EFI grub file does not exist, fall back to isolinux.
This requires a -c option as well, according to
https://wiki.debian.org/DebianInstaller/Modify/CD
Only try to set up a grub config if we are booting grub. (The i386
image in question does not contain a [debian]/boot/grub directory.)
If boot/grub/efi.img _does_ exist (ie, for other existing tests), the
only difference in behaviour is to reorder slightly the options to
genisoimage: `-b boot/grub/efi.img' now occurs after `-no-emul-boot
-r' rather than before.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v4: Log $bootfile value.
Preseed generation now happens later due to previous patch;
so $bootfile setting now also deferred. Context change only.
Ian Jackson [Mon, 21 Sep 2015 14:13:23 +0000 (15:13 +0100)]
ts-debian-hvm-install: Defer preseed generation
Defer preseed file generation until after we have fetched and looked
inside the install image, because we are going to want to make changes
to the preseed file based on the image contents.
No overall functional change, although some things happen in a
different order now, and the ISO manipulation takes place in two calls
to target_cmd_root rather than one.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v4: New patch. Needed because otherwise the test for the grub install
image (to be introduced in the next patch) happens before the ISO
is unpacked, and we would then always fall back to isolinux.
Ian Jackson [Fri, 18 Sep 2015 13:57:47 +0000 (14:57 +0100)]
ts-debian-hvm-install, etc.: Do not hardcode in-iso path
ts-debian-hvm-install hardcoded `install.amd' as the directory in the
.iso in which to find the kernel and initrd. This is wrong for
architectures other than amd64.
Instead, pass this information in runvars (as is done for the netinst
installs in make-distros-flight), and honour it in
ts-debian-hvm-install.
If the runvars are not set, default to the previous hardcoded values.
(This arranges that clones of old flights still work with new osstest,
eg for bisection.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 8 Jul 2015 15:21:05 +0000 (16:21 +0100)]
ap-*: Be able to fetch and push xen.git#smoke
The branches and push gates are now:
xen.git#staging -[xen-unstable-smoke]-> #smoke -[xen-unstable]-> #master
Deployment note: When this passes the osstest self-push-gate, the main
xen-unstable flight will start using smoke as an input. Therefore,
until the new cronjob is installed to run the xen-unstable-smoke
tests, an automatic process should keep xen.git#smoke up to date with
xen.git#staging. Eg, running in screen in xen@xenbits:~/git/xen.git:
while sleep 1800; do git fetch . staging:smoke; done
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: `xen.git#smoked' branch name changed to `#smoke'
Ian Jackson [Thu, 17 Sep 2015 15:17:25 +0000 (16:17 +0100)]
cr-daily-branch: Use mg-adjust-flight-makexrefs to have smoke tests reuse builds
The smoke tests are for testing xen-unstable. We want to avoid
building anything else. So arrange to reuse previous builds by
calling mg-adjust-flight-makexrefs.
We rebuild libvirt too. This is necessary because libvirt is built
against xen.git, and uses ABI-unstable APIs, so we need a libvirt
built against the right xen.git. This means, for the smoke tests, we
need to build libvirt ourselves. Currently this build seems to take
416 sends (from host allocation, which we - perhaps naively - hope
will be able to reuse the host from the just-finished build job).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v4: Keep build-amd64-libvirt too.
v3: Add a comment about the --blessings=real
v2: New patch
Ian Jackson [Fri, 3 Jul 2015 18:56:45 +0000 (19:56 +0100)]
Provide xen-unstable-smoke branch
Introduce support for branch=qemu-xen-unstable-smoke which has
xenbranch=xen-unstable-smoke.
In make-flight, this contains a very limited set of jobs
test-amd64-amd64-libvirt
test-amd64-amd64-xl-qemuu-debianhvm-i386
test-armhf-armhf-xl
and the builds they depend on.
The debianhvm job exists only in this flight, and is generated by
having branch_debianhvm_arch return i386 instead of amd64. This is so
that this branch contains a 32-bit x86 guest as well as a 64-bit one.
We override host allocator parameters to make this flight not care
about host stickiness: it just takes whatever comes to hand. These
runvars are marked `synth' so that cs-bisection-step and
cs-adjust-flight do not copy them, as discussed in previous patches.
Later we will arrange to reuse previous builds for the build artefacts
which aren't intended subjects of the smoke test.
(Deployment note: This needs images/debian-7.2.0-i386-CD-1.iso which I
have already placed in the Cambridge and Xen Project instances.)
In ap-common we need to arrange to use the same qemu trees as for
xen-unstable, rather than looking for special smoke ones.
In select_xenbranch xen-unstable-smoke is mostly like xen-unstable.
There are only two places in osstest where xenbranch `xen-unstable' is
treated specially and only one of them needs adjusting to match
xen-unstable-smoke too.
The new branch `xen-unstable-smoke' has a `prev' branch of
`xen-unstable' according to cri-getprevxenbranch, which is technically
wrong, but this is not important because xen-unstable-smoke has no
prev tests.
We are going to sort out the push gate ref plumbing in xen.git in the
next osstest patch.
Also, use a branch-settings file to set the new branch's resource
priority to -20 to make it run ahead of anything else automatic.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v4: Introduce xenbranch_forqemu into ap-common.
Combine patches for make-flight and cr-*. make-flight includes
cri-common and ap-common (which is arguably a layering violation,
but there we are).
Dropped ack from Ian Campbell.
Set `qemuubranch' correctly in select_xenbranch.
v2: Generate all the jobs that this flight's tests use, and add
note about this to the commit message.
Mention `synth'-ness of hostalloc runvars in commit message.
Image is in Xen Project test colo too.
Ian Jackson [Fri, 3 Jul 2015 18:55:32 +0000 (19:55 +0100)]
make-flight: Run job_create_test_filter_callback on true job name
job_create_test would pass $job to job_create_test_filter_callback but
then later maybe append -xsm to it. Fix this.
No functional change for existing in-tree code because all existing
tests of the $job end in *.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Fix textual conflict after dropping "make-flight: Allow separate
specification of pre-built Xen vs others"
Ian Jackson [Thu, 17 Sep 2015 12:43:17 +0000 (13:43 +0100)]
cs-adjust-flight: Provide `jobs-del' operation
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Document the new operation
v2: New patch
Ian Jackson [Wed, 16 Sep 2015 15:48:05 +0000 (16:48 +0100)]
sg-check-tested: New --pass-job= option
Specifies that returned information should relate to a flight in which
a particular job existed and passed. The option can be repeated if
desired (to specify flights in which _all_ those jobs passed).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: New patch
Ian Jackson [Mon, 21 Sep 2015 13:35:15 +0000 (13:35 +0000)]
sg-report-flight: Better searching for used revisions
The old algorithm used for determining which flight might be a
suitable test of a particular revision was rather crude, in two ways:
* It would look at _all_ jobs in a flight referred to from the flight
of interest, not just at the relevant jobs;
* It would only look at the direct referents of the flight in
question. So for example, if a flight of interest contained
test-amd64-i386-libvirt, it would find a referenced
build-i386-libvirt in another flight, but that build refers to
build-i386, and it would not look at that (unless it happened to be
in the same flight).
Fix this by redoing the revision archaeology, with some $why tracking
to explain how we found a particular revision.
cs-bisection-step and sg-check-tested arguably ought to do do it this
way too. But I am leaving centralising this new logic, and using it
in those other programs, for another day.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v4: New patch
Ian Jackson [Thu, 2 Jul 2015 17:27:28 +0000 (18:27 +0100)]
ts-host-alloc-Executive: Honour various hostalloc_* runvars
We honour
hostalloc_maxbonus_variation
hostalloc_bonus_previousfail
hostalloc_bonus_sharereuse
and make them default to their previous values.
These should be set as `synth' runvars during flight construction, so
that they are not copied into flights generated by cs-bisection-step
or cs-adjust-flight. cs-bisection-step makes its own arrangements for
host specification. So should the caller of cs-adjust-flight (perhaps
via the blessing system).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Incorporate note about `synth'-ness in commit message.
Ian Jackson [Wed, 16 Sep 2015 12:14:47 +0000 (13:14 +0100)]
standalone-generate-dump-flight-runvars: Show synth runvars
Pass -a to mg-show-flight-runvars. That way when we use the new
cs-job-create feature to set synth runvars during creation, we will
see them in the dump.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 16 Sep 2015 12:09:01 +0000 (13:09 +0100)]
mg-show-flight-runvars: Decorate synth runvar names with ~
Make mg-show-flight-runvars -a append ~ to the names of synth runvars.
(This is consistent with the new syntax in cs-job-create.)
We do this by editing $row[1] (and $colws[1]) so we can avoid
disturbing the general column format calculation and printing.
We switch to fetchrow_array rather than fetchrow_arrayref. This is
clearer and also avoids having to copy $row (because the value in the
DB $row from fetchrow_hashref would be readonly).
We have to check for $synth eq 'f' as well as $synth being boolean
false, because SQLite's typeless nature (or, to put it another way,
DBD::SQLite's failure to look at the schema) means that a boolean
field's value of 'f' or 't' is simply returned as a string to Perl.
But of course "f" is trueish in Perl.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Use fetchrow_array instead.
Do not mistakenly drop the $synthcond assignment (!)
Ian Jackson [Thu, 2 Jul 2015 17:33:19 +0000 (18:33 +0100)]
cs-job-create: Permit creation of `synth' runvars
This will be useful for some hostalloc_* runvars which we are going to
introduce shortly.
This is going to be the way to set a runvar which is not copied by
cs-bisection-step or cs-adjust-flight. Using `synth' for this is
arguably slightly wrong but it does the right thing in all existing
cases. The alternative would be a schema change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Defer some of the discussion to later commit messages.
Ian Campbell [Wed, 16 Sep 2015 11:47:43 +0000 (12:47 +0100)]
Add support for selecting resources based on their properties.
In particular for allocating hosts based on host properties.
To do this we extend the hostflags syntax with "condition:arg1:arg2".
This specifies that the candidate host must pass the condition given
the arguments.
Each "condition" is a new module in the Osstest::ResourceCondition
namespace. For each condition an object is constructed using the given
arguments (split on ':') and stored in $hid.
When allocating for each candidate host the object's ->check method is
called giving $restype and $resname and will return true or false
depending on whether the given host meets the condition.
Only a single condition is implemented here "PropMinVer" which
requires that a given property on the resource has at least the given
value when compared as a version string. Enforce that the database and
the resource property both use the canonical CamelCase naming through
the use of the newly added here propname_check function. Lack of the
property being compared is taken a "no restriction" and hence is
allowed.
Osstest::cfgvar_re is exported for use in the new propname_check
function.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 3 Jul 2015 16:54:28 +0000 (17:54 +0100)]
cr-ensure-disk-space: Take the flights db lock
This eliminates the race with cs-bisection-step (and other
flight-construction tools which might reuse previous flights, provided
that they also do not pass previous flight numbers from hand to hand
with the db unlocked).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 3 Jul 2015 16:39:41 +0000 (17:39 +0100)]
cr-ensure-disk-space: Look at referring flights
Previously the flight to delete was simply the one with the lowest
flight number. Now we sort flights not by their own flight number,
but by the highest flight number of any referencing flight.
This means that flights whose builds are being reused are kept as long
as the reusing flights.
This almost-entirely fixes a largely-theoretical race in the way
cs-bisection-step works (where the flight's logs and build outputs
might be deleted between the setup and execution of the referring
flight).
A smaller race still exists because the stash check in
cs-bisection-step occurs before the being-created flight is visible to
other db clients. We will have to fix this by taking the flights
lock.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Thu, 27 Aug 2015 15:37:17 +0000 (16:37 +0100)]
ap-push: Use refs/heads/ for destinations
When the destination is a branch, specify refs/heads/ explicitly.
This makes ap-push work even if the ref does not yet exist on the
destination.
There is no functional change for an existing installation pushing to
an existing branch. But for a hypothetical new installation, this
would be necessary.
And, more relevantly, when new "branches" are invented, the use of an
existing ap-push case as a template will generate a new case which
creates the branch as is necessary.
I leave the more complex osstest case alone. It's not clear to me
whether the destination ref not existing is an installation problem of
such severity that indeed ap-push should fail.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Also you can say AP_FETCH_MEMO_KEEP=1 to make it reuse old
information, which is useful for making comparisons.
For a further speed improvement, one can use `eatmydata'. This is not
the default because it risks corruption of `standalone.db' which is
used for other purposes too. Add a comment about possibly improving
this.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 28 Aug 2015 18:09:27 +0000 (19:09 +0100)]
memoise: New utility
Give this a GPLv2+ licence so that we can move it into some other
FLOSS package later.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Do not use FSF street address in copyright notice.
Ship a copy of GPL-3.
Ian Jackson [Tue, 15 Sep 2015 17:24:17 +0000 (18:24 +0100)]
standalone: Set very long SQLite3 busy timeout in Perl
Without this, big standalone-generate-dump-flight-runvars jobs may
trying to serialise so much work that SQLite3 times out. And we are
about to introduce an optimisation which makes this much more likely.
In standalone mode we probably don't care much about this timeout at
all. (It might even be that the user is using sqlite(3) and has
effectively locked the database interactively for an extended period.)
We would prefer to rely on the user to stop anything that seems to
have become stuck. So set the timeout to 10ks.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 16 Sep 2015 12:34:28 +0000 (13:34 +0100)]
standalone: Do not blunder on after errors
./standalone's with_logging function would _log_ errors, but it
wouldn't exit immediately. As a result, the script would blunder on.
Normally it wouldn't do very much more since most of the with_logging
calls are the last thing it does - but the exit status would be wrong
(0, from echo).
As a result, for example, standalone-generate-dump-flight-runvars
would never properly report make-flight failures.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Mon, 14 Sep 2015 10:55:08 +0000 (11:55 +0100)]
Executive: Abolish use of the `configdb'
This was a database used by networking infrastructure on the
now-obsolete XenClient network in the Citrix Cambridge office (which
used some management tools developed by Mythic Beasts).
The production database in Cambridge no longer has the configdb, and
both instances have `HostDB_Executive_NoConfigDB 1' in the
configuration. We think it very unlikely that anyone has as similar
arrangement.
Remove all the code for accessing this database. We leave the config
settings `NoConfigDB' for now, for the benefit of ad-hoc trees which
are not immediately updated but which use their site's official
production-config. They can be deleted later.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
But cs-bisection step foolishly assumed that the --graph-out argument
did not contain any shell metacharacters. Fix this.
Specifically:
* Change invocations of perl's open to use the 3-argument form
* Change invocations of system to pass individual arguments rather
than constructing a shell script fragment and relying on the shell
to split it up.
* In particular, in the png processing pipeline, use the "sh -ec
<script> x <arg>..." technique to pass the input and output
filenames in a way that does not expose them to the shell's parser.
To avoid making this code more tangled than it already is, also
break out the construction of what is now $scriptlet.
* Escape metacharacters in the URIs we put in the html output.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 14 Sep 2015 10:20:06 +0000 (11:20 +0100)]
crontab-cambridge: Change the days when we run a given distro-debian suite
The weekly CD images which are used by the snapshot flight are
generated Sunday-Monday, so running that on a Saturday as we have been
doing ensures that it will take at least two iterations/weeks to get
any issues fixed.
Also the current ordering of the existing releases made it hard to
decide where to insert a new release (e.g. Stretch).
So reorder as:
- Run the Sid daily run on a Monday
- Run the Snapshot run on a Tuesday (to pick up the weekly builds
from Monday)
- Run Squeeze on Wednesday and continue with newer releases
chronologically from there.
New releases can then be added at the end (wrapping the days).
Also add some blank lines to aid clarity.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 11 Sep 2015 10:07:03 +0000 (11:07 +0100)]
ts-xen-build: Do not set QEMU_REMOTE unless $r{tree_qemu} is set
4.4 and earlier do not check if QEMU_REMOTE is empty before using it.
From 4.5 onwards if QEMU_REMOTE is empty then default is used.
This should fix the build-*-prev job for 4.5 and earlier. In this job
we deliberately don't specify tree_qemu since we want whatever
that branch gives us.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 7 Sep 2015 12:58:29 +0000 (13:58 +0100)]
ts-xen-install: Rewrite /etc/hosts to comment out 127.0.1.1 entry
Debian creates an entry such as:
127.0.1.1 lace-bug.xs.citrite.net lace-bug
This causes local lookups of the FQDN to get 127.0.1.1, which is
unhelpful if you are looking for an address to bind to and were hoping
to get the public IP address, as libvirt does on the target host for
migration.
Here we remove (actually, comment) any 127.0.1.1 line in /etc/hosts.
This means that lookups of a hosts own name (fqdn or just dn) now rely
on DNS, which may not be ideal. However for a host which uses DHCP I'm
not aware of a way to keep /etc/hosts up to date with the actual IP
address the machine has. In our infra the test host IP addresses are
all static, but I don't think we want to rely on at any more that we
already do.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 15:18:37 +0000 (16:18 +0100)]
cambridge: arrange to test each new baseline
Provide a new cr-daily-branch setting OSSTEST_BASELINES_ONLY which
causes it to only attempt to test the current baseline (if it is
untested) and never the tip version. Such tests will not result in any
push.
Each new baseline is tested exactly once (i.e. we aren't repeating
hoping for a pass), hence the correct revision is just the one tested
by the last run on the branch.
Add a cronjob to Cambridge which runs in this manner, ensuring that
there will usually be some sort of reasonably up to date baseline for
any given branch which can be used for comparisons in adhoc testing or
bisections.
This will also give us some data on the success of various branches on
the set of machines in Cambridge, which can be useful/interesting.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 31 Jul 2015 10:58:48 +0000 (11:58 +0100)]
Osstest/TestSupport: Hide $ho->{Toolstack} from casual use
This should only be accessed via toolstack($ho), which is responsible
for caching the value. Rename the field to _Toolstack to deter code
from using it.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 6 Aug 2015 17:03:28 +0000 (18:03 +0100)]
ts-xen-install: Add dom0_mem runvar to control dom0 memory
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 9 Sep 2015 15:46:21 +0000 (16:46 +0100)]
production-config: Update TftpDiVersion
I have already run mg-debian-installer-update-all
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- also update production-config-cambridge ]
Ian Jackson [Mon, 7 Sep 2015 13:00:51 +0000 (14:00 +0100)]
Manual allocation: Break out manual_allocation_base_jobinfo from mg-blockage
This is called `jobinfo' because it ought to be used in
alloc_resources's JobInfo xparam, rather than an Xinfo in the booking:
JobInfo is per planning client; Xinfo is per individual resource.
mg-blockage currently gets this wrong; we will fix that shortly.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New patch
v3: Fix "joinfo" to "jobinfo".
Ian Jackson [Mon, 7 Sep 2015 13:15:25 +0000 (14:15 +0100)]
Manual allocation: Report better info in plan for rogue tasks
(This will only take effect as such tasks appear in the plan for the
first time. Ie, once a rogue task is found, the plan is populated by
whatever version of the planner is running at that time. So the
effect will not be immediately visible.)
Signed-off-by: Ian Jackson <iwj@osstest.xs.citrite.net>
---
v2: New patch
Ian Jackson [Mon, 7 Sep 2015 14:14:10 +0000 (15:14 +0100)]
Planner: ms-queuedaemon: Better log message for Tcl `after idle'
This does not mean the planner is `idle' in any general sense of the
word. It just means that the Tcl event loop has finished processing
outstanding events. Change the debug message to be less confusing.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New patch
Ian Jackson [Fri, 4 Sep 2015 16:44:21 +0000 (17:44 +0100)]
Planner: Remove O(n^2) problem from plan restart
Change `./ms-planner unprocessed' to take a file of infos on stdin,
and when we restart the planning, invoke it once.
(This would be an incompatible change to the planner, needing a
queuedaemon restart, if this patch were applied separately from the
previous "Report unprocessed planning clients".)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New patch
Ian Jackson [Mon, 7 Sep 2015 14:08:19 +0000 (15:08 +0100)]
Planner: Report unprocessed planning clients
With recent changes, it can happen that a queue daemon client is not
given an opportunity to report itself in the plan. This makes the
plan incomplete.
(For resource-plan.html, because the planning run was restarted to try
to quickly allocate new resources; for resource-projection.html,
because it's an old client that doesn't support feature-noalloc.)
When this happens, provide an explicit indication of this in the plan:
* Invent a new entry Unprocessed in data-*.pl for this information.
* Display the first 50 in ms-planner show-html.
* Provide a new ms-planner invocation `unprocessed' to record one.
* Note unprocessed when we skip a client due to !feature-noalloc.
* Note unprocessed for remaining queue when we restart planning.
For now this algorithm can be rather unfortunately O(n^2) when
draining the planning queue, because each `ms-planner unprocessed'
invocation adds only one job but needs to read and write the whole
plan. This will be fixed shortly.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New patch
Ian Jackson [Thu, 3 Sep 2015 11:46:27 +0000 (12:46 +0100)]
Plan reporting: Provide get-last-plan queuedaemon command
This allows retrieval, by monitoring clients which are not
participating in the planning queue, of the finished projection, or
the unfinished plan as it was at the time of last restart.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Fix invocation of return-plan-to-client.
Use data-W.final.pl, not data-W-final.pl, to fit
with existing .gitignore, and be slightly neater.
Ian Jackson [Wed, 2 Sep 2015 14:12:48 +0000 (15:12 +0100)]
Planner: ms-queuedaemon: Restart planning when resources become free
This solves a performance problem with the existing planner.
The problem is that with a large installation, and a big queue, a full
plan can take a long time to prepare. (In our current installation,
perhaps as long as half an hour.) Any resource which becomes free
during one plan run cannot be allocated to a new job until the next
plan run starts. This means resources (test machines) are often
sitting around idle.
Fix this by restarting the planning process as soon as any new
resource becomes free. This means that jobs at the front of the queue
get a chance to allocate it right away, so it will probably be
allocated soon.
If it is only interesting to jobs later in the queue, then there may
be a delay in reallocating it, but presumably the resource is not much
in demand and those later jobs will allocate it when they get a bit
closer to the head.
But, there is a problem with this: it means that the plan is generally
never completed. So we have no overview any more of when which
flights will finish and what the overall queue is like. We solve this
problem by running a second instance of the planner algorithm, all the
way to completion, in a `dummy' mode where no actual resource
allocation takes place. This second `projection' instance comes into
being whenever the main `plan' instance is restarted, and it inherits
the planning state from the main `plan' instance.
Global livelock (where we keep restarting the plan but never manage to
allocate anything) is not possible because each restart involves a new
resource becoming free. If nothing gets allocated because we can't
get that far before being restarted, then eventually there will be
nothing left allocated to become newly free.
Starvation, of a form, is possible: a late-in-queue job which wants a
resource available right now might have difficulty allocating it
because the planner is spending its effort rescheduling early-in-queue
jobs which want resources which are in greater demand - so that the
late-in-queue job never gets called. Arguably this is an appropriate
allocation of planning time.
With this arrangement we can generate two reports: a `plan' report
containing the short term plan which was used for actual resource
allocation, and which is frequently restarted and therefore not
necessarily complete; and a `projection' report which contains a
complete plan for all work the system is currently aware of, but which
is less-frequently updated.
Because planner clients do not contain the planning algorithm state,
the only client change needed is the ability to run in a `dummy' mode
without actual allocation; this is the `noalloc' feature earlier in
this series.
The main work is in ms-queuedaemon. We have prepared the ground for
multiple instances of the planning algorithm; from the point of view
of ms-queuedaemon, an instance of the planning algorithm is mainly a
walk over the job queue. So we call them `walkers'.
Therefore, what we do here is introduce a new `projection' walker,
as follows:
Add `projection' to the global list of possible walkers.
Invent a new section of code, the `restarter', which is responsible
for managing the relationship between the two walkers. (It uses
direct knowledge of the queue state data structures, etc., to avoid
having to invent a complete formal interface to a walker.)
If we ever finish the plan walker's queue, we update both the
projection report output and the plan report output, from the same
plan. Finishing the projection walker's queue means we have a
complete projection, but we don't touch the plan.
In principle it might happen that the plan walker might overtake the
projection walker, and then complete, write out a complete and up to
date plan as the projection, and that the projection walker would then
complete and overwrite the projection with less-up-to-date
information. We don't explicitly exclude this. Of course such a
result will be rectified soon enough by another planning run.
The restarter can ask the database for the list of currently-available
resources, and can therefore detect when new become newly-free.
The rest of the code remains largely ignorant of the operation of the
restarter. There are a few hooks:
runneeded-perhaps-start notifies the restarter when we start the
plan; this is used by the restarter to record the set of free
resources at the start of a planning run, so that it can see later
whether any /new/ resources have become free.
restarter-maybe-provoke-restart is called when we get notification
from the the owner daemon that resources may have become idle. We
look for newly-idle resources, and if there are any, and we are
running the plan walker, we directly edit the plan walker's queue to
put RESTART at the front.
queuerun-perhaps-step spots the special entry RESTART in its queue and
calls into back the restarter when it finds it. This deferred
approach is necessary because we can't do the restart operation while
a client is thinking (because we would have to change that client's
cogitation from the `live, can allocate' mode to the `dummy, cannot
allocate' mode; and because that would make the code more complex).
The main work is done in the restarter-restart-now hook. It reports
the current (incomplete) plan, and then checks to see if a projection
walker is running; if it is, it leaves it alone, and simply abandons
the current plan run and arranges for a new run to started. If a
projection walker is not running it copies all the plan walker's state
(including the data-plan.pl disk file containing the plan-in-progress)
to the projection walker, and sets the projection walker going.
We update .gitignore to ignore data-plan.* and data-projection.*.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Update .gitignore too.
Use `walker-globals' not `walker-runvars' (which does not exist).
Remove wrap damage `#' from comment.
Fix typo in commit message.
Fix several silly bugs in for-free-resources
Fix three silly bugs relating to handling of $newly_free
Fix a wrong bracket syntax error in restarter-maybe-provoke-restart
Properly return from queuerun-perhaps-step on RESTART;
restarter-restart-now has taken the flow of control.
Reorder operations in restarter-restart-now so as to make it work
Correct some wrong log messages in restarter-restart-now
Add a log message when we restart planning
Minor code layout changes
In notify-to-think, process feature-noalloc properly
Ian Jackson [Tue, 1 Sep 2015 18:04:53 +0000 (19:04 +0100)]
Planner: ms-queuedaemon: Break out queuerun-finished/<walker>
This formalises the queue-completed interface, allowing parts outside
the queuerun machinery to cleanly be notified when a queue is
completed, and relieving the queuerun-perhaps-step of the need to know
what to do for the end of any particular walker's queue.
Currently there is still only one walker, `plan'.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
If multiple walkers want to ask the same chan, we want to serialise
them. This is actually straightforward: Firstly, we arrrange that
each walker finishing a thought will prompt _all_ walkers to
reconsider whether they need to continue. Then we can simply do
nothing if we want to a chan to think that another walker is already
waiting for; since that other walker will prompt us later.
Still no actual functional change because there is still only one
walker.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 1 Sep 2015 15:54:04 +0000 (16:54 +0100)]
Planner: ms-queuedaemon: Prep for multiple walkers
We are going to introduce multiple concurrent streams of planning
processing, called `walkers'.
Prepare the ground for this with some formulaic changes which will
otherwise greatly clutter substantive patches.
(A client will still only think for one walker at once, because that's
what the client protocol expects - and anything else would be far too
confusing.)
General:
* Introduce the concept of a `walker' to ms-queuedaemon.
* Provide a list of the walkers which might exist, `walkers'
* Provide some helper procedures for iterating over these,
and easily accessing their state.
Queue handling:
* Add a new `w' argument to many procs: specifically, most of the
procs in the section `machinery for running the queue'.
* Log the walker ($w) at the start of all relevant log messages.
* Pass the -w option to ms-planner and ms-planner-debug.
* Add safety catches which will crash the ms-queuedaemon if it finds
it is asking the same client to think for more than one walker.
* we-are-thinking and check-we-are-thinking tell the caller what
walker the client is thinking for.
* In the resource-plan.html filename, replace `plan' with the walker
filename.
Elsewhere:
* Teach dequeue-chan to deal with all the walkers, including
maybe the (one) walker for which the client is thinking.
* Teach log-state to report on all the walkers.
* In the runneeded logic, hardcode `plan' as the walker to use.
There is still actually only one walker.
No overall functional change, except to some log messages.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Fix walker-globals to import the $w/$v from #0, ie the global scope
Correct invocation of upvar in walker-globals
Use walker-globals everywhere, not obsolete name walker-vars
Do not pass w to do-book-resources (which does not want it
because it uses uses chan-we-are-thinking)
Ian Jackson [Tue, 1 Sep 2015 15:52:17 +0000 (16:52 +0100)]
Planner: ms-planner support -w option
We are going to introduce multiple concurrent streams of planning
processing, called `walkers' in ms-queuedaemon. The work-in-progress
plan is stored, server-side, during planning, in data-plan.pl. But we
need to have more than one of these.
Update ms-planner and ms-planner-debug to honour a -w option, to
specify a replacement for the word `plan' in `data-plan.pl'.
No overall functional change, since nothing uses these options yet.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 1 Sep 2015 13:56:46 +0000 (14:56 +0100)]
Planner: client side: New `!OK think noalloc' protocol
Introduce a way for the queue daemon to tell its client that it must
not allocate anything in this planning iteration.
In the client:
* Advertise the new feature via set-info.
* Accept the `noalloc' part of `!OK think noalloc';
* Print that in our log message;
* Honour it by passing it to $resourcecall.
And document the new protocol. However, there is no server-side yet,
so this does not yet introduce any overall change to the system.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 1 Sep 2015 13:50:52 +0000 (14:50 +0100)]
Planner: client side: $mayalloc parameter to $resourcecall->()
Add a new parameter to $resourcecall which allows the alloc_resources
loop in Osstest::Executive to specify to its clients that on this
occasion they should not make any actual allocations.
The callers of alloc_resources are all adjusted to honour this new
parameter:
* ts-hosts-allocate-Executive avoids allocating unless $mayalloc
* mg-allocate avoids allocating unless $mayalloc
* mg-blockage never allocates anyway.
Currently we always pass 1, so no functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Add missing my $mayalloc. ($plan is global.)
Ian Jackson [Tue, 1 Sep 2015 18:15:32 +0000 (19:15 +0100)]
Planner: Fix indefinite holdoff
runneeded-ensure-will would always reset the runneeded_holdoff_after
timer. So no new queue run would start until no runneeded-ensure-will
has occurred for (currently) 30s.
Instead, only start the timer if it's not already running.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 1 Apr 2015 16:55:12 +0000 (17:55 +0100)]
ms-planner: Propagate a booking's Job to the plan
This needs to be done in several places:
- When booking resources (cmd: book-resources), to initially propagate
from the booking (e.g. from ts-hosts-allocate-Executive's input).
- On reset (cmd: reset) so that the Events corresponding to actual
allocations retain their Job.
- When retrieving the plan (cmd: get-plan), so it would be available
for logging etc.
The Job is added by a following patch "ts-hosts-allocate-Executive:
Add the requesting Job to the booking".
This patch has been deployed on the Cambridge instance for testing
with no ill-effects.
cmd_reset does not include a ->Job for jobs which are "(preparing)",
corresponding to a job which is going to use a shared host which is
currently being installed by another job. I was unable to figure out a
way to include these.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 15:43:47 +0000 (16:43 +0100)]
Disable proxy for all preseeded wget
At least in some contexts scripts can be run with http_proxy pointing
to the apt proxy (I noticed it in /usr/lib/base-installer.d/ hook used
for ucode installation).
Since all of these particular fetches are from a known to be local
webserver just disable proxying altogether.
With busybox wget in d-i this is done with the -Y argument.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 16:52:41 +0000 (17:52 +0100)]
Debian: Create /boot/boot -> . symlink on ARM when PvMenuLst enabled
This is under the same conditional as the nobootloader confirmation
one, since they effectively both stem from the lack of a boot loader
and the consequential use of the pv-grub-menu package.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 16:52:39 +0000 (17:52 +0100)]
Debian: ARM: only apply no bootloader workaround if xopts{PvMenuLst}
This workaround is only necessary because of how pv-grub-menu works,
so we should only apply both or neither of them.
This results in a long line and I'm about to add a second workaround
to this block, so switch to a regular if block instead of postfixing
on the one command. Move the comment inside that block in preparation
for other workarounds as well.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>