Ian Jackson [Thu, 2 Jul 2015 17:33:19 +0000 (18:33 +0100)]
cs-job-create: Permit creation of `synth' runvars
This will be useful for some hostalloc_* runvars which we are going to
introduce shortly.
This is going to be the way to set a runvar which is not copied by
cs-bisection-step or cs-adjust-flight. Using `synth' for this is
arguably slightly wrong but it does the right thing in all existing
cases. The alternative would be a schema change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Defer some of the discussion to later commit messages.
Ian Campbell [Wed, 16 Sep 2015 11:47:43 +0000 (12:47 +0100)]
Add support for selecting resources based on their properties.
In particular for allocating hosts based on host properties.
To do this we extend the hostflags syntax with "condition:arg1:arg2".
This specifies that the candidate host must pass the condition given
the arguments.
Each "condition" is a new module in the Osstest::ResourceCondition
namespace. For each condition an object is constructed using the given
arguments (split on ':') and stored in $hid.
When allocating for each candidate host the object's ->check method is
called giving $restype and $resname and will return true or false
depending on whether the given host meets the condition.
Only a single condition is implemented here "PropMinVer" which
requires that a given property on the resource has at least the given
value when compared as a version string. Enforce that the database and
the resource property both use the canonical CamelCase naming through
the use of the newly added here propname_check function. Lack of the
property being compared is taken a "no restriction" and hence is
allowed.
Osstest::cfgvar_re is exported for use in the new propname_check
function.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 3 Jul 2015 16:54:28 +0000 (17:54 +0100)]
cr-ensure-disk-space: Take the flights db lock
This eliminates the race with cs-bisection-step (and other
flight-construction tools which might reuse previous flights, provided
that they also do not pass previous flight numbers from hand to hand
with the db unlocked).
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 3 Jul 2015 16:39:41 +0000 (17:39 +0100)]
cr-ensure-disk-space: Look at referring flights
Previously the flight to delete was simply the one with the lowest
flight number. Now we sort flights not by their own flight number,
but by the highest flight number of any referencing flight.
This means that flights whose builds are being reused are kept as long
as the reusing flights.
This almost-entirely fixes a largely-theoretical race in the way
cs-bisection-step works (where the flight's logs and build outputs
might be deleted between the setup and execution of the referring
flight).
A smaller race still exists because the stash check in
cs-bisection-step occurs before the being-created flight is visible to
other db clients. We will have to fix this by taking the flights
lock.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Thu, 27 Aug 2015 15:37:17 +0000 (16:37 +0100)]
ap-push: Use refs/heads/ for destinations
When the destination is a branch, specify refs/heads/ explicitly.
This makes ap-push work even if the ref does not yet exist on the
destination.
There is no functional change for an existing installation pushing to
an existing branch. But for a hypothetical new installation, this
would be necessary.
And, more relevantly, when new "branches" are invented, the use of an
existing ap-push case as a template will generate a new case which
creates the branch as is necessary.
I leave the more complex osstest case alone. It's not clear to me
whether the destination ref not existing is an installation problem of
such severity that indeed ap-push should fail.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Also you can say AP_FETCH_MEMO_KEEP=1 to make it reuse old
information, which is useful for making comparisons.
For a further speed improvement, one can use `eatmydata'. This is not
the default because it risks corruption of `standalone.db' which is
used for other purposes too. Add a comment about possibly improving
this.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 28 Aug 2015 18:09:27 +0000 (19:09 +0100)]
memoise: New utility
Give this a GPLv2+ licence so that we can move it into some other
FLOSS package later.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Do not use FSF street address in copyright notice.
Ship a copy of GPL-3.
Ian Jackson [Tue, 15 Sep 2015 17:24:17 +0000 (18:24 +0100)]
standalone: Set very long SQLite3 busy timeout in Perl
Without this, big standalone-generate-dump-flight-runvars jobs may
trying to serialise so much work that SQLite3 times out. And we are
about to introduce an optimisation which makes this much more likely.
In standalone mode we probably don't care much about this timeout at
all. (It might even be that the user is using sqlite(3) and has
effectively locked the database interactively for an extended period.)
We would prefer to rely on the user to stop anything that seems to
have become stuck. So set the timeout to 10ks.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 16 Sep 2015 12:34:28 +0000 (13:34 +0100)]
standalone: Do not blunder on after errors
./standalone's with_logging function would _log_ errors, but it
wouldn't exit immediately. As a result, the script would blunder on.
Normally it wouldn't do very much more since most of the with_logging
calls are the last thing it does - but the exit status would be wrong
(0, from echo).
As a result, for example, standalone-generate-dump-flight-runvars
would never properly report make-flight failures.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Mon, 14 Sep 2015 10:55:08 +0000 (11:55 +0100)]
Executive: Abolish use of the `configdb'
This was a database used by networking infrastructure on the
now-obsolete XenClient network in the Citrix Cambridge office (which
used some management tools developed by Mythic Beasts).
The production database in Cambridge no longer has the configdb, and
both instances have `HostDB_Executive_NoConfigDB 1' in the
configuration. We think it very unlikely that anyone has as similar
arrangement.
Remove all the code for accessing this database. We leave the config
settings `NoConfigDB' for now, for the benefit of ad-hoc trees which
are not immediately updated but which use their site's official
production-config. They can be deleted later.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
But cs-bisection step foolishly assumed that the --graph-out argument
did not contain any shell metacharacters. Fix this.
Specifically:
* Change invocations of perl's open to use the 3-argument form
* Change invocations of system to pass individual arguments rather
than constructing a shell script fragment and relying on the shell
to split it up.
* In particular, in the png processing pipeline, use the "sh -ec
<script> x <arg>..." technique to pass the input and output
filenames in a way that does not expose them to the shell's parser.
To avoid making this code more tangled than it already is, also
break out the construction of what is now $scriptlet.
* Escape metacharacters in the URIs we put in the html output.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 14 Sep 2015 10:20:06 +0000 (11:20 +0100)]
crontab-cambridge: Change the days when we run a given distro-debian suite
The weekly CD images which are used by the snapshot flight are
generated Sunday-Monday, so running that on a Saturday as we have been
doing ensures that it will take at least two iterations/weeks to get
any issues fixed.
Also the current ordering of the existing releases made it hard to
decide where to insert a new release (e.g. Stretch).
So reorder as:
- Run the Sid daily run on a Monday
- Run the Snapshot run on a Tuesday (to pick up the weekly builds
from Monday)
- Run Squeeze on Wednesday and continue with newer releases
chronologically from there.
New releases can then be added at the end (wrapping the days).
Also add some blank lines to aid clarity.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 11 Sep 2015 10:07:03 +0000 (11:07 +0100)]
ts-xen-build: Do not set QEMU_REMOTE unless $r{tree_qemu} is set
4.4 and earlier do not check if QEMU_REMOTE is empty before using it.
From 4.5 onwards if QEMU_REMOTE is empty then default is used.
This should fix the build-*-prev job for 4.5 and earlier. In this job
we deliberately don't specify tree_qemu since we want whatever
that branch gives us.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 7 Sep 2015 12:58:29 +0000 (13:58 +0100)]
ts-xen-install: Rewrite /etc/hosts to comment out 127.0.1.1 entry
Debian creates an entry such as:
127.0.1.1 lace-bug.xs.citrite.net lace-bug
This causes local lookups of the FQDN to get 127.0.1.1, which is
unhelpful if you are looking for an address to bind to and were hoping
to get the public IP address, as libvirt does on the target host for
migration.
Here we remove (actually, comment) any 127.0.1.1 line in /etc/hosts.
This means that lookups of a hosts own name (fqdn or just dn) now rely
on DNS, which may not be ideal. However for a host which uses DHCP I'm
not aware of a way to keep /etc/hosts up to date with the actual IP
address the machine has. In our infra the test host IP addresses are
all static, but I don't think we want to rely on at any more that we
already do.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 15:18:37 +0000 (16:18 +0100)]
cambridge: arrange to test each new baseline
Provide a new cr-daily-branch setting OSSTEST_BASELINES_ONLY which
causes it to only attempt to test the current baseline (if it is
untested) and never the tip version. Such tests will not result in any
push.
Each new baseline is tested exactly once (i.e. we aren't repeating
hoping for a pass), hence the correct revision is just the one tested
by the last run on the branch.
Add a cronjob to Cambridge which runs in this manner, ensuring that
there will usually be some sort of reasonably up to date baseline for
any given branch which can be used for comparisons in adhoc testing or
bisections.
This will also give us some data on the success of various branches on
the set of machines in Cambridge, which can be useful/interesting.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 31 Jul 2015 10:58:48 +0000 (11:58 +0100)]
Osstest/TestSupport: Hide $ho->{Toolstack} from casual use
This should only be accessed via toolstack($ho), which is responsible
for caching the value. Rename the field to _Toolstack to deter code
from using it.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Thu, 6 Aug 2015 17:03:28 +0000 (18:03 +0100)]
ts-xen-install: Add dom0_mem runvar to control dom0 memory
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 9 Sep 2015 15:46:21 +0000 (16:46 +0100)]
production-config: Update TftpDiVersion
I have already run mg-debian-installer-update-all
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- also update production-config-cambridge ]
Ian Jackson [Mon, 7 Sep 2015 13:00:51 +0000 (14:00 +0100)]
Manual allocation: Break out manual_allocation_base_jobinfo from mg-blockage
This is called `jobinfo' because it ought to be used in
alloc_resources's JobInfo xparam, rather than an Xinfo in the booking:
JobInfo is per planning client; Xinfo is per individual resource.
mg-blockage currently gets this wrong; we will fix that shortly.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New patch
v3: Fix "joinfo" to "jobinfo".
Ian Jackson [Mon, 7 Sep 2015 13:15:25 +0000 (14:15 +0100)]
Manual allocation: Report better info in plan for rogue tasks
(This will only take effect as such tasks appear in the plan for the
first time. Ie, once a rogue task is found, the plan is populated by
whatever version of the planner is running at that time. So the
effect will not be immediately visible.)
Signed-off-by: Ian Jackson <iwj@osstest.xs.citrite.net>
---
v2: New patch
Ian Jackson [Mon, 7 Sep 2015 14:14:10 +0000 (15:14 +0100)]
Planner: ms-queuedaemon: Better log message for Tcl `after idle'
This does not mean the planner is `idle' in any general sense of the
word. It just means that the Tcl event loop has finished processing
outstanding events. Change the debug message to be less confusing.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New patch
Ian Jackson [Fri, 4 Sep 2015 16:44:21 +0000 (17:44 +0100)]
Planner: Remove O(n^2) problem from plan restart
Change `./ms-planner unprocessed' to take a file of infos on stdin,
and when we restart the planning, invoke it once.
(This would be an incompatible change to the planner, needing a
queuedaemon restart, if this patch were applied separately from the
previous "Report unprocessed planning clients".)
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New patch
Ian Jackson [Mon, 7 Sep 2015 14:08:19 +0000 (15:08 +0100)]
Planner: Report unprocessed planning clients
With recent changes, it can happen that a queue daemon client is not
given an opportunity to report itself in the plan. This makes the
plan incomplete.
(For resource-plan.html, because the planning run was restarted to try
to quickly allocate new resources; for resource-projection.html,
because it's an old client that doesn't support feature-noalloc.)
When this happens, provide an explicit indication of this in the plan:
* Invent a new entry Unprocessed in data-*.pl for this information.
* Display the first 50 in ms-planner show-html.
* Provide a new ms-planner invocation `unprocessed' to record one.
* Note unprocessed when we skip a client due to !feature-noalloc.
* Note unprocessed for remaining queue when we restart planning.
For now this algorithm can be rather unfortunately O(n^2) when
draining the planning queue, because each `ms-planner unprocessed'
invocation adds only one job but needs to read and write the whole
plan. This will be fixed shortly.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New patch
Ian Jackson [Thu, 3 Sep 2015 11:46:27 +0000 (12:46 +0100)]
Plan reporting: Provide get-last-plan queuedaemon command
This allows retrieval, by monitoring clients which are not
participating in the planning queue, of the finished projection, or
the unfinished plan as it was at the time of last restart.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Fix invocation of return-plan-to-client.
Use data-W.final.pl, not data-W-final.pl, to fit
with existing .gitignore, and be slightly neater.
Ian Jackson [Wed, 2 Sep 2015 14:12:48 +0000 (15:12 +0100)]
Planner: ms-queuedaemon: Restart planning when resources become free
This solves a performance problem with the existing planner.
The problem is that with a large installation, and a big queue, a full
plan can take a long time to prepare. (In our current installation,
perhaps as long as half an hour.) Any resource which becomes free
during one plan run cannot be allocated to a new job until the next
plan run starts. This means resources (test machines) are often
sitting around idle.
Fix this by restarting the planning process as soon as any new
resource becomes free. This means that jobs at the front of the queue
get a chance to allocate it right away, so it will probably be
allocated soon.
If it is only interesting to jobs later in the queue, then there may
be a delay in reallocating it, but presumably the resource is not much
in demand and those later jobs will allocate it when they get a bit
closer to the head.
But, there is a problem with this: it means that the plan is generally
never completed. So we have no overview any more of when which
flights will finish and what the overall queue is like. We solve this
problem by running a second instance of the planner algorithm, all the
way to completion, in a `dummy' mode where no actual resource
allocation takes place. This second `projection' instance comes into
being whenever the main `plan' instance is restarted, and it inherits
the planning state from the main `plan' instance.
Global livelock (where we keep restarting the plan but never manage to
allocate anything) is not possible because each restart involves a new
resource becoming free. If nothing gets allocated because we can't
get that far before being restarted, then eventually there will be
nothing left allocated to become newly free.
Starvation, of a form, is possible: a late-in-queue job which wants a
resource available right now might have difficulty allocating it
because the planner is spending its effort rescheduling early-in-queue
jobs which want resources which are in greater demand - so that the
late-in-queue job never gets called. Arguably this is an appropriate
allocation of planning time.
With this arrangement we can generate two reports: a `plan' report
containing the short term plan which was used for actual resource
allocation, and which is frequently restarted and therefore not
necessarily complete; and a `projection' report which contains a
complete plan for all work the system is currently aware of, but which
is less-frequently updated.
Because planner clients do not contain the planning algorithm state,
the only client change needed is the ability to run in a `dummy' mode
without actual allocation; this is the `noalloc' feature earlier in
this series.
The main work is in ms-queuedaemon. We have prepared the ground for
multiple instances of the planning algorithm; from the point of view
of ms-queuedaemon, an instance of the planning algorithm is mainly a
walk over the job queue. So we call them `walkers'.
Therefore, what we do here is introduce a new `projection' walker,
as follows:
Add `projection' to the global list of possible walkers.
Invent a new section of code, the `restarter', which is responsible
for managing the relationship between the two walkers. (It uses
direct knowledge of the queue state data structures, etc., to avoid
having to invent a complete formal interface to a walker.)
If we ever finish the plan walker's queue, we update both the
projection report output and the plan report output, from the same
plan. Finishing the projection walker's queue means we have a
complete projection, but we don't touch the plan.
In principle it might happen that the plan walker might overtake the
projection walker, and then complete, write out a complete and up to
date plan as the projection, and that the projection walker would then
complete and overwrite the projection with less-up-to-date
information. We don't explicitly exclude this. Of course such a
result will be rectified soon enough by another planning run.
The restarter can ask the database for the list of currently-available
resources, and can therefore detect when new become newly-free.
The rest of the code remains largely ignorant of the operation of the
restarter. There are a few hooks:
runneeded-perhaps-start notifies the restarter when we start the
plan; this is used by the restarter to record the set of free
resources at the start of a planning run, so that it can see later
whether any /new/ resources have become free.
restarter-maybe-provoke-restart is called when we get notification
from the the owner daemon that resources may have become idle. We
look for newly-idle resources, and if there are any, and we are
running the plan walker, we directly edit the plan walker's queue to
put RESTART at the front.
queuerun-perhaps-step spots the special entry RESTART in its queue and
calls into back the restarter when it finds it. This deferred
approach is necessary because we can't do the restart operation while
a client is thinking (because we would have to change that client's
cogitation from the `live, can allocate' mode to the `dummy, cannot
allocate' mode; and because that would make the code more complex).
The main work is done in the restarter-restart-now hook. It reports
the current (incomplete) plan, and then checks to see if a projection
walker is running; if it is, it leaves it alone, and simply abandons
the current plan run and arranges for a new run to started. If a
projection walker is not running it copies all the plan walker's state
(including the data-plan.pl disk file containing the plan-in-progress)
to the projection walker, and sets the projection walker going.
We update .gitignore to ignore data-plan.* and data-projection.*.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Update .gitignore too.
Use `walker-globals' not `walker-runvars' (which does not exist).
Remove wrap damage `#' from comment.
Fix typo in commit message.
Fix several silly bugs in for-free-resources
Fix three silly bugs relating to handling of $newly_free
Fix a wrong bracket syntax error in restarter-maybe-provoke-restart
Properly return from queuerun-perhaps-step on RESTART;
restarter-restart-now has taken the flow of control.
Reorder operations in restarter-restart-now so as to make it work
Correct some wrong log messages in restarter-restart-now
Add a log message when we restart planning
Minor code layout changes
In notify-to-think, process feature-noalloc properly
Ian Jackson [Tue, 1 Sep 2015 18:04:53 +0000 (19:04 +0100)]
Planner: ms-queuedaemon: Break out queuerun-finished/<walker>
This formalises the queue-completed interface, allowing parts outside
the queuerun machinery to cleanly be notified when a queue is
completed, and relieving the queuerun-perhaps-step of the need to know
what to do for the end of any particular walker's queue.
Currently there is still only one walker, `plan'.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
If multiple walkers want to ask the same chan, we want to serialise
them. This is actually straightforward: Firstly, we arrrange that
each walker finishing a thought will prompt _all_ walkers to
reconsider whether they need to continue. Then we can simply do
nothing if we want to a chan to think that another walker is already
waiting for; since that other walker will prompt us later.
Still no actual functional change because there is still only one
walker.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 1 Sep 2015 15:54:04 +0000 (16:54 +0100)]
Planner: ms-queuedaemon: Prep for multiple walkers
We are going to introduce multiple concurrent streams of planning
processing, called `walkers'.
Prepare the ground for this with some formulaic changes which will
otherwise greatly clutter substantive patches.
(A client will still only think for one walker at once, because that's
what the client protocol expects - and anything else would be far too
confusing.)
General:
* Introduce the concept of a `walker' to ms-queuedaemon.
* Provide a list of the walkers which might exist, `walkers'
* Provide some helper procedures for iterating over these,
and easily accessing their state.
Queue handling:
* Add a new `w' argument to many procs: specifically, most of the
procs in the section `machinery for running the queue'.
* Log the walker ($w) at the start of all relevant log messages.
* Pass the -w option to ms-planner and ms-planner-debug.
* Add safety catches which will crash the ms-queuedaemon if it finds
it is asking the same client to think for more than one walker.
* we-are-thinking and check-we-are-thinking tell the caller what
walker the client is thinking for.
* In the resource-plan.html filename, replace `plan' with the walker
filename.
Elsewhere:
* Teach dequeue-chan to deal with all the walkers, including
maybe the (one) walker for which the client is thinking.
* Teach log-state to report on all the walkers.
* In the runneeded logic, hardcode `plan' as the walker to use.
There is still actually only one walker.
No overall functional change, except to some log messages.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Fix walker-globals to import the $w/$v from #0, ie the global scope
Correct invocation of upvar in walker-globals
Use walker-globals everywhere, not obsolete name walker-vars
Do not pass w to do-book-resources (which does not want it
because it uses uses chan-we-are-thinking)
Ian Jackson [Tue, 1 Sep 2015 15:52:17 +0000 (16:52 +0100)]
Planner: ms-planner support -w option
We are going to introduce multiple concurrent streams of planning
processing, called `walkers' in ms-queuedaemon. The work-in-progress
plan is stored, server-side, during planning, in data-plan.pl. But we
need to have more than one of these.
Update ms-planner and ms-planner-debug to honour a -w option, to
specify a replacement for the word `plan' in `data-plan.pl'.
No overall functional change, since nothing uses these options yet.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 1 Sep 2015 13:56:46 +0000 (14:56 +0100)]
Planner: client side: New `!OK think noalloc' protocol
Introduce a way for the queue daemon to tell its client that it must
not allocate anything in this planning iteration.
In the client:
* Advertise the new feature via set-info.
* Accept the `noalloc' part of `!OK think noalloc';
* Print that in our log message;
* Honour it by passing it to $resourcecall.
And document the new protocol. However, there is no server-side yet,
so this does not yet introduce any overall change to the system.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 1 Sep 2015 13:50:52 +0000 (14:50 +0100)]
Planner: client side: $mayalloc parameter to $resourcecall->()
Add a new parameter to $resourcecall which allows the alloc_resources
loop in Osstest::Executive to specify to its clients that on this
occasion they should not make any actual allocations.
The callers of alloc_resources are all adjusted to honour this new
parameter:
* ts-hosts-allocate-Executive avoids allocating unless $mayalloc
* mg-allocate avoids allocating unless $mayalloc
* mg-blockage never allocates anyway.
Currently we always pass 1, so no functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Add missing my $mayalloc. ($plan is global.)
Ian Jackson [Tue, 1 Sep 2015 18:15:32 +0000 (19:15 +0100)]
Planner: Fix indefinite holdoff
runneeded-ensure-will would always reset the runneeded_holdoff_after
timer. So no new queue run would start until no runneeded-ensure-will
has occurred for (currently) 30s.
Instead, only start the timer if it's not already running.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 1 Apr 2015 16:55:12 +0000 (17:55 +0100)]
ms-planner: Propagate a booking's Job to the plan
This needs to be done in several places:
- When booking resources (cmd: book-resources), to initially propagate
from the booking (e.g. from ts-hosts-allocate-Executive's input).
- On reset (cmd: reset) so that the Events corresponding to actual
allocations retain their Job.
- When retrieving the plan (cmd: get-plan), so it would be available
for logging etc.
The Job is added by a following patch "ts-hosts-allocate-Executive:
Add the requesting Job to the booking".
This patch has been deployed on the Cambridge instance for testing
with no ill-effects.
cmd_reset does not include a ->Job for jobs which are "(preparing)",
corresponding to a job which is going to use a shared host which is
currently being installed by another job. I was unable to figure out a
way to include these.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 15:43:47 +0000 (16:43 +0100)]
Disable proxy for all preseeded wget
At least in some contexts scripts can be run with http_proxy pointing
to the apt proxy (I noticed it in /usr/lib/base-installer.d/ hook used
for ucode installation).
Since all of these particular fetches are from a known to be local
webserver just disable proxying altogether.
With busybox wget in d-i this is done with the -Y argument.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 16:52:41 +0000 (17:52 +0100)]
Debian: Create /boot/boot -> . symlink on ARM when PvMenuLst enabled
This is under the same conditional as the nobootloader confirmation
one, since they effectively both stem from the lack of a boot loader
and the consequential use of the pv-grub-menu package.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 16:52:39 +0000 (17:52 +0100)]
Debian: ARM: only apply no bootloader workaround if xopts{PvMenuLst}
This workaround is only necessary because of how pv-grub-menu works,
so we should only apply both or neither of them.
This results in a long line and I'm about to add a second workaround
to this block, so switch to a regular if block instead of postfixing
on the one command. Move the comment inside that block in preparation
for other workarounds as well.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 13 Aug 2015 16:52:36 +0000 (17:52 +0100)]
ts-debian-di-install: Use exit/poweroff in preference to exit/always_halt
always_halt results in d-i calling "halt", which does not necessarily
poweroff the host (it seems to for x86/PV Xen guests, but does not for
ARM). Using exit/poweroff calls "poweroff" which is equivalent to
"halt -p", doing so results in ARM guests powering off as desired.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 4 Sep 2015 10:46:37 +0000 (11:46 +0100)]
cs-bisection-step: Properly handle external job refs in template
cs-bisection-step has had, for a long time, code which is supposed to
handle the situation where the template flight contains build job
references to other flights.
However:
- The regexp to spot these other-flight job reference runvars would
never match because it said \s where \S was probably intended (and
. would be better);
- If it were to match, the flight and job arguments to the recursive
preparejob invocation were the wrong way round. preparejob takes
the job name first.
Fix these two bugs. Now it does seem to work properly.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 4 Sep 2015 10:38:35 +0000 (11:38 +0100)]
cs-bisection-step: Print our command line at the start
The usual approach for debugging the cs-bisection-step is to repro the
problem (with --max-flight), which is most easily done by copying the
command line provided during a run which did the wrong thing.
Print the command line at startup, so that it appears in the report.
This will save us grobbling through the logs and cron mail.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Find all the places where adhoc-revtuple-generator runs subprograms
and have it add set -x (either by adding $OSSTEST_AHRTG_SETX to an
existing set -e, or using $setx which is either : or `set -x').
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Thu, 3 Sep 2015 10:39:37 +0000 (11:39 +0100)]
cr-daily-branch: Make sg-report-flight ignore bisections
sg-report-flight when testing X' (with a baseline of X) can justify a
failure of T(X',Y,Z) with a bisection failure of T(X,Y'',Z).
If Y'' breaks T then this makes it look to sg-report-flight like T was
already broken in X; cr-daily-branch could then push X' even though it
is actually broken.
This happened rarely, because cr-daily-branch's sg-report-flight would
only look at flights on the right branch, so only a bisection of T on
that branch can cause this, but nevertheless this can produce bad
pushes.
So: have cr-daily-branch pass a --blessings option to cr-daily-branch,
so that it only looks at (usually) `real' rather than the default of
`real' and also `real-bisect'.
An alternative, more complicated, approach would be for
sg-report-flight to compare versions of Y, Z, et al, when looking for
justifications, but I'm not sure this is desirable because it would
effectively reset the heisenbug compensator each time any other tree
changed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 28 Aug 2015 11:28:26 +0000 (11:28 +0000)]
Other-revision-jobs: Update cs-bisection-step
This is rather more subtle. We want to be able to bisect over all the
relevant inputs.
What we actually want to do if one of the *prev* tests fail is to
treat the "previous Xen branch" as a separate "tree" when bisecting,
so each revision tuple has both "current" and "old" Xen versions.
That way if the stable-4.x branch has broken forward migration, we
will report it properly.
Indeed, this needs to be extended not just to the Xen revision, but
all the inputs to the *prev* build.
We achieve this with new concept `other-revision job suffix',
introduced in the previous patch. The bisector now works internally
always with tree names which are `<tree>[ <suffix>]' (delimited by a
space). (Henceforth, we'll call `[ <suffix>]' the `othrev'.)
That is, all the revisions specified in prev build jobs are treated as
revisions of different trees to the revisions of apparently-same trees
in non-prev jobs.
The specific changes needed to cs-bisection-step are very small. We
only need to adjust the code which reads and writes the database:
* When we do the cross join on urls and revisions which generates the
rev tuple for a particular flight, also have the database compute
the othrev for each tree. Then, print the othrev in the debug
output, and append it to the tree name.
That resulting name is used everywhere:
It affects `mixed revision' detection, so we consider build-*-prev
jobs with differing revisions to problematic, or main-revision build
jobs with differing revisions, but we treat each category of build
job separately so the fact that the prev and main build jobs have
different revisions is fine.
The name is used for the key that is returned from flight_rmap.
Thence it is used for the Name in @treeinfos, and therefore the
results from flight_rtuple will be terms of this decorated tree
namespace.
* When we are preparing a new job to go, we need to (effectively) undo
this transformation. The query which finds the `tree_' variables
for a particular tree name is arranged to take an additional
parameter, which is the othrev. If the othrev does not match the
job, the name is not returned in the results.
Actually, because both the job and the othrev are query parameters,
what happens is either that they match (ie, the othrev in the tree
name from @treeinfos is indeed the othrev for the job we are
constructing) in which case we process the variable as before; or
they don't match, in which case the query contains contradictory
conditions in its AND clauses, and returns no rows.
So the ultimate effect is that we process each Name from @treeinfos
only if it is for the this kind of job. This slightly convoluted
implementation arises from the fact that the job-to-othrev mapping
is implemented as SQL, so we need to ask the database.
There is no need to change any of the output processing and reporting,
because "<tree> prev" is a perfectly good thing to print in all the
relevant contexts.
And there is no need to change how we drive adhoc-revtuple-generator,
because we do not pass it tree names at all, only urls.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 28 Aug 2015 14:17:09 +0000 (15:17 +0100)]
Other-revision-jobs: Provide other_revision_job_suffix
This is a string, a function of the job name, that identifies the
class of `other revisions'. It is empty for main-revision jobs
and currently there is only `<delimiter>prev' for build-*-prev.
We are going to use this in the bisector.
Reimplement main_revision_job_cond in terms of this. No functional
change, except that the SQL optimiser may have more work to do.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Fri, 28 Aug 2015 11:16:40 +0000 (11:16 +0000)]
Other-revision-jobs: Provide central test
Since 75fbbc19 "Arrange to test migration from the previous Xen
version", some flights have contained additional jobs build-*-prev,
which build a different revision of xen.git.
However, this violates an existing assumption in several of the
automatic archaeologists, namely that a flight should contain only
runvars referring to a single revision of a tree.
We will need to adjust all the places where this assumption is baked
in. The question arises, as to how the code in general is supposed to
know. There are many possible schemes, but almost all of them would
involve some kind of schema change and/or would be violated by
now-recorded history.
For now we adopt the following rule: the job name tells you. That is,
revision runvars in jobs with certain job names are disregarded. We
call non-disregarded jobs `main-revision jobs', since they use the
`main' revisions of everything, and others `other-revision jobs'.
We provide a single function in Osstest.pm which takes as argument a
SQL expression string representing a job name, and returns a SQL
expression string evaluating to a boolean, specifying whether the job
is a main revision job. This can be used in queries.
In subsequent patches I will go through all plausibly-relevant output
from
git-grep 'revision_\|revision\\\\_'
and update each piece in turn.
There are obviously-irrelevant hits in TestSupport (build_clone and
store_vcs_revision) and in BuildSupport.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Tue, 11 Aug 2015 20:25:09 +0000 (21:25 +0100)]
Toolstack/libvirt: use URI in migration command
Virsh migrate expects an URI, not a host. We don't actually care what
kind of transport it uses, the main objective is to test migration, so
use xen+ssh for the time being.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- added --live and factored out $duri ]
Ian Campbell [Tue, 11 Aug 2015 16:25:10 +0000 (17:25 +0100)]
Arrange to test migration from the previous Xen version
There are several steps to this:
- Identify $prevxenbranch, that is the branch which precedes
$xenbranch.
- Create appropriate build jobs.
- Add support in ts-xen-install for overriding {xen,}buildjob on a
per-ident basis
- Add a new receipt test-pair-oneway which only migrates from
src_host to dst_host and not the reverse
- Create appropriate test jobs, overridding the default builds for
src_host.
Currently we only do this for xen* branches and using xl, but in the
future we may wish to add to the libvirt branch too.
In make-flight if REVISION_PREVXEN is not supplied (e.g. called from
standalone-reset or by hand etc) then we create the build-$arch-prev jobs
with no revision_xen, same as build-$arch
It would be nice to try and reuse the builds from the last flight
which tested the $prevxenbranch baseline. I've not dont that here.
Ian Campbell [Wed, 5 Aug 2015 12:48:27 +0000 (13:48 +0100)]
libvirt: Pass correct arguments to virsh migrate
$dst is a host hash/object, resulting in:
2015-08-04 22:35:25 Z executing ssh ... root@172.16.144.34 virsh
migrate debian.guest.osstest HASH(0x28f4310)
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `virsh migrate debian.guest.osstest HASH(0x28f4310)'
Switch to using the same pattern as xl.pm, which is to call the
argument (containing the host hash) $dho and for $dst to be a local
variable containing $dho->{Name}.
Also s/$ho/$sho/ to match xl.pm, since I think that is clearer about
what role everything has.
Fix the prototype too while editing this function.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Mon, 27 Jul 2015 12:51:27 +0000 (13:51 +0100)]
ts-debian-hvm-install: Use xargs -0 to avoid massive filelist in logs.
The current arrangement is a bit odd, I'm not sure why it would be
that way and it results in a huge list of files in the middle of the
log which is rather boring to scroll through.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 27 Jul 2015 12:51:26 +0000 (13:51 +0100)]
ts-debian-hvm-install: use di_installcmdline_core
This is primarily to get DEBIAN_FRONTEND=text, for easier to read
logging.
Previously the command line consisted of the console and
preseed/file=/preseed.cfg. After this it is more complex.
The preseed file uses file= which is an alias for preseed/file. Extra
options are given including DEBIAN_FRONTEND and DEBCONF_DEBUG and the
following are preseeded via the command line:
Previous implied were "auto=true preseed" which are now explicit.
In addition the following harmless (in this context) options are
added:
hw-detect/load_firmware=
hostname=
netcfg/dhcp_timeout=
netcfg/choose_interface=
The caller could also cause debconf/priority to be set, but doesn't
here.
ts-debian-di-install in the distro test series also uses
di_installcmdline_core for guest uses.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>