Hi.

Now I'm going to talk some more about osstest, the automatic test
framework used for continuous integration by the Xen Project.

osstest has a number of interesting, and possibly rare or unique,
architectural features which make it flexible and powerful, but which
aren't always obvious if you don't work with it from day to day.


[ architecture slide ]

osstest has what I like to call a controllerless architecture.

Most other CI systems have a single controller host - often, a single
controller process - which coordinates and directs all the system's
activities.

osstest does not.
You can see the architecture in the slide.

The shared infrastructure is very limited.  There is the database,
which records test results, and also has the master record of which
test harness job is currently using which resource.  And, there are
two simple coordination daemons that allow different test harness
processes to collaboratively share the pool of test resources.

There is no central task generator and no central resource planner.
In the production system, individual harness instances (for example,
the instance which runs the xen-unstable tests) are typically invoked
from cron, and decide whether there is work to do and what that work
is.

The instances are called upon, in a simple queue based on their
priority and start time, to book test hosts.  Each harness instance
gets to decide for itself which resources (ie, which test boxes) it
would prefer to use.

This distributed arrangement has a number of interesting advantages.

Multiple different versions of the osstest codebase can be running in
a single shared environment.

This greatly simplifies updates.  New versions of osstest to be
deployed seamlessly, without stopping operation: existing instances
will pick up the new code at a convenient point, after finishing one
set of tests and before starting on a new run.

This principle is used to allow osstest to self-audit: when osstest is
updated, the new version will run in parallel with the old and test
itself.  Most kinds of faults in the new code - particularly, faults
in the primary test implementations - will be detected, so that full
deployment of a broken version is avoided.

Even updates to the policy and heuristics of test box selection can be
handled this way, since there is no centralised planning policy.

This also means that different harness instances can run on different
hosts - although, currently, the production instance in Massachusetts
does not make use of this facility.

There are theoretical scaling advantages to this architecture, too.
For example, it would be possible in the future to shard the database
tables containing containing the test results.  These potential
scaling advantages are largely unrealised at the moment, since we
haven't needed them yet.


[ tolerable FAIL 110386 slide ]

Another somewhat unusual aspect of osstest is its approach to handling
failures.

Most existing CI systems, on code deployment paths or gating pushes to
public branches, regard test failures as blocking.  Mark a particular
test failure as expected is done manually.

This requirement for manual intervention complicates the
co-development of mainline code, and the corresponding tests: it
imposes a sequencing requirement; or it requires a test administrator
to manually intervene when a test starts passing, to remove the test
from the list of expected failures.

When osstest is collating results of a test run, it will automatically
analyse at the test history of the particular set of branches of the
codebase to determine whether a particular failed test is expected to
fail, or represents a regression.

Not only does this eliminate part of the triage burden, it also means
that the very same battery of tests can be run against both older and
newer code: features not present in older code give test failures, but
not regressions.


[ bisector slide ]

osstest's data-driven approach is also evident in its automatic
cross-tree bisector: if a test is seen to fail, osstest will
automatically search for a suitabler previous pass of the same test on
the same host, and attempt to reproduce both the previous pass and the
current failure.

If that success and failure are reproducible, osstest will start to
bisect across all the changes in between.  If the bug is reproducible
and was introduced by one of those commits, osstest will normally be
able to pinpoint it.

The overall goal is to try to reduce the need for human triage.

osstest can do this even though the test which failed is probably
using different versions of several different components, each with
their own revision histories.  osstest will combine the different git
trees for the different versions into a notional consolidated history.
You can see an example of this on the slide here.  Each box represents
a combination of versions of the four different input source trees.


There is still room for improvement.  The biggest problem is that
neither the regression analysis, nor the automatic bisector, works
very well with intermittent bugs.  And we have rather too many
intermittent bugs in our codebases.  I have a half-formed plan to deal
with these too, although mechanically investigating a true heisenbug
is necessarily rather time- and resource-consuming.


[ standalone slide 1 - architecture ]

Setting up a production instance of any CI system is not entirely
trivial and osstest is no exception.  Here's that osstest architecture
slide again.

But we want people to be able to easily add tests, and to benchtest
them.  Most of those developers don't want to get involved with such a
complicated edifice.

[ standalone simple slide ]

In order to help with this, osstest provides a standalone mode in
which it does not require any database or daemons.  And you can just
run the individual test step scripts, interactively.

It's still not entirely trivial, but the remaining complexity is
mostly irreducible, and arises from one of two main sources.  Firstly,
many of the tests need some help from your overall network
environment, so you have to give osstest a config file to tell it
about your dhcp server, for example.  Secondly, to test Xen this way
you necessarily have to set up a complete Xen instance, and some
guests, on a separate test box.  If you choose to use the relevant
functionality, osstest can automate that for you, if you give it the
right background information,


[ final slide ]

There is a lot more I could say about how osstest works of course.  I
hope I've given you a flavour of some of its more interesting
features.

I hope you'll be curious and perhaps try it out in standalone mode.

I'd like to take questions now.