Dan Smith [Fri, 2 Dec 2016 20:14:30 +0000 (12:14 -0800)]
Setup CellsV2 environment in base test
This makes us automatically set up a usable CellsV2 environment in
the base test case, if we're setting up database stuff. If the test
uses the DB normally, we get a cell0, a real cell, and create
hostmappings for any compute services that we start. If we're not
a DB-using test, we mock out the cell mapping stuff so that
everything appears to just be in the same cell.
This includes a fix for the nova-manage tests, which need to control
their own cells destiny now.
This includes a fix to the pci tests for libvirt, which were starting
the same compute service twice in a row. That no longer works because
we fail to create the duplicate hostmapping record, but we should not
have been doing that anyway. This makes us only create it once.
This includes a fix to the connection switching test to set up the
database fixtures itself since it requires a specific environment
that we now confuse by always going through the CellDatabases fixture.
Dan Smith [Thu, 8 Dec 2016 20:25:37 +0000 (12:25 -0800)]
Cleanup after any failed libvirt spawn
When we go to spawn a libvirt domain, we catch a few types of exceptions
and perform cleanup before failing the operation. For some reason, we
don't do this universally, which means that we leave things like network
devices laying around (from plug_vifs()). If a delete comes later, it
should clean those things up. However, if a subsequent failure prevents
that, and especially if we do a local delete at the API, we'll leak those
interfaces.
As seen in at least one real-world situation, this can cause us to leak
interfaces until we have tens of thousands of them on the system, which
then causes secondary failures.
Since we run the cleanup() routine for certain failures, it certainly
seems appropriate to run it always and not leave residue until a
successful delete is performed.
Andrea Frittoli [Mon, 5 Dec 2016 15:34:11 +0000 (15:34 +0000)]
Guestfs handle no passwd or group in image
When setting ownership of a file or directory, the guestfs driver
looks for the /etc/passwd and/or /etc/group files. In case they
are not found, the current driver lets the auges RuntimeError
through, which does not produce a very helpful error message.
Fixing that by handling the original exception and rasing a
Nova exception with more details in it.
Matt Riedemann [Thu, 8 Dec 2016 18:44:54 +0000 (13:44 -0500)]
Don't trace on ImageNotFound in delete_image_on_error
The point of the delete_image_on_error decorator is to
cleanup an image used during snapshot operations, so it
makes little sense to log an exception trace if the image
delete fails because the image no longer exists, which it
might not since _snapshot_instance method will proactively
delete non-active images in certain situations.
So let's just handle the ImageNotFound and ignore it.
Sean Dague [Thu, 8 Dec 2016 15:09:06 +0000 (10:09 -0500)]
Bump prlimit cpu time for qemu from 2 to 8
We've got user reported bugs that when opperating with slow NFS
backends with large (30+ GB) disk files, the prlimit of cpu_time 2 is
guessed to be the issue at hand because if folks hot patch a qemu-img
that runs before the prlimitted one, the prlimitted one succeeds.
This increases the allowed cpu timeout, as well as tweaking the error
message so that we return something more prescriptive when the
qemu-img command fails with prlimit abort.
The original bug (#1449062) the main mitigation concern here was a
carefully crafted image that gets qemu-img to generate > 1G of json,
and hence could be a node attack vector. cpu_time was never mentioned,
and I think was added originally as a belt and suspenders addition. As
such, bumping it to 8 seconds shouldn't impact our protection in any
real way.
Sylvain Bauza [Wed, 2 Nov 2016 11:28:02 +0000 (12:28 +0100)]
Extend get_all_by_filters to support resource criteria
Given the scheduler wants to know which RPs can support a set of different
requests, each one having a resource class with an amount, we need to
modify the current ResourceProviderList method for returning a subset.
The proposal for the request parameter is a dictionary of amounts keyed
by the resource class name.
ChangBo Guo(gcb) [Thu, 24 Mar 2016 12:33:50 +0000 (20:33 +0800)]
Don't use 'updated_at' to check service's status
Commit b9bae02af2168ad64d3b3d28c97c3853cee73272 introduced
'last_seen_up' to check service in Liberty. Nova use 'updated_at'
to check that before Liberty. So we can remove legacy item now.
Note: 'last_seen_up' may be null at the beginning. so we still
need 'created_at' in this case.
Matthew Booth [Wed, 7 Dec 2016 14:45:40 +0000 (14:45 +0000)]
libvirt: Fix initialising of LVM ephemeral disks
The LVM backend expects to write directly to the target disk rather
than to the image cache when initialising an ephemeral disk. This is
confounded by Image.cache(), which doesn't call the given callback
(_create_ephemeral in this case), if the target already exists.
Dan Smith [Fri, 2 Dec 2016 20:12:24 +0000 (12:12 -0800)]
Fix up non-cells-aware context managers in test_db_api
We recently converted all the db_api calls to use pick_context_manager
so that they will work with CellsV2. There were, however, still some
instances in test_db_api that did the old thing for testing low-level
bits. This cleans those up in preparation for cells patches to come.
Since we have to use the pick decorators, this converts several things
to use a decorated closure for that reason. No real functional change
though.
Dan Smith [Thu, 1 Dec 2016 16:13:01 +0000 (08:13 -0800)]
Add SingleCellSimple fixture
This fixture makes it trivial to mock out all the cell listing stuff
for the simple case where you just want to assume a single cell that
is configured as the default database.
Pavel Kholkin [Thu, 1 Dec 2016 15:37:28 +0000 (18:37 +0300)]
[proxy-api] microversion 2.39 deprecates image-metadata proxy API
Almost all proxy APIs were deprecated in microversion 2.36.
But the sub-resource image-metadata of image was forgotten to deprecate.
This patch deprecates the image-metdata API from 2.39.
Dan Smith [Thu, 10 Nov 2016 21:19:32 +0000 (13:19 -0800)]
Make RPCFixture support multiple connections
For testing cells, we will need to track the driver instances that
we give out by url. This normally just works with a conventional
oslo.messaging driver, but the fake driver keeps internal data
structures for simulating its bus. If we end up with clients creating
a new instance of the driver in the rpc switching code, we'll never
be able to send messages to services because we'll always have
private/separate data structures.
So, this makes the fixture wrap the transport creation stuff
and unify references by url. In order to make this work, some
retooling of rpc.init() is done, which makes it more in line with
the recent additions we had for wrapping transport initialization
per connection anyway.
For now, a lot of our tests can't handle the possibility of
multiple RPC connections due to them looking at the global
transport_url configuration. So for the moment, even though this
makes the fixture support multiple independent connections, we
collapse any such attempts down to a single connection to the
default broker.
Note: this requires a fix in oslo.messaging 5.14.0
tests: avoid starting compute service twice in sriov functional test
SRIOV functional tests that starts two or more guest were able to start
the compute service multiple times with the same hostname, which affected the
correctness of the tests.
This patch will make sure that the compute service is started only once.
Closes-Bug: #1647776 Co-Authored-By: Dan Smith <dansmith@redhat.com>
Change-Id: I8556ce068571d8e496e6fba756c1977c1d2c3ca1
tests: generate correct pci addresses for fake pci devices
fakelibvirt library was not generating a correct pci address for
its fake pci devices. PCI slot field would remain constant in all
generated devices.
While this issue would be transparrent for most of the tests,
but test_create_server_with_PF_no_VF would be affected, as it
should lookup VFs by its addresses.
In latest devstack, nova-serialproxy fails to start because it crashed
when it tries to register the cli options.
The issue is that it tried to register an array of options by invoking
conf.register_cli_opt(), when multiple options need to be register
through conf.register_cli_opts().
Matthew Booth [Tue, 22 Nov 2016 12:02:18 +0000 (12:02 +0000)]
libvirt: Mock imagebackend template funcs in ImageBackendFixture
This represents a small change to how we test the arguments passed to
a template function. Most tests which test cache() currently don't
directly test the callback function. Some test the callback function
which was passed to cache(), but this is undesirable as:
* It breaks untestably if you replace it with a wrapper
* You can't test the arguments which were passed to it
To make this easier to test, and because a subsequent change alters
this slightly in ways we want to make obvious, we update
ImageBackendFixture to execute the callback function when cache() is
called. We pre-emptively mock all callback methods so they are not
actually called. Test can assert on these mocks to check that the
intended callback was called, and the arguments used.
Matt Riedemann [Mon, 5 Dec 2016 21:24:05 +0000 (16:24 -0500)]
Handle MarkerNotFound from cell0 database
When listing instances in the cellv2 world we look them up
from three locations:
1. Build requests which exist before the instances are created
in the cell database (after the scheduler picks a host to
build the instance). Currently instances and build requests
are both created before casting to conductor, but that's going
away in Ocata with the support for multiple cellsv2 cells.
2. The cell0 database for instances which failed to get scheduled
to a compute host (and therefore a cell).
3. The actual cell database that the instance lives in. Currently
that's only a single traditional nova database, but could be one
of multiple cellsv2 cells when we add that support in Ocata.
If a marker is passed in when listing instances, if the instance
lives in an actual cell database, we'll get a MarkerNotFound failure
from cell0 because the instance doesn't exist in cell0, but we check
cell0 before we check the cell database. This makes the instance
listing short-circuit and fail with a 400 from the REST API.
This patch simply handles the MarkerNotFound when listing instances
from the cell0 database and ignores it so we can continue onto the
cell database.
bhagyashris [Mon, 17 Oct 2016 13:59:21 +0000 (19:29 +0530)]
Handle ImageNotFound exception during instance backup
If user have already backed up instance to few numbers and then
execute backup api with rotation 1, then nova will delete the
previously created images exceeding rotation limit.
During deleting these images, if user mistakenly deletes one of the
image in advance, then backup instance won't be able to delete all
images exceeding rotation limit causing api failure.
This patch handles ImageNotFound exception during deleting backup
images, logs a warning message and continues deleting all of the
remaining images.
melanie witt [Fri, 18 Nov 2016 17:18:24 +0000 (17:18 +0000)]
Add a CellDatabases test fixture
As we progress with the Cells v2 scheduling interaction work, we need
to be able to have switching between multiple databases work in our
functional tests. The existing Database fixture doesn't work in this
case because each connection switch via target_cell results in a new,
empty sqlite database, and main_context_manager is global in the DB
API and always points at the same sqlite database.
This adds a fixture that creates a new sqlite database per cell
database, runs migrations, and keeps track of the databases using
identifiers provided when cell databases are added to the fixture.
It patches get_context_manager, create_context_manager, and target_cell
to return the matching database connection according to identifier,
simulating switching between multiple databases in a single test.
Markus Zoeller [Tue, 6 Dec 2016 10:40:25 +0000 (11:40 +0100)]
libvirt: virtlogd: use virtlogd for char devices
This change makes actual usage of the "logd" sub-element for char devices.
The two REST APIs ``os-getConsoleOutput`` and ``os-getSerialConsole`` can
now be satisfied at the same time. This is valid for any combination of:
* char device element: "console", "serial"
* char device type: "tcp", "pty"
There is also no need to create multiple different device types anymore.
If we have a tcp device, we don't need the pty device anymore. The logging
will be done in the tcp device.
Markus Zoeller [Mon, 7 Nov 2016 09:01:45 +0000 (10:01 +0100)]
libvirt: create consoles in an understandable/extensible way
This change refactors the way the consoles of a libvirt guest get
created. This is basically just a reshuffle of code and an extraction
of methods with the goal to make the plethora of conditionals easier
to understand. Also, future enhancements should be easier this way.
For example, the blueprint libvirt-virtlogd (targeted for Ocata) will
have to be integrated in this console creation flow.
During the implementation I noticed that the host caps are *not*
needed for creation. That was an unnecessary special case for s390x
which didn't make any sense as the guest arch is the important piece.
That's the reason I dropped the "caps" parameter of the method
"_create_consoles". That also made it necessary to adjust the unit
tests.
I also took the chance to rename the "guest" parameter, which represents
the domain *configuration object*, to "guest_cfg". This is (almost) used
in every other place in the libvirt driver.
jichenjc [Fri, 18 Nov 2016 20:51:15 +0000 (04:51 +0800)]
Add more log when delete orphan node
we have following log when delete opphan node
INFO nova.compute.manager Deleting orphan compute node xx
we might need to know why those node are removed so
we need additional log info about the removal.
however, it's not complete about why it's removed and
what's removed unless we dig into database layer