In order for a PV domain to set up DMA from a passed-through device to
one of its pages, the page must be mapped in the IOMMU. On the other
hand, before a PV page may be used as a "special" page type (such as a
pagetable or descriptor table), it _must not_ be writable in the IOMMU
(otherwise a malicious guest could DMA arbitrary page tables into the
memory, bypassing Xen's safety checks); and Xen's current rule is to
have such pages not in the IOMMU at all.
At the moment, in order to accomplish this, the code borrows HVM
domain's "physmap" concept: When a page is assigned to a guest,
guess_physmap_add_entry() is called, which for PV guests, will create
a writable IOMMU mapping; and when a page is removed,
guest_physmap_remove_entry() is called, which will remove the mapping.
Additionally, when a page gains the PGT_writable page type, the page
will be added into the IOMMU; and when the page changes away from a
PGT_writable type, the page will be removed from the IOMMU.
Unfortunately, borrowing the "physmap" concept from HVM domains is
problematic. HVM domains have a lock on their p2m tables, ensuring
synchronization between modifications to the p2m; and all hypercall
parameters must first be translated through the p2m before being used.
Trying to mix this locked-and-gated approach with PV's lock-free
approach leads to several races and inconsistencies:
* A race between a page being assigned and it being put into the
physmap; for example:
- P1: call populate_physmap() { A = allocate_domheap_pages() }
- P2: Guess page A's mfn, and call decrease_reservation(A). A is owned by the domain,
and so Xen will clear the PGC_allocated bit and free the page
- P1: finishes populate_physmap() { guest_physmap_add_entry() }
Now the domain has a writable IOMMU mapping to a page it no longer owns.
* Pages start out as type PGT_none, but with a writable IOMMU mapping.
If a guest uses a page as a page table without ever having created a
writable mapping, the IOMMU mapping will not be removed; the guest
will have a writable IOMMU mapping to a page it is currently using
as a page table.
* A newly-allocated page can be DMA'd into with no special actions on
the part of the guest; However, if a page is promoted to a
non-writable type, the page must be mapped with a writable type before
DMA'ing to it again, or the transaction will fail.
To fix this, do away with the "PV physmap" concept entirely, and
replace it with the following IOMMU discipline for PV guests:
- (type == PGT_writable) <=> in iommu (even if type_count == 0)
- Upon a final put_page(), check to see if type is PGT_writable; if so,
iommu_unmap.
In order to achieve that:
- Remove PV IOMMU related code from guest_physmap_*
- Repurpose cleanup_page_cacheattr() into a general
cleanup_page_mappings() function, which will both fix up Xen
mappings for pages with special cache attributes, and also check for
a PGT_writable type and remove pages if appropriate.
- For compatibility with current guests, grab-and-release a
PGT_writable_page type for PV guests in guest_physmap_add_entry().
This will cause most "normal" guest pages to start out life with
PGT_writable_page type (and thus an IOMMU mapping), but no type
count (so that they can be used as special cases at will).
Also, note that there is one exception to to the "PGT_writable => in
iommu" rule: xenheap pages shared with guests may be given a
PGT_writable type with one type reference. This reference prevents
the type from changing, which in turn prevents page from gaining an
IOMMU mapping in get_page_type(). It's not clear whether this was
intentional or not, but it's not something to change in a security
update.
This is XSA-288.
Reported-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>