xenbits.xensource.com Git

x86/setup: don't relocate the VGA hole.

Copying the contents of the VGA hole is at best pointless and at worst
dangerous. Booting Xen on Xen, it causes a very long delay as each
byte is referred to qemu.

Since we were already discarding the first 1MB of the relocated area,
just avoid copying it in the first place.

Reported-by: Jon Ludlam <jonathan.ludlam@eu.citrix.com>
Signed-off-by: Tim Deegan <tim@xen.org>
master changeset: 0b76ce20de85ad7c23c47ee3275020859b91d46b
master date: 2013-02-14 12:20:58 +0000

Add .gitignore

Copy .gitignore from staging-4.2 current tip
(ie from 3f5e3cd97398468d624cf907979c2bb12ff7ee7e).

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools: Fix memset(&p,0,sizeof(p)) idiom in several places.

gcc 4.8 identifies several places where code of the form memset(x, 0,
sizeof(x)); is used incorrectly, meaning that less memory is set to
zero than required.

Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Committed-by: Keir Fraser <keir@xen.org>
(cherry picked from commit d119301b5816b39b5ba722a2f8b301b37e8e34bd)

libxl: Fix uninitialized variable in libxl_create_stubdom

It is used for result domid from libxl__domain_make, but actually this
function have assert on an initial value.

This patch is intended for xen-4.1 only - 4.2 and later have reworked
this part of code already containing the fix.

Signed-off-by: Marek Marczykowski <marmarek@invisiblethingslab.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

AMD IOMMU: don't BUG() when we don't have to

find_iommu_for_device() can easily return NULL instead, as all of its
callers are prepared for that.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master changeset: f547d42ec0306cdceffb8f7603c7e6f8977cf398
master date: 2013-02-18 09:37:35 +0100

xenoprof: avoid division by 0

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master changeset: 085f1f2d3aee1a35dfc7ca2f4423e51fa654010c
master date: 2013-02-15 09:42:02 +0100

gcc4.8 build fix: Add -Wno-unused-local-typedefs to CFLAGS.

Based on a patch by M A Young <m.a.young@durham.ac.uk>

Signed-off-by: Keir Fraser <keir@xen.org>
master changeset: 511278b4e239df00de7b33f7b42d8d5d7e52221b
master date: 2013-02-13 17:03:31 +0000

xen: sched_credit: improve picking up the idle CPU for a VCPU

In _csched_cpu_pick() we try to select the best possible CPU for
running a VCPU, considering the characteristics of the underlying
hardware (i.e., how many threads, core, sockets, and how busy they
are). What we want is "the idle execution vehicle with the most
idling neighbours in its grouping".

In order to achieve it, we select a CPU from the VCPU's affinity,
giving preference to its current processor if possible, as the basis
for the comparison with all the other CPUs. Problem is, to discount
the VCPU itself when computing this "idleness" (in an attempt to be
fair wrt its current processor), we arbitrarily and unconditionally
consider that selected CPU as idle, even when it is not the case,
for instance:
1. If the CPU is not the one where the VCPU is running (perhaps due
    to the affinity being changed);
2. The CPU is where the VCPU is running, but it has other VCPUs in
    its runq, so it won't go idle even if the VCPU in question goes.

This is exemplified in the trace below:

]  3.466115364 x|------|------| d10v1   22005(2:2:5) 3 [ a 1 8 ]
   ... ... ...
   3.466122856 x|------|------| d10v1 runstate_change d10v1
   running->offline
   3.466123046 x|------|------| d?v? runstate_change d32767v0
   runnable->running
   ... ... ...
]  3.466126887 x|------|------| d32767v0   28004(2:8:4) 3 [ a 1 8 ]

22005(...) line (the first line) means _csched_cpu_pick() was called
on VCPU 1 of domain 10, while it is running on CPU 0, and it choose
CPU 8, which is busy ('|'), even if there are plenty of idle
CPUs. That is because, as a consequence of changing the VCPU affinity,
CPU 8 was chosen as the basis for the comparison, and therefore
considered idle (its bit gets unconditionally set in the bitmask
representing the idle CPUs). 28004(...) line means the VCPU is woken
up and queued on CPU 8's runq, where it waits for a context switch or
a migration, in order to be able to execute.

This change fixes things by only considering the "guessed" CPU idle if
the VCPU in question is both running there and is its only runnable
VCPU.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
xen-unstable changeset: 26287:127c2c47d440
xen-unstable date: Tue Dec 18 18:10:18 UTC 2012

AMD IOMMU: also spot missing IO-APIC entries in IVRS table

Apart from dealing duplicate conflicting entries, we also have to
handle firmware omitting IO-APIC entries in IVRS altogether. Not doing
so has resulted in c/s 26517:601139e2b0db to crash such systems during
boot (whereas with the change here the IOMMU gets disabled just as is
being done in the other cases, i.e. unless global tables are being
used).

Debugging this issue has also pointed out that the debug log output is
pretty ugly to look at - consolidate the output, and add one extra
item for the IVHD special entries, so that future issues are easier
to analyze.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26531:e68f14b9e739
xen-unstable date: Thu Feb 14 08:40:52 UTC 2013

tools/ocaml: oxenstored: correctly handle a full ring.

Change 26521:2c0fd406f02c (part of XSA-38 / CVE-2013-0215) incorrectly
caused us to ignore rather than process a completely full ring. Check if
producer and consumer are equal before masking to avoid this, since prod ==
cons + PAGE_SIZE after masking becomes prod == cons.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26539:759574df84a6
Backport-requested-by: security@xen.org
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

unmodified_drivers: __devinit was removed in linux-3.8

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Merge with __init handling.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 26526:a37aa55c3cbc
xen-unstable date: Tue Feb 12 10:29:51 UTC 2013

x86: restore (optional) forwarding of PCI SERR induced NMI to Dom0

c/s 22949:54fe1011f86b removed the forwarding of NMIs to Dom0 when they
were caused by PCI SERR. NMI buttons as well as BMCs (like HP's iLO)
may however want such events to be seen in Dom0 (e.g. to trigger a
dump).

Therefore restore most of the functionality which named c/s removed
(adjusted for subsequent changes, and adjusting the public interface to
use the modern term, retaining the old one for backwards
compatibility).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26440:5af4f2ab06f3
xen-unstable date: Tue Jan 22 08:33:10 UTC 2013

x86/AMD: Enable WC+ memory type on family 10 processors

In some cases BIOS may not enable WC+ memory type on family 10 processors,
instead converting what would be WC+ memory to CD type. On guests using
nested pages this could result in performance degradation. This patch
enables WC+.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
xen-unstable changeset: 26427:8f6dd5dc5d6c
xen-unstable date: Fri Jan 18 11:20:58 UTC 2013

oxenstored: Enforce a maximum message size of 4096 bytes

The maximum size of a message is part of the protocol spec in
xen/include/public/io/xs_wire.h

Before this patch a client which sends an overly large message can
cause a buffer read overrun.

Note if a badly-behaved client sends a very large message
then it will be difficult for them to make their connection
work again-- they will probably need to reboot.

This is a security issue, part of XSA-38 / CVE-2013-0215.

Signed-off-by: David Scott <dave.scott@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 26522:ffd30e7388ad
Backport-requested-by: security@xen.org
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools/ocaml: oxenstored: Be more paranoid about ring reading

oxenstored makes use of the OCaml Xenbus bindings, in which the
function xs_ring_read in tools/ocaml/libs/xb/xs_ring_stubs.c is used
to read from the shared memory Xenstore ring.

This function does not correctly handle all possible (prod, cons)
states when MASK_XENSTORE_IDX(prod) > MASK_XENSTORE_IDX(cons).

The root cause is the use of the unmasked values of prod and cons to
calculate to_read.  If prod is set to an out-of-range value, the ring
peer can cause to_read to be too large or even negative.  This allows
the ring peer to force oxenstored to read and write out of range for
the buffers leading to a crash or possibly to privilege escalation.

Correct this by masking the values of cons and prod at the start, so
we only deal with masked values.  This makes the logic simpler, as
semantically inappropriate values of the upper bits of the ring
pointers are simply ignored.

The same vulnerability does not exist in the ring writer because the
only use made of the unmasked value is the check which prevents the
prod pointer overtaking the cons pointer.  A ring peer which defeats
this check will suffer only lost data.

However, additionally, precautions need to be taken to ensure that
req_cons and req_prod are only read once in each function.  Without
the use of volatile or some asm construct, the compiler can "prove"
that req_cons and req_prod do not change unexpectedly and is permitted
to "amplify" the read of (say) req_cons into two reads at different
times, giving two different values for use as cons, and then use the
two sources of cons interchangeably.  (The use of xen_mb() does not
forbid this.)

Therefore do the reads of req_cons and req_prod through a volatile
pointer in both xs_ring_read and xs_ring_write.

This is currently believed to be a theoretical vulnerability as we are
not aware of any compilers which amplify reads in this way.

This is a security issue, part of XSA-38 / CVE-2013-0215.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Matthew Daley <mattjd@gmail.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 26521:2c0fd406f02c
Backport-requested-by: security@xen.org
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

AMD,IOMMU: Make per-device interrupt remapping table default

Using global interrupt remapping table may be insecure, as
described by XSA-36. This patch makes per-device mode default.

This is XSA-36 / CVE-2013-0153.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Moved warning in amd_iov_detect() to location covering all cases.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 26519:1af531e7bc2f
xen-unstable date: Tue Feb 5 14:22:11 UTC 2013

AMD,IOMMU: Disable IOMMU if SATA Combined mode is on

AMD's SP5100 chipset can be placed into SATA Combined mode
that may cause prevent dom0 from booting when IOMMU is
enabled and per-device interrupt remapping table is used.
While SP5100 erratum 28 requires BIOSes to disable this mode,
some may still use it.

This patch checks whether this mode is on and, if per-device
table is in use, disables IOMMU.

This is XSA-36 / CVE-2013-0153.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Flipped operands of && in amd_iommu_init() to make the message issued
by amd_sp5100_erratum28() match reality (when amd_iommu_perdev_intremap
is zero, there's really no point in calling the function).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 26518:e379a23b0465
xen-unstable date: Tue Feb 5 14:21:25 UTC 2013

AMD,IOMMU: Clean up old entries in remapping tables when creating new one

When changing the affinity of an IRQ associated with a passed
through PCI device, clear previous mapping.

This is XSA-36 / CVE-2013-0153.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
In addition, because some BIOSes may incorrectly program IVRS
entries for IOAPIC try to check for entry's consistency. Specifically,
if conflicting entries are found disable IOMMU if per-device
remapping table is used. If entries refer to bogus IOAPIC IDs
disable IOMMU unconditionally

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
xen-unstable changeset: 26517:601139e2b0db
xen-unstable date: Tue Feb 5 14:20:47 UTC 2013

ACPI: acpi_table_parse() should return handler's error code

Currently, the error code returned by acpi_table_parse()'s handler
is ignored. This patch will propagate handler's return value to
acpi_table_parse()'s caller.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Committed-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 26516:32d4516a97f0
xen-unstable date: Tue Feb 5 14:18:18 UTC 2013

QEMU_TAG update

x86: compat_show_guest_stack() should not truncate MFN

Re-using "addr" here was a mistake, as it is a 32-bit quantity.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26332:8e942f2f3b45
xen-unstable date: Mon Jan 7 12:28:29 UTC 2013

x86/mm: Fix loop increment in paging_log_dirty_range()

In 23417:53ef1f35a0f8 (the fix for XSA-27 / CVE-2012-5511), the
loop variable gets incremented twice, so the loop only clears every
second page of the bitmap. This might cause the tools to think that
pages are dirty when they are not.

Reported-by: Steven Noonan <snoonan@amazon.com>
Reported-by: Matt Wilson <msw@amazon.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Jan Beulich <jbeulich@suse.com>

Config.mk: delete accidentally introduced drivel

QEMU_TAG update

VT-d: fix interrupt remapping source validation for devices behind legacy bridges

Using SVT_VERIFY_BUS here doesn't make sense; native Linux also
uses SVT_VERIFY_SID_SQ here instead.

This is XSA-33 / CVE-2012-5634.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 26340:19fd1237ff0d
xen-unstable date: Wed Jan 9 16:13:26 UTC 2013

x86, amd: Disable way access filter on Piledriver CPUs

The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.

The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.

The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.

The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.

More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf

CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.

Signed-off-by: Andre Przywara <osp@andrep.de>
Add and use MSR_AMD64_IC_CFG. Update the value whenever it is found to
not have all bits set, rather than just when it's zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26294:5fb0b8b838da
xen-unstable date: Wed Dec 19 10:42:09 UTC 2012

IOMMU/ATS: fix maximum queue depth calculation

The capabilities register field is a 5-bit value, and the 5 bits all
being zero actually means 32 entries.

Under the assumption that amd_iommu_flush_iotlb() really just tried
to correct for the miscalculation above when adding 32 to the value,
that adjustment is also being removed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Wei Huang <wei.huang2@amd.com>
xen-unstable changeset: 26235:670b07e8d738
xen-unstable date: Wed Dec 5 08:52:14 UTC 2012

x86/HPET: fix FSB interrupt masking

HPET_TN_FSB is not really suitable for masking interrupts - it merely
switches between the two delivery methods. The right way of masking is
through the HPET_TN_ENABLE bit (which really is an interrupt enable,
not a counter enable or some such). This is even more so with certain
chip sets not even allowing HPET_TN_FSB to be cleared on some of the
channels.

Further, all the setup of the channel should happen before actually
enabling the interrupt, which requires splitting legacy and FSB logic.

Finally this also fixes an S3 resume problem (HPET_TN_FSB did not get
set in hpet_broadcast_resume(), and hpet_msi_unmask() doesn't get
called from the general resume code either afaict).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26183:c139ca92edca
xen-unstable date: Thu Nov 22 09:03:23 UTC 2012

passthrough/PCI: replace improper uses of pci_find_next_cap()

Using pci_find_next_cap() without prior pci_find_cap_offset() is bogus
(and possibly wrong, given that the latter doesn't check the
PCI_STATUS_CAP_LIST flag, which so far was checked in an open-coded way
only for the non-bridge case).

Once at it, fold the two calls into one, as we need its result in any
case.

Question is whether, without any caller left, pci_find_next_cap()
should be purged as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
xen-unstable changeset: 26179:ae6fb202b233
xen-unstable date: Tue Nov 20 07:58:31 UTC 2012

XZ: Fix incorrect XZ_BUF_ERROR

From: Lasse Collin <lasse.collin@tukaani.org>

xz_dec_run() could incorrectly return XZ_BUF_ERROR if all of the
following was true:

- The caller knows how many bytes of output to expect and only
   provides
   that much output space.

- When the last output bytes are decoded, the caller-provided input
   buffer ends right before the LZMA2 end of payload marker.  So LZMA2
   won't provide more output anymore, but it won't know it yet and
   thus
   won't return XZ_STREAM_END yet.

- A BCJ filter is in use and it hasn't left any unfiltered bytes in
   the
   temp buffer.  This can happen with any BCJ filter, but in practice
   it's more likely with filters other than the x86 BCJ.

This fixes <https://bugzilla.redhat.com/show_bug.cgi?id=3D735408>
where Squashfs thinks that a valid file system is corrupt.

This also fixes a similar bug in single-call mode where the
uncompressed size of a block using BCJ + LZMA2 was 0 bytes and caller
provided no output space.  Many empty .xz files don't contain any
blocks and thus don't trigger this bug.

This also tweaks a closely related detail: xz_dec_bcj_run() could call
xz_dec_lzma2_run() to decode into temp buffer when it was known to be
useless.  This was harmless although it wasted a minuscule number of
CPU cycles.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 23870:5c97b02f48fc
xen-unstable date: Thu Sep 22 17:34:27 UTC 2011

XZ decompressor: Fix decoding of empty LZMA2 streams

From: Lasse Collin <lasse.collin@tukaani.org>

The old code considered valid empty LZMA2 streams to be corrupt.
Note that a typical empty .xz file has no LZMA2 data at all,
and thus most .xz files having no uncompressed data are handled
correctly even without this fix.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 23869:db1ea4b127cd
xen-unstable date: Thu Sep 22 17:33:48 UTC 2011

Add Dom0 xz kernel decompression

Largely taken from Linux 2.6.38 and made build/work for Xen.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 23001:9eb9948904cd
xen-unstable date: Wed Mar 9 16:18:58 UTC 2011

update Xen version to 4.1.5-pre

Added signature for changeset 12c4c4c0a715

Added tag RELEASE-4.1.4 for changeset 12c4c4c0a715

update Xen version to 4.1.4

libxl: revert 23428:93e17b0cd035 "avoid blktap2 deadlock"

This results in additional leakage in xenstore according to the
automated tests.

Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

libxl: avoid blktap2 deadlock on cleanup

Establishes correct cleanup behavior for blktap devices. This patch
implements the release of the backend device before calling for
the destruction of the userspace component of the blktap device.

Without this patch the kernel xen-blkback driver deadlocks with
the blktap2 user control plane until the IPC channel is terminated by the
timeout on the select() call. This results in a noticeable delay
in the termination of the guest and causes the blktap minor
number which had been allocated to be orphaned.

Signed-off-by: Greg Wettstein <greg@enjellic.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

From: Ian Campbell <ian.campbell@citrix.com>

libxl: attempt to cleanup tapdisk processes on disk backend destroy.

This patch properly terminates the tapdisk2 process(es) started
to service a virtual block device.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 23883:7998217630e2
xen-unstable date: Wed Sep 28 16:42:11 2011 +0100
Signed-off-by: Greg Wettstein <greg@enjellic.com>
Backport-requested-by: Greg Wettstein <greg@enjellic.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

Added signature for changeset 0125069bc1b2

Added tag 4.1.4-rc2 for changeset 0125069bc1b2

update Xen version to 4.1.4-rc2

x86/hap: Fix memory leak of domain->arch.hvm_domain.dirty_vram

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Signed-off-by: Tim Deegan <tim@xen.org>
xen-unstable changeset: 26203:b5cb6cccc32c
xen-unstable date: Thu Nov 29 11:01:00 UTC 2012

MAINTAINERS: Reference stable maintenance policy

I also couldn't resist fixing a typo and adding a reference to
http://wiki.xen.org/wiki/Submitting_Xen_Patches for the normal case as
well.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 26238:53805e238cca
xen-unstable date: Thu Dec 6 09:56:53 UTC 2012

memop: limit guest specified extent order

Allowing unbounded order values here causes almost unbounded loops
and/or partially incomplete requests, particularly in PoD code.

The added range checks in populate_physmap(), decrease_reservation(),
and the "in" one in memory_exchange() architecturally all could use
PADDR_BITS - PAGE_SHIFT, and are being artificially constrained to
MAX_ORDER.

This is XSA-31 / CVE-2012-5515.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

xen: fix error handling of guest_physmap_mark_populate_on_demand()

The only user of the "out" label bypasses a necessary unlock, thus
enabling the caller to lock up Xen.

Also, the function was never meant to be called by a guest for itself,
so rather than inspecting the code paths in depth for potential other
problems this might cause, and adjusting e.g. the non-guest printk()
in the above error path, just disallow the guest access to it.

Finally, the printk() (considering its potential of spamming the log,
the more that it's not using XENLOG_GUEST), is being converted to
P2M_DEBUG(), as debugging is what it apparently was added for in the
first place.

This is XSA-30 / CVE-2012-5514.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

xen: add missing guest address range checks to XENMEM_exchange handlers

Ever since its existence (3.0.3 iirc) the handler for this has been
using non address range checking guest memory accessors (i.e.
the ones prefixed with two underscores) without first range
checking the accessed space (via guest_handle_okay()), allowing
a guest to access and overwrite hypervisor memory.

This is XSA-29 / CVE-2012-5513.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

x86/HVM: range check xen_hvm_set_mem_access.hvmmem_access before use

Otherwise an out of bounds array access can happen if changing the
default access is being requested, which - if it doesn't crash Xen -
would subsequently allow reading arbitrary memory through
HVMOP_get_mem_access (again, unless that operation crashes Xen).

This is XSA-28 / CVE-2012-5512.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

hvm: Limit the size of large HVM op batches

Doing large p2m updates for HVMOP_track_dirty_vram without preemption
ties up the physical processor. Integrating preemption into the p2m
updates is hard so simply limit to 1GB which is sufficient for a 15000
* 15000 * 32bpp framebuffer.

For HVMOP_modified_memory and HVMOP_set_mem_type preemptible add the
necessary machinery to handle preemption.

This is CVE-2012-5511 / XSA-27.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
x86/paging: Don't allocate user-controlled amounts of stack memory.

This is XSA-27 / CVE-2012-5511.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
v2: Provide definition of GB to fix x86-32 compile.

Signed-off-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

gnttab: fix releasing of memory upon switches between versions

gnttab_unpopulate_status_frames() incompletely freed the pages
previously used as status frame in that they did not get removed from
the domain's xenpage_list, thus causing subsequent list corruption
when those pages did get allocated again for the same or another purpose.

Similarly, grant_table_create() and gnttab_grow_table() both improperly
clean up in the event of an error - pages already shared with the guest
can't be freed by just passing them to free_xenheap_page(). Fix this by
sharing the pages only after all allocations succeeded.

This is CVE-2012-5510 / XSA-26.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

x86/time: fix scale_delta() inline assembly

The way it was coded, it clobbered %rdx without telling the compiler.
This generally didn't cause any problems except when there are two back
to back invocations (as in plt_overflow()), as in that case the
compiler may validly assume that it can re-use for the second instance
the value loaded into %rdx before the first one.

Once at it, also properly relax the second operand of "mul" (there's no
need for it to be in %rdx, or a register at all), and switch away from
using explicit register names in the instruction operands.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26188:16bf7f3069a7
xen-unstable date: Mon Nov 26 16:20:39 UTC 2012

fix backport oversight in 23383:addf106cc90f

This fixes an omission in said backport (of -unstable
25931:149805919569): While the XEN_DOMCTL_memory_mapping code
pointlessly sets "ret" to zero, the XEN_DOMCTL_ioport_mapping code
needs to because of an XSM call (leaving ret set to zero when reaching
the code in question) present in -unstable, but absent in 4.1-testing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Added signature for changeset 500194a883bd

Added tag 4.1.4-rc1 for changeset 500194a883bd

update Xen version to 4.1.4-rc1

compat/gnttab: Prevent infinite loop in compat code

c/s 20281:95ea2052b41b, which introduces Grant Table version 2
hypercalls introduces a vulnerability whereby the compat hypercall
handler can fall into an infinite loop.

If the watchdog is enabled, Xen will die after the timeout.

This is a security problem, XSA-24 / CVE-2012-4539.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 26151:b64a7d868f06
Backport-requested-by: security@xen.org
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

xen/mm/shadow: check toplevel pagetables are present before unhooking them.

If the guest has not fully populated its top-level PAE entries when it calls
HVMOP_pagetable_dying, the shadow code could try to unhook entries from
MFN 0. Add a check to avoid that case.

This issue was introduced by c/s 21239:b9d2db109cf5.

This is a security problem, XSA-23 / CVE-2012-4538.

Signed-off-by: Tim Deegan <tim@xen.org>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86/physmap: Prevent incorrect updates of m2p mappings

In certain conditions, such as low memory, set_p2m_entry() can fail.
Currently, the p2m and m2p tables will get out of sync because we still
update the m2p table after the p2m update has failed.

If that happens, subsequent guest-invoked memory operations can cause
BUG()s and ASSERT()s to kill Xen.

This is fixed by only updating the m2p table iff the p2m was
successfully updated.

This is a security problem, XSA-22 / CVE-2012-4537.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86/physdev: Range check pirq parameter from guests

Otherwise Xen will read beyond either end of the struct
domain.arch.pirq_emuirq array, usually resulting in a fatal page fault.

This vulnerability was introduced by c/s 23241:d21100f1d00e, which adds
a call to domain_pirq_to_emuirq() which uses the guest provided pirq
value before range checking it, and was fixed by c/s 23573:584c2e5e03d9
which changed the behaviour of the domain_pirq_to_emuirq() macro to use
radix trees instead of a flat array.

This is XSA-21 / CVE-2012-4536.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

VCPU/timers: Prevent overflow in calculations, leading to DoS vulnerability

The timer action for a vcpu periodic timer is to calculate the next
expiry time, and to reinsert itself into the timer queue. If the
deadline ends up in the past, Xen never leaves __do_softirq(). The
affected PCPU will stay in an infinite loop until Xen is killed by the
watchdog (if enabled).

This is a security problem, XSA-20 / CVE-2012-4535.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 26148:bf58b94b3cef
Backport-requested-by: security@xen.org
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86/mm x86 shadow: Fix typo in sh_invlpg sl3 page presence check

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Tim Deegan <tim@xen.org>
xen-unstable changeset: 26134:279bbf2a0b48
xen-unstable date: Mon Nov 12 10:17:00 UTC 2012

tmem: Prevent NULL dereference on error case

If the client / pool IDs given to tmemc_save_get_next_page are invalid,
the calculation of pagesize will dereference NULL.

Fix this by moving the calculation below the appropriate NULL check.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
xen-unstable changeset: 26132:286ef4ced216
xen-unstable date: Mon Nov 12 08:34:57 UTC 2012

QEMU_TAG update

xend/pvscsi: update sysfs parser for Linux 3.0

The sysfs parser for /sys/bus/scsi/devices understands only the layout
of kernel version 2.6.16. This looks as follows:

/sys/bus/scsi/devices/1:0:0:0/block:sda is a symlink to /sys/block/sda/
/sys/bus/scsi/devices/1:0:0:0/scsi_generic:sg1 is a symlink to /sys/class/scsi_generic/sg1

Both directories contain a 'dev' file with the major:minor information.
This patch updates the used regex strings to match also the colon to
make it more robust against possible future changes.

In kernel version 3.0 the layout changed:
/sys/bus/scsi/devices/ contains now additional symlinks to directories
such as host1 and target1:0:0. This patch ignores these as they do not
point to the desired scsi devices. They just clutter the devices array.

The directory layout in '1:0:0:0' changed as well, the 'type:name'
notation was replaced with 'type/name' directories:

/sys/bus/scsi/devices/1:0:0:0/block/sda/
/sys/bus/scsi/devices/1:0:0:0/scsi_generic/sg1/

Both directories contain a 'dev' file with the major:minor information.
This patch adds additional code to walk the subdir to find the 'dev'
file to make sure the given subdirectory is really the kernel name.

In addition this patch makes sure devname is not None.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26010:cff10030c6ea
Backport-requested-by: Olaf Hering <olaf@aepfle.de>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25915:839e5d95d483
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

xend/pvscsi: fix usage of persistant device names for SCSI devices

Currently the callers of vscsi_get_scsidevices() do not pass a mask
string. This will call "lsscsi -g '[]'", which causes a lsscsi syntax
error. As a result the sysfs parser _vscsi_get_scsidevices() is used.
But this parser is broken and the specified names in the config file are
not found.

Using a mask '*' if no mask was given will call lsscsi correctly and the
following config is parsed correctly:

vscsi=[
'/dev/sg3, 0:0:0:0',
'/dev/disk/by-id/wwn-0x600508b4000cf1c30000800000410000, 0:0:0:1'
]

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26009:2dbfa4d2e107
Backport-requested-by: Olaf Hering <olaf@aepfle.de>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25914:b8916af165b9
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

xend/pvscsi: fix passing of SCSI control LUNs

Currently pvscsi can not pass SCSI devices that have just a scsi_generic node.
In the following example sg3 is a control LUN for the disk sdd.
But vscsi=['4:0:2:0,0:0:0:0'] does not work because the internal 'devname'
variable remains None. Later writing p-devname to xenstore fails because None
is not a valid string variable.

Since devname is used for just informational purpose use sg also as devname.

carron:~ $ lsscsi -g
[0:0:0:0]    disk    ATA      FK0032CAAZP      HPF2  /dev/sda   /dev/sg0
[4:0:0:0]    disk    HP       P2000G3 FC/iSCSI T100  /dev/sdb   /dev/sg1
[4:0:1:0]    disk    HP       P2000G3 FC/iSCSI T100  /dev/sdc   /dev/sg2
[4:0:2:0]    storage HP       HSV400           0950  -         /dev/sg3
[4:0:2:1]    disk    HP       HSV400           0950  /dev/sdd   /dev/sg4
[4:0:3:0]    storage HP       HSV400           0950  -         /dev/sg5
[4:0:3:1]    disk    HP       HSV400           0950  /dev/sde   /dev/sg6

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26008:eecb528583d7
Backport-requested-by: Olaf Hering <olaf@aepfle.de>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25913:16ced2f387b9
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

tools: xend: fix wrong condition check for xml file

In commit e8d40584, it intended to check xml file size and when empty will
return, the condition should be "if os.path.getsize(xml_path) == 0" rather
then "if not os.path.getsize(xml_path) == 0".

Signed-off-by: Chuang Cao <chuang.cao@oracle.com>
Signed-off-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26088:dd64a1bdbe3a
Backport-requested-by: Ian Campbell <Ian.Campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25905:82b61b99d15d
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

pygrub: correct typo in --args assignment

If pygrub was called with --args="some thing", then this string should
be append to the kernel command line. But the last changeset
25941:795c493fe561 contained a typo, it assigns 'args' instead of 'arg'.

Rename the local variable which holds the string from the domain config
file to avoid further confusion.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26018:ecc7627ca6d7
Backport-requested-by: Ian Campbell <Ian.Campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25899:dbb1872bbb97
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

xenballoond.init: remove 4 from default runlevel

Remove 4 from default runlevel in xenballoond.init.

Similar to what changeset 24847:0900b1c905f1 does in xencommons, remove
runlevel 4 from the other runlevel scripts. LSB defines runlevel 4 as
reserved for local use, the local sysadmin is responsible for symlink
creation in rc4.d.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26007:fe756682cc7f
Backport-requested-by: Ian Campbell <Ian.Campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25897:dcd4bf824284
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

hotplug/Linux: Remove tracing (bash -x) from network-nat script

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26006:8b6870d686d6
Backport-requested-by: Ian Campbell <Ian.Campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25896:6adf0c7937bf
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

pygrub: always append --args

If a bootloader entry in menu.lst has no additional kernel command line
options listed and the domU.cfg has 'bootargs="--args=something"' the
additional arguments from the config file are not passed to the kernel.
The reason for that incorrect behaviour is that run_grub appends arg
only if the parsed config file has arguments listed.

Fix this by appending args from image section and the config file separatly.
To avoid adding to a NoneType initialize grubcfg['args'] to an empty string.
This does not change behaviour but simplifies the code which appends the
string.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 25941:795c493fe561
Backport-requested-by: Ian Campbell <Ian.Campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25891:7e91c668bae2
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

docs: correct formatting errors in xmdomain.cfg

This patch corrects the following errors produced by pod2man:

Hey! The above document had some coding errors, which are explained
below:

Around line 301:
    You can't have =items (as at line 305) unless the first thing after
    the =over is an =item

Around line 311:
    '=item' outside of any '=over'

Signed-off-by: Matt Wilson <msw@amazon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 25840:c7e4b7e64303
Backport-requested-by: Ian Campbell <Ian.Campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-4.2-testing changeset: 25885:c23d938e3e64
Backport-requested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86: don't special case first IO-APIC

It has always been puzzling me why the first IO-APIC gets special cased
in two places, and finally Xen got run on a system where this breaks:

(XEN) ACPI: IOAPIC (id[0x10] address[0xfecff000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 16, version 17, address 0xfecff000, GSI 0-2
(XEN) ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[3])
(XEN) IOAPIC[1]: apic_id 15, version 17, address 0xfec00000, GSI 3-38
(XEN) ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[39])
(XEN) IOAPIC[2]: apic_id 14, version 17, address 0xfec01000, GSI 39-74
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 4 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 5 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 6 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 7 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 6 global_irq 9 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 10 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 11 low edge)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 12 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 12 global_irq 15 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 13 global_irq 16 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 17 low edge)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 18 dfl dfl)

i.e. all legacy IRQs (apart from the timer one, but the firmware passed
data doesn't look right for that case anyway, as both Xen and native
Linux are falling back to use the virtual wire setup for IRQ0,
apparently rather using pin 2 of the first IO-APIC) are being handled
by the second IO-APIC.

This at once eliminates the possibility of an unmasked RTE getting
written without having got a vector put in place (in
setup_IO_APIC_irqs()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26102:22e08c9ac770
xen-unstable date: Wed Oct 24 15:51:48 UTC 2012

hvm: handle PoD and grant pages in HVMOP_get_mem_type

During kexec in a ballooned PVonHVM guest the new kernel needs to check
each pfn if its backed by a mfn to find ballooned pages. Currently all
PoD and grant pages will appear as HVMMEM_mmio_dm, so the new kernel has
to assume they are ballooned. This is wrong: PoD pages may turn into
real RAM at runtime, grant pages are also RAM.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Tim Deegan <tim@xen.org>
xen-unstable changeset: 26093:4ae08ca5500f
xen-unstable date: Fri Oct 19 14:09:05 UTC 2012

x86/HPET: obtain proper lock for changing IRQ affinity

The IRQ descriptor lock should be held while adjusting the affinity of
any IRQ; the HPET channel lock isn't sufficient to protect namely
against races with moving the IRQ to a different CPU.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26063:1f4be6ee4619
xen-unstable date: Wed Oct 17 12:13:20 UTC 2012

x86/oprof: adjust off-by-one counter range checks

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26061:4b4c0c7a6031
xen-unstable date: Wed Oct 17 09:23:10 UTC 2012

More efficient TLB-flush filtering in alloc_heap_pages().

Rather than per-cpu filtering for every page in a super-page
allocation, simply remember the most recent TLB timestamp across all
allocated pages, and filter on that, just once, at the end of the
function.

For large-CPU systems, doing 2MB allocations during domain creation,
this cuts down the domain creation time *massively*.

TODO: It may make sense to move the filtering out into some callers,
such as memory.c:populate_physmap() and
memory.c:increase_reservation(), so that the filtering can be moved
outside their loops, too.

Signed-off-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26056:177fdda0be56
xen-unstable date: Mon Oct 15 15:38:11 UTC 2012

x86/xenoprof: fix kernel/user mode detection for HVM

While trying oprofile under Xen, I noticed that HVM passive domain's
kernel addresses were showing up as user application. It turns out
under HVM get_cpu_user_regs()->cs contains 0x0000beef.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Don't cast away const-ness. Use SS instead of CS to determine ring.
Special-case real and protected mode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26055:14e32621dbaf
xen-unstable date: Mon Oct 15 13:04:51 UTC 2012

x86/amd: Fix xen_apic_write warnings in Dom0

[    0.020294] ------------[ cut here ]------------
[    0.020311] WARNING: at arch/x86/xen/enlighten.c:730
xen_apic_write+0x15/0x17()
[    0.020318] Hardware name: empty
[    0.020323] Modules linked in:
[    0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
[    0.020340] Call Trace:
[    0.020354]  [<ffffffff81050379>] warn_slowpath_common+0x80/0x98
[    0.020369]  [<ffffffff810503a6>] warn_slowpath_null+0x15/0x17
[    0.020378]  [<ffffffff810034df>] xen_apic_write+0x15/0x17
[    0.020392]  [<ffffffff8101cb2b>] perf_events_lapic_init+0x2e/0x30
[    0.020410]  [<ffffffff81ee4dd0>] init_hw_perf_events+0x250/0x407
[    0.020419]  [<ffffffff81ee4b80>] ? check_bugs+0x2d/0x2d
[    0.020430]  [<ffffffff81002181>] do_one_initcall+0x7a/0x131
[    0.020444]  [<ffffffff81edbbf9>] kernel_init+0x91/0x15d
[    0.020456]  [<ffffffff817caaa4>] kernel_thread_helper+0x4/0x10
[    0.020471]  [<ffffffff817c347c>] ? retint_restore_args+0x5/0x6
[    0.020481]  [<ffffffff817caaa0>] ? gs_change+0x13/0x13
[    0.020500] ---[ end trace a7919e7f17c0a725 ]---

Kernel function check_hw_exists() writes 0xabcd to msr 0xc0010201 (Performance Event
Counter 0) and read it again to check if it is running as dom0. Early amd cpus does
not reset perf counters during warm reboot. If the kernel is booted with bare metal
and then as a dom0, the content of msr 0xc0010201 will stay and the checking will
pass and PMU will be enabled unexpectedly.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
Don't reset the counters when used for the NMI watchdog.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26054:983108e1b56b
xen-unstable date: Mon Oct 15 13:03:36 UTC 2012

hvmloader: Do not zero the wallclock fields in shared-info.

These fields need to be valid at all times. Hypervisor ensures this
even across 32/64-bit guest transitions.

This fixes a bug where wallclock time is incorrect for booting 32-bit
HVM guests.

This should be backported to Xen 4.1 and 4.2.

Signed-off-by: Keir Fraser <keir@xen.org>
Tested-and-Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen-unstable changeset: 25908:12fa949b9060
xen-unstable date: Fri Sep 14 18:47:57 UTC 2012

libxc: builder: limit maximum size of kernel/ramdisk.

Allowing user supplied kernels of arbitrary sizes, especially during
decompression, can swallow up dom0 memory leading to either virtual
address space exhaustion in the builder process or allocation
failures/OOM killing of both toolstack and unrelated processes.

We disable these checks when building in a stub domain for pvgrub
since this uses the guest's own memory and is isolated.

Decompression of gzip compressed kernels and ramdisks has been safe
since 14954:58205257517d (Xen 3.1.0 onwards).

This is XSA-25 / CVE-2012-4544.

Also make explicit checks for buffer overflows in various
decompression routines. These were already ruled out due to other
properties of the code but check them as a belt-and-braces measure.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ Includes 25589:60f09d1ab1fe for CVE-2012-2625 ]

x86: check remote MMIO remap permissions

When a domain is mapping pages from a different pg_owner domain, the
iomem_access checks are currently only applied to the pg_owner domain,
potentially allowing a domain with a more restrictive iomem_access
policy to have the pages mapped into its page tables. To catch this,
also check the owner of the page tables. The current domain does not
need to be checked because the ability to manipulate a domain's page
tables implies full access to the target domain, so checking that
domain's permission is sufficient.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25952:8278d7d8fa48
xen-unstable date: Wed Sep 26 09:56:07 UTC 2012

x86: tighten checks in XEN_DOMCTL_memory_mapping handler

Properly checking the MFN implies knowing the physical address width
supported by the platform, so to obtain this consistently the
respective code gets moved out of the MTRR subdir.

Btw., the model specific workaround in that code is likely unnecessary
- I believe those CPU models don't support 64-bit mode. But I wasn't
able to formally verify this, so I preferred to retain that code for
now.

But domctl code here also was lacking other error checks (as was,
looking at it again from that angle) the XEN_DOMCTL_ioport_mapping one.
Besides adding the missing checks, printing is also added for the case
where revoking access permissions didn't work (as that may have
implications for the host operator, e.g. wanting to not pass through
affected devices to another guest until the one previously using them
did actually die).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25931:149805919569
xen-unstable date: Thu Sep 20 07:21:53 UTC 2012

x86: properly check XEN_DOMCTL_ioport_mapping arguments for invalid range

In particular, the case of "np" being a very large value wasn't handled
correctly. The range start checks also were off by one (except that in
practice, when "np" is properly range checked, this would still have
been caught by the range end checks).

Also, is a GFN wrap in XEN_DOMCTL_memory_mapping really okay?

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25927:3e3959413b2f
xen-unstable date: Wed Sep 19 07:27:55 UTC 2012

VT-d: split .ack and .disable DMA-MSI actors

Calling irq_complete_move() from .disable is wrong, breaking S3 resume.

Comparing with all other .ack actors, it was also missing a call to
move_{native,masked}_irq(). As the actor is masking its interrupt
anyway (albeit it's not immediately obvious why), the latter is the
better choice.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by Xiantao Zhang <xiantao.zhang@intel.com>
xen-unstable changeset: 25836:7d216f026f71
xen-unstable date: Mon Sep 10 07:45:30 UTC 2012

adjust a few RCU domain locking calls

x86's do_physdev_op() had a case where the locking was entirely
superfluous. Its physdev_map_pirq() further had a case where the lock
was being obtained too early, needlessly complicating early exit paths.

Grant table code had two open coded instances of
rcu_lock_target_domain_by_id(), and a third code section could be
consolidated by using the newly introduced helper function.

The memory hypercall code had two more instances of open coding
rcu_lock_target_domain_by_id(), but note that here this is not just
cleanup, but also fixes an error return path in memory_exchange() to
actually return an error.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25835:c70d70d85306
xen-unstable date: Fri Sep 7 15:58:12 UTC 2012

x86/MSI: fix 2nd S3 resume with interrupt remapping enabled

The first resume from S3 was corrupting internal data structures (in
that pci_restore_msi_state() updated the globally stored MSI message
from traditional to interrupt remapped format, which would then be
translated a second time during the second resume, breaking interrupt
delivery).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25834:0376c85caaf3
xen-unstable date: Fri Sep 7 15:57:10 UTC 2012

make domain_create() return a proper error code

While triggered by the XSA-9 fix, this really is of more general use;
that fix just pointed out very sharply that the current situation
with all domain creation failures reported to user (tools) space as
-ENOMEM is very unfortunate (actively misleading users _and_ support
personnel).

Pull over the pointer <-> error code conversion infrastructure from
Linux, and use it in domain_create() and all it callers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 25808:4746414def65
xen-unstable date: Mon Sep 3 07:40:38 UTC 2012

tmem: bump pool version to 1 to fix restore issue when tmem enabled

Restore fails when tmem is enabled both in hypervisor and guest. This
is due to spec version mismatch when restoring a pool.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
xen-unstable changeset: 25929:fee83ac77d8c
xen-unstable date: Wed Sep 19 15:38:47 UTC 2012

tmem: cleanup

- one more case of checking for a specific rather than any error
- drop redundant casts

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
xen-unstable changeset: 25860:e4cb84111610
xen-unstable date: Tue Sep 11 12:19:29 UTC 2012

tmem: fixup 2010 cleanup patch that breaks tmem save/restore

20918:a3fa6d444b25 "Fix domain reference leaks" (in Feb 2010, by Jan)
does some cleanup in addition to the leak fixes. Unfortunately, that
cleanup inadvertently resulted in an incorrect fallthrough in a switch
statement which breaks tmem save/restore.

That broken patch was apparently applied to 4.0-testing and 4.1-testing
so those are broken as well.

What is the process now for requesting back-patches to 4.0 and 4.1?

(Side note: This does not by itself entirely fix save/restore in 4.2.)

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25859:16e0392c6594
xen-unstable date: Tue Sep 11 12:19:03 UTC 2012

tmem: reduce severity of log messages

Otherwise they can be used by a guest to spam the hypervisor log with
all settings at their defaults.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
xen-unstable changeset: 25858:0520982a602a
xen-unstable date: Tue Sep 11 12:18:36 UTC 2012

tmem: properly drop lock on error path in do_tmem_op()

Reported-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
xen-unstable changeset: 25857:109ea6a0c23a
xen-unstable date: Tue Sep 11 12:18:26 UTC 2012

tmem: properly drop lock on error path in do_tmem_get()

Also remove a bogus assertion.

Reported-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
xen-unstable changeset: 25856:83b97a59888b
xen-unstable date: Tue Sep 11 12:18:08 UTC 2012

tmem: detect arithmetic overflow in tmh_copy_{from,to}_client()

This implies adjusting callers to deal with errors other than -EFAULT
and removing some comments which would otherwise become stale.

Reported-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
xen-unstable changeset: 25855:33b8c42a87ec
xen-unstable date: Tue Sep 11 12:17:59 UTC 2012

tmem: don't access guest memory without using the accessors intended for this

This is not permitted, not even for buffers coming from Dom0 (and it
would also break the moment Dom0 runs in HVM mode). An implication from
the changes here is that tmh_copy_page() can't be used anymore for
control operations calling tmh_copy_{from,to}_client() (as those pass
the buffer by virtual address rather than MFN).

Note that tmemc_save_get_next_page() previously didn't set the returned
handle's pool_id field, while the new code does. It need to be
confirmed that this is not a problem (otherwise the copy-out operation
will require further tmh_...() abstractions to be added).

Further note that the patch removes (rather than adjusts) an invalid
call to unmap_domain_page() (no matching map_domain_page()) from
tmh_compress_from_client() and adds a missing one to an error return
path in tmh_copy_from_client().

Finally note that the patch adds a previously missing return statement
to cli_get_page() (without which that function could de-reference a
NULL pointer, triggerable from guest mode).

This is part of XSA-15 / CVE-2012-3497.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
xen-unstable changeset: 25854:ccd60ed6c555
xen-unstable date: Tue Sep 11 12:17:49 UTC 2012

tmem: check for a valid client ("domain") in the save subops

This is part of XSA-15 / CVE-2012-3497.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
xen-unstable changeset: 25853:f53c5aadbba9
xen-unstable date: Tue Sep 11 12:17:27 UTC 2012

tmem: check the pool_id is valid when destroying a tmem pool

This is part of XSA-15 / CVE-2012-3497.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25852:d189d99ef00c
xen-unstable date: Tue Sep 11 12:06:54 UTC 2012

tmem: consistently make pool_id a uint32_t

Treating it as an int could allow a malicious guest to provide a
negative pool_Id, by passing the MAX_POOLS_PER_DOMAIN limit check and
allowing access to the negative offsets of the pool array.

This is part of XSA-15 / CVE-2012-3497.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 25851:fcf567acc92a
xen-unstable date: Tue Sep 11 12:06:43 UTC 2012