]> xenbits.xensource.com Git - xen.git/log
xen.git
12 years agox86/HPET: mask interrupt while changing affinity
Jan Beulich [Mon, 18 Mar 2013 16:13:32 +0000 (17:13 +0100)]
x86/HPET: mask interrupt while changing affinity

While being unable to reproduce the "No irq handler for vector ..."
messages observed on other systems, the change done by 5dc3fd2 ('x86:
extend diagnostics for "No irq handler for vector" messages') appears
to point at the lack of masking - at least I can't see what else might
be wrong with the HPET MSI code that could trigger these warnings.

While at it, also adjust the message printed by aforementioned commit
to not pointlessly insert spaces - we don't need aligned tabular output
here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agoMAINTAINERS: Remove myself from AMD IOMMU maintainer
Wei Wang [Mon, 18 Mar 2013 10:57:54 +0000 (11:57 +0100)]
MAINTAINERS: Remove myself from AMD IOMMU maintainer

Signed-off-by: Wei Wang <wawei@amazon.com>
12 years agoQEMU_TAG update
Ian Jackson [Fri, 15 Mar 2013 18:29:36 +0000 (18:29 +0000)]
QEMU_TAG update

12 years agoxl: add vif.default.script
Roger Pau Monne [Wed, 13 Mar 2013 17:42:17 +0000 (17:42 +0000)]
xl: add vif.default.script

Replace vifscript with vif.default.script. The old config option is
kept for backwards compatibility.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agoxl: add vif.default.bridge
Roger Pau Monne [Wed, 13 Mar 2013 17:42:17 +0000 (17:42 +0000)]
xl: add vif.default.bridge

This is a replacement for defaultbridge xl.conf option. The now
deprecated defaultbridge is still supported.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agoxl: allow specifying a default gatewaydev in xl.conf
Roger Pau Monne [Wed, 13 Mar 2013 17:42:17 +0000 (17:42 +0000)]
xl: allow specifying a default gatewaydev in xl.conf

This adds a new global option in the xl configuration file called
"vif.default.gatewaydev", that is used to specify the default
gatewaydev to use when none is passed in the vif specification.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ulf Kreutzberg <ulf.kreutzberg@hosteurope.de>
Cc: Ulf Kreutzberg <ulf.kreutzberg@hosteurope.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agoxl/libxl: add gatewaydev/netdev to vif specification
Roger Pau Monne [Wed, 13 Mar 2013 17:42:17 +0000 (17:42 +0000)]
xl/libxl: add gatewaydev/netdev to vif specification

This option is used by the vif-route hotplug script. A new more
descriptive name is used, "gatewaydev", but "netdev" is also supported
as a deprecated backwards compatible option.

This option was supported in the past, according to
http://wiki.xen.org/wiki/Vif-route, so we should also support it in
libxl.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ulf Kreutzberg <ulf.kreutzberg@hosteurope.de>
Cc: Ulf Kreutzberg <ulf.kreutzberg@hosteurope.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agox86/mm: avoid undefined behavior in IS_NIL()
Xi Wang [Fri, 15 Mar 2013 09:26:17 +0000 (10:26 +0100)]
x86/mm: avoid undefined behavior in IS_NIL()

Since pointer overflow is undefined behavior in C, some compilers such
as clang optimize away the check !((ptr) + 1) in the macro IS_NIL().

This patch fixes the issue by casting the pointer type to uintptr_t,
the operations of which are well-defined.

Signed-off-by: Xi Wang <xi@mit.edu>
With that, we also need to avoid the overflow in NIL().

Note that either part of the change results in the respective macros to
become unsuitable for use with "void".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
12 years agotools: libxl: unbreak build after ec41430ef6a7
Ian Campbell [Thu, 14 Mar 2013 09:45:57 +0000 (09:45 +0000)]
tools: libxl: unbreak build after ec41430ef6a7

libxl_create.c: In function ‘libxl__domain_build_info_setdefault’:
libxl_create.c:109: error: ‘info’ undeclared (first use in this function)
libxl_create.c:109: error: (Each undeclared identifier is reported only once
libxl_create.c:109: error: for each function it appears in.)
cc1: warnings being treated as errors
libxl_create.c:108: error: suggest explicit braces to avoid ambiguous ‘else’
libxl_create.c: At top level:
libxl_create.c:141: error: expected identifier or ‘(’ before ‘if’
...

Fix is to insert the missing opening brace and s/info/b_info/ in one spot.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agox86: extend diagnostics for "No irq handler for vector" messages
Jan Beulich [Thu, 14 Mar 2013 11:10:53 +0000 (12:10 +0100)]
x86: extend diagnostics for "No irq handler for vector" messages

By storing the inverted IRQ number in vector_irq[], we may be able to
spot which IRQ a vector was used for most recently, thus hopefully
permitting to understand why these messages trigger on certain systems.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agox86/mem_access: check for errors in p2m->set_entry().
Tim Deegan [Thu, 7 Mar 2013 14:23:05 +0000 (14:23 +0000)]
x86/mem_access: check for errors in p2m->set_entry().

These calls ought always to succeed.  Assert that they do rather than
ignoring the return value.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Aravindh Puthiyaparambil <aravindh@virtuata.com>
12 years agox86/mem_sharing: check for errors in p2m->set_entry().
Tim Deegan [Thu, 7 Mar 2013 14:08:24 +0000 (14:08 +0000)]
x86/mem_sharing: check for errors in p2m->set_entry().

This call ought always to succeed.  Assert that it does rather than
ignoring the return value.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
12 years agox86/ept: check for errors in a few callers of ept_set_entry.
Tim Deegan [Thu, 7 Mar 2013 13:22:32 +0000 (13:22 +0000)]
x86/ept: check for errors in a few callers of ept_set_entry.

AFAICT in all these cases we have the p2m lock and have just checked
that the p2m trie is populated so the call should succeed.  Make it
explicit with ASSERT() rather than just ignoring the result.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
12 years agox86/mm: warn if we ever run out of shadow/hap pool for p2m/lgd ops.
Tim Deegan [Thu, 7 Mar 2013 12:49:52 +0000 (12:49 +0000)]
x86/mm: warn if we ever run out of shadow/hap pool for p2m/lgd ops.

Even if the error propagates up through the p2m ops to the caller,
it'll look like ENOMEM, which won't be obviously a shadow-pool problem.

Warn on the console, once per domain.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
12 years agox86/mm: use bool_t for flags in shadow-pagetable structs
Tim Deegan [Thu, 7 Mar 2013 12:37:12 +0000 (12:37 +0000)]
x86/mm: use bool_t for flags in shadow-pagetable structs

and reshuffle the domain struct to pack a little better.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
12 years agolibxl: use qemu-xen (upstream QEMU) as device model by default
Stefano Stabellini [Tue, 4 Dec 2012 13:06:35 +0000 (13:06 +0000)]
libxl: use qemu-xen (upstream QEMU) as device model by default

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
12 years agolibxl: move check for existence of qemuu device model
Ian Jackson [Fri, 1 Mar 2013 17:17:04 +0000 (17:17 +0000)]
libxl: move check for existence of qemuu device model

The stat in libxl__domain_build_info_setdefault's default device model
logic works to fall back to qemu-xen-traditional whenever the
executable for qemu-xen is not found.

We are going to use qemu-xen-traditional in more cases, so break this
check out into its own if statement.

Also add a pair of braces to make the if() statement symmetrical.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
12 years agolibxl: move libxl_device_action to idl
Roger Pau Monne [Wed, 23 Jan 2013 17:55:39 +0000 (17:55 +0000)]
libxl: move libxl_device_action to idl

Move to idl for ease of expansion and auto-generated functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agolibxl: remove double check in NetBSD hotplug
Roger Pau Monne [Wed, 23 Jan 2013 17:55:44 +0000 (17:55 +0000)]
libxl: remove double check in NetBSD hotplug

Remove a duplicated check performed in libxl__get_hotplug_script_info
for NetBSD

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agolibxl: don't launch more than one tapdisk process for each disk
Roger Pau Monne [Tue, 5 Mar 2013 17:06:29 +0000 (17:06 +0000)]
libxl: don't launch more than one tapdisk process for each disk

When adding a disk don't launch multiple tapdisk instances for the
same disk, if transaction fails in device_disk_add reuse the same
tapdisk for further tries instead of creating a new instance each
time a transaction fails.

Reported-by: Darren Shepherd <darren.s.shepherd@gmail.com>
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Tested-by: Darren Shepherd <darren.s.shepherd@gmail.com>
12 years agoxen: arm: create dom0 DTB /hypervisor/ node dynamically.
Ian Campbell [Mon, 18 Feb 2013 15:20:36 +0000 (15:20 +0000)]
xen: arm: create dom0 DTB /hypervisor/ node dynamically.

I initially added hypervisor-new and confirmed via /proc/device-model
that the content is the same before changing it to drop and replace
an existing node.

NB: There is an ambiguity in the compatibility property.
linux/arch/arm/boot/dts/xenvm-4.2.dts says "xen,xen-4.2" while
Documentation/devicetree/bindings/arm/xen.txt says "xen,xen-4.3". I have
used the actual hypervisor version as discussed in
http://marc.info/?l=xen-devel&m=135963416631423

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoxen: strip xen, multiboot-module nodes from dom0 device tree
Ian Campbell [Mon, 18 Feb 2013 15:20:35 +0000 (15:20 +0000)]
xen: strip xen, multiboot-module nodes from dom0 device tree

These nodes are used by Xen to find the initial modules.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoxen: arm: parse modules from DT during early boot.
Ian Campbell [Mon, 18 Feb 2013 15:20:34 +0000 (15:20 +0000)]
xen: arm: parse modules from DT during early boot.

The bootloader should populate /chosen/modules/module@<N>/ for each
module it wishes to pass to the hypervisor. The content of these nodes
is described in docs/misc/arm/device-tree/booting.txt

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agodtb: correct handling of #address-cells and #size-cells.
Ian Campbell [Mon, 18 Feb 2013 15:20:33 +0000 (15:20 +0000)]
dtb: correct handling of #address-cells and #size-cells.

If a node does not have #*-cells then the parent's value should be
used. Currently we were asssuming zero which is useless.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
12 years agoxen: correct BITS_PER_EVTCHN_WORD on arm
Ian Campbell [Wed, 27 Feb 2013 13:14:54 +0000 (13:14 +0000)]
xen: correct BITS_PER_EVTCHN_WORD on arm

This is always 64-bit on ARM, not BITS_PER_LONG

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agocoverage: fix on ARM
Ian Campbell [Fri, 22 Feb 2013 10:57:40 +0000 (10:57 +0000)]
coverage: fix on ARM

Use a list of pointers to simplify the handling of 32- vs 64-bit.

Also on ARM the section name is ".init_array" and not ".ctors".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
[ ijc -- tweak whitespace per Frediano's comment ]

12 years agox86/MCA: suppress bank clearing for certain injected events
Jan Beulich [Tue, 12 Mar 2013 14:53:30 +0000 (15:53 +0100)]
x86/MCA: suppress bank clearing for certain injected events

As the bits indicating validity of the ADDR and MISC bank MSRs may be
injected in a way that isn't consistent with what the underlying
hardware implements (while the bank must be valid for injection to
work, the auxiliary MSRs may not be implemented - and hence cause #GP
upon access - if the hardware never sets the corresponding valid bits.

Consequently we need to do the clearing writes only if no value was
interposed for the respective MSR (which also makes sense the other way
around: there's no point in clearing a hardware register when all data
read came from software). Of course this all requires the injection
tool to do things in a consistent way (but that had been a requirement
before already).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Ren Yongjie <yongjie.ren@intel.com>
Acked-by: Liu Jinsong <jinsong.liu@intel.com>
12 years agovpmu intel: pass through cpuid bits when BTS is enabled
Dietmar Hahn [Tue, 12 Mar 2013 14:37:45 +0000 (15:37 +0100)]
vpmu intel: pass through cpuid bits when BTS is enabled

This patch passes the orginal cpuid bits for X86_FEATURE_DTES64 (64-bit
DS Area) and X86_FEATURE_DSCPL (CPL Qualified Debug Store) to the guest
when the BTS feature is switched on. I forgot this when I did this BTS
emulation.

Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
12 years agopowernow: add fixups for AMD P-state figures
Konrad Rzeszutek Wilk [Tue, 12 Mar 2013 14:34:22 +0000 (15:34 +0100)]
powernow: add fixups for AMD P-state figures

In the Linux kernel, these two git commits:

f594065faf4f9067c2283a34619fc0714e79a98d
  ACPI: Add fixups for AMD P-state figures
9855d8ce41a7801548a05d844db2f46c3e810166
  ACPI: Check MSR valid bit before using P-state frequencies

Try to fix the the issue that "some AMD systems may round the
frequencies in ACPI tables to 100MHz boundaries. We can obtain the real
frequencies from MSRs, so add a quirk to fix these frequencies up
on AMD systems." (from f594065..)

In discussion (around 9855d8..) "it turned out that indeed real
HW/BIOSes may choose to not set the valid bit and thus mark the
P-state as invalid. So this could be considered a fix for broken
BIOSes." (from 9855d8..)

which is great for Linux. Unfortunatly the Linux kernel, when
it tries to do the RDMSR under Xen it fails to get the right
value (it gets zero) as Xen traps it and returns zero. Hence
when dom0 uploads the P-states they will be unmodified and
we should take care of updating the frequencies with the right
values.

I've tested it under Dell Inc. PowerEdge T105 /0RR825, BIOS 1.3.2
08/20/2008 where this quirk can be observed (x86 == 0x10, model == 2).
Also on other AMD (x86 == 0x12, A8-3850; x86 = 0x14, AMD E-350) to
make sure the quirk is not applied there.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: stefan.bader@canonical.com
Do the MSR access here (and while at it, also the one reading
MSR_PSTATE_CUR_LIMIT) on the target CPU, and bound the loop over
amd_fixup_frequency() by max_hw_pstate (matching the one in
powernow_cpufreq_cpu_init()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
12 years agommu: Introduce XENMEM_claim_pages (subop of memory ops)
Dan Magenheimer [Mon, 11 Mar 2013 16:13:42 +0000 (16:13 +0000)]
mmu: Introduce XENMEM_claim_pages (subop of memory ops)

When guests memory consumption is volatile (multiple guests
ballooning up/down) we are presented with the problem of
being able to determine exactly how much memory there is
for allocation of new guests without negatively impacting
existing guests. Note that the existing models (xapi, xend)
drive the memory consumption from the tool-stack and assume
that the guest will eventually hit the memory target. Other
models, such as the dynamic memory utilized by tmem, do this
differently - the guest drivers the memory consumption (up
to the d->max_pages ceiling). With dynamic memory model, the
guest frequently can balloon up and down as it sees fit.
This presents the problem to the toolstack that it does not
know atomically how much free memory there is (as the information
gets stale the moment the d->tot_pages information is provided
to the tool-stack), and hence when starting a guest can fail
during the memory creation process. Especially if the process
is done in parallel. In a nutshell what we need is a atomic
value of all domains tot_pages during the allocation of guests.
Naturally holding a lock for such a long time is unacceptable.
Hence the goal of this hypercall is to attempt to atomically and very
quickly determine if there are sufficient pages available in the
system and, if so, "set aside" that quantity of pages for future
allocations by that domain.  Unlike an existing hypercall such as
increase_reservation or populate_physmap, specific physical
pageframes are not assigned to the domain because this
cannot be done sufficiently quickly (especially for very large
allocations in an arbitrarily fragmented system) and so the
existing mechanisms result in classic time-of-check-time-of-use
(TOCTOU) races.  One can think of claiming as similar to a
"lazy" allocation, but subsequent hypercalls are required
to do the actual physical pageframe allocation.

Note that one of effects of this hypercall is that from the
perspective of other running guests -  suddenly there is
a new guest occupying X amount of pages. This means that when
we try to balloon up they will hit the system-wide ceiling of
available free memory (if the total sum of the existing d->max_pages
>= host memory). This is OK - as that is part of the overcommit.
What we DO NOT want to do is dictate their ceiling should be
(d->max_pages) as that is risky and can lead to guests OOM-ing.
It is something the guest needs to figure out.

In order for a toolstack to "get" information about whether
a domain has a claim and, if so, how large, and also for
the toolstack to measure the total system-wide claim, a
second subop has been added and exposed through domctl
and libxl (see "xen: XENMEM_claim_pages: xc").

== Alternative solutions ==
There has been a variety of discussion whether the problem
hypercall is solving can be done in user-space, such as:
 - For all the existing guest, set their d->max_pages temporarily
   to d->tot_pages and create the domain. This forces those
   domains to stay at their current consumption level (fyi, this is what
   the tmem freeze call is doing). The disadvantage of this is
   that needlessly forces the guests to stay at the memory usage
   instead of allowing it to decide the optimal target.
 - Account only using d->max_pages of how much free memory there is.
   This ignores ballooning changes and any over-commit scenario. This
   is similar to the scenario where the sum of all d->max_pages (and
   the one to be allocated now) on the host is smaller than the available
   free memory. As such it ignores the over-commit problem.
 - Provide a ring/FIFO along with event channel to notify an userspace
   daemon of guests memory consumption. This daemon can then provide
   up-to-date information to the toolstack of how much free memory
   there is. This duplicates what the hypervisor is already doing and
   introduced latency issues and catching breath for the toolstack as there
   might be millions of these updates on heavily used machine. There might
   not be any quiescent state ever and the toolstack will heavily consume
   CPU cycles and not ever provide up-to-date information.

It has been noted that this claim mechanism solves the
underlying problem (slow failure of domain creation) for
a large class of domains but not all, specifically not
handling (but also not making the problem worse for) PV
domains that specify the "superpages" flag, and 32-bit PV
domains on large RAM systems.  These will be addressed at a
later time.

Code overview:

Though the hypercall simply does arithmetic within locks,
some of the semantics in the code may be a bit subtle.

The key variables (d->unclaimed_pages and total_unclaimed_pages)
starts at zero if no claim has yet been staked for any domain.
(Perhaps a better name is "claimed_but_not_yet_possessed" but that's
a bit unwieldy.)  If no claim hypercalls are executed, there
should be no impact on existing usage.

When a claim is successfully staked by a domain, it is like a
watermark but there is no record kept of the size of the claim.
Instead, d->unclaimed_pages is set to the difference between
d->tot_pages and the claim.  When d->tot_pages increases or decreases,
d->unclaimed_pages atomically decreases or increases.  Once
d->unclaimed_pages reaches zero, the claim is satisfied and
d->unclaimed pages stays at zero -- unless a new claim is
subsequently staked.

The systemwide variable total_unclaimed_pages is always the sum
of d->unclaimed_pages, across all domains.  A non-domain-
specific heap allocation will fail if total_unclaimed_pages
exceeds free (plus, on tmem enabled systems, freeable) pages.

Claim semantics could be modified by flags.  The initial
implementation had three flag, which discerns whether the
caller would like tmem freeable pages to be considered
in determining whether or not the claim can be successfully
staked. This in later patches was removed and there are no
flags.

A claim can be cancelled by requesting a claim with the
number of pages being zero.

A second subop returns the total outstanding claimed pages
systemwide.

Note: Save/restore/migrate may need to be modified,
else it can be documented that all claims are cancelled.

This patch of the proposed XENMEM_claim_pages hypercall/subop, takes
into account review feedback from Jan and Keir and IanC and Matthew Daley,
plus some fixes found via runtime debugging.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
12 years agocredit2: Reset until the front of the runqueue is positive
George Dunlap [Mon, 11 Mar 2013 08:57:11 +0000 (09:57 +0100)]
credit2: Reset until the front of the runqueue is positive

Under normal circumstances, snext->credit should never be less than
-CSCHED_MIN_TIMER.  However, under some circumstances, a vcpu with low
credits may be allowed to run long enough that its credits are
actually less than -CSCHED_CREDIT_INIT.

(Instances have been observed, for example, where a vcpu with 200us of
credit was allowed to run for 11ms, giving it -10.8ms of credit.  Thus
it was still negative even after the reset.)

If this is the case for snext, we simply want to keep moving everyone
up until it is in the black again.  This fair because none of the
other vcpus want to run at the moment.

Rather than loop, just detect how many times we want to add
CSCHED_CREDIT_INIT.  Try to avoid integer divides and multiplies in
the common case.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
12 years agocredit2: Fix erronous ASSERT
George Dunlap [Mon, 11 Mar 2013 08:56:02 +0000 (09:56 +0100)]
credit2: Fix erronous ASSERT

In order to avoid high-frequency cpu migration, vcpus may in fact be
scheduled slightly out-of-order.  Account for this situation properly.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
12 years agox86/vPMU: change Intel model numbers from decimal to hex
Konrad Rzeszutek Wilk [Fri, 8 Mar 2013 15:22:43 +0000 (16:22 +0100)]
x86/vPMU: change Intel model numbers from decimal to hex

Suggested-by: "Nakajima, Jun" <jun.nakajima@intel.com>
Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
12 years agox86/vPMU: add missing Merom, Westmere, and Nehalem models
Konrad Rzeszutek Wilk [Fri, 8 Mar 2013 15:21:03 +0000 (16:21 +0100)]
x86/vPMU: add missing Merom, Westmere, and Nehalem models

Mainly 22 (Merom-L); 30 (Nehelem); and 37, 44 (Westmere).

A comprehensive list is available at:
http://software.intel.com/en-us/articles/intel-architecture-and-processor-identification-with-cpuid-model-and-family-numbers

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
12 years agox86/vPMU: provide comments for which Intel model is what
Konrad Rzeszutek Wilk [Fri, 8 Mar 2013 15:18:15 +0000 (16:18 +0100)]
x86/vPMU: provide comments for which Intel model is what

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoMAINTAINERS: Update my mail address
Christoph Egger [Fri, 8 Mar 2013 13:34:38 +0000 (14:34 +0100)]
MAINTAINERS: Update my mail address

Remove myself as AMD SVM maintainer.

Signed-off-by: Christoph Egger <chegger@amazon.de>
12 years agoremove Andre from the SVM maintainers list
Andre Przywara [Fri, 8 Mar 2013 13:29:32 +0000 (14:29 +0100)]
remove Andre from the SVM maintainers list

Signed-off-by: Andre Przywara <andre.przywara@calxeda.com>
12 years agoAMD: update MAINTAINERS file
Suravee Suthikulpanit [Fri, 8 Mar 2013 13:28:22 +0000 (14:28 +0100)]
AMD: update MAINTAINERS file

Adding AMD engineers to the list of AMD-specific components' maintainers.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
12 years agox86/MSI: add mechanism to fully protect MSI-X table from PV guest accesses
Jan Beulich [Fri, 8 Mar 2013 13:05:34 +0000 (14:05 +0100)]
x86/MSI: add mechanism to fully protect MSI-X table from PV guest accesses

This adds two new physdev operations for Dom0 to invoke when resource
allocation for devices is known to be complete, so that the hypervisor
can arrange for the respective MMIO ranges to be marked read-only
before an eventual guest getting such a device assigned even gets
started, such that it won't be able to set up writable mappings for
these MMIO ranges before Xen has a chance to protect them.

This also addresses another issue with the code being modified here,
in that so far write protection for the address ranges in question got
set up only once during the lifetime of a device (i.e. until either
system shutdown or device hot removal), while teardown happened when
the last interrupt was disposed of by the guest (which at least allowed
the tables to be writable when the device got assigned to a second
guest [instance] after the first terminated).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
12 years agosched: always ask the scheduler to re-place the vcpu when the affinity changes
George Dunlap [Fri, 8 Mar 2013 08:43:40 +0000 (09:43 +0100)]
sched: always ask the scheduler to re-place the vcpu when the affinity changes

It's probably a good idea to re-evaluate placement whenever the
affinity changes.

This additionally has the benefit of removing scheduler-specific
exceptions introduced in git c/s e6a6fd63.

The conditionals surrounding vcpu_migrate() are left pending a re-work
of the logic to avoid the common case calling vcpu_migrate() twice (once
here, and once in context_saved().

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
12 years agotools/xenconsoled: Initialise pointers before trying to use them
Andrew Cooper [Thu, 7 Mar 2013 16:20:50 +0000 (16:20 +0000)]
tools/xenconsoled: Initialise pointers before trying to use them

This is a regression introduced by

"Switch from select() to poll() in xenconsoled's IO loop."
  hg c/s 26405:7359c3122c5d
  git cc5434c933153c4b8812d1df901f8915c22830a8

which results in reliable segfaults during VM power operations.

Switch to calloc(3) in an effort to avoid similar problems with changes in the
future.

Signed-off-by: Marcus Granado <marcus.granado@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
12 years agofix domain unlocking in some xsm error paths
Matthew Daley [Wed, 6 Mar 2013 16:10:26 +0000 (17:10 +0100)]
fix domain unlocking in some xsm error paths

A couple of xsm error/access-denied code paths in hypercalls neglect to
unlock a previously locked domain. Fix by ensuring the domains are
unlocked correctly.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agochange arguments of do_kexec_op and compat_set_timer_op prototypes
Robbie VanVossen [Wed, 6 Mar 2013 16:08:08 +0000 (17:08 +0100)]
change arguments of do_kexec_op and compat_set_timer_op prototypes

... to match the actual functions.

Signed-off-by: Robbie VanVossen <robert.vanvossen@dornerworks.com>
Also make sure the source files defining these symbols include the
header declaring them (had we done so, the problem would have been
noticed long ago).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agox86/shadow: don't use PV LDT area for cross-pages access emulation
Jan Beulich [Tue, 5 Mar 2013 07:51:10 +0000 (08:51 +0100)]
x86/shadow: don't use PV LDT area for cross-pages access emulation

As of 703ac3a ("x86: introduce create_perdomain_mapping()"), the page
tables for this range don't get set up anymore for non-PV guests. And
the way this was done was marked as a hack rather than a proper
mechanism anyway. Use vmap() instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxentrace: fix off-by-one in calculate_tbuf_size
Olaf Hering [Mon, 4 Mar 2013 12:42:17 +0000 (13:42 +0100)]
xentrace: fix off-by-one in calculate_tbuf_size

Commit "xentrace: reduce trace buffer size to something mfn_offset can
reach" contains an off-by-one bug. max_mfn_offset needs to be reduced by
exactly the value of t_info_first_offset.

If the system has two cpus and the number of requested trace pages is
very large, the final number of trace pages + the offset will not fit
into a short. As a result the variable offset in alloc_trace_bufs() will
wrap while allocating buffers for the second cpu. Later
share_xen_page_with_privileged_guests() will be called with a wrong page
and the ASSERT in this function triggers. If the ASSERT is ignored by
running a non-dbg hypervisor the asserts in xentrace itself trigger
because "cons" is not aligned because the very last trace page for the
second cpu is a random mfn.

Thanks to Jan for the quick analysis.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
12 years agocredit2: track residual from divisions done during accounting
George Dunlap [Mon, 4 Mar 2013 12:39:19 +0000 (13:39 +0100)]
credit2: track residual from divisions done during accounting

This should help with under-accounting of vCPU-s running for extremly
short periods of time, but becoming runnable again at a high frequency.

Don't bother subtracting the residual from the runtime, as it can only ever
add up to one nanosecond, and will end up being debited during the next
reset interval anyway.

Original-patch-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
12 years agocredit2: Avoid extra c2t calcuation in csched_runtime
George Dunlap [Mon, 4 Mar 2013 12:38:45 +0000 (13:38 +0100)]
credit2: Avoid extra c2t calcuation in csched_runtime

csched_runtime() needs to call the ct2() function to change credits
into time.  The c2t() function, however, is expensive, as it requires
an integer division.

c2t() was being called twice, once for the main vcpu's credit and once
for the difference between its credit and the next in the queue.  But
this is unnecessary; by calculating in "credit" first, we can make it
so that we just do one conversion later in the algorithm.

This also adds more documentation describing the intended algorithm,
along with a relevant assertion..

The effect of the new code should be the same as the old code.

Spotted-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
12 years agocredit1: Use atomic bit operations for the flags structure
George Dunlap [Mon, 4 Mar 2013 12:37:39 +0000 (13:37 +0100)]
credit1: Use atomic bit operations for the flags structure

The flags structure is not protected by locks (or more precisely,
it is protected using an inconsistent set of locks); we therefore need
to make sure that all accesses are atomic-safe.  This is particulary
important in the case of the PARKED flag, which if clobbered while
changing the YIELD bit will leave a vcpu wedged in an offline state.

Using the atomic bitops also requires us to change the size of the "flags"
element.

Spotted-by: Igor Pavlikevich <ipavlikevich@gmail.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
12 years agox86: make x86_mcinfo_reserve() clear its result buffer
Jan Beulich [Mon, 4 Mar 2013 09:25:24 +0000 (10:25 +0100)]
x86: make x86_mcinfo_reserve() clear its result buffer

... instead of all but one of its callers.

Also adjust the corresponding sizeof() expressions to specify the
pointed-to type of the result variable rather than the literal type
(so that a type change of the variable will imply the size to get
adjusted too).

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
12 years agox86: reduce irq_cpustat_t's __softirq_pending to 32 bits
Jan Beulich [Mon, 4 Mar 2013 09:22:10 +0000 (10:22 +0100)]
x86: reduce irq_cpustat_t's __softirq_pending to 32 bits

Assembly code was already only accessing the low 32 bits of it, and
we're far away from using all 32 bits of it.

Noticed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agox86: don't rely on __softirq_pending to be the first field in irq_cpustat_t
Jan Beulich [Mon, 4 Mar 2013 09:20:57 +0000 (10:20 +0100)]
x86: don't rely on __softirq_pending to be the first field in irq_cpustat_t

This is even more so as the field doesn't have a comment to that effect
in the structure definition.

Once modifying the respective assembly code, also convert the
IRQSTAT_shift users to do a 32-bit shift only (as we won't support 48M
CPUs any time soon) and use "cmpl" instead of "testl" when checking the
field (both reducing code size).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agox86: defer processing events on the NMI exit path
Jan Beulich [Mon, 4 Mar 2013 09:19:34 +0000 (10:19 +0100)]
x86: defer processing events on the NMI exit path

Otherwise, we may end up in the scheduler, keeping NMIs masked for a
possibly unbounded period of time (until whenever the next IRET gets
executed). Enforce timely event processing by sending a self IPI.

Of course it's open for discussion whether to always use the straight
exit path from handle_ist_exception.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agoSEDF: avoid gathering vCPU-s on pCPU0
Jan Beulich [Mon, 4 Mar 2013 09:17:52 +0000 (10:17 +0100)]
SEDF: avoid gathering vCPU-s on pCPU0

The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
incompatible with the SEDF scheduler: Any vCPU using
VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
guests) ends up on pCPU0 after that call. Obviously, running all PV
guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
guests rather sooner than later.

So the main thing that was clearly wrong (and bogus from the beginning)
was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
by a construct that prefers to put back the vCPU on the pCPU that it
got launched on.

However, there's one more glitch: When reducing the affinity of a vCPU
temporarily, and then widening it again to a set that includes the pCPU
that the vCPU was last running on, the generic scheduler code would not
force a migration of that vCPU, and hence it would forever stay on the
pCPU it last ran on. Since that can again create a load imbalance, the
SEDF scheduler wants a migration to happen regardless of it being
apparently unnecessary.

Of course, an alternative to checking for SEDF explicitly in
vcpu_set_affinity() would be to introduce a flags field in struct
scheduler, and have SEDF set a "always-migrate-on-affinity-change"
flag.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agox86: make certain memory sub-ops return valid values
Jan Beulich [Mon, 4 Mar 2013 09:16:04 +0000 (10:16 +0100)]
x86: make certain memory sub-ops return valid values

When a domain's shared info field "max_pfn" is zero,
domain_get_maximum_gpfn() so far returned ULONG_MAX, which
do_memory_op() in turn converted to -1 (i.e. -EPERM). Make the former
always return a sensible number (i.e. zero if the field was zero) and
have the latter no longer truncate return values.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agofix compat memory exchange op splitting
Jan Beulich [Fri, 1 Mar 2013 15:59:49 +0000 (16:59 +0100)]
fix compat memory exchange op splitting

A shift with a negative count was erroneously used here, yielding
undefined behavior.

Reported-by: Xi Wang <xi@mit.edu>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agotools/xentoollog: update tty detection in stdiostream_progress
Olaf Hering [Wed, 27 Feb 2013 14:16:36 +0000 (14:16 +0000)]
tools/xentoollog: update tty detection in stdiostream_progress

As suggested by IanJ:
Check isatty only once to preserve the errno of ->progress users, and to
reduce the noice in strace output.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agobuild: rename deb target as debball
Ian Campbell [Wed, 27 Feb 2013 11:16:47 +0000 (11:16 +0000)]
build: rename deb target as debball

"debball" by analogy with "tarball". Attempt to clarify the purpose of this
target in the comment.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agoAvoid stale pointer when moving domain to another cpupool
Juergen Gross [Thu, 28 Feb 2013 14:56:45 +0000 (14:56 +0000)]
Avoid stale pointer when moving domain to another cpupool

When a domain is moved to another cpupool the scheduler private data pointers
in vcpu and domain structures must never point to an already freed memory
area.

While at it, simplify sched_init_vcpu() by using DOM2OP instead VCPU2OP.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
12 years agovmx: fix handling of NMI VMEXIT.
Tim Deegan [Thu, 28 Feb 2013 12:42:15 +0000 (12:42 +0000)]
vmx: fix handling of NMI VMEXIT.

Call do_nmi() directly and explicitly re-enable NMIs rather than
raising an NMI through the APIC. Since NMIs are disabled after the
VMEXIT, the raised NMI would be blocked until the next IRET
instruction (i.e. the next real interrupt, or after scheduling a PV
guest) and in the meantime the guest will spin taking NMI VMEXITS.

Also, handle NMIs before re-enabling interrupts, since if we handle an
interrupt (and therefore IRET) before calling do_nmi(), we may end up
running the NMI handler with NMIs enabled.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
12 years agox86/mm: fix invalid unlinking of nested p2m tables
Matthew Daley [Thu, 28 Feb 2013 05:16:04 +0000 (18:16 +1300)]
x86/mm: fix invalid unlinking of nested p2m tables

Commit 90805dc (c/s 26387:4056e5a3d815) ("EPT: Make ept data stucture or
operations neutral") makes nested p2m tables be unlinked from the host
p2m table before their destruction (in p2m_teardown_nestedp2m).
However, by this time the host p2m table has already been torn down,
leading to a possible race condition where another allocation between
the two kinds of table being torn down can lead to a linked list
assertion with debug=y builds or memory corruption on debug=n ones.

Fix by swapping the order the two kinds of table are torn down in. While
at it, remove the condition in p2m_final_teardown, as it is already
checked identically in p2m_teardown_hostp2m itself.

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agox86: use linear L1 page table for map_domain_page() page table manipulation
Jan Beulich [Thu, 28 Feb 2013 10:10:53 +0000 (11:10 +0100)]
x86: use linear L1 page table for map_domain_page() page table manipulation

This saves allocation of a Xen heap page for tracking the L1 page table
pointers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agox86: rework hypercall argument translation area setup
Jan Beulich [Thu, 28 Feb 2013 10:09:39 +0000 (11:09 +0100)]
x86: rework hypercall argument translation area setup

... using the new per-domain mapping management functions, adding
destroy_perdomain_mapping() to the previously introduced pair.

Rather than using an order-1 Xen heap allocation, use (currently 2)
individual domain heap pages to populate space in the per-domain
mapping area.

Also fix a benign off-by-one mistake in is_compat_arg_xlat_range().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agox86: introduce create_perdomain_mapping()
Jan Beulich [Thu, 28 Feb 2013 10:08:13 +0000 (11:08 +0100)]
x86: introduce create_perdomain_mapping()

... as well as free_perdomain_mappings(), and use them to carry out the
existing per-domain mapping setup/teardown. This at once makes the
setup of the first sub-range PV domain specific (with idle domains also
excluded), as the GDT/LDT mapping area is needed only for those.

Also fix an improperly scaled BUILD_BUG_ON() expression in
mapcache_domain_init().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agocredit1: track residual from divisions done during accounting
Jan Beulich [Thu, 28 Feb 2013 10:06:42 +0000 (11:06 +0100)]
credit1: track residual from divisions done during accounting

This should help with under-accounting of vCPU-s running for extremly
short periods of time, but becoming runnable again at a high frequency.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
12 years agox86: minor fix for rdmsrl
Liu Jinsong [Thu, 28 Feb 2013 08:22:41 +0000 (09:22 +0100)]
x86: minor fix for rdmsrl

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
12 years agoMAINTAINERS: Remove Jeremy from pv_ops maintainer list.
Keir Fraser [Tue, 26 Feb 2013 16:51:28 +0000 (16:51 +0000)]
MAINTAINERS: Remove Jeremy from pv_ops maintainer list.

Signed-off-by: Keir Fraser <keir@xen.org>
12 years agoCREDITS: First checkin.
Konrad Rzeszutek Wilk [Tue, 26 Feb 2013 16:18:34 +0000 (16:18 +0000)]
CREDITS: First checkin.

Adding Jeremy and moving him from the MAINTAINERS file.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
12 years agoMAINTAINERS: Provide proper URL to the upstream Linux development tree for Xen.
Konrad Rzeszutek Wilk [Tue, 26 Feb 2013 16:18:09 +0000 (16:18 +0000)]
MAINTAINERS: Provide proper URL to the upstream Linux development tree for Xen.

And also put my name behind the maintainership.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
12 years agolibxl: Made it possible to use 'timer='delay_for_missed_ticks'
Konrad Rzeszutek Wilk [Mon, 25 Feb 2013 16:30:18 +0000 (11:30 -0500)]
libxl: Made it possible to use 'timer='delay_for_missed_ticks'

The assertion only allows values of 1 (no_delay_for_missed_ticks)
up to 3 (one_missed_tick_pending). It should be possible to
use the value of 0 (delay_for_missed_ticks) for the timer configuration
option.

Acked-by: Ian Campbell <ian.cambell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
12 years agotools: foreign: ensure 64 bit values are properly aligned for arm
Ian Campbell [Tue, 26 Feb 2013 10:12:46 +0000 (10:12 +0000)]
tools: foreign: ensure 64 bit values are properly aligned for arm

When building the foreign headers on x86_32 we use '#pragma pack(4)' and
therefore need to explicitly align types which should be aligned to 8-byte
boundaries.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
12 years agox86: fix CMCI injection
Jan Beulich [Tue, 26 Feb 2013 09:15:56 +0000 (10:15 +0100)]
x86: fix CMCI injection

This fixes the wrong use of literal vector 0xF7 with an "int"
instruction (invalidated by 25113:14609be41f36) and the fact that doing
the injection via a software interrupt was never valid anyway (because
cmci_interrupt() acks the LAPIC, which does the wrong thing if the
interrupt didn't get delivered though it).

In order to do latter, the patch introduces send_IPI_self(), at once
removing two opend coded uses of "genapic" in the IRQ handling code.

Reported-by: Yongjie Ren <yongjie.ren@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Yongjie Ren <yongjie.ren@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
12 years agoIOMMU, AMD Family15h Model10-1Fh erratum 746 Workaround
Suravee Suthikulpanit [Tue, 26 Feb 2013 09:14:53 +0000 (10:14 +0100)]
IOMMU, AMD Family15h Model10-1Fh erratum 746 Workaround

The IOMMU may stop processing page translations due to a perceived lack
of credits for writing upstream peripheral page service request (PPR)
or event logs. If the L2B miscellaneous clock gating feature is enabled
the IOMMU does not properly register credits after the log request has
completed, leading to a potential system hang.

BIOSes are supposed to disable L2B micellaneous clock gating by setting
L2_L2B_CK_GATE_CONTROL[CKGateL2BMiscDisable](D0F2xF4_x90[2]) = 1b. This
patch corrects that for those which do not enable this workaround.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
12 years agoAMD IOMMU: cover all functions of a device even if ACPI only tells us of func 0
Jan Beulich [Tue, 26 Feb 2013 09:12:57 +0000 (10:12 +0100)]
AMD IOMMU: cover all functions of a device even if ACPI only tells us of func 0

This ought to work as all functions of a device have the same place in
the bus topology, i.e. use the same IOMMU.

Also fix the type of ivrs_bdf_entries (when it's 'unsigned short' and
the last device found on a segment is ff:1f.x, it would otherwise end
up being zero).

And drop the bogus 'last_bdf' static variable, which conflicted anyway
with various functions' parameters.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
12 years agoQEMU_TAG update
Ian Jackson [Mon, 25 Feb 2013 16:45:14 +0000 (16:45 +0000)]
QEMU_TAG update

12 years agox86: fix null pointer dereference in intel_get_extended_msrs()
Xi Wang [Mon, 25 Feb 2013 11:44:25 +0000 (12:44 +0100)]
x86: fix null pointer dereference in intel_get_extended_msrs()

`memset(&mc_ext, 0, ...)' leads to a buffer overflow and a subsequent
null pointer dereference.  Replace `&mc_ext' with `mc_ext'.

Signed-off-by: Xi Wang <xi@mit.edu>
12 years agoQEMU_TAG update
Ian Jackson [Fri, 22 Feb 2013 18:16:54 +0000 (18:16 +0000)]
QEMU_TAG update

12 years agotools/flask: add FLASK policy to build
Daniel De Graaf [Wed, 13 Feb 2013 16:06:57 +0000 (16:06 +0000)]
tools/flask: add FLASK policy to build

This patch enables the compilation of the FLASK policy as part of the
tools build if the needed prerequisites are present.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
12 years agoflask/policy: rework policy build system
Daniel De Graaf [Wed, 13 Feb 2013 16:07:05 +0000 (16:07 +0000)]
flask/policy: rework policy build system

This adds the ability to define security classes and access vectors in
FLASK policy not defined by the hypervisor, for the use of stub domains
or applications without their own security policies.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
12 years agoflask/policy: sort dom0 accesses
Daniel De Graaf [Wed, 13 Feb 2013 16:06:57 +0000 (16:06 +0000)]
flask/policy: sort dom0 accesses

For the example policy shipped with Xen, it makes sense to allow dom0
access to all system calls so that policy does not need to be updated
for each new hypervisor or toolstack feature used.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
12 years agobuild: Fix distclean when repo location changes
Andrei Lifchits [Wed, 20 Feb 2013 16:54:03 +0000 (16:54 +0000)]
build: Fix distclean when repo location changes

If the path to xen-unstable.hg changes (i.e. you move the repo), the symlinks
inside xen-unstable.hg/stubdom/libxc-x86_[32|64]/ all become broken, which
breaks distclean because make attempts to clean inside those first and fails to
find Makefile (which is also a symlink).

Signed-off-by: Andrei Lifchits <andrei.lifchits@citrix.com>
12 years agoxend: Only add cpuid and cpuid_check to sexpr once
Olaf Hering [Thu, 14 Feb 2013 17:18:56 +0000 (17:18 +0000)]
xend: Only add cpuid and cpuid_check to sexpr once

tools/xend: Only add cpuid and cpuid_check to sexpr once

When converting a XendConfig object to sexpr, cpuid and cpuid_check
were being emitted twice in the resulting sexpr.  The first conversion
writes incorrect sexpr, causing parsing of the sexpr to fail when xend
is restarted and domain sexpr files in /var/lib/xend/domains/<dom-uuid>
are read and parsed.

This patch skips the first conversion, and uses only the custom
cpuid{_check} conversion methods called later.  It is not pretty, but
is the least invasive fix in this complex code.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
12 years agodoc: Improve xc_domain_restore inline documentation
Frediano Ziglio [Thu, 14 Feb 2013 14:10:15 +0000 (14:10 +0000)]
doc: Improve xc_domain_restore inline documentation

Was not clear that xc_domain_restore did not resume the machine.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
12 years agoxen: arm: implement cpuinfo
Ian Campbell [Fri, 22 Feb 2013 08:58:25 +0000 (08:58 +0000)]
xen: arm: implement cpuinfo

Use to:

 - Only context switch ThumbEE state if the processor implements it. In
   particular the ARMv8 FastModels do not.
 - Detect the generic timer, and therefore call identify_cpu before
   init_xen_time.

Also improve the boot time messages a bit.

I haven't added decoding for all of the CPUID words, it seems like overkill
for the moment.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Cc: stefano.stabellini@citrix.com
12 years agoxen: arm: Fix guest mode for 64-bit
Ian Campbell [Fri, 22 Feb 2013 08:58:24 +0000 (08:58 +0000)]
xen: arm: Fix guest mode for 64-bit

Need to check for the 64-bit EL2 modes, not 32-bit HYP mode.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: print arm64 not arm32 in xen info when appropriate.
Ian Campbell [Fri, 22 Feb 2013 08:58:23 +0000 (08:58 +0000)]
xen: arm: print arm64 not arm32 in xen info when appropriate.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: Explicitly setup VPIDR & VMPIDR at start of day
Ian Campbell [Fri, 22 Feb 2013 08:58:22 +0000 (08:58 +0000)]
xen: arm: Explicitly setup VPIDR & VMPIDR at start of day

These are supposed to reset to the value of the underlying hardware
but appears not to be on at least some v8 models. There's no harm in
setting them explicitly.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: Use generic mem{cpy, move, set, zero} on 64-bit
Ian Campbell [Fri, 22 Feb 2013 08:58:21 +0000 (08:58 +0000)]
xen: arm: Use generic mem{cpy, move, set, zero} on 64-bit

No optimised versions are available in Linux yet (meaning I couldn't
copy them).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: Enable VFP is a nop on 64-bit.
Ian Campbell [Fri, 22 Feb 2013 08:58:20 +0000 (08:58 +0000)]
xen: arm: Enable VFP is a nop on 64-bit.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: implement do_multicall_call for both 32 and 64-bit
Ian Campbell [Fri, 22 Feb 2013 08:58:19 +0000 (08:58 +0000)]
xen: arm: implement do_multicall_call for both 32 and 64-bit

Obviously nothing is actually making multicalls even on 32-bit so
this isn't tested.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: guest stage 1 walks on 64-bit hypervisor
Ian Campbell [Fri, 22 Feb 2013 08:58:18 +0000 (08:58 +0000)]
xen: arm: guest stage 1 walks on 64-bit hypervisor

Still only supports non-LPAE 32-bit guests.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: handle 32-bit guest CP register traps on 64-bit hypervisor
Ian Campbell [Fri, 22 Feb 2013 08:58:17 +0000 (08:58 +0000)]
xen: arm: handle 32-bit guest CP register traps on 64-bit hypervisor

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: select_user_reg support for 64-bit hypervisor
Ian Campbell [Fri, 22 Feb 2013 08:58:16 +0000 (08:58 +0000)]
xen: arm: select_user_reg support for 64-bit hypervisor

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: Use 64-bit compatible registers in vtimer.
Ian Campbell [Fri, 22 Feb 2013 08:58:15 +0000 (08:58 +0000)]
xen: arm: Use 64-bit compatible registers in vtimer.

Also, don't crash the host if we fail to emulate a vtimer access,
just kill the guest.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: p2m: use 64-bit compatible registers.
Ian Campbell [Fri, 22 Feb 2013 08:58:14 +0000 (08:58 +0000)]
xen: arm: p2m: use 64-bit compatible registers.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: time: use 64-bit compatible registers
Ian Campbell [Fri, 22 Feb 2013 08:58:13 +0000 (08:58 +0000)]
xen: arm: time: use 64-bit compatible registers

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: gic: use 64-bit compatible registers
Ian Campbell [Fri, 22 Feb 2013 08:58:12 +0000 (08:58 +0000)]
xen: arm: gic: use 64-bit compatible registers

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: make dom0 builder work on 64-bit hypervisor
Ian Campbell [Fri, 22 Feb 2013 08:58:11 +0000 (08:58 +0000)]
xen: arm: make dom0 builder work on 64-bit hypervisor

This still only builds a 32-bit dom0, although it lays a bit of
simple ground work for 64-bit dom0.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: show_registers() support for 64-bit.
Ian Campbell [Fri, 22 Feb 2013 08:58:10 +0000 (08:58 +0000)]
xen: arm: show_registers() support for 64-bit.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm: guest context switching.
Ian Campbell [Fri, 22 Feb 2013 08:58:09 +0000 (08:58 +0000)]
xen: arm: guest context switching.

One side effect of this is that we now save the full 64-bit
TTBR[0,1] even on a 32-bit hypervisor. This is needed anyway to
support LPAE guests (although this patch doesn't implement anything
other than the context switch).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
12 years agoxen: arm64: percpu variable support.
Ian Campbell [Fri, 22 Feb 2013 08:58:08 +0000 (08:58 +0000)]
xen: arm64: percpu variable support.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>