]> xenbits.xensource.com Git - xen.git/log
xen.git
13 years agox86-64/EFI: 2.0 header extensions
Jan Beulich [Sat, 17 Sep 2011 15:27:06 +0000 (16:27 +0100)]
x86-64/EFI: 2.0 header extensions

Updates from gnu-efi 3.0m. UEFI 2.0 runtime services additions taken
from EDK 1.06.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
13 years agox86/vmx: don't call __vmxoff() blindly
Jan Beulich [Sat, 17 Sep 2011 15:26:37 +0000 (16:26 +0100)]
x86/vmx: don't call __vmxoff() blindly

If vmx_vcpu_up() failed, __vmxon() would generally not have got
(successfully) executed, and in that case __vmxoff() will #UD.

Additionally, any panic() during early resume (namely the tboot
related one) would cause vmx_cpu_down() to get executed without
vmx_cpu_up() having run before.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
13 years agox86/tboot: make resume error messages visible
Jan Beulich [Sat, 17 Sep 2011 15:25:53 +0000 (16:25 +0100)]
x86/tboot: make resume error messages visible

With tboot_s3_resume() running before console_resume(), the error
messages so far printed by it are mostly guaranteed to go into
nirwana.  Latch MACs into a static variable instead, and issue the
messages right before calling panic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
13 years agoxen: Move tsc reliability check until after CPUs have booted
George Dunlap [Sat, 17 Sep 2011 15:22:54 +0000 (16:22 +0100)]
xen: Move tsc reliability check until after CPUs have booted

AMD CPUs by default enable X86_FEATURE_TSC_RELIABLE, and depend upon a
later check to disable this feature if TSC drift is detected.
Unfortunately, this check is done in time.c:init_xen_time(), which is
done before any secondary CPUs are brought up, and is thus guaranteed
to succed.

This patch moves the check into its own function, and calls it after
cpus are brought up.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agox86/hvm: Tidy up the viridian code a little and flesh out the APIC
Paul Durrant [Sat, 17 Sep 2011 15:22:13 +0000 (16:22 +0100)]
x86/hvm: Tidy up the viridian code a little and flesh out the APIC
assist MSR handling code.

We don't say we that handle that MSR but Windows assumes it. In
Windows 7 it just wrote to the MSR and we used to handle that
ok. Windows 8 also reads from the MSR so we need to keep a record of
the contents.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
13 years agoxen/xsm: Compile error due to naming clash between XSM and EFI runtime
James Carter [Sat, 17 Sep 2011 15:20:58 +0000 (16:20 +0100)]
xen/xsm: Compile error due to naming clash between XSM and EFI runtime

The problem is that efi_runtime_call is the name of both a function in
xen/arch/x86/efi/runtime.c and a member of the xsm_operations struct
in xen/include/xsm/xsm.h. This causes the macro "#define
efi_runtime_call(x) efi_compat_runtime_call(x)" on line 15 of
xen/arch/x86/x86_64/platform_hypercall.c to cause the above compile
error.

Renaming the XSM struct member fixes the problem.

Signed-off-by: James Carter <jwcart2@tycho.nsa.gov>
Acked-by: Jan Beulich <jbeulich@suse.com>
13 years agoAvoid race in schedule() when switching schedulers
Juergen Gross [Sat, 17 Sep 2011 15:19:26 +0000 (16:19 +0100)]
Avoid race in schedule() when switching schedulers

Selecting the scheduler to call must be done under lock. Otherwise a
race might occur when switching schedulers in a cpupool

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agomem_event: use different ringbuffers for share, paging and access
Olaf Hering [Fri, 16 Sep 2011 11:19:26 +0000 (12:19 +0100)]
mem_event: use different ringbuffers for share, paging and access

Up to now a single ring buffer was used for mem_share, xenpaging and
xen-access.  Each helper would have to cooperate and pull only its own
requests from the ring.  Unfortunately this was not implemented. And
even if it was, it would make the whole concept fragile because a crash
or early exit of one helper would stall the others.

What happend up to now is that active xenpaging + memory_sharing would
push memsharing requests in the buffer. xenpaging is not prepared for
such requests.

This patch creates an independet ring buffer for mem_share, xenpaging
and xen-access and adds also new functions to enable xenpaging and
xen-access. The xc_mem_event_enable/xc_mem_event_disable functions will
be removed. The various XEN_DOMCTL_MEM_EVENT_* macros were cleaned up.
Due to the removal the API changed, so the SONAME will be changed too.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Tim Deegan <tim@xen.org>
13 years agomem_event: pass mem_event_domain pointer to mem_event functions
Olaf Hering [Fri, 16 Sep 2011 11:13:31 +0000 (12:13 +0100)]
mem_event: pass mem_event_domain pointer to mem_event functions

Pass a struct mem_event_domain pointer to the various mem_event
functions.  This will be used in a subsequent patch which creates
different ring buffers for the memshare, xenpaging and memaccess
functionality.

Remove the struct domain argument from some functions.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
13 years agolibxc: Enable cpuid performance counter leaf for HVM
Juergen Gross [Thu, 15 Sep 2011 14:26:07 +0000 (15:26 +0100)]
libxc: Enable cpuid performance counter leaf for HVM

In HVM domains the usable performance counters can be checked
automatically only, if cpuid leaf 0x0000000a is accessible.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agoxenstored: allow guest to shutdown all its watches/transactions
Olaf Hering [Thu, 15 Sep 2011 10:08:05 +0000 (11:08 +0100)]
xenstored: allow guest to shutdown all its watches/transactions

During kexec all old watches have to be removed, otherwise the new
kernel will receive unexpected events. Allow a guest to reset itself
and cleanup all of its watches and transactions.

Add a new XS_RESET_WATCHES command to do the reset on behalf of the
guest.

(Changes by iwj: specify the argument to be a single nul byte.  Permit
read-only clients to use the new command.)

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agotools: Revert seabios and upstream qemu build changes
Ian Jackson [Wed, 14 Sep 2011 10:38:13 +0000 (11:38 +0100)]
tools: Revert seabios and upstream qemu build changes

These have broken the build and it seems to be difficult to fix.  So
we will revert the whole lot for now, and await corrected patch(es).

Revert "fix the build when CONFIG_QEMU is specified by the user"
Revert "tools: fix permissions of git-checkout.sh"
Revert "scripts/git-checkout.sh: Is not bash specific. Invoke with /bin/sh."
Revert "Clone and build Seabios by default"
Revert "Clone and build upstream Qemu by default"
Revert "Rename ioemu-dir as qemu-xen-traditional-dir"
Revert "Move the ioemu-dir-find shell script to an external file"

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agofix the build when CONFIG_QEMU is specified by the user
Stefano Stabellini [Tue, 13 Sep 2011 14:46:47 +0000 (15:46 +0100)]
fix the build when CONFIG_QEMU is specified by the user

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Committed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
13 years agotools: fix permissions of git-checkout.sh
Ian Jackson [Tue, 13 Sep 2011 13:52:22 +0000 (14:52 +0100)]
tools: fix permissions of git-checkout.sh

23828:0d21b68f528b introduced a new scripts/git-checkout.sh, but it
had the wrong permissions.  chmod +x it, and add a blank line at the
end to make sure it actually gets updated.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agoscripts/git-checkout.sh: Is not bash specific. Invoke with /bin/sh.
Keir Fraser [Tue, 13 Sep 2011 10:20:57 +0000 (11:20 +0100)]
scripts/git-checkout.sh: Is not bash specific. Invoke with /bin/sh.

Signed-off-by: Keir Fraser <keir@xen.org>
13 years agoxen,credit1: Add variable timeslice
George Dunlap [Tue, 13 Sep 2011 09:43:43 +0000 (10:43 +0100)]
xen,credit1: Add variable timeslice

Add a xen command-line parameter, sched_credit_tslice_ms,
to set the timeslice of the credit1 scheduler.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agoIRQ: IO-APIC support End Of Interrupt for older IO-APICs
Andrew Cooper [Tue, 13 Sep 2011 09:33:10 +0000 (10:33 +0100)]
IRQ: IO-APIC support End Of Interrupt for older IO-APICs

The old io_apic_eoi() function using the EOI register only works for
IO-APICs with a version of 0x20.  Older IO-APICs do not have an EOI
register so line level interrupts have to be EOI'd by flipping the
mode to edge and back, which clears the IRR and Delivery Status bits.

This patch replaces the current io_apic_eoi() function with one which
takes into account the version of the IO-APIC and EOI's
appropriately.

v2: make recursive call to __io_apic_eoi() to reduce code size.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
13 years agoxen: if mapping GSIs we run out of pirq < nr_irqs_gsi, use the others
Stefano Stabellini [Tue, 13 Sep 2011 09:32:24 +0000 (10:32 +0100)]
xen: if mapping GSIs we run out of pirq < nr_irqs_gsi, use the others

PV on HVM guests can have more GSIs than the host, in that case we
could run out of pirq < nr_irqs_gsi. When that happens use pirq >=
nr_irqs_gsi rather than returning an error.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Benjamin Schweikert <b.schweikert@googlemail.com>
13 years agoClone and build Seabios by default
Stefano Stabellini [Tue, 13 Sep 2011 09:30:09 +0000 (10:30 +0100)]
Clone and build Seabios by default

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
13 years agoClone and build upstream Qemu by default
Stefano Stabellini [Tue, 13 Sep 2011 09:29:14 +0000 (10:29 +0100)]
Clone and build upstream Qemu by default

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
13 years agoRename ioemu-dir as qemu-xen-traditional-dir
Stefano Stabellini [Tue, 13 Sep 2011 09:27:53 +0000 (10:27 +0100)]
Rename ioemu-dir as qemu-xen-traditional-dir

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
13 years agoMove the ioemu-dir-find shell script to an external file
Stefano Stabellini [Tue, 13 Sep 2011 09:27:20 +0000 (10:27 +0100)]
Move the ioemu-dir-find shell script to an external file

Add support for configuring upstream qemu and rename ioemu-remote
ioemu-dir-remote.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
13 years agoxenpaging: use batch of pages during final page-in
Olaf Hering [Tue, 13 Sep 2011 09:25:32 +0000 (10:25 +0100)]
xenpaging: use batch of pages during final page-in

Map up to RING_SIZE pages in exit path to fill the ring instead of
populating one page at a time.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
13 years agohvmloader: don't clear acpi_info after filling in some fields
Ian Campbell [Tue, 13 Sep 2011 09:22:03 +0000 (10:22 +0100)]
hvmloader: don't clear acpi_info after filling in some fields

In particular the madt_lapic0_addr and madt_csum_addr fields are
filled in while building the tables.

This fixes a bluescreen on shutdown with certain versions of Windows.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reported-by: Christoph Egger <Christoph.Egger@amd.com>
Tested-and-acked-by: Christoph Egger <Christoph.Egger@amd.com>
13 years agox86/mm: use new page-order interfaces in nested HAP code
Tim Deegan [Thu, 8 Sep 2011 14:13:06 +0000 (15:13 +0100)]
x86/mm: use new page-order interfaces in nested HAP code
to make 2M and 1G mappings in the nested p2m tables.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
13 years agox86/mm: adjust paging interface to return superpage sizes
Tim Deegan [Thu, 8 Sep 2011 14:13:06 +0000 (15:13 +0100)]
x86/mm: adjust paging interface to return superpage sizes
to the caller of paging_ga_to_gfn_cr3()

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
13 years agox86/mm: adjust p2m interface to return superpage sizes
Tim Deegan [Thu, 8 Sep 2011 14:13:06 +0000 (15:13 +0100)]
x86/mm: adjust p2m interface to return superpage sizes

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
13 years agop2m-ept: remove map_domain_page check
Olaf Hering [Wed, 7 Sep 2011 09:37:48 +0000 (10:37 +0100)]
p2m-ept: remove map_domain_page check

map_domain_page() can not fail, remove ASSERT in ept_set_entry().

Signed-off-by: Olaf Hering <olaf@aepfle.de>
13 years agox86: remove unnecessary indirection from irq_complete_move()'s sole parameter
Jan Beulich [Wed, 7 Sep 2011 09:37:20 +0000 (10:37 +0100)]
x86: remove unnecessary indirection from irq_complete_move()'s sole parameter

Signed-off-by: Jan Beulich <jbeulich@suse.com>
13 years agobitmap_scnlistprintf() should always zero-terminate its output buffer
Jan Beulich [Wed, 7 Sep 2011 09:36:55 +0000 (10:36 +0100)]
bitmap_scnlistprintf() should always zero-terminate its output buffer

... as long as it has non-zero size. So far this would not happen if
the passed in CPU mask was empty.

Also fix the comment describing the return value to actually match
reality.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
13 years agodocs: Fix 'make docs'
Keir Fraser [Tue, 6 Sep 2011 14:49:40 +0000 (15:49 +0100)]
docs: Fix 'make docs'

Signed-off-by: Keir Fraser <keir@xen.org>
13 years agomem_event: use mem_event_mark_and_pause() in mem_event_check_ring()
Olaf Hering [Mon, 5 Sep 2011 14:10:28 +0000 (15:10 +0100)]
mem_event: use mem_event_mark_and_pause() in mem_event_check_ring()

Signed-off-by: Olaf Hering <olaf@aepfle.de>
13 years agomem_event: add ref counting for free requestslots
Olaf Hering [Mon, 5 Sep 2011 14:10:09 +0000 (15:10 +0100)]
mem_event: add ref counting for free requestslots

If mem_event_check_ring() is called by many vcpus at the same time
before any of them called also mem_event_put_request(), all of the
callers must assume there are enough free slots available in the ring.

Record the number of request producers in mem_event_check_ring() to
keep track of available free slots.

Add a new mem_event_put_req_producers() function to release a request
attempt made in mem_event_check_ring(). Its required for
p2m_mem_paging_populate() because that function can only modify the
p2m type if there are free request slots. But in some cases
p2m_mem_paging_populate() does not actually have to produce another
request when it is known that the same request was already made
earlier by a different vcpu.

mem_event_check_ring() can not return a reference to a free request
slot because there could be multiple references for different vcpus
and the order of mem_event_put_request() calls is not known. As a
result, incomplete requests could be consumed by the ring user.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
13 years agoIRQ: Introduce old_vector to irq_cfg
Andrew Cooper [Mon, 5 Sep 2011 14:09:24 +0000 (15:09 +0100)]
IRQ: Introduce old_vector to irq_cfg

Introduce old_vector to irq_cfg with the same principle as
old_cpu_mask.  This removes a brute force loop from
__clear_irq_vector(), and paves the way to correct bitrotten logic
elsewhere in the irq code.

Signed-off-by Andrew Cooper <andrew.cooper3@citrix.com>

13 years agoIRQ: Fold irq_status into irq_cfg
Andrew Cooper [Mon, 5 Sep 2011 14:08:38 +0000 (15:08 +0100)]
IRQ: Fold irq_status into irq_cfg

irq_status is an int for each of nr_irqs which represents a single
boolean variable.  Fold it into the bitfield in irq_cfg, which saves
768 bytes per CPU with per-cpu IDTs in use.

Signed-off-by Andrew Cooper <andrew.cooper3@citrix.com>

13 years agoIRQ: Remove bit-rotten code
Andrew Cooper [Mon, 5 Sep 2011 14:02:11 +0000 (15:02 +0100)]
IRQ: Remove bit-rotten code

irq_desc.depth is a write only variable.
LEGACY_IRQ_FROM_VECTOR(vec) is never referenced.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agoxen, vtd: Fix device check for devices behind PCIe-to-PCI bridges
George Dunlap [Mon, 5 Sep 2011 14:00:46 +0000 (15:00 +0100)]
xen, vtd: Fix device check for devices behind PCIe-to-PCI bridges

On some systems, requests devices behind a PCIe-to-PCI bridge all
appear to the IOMMU as though they come from from slot 0, function 0
on that device; so the mapping code much punch a hole for X:0.0 in the
IOMMU for such devices.  When punching the hole, if that device has
already been mapped once, we simply need to check ownership to make
sure it's legal.  To do so, domain_context_mapping_one() will look up
the device for the mapping with pci_get_pdev() and look for the owner.

However, if there is no device in X:0.0, this look up will fail.

Rather than returning -ENODEV in this situation (causing a failure in
mapping the device), try to get the domain ownership from the iommu
context mapping itself.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agoxen: Add global irq_vector_map option, set if using AMD global intremap tables
George Dunlap [Mon, 5 Sep 2011 14:00:15 +0000 (15:00 +0100)]
xen: Add global irq_vector_map option, set if using AMD global intremap tables

As mentioned in previous changesets, AMD IOMMU interrupt
remapping tables only look at the vector, not the destination
id of an interrupt.  This means that all IRQs going through
the same interrupt remapping table need to *not* share vectors.

The irq "vector map" functionality was originally introduced
after a patch which disabled global AMD IOMMUs entirely.  That
patch has since been reverted, meaning that AMD intremap tables
can either be per-device or global.

This patch therefore introduces a global irq vector map option,
and enables it if we're using an AMD IOMMU with a global
interrupt remapping table.

This patch removes the "irq-perdev-vector-map" boolean
command-line optino and replaces it with "irq_vector_map",
which can have one of three values: none, global, or per-device.

Setting the irq_vector_map to any value will override the
default that the AMD code sets.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agons16550: Simplify UART and UART-interrupt probing logic.
Keir Fraser [Fri, 2 Sep 2011 13:56:26 +0000 (14:56 +0100)]
ns16550: Simplify UART and UART-interrupt probing logic.

1. No need to check for UART existence in the polling routine. We
already check for UART existence during boot-time initialisation (see
check_existence() function).

2. No obvious need to send a dummy character. The poll routine will
run until a character is eventually sent, but for the most common use
of serial ports (console logging) that will happen almost immediately.

Signed-off-by: Keir Fraser <keir@xen.org>
13 years agoxen/x86: only support >128 CPUs on x86_64
Ian Campbell [Thu, 1 Sep 2011 16:46:43 +0000 (17:46 +0100)]
xen/x86: only support >128 CPUs on x86_64

32 bit cannot cope with 256 cpus and hits:

    /* At least half the ioremap space should be available to us. */
    BUILD_BUG_ON(IOREMAP_VIRT_START + (IOREMAP_MBYTES << 19) >=
    FIXADDR_START);

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
13 years agox86/mm: use defines for page sizes rather hardcoding them.
Tim Deegan [Thu, 1 Sep 2011 08:39:25 +0000 (09:39 +0100)]
x86/mm: use defines for page sizes rather hardcoding them.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
13 years agoxen: get_free_pirq: make sure that the returned pirq is allocated
Stefano Stabellini [Wed, 31 Aug 2011 14:23:49 +0000 (15:23 +0100)]
xen: get_free_pirq: make sure that the returned pirq is allocated

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
13 years agoxen: __hvm_pci_intx_assert should check for gsis remapped onto pirqs
Stefano Stabellini [Wed, 31 Aug 2011 14:23:34 +0000 (15:23 +0100)]
xen: __hvm_pci_intx_assert should check for gsis remapped onto pirqs

If the isa irq corresponding to a particular gsi is disabled while the
gsi is enabled, __hvm_pci_intx_assert will always inject the gsi
through the violapic, even if the gsi has been remapped onto a pirq.
This patch makes sure that even in this case we inject the
notification appropriately.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
13 years agoxen: fix hvm_domain_use_pirq's behavior
Stefano Stabellini [Wed, 31 Aug 2011 14:23:12 +0000 (15:23 +0100)]
xen: fix hvm_domain_use_pirq's behavior

hvm_domain_use_pirq should return true when the guest is using a
certain pirq, no matter if the corresponding event channel is
currently enabled or disabled.  As an additional complication, qemu is
going to request pirqs for passthrough devices even for Xen unaware
HVM guests, so we need to wait for an event channel to be connected
before considering the pirq of a passthrough device as "in use".

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
13 years agoIRQ: manually EOI migrating line interrupts
Andrew Cooper [Wed, 31 Aug 2011 14:19:24 +0000 (15:19 +0100)]
IRQ: manually EOI migrating line interrupts

When migrating IO-APIC line level interrupts between PCPUs, the
migration code rewrites the IO-APIC entry to point to the new
CPU/Vector before EOI'ing it.

The EOI process says that EOI'ing the Local APIC will cause a
broadcast with the vector number, which the IO-APIC must listen to to
clear the IRR and Status bits.

In the case of migrating, the IO-APIC has already been
reprogrammed so the EOI broadcast with the old vector fails to match
the new vector, leaving the IO-APIC with an outstanding vector,
preventing any more use of that line interrupt.  This causes a lockup
especially when your root device is using PCI INTA (megaraid_sas
driver *ehem*)

However, the problem is mostly hidden because send_cleanup_vector()
causes a cleanup of all moving vectors on the current PCPU in such a
way which does not cause the problem, and if the problem has occured,
the writes it makes to the IO-APIC clears the IRR and Status bits
which unlocks the problem.

This fix is distinctly a temporary hack, waiting on a cleanup of the
irq code.  It checks for the edge case where we have moved the irq,
and manually EOI's the old vector with the IO-APIC which correctly
clears the IRR and Status bits.  Also, it protects the code which
updates irq_cfg by disabling interrupts.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
13 years agox86: add irq count for IPIs
Kevin Tian [Wed, 31 Aug 2011 14:18:23 +0000 (15:18 +0100)]
x86: add irq count for IPIs

such count is useful to assist decision make in cpuidle governor,
while w/o this patch only device interrupts through do_IRQ is
currently counted.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
13 years agovpmu: Add processors Westmere E7-8837 and SandyBridge i5-2500 to the vpmu list
Dietmar Hahn [Wed, 31 Aug 2011 14:17:45 +0000 (15:17 +0100)]
vpmu: Add processors Westmere E7-8837 and SandyBridge i5-2500 to the vpmu list

Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
13 years agox86: Increase the default NR_CPUS to 256
Laszlo Ersek [Wed, 31 Aug 2011 14:16:14 +0000 (15:16 +0100)]
x86: Increase the default NR_CPUS to 256

Changeset 21012:ef845a385014 bumped the default to 128 about one and a
half years ago. Increase it now to 256, as systems with eg. 160
logical CPUs are becoming (have become) common.

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
13 years agonestedsvm: VMRUN doesn't use nextrip
Christoph Egger [Wed, 31 Aug 2011 14:15:41 +0000 (15:15 +0100)]
nestedsvm: VMRUN doesn't use nextrip

VMRUN does not use nextrip. So remove pointless assignment.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
13 years agox86-64: Fix off-by-one error in __addr_ok() macro
Keir Fraser [Wed, 31 Aug 2011 14:14:49 +0000 (15:14 +0100)]
x86-64: Fix off-by-one error in __addr_ok() macro

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Keir Fraser <keir@xen.org>
13 years agoMerge
Ian Jackson [Tue, 30 Aug 2011 10:46:58 +0000 (11:46 +0100)]
Merge

13 years agoConfig.mk: Include optional .config file *first* rather than *last*
Keir Fraser [Sat, 27 Aug 2011 11:20:19 +0000 (12:20 +0100)]
Config.mk: Include optional .config file *first* rather than *last*

Allows the core of Config.mk to correctly respond to any configuration
overrides specified in the .config file.

Signed-off-by: Keir Fraser <keir@xen.org>
13 years agox86: drop unused parameter from msi_compose_msg() and setup_msi_irq()
Jan Beulich [Sat, 27 Aug 2011 11:15:07 +0000 (12:15 +0100)]
x86: drop unused parameter from msi_compose_msg() and setup_msi_irq()

This particularly eliminates the bogus passing of NULL by hpet.c.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agox86: work around certain Intel BIOSes causing (transient) hangs during boot
Jan Beulich [Sat, 27 Aug 2011 11:14:38 +0000 (12:14 +0100)]
x86: work around certain Intel BIOSes causing (transient) hangs during boot

They apparently leave the USB legacy emulation bits set in ICH10's
SMI Control and Enable register, but fail to handle the resulting SMIs
gracefully. The hangs can apparently extend indefinitely, but are
commonly observed to last between a few seconds and a minute.

This assumes that only ICH10-based systems on Intel main boards with
Intel BIOS may be affected. Until Intel comes up with a more precise
identification of affected BIOSes, all Intel ones on Intel boards
will get this workaround applied.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agox86-64: allow mapping mmcfg space for high numbered PCI segments
Jan Beulich [Sat, 27 Aug 2011 11:13:39 +0000 (12:13 +0100)]
x86-64: allow mapping mmcfg space for high numbered PCI segments

Rather than using the segment number directly when determining the
virtual address for a particular mmconfig block, use the array index
instead. Thus a system with (perhaps significantly) less than 2048 PCI
segments, but with some having numbers beyond 2047 can actually have
all its mmconfig blocks mapped.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agoAdd missing 'break' statement.
Kaushik Kumar Ram [Fri, 26 Aug 2011 13:58:41 +0000 (14:58 +0100)]
Add missing 'break' statement.

Without the 'break', assigning a pci device to a PV guest results in an abort,
since the code always falls through to the default abort case in the switch
statement.

Signed-off-by: Kaushik Kumar Ram <kaushik@rice.edu>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agoUpdate my email address in MAINTAINERS
Tim Deegan [Fri, 26 Aug 2011 12:06:39 +0000 (13:06 +0100)]
Update my email address in MAINTAINERS

Signed-off-by: Tim Deegan <tim@xen.org>
13 years agox86/mm/p2m: use defines for page sizes
Christoph Egger [Fri, 26 Aug 2011 12:00:52 +0000 (13:00 +0100)]
x86/mm/p2m: use defines for page sizes

Use defines for page sizes instead of hardcoding the value.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>
13 years agopassthrough: Turn on IOMMU/HAP pagetable sharing by default.
Tim Deegan [Thu, 25 Aug 2011 11:03:14 +0000 (12:03 +0100)]
passthrough: Turn on IOMMU/HAP pagetable sharing by default.

Signed-off-by: Tim Deegan <tim@xen.org>
13 years agox86: don't limit dom0's maximum reservation by the available memory
David Vrabel [Wed, 24 Aug 2011 08:33:10 +0000 (09:33 +0100)]
x86: don't limit dom0's maximum reservation by the available memory

Set dom0's initial maximum reservation using the max value supplied in
the dom0_mem command line option without limiting it by the available
memory.

This allows dom0 to make use of any hotplugged memory without having
to also adjust the maximum reservation.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
13 years agoPassthrough: fix iommu_use_hap_pt() to use hap_enabled()
Tim Deegan [Tue, 23 Aug 2011 09:54:27 +0000 (10:54 +0100)]
Passthrough: fix iommu_use_hap_pt() to use hap_enabled()

In line with 22924:86000076dcee, paging_mode_hap(d) shouldn't be
used in HAP internals that are called during HAP setup.

Signed-off-by: Tim Deegan <tim@xen.org>
13 years agoIOMMU: only try to share IOMMU and HAP tables for domains with P2M.
Tim Deegan [Tue, 23 Aug 2011 09:43:25 +0000 (10:43 +0100)]
IOMMU: only try to share IOMMU and HAP tables for domains with P2M.
This makes the check more precise, and brings VTd in line with AMD code.

Signed-off-by: Tim Deegan <tim@xen.org>
13 years agoVT-d: Explicitly test EPT capabilities during IOMMU init
Tim Deegan [Tue, 23 Aug 2011 09:43:20 +0000 (10:43 +0100)]
VT-d: Explicitly test EPT capabilities during IOMMU init
because the cached version isn't set up until the EPT init happens.

Signed-off-by: Tim Deegan <tim@xen.org>
13 years agox86: Fix up irq vector map logic
George Dunlap [Mon, 22 Aug 2011 15:15:33 +0000 (16:15 +0100)]
x86: Fix up irq vector map logic

We need to make sure that cfg->used_vector is only cleared once;
otherwise there may be a race condition that allows the same vector to
be assigned twice, defeating the whole purpose of the map.

This makes two changes:
* __clear_irq_vector() only clears the vector if the irq is not being
moved
* smp_iqr_move_cleanup_interrupt() only clears used_vector if this
is the last place it's being used (move_cleanup_count==0 after
decrement).

Also make use of asserts more consistent, to catch this kind of logic
bug in the future.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agoAdjust non-debug ASSERT() definition to avoid unused-variable warnings.
Keir Fraser [Mon, 22 Aug 2011 15:15:19 +0000 (16:15 +0100)]
Adjust non-debug ASSERT() definition to avoid unused-variable warnings.

Signed-off-by: Keir Fraser <keir@xen.org>
13 years agonested-p2m: suppress np2m flushes during p2m setup
Christoph Egger [Mon, 22 Aug 2011 13:37:29 +0000 (14:37 +0100)]
nested-p2m: suppress np2m flushes during p2m setup

There is no need to send IPIs within p2m_alloc_table() via
set_p2m_entry().

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Tim Deegan <tim@xen.org>
13 years agoACPI: add _PDC input override mechanism
Jan Beulich [Mon, 22 Aug 2011 09:12:36 +0000 (10:12 +0100)]
ACPI: add _PDC input override mechanism

In order to have Dom0 call _PDC with input fully representing Xen's
capabilities, and in order to avoid building knowledge of Xen
implementation details into Dom0, this provides a mechanism by which
the Dom0 kernel can, once it filled the _PDC input buffer according to
its own knowledge, present the buffer to Xen to apply overrides for
the parts of the C-, P-, and T-state management that it controls. This
is particularly to address the dependency of Xen using MWAIT to enter
certain C-states on the availability of the break-on-interrupt
extension (which the Dom0 kernel should have no need to know about).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agox86/IO-APIC: clear remoteIRR in clear_IO_APIC_pin()
Jan Beulich [Mon, 22 Aug 2011 09:11:10 +0000 (10:11 +0100)]
x86/IO-APIC: clear remoteIRR in clear_IO_APIC_pin()

It was found that in a crash scenario, the remoteIRR bit in an IO-APIC
RTE could be left set, causing problems when bringing up a kdump
kernel. While this generally is most important to be taken care of in
the new kernel (which usually would be a native one), it still seems
desirable to also address this problem in Xen so that (a) the problem
doesn't bite Xen when used as a secondary emergency kernel and (b) an
attempt is being made to save un-fixed secondary kernels from running
into said problem.

Based on a Linux patch from suresh.b.siddha@intel.com.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agopm: don't truncate processors' ACPI IDs to 8 bits
Jan Beulich [Mon, 22 Aug 2011 09:10:39 +0000 (10:10 +0100)]
pm: don't truncate processors' ACPI IDs to 8 bits

This is just another adjustment to allow systems with very many CPUs
(or unusual ACPI IDs) to be properly power-managed.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agoAMD IOMMU: remove iommu tlb flush for non-present entries
Wei Wang [Mon, 22 Aug 2011 09:10:04 +0000 (10:10 +0100)]
AMD IOMMU: remove iommu tlb flush for non-present entries

Fixes dom0 boot on some systems.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
13 years agox86: use 'dom0_mem' to limit the number of pages for dom0
David Vrabel [Mon, 22 Aug 2011 09:05:27 +0000 (10:05 +0100)]
x86: use 'dom0_mem' to limit the number of pages for dom0

Use the 'dom0_mem' command line option to set the maximum number of
pages for dom0.  dom0 can use then use the XENMEM_maximum_reservation
memory op to automatically find this limit and reduce the size of any
page tables etc.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
13 years agonestedhvm: avoid endless loop of nested page faults
Tim Deegan [Fri, 19 Aug 2011 12:29:27 +0000 (13:29 +0100)]
nestedhvm: avoid endless loop of nested page faults

Stop sending IPIs to flush the nested-on-nested pagetable
after write operations. Instead flush the TLB only.
This fixes an endless loop of nested page faults after
adding an entry to the nested-on-nested pagetable.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Tim Deegan <tim@xen.org>
13 years agonestedhvm: do not send IPIs twice
Tim Deegan [Fri, 19 Aug 2011 12:29:25 +0000 (13:29 +0100)]
nestedhvm: do not send IPIs twice

In p2m_get_nestedp2m() there is no need to send IPIs via
nestedhvm_vmcx_flushtlb() since p2m_flush_table() already
did that.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Tim Deegan <tim@xen.org>
13 years agox86/KEXEC: disable hpet legacy broadcasts earlier
Andrew Cooper [Fri, 19 Aug 2011 08:58:22 +0000 (09:58 +0100)]
x86/KEXEC: disable hpet legacy broadcasts earlier

On x2apic machines which booted in xapic mode,
hpet_disable_legacy_broadcast() sends an event check IPI to all online
processors.  This leads to a protection fault as the genapic blindly
pokes x2apic MSRs while the local apic is in xapic mode.

One option is to change genapic when we shut down the local apic, but
there are still problems with trying to IPI processors in the online
processor map which are actually sitting in NMI loops

Another option is to have each CPU take itself out of the online CPU
map during the NMI shootdown.

Realistically however, disabling hpet legacy broadcasts earlier in the
kexec path is the easiest fix to the problem.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
13 years agomini-os: work around ld bug causing stupid CTOR count
Jeremy Fitzhardinge [Fri, 19 Aug 2011 08:57:42 +0000 (09:57 +0100)]
mini-os: work around ld bug causing stupid CTOR count

I'm seeing pvgrub crashing when running CTORs.  It appears its because
the magic in the linker script is generating junk.  If I get ld to
output a map, I see:

.ctors          0x0000000000097000       0x18
                0x0000000000097000                __CTOR_LIST__ = .
                0x0000000000097000        0x4 LONG 0x25c04
                (((__CTOR_END__ - __CTOR_LIST__) / 0x4) - 0x2)
 *(.ctors)
 .ctors         0x0000000000097004       0x10
                /home/jeremy/hg/xen/unstable/stubdom/mini-os-x86_32-grub/mini-os.o
                0x0000000000097014        0x4 LONG 0x0
                0x0000000000097018                __CTOR_END__ = .

In other words, somehow ((0x97018-0x97000) / 4) - 2 = 0x25c04

The specific crash is that the ctor loop tries to call the NULL
sentinel.  I'm seeing the same with the DTOR list.

Avoid this by terminating the loop with the NULL sentinel, and get rid
of the CTOR count entirely.

From: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Keir Fraser <keir@xen.org>
13 years agox86-64/EFI: construct EDD data from device path protocol information
Jan Beulich [Fri, 19 Aug 2011 08:55:20 +0000 (09:55 +0100)]
x86-64/EFI: construct EDD data from device path protocol information

In the absence of a BIOS to handle INT13 requests, this information
must be constructed artificially instead when booted from EFI.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agox86: trampoline cleanup
Jan Beulich [Fri, 19 Aug 2011 08:54:53 +0000 (09:54 +0100)]
x86: trampoline cleanup

To make future changes less error prone, and to slightly simplify a
possible future conversion to a relocatable trampoline even for the
multiboot path (pretty desirable given that we had to change the
trampoline base a number of times to escape collisions with firmware
placed data),
- remove final uses of bootsym_phys() from trampoline.S, allowing the
  symbol to be undefined before including this file (to make sure no
  new references get added)
- replace two easy to deal with uses of bootsym_phys() in head.S
- remove an easy to replace reference to BOOT_TRAMPOLINE

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agox86: make run-time part of trampoline relocatable
Jan Beulich [Fri, 19 Aug 2011 08:54:26 +0000 (09:54 +0100)]
x86: make run-time part of trampoline relocatable

In order to eliminate an initial hack in the EFI boot code (where
memory for the trampoline was just "claimed" instead of properly
allocated), the trampoline code must no longer make assumption on the
address at which it would be located. For the time being, the fixed
address is being retained for the traditional multiboot path.

As an additional benefit (at least from my pov) it allows confining
the visibility of the BOOT_TRAMPOLINE definition to just the boot
code.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agox86: simplify (and fix) clear_IO_APIC{,_pin}()
Jan Beulich [Tue, 16 Aug 2011 14:05:55 +0000 (15:05 +0100)]
x86: simplify (and fix) clear_IO_APIC{,_pin}()

These are used during bootup and (emergency) shutdown only, and their
only purpose is to get the actual IO-APIC's RTE(s) cleared.
Consequently, only the "raw" accessors should be used (and the ones
going through interrupt remapping code can be skipped), with the
exception of determining the delivery mode: This one must always go
through the interrupt remapping path, as in the VT-d case the actual
IO-APIC's RTE will have the delivery mode always set to zero (which
before possibly could have resulted in such an entry getting cleared
in the "raw" pass, though I haven't observed this case in practice).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agopassthrough: don't use open coded IO-APIC accesses
Jan Beulich [Tue, 16 Aug 2011 14:05:30 +0000 (15:05 +0100)]
passthrough: don't use open coded IO-APIC accesses

This makes the respective functions quite a bit more legible.

Since this requires fiddling with __ioapic_{read,write}_entry()
anyway,
make them and their wrappers have their argument types match those of
__io_apic_{read,write}() (int -> unsigned int).

No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agox86-64/mmcfg: relax base address restriction
Jan Beulich [Tue, 16 Aug 2011 14:05:03 +0000 (15:05 +0100)]
x86-64/mmcfg: relax base address restriction

Following what Linux did quite a while ago, don't generally disallow
MMCFG base addresses to live above the 4Gb boundary: New systems are
assumed to be fine, and SGI ones are, too.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agoRevert 23733:fbf3768e5934 "AMD IOMMU: remove global ..."
Ian Jackson [Tue, 16 Aug 2011 14:04:19 +0000 (15:04 +0100)]
Revert 23733:fbf3768e5934 "AMD IOMMU: remove global ..."

23733:fbf3768e5934 causes xen-unstable not to boot on several of the
xen.org AMD test systems.  We get an endless series of these:

  (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x00a0, fault
  address = 0xfdf8f10144

I have constructed the attached patch which reverts c/s 23733
(adjusted for conflicts due to subsequent patches).  With this
reversion Xen once more boots on these machines.

23733 has been in the tree for some time now, causing this breakage,
and has already been fingered by the automatic bisector and discussed
on xen-devel as the cause of boot failures.  I think it is now time to
revert it pending a correct fix to the original problem.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agoamd iommu: Automatic page coalescing
Wei Wang [Tue, 16 Aug 2011 14:03:11 +0000 (15:03 +0100)]
amd iommu: Automatic page coalescing

This patch implements automatic page coalescing when separated io page
table is used. It uses ignore bits in iommu pde to cache how many
entries lower next page level are suitable for coalescing and then
builds a super page entry when all lower entries are contiguous.  This
patch has been tested OK for weeks mainly with graphic devices and 3D
mark vantage.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
13 years agox86/PCI-MSI: properly determine VF BAR values
Jan Beulich [Sat, 13 Aug 2011 09:14:58 +0000 (10:14 +0100)]
x86/PCI-MSI: properly determine VF BAR values

As was discussed a couple of times on this list, SR-IOV virtual
functions have their BARs read as zero - the physical function's
SR-IOV capability structure must be consulted instead. The bogus
warnings people complained about are being eliminated with this
change.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agox86: IRQ fix incorrect logic in __clear_irq_vector
Andrew Cooper [Sat, 13 Aug 2011 09:14:28 +0000 (10:14 +0100)]
x86: IRQ fix incorrect logic in __clear_irq_vector

In the old code, tmp_mask is the cpu_and of cfg->cpu_mask and
cpu_online_map.  However, in the usual case of moving an IRQ from one
PCPU to another because the scheduler decides its a good idea,
cfg->cpu_mask and cfg->old_cpu_mask do not intersect.  This causes the
old cpu vector_irq table to keep the irq reference when it shouldn't.

This leads to a resource leak if a domain is shut down wile an irq has
a move pending, which results in Xen's create_irq() eventually failing
with -ENOSPC when all vector_irq tables are full of stale references.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
13 years agox86/amd: Add support for read-only APERF/MPERF
Mark Langsdorf [Sat, 13 Aug 2011 09:13:38 +0000 (10:13 +0100)]
x86/amd: Add support for read-only APERF/MPERF

AMD is adding support for a read-only mode of the APERF
and MPERF MSRs. When this mode is enabled, writes to
these registers are ignored and do no reset the registers.
This allows multiple well-behaved programs to share the
use of the registers even if a poorly behaved program
attempts to reset them. Support for this feature is
indicated by a CPUID bit.

AMD has been recommending that well-behaved software
avoid resetting the APERF and MPERF MSRs. Enabling
this feature should not change the behavior of well-
behaved software. This change has been tested with the
turbostat and cpufreq-aperf applications.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
13 years agoVT-d: don't reject valid DMAR/ATSR tables on systems with multiple PCI segments
Jan Beulich [Sat, 13 Aug 2011 09:12:49 +0000 (10:12 +0100)]
VT-d: don't reject valid DMAR/ATSR tables on systems with multiple PCI segments

On multi-PCI-segment systems, each segment has to be expected to have
an include-all DRHD and an all-ports ATSR, so the firmware consistency
check incorrectly rejects valid configurations there (which is
particularly problematic when the firmware also pre-enabled x2apic
mode, as the system will panic in that case due to being unable to
enable interrupt remapping). Thus constrain the check to just segment
0 for now; once full multi-segment support is there (which I'm working
on), it can be revisited whether we'd want to track this per segment,
or whether we trust the firmware of such large systems.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
13 years agoPassthrough: disable bus-mastering on any card that causes an IOMMU fault.
Tim Deegan [Fri, 12 Aug 2011 10:29:24 +0000 (11:29 +0100)]
Passthrough: disable bus-mastering on any card that causes an IOMMU fault.

This stops the card from raising back-to-back faults and live-locking
the CPU that handles them.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Wei Wang2 <wei.wang2@amd.com>
Acked-by: Allen M Kay <allen.m.kay@intel.com>
13 years agohvmloader: relicense hvmloader xenbus implementation under more
Ian Campbell [Wed, 10 Aug 2011 13:43:34 +0000 (14:43 +0100)]
hvmloader: relicense hvmloader xenbus implementation under more
liberal terms.

This code is a great example of a simple xenbus implementation and we
would like to reuse it in projects with non-GPLv2 license
(specifically in this case SeaBIOS which is GPLv3).

I picked the license from extras/mini-os/COPYING (A two clause BSD
style license) since mini-os exists for much the same purpose.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
13 years agoACPI ERST: Revert change to erst_check_table() to be more permissive.
Keir Fraser [Tue, 9 Aug 2011 17:06:43 +0000 (18:06 +0100)]
ACPI ERST: Revert change to erst_check_table() to be more permissive.

Permits tables that apparently Xen cannot handle (causes boot failure
on many systems).

Signed-off-by: Keir Fraser <keir@xen.org>
13 years agoRevert 23757:f5176c177b99 "xenstored: allow guests to reintroduce themselves"
Ian Jackson [Tue, 9 Aug 2011 16:48:16 +0000 (17:48 +0100)]
Revert 23757:f5176c177b99 "xenstored: allow guests to reintroduce themselves"

This patch seems to have been applied by mistake, despite adverse
comments on the list and a lack of an appropriate ack.

Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
13 years agohvmloader: Move init_vm86_tss() back into common code.
Keir Fraser [Tue, 9 Aug 2011 10:33:40 +0000 (11:33 +0100)]
hvmloader: Move init_vm86_tss() back into common code.

It is not BIOS specific.

Signed-off-by: Keir Fraser <keir@xen.org>
13 years agoxenstored: allow guests to reintroduce themselves
Olaf Hering [Tue, 9 Aug 2011 07:53:40 +0000 (08:53 +0100)]
xenstored: allow guests to reintroduce themselves

During kexec all old watches have to be removed, otherwise the new
kernel will receive unexpected events. Allow a guest to introduce
itself
and cleanup all of its watches.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
13 years agohvmloader: Enable SCI in QEMU has it disabled.
Keir Fraser [Thu, 28 Jul 2011 14:40:54 +0000 (15:40 +0100)]
hvmloader: Enable SCI in QEMU has it disabled.

When booting a Windows guest, the OS report an issue with the ACPI (in
a BSOD). The exact issue is "SCI_EN never becomes set in PM1 Control
Register." (quoted from WinDbg help).

So this patch enables the flags SCI_EN if it is not yet enabled.

Reported-by: Tobias Geiger <tobias.geiger@vido.info>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Keir Fraser <keir@xen.org>
13 years agox86/mm: Handle 1GiB superpages in the pagetable walker.
Tim Deegan [Thu, 28 Jul 2011 12:45:09 +0000 (13:45 +0100)]
x86/mm: Handle 1GiB superpages in the pagetable walker.

This allows HAP guests to use 1GiB superpages.  Shadow and PV guests
still can't use them without more support in shadow/* and mm.c.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
13 years agoxen: AMD IOMMU: Automatically enable per-device vector maps
George Dunlap [Tue, 26 Jul 2011 17:37:32 +0000 (18:37 +0100)]
xen: AMD IOMMU: Automatically enable per-device vector maps

Automatically enable per-device vector maps when using IOMMU,
unless disabled specifically by an IOMMU parameter.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agoxen: Option to allow per-device vector maps for MSI IRQs
George Dunlap [Tue, 26 Jul 2011 17:37:16 +0000 (18:37 +0100)]
xen: Option to allow per-device vector maps for MSI IRQs

Add a vector-map to pci_dev, and add an option to point MSI-related
IRQs to the vector-map of the device.

This prevents irqs from the same device from being assigned
the same vector on different pcpus.  This is required for systems
using an AMD IOMMU, since the intremap tables on AMD only look at
vector, and not destination ID.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agoxen: Infrastructure to allow irqs to share vector maps
George Dunlap [Tue, 26 Jul 2011 17:36:58 +0000 (18:36 +0100)]
xen: Infrastructure to allow irqs to share vector maps

Laying the groundwork for per-device vector maps.  This generic
code allows any irq to point to a vector map; all irqs sharing the
same vector map will avoid sharing vectors.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
13 years agoNested VMX: fix error paths in emulation of VMLAUNCH and VMRESUME.
Tim Deegan [Tue, 26 Jul 2011 16:00:25 +0000 (17:00 +0100)]
Nested VMX: fix error paths in emulation of VMLAUNCH and VMRESUME.

These instructions don't fault on bad VMCS pointers, they set bits in
RFLAGS and continue execution.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
13 years agoNested VMX: always mark VVMCS as not-launched on VMCLEAR.
Tim Deegan [Tue, 26 Jul 2011 16:00:24 +0000 (17:00 +0100)]
Nested VMX: always mark VVMCS as not-launched on VMCLEAR.

The SDM says to flush changes and clear the launch state even if this
isn't the "current VMCS".  KVM relies on this behaviour, so take the
warning printk away as well.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>