Andrew Cooper [Fri, 14 Mar 2014 08:43:37 +0000 (09:43 +0100)]
common: shuffle use of __attribute__((packed))
This introduced a formal define in compiler.h, and is otherwise manual
shuffling of __attribute__((packed)) statements to __packed at the head of the
structure.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 14 Mar 2014 08:42:28 +0000 (09:42 +0100)]
functional cleanup for __attribute__((packed)) changes
This is to separate the functional changes from the noop consistency changes.
* Pack struct cper_mce_record rather than creating a structure named __packed
* Remove unreferenced struct xgt_desc
* Use two u16's rather than two u32 16-bit bitfields
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Also drop now pointless (and always having been bogus) pack pragmas.
If we failed to open an xc interface, using xch to log an error will end in
tears. Print to stderr instead, as we are bailing immediately later.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Coverity-id: 1191885 Acked-by: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Andrew Cooper [Thu, 13 Mar 2014 13:38:37 +0000 (14:38 +0100)]
console: Traditional console timestamps including milliseconds
Suggested-by: Don Slutz <dslutz@verizon.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Thu, 13 Mar 2014 13:37:58 +0000 (14:37 +0100)]
console: provide timestamps as an offset since boot
This adds a new "Linux style" console timestamp method, which is shorter and
more useful than the current date/time timestamps with single-second
granularity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 13 Mar 2014 13:27:51 +0000 (14:27 +0100)]
x86: make hypercall preemption checks consistent
- never preempt on the first iteration (ensure forward progress)
- never preempt on the last iteration (pointless/wasteful)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 13 Mar 2014 13:26:35 +0000 (14:26 +0100)]
common: make hypercall preemption checks consistent
- never preempt on the first iteration (ensure forward progress)
- do cheap checks first
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
Ian Jackson [Mon, 24 Feb 2014 14:19:15 +0000 (14:19 +0000)]
libxl: Fix carefd lock leak in save callout
If libxl_pipe fails we leave the carefd locked, which translates to
the atfork lock remaining held. This would probably cause the process
to deadlock shortly afterwards.
Of course libxl_pipe is very unlikely to fail unless things are
already going very badly. This bug has not been observed anywhere as
far as we are aware.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> CC: George Dunlap <george.dunlap@eu.citrix.com>
Ian Jackson [Mon, 24 Feb 2014 14:19:14 +0000 (14:19 +0000)]
libxl: Hold the atfork lock while closing carefd
This avoids the process being forked while a carefd is recorded in the
list but the actual fd has been closed. If that happened, a
subsequent libxl_postfork_child_noexec would attempt to close the fd
again. If we are lucky that results in a harmless warning; but if we
are unlucky the fd number has been reused and we close an unrelated
fd.
This race has not been observed anywhere as far as we are aware.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> CC: George Dunlap <george.dunlap@eu.citrix.com>
Ian Campbell [Tue, 14 Jan 2014 16:55:04 +0000 (16:55 +0000)]
xen: arm: correctly write release target in smp_spin_table_cpu_up
flush_xen_data_tlb_range_va() is clearly bogus since it flushes the tlb, not
the data cache. Perhaps what was meant was flush_xen_dcache(), but the address
was mapped with ioremap_nocache and hence isn't cached in the first place.
Accesses should be via writeq though, so do that.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
Andrew Cooper [Tue, 25 Feb 2014 10:54:14 +0000 (10:54 +0000)]
tools/xen-mceinj: Fix depency for the install rule
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Liu Jinsong <jinsong.liu@intel.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Instead of having hard-coded values. We only do PCI vendors
as Jan requested and put all PCI device vendors in one
new file.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Sorted them based on their numerical values per Jan's review] Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
serial: Seperate the PCI device ids and parameters (v1)
This will allow us to re-use the parameters for multiple PCI
devices.
No functional change.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: s/nr/idx/ of the enum, use __initconst and const by Jan's review] Reviewed-by: Jan Beulich <jbeulich@suse.com>
but since I don't have any of those cards this patch does not
enable it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Init for ARM and add offset to virt addr]
[v2: Remove the offset usage] Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
serial: Fix COM1 assumption if pci_uart_config did not find the AMT card.
The io_base by default is set to be 0x3f8 for COM1 and 0x2f8 for COM2
in __setup_xen. Then we call 'ns16550_init' which copies those in
the appropriate uart, which then calls 'ns16550_parse_port_config'
to deal with parameter parsing. If the 'amt' parameter has been
specified we further call 'pci_uart_config code' which scans the PCI bus.
If it does not find the AMT device it would overwrite the io_base with
0x3f8 regardless whether this is COM1 or COM2 - but only if 'amt'
parameter had been specified.
The overwrite is a way to set it back to the failsafe defaults -
except for COM2 it is bogus.
Note again - if an AMT card is found, this over-write will not happen.
This in theory (as I don't have a machine with two COM ports
readily available) means that if the user specified 'com2=9600,8n1,amt'
and the device did not have an AMT serial device, instead of using
0x2f8 for the io_base it ends up using 0x3f8 - and we don't get the
output on COM2. If the user had done 'com2=9600,8n1' we would never
get in this path so this bug would never manifest itself
(because we don't end up scanning for the AMT device).
We also unconditionally reset the IRQ value - so we would never get the
proper interrupt when falling back to the legacy 0x3f8 and 0x2f8 COM ports.
That is OK - as we would end up using the polling mode - while
not the best - it still would work.
Lastly the clock_hz is also set to the default one (UART_CLOCK_HZ,
which is the same for legacy COM1 and COM2 ports)- that is strictly
not a bug, but it is redundant and not needed.
This bug was introduced with the original AMT support and I cannot
recall why it was done that way - it is a bug.
Fix it by saving the original io_base before starting the
scan of the PCI bus. If we don't find an serial PCI device (because
we did not exit out of the loop using return) then
assign the original io_base value back.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Also remove the irq override spotted by Jan]
[v2: Add more details to the commit description] Reviewed-by: Jan Beulich <jbeulich@suse.com>
serial: Skip over PCIe device which have no quirks (fix AMT regression).
The "ns16550: Add support for UART present in Broadcom TruManage
capable NetXtreme chips" implies that only devices that are have
an MMIO BAR and are in the quirks table should be processed.
Even the comment at the end says so:
If we have an io_base, then we succeeded in the lookup
But the code was checking for the !io_base - which is to say if
the io_base was 0 then we would skip scanning. But io_base
always has a value - it is set by 'ns16550_init' to a default
value - so it would never hit the 'continue' path.
This means that if we have an communication device followed by
a serial AMT device we would pick the communication device instead
of the AMT device.
See:
00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
Subsystem: Intel Corporation Device 2008
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at fb12a000 (64-bit, non-prefetchable) [size=16]
00:16.3 Serial controller: Intel Corporation Cougar Point KT Controller (rev 04) (prog-if 02 [16550])
Subsystem: Intel Corporation Device 2008
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 17
I/O ports at f0e0 [size=8]
Memory at fb129000 (32-bit, non-prefetchable) [size=4K]
pci 0000:00:16.0: [8086:1c3a] type 00 class 0x078000
pci 0000:00:16.3: [8086:1c3d] type 00 class 0x070002
And Xen picks 00:16.0 as its console when using 'com1=115200,8n1,amt'.
This patch fixes it and allows us to use AMT again by zeroing
out io_base to zero. If the scan did not work, the io_base is
set back to a default value (the 'pci_uart_config' does that
already at its end).
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> CC: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> CC: Thomas Lendacky <Thomas.Lendacky@amd.com> CC: Keir Fraser <keir@xen.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 18 Feb 2014 15:59:05 +0000 (15:59 +0000)]
tools/libxl: Don't read off the end of tinfo[]
It is very common for BIOSes to advertise more cpus than are actually present
on the system, and mark some of them as offline. This is what Xen does to
allow for later CPU hotplug, and what BIOSes common to multiple different
systems do to to save fully rewriting the MADT in memory.
An excerpt from `xl info` might look like:
...
nr_cpus : 2
max_cpu_id : 3
...
Which shows 4 CPUs in the MADT, but only 2 online (as this particular box is
the dual-core rather than the quad-core SKU of its particular brand)
Because of the way Xen exposes this information, a libxl_cputopology array is
bounded by 'nr_cpus', while cpu bitmaps are bounded by 'max_cpu_id + 1'.
The current libxl code has two places which erroneously assume that a
libxl_cputopology array is as long as the number of bits found in a cpu
bitmap, and valgrind complains:
==14961== Invalid read of size 4
==14961== at 0x407AB7F: libxl__get_numa_candidate (libxl_numa.c:230)
==14961== by 0x407030B: libxl__build_pre (libxl_dom.c:167)
==14961== by 0x406246F: libxl__domain_build (libxl_create.c:371)
...
==14961== Address 0x4324788 is 8 bytes after a block of size 24 alloc'd
==14961== at 0x402669D: calloc (in/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==14961== by 0x4075BB9: libxl__zalloc (libxl_internal.c:83)
==14961== by 0x4052F87: libxl_get_cpu_topology (libxl.c:4408)
==14961== by 0x407A899: libxl__get_numa_candidate (libxl_numa.c:342)
...
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Wed, 19 Feb 2014 14:03:30 +0000 (14:03 +0000)]
xl: Comment error handling in dolog
Coverity-ID: 1087116 Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> CC: coverity@xenproject.org
Ian Jackson [Wed, 19 Feb 2014 14:03:29 +0000 (14:03 +0000)]
libxl: Fix error path in libxl_device_events_handler
libxl_device_events_handler would fail to call AO_ABORT if it failed;
instead it would simply return rc. (This leaves the egc etc. from the
now-abolished stack frame potentially live, and leaves the ctx
locked.)
In xl, this is of no consequence, because xl will immediately exit in
this situation. This is very likely to be true in any other callers
(of which we don't know of any, anyway).
Coverity-ID: 1181840 Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> CC: coverity@xenproject.org
Wei Liu [Tue, 28 Jan 2014 15:38:01 +0000 (15:38 +0000)]
xl: honor more top level vfb options
Now that SDL and keymap options for VFB can also be specified in top
level options. Documentation is also updated.
This fixes bug #31 and further possible problems.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Joby Poriyath [Tue, 4 Feb 2014 18:10:35 +0000 (18:10 +0000)]
xen/pygrub: grub2/grub.cfg from RHEL 7 has new commands in menuentry
menuentry in grub2/grub.cfg uses linux16 and initrd16 commands
instead of linux and initrd. Due to this RHEL 7 (beta) guest failed to
boot after the installation.
In addition to this, RHEL 7 menu entries have two different single-quote
delimited strings on the same line, and the greedy grouping for menuentry
parsing gets both strings, and the options inbetween.
Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: george.dunlap@citrix.com
Chen Baozi [Sun, 16 Feb 2014 16:09:26 +0000 (00:09 +0800)]
xen/arm{32, 64}: fix section shift when mapping 2MB block in boot page table
Section shift for level-2 page table should be #21 rather than #20. Besides,
since there are {FIRST,SECOND,THIRD}_SHIFT macros defined in asm/page.h, use
these macros instead of hard-coded shift value.
Signed-off-by: Chen Baozi <baozich@gmail.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: pass a struct pending_irq* as parameter to gic helper functions
gic_add_to_lr_pending and gic_set_lr should take a struct pending_irq*
as parameter instead of the virtual_irq number and the priority
separately and doing yet another irq_to_pending lookup.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Thu, 6 Mar 2014 16:13:46 +0000 (17:13 +0100)]
docs: remove ia64 from kexec_and_kdump.txt
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Mukesh Rathor [Tue, 11 Mar 2014 12:56:50 +0000 (13:56 +0100)]
pvh: call pit_init for pvh also
During halt of a pvh guest, the guest may do speaker shutdown. This
results in call to handle_speaker_io in xen. It will hang on the vpit
spin lock because it has not been initialized.
Since, pit_init is also called for both pv and hvm, the call is
moved to a more generic place.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 11 Mar 2014 12:53:38 +0000 (13:53 +0100)]
x86/MTRR: consolidation
- use a single set of manifest constants (the ones from msr-index.h)
- drop unnecessary MSR index constants
- get hvm_msr_{read,write}_intercept() in line with the rest of the
MTRR emulation code regarding the number of emulated MTRRs
- remove use of hardcoded numbers where expressions can be used (at
once serving as documentation)
- centrally check mtrr_state.have_fixed in get_fixed_ranges(), making
unnecessary the cpu_has_mtrr check in mtrr_save_fixed_ranges
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
George Dunlap [Mon, 10 Mar 2014 12:46:56 +0000 (12:46 +0000)]
Add a "make rpmball" target
Build a simplistic dummy package, similar to "make debball", for
developers on rpm-based systems.
[ Fixed some trailing whitespace -iwj ]
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Dario Faggioli <dario.faggioli@citrix.com> CC: Olaf Hering <olaf@aepfle.de> CC: Don Slutz <dslutz@verizon.com> CC: M A Young <m.a.young@durham.ac.uk> Tested-by: Don Slutz <dslutz@verizon.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Tue, 11 Mar 2014 09:30:50 +0000 (10:30 +0100)]
tools/libxc: pass errno to callers of xc_domain_save
Callers of xc_domain_save use errno to print diagnostics if the call
fails. But xc_domain_save does not preserve the actual errno in case of
a failure.
This change preserves errno in all cases where code jumps to the label
"out". In addition a new label "exit" is added to catch also code which
used to do just "return 1".
Now libxl_save_helper:complete can print the actual error string.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Tue, 25 Feb 2014 18:46:14 +0000 (18:46 +0000)]
tools/ocaml: Ingore more OCaml test binaries
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: David Scott <dave.scott@eu.citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com>
Julien Grall [Mon, 10 Mar 2014 13:40:50 +0000 (14:40 +0100)]
xmalloc: handle correctly page allocation when align > size
When align is superior to size, we need to retrieve the order from
align during multiple page allocation. I guess it was the goal of the commit fb034f42 "xmalloc: make close-to-PAGE_SIZE allocations more efficient".
Daniel De Graaf [Tue, 4 Mar 2014 22:51:34 +0000 (17:51 -0500)]
xenstored: add --master-domid to support domain builder
When a domain builder stub domain is used, the initial xenstore
connection to domain 0 may use a different domain ID as the endpoint;
allow this domain ID to be specified on the command line.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com>
George Dunlap [Tue, 4 Mar 2014 13:38:19 +0000 (13:38 +0000)]
xl: Add "seize" option to PCI devices
The "seize" option tells the toolstack to attempt to automatically
unbind devices and re-bind them to the pciback driver. This should
make creating VMs that habitually use pass-through (such as driver domain
VMs and gaming VMs) easier to use and manage.
[Whitespace error fixed by iwj.]
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Tim Deegan [Mon, 10 Mar 2014 10:18:49 +0000 (11:18 +0100)]
x86/time: always count s_time from Xen boot
Timestamped printks() can call NOW() before init_xen_time().
Set a baseline TSC as soon as we've calibrated the TSC rate,
so that NOW() consistently counts from boot time.
Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 10 Mar 2014 10:18:05 +0000 (11:18 +0100)]
x86/schedule: remove noreturn from schedule_tail() function pointer
XenServer has recently had a support case where this bugframe in
context_switch() was hit, presumably from a corrupt function pointer as the
vcpu pointer was fine.
On balance, it is better to leave the bugframe around for peace of mind in
exceptional circumstances, than to use the optimisations provided by noreturn.
At any meaningful levels of optimisation, the noreturn causes the bugframe to
be optimised out, meaning that any exceptional returns fall into unlikely
branches, which will result in very weird behaviour.
The unreachable() in BUG() does the useful part of noreturn for us, allowing
the compiler not to mess about restoring stack frames etc, but causes a ud2
instruction to be present.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Len Brown [Mon, 10 Mar 2014 10:14:25 +0000 (11:14 +0100)]
x86/mwait_idle: support Intel Atom Processor C2000 product family
Support the "Intel(R) Atom(TM) Processor C2000 Product Family",
formerly code-named Avoton. It is based on the next generation
Intel Atom processor architecture, formerly code-named Silvermont.
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 10 Mar 2014 10:11:28 +0000 (11:11 +0100)]
kexec: identify which cpu the kexec image is being executed on
A patch to this effect has been in XenServer for a little while, and has
proved to be a useful debugging point for servers which have different
behaviours depending when crashing on the non-bootstrap processor.
Moving the printk() from kexec_panic() to one_cpu_only() means that it will
only be printed for the cpu which wins the race along the kexec path.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: David Vrabel <david.vrabel@citrix.com>
Jan Beulich [Mon, 10 Mar 2014 10:04:36 +0000 (11:04 +0100)]
x86/HVM: consolidate passthrough handling in epte_get_entry_emt()
It is inconsistent to depend on iommu_enabled alone: For a guest
without devices passed through to it, it is of no concern whether the
IOMMU is enabled.
There's one rather special case to take care of: VMX code marks the
LAPIC access page as MMIO. The added assertion needs to take this into
consideration, and the subsequent handling of the direct MMIO case was
inconsistent too: That page would have been WB in the absence of an
IOMMU, but UC in the presence of it, while in fact the cachabilty of
this page is entirely unrelated to an IOMMU being in use.
Jan Beulich [Mon, 10 Mar 2014 10:03:53 +0000 (11:03 +0100)]
x86/HVM: fix memory type merging in epte_get_entry_emt()
Using the minimum numeric value of guest and host specified memory
types is too simplistic - it works only correctly for a subset of
types. It is in particular the WT/WP combination that needs conversion
to UC if the two types conflict.
Dongxiao Xu [Mon, 10 Mar 2014 10:02:25 +0000 (11:02 +0100)]
x86/hvm: refine the judgment on IDENT_PT for EMT
When trying to get the EPT EMT type, the judgment on
HVM_PARAM_IDENT_PT is not correct which always returns WB type if
the parameter is not set. Remove the related code.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
We can't fully drop the dependency yet, but we should certainly avoid
overriding cases already properly handled. The reason for this is that
the guest setting up its MTRRs happens _after_ the EPT tables got
already constructed, and no code is in place to propagate this to the
EPT code. Without this check we're forcing the guest to run with all of
its memory uncachable until something happens to re-write every single
EPT entry. But of course this has to be just a temporary solution.
In the same spirit we should defer the "very early" (when the guest is
still being constructed and has no vCPU yet) override to the last
possible point.
George Dunlap [Thu, 6 Mar 2014 11:19:39 +0000 (12:19 +0100)]
credit: change default timeslice to 5ms
The 30ms timeslice was chosen nearly a decade ago now, with cpu
"burning" workloads in mind. In the mean time, processors have gotten
faster and VMEXITs have gotten faster. A timeslice of 30ms has a
major cost when running latency-sensitive workloads like network or
audio streaming: getting caught behind just one or two other VMs can
introduce a processing delay of up to 60ms, and the "round-robin"
nature of the credit scheduler means this delay may be introduced
every time the VM yields for periods of time.
The XenServer performance team at Citrix have done extensive testing
with various timeslices, including 30ms, 10ms, 5ms, and 2ms. None of
the workloads exhibited any performance degradation with a 5ms
timeslice.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 28 Nov 2013 15:40:48 +0000 (15:40 +0000)]
bitmaps/bitops: Clarify tests for small constant size.
No semantic changes, just makes the control flow a bit clearer.
I was looking at this bcause the (-!__builtin_constant_p(x) | x__)
formula is too clever for Coverity, but in fact it always takes me a
minute or two to understand it too. :)
Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Andrew Cooper [Tue, 4 Mar 2014 10:19:20 +0000 (11:19 +0100)]
x86: identify reset_stack_and_jump() as noreturn
reset_stack_and_jump() is actually a macro, but can effectivly become noreturn
by giving it an unreachable() declaration.
Propagate the 'noreturn-ness' up through the direct and indirect callers.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 4 Mar 2014 10:18:28 +0000 (11:18 +0100)]
misc cleanup as a result of the previous patches
This includes:
* A stale comment in sh_skip_sync()
* A dead for ever loop in __bug()
* A prototype for machine_power_off() which unimplemented in any architecture
* Replacing a for(;;); loop with unreachable()
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 4 Mar 2014 10:17:03 +0000 (11:17 +0100)]
identify panic and reboot/halt functions as noreturn
On an x86 build (GCC Debian 4.7.2-5), this substantially reduces the size of
.text and .init.text sections.
Experimentally, even in a non-debug build, GCC uses `call` rather than `jmp`
so there should be no impact on any stack trace generation.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Make a formal define for noreturn in compiler.h, and fix up opencoded uses of
__attribute__((noreturn)). This includes removing redundant uses with
function definitions which have a public declaration.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 4 Mar 2014 10:14:53 +0000 (11:14 +0100)]
x86/crash: fix up declaration of do_nmi_crash()
... so it can correctly be annotated as noreturn. Move the declaration of
nmi_crash() to be effectively private in crash.c
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 4 Mar 2014 10:01:57 +0000 (11:01 +0100)]
correctly use gcc's -x option
In Linux the improper use was found to cause problems with certain
distributed build environments. Even if not directly affecting us, be
on the safe side.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 4 Mar 2014 09:52:20 +0000 (10:52 +0100)]
IOMMU: generalize and correct softirq processing during Dom0 device setup
c/s 21039:95f5a4ce8f24 ("VT-d: reduce default verbosity") having put a
call to process_pending_softirqs() in VT-d's domain_context_mapping()
was wrong in two ways: For one we shouldn't be doing this when setting
up a device during DomU assignment. And then - I didn't check whether
that was the case already back then - we shouldn't call that function
with the pcidevs_lock (or in fact any spin lock) held.
Move the "preemption" into generic code, at once dealing with further
actual (too much output elsewhere - particularly on systems with very
many host bridge like devices - having been observed to still cause the
watchdog to trigger when enabled) and potential (other IOMMU code may
also end up being too verbose) issues.
Do the "preemption" once per device actually being set up when in
verbose mode, and once per bus otherwise.
Note that dropping pcidevs_lock around the process_pending_softirqs()
invocation is specifically not a problem here: We're in an __init
function and aren't racing with potential additions/removals of PCI
devices. Not acquiring the lock in setup_dom0_pci_devices() otoh is not
an option, as there are too many places that assert the lock being
held.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Wei Liu [Fri, 28 Feb 2014 16:35:15 +0000 (17:35 +0100)]
mm: ensure useful progress in decrease_reservation
During my fun time playing with balloon driver I found that hypervisor's
preemption check kept decrease_reservation from doing any useful work
for 32 bit guests, resulting in hanging the guests.
As Andrew suggested, we can force the check to fail for the first
iteration to ensure progress. We did this in d3a55d7d9 "x86/mm: Ensure
useful progress in alloc_l2_table()" already.
After this change I cannot see the hang caused by continuation logic
anymore.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>