Huang Ying [Wed, 2 Mar 2011 07:56:20 +0000 (08:56 +0100)]
KVM, MCE, unpoison memory address across reboot
In Linux kernel HWPoison processing implementation, the virtual
address in processes mapping the error physical memory page is marked
as HWPoison. So that, the further accessing to the virtual
address will kill corresponding processes with SIGBUS.
If the error physical memory page is used by a KVM guest, the SIGBUS
will be sent to QEMU, and QEMU will simulate a MCE to report that
memory error to the guest OS. If the guest OS can not recover from
the error (for example, the page is accessed by kernel code), guest OS
will reboot the system. But because the underlying host virtual
address backing the guest physical memory is still poisoned, if the
guest system accesses the corresponding guest physical memory even
after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
simulated. That is, guest system can not recover via rebooting.
In fact, across rebooting, the contents of guest physical memory page
need not to be kept. We can allocate a new host physical page to
back the corresponding guest physical address.
This patch fixes this issue in QEMU-KVM via calling qemu_ram_remap()
to clear the corresponding page table entry, so that make it possible
to allocate a new page to recover the issue.
Huang Ying [Wed, 2 Mar 2011 07:56:19 +0000 (08:56 +0100)]
Add qemu_ram_remap
qemu_ram_remap() unmaps the specified RAM pages, then re-maps these
pages again. This is used by KVM HWPoison support to clear HWPoisoned
page tables across guest rebooting, so that a new page may be
allocated later to recover the memory error.
Jan Kiszka [Wed, 2 Mar 2011 07:56:17 +0000 (08:56 +0100)]
kvm: x86: Clean up kvm_setup_mce
There is nothing to abstract here. Fold kvm_setup_mce into its caller
and fix up the error reporting (return code of kvm_vcpu_ioctl holds the
error value).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> CC: Huang Ying <ying.huang@intel.com> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 2 Mar 2011 07:56:16 +0000 (08:56 +0100)]
kvm: x86: Consolidate TCG and KVM MCE injection code
This switches KVM's MCE injection path to cpu_x86_inject_mce, both for
SIGBUS and monitor initiated events. This means we prepare the MCA MSRs
in the VCPUState also for KVM.
We have to drop the MSRs writeback restrictions for this purpose which
is now safe as every uncoordinated MSR injection is removed with this
patch.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> CC: Huang Ying <ying.huang@intel.com> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 2 Mar 2011 07:56:15 +0000 (08:56 +0100)]
x86: Run qemu_inject_x86_mce on target VCPU
We will use the current TCG-only MCE injection path for KVM as well, and
then this read-modify-write of the target VCPU state has to be performed
synchronously in the corresponding thread.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 2 Mar 2011 07:56:14 +0000 (08:56 +0100)]
kvm: x86: Inject pending MCE events on state writeback
The current way of injecting MCE events without updating of and
synchronizing with the CPUState is broken and causes spurious
corruptions of the MCE-related parts of the CPUState.
As a first step towards a fix, enhance the state writeback code with
support for injecting events that are pending in the CPUState. A pending
exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
And, just like for TCG, we need to leave the halt state when
CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
code).
This will also allow to unify TCG and KVM injection code.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> CC: Huang Ying <ying.huang@intel.com> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 2 Mar 2011 07:56:11 +0000 (08:56 +0100)]
Synchronize VCPU states before reset
This is required to support keeping VCPU states across a system reset.
If we do not read the current state before the reset,
cpu_synchronize_all_post_reset may write back incorrect state
information.
The first user of this will be MCE MSR synchronization which currently
works around the missing cpu_synchronize_all_states.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 2 Mar 2011 07:56:10 +0000 (08:56 +0100)]
x86: Optionally avoid injecting AO MCEs while others are pending
Allow to tell cpu_x86_inject_mce that it should ignore Action Optional
MCE events when the target VCPU is still processing another one. This
will be used by KVM soon.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> CC: Huang Ying <ying.huang@intel.com> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Paolo Bonzini [Sat, 12 Mar 2011 16:43:57 +0000 (17:43 +0100)]
always qemu_cpu_kick after unhalting a cpu
This ensures env->halt_cond is broadcast, and the loop in
qemu_tcg_wait_io_event and qemu_kvm_wait_io_event is exited
naturally rather than through a timeout.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Paolo Bonzini [Sat, 12 Mar 2011 16:43:54 +0000 (17:43 +0100)]
add assertions on the owner of a QemuMutex
These are already present in the Win32 implementation, add them to
the pthread wrappers as well. Use PTHREAD_MUTEX_ERRORCHECK for mutex
operations. Later we'll add tracking of the owner for cond_signal/broadcast.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Paolo Bonzini [Sat, 12 Mar 2011 16:43:52 +0000 (17:43 +0100)]
add win32 qemu-thread implementation
For now, qemu_cond_timedwait and qemu_mutex_timedlock are left as
POSIX-only functions. They can be removed later, once the patches
that remove their uses are in.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Jan Kiszka [Sat, 12 Mar 2011 16:43:51 +0000 (17:43 +0100)]
Refactor thread retrieval and check
We have qemu_cpu_self and qemu_thread_self. The latter is retrieving the
current thread, the former is checking for equality (using CPUState). We
also have qemu_thread_equal which is only used like qemu_cpu_self.
This refactors the interfaces, creating qemu_cpu_is_self and
qemu_thread_is_self as well ass qemu_thread_get_self.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Fix this by limiting the access to the allowed range.
MultiArcadeMachineEmulator has newer versions of fmopl,
but using these requires more efforts.
Cc: Blue Swirl <blauwirbel@gmail.com> Reviewed-by: malc <av1474@comtv.ru> Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
William Dauchy [Sun, 6 Mar 2011 21:27:18 +0000 (22:27 +0100)]
moving eeprom initialization
The initialization should not be only on reset but also when initializing
the device.
It resolves a bug when hot plugging a pci network device: the mac address
was always null.
Signed-off-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Blue Swirl [Sat, 12 Mar 2011 09:52:25 +0000 (09:52 +0000)]
pc: fix wrong CMOS values for floppy drives
Before commit 63ffb564dca94f8bda01ed6d209784104630a4d2, states for
floppy drives were calculated in fdc.c:fd_revalidate(). There it is
also considered whether a disk is inserted or not. The commit didn't copy
the logic completely to pc.c, which caused a regression.
Fix by adding the same check also to pc.c.
Reported-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Tested-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Instead of fixing the wrong parameter value, bitmap_clear(), bitmap_set
and width_mask were removed, and bitmap_intersect() was replaced by
!bitmap_empty(). The new operation is much shorter and equivalent to
the old operations.
The declarations of the dirty bitmaps in vnc.h were also wrong for 64 bit
hosts because of a rounding effect: for these hosts, VNC_MAX_WIDTH is no
longer a multiple of (16 * BITS_PER_LONG), so the rounded value of
VNC_DIRTY_WORDS was too small.
Fix both declarations by using the macro which is designed for this
purpose.
Cc: Corentin Chary <corentincj@iksaif.net> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Gerhard Wiesinger <lists@wiesinger.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
client_migrate_info was merged badly, placing it between the command
and the documentation for another command. In addition it did not
respect the general rule of hmp-commands.hx, of having command
definition before the documentation.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Vincent Palatin [Thu, 10 Mar 2011 20:47:46 +0000 (15:47 -0500)]
Fix performance regression in qemu_get_ram_ptr
When the commit f471a17e9d869df3c6573f7ec02c4725676d6f3a converted the
ram_blocks structure to QLIST, it also removed the conditional check before
switching the current block at the beginning of the list.
In the common use case where ram_blocks has a few blocks with only one
frequently accessed (the main RAM), this has a performance impact as it
performs the useless list operations on each call (which are on a really
hot path).
On my machine emulation (ARM on amd64), this patch reduces the
percentage of CPU time spent in qemu_get_ram_ptr from 6.3% to 2.1% in the
profiling of a full boot.
Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
First, sysbus_init_irq shan't be called on on-stack variables. Indeed,
it only stores a passed pointer in qdev and the stored irq is later
populated, so we get a nice write-to-stack bug.
Second, irq for pxa27x should probably be handled in a more gentler way,
as we should check if we have events to raise this irq.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
Stefan Hajnoczi [Sat, 26 Feb 2011 18:38:39 +0000 (18:38 +0000)]
simpletrace: Thread-safe tracing
Trace events outside the global mutex cannot be used with the simple
trace backend since it is not thread-safe. There is no check to prevent
them being enabled so people sometimes learn this the hard way.
This patch restructures the simple trace backend with a ring buffer
suitable for multiple concurrent writers. A writeout thread empties the
trace buffer when threshold fill levels are reached. Should the
writeout thread be unable to keep up with trace generation, records will
simply be dropped.
Each time events are dropped a special record is written to the trace
file indicating how many events were dropped. The event ID is
0xfffffffffffffffe and its signature is dropped(uint32_t count).
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Michael Walle [Thu, 17 Feb 2011 22:45:12 +0000 (23:45 +0100)]
lm32: system control model
This patch add support for a system control block. It is supposed to
act as helper for the emulated program. E.g. shutting down the VM or
printing test results. This model is intended for testing purposes only and
doesn't fit to any real hardware. Therefore, it is not added to any board
by default. Instead a user has to add it explicitly with the '-device'
commandline parameter.
Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Michael Walle [Thu, 17 Feb 2011 22:45:08 +0000 (23:45 +0100)]
lm32: juart model
This patch adds the JTAG UART model. It is accessed through special control
registers and opcodes. Therefore the translation uses callbacks to this
model.
Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Michael Walle [Thu, 17 Feb 2011 22:45:07 +0000 (23:45 +0100)]
lm32: interrupt controller model
This patch adds the interrupt controller of the lm32. Because the PIC is
accessed through special control registers and opcodes, there are callbacks
from the lm32 translation code to this model.
Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Peter Maydell [Tue, 22 Feb 2011 18:19:43 +0000 (18:19 +0000)]
target-arm: Implement a minimal set of cp14 debug registers
Newer ARM kernels try to probe for whether the CPU has hardware breakpoint
support. For this to work QEMU has to implement a minimal set of the cp14
debug registers. The architecture requires v7 cores to implement debug
and so there is no defined way to report its absence; however in practice
returning a zero DBGDIDR (ie with a reserved value for "debug architecture
version") should cause well-written hw debug users to do the right thing.
We also implement DBGDRAR and DBGDSAR as RAZ, indicating no memory mapped
debug components.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Peter Maydell [Sun, 6 Mar 2011 21:39:54 +0000 (21:39 +0000)]
target-arm: Remove ad-hoc leak checking code
This commit removes the ad-hoc resource leak checking code from
target-arm. This includes replacing all uses of new_tmp() with
tcg_temp_new_i32() and all uses of dead_tmp() with
tcg_temp_free_i32().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Peter Maydell [Sun, 6 Mar 2011 21:39:53 +0000 (21:39 +0000)]
tcg: Add support for debugging leakage of temporaries
Add support (if CONFIG_DEBUG_TCG is defined) for debugging leakage
of temporary variables. Generally any temporaries created by
a target while it is translating an instruction should be freed
by the end of that instruction; otherwise carefully crafted
guest code could cause TCG to run out of temporaries and assert.
By calling tcg_check_temp_count() after each instruction we can
check that we are not leaking temporaries in this way.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
target-arm: Integrate secondary CPU reset in arm_boot
Integrate secondary CPU reset into arm_boot, removing it from realview.c.
On non-Linux systems secondary CPUs start with the same entry as the boot
CPU.
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Peter Maydell [Sun, 6 Mar 2011 20:32:09 +0000 (20:32 +0000)]
target-arm: Set carry flag correctly for Thumb2 ORNS
The code for Thumb2 ORNS (or negated and set flags) was trashing
a TCG input register which was needed later for use in calculating
flags, with the effect that the carry flag was always set with
the wrong sense. Fix this by using the TCG orc op instead of
separate not and or ops.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Jes Sorensen [Thu, 17 Feb 2011 12:26:05 +0000 (13:26 +0100)]
tracetool: Add optional argument to specify dtrace probe names
Optional feature allowing a user to generate the probe list to match
the name of the binary, in case they wish to install qemu under a
different name than qemu-{system,user},<arch>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefaha@linux.vnet.ibm.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Peter Maydell [Tue, 1 Mar 2011 17:35:19 +0000 (17:35 +0000)]
target-arm: Handle VMOV between two core and VFP single regs
Fix two bugs in the translation of the instructions VMOV sa,sb,rx,ry and
VMOV rx,ry,sa,sb (which copy between a pair of ARM core registers and a
pair of VFP single precision registers):
* An incorrect condition meant these instruction patterns were being
treated as load/store multiple, which resulted in the generation
of bad code and a runtime segfault
* The order of the core register pair was reversed so the values would
go to the wrong registers
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Peter Maydell [Fri, 25 Feb 2011 15:04:12 +0000 (15:04 +0000)]
target-arm: Don't decode old cp15 WFI instructions on v7 cores
In v7 of the ARM architecture, WFI (wait for interrupt) is a first-class
instruction, but in previous versions this functionality was provided
via a cp15 coprocessor register. Add correct feature checks to the
decoding of the cp15 WFI instructions so that they behave correctly
for newer cores. In particular, the old 0,c7,c8,2 encoding used on
ARM940 has been reused for VA-to-PA translation in v6 and v7.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>