]> xenbits.xensource.com Git - xen.git/log
xen.git
15 years agoFix domain reference leaks
Keir Fraser [Wed, 10 Feb 2010 09:18:43 +0000 (09:18 +0000)]
Fix domain reference leaks

Besides two unlikely/rarely hit ones in x86 code, the main offender
was tmh_client_from_cli_id(), which didn't even have a counterpart
(albeit it had a comment correctly saying that it causes d->refcnt to
get incremented). Unfortunately(?) this required a bit of code
restructuring (as I needed to change the code anyway, I also fixed
a couple os missing bounds checks which would sooner or later be
reported as security vulnerabilities), so I would hope Dan could give
it his blessing before it gets applied.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: MCE fixes
Keir Fraser [Wed, 10 Feb 2010 09:18:11 +0000 (09:18 +0000)]
x86: MCE fixes

- fill_vmsr_data() leaked a domain reference; since the caller already
  obtained one, there's no need to obtain another one here
- intel_UCR_handler() could call put_domain() with a NULL pointer
- mcheck_mca_logout() updated a local data structure that wasn't used
  after the update

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agotmem: Disable by default: enable with Xen boot param 'tmem'
Keir Fraser [Wed, 10 Feb 2010 09:09:35 +0000 (09:09 +0000)]
tmem: Disable by default: enable with Xen boot param 'tmem'

This reverts 20758:4e56f809ddbf and 20655:3c5b5c4c1d79

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxend: Enlarge the memory balloon size for domain creation since shadow
Keir Fraser [Wed, 10 Feb 2010 09:07:48 +0000 (09:07 +0000)]
xend: Enlarge the memory balloon size for domain creation since shadow
pre-allocation size has changed from 1M to 4M.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
15 years agoxenpm: Fix ia64 build
Keir Fraser [Wed, 10 Feb 2010 09:06:59 +0000 (09:06 +0000)]
xenpm: Fix ia64 build

cpuid_eax() is x86-specific.

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
15 years agoDump machine check context for fatal machine check
Keir Fraser [Mon, 8 Feb 2010 10:18:51 +0000 (10:18 +0000)]
Dump machine check context for fatal machine check

This small patches enable Xen hypervisor to always dump machine check
ontext, previously it will not print anything if fatal MCE happens. It
also add checking for NULL pointer.

It also change the address passing to guest to always use guest
mfn. It should benifit non-translated guest.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agoDon't scrub broken pages
Keir Fraser [Mon, 8 Feb 2010 10:18:14 +0000 (10:18 +0000)]
Don't scrub broken pages

Don't touch the poison pages when scrub the pages. Consuming poison
page will contaminate the CPU context and may cause system crash.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agoSome time-handling fixes.
Keir Fraser [Mon, 8 Feb 2010 10:14:48 +0000 (10:14 +0000)]
Some time-handling fixes.

Fixes my domU boot hangs (when using vtsc) due to vtsc_offset less
then local cpu's stime_local_stamp, leading to bogus
vcpu_time_info.tsc_timestamp.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agotools/xenconsole: fix Segmentation fault
Keir Fraser [Mon, 8 Feb 2010 08:50:03 +0000 (08:50 +0000)]
tools/xenconsole: fix Segmentation fault

Segmentation fault occurs if DOMID isn't specified.
Some check be added to output error message in this situation.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
15 years agoHandle bogus serial ports that appear normal, but don't generate
Keir Fraser [Mon, 8 Feb 2010 08:49:19 +0000 (08:49 +0000)]
Handle bogus serial ports that appear normal, but don't generate
interrupts e.g. the "remote serial console" on Blades.

Authored-by: Gary Grebus <Gary.Grebus@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
15 years agoxenpm: Allow user to enable/disable dbs governor turbo mode.
Keir Fraser [Mon, 8 Feb 2010 08:48:40 +0000 (08:48 +0000)]
xenpm: Allow user to enable/disable dbs governor turbo mode.

Signed-off-by: Lu Guanqun <guanqun.lu@intel.com>
15 years agovmx: Don't enable irq for machine check vmexit handling
Keir Fraser [Mon, 8 Feb 2010 08:43:25 +0000 (08:43 +0000)]
vmx: Don't enable irq for machine check vmexit handling

We should not enable irq for machine check VMExit

In changeset 18658:824892134573, IRQ is enabled during VMExit except
external interrupt. The exception should apply for machine check also,
because :
a) The mce_logout_lock should be held in irq_disabled context.
b) The machine check event should be handled as quickly as possible,
enable irq will increase the period greatly.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoremus: Only build kernel module if parent kernel has IMQ configured.
Keir Fraser [Mon, 8 Feb 2010 08:41:51 +0000 (08:41 +0000)]
remus: Only build kernel module if parent kernel has IMQ configured.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoblktap2: Fix non-Linux build
Keir Fraser [Fri, 5 Feb 2010 13:57:20 +0000 (13:57 +0000)]
blktap2: Fix non-Linux build

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoRemus: fix ia64 build
Keir Fraser [Fri, 5 Feb 2010 10:37:24 +0000 (10:37 +0000)]
Remus: fix ia64 build

This patch fixes the following error:
  /xen-unstable.hg/tools/remus/kmod/sch_queue.c: In function
  `is_foreign':
  /xen-unstable.hg/tools/remus/kmod/sch_queue.c:51: error:
  `phys_to_machine_mapping' undeclared (first use in this function)

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
15 years agoUpdate QEMU_TAG to d463cfee458bacfe1ee3cb9c3b4ce46a6be06edf
Keir Fraser [Fri, 5 Feb 2010 10:36:43 +0000 (10:36 +0000)]
Update QEMU_TAG to d463cfee458bacfe1ee3cb9c3b4ce46a6be06edf

15 years agolibxl: Properly parse vbd names
Keir Fraser [Fri, 5 Feb 2010 10:36:17 +0000 (10:36 +0000)]
libxl: Properly parse vbd names

Implement proper parsing of vbd names, as documented here:
  From: Ian Jackson <Ian.Jackson@eu.citrix.com>
  Subject: Xen vbd numbering
  Date: Wed, 03 Feb 2010 16:51:47 GMT
  Message-ID: <19305.43376.600816.817077@mariner.uk.xensource.com>
  http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00183.html

Previously, xvd and numerical specification were broken in libxl.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agolibxl: notice if vbd virt device specifier ("path") unrecognised
Keir Fraser [Fri, 5 Feb 2010 10:35:57 +0000 (10:35 +0000)]
libxl: notice if vbd virt device specifier ("path") unrecognised

Previously, specifying a virtual device string the vbd that couldn't
be parsed would result in attempting to actually create the device
with vbd number -1 !

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agoRevert changeset 20898:8c1889297084
Keir Fraser [Thu, 4 Feb 2010 19:40:19 +0000 (19:40 +0000)]
Revert changeset 20898:8c1889297084

15 years agolibxc: Reorder functions in xc_misc.c to avoid weak symbol problem
Keir Fraser [Thu, 4 Feb 2010 13:16:39 +0000 (13:16 +0000)]
libxc: Reorder functions in xc_misc.c to avoid weak symbol problem

Using a function, and then declaring it weak later, has undefined
behaviour:
  cc1: warnings being treated as errors
  xc_misc.c:388: error: weak declaration of 'xc_map_foreign_bulk'
  after first use results in unspecified behavior

So swap the functions xc_map_foreign_pages and xc_map_foreign_bulk.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agolibxenlight: Do not build libconfig, but require it as a prerequisite
Keir Fraser [Thu, 4 Feb 2010 13:16:03 +0000 (13:16 +0000)]
libxenlight: Do not build libconfig, but require it as a prerequisite

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agomem hotplug: Fix an incorrect sanity check in memory add
Keir Fraser [Thu, 4 Feb 2010 13:09:30 +0000 (13:09 +0000)]
mem hotplug: Fix an incorrect sanity check in memory add

Current, memory hot-add will fail if the new added memory is bigger
than current max_pages. This is really a stupid checking, considering
user may hot-add the biggest address riser card firstly.

This patch fix this issue. It check if all new added memory is
unpopulated, if yes, then it is ok.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agoVT-d, tools: Intel IGD passthrough 2/2: hvm config file example
Keir Fraser [Thu, 4 Feb 2010 09:09:13 +0000 (09:09 +0000)]
VT-d, tools: Intel IGD passthrough 2/2: hvm config file example

Add an option into hvm config file to enable graphics passthrough
including discrete and IGD. To passthrough graphics to guest, need to
set gfx_passthru=1, and also specify graphics device BDF in pci
passthrough option, like pci=['xx:xx.x'] in hvm config file.

Signed-off-by: Weidong Han <weidong.han@intel.com>
15 years agoVT-d, tools: Intel IGD passthrough 1/2: vendor-specific FLR
Keir Fraser [Thu, 4 Feb 2010 09:08:05 +0000 (09:08 +0000)]
VT-d, tools: Intel IGD passthrough 1/2: vendor-specific FLR

Due to FLR capability of IGD is not exposed on some platforms, this
patch uses vendor specific FLR to reset those IGDs.

Signed-off-by: Weidong Han <weidong.han@intel.com>
15 years agox86: Intel EPT entry structure changes.
Keir Fraser [Thu, 4 Feb 2010 09:04:33 +0000 (09:04 +0000)]
x86: Intel EPT entry structure changes.

 - Intel SDM defines bit6 in EPT page table entry as "Ignore PAT
   memory type", so change the abbreviation from "igmt" to "ipat".

 - Change the mfn and avail2 fields according to SDM definition.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
15 years agotools: Do not append trailing slash to XEN_ROOT in Makefiles
Keir Fraser [Thu, 4 Feb 2010 09:03:42 +0000 (09:03 +0000)]
tools: Do not append trailing slash to XEN_ROOT in Makefiles

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agox86: make max_mfn returned from XENMEM_machphys_mapping dynamic
Keir Fraser [Thu, 4 Feb 2010 08:53:49 +0000 (08:53 +0000)]
x86: make max_mfn returned from XENMEM_machphys_mapping dynamic

This helps debugging in the guest kernels, as then MFNs there can then
be range checked based on the reported value.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: fix frame table initialization when hotplug memory regions were detected
Keir Fraser [Thu, 4 Feb 2010 08:53:16 +0000 (08:53 +0000)]
x86: fix frame table initialization when hotplug memory regions were detected

max_idx is not a pdx, and hence needs to be converted to one in all
cases where it is being passed to pdx_to_page().

Also, just like for max_pdx, the conversion result of max_idx may
point into an address space hole, and hence it must not be used
directly as an argument to pdx_to_page(). Note that this doesn't apply
to the arguments passed to memset(), as the size argument would be
zero in the case of hitting an address space hole.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoVT-d: fix a bug in enable_ats_device
Keir Fraser [Wed, 3 Feb 2010 09:46:38 +0000 (09:46 +0000)]
VT-d: fix a bug in enable_ats_device

In enable_ats_device, it should enable ATS if find matched atsr unit
for a device, and don't enable it if no matched atsr unit. But current
code does contrarily. This patch fixes it.

Signed-off-by: Weidong Han <Weidong.han@intel.com>
15 years agolibxc: Check there's enough memory for segments we're creating
Keir Fraser [Wed, 3 Feb 2010 09:46:01 +0000 (09:46 +0000)]
libxc: Check there's enough memory for segments we're creating

Previously, xc_dom_alloc_segment would go ahead even if the segment
we're trying to create is too big for the domain's RAM (or the
requested addr is out of range).  It would pass invalid parameters to
xc_dom_seg_to_ptr giving undefined behaviour.

Fixing xc_dom_seg_to_ptr to fail is not sufficient because we want to
provide a comprehensible explanation to the caller - which may
ultimately be the user.

In particular, with this change attempting "xl create" with a ramdisk
image bigger than the guest's specified RAM will provide a useful
error message mentioning the ramdisk.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agolibxc: Check full range of pfns for xc_dom_pfn_to_ptr
Keir Fraser [Wed, 3 Feb 2010 09:45:40 +0000 (09:45 +0000)]
libxc: Check full range of pfns for xc_dom_pfn_to_ptr

Previously, passing a valid pfn but an overly large count to
xc_dom_pfn_to_ptr, and functions which call it, would run off the end
of the pfn array giving undefined behaviour.

It is tempting to change this check to an assert, as no callers should
be providing invalid parameters here.  But this is probably best not
done while frozen for 4.0.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agoxl: Do not duplicate last line of config file when trying compat mode.
Keir Fraser [Wed, 3 Feb 2010 09:45:25 +0000 (09:45 +0000)]
xl: Do not duplicate last line of config file when trying compat mode.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agoxend: Disallow "/" in domain names
Keir Fraser [Wed, 3 Feb 2010 09:45:02 +0000 (09:45 +0000)]
xend: Disallow "/" in domain names

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
15 years agotboot: fix S3 issue for Intel Trusted Execution Technology.
Keir Fraser [Wed, 3 Feb 2010 09:44:12 +0000 (09:44 +0000)]
tboot: fix S3 issue for Intel Trusted Execution Technology.

Those unmapped pages cause page fault when MACing them and finally
cause S3 failure.

Signed-off-by: Shane Wang <shane.wang@intel.com>
15 years agotools/xen-detect: fix printing xen version
Keir Fraser [Wed, 3 Feb 2010 09:42:45 +0000 (09:42 +0000)]
tools/xen-detect: fix printing xen version

check_for_xen() should return xen version rather than
boolean true if signature XenVMM is found.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
15 years agolibxc: Export do_mca hypercall to user space tools.
Keir Fraser [Wed, 3 Feb 2010 09:38:57 +0000 (09:38 +0000)]
libxc: Export do_mca hypercall to user space tools.

This is mainly for software trigger MCE operation, so that test suites
can trigger software MCE.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agoRemus: ensure kernel modules are built before attempting to install
Keir Fraser [Wed, 3 Feb 2010 09:38:00 +0000 (09:38 +0000)]
Remus: ensure kernel modules are built before attempting to install
them

make tools seems to skip straight to the install target.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
15 years agoRemus: use imqebt instead of ebtables
Keir Fraser [Wed, 3 Feb 2010 09:37:40 +0000 (09:37 +0000)]
Remus: use imqebt instead of ebtables

I added a standalone IMQ-aware ebtables version in the initial Remus
submission, but forgot to actually activate it. This fixes that
oversight.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
15 years agoxentrace: Disable tracing, then read records one more time.
Keir Fraser [Wed, 3 Feb 2010 09:36:37 +0000 (09:36 +0000)]
xentrace: Disable tracing, then read records one more time.

When interrupted, first disable tracing, then read through the records
one last time.

Without this patch, it's possible to get traces which interact (such
as runstate changes) on processors with higher numbers, while missing
the corresponding traces generated on lower-numbered processors.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agoxentrace: Clear lost records when disabling tracing
Keir Fraser [Wed, 3 Feb 2010 09:35:56 +0000 (09:35 +0000)]
xentrace: Clear lost records when disabling tracing

This patch clears the "lost records" flag on each cpu when tracing is
disabled.

Without this patch, the next time tracing starts, cpus with lost
records will generate lost record traces, even though buffers are
empty and no tracing has recently happened.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agoxentrace: Trace p2m events
Keir Fraser [Wed, 3 Feb 2010 09:35:23 +0000 (09:35 +0000)]
xentrace: Trace p2m events

Add more tracing to aid in debugging ballooning / PoD:
* Nested page faults for EPT/NPT systems
* set_p2m_enry
* Decrease reservation (for ballooning)
* PoD populate, zero reclaim, superpage splinter

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agoUpdate QEMU_TAG to 575ed1016f6fba1c6a6cd32a828cb468bdee96bb
Keir Fraser [Wed, 3 Feb 2010 09:33:12 +0000 (09:33 +0000)]
Update QEMU_TAG to 575ed1016f6fba1c6a6cd32a828cb468bdee96bb

15 years agohvmloader: Fix CPU hotplug notify handler in ACPI DSDT.
Keir Fraser [Wed, 3 Feb 2010 09:16:11 +0000 (09:16 +0000)]
hvmloader: Fix CPU hotplug notify handler in ACPI DSDT.

By merging PRSC and NTFY methods we simplify the code, improve
efficiency, and fix a bug where PRSC iterated 0-255 but NTDY could
only handle CPU numbers 0-127.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agopygrub: support parsing of syslinux configuration files
Keir Fraser [Mon, 1 Feb 2010 14:03:47 +0000 (14:03 +0000)]
pygrub: support parsing of syslinux configuration files

Allows booting from ISOs which use isolinux as well as guests using
extlinux.

Also add copyright header to GrubConf.py, I think the grub2 support
added last year qualifies as a substantial change.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
15 years agohvm, s3: HVM guest RTCs become unsync'ed across host S3.
Keir Fraser [Mon, 1 Feb 2010 14:03:06 +0000 (14:03 +0000)]
hvm, s3: HVM guest RTCs become unsync'ed across host S3.

Signed-off-by: Kamala Narasimhan <kamala.narasimhan@citrix.com>
15 years agotools/gtraceview: fix SIGFPE
Keir Fraser [Fri, 29 Jan 2010 08:59:46 +0000 (08:59 +0000)]
tools/gtraceview: fix SIGFPE

If there are 0 or 1 valid record in xentrace file,
SIGFPE will occur. Fix it.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
15 years agoblktap2: Prefer AIO eventfd support on kernels >= 2.6.22
Keir Fraser [Fri, 29 Jan 2010 08:55:27 +0000 (08:55 +0000)]
blktap2: Prefer AIO eventfd support on kernels >= 2.6.22

Mainline kernel support for eventfd(2) in linux aio was added between
2.6.21 and 2.6.22. Libaio after 0.3.107 has the header file, but
presently few systems support it. Neither do we rely on an up-to-date
libc6.

Instead, this patch adds a header which defines custom iocb_common
struct, and works around a potentially missing sys/eventfd.h.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
15 years agoblktap2: Separate tapdisk raw I/O into different backends.
Keir Fraser [Fri, 29 Jan 2010 08:54:51 +0000 (08:54 +0000)]
blktap2: Separate tapdisk raw I/O into different backends.

Hide tapdisk support for different raw I/O interfaces behind a new
struct tio. Libaio remains to dominate the interface, requiring
everyone to dispatch iocb/ioevent structs.

Backends:
 - lio:  Kernel AIO via libaio.
 - rwio: Canonical read/write() mode.

Misc:
 - Fixes a bug in tapdisk-vbd which locks up the sync io mode.
 - Wants a PERROR macro in blktaplib.h
 - Removes dead code in qcow2raw to make it link again.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jake Wires <jake.wires@citrix.com>
15 years agoblktap2: Sort out tapdisk AIO init.
Keir Fraser [Fri, 29 Jan 2010 08:54:22 +0000 (08:54 +0000)]
blktap2: Sort out tapdisk AIO init.

Move event callbacks registration into tapdisk-queue. This should also
obsoletes the dummy pollfd pipe in the synchronous I/O case.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
15 years agoblktap2: Sort out tapdisk IPC init.
Keir Fraser [Fri, 29 Jan 2010 08:53:52 +0000 (08:53 +0000)]
blktap2: Sort out tapdisk IPC init.

Move I/O and event callbacks setup out of tapdisk-server, into
tapdisk-ipc.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
15 years agolibelf: make elf_phdr_is_loadable load read-only segments.
Keir Fraser [Fri, 29 Jan 2010 07:14:32 +0000 (07:14 +0000)]
libelf: make elf_phdr_is_loadable load read-only segments.

From: Brad Plant <bplant@iinet.net.au>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agopv-on-hvm: Correct the order of the argument of out*()
Keir Fraser [Fri, 29 Jan 2010 07:10:28 +0000 (07:10 +0000)]
pv-on-hvm: Correct the order of the argument of out*()

The order of the argument of outl() is wrong.
The correct order is outl(value, port). This causes kernel panic.

And outw() is also similar.

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
15 years agox86 mca: Be more careful for printk in MCE context
Keir Fraser [Fri, 29 Jan 2010 06:50:23 +0000 (06:50 +0000)]
x86 mca: Be more careful for printk in MCE context

MCE may happen in printk context, and will cause deadlock if we try to
call printk again in MCE context.

A new level(mce_critical) is added to mce_verbosity for printk in mce
context. This level is only for developer that aware of such issue.
In mce_panic, force console unlock.

Singed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agox86 mca: Add MCE broadcast checkiing.
Keir Fraser [Fri, 29 Jan 2010 06:49:42 +0000 (06:49 +0000)]
x86 mca: Add MCE broadcast checkiing.

Some platform will broadcast MCE to all logical processor, while some
platform will not. Distinguish these platforms will be helpful for
unified MCA handler.

the "mce_fb" is a option to emulate the broadcast MCA in non-broadcast
platform. This is mainly for MCA software trigger.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agox86 mca: Fix the vMCE address translation for HVM guest.
Keir Fraser [Fri, 29 Jan 2010 06:49:13 +0000 (06:49 +0000)]
x86 mca: Fix the vMCE address translation for HVM guest.

Fix address translation when we inject a virtual MCE to HVM guest.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agox86 mca: Add the mised put_domain in UCR handler function.
Keir Fraser [Fri, 29 Jan 2010 06:48:37 +0000 (06:48 +0000)]
x86 mca: Add the mised put_domain in UCR handler function.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agox86 mca: Not GP fault when guest write non 0s or 1s to MCA CTL MSRs.
Keir Fraser [Fri, 29 Jan 2010 06:48:00 +0000 (06:48 +0000)]
x86 mca: Not GP fault when guest write non 0s or 1s to MCA CTL MSRs.

a) For Mci_CTL MSR, Guest can write any value to it. When read back,
it will be ANDed with the physical value. Some bit in physical value
can be 0, either because read-only in hardware (like masked by AMD's
Mci_CTL_MASK), or because Xen didn't enable it.
    If guest write some bit as 0, while that bit is 1 in host, we will
    not inject MCE corresponding that bank to guest, as we can't
    distinguish if the MCE is caused by the guest-cleared bit.

b) For MCG_CTL MSR, guest can write any value to it. When read back,
it will be ANDed with the physical value.
    If guest does not write all 1s. In mca_ctl_conflict(), we simply
    not inject any vMCE to guest if some bit is set in physical MSR
    while is cleared in guest 's vMCG_CTL MSR.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agox86 mca: Handle the vMCA bank correctly
Keir Fraser [Fri, 29 Jan 2010 06:47:24 +0000 (06:47 +0000)]
x86 mca: Handle the vMCA bank correctly

Currently the virtual MCE MSR assume all MSRs range from 0 to
MAX_NR_BANKS are always MCE MSR, this is not always correct. With this
patch, the mce_rdmsr/mce_wrmsr will only handle vMCE MSR range from 0
to the MCA banks in the host platform.
Please notice that some MSR beyond current MCA banks in the host
platform are really MCA MSRs, that should be handled by general MSR
handler.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agox86: Clean up c/s 20844:ca0759a08057
Keir Fraser [Fri, 29 Jan 2010 06:45:45 +0000 (06:45 +0000)]
x86: Clean up c/s 20844:ca0759a08057

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxend: destroy restored domain when its device doesn't exist
Keir Fraser [Wed, 27 Jan 2010 08:59:47 +0000 (08:59 +0000)]
xend: destroy restored domain when its device doesn't exist

A migrated domain keeps on running even though its disk doesn't
exist. This situation must be undesirable.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
15 years agopygrub: improve grub 2 support
Keir Fraser [Tue, 26 Jan 2010 15:54:40 +0000 (15:54 +0000)]
pygrub: improve grub 2 support

* The "default" value can be a quoted string (containing an integer)
  so strip the quotes before interpreting.
* The "set" command takes a variable with an arbitrary name so instead
  of whitelisting the ones to ignore simply silently accept any set
  command with an unknown variable.
* Ignore the echo command.
* Handle the function { ... } syntax. Previously pygrub would error
  out with a syntax error on the closing "}" because it thought it was
  the closing bracket of a menuentry.

This makes pygrub2 work with the configuration files generated by
Debian Squeeze today.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
15 years agox86: Polarity-switch method only effective in non-directed EOI case.
Keir Fraser [Tue, 26 Jan 2010 15:54:09 +0000 (15:54 +0000)]
x86: Polarity-switch method only effective in non-directed EOI case.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
15 years agox86: reduce EOI stack's size in per-cpu area.
Keir Fraser [Tue, 26 Jan 2010 15:53:52 +0000 (15:53 +0000)]
x86: reduce EOI stack's size in per-cpu area.

Only dynamic vectors uses EOI stack, so the size
can be safely reducd to NR_DYNAMIC_VECTORS.

Signed-off-by : Xiantao Zhang <xiantao.zhang@intel.com>

15 years agox86: Directly clear all pending EOIs once MSI info changed
Keir Fraser [Tue, 26 Jan 2010 15:53:01 +0000 (15:53 +0000)]
x86: Directly clear all pending EOIs once MSI info changed

As to unmaskable MSI, its deferred EOI policy only targets
for avoiding IRQ storm. It should be safe to clear pending
EOIs in advance when guest irq migration occurs, because next
interrupt's EOI write is still deferred, and also can avoid
storm.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
15 years agox86: Revert Cset 20334:dcc5d5d954e9
Keir Fraser [Tue, 26 Jan 2010 15:52:30 +0000 (15:52 +0000)]
x86: Revert Cset 20334:dcc5d5d954e9

Recording old MSI info doesn't solve all the corner cases
when guest's irq migration occurs.

Signed-off-by : Xiantao Zhang <xiantao.zhang@intel.com>

15 years agoUpdate Xen version to 4.0.0-rc3-pre
Keir Fraser [Tue, 26 Jan 2010 15:51:53 +0000 (15:51 +0000)]
Update Xen version to 4.0.0-rc3-pre

15 years agoAdded tag 4.0.0-rc2 for changeset e5e4573bcaba
Keir Fraser [Tue, 26 Jan 2010 14:15:05 +0000 (14:15 +0000)]
Added tag 4.0.0-rc2 for changeset e5e4573bcaba

15 years agoUpdate Xen version to 4.0.0-rc2 4.0.0-rc2
Keir Fraser [Tue, 26 Jan 2010 14:15:01 +0000 (14:15 +0000)]
Update Xen version to 4.0.0-rc2

15 years agoVT-d: add "iommu=workaround_bios_bug" option
Keir Fraser [Tue, 26 Jan 2010 07:51:20 +0000 (07:51 +0000)]
VT-d: add "iommu=workaround_bios_bug" option

Add this option to workaround BIOS bugs. Currently it ignores DRHD
if "all" devices under its scope are not pci discoverable. This
workarounds a BIOS bug in some platforms to make VT-d work. But note
that this option doesn't guarantee security, because it might ignore
DRHD.

So there are 3 options which handle BIOS bugs differently:
  iommu=1 (default): If detect non-existent device under a DRHD's
scope, or find incorrect RMRR setting (base_address > end_address),
disable VT-d completely in Xen with warning messages. This guarantees
security when VT-d enabled, or just disable VT-d to let Xen work
without VT-d.
  iommu=force: it enforces to enable VT-d in Xen. If VT-d cannot be
enabled, it will crashes Xen. This is mainly for users who must need
VT-d.
  iommu=workaround_bogus_bios: it workarounds some BIOS bugs to make
VT-d still work.  This might be insecure because there might be a
device not protected by any DRHD if the device is re-enabled by
malicious s/w.  This is for users who want to use VT-d regardless of
security.

Signed-off-by: Weidong Han <weidong.han@intel.com>
15 years agotools/xsm: Expose Flask XSM AVC functions to user-space
Keir Fraser [Tue, 26 Jan 2010 07:50:04 +0000 (07:50 +0000)]
tools/xsm: Expose Flask XSM AVC functions to user-space

This patch exposes the flask_access, flask_avc_cachestats,
flask_avc_hashstats, flask_getavc_threshold, flask_setavc_threshold,
and flask_policyvers functions to user-space. A python wrapper was
created for the flask_access function to facilitate policy based
user-space access control decisions. flask.h was renamed to libflask.h
to remove a naming conflict.

Signed-off-by : Machon Gregory <mbgrego@tycho.ncsc.mil>

15 years agolibxl: Fix libconfig install directory
Keir Fraser [Sat, 23 Jan 2010 08:28:01 +0000 (08:28 +0000)]
libxl: Fix libconfig install directory

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
15 years agopv-on-hvm: Only unplug emulated devices if requested via module parameter.
Keir Fraser [Sat, 23 Jan 2010 08:26:23 +0000 (08:26 +0000)]
pv-on-hvm: Only unplug emulated devices if requested via module parameter.

dev_unplug=[all,][ide-disks,][aux-ide-disks,][nics]

ide-disks: Unplug all emulated IDE disks (but not CD-ROMs)
aux-ide-disks: As above, but doesn't touch primary IDE master
nics: Unplug all emulated NICs
all: ide-disks and nics

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoVT-d: improve RMRR validity checking
Keir Fraser [Sat, 23 Jan 2010 08:23:24 +0000 (08:23 +0000)]
VT-d: improve RMRR validity checking

In order to make Xen more defensive to VT-d related BIOS issue, this
patch ignores a DRHD if all devices under its scope are not pci
discoverable, and regards a DRHD as invalid and then disable whole
VT-d if some devices under its scope are not pci discoverable. But if
iommu=force is set, it will enable all DRHDs reported by BIOS, to
avoid any security vulnerability with malicious s/s re-enabling
"supposed disabled" devices.  Pls note that we don't know the devices
under the "Include_all" DRHD are existent or not, because the scope of
"Include_all" DRHD won't enumerate common pci device, it only
enumerates I/OxAPIC and HPET devices.

Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
15 years agoGet libconfig tarball from xenbits
Keir Fraser [Fri, 22 Jan 2010 13:32:26 +0000 (13:32 +0000)]
Get libconfig tarball from xenbits

Download libconfig.tar.gz from xenbits.org extfiles rather than from
upstream.  This insulates us from upstream networking failures and any
upstream changes to the files hosted etc.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agox86: check if desc->action is NULL when unbinding guest pirq
Keir Fraser [Fri, 22 Jan 2010 11:01:18 +0000 (11:01 +0000)]
x86: check if desc->action is NULL when unbinding guest pirq

Before igb PF driver is unloaded, dom0 doesn't unload igbvf driver
automatically. When igb drver is unloaded, it invokes the
PHYSDEVOP_manage_pci_remove hypercall to remove the VFs and xen frees
the msi irqs by pci_cleanup_msi() -> ... -> dynamic_irq_cleanup() and
sets the desc->action to NULL.  igbvf driver knows the VF is
disappearing via a hook ndo_stop() in dev_close() and tries to unbind
the pirq and xen would crash as the desc->action is NULL now.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
15 years agoblktap: fix blktapctrl abort
Keir Fraser [Fri, 22 Jan 2010 11:00:45 +0000 (11:00 +0000)]
blktap: fix blktapctrl abort

On rebooting a hvm, the blktapctrl daemon has died.

gdb shows the following call trace:
(gdb) where
#0  0x00000039d1830155 in raise () from /lib64/libc.so.6
#1  0x00000039d1831bf0 in abort () from /lib64/libc.so.6
#2  0x00000039d186a38b in __libc_message () from /lib64/libc.so.6
#3  0x00000039d1871634 in _int_free () from /lib64/libc.so.6
#4  0x00000039d1874c5c in free () from /lib64/libc.so.6
#5  0x0000003320a01bdd in ueblktap_probe (h=3D0x6073b0,=20
    w=<value optimized out>, bepath_im=<value optimized out>) at
    xenbus.c:270
#6  0x0000003320a020e0 in xs_fire_next_watch (h=3D0x6073b0) at
xs_api.c:355
#7  0x0000000000401785 in main (argc=3D<value optimized out>,
    argv=<value optimized out>) at blktapctrl.c:907

There is a case that "/local/domain/0/backend/tap/<dom_id>" exists but
"/local/domain/<dom_id>/vm" is not in the xenstore.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
15 years agolibxc: mmapbatch-v2 adjustments
Keir Fraser [Fri, 22 Jan 2010 10:59:51 +0000 (10:59 +0000)]
libxc: mmapbatch-v2 adjustments

Just like the kernel, the fallback implementation of
xc_map_foreign_bulk() should clear the error indication array upon
success.

Also, a few allocations were needlessly using calloc() instead of
malloc().

Finally, in xc_domain_save() allocate the error indicator array once
(along with the other arrays) instead of using realloc() (without
error checking) in the loop body.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agolibxc: New hcall_buf_{prep,release} pre-mlock interface
Keir Fraser [Fri, 22 Jan 2010 10:59:03 +0000 (10:59 +0000)]
libxc: New hcall_buf_{prep,release} pre-mlock interface

Allow certain performance-critical hypercall wrappers to register data
buffers via a new interface which allows them to be 'bounced' into a
pre-mlock'ed page-sized per-thread data area. This saves the cost of
mlock/munlock on every such hypercall, which can be very expensive on
modern kernels.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agox86: kill msix_flush_writes()
Keir Fraser [Thu, 21 Jan 2010 15:13:00 +0000 (15:13 +0000)]
x86: kill msix_flush_writes()

The (only) two callers of it don't need it, as the MSI-X case of
msi_set_mask_bit() already does the necessary readl().

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: dump full IRQ affinity
Keir Fraser [Thu, 21 Jan 2010 15:12:38 +0000 (15:12 +0000)]
x86: dump full IRQ affinity

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: add keyhandler to dump MSI state
Keir Fraser [Thu, 21 Jan 2010 15:12:17 +0000 (15:12 +0000)]
x86: add keyhandler to dump MSI state

Equivalent to dumping IO-APIC state; the question is whether this
ought to live on its own key (as done here), or whether it should be
chanined to from the 'i' handler.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoxend: Dis-allow device assignment if PoD is enabled.
Keir Fraser [Thu, 21 Jan 2010 14:40:05 +0000 (14:40 +0000)]
xend: Dis-allow device assignment if PoD is enabled.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
15 years agotools: fix sysfs error path
Keir Fraser [Thu, 21 Jan 2010 11:27:11 +0000 (11:27 +0000)]
tools: fix sysfs error path

Attached patch fixes sysfs error path.
NetBSD also has a /proc/mounts file but no sysfs.
On Linux you can test this with sysfs not mounted.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
15 years agoVT-d: warn on bogus RMRR entry
Keir Fraser [Thu, 21 Jan 2010 11:26:26 +0000 (11:26 +0000)]
VT-d: warn on bogus RMRR entry

Signed-off-by: Weidong Han <weidong.han@intel.com>
15 years agoxentrace: XC_PAGE_SIZE should be used
Keir Fraser [Thu, 21 Jan 2010 09:13:46 +0000 (09:13 +0000)]
xentrace: XC_PAGE_SIZE should be used

20827:fad80160c001 cannot be compiled on ia64:
  xentrace.c:647: error: 'PAGE_SIZE' undeclared (first use in this

This patch fixes it.

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
15 years agoVT-d: improve RMRR validity checking
Keir Fraser [Thu, 21 Jan 2010 09:12:01 +0000 (09:12 +0000)]
VT-d: improve RMRR validity checking

Currently, Xen checks RMRR range and disables VT-d if RMRR range is
set incorrectly in BIOS rigorously. But, actually we can ignore the
RMRR if the device under its scope are not pci discoverable, because
the RMRR won't be used by non-existed or disabled devices.

This patch ignores the RMRR if the device under its scope are not pci
discoverable, and only checks the validity of RMRRs that are actually
used. In order to avoid duplicate pci device detection code, this
patch defines a function pci_device_detect for it.

Signed-off-by: Weidong Han <weidong.han@intel.com>
15 years agoVT-d: handle return value of deassign_device
Keir Fraser [Thu, 21 Jan 2010 09:11:06 +0000 (09:11 +0000)]
VT-d: handle return value of deassign_device

deassign_device may fail, so need to capture its failure for
appropriate handling. This patch captures return values of
deassign_device, and prints error messages if it fails.

In addition, this patch also fixes some code style issues.

Signed-off-by: Weidong Han <Weidong.han@intel.com>
15 years agolibxc: Unbreak HVM live migration after 0b138a019292.
Keir Fraser [Thu, 21 Jan 2010 09:03:20 +0000 (09:03 +0000)]
libxc: Unbreak HVM live migration after 0b138a019292.

0b138a019292 was a little too ambitious replacing xc_map_foreign_batch
with xc_map_foreign_pages in xc_domain_restore. With HVM, some of the
mappings are expected to fail (as "XTAB" pages).

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
15 years agoxend: Unbreak live migration with tapdisk2 after 20691:054042ba73b6
Keir Fraser [Thu, 21 Jan 2010 09:03:00 +0000 (09:03 +0000)]
xend: Unbreak live migration with tapdisk2 after 20691:054042ba73b6

vm.image does not exist at this point in the restore process.
I haven't looked at the memory_sharing code. It's likely something
better is needed to make that work across relocation.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
15 years agolibxl, hvm: Add support to trigger power or sleep button events
Keir Fraser [Wed, 20 Jan 2010 20:36:19 +0000 (20:36 +0000)]
libxl, hvm: Add support to trigger power or sleep button events

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
15 years agohvm: Add ACPI fixed sleep button
Keir Fraser [Wed, 20 Jan 2010 20:34:19 +0000 (20:34 +0000)]
hvm: Add ACPI fixed sleep button

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
15 years agoxentrace: Per-cpu xentrace buffers
Keir Fraser [Wed, 20 Jan 2010 20:33:35 +0000 (20:33 +0000)]
xentrace: Per-cpu xentrace buffers

In the current xentrace configuration, xentrace buffers are all
allocated in a single contiguous chunk, and then divided among logical
cpus, one buffer per cpu.  The size of an allocatable chunk is fairly
limited, in my experience about 128 pages (512KiB).  As the number of
logical cores increase, this means a much smaller maximum per-cpu
trace buffer per cpu; on my dual-socket quad-core nehalem box with
hyperthreading (16 logical cpus), that comes to 8 pages per logical
cpu.

This patch addresses this issue by allocating per-cpu buffers
separately.

Signed-off-by: George Dunlap <dunlapg@umich.edu>
15 years agoxend: Fix 20825:49a2c1069e14
Keir Fraser [Wed, 20 Jan 2010 09:51:38 +0000 (09:51 +0000)]
xend: Fix 20825:49a2c1069e14

Converting an Python Int, sizeof(long) already returns byte length
rather than bit length so do not divide-by-8.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxend: Properly interpret vcpu_avail Long Integer in xc.hvm_build().
Keir Fraser [Wed, 20 Jan 2010 09:33:59 +0000 (09:33 +0000)]
xend: Properly interpret vcpu_avail Long Integer in xc.hvm_build().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoEnable IOMMU by default.
Keir Fraser [Tue, 19 Jan 2010 15:44:54 +0000 (15:44 +0000)]
Enable IOMMU by default.

Can be disabled with 'iommu=0' boot parameter.

Note that iommu_inclusive_mapping is now also enabled by default, to
deal with systems with broken BIOS tables specifying bad RMRRs. Old
behaviour can be specified via 'iommu_inclusive_mapping=0'.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agox86: Clean up TSC_RELIABLE handling after 20705:a74aca4b9386
Keir Fraser [Tue, 19 Jan 2010 10:56:59 +0000 (10:56 +0000)]
x86: Clean up TSC_RELIABLE handling after 20705:a74aca4b9386

Set the feature by default and disable it if we can detect TSC warp,
rather than leaving the feature cleared and setting it if we happen
not to detect TSC warp.

This way round fixes dom0 kernel boot for Masaki Kanno.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxc_domain_save: allocate pfn_err before use
Keir Fraser [Tue, 19 Jan 2010 09:40:30 +0000 (09:40 +0000)]
xc_domain_save: allocate pfn_err before use

Due to recent changes related to xc_map_foreign_bulk, xc_domain_save
segfaults because it tries to use pfn_err without allocating it first.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
15 years agolibxl: fix "xl list" output
Keir Fraser [Mon, 18 Jan 2010 14:49:00 +0000 (14:49 +0000)]
libxl: fix "xl list" output

This simple patch fixes the "xl list" output and cleans
libxl_list_domain after the recent API changes to list domains and
VMs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
15 years agominios: implement xc_map_foreign_bulk
Keir Fraser [Mon, 18 Jan 2010 14:48:18 +0000 (14:48 +0000)]
minios: implement xc_map_foreign_bulk

In order to do so it modifies map_frames_ex and do_map_frames to take
an int *err as parameter and return any error that way.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
15 years agoRevert 20746:042b371d8728 --- Breaks stubdoms.
Keir Fraser [Mon, 18 Jan 2010 10:37:28 +0000 (10:37 +0000)]
Revert 20746:042b371d8728 --- Breaks stubdoms.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>