]> xenbits.xensource.com Git - xen.git/log
xen.git
15 years agox86: fix NUMA handling (c/s 20599:e5a757ce7845)
Keir Fraser [Fri, 8 Jan 2010 11:22:41 +0000 (11:22 +0000)]
x86: fix NUMA handling (c/s 20599:e5a757ce7845)

c/s 20599 caused the hash shift to become significantly smaller on
systems with an SRAT like this

(XEN) SRAT: Node 0 PXM 0 0-a0000
(XEN) SRAT: Node 0 PXM 0 100000-80000000
(XEN) SRAT: Node 1 PXM 1 80000000-d0000000
(XEN) SRAT: Node 1 PXM 1 100000000-130000000

Comined with the static size of the memnodemap[] array, NUMA got
therefore disabled on such systems. The backport from Linux was really
incomplete, as Linux much earlier had already introduced a dynamcially
allocated memnodemap[].

Further, doing to/from pdx translations on addresses just past a valid
range is not correct, as it may strip/fail to insert non-zero bits in
this case.

Finally, using 63 as the cover-it-all shift value is invalid on 32bit,
since pdx values are unsigned long.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoHandle PoD case in hvm_hap_nested_page_fault()
Keir Fraser [Wed, 6 Jan 2010 12:45:23 +0000 (12:45 +0000)]
Handle PoD case in hvm_hap_nested_page_fault()

The new combined nested page fault handling doesn't consider the case
where the gfn_to_mfn() translation caused the page to be transparently
populated.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agosysctl: Return max_node_id rather than nr_nodes from physinfo command.
Keir Fraser [Wed, 6 Jan 2010 10:13:55 +0000 (10:13 +0000)]
sysctl: Return max_node_id rather than nr_nodes from physinfo command.

Python extension continues to synthesise a nr_nodes value.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agox86: XEN_DOMCTL_MEM_SHARING_OP_CONTROL should not depend on HAP.
Keir Fraser [Wed, 6 Jan 2010 09:39:01 +0000 (09:39 +0000)]
x86: XEN_DOMCTL_MEM_SHARING_OP_CONTROL should not depend on HAP.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agolibxl: apply CPUID policy for all types of VMs in all situations
Keir Fraser [Wed, 6 Jan 2010 08:25:07 +0000 (08:25 +0000)]
libxl: apply CPUID policy for all types of VMs in all situations

Apply CPUID policy to all types of VMs in all situations. Otherwise PV
VMs get no cpuid flags. It would be interesting if someone tested
libxl on PV before pushing dozens of patches.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
15 years ago[IA64] Fix ia64 build
Keir Fraser [Wed, 6 Jan 2010 08:20:11 +0000 (08:20 +0000)]
[IA64] Fix ia64 build

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
15 years agotmem: Only enable by default for x86_64
Keir Fraser [Wed, 6 Jan 2010 08:18:04 +0000 (08:18 +0000)]
tmem: Only enable by default for x86_64

While tmem has gotten limited testing with a 32-bit Xen, it
has severe limitations due to 32-bit heap restrictions.
So, turn it off by default for 32-bit so nobody accidentally
runs into this.

Signed-off by: Dan Magenheimer <dan.magenheimer@oracle.com>

15 years agoxend: passthrough: also do_FLR when a device is assigned.
Keir Fraser [Wed, 6 Jan 2010 08:17:20 +0000 (08:17 +0000)]
xend: passthrough: also do_FLR when a device is assigned.

To workaround a race condition about guest hotplug, c/s
18338:7c10be016e4 disabled do_FLR when we create guest or 'xm
pci-attach' device into guest, so now we actually only do_FLR when a
guest is destroyed or 'xm pci-detach'.

By moving the FLR-related checking/do_FLR logic a little earlier, this
patch re-enables do_FLR in these 2 cases disabled by 18338.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
15 years agolibxenlight: install libxl.h
Keir Fraser [Tue, 5 Jan 2010 08:40:18 +0000 (08:40 +0000)]
libxenlight: install libxl.h

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxenlight: remove any uuid dependancies from xl
Keir Fraser [Tue, 5 Jan 2010 08:39:40 +0000 (08:39 +0000)]
libxenlight: remove any uuid dependancies from xl

uuid handles in create and now create_device_model need to fill the
uuid field by client. the uuid field happens to be the exact same size
as the standard uuid (sixteen 8 bits values).

stubdom need to have a uuid when created, so using the one in
create_device_model.

this permits the client library to generate the uuid in any way it see
fits (even if it's not compliant to any standard), and simplify
installation of the libxenlight header.

xl converted from libuuid generated uuid to generated through random()
C call.  need to be fixed if anyone plan to use xl for anything
seriously apart from developing libxl.

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agonuma: Correct handling node with CPU populated but no memory populated
Keir Fraser [Tue, 5 Jan 2010 08:38:23 +0000 (08:38 +0000)]
numa: Correct handling node with CPU populated but no memory populated

In changeset 20599, the node that has no memory populated is marked
parsed, but not online. However, if there are CPU populated in this
node, the corresponding CPU mapping (i.e. the cpu_to_node) is still
setup to the offline node, this will cause trouble for memory
allocation.

This patch changes the init_cpu_to_node() and srant_detect_node(), to
considering the node is offlined situation.

Now the apicid_to_node is only used to keep the mapping between
cpu/node provided by BIOS, and should not be used for memory
allocation anymore.

One thing left is to update the cpu_to_node mapping after memory
populated by memory hot-add.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
This is a reintroduction of 20726:ddb8c5e798f9, which I incorrectly
reverted in 20745:d3215a968db9

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxend: Pass -vcpu-avail option to QEMU now it is supported.
Keir Fraser [Tue, 5 Jan 2010 08:36:54 +0000 (08:36 +0000)]
xend: Pass -vcpu-avail option to QEMU now it is supported.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoUpdate QEMU_TAG to 2621a102cd74cd6691bed30f638581639fcb141d
Keir Fraser [Tue, 5 Jan 2010 08:35:31 +0000 (08:35 +0000)]
Update QEMU_TAG to 2621a102cd74cd6691bed30f638581639fcb141d

15 years agoRevert incorrect comment change introduced by 20720:ddb3646ad681
Keir Fraser [Tue, 5 Jan 2010 08:34:55 +0000 (08:34 +0000)]
Revert incorrect comment change introduced by 20720:ddb3646ad681

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agodomctl: Fix command-number clashes and place all #defines together to
Keir Fraser [Mon, 4 Jan 2010 10:35:16 +0000 (10:35 +0000)]
domctl: Fix command-number clashes and place all #defines together to
avoid the problem in future.

From: Juergen Gross <juergen.gross@ts.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoVT-d: fix iommu_domid for PCI/PCIx devices assignment
Keir Fraser [Mon, 4 Jan 2010 09:07:28 +0000 (09:07 +0000)]
VT-d: fix iommu_domid for PCI/PCIx devices assignment

Currently, it clears iommu_domid and domid_map at the end of
domain_context_unmap_one() if no other devices under the same iommu
owned by this domain. But, when assign a PCI/PCIx device to a guest,
it also assigns its upstream bridge to the guest, and they use the
same iommu_domid. In the deassignment, the iommu_domid and domid_map
are cleared in domain_context_unmap_one() for the assigned PCI/PCIx
device, therefore it cannot get valid iommu_domid in followed
domain_context_unmap_one for its upstream bridge. It causes PCI/PCIx
device re-assignment failure.

This patch moves the iommu_domid and domid_map clearing code to the
end of domain_context_unmap, where all dependent
domain_context_unmap_one()s are completed, thus fix above issue.

Signed-off-by: Weidong Han <Weidong.han@intel.com>
15 years agoVT-d: fix iommu_domain_destroy
Keir Fraser [Mon, 4 Jan 2010 09:06:36 +0000 (09:06 +0000)]
VT-d: fix iommu_domain_destroy

Currently, g2m_ioport list and mapped_rmrrs always won't be released
in iommu_domain_destroy, because the function returns before those
code. It causes potential leak. This patch releases them, and thus
avoid the potential leak.

Signed-off-by: Weidong Han <Weidong.han@intel.com>
15 years agoVT-d: clean up dynamic page mapping
Keir Fraser [Mon, 4 Jan 2010 09:06:02 +0000 (09:06 +0000)]
VT-d: clean up dynamic page mapping

Before dynamic VT-d page table for hvm guest (changeset 20152),
need_iommu is only used for PV guest. And it maps pages into VT-d for
PV guest in get_page_type and grant table.  Now need_iommu is used
both hvm and pv guests, this patch makes those code still only for PV
guest, because it needn't to map pages there for hvm domain.

Signed-off-by: Weidong Han <Weidong.han@intel.com>
15 years agoxend: Allow disable QEMU monitor by settinbg option to 0 in config file.
Keir Fraser [Mon, 4 Jan 2010 09:04:53 +0000 (09:04 +0000)]
xend: Allow disable QEMU monitor by settinbg option to 0 in config file.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
15 years agoRevert 20726:ddb8c5e798f9
Keir Fraser [Mon, 4 Jan 2010 09:03:42 +0000 (09:03 +0000)]
Revert 20726:ddb8c5e798f9

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agox86: In mmuext_op(), MMUEXT_[UN]PIN_* must respect 'foreigndom'...
Keir Fraser [Wed, 30 Dec 2009 13:10:03 +0000 (13:10 +0000)]
x86: In mmuext_op(), MMUEXT_[UN]PIN_* must respect 'foreigndom'...

... and *only* those subcommands respect 'foreigndom', according to
documentation in public header xen.h.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agohvmloader: mp table fix
Keir Fraser [Wed, 30 Dec 2009 12:49:10 +0000 (12:49 +0000)]
hvmloader: mp table fix

The bug causes noapic PAE Windows 2k3 boot failure.

Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
15 years agolibxl: add a versioning number to ctx_init that permit to detect
Keir Fraser [Wed, 30 Dec 2009 12:47:09 +0000 (12:47 +0000)]
libxl: add a versioning number to ctx_init that permit to detect
incompatible client.

at the moment if the versioning of the library is not exactly the same
used in the client then the ctx_init return an ERROR_VERSION. however
the same mechanism can be use in the future to be able to support
older version and offer a compatibility layer.

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: define errors as an enum instead of define random values.
Keir Fraser [Wed, 30 Dec 2009 12:46:16 +0000 (12:46 +0000)]
libxl: define errors as an enum instead of define random values.

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: add a get_shutdown_reason
Keir Fraser [Wed, 30 Dec 2009 12:45:41 +0000 (12:45 +0000)]
libxl: add a get_shutdown_reason

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: remove API for dominfolist and list that returns xc_dominfo.
Keir Fraser [Wed, 30 Dec 2009 12:45:13 +0000 (12:45 +0000)]
libxl: remove API for dominfolist and list that returns xc_dominfo.

fixup xl and part of libxl that use those API, to use simpler, faster
and less wasteful API (doesn't need to get the info about all domains
when looking for one specific domain).

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: add useful xc flags in the xl_dominfo structure
Keir Fraser [Wed, 30 Dec 2009 12:44:37 +0000 (12:44 +0000)]
libxl: add useful xc flags in the xl_dominfo structure

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: remove waitpid wrapper that's doesn't do anything
Keir Fraser [Wed, 30 Dec 2009 12:43:57 +0000 (12:43 +0000)]
libxl: remove waitpid wrapper that's doesn't do anything

if the waitpid callback isn't defined just call normal waitpid

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: hide internal logging from client
Keir Fraser [Wed, 30 Dec 2009 12:43:19 +0000 (12:43 +0000)]
libxl: hide internal logging from client

reimplement simple logging in xl, the XL_LOG facilities are a
means for the library to communicate back to the client, not
for a logging library that may be redundant with what the client
use.

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: remove structure init from the library and structure domid
Keir Fraser [Wed, 30 Dec 2009 12:42:41 +0000 (12:42 +0000)]
libxl: remove structure init from the library and structure domid
fixup completly

structure init are more accurately done in the client of the library.

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: remove useless smac in the nic_info structure
Keir Fraser [Wed, 30 Dec 2009 12:42:01 +0000 (12:42 +0000)]
libxl: remove useless smac in the nic_info structure

the string representing the mac is easily recomputed from the mac
array

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: coding styles cleanup
Keir Fraser [Wed, 30 Dec 2009 12:41:22 +0000 (12:41 +0000)]
libxl: coding styles cleanup

simplify some lines, and keep the xl style consistant with itself.
use libxl_sprintf instead of snprintf/sprintf

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agolibxl: change semantics of ctx_free and remove ctx_close
Keir Fraser [Wed, 30 Dec 2009 12:40:44 +0000 (12:40 +0000)]
libxl: change semantics of ctx_free and remove ctx_close

ctx_close isn't use anywhere, and free reallocate the GC array, which
is quite surprising and lead to memory leaking in xl.c

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
15 years agox86: Initialise percpu areas as early as possible during bootstrap.
Keir Fraser [Tue, 29 Dec 2009 15:11:47 +0000 (15:11 +0000)]
x86: Initialise percpu areas as early as possible during bootstrap.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoXendAPI: After VBD_destroy and VIF_destroy the managed config must be saved
Keir Fraser [Tue, 29 Dec 2009 15:04:17 +0000 (15:04 +0000)]
XendAPI: After VBD_destroy and VIF_destroy the managed config must be saved

...otherwise already deleted devices appear again in configuration
after a xend restart.

Signed-off-by: Lutz Dube <Lutz.Dube@ts.fujitsu.com>
15 years agomemshr: Must be built on ia64 as well as x86, as blktap depends on it.
Keir Fraser [Mon, 28 Dec 2009 10:55:50 +0000 (10:55 +0000)]
memshr: Must be built on ia64 as well as x86, as blktap depends on it.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agox86, passthrough: Enhance guest's interrupt affinity setting.
Keir Fraser [Mon, 28 Dec 2009 09:39:23 +0000 (09:39 +0000)]
x86, passthrough: Enhance guest's interrupt affinity setting.

When guest uses logical flat destionation mode for interrupt delivery,
vector doesn't change but destionation also can change, so should
enhance the check condition.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
15 years agohvmloader: Fix Windows XP standby with cirrus VGA
Keir Fraser [Mon, 28 Dec 2009 09:38:34 +0000 (09:38 +0000)]
hvmloader: Fix Windows XP standby with cirrus VGA

Fix it by telling OSPM don't power down vga card on entering S3
state. The trick works for XP and Windows2003, but Vista still refuse
to allow S3.

It is picked from kvm-userdapce.git commit 60e85d, author "Gleb
Natapov".

Signed-off-by: Yu Ke <ke.yu@intel.com>
15 years agonuma: Correct handling node with CPU populated but no memory populated
Keir Fraser [Mon, 28 Dec 2009 09:36:51 +0000 (09:36 +0000)]
numa: Correct handling node with CPU populated but no memory populated

In changeset 20599, the node that has no memory populated is marked
parsed, but not online. However, if there are CPU populated in this
node, the corresponding CPU mapping (i.e. the cpu_to_node) is still
setup to the offline node, this will cause trouble for memory
allocation.

This patch changes the init_cpu_to_node() and srant_detect_node(), to
considering the node is offlined situation.

Now the apicid_to_node is only used to keep the mapping between
cpu/node provided by BIOS, and should not be used for memory
allocation anymore.

One thing left is to update the cpu_to_node mapping after memory
populated by memory hot-add.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agogrant_table: Build fixes for IA64.
Keir Fraser [Mon, 28 Dec 2009 09:32:39 +0000 (09:32 +0000)]
grant_table: Build fixes for IA64.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agomemshr: Build fixes
Keir Fraser [Mon, 28 Dec 2009 09:14:16 +0000 (09:14 +0000)]
memshr: Build fixes

 * Build memshr/xenpaging on x86/Linux only
 * Remove dependency on GCC 4.1+ __sync_*() intrinsics.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
15 years agox86: Fix mfn/page handling in do_mmuext_op().
Keir Fraser [Thu, 24 Dec 2009 15:59:44 +0000 (15:59 +0000)]
x86: Fix mfn/page handling in do_mmuext_op().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoevtchn: Do not free d->poll_mask until domain is being deallocated.
Keir Fraser [Thu, 24 Dec 2009 12:14:09 +0000 (12:14 +0000)]
evtchn: Do not free d->poll_mask until domain is being deallocated.

Avoids crash on dereference of poll_mask after domain_kill().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agohvmloader: Only mark LTP1 present in ACPI DSDT if parport really is there.
Keir Fraser [Thu, 24 Dec 2009 09:10:25 +0000 (09:10 +0000)]
hvmloader: Only mark LTP1 present in ACPI DSDT if parport really is there.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
15 years agox86/mm: early put_page when XENMEM_add_to_physmap(XENMAPSPACE_gmfn)
Keir Fraser [Thu, 24 Dec 2009 09:06:12 +0000 (09:06 +0000)]
x86/mm: early put_page when XENMEM_add_to_physmap(XENMAPSPACE_gmfn)

When using a stub domain, xen massively complains as follows:

(XEN) sh error: sh_remove_all_mappings(): can't find all mappings of
mfn be=
3c5: c=3D8000000000000004 t=3D00000000
(XEN) sh error: sh_remove_all_mappings(): can't find all mappings of
mfn be=
3c4: c=3D8000000000000004 t=3D00000000
...

This comes from the XENMEM_add_to_physmap hypercall from hvmloader.

The guest_physmap_remove_page function calls sh_remove_all_mappings()
which checks reference count of the page. Then, calling
guest_physmap_remove_page after temporarily get_page is obviously
wrong. And early put_page is harmless here since domain_lock is
acquired.

Also, the restore program seems not to complain extra mappings
long before. Instead, the stub domain does. Thus the comment in
sh_remove_all_mappings() is rewritten.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
15 years agoxend: Fix 'monitor' domain config parameter.
Keir Fraser [Thu, 24 Dec 2009 08:59:47 +0000 (08:59 +0000)]
xend: Fix 'monitor' domain config parameter.

Introduce new 'monitor_path' parameter, so that 'monitor' can revert
to its old type and meaning.

Fixes domain reboot and save/restore.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxen-detect: Add command-line arguments.
Keir Fraser [Wed, 23 Dec 2009 08:22:13 +0000 (08:22 +0000)]
xen-detect: Add command-line arguments.

 - Usage info
 - Quiesce normal output
 - Affect exit status if running in unexpected context

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoxend: Extra qemu options: parallel,serial,monitor
Keir Fraser [Wed, 23 Dec 2009 07:36:33 +0000 (07:36 +0000)]
xend: Extra qemu options: parallel,serial,monitor

Allows par/ser ports to be configured with a path to backing device.
Allows qemu monitor to be disabled or redirected.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
15 years agopygrub, reiserfs: Fix on-disk structure definition.
Keir Fraser [Wed, 23 Dec 2009 07:27:21 +0000 (07:27 +0000)]
pygrub, reiserfs: Fix on-disk structure definition.

Without this patch pyGRUB could not read ReiserFS.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
15 years agoRemove videoram option from vfb config.
Keir Fraser [Wed, 23 Dec 2009 07:26:31 +0000 (07:26 +0000)]
Remove videoram option from vfb config.
This option is only valid in main config.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
15 years agoReplace process_pending_timers() with process_pending_softirqs().
Keir Fraser [Tue, 22 Dec 2009 18:35:34 +0000 (18:35 +0000)]
Replace process_pending_timers() with process_pending_softirqs().

This ensures that any critical softirqs are handled in a timely manner
(e.g., TIME_CALIBRATE_SOFTIRQ) while still avoiding being preempted by
the scheduler (by SCHEDULE_SOFTIRQ), which is the reason for avoiding
use of do_softirq() directly.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agopaging: Updates to public grant table header file.
Keir Fraser [Tue, 22 Dec 2009 18:18:07 +0000 (18:18 +0000)]
paging: Updates to public grant table header file.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoVT-d: improve RMRR region handling
Keir Fraser [Tue, 22 Dec 2009 13:39:12 +0000 (13:39 +0000)]
VT-d: improve RMRR region handling

This patch improves RMRR regions handling as follows:

1) Get rid of duplicated RMRR mapping: different devices may share the
same RMRR regions, when they are assigned to the same guest, it only
need to map the same RMRR region once because RMRR region must be
identity mapped. Add an array of mapped RMRRs to achieve this.

2) Needn't call domain_context_mapping to map the device again in
iommu_prepare_rmrr_dev, and change iommu_prepare_rmrr_dev to
rmrr_identity_mapping which is more suitable.

3) A device may have more than one RMRR regions, remove "break" in
intel_iommu_add_device to let it map all RMRR regions of the device.

Signed-off-by: Weidong Han <Weidong.han@intel.com>
15 years agodomctl/sysctl: Clean up definitions
Keir Fraser [Tue, 22 Dec 2009 11:33:15 +0000 (11:33 +0000)]
domctl/sysctl: Clean up definitions
 - Use fixed-width types only
 - Use named unions only
 - Bump domctl version number

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoRevert 20709:085627544270
Keir Fraser [Mon, 21 Dec 2009 16:51:40 +0000 (16:51 +0000)]
Revert 20709:085627544270

15 years agoxend: Enable vHPET in HVM guests by default.
Keir Fraser [Mon, 21 Dec 2009 10:50:28 +0000 (10:50 +0000)]
xend: Enable vHPET in HVM guests by default.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoCheck m2p/compat m2p table for new added memory.
Keir Fraser [Mon, 21 Dec 2009 10:48:01 +0000 (10:48 +0000)]
Check m2p/compat m2p table for new added memory.

As we allocate m2p/compat m2p/frametable page tables from new added
memory, we want to make sure the new range can hold up the new page
tables, this is because m2p/frametable need be aligned and cover more
than the new-added range.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agoFix bugs in frame table setup function when memory hot-add.
Keir Fraser [Mon, 21 Dec 2009 10:47:34 +0000 (10:47 +0000)]
Fix bugs in frame table setup function when memory hot-add.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agoClean up memory hotplug functions.
Keir Fraser [Mon, 21 Dec 2009 10:47:21 +0000 (10:47 +0000)]
Clean up memory hotplug functions.

Move the range checking to mem_hotadd_check.
Add more error handling, to restore the node information, unmap iommu
page tables, destroy xen mapping when error happens.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agoVerify TSC sync even on systems with constant and non-stop TSC.
Keir Fraser [Mon, 21 Dec 2009 10:41:26 +0000 (10:41 +0000)]
Verify TSC sync even on systems with constant and non-stop TSC.
We now reserve X86_FEATURE_TSC_RELIABLE for those systems
that have been verified.

For the record... Jeremy was right!  (there, I said it ;-)

See linux patch described here:
http://patchwork.kernel.org/patch/68397/

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
15 years agoxenpaging: Add checks for p2m_is_valid() after calls to gfn_to_mfn()
Keir Fraser [Mon, 21 Dec 2009 10:40:51 +0000 (10:40 +0000)]
xenpaging: Add checks for p2m_is_valid() after calls to gfn_to_mfn()
that replace calls to gmfn_to_mfn(), which does the check internally.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoxenstore: Fix memory leak in command 'xenstore rm'
Keir Fraser [Mon, 21 Dec 2009 10:39:48 +0000 (10:39 +0000)]
xenstore: Fix memory leak in command 'xenstore rm'

When option '-t' is used to do tidy remove, routine xs_directory()
will be called in order to check there are brother directories or not.
The returned pointer should be passed to free() after this check.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
15 years agoxenstore: Fix the method of get options and the usage
Keir Fraser [Fri, 18 Dec 2009 07:53:27 +0000 (07:53 +0000)]
xenstore: Fix the method of get options and the usage

Add long option '--flat' correspond to short option '-f',
and let it just can be used for subcommand 'ls' (because
in fact it's useless for subcommand 'read' and 'list').
And fix the usage of subcommands 'ls', 'list' and 'chmod'.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
15 years agox86_32: Build fix in xenpaging tool.
Keir Fraser [Fri, 18 Dec 2009 07:52:03 +0000 (07:52 +0000)]
x86_32: Build fix in xenpaging tool.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agonetbsd: Build fix (do not build memshr).
Keir Fraser [Fri, 18 Dec 2009 07:51:43 +0000 (07:51 +0000)]
netbsd: Build fix (do not build memshr).

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoMake Citrix copyright strinsg consistent.
Keir Fraser [Fri, 18 Dec 2009 07:42:09 +0000 (07:42 +0000)]
Make Citrix copyright strinsg consistent.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agoguest_walk.c: Remove commented out p2m paging type check code
Keir Fraser [Fri, 18 Dec 2009 07:33:52 +0000 (07:33 +0000)]
guest_walk.c: Remove commented out p2m paging type check code

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agomemshr: Include unistd.h for sleep().
Keir Fraser [Fri, 18 Dec 2009 07:33:09 +0000 (07:33 +0000)]
memshr: Include unistd.h for sleep().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
15 years agomini-os: Fix build error when !HAVE_LIBC
Keir Fraser [Fri, 18 Dec 2009 07:31:02 +0000 (07:31 +0000)]
mini-os: Fix build error when !HAVE_LIBC

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
15 years agox86_32: Build fixes after page-sharing patches.
Keir Fraser [Thu, 17 Dec 2009 16:09:19 +0000 (16:09 +0000)]
x86_32: Build fixes after page-sharing patches.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
15 years agoMaintains/cleans-up the sharing map. At the moment a simple FIFO policy is
Keir Fraser [Thu, 17 Dec 2009 06:27:57 +0000 (06:27 +0000)]
Maintains/cleans-up the sharing map. At the moment a simple FIFO policy is
applied.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoReads from read only parent disk images are intercepted, and are used to detect
Keir Fraser [Thu, 17 Dec 2009 06:27:57 +0000 (06:27 +0000)]
Reads from read only parent disk images are intercepted, and are used to detect
potentially sharable memory pages.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoMultiple tapdisk2 processes may use the same parent disk images (later used to
Keir Fraser [Thu, 17 Dec 2009 06:27:57 +0000 (06:27 +0000)]
Multiple tapdisk2 processes may use the same parent disk images (later used to
detect sharable memory pages). This patch establishes unique id for each disk
image opened by tapdisk2, and stores it in shared memory region, thus making it
available to the remaining tapdisk2s.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoAdds 'memory_sharing' option to domain config scripts. It passes domain id to
Keir Fraser [Thu, 17 Dec 2009 06:27:57 +0000 (06:27 +0000)]
Adds 'memory_sharing' option to domain config scripts. It passes domain id to
the tapdisk2 process if sharing is enabled (tapdisk2 is not normally aware what
domain it is working for).

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoGeneric bi-directional map, and related initialisation functions. At the moment
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
Generic bi-directional map, and related initialisation functions. At the moment
a single map is used to store mappings between sharing handles and disk blocks.
This is used to share pages which store data read of the same blocks on
(virtual) disk.
Note that the map is stored in a shared memory region, as it needs to be
accessed by multiple tapdisk processes. This complicates memory allocation
(malloc cannot be used), prevents poniters to be stored directly (as the shared
memory region might and is mapped at different base address) and finally pthread
locks need to be multi-process aware.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoSupport for -EAGAIN from xc_gnttab_map_grant_ref.
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
Support for -EAGAIN from xc_gnttab_map_grant_ref.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoInterfaces to memshr domctls.
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
Interfaces to memshr domctls.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoRequest re-coalescing for qcow disks. qcow driver had the habit of breaking each
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
Request re-coalescing for qcow disks. qcow driver had the habit of breaking each
(4K) block read into 8 (512 bytes) sector reads. This is inefficient, but also
prevents sharing detector from working, as it is based on page-size reads.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoAudit code for memory sharing.
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
Audit code for memory sharing.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoDomctls defined for all relevant memory sharing operations.
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
Domctls defined for all relevant memory sharing operations.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoHAP fault handling for shared pages.
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
HAP fault handling for shared pages.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoForeign mappings need to verify if the underlying pages are sharable/shared. If
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
Foreign mappings need to verify if the underlying pages are sharable/shared. If
so, only RO mappings are allowed to go ahead. If an RW mapping to
sharable/shared page is requested, the GFN will be unshared (if there are free
pages for private copies) or an error returned otherwise. Note that all tools
(libxc + backends) which map foreign mappings need to check for error return
values.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoThis patch establishes a new abstraction of sharing handles (encoded as a 64bit
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
This patch establishes a new abstraction of sharing handles (encoded as a 64bit
int), each corresponding to a single sharable pages. Externally all sharing related
operations (e.g. nominate/share) will use sharing handles, thus solving a lot of
consistency problems (like: is this sharable page still the same sharable page
as before).
Internally, sharing handles can be translated to the MFNs (using a newly created
hashtable), and then for each MFNs a doubly linked list of GFNs translating to
this MFN is maintained. Finally, sharing handle is stored in page_info strucutre
for each sharable MFN.
All this allows to share and unshare pages efficiently. However, at the moment a
single lock is used to protect the sharing handle hash table. For scalability
reasons, the locking needs to be made more granular.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoThe internal Xen x86 emulator is fixed to handle shared/sharable pages corretly.
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
The internal Xen x86 emulator is fixed to handle shared/sharable pages corretly.
If pages cannot be unshared immediately (due to lack of free memory required to
create private copies) the VCPU under emulation is paused, and the emulator
returns X86EMUL_RETRY, which will get resolved after some memory is freed back
to Xen (possibly through host paging).

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoM2P translation cannot be handled through flat table with only one slot per MFN
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
M2P translation cannot be handled through flat table with only one slot per MFN
when an MFN is shared. However, all existing calls can either infer the GFN (for
example p2m table destructor) or will not need to know GFN for shared pages.
This patch identifies and fixes all the M2P accessors, either by removing the
translation altogether or by making the relevant modifications. Shared MFNs have
a special value of SHARED_M2P_ENTRY stored in their M2P table slot.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoSharable/shared pages need to be unshared in responce to a write attempt. This
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
Sharable/shared pages need to be unshared in responce to a write attempt. This
is handled through custom gfn_to_mfn transation functions called from generic
host page table page fault handler. This should handle both SVM and VTX alike.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoThis patch defines a new P2M type used for sharable/shared pages. It also
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
This patch defines a new P2M type used for sharable/shared pages. It also
implements the basic functions to nominate GFNs for sharing, and to break
sharing (either by making page 'private' or creating private copy),
mem_sharing_nominate_page() and mem_sharing_unshare_page() respectively. Note
pages cannot be shared yet, because there is no efficient way to find all GFNs
mapping to the two MFNs scheduled for sharing.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoThis patch defines a new PGT type called PGT_shared_page and a new synthetic
Keir Fraser [Thu, 17 Dec 2009 06:27:56 +0000 (06:27 +0000)]
This patch defines a new PGT type called PGT_shared_page and a new synthetic
domain called 'dom_cow'. In order to share a page, the type needs to be changed
to PGT_shared_page and the owner to dom_dow. Only pages with PGT_none, and no
type count are allowed to become sharable. Conversly, sharable pages can only be
made 'private' if type count equals one. page_make_sharable() and
page_make_private() handle these transitions.

Signed-off-by: Grzegorz Milos <Grzegorz.Milos@citrix.com>
15 years agoUser-land tool for memory paging.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
User-land tool for memory paging.

This tool will page out the specified number of pages from the specified
domain. When a paged out page is accessed, Xen will issue a request and
notify the tool over an event channel. The tool will process ther request,
page the page in, and notify Xen.

The current (default) policy tracks the 1024 most recently paged in pages
and will not choose to evict any of those. This is done with the assumption
that if a page is accessed, it is likely to be accessed again soon.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agolibxc interface support for memory paging domctls.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
libxc interface support for memory paging domctls.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agolibxc support of memory paging.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
libxc support of memory paging.

libxc accepts the new return code from privcmd mmap, which indicates a page
being mapped is actually paged out. Spin until the page is paged in and return
as normal to the caller. This allows memory paging to work transparently with
existing tools.

Since libxc runs in user-space, as does the pager, both processes will be
scheduled and run. This enables the page to be paged in without needing to
spin in kernel mode (which would cause a dead-lock).

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoMemory paging domctl support, which is a sub-operation of the generic memory
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
Memory paging domctl support, which is a sub-operation of the generic memory
event domctl support.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoAdd memory paging support for MMU updates (mapping a domain's memory).
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
Add memory paging support for MMU updates (mapping a domain's memory).

If Domain-0 tries to map a page that has been paged out, then propagate an
error so that it knows to try again. If the page is paged out, request that
it be paged back in. If the page is in the process of being paged in, then
just keeping returning the error until it is paged back in.

This requires the co-operation of the Domain-0 kernel's privcmd mmap
functions. The kernel can't simply spin waiting for the page, as this will
cause a dead-lock (since the paging tool lives in Domain-0 user-space and if
it's spinning in kernel space, it will never return to user-space to allow the
page to be paged back in). There is a complimentary Linux patch which sees
ENOENT, which is not returned by any other part of this code, and marks the
PFN of that paged specially to indicate it was paged out (much like what it
does with PFNs that are within the range of a domain's memory but are not
presently mapped).

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoSupport for Memory paging in grant table mappings.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
Support for Memory paging in grant table mappings.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoMemory paging support for HVM guest emulation.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
Memory paging support for HVM guest emulation.

A new HVMCOPY return value, HVMCOPY_gfn_paged_out is defined to indicate that
a gfn was paged out. This value and PFEC_page_paged, as appropriate, are
caught and passed up as X86EMUL_RETRY to the emulator. This will cause the
emulator to keep retrying the operation until is succeeds (once the page has
been paged in).

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agohap_gva_to_gfn paging support. Return PFEC_page_paged when a paged
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
hap_gva_to_gfn paging support. Return PFEC_page_paged when a paged
out page is found. Ensure top-level page table page and l1 entry
are paged in. If an intermediary page table page is paged out,
propogate error to caller.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoBase paging support for HVM guests.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
Base paging support for HVM guests.

This includes paging support for HVMOPs, HAP nested paging, and HVM map entry.
In all cases, the page is paged in automatically and an error returned,
indicating that the failed operation should be retried.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoPaging support for guest walk tables to page in l1-l3 page table pages.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
Paging support for guest walk tables to page in l1-l3 page table pages.

A new page flag has been added to indicate that a paged out page was found
while walking the page tables. The paging in code is automatically called,
so the flag is only an indicator that the operation should be retried, not
that the page should be paged in.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoEPT specific P2M support for new paging types.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
EPT specific P2M support for new paging types.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
15 years agoNew P2M types for memory paging and supporting functions.
Keir Fraser [Thu, 17 Dec 2009 06:27:55 +0000 (06:27 +0000)]
New P2M types for memory paging and supporting functions.
Several new types need to be added to represent the various different stages
a page can be in while being paged out/in. Xen will sometimes make different
decisions based on these types.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>